The recent media storm generated by OpenAI’s release of a technical paper on the GPT-2 model (and their controversial decision to withhold publication of the full model, citing fears of potential misuse) will not have escaped most people’s notice. Headlines such as “Could AI replace human writers?” (Financial Times, 22 March 2019) and “New AI fake text generator too dangerous to release” (The Guardian, 14 February 2019) may seem somewhat sensationalist but if you look under the bonnet of GPT-2 it doesn’t take long to realise why people are getting so excited about this. While GPT-2 isn’t fully open sourced it is the second publication in recent months of a Transformer-based model, following Google’s release of the fully open source BERT model in November 2018. For those with the required knowledge base in deep learning methodologies, this marks the beginning of an era of rapid advancement in Natural Language Processing (NLP). OpenAI and Google’s recent publications will undoubtedly have a significant impact on how we interact with IT systems and perhaps more importantly, how those systems interact with us in the coming 3-5 years.
What makes Transformer-based models different?
These two Transformer-based models have captured headlines not only because they are open source, but also because of their unprecedented performance. And rightfully so. The significant difference between Transformer-based models and their predecessors is that they employ multi-headed self attention mechanisms which enable the models to retain memory without the need for inexhaustible compute. Google AI’s paper “Attention is all you need”, published in 2017, first introduced the Transformer architecture and explains how this novel kind of neural network uses multi-headed attention to map the complex relationships between every word in a given text. Multi-headed attention is what makes these models particularly well suited to language processing (rather than speech or image recognition for example). The image below demonstrates the evolution of word contextualisation in NLP models; first we had shallowly bidirectional models such as ELMo, then unidirectional such as the original GPT model and more recently we have seen the evolution of deeply bidirectional word contextualisation. GPT-2 uses the same process for word contextualisation as the original GPT model but significantly scales up processing power, with impressive results!
Extracting value from open source AI
The open sourcing of these two Transformer-based models into the public market has caused a boom in the intrigue surrounding what use NLP models of this calibre can serve in business. However, while both models have significantly outstripped any previous performance metrics, they offer limited business value until they have been adapted, fine-tuned and deployed for a particular business use-case. The truly exciting thing about this new era of open source AI is that if you have the expertise to train these models, you can uncover a whole new layer of effortlessly generated business intelligence which would otherwise remain out of reach.
FeedStock’s deep learning models
We embrace the challenge of staying one step ahead of the market and assisting our clients in their digital transformation. FeedStock’s NLP models have been trained using hundreds of thousands of expertly annotated data points and a vast bank of unannotated data with the result that they can classify email content into as many as 30 highly nuanced categories with accuracy ratings as high as 97.8%. This allows our clients to specifically identify revenue generating insights from their inbound and outbound communication flows. The model can pick out specific companies, sectors or products from email and chat communication (according to our clients’ requirements) to give an overview of trending topics within different communication types, such as client requests. Over the past three years FeedStock’s data science team has trained and fine-tuned its models so they are applicable for a wide variety of business sectors and we are continuously performing R&D to further develop their capabilities, such as classifying communication into over 100 languages. As we move forward into the era of open source AI and machine generated insights, one thing is clear: the potential to streamline business operations and optimise sales performance is limitless for those who deploy deep learning technologies.