From the course: Next Generation AI: An Intro to GPT-3

What is GPT-3?

- GPT-3 stands for Generative Pre-trained Transformer version three. A name like that was obviously created by people that know nothing about marketing. It's certainly a mouthful. Joking aside, GPT-3 uses advanced algorithms, a set of rules for problem-solving that computers follow, and lots of data to create original human like text and other output. Sophisticated Natural language processing, NLP, produces natural language generation, NLG. The quality of output that GPT-3 produces is so good, it is hard, even impossible some would say, to discern it from human generated text. GPT-3 can create meaningful stories, poems, emails, chat bot responses, even software code with just a few prompts from a human. For example, take a look at how GPT-3 based in some training data was able to change legal jargon into plain English. I'm sure you'll agree that this is helpful. I know I struggle often with reading legal contracts. Later on, I will provide many compelling examples. GPT-3 is based on a generative model of language. Generative models use existing knowledge of language to make predictions on what words may come next based on a series of previous words. This is where we get the word generative in Generative Pre-trained Transformer, or GPT. As a simple example, if you type once upon a, the model will predict time as the next word based on its analysis of a large corpus of content. Knowledge in this case, or what we call Training Data, for the system is acquired from a variety of sources, including Wikipedia and Common Crawl, a free set of data with about a trillion words, derived from perusing the internet. To help make the system smart so to speak, text is randomly removed from the acquired content and the software is trained to fill it in with the correct missing words. This is where we get the word pre-trained in Generative Pre-trained Transformer. Powering all of this is a type of AI called Deep Learning, which is based on 70 years of research in neural networks. These networks power the AI learning process by consuming training data. For example, for an AI powered recognition system to learn what a bicycle is and identified in a picture, it must analyze large volumes of existing bicycle pictures. It's cold a neural network because it loosely mimics the function of the brain. The network consists of a web of connected nodes or computational units. The type of neural network used in GPT-3 is called a Transformer. It's particularly good at taking text and reusing it in another context or word sequence, while maintaining meaning. This is where we get the word transformer in Generative Pre-trained Transformer. In a neural network, data moves through nodes based on certain criteria. One of these criteria is weight or strength of connections between nodes, a concept I'll return to shortly. If criteria are not met, data does not get processed. If it does, it moves to another node. This repeats until the data rise transformed at the end of the neural network. What makes GPT-3 a game changer relative to its two prior versions, is a number of model parameters. GPT-2 had 1.5 billion. GPT-3 has 175 billion. It is currently 10 times the size of the next largest model, the Turing NLG developed by Microsoft. In other words, it's really, really big. But what are these parameters? A model parameter is a variable whose value can be estimated from data. The data I'm referring to here is the training data that is fed to GPT-3. In the process of learning from the data, the algorithm stores values in the model. Each parameter assists with increasing the probability of a text prediction being accurate or relevant. Intuitively, the more parameters that exist, the higher the likelihood of quality predictions. In GPT-3, the sentence beginning once upon a, has a close to guaranteed probability of the correct completion. Next, I want to talk briefly about few-shot learning. A test of AI capability is the degree to which humans must provide it with answers in advance, to get a correct outcome. It is generally acknowledged that the fewer examples AI needs in order to predict accurately, the better the quality of the AI solution. For example, let's say we want to use AI for translation from English to French. We enter the word cheese and wait for the AI to translate. Since the algorithm doesn't know what language to translate it to, it's not going to be able to do anything. This is called zero-shot learning. In other words, we provided no guidance. However, if we instead feed the AI with friends equals ami, then query what cheese equals, now it has something to work with. This is one-shot learning. GPT-3 looks for a pattern using its vast language model. It may get the right answer. Finally, if we give GPT-3 more examples, now we're into few-shot learning. House equals maison, friend equals ami, rain equals pluie, then query what cheese equals, the likelihood of GPT-3 getting the right answer increases greatly. Of course fromage, pretty cool, right? Depending on what GPT-3 is being tasked with, it has been shown to provide good results in each of zero one and few-shot learning. Now, finally, unlike its predecessors, GPT-3 is not being made freely available. While the public API will be available to use under certain conditions, in late 2020 Microsoft assumed control of the source code. It will be fascinating to see how Microsoft incorporates GPT-3 capabilities into its products such as Word, Excel, and Bing. And how it provides it as a service on its Azure platform. GPT-3 is certainly not without some significant weaknesses and challenges. I'll explore those later. As is the case with all innovation and scientific breakthroughs, Microsoft and others will build on the new groundbreaking baseline of GPT-3 in order to create even more powerful AI in the years ahead.

Contents