From the course: Next Generation AI: An Intro to GPT-3

Brief background on natural language processing (NLP)

From the course: Next Generation AI: An Intro to GPT-3

Brief background on natural language processing (NLP)

- One of the core features of GPT-3 is the ability to ingest large amounts of existing content and use it to create original output that emulates human-based text. The quality of output is so high that it's often difficult to distinguish it from something written by a person. This is not the only feature of GPT-3, as we'll see later, but it is its central feature. To enable and support this high-quality output, a number of AI techniques are employed. In particular. GPT-3 utilizes an approach that is at the intersection of AI, computer science and human language, called natural language processing, or NLP. Understanding the basics of NLP can help you to further contextualize and appreciate the breakthroughs in GPT-3. Let's begin with an example of NLP in action. Many of us now issue voice commands to digital devices, such as the popular virtual assistants Amazon Alexa and Google Home. We also ask our smartphones and laptops through voice with all manner of requests. These devices listen to our words and try to make sense of them, derive meaning, process the input, and then execute some action such as providing directions or playing your favorite album. This is done through NLP. This same technology that makes sense of voice commands can also be applied to text inputs on a computer. This enables function such as chatbots, search engines, language translation, spellcheck, and sentiment analysis, a process that classifies emotion in subjective language. This is NLP hard at work in the background. It's complex, state-of-the-art software. At a high level, NLP generally works like this. The software needs to pre-process the text in sentences in order to provide some form of structure that can be used as the basis for interpretation. One or more of the following can be used. One, the sentence is broken into each word. This is called tokenization. The individual words are known as tokens. Unnecessary punctuation is removed. Two, words can be identified and tagged as nouns, verbs, adjectives, pronouns, et cetera. And third, the application of stemming. This is where words are standardized and put in context by reducing them to their root form. For example, the words banks, banker, and banking are all associated with the root word bank. This is the stem. That root word will also be used to assign the context for all the others to a financial institution and not the act of turning an aircraft. Once the text has been pre-processed, a machine learning or ML algorithm is used to interpret the data. These algorithms use statistical models based on vast volumes of example data called training data to suggest what action to perform. When the pre-processed text is analyzed by the ML algorithm, it is looking for words, phrases, and patterns of text that are familiar from the training data. If there is a high probability that the words and context are understood, the NLP now knows the topic of the text. Finally, NLP can be used for natural language generation or NLG. This process analyzes and pre-processes text, interprets it and then uses insights from the input to generate new content. We'll see this capability applied in GPT-3.

Contents