Hey, today is #MindblowingMonday
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🤯" title="Explodierender Kopf" aria-label="Emoji: Explodierender Kopf">!
I want to tell you about Language Models, a type of machine learning techniques that are behind most of the recent hype in natural language processing.
https://abs.twimg.com/emoji/v2/... draggable="false" alt="❓" title="Rotes Fragezeichen-Symbol" aria-label="Emoji: Rotes Fragezeichen-Symbol"> Want to know more about them?
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧵" title="Thread" aria-label="Emoji: Thread">
https://abs.twimg.com/emoji/v2/... draggable="false" alt="👇" title="Rückhand Zeigefinger nach unten" aria-label="Emoji: Rückhand Zeigefinger nach unten">
I want to tell you about Language Models, a type of machine learning techniques that are behind most of the recent hype in natural language processing.
A language model is a computational representation of human language that captures which sentences are more likely to appear in a given language.
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎩" title="Zylinder" aria-label="Emoji: Zylinder"> Formally, a language model is a probability distribution over the sentences in a language.
https://abs.twimg.com/emoji/v2/... draggable="false" alt="❓" title="Rotes Fragezeichen-Symbol" aria-label="Emoji: Rotes Fragezeichen-Symbol"> What are they used for?
https://abs.twimg.com/emoji/v2/... draggable="false" alt="👇" title="Rückhand Zeigefinger nach unten" aria-label="Emoji: Rückhand Zeigefinger nach unten">
They come in many flavors
The simplest language model is the *unigram model*, also called a *bag of words* (BOW).
https://abs.twimg.com/emoji/v2/... draggable="false" alt="👉" title="Rückhand Zeigefinger nach rechts" aria-label="Emoji: Rückhand Zeigefinger nach rechts"> In BOW, each word is assigned a probability Pi, and the probability of a sentence is computed assuming all words are independent.
But of course, this isn& #39; true.
But of course, this isn& #39; true.
For example, "water" is a more commonly used word than "philosophy", but the phrase "philosophy is the mother of science" is arguably much more likely than the phrase "water is the mother of science".
https://abs.twimg.com/emoji/v2/... draggable="false" alt="💡" title="Elektrische Glühbirne" aria-label="Emoji: Elektrische Glühbirne"> The likelihood of a phrase depends upon all its words.
This dependency can be modelled with an n-gram model, in which the likelihood of a word is computed w.r.t. the words before in a given phrase (in a window of size n).
https://abs.twimg.com/emoji/v2/... draggable="false" alt="💡" title="Elektrische Glühbirne" aria-label="Emoji: Elektrische Glühbirne"> If we start a phrase with "philosophy", is more likely to see the word "science" than "shark".
If you want to capture phrases of length n=10, you need N^10 numbers, where N is the number of words in the language!
They try to learn jointly a vectorial representation for all words (aka an embedding) and some mathematical operation among them that approximates the likelihood.
The most popular neural language model is possibly *word2vec*, trained in predicting a word given a small window around it.
Popular examples are BERT and the family of GPT models, of which GPT-3 recently took Twitter by surprise with its ability to speak nonstop about anything, often without much sense.
For example, the phrase "boy is a programmer" is considered more likely by a model than "girl is a programmer", simply because the Internet has more examples of the first phrase.
As usual, if you like this topic, reply in this thread or @ me at any time. Feel free to
https://abs.twimg.com/emoji/v2/... draggable="false" alt="❤️" title="Rotes Herz" aria-label="Emoji: Rotes Herz"> like and
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🔁" title="Nach rechts und links zeigende Pfeile in offenem Kreis im Uhrzeigersinn" aria-label="Emoji: Nach rechts und links zeigende Pfeile in offenem Kreis im Uhrzeigersinn"> retweet if you think someone else could benefit from knowing this stuff.
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧵" title="Thread" aria-label="Emoji: Thread"> Read this thread online at < https://apiad.net/tweetstorms/mindblowingmonday-languagemodels>">https://apiad.net/tweetstor...
Stay curious
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🖖" title="„Live long and prosper!“" aria-label="Emoji: „Live long and prosper!“">:
-
https://abs.twimg.com/emoji/v2/... draggable="false" alt="📃" title="Seite mit Eselsohr" aria-label="Emoji: Seite mit Eselsohr"> < https://en.wikipedia.org/wiki/Language_model>
-">https://en.wikipedia.org/wiki/Lang...
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🗞️" title="Eingerollte Zeitung" aria-label="Emoji: Eingerollte Zeitung"> < https://arxiv.org/abs/2005.14165 >
-">https://arxiv.org/abs/2005....
https://abs.twimg.com/emoji/v2/... draggable="false" alt="💻" title="Computer" aria-label="Emoji: Computer"> < https://github.com/huggingface/transformers>
-">https://github.com/huggingfa...
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎥" title="Filmkamera" aria-label="Emoji: Filmkamera"> < https://youtu.be/89A4jGvaaKk >
-">https://youtu.be/89A4jGvaa...
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎥" title="Filmkamera" aria-label="Emoji: Filmkamera"> < https://youtu.be/_x9AwxfjxvE >">https://youtu.be/_x9Awxfjx...
-
-">https://en.wikipedia.org/wiki/Lang...
-">https://arxiv.org/abs/2005....
-">https://github.com/huggingfa...
-">https://youtu.be/89A4jGvaa...