Got my invite to the  @OpenAI GPT-3 API from  @gdb. I actually think it deserves more hype than it’s getting, but not necessarily for the magical reasons Twitter touts. Why? My quick thoughts and impressions: (1/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        First, let me summarize the API documentation. A user has access to 4 models (varying sizes). They cannot fine-tune the models. There’s basically only one function: the user can input some text (“priming”) and the model will predict the next several tokens (words). (2/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        When I first played with the API, I was stumped. Why couldn’t I replicate stellar Twitter demos? Was everyone just sharing cherry-picking text samples? I wanted the model to identify basic patterns in some unstructured data, but it gave garbage when I input just the data. (3/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                         @notsleepingturk suggested I reformat my inputs as tuples of (unstructured data, example pattern, indicator if the pattern exists). The model could then easily “autocomplete” tuples with missing indicators. Damn. Priming is obviously an art. (4/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        So why is GPT-3 so hype? It’s amazingly powerful *if* you know how to prime the model well. It’s going to change the ML paradigm — instead of constructing giant train sets for models, we’ll be crafting a few examples for models to do “few-shot” extrapolation from. (5/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                         @sharifshameem cracked the skill of priming in his demos. We don’t see what he prepends the demo input with before sending it to the API. Figuring out how to prime models properly will be the key to successfully utilizing language models in the future. (6/11) https://twitter.com/sharifshameem/status/1282676454690451457">https://twitter.com/sharifsha...
                        
                            
                            
                            
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Let’s deep-dive into how, in theory, one can be a  https://abs.twimg.com/emoji/v2/... draggable="false" alt="⭐️" title="Mittelgroßer Stern" aria-label="Emoji: Mittelgroßer Stern"> primer. The model’s goal is to maximize the log-likelihood of successive tokens given the primed input. In English, this ends up being: find similar patterns in the training set, and output similar successive tokens. (7/11)
https://abs.twimg.com/emoji/v2/... draggable="false" alt="⭐️" title="Mittelgroßer Stern" aria-label="Emoji: Mittelgroßer Stern"> primer. The model’s goal is to maximize the log-likelihood of successive tokens given the primed input. In English, this ends up being: find similar patterns in the training set, and output similar successive tokens. (7/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        GPT-3 is a great example of the “garbage-in-garbage-out” principle. If you prime poorly, you get shitty results. But since the models are probably trained on basically every piece of data on the Internet, chances are if you prime well, you’ll get intelligent outputs. (8/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        I like to think of these language models as “children with infinite memory.” Children’s skills are not all that refined, but they have basic pattern-matching skills. Coupled with a superpower to memorize the entire world, well, couldn’t they be extremely useful? (9/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        What else is so hype? The API’s best model is 350 GB. Serving this monstrosity efficiently and cheaply is an entirely new software problem for the industry. If  @OpenAI cracks this, they can become the AWS of modeling. (10/11)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        TLDR, if this takes off:
1) Expect the next generation of good ML practitioners to be in way more creative. It’s taking me a while to wrap my head around how to prime this model to get cool demos, lol.
2) Startups will move away from training their own in-house models. (11/11)
                    
                
                1) Expect the next generation of good ML practitioners to be in way more creative. It’s taking me a while to wrap my head around how to prime this model to get cool demos, lol.
2) Startups will move away from training their own in-house models. (11/11)
 
                         Read on Twitter
Read on Twitter 
                                     
                                    