Doctor GPT-3
or: How I Learned to Stop Worrying and Love the Artificial Intelligence

This week's newsletter is 6,000 words on #gpt3 :

- how it works
- if the hype is deserved
- how to detect it
- if it’s going to plunder our jobs

https://avoidboringpeople.substack.com/p/doctor-gpt-3 ?

Thread 👇

1/45
GPT-3 was created by OpenAI, a company trying to "make sure artificial general intelligence benefits all of humanity," i.e. the robots don't kill us all.

2/45
It's a general language model; think of it as an amazing autocomplete function

Because it's a general model, it can solve many different types of tasks

You could ask it to write a paragraph about unicorns, translate a sentence, generate programming code, or more

3/45
You might think it'd be worse than specialised models, but that's surprisingly not the case.

Impressively, in many areas it can be as good, if not better than State Of The Art models

4/45
If you could only pick one model, you'd probably want to use GPT. It's the Simone Biles of the AI community, being top at many events and great at the rest.

5/45
There's been a lot of hype ever since the viral tweet by @sharifshameem , which came a few weeks after the actual release of GPT-3.

If you look at search trends, the itsby-bitsy bump is the actual release. The large spike is after the viral tweet.

6/45
What's different this time, especially compared to GPT-2?

GPT-3 gives better results, largely due to the increased data used in training and increased model parameters

7/45
@minimaxir also points out that GPT-3 allows for longer text generation output, and makes better use of prompts to the model compared to before.

Overall, he estimates GPT-3 results are usable 30-40% of the time, vs GPT-2 10% of the time

8/45
That increased accuracy has led to many viral demo videos of programming code, google sheets plugins, or legal speak generation.

9/45 https://twitter.com/pavtalk/status/1285410751092416513?s=20
10/ That said, if you'd seen GPT-2 results last year, you wouldn't have been as surprised. The outputs then were already impressive.

The general public hype has gone from 0 to 100; the ppl in the space probably from 50 to 70.

10/45
Potential considerations on the model include:

- Slow to generate output
- Cherry-picking of examples
- Everyone's using the same model
- Bias in training

per @minimaxir

11/45
Funnily enough, @ykilcher also pointed out how the model is so expensive to train, that when the researchers found an error in the training, they couldn't retrain it due to cost concerns

12/45
GPT-3's model is based on the Transformer model, which is a different algorithm compared to the older sequence to sequence ones

13/45
I go into more detail in the post, and how those older seq2seq models worked is that they take inputs one at a time, perform some function, and get a temporary output

14/45
They then repeat that process for all the input words, in a sequence

15/45
And use that to predict the output words, in a sequence as well

16/45
The problem with doing this sequentially is that it takes a long time. You can't parallel process any of it, since each step is dependent on the previous one.

This is why these are known as "recurrent" neural networks.

17/45
Enter the transformer model.

GPT-3 stands for Generative Pretrained Transformer 3.

It's based on the transformer model, but actually a variation on it, as @JayAlammar pointed out to me

18/45
I elaborate in the post, but what the transformer does is split the inputs into more features

19/45
And then perform more functions on them in parallel, rather than sequentially

20/45
A transformer does many layers of functions on the input, and uses part of the result to predict the output word by word, by also running them through more functions

21/45
GPT-3 though is a transformer-decoder model, which per my understanding is a tweak on the above model, but without the encoding layers

Supposedly this saves computing due to reducing the number of parameters by half

22/45
When you think of the gargantuan number of parameters GPT uses, that's because of the number of layers and functions it's performing.

e.g. If you transformed a word into 1k numbers, you need 1k functions at every step

23/45
Are there ways to detect GPT written text?

There's a few surprisingly simple ways of doing so:

- frequency analysis

- using another model to check the text

24/45
Firstly, because of the hyperparameters used in GPT-3,the frequency of words generated will not follow distributions expected from normal humans, as pointed out by @gwern

25/45 https://twitter.com/gwern/status/1285304763219836933?s=20
For example, copying the first appendix poetry sample from the GPT3 paper into Grover by @allen_ai, shows that Grover thinks it was machine written

27/45
There’ll of course be false positives and false negatives, but having such methods still available make me less fearful of the dangers of fake machine generated text.

28/45
Since both of the tools above rely on structural features of the models, it seems likely that as long as the models have hyperparameters to adjust, the text generated would be identifiable

29/45
Access to the GPT-3 API is waitlisted, but there are workarounds:

- OpenAI released the code for GPT-2
- The team at @huggingface built a user friendly GPT-2 site https://transformer.huggingface.co/ 
- @aarontay posted that @AiDungeon might have backdoor access to GPT-3

30/45
GPT-3 doesn’t quite pass Turing Tests yet, as @lacker shows.

31/45
However, we should expect such models to continue improving.

The eventual use cases for GPT are going to be even more creative and expansive than we think.

Demos are going to keep surprising us.

32/45
At the same time, there’s this unending unease that the robots are here for our jobs.

I think that’s less likely, though open to being proven wrong

33/45
If getting more efficient tools was a major job-killer, excel would have wiped out half of all office jobs years ago.

What’s more likely to happen is that there’s an even higher reward to specialisation in your career

34/45
Think of it this way: Does it take more or less expertise for you to correct your kid’s homework, as they get older?

You start off editing spelling mistakes, you end off needing to know calculus.

35/45
Additionally, people seem to think all the inputs and outputs will be clean and readily usable.

As @vboykis has pointed out before, data cleaning is usually the majority of the time spent in ML:

https://vicki.substack.com/p/were-still-in-the-steam-powered-days

36/45
If you had a custom GPT model for yourself, you'd likely spend most of your time cleaning the data for training.

If you didn't, you'd likely spend most of your time cleaning the output.

37/45
I'd controversially propose that GPT-3 tells us more about ourselves as humans, rather than about computers.

It shows that we have a surprisingly wide range of tolerance for variation in the inputs we receive, whether that's prose, poetry, or pieces of music.

38/45
Perhaps it's that appreciation for ambiguity, that welcoming of the weird, that separates our synapse signals from bits and bytes.

39/45
Or perhaps we'll have to rethink what it means; of being alive.



40/45
Thanks to @gwern and @JayAlammar for answering questions about GPT-3.

And @nbashaw for edits

42/45
This thread not written by GPT-3.

Yet.

45/45
You can follow @Leonlinsx.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: