. @Openai GPT-3 Thoughts and Takeaways
Demos are fun, but let's discuss the details.
This thread talks about about sentence completion, trade-offs, few shot learning, fine-tuning, technical takeaways, industry impacts, ethics, fun facts, and open questions.
cc @gdb
(1/13)
Demos are fun, but let's discuss the details.
This thread talks about about sentence completion, trade-offs, few shot learning, fine-tuning, technical takeaways, industry impacts, ethics, fun facts, and open questions.
cc @gdb
(1/13)

There are

There are

(2/13)
Completion Parameters
Prompt - Input text.
Max_tokens - Output token length.
Temperature -
= less random + more deterministic.
= more “creative.”
Top_p - Diversity via nucleus sampling.
Frequency_Penalty -
=
repetition.
(3/13)









(3/13)







(4/13)

GPT-3 thinks it's 10 years old and wants to be a doctor when it grows because it wants to help people.
The playground is a fun toy, but the API makes running GPT-3 easier than running a linear regression using @scikit_learn.
(5/13)

I discovered an interesting trade-off between random creativity and reproducible logic when experimenting on the GRE multiple choice sentence completion task. Increasing the temperature (or creativity) decreased the accuracy.
(6/13)
I wonder if GPT-3 wouldn't be easily be good at writing a math book because you'd like the text part of the book to be more creative and the mathematical part to be logical and repeatable. You probably wouldn't want a math book that was creatively written and then 2+2=5.
(7/13)
(7/13)

Back in the day (a few months ago), you needed to fine-tune a pre-trained model on a task-specific supervised dataset.
Today, you get similar results by simply prepending a few task-specific examples to the prompt during inference using GPT-3.
(8/13)

Zero-shot performance improves steadily with model size.
Few-shot performance increases more rapidly.
Larger models are better at in-context learning.
Graph from paper: https://arxiv.org/pdf/2005.14165.pdf
(9/13)

@OpenAI will be competing with AI-as-an-API startups, like @rev, and big tech companies with ML solutions, like @googlecloud.
Bigger models need better hardware.
Companies will need to upgrade their ML serving infrastructure for bigger models.
(10/13)

The paper talked about social impact and potential misuse. @openai enabled “Flag Toxicity” filter by default and allowed us to send feedback about “unsafe” content. They’re also working on a semantically-deep toxicity filter built on the API.
(11/13)

GPT - June 2018 release date, 150M parameters, 5GB training set.
GPT-2 - February 2019, 1.5B, 50GB.
GPT-3 - June 2020, 175B, 570GB.
GPT-4 - June 2021, 1.5T, 5.7TB.
GPT-4 predicted by GPT-3.
(12/13)

How deep is the model's understanding?
How do we optimize the parameters? Random search?
How do we evaluate the model generally and specifically to priming?
If anyone has any ideas, please feel free to reply.
(13/13)