The gradient descent algorithm should become a standard part of high school calculus courses. This material is a tiny extension of existing calculus material, but vital to the future of information technology through its role in artificial intelligence. 1/
[Disclaimer: opinions in this thread are my own, and not my empoyer's.] In this thread: what is gradient descent and how does it fit into AI? How does this serve students? How does this material fit into calculus? Is this material at the right level for high schoolers? 2/
Background: Mathematical optimization is core to many economic sectors. Off the top of my head: prominent in operations research and logistics, engineering (for pretty much every engineering field in one form or another), economics, and more. 3/
If you have a decision-making problem that you can model mathematically, you can write a mathematical optimization problem whose solution is the best possible decision (when the model is good). 4/
This is one of the most important concepts in the economy, given how many sectors it touches. So if a student has mastery over formulating and solving optimization problems, they have a foot in the door to a huge range of careers. 5/
"Gradient descent" is an algorithm for finding solutions to optimization problems. It's surprisingly simple, but the simplicity is profound: it is enormously versatile and you can use it to make progress on _any_ mathematical optimization problem. 6/
It doesn't always give you a perfect solution, and it is by no means the right tool for every task. But it can get you really, really far, depending on various technical conditions. 7/
The cost/benefit ratio for learning it is ridiculous: it's a tiny extension of basic calculus. It wouldn't add more than a couple of pages and a single problem section to any standard textbook. But it unlocks a whole world of problem-solving. 8/
Importantly, gradient descent has become the backbone of an emerging technology field: artificial intelligence (AI) powered by machine learning (ML). The connection is a bit tricky to explain, but I'll give it a shot below: 9/
Gradient descent is the algorithm that makes ML-based modules called "neural networks" capable of doing basic intelligence tasks. By an intelligence task, I mean something like looking at a picture and recognizing that there's a bird in it, or picking the best move in a game. 10/
Neural networks are now being used by companies like Google and Facebook to provide many services to their users, including translation ( https://en.wikipedia.org/wiki/Google_Neural_Machine_Translation), image classification ( https://engineering.fb.com/ml-applications/advancing-state-of-the-art-image-recognition-with-deep-learning-on-hashtags/), improvements to search, and more ( https://ai.googleblog.com/2020/01/google-research-looking-back-at-2019.html). 11/
The basic formula is [data + neural networks + gradient descent = something remarkably useful]. And while this technology is already in use in the applications I mentioned above, there are many more in development: applications are abundant and demand is high. 12/
The increased demand for AI is also driving a boom in computer hardware markets ( https://www.businesswire.com/news/home/20190828005343/en/Artificial-Intelligence-AI-Chip-Market-Global-Opportunity) that's translating directly into AI-centric jobs there as well ( https://www.nvidia.com/en-us/about-nvidia/careers/). 14/
Putting it all together: ML-based AI technology is going to be central to economic trends over the next decade ( http://reports.weforum.org/future-of-jobs-2018/workforce-trends-and-strategies-for-the-fourth-industrial-revolution/), and gradient descent is a cornerstone concept required for working in ML or understanding ML. 15/
How does gradient descent fit into a calculus curriculum? Simply put: it's already halfway there! But underemphasized, incomplete. Key related concepts (like extrema of functions and Euler's method) are presented already, so it's a small change to add this as well. 16/
The basic lesson fits in three tweets. 17/
You want to solve a problem min_x f(x). Try an initial guess of x_0, and pick a "learning rate" alpha. "Descend" towards the solution by iterating x_{i+1} = x_i - alpha * df/dx|_(x_i). For small alpha, you get close to a local minimum of f eventually. 18/
Intuitively, this is like trying to walk down a hill while blindfolded. All you have to do is check the slope of the ground under your feet (the derivative of the function), and take small steps down the slope (gradient "descent"). 19/
From there you can connect it to regression: finding a "line of best fit" by iteratively minimizing mean-squared-error with gradient descent. A student who learns this material now has a conceptual foundation for modern ML techniques. 20/
Is this material at the right level for high schoolers, and should high schoolers be learning it? Yes and yes! Right level: as mentioned, a direct and small extension of existing material. 21/
This is neatly in-line with recommendations made by educational initiatives like AI4K12 ( https://github.com/touretzkyds/ai4k12/wiki), which recommends that high school students should be prepared to train simple neural networks. Gradient descent is necessary for that. 22/
To sum up: putting gradient descent into the high school calculus curriculum is almost certainly worth it, given how little modification to existing course material it would take, and how much upside it creates. 23/ #education #AIeducation #mathchat #iteachmath @CollegeBoard
This is my first time trying to put anything out there for math education Twitter, so hello to people I don't know! :) If I seem way off base about anything, please feel free to let me know in replies. 24/24
You can follow @jachiam0.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: