1/ Random thought: Deep Learning model development has seemed to me to be a lot like cooking. Typical example: Images looking a little strange here? Train on this other loss to tone it down a bit. How much? Until it's done!
2/ I'm not claiming this is typical. I really don't know. I'm in my own little world. But I can tell you this: From day one of getting into deep learning I viewed the endeavor as incredibly complex. Tackling incredibly complex things successfully requires different tools.
3/ Here's the thing about high complexity: It's very easy to fool yourself into reaching wrong conclusions about cause/effect. In fact that's probably the norm. Why is that? Well a good part of that is that our working memory simply isn't equipped to deal many moving parts.
4/ You've probably experienced the severe difficulty of trying to recall more than 7-8 digits that you've just seen. Now imagine that you're trying to reason about a system with millions of parameters, many more connections, and innumerable interactions with the outside world.
5/ What's the answer to getting this complexity under control (somewhat)? Well good abstractions- ideally "lossless"- go a long way. Turning many things into one vastly reduces cognitive load and increases your chances of success.
6/ This is why people who say "chemistry is just physics" or "biology is just chemistry" are not just being annoyingly reductionist, but are also quite frankly wrong. These abstracted levels of study are absolutely necessary if we are to make sense of the complexity.
7/ Hence when I set out to deal with deep learning, I didn't think just digging deep into math and trying to view everything from that lens was going to get me far. For one, I'm just ok with math. Passable. It'd be a big, non-motivating uphill battle.
8/ But more importantly, I thought it would be much more productive to focus on honing in on the abstractions of the immense complexity you're dealing with. That is, to develop the skill of being able to intuit the interactions of the system at a high level.
9/ What does that even look like? Well first I think @fastdotai gives a great foundation for that, as they start with big picture abstractions that were clearly hard-earned and passed on to the students as very easy to digest. That was huge for me.
10/ This is in contrast to the bottom up method that you see in fields like math, where the reward of actually using it practically and intuitively never seems in your grasp until years down the road.
11/ But another key component is simply bludgeoning yourself with hard earned experience. You can't help but pick up on insightful patterns if you keep paying attention to what you think is going to happen vs what actually happens in your models.
12/ I think this is pretty similar to what Malcolm Gladwell describes in the book "Blink", which famously describes the now somewhat derided 10,000 hour rule of achieving expertise.
13/ The definition of expertise in that book though I think is great: That you start as a novice by carefully overthinking every step and sticking to "recipes" without necessarily understanding the tasks of a field when first starting out.
14/ But then gradually, you switch from the "recipes" to more of a reliance on "pattern recognition".

Pattern recognition, or "intuition", seems to be developed in a way that seems analogous to neural networks: You just have to keep incrementing with good real world feedback!
You can follow @citnaj.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: