1) Tesla's Autopilot driver assist product is now largely feature complete, but how much further does it have to progress to achieve reliability 3-5x greater than the average human & allow removal of human supervision?
...
2) This is the March of 9s.
Tesla’s self driving strategy from the start was chosen & optimised for this moment now; putting a system & infrastructure in place without data or hardware bottlenecks to allow largely automated progress for a feature complete AP on the March of 9s.
3) But how far do they have left to March?
Is Tesla 99.9% there, does it have 9,000x further to go or its it 33% there? It’s all a matter of perspective.
And most importantly does this all mean is Tesla 1 year away or 10 years away from Robotaxi level reliability?
4) I’ll get into the details soon, but to skip to the conclusion – I’m leaning towards 33% of the way there and 2 years away from 3x human reliability, with 2021-2024 the most likely 3 year period at ~40% odds.
5) On a high level first, Tesla’s solution to the March of 9s is “Operation Vacation”; aside from efficient human labeling & algorithmic & driving policy tweaks it is largely automated:
A) Fleet disengagements/shadow mode disagreements generate a priority edge case class list
6) B) Fleet collects data when cars see these cases (~1.2bn fleet miles/month)
C) New data is labelled & used to retrain neural nets. System iteratively improves its predictions using self supervised learning using real knowledge of object's future locations.
7) D) System tests the new neural nets in simulation against test cases & chooses the best NN.
E) NN & broader architecture is rolled out in shadow mode and beta to test real world performance.
8) Now to the numbers:
First lets define reliability as probability of zero accidents per 5 seconds of driving (5 seconds chosen somewhat arbitrarily as this is likely the video length used for training Tesla’s NNs).
9)
A) I’ll assume a self driving car is competing against ~9% odds (no controls, just luck) –1 nine.
B) Humans crash in the US every ~100k miles (only serious enough to be reported to NHTSA every ~265k miles, which with~1.8x cars per crash leads to their 1 crash/478k miles data).
10) Translated into probability of no crash per 5 seconds (using 31mph average speed) 100k miles is 99.99996% or 6.6 nines. So to get to 3-5x humans would require ~7 nines.
11) C) On rough estimate I guess Tesla FSD Beta has a disengagement every ~3 miles in the city & Autopilot every ~80 miles on highways. Time & speed weighted this puts Tesla miles per disengagement at 34 & probability of no disengagement per 5 seconds at 99.87%.
12) So Tesla is 99.9% there? Or does it have to reduce disengagement probability per 5 seconds another ~8900x from 0.13% to 0.000014% to get to 3x humans?
I don’t think either of these are the right way to look at it.
13) This is how I think about it:
For Tesla to solve a another 9 it has to get from 99.9% to 99.99% - this requires solving new edge case classes that add up to 0.09% disengagements per 5 seconds.
14) Say solving this 9 requires solving 100 new edge cases. Then it has to solve 100 edge cases which on average individually lead to 0.09%/100 = 0.0009% disengagements per 5 seconds.
15) So they solve these, now they are at 99.99%. To solve the next 9 they need to solve edge cases which add up to 0.009%. They can do this by solving another 100 edge cases which this time on average lead to 0.00009% problems per 5s.
16) This is key; the latest 9 took the same amount of work on labeling & Autopilot team problem solving as the prior 9 – it didn’t need 10x more work. What it did require though is access to 10x more data & data filtering processing (but 1x not 10x more actual data collection).
17) This is where Tesla’s fleet vision strategy shines. It already has a fleet driving ~1.2bn miles/month with HW2+ supercomputers able to do data filtering & processing. So to access 10x more data Tesla simply has to scale up utilisation rates of its customer car fleet.
18) So for example above for getting to 99.999% level (or 2 nines past current FSD Beta), it needed to solve 100 new edge case classes averaging 0.00009% occurrence per 5s. These would happen 25,000 per month across Tesla’s fleet’s 1.2bn miles (38mn hours).
19) So if Tesla needs e.g. 1000 examples of each of these for the NN training solution, it still only has to collect data from 1000/25000 = 4% utilisation of fleet encounters with these cases.
20) So in summary I’d say Tesla has solved two 9s with FSD beta & has to solve four more to get to 3x humans. Hence ~33% there. But when was the starting point? I think @karpathy started working on video based Neural nets in 2018.
21) But probably this latest architecture has only been broadly settled on for ~1 year. Hence Tesla's March of 9s strategy for FSD Beta has progressed 33% in 1 year & thus could get to 3x human in another 2 years.
22) But Tesla is venturing very far into the unknown here. We do not know if they will hit other architecture or hardware bottlenecks or local maximum to slow down their progress. Fortunately Tesla’s agility will help respond to challenges as they come.
23) Ideally Tesla will achieve 3x with the current HW3 hardware, but next year manufacturing capacity will be ~1 million so even if they have to make changes to computing power, camera definition, laser camera cleaners, radar etc, they will very rapidly have a large fleet anyway.
24) Some important notes on this thread: Tesla is solving edge case classes & not specific edge cases; the system has some level of generalization ability.
25) So many very rare edge cases (with a frequency level associated with a later nine) can already be handled by Tesla’s cars by generalising solutions across its playbook of already trained &solved edge case classes.
26) And the larger & more varied the number of edge case classes solved in Tesla’s NNs, the better & better it will get at generalising to never before seen situations.
27) Another important note: All the numbers above are just examples & numbers are just for illustration. The real situation is much more complicated.
28) In particular we do not know the real frequency distribution of edge case classes (particularly the ones which aren’t already solved by generalisation between past class solutions).
29) It could be there is a sudden fatter tail of far rarer new edge case classes needed to solve the next 9; this would then need more actual work & labeling than the past 9 as well as access to even rarer events.
30) Fortunately with Tesla’s fleet driving 13.5bn miles a year it has a lot of fleet driving experience utilisation runway to play with.
31) One argument against a fatter & fatter tail of ungeneralisable edge cases is that if this were true – it seems very unlikely Tesla & Waymo etc would have got as far as they have already.
32) For example if there were say 1 million different 0.0000009% per 5 second edge case classes - this frequency group of problems would add up to 0.9% problems per 5 seconds - & it would have been impossible to solve even the second 9.
33) If I haven’t lost & confused you too much already. Some more threads on Tesla’s Autopilot:

General summary of the rational for Tesla’s Data/Intelligence Heavy Vision strategy: https://twitter.com/ReflexFunds/status/1292186746764038145?s=20
34) Thread on the detection bottleneck holding back the previous Autopilot architecture & how the new 4D architecture may have solved this: https://twitter.com/ReflexFunds/status/1292887757879222272?s=20
35) Thread on the human resource bottlenecks to Tesla's Autopilot performance and how they've worked to solve this with Operation Vacation: https://twitter.com/ReflexFunds/status/1292891381229379585?s=20
You can follow @ReflexFunds.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: