How can Tesla progress from its current Autopilot product to Robotaxis?
1) I think the AP driver assist product is an extremely valuable feature; Human+AP already looks statistically safer than human alone & it makes long distance driving far less tiring.
2) However, AP still looks far from capable of driving fully autonomously. Why is this, given Tesla's full access to data from the ~1 billion miles per month Tesla’s fleet is driving?
3) I put this down to two key reasons: A) Architecture capability bottleneck & B) Staff bandwidth bottleneck for solving more & more driving scenarios/problems.
4) (Prior to the release of Tesla’s in-house neural net chip & HW3 in March 2019 Tesla also had a compute bottleneck – but current AP is still far from utilising the new HW3 computer.)
5) Note: The following thread is heavily simplified & the technical details of the neural network architectures & training procedures are all far more complicated than summarised below.
6) What is the architecture capability bottleneck & can it be solved?
In my view Tesla’s current 2D image-based Autopilot architecture is likely incapable of solving “pseudo lidar” to a sufficient level to allow Robotaxis. But I don’t think it was ever planned or expected to.
7) I think there is truth to the “Waymo is winning” consensus' arguments that Tesla’s camera based neural networks are not currently able to measure object distance & velocity as accurately as the high spec $ multi-thousands Lidars available today.
8) This is currently a key bottleneck to Tesla’s AP capabilities.
However, I think Tesla’s plan always involved using a new architecture to solve this problem – this was the key reason for developing the HW3 chip & the key project @Karpathy has been working on since he joined.
9)I think the current AP solves pseudo lidar using 2 frames of 2D labelled images processed from each camera separately. It was trained initially using radar measurements to label distance & velocity. They then use driving policy/symbolic AI to predict object's future paths
10) In 2018 a new NN “AKnet” was discovered in Teslas (but never activated) which fed all camera images into a single neural net.
Since then I think @Karpathy has been working to perfect this architecture & take multi seconds video from all cameras to feed into a single NN.
11) This has required fundamental AI breakthroughs to achieve but I believe the broadly final architecture is now in developer cars & is planned to be pushed to the fleet before year end – thus eliminating Tesla’s current Autopilot neural net architecture.
12) Why may this new architecture be a giant leap to solving pseudo lidar?
With multi cameras, the neural net can do multiple combinations of parallax calculations to estimate object distance. With full video frames it has multiple reference points to use to calculate velocity.
13) But most importantly, it is able to use self-supervised learning to continuously improve the models it uses to make these calculations. For example, the neural net can identify a car at second 1, & then predict the cars position at second 3...
14)... (implicitly requiring measurements of distance & velocity). But if its prediction proves incorrect & doesn’t match the position of the car at second 3 then the NN parameters can be updated & cycled continuously until its predictions get more & more accurate.
15) This should allow a pathway for the pseudo lidar system to match lidar’s accuracy at distance & velocity measurements, but key here is that lidar only sees a generic object – it has no idea what that object is.
16) Tesla’s 4D system combines detection with prediction – not only will it learn & extrapolate simple laws of physics for a generic object, it also knows what that object is & thus future path predictions can include knowledge like; if its a car it may break or change direction
17) With most of Tesla’s fleet now installed with a HW3 computer, Tesla has the compute capacity to move to these much larger & more complicated neural nets with its current hardware.
18) So, this is why In my prior thread i said Tesla’s planned Q4 public rollout of its 4D video based neural nets is the day of reckoning for Tesla’s Robotaxi vision. Either 4D self-supervised learning makes Tesla’s pseudo lidar approach equivalent to lidar...
19) ... & Robotaxis just become a matter of using more fleet data to tick off edge cases, or it doesn’t get close enough & there is a lot more fundamental algorithmic improvement & computing resource progress needed.
20) So, we should soon know if Tesla has solved the architecture capability bottleneck. What about the staff bandwidth bottleneck?
(This has got rather long, so I’ll continue in a later thread).
21) Thread part 2:
You can follow @ReflexFunds.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: