Thread by @haltakov, Computer vision for self-driving cars There are different computer vision [...]

Computer vision for self-driving cars

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧠" title="Gehirn" aria-label="Emoji: Gehirn">

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚙" title="Wohnmobil" aria-label="Emoji: Wohnmobil">

There are different computer vision problems you need to solve in a self-driving car.

https://abs.twimg.com/emoji/v2/... draggable="false" alt="▪️" title="Schwarzes kleines Quadrat" aria-label="Emoji: Schwarzes kleines Quadrat"> Object detection

https://abs.twimg.com/emoji/v2/... draggable="false" alt="▪️" title="Schwarzes kleines Quadrat" aria-label="Emoji: Schwarzes kleines Quadrat"> Lane detection

https://abs.twimg.com/emoji/v2/... draggable="false" alt="▪️" title="Schwarzes kleines Quadrat" aria-label="Emoji: Schwarzes kleines Quadrat"> Drivable space detection

https://abs.twimg.com/emoji/v2/... draggable="false" alt="▪️" title="Schwarzes kleines Quadrat" aria-label="Emoji: Schwarzes kleines Quadrat"> Semantic segmentation

https://abs.twimg.com/emoji/v2/... draggable="false" alt="▪️" title="Schwarzes kleines Quadrat" aria-label="Emoji: Schwarzes kleines Quadrat"> Depth estimation

https://abs.twimg.com/emoji/v2/... draggable="false" alt="▪️" title="Schwarzes kleines Quadrat" aria-label="Emoji: Schwarzes kleines Quadrat"> Visual odometry

Details

https://abs.twimg.com/emoji/v2/... draggable="false" alt="👇" title="Rückhand Zeigefinger nach unten" aria-label="Emoji: Rückhand Zeigefinger nach unten">

Computer vision for self-driving cars https://abs.twimg.com/emoji/v2/... draggable=

Object Detection

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚗" title="Auto" aria-label="Emoji: Auto">

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚶‍♂️" title="Gehender Mann" aria-label="Emoji: Gehender Mann">

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚦" title="Vertikale Verkehrsampel" aria-label="Emoji: Vertikale Verkehrsampel">

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🛑" title="Stoppzeichen" aria-label="Emoji: Stoppzeichen">

One of the most fundamental tasks - we need to know where other cars and people are, what signs, traffic lights and road markings need to be considered. Objects are identified by 2D or 3D bounding boxes.

Relevant methods: R-CNN, Fast(er) R-CNN, YOLO

Object Detection https://abs.twimg.com/emoji/v2/... draggable=

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚶‍♂️" title="Gehender Mann" aria-label="Emoji: Gehender Mann">https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚦" title="Vertikale Verkehrsampel" aria-label="Emoji: Vertikale Verkehrsampel">https://abs.twimg.com/emoji/v2/... draggable="false" alt="🛑" title="Stoppzeichen" aria-label="Emoji: Stoppzeichen"> One of the most fundamental tasks - we need to know where other cars and people are, what signs, traffic lights and road markings need to be considered. Objects are identified by 2D or 3D bounding boxes.Relevant methods: R-CNN, Fast(er) R-CNN, YOLO" title="Object Detection https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚗" title="Auto" aria-label="Emoji: Auto">https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚶‍♂️" title="Gehender Mann" aria-label="Emoji: Gehender Mann">https://abs.twimg.com/emoji/v2/... draggable="false" alt="🚦" title="Vertikale Verkehrsampel" aria-label="Emoji: Vertikale Verkehrsampel">https://abs.twimg.com/emoji/v2/... draggable="false" alt="🛑" title="Stoppzeichen" aria-label="Emoji: Stoppzeichen"> One of the most fundamental tasks - we need to know where other cars and people are, what signs, traffic lights and road markings need to be considered. Objects are identified by 2D or 3D bounding boxes.Relevant methods: R-CNN, Fast(er) R-CNN, YOLO" class="img-responsive" style="max-width:100%;"/>

Distance Estimation

https://abs.twimg.com/emoji/v2/... draggable="false" alt="📏" title="Gerades Lineal" aria-label="Emoji: Gerades Lineal">

After you know what objects are present and where they are in the image, you need to know where they are in the 3D world.

Since the camera is a 2D sensor you need to first estimate the distance to the objects.

Relevant methods: Kalman Filter, Deep SORT

Distance Estimation https://abs.twimg.com/emoji/v2/... draggable=

After you know what objects are present and where they are in the image, you need to know where they are in the 3D world.Since the camera is a 2D sensor you need to first estimate the distance to the objects.Relevant methods: Kalman Filter, Deep SORT" title="Distance Estimation https://abs.twimg.com/emoji/v2/... draggable="false" alt="📏" title="Gerades Lineal" aria-label="Emoji: Gerades Lineal"> After you know what objects are present and where they are in the image, you need to know where they are in the 3D world.Since the camera is a 2D sensor you need to first estimate the distance to the objects.Relevant methods: Kalman Filter, Deep SORT" class="img-responsive" style="max-width:100%;"/>

Lane Detection

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🛣️" title="Autobahn" aria-label="Emoji: Autobahn">

Another critical information the car needs to know is where the lane boundaries are. You need to detect not only lane markings, but also curbs, grass edges etc.

There are different methods to do that - from traditional edge detection based methods to CNNs.

Lane Detection https://abs.twimg.com/emoji/v2/... draggable=

Another critical information the car needs to know is where the lane boundaries are. You need to detect not only lane markings, but also curbs, grass edges etc.There are different methods to do that - from traditional edge detection based methods to CNNs." title="Lane Detection https://abs.twimg.com/emoji/v2/... draggable="false" alt="🛣️" title="Autobahn" aria-label="Emoji: Autobahn"> Another critical information the car needs to know is where the lane boundaries are. You need to detect not only lane markings, but also curbs, grass edges etc.There are different methods to do that - from traditional edge detection based methods to CNNs." class="img-responsive" style="max-width:100%;"/>

Driving Path Prediction

https://abs.twimg.com/emoji/v2/... draggable="false" alt="⤴️" title="Nach rechts zeigender Pfeil mit Krümmung nach oben" aria-label="Emoji: Nach rechts zeigender Pfeil mit Krümmung nach oben">

An alternative is to train a neural network that will directly output the trajectory that the car needs to drive. This can be used as a substitute to centering between the lane markings if they are not visible for example.

Driving Path Prediction https://abs.twimg.com/emoji/v2/... draggable=

An alternative is to train a neural network that will directly output the trajectory that the car needs to drive. This can be used as a substitute to centering between the lane markings if they are not visible for example." title="Driving Path Prediction https://abs.twimg.com/emoji/v2/... draggable="false" alt="⤴️" title="Nach rechts zeigender Pfeil mit Krümmung nach oben" aria-label="Emoji: Nach rechts zeigender Pfeil mit Krümmung nach oben"> An alternative is to train a neural network that will directly output the trajectory that the car needs to drive. This can be used as a substitute to centering between the lane markings if they are not visible for example." class="img-responsive" style="max-width:100%;"/>

Drivable Space Detection

https://abs.twimg.com/emoji/v2/... draggable="false" alt="⭕️" title="Fetter großer Kreis" aria-label="Emoji: Fetter großer Kreis">

The goal here is to detect which parts of the image represent the space where the car can physically drive onto.

The methods here are usually very similar to the semantic segmentation methods (see below).

Drivable Space Detection https://abs.twimg.com/emoji/v2/... draggable=

The goal here is to detect which parts of the image represent the space where the car can physically drive onto.The methods here are usually very similar to the semantic segmentation methods (see below)." title="Drivable Space Detection https://abs.twimg.com/emoji/v2/... draggable="false" alt="⭕️" title="Fetter großer Kreis" aria-label="Emoji: Fetter großer Kreis">The goal here is to detect which parts of the image represent the space where the car can physically drive onto.The methods here are usually very similar to the semantic segmentation methods (see below)." class="img-responsive" style="max-width:100%;"/>

Semantic Segmentation

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎨" title="Farbpalette" aria-label="Emoji: Farbpalette">

Not all parts of the image can be described by a bounding box or a lane model, e.g. trees, buildings, the sky. Semantic segmentation methods classify each pixel in the image.

Relevant methods: Fully Convolutional NN, UNet, PSPNet

Semantic Segmentation https://abs.twimg.com/emoji/v2/... draggable=

Not all parts of the image can be described by a bounding box or a lane model, e.g. trees, buildings, the sky. Semantic segmentation methods classify each pixel in the image.Relevant methods: Fully Convolutional NN, UNet, PSPNet" title="Semantic Segmentation https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎨" title="Farbpalette" aria-label="Emoji: Farbpalette">Not all parts of the image can be described by a bounding box or a lane model, e.g. trees, buildings, the sky. Semantic segmentation methods classify each pixel in the image.Relevant methods: Fully Convolutional NN, UNet, PSPNet" class="img-responsive" style="max-width:100%;"/>

Depth Estimation

https://abs.twimg.com/emoji/v2/... draggable="false" alt="📐" title="Geodreieck" aria-label="Emoji: Geodreieck">

The goal is to estimate the distance to every pixel in the image, in order to have a better 3D model of the surrounding.

Methods like stereo and structure-from-motion are now being replaces by self-supervised deep learning models working on single images.

Depth Estimation https://abs.twimg.com/emoji/v2/... draggable=

The goal is to estimate the distance to every pixel in the image, in order to have a better 3D model of the surrounding.Methods like stereo and structure-from-motion are now being replaces by self-supervised deep learning models working on single images." title="Depth Estimation https://abs.twimg.com/emoji/v2/... draggable="false" alt="📐" title="Geodreieck" aria-label="Emoji: Geodreieck">The goal is to estimate the distance to every pixel in the image, in order to have a better 3D model of the surrounding.Methods like stereo and structure-from-motion are now being replaces by self-supervised deep learning models working on single images." class="img-responsive" style="max-width:100%;"/>

Visual Odometry

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎥" title="Filmkamera" aria-label="Emoji: Filmkamera">

While we know the movement of the car from the wheel sensors and IMU, determining the actual movement in the camera can be more accurate to get the pitch angle for example.

The visual odometry estimates the 6 DoF movement of the camera between two frames.

Visual Odometry https://abs.twimg.com/emoji/v2/... draggable=

While we know the movement of the car from the wheel sensors and IMU, determining the actual movement in the camera can be more accurate to get the pitch angle for example.The visual odometry estimates the 6 DoF movement of the camera between two frames." title="Visual Odometry https://abs.twimg.com/emoji/v2/... draggable="false" alt="🎥" title="Filmkamera" aria-label="Emoji: Filmkamera"> While we know the movement of the car from the wheel sensors and IMU, determining the actual movement in the camera can be more accurate to get the pitch angle for example.The visual odometry estimates the 6 DoF movement of the camera between two frames." class="img-responsive" style="max-width:100%;"/>

Summary

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🏁" title="Karierte Flagge" aria-label="Emoji: Karierte Flagge">

There are of course many other computer vision problems that may be helpful, but this thread will give you an overview of the most important ones.

As you see, nowadays, deep learning methods (and especially CNNs) dominate all aspects of computer vision...

If you liked this thread and want to read more about self-driving cars and machine learning follow me @haltakov!

I have many more threads like this planned

https://abs.twimg.com/emoji/v2/... draggable="false" alt="😃" title="Lächelndes Gesicht mit geöffnetem Mund" aria-label="Emoji: Lächelndes Gesicht mit geöffnetem Mund">

Latest Threads Unrolled: