A quick, non-technical explanation of Dropout.

(As easy as I could make it.)

🧵👇
Remember those two kids from school that sat together and copied from each other during exams?

They aced every test but were hardly brilliant, remember?

Eventually, the teacher had to set them apart. That was the only way to force them to learn.

👇
The same happens with neural networks.

Sometimes, a few hidden units create associations that, over time, provide most of the predictive power, forcing the network to ignore the rest.

This is called co-adaptation, and it prevents networks from generalizing appropriately.

👇
We can solve this problem like teachers do: breaking the associations preventing the network from learning.

This is what Dropout is for.

During training, Dropout randomly removes some of the units. This forces the network to learn in a balanced way.

👇
Units may or may not be present during a round of training.

Now every unit is on its own and can't rely on other units to do their work. They have to work harder by themselves.

Dropout works very well, and it's one of the main mechanisms to reduce overfitting.

👇
Here is an example of how Dropout works.

In this case, we are dropping 50% of all the units.

Notice how the result shows the dropped units (equal to zero), and it scaled the remaining units (to account for the missing units.)

👇
You can follow @svpino.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: