I always get Normalization and Standardization mixed up.

But they are different.

Notes about them and why do we care.

🧵👇
Feature scaling is key for a lot of Machine Learning algorithms to work well.

We always want all of our data on the same scale.

👇
Imagine we are working with a dataset of workers.

"Age" will range between 16 and 90.
"Salary" will range between 15,000 and 150,000.

Huge disparity!

Salary will dominate any comparisons because of its magnitude.

We can fix that by scaling both features.
👇
We can normalize or standardize these features.

The former will squeeze values within 0 and 1.

The latter will center values around the mean with a unit standard deviation.

👇
Here are some handwritten notes about Normalization.

👇
Here are some handwritten notes about Standardization.

👇
Which one is better?

Well, it depends.

There's really no way to determine which of these two methods works better without actually trying them (Thanks to @javaloyML for pointing this to me.)

It's a good idea to treat these as tunable settings that we can experiment with.
You can follow @svpino.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: