practical MLE tip: if you know your distribution isn’t Gaussian, min-max normalize instead of standardize https://twitter.com/svpino/status/1318930792232456192
practical deep learning tip: if you have serious outliers in your dataset, you may need to “clamp” your z-scores after standardization. min-max normalization is slightly more robust in deep learning because all values are guaranteed to be between 0 and 1
practical SWE tip: treat your preprocessing pipeline as an artifact like the model. so your pipeline can be data -> preprocessor -> model -> output. this way you don’t need to copy paste preprocessing code when productionizing, you can just load the preprocessor
it’s mind-blowing that very little of this stuff is taught in school! guess it’s because academics work with perfect, vetted datasets 😏
practical ML tip: you *must* do some form of normalization before kNN or linear models (logistic regression, etc). the scale of each feature is important! it is less important for trees but still doesn’t hurt
You can follow @sh_reya.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: