practical MLE tip: if you know your distribution isn’t Gaussian, min-max normalize instead of standardize https://twitter.com/svpino/status/1318930792232456192
practical deep learning tip: if you have serious outliers in your dataset, you may need to “clamp” your z-scores after standardization. min-max normalization is slightly more robust in deep learning because all values are guaranteed to be between 0 and 1
practical SWE tip: treat your preprocessing pipeline as an artifact like the model. so your pipeline can be data -> preprocessor -> model -> output. this way you don’t need to copy paste preprocessing code when productionizing, you can just load the preprocessor
it’s mind-blowing that very little of this stuff is taught in school! guess it’s because academics work with perfect, vetted datasets

practical ML tip: you *must* do some form of normalization before kNN or linear models (logistic regression, etc). the scale of each feature is important! it is less important for trees but still doesn’t hurt