Thread by @karlrohe, Before PCA (i.e SVD), I preprocess with three principals:1) sqrt any features [...]

Before PCA (i.e SVD), I preprocess with three principals:
1) sqrt any features that are counts. log any feature with a heavy tail.
2) localization is noise. *regularize* when you normalize.
3) and my favorite rule, the Cheshire cat rule

explanations in

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧵" title="Thread" aria-label="Emoji: Thread">... https://twitter.com/seanjtaylor/status/1297706506196905985">https://twitter.com/seanjtayl...

https://twitter.com/seanjtaylor/status/1297706506196905985

1)

count data and data with heavy tails can mess up the PCA.

PCA prefers things that are "homoscedastic" (which is my favorite word to ASMR and I literally do it in class)

sqrt and log are "variance stabilizing transformations". It typically fixes it!

2) localization

if you make a histogram of a component (or loading) vector and it has really big outliers, that is localization. It& #39;s bad. It means the vector is noise.

Here is a better diagnostic that my lab uses: https://github.com/karlrohe/LocalizationDiagnostic">https://github.com/karlrohe/...

karlrohe/LocalizationDiagnostic

Creates a plot to diagnose localization in the spectral analysis of graphs - karlrohe/LocalizationDiagnostic

https://github.com/karlrohe/LocalizationDiagnostic

To address localization, I would suggest normalizing by *regularized* row/column sums. This works like fucking magic. Not even kidding.

Before learning this trick from @kamalikac I had given up on spectral techniques.

Let A be your matrix, define rs to contain the row sums, and cs to contain the column sums. define

D_r = Diagonal(1/ sqrt(rs + mean(rs))
D_c = Diagonal(1/ sqrt(cs + mean(cs))

Do SVD on

D_r A D_c

The use of mean(rs) is what makes it regularized.

If you want to know why it works so well... this is my best shot:

paper: https://papers.nips.cc/paper/8262-understanding-regularized-spectral-clustering-via-graph-conductance.pdf

thread">https://papers.nips.cc/paper/826... on the paper: https://twitter.com/karlrohe/status/1011269017582137346?s=20

YouTube">https://twitter.com/karlrohe/... summary of the paper: https://www.youtube.com/watch?v=lOCoa3hYR4Y&feature=youtu.be

Again,">https://www.youtube.com/watch... the diagnostic to assess localization: https://github.com/karlrohe/LocalizationDiagnostic">https://github.com/karlrohe/...

3) the Cheshire cat rule.

“One day Alice came to a fork in the road and saw a Cheshire cat in a tree. ‘Which road do I take?’ she asked. ‘Where do you want to go?’ was his response. ‘I don’t know,’ Alice answered. ‘Then,’ said the cat, ‘it doesn’t matter.”

In unsupervised learning, we often don& #39;t quite know where we are going.

So, is it ok to down-weight, discard, or interact the features? Try it out and see where it takes you!

Latest Threads Unrolled: