I have a Gaussian distribution like this in 2d. You know its variance is 1 but don& #39;t know its mean. I randomly pick a point (x₁,x₂) according to this distribution and tell you. You try to guess the mean.

Your best guess is (x₁,x₂).

But this is not true in 3d!!!

(1/n)
This is called Stein& #39;s paradox, and it shocked the world of statistics in 1956!

But I need to state it precisely. An "estimator" is a function that provides a guess of the mean given the sample point x∈Rⁿ. We seek an estimator with smallest average squared error.

(2/n)
It& #39;s easy to create an estimator that does well *sometimes*. For example, suppose your estimator ignores the sample point and always guesses the mean is 0. Then if the mean of my Gaussian actually is zero, you& #39;ll do great!

We want an estimator that does well *always*.

(3/n)
We say one estimator "strictly dominates" another if its average squared error is never bigger, regardless of the Gaussian& #39;s mean - and it& #39;s actually smaller for at least one choice of the Gaussian& #39;s mean.

Got it? https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧐" title="Gesicht mit Monokel" aria-label="Emoji: Gesicht mit Monokel">

(4/n)
In 2d, no estimator strictly dominates the obvious one, where you guess the mean is the sample point (x₁,x₂) that I& #39;ve randomly chosen from my Gaussian distribution.

In 3 or more dimensions, there ARE estimators that dominate the obvious one!!! Utterly shocking!!!

(5/n)
For example, in 3d you can take the sample point x = (x₁,x₂,x₃), move towards the origin by a distance 1/||x||, and take that as your estimate of the mean. This estimator strictly dominates the obvious one where you guess x.

INSANE!!!

Proof:

https://en.wikipedia.org/wiki/Proof_of_Stein%27s_example

(6/n)">https://en.wikipedia.org/wiki/Proo...
The proof is just a computation.

In fact you don& #39;t need to move toward the *origin*. You could choose *any* point p and always move the sample point x towards that point by a distance 1/||x-p||. This estimator strictly dominates the obvious one.

(7/n)
So today my mind is a wreck. "Common sense" has been disproved, and I haven& #39;t yet found the new intuitions that make the truth seem reasonable to me. This is the life of a scientist. I& #39;ve always sought this out. Things always make sense eventually.

(8/n)
One strange clue. Larry Brown showed that no estimator can strictly dominate the obvious one in n dimensions if and only if n-dimensional Brownian motion is "recurrent" - i.e. with probability one it comes back to where it was. This is true only for n < 3.

(9/n)
Larry Brown& #39;s argument is here:

https://projecteuclid.org/euclid.aoms/1177693318

Here& #39;s">https://projecteuclid.org/euclid.ao... a nice intro to Stein& #39;s paradox:

http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf

Do">https://www.statslab.cam.ac.uk/~rjs57/St... any of you intuitively understand why reducing variance at the expense of introducing bias can help? Hmm, maybe I get it... https://abs.twimg.com/emoji/v2/... draggable="false" alt="🤔" title="Denkendes Gesicht" aria-label="Emoji: Denkendes Gesicht">

(10/n, n = 10)
You can follow @johncarlosbaez.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: