I have a Gaussian distribution like this in 2d. You know its variance is 1 but don't know its mean. I randomly pick a point (x₁,x₂) according to this distribution and tell you. You try to guess the mean.

Your best guess is (x₁,x₂).

But this is not true in 3d!!!

(1/n)
This is called Stein's paradox, and it shocked the world of statistics in 1956!

But I need to state it precisely. An "estimator" is a function that provides a guess of the mean given the sample point x∈Rⁿ. We seek an estimator with smallest average squared error.

(2/n)
It's easy to create an estimator that does well *sometimes*. For example, suppose your estimator ignores the sample point and always guesses the mean is 0. Then if the mean of my Gaussian actually is zero, you'll do great!

We want an estimator that does well *always*.

(3/n)
We say one estimator "strictly dominates" another if its average squared error is never bigger, regardless of the Gaussian's mean - and it's actually smaller for at least one choice of the Gaussian's mean.

Got it? 🧐

(4/n)
In 2d, no estimator strictly dominates the obvious one, where you guess the mean is the sample point (x₁,x₂) that I've randomly chosen from my Gaussian distribution.

In 3 or more dimensions, there ARE estimators that dominate the obvious one!!! Utterly shocking!!!

(5/n)
For example, in 3d you can take the sample point x = (x₁,x₂,x₃), move towards the origin by a distance 1/||x||, and take that as your estimate of the mean. This estimator strictly dominates the obvious one where you guess x.

INSANE!!!

Proof:

https://en.wikipedia.org/wiki/Proof_of_Stein%27s_example

(6/n)
The proof is just a computation.

In fact you don't need to move toward the *origin*. You could choose *any* point p and always move the sample point x towards that point by a distance 1/||x-p||. This estimator strictly dominates the obvious one.

(7/n)
So today my mind is a wreck. "Common sense" has been disproved, and I haven't yet found the new intuitions that make the truth seem reasonable to me. This is the life of a scientist. I've always sought this out. Things always make sense eventually.

(8/n)
One strange clue. Larry Brown showed that no estimator can strictly dominate the obvious one in n dimensions if and only if n-dimensional Brownian motion is "recurrent" - i.e. with probability one it comes back to where it was. This is true only for n < 3.

(9/n)
Larry Brown's argument is here:

https://projecteuclid.org/euclid.aoms/1177693318

Here's a nice intro to Stein's paradox:

http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf

Do any of you intuitively understand why reducing variance at the expense of introducing bias can help? Hmm, maybe I get it... 🤔

(10/n, n = 10)
You can follow @johncarlosbaez.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: