OK, quick thread on the 3 major types of statistical issues in seroprevalence studies—that is, studies that try to figure out how many people had COVID-19, like the one released in NY yesterday. These are:

1) Test accuracy (false negatives/positives)
2) Sample selection
3) Lags
Test accuracy has gotten the most discussion here. These tests aren't perfect. They produce false positives (it says you had it when you didn't) and false negatives (it says you didn't when you did).
Intuitively, the false positive problem should be easy to see. If a test produces, say, 1-2% false positives among healthy people, then results saying that 2-3% of people tested positive in a given region just won't really tell you very much.
Technically, the issue is not the false positive rate per se but that *we don't know what the true false positive rate* is. If we knew false positives were *exactly* 1%, we could correct for it. But it could be 0.5% or 2.5% or who knows—maybe not what the manufacturer claims.
Unless these tests are *very* accurate—and they probably won't be, since it's still the early stages of this crisis—this renders antibody studies not terribly useful in places where the underlying incidence of a disease in a population is low (say, under 5%).
However, this should be a less of a problem in places that have had a medium-to-bad epidemic; say, Belgium or London—or certainly NYC. Studies in those places should be more reliable, therefore.

Further, there is *also* the issue of false negatives.
You can follow @NateSilver538.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: