This thread racked up over 34K retweets. It's an expertly written thread that makes many important points. But its central premise – that incorrectly pooling data is the key problem – is at best misleading and at worst wrong. #epitwitter #statstwitter 1/13 https://twitter.com/mbeckett/status/1278750652160634880?s=20
It's an insidious example of armchair epidemiology: “The experts can't (fully) explain what's going on, but I can. It's simple. The real insight is from [waves hands] [insert cool paradox here].” Let me explain why. 2/13
I'm a data scientist, health economist and public health policy professor. My research uses rigorous statistics methods to draw accurate conclusions from non-random-sample data. Much of my work develops data-driven policy for another deadly respiratory disease: #tuberculosis.3/13
Most of the tweets in this “Trillion $ Question” thread make excellent points. But this sets up a dangerous cognitive trap. You let your guard down, accept the main point without deeply examining it, click ❤️ and get back to your weekend. Which, to be honest, I almost did. 4/13
Experts in epidemiology, biostatistics and health policy have offered 3 main explanations for the southern surge. Instead of just adding his thoughts to the discussion, Becker mostly dismisses the experts' explanations to elevate his main point. 5/13 https://twitter.com/mbeckett/status/1278750656438784001?s=20
He proposes that experts are looking at the data in the wrong way. Only he avoided the pitfall. 6/13 https://twitter.com/mbeckett/status/1278750657495707648?s=20
Simpson's isn't really a paradox. At its heart it’s about confounding, not pooling. Splitting the sample based on factor X is analogous to adjusting/controlling for factor X. So you’d be unwisely comparing the full sample (unadjusted for X) with sub-samples (adjusted by X). 7/13
But even when you split the data by county as he suggests, the curves are affected by 1. changes in composition of cases and 2. time lag between case detection and hosps/death, which are mostly addressed by the 3 explanations the experts have raised. 8/13 https://twitter.com/mbeckett/status/1278750656438784001?s=20
Patterns in the data depend on who we see (who exposed, who tested, severity) and when we see them (lead time bias, severity). And are therefore unlikely to fully reflect the underlying patterns. In other words, we face time-varying selection bias, regardless of pooling. 9/13
If you look at the data and fail to take into account either changes in the demographic composition of cases or changes in the lag between case detection and hospitalization/death, you risk drawing the wrong conclusions and making poor predictions. 10/13
Pointing to Simpson’s paradox as “the simple answer” leads people to draw the wrong conclusions about how to interpret the data. Most perniciously, it discounts the experts’ explanations, which are more complex and more cautious, as they should be. 11/13
Everyone wants a simple answer. But experts rarely provide one when there are many forces at work. They know that giving credence to one simple answer can blind us to the rest of the story. 12/13
Time-varying selection effects give us every reason to believe that the next 2 months are going to be brutal for the south. Becker and I strongly agree on this point. The die is cast and the storm is coming. 13/13 https://twitter.com/mbeckett/status/1278750662038302720?s=20
You can follow @ZoeMcLaren.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: