There’s a new paper circulating today about “risk factors” for COVID19 which is getting misinterpreted in a pretty common way: applying conclusions about causation to results obtained via methods designed only for finding correlations.

It’s time for a #tweetorial!
Here is the study that inspired this tweetorial. 👇🏼

They looked at a truly huge number of people presenting to medical care in the UK and then compared how common it was for people to die in hospital from COVID across a whoooooole bunch of different types of people. https://twitter.com/StfnFlsch/status/1258401834995273728
Based on those comparisons, they highlight some characteristics which correlated with COVID death as potential having risks or benefits.

Some agree with what we already know: eg older age, certain comorbidities.

Others are counterintuitive: especially “current smoker” status
Why is that counterintuitive? Current smokers should be expected to, on average, have less healthy lungs than never smokers (and maybe even former smokers), and we know COVID19 can kill people by attacking their lungs.
This is where the “disease detective” skill set of an epidemiologist comes in.

An important principle of epidemiology is that we don’t just want to just calculate numbers, we want develop an understanding of the world.
That means we should never just accept the numeric results at face value.

Instead, we think of all the ways we could have gotten that number by accident or mistake, and we try to rule them out, and see what we learn along the way.
Okay so let’s look more closely at these smoking results.

This is a table from the paper. The first column of numbers comes from a logistic regression of COVID death on smoking adjusting for age and sex only. The second column adjusts for a whole big list of other variables.
What do these numbers mean?

Well, the 1.80 number means that, in this sample, the ex-smokers who presented to care were 80% more likely to die in hospital from COVID over the study period when compared to never smokers in that same sample and with the same age & sex.
When we look at the current smokers, the first numeric column tells us *they* were 25% more likely to die than never smokers of the same age and sex in the sample.

....Wait a minute!!! You’re probably thinking “hey, didn’t you just say current smokers were protected??”
I sure did. And that’s what a lot of people are saying about this study, and it’s where the common interpretation mistake comes in.

Look at the second column👇🏼

This gives us estimates of the same quantity but now comparing people who are the same on a huge number of features.
If we compare current smokers to never smokers, with the same age, sex, heart disease, lung disease, cancer, weight, etc etc, then the current smokers were 14% less likely to die in hospital with COVID.

This sounds like an apples to apples comparison, but it’s not!
There are lots of things that might be different about current and never smokers that the researchers didn’t include—that is, confounders.

For example, social determinants of health, parental smoking habits, circumstances of childhood & young adult, etc.
Now, I *think* the authors aren’t actually trying to estimate a causal effect & instead want to identify things that could help us find high-risk people.

So confounding might not matter for them (see linked tweetorial), except that *other people* are interpreting this causally. https://twitter.com/EpiEllie/status/1214641734900224003
The problem is, the authors used the same set of adjustment factors for all the characteristics they were interested in.

This is where we get into trouble with a causal intepretation: they adjust age and sex, and also for *all the other characteristics they are assessing*
Let’s add one of those other characteristics to our schematic & see what that’s problematic👇🏼

Uh-oh!

Heart disease is not really a direct confounder for current smoking and COVID hospital death—it’s actually sometimes *caused* by smoking.
There are two problems in this schematic that appear when we adjust for heart disease.

First, heart disease due to long time smoker might be part of *how* smoking affects COVID death (if it does), and by adjusting we remove that part of the causal effect from our result.
Second, because we are comparing people who are never & current smokers but have the same heart disease status, there must be some other cause of heart disease among the never smokers.

Here, I use social determinants of health as the other cause, but I bet you can think of more!
If the other cause of heart disease affects COVID19, hospitalization, &/or death in hospital with COVID, we have a problem😱

Adjusting for heart disease will make current smoking look related to COVID hospital death *even if its not*!

This is “M bias”—a type of collider bias.
Bringing this all back together we can see that there are two main errors here.

First, paper was not clear enough about the descriptive goals of their analysis.

They don’t define causal questions (see 👇🏼tweetorial for more), and they aren’t trying to obtain causal answers. https://twitter.com/EpiEllie/status/1071949100290068483
I used a very simple cartoon schematic to explain the results, but the real set of relationships is much more complicated.

@David_Simons_UK tried to draw out relationships between all the characteristics the authors considered. https://twitter.com/David_Simons_UK/status/1258543096461004801
If the set of relationships in his schematic is true, then it implies the *only* confounder of the relationship between current smoking and hospital death with COVID19 among the characteristics they considered was ethnicity.
And if *that* is true, then none of the results for smoking in the table provide us with a valid and believable estimate of the causal effect of smoking on hospital death with COVID19 in the UK, or even whether there’s any effect!

(Even ignoring the whole causal question issue!) https://twitter.com/EpiEllie/status/1071949100290068483
To wrap up, I want to stress that just because a study isn’t designed to identify or estimate causal effects does not mean it doesn’t have value.

Descriptive epidemiology is very important, & a key part of pandemic response. We need studies like this to help explore the data.
Estimating causal effects is important too, but it’s hard and takes time.

Right now, we need rapid information on COVID19, like who to expect to show up at our hospitals.

This study can *help* scientists with that but it’s *not* the final word.

Tl;dr: Don’t start smoking!
You can follow @EpiEllie.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: