There are a lot of things I could say about this article, but in the interest of fairness, the authors at least aren’t wrong about the need to use the best, most valid inputs we can find for our models.
Epidemiologists & modelers know this well. https://twitter.com/statnews/status/1386077576330944521
Epidemiologists & modelers know this well. https://twitter.com/statnews/status/1386077576330944521
But, if we want to know whether action A or action B will lead to the best outcome (eg, the fewest covid cases in school children, teachers, & staff), simply looking at the data is not enough.
This is a causal question & we need causal methods https://academic.oup.com/aje/article/186/2/131/3904485?login=true
This is a causal question & we need causal methods https://academic.oup.com/aje/article/186/2/131/3904485?login=true
We have great causal methods for understanding real world data when we are interested in making decisions about chronic disease, but there are several problems using them to decide on which action to take in a pandemic https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
The most important problem, IMO, is that the *only* way to get real-world data about an infectious disease is to let real-world infections to happen.
That means letting real-world hospitalizations, real-world deaths, and real-world long covid happen too!
That means letting real-world hospitalizations, real-world deaths, and real-world long covid happen too!
This is WHY we use models—they can help us make intelligent predictions about what the data might have been able to tell us, WITHOUT letting those infections, deaths, and illnesses happen.
The second problem is that all of our fantastic methods that help us decide on action A vs B for chronic disease start to fall apart when we’re dealing with infectious disease.
Infections violate a core condition for our causal methods to work: independence between individuals.
Infections violate a core condition for our causal methods to work: independence between individuals.
This is a big problem, but it’s not one without solutions.
Many very smart epidemiologists have worked out fantastic causal tools for understanding what we can expect when we take action A vs B for infectious diseases. https://pubmed.ncbi.nlm.nih.gov/28133589/
Many very smart epidemiologists have worked out fantastic causal tools for understanding what we can expect when we take action A vs B for infectious diseases. https://pubmed.ncbi.nlm.nih.gov/28133589/
Even better, we can combine those causal infectious disease tools with our causal modeling tools. https://academic.oup.com/aje/advance-article-abstract/doi/10.1093/aje/kwab040/6140873
Here’s an example of where we did this for HIV prevention: https://academic.oup.com/aje/advance-article-abstract/doi/10.1093/aje/kwaa239/5943465?redirectedFrom=fulltext
Unfortunately, a causal model require inputs causal inputs, and these aren’t easy to get.
Sometimes, they are actually *impossible* for us to estimate from any data, even a randomized trial. https://journals.sagepub.com/doi/full/10.1177/0272989X19894940
Sometimes, they are actually *impossible* for us to estimate from any data, even a randomized trial. https://journals.sagepub.com/doi/full/10.1177/0272989X19894940
When that happens, we often couldn’t even answer the question using real-world data either, even if we were willing to wait for the data.
But we can use a little bit of real-world data to improve our causal models. https://journals.sagepub.com/doi/abs/10.1177/0272989X17738753
But we can use a little bit of real-world data to improve our causal models. https://journals.sagepub.com/doi/abs/10.1177/0272989X17738753
How to make good decisions about hard questions using the best data we have, but without waiting for more people to get sick and die so that we could have perfect data, is my scientific area of expertise.
Several papers in this
are my best attempts to get better answers sooner.
Several papers in this

I have been studying how to do this for more than a decade, and there are still many things I & others don’t know yet.
The one thing I definitely know: “real-world” data are never simple.
The one thing I definitely know: “real-world” data are never simple.