1. When plotting epidemic curves or death totals, should we divide by population size? Here on twitter this question has generated a lot more heat than light.

The answer is a bit subtle and so while I’ve tweeted about this before I want to address it in more detail.
2. Unfortunately this issue has become politicized because if you start at some fixed number of deaths and look at total cases, the pandemic in the US looks among the worst in the world.
3. If instead you divide by population size to show prevalence (and if you omit China and a few other nations from the map) the US looks quite good.
4. So which is right? Should you divide by population size, or not? I think you can—but you need to start each country from a constant *fraction* of cases, not a constant number of cases. Then you can see prevalence data, but you are not mislead by comparisons between countries.
5. Today I was talking my college Ben Kerr @evokerr about how to best illustrate what is going on with these different graphs, and he came up with a very nice framework for thinking about the problem. I'd like to share that, and his illustrations, here.
6. Why might per capita cases (infected individuals or fatalities) in a large country such as the U.S. appear to be increasing slower than per capita cases in smaller countries, even when there is no difference in the actual rate of increase?
7. Imagine a large country. Different regions experience an epidemic at different times; in the diagram below, the country comprises four regions. The first outbreak is in the green region. Later, an infected individual moving south starts the outbreak in the blue region.
8. After further delays, dispersal initiates outbreaks in the purple and then orange regions as well. Within our four-region country, the number of cases is doubling every time step. Once an outbreak begins in any region, all else equal, it also increases approximately two-fold.
9. For instance, if we realign the blue region such that we start its timeline upon its first case, we can see the epidemic curves (cases as a function of time) for country and this region are in lockstep with one another.
10. Now if we plot the per capita cases for the country from the time of its first case, and compare this to the per capita cases for each region *started at their respective first cases*, it looks like the country's outbreak is much milder than those in than its own regions.
11. What is going on here is that the starting position for the country as a whole is not properly aligned at the left side of the graph with the starting positions for its various regions. The above graph that @evokerr put together is really useful to see what is going on.
12. At the time of the first case in the country, the per-capita prevalence for the country includes a set of disease-free regions. The y-intercept for the country is lower than the intercepts for each smaller region at the starts of their respective outbreaks.
13. While the difference in intercepts looks small because of due to the linear scaling on the y-axis, exponential growth quickly reveals the initial differences in starting heights.
14. In this example the difference in epidemic trajectories is spurious. The regions and the country are increasing at the same rate—but different regions are doing it asynchronously. This asynchrony translates to a problematic comparison when graphing per-capita cases.
15. Now, in the real world, different countries or regions may experiencing different rates of epidemic spread. But if one wants to make an apples-to-apples comparison, then one would compare per capita trajectories from the same starting fraction infection, like this.
16. In other words, it's OK to look at per-capita measures, but if you are plotting things on a linear scale, you need to make sure that the starting points are aligned in terms of their per-capita values. Anything else runs the risk of misleading your viewers.
17. Finally, for some audiences, it's nature to use a logarithmic scale on the vertical axis. When doing so, you get the best of all worlds: slopes reflect growth rates, heights reflect prevalence, and relative timing is maintained. ( https://datausa.io/coronavirus )
18. I hope this helps clear up some of the confusion swirling around the use of per-capita versus absolute measures when plotting epidemic curves. The answer, as I see it, is that you can do either—so long as you do so properly. Thanks again to @evokerr for the ideas and graphs.
You can follow @CT_Bergstrom.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: