I've been hearing a lot of arguments out there about the graphs tracking the progress of COVID-19 — in particular people who insist that they "MUST MAKE PER CAPITA ADJUSTMENT".

Let me explain why this is *a model assumption* as to whether it's best way to display the data.
... and it's one that is almost certainly wrong early on in the progress of COVID-19 or in countries that are effective at containing the spread.

Let's say this is a human network:
The per capita normalization makes sense on this network if the distribution of infections looks something like this — i.e. most people in the network are close to an affected person.

That is to say, it's an assumption COVID-19 is already widely distributed.
However, if the distribution of people affected (pink) looks more like this — widespread in strongly connected sub-graphs, but not distributed throughout with bottlenecks between — then per capita normalization mis-represents the rate making totals are a better representation.
The reason per capita normalization mis-represents the rate in this case is that you end up dividing by a total that includes (possibly large) unaffected subgraphs (dashed circle).

It's not too different from randomly adding the population of Nigeria to Italy's and dividing.
That's what makes the per capita normalization a model-dependent assumption. Either COVID-19 —
1. Is already widely spread in all closely connected network components (per capita is good)
2. Is not yet widely spread or is mitigated by e.g. social distancing (per capita is bad)
To put it in more explicit terms, doing a per capita adjustment for WA state includes me in the denominator despite the fact that I've been at home for weeks.

But in truth, neither 1 nor 2 is a perfect model, so either representation can show useful information.
PS — looking at rates on log graphs implicitly makes the per capita adjustment if the COVID-19 is widespread, and doesn't if it's not.

That's because d/dt log X = (1/X) dX/dt so that if X ~ population, you are dividing by X ~ population.
But that can also make comparison hard because the same mathematical operation (taking a logarithm) on data from widespread COVID-19 vs just in subgraphs introduces a model-dependent component.

It may be different for different countries!
You can follow @infotranecon.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: