When folk estimated a herd immunity threshold, not only did many miss the importance of heterogeneity in transmission and susceptibility in lowering HIT, but they may have also missed important biases in how we estimate R0:

Thread:
Estimation of R0 essentially boils down to measuring an exponential growth rate. When cases go 1–>2–>4–>8 in days 0–>2–>4–>8, we measure R0 by noting the epidemic is doubling every 2 days, and incorporate some additional info on how long it takes for one person to infect another.
The time it takes for one person to infect another (the serial interval/generation time distribution) is reasonably well-estimated, but the way we estimate growth rates in cases/deaths may have biases if R0 is variable across regions.
We measured the growth rate of epidemics early on by looking at how fast cases grow in a country. The problem? Cases in a country quickly reflect the fastest-growing epidemic.

If you measure temperature in a room with a fire and an explosion, T quickly looks like the explosion.
Cases early in the epidemic were from densely populated metropolitan areas: Wuhan, Milan, New York City, the whole NE corridor megalopolis, etc. So, measuring R0 early in the epidemic likely runs the same risk as measuring IFR from cases: we’re biased towards observing the worst.
Mathematically, this is a consequence of measuring growth rates in pooled counts. I showed this effect in the SI of a paper on how to infer the rate of spillover when pooling events from multiple sources - our inferences look like the dominant source.

https://royalsocietypublishing.org/doi/10.1098/rstb.2018.0331
Consider epidemic i with expected cases on day t, x(i,t). When we estimate growth rates of counts for epidemic i via our usual tools (generalized linear models and their relatives), we estimate

r(i,t) = log(x(i,t))

But when we pool cases, they have mean

X(t)=x(1,t)+...+x(n,t)
And when we estimate how X(t) grows exp’ly, we estimate

r(t)=log(X(t))
=log(x(1,t)+...+x(n,t))
=log(exp(r(1,t))+...+exp(r(n,t)))

The right side above looks ugly but it has a name + intuition: it’s called the LogSumExp function, and it’s a kind of a “smooth maximum”.
The smooth maximum functions take a set of values, such as exponential growth rates, and outputs something like the maximum.

https://en.m.wikipedia.org/wiki/Smooth_maximum
Imagine you bought a portfolio of three stocks, and one earned +8% a year while the other two +2%. As your fast-growing stock dominates your portfolio, your portfolio’s average growth and day-to-day jumps would begin to resemble that of the maximum. Same idea!
When we measured R0 early in the epidemic, our estimates looked more like the “maximum” epidemic - that with the fastest growth rate.

Not only should we be suspicious about HITs estimated assuming homogeneity, but we should also revisit R0 given likely biases in early estimates.
Deaths in NY doubled every 2.6 days in March, whereas WA, CA doubled far slower. Japan doubled every 7 days. Different places & people clearly have different r(i,0) + R0 and consequently different HITs, and our early estimates may have been biased towards the maximum/worst cases.
For a simple take home: the growth rate of an average/pool of exponentially growing processes is not equal to the average growth rate.

The R0 of the average epidemic isn’t the average of the R0s, it’s more like the highest R0 in the set.

Science!!!
You can follow @Alex_Washburne.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: