What is Case Fatality Rate and how do you calculate it?

A thread.
This seems trivial, right? What percentage of your cases die? That's a simple question.

And after coronavirus has run it's course it will be fairly simple. CFR=deaths/cases.

(Note that cases is also just recoveries+deaths).

Done.
Note: because it is "case" fatality rate, we can ignore for the purpose of this discussion all those people were never diagnosed. The reality is that some of those undiagnosed people will die, but to make this simple, we're only interested in cases, not infections.
But what about just calculating the CFR now from available data? Or at least, estimating it.

Everybody's first dumb attempt is dumb and wrong:
CFR = current deaths/current cases
Most of us realize that this is wrong because the current cases were mostly just diagnosed in the last few days, and they haven't had time to die yet. And because of exponential growth, the current cases who will die is DRAMATICALLY higher than current deaths.
So this leads to...
Everybody's second dumb attempt:
CFR = current deaths/(current recoveries+deaths)

Well that seems better, right? It's only resolved cases and that's what we want, right?

No.
The problem here is similar but opposite of the first problem.

After diagnosis (the point at which people become "cases"), death happens much sooner than recovery. One to two *weeks* sooner.

So you're still not seeing complete case outcomes.
You get deaths being VERY overrepresented, as recoveries came from cases three to four weeks ago, and deaths came from cases that were new last week (i.e. a lot more cases to select from).
So there are some more sophisticated ways of calculating CFR that people advocate for, like this one:
CFR=deaths today/cases n days ago.

Where n is the average number of days it takes for someone to die after diagnosis.
Well that's great but first of all, if we get n wrong by even one day, because of exponential growth, we're going to get wildly different numbers.

But also, starting n days ago, and counting deaths today, we're assuming the rest will all recover and that's a horrible assumption.
To overcome this problem, instead of focusing on lag between diagnosis and death, you can focus on lag between death and recovery:
CFR=deaths x days ago/(recoveries today + deaths x days ago).
Where x is the extra time to after earlier deaths to wait and see if cases recover.
The idea is that recoveries today + deaths x days ago is your actual cases. These cases are all closed, and no other cases are expected to close. So this is better.

But again, this is very fragile depending on the value of x.
For example if I use x as 9 days I get 5.8% CFR. 10 days gives 4.1% and 11 days gives 3.5%.

But there are still big problems with all of these methods.
The first problem is that most people are naively using totals instead of new cases and deaths on those days. (The numbers I found above were with this naive calculation).

An easy fix but inadquate because...
The real problem of everything above is that daily totals can not reliably tell us anything about individual case outcomes.

Some people die sooner, some later. Some recover more quickly than others. Some cases were diagnosed early, and some diagnosed later.
And because of exponential growth, there's no simple assumption or average like "n days of lag" that will correctly and accurately adjust for those variations. Mathematically those averages just don't work.
Especially since those variations themselves change over time. When testing is good we catch more cases early. When testing is behind, as it was early on, and as it is right now, we catch more and more cases late.
But there is a way to accurately and correctly calculate CFR right now. We just need data that is not publicly available.
If you had a database of all individual cases and their individual status, start date, end date, outcome, it would again be trivial to calculate CFR:
CFR=deaths/cases**

** but ONLY looking at cases that are all old enough to have resolved regardless of outcome.
Which highlights another problem. If we did have this data (note that the CDC ought to have it, I would hope), we could only use cases that existed three or four weeks ago. If we have to go back four weeks to ensure 100% case completion, we only had 118 cases then.
Still, the CDC ought to have a pretty good idea of the true CFR if they have that database of individual statuses.

But frankly at this point, I don't trust them to give us accurate information.

I'd much rather have an anonymized database of case statuses available publicly.
Adding on to this thread from early April that I left out. It's there implicitly but it should be said explicitly. Case Fatality Rate depends very heavily on what a "case" is. So comparing CFR between countries, or even states, is probably pointless.
For example, if we think, when all is said and done, our _infection_ fatality rate is 1%, then for people who do a horrible job of testing (like the United States), or CFR could EASILY be 5% or more.
Basically we're defining "case" in this country to mean "you look like you are about to die".
You can follow @thomasafine.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: