There has been a lot of talk about the IHME Covid-19 projection model. @EpiEllie & I have a chat about it in tomorrow's @casualinfer episode; here is a quick description of what is going on here with a focus on the *uncertainty*
https://covid19.healthdata.org/united-states-of-america
1/14
https://covid19.healthdata.org/united-states-of-america
1/14
When I look at models, I usually start with two things:
What method is being used?
What data is it based on?
Let's start with the methods!
2/14
What method is being used?
What data is it based on?
Let's start with the methods!
2/14
The IHME model is estimating the log of the cumulative death rate for a given state at a given time
Using curve fitting¹
parametrized with info about the state's social distancing
--
¹ in particular it is a non-linear mixed effects model https://ihmeuw-msca.github.io/CurveFit/methods/
3/14
Using curve fitting¹
parametrized with info about the state's social distancing
--
¹ in particular it is a non-linear mixed effects model https://ihmeuw-msca.github.io/CurveFit/methods/
3/14
Since the IHME model is trying to estimate a *curve* there are two important pieces:
When will deaths "peak"
How many deaths will there be at the "peak"
4/14
When will deaths "peak"
How many deaths will there be at the "peak"
4/14
To estimate when these, the IHME model has two sources of info:
the current death rate over time for the state
the social distancing measures being implemented
5/14
the current death rate over time for the state
the social distancing measures being implemented
5/14
This information is combined with some global info as well
In the short run, the model is impacted more by the state's data
in the long run, they use info from locations that have seemingly already reached a peak: Wuhan, 5 in Italy, 2 in Spain
6/14
In the short run, the model is impacted more by the state's data
in the long run, they use info from locations that have seemingly already reached a peak: Wuhan, 5 in Italy, 2 in Spain
6/14
OKAY now that we know what the IHME model is doing, let's get to the good stuff - where is the uncertainty?
There is uncertainty that the model itself will accurately predict what will happen (it's based on a Gaussian error function - is that right?)
7/14
There is uncertainty that the model itself will accurately predict what will happen (it's based on a Gaussian error function - is that right?)
7/14
There is uncertainty in the distributional assumptions of the model
Even if the model is correctly specified, there is uncertainty in the parameter estimation (this is a mixed effects model, so there is uncertainty associated with fixed and the random effects)
8/14
Even if the model is correctly specified, there is uncertainty in the parameter estimation (this is a mixed effects model, so there is uncertainty associated with fixed and the random effects)
8/14
There may be systematic uncertainty in the reported state-by-state death data. Why? Fewer deaths may be reported on weekends, if systems are overrun, COVID-19 related deaths may go unreported¹, etc
--
¹ https://www.npr.org/sections/coronavirus-live-updates/2020/04/08/829506542/after-deaths-at-home-in-nyc-officials-plan-to-count-many-as-covid-19
9/14
--
¹ https://www.npr.org/sections/coronavirus-live-updates/2020/04/08/829506542/after-deaths-at-home-in-nyc-officials-plan-to-count-many-as-covid-19
9/14
There may be random uncertainty in the reported state-by-state death data
There is uncertainty in the reported information coming from cities that seem to have already peaked
10/14
There is uncertainty in the reported information coming from cities that seem to have already peaked
10/14
So let's recap on the uncertainty in the IHME model:
model choice
model parameters
model estimation
data from the states (systematic)
data from the states (random)
data from the "peaked" locations
11/x
model choice
model parameters
model estimation
data from the states (systematic)
data from the states (random)
data from the "peaked" locations
11/x
In the original model (pre-last week) the error bands you saw only accounted for , since then the model was updated so that the uncertainty also accounts for out-of-sample uncertainty, which I believe covers
12/14
12/14
The shaded red region in the model is the *uncertainty* the model accounts for, just two of the 6:
model choice
model parameters
model estimation
data from the states (systematic)
data from the states (random)
data from the "peaked" locations
13/14
model choice
model parameters
model estimation
data from the states (systematic)
data from the states (random)
data from the "peaked" locations
13/14
This is not unusual or bad! It is just good to keep in mind the uncertainty that these projections carry with them. If all of the uncertainty we've talked about today was quantified, it's possible we'd basically have no answers to go off of
https://twitter.com/hspter/status/1246955939946803202?s=20
14/14
https://twitter.com/hspter/status/1246955939946803202?s=20
14/14
Think I missed something important? Please let me know!