This thread is a fun little analysis I did about CDC's delayed reporting of death statistics. If you are allergic to graphs, you have been warned.
Disclaimer: I have no professional background in pandemic or statistical analysis. This was just a hobby project. (1/16)
Disclaimer: I have no professional background in pandemic or statistical analysis. This was just a hobby project. (1/16)
Had a conversation with a family member a month ago about the CDC's reported death counts. They noted that despite the increasing number of cases, a fact especially prevalent in the media, that the number of deaths being reported by the CDC was consistently going down. (2/16)
After some research, I learned that the CDC's death statistics are not always accurate. It takes time for hospitals and morgues to report deaths and causes to the CDC, increasing the number of people who died in a given week as you get further from that week. (3/16)
When you go back and compare the death reports from July to the current death reports, you can start to see the beginning of the second hump. (4/16)
So my questions are:
1) How long must you wait before you can trust CDC death reports?
2) How can we use that to predict the number of reported deaths the CDC will report in the future?
(5/16)
1) How long must you wait before you can trust CDC death reports?
2) How can we use that to predict the number of reported deaths the CDC will report in the future?
(5/16)
To start, I collected the reported dead (about) once a week as far back as I could go.
Sources: http://archive.org and https://www.cdc.gov/nchs/nvss/vsrr/covid19/index.htm
Next, I made a graph of all the death curves on top of each other to visualize how delayed the reporting was. (6/16)
Sources: http://archive.org and https://www.cdc.gov/nchs/nvss/vsrr/covid19/index.htm
Next, I made a graph of all the death curves on top of each other to visualize how delayed the reporting was. (6/16)
At any given date, it seems to take anywhere from 3 to 6 weeks for the number of deaths to start to converge on the final value, but it would still update after that. What I needed to do was better visualize that convergence process. (7/16)
My next plot shows how the number of deaths reported converge over time for each of the week segments. The x-axis shows the days between the week and when the data was reported. The y-axis normalized deaths by the number of deaths reported 6 weeks later. (8/16)
An exponential curve was fit through the points using JMP. I attempted making fits with normalization higher and lower than 6 weeks, but I found 6 weeks to be a good mix between accuracy and number of available data points. (Never said I was a good statistician). (9/16)
One way to understand the equation is that in the long term (x → ∞), deaths will approach 7.5% more than reported during week 6. The equation to predict total deaths is generated from the previous equation and is shown below (10/16)
The prediction algorithm was applied to various historical points to understand it's effects. The predictions both overestimate and underestimate, depending on the situation. Bottom right shows the original graph from July 7th. Prediction unable to predict second hump. (11/16)
The every prediction made was compared to the most current estimates (at least 4 weeks old) to determine accuracy of predictions (some 1 week data points are off screen). The standard deviations of these predictions were also calculated (note log y axis). (12/16)
Week 1 estimates are very off and should be taken with a grain of salt. Week 2 estimates are start to approach the right value, but are still inaccurate. Things start being reasonable around week 3 or 4. Graphs are shown below with predictions with and without week 2. (13/16)
In conclusion, the quickest way to accurately predict the number of actual deaths is to wait until 3 weeks after data is initially released and then increase that number by +40% with an error margin of ±15% or wait 4 weeks for +22% with error margin ±6.9%. (14/16)
Finally, this graph shows where we're at currently and the prediction w/ error bars. The last two data points predict a drop in deaths, but given the error makes the last point worthless and the second point questionable. Overall, it doesn't seem to be getting better. (15/16)
(I have no qualifications for making this data. I take no responsibility for any deaths caused by the inappropriate use of this data to justify government policy. Though I would be shocked to learn government policy was based on any data). (16/16)