I’ve been asked to explain why I thought the high reported number of #COVID19 cases in Sheffield was misleading, so here it is:

Essentially, the COVID19 tracker is a type of crude disease surveillance that tots up cases. But, ALL surveillance systems have limitations.
1/...
Firstly, surveillance does not measure all cases of disease, only what’s been reported or tested. I.e. it's a proxy measure for real incidence of disease. You only see the tip of the iceberg and you don’t always know the size of the iceberg you can’t see.
2/...
If you test more, you will find more cases, i.e. you will see more of the iceberg. In the case of Sheffield, put simply they’ve been testing a whole lot more so it’s not a surprise they’ve uncovered more of the iceberg (ascertainment bias).
3/...
Surveillance data is susceptible to bias. A different test, or test protocol, or reporting procedure, or observer/reporter behaviour, can all introduce bias in what’s reported. Not every case presents to a doctor. Not every case is tested. Not every case is reported.
4/...
So imagine situation where different areas have identical number of cases. But if they have different levels of testing, you get a range of different number of cases reported. Now imagine if those different areas have different numbers of cases and test/report differently...
5/..
Making sense of the data is not straightforward either. Even if testing availability & protocols were identical, it is tricky to compare between different areas as every district, town, city, country will have different demographic profiles.
6/...
That’s why comparison of cases of COVID19 between different areas in the UK is difficult, and likewise comparisons between countries worldwide. I always remind my students to consider whether data is fact or fiction due to artefact, error, bias or simple random variation.
7/...
Currently, confirmed cases are more likely to be patients with greater severity of illness that have ended up in hospital, so a lot of folk with milder infections out in the community won’t have been detected, diagnosed or reported.
8/...
Also, if an area tests more of its healthworkers for the infection they are likely to find more cases as they are a higher risk group for infection (through frequent, prolonged exposure to ill infectious patients). So healthworkers are not reflective of the wider population
9/..
So if comparing cases is not a reliable measure of disease, what is? One option is to look at deaths as one might think reported deaths are more reliable (death notification is a legal requirement in the UK).
But...
10/...
Trouble is, even diagnosis of cause of death is not 100% fullproof (doctors are fallible and sometimes get them wrong!). Patients who die in the community may not have been diagnosed with COVID19 or have it reported on their death certificates.
11/...
Next best thing to finding out how prevalent COVID19 infections are in the community is thru seroprevalence estimates where we survey blood samples from the population for antibodies to the virus. This will give us a sense of what % of population is or has been infected.
12/...
So lots of caveats when interpreting surveillance data. One might question why bother? Well if you assume every area is reporting uniformly badly then what’s helpful are the trends and it is the trends we need to look out for. With a large pinch of salt.
You can follow @andrewleedr.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: