It& #39;s been coming up a lot lately, so I thought I& #39;d do a bit of a thread on CONVENIENCE SAMPLES and why they aren& #39;t great for assessing POPULATION PREVALENCE of a disease
In other words - how many people have had COVID-19?
1/n
In other words - how many people have had COVID-19?
1/n
2/n So, the basic idea here is simple. We want to know about people who have (or in this case, have had) a disease
How do we find that out?
How do we find that out?
3/n The traditional method is to do a large, randomly-sampled study involving dialing up 10,000s of people across a population and surveying them + doing lots of blood tests
But this is EXPENSIVE
But this is EXPENSIVE
4/n Running a proper statistically representative process, getting all the people to answer their phones and give you bloods...even if the cost per person is low, multiply that by 10-100,000 and the cost can be prohibitive
5/n Which brings us to the idea of a CONVENIENCE SAMPLE
Why is it called a convenience sample (hint: answer is in the name)
Why is it called a convenience sample (hint: answer is in the name)
6/n Yes, convenience samples are just that - convenient
Usually, they are groups of people that you are ALREADY TESTING for some reason that you can either add another test on to or survey
Usually, they are groups of people that you are ALREADY TESTING for some reason that you can either add another test on to or survey
7/n I have used this method in the past to look at the burden of diabetes in-hospital and GP clinics - we looked at people who were already getting blood tests, and added one extra test for diabetes (and science!) https://www.sciencedirect.com/science/article/abs/pii/S0168822718318862">https://www.sciencedirect.com/science/a...
8/n But there& #39;s an issue here
We have selected these people very specifically. They are not a random, representative sample - they were people ALREADY GETTING blood tests which means they are probably different in LOTS OF WAYS to the general population
We have selected these people very specifically. They are not a random, representative sample - they were people ALREADY GETTING blood tests which means they are probably different in LOTS OF WAYS to the general population
9/n So in our lovely study of a convenience sample of diabetes tests, we can& #39;t say anything about how much diabetes there is in the community (population prevalence)!
All we can talk about is diabetes IN THE PATIENTS TESTED
All we can talk about is diabetes IN THE PATIENTS TESTED
10/ "But God", you ask, with a common autocorrect mistake, "what does this have to do with COVID-19?"
Well, reader, this is where we get to antibody testing
Well, reader, this is where we get to antibody testing
11/n You see, when you get sick with a new disease, your body produces antibodies*
We can then test for these antibodies to see if you& #39;ve had the disease before*
*oversimplified, plz don& #39;t murder me immunologists
We can then test for these antibodies to see if you& #39;ve had the disease before*
*oversimplified, plz don& #39;t murder me immunologists
12/n If you run an antibody test on a large group of people, it& #39;s called a serosurvey (because antibody tests are also known as serology in sciency terms)
13/n Now, a lot of places (countries, states, colleges) have run serosurveys and had a grand old time of it. This is why you keep seeing those news articles saying that x% of people in a place have had COVID-19 already
14/n The problem is, some of these serosurveys used CONVENIENCE SAMPLES
Just like we discussed earlier, that makes them a bit problematic
Just like we discussed earlier, that makes them a bit problematic
15/n My co-authors and I, in our systematic review of age-stratified IFRs for COVID-19, looked into just how problematic
The answer: a whole lot https://www.medrxiv.org/content/10.1101/2020.07.23.20160895v4">https://www.medrxiv.org/content/1...
The answer: a whole lot https://www.medrxiv.org/content/10.1101/2020.07.23.20160895v4">https://www.medrxiv.org/content/1...
16/n For example, one study in Tokyo that used a CONVENIENCE SAMPLE found that 3.8% of people had had COVID-19 in the sample tested
But a proper randomized sample found just 0.1% - 38 times lower!
But a proper randomized sample found just 0.1% - 38 times lower!
17/n In England, a CONVENIENCE SAMPLE of blood donors implied that 1 in 12 people had had COVID-19, but a large representative sample found it was just 1 in 20
18/n The problem is, these CONVENIENCE SAMPLES are systematically biased. They are of people who are different to the general population in ways that can be very difficult to measure and/or understand
19/n Blood donors, for example, are young and healthy by design. But the people who have been (generously) giving blood during the pandemic might also be...well, a bit odd
20/n They& #39;re going to great personal lengths to sacrifice for the rest of us ungrateful buggers, which might indicate that they& #39;re more likely to socialize, more likely to mingle, and thus more likely to get infected
We JUST DON& #39;T KNOW
We JUST DON& #39;T KNOW
21/n And this is the problem with convenience samples, generally
We cannot use them to estimate population prevalence (how many people have had COVID-19), because they aren& #39;t representative of society as a whole
We cannot use them to estimate population prevalence (how many people have had COVID-19), because they aren& #39;t representative of society as a whole
22/n So if you see a headline that says "x% of people infected with COVID-19!" take a leaf out of my mentor& #39;s book and ask:
"WHAT& #39;S THE DENOMINATOR?"
It& #39;s a vitally important question
"WHAT& #39;S THE DENOMINATOR?"
It& #39;s a vitally important question
23/n THIS DOESN& #39;T MEAN THAT CONVENIENCE SAMPLES ARE USELESS
I use them in my research. They are brilliant for quick, cheap tracking of rates of infection IN SELECT GROUPS
They also provide a brilliant window into change OVER TIME
I use them in my research. They are brilliant for quick, cheap tracking of rates of infection IN SELECT GROUPS
They also provide a brilliant window into change OVER TIME
24/n For example, if you sample blood donors every week for a year, you& #39;ve got an amazing insight into the changing nature of the pandemic
THIS IS MASSIVELY IMPORTANT AND VERY CHEAP
THIS IS MASSIVELY IMPORTANT AND VERY CHEAP