It's been coming up a lot lately, so I thought I'd do a bit of a thread on CONVENIENCE SAMPLES and why they aren't great for assessing POPULATION PREVALENCE of a disease

In other words - how many people have had COVID-19?

1/n
2/n So, the basic idea here is simple. We want to know about people who have (or in this case, have had) a disease

How do we find that out?
3/n The traditional method is to do a large, randomly-sampled study involving dialing up 10,000s of people across a population and surveying them + doing lots of blood tests

But this is EXPENSIVE
4/n Running a proper statistically representative process, getting all the people to answer their phones and give you bloods...even if the cost per person is low, multiply that by 10-100,000 and the cost can be prohibitive
5/n Which brings us to the idea of a CONVENIENCE SAMPLE

Why is it called a convenience sample (hint: answer is in the name)
6/n Yes, convenience samples are just that - convenient

Usually, they are groups of people that you are ALREADY TESTING for some reason that you can either add another test on to or survey
7/n I have used this method in the past to look at the burden of diabetes in-hospital and GP clinics - we looked at people who were already getting blood tests, and added one extra test for diabetes (and science!) https://www.sciencedirect.com/science/article/abs/pii/S0168822718318862
8/n But there's an issue here

We have selected these people very specifically. They are not a random, representative sample - they were people ALREADY GETTING blood tests which means they are probably different in LOTS OF WAYS to the general population
9/n So in our lovely study of a convenience sample of diabetes tests, we can't say anything about how much diabetes there is in the community (population prevalence)!

All we can talk about is diabetes IN THE PATIENTS TESTED
10/ "But God", you ask, with a common autocorrect mistake, "what does this have to do with COVID-19?"

Well, reader, this is where we get to antibody testing
11/n You see, when you get sick with a new disease, your body produces antibodies*

We can then test for these antibodies to see if you've had the disease before*

*oversimplified, plz don't murder me immunologists
12/n If you run an antibody test on a large group of people, it's called a serosurvey (because antibody tests are also known as serology in sciency terms)
13/n Now, a lot of places (countries, states, colleges) have run serosurveys and had a grand old time of it. This is why you keep seeing those news articles saying that x% of people in a place have had COVID-19 already
14/n The problem is, some of these serosurveys used CONVENIENCE SAMPLES

Just like we discussed earlier, that makes them a bit problematic
16/n For example, one study in Tokyo that used a CONVENIENCE SAMPLE found that 3.8% of people had had COVID-19 in the sample tested

But a proper randomized sample found just 0.1% - 38 times lower!
17/n In England, a CONVENIENCE SAMPLE of blood donors implied that 1 in 12 people had had COVID-19, but a large representative sample found it was just 1 in 20
18/n The problem is, these CONVENIENCE SAMPLES are systematically biased. They are of people who are different to the general population in ways that can be very difficult to measure and/or understand
19/n Blood donors, for example, are young and healthy by design. But the people who have been (generously) giving blood during the pandemic might also be...well, a bit odd
20/n They're going to great personal lengths to sacrifice for the rest of us ungrateful buggers, which might indicate that they're more likely to socialize, more likely to mingle, and thus more likely to get infected

We JUST DON'T KNOW
21/n And this is the problem with convenience samples, generally

We cannot use them to estimate population prevalence (how many people have had COVID-19), because they aren't representative of society as a whole
22/n So if you see a headline that says "x% of people infected with COVID-19!" take a leaf out of my mentor's book and ask:

"WHAT'S THE DENOMINATOR?"

It's a vitally important question
23/n THIS DOESN'T MEAN THAT CONVENIENCE SAMPLES ARE USELESS

I use them in my research. They are brilliant for quick, cheap tracking of rates of infection IN SELECT GROUPS

They also provide a brilliant window into change OVER TIME
24/n For example, if you sample blood donors every week for a year, you've got an amazing insight into the changing nature of the pandemic

THIS IS MASSIVELY IMPORTANT AND VERY CHEAP
25/n You just can't use those results to tell how many people in the rest of society have gotten COVID-19

But that doesn't mean the results aren't helpful at all
You can follow @GidMK.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: