It& #39;s been coming up a lot lately, so I thought I& #39;d do a bit of a thread on CONVENIENCE SAMPLES and why they aren& #39;t great for assessing POPULATION PREVALENCE of a disease

In other words - how many people have had COVID-19?

1/n
2/n So, the basic idea here is simple. We want to know about people who have (or in this case, have had) a disease

How do we find that out?
3/n The traditional method is to do a large, randomly-sampled study involving dialing up 10,000s of people across a population and surveying them + doing lots of blood tests

But this is EXPENSIVE
4/n Running a proper statistically representative process, getting all the people to answer their phones and give you bloods...even if the cost per person is low, multiply that by 10-100,000 and the cost can be prohibitive
5/n Which brings us to the idea of a CONVENIENCE SAMPLE

Why is it called a convenience sample (hint: answer is in the name)
6/n Yes, convenience samples are just that - convenient

Usually, they are groups of people that you are ALREADY TESTING for some reason that you can either add another test on to or survey
7/n I have used this method in the past to look at the burden of diabetes in-hospital and GP clinics - we looked at people who were already getting blood tests, and added one extra test for diabetes (and science!) https://www.sciencedirect.com/science/article/abs/pii/S0168822718318862">https://www.sciencedirect.com/science/a...
8/n But there& #39;s an issue here

We have selected these people very specifically. They are not a random, representative sample - they were people ALREADY GETTING blood tests which means they are probably different in LOTS OF WAYS to the general population
9/n So in our lovely study of a convenience sample of diabetes tests, we can& #39;t say anything about how much diabetes there is in the community (population prevalence)!

All we can talk about is diabetes IN THE PATIENTS TESTED
10/ "But God", you ask, with a common autocorrect mistake, "what does this have to do with COVID-19?"

Well, reader, this is where we get to antibody testing
11/n You see, when you get sick with a new disease, your body produces antibodies*

We can then test for these antibodies to see if you& #39;ve had the disease before*

*oversimplified, plz don& #39;t murder me immunologists
12/n If you run an antibody test on a large group of people, it& #39;s called a serosurvey (because antibody tests are also known as serology in sciency terms)
13/n Now, a lot of places (countries, states, colleges) have run serosurveys and had a grand old time of it. This is why you keep seeing those news articles saying that x% of people in a place have had COVID-19 already
14/n The problem is, some of these serosurveys used CONVENIENCE SAMPLES

Just like we discussed earlier, that makes them a bit problematic
16/n For example, one study in Tokyo that used a CONVENIENCE SAMPLE found that 3.8% of people had had COVID-19 in the sample tested

But a proper randomized sample found just 0.1% - 38 times lower!
17/n In England, a CONVENIENCE SAMPLE of blood donors implied that 1 in 12 people had had COVID-19, but a large representative sample found it was just 1 in 20
18/n The problem is, these CONVENIENCE SAMPLES are systematically biased. They are of people who are different to the general population in ways that can be very difficult to measure and/or understand
19/n Blood donors, for example, are young and healthy by design. But the people who have been (generously) giving blood during the pandemic might also be...well, a bit odd
20/n They& #39;re going to great personal lengths to sacrifice for the rest of us ungrateful buggers, which might indicate that they& #39;re more likely to socialize, more likely to mingle, and thus more likely to get infected

We JUST DON& #39;T KNOW
21/n And this is the problem with convenience samples, generally

We cannot use them to estimate population prevalence (how many people have had COVID-19), because they aren& #39;t representative of society as a whole
22/n So if you see a headline that says "x% of people infected with COVID-19!" take a leaf out of my mentor& #39;s book and ask:

"WHAT& #39;S THE DENOMINATOR?"

It& #39;s a vitally important question
23/n THIS DOESN& #39;T MEAN THAT CONVENIENCE SAMPLES ARE USELESS

I use them in my research. They are brilliant for quick, cheap tracking of rates of infection IN SELECT GROUPS

They also provide a brilliant window into change OVER TIME
24/n For example, if you sample blood donors every week for a year, you& #39;ve got an amazing insight into the changing nature of the pandemic

THIS IS MASSIVELY IMPORTANT AND VERY CHEAP
25/n You just can& #39;t use those results to tell how many people in the rest of society have gotten COVID-19

But that doesn& #39;t mean the results aren& #39;t helpful at all
You can follow @GidMK.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: