1/ A tweetorial on false positives & transposing the conditional!

Or why interpreting type 1 error can be so confusing.

This is a common topic that overlaps in medical testing, data reporting, and statistical hypothesis tests (NHST).
2/ Lets start with some basic terms!

In binary classification, say for medical test results, their are 4 basic categories.

True positive
True negative
False positive
False negative
3/ We are interested in the False positive.

A false positive medical test result is when a patient DOES NOT have a condition, but the test (predicted condition) is Positive. See the table below.
4/ A False Positive in statistical hypothesis testing is when the null hypothesis (H0) is true, but the test has a result compatible with the alternative hypothesis (H1).

In the table below, this is like detecting a signal, when no signal is present.
5/ In fairy tales, a False Positive is also known as a False Alarm. Its the story of the boy who cries wolf, when no wolves are present.
6/ Its important to note the a False Positive is a *category*.

By itself, a False Positive *is not* a fraction, or a probability, or a percentage, or a rate. It is only a count.

Below is an example of a study, where 180 test results were counted as False Positives.
7/ Unsurprisingly, False Positives (FP) & False Negatives (FN) are considered Errors.

In the 1920s, the statisticians Neyman and Pearson identified these possible errors of NHST, and creatively decided to label them as "Type 1 Errors" (FP) and "Type 2 Errors" (FN).
8/ Mathematically, "Type 1 error rates" are basically a ratio of False Positives divided by the Total # of Negatives:
FP / (FP + TN)

Even though "Type 1 errors" (count) is not same as "Type 1 error rate" (ratio), we often use the terms interchangeably.

Its so confusing!
9/ Type 1 error rate can also be expressed as a probability (α):

Prob(FP | H0 true): If the null hypothesis is true, it's the probability of falsely rejecting it.

Prob(FP | condition absent): If the condition/disease is absent, it's the probability of falsely diagnosing it.
10/ OK! Following so far? Hopefully this has set the stage for the fun part of the tweetorial.

Because now we dive into the rabbit hole!
11/ Scenario 1: Lets say we have a positive medical test result, but are unsure if a disease is truly present. We want to know if the test result is wrong, i.e. a False Positive or Type 1 Error.

How do we determine the chances that this a false positive? What do we calculate?
12/ The correct answer is "Other".

We want to know: Given a positive result, what is probability that the condition is absent. Prob(Condition absent | Pos Test).

But Type 1 Error Rate (α) is Prob(FP | Condition absent), the inverse (transposed conditional) of our question!
13/ So in our scenario, the "Type 1 error rate" (α) does not help us determine the probability of having a false positive.

Or, to be doubly confusing: in our scenario the "Type 1 Error Rate" (α) does not help us determine the probability of having a "Type 1 Error".
14/ Now, to be triply confusing: "Type 1 Error" is often used interchangeably with "Type 1 Error Rate". As such, the the "probability of Type 1 error rate" (α), is often defined just as "the probability of type 1 error".

Got that?
15/ Yet the (α) defined as "probability of a Type 1 error" is still the conditional inverse of our attempt in the previously stated scenario to calculate the "probability of a false positive" or "probability of a Type 1 error".

Are you confused? Im feeling pretty confused.
16/ What is the solution out of this semantic and transposed conditional quagmire?

The best advice I've heard is this:

1) use conditional statements for probability
2) limit usage of the phrase "type 1 error" to study design
3) avoid pos/neg labels for non-binary test results
17/ Scenario 1: We have a positive test result, but are unsure if the disease/condition is truly present.

Confusing Q: Is this a false positive or type 1 error?
Good Q: Given the positive test result, what is the probability that the condition is absent?
18/ Scenario 2: A study has made a new discovery, and we want determine the chances it is wrong.

Wrong Q: What is the chance of type 1 error rate (α)?
Confusing Q: What is the chance of a false positive?
Good Q: Given the study results, what is prob the discovery is false?
19/ When discussing False Positives, always keep in mind

i) am i discussing the "category" and counts?
ii) am i discussing rates & probability?
iii) for probability questions, what is the conditional order?
20/ End

I hope this thread has been helpful introduction to the topic. Feel free to post corrections or comments.
You can follow @raj_mehta.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: