🚨Working paper alert!🚨
"Scaling up fact-checking using the wisdom of crowds"

We find that 10 laypeople rating just headlines match performance of professional fact-checkers researching full articles- using set of URLs flagged by internal FB algorithm

https://psyarxiv.com/9qdza/ 
Here's where the *wisdom of crowds* comes in

Crowd judgments have been shown to perform well in guessing tasks, medical diagnoses, and market predictions

Plus politically balanced crowds cant be accused of bias

BUT can crowds actually do a good job of evaluating news articles?
We set out to answer this question

It was critical to use *representative* articles- otherwise unclear if findings would generalize

So we partnered with FB Community Review team, and got 207 URLs flagged for fact-checking by an internal FB algorithm https://www.axios.com/facebook-fact-checking-contractors-e1eaeb8b-54cd-4519-8671-d81121ef1740.html
Next we had 3 professional factcheckers research each article & rate its accuracy

First surprise: They disagreed more than you might expect!

The avg correlation b/t the fact-checkers' ratings was .62

On half the articles, 1 FC disagreed w other 2; on other half, all 3 agreed
Then we recruited N=1,128 laypeople from MTurk to rate the same articles (20/turker)

For scalability, they just read & rated each headline+lede, not full article

Half shown URL domain, other half no source info

Our Q: How well do layperson ratings predict factchecker ratings?
We created politically-balanced crowds & correlated their avg ratings with avg factchecker ratings

The crowd does quite well:

With as few as 10 laypeople, crowd is as correlated with average fact-checker rating as the fact-checkers’ ratings are correlated with each other!!
Next, we used laypeople ratings to predict the modal categorical rating fact-checkers gave to each headline (1 = True, 0 = Not True)

Overall AUC=.86
AUC>0.9 for articles where factcheckers were unanimous
AUC>0.75 for articles where one FC disagreed w other 2

Pretty damn good!
Finally we asked if some crowds did better than others

Answer:Yes & no

Crowds that were 1)Dem 2)high CRT 3)high political knowledge did better than 1)Rep 2)low CRT 3)low PK counterparts- but DIDNT outperform overall crowd!

Crowd neednt be all experts to match expert judgment
Caveats:
1) Individuals still fell for misinfo- but *crowds* did well
2) Need to protect against coordinated attacks (eg randomly poll users, not reddit-like)
3) Not representative sample- but point is that some laypeople can do well (FB could hire turkers!)
4) This was pre-COVID
Overall, we think that crowdsourcing is a really promising avenue for platforms trying to scale their fact-checking program!

Led by @_JenAllen @AaArechar w @GordPennycook

Thanks to FB Community Review team and others who gave comments

Would love to hear your thoughts too!🎉
You can follow @DG_Rand.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: