I've been talking a lot about the bad study about Twitter bots and COVID-19 that has been shared widely over the weekend. I'd like to highlight a GOOD study on the same topic, mostly as an example of what I would like to see more of from researchers, & what media should look for.
The paper is "COVID-19 on Twitter: Bots, Conspiracies, and Social Media Activism", by Emilio Ferrara ( @emilio__ferrara) of USC. It's a preprint, which means it's not yet peer reviewed, but is available for the public and experts (like me) to evaluate.

https://arxiv.org/abs/2004.09531 
The paper analyzes ~100M tweets from Jan 21 to Mar 12. It gives the exact keywords used to find the tweets, the algorithm used to judge how bot-like an account is (Botometer in this case), and the statistical methods used to analyze the bot scores.
Even more importantly, the entire data set is available here: https://github.com/echen102/COVID-19-TweetIDs (though it's just the IDs so there will be some deleted/suspended tweet content missing)

So first off, the researcher is clear about what he's doing, and the work is reproducible.
Ferrara makes reasonable assumptions. He says that accts w/ a 10% or less "bot" score are likely human, and accts w/ a 90%+ "bot" score are likely bots. Anything in between is too uncertain so he throws it out bc we can't know if the behavior is from bots or from people.
Of course, some of those <10% scoring accounts might be bots and some of the >90% scoring accounts might be human. But these are pretty high thresholds. Compare this to the CMU lab's previous studies, which use a 60% likelihood threshold as a cutoff for "we consider this a bot".
The paper is also forthcoming about what is guesswork and what is measured fact, prefacing speculation with phrases like "We can only speculate that..."
The paper lays bare its natural language processing analysis too. Check out this figure where he filters for the top 10 distinctive 3-word phrases in likely bot vs human accounts. The words alone tell a qualitative story, and we're given a time series for even more context.
The paper also asks: if there are bots that try to interfere with discourse, are there bots that try to help inform the public? He looks at the data and finds that yes, these exist too. I've attached the conclusion here, which is a headline-repelling "it's complicated".
And finally, the author expends significant effort explaining the limitations of the data and of the study.

All in all, a great paper. Interesting enough that I might go ahead and try and reproduce the results and play with the data myself.
Researchers: please take note of these practices and try to emulate them.

Journalists: if an academic comes to you with exciting results of a study, ask to see the paper, and make sure it looks like this. (If it's about bots, send it to me! I'll happily opine on its legitimacy.)
Also @emilio__ferrara is guest editing an upcoming COVID-19 issue of the Journal of Computational Social Science. Here's the call for papers. I expect it will be good reading when it's out. https://www.springer.com/journal/42001/updates/17993070
For my prior threads on the bad news items that have been going around, see here: https://twitter.com/tinysubversions/status/1264681091450892289
Made a mistake earlier. The paper is not using as a threshold Botometer scores lower than 10% and higher than 90%, it's lower than 10th *percentile* and higher than 90th *percentile*. this translates to Botometer scores of < 0.04 and > 0.44 (am auditing the data today, more soon)
Folks, Botometer is not looking accurate when I manually review accounts flagged w/ a 90th percentile bot score. Gonna have to write up something substantial and it's probably gonna take longer than a day.

Still: this paper is good science bc I can check its work like this!
I almost wonder if Botometer should be renamed to Normiemeter because it seems to flag a lot of activity that is just normal interaction w/ twitter by people who are not Terminally Online (people who exclusively use the "share this article" intent feature on news sites to tweet)
Can you imagine the headlines? "50% of accounts tweeting about COVID are normies"
As an ethnographer friend of mine pointed out: Botometer's false positives might just be another case of tech people assuming that everyone uses technology exactly like they do
What's amusing/depressing is the higher the bot score, the less likely the account is to be a bot and the more likely they are to be just a person who doesn't understand Twitter very well and mostly crossposts from their very real Facebook account (or similar)
0.5 bot score: account created a month ago that only RTs accounts created a month ago

0.95 bot score: grandpa sharing from facebook who probably forgot he even has a twitter account hooked up to his facebook account
so far I have not found a *single* account scoring higher than 0.9 via botometer that does not appear to just be........... an older person using tech in ways that are considered uncool or gauche by techies
This has nothing to do with botometer but I remember a researcher saying "we consider an 8 digit number in an account name to be a sign that it's a bot because that's what twitter offers by default when you sign up" ... the implicit assumption being people would change it (1/4)
But if you sign up for a Twitter account in 2020, you literally don't have the option to choose a username! They give you a username that is something like name12345678, and then you have to go into your settings and manually change it, and they don't prompt you to do this! (2/4)
If you are not very technical & don't like to poke around in your application settings, and you're not social media savvy and therefore don't understand that the Twitter equivalent of [email protected] is a corny username, you're not gonna change it. Doesn't make you a bot. (3/4)
In fact if I were building a botnet one of the first things I would do is make non-default usernames so as not to appear fishy. I wouldn't be surprised if name12345678 accounts are more likely to be simply less-computer-literate humans rather than bots. (4/4)
update: found one account but it's a news aggregator, hardly salacious stuff https://twitter.com/tinysubversions/status/1265346449031618560
I'm like the miracle sudoku guy in reverse -- working through the puzzle and realizing that, no, none of this actually works
Let me tell you: being reverse miracle sudoku guy fuckin sucks
(thread continued here) https://twitter.com/tinysubversions/status/1266146859938009088
You can follow @tinysubversions.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: