The Twitter API v2 for academics makes it a lot easier to collect complete event data

I currently have a 4 step process that I finally successfully ran end to end on a real event this week, so I thought it might be useful to share

1/
1. Try to use the filter stream to get as much of the data as possible

I think the 1% threshold still applies here (you can only stream all filtered tweets if they're <1% of all tweets at any time). If this is really trouble for you, you can just use the next step
2. Use the full archive endpoint to backfill the early event data that you missed

This is slightly less comprehensive than streaming. Streaming matches your filters on both a tweet's text AND a quoted tweet's text. The search only matches to a tweet, not any quoted tweets
Also the completeness of your data will vary based on how quickly you're able to run the full archive search

Note: this is where you need to have the academic track
3. Use all of the conversation IDs of all the tweets from the streaming and the search to get all their reply threads

This is new with v2! There's no excuse anymore not to actually have the conversations that happen around events in your data
4. (optional) Collect the user timelines of those in your dataset

This is so that you have contextual data on what people were saying prior to a particular event. When we focus just on event data, we lose the bigger picture

But a warning:
Limit how much of the timelines you collect. If you try to get all 3,200 tweets for every user, you'll sky rocket past the 10,000,000 tweets/month cap

I'm currently trying to go just 2 weeks back for every user. This is the last thing I'm running so we'll see if I hit the cap
So if we use Twitter API v2 in this way, what does this give us

1) complete event data, including early trending that is missed before something goes viral
2) full conversations around the event
3) contextual timeline data so we can situate the event in a broader setting
One potential issue is that there isn't a way around with v2 is the 10,000,000 tweets/month cap. So this pipeline is only going to work on moderate sized events. Nothing something large or ongoing like the presidential election or national vaccine rollouts
My code is a bit adapted to our lab machine so it probably isn't particularly useful, but I hope this thread is helpful for thinking through how to use the Twitter API v2 for event data

Also, totally forgot the thread numbers. Time for the weekend
You can follow @ryanjgallag.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: