Quick thread highlighting a potentially hot take for fellow data nerds about the @FiveThirtyEight NFL Prediction Model (yes, they do much more than just electoral politics):

I've long suspected that the model's pre-season ELO ratings imply too much parity amongst teams.

(1/n)
Let's start with how a team's pre-season ELO is determined.

In short, it's majority (two thirds) using Vegas odds for the upcoming season, and minority (one third) using last season's ending ELO mean-reverted by 33%.

Full explanation by the @FiveThirtyEight team pictured

(2/n)
My suspicion is that the decision to revert to the system's mean ELO by specifically 33% is the issue... some interesting questions: Why 33%? Was that the optimal back tested value? If so, should we be concerned about data dredging?

(3/n)
This only matters (in the context of a sports model 😜) if it manifests in the predictions.

Best might be to consider the forecasted full season records of teams immediately prior to the start of the season, when the above described ELOs are running through the model.

(4/n)
As pictured, the model forecasted that no team would *on average* (across tens of thousands of simulations) finish the 2020 season with a record worse than 5 wins and 11 losses. How does that compare with how the worst teams in the league have fared historically?

(5/n)
Since the NFL moved to a 16 game season in 1978, the number of full seasons in which no team finished with a record worse than 5 wins and 11 losses is... zero.

Worse than 4 wins and 12 losses? Just once (2003).

78% of seasons have a team that wins 2 games or less (!!!).

(6/n)
Importantly, this *is not* damning in and of itself. After all, mean reversion is real (and, like, the whole point of this thread...)

That is, *most* teams that finish with a 2-14 record are "better than 2-14" across tens of thousands of simulations. But by how much?

(7/n)
An efficient approach to learn whether this is worth exploring further (or whether my hypothesis is blatantly wrong) could be to run the current model on each historical season (n=41), and determine the probability that *any team* finishes with 2 wins or less.

(8/n)
Using these 41 data points, we could calculate the value of 'x' that the model would have used to complete the statement: "We expect that in ['x'%] of seasons, there will be a team that finishes with 2 wins or less".

Does the model view it as extremely unlikely?

(9/n)
Unfortunately, this is where we start losing visibility.

There's a downloadable CSV of the historical data linked, but as far as I can tell, it isn't reconcilable to the distribution of potential records for each team across many simulations.

https://github.com/fivethirtyeight/data/tree/master/nfl-elo

(10/n)
If I'm misinterpreting the data and it actually is reconcilable to the model's predicted records for each team at the start of every season, hopefully someone reading can point out how to me!

(11/n)
Would also love to hear anyone's thoughts on this, certainly not least including @NateSilver538, @jayboice, and @Neil_Paine ... especially any insider commentary on what they discussed when thinking about reversion to the mean in this particular model!

(12/n)
Lastly, I'll highlight that the point is by no means to disparage the model or the team ... quite the opposite! I love their work, and I am not actually any good at this stuff myself -- I'm commenting from a place of recreational interest.

(13/13)
You can follow @Matt_Levine_1.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: