Thread by @nickwan, i asked some days ago about the value of a drive and [...]

i asked some days ago about the value of a drive and whether you can assess if a drive was good or bad irrespective of the outcome of the drive. like if a drive featured high value plays but the outcome was not a TD, it isn& #39;t necessarily "bad" drive https://twitter.com/nickwan/status/1305570840726122501">https://twitter.com/nickwan/s...

https://twitter.com/nickwan/status/1305570840726122501

so i messed around and came up with a solution. you could use the average EPA of a drive and the EPA variability of a drive (minus the drive-ending play) to get a rough estimation of drive value but i wanted to connect those values to something like EP

so i took EPA mean, SD, the initial drive starting yard line, and the time left in the game and trained a model that returned EP given these drive-level values. mind you, there are no drive-ending plays in this so there& #39;s no explicit tells for how the drive ends

theoretically, the outcome is the value of the drive regardless of the outcome. so how did it perform?

not bad, actually. so if you separate the top drive values from the bottom drive values (median split) and plot them on EP, high drive value end up with higher EP than low drive value. there& #39;s also bleed between these groups to suggest good drives can end in not good outcomes

i wanted to know if this was sort of like an effect of amount of plays and this popped out. the fewer the amount of plays in a drive, the stronger the value is associated to the outcome. but by the time you get to 7 plays, it& #39;s harder to determine if a drive was "good"

for transparency, mean of EPA for a drive does something

so i wanted to know whether teams with good drives also scored more often? so i took team-seasons, took the mean of the drive value and correlated that with the points scored in a season. this actually ended up with a stronger correlation than just EPA mean

(sorry about the crazy titles yall, i& #39;m just copying and pasting straight up from my notebook)

so to some extent, i felt pretty okay with this simple approach to drive value. so like... let& #39;s test this shit. if we look at last night& #39;s game, the quality of NE& #39;s drive (especially in the 4Q) were really good, while both team& #39;s 3Q drives were sub par

this also sheds some light on those first two drives from seattle. one was "bad" because it ended in a pick 6 and the other was a TD but both were actually above average drives, irrespective of outcome

best drive quality this week? that& #39;s the browns

was the cowboys game luck? dallas actually out-drove atlanta the majority of the game (albeit pretty close). atlanta lost some pace in the 3Q. so drive-wise, perhaps the game was a lot closer than the score suggested at halftime

how about the worst quality games? worst driving team were the jets this past week. the niners didn& #39;t fair much better, but they out-drove the jets consistently from 2Q onward. unsurprisingly, that& #39;s also where the win% swings heavily towards SF

the lowest drive quality game of the week was perhaps also the most fun to watch -- the OT KC comeback vs the chargers. both teams actually had fairly poor drives -- KC particularly so basically until part way into the 3Q -- and LAC was actually *better* in terms of drives

but KC ended up with something my model can& #39;t capture. and that& #39;s pretty fun (or frustrating, depending on how data-centric you are)

anyway. a wise friend once said "if you torture your data, it will confess". unsure if i& #39;m doing that. as i said before, you may be alright with just taking the EPA average of a drive; this is just one step beyond that point imo

if you are in my discord, you read all this already! sorry i haven& #39;t streamed -- internet is ass and i move in a little under two weeks. but as always, my analytics are always open for yall: https://colab.research.google.com/drive/1vziZs1yDqribhKrAl3_NsCabBA_GQhAe?usp=sharing">https://colab.research.google.com/drive/1vz...

Google Colaboratory

https://colab.research.google.com/drive/1vziZs1yDqribhKrAl3_NsCabBA_GQhAe?usp=sharing

and if you want more of this content (sports analytics or data sci in general), tune into my streams http://twitch.tv/nickwan_datasci ">https://twitch.tv/nickwan_d...

nickwan_datasci - Twitch

Same nickwan as twitter.com/nickwan I stream variety and coding

http://twitch.tv/nickwan_datasci

oh right, two things:
1) this is (and probably will always be) a work in progress and created to be as simple as possible. no definitives here, all up for modification
2) i encourage you to take the notebook and fiddle with it https://colab.research.google.com/drive/1vziZs1yDqribhKrAl3_NsCabBA_GQhAe?usp=sharing">https://colab.research.google.com/drive/1vz...

Google Colaboratory

https://colab.research.google.com/drive/1vziZs1yDqribhKrAl3_NsCabBA_GQhAe?usp=sharing

never be scared to be falsified by better science yall. records were meant to be broken, science was meant to go through rigorous methodology

Latest Threads Unrolled: