I’ve been playing #FPL for three years now. I don’t know much about football (soccer). But I do know a bit about statistics and economics, so I’ve built a model and applied some theory to both FPL and Draft. Thread here if you’re interested in FPL and statistics (mute otherwise)
I’m originally from Vancouver and am a hockey guy (go @Canucks). There’s no way I will learn about all the teams and players in the PL. But is there a way I can get a leg up and avoid eye-testing endless goalless draws between West Ham and West Brom? Bloody well hope so...
A number of different sites offer individual player point predictions, each taking into account different variables. Sites like @FFH_HQ , @foomnianalytics, and @FantasyFootyFix (disclosure: I’m a minor angel investor in @FFH_HQ)
Lots of thought has gone into their models... by smart people who know WAY more than me about football. Each model has its own assumptions and imperfections, so alone they’re wrought with bias, statistical noise and random variance. This isn’t a diss, it’s an objective reality.
Enter ensemble models. This is a machine learning technique where you take multiple models, each with their own imperfections, and combine them. The whole becomes greater than the sum of the parts. Background reading: https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2
A quick aside: I have a busy day job and a one year old son, so I’ve not had too much time to dedicate to this. It’s been a spare time for fun thing. I’m sure it can be done better; I whipped it up in Excel, which has acknowledged limitations. ML/stats people, pls don’t judge🤷‍♂️
Right. Step one: get predictions from a bunch of models. I used Foomni, both prediction methods from Hub, and Fix. I also used last year’s total points, and xP from Hub’s OPTA data. So six different predictions in all, per player. If a player is new in PL, I ignored the last two.
Side Note: for points last year, and xP, I extrapolated across a 38g season. Yes, that’s imperfect. All the models are imperfect. That’s kind of the point of this exercise. Probably weighting the models would be better, but I mean, this is supposed to be a hobby, not a job.
Side note: each site uses slightly different ways to list players, positions and teams. So it took me a while to clean the data so it all matched. I’ve probably missed some detail or mid-coded a few players. Would be easier if like 20 different players weren’t called Pereira!
Ok, so now I got a load of data. But: I found a problem! Each prediction method has different distributions of data. Some over-score goalies, some under-score midfielders, and so on. So next step was to independently normalise each prediction.
Side note: normalising is where you 1) assume each data set is normally distributed; and 2) recalculate it such that the mean is 0, and the standard deviation is 1. This ensures you can compare apples and apples. Background reading: https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Probability/BS704_Probability9.html
Right: so now we have 6 prediction methods, all on the same rating scale. I then took the arithmetic mean, and presto! Relative point predictions for about 200 of the top rated players (200 is the limit for linear optimisation in Excel, more on that coming up) 🙄
But here’s a problem: there are billions of possible team combinations here! And while I’m sure some know for sure that Werner will smash it for a new team, or that Rashford secretly stubbed his golden toe, I don’t know this info and tbh don’t care to learn it. So how can I pick?
Enter linear optimisation. This is where you have a linear mathematical problem with many outcomes... far more outcomes than the feeble human brain can comprehend (no offence, loser humans)😜 background reading: https://en.m.wikipedia.org/wiki/Linear_programming
Back in business school in Canada, I studied decision modelling in Excel. If you set up your variables (players) and constraints (budget and number of players, max players per team etc) then Excel will determine the optimal (well, technically near optimal) solution.
You can only have 200 variables (players to select from) as a limit in Excel, so I couldn’t include Barry Horowitz or some other jabroni from Fulham. But that doesn’t matter, because I’m only concerned with picking good players!
Anyways, I ran it - and the results were interesting. Different from what #FPL twitter says you’re supposed to do. Everyone is going hard on £12m midfielders, and sacrificing defence and goalies as a result. But... my model isn’t so sure that’s wise...
So here’s the big reveal. I’ve assumed I’ll have a £4m goalie and £4m D, and then one either F or M who’s £4.5m. That leaves me w £87.5m across 12 players to play with (instead of 11, to ensure 1 good bench cover). And here is the “optimal” team from my linear optimisation:
And here are the top 20 predicted scorers overall. Interesting to note that the correlation coefficient between predicted score and price is 0.81. So broadly, FPL prices indicated expected points. Neato.
Prob a few surprises in the optimal team! No Man City, for example. Perhaps the rotation risk, or having too many potential scorers, or the high average prices, has reduced the efficacy of individual performers. I’m just guessing here btw.
Another interesting point is the heaviness in D, and avoiding premium M except for Salah. The efficiency of pounds spent on D, according to my (admittedly unproven) method, outweighs the opportunity cost of over-investing in Ms.
Now, without including captain points, the team is predicted to get 2127 points. Of course, transfers will be needed and so forth. Some Pukki or Ings will raise their foot as the deal of the season. But, that is impossible to predict at this stage.
And yeah, Sterling or someone else will score a hattrick and people will revel in their (lucky) superiority. My model is bad at short-term variance, and instead focuses on longitudinal success. Any given player can have a lucky week, and if you guessed right, enjoy it.
BUT: you will never be able to reliably predict luck. Don’t kid yourself.
So, there you go. Will this method make me do better than my pretty lame 230k finish last year? Maybe. But if it doesn’t, at least I can blame the machine... and never forget the fact that football sucks and ice hockey is a way way way better sport 😂😂😂
BTW I've just added a new thread about #FPLDraft - if you're interested in Draft, check it out here: https://twitter.com/ParryMalm/status/1303728381515751425?s=20
You can follow @ParryMalm.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: