Let’s talk prediction models.
My background is in health, and I’m working on a prediction model at the moment (although I can’t talk about it until after it’s released by Government, probably end of 2021). So I’ll be talking from that lens, but the same principles apply. https://twitter.com/callapilla/status/1330635260292263936">https://twitter.com/callapill...
My background is in health, and I’m working on a prediction model at the moment (although I can’t talk about it until after it’s released by Government, probably end of 2021). So I’ll be talking from that lens, but the same principles apply. https://twitter.com/callapilla/status/1330635260292263936">https://twitter.com/callapill...
Basically when we do this kind of analysis, we’re trying to come up with a description of the world, using mathematics. Something like “people who have high blood pressure tend to have more heart disease”, only with more Greek letters in it. This is a “model”.
A model is basically a box with three things: lots of knobs on, somewhere we can feed data in, and somewhere we can get predictions out. We then feed in a whole bunch of historical data and twiddle the knobs until we get the result we want. See https://xkcd.com/1838/ ">https://xkcd.com/1838/&quo... here.
Once we finish twiddling, we have a “trained model”, also called a “fitted model”, depending on whether you’re a computer scientist or a statistician, that we can then give new data and get predictions out. Ideally, the new predictions will be approximately right.
However, there’s always some people with high blood pressure who don’t get heart disease. Maybe this is because the world is just kind of random, there’s always some element of chance. This is called “aleatoric uncertainty” - randomness happens, sometimes predictions are wrong.
This is the source of the 95% accuracy figure in the article - “there’s just some people who don’t fit this model, because, well, shit happens, amiright?”
Well, no. Because there’s another problem: “epistemic uncertainty”.
Basically, this is the idea that, well, what if your model itself is wrong? What if your description of the world is woefully incomplete?
Basically, this is the idea that, well, what if your model itself is wrong? What if your description of the world is woefully incomplete?
Now I fit firmly into the George Box, uh, box here - all models are wrong, but some are useful. Recognising how the model might be wrong is the most important thing that I can do as someone trying to model the world.
This takes me away from the results as inherently valuable and reminds me that it’s what you do with the model that is the most important thing.
This is why we don’t look at someone with high blood pressure and say “well, you’re going to get heart disease!” We treat them.
This is why we don’t look at someone with high blood pressure and say “well, you’re going to get heart disease!” We treat them.
We look at the results of our model and dig underneath the high blood pressure. We try and find the underlying cause, whether it’s diet or exercise or genetics or whatever, and we come up with a plan to break the causal chain that leads to heart disease.
Importantly, this breaks our model. Suddenly the person who we predicted would get heart disease doesn’t.
Epistemic uncertainty says that there will always be ways to break the model.
The model does not tell the future: it highlights ways for us to make the future that we want.
Epistemic uncertainty says that there will always be ways to break the model.
The model does not tell the future: it highlights ways for us to make the future that we want.
This is why the modelling of COVID outbreaks has to be reworked when we make social or policy changes to try and flatten the curve: the model is epistemically incomplete. It’s not just that it happened to get the predictions wrong: it approximates a world that no longer exists.
To design a model that predicts who will die of heart disease and then use that model to condemn those people to death is profoundly unethical.
To decide a priori who will be a “criminal” and to punish them based on that decision is no less unethical.
To decide a priori who will be a “criminal” and to punish them based on that decision is no less unethical.
(and don’t get me started on the social construction of crime pls)
All models are wrong.
However, some are useful. Within then they contain the seeds of their own destruction: the ways in which we can break the models to avoid predicted harms.
However, some are useful. Within then they contain the seeds of their own destruction: the ways in which we can break the models to avoid predicted harms.
Any model that doesn’t lead you to new understandings is, to my mind, useless at best, and harmful at worst. Algorithmic cruelty is no less cruel because you made a computer do it.