Thread by @kareem_carr, The reason machine learning algorithms show bias is that the goal of [...]

🔥 Kareem Carr 🔥

kareem_carr

The reason machine learning algorithms show bias is that the goal of these algorithms is to learn ALL the patterns in the data including the biases. The "bias" is actually the gap between what the data scientist THINKS is being learned and what& #39;s actually being learned.

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧵" title="Thread" aria-label="Emoji: Thread">

An interesting feature of this bias is it& #39;s subjective. It depends on what the data scientist INTENDED to learn from the data. For all we know, the data scientist intended to learn all the patterns in the data, racism and all. In which case, there is no bias.

Generally, machine learning does not require us to be specific about what patterns we are trying to learn. It just vaguely picks up all of them. This means we often have no clue what was learned and if it is what we intended to learn.

Traditional statistics isn& #39;t like this. In statistics, the first step is specifying what patterns you want to detect. This requires you to have some kind of theory about the structure of the data. Most importantly, this allows you to check if your theory is wrong.

This issue is an huge weakness of the machine learning approach. The vagueness about what is being learned means that we have to do a lot of work after we fit the model to understand the properties of the model itself. In practice, this work is often not done.

The reason we need to do the work is because we can& #39;t rely on theory to tell us what the model learned so we must measure it. This means looking at how the model behaves in order to see if it& #39;s racist, sexist or has other biases we might care about.

As we see with the many examples of racist algorithms, many of the people using machine learning mistakenly think that they can rely on their intuitions to guess what kinds of patterns are in their dataset and what kind of patterns their algorithms are learning. This is naive.

I think the solution to racism in algorithms (and other biases of this kind) is to be more hands-on about understanding the processes that created the data your model uses and more proactive and explicit about checking that your models have the properties you think they have.

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🧵" title="Thread" aria-label="Emoji: Thread">

You can follow @kareem_carr.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: