Thread by @pookleblinky, Data over a certain size is impossible to anonymize. That size is [...]

Data over a certain size is impossible to anonymize.

That size is ridiculously small, it is far smaller than you& #39;d think. A few hundred data points can reveal an enormous amount of patterns.

It takes 33 bits of information to uniquely identify one out of 7 billion humans.

It takes 20 bits to identify one out of a million.

It takes 10 bits to identify one out of a thousand.

Any usefully large dataset collected about human behaviour becomes impossible to anonymize. Patterns will reveal themselves, and will correlate to other patterns.

Any dataset of human behaviour large enough to serve one purpose (say, targeted advertising) will inevitably be able to serve other purposes (say, identifying pregnancy, or determining sexuality)

There is nothing preventing, say, data tracking your activity levels such as footsteps per day, to illuminate such as which supermarket you shop at.

Large supermarket chains know exactly how many steps the average customer takes through them. They run models to optimize how much product you must walk past to check off a shopping list.

Your fitbit data indicates which supermarket you go to, for a sufficiently motivated party. From that, it indicates which areas you may live.

Even if you stripped out any reference to location, that pattern is still there.

After enough data is collected, it& #39;s possible to exploit it for patterns far different than the ones intended.

There is no informed consent possible on large datasets. There is no way you could know what pattern eventually emerges from the data. Or limit which patterns can be gleaned from it.

From a person& #39;s tweet schedule it& #39;s possible to identify their work hours, whether they& #39;re a parent, whether they& #39;re married, whether they& #39;re in school, which school they& #39;re in, and which classes they& #39;re taking.

Basically: surveillance can& #39;t *help* but become ubiquitous. It& #39;s impossible to collect large data about human behaviour without that data revealing far more than what was intended.

Once you can collect enough, the surveillance is inevitable.

That surveillance is inevitable, it& #39;s an outcome of the mere existence of that data.

There is no technological way to limit what patterns that data can reveal, to narrow it to just the domain sought.

You will identify pregnant teenagers just by what hats they buy.

There& #39;s also no social way to prevent that data from revealing all the patterns it holds. GDPR can& #39;t stop a sufficiently motivated person from intentionally or unintentionally discovering a pattern in that data.

Once that data exists, all the patterns within it exist, just waiting to be correlated with each other, waiting to be used for unexpected purposes entirely unlike the originally intended use.

Consider the data you are providing right now, just by when you tweet.

What patterns are there, to be found among a large enough population of users?

Could you identify which company a person was laid off from, knowing this plus their location?

Could you identify not only what company, but what their previous job was?

Could you identify not only what their previous job was, but whether they are likely to own or rent a home?

Could you identify, just from that data, whether they have a car or use public transport?

Could you identify that one person, out of their entire city, from that data?

Latest Threads Unrolled: