Heading to Facebook today for a fireside chat with @SolomonMg about The Ethical Algorithm. In preparing I was looking into re-identification attacks against production systems that purport to protect privacy, and read about Aircloak's system Diffix. A short thread. 1/
Diffix provides an interactive system by which users can query data, and returns answers that are perturbed with small amounts of noise. But despite the name, it doesn't promise differential privacy. They are proud of this! On their website they write that: 2/
"Aircloak’s approach has engineered away the need for a privacy budget by producing tailored pseudo-random noise values that do not average away... which in turn leads to the ability to ask as many queries of your dataset as you desire." So they claim to do away with... 3/
One of the main weaknesses of differential privacy: a limited privacy budget! But wait a minute --- the inability to answer an unlimited number of queries isn't a limitation of differential privacy, but what is called the "Fundamental Law of Information Recovery"! 4/
We've known since 2003 from the work of @Kobbini and Dinur that answering sufficiently many queries to sufficiently high accuracy allows one to reconstruct the entire dataset exactly, and so to violate -any- reasonable notion of privacy! So how can Aircloak get around this? 5/
The answer is that they cannot. :-) In fact the 2003 attack works against Diffix, as Aloni Cohen and @Kobbini recently showed: https://arxiv.org/pdf/1810.05692.pdf Separately, @yvesalexandre and colleagues demonstrated an attack that makes only 32 queries per user: https://arxiv.org/pdf/1810.05692.pdf 6/
Aircloak has claimed to have changed their system to exploit these specific attacks: https://aircloak.com/break-my-lifes-work-and-ill-pay-you-handsomely/ but still claim to be able to answer an unlimited number of queries accurately, and to satisfy GPDR privacy requirements. Color me skeptical. 7/
The history of data privacy before differential privacy looked like this: privacy researchers would propose some system of heuristics to anonymize data. Then, clever attackers would come along and find an exploit. The researchers would patch it up, and this would repeat. 8/
It was a losing game for the privacy side. The advent of differential privacy put a stop to this cycle by offering rigorous guarantees. And the concept of accurate access to data as a necessarily limited and budgeted resource is fundamental (as we've known since 2003) 9/
and not specific to differential privacy. The attacks on Diffix were white hat in that they were done by researchers who published their work, but they could have easily been "black hat" and we'd never know. Real guarantees are important when we are worried about adversaries. 10/
I pasted the wrong link for the 2nd attack on Diffix: You can find that paper here: https://www.usenix.org/conference/usenixsecurity19/presentation/gadotti (Thanks @jugander for noticing!)
You can follow @Aaroth.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: