joss wright, oxford internet institute (plenary): privacy-preserving data analysis - mechanisms and...

57
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of oxfordoiioiioiioiioiioiio Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy-Preserving Data Analysis Mechanisms and Formal Guarantees Joss Wright [email protected] Oxford Internet Institute Oxford University Joss Wright Privacy-Preserving Data Analysis: 1/57

Upload: iscienceeu

Post on 05-Dec-2014

1.261 views

Category:

Education


3 download

DESCRIPTION

Network of Excellence Internet Science Summer School. The theme of the summer school is "Internet Privacy and Identity, Trust and Reputation Mechanisms". More information: http://www.internet-science.eu/

TRANSCRIPT

Page 1: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Privacy-Preserving Data AnalysisMechanisms and Formal Guarantees

Joss [email protected]

Oxford Internet InstituteOxford University

Joss Wright Privacy-Preserving Data Analysis: 1/57

Page 2: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Privacy

What is privacy?

Many definitions in different areas of application.A useful definition: informational self-determination

Enable data subjects to control how, in what way, and to whomtheir data is made available.

Joss Wright Privacy-Preserving Data Analysis: 2/57

Page 3: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Privacy

What is privacy?

Within the privacy enhancing technologies community:Protecting the relations between communicating parties fromobservation.

Context privacy.Anonymous communications.

Preventing deduction of identities or attributes from collections ofdata.

Data privacy.Strongly related concepts, but surprisingly separate fields of research.

Joss Wright Privacy-Preserving Data Analysis: 3/57

Page 4: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Data Privacy

Protection of individual data subjects from identification.

Typically we work within the context of statistical queries ondatabases.

Counts, averages, histogram queries, etc.

Joss Wright Privacy-Preserving Data Analysis: 4/57

Page 5: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Model

Consider a database as made up from a number of rowsrepresenting a single, unique individual, with columns showingattributes.

All databases are not like this, but it’s useful for mechanism designand gives sufficient generality.

Joss Wright Privacy-Preserving Data Analysis: 5/57

Page 6: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Model

Name Age HeightJoss 31 168Alice 30 144Bob 25 200

Joss Wright Privacy-Preserving Data Analysis: 6/57

Page 7: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Actors

Data subjectsOwners of the data

Holders and publishers of dataRecipients of data

Attacker

Joss Wright Privacy-Preserving Data Analysis: 7/57

Page 8: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Trust in the System

Where do we place trust in the system?

SubjectsNeed not be trusted as they control their own data.

PublishersMay need to be trusted in how they gather the data.If you expect them to control release, they must be trusted.

Data RecipientsAdversarial and malicious.

Joss Wright Privacy-Preserving Data Analysis: 8/57

Page 9: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Basic Mechanisms

AnonymizationRemove explicit identifiers such as names.

Privacy-preserving data miningRestrict queries to preserve privacy or results.Preferably enforced by the data publisher.

Data peturbationAlter data to prevent undesirable inferences from being drawn

Joss Wright Privacy-Preserving Data Analysis: 9/57

Page 10: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Anonymization

Remove names or other obvious identifiers from data.Problems arise with quasi-identifiers.

Combinations of record values that uniquely identify individuals.These can be difficult to specify or even detect.Exacerbated by the fact that data from external sources maycombine with the database to form a quasi-identifier.We’ll come back to this.

Joss Wright Privacy-Preserving Data Analysis: 10/57

Page 11: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Anonymization

Name Age HeightJoss 31 168Alice 30 144Bob 25 200

Charles 31 187David 27 168

Joss Wright Privacy-Preserving Data Analysis: 11/57

Page 12: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Anonymization

Name Age HeightJoss 31 168Alice 30 144Bob 25 200

Charles 31 187David 27 168

Joss Wright Privacy-Preserving Data Analysis: 12/57

Page 13: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Anonymization

Name Age HeightJoss 31 168Alice 30 144Bob 25 200

Charles 31 187David 30 168

Red values are unique, therefore quasi-identifiers.

Joss Wright Privacy-Preserving Data Analysis: 13/57

Page 14: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Anonymization

Name Age HeightJoss 31 168Alice 30 144Bob 25 200

Charles 31 187David 30 168

Blue values are unique combinations, and so quasi-identifiers.

Joss Wright Privacy-Preserving Data Analysis: 14/57

Page 15: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Anonymization Methods

One of the most well-known anonymizing mechanisms applied todata is k-anonymity

Each unique set of records in a database should be combined with(1− k) other records in the database.Any given record therefore describes at least k people.

The probability that you are identified by that record is 1/k.

Joss Wright Privacy-Preserving Data Analysis: 15/57

Page 16: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

k-anonymity

Name Age HeightJoss 31 168Alice 30 144Bob 25 200

Charles 31 187David 27 168

Joss Wright Privacy-Preserving Data Analysis: 16/57

Page 17: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

k-anonymity

Name Age HeightJoss [25-35] ≤180Alice [25-35] ≤180Bob [25-35] >180

Charles [25-35] >180David [25-35] ≤180

Joss Wright Privacy-Preserving Data Analysis: 17/57

Page 18: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

k-anonymity Applied

This is not a hypothetical issue.When Sweeney proposed k-anonymity, she demonstrated therisks.

Took postcode, date of birth and sex from a published voterregisterTook anonymized published medical recordsIdentified the record belonging to a former governor ofMassachusetts.

Joss Wright Privacy-Preserving Data Analysis: 18/57

Page 19: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Beyond k-anonymity

k-anonymity gives a basic level of anonymization that prevents anindividual being simply re-identified from their published attributes.

There are, naturally, more subtle issues.

We may still be able to infer sensitive information about a person,even if we can’t directly identify them.

Joss Wright Privacy-Preserving Data Analysis: 19/57

Page 20: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

l-diversity

k-anonymity ensures that an individual is indistinguishable from agroup of other individuals, preventing their direct re-identification.

It could be, however, that attributes shared by the entire group aresensitive.

Joss Wright Privacy-Preserving Data Analysis: 20/57

Page 21: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

l-diversity

Name Age Height IllnessJoss 31 168 FluAlice 30 144 FluBob 25 200 HIV

Charles 31 187 HIVDavid 27 168 Flu

Joss Wright Privacy-Preserving Data Analysis: 21/57

Page 22: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

l-diversity

Name Age Height IllnessJoss [25-35] ≤180 FluAlice [25-35] ≤180 FluBob [25-35] >180 HIV

Charles [25-35] >180 HIVDavid [25-35] ≤180 Flu

Joss Wright Privacy-Preserving Data Analysis: 22/57

Page 23: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

l-diversity

Name Age Height IllnessJoss [25-35] ≤180 FluAlice [25-35] ≤180 FluBob [25-35] >180 HIV

Charles [25-35] >180 HIVDavid [25-35] ≤180 Flu

Joss Wright Privacy-Preserving Data Analysis: 23/57

Page 24: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

l-diversity

Name Age Height IllnessJoss [25-35] ≤200 FluAlice [25-35] ≤200 FluBob [25-35] ≤200 HIV

Charles [25-35] ≤200 HIVDavid [25-35] ≤200 Flu

Joss Wright Privacy-Preserving Data Analysis: 24/57

Page 25: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

l-diversity

l-diversity ensures that not only are all users k-anonymous, butthat each group of users shares a variety of sensitive attributes.

Variations ensure that all sensitive attributes are evenly orsufficiently distributed to avoid high probability association of userwith attribute.

One notable extenstion is t-closeness that ensures that thedistribution of attributes in the group is close to the distributionacross the entire table.

Joss Wright Privacy-Preserving Data Analysis: 25/57

Page 26: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Peturbation

The above approaches maintain the consistency of the database.

One of the oldest ideas is simply to replace genuine values withperturbed values that maintain almost-correct desirableproperties.For numeric quantities this can simply be the addition of randomnoise according to some appropriate distribution.

Obviously this works best for numerical data.For categories, this can result in attributes being re-assigned in avariety of ways.

Joss Wright Privacy-Preserving Data Analysis: 26/57

Page 27: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Permutation

Sensitive attributes can be swapped between data records,maintaining statistical quantities such as aggregate counts, averagesand distribution of data.This has to be performed sensitively with respect to the requiredanalyses.

Typically on an ad-hoc, per-database basis.

Joss Wright Privacy-Preserving Data Analysis: 27/57

Page 28: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Sweeney’s k-anonymity Re-identification

In 2001, Sweeney set out to prove the ideas behind k-anonymity.

Took publicly available voter registration data and published,anonymized medical records. (GIC Healthcare Data.)At the time of the data collection, William Weld was the governorof Massachusetts.

According to the voter records, only six people in Cambridge,Massachusetts shared his birth date.Of those six, three were male.Only one lived within his (5-digit) ZIP code.

Joss Wright Privacy-Preserving Data Analysis: 28/57

Page 29: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Sweeney’s k-anonymity Re-identification

The anonymized medical records contained over 100 attributesdetailing diagnoses, procedures and medications.Sweeney calculated that 87% of US citizens were uniquelyidentifiable through the quasi-identifier of {sex, date of birth,5-digit ZIP}

53% from {sex, date of birth, city}18% from {sex, date of birth, county}

Joss Wright Privacy-Preserving Data Analysis: 29/57

Page 30: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Netflix Prize

Netflix wanted to improve its film recommendation algorithm.

Published a database of over 100,000,000 film ratings by roughly500,000 subscribers between 1999 and 2005.

A million dollar prize was offered for an algorithm that wouldimprove the recommendations given to users by a given degree ofaccuracy.

“...all customer identifying information has been removed.”

Joss Wright Privacy-Preserving Data Analysis: 30/57

Page 31: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Netflix Prize

Narayanan and Shmatikov disagreed.Combined Netflix data with IMDb data to re-identify a largenumber of users.

Linked Netflix ratings to IMDb profiles.Showed the entire viewing history of many users.Demonstrated how information such as political preference couldbe extracted from the available data.Proof of concept algorithm used IMDb. Easily adaptable foralternative information sources.

Joss Wright Privacy-Preserving Data Analysis: 31/57

Page 32: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Netflix Prize

With 8 film ratings, 96% of subscribers can be uniquely identified.

With 2 ratings, and dates, 64% can be completely deanonymized.

With 2 ratings, and dates, 89% can be reduced to a possible 8users.

Joss Wright Privacy-Preserving Data Analysis: 32/57

Page 33: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Netflix Prize Redux

Following this publication, Netflix’s response was...

... to announce a second Netflix prize containing more data points,including age, zip code, gender and previously-chosen films.

Eventually cancelled, but only in response to legal action fromcustomers and concerns from the US Federal Trade Commission.

Joss Wright Privacy-Preserving Data Analysis: 33/57

Page 34: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Mechanisms Revisited

The mechanisms we’ve looked at so far are:Typically ad-hoc based on the desired utility; the purpose for whichthe data will be used.Without formal guarantees.

Quantifiable probability that individuals could be reidentified.

Sensitive to auxiliary information from external data sources.

Joss Wright Privacy-Preserving Data Analysis: 34/57

Page 35: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Mechanisms Revisited

We can also consider privacy mechanisms as falling into one oftwo families:

Non-interactiveAnonymize the data somehow, then release it.

InteractiveKeep the database secret, and only release results to queries.

Joss Wright Privacy-Preserving Data Analysis: 35/57

Page 36: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Non-Interactive Mechanisms

Historically, the main way of doing things.Including most of the methods we’ve looked at so far.

A major limitation of this approach to anonymization is that itrequires you to fix the utility before you release the data.

Data is either useless and anonymousOr useful and identifiable.It is difficult to predict interactions with data that might be releasedin the future.

Joss Wright Privacy-Preserving Data Analysis: 36/57

Page 37: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Interactive Mechanisms

In interactive mechanisms, the data is never released.

Instead, queries are sent to the holder of the database, whoreleases an answer.

This approach is taken by the current state of the art: differentialprivacy.

Joss Wright Privacy-Preserving Data Analysis: 37/57

Page 38: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy

In 1978, Dalenius stated the following desirable property forprivacy-preserving statistical databases:

“A statistical database should reveal nothing about an individualthat could not be learned without access to the database.”

This is impossible, largely due to the existence of auxiliary externalinformation that can be combined with the data in the database.

Joss Wright Privacy-Preserving Data Analysis: 38/57

Page 39: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy

‘Suppose one’s height were considered a sensitive piece ofinformation, and that revealing the height of an individual were aprivacy breach. Assume that a database yields the averageheights of women of different nationalities. An adversary whohas access to the statistical database and the auxiliaryinformation “Terry Gross is two inches shorter than the averageLithuanian woman” learns Terry Gross’ height, while anyonelearning only the auxiliary information, without access to theaverage heights, learns relatively little.’

– Dwork

Joss Wright Privacy-Preserving Data Analysis: 39/57

Page 40: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy

Critically, this privacy breach occurs whether or not Terry Gross’ data is inthe database.

Joss Wright Privacy-Preserving Data Analysis: 40/57

Page 41: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy

Rather than guaranteeing that a privacy breach will not occur,differential privacy guarantees that the privacy breach will notoccur due to the data in the database.

Reformulated: Anything that can happen if your data is in thedatabase could have happened even if your data weren’t in thedatabase.

This neatly accomodates any and all possible auxiliary informationavailable now or in the future.

It also divorces the privacy mechanism from the nature of theunderlying data, providing a general mechanism.

Joss Wright Privacy-Preserving Data Analysis: 41/57

Page 42: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy Core

A randomised function K achieves ϵ-differential privacy if, for any twodatabases D1,D2 differing on at most one element, and allS ⊆ Range(K):

Pr[K(D1) ∈ S] ≤ eϵ × Pr[K(D2) ∈ S]

Joss Wright Privacy-Preserving Data Analysis: 42/57

Page 43: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy Core

Alternatively:

Pr[K(D1)∈S]Pr[K(D2)∈S] ≤ eϵ

The ratio between the two probabilities is bounded by eϵ

Joss Wright Privacy-Preserving Data Analysis: 43/57

Page 44: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

The Exponential Function

0 1 2 3 4 5

050

100

150

ε

Joss Wright Privacy-Preserving Data Analysis: 44/57

Page 45: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Diferential Privacy Core

Translated: for any calculation that you make on a database, anyresult you get is (almost) equally probable if you add a person, andthus a single record, to that database.

Alternatively put: two databases that differ in a single recordshould be indistinguishable, with given probability, when accessedvia the privacy mechanism.

Joss Wright Privacy-Preserving Data Analysis: 45/57

Page 46: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Achieving Differential Privacy

How do we achieve this guarantee?

There are a variety of mechanisms proposed in the literature, butDwork’s original suggestion remains popular :Appropriately chosen random noise is added to the result of aquery of arbitrary complexity.

Noise added to the result means that the original database retainsits accuracy.The Laplace distribution provides desirable properties for theappropriate noise.

Joss Wright Privacy-Preserving Data Analysis: 46/57

Page 47: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

The Laplace Distribution

Joss Wright Privacy-Preserving Data Analysis: 47/57

Page 48: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Achieving Differential PrivacyHow do we know how much noise to add?

We use the L1-sensitivity of the function to bound the noise:Defined as the amount by which the query could change if a singlerecord were added to the database.

Recall that our guarantee is based around indistinguishabilitybetween similar databases.

As an example: the count function (e.g. “How many people in thedatabase are left-handed?”) can only differ by one.

Other queries types differ, but many complex queries havemanageable L1-sensitivity.

Joss Wright Privacy-Preserving Data Analysis: 48/57

Page 49: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Properties of Differential Privacy

Use of the Laplace distribution to add noise provably adds thesmallest amount required to preserve privacy.The multiplicative factor used in the guarantee is scalable forhigher or lower guarantees.

Higher values decrease the likelihood that databases can bedistinguished as a result of queries, but make results less accurate.

Joss Wright Privacy-Preserving Data Analysis: 49/57

Page 50: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy Illustrated

ba

Pr[x]

µ1 µ2

Joss Wright Privacy-Preserving Data Analysis: 50/57

Page 51: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Differential Privacy Illustrated (Explanation)

In the previous slide, let µ1 and µ2 be two “true” results of aquery, such as a count function, from each of two databases thatdiffer in a single record.

With random noise added, drawn from the Laplace distribution,both a and b are possible “noisy” results of the query for eitherdatabase.

Importantly, the ratio between the probability of a given noisyresult, such as a or b, based on µ1, and the probability of thatresult based on µ2, is constant.

Joss Wright Privacy-Preserving Data Analysis: 51/57

Page 52: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Properties of Differential Privacy

Differentially private queries are neatly composable in two senses:A complex sequence of queries can be given to the databaseowner, each of which depends on the accurate result of theprevious query. At the end, only the final result need be perturbed.The result of a differentially private query exhausts some amountof the privacy guarantee. Further queries can be made until thisbudget is exhausted.

At this point the database should be destroyed!

Joss Wright Privacy-Preserving Data Analysis: 52/57

Page 53: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Practical Application

Privacy Integrated Queries (PINQ)For practical application, we do not want database owners to needto understand the theory.There is now a simple database query language, similar to SQL,that automatically enforces differential privacy guarantees.Has been used in academic analyses, but not commercially.

Joss Wright Privacy-Preserving Data Analysis: 53/57

Page 54: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Practical Application

Smart GridsRecent work by Danezis demonstrates differentially-private smartmetering for electrical grids.Injects noise in billing by increasing the amount you pay.Rapidly gets very expensive, but gives quantifiable privacy goals.

Joss Wright Privacy-Preserving Data Analysis: 54/57

Page 55: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Future Work

Differential privacy is a very strong guarantee. How effectively canit be weakened?

Distributed settings for data sources and noise addition.

Streaming, or otherwise changing, data rather than static databases.

Joss Wright Privacy-Preserving Data Analysis: 55/57

Page 56: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Lessons

A step back: anonymizing data is hard.We are only just beginning to realise just how hard.Differential privacy, and PINQ, are good examples of how to goabout this and what limitations we face.

Netflix and other examples show that these risks are not isolatedor theoretical.

This is before we look at Facebook, Google, Amazon.

Joss Wright Privacy-Preserving Data Analysis: 56/57

Page 57: Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

oiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioxford internet ins�tute university of

oxfordoiioiioiioiioiioiioiio

Privacy Mechanisms Notable Cases State of the Art Conclusions

Lessons

If you are in a position where you need to anonymize data, thinkvery carefully about how you treat the data, and what you release.

Eyeballing data, and removing obvious linkages, is not even close tosufficient.Do it if you want to, but don’t claim it’s anonymized.

The most important principle is data minimisation.Only gather what you need.Only use it for what you (initially) need.Only share it when you must.

Joss Wright Privacy-Preserving Data Analysis: 57/57