school of computing university of dundee · a. haeberlen motivation: protecting privacy 2 usenix...

46
Marco Gaboardi School of Computing University of Dundee Differential privacy

Upload: others

Post on 13-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Marco GaboardiSchool of ComputingUniversity of Dundee

Differential privacy

Page 2: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Page 3: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

medicalcorrelation?

queryanswer

Page 4: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

answer

query

Does Joe have

cancer? I know he traveled toAtlantis...

Page 5: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Does Joe have

cancer?

Page 6: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Attacker

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Page 7: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

A classic case...

Page 8: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Adding statistical noise

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

Page 9: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

query

answer + noise

medicalcorrelation?

Adding statistical noise

Page 10: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

answer + noise

query

?!?

Adding statistical noise

I know he traveled toAtlantis...

Page 11: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

?!?

Adding statistical noise

Page 12: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

The promise of Differential Privacy

“You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, data sets, or information sources, are available.”

C. Dwork, A. Roth 2014

Page 13: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

(ε,δ)-Differential Privacy

DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:

Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

Page 14: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

(ε,δ)-Differential Privacy

DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:

Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

A query returning an answer with noise

Page 15: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

(ε,δ)-Differential Privacy

DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:

Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

Privacy parameters

Page 16: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

(ε,δ)-Differential Privacy

DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:

Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

a notion of individual data we want to protect

Page 17: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

(ε,δ)-Differential Privacy

DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:

Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

and over all the possible “bad” outcomes

Page 18: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

A way to achieve it: adding noise

• Suppose that we have x1,..,xn ∈[0,1]

• we want to compute mean(x1,..,xn) = μ

• Release a noised version

μ° = μ + Laplace noise ~ ε

Page 19: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

• Suppose that we have x1,..,xn ∈[0,1]

• we want to compute mean(x1,..,xn) = μ

• Release a noised version

μ° = μ + Laplace noise ~ ε

ε-differentially private

A way to achieve it: adding noise

Page 20: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Q(b) Q(b∪{x})

ε-Differential Privacy

Page 21: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Probability of a bad event

Bad Event

Pr[Q(b∪{x})∈S] Pr[Q(b) ∈ S]

log ≤ε

Page 22: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Probability of a bad event

Bad Event

Dataset bad event

without my data 16%with my data 16.8%

Page 23: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Bad Event

(ε,δ)-Differential Privacy

Overall ε is agood bound.

Page 24: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Bad Event

(ε,δ)-Differential Privacy

Overall ε is agood bound.

it can failwith probability δ.

Page 25: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Noise vs Accuracy• We run in the first place a data analysis for

having an accurate answer,

• We have several ways to estimate how good is our answer. Two examples:

- by comparing the answer with the output of the noise-free computation

- by comparing the answer with the expected value of the query on the population

Page 26: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Accuracy of the analysis60% 10.2±0.5

80% 10.2±1

90% 10.2±2

(α,β)-accurate

Page 27: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

M(ε,δ)

Privacy-Accuracy trade-off

Page 28: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

A simple DP mechanism

query ε-differentially privatequery

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

Laplace Noise ~ ε

Page 29: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Composition• We can easily compose differentially private data

analysis with the guarantee degrading gracefully.

ε-DP Query

ε-DP Query

ε-DP Query...

.....

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Page 30: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

n*ε-DP Query

Composition• We can easily compose differentially private data

analysis with the guarantee degrading gracefully.

ε-DP Query

ε-DP Query

ε-DP Query...

.....

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Page 31: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Composition• We can easily compose differentially private data

analysis with the guarantee degrading gracefully.

ε-DP Query

ε-DP Query

ε-DP Query

......

..

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Page 32: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

ε-DP Query

Composition• We can easily compose differentially private data

analysis with the guarantee degrading gracefully.

ε-DP Query

ε-DP Query

ε-DP Query

......

..

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Page 33: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Linking different data

Page 34: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Tools• The composition properties permit to build

tools for automatic ensuring differential privacy:

- PINQ (Microsoft Research)

- Airavat (U. of Texas)

- Fuzz (U. of Pennsylvania)

- CertiPriv (Imdea Software Madrid)

- Gupt (Berkeley U.)

Page 35: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Naturaldifferential privacy?

Page 36: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

The role of ε and δ• We have that ε is:

- a bound on the increase probability of a bad event to happen if I decide to participate in a study,

- an overall privacy budget for a given collection of data

• Instead, δ is probability that the analysis will fail to provide me the guarantee

Page 37: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

M(ε,δ)

Privacy-Accuracy trade-off

Page 38: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

The analyst’s view

• The mechanism has an error probability AM(N,ε,δ)

• Goal: obtain an answer with error probability smaller than a target α:

• don’t exceed the budget B.

AM(N,ε,δ) ≤ α

Page 39: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

The individual’s view

• He has a cost function f on the output of M,

• He has a worst-case cost W

• He is compensated with S

• goal: participating if he is compensated for the cost.

(eε-1)Ef + δW≤ S

Page 40: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

The individual’s view

• He has a cost function f on the output of M,

• He has a worst-case cost W

• He is compensated with S

• goal: participating if he is compensated for the cost.

(eε-1)Ef + δW≤ SIncreasing in the cost if he

participates

Page 41: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

The individual’s view

• He has a cost function f on the output of M,

• He has a worst-case cost W

• He is compensated with S

• goal: participating if he is compensated for the cost.

(eε-1)Ef + δW≤ SIncreasing in the cost if he

participates

Cost in the case of failure

Page 42: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Combining the two views

(eε-1)EfN+ δWN≤ B

AM(N,ε,δ) ≤ α{e

N

B

a

Figure 2: Feasible e,N, for accuracy a and budget B.

For the budget constraint, let the budget be B = 4.5⇥ 105. Toestimate each individual’s base cost, we need to reason about theindividual’s costs that might be affected by this study.

For the sake of example, suppose that the health insurance com-pany does not know that the individual smokes, and the individualis afraid that the insurance company will discover this and raise herpremiums. Taking some average figures, the average health insur-ance premium for smokers is $1274 more, compared to nonsmok-ers [20]. Thus, some participants fear a price increase of $1274.10

To calculate how much individuals should be compensated, wereason about the probability that the insurance company finds outthat an individual smokers, even if she does not participate in thestudy. This is not impossible—perhaps an insurance agency em-ployee observes the individual smoking outside. However, it alsois not very likely—insurance agencies generally do not spy on in-dividuals, trying to catch smokers.

So, suppose the participants think there is a moderate, 20%chance that the insurance company finds out that they are smok-ers, even if they do not participate.11 Thus, we can upper boundthe base cost by E = 0.20 ·1274 = 254.8—this is the cost the par-ticipants expect, even if they do not participate.

Plugging these numbers in, Equation (8) is satisfied and the studyis feasible. For instance, e = 8.4⇥ 10�4 and N = 2⇥ 106 satisfythe original accuracy and budget constraints; each participant iscompensated (ee � 1) ·E = $0.22, for a total cost of 4.4⇥ 105 <B = 4.5⇥105.

Figure 1 gives a pictorial representation of the constraints. Fora fixed accuracy a , the blue curve (marked a) contains values ofe,N that achieve error a . The blue shaded region (above the acurve) shows points that are feasible for that accuracy—there, e,Ngive accuracy better than a . The red curve (marked B) and redshaded region (below the B curve) show the same thing for a fixedbudget B. The intersection of the two regions (the purple area)contains values of e,N that satisfy both the accuracy constraint,and the budget constraint. Figure ?? shows the curves for differentset values of a and B.

10Note that it would not make sense to include the participant’s cur-rent health insurance cost as part of the bad event—participating ina study will not make it more likely that participants need to payfor health insurance (presumably, they are already paying for it).Rather, participating in a study may lead to a payment increase,which is the bad event.

11Presumably, nonsmokers would expect much lower cost. Theyshould not expect zero cost, since there is always a chance that theinsurance company wrongly labels them a smoker, but this happenswith low probability.

e

N

B1

B2

a2

a1

Figure 3: Constant accuracy curves for a1 < a2, constant bud-

get curves for B1 < B2

A more realistic example: answering many queries Theprevious example has a significant drawback in that the mechanismonly answers a single query. Any realistic study, medical or oth-erwise, will need many more queries. In this section, we will seehow to carry out calculations for a more sophisticated algorithmfrom the privacy literature: the multiplicative weights exponentialmechanism (MWEM) [13, 14].

MWEM is a mechanism that can answer a large number (ex-ponential in N) of counting queries: queries of the form “Whatfraction of the records in the database satisfy property P?” For aconcrete example, suppose that the space of records is bit stringsof length 20, i.e. X = {0,1}20. Each individual’s bit string canbe thought of as a list of attributes: the first bit might encode thegender, the second bit might encode the smoking status, the thirdbit might encode whether the age is above 50 or not, etc. Then,queries like “What fraction of subjects are male, smokers and above50?”, or “What proportion of subjects are female nonsmokers?” arecounting queries.

We will use an accuracy bound for MWEM [14]. For a datauniverse X , set of queries C and number of records N, the e-private MWEM answers all queries in C to within additive error Twith probability at least 1�b , where

T =

0

@128ln |X | ln

⇣32|C | ln |X |

bT 2

eN

1

A1/3

.

To fit this into our model, we define the accuracy functionA(e,N) to be the probability of exceeding error T on any query,i.e. b above. Solving, we can set

A(e,N) := b =32|C | ln |X |

T 2 exp✓� eNT 3

128ln |X |

◆,

fixing the accuracy constraint A(e,N) a . The budget constraintis (ee �1)EN B, like the previous example.

For the various parameters, suppose we want X = {0,1}5 so|X | = 25 and accuracy T = 0.2 for 20% error. Further, we wantto answer 1000 queries, so |C | = 1000. For the budget, supposethe individuals remain worried about their health insurance premi-ums, so E = 254.8. If the budget B > 2.9⇥108, the constraints aresatisfiable—take e = 0.2, N = 5⇥ 106, when each participant ispaid (ee �1)E = 56.4. In Section 8, we will see a different versionof MWEM with better costs.

Page 43: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Who is developing DP?

• Computer scientists from different research areas: algorithms, machine learning, programming languages, systems and networks, cryptography, databases,

• Statisticians and data analysists,

• Researchers in social science, medicine?

Page 44: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Research projects

• Putting differential privacy to work University of Pennsylvania

• Privacy tools for sharing research dataHarvard University

• Enabling medical research with differential privacyUniversity of Illinois

Page 45: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Who is interested in DP?

• Facebook ads platform,

• The california public utilities commission,

• US census “on the map”,

• Orange security lab,

• Few start-ups for third parties analysis,

• Several research projects.

Page 46: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505

Thanks!