towards robustness in query auditing

31
Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani

Upload: boris

Post on 19-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Towards Robustness in Query Auditing. Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani. Data Mining vs Privacy. Large amount of data available in digital form Statisticians query data to mine useful trends - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Robustness in Query Auditing

Towards Robustness in Query Auditing

Shubha U. NabarStanford University

VLDB 2006

Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani

Page 2: Towards Robustness in Query Auditing

Data Mining vs Privacy

• Large amount of data available in digital form

• Statisticians query data to mine useful trends

• Potential for privacy breaches

Page 3: Towards Robustness in Query Auditing

Online Query Auditing

• Given a stream of queries over a DB containing private information, when should queries be denied to protect privacy?

• Our focus: Statistical DBs: census, hospital, employee Only one private attribute, e.g., salary, disease Statistical queries over private attribute: sum, max, mean Stream of queries of single type from single user

Page 4: Towards Robustness in Query Auditing

Online Query Auditing

Company Database

Name Age Sex Salary

Alice 23 F 42K

Bob 25 M 50K

Carl 30 M 80K

Dave 21 M 35K

Sum of salaries offemale employees

42,000

AdversaryAlice’s

salary = $42,000!

Page 5: Towards Robustness in Query Auditing

Online Query Auditing

• In general, more complex queries can be posed and answers put together to deduce information

• Task of auditor: deny query when answer to current and past queries can be “stitched together” to leak information.

Page 6: Towards Robustness in Query Auditing

Our Contributions

• Auditor for max queries • Auditor for combinations of max and min queries• A first analysis of the utility of an auditing scheme

Page 7: Towards Robustness in Query Auditing

Related Work

• Perturbing data itself [W ‘65, AS ‘00, EGS ’03, CDMSW ‘05]

• Perturbing results supplied to user [DN ‘03, DMNS ‘06]

Statisticians unhappy with addition of noise

Auditors provide exact answers if at all

Page 8: Towards Robustness in Query Auditing

Previous Work

• Restricting Size and Overlap of Queries [Dobkins, Jones, Lipton ‘79]

• Offline Auditing [Chin ‘86]

• Auditing for Boolean Attributes [Kleinberg, Papadimitriou, Raghavan ‘03]

• Auditing Compliance with a Hippocratic Database[Agrawal, Bayardo, Faloutsos, Kiernan, Rantzau, Srikant ’04]

• Simulatable Auditing [Kenthapadi, Mishra, Nissim ‘05]

Page 9: Towards Robustness in Query Auditing

Naïve Auditor

If answer to current query causes an element to

be determined, deny

Adversary Company Database

Alice 23 F 42K

Bob 25 M 50K

Carl 30 M 80K

Dave 21 M 35K

max salary{Alice,Bob,Carl}

80,000

max salary{Alice,Bob}

denied

Carl’s salary = $80,000!

Name Age Sex Salary

Page 10: Towards Robustness in Query Auditing

Simulatability

• Denials based on answer to current query may cause privacy breach

• Solution: If attacker can simulate and predict decision to deny ) denials do not leak information

• Auditor: If there is any dataset consistent with past answers in which current query causes breach, deny Attacker can check condition himself Denials do not leak information

Page 11: Towards Robustness in Query Auditing

Goal

Find online, efficient, simulatable, high-utility

auditors for various classes of queries

Page 12: Towards Robustness in Query Auditing

Definition of Privacy Breach

• Full Disclosure: some private data point can be uniquely determined e.g. max{xa, xb, xc} = 10 max{xa, xb} = 8 ) xc = 10

• Partial Disclosure (probabilistic compromise): significant change in attacker’s confidence about some private data value

Page 13: Towards Robustness in Query Auditing

Probabilistic Compromise

• Private data known to be drawn according to D• Range of each data point divided in to intervals

SDB

qt

at

0 1 0 1

query

Prior Posterior

Page 14: Towards Robustness in Query Auditing

Outline

• Problem Statement• Previous Work• Auditing Max Queries• Auditing Max and Min Queries• Utility• Future Work

See paper for auditing against full disclosure

Page 15: Towards Robustness in Query Auditing

Skeleton of Probabilistic Auditor

1. Attacker poses query qt

2. Attacker has posterior distribution over answer to qt, given previous answers

3. Auditor repeatedly:a. Samples possible answer from this distribution

b. Checks if sampled answer will change attacker’s belief about some data point

4. If qt “unsafe” in significant fraction of samples, deny

Need to estimate posterior distributions in 2. and 3b.

Page 16: Towards Robustness in Query Auditing

Probabilistic Max Auditor

• Assumption: dataset drawn uniformly at random from set of duplicate-free points in [,]n

For each xi and any interval in [α, ] prior prob uniform

• Given answers to set of queries, what are posterior probabilities?

Page 17: Towards Robustness in Query Auditing

Probabilistic Max Auditor

• Given queries q1…qt and answers a1…at create synopsis Bmax

• Bmax contains predicates

[max(S1) = a1], [max(S2) < a2]…

Sis are disjoint

• Bmax enables succinct representation of audit trail

• Bmax enables computation of posterior probabilities

Page 18: Towards Robustness in Query Auditing

Determining Posterior Probabilities

max{xa, xb, xc} = 0.75

xa

xb

xc

(0.75, 0, 0)

(0, 0.75, 0)(0, 0, 0.75)

Pr{xa 2 [0,0.25]}Pr{xa 2 [0.25,0.5]}Pr{xa 2 [0.5,0.75)}Pr{xa = 0.75}

• Pr{xa = 0.75} = 1/3, since any one of xa, xb or xc is equally likely to be max

• With remaining 2/3 probability, xa is uniformly distributed in [0,0.75)

Page 19: Towards Robustness in Query Auditing

Probabilistic Max Auditor

1. Attacker poses query qt

2. Attacker has posterior distribution over answer to qt, given previous answers

3. Auditor repeatedly:a. Samples possible answer from this distribution

b. Checks if sampled answer will change attacker’s belief about some data point

4. If qt “unsafe” in significant fraction of samples, deny

Can give guarantees on probability that adversary learns

new information

Page 20: Towards Robustness in Query Auditing

Outline

• Problem Statement• Previous Work• Auditing Max Queries• Auditing Max and Min Queries• Utility• Future Work

Page 21: Towards Robustness in Query Auditing

Probabilistic Max-and-Min Auditor

• Computing posterior probabilities becomes harder

• Given queries, create synopsis so that a data point occurs in at most one max and one min predicate

Page 22: Towards Robustness in Query Auditing

Equivalent Graph Coloring Problem

max{xa, xb, xc} = 1 min{xa, xb} = 0.2

max{xd, xe} = 2 min{xc, xd, xe} = 0.5

a, b, c a, b

d, e c, d, e

Every valid coloring corresponds to a set of consistent datasets

Page 23: Towards Robustness in Query Auditing

Probabilistic Max-and-Min Auditor

We show Can sample consistent dataset according to posterior

distribution by sampling valid coloring according to distribution P

Can sample valid coloring according to P using markov chain over colorings

Can use sampled colorings to answer questions about posterior distribution of data points up to arbitrary precision

See paper for details

Page 24: Towards Robustness in Query Auditing

Outline

• Problem Statement• Previous Work• Auditing Max Queries• Auditing Max and Min Queries• Utility• Future Work

Page 25: Towards Robustness in Query Auditing

Utility

• Several dimensions of utility: How many queries are answered? What kinds of queries are answered? What can be computed? “Price of simulatability”

• Expected time to first denial

Page 26: Towards Robustness in Query Auditing

Utility of Sum Auditor

• Consider full disclosure

• No prior knowledge – data points come from unbounded range

• Queries chosen uniformly at random

Page 27: Towards Robustness in Query Auditing

Sum Auditor

1 0 1 0 1 1 1 0 0 01 0 0 0 11 1 0 0 1

xa

xb

xc

xd

xe

=

a1

a2

a3

a4

1 0 0 0 0 0 1 0 0 00 0 1 0 00 0 0 1 0

xa

xb

xc

xd

xe

=

a2 - a4 + a3

a4 – a3

a1 - a3

a4 - a2

Page 28: Towards Robustness in Query Auditing

Utility of Sum Auditor

• We show, expected time to first denial ¸ n/4 · n + lgn

• Good news for large databases – answers not riddled with denials

• Can’t do much better

• Once n-1 independent queries are answered, at least half the queries will be denied on average

Page 29: Towards Robustness in Query Auditing

Utility of Sum Auditor

• Reality Users do not choose queries uniformly at random Users cannot query arbitrary subsets of the data Database frequently updated – old information

becomes irrelevant

e.g. q1 = xa + xb + xc ; xa is modified

q2 = xa + xb

q2 will no longer be denied

• Denials may not be so frequent in reality

Page 30: Towards Robustness in Query Auditing

Utility: Experiments

Plot 1: Sum queries chosen uniformly at randomPlot 2: Sum queries with updatesPlot 3: 1 dimensional range sum queries

Page 31: Towards Robustness in Query Auditing

Future Work

• Ways to proactively enhance utility Deny innocuous queries in the present in the hope that more

can be answered in the future• Ward off denial of service attacks

• Devise auditors, study utility for more complex queries

• Remove assumptions about prior knowledge

• Solution to collusion