raef bassily computer science & engineering pennsylvania state university new tools for...
TRANSCRIPT
Raef Bassily Computer Science &
Engineering Pennsylvania State University
New Tools for Privacy-Preserving Statistical Analysis
IBM Research Almaden
February 23, 2015
Privacy in Statistical Databases
Aqueries
answers)(
Government,researchers,businesses
(or) maliciousadversary
Curatorx1
x2
xn
...
Users
• Two conflicting goals: Utility vs. Privacy
internet
social networks
anonymized datasets
• Balancing these goals is tricky: No control over external sources of information Ad-hoc Anonymization schemes are unreliable:
[Narayanan-Shmatikov’08],
[Korolova’11],
[Calendrino et al.’12], …
Need algorithms with robust, provable privacy guarantees.
This work
Gives efficient algorithms for statistical data analyses with optimal accuracy under rigorous, provable privacy guarantees.
Differential privacy [DMNS’06, DKMMN’06]
local random coins
A A
local random coins
x1
x2
xn
x2’
x1
Datasets x and x’ are called neighbors if they differ in one record.
xn
Require: Neighbor datasets induce close distributions on outputs
Def.: A randomized algorithm A is -differentially private if, for all neighbor data sets and , for all events ,
“Almost same” conclusions will be reached from the output regardless of whether any individual opts into or opts out of the data set.
Think of Think of
Worst-case definition:
DP gives same guarantee regardless of side information of attacker.
Two regimes:
-differential privacy
-differential privacy,
Two models for private data analysis
A
Individuals TrustedCuratorx1
x2
xnA is differentially private
w.r.t. datasets of size n
Centralized model
B
Individuals Untrusted Curator
y1
y2
yn
x1
x2
xn
Q1
Q2
Qn
Each Qi is differentially private w.r.t. datasets of size 1
Local model
This talk
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
This talk
Example of Convex ERM: Support Vector Machines
• Goal: Classify data points of different “types”
Find a hyper-plane separating two different “types” of data points.
• Many applications Medical studies: Disease classification
based on protein structures.
Tested +ve
Tested -ve
• Many applications Medical studies: Disease classification
based on protein structures.
• Coefficients of hyper-plane is the solution of a convex optimization problem defined by the data set.
• is given by a linear combination of only few data points called support vectors.
Convex empirical risk minimization
C
• Dataset .
• Convex constraint set .
• Loss function
where is convex for all .
Convex empirical risk minimization
Actual minimizer
C
• Dataset .
• Convex constraint set .
• Loss function
where is convex for all .
• Goal: Find a “parameter”
that minimizes
Excess risk
OutputActual minimizer
C
• Dataset .
• Convex constraint set .
• Loss function
where is convex for all .
• Goal: Find a “parameter”
that minimizes
• Output such that
Convex empirical risk minimization
Other examples
• Median
• Linear regression
Why privacy is hard to maintain in ERM?
• Dual form of SVM: typically contains a subset of the exact data points in the clear.
• Median: Minimizer is always a data point.
Private convex ERM [Chaudhuri-Monteleoni 08 & -- Sarwate 11]
• Studied by [Chaudhuri-et-al ‘11, Rubinstein-et-al ’11, Kifer-Smith-Thakurta‘12, Smith-Thakurta ’13, …]
• Privacy: A is differentially private in input • Utility measured by (worst-case) expected excess risk:
A -diff. private
Dataset
Convex setLoss , Random coins
• Best previous work [Chaudhuri-et-al’11, Kifer et al.’12] address special case (smooth functions) Application to many problems (e.g., SVM, median, …)
introduces large additional error.
Contributions [B, Smith, Thakurta ‘14]
• This work improves previous excess risk bounds by factor of
1. New algorithms with optimal excess risk assuming:
• Loss function is Lipschitz.
• Parameter set C is bounded.
(Separate set of algorithms for strongly convex loss.)
2. Matching lower bounds
Privacy Excess risk Technique
-DPExponential sampling(inspired by [McSherry-Talwar’07])
-DPNoisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and[Chaudhuri-Sarwate-Song’13])
Normalized bounds: Loss is 1-Lipschitz on parameter set C of diameter 1.
Results (dataset size = , C )
Privacy Excess risk Technique
-DPExponential sampling(inspired by [McSherry-Talwar’07])
-DPNoisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and[Chaudhuri-Sarwate-Song’13])
Results (dataset size = , C )
Normalized bounds: Loss is 1-Lipschitz on parameter set C of diameter 1.
Exponential sampling
• Define a probability distribution over C :
• Output a sample from C according to
An instance of the exponential mechanism [McSherry-Talwar’08]
Efficient construction based on rapidly mixing MCMC: Uses [Applegate-Kannan’91] as a subroutine. Provides purely multiplicative convergence guarantee. Does not follow directly from existing results.
Tight utility analysis via a “peeling” argument: Exploits structure of convex functions:
A1 , A2 , … are decreasing in volume
Shows that when
• Run SGD with noisy queries for
sufficiently many iterations.
Noisy stochastic gradient descent
• Our contributions: Tight privacy analysis
Stochastic privacy amplification Running SGD for many iterations (T = n2 iterations) optimal
excess risk.
Remarks:
• Stochastic part only for efficiency.
• Empirically, [CSS’13] showed few
iterations are enough in some cases.
Generalization error
For a distribution , generalization error at :
For any distribution , for output of any -DP algorithm:
• -DP algorithm such that:
• -DP algorithm such that:
• Generalized linear model: we get optimal.
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
This talk
Finance.com
Fashion.com
WeirdStuff.com
How many users like Business.com?
...
A conundrum
server
How can the server compute aggregate statistics about users
without storing user-specific information?
...
n
1
2
... Untrusted server
A set of items (e.g. websites) = [d] = {1, …, d}Set of users = [n] Frequency of an item a is f(a) = ( users holding a♯ )/n
Finance.com
Fashion.com
WeirdStuff.com
Goal is to produce a succinct histogram: a list of frequent items (“heavy hitters”) and estimates of their frequencies while providing rigorous privacy guarantees to the users.
. . . . . .
1 2 3Item ♯ . . . . . . d-2 d-1 d
. . . . . .
. . . . . .
1 2 3Item ♯ . . . . . . d-2 d-1 d
. . . . . .
Succinct histogram =
for some
implicitly
Succinct histograms
Local model of Differential Privacy
Algorithm Q is -local differentially private (LDP) if for any pair v, v’ [d], for all events S,
v1
...
v2
vn
Q1
Q2
Qn
...
z1
z2
zn
Succinct histogram
is item of user
zi is differentially-private report of user i
LDP protocols for frequency estimation is used
• in Chrome web browser (RAPPOR) [Erlingsson-Korolova-Pihur’14]
• as a basis for other estimation tasks [Dwork-Nissim’04]
Error is measured by the worst-case estimation error:
Performance measures
v1
...
v2
vn
Q1
Q2
Qn
...
z1
z2
zn
Succinct histogram
is item of user
zi is differentially-private report of user i
A protocol is efficient if it runs in time poly(log(d), n)Communication Complexity measured by number of bits transmitted per user.
• d is very large, e.g., number of all possible URL’s
• log(d) = # of bits to describe single URL
Contributions [B, Smith ‘15]
1. Efficient -LDP protocol with optimal error:• run in time poly(log(d), n).
• Estimate all frequencies up to error .
2. Matching lower bound on the error.
3. Generic transformation reducing the communication
complexity to 1 bit/user.• Previous protocols either
ran in time [Mishra-Sandler’06, Hsu-Khanna-Roth’12, EKP’14]
or, had larger error [HKR’12]
Too slow
Too much error• Best previous lower bound was
• UHH: at least fraction of users have the same item
while the rest have (i.e., “no item”)
Design paradigm
• Reduction from a simpler problem with a unique heavy hitter
(UHH problem) Efficient protocol with optimal error for UHH
efficient protocol with optimal error for the general problem.
Construction for the UHH problem
v*
v*
...
Encoder
Encoder
z1Noising operator
z2Noising operator
znNoising operator
Round Decoder
(error-correcting code)
Key idea: is the signal-to-noise ratio. Decoding succeeds when
• Each user has either v* or • v* is unknown to the server • Goal: Find v* and estimate f(v*)
Similar to [Duchi et al.’13]
• Guarantees that w.h.p., every heavy hitter is allocated a “collision-free” copy of the UHH protocol.
v1
vn
Hash
...
Hash
..
1
K
2
..
1
K
2
..
..
v1
vn
..
..
UHH
UHH
UHH.
UHH
UHH
UHH.
Item whose frequency
Construction for the general setting
Key insight: • Decompose general scenario into multiple instances of UHH via hashing. • Run parallel copies of the UHH protocol on these instances.
. . .
Efficient Private
Protocol for a
unique heavy hitter
UHH
Efficient Private Protocol for estimating all heavy hitters
Efficient Private
Protocol for a
unique heavy hitter
UHH
Time poly(log(d), n) All frequencies up to the optimal error
Efficient Private
Protocol for a
unique heavy hitter
UHH
Recap: Construction of succinct histograms
Transforming to a protocol with 1-bit reports
• generate public random string; one for each user• User i sends a biased bit Bi
• Conditioned on Bi = 1, the public string has the same distribution as the output of
local randomizer Qi
Gen(Qi , vi , si)vi Bi
si
Local randomizer: Qi
IF Bi = 1, THEN
report of user i = si
ELSE ignore user i
This transformation works for any local protocol not only heavy hitters.
Key idea: What matters is the distribution of the output of each local randomizer.
Public string does not depend on private data: can be generated by untrusted server.
For our HH protocol, this transformation gives essentially same error and computational
efficiency (Gen can be computed in O(log(d))).
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
This talk
Attacker’s side information
Aqueries
answers)(
Curatorx1
xi
xn
..
Attackerinternet
social networks
anonymized datasets
..
Attacker’s side information is the main reason privacy is hard.
Attacker’s side information
Aqueries
answers)(
Curatorx1
xi
xn
..
Omniscientattacker
..everything
except xi
Differential privacy is robust against arbitrary side information.Attackers typically have limited knowledge.
Contributions [B, Groce, Katz, Smith’13]: • Rigorous framework for formalizing and exploiting
limited adversarial information: coupled-worlds privacy • Algorithms with higher accuracy than is possible under
differential privacy
Exploiting attacker’s uncertainty [BGKS’13]
Aqueries
answers)(
Curatorx1
xi
xn
..
Attacker
..
Side info in Δ
for any side information in Δ, Given some restricted class of attacker’s knowledge Δ,
the output of A must “look the same” to the attacker regardless of whether any single individual is in or out of the computation.
Distributional Differential Privacy [BGKS’13]
local random coins
A A
local random coins
x1
xi
xn
xi
x1
xn
A is -DDP if,
for any distribution on the data set , for any index i, for any value v of a data entry, and for any event
This implies: for all distributions and for all i, w.p. : For any distribution in Δ, almost same inferences will be made about Alice whether or not Alice’s data is present in the data set.
What can we release exactly and privately?
• Sums whenever the data distribution has a small uniform component.
• Histograms constructed from a random sample from the population.
• Stable functions small probability that the output changes when any single entry
of the dataset changes.
Under modest distributional assumptions, we can release several exact statistics while satisfying DDP:
Conclusions
• Privacy, a pressing concern in “Big Data”, but hard to define intuitively.
• Differential privacy, a sound rigorous approach:
Robust against arbitrary side information
• This work:
the first efficient differentially private algorithms with optimal
accuracy guarantees for essential tasks in statistical data analysis.
generic definitional framework for privacy relaxing DP.