from stability to differential privacy

From Stability to Differential Privacy

Abhradeep Guha ThakurtaYahoo! Labs, Sunnyvale

Thesis: Stable algorithms yield differentially private algorithms

Differential privacy: A short tutorial

Privacy in Machine Learning Systems

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals

Trusted learning Algorithm

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals

UsersSumma

ry statistic

s1. Classifiers2. Clusters3. Regressio

n coefficients

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals

UsersSumma

ry statistic

s1. Classifiers2. Clusters3. Regressio

n coefficients

Attacker

Learning Algorithm

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Two conflicting goals:

1. Utility: Release accurate information

2. Privacy: Protect privacy of individual entries

Balancing the tradeoff is a difficult problem:

1. Netflix prize database attack [NS08]

2. Facebook advertisement system attack [Korolova11]

3. Amazon recommendation system attack [CKNFS11]

Data privacy is an active area of research:

• Computer science, economics, statistics, biology, social sciences …

Differential Privacy [DMNS06, DKMMN06]Intuition:

• Adversary learns essentially the same thing irrespective of your presence or absence in the data set

• and are called neighboring data sets

• Require: Neighboring data sets induce close distribution on outputs

Random coins

𝑑1M()

Random coins

𝑑1M()

Data set: Data set:

Differential Privacy [DMNS06, DKMMN06]

Definition:

A randomized algorithm M is -differentially private if

• for all data sets and that differ in one element• for all sets of answers

•Differential privacy is a condition on the algorithm

•Guarantee is meaningful in the presence of any auxiliary information

• Typically, think of privacy parameters: and , where = # of data samples

• Composition: ’s and ‘s add up over multiple executions

Semantics of Differential Privacy

Laplace Mechanism [DMNS06]

Data set and be a function on

Sensitivity: S()

1. Random variable sampled from Lap() 2. Output

Theorem (Privacy): Algorithm is -differentially private

This Talk

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

Perturbation stability (a.k.a. zero local sensitivity)

Perturbation Stability

Function

Data set

Output

Perturbation Stability

Function

Data set

Output

Stability of at : The output does not change on changing any one entryEquivalently, local sensitivity of at is zero

Distance to Instability Property

•Definition: A function is stable at a data set if• For any data set , with ,

•Distance to instability:

•Objective: Output while preserving differential privacy

All data setsUnstable data sets

𝐷Distance

Stable data sets

Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.’13]

1. If, then return , else return

A Meta-algorithm: Propose-Test-Release (PTR)

Theorem: The algorithm is differentially private

Theorem: If is -stable at , then w.p. the algorithm outputs

Basic tool: Laplace mechanism

This Talk

Sample and aggregate framework[NRS07, Smith11, Smith T.’13]

Sample and Aggregate FrameworkData set

Subsample

𝐷1 𝐷𝑚

Output

Algorithm

Aggregator

Sample and Aggregate Framework

Theorem: If the aggregator is differentially private, then the overall framework is differentially private

Assumption: Each entry appears in data blocks

Proof: Each data entry affects only one data block

A differentially private aggregator using PTR framework [Smith T.’13]

Assumption: discrete possible outputs

𝑆1 𝑆2 𝑆∗ 𝑆𝑟

𝐷1 𝐷𝑚

Vote Vote

An differentially Private Aggregator

Function : Candidate output with the maximum votes

PTR+Report-Noisy-Max Aggregator

Observation: is the gap between the counts of highest and the second highest scoring modelObservation: The algorithm is always computationally efficient

Analysis of the aggregator under subsampling stability [Smith T.’13]

Subsampling Stability

Data set

Random subsamplewith replacement 𝐷1 𝐷𝑚

Function

Stability:

Function¿ w.p.

A Private Aggregator using Subsampling Stability

Voting histogram (in expectation)

𝑆1 𝑆2 𝑆∗ 𝑆𝑟

34𝑚

12𝑚

14𝑚

• : Sample each entry from w.p.

• Each entry of appears in data blocks

PTR+Report-Noisy-Max Aggregator

• Each entry of appears in data blocks w.p.

1. If, then return , else return 𝑆1 𝑆2 𝑆∗ 𝑆𝑟

Theorem: Above algorithm is differentially private

Theorem: If ,then with probability at least , the true answer is output

A Private Aggregator using Subsampling Stability

Notice: Utility guarantee does not depend on the number of candidate models

This Talk

Sparse linear regression in high-dimensions and the LASSO

Sparse Linear Regression in High-dimensions ()• Data set: where and

• Assumption: Data generated by noisy linear system

𝑦 𝑖 +¿¿

𝑥𝑖

𝜃𝑝× 1∗

𝑤𝑖

r vect

Field noise

Feature vector

Data normalization:

• is sub-Gaussian

Sparse Linear Regression in High-dimensions ()• Data set: where and

• Assumption: Data generated by noisy linear system

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

or Design matrix

r vect

• Sparsity: has non-zero entries

• Bounded norm: for arbitrary small const.

Model selection problem: Find the non-zero coordinates of

Sparse Linear Regression in High-dimensions ()

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

or Design matrix

Model selection: Non-zero coordinates (or the support) of

Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…]

Sparse Linear Regression in High-dimensions ()

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

or Design matrix

Incoherence Restricted Strong Convexity

Consistency of the LASSO Estimator

Consistency conditions* [Wainwright06,ZY07]:

• Support of the underlying parameter vector

𝑋 Γ 𝑋 Γ 𝑐

Restricted Strong Convexity

Consistency of the LASSO Estimator

Theorem*: Under proper choice of and , support of the LASSO estimator equals support of

Incoherence

Incoherence Restricted Strong Convexity

Stochastic Consistency of the LASSO

Theorem [Wainwright06,ZY07]: If each data entry in , then the assumptions above are satisfied w.h.p.

We show [Smith,T.’13]

Consistency conditions

Perturbation stability Proxy conditions(Efficiently testable with

privacy)

This Talk

Interlude: A simple subsampling based private LASSO algorithm [Smith,T.’13]

Notion of Neighboring Data sets

𝑥𝑖 𝑦 𝑖

Data set =

Design matrix Response vector

Notion of Neighboring Data sets

𝑥𝑖 ′ 𝑦 𝑖′

Data set =

and are neighboring data sets

Design matrix Response vector

Recap: Subsampling Stability

Data set

Random subsamplewith replacement 𝐷1 𝐷𝑚

Function

Stability:

Function¿ w.p.

Recap: PTR+Report-Noisy-Max Aggregator

Assumption: All candidate models

𝑆1 𝑆2 𝑆∗ 𝑆𝑘

𝐷1 𝐷𝑚

Vote Vote

𝑓 𝑓 𝑓

Recap: PTR+Report-Noisy-Max Aggregator

• Each entry of appears in data blocks w.p.

• Fix

1. If, then return , else return 𝑆1 𝑆2 𝑆∗ 𝑆𝑟

Subsampling Stability of the LASSO

Stochastic assumptions: Each data entry in Noise

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

or Design matrix

r vect

Subsampling Stability of the LASSO

Stochastic assumptions: Each data entry in Noise

Theorem [Wainwright06,ZY07]: Under proper choice of and , support of the LASSO estimator equals support of

Theorem: Under proper choice of , and , the output of the aggregator equals support of

Notice the gap of

Scale of

Perturbation stability based private LASSO and optimal sample complexity [Smith,T.’13]

Recap: Distance to Instability Property

•Definition: A function is stable at a data set if• For any data set , with ,

•Distance to instability:

•Objective: Output while preserving differential privacy

All data setsUnstable data sets

𝐷Distance

Stable data sets

Recap: Propose-Test-Release Framework (PTR)

Theorem: If is -stable at , then w.p. the algorithm outputs

TBD: Some global sensitivity one query

Instantiation of PTR for the LASSO

LASSO:

• Set function support of

• Issue: For , distance to instability might not be efficiently

computable

From [Smith,T.’13] Consistency

conditions

privacy)

This talk

Consistency conditions

privacy)

Perturbation Stability of the LASSO

LASSO: +¿¿

Theorem: Consistency conditions on LASSO are sufficient for perturbation stabilityProof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality conditions at

2. Show that support() is stable via using ‘’dual certificate’’ on stable instances

Perturbation Stability of the LASSO+¿¿

Lasso objective on

Proof Sketch: Gradient of LASSO =

0∈𝜕 𝐽 𝐷( �̂�)

Lasso objective on

0∈𝜕 𝐽 𝐷 ′ (�̂� ′ )

Perturbation Stability of the LASSO+¿¿

Proof Sketch: Gradient of LASSO =

Argue using the optimality conditions of and

1. No zero coordinates of become non-zero in (use mutual incoherence condition)

2. No non-zero coordinates of become zero in (use restricted strong convexity condition)

Perturbation Stability Test for the LASSO

: Support of : Complement of the support of

Test for the following (real test is more complex):• Restricted Strong Convexity (RSC): Minimum eigenvalue of

• Strong stability: Negative of the (absolute) coordinates of

the gradient of the least-squared loss in are

Intuition: Strong convexity ensures supp() supp()

1. Strong convexity ensures is small

2. If is large, then

3. Consistency conditions imply is large

Geometry of the Stability of LASSO+¿¿

Dimension 2 in

Dimension 1 in �̂�

Lasso objective along

Intuition: Strong stability ensures no zero coordinate in becomes non-zero in

• For the minimizer to move along , the perturbation to the gradient of least-squared loss has to be large

Dimension 2 in

Dimension 1 in

Slope:

Slope: -

�̂�

Gradient of the least-squared loss:

𝑎𝑖

𝑎𝑝

• Strong stability: for all has a sub-gradient of zero for LASSO()

−𝑋𝑇 (𝑦−𝑋 �̂� )=¿Dimension 2 in

Dimension 1 in

Slope:

Slope: -

�̂�

Test for Restricted Strong Convexity:

Test for strong stability:

Issue: If and , then sensitivities are and

Our solution: Proxy distance • has global sensitivity of one

Making the Stability Test Private (Simplified)

and are both largeand insensitive

1. Compute = function of and

Private Model Selection with Optimal Sample Complexity

Theorem: Under consistency conditions , and , w.h.p. the support of is output. Here .

+¿¿Nearly optimal sample complexity

Thesis: Stable algorithms yield differentially private algorithms

Two notions of stability:

1. Perturbation stability

2. Subsampling stability

This Talk

Concluding Remarks1. Sample and aggregate framework with PTR+report-noisy-

max aggregator is a generic tool for designing learning algorithms

• Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.]

2. Propose-test-release framework is an interesting tool if one can compute distance to instability efficiently

3. Open problem: Private high-dimensional learning without assumptions like incoherence and restricted strong convexity

from stability to differential privacy

privacy fits

talkdifferential privacy

privacy parameters

answers differential

data set iffor

short tutorial privacy

lap output theorem privacy

neighboring data setsrequire

Documents

1 the optimal mechanism in differential privacy ... ·...

utility-optimized local differential privacy mechanisms

differential privacy xintao wu oct 31, 2012

computational complexity & differential privacy

detecting violations of differential privacy

differential privacy as a response to the …

differential privacy models for location- based services ·...

differential privacy preservation for deep auto-encoders

game theory, mechanism design, differential privacy (and...

current developments in differential privacy

inductive learning and local differential privacy for

publishing set-valued data via differential privacy

differential privacy for bayesian inference through

differential privacy under fire

graph analysis with node differential privacy

privacy of correlated data & relaxations of differential...

cse711 topics in differential privacy

differential privacy sigmod 2012 tutorial

robust privacy-utility tradeoffs under differential

an brief tour of differential privacy