![Page 1: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/1.jpg)
Kunal TalwarMSR SVC
The Price of Privacy and the Limits of LP decoding
[Dwork, McSherry, Talwar, STOC 2007]
![Page 2: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/2.jpg)
Compressed Sensing:If x 2 RN is k-sparseTake M ~ Ck log N/k random Gaussian measurements
Then L1 minimization recovers x.
For what k does this make sense (i.e M < N)?
How small can C be?
Teaser
![Page 3: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/3.jpg)
Privacy motivation
Coding setting
Results
Proof Sketch
Outline
![Page 4: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/4.jpg)
Database of information about individualsE.g. Medical history, Census data, Customer
info.Need to guarantee confidentiality of individual
entriesWant to make deductions about the database;
learn large scale trends.E.g. Learn that drug V increases likelihood of
heart diseaseDo not leak info about individual patients
Setting
Curator
Analyst
![Page 5: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/5.jpg)
Simple Model (easily justifiable)Database: n-bit binary vector xQuery: vector aTrue answer: Dot product axResponse is ax+e = True Answer + Noise
Blatant Non-Privacy: Attacker learns n−o(n) bits of x.
Theorem: If all responses are within o(√n) of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking O(n log2n) random questions.
Dinur and Nissim [2003]
![Page 6: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/6.jpg)
Privacy has a PriceThere is no safe way to avoid increasing the
noise as the number of queries increases
Applies to Non-Interactive SettingAny non-interactive solution permitting answers
that are “too accurate” to “too many” questions is vulnerable to the DiNi attack.
This work : what if most responses have small error, but some can be arbitrarily off?
Implications
![Page 7: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/7.jpg)
Real vector x 2 Rn
Matrix A 2 Rmxn with i.i.d. Gaussian entriesTransmit codeword Ax 2 Rm
Channel corrupts message. Receive y=Ax +eDecoder must reconstruct x, assuming e has
small support small support: at most m entries of e are non-
zero.
Error correcting codes: Model
ChannelEncoder Decoder
![Page 8: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/8.jpg)
The Decoding problem
min support(e')such that
y=Ax'+e'x' 2 Rn
solving this would give the original message x.
min |e'|1
such that
y=Ax'+e'x' 2 Rn
this is a linear
program; solvable in poly time.
![Page 9: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/9.jpg)
Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin]For an error rate < 1/2000, LP decoding succeeds in recovering x (for m=4n).
This talk: How large an error rate can LP decoding tolerate?
LP decoding works
![Page 10: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/10.jpg)
Let * = 0.2390318914495168038956510438285657…
Theorem 1: For any <*, there exists c such that if A has i.i.d. Gaussian entries, and ifA has m = cn rowsFor k=m, every support k vector ek satisfies|e – ek| < then LP decoding reconstructs x’ where |x’-x|2 is O( ∕ √n).
Theorem 2: For any >*, LP decoding can be made to fail, even if m grows arbitrarily.
Results
![Page 11: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/11.jpg)
In the privacy setting: Suppose, for <*, the curatoranswers (1- ) fraction of questions within error o(√n)answers fraction of the questions arbitrarily.Then the curator is blatantly non-private.
Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from §1.
Attack works in non-interactive setting as well. Also leads to error correcting codes over finite
alphabets.
Results
![Page 12: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/12.jpg)
Theorem 1: For any <*, there exists c such that if B has i.i.d. Gaussian entries, and ifB has M = (1 – c) N rowsFor k=m, for any vector x 2 RN
then given Ax, LP decoding reconstructs x’ where
In Compressed sensing lingo
jx ¡ x0j2 · CpN infxk :jxk j0 · k jx ¡ xk j1
![Page 13: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/13.jpg)
Let * = 0.2390318914495168038956510438285657…
Theorem 1 (=0): For any <*, there exists c such that if A has i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x.
Proof sketch…
Rest of Talk
![Page 14: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/14.jpg)
Scale and translation invariance
LP decoding is scale and translation invariant
Thus, without loss of generality, transmit x = 0
Thus receive y = Ax+e = e
If reconstruct z 0, then |z|2 = 1
Call such a z bad for A.
Ax
Ax’
y
![Page 15: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/15.jpg)
Proof Outline
Proof:Any fixed z is very unlikely to be bad for A:
Pr[z bad] · exp(-cm)
Net argument to extend to Rn:Pr[9 bad z] · exp(-c’m)
Thus, with high probability, A is such that LP decoding never fails.
![Page 16: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/16.jpg)
z bad: |Az – e|1 < |A0 – e|1 ) |Az – e|1 < |e|1
Let e have support T.Without loss of generality,
e|T = Az|T
Thus z bad:|Az|Tc < |Az|T ) |Az|T > ½|Az|1
Suppose z is bad…e1 e2 e3 ....
em
000....0
a1za2za3z....
amz
T
0 y=e Az
0Tc
![Page 17: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/17.jpg)
A i.i.d. Gaussian ) Each entry of Az is an i.i.d. Gaussian
Let W = Az; its entries W1,…Wm are i.i.d. Gaussians
z bad ) i 2 T |Wi| > ½i |Wi|Recall: |T| · m
Define S(W) to be sum of magnitudes of the top fraction of entries of W
Thus z bad ) S(W) > ½ S1(W)Few Gaussians with a lot of mass!
Suppose z is bad…
0
T
![Page 18: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/18.jpg)
Let us look at E[S]
Let w* be such that
Let * = Pr[|W| ¸ w*]Then E[S*] = ½ E[S1]
Moreover, for any < *, E[S] · (½ – ) E[S1]
Defining *
E[S*] =½ E[S1]
E[S]
w*
![Page 19: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/19.jpg)
S depends on many independent Gaussians.
Gaussian Isoperimetric inequality implies:With high probability, S(W) close to E[S].S1 similarly concentrated.
Thus Pr[z is bad] · exp(-cm)
Concentration of measure
E[S*] =½ E[S1]
E[S]
![Page 20: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/20.jpg)
Now E[S] > ( ½ + ) E[S1]
Similar measure concentration argument shows that any z is bad with high probability.
Thus LP decoding fails w.h.p. beyond *
Donoho/CRTV experiments used random error model.
Beyond *
E[S*] =½ E[S1]
E[S]
![Page 21: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/21.jpg)
Compressed Sensing:If x 2 RN is k-sparseTake M ~ Ck log N/k random Gaussian measurements
Then L1 minimization recovers x.
For what k does this make sense (i.e M < N)? How small can C be?
Teaser
k < * N ≈ 0.239 N
C > (* log 1/ * )–1 ≈ 2.02
![Page 22: The Price of Privacy and the Limits of LP decoding](https://reader035.vdocuments.net/reader035/viewer/2022070500/56816843550346895dde188f/html5/thumbnails/22.jpg)
Tight threshold for Gaussian LP decodingTo preserve privacy: lots of error in lots of
answers.
Similar results hold for +1/-1 queries.
Inefficient attacks can go much further:Correct (½-) fraction of wild errors.Correct (1-) fraction of wild errors in the list
decoding sense.Efficient Versions of these attacks?
Dwork-Yekhanin: (½-) using AG codes.
Summary