foundations of privacy lecture 5
DESCRIPTION
Foundations of Privacy Lecture 5. Lecturer: Moni Naor. Recap of last week’s lecture. The Exponential Mechanism Differential privacy May yield utility/approximation Is defined and evaluated by considering all possible answers Counting Queries The BLR Algorithm Efficient Algorithm. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/1.jpg)
Foundations of Privacy
Lecture 5
Lecturer: Moni Naor
![Page 2: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/2.jpg)
Recap of last week’s lecture• The Exponential Mechanism
– Differential privacy– May yield utility/approximation– Is defined and evaluated by considering all possible
answers• Counting Queries
– The BLR Algorithm– Efficient Algorithm
![Page 3: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/3.jpg)
query 1,query 2,. . .
Synthetic DB: Output is a DB
Database
answer 1answer 3
answer 2
?
Sanitizer
Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB
Software and people compatibleConsistent answers
![Page 4: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/4.jpg)
Counting Queries
• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy: answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
Assume all queries given in advance
U
Database D of size n
Query c
Non-interactive
![Page 5: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/5.jpg)
The BLR Algorithm
For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|
Intuition: far away DBs get smaller probability
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
Blum Ligett Roth08
![Page 6: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/6.jpg)
Counting Queries
• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy: answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
U
Database D of size n
Query c
Sample F of size m approx D on all given predicates
c
![Page 7: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/7.jpg)
The BLR Algorithm: Error Õ(n2/3 log|C|)
There exists Fgood of size m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ α
Pr[Fgood] / e-εα
For any Fbad with dist 2α, Pr[Fbad] / e-2εα
Union bound: ∑ bad DB Fbad Pr[Fbad] / |U|me-2εα
For α=Õ(n2/3log|C|), Pr[Fgood] >> ∑ Pr[Fbad]
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
![Page 8: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/8.jpg)
The BLR Algorithm: Running Time
Generating the distribution by enumeration:Need to enumerate every size-m database,where m = Õ((n\α)2·log|C|)
Running time ≈ |U|Õ((n\α)2·log|c|)
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
![Page 9: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/9.jpg)
Conclusion
Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries
• Error α is Õ(n2/3 log|C|/ε)
• Super-poly running time: |U|Õ((n\α)2·log|C|)
![Page 10: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/10.jpg)
Can we Efficiently Sanitize?
The good news
If the universe is small, Can sanitize EFFICIENTLY
The bad news cannot do much better, namely sanitize in time:
sub-poly(|C|) AND sub-poly(|U|)
Time poly(|C|,|U|)
![Page 11: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/11.jpg)
How Efficiently Can We Sanitize?
|C||U|
subpoly
poly
subpoly
poly
?
Good news!
?
? ?
![Page 12: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/12.jpg)
The Good News: Can Sanitize When Universe is Small
Efficient Sanitizer for query set C• DB size n ¸ Õ(|C|o(1) log|U|)• error is ~ n2/3 • Runtime poly(|C|,|U|) Output is a synthetic database
Compare to [Blum Ligget Roth]:
n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)
![Page 13: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/13.jpg)
Recursive Algorithm
C0=C C1 C2 Cb
Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:
shrink query set by (small) factor
![Page 14: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/14.jpg)
Recursive Algorithm
Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:
shrink query set by (small) factorEnd recursion: sanitize D w.r.t. small query set CbOutput is good for all queries in small set Ci+1
Extract utility on almost-all queries in large set CiFix remaining “underprivileged” queries in large set Ci
C0=C C1 C2 Cb
![Page 15: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/15.jpg)
Recursive Algorithm Overview Want to sanitize DB D for query set CSay we have a small sanitizer A’ for smaller subsets C’ ½ C,
and A’ outputs small synthetic databaseChoose random C’½ C, sanitize D for C’ using A’
“Magic”: Sanitization givesaccurate answers onall but small subset B ½ C
Fix “underprivileged”queries in B “manually”
CC’
B
A’ sanitizesFix manually
Why?
How?
Where?
![Page 16: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/16.jpg)
Sanitize for few queries, get utility for almost all
Consider m-bit synthetic DB output y of A’ vs. DB D:If y is “bad” for query set By of fractional size ≥m/s:
PrC’[C’By=φ] ≤ (1-m/s)|C’| ≈ e-m
W.h.p. simultaneously for all y‘s with large set By of bad queries, C’ intersects By
C’
C
y*=A’(D) good for all of C’y* good for almost all C
y: potential m-bit output DB
By
Occam’s Razor
By*
![Page 17: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/17.jpg)
How to get Synthetic DB? Syntheticizer
Problem: need small synthetic DB, have large other output
Lemma [“Syntheticizer”]Given sanitizer A with α-accuracy and arbitrary outputProduce sanitizer A’ with 2α-accuracy and synthetic DB
output of size Õ(log|C|/α2)
Runtime is poly(|U|,|C|)
Transform output to synthetic DB using linear programmingVariable per item in U, constraint per query in C
![Page 18: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/18.jpg)
The Linear Program
• Run the sanitizer A and then use it to get differentially private counts vc on all the concepts in C– Database never used again - privacy
• Come up with a low-weight fractional database that approximates these counts.
• Transform this fractional database into a standard synthetic database by rounding the fractional counts.
![Page 19: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/19.jpg)
• For all i 2 U variable xi• For all c 2 C constraint
vc - · i s.t c(i)=1 xi · vc +
![Page 20: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/20.jpg)
The Linear Program
• Why is there a fractional solution?– The real one integer solution is one example!
• Rounding:– scale the fractional database so that its total weight is 1, – Round down each fractional point to closest multiple of
/|U|– Treat the rounded fractional database, as an integer
synthetic database of size at most |U| / – If too large -sample
![Page 21: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/21.jpg)
How Do We Use Synthetic DB?
Why Synthetic DB?1. Easy to “shrink” DBs by sub-sampling Õ(log|
C|/α2) DB items
2. Gives counts for every query output is well-defined even for queries that were not around when sanitizing
![Page 22: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/22.jpg)
Utility for all queries: First Attempt
Sanitizing small C’ is easy (“brute force”),can “shrink” using syntheticizer
Sub-sample small C’, work for all but a few queriesRepeat many times, take majority
Doesn’t work:Underprivileged queries
C’
C
B C’’
![Page 23: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/23.jpg)
Utility for all queries: fix “underpriveleged”
LemmaGiven query set C, diff. private sanitizer A that:1. Works for every C’ ½C, |C’|=s2. Outputs synthetic DB of size ≤ mGet sanitizer for C, utility on all queries Need DB size n ≥ Õ(|C|m/s)
![Page 24: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/24.jpg)
Proof OutlineSubsample small C’, get synthetic DB that works for
all but a few (~|C|m/s) “underprivileged” queries
Now “manually” correct those few:“brute force”: release noisy counts vc (noise ~|C|m/s)
Also need to say which ones are underprivileged…depends on DB D. What about privacy?
Key point: regardless of D, almost all queries strongly privileged. Release noisy indicator vector.
For privacy analysis, need only consider the ~|C|m/s potentially underprivileged queries
![Page 25: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/25.jpg)
Recursive Algorithm: Recap
C0=C C1 C2 Cb
Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factor
v
![Page 26: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/26.jpg)
Recursive Algorithm: Recap
Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factorSanitize D w.r.t. small Cb (use “brute force” sanitizer)Syntheticizer transforms output to small synthetic DBFix “underprivileged” (need n ≥ Õ(f))Lose 2b accuracy, “brute force” needs n ≥ 2b|Cb|
C0=C C1 C2 Cb
n ≥ |C|o(1) by trading off b,f
![Page 27: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/27.jpg)
And Now… Bad News
Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output
Exponential Mechanism cannot be implemented
Want hardness… Got Crypto?
![Page 28: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/28.jpg)
The Bad News
For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output
Exponential Mechanism cannot be implemented
Want hardness… Got Crypto?
![Page 29: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/29.jpg)
Digital Signatures
Digital Signatures (sk,vk)
Can build from one-way function [NaYu,Ro]
m1 sig(m1)
m2 sig(m2)
mn sig(mn)
m’ sig(m’)
valid signatures under vk
Hard to forge new signature
![Page 30: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/30.jpg)
Signatures ! No Synthetic DB
Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk
m1 sig(m1)
m2 sig(m2)
mn sig(mn)
sanitizerm’1 s1
m’k sk
most are valid signatures under vkinputs appear in output, no
privacy!valid signatures under same vk
![Page 31: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/31.jpg)
Can We output Synthetic DB Efficiently?
|C||U|
subpoly
poly
subpoly
poly
? ?
?
![Page 32: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/32.jpg)
Where is Hardness Coming From?
Signature example:Hard to satisfy a given queryEasy to maintain utility for all queries but one
More natural:Easy to satisfy each individual queryHard to maintain utility for most queries
![Page 33: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/33.jpg)
Hardness on Average
Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)
cv(vk,m,s) - 1 iff valid sig under vk
sanitizer
valid signatures under vk
m’1 s1vk’1m1 sig(m1)vk
m2 sig(m2)vk
mn sig(mn)vk
m’k skvk’k
are these keys related to vk?Yes! At least one is vk!
![Page 34: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/34.jpg)
Hardness on Average
Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)
cv(vk,m,s) - 1 iff valid sig under vkm’1 s1
m’k sk
vk’1
vk’k
8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are
3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!
are these keys related to vk?Yes! At least one is vk!
![Page 35: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/35.jpg)
Where is Hardness Coming From?
Signature example:Hard to satisfy a given queryEasy to maintain utility for all queries but one
More natural:Easy to satisfy each individual queryHard to maintain utility for most queries
![Page 36: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/36.jpg)
Can We output Synthetic DB Efficiently?
|C||U|
subpoly
poly
subpoly
poly
? ?
?
Signatures Hard on Avg.Using PRFs
![Page 37: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/37.jpg)
General output sanitizers
TheoremTraitor tracing schemes exist
if and only if sanitizing is hardTight connection between |U|,|C| hard to sanitize
and key,ciphertext sizes in traitor tracing
Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme
![Page 38: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/38.jpg)
Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption
devices
• Given a Pirate Box want to find who leaked the keys
E(Content)
K1 K3 K8
ContentPirate Box
Traitors ``privacy” is violated!
![Page 39: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/39.jpg)
Equivalence of TT and Hardness of Sanitizing
Ciphertext
Key
Traitor Tracing
Database entry
Query
Sanitizing hard
TT Pirate Sanitizer
for distribution of DBs(collection of)
(collection of)
![Page 40: Foundations of Privacy Lecture 5](https://reader035.vdocuments.net/reader035/viewer/2022070420/56815f6a550346895dce6bae/html5/thumbnails/40.jpg)
Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme
– cipher length c(n), – key length k(n),
can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases with entries from UD is “hard to sanitize”: exists tracer that can extract an entry in
D from any sanitizer’s output
Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme
Violate its privacy!