privacy preserving similarity detection for data analysisleontiad/publications/csar2013.pdf ·...
TRANSCRIPT
![Page 1: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/1.jpg)
Privacy preserving similarity detection for data analysis
Iraklis Leontiadis1 Melek Önen1 Refik Molva1 M.J. Chorley2 G.B. Colombo2
CSAR 2013
1Eurecom - France 2Cardiff - UK
![Page 2: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/2.jpg)
Privacy vs Utility
Data A1,A2,A3,…An
Data B1,B2,B3,… Bn
Clustering
Similarity
Privacy preserving similarity detection for data analysis 2
.
.
.
? ? ? ? ? ?
Personality test
![Page 3: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/3.jpg)
Naïve solutions
• Encrypt data with standard crypto – Renders operations infeasible.
• Data separation – Vertical separation is not always applicable.
• Anonymizing techniques – Don’t protect individuals data.
Privacy preserving similarity detection for data analysis 3
![Page 4: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/4.jpg)
Our Approach
• Combine crypto with data processing
User Data Data analysis
Alice 𝐴1′, …𝐴𝐴′ 𝐹(𝐴1′, …𝐴𝐴′)
Bob 𝐵1′, …𝐵𝐴′
𝐹(𝐵1′, …𝐵𝐴′)
𝐹(𝐴1, …𝐴𝐴) = 𝐹(𝐴1′, …𝐴𝐴′)
Data A’1,A’2,A’3,…A’n
Data B’1,B’2,B’3,… B’n
Privacy preserving similarity detection for data analysis 4
.
.
.
![Page 5: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/5.jpg)
Outline
• Our solution – Cosine similarity – Privacy with Geometrical Transformations
• Security Analysis • Performance Evaluation
– Hierarchical clustering – Results
• Looking Ahead
Privacy preserving similarity detection for data analysis 5
![Page 6: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/6.jpg)
Cosine similarity
A
B θo
1 1 w1 w2
…
w4
w3
wn
F1
Dictionary
F2
“Next CSAR workshop will be held in Karlsruhe”
“Next CSAR workshop will be held in London”
A= 1 1
1 1 0 1 1
1 1
1 1 0 1
1
1
1
1
1
B=
…
…
…
…
Privacy preserving similarity detection for data analysis 6
![Page 7: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/7.jpg)
Random Scaling
• Data encoded as unique vectors in ℝ𝐴
• φr:ℝ𝐴 → ℝ𝐴 s.t:
cos a, b = cos φr1(a),φr2(b)
• Random scaling
– r ⟵ℝ𝑛
– S(r, A) = r ∙ A =r ⋯⋮ 𝑟 ⋮
⋯ 𝑟∙ A
Privacy preserving similarity detection for data analysis 7
θo θo
![Page 8: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/8.jpg)
Vector Rotation
• Rotation by a common angle λ°
– R λ° a = a ∙cos (λ°) ⋯ sin (λ°)
⋮ ⋮−sin (λ°) ⋯ cos (λ°)
• φr = a ∙R λ° a ∙ 𝑆𝑟(a)
F1’
F2’
θo
F1
F2
Privacy preserving similarity detection for data analysis 8
![Page 9: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/9.jpg)
Our solution
Privacy preserving similarity detection for data analysis 9
Dimension reduction
Random Scaling
A S(r1, A1) = A1
A2
A3
S(r2, A2) =
S(r3, A3) =
r1 ∙
r2 ∙
r3 ∙
Rotation
R λ° r1 ∙ A1 =
R λ° r2 ∙ A2 =
R λ° r3 ∙ A3 =
R λ° ∙ r1 ∙
R λ° ∙ r2 ∙
R λ° ∙ r3 ∙
![Page 10: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/10.jpg)
Security analysis
𝑉′1 = R λ° (S r1,𝑑1,𝑑2 , S r2, 𝑑3,𝑑4 , S r3,𝑑1𝑑5 )
Privacy preserving similarity detection for data analysis 10
• Internal:
– Rotation angle is known.
• External:
– Rotation angle remains unknown.
![Page 11: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/11.jpg)
Security analysis cont’d
Privacy preserving similarity detection for data analysis 11
Per user equivalent coefficient are exposured as auxiliary information
∙𝐜𝐨𝐨 (𝝀𝝀) ⋯ 𝐨𝐬𝐬 (𝝀𝝀)
⋮ ⋮−𝐨𝐬𝐬 (𝝀𝝀) ⋯ 𝐜𝐨𝐨 (𝝀𝝀)
?
∙ r1
∙ r2
∙ r3
∙ r1
∙ r2
∙ r3
![Page 12: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/12.jpg)
Evaluation
• 173 users willing to run 4sqPersonality test • 5 factor personality test
– Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism.
Privacy preserving similarity detection for data analysis 12
![Page 13: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/13.jpg)
Clustering approach
• Hierarchical Agglomerative clustering (HAC) – Input: n points and N*N similarity matrix – Output: Single cluster containing all n points C=MakeSingletonClusters(); for i=0 to i=n: Find “closest” clusters c1,c2; Merge(c1,c2); RecomputeDistances(C); if #C=1 exit();
Agglomerative: O(n3) Divisible: O(2n)
Privacy preserving similarity detection for data analysis 13
Cosine Similarity
![Page 14: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/14.jpg)
Results
![Page 15: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/15.jpg)
Recap
1. Pairwise cosine similarity for multidimensional vectors.
2. Geometrical transformations compatible with cosine similarity.
Privacy preserving similarity detection for data analysis 15
![Page 16: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀](https://reader036.vdocuments.net/reader036/viewer/2022081613/5fbc9387e23fba095d031ea9/html5/thumbnails/16.jpg)
Looking Ahead
• Other privacy preserving similarity detection algorithms.
• Privacy preserving data analysis algorithms: – MAX,MIN
Thank you! Iraklis Leontiadis
Privacy preserving similarity detection for data analysis 16