![Page 1: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/1.jpg)
Privacy Preserving K-means Clustering on Vertically
Partitioned Data
Presented by: Jaideep Vaidya
Joint work: Prof. Chris Clifton
![Page 2: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/2.jpg)
Overview
• Global Problem– Privacy Preserving Distributed Data Mining
• Specific Problem– Clustering (K-Means)
• For– Vertically Partitioned Data
• Using– Cryptographic Tools
![Page 3: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/3.jpg)
Medical Records
RPJ Yes Diabetic
CAC No Tumor No
PTR No Tumor Diabetic
Cell Phone Data
RPJ 5210 Li/Ion
CAC none none
PTR 3650 NiCd
Global Database ViewTID Brain Tumor? Diabetes? Model Battery
Vertical Partitioning of Data
![Page 4: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/4.jpg)
Is the problem trivial?
![Page 5: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/5.jpg)
Privacy Preserving Data Mining
• Perturbation– Agrawal & Srikant, Agrawal & Aggarwal, – Rizvi & Haritsa, Evfimievski et al.
• Cryptographic– Lindell & Pinkas, Du & Zhan– Vaidya & Clifton, Kantarcioglu & Clifton
![Page 6: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/6.jpg)
Secure Multiparty Computation (SMC)
• Given a function f and n inputs, distributed at n sites, compute
the result
while revealing nothing to any site except its own input(s) and the result.
xxx n,...,,
21
nxxxfy ,,, 21
![Page 7: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/7.jpg)
Results
• Cluster assignment for entities– Not private
• Cluster centers– Semi-private
2.3 34 19 15.5 5210 Li/Ion Piezo
![Page 8: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/8.jpg)
Secure K-means clustering
Arbitrarily select k starting points
Repeat– Assign to respectively– (re)assign each object to closest cluster
based on distance from mean– Re-compute the cluster means
Until no change
''2
'1 ,,, k
k ,,, 21 ''2
'1 ,,, k
''2
'1 ,,, k
K-means clustering
![Page 9: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/9.jpg)
Assigning objects to closest cluster
k
i
r
D
PPP
O,
O,ity object/entevery For
j
2
1
21
rj
ijki
x 11
minarg Compute
![Page 10: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/10.jpg)
Key Idea
• Disguise site components with random values
• Compare distances while revealing only comparison result
• Permute order of clusters to conceal meaning of comparison results
![Page 11: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/11.jpg)
Closest Cluster Computation
• 3 special sites, P1, P2 and Pr
• P1 generates
– r random vectors such that– Permutation π (over 1 .. K)
iV 01
r
iiV
![Page 12: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/12.jpg)
Permutation ProtocolDu and Atallah ’01
A B,
V
X
EXE ),(
))((
VXE
Homomorphic encryption: Ek(x)*Ek(y) = Ek(x+y)
)(
VX
![Page 13: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/13.jpg)
Closest Cluster Computation
P1
P2
,
V i
2X222 ),( EXE
))(( 222
VXE
Pr
rX
rrr EXE ),(
))((
rrr VXE
Stage 1
P1
Pr-1
P3
Pr
)( 33
VX
)( 11
VX
)( 11
rr VX
Stage 2
2i
ii VX
![Page 14: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/14.jpg)
Closest Cluster Computation
• Stage 3– P2 and Pr determine i, the index of the cluster
with minimum distance
• Stage 4– P1 computes and broadcasts i1
![Page 15: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/15.jpg)
When to stop?
• Locally compute difference in means
• Globally known threshold
• Use simple random-adding technique to disguise actual values– First party adds random value to its distance and
sends to next party– Each party adds its value to total and sends on– Last party compares with first party’s random
+threshold
![Page 16: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/16.jpg)
Communication Cost
• r parties, n data elements, m bit distances
Bits Rounds
Basic Algorithm
O(knr) O(r+k)
Optimized Algorithm
O(kmr) O(r)
Generic Method
O(kmnr3) 1
Non-Secure Method
O(n) 1
![Page 17: Privacy Preserving K -means Clustering on Vertically Partitioned Data](https://reader036.vdocuments.net/reader036/viewer/2022062321/56812c4e550346895d90d70c/html5/thumbnails/17.jpg)
Conclusion
• Presented a solution for Privacy Preserving K-Means Clustering problem
• How to use clusters?
• Will parties share required information for the possible benefits?
• Improve Efficiency
• Working on EM-Clustering, implementations