privacy leak in egonets - sharad · kumar sharad, computer lab, university of cambridge george...

1
Privacy Leak in Egonets Kumar Sharad, Computer Lab, University of Cambridge George Danezis, Microsoft Research Cambridge http://www.cl.cam.ac.uk/ ~ ks640/ [email protected] Abstract A major Telecom released anonymized communication sub-graphs of a few thousand randomly selected subscribers for scientific analysis. To obfuscate the interactions between the sub-graphs the common nodes were renamed. However, this is not enough to preserve privacy. Extracted Sub-graphs The sub-graphs tend to have nodes in common which can be re-identified by analysing the graph topology. Recovering the full communication graph could lead to severe privacy breaches. Re-identification By observing the neighbourhood of nodes we can assign a match probability to pairs of nodes across ego-nets. The nodes at a distance of 1-hop from the ego are easier to re-identify. After we have a match score between all possible pairs across two ego-nets, an optimal matching can be found. Results I Almost all the common 1-hop nodes were re-identified with over 98% success probability. I A significant portion of the 2-hop nodes were re-identified with over 75% (often over 90%) success probability. Conclusion I It is hard to release high dimensional data whilst preserving privacy. I Anonymizing real world data does not work in general.

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy Leak in Egonets - Sharad · Kumar Sharad, Computer Lab, University of Cambridge George Danezis, Microsoft Research Cambridge Created Date 20131217134244Z

Privacy Leak in EgonetsKumar Sharad, Computer Lab, University of Cambridge

George Danezis, Microsoft Research Cambridge

http://www.cl.cam.ac.uk/~ks640/ [email protected]

Abstract

A major Telecom released anonymizedcommunication sub-graphs of a fewthousand randomly selectedsubscribers for scientific analysis. Toobfuscate the interactions between thesub-graphs the common nodes wererenamed. However, this is not enoughto preserve privacy.

Extracted Sub-graphs

The sub-graphs tend to have nodes incommon which can be re-identified byanalysing the graph topology.

Recovering the full communicationgraph could lead to severe privacybreaches.

Re-identification

By observing the neighbourhood ofnodes we can assign a matchprobability to pairs of nodes acrossego-nets. The nodes at a distance of1-hop from the ego are easier tore-identify.

After we have a match score betweenall possible pairs across two ego-nets,an optimal matching can be found.

Results

I Almost all the common 1-hop nodeswere re-identified with over 98%success probability.

I A significant portion of the 2-hopnodes were re-identified with over 75%(often over 90%) success probability.

Conclusion

I It is hard to release high dimensionaldata whilst preserving privacy.

I Anonymizing real world data does notwork in general.