privacy-preserving anonymization of set value data manolis terrovitis, nikos mamoulis university of...
Post on 23-Dec-2015
216 Views
Preview:
TRANSCRIPT
Privacy-preserving Anonymization of Set Value Data
Manolis Terrovitis, Nikos MamoulisUniversity of Hong Kong
Panos KalnisNational University of Singaporewww.comp.nus.edu.sg/~kalnis
2
Motivation
Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items
0% M
ilk
Preg
nanc
y
test
Beer
Helen
3
Motivation (cont.)
Helen: Beer, 0% Milk, Pregnancy testJohn: Cola, CheeseTom: 2% Milk, Coffee….Mary: Wine, Beer, Full-fat Milk
Database
t1: Beer, 0%Milk, Pregnancy testt2: Cola, Cheeset3: 2% Milk, Coffee….tn: Wine, Beer, Full-fat Milk
Published
AttackerFind all transactions that contain Beer & 0% Milk
t1: Beer, Milk, Pregnancy testt2: Cola, Cheeset3: Milk, Coffee….tn: Wine, Beer, Milk
4
km-anonymity
Di
tttD
t
ooo
,...,
,...,,
21
21
Set of items
Transaction
Database
tqsDttres |
kresres 0
mqs Query terms
km-anonymity:
5
Related Work: K-Anonymity [Swe02]
Age ZipCode Disease
42 25000 Flu
46 35000 AIDS
50 20000 Cancer
54 40000 Gastritis
48 50000 Dyspepsia
56 55000 Bronchitis
[Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.
(a) Microdata
Quasi-identifier
Age ZipCode Disease
42-46 25000-35000 Flu
42-46 25000-35000 AIDS
50-54 20000-40000 Cancer
50-54 20000-40000 Gastritis
48-56 50000-55000 Dyspepsia
48-56 50000-55000 Bronchitis
(a) 2-anonymous microdata
NOT suitable for high-dimensionality
6
Related Work: L-diversity in Transactions
[GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008
Requires knowledge of (non)-sensitive attributes
7
Our Approach: Employs Generalization
Aaa 21,
Gen
era
lizati
on
H
iera
rch
y
otherwise ,
node leaf ,0)(
pupNCP
Information loss
k=2m=2
8
Lattice of Generalizations
9
Count Tree
1221
1212122 ,,,
,,,,,,,,
baBaAbAB
baBABAbabat
A1B
12a
11b
1
1b1
B1
2a1
1b1
23 2 2
10
Optimal Algorithm
Q: Q: Q:
11
“Direct” Anonymization
COUNT({a1,a2})=1
Solves each “problem” independently
12
“Apriori-based” AnonymizationConstruct the count-tree incrementally
Prune unnecessary branches
13
Small Datasets (2-15K, BMS-WebView2)
|I|=40..60, k=100, m=3
14
Small Datasets (BMS-WebView2)
|D|=10K, k=100, m=1..4
15
Apriori Anonymization for Large Datasets
500
sec
10se
c
100
sec
|D| |I|
515K 1657
59K 497
77K 3340
k=5 m=3
16
Points to Remember
Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier
Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information
loss On-going work
Local recoding (sort by Gray order and partition)
Transactional data in streaming environments
17
Bibliography on LBS Privacy
http://anonym.comp.nus.edu.sg
top related