Center for Secure Information Systems
Concordia Institute for Information Systems Engineering
k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure
Wen Ming Liu1, Lingyu Wang1, and Lei Zhang2
1 Concordia University2 George Mason University
ICDT 2010
CIISE / CSIS March 23 , 2010
Agenda
2
Background
K-Jump Strategy
Data Utility Comparison
Conclusion
Agenda
3
Background
K-Jump Strategy
Data Utility Comparison
Conclusion
Example
Algorithm anaive and asafe
Data Holder’s View
4
Example
Example – Data Holder’s View
Name DoB Condition
Alice 1990 flu
Bob 1985 cold
Charlie 1974 cancer
David 1962 cancer
Eve 1953 headache
Fen 1941 toothache
Micro-Data Table t0
5
DoB Condition
1970~1999
1940~1969
Generalization g2(t0)
Goal:Release table to satisfy 2-diversity
generalization
Name: identifier. DoB: quasi-identifier.Condition: sensitive attribute.
Data Holder
DoB Condition
1980~1999
1960~1979
1940~1959
Generalization g1(t0)
generalization algorithm: considering generalization function g1 and then g2 in order
Goal:Release table to satisfy 2-diversity
DoB Condition
1980~1999
1960~1979
1940~1959
generalization function g1()
DoB Condition
1970~1999
1940~1969
generalization function g2()
Condition
flu
cold
cancer
cancer
headache
toothache
2-diversity? 2-diversity?
Condition
flu
cold
cancer
cancer
headache
toothache
generalization
Released!
DoB Condition
1970~1999 flu
cold
cancer
1940~1969 cancer
headache
toothache
Generalization g2(t0)
Released!
Adversary’s View
6
Example (cont.)
Example (cont.) – Adversary’s View
Name DoB Condition
Alice 1990
Bob 1985
Charlie 1974
David 1962
Eve 1953
Fen 1941
Public Knowledge
7
DoB Condition
1970~1999 flu
cold
cancer
1940~1969 cancer
headache
toothache
ReleasedGeneralization g2(t0)
Name: identifier. DoB: quasi-identifier.Condition: sensitive attribute.
permutation setWhat can adversary infer?
Adversary
Name DoB Condition
Alice 1990 ???
Bob 1985 ???
Charlie 1974 ???
David 1962 ???
Eve 1953 ???
Fen 1941 ???
UnknownMicro-Data Table t0
t1
A flu
B col
C can
D can
E hac
F tac
t2
flu
can
col
can
hac
tac
t3
col
flu
can
can
hac
tac
t4
col
can
flu
can
hac
tac
…
…
…
…
…
…
…
t35
can
flu
col
tac
hac
can
t36
can
col
flu
tac
hac
can
Attacker knows: generalization public knowledge privacy property
Goal:Guess what is the
micro-data
DoB Condition
1970~1999 flu
cold
cancer
1940~1969 cancer
headache
toothache
ReleasedGeneralization g2(t0)
The three persons in each group may have the three conditions in any given order.
This would be the adversary’s best guesses of the micro-data table, if the released
generalization is his/her only knowledge,However …
8
t1
A flu
B col
C can
D can
E hac
F tac
t2
flu
can
col
can
hac
tac
t3
col
flu
can
can
hac
tac
t4
col
can
flu
can
hac
tac
…
…
…
…
…
…
…
t35
can
flu
col
tac
hac
can
t36
can
col
flu
tac
hac
can
permutation set
Example (cont.)
Example (cont.) – Adversary Simulating the Algorithm
9
However, adversary also knows the generalization algorithm, and can simulate the algorithm to further exclude some invalid guesses.
DoB Condition
1980~1999 ???
???
1960~1979 ???
???
1940~1959 ???
???
Generalization g1(ti)
Name DoB Condition
Alice 1990 ???
Bob 1985 ???
Charlie 1974 ???
David 1962 ???
Eve 1953 ???
Fen 1941 ???
Possible Table ti
Example (cont.) – Adversary Simulating the Algorithm
Name DoB Condition
Alice 1990 ???
Bob 1985 ???
Charlie 1974 ???
David 1962 ???
Eve 1953 ???
Fen 1941 ???
UnknownMicro-Data Table t0
10
DoB Condition
1970~1999 flu
cold
cancer
1940~1969 cancer
headache
toothache
ReleasedGeneralization g2(t0)
DoB Condition
1980~1999 ???
???
1960~1979 ???
???
1940~1959 ???
???
Checked but unusedGeneralization g1(t0)
disclosure setpermutation set
t1
flu
cold
cancer
cancer
headache
toothache
t1
flu
cold
cancer
cancer
headache
toothache
Violate privacy!Satisfyprivacy!
t2
flu
cancer
cold
cancer
headache
toothache
t2
flu
cancer
cold
cancer
headache
toothache
t3
cold
flu
cancer
cancer
headache
toothache
t3
cold
flu
cancer
cancer
headache
toothache
t4
cold
cancer
flu
cancer
headache
toothache
t4
cold
cancer
flu
cancer
headache
toothache
…
…
…
…
…
…
…
…
…
…
…
…
…
…
t35
cancer
flu
cold
toothache
headache
cancer
t35
cancer
flu
cold
toothache
headache
cancer
t36
cancer
cold
flu
toothache
headache
cancer
t36
cancer
cold
flu
toothache
headache
cancer
Name DoB
Alice 1990
Bob 1985
Charlie 1974
David 1962
Eve 1953
Fen 1941
t1
flu
cold
cancer
cancer
headache
toothache
t3
cold
flu
cancer
cancer
headache
toothache
t7
flu
cold
cancer
cancer
toothache
headache
t9
cold
flu
cancer
cancer
toothache
headache
t1
flu
col
can
can
hac
tac
t2
flu
can
col
can
hac
tac
A
B
C
D
E
F
t3
col
flu
can
can
hac
tac
t4
col
can
flu
can
hac
tac
…
…
…
…
…
…
…
t35
can
flu
col
tac
hac
can
t36
can
col
flu
tac
hac
can
Is this the valid guess of the
micro-data table?
Let’s try to check it using the algorithm!
Sim
ulat
ing
the
algo
rith
mM
enta
l im
age
11
Decision Process of Safe and Unsafe Algorithms
per1
g1 g2
t0
g1(t0)
YN
per2
g2(t0)
YN
gi
peri
gi(t0)
YN
gn
pern
gn(t0)
YN... ...
ds1
per1
g1 g2
t0
g1(t0)
YN
ds2
per2
g2(t0)
YN
gi
dsi
peri
gi(t0)
YN
gn
dsn
pern
gn(t0)
YN... ...
anaive
asafe
evaluation path
box: the ith iteration diamond:
an evaluation of the privacy property
per: permutation set ds: disclosure set
Most existing generalization algorithms (without considering this problem):
Evaluate the permutation set.(Adversary’s mental image of the micro-data table without the knowledge about the algorithm)
Safe generalization algorithms (Zhang’07ccs, ….)
Evaluate the disclosure set, instead.(Adversary’s mental image of the micro-data table after simulating the algorithm)
Agenda
12
Background
Data Utility Comparison
Conclusion
The Algorithm Family ajump( k )
Properties of ajump( k )
K-Jump Strategy
13
The Algorithm Family ajump(k)
ds1
per1
g1 g2
t0
g1(t0)
YN
g2(t0)
N
g2+k
g2+k(t0)
N
gn
gn(t0)
N... ...
ajump(k)
Y
ds2
per2
Y
Y
ds2+k
per2+k
Y
Y
dsn
pern
Y
YN N
N
naive strategy : evaluate privacy property on permutation set only safe strategy : evaluate privacy property on disclosure set directly k-jump strategy: penalize by jumping over the next k-1 iterations
naive strategy: efficient but unsafe safe strategy : safe but costly
14
Properties of ajump(k)
Computation of the disclosure set
ds(g1(t0)) and ds(g2(t0))
Size of the family
asafe: to compute ds(gi(t0)), must first compute ds(gj(t)) for all t in per(gi(t0)) and j=1,2, … ,i-1
ds1
per1
g1 g2
t0
g1(t0)
YN
g2(t0)
N
g2+k
g2+k(t0)
N
gn
gn(t0)
N... ...
ajump(k)
Yds2
per2
Y
Yds2+k
per2+k
Y
Ydsn
pern
Y
YN N
N
ajump: to compute ds(gi(t0)) (2<i<2+k), no longer need to compute ds(g2(t)) for all t in per(gi(t0))
ds(g1(t0)) = per(g1(t0))
ds(g2(t0)) is independent of the distance vector.
There are (n-1)! different jump distance vectors.
Agenda
15
Background
Conclusion
Construction for Theorem 1: 1-jump and i-jump (1<i) incomparable
Construction for Theorem 2: i-jump and j-jump (1<i<j) incomparable
Construction for Theorem 3: K1-jump and K2-jump (K1,K2: vector) incomparable
Construction for proposition 2: Reusing generalization functions
Results on asafe and ajump(1)
K-Jump Strategy
Data Utility Comparison
Construction for Theorem1:1-jump and i-jump (1<i) incomparable
16
QID g1 g2 g3 …
A C0 C0 C0 …
B C1 C1 C1 …
C C2 C2 C2 …
D C3 C3 C3 …
E C4 C4 C4 …
F C5 C5 C5 …
G C6 C6 C6 …
H C6 C6 C6 …
I C6 C6 C6 …
J C7 C7 C7 …
K C7 C7 C7 …
L C8 C8 C8 …
M C8 C8 C8 …
N C9 C9 C9 …
O C9 C9 C9 …
privacy property :highest ratio of a
sensitive value in a group must be no greater than
1/2
To compute ds3k(t0):
Excluding any table t for which p(per1(t))=true
1
S1 S2 S3 S4
A C0 C0 C0 C0
B C1 C1 C1 C1
C C2 C2 C2 C2
D C3 C3 C3 C3
E C4 C4 C4 C4
F C5 C5 C5 C5
G C6 C6 C6 C6
H C6 C6 C6 C6
I C6 C8/C9 C7/C9 C7/C8
J C7 C6 C6 C6
K C7 C8 C7 C7
L C8 C9 C9 C8
M C8 C8/C9 C7/C9 C7/C8
N C9 C7 C8 C9
O C9 C7 C8 C9
# 4320 1152 1152 1152
Belongs to one of the four disjoint sets.
S1 S2 S3 S4
A C0 C0 C0 C0
B C1 C1 C1 C1
C C2 C2 C2 C2
D C3 C3 C3 C3
E C4 C4 C4 C4
F C5 C5 C5 C5
G C6 C6 C6 C6
H C6 C6 C6 C6
I C6 C8/C9 C7/C9 C7/C8
J C7 C6 C6 C6
K C7 C8 C7 C7
L C8 C9 C9 C8
M C8 C8/C9 C7/C9 C7/C8
N C9 C7 C8 C9
O C9 C7 C8 C9
# 4320 1152 1152 1152
Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)
17
QID g1 g2 g3 …
A C0 C0 C0 …
B C1 C1 C1 …
C C2 C2 C2 …
D C3 C3 C3 …
E C4 C4 C4 …
F C5 C5 C5 …
G C6 C6 C6 …
H C6 C6 C6 …
I C6 C6 C6 …
J C7 C7 C7 …
K C7 C7 C7 …
L C8 C8 C8 …
M C8 C8 C8 …
N C9 C9 C9 …
O C9 C9 C9 …
privacy property :highest ratio of a
sensitive value in a group must be no greater than
1/2
To compute ds3k(t0):
S1 S2 S3 S4
A C0 C0 C0 C0
B C1 C1 C1 C1
C C2 C2 C2 C2
D C3 C3 C3 C3
E C4 C4 C4 C4
F C5 C5 C5 C5
G C6 C6 C6 C6
H C6 C6 C6 C6
I C6 C8/C9 C7/C9 C7/C8
J C7 C6 C6 C6
K C7 C8 C7 C7
L C8 C9 C9 C8
M C8 C8/C9 C7/C9 C7/C8
N C9 C7 C8 C9
O C9 C7 C8 C9
# 4320 1152 1152 1152
Excluding any table t for which p(per1(t))=true
Considering generalizing these tables using g2
1
2
S2, S3, S4 cannot be disclosed under g2.
Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)
18
QID g1 g2 g3 …
A C0 C0 C0 …
B C1 C1 C1 …
C C2 C2 C2 …
D C3 C3 C3 …
E C4 C4 C4 …
F C5 C5 C5 …
G C6 C6 C6 …
H C6 C6 C6 …
I C6 C6 C6 …
J C7 C7 C7 …
K C7 C7 C7 …
L C8 C8 C8 …
M C8 C8 C8 …
N C9 C9 C9 …
O C9 C9 C9 …
privacy property :highest ratio of a
sensitive value in a group must be no greater than
1/2
To compute ds3k(t0):
S1 S101 S102 S103
A C0 C0 C0 C0
B C1 C1 C1 C1
C C2 C2 C2 C2
D C3 C3 C3 C3
E C4 C4 C4 C4
F C5 C5 C5 C5
G C6 C6 C6 C6
H C6 C6 C6 C6
I C6
C6 C6 C6
J C7 C8 C7 C7
K C7 C8 C7 C7
L C8 C9 C9 C8
M C8 C9 C9 C8
N C9 C7 C8 C9
O C9 C7 C8 C9
# 4320 288 288 288
|S1’|=864
Excluding any table t for which p(per1(t))=true
Considering generalizing these tables using g2
1
2
a. Subsets in S1 which with both N and O have C7, C8,
or C9 cannot be disclosed under g2.
19
QID g1 g2 g3 …
A C0 C0 C0 …
B C1 C1 C1 …
C C2 C2 C2 …
D C3 C3 C3 …
E C4 C4 C4 …
F C5 C5 C5 …
G C6 C6 C6 …
H C6 C6 C6 …
I C6 C6 C6 …
J C7 C7 C7 …
K C7 C7 C7 …
L C8 C8 C8 …
M C8 C8 C8 …
N C9 C9 C9 …
O C9 C9 C9 …
privacy property :highest ratio of a
sensitive value in a group must be no greater than
1/2
To compute ds3k(t0):
S1 S111 S112 S113
A C0 C0 C0 C0
B C1 C1 C1 C1
C C2 C2 C2 C2
D C3 C3 C3 C3
E C4 C4 C4 C4
F C5 C5 C5 C5
G C6 C6 C6 C6
H C6 C6 C6 C6
I C6
C6 C6 C6
J C7 C7 C7 C7
K C7 C8 C8 C7
L C8 C9 C8 C8
M C8 C9 C9 C9
N C9 C7 C7 C8
O C9 C8 C9 C9
# 4320 1152 1152 1152
|S1\S1’|=3456
Excluding any table t for which p(per1(t))=true
Considering generalizing these tables using g2
1
2
b. For ajump(i),all tables in S1\S1’ will be excluded from
ds3i(t0).
432103 SSSS)(tds 'i
Satisfied!
Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)
20
QID g1 g2 g3 …
A C0 C0 C0 …
B C1 C1 C1 …
C C2 C2 C2 …
D C3 C3 C3 …
E C4 C4 C4 …
F C5 C5 C5 …
G C6 C6 C6 …
H C6 C6 C6 …
I C6 C6 C6 …
J C7 C7 C7 …
K C7 C7 C7 …
L C8 C8 C8 …
M C8 C8 C8 …
N C9 C9 C9 …
O C9 C9 C9 …
privacy property :highest ratio of a
sensitive value in a group must be no greater than
1/2
To compute ds3k(t0):
S1 S111 S1111 S1112
A C0 C0 C0 C0
B C1 C1 C1 C1
C C2 C2 C2 C2
D C3 C3 C3 C3
E C4 C4 C4 C4/C5
F C5 C5 C5 C6
G C6 C6 C6 C6
H C6 C6 C6 C4/C5
I C6 C6
C6 C6
J C7 C7 C7 C7
K C7 C8 C8 C8
L C8 C9 C9 C9
M C8 C9 C9 C9
N C9 C7 C7 C7
O C9 C8 C8 C8
# 4320 1152 576 576
Excluding any table t for which p(per1(t))=true
Considering generalizing these tables using g2
1
2
c. For ajump(1),the disclosure set of all
tables in S1\S1’ under g2 do not satisfy the
privacy property.
4321013 SSSS)(tds
The ratio of I being associated with C6 is 5/9.Violated!
Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)
21
Show the evaluation paths by figures.
Construction for Theorem2: i-jump and j-jump (1<i<j) incomparable
22
g1 g2 g3 … gj gj+1 gj+2 …
C0 C0 C0 … C0 C0 C0 …
C1 C1 C1 … C1 C1 C1 …
C2 C2 C2 … C2 C2 C2 …
C3 C3 C3 … C3 C3 C3 …
C4 C4 C4 … C4 C4 C4 …
S S S … S S S …
S S S … S S S …
C5 C5 C5 … C5 C5 C5 …
C6 C6 C6 … C6 C6 C6 …
C7 C7 C7 … C7 C7 C7 …
C8 C8 C8 … C8 C8 C8 …
C9 C9 C9 … C9 C9 C9 …
… … … … … … … …
The case where i-jump has better
utility than j-jump is relatively easier to
construct. We only show the construction
for the other case.
For this construction, generalization
gj+2 will be released for j-jump, while gj+i+1
or after will be released for i-jump.
Construction for Theorem2 (cont.) : i-jump and j-jump (1<i<j)
23
Construction for Theorem3:
K1-jump and K2-jump (K1,K2:vectors) incomparable
24
QID g1 g2 g3 g2'
A C1 C1 C1 C1
B C2 C2 C2 C2
C C3 C3 C3 C3
D C4 C4 C4 C4
E C5 C5 C5 C5
F C3 C3 C3 C3
G C3 C3 C3 C3
Cannot be disclosed under g1(.) or g3(.) .
1
The table will lead to disclosing
nothing!
g2 S1 S2 S3
A C1 C1 C1/C2 C1/C2
B C2 C2 C3 C3
C C3 C3 C1/C2 C1/C2
D C4 C4 C3 C4
E C5 C5 C3 C5
F C3 C3 C4 C3
G C3 C3 C5 C3
# 72 24 8 8
40
the jump distance is 1;
the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.
Without reusing g2:
To compute ds2:2
Belongs to one of the three disjoint
sets.
3212 SSSds
Violated!
Construction for proposition2: Reusing generalization functions
25
QID g1 g2 g3 g2'
A C1 C1 C1 C1
B C2 C2 C2 C2
C C3 C3 C3 C3
D C4 C4 C4 C4
E C5 C5 C5 C5
F C3 C3 C3 C3
G C3 C3 C3 C3
To calculate ds2’, the tables can be
disclosed under g1, g2, and g3 must be excluded from per2’
g3 S1 S2 S3
A C1 C1 C1/C2 C1/C2
B C2 C2 C3 C3
C C3 C3 C1/C2 C1/C2
D C4 C4 C3 C4
E C5 C5 C3 C5
F C3 C3 C4 C3
G C3 C3 C5 C3
# 24 8 8
40
the jump distance is 1;
the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.
g2 is reused as g2’:
S1,S2, and S3 cannot be disclosed under g2, as mentioned above.
1
S2 and S3 cannot be disclosed under g3.
2
Construction for proposition2 (cont.): Reusing generalization functions
26
QID g1 g2 g3 g2'
A C1 C1 C1 C1
B C2 C2 C2 C2
C C3 C3 C3 C3
D C4 C4 C4 C4
E C5 C5 C5 C5
F C3 C3 C3 C3
G C3 C3 C3 C3
To caculate ds2’, the tables can be
disclosed under g1, g2, and g3 must be excluded from per2’
S1 S11 S12 S13
A C1 C1 C1 C1
B C2 C2 C2 C2
C C3 C3 C3 C3
D C4 C3 C3 C4
E C5 C4/C5 C3 C5
F C3 C3 C4 C3
G C3 C4/C5 C5 C3
# 24 16 4 4
the jump distance is 1;
the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.
g2 is reused as g2’:
S1,S2, and S3 cannot be disclosed under g2, as mentioned above.
1
S2 and S3 cannot be disclosed under g3.
2
S1 can be further divided into three disjoint subsets
3
a. S12 and S13 cannot be disclosed under g3.
Construction for proposition2 (cont.): Reusing generalization functions
27
QID g1 g2 g3 g2'
A C1 C1 C1 C1
B C2 C2 C2 C2
C C3 C3 C3 C3
D C4 C4 C4 C4
E C5 C5 C5 C5
F C3 C3 C3 C3
G C3 C3 C3 C3
To caculate ds2’, the tables can be
disclosed under g1, g2, and g3 must be excluded from per2’
S1 S11 tA SA1 SA2
A C1 C1 C1 C3 C1/C2/C4
B C2 C2 C2 C3 C1/C2/C4
C C3 C3 C3 C1 C3
D C4 C3 C3 C2 C3
E C5 C4/C5 C4 C4 C1/C2/C4
F C3 C3 C3 C3 C3
G C3 C4/C5 C5 C5 C5
# 24 16 120 12 36
g2 is reused as g2’:
S1,S2, and S3 cannot be disclosed under g2, as mentioned above.
1
S2 and S3 cannot be disclosed under g3.
2
S1 can be further divided into three disjoint subsets
3
b. The tables in subset S11 can be disclosed under
g3.
To compute ds3(t0 in S11):
Excluding any table t for which p(per1(t))=true
A
Belongs to one of the two disjoint sets
(nor under g2).
These subsets cannot be disclosed under g2.
B
one instance
Construction for proposition2 (cont.): Reusing generalization functions
28
QID g1 g2 g3 g2'
A C1 C1 C1 C1
B C2 C2 C2 C2
C C3 C3 C3 C3
D C4 C4 C4 C4
E C5 C5 C5 C5
F C3 C3 C3 C3
G C3 C3 C3 C3
S12 S13 S2 S3
A C1 C1 C1/C2 C1/C2
B C2 C2 C3 C3
C C3 C3 C1/C2 C1/C2
D C3 C4 C3 C4
E C3 C5 C3 C5
F C4 C3 C4 C3
G C5 C3 C5 C3
# 4 4 8 8
the jump distance is 1;
the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.
g2 is reused as g2’:
321312'2 SSSSds
The ratio of D and E being associated with
C3 are 0.5, which is the highest ratio.
Satisfied!
Construction for proposition2 (cont.): Reusing generalization functions
Results on asafe and ajump(1)
29
Lemma 3: p(per(t0))=false p(any of its subsets)=false
Corollary 1:The algorithm asafe has the same data utility as ajump(1)
1. When the privacy property is:either set-monotonicor based on the highest ratio of sensitive values
Lemma 4: The ds3 under asafe is a subset of that under ajump(1)
Theorem 5:The data utility of asafe and ajump(1) is generally incomparable.
2. When the privacy property is other cases:
Agenda
30
Background
K-Jump Strategy
Data Utility Comparison
Conclusion
Conclusion
31
We have proposed a novel k-jump strategy for micro-data disclosure.
Transform a given generalization algorithm into a large number of safe algorithms.
Show the data utility is generally incomparable by constructing counter-examples.
Practical impact: make a secret choice.
Further Result and Future Work
32
Future studies:
Study more efficient safe algorithms.
Employ statistical methods to compare different k-jump algorithms..
Further investigate the opportunity in reusing generalization
functions.
Further Results in the extended version of this paper:
Computational complexity:
Making a secret choice among unsafe algorithms does not yield a safe
solution.
)|)max((| k
n
perO
Thank you!
33
Example – Data Holder View
Name DoB Condition
Alice 1990 flu
Bob 1985 cold
Charlie 1974 cancer
David 1962 cancer
Eve 1953 headache
Fen 1941 toothache
Micro-Data Table t0
34
DoB Condition
1970~1999
1940~1969
Generalization g2(t0)
Goal:Release table to satisfy 2-diversity
generalization
Name: identifier. DoB: quasi-identifier.Condition: sensitive attribute.
Data Holder
DoB Condition
1980~1999
1960~1979
1940~1959
Generalization g1(t0)
generalization algorithm: considering generalization function g1 and then g2 in order
Goal:Release table to satisfy 2-diversity
DoB Condition
1980~1999
1960~1979
1940~1959
generalization function g1()
DoB Condition
1970~1999
1940~1969
generalization function g2()
Condition
flu
cold
cancer
cancer
headache
toothache
2-diversity? 2-diversity?
Condition
flu
cold
cancer
cancer
headache
toothache
generalization
Toy Example
Name DoB Condition
Alice 1990 flu
Bob 1985 cold
Charlie 1974 cancer
David 1962 cancer
Eve 1953 headache
Fen 1941 toothache
Micro-Data Table t0
35
DoB Condition
1970~1999 flu
cold
cancer
1940~1969 cancer
headache
toothache
Generalization g2(t0)2-diversity
generalized
Name: identifier.
DoB: quasi-identifier.
Condition: sensitive attribute.
permutation setWhat can
attacker infer?
Data Holder
Attacker
Name DoB Condition
Alice 1990 ???
Bob 1985 ???
Charlie 1974 ???
David 1962 ???
Eve 1953 ???
Fen 1941 ???
External Data
t1
A flu
B col
C can
D can
E hac
F tac
t2
flu
can
col
can
hac
tac
t3
col
flu
can
can
hac
tac
t4
col
can
flu
can
hac
tac
…
…
…
…
…
…
…
t35
can
flu
col
tac
hac
can
t36
can
col
flu
tac
hac
can
Attacker knows: generalization external data privacy property