privacy-aware data management in information networks

112
Privacy-Aware Data Management in Information Networks Michael Hay, Cornell University Kun Liu, Yahoo! Labs Gerome Miklau, Univ. of Massachusetts Amherst Jian Pei, Simon Fraser University Evimaria Terzi, Boston University SIGMOD 2011 Tutorial 1

Upload: phunghuong

Post on 03-Jan-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy-Aware Data Management in Information Networks

Privacy-Aware Data Management in Information Networks

Michael Hay, Cornell UniversityKun Liu, Yahoo! Labs Gerome Miklau, Univ. of Massachusetts AmherstJian Pei, Simon Fraser UniversityEvimaria Terzi, Boston University

SIGMOD 2011 Tutorial 1

Page 2: Privacy-Aware Data Management in Information Networks

Romantic connections in a high school

Bearman, et al.The structure of adolescent romantic and sexual networks. American Journal of Sociology, 2004. (Image drawn by Newman)2

Page 3: Privacy-Aware Data Management in Information Networks

(86%) during this special study interval, using an interview

period that was at least twice as long (180 days). Both compo-

nent distribution and non-cyclic linear structure were similar

during this interval to that of the four year period. The only

notable difference was a substantial shift in the dyad to triad

ratio, from 1.8:1 overall to 0.46:1 in the 1996–97 study

interval. Thus enhanced partner interviewing procedures

tended to increase observed connectivity in the smallestcomponents. Low overall network connectivity and the virtual

absence of cyclic microstructures in large connected compo-

nents support the view that chlamydia infection in Colorado

Springs was probably in a maintenance phase or possibly in a

decline phase during the four year study period. We conclude

that the fragmented, non-cyclic network structure observed

probably reflects low endemic rather than epidemic spread.

Comparison with epidemic network structureA historical contact tracing dataset recording rapid epidemic

spread of bacterial STD in Colorado Springs was available for

reanalysis. As previously reported,2 3

a group composed of 578

persons, mostly adolescents associated with crack cocaine

street gangs, was involved in an STD outbreak during 1990

and 1991. Of 578 individuals identified, 410 (71%) formed a

single connected component consisting of 218 men and 192

women. In this component, 300 (73%) were examined; 248

were infected with one or more bacterial STD (261 gonococcal,

127 chlamydial, and two early syphilis infections). These data

suggest a hyperendemic STD period prevalence of 130 000

cases per 100 000 population. The dense interconnections in

this group reveal a predominantly cyclic pattern with some

linear connections at individual nodes (fig 4A). Pruning the

A

B

G

G = gonorrhoeaGT = gonorrhoea and chlamydiaT = chlamydia (T, trachomatis)

GT

GT

G

G

G

G

GT

G

G

G GT G

GGGT

G

GT

G

TT

GGT

GTGT

GTGT

GT

GG

GT

GT

GT

GT

GT

GTGT

G

TT

GTG

GT

G

GTG

G

G

G G

GT

GG

G GTGTGT G

GT

G

G

GT

GT

GG

GT

GT

GT

GT

GT

GT

G

G

G

GT

GT GT

GT

G

G

GT

GT

G

G

GTG

GT

GT

Figure 4 (A) Graph of the largestcomponent in gang associated STDoutbreak, Colorado Springs,1989–91 (n = 410). (B) Core of thelargest component in gangassociated STD outbreak, ColoradoSprings, 1989–91 (n = 107).

Sexual network structure i155

www.sextransinf.com

on 16 August 2008 sti.bmj.comDownloaded from

Potterat, et al.Risk network structure in the early epidemic phase of hiv transmission in colorado springs. Sexually Transmitted Infections, 2002.

Sexual and injecting drug partners

3

Page 4: Privacy-Aware Data Management in Information Networks

5

DEU

TSC

HE

PHY

SIK

ALI

SCH

EG

ESEL

LSC

HA

FT

Figu

re2.

Num

bero

fnod

esin

the

sam

pleN

s(!)

,obt

aine

dby

snow

ball

sam

plin

g,

asa

func

tion

ofex

tract

ion

dist

ance

!fo

rsev

eral

choi

ceso

fthe

sour

ceno

de(s

olid

lines

)and

thei

rave

rage

(das

hed

line)

.

Figu

re3.

Asa

mpl

eof

the

netw

ork,

show

ing

the

sour

ce(o

rang

esq

uare

atth

e

cent

re)

node

from

whi

chsa

mpl

ing

was

star

ted,

the

bulk

node

s(+

),an

dsu

rfac

e

node

s(!)

.For

surf

ace

node

s,w

hich

clea

rlyar

ein

the

maj

ority

,onl

yso

me

ofth

eir

near

estn

eigh

bour

sare

visi

ble

inth

esa

mpl

e,w

hile

the

rest

are

outs

ide

the

sam

ple.

New

Jour

nalo

fPhy

sics

9(2

007)

179

(http

://w

ww

.njp

.org

/)

J. Onnela et al. !Structure and tie strengths in mobile communication networks,Proceedings of the National Academy of Sciences, 2007

Social ties derived from a mobile phone network

4

Page 5: Privacy-Aware Data Management in Information Networks

~600 million nodesbillions of relationships

Facebook

networkdata set

complex platform for

sharing5

Page 6: Privacy-Aware Data Management in Information Networks

Privately managing enterprise network data

Personal Privacy in Online Social Networks

Data: Enterprise collects data or observes interactions of individuals.

Control: Enterprise controls dissemination of information.

Goal: permit analysis of aggregate properties; protect facts about individuals.

Challenges: privacy for networked data, complex utility goals.

Data: Individuals contribute their data thru participation in OSN.

Control: Individuals control their connections, interactions, visibility.

Goal: reliable and transparent sharing of information.

Challenges: system complexity, leaks thru inference, unskilled users.

6

Page 7: Privacy-Aware Data Management in Information Networks

• Privately Managing Enterprise Network Data

• Goals, Threats, and Attacks

• Releasing transformed networks (anonymity)

• Releasing network statistics (differential privacy)

• Personal Privacy in Online Social Networks

• Understanding privacy risk

• Managing privacy controls

Outline of tutorial

60minutes

30minutes

7

Page 8: Privacy-Aware Data Management in Information Networks

Data model

ID Age HIVAlice 25 PosBob 19 Neg

Carol 34 PosDave 45 PosEd 32 Neg

Fred 28 NegGreg 54 PosHarry 49 Neg

Alice Bob Carol

Dave Ed

Fred Greg Harry

NodesID1 ID2

Alice BobBob CarolBob DaveBob EdDave EdDave FredDave GregEd GregEd Harry

Fred GregGreg Harry

Edges

8

Page 9: Privacy-Aware Data Management in Information Networks

Sensitive information in networks

• Disclosing attributes

• Disclosing edges

• Disclosing properties

• node degree, clustering, etc.

• properties of neighbors (e.g. mostly friends with republicans)

9

Page 10: Privacy-Aware Data Management in Information Networks

Goals in analyzing networks

• Properties of the degree distribution

• Motif analysis

• Community structure

• Processes on networks: routing, rumors, infection

• Resiliency / robustness

• Homophily

• Correlation / causation

Can we permit analysts to study networks without revealing sensitive information about participants?

Example analyses

10

Page 11: Privacy-Aware Data Management in Information Networks

Naive anonymization

Original network

NaiveAnonymization

DATA OWNER ANALYST

Naive anonymization is a transformation of the network in which identifiers are replaced with random numbers.

Alice Bob Carol

Dave Ed

Fred Greg Harry4

2

5

13

6

7

8

Naively anonymized network

Good utility: output is isomorphic to the original network

AliceBobCarolDaveEdFredGregHarry

68572341

11

Page 12: Privacy-Aware Data Management in Information Networks

Protection under naive anonymization

• Two primary threats:

• Node re-identification: adversary is able to deduce that node x in the naively anonymized network corresponds to an identified individual Alice in the hidden network.

• Edge disclosure: adversary is able to deduce that two identified individuals Alice and Bob are connected in the hidden network.

• With no external information: good protection

• Who is Alice? one of {1,2,3,4,5,6,7,8}

• Alice and Bob connected? 11/28 likelihood 4

2

5

13

6

7

8

12

Page 13: Privacy-Aware Data Management in Information Networks

Adversaries with external information

• Structural knowledge

• often assumed limited to small radius around node

• “Alice has degree 2” or “Bob has two connected neighbors”

• Information can be precise or approximate

• External information may be acquired from a specific attack, or we may assume a category of knowledge as a bound on adversary capabilities.

External information: facts about identified individuals and their relationships in the hidden network.

13

Page 14: Privacy-Aware Data Management in Information Networks

(86%) during this special study interval, using an interview

period that was at least twice as long (180 days). Both compo-

nent distribution and non-cyclic linear structure were similar

during this interval to that of the four year period. The only

notable difference was a substantial shift in the dyad to triad

ratio, from 1.8:1 overall to 0.46:1 in the 1996–97 study

interval. Thus enhanced partner interviewing procedures

tended to increase observed connectivity in the smallestcomponents. Low overall network connectivity and the virtual

absence of cyclic microstructures in large connected compo-

nents support the view that chlamydia infection in Colorado

Springs was probably in a maintenance phase or possibly in a

decline phase during the four year study period. We conclude

that the fragmented, non-cyclic network structure observed

probably reflects low endemic rather than epidemic spread.

Comparison with epidemic network structureA historical contact tracing dataset recording rapid epidemic

spread of bacterial STD in Colorado Springs was available for

reanalysis. As previously reported,2 3

a group composed of 578

persons, mostly adolescents associated with crack cocaine

street gangs, was involved in an STD outbreak during 1990

and 1991. Of 578 individuals identified, 410 (71%) formed a

single connected component consisting of 218 men and 192

women. In this component, 300 (73%) were examined; 248

were infected with one or more bacterial STD (261 gonococcal,

127 chlamydial, and two early syphilis infections). These data

suggest a hyperendemic STD period prevalence of 130 000

cases per 100 000 population. The dense interconnections in

this group reveal a predominantly cyclic pattern with some

linear connections at individual nodes (fig 4A). Pruning the

A

B

G

G = gonorrhoeaGT = gonorrhoea and chlamydiaT = chlamydia (T, trachomatis)

GT

GT

G

G

G

G

GT

G

G

G GT G

GGGT

G

GT

G

TT

GGT

GTGT

GTGT

GT

GG

GT

GT

GT

GT

GT

GTGT

G

TT

GTG

GT

G

GTG

G

G

G G

GT

GG

G GTGTGT G

GT

G

G

GT

GT

GG

GT

GT

GT

GT

GT

GT

G

G

G

GT

GT GT

GT

G

G

GT

GT

G

G

GTG

GT

GT

Figure 4 (A) Graph of the largestcomponent in gang associated STDoutbreak, Colorado Springs,1989–91 (n = 410). (B) Core of thelargest component in gangassociated STD outbreak, ColoradoSprings, 1989–91 (n = 107).

Sexual network structure i155

www.sextransinf.com

on 16 August 2008 sti.bmj.comDownloaded from

Naively Anonymized Network 14

unique or partialnode re-identification

Bob

External information

Alice

Matching attack: the adversary matches external information to a

naively anonymized network.

Matching attacks

Page 15: Privacy-Aware Data Management in Information Networks

Attacks on naively anonymized networks

• Success of a matching attack depends on:

• descriptiveness of external information

• structural diversity in the network

• With external information: weaker protection

• Who is Alice?

• Who is Alice, if her degree is known to be 4 ?

• Alice and Bob connected?

one of {2,4,7,8}

4

2

5

13

6

7

8

one of {1,2,3,4,5,6,7,8}

15

Page 16: Privacy-Aware Data Management in Information Networks

Local structure is highly identifying

0

0.2

0.4

0.6

0.8

1

H1 H2 H3 H4

Re-identification Risk

Frac

tio

n o

f P

op

ulat

ion

Strength of Adversary’s Knowledge

[1][2-4][5-10][11-20][>21]

Friendster network~4.5 million nodes

Uniquely identified

degree nbrs degrees

Well-protected

[Hay, PVLDB 08] 16

Page 17: Privacy-Aware Data Management in Information Networks

Active attack on an online network

• Goal: disclose edge between two targeted individuals.

• Assumption: adversary can alter the network structure, by creating nodes and edges, prior to naive anonymization.

• In blogging network: create new blogs and links to other blogs.

• In email network: create new identities, send mail to identities.

• (Harder to carry out this attack in a physical network)

[Backstrom, WWW 07] 17

Page 18: Privacy-Aware Data Management in Information Networks

Active attack on an online network

1 Attacker creates a distinctive subgraph of nodes and edges.

2 Attacker links subgraph to target nodes in the network.

Naive anonymizationNaive anonymization

3 Attacker finds matches for pattern in naively anonymized network.

4 Attacker re-identifies targets and discloses structural properties.

Alice

Bob

(86%) during this special study interval, using an interview

period that was at least twice as long (180 days). Both compo-

nent distribution and non-cyclic linear structure were similar

during this interval to that of the four year period. The only

notable difference was a substantial shift in the dyad to triad

ratio, from 1.8:1 overall to 0.46:1 in the 1996–97 study

interval. Thus enhanced partner interviewing procedures

tended to increase observed connectivity in the smallestcomponents. Low overall network connectivity and the virtual

absence of cyclic microstructures in large connected compo-

nents support the view that chlamydia infection in Colorado

Springs was probably in a maintenance phase or possibly in a

decline phase during the four year study period. We conclude

that the fragmented, non-cyclic network structure observed

probably reflects low endemic rather than epidemic spread.

Comparison with epidemic network structureA historical contact tracing dataset recording rapid epidemic

spread of bacterial STD in Colorado Springs was available for

reanalysis. As previously reported,2 3

a group composed of 578

persons, mostly adolescents associated with crack cocaine

street gangs, was involved in an STD outbreak during 1990

and 1991. Of 578 individuals identified, 410 (71%) formed a

single connected component consisting of 218 men and 192

women. In this component, 300 (73%) were examined; 248

were infected with one or more bacterial STD (261 gonococcal,

127 chlamydial, and two early syphilis infections). These data

suggest a hyperendemic STD period prevalence of 130 000

cases per 100 000 population. The dense interconnections in

this group reveal a predominantly cyclic pattern with some

linear connections at individual nodes (fig 4A). Pruning the

A

B

G

G = gonorrhoeaGT = gonorrhoea and chlamydiaT = chlamydia (T, trachomatis)

GT

GT

G

G

G

G

GT

G

G

G GT G

GGGT

G

GT

G

TT

GGT

GTGT

GTGT

GT

GG

GT

GT

GT

GT

GT

GTGT

G

TT

GTG

GT

G

GTG

G

G

G G

GT

GG

G GTGTGT G

GT

G

G

GT

GT

GG

GT

GT

GT

GT

GT

GT

G

G

G

GT

GT GT

GT

G

G

GT

GT

G

G

GTG

GT

GT

Figure 4 (A) Graph of the largestcomponent in gang associated STDoutbreak, Colorado Springs,1989–91 (n = 410). (B) Core of thelargest component in gangassociated STD outbreak, ColoradoSprings, 1989–91 (n = 107).

Sexual network structure i155

www.sextransinf.com

on 16 August 2008 sti.bmj.comDownloaded from

[Backstrom, WWW 07] 18

Page 19: Privacy-Aware Data Management in Information Networks

Results of active attack

• Given a network G with n nodes, it is possible to construct a pattern subgraph with k = O(log(n)) nodes that will be unique in G with high probability.

• injected subgraph is chosen uniformly at random.

• the number of subgraphs of size k that appear in G is small relative to the number of all possible subgraphs of size k.

• The pattern subgraph can be efficiently found in the released network, and can be linked to as many as O(log2(n)) target nodes.

• In 4.4 million node Livejournal friendship network, attack succeeds w.h.p. for 7 pattern nodes.

[Backstrom, WWW 07] 19

Page 20: Privacy-Aware Data Management in Information Networks

Auxiliary network attack

• Goal: re-identify individuals in a naively anonymized target network

• Assumptions:

• An un-anonymized auxiliary network exists, with overlapping membership.

• There is a set of seed nodes present in both networks, for which the mapping between target and auxiliary is known.

• Starting from seeds, mapping is extended greedily.

• Using Twitter (target) and Flickr (auxiliary), true overlap of ~30000 individuals, 150 seeds, 31% re-identified correctly, 12% incorrectly.

[Narayanan, OAKL 09] 20

Page 21: Privacy-Aware Data Management in Information Networks

Summary

• Naive anonymization may be good for utility...

• ... but it is not sufficient for protecting sensitive information in networks.

• an individual’s connections in the network can be highly identifying.

• external information may be available to adversary from outside sources or from specific attacks.

• Conclusion: stronger protection mechanisms are required.

21

Page 22: Privacy-Aware Data Management in Information Networks

Questions & challenges

• What is the correct model for adversary external information?

• How do attributes and structural properties combine to increase identifiability and worsen attacks?

• Are there additional attacks on naive anonymization (or other forms of anonymization)?

Next: How can we strengthen the protection offered by a released network while preserving utility ?

22

Page 23: Privacy-Aware Data Management in Information Networks

• Privately Managing Enterprise Network Data

• Goals, Threats, and Attacks

• Releasing transformed networks (anonymity)

• Releasing network statistics (differential privacy)

• Personal Privacy in Online Social Networks

• Understanding privacy risk

• Managing privacy controls

Outline of tutorial

23

Page 24: Privacy-Aware Data Management in Information Networks

Releasing data vs. statistics

A

C

BJ

I

W

D

E

FGY

Aa

Bb

M

Dd

Cc

H

P

N

O

Q

R

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

A

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

• Releasing transformed networks

A

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh safe answerQ

• Releasing “safe” network statistics

To prevent adversary attack, release transformed network

• transformations obscure identifying node features

• while hopefully preserve global topology

24

Page 25: Privacy-Aware Data Management in Information Networks

• A graph G( V, E ) is k-degree anonymous if every node in V has the same degree as k-1 other nodes in V.

Transform for degree anonymity

25

!"#$%"

&"#'%" ("#'%"

)"#'%" *"#'%"

!"#$%"

&"#$%" ("#$%"

)"#'%" *"#'%"

!"#"$%&'!(#")

[Liu, SIGMOD 08]

1-degree anonymous

2-degree anonymous

Page 26: Privacy-Aware Data Management in Information Networks

• Problem: Given a graph G( V, E ) and integer k, find minimal set of

edges E’ such that graph G( V, E ∪ E’) is k-degree anonymous.

• Approach: Use dynamic programming to finds minimum change to degree sequence.

• Challenge: may not be possible to realize degree sequence through edge additions.

• Example: V = {a, b, c}, E = { (b,c) }. Degree sequence is [0,1,1]. Min. change yields [1,1,1] but not realizable (without self-loops).

• Algorithm: draws on ideas from graph theory to construct a graph with minimum, or near minimum, edge insertions.

Algorithm for degree anonymization

26

[Liu, SIGMOD 08]

Page 27: Privacy-Aware Data Management in Information Networks

• Degree anonymization is an instance of a more general paradigm. Many approaches proposed follow this paradigm.

A common problem formulation

27

Given input graph G,

• Consider set of graphs G, each G* in G reachable from G by certain graph transformations

• Find G* in G such that G* satisfies privacy( G*, ... ), and

• Minimizes distortion( G, G* )

Page 28: Privacy-Aware Data Management in Information Networks

Privacy as resistance to attack

• Adversary capability: knowledge of...

• attributes

• degree

• subgraph neighborhood

• structural knowledge beyond immediate neighborhood

• Attack outcome

• Node re-identification

• Edge disclosure

28

Page 29: Privacy-Aware Data Management in Information Networks

Kinds of transformations

• Transformations considered in literature can be classified into three categories

• Directed alteration

• Generalization

• Random alteration

29

Page 30: Privacy-Aware Data Management in Information Networks

Directed alteration

• Transform network by adding (or removing) edges

• [Liu, SIGMOD 08] insert edges to achieve degree anonymity

• [Zhou, ICDE 08] neighborhood anonymity, labels on nodes

• [Zou, PVLDB 09] complete anonymity (k isomorphic subgraphs)

• [Cheng, SIGMOD 10] complete anonymity and bounds on edge disclosure

30

Hh

Z

X

Ff

K

S

L

V

Gg

U

Ee

T

RQO

NP H

CcDd

M

BbAa

Y

G

F

E

D

W

I

J

BCAA

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

Page 31: Privacy-Aware Data Management in Information Networks

3

4

3

1

33

16

316

2

8

511

9

21

13

24

7 30

32

2215

2627

1210 1 29

25

17

20

14

34

4

19

3

33

28

23

18

17

63

52

1

2

3

13

6

4 4

31

5

1

Generalization

• Transform network by cluster nodes into groups

• [Cormode, PVLDB 08] attribute-based attacks (graph structure unmodified) on bipartite graphs, prevents edge disclosure

• [Cormode, PVLDB 09] similar to above but for arbitrary interaction graphs (attributes on nodes and edges)

• [Hay, PVLDB 08, VLDBJ 10] summarize graph topology in terms of node groups; anonymity against arbitrary structural knowledge

31

A

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

Page 32: Privacy-Aware Data Management in Information Networks

Hh

Z

X

Ff

K

S

L

V

Gg

U

Ee

T

RQO

NP H

CcDd

M

BbAa

Y

G

F

E

D

W

I

J

BCA

Random alteration

• Transform network by stochastically adding, removing, or rewiring edges

• [Ying, SDM 08] random rewiring subject to utility constraint (spectral properties of graph must be preserved).

• [Liu, SDM 09] randomization to hide sensitive edge weights

• [Wu, SDM 10] exploits spectral properties of graph data to filter out some of the introduced noise.

32

A

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

Page 33: Privacy-Aware Data Management in Information Networks

Other work in network transformation

• Other works

• [Zheleva, PinKDD 07] predicting sensitive hidden edges from released graph data (nodes and non-sensitive edges).

• [Ying, SNA-KDD 09] comparison of randomized alteration and directed alteration.

• [Bhagat, WWW 10] releasing multiple views of a dynamic social network.

• Surveys:

• [Liu, Next Generation Data Mining 08]

• [Zhou, SIGKDD 08]

• [Hay, Privacy-Aware Knowledge Discovery 10]

• [Wu, Managing and Mining Graph Data 10]

33

Page 34: Privacy-Aware Data Management in Information Networks

Evaluating impact on utility

• After transformations, graph is released to public. Analyst measures transformed graph in place of original. What is impact on utility?

• Graph remains useful if it is “similar” to original. How measure similarity?

• Related questions arise in statistical modeling of networks and assessing model fitness [Goldenberg, Foundations 10] [Hunter, JASA 08]

• Common approach to evaluating utility: empirically compare transformed graph to original graph in terms of various network properties

34

Page 35: Privacy-Aware Data Management in Information Networks

Impact on network properties

0 10 20 30

110

100

1000

Inf 2 4 6 8 10

050

100

150

0.0 0.2 0.4 0.6 0.8 1.0

01

10100

1000

path lengths clusteringdegree

freq

uenc

y

!

! !!

!

!! ! !

!

0.00

0.25

0.50

Avg

. Dis

tort

ion

k2 5 10 20 |V|

0.00

0.25

0.50

Clustering

Degree

Paths

Original

k=10k= |V|

Algorithm from Hay PVLDB 08; experiments on version of HepTh network (2.5K nodes, 4.7K edges)

[Hay, PVLDB 08] 35

Page 36: Privacy-Aware Data Management in Information Networks

Limitations

• Utility

• Transformation may distort some properties: some analysts will find transformed graph useless

• Lack of formal bounds on error: analyst uncertain about utility

• Privacy

• Defined as resistance to a specific class of attacks; vulnerable to unanticipated attacks?

• Inspired by k-anonymity; doomed to repeat that history? (See survey [Chen, Foundations and Trends in Database 09].)

36

Page 37: Privacy-Aware Data Management in Information Networks

Outline of tutorial

• Privately Managing Enterprise Network Data

• Goals, Threats, and Attacks

• Releasing transformed networks (anonymity)

• Releasing network statistics (differential privacy)

• Differential privacy

• Degree sequence

• Subgraph counts

• Personal Privacy in Online Social Networks37

Page 38: Privacy-Aware Data Management in Information Networks

Releasing data vs. statistics

Ease of use good

Protection anonymity

Accuracy no formal guaranteesA

C

BJ

I

W

D

E

FGY

Aa

Bb

M

Dd

Cc

H

P

N

O

Q

R

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

A

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh

• Releasing transformed networks

A

CB

J

I

W

D

E

F

G

Y

Aa Bb

M

DdCc

HPN

O QR

T

Ee

U

Gg

V

L

S

K

Ff

X

Z

Hh safe answerQ Ease of use bad for practical analyses

Protection formal privacy guarantee

Accuracy provable bounds

• Releasing “safe” network statistics

Q(G) + noiseoutputperturbation

38

Page 39: Privacy-Aware Data Management in Information Networks

When are aggregate statistics safe to release?

• “Safe” statistics should report on properties of a group, without revealing properties of individuals.

• We often want to release a combination of statistics. Still safe?

• What if adversary uses external information along with statistics? Still safe?

• Dwork, McSherry, Nissim, Smith [Dwork, TCC 06] proposed differential privacy as a rigorous standard for safe release.

• Many existing results for tabular data; relatively few results for network data.

39

Page 40: Privacy-Aware Data Management in Information Networks

The differential guarantee

Two databases are neighbors if they differ by at most one tuple

DqA

DATA OWNER ANALYST

Neighborsindistinguishable

given output

(no. of ‘B’ students)name gender grade

Alice Female A

Bob Male B

Carl Male Aq(D)~

D’q(D’)qA

name gender grade

Alice Female A

Carl Male A~

(noisy answer on D)

40

Page 41: Privacy-Aware Data Management in Information Networks

Pr[A(D) ∈ S] ≤ e�Pr[A(D�) ∈ S]

Differential privacy

A randomized algorithm A provides ε-differential privacy if:for all neighboring databases D and D’, andfor any set of outputs S:

epsilon is a privacy parameter

� = 0.1 e� ≈ 1.10Epsilon is usually small: e.g. if then

epsilon = stronger privacy

[Dwork, TCC 06] 41

Page 42: Privacy-Aware Data Management in Information Networks

Calibrating noise

• How much noise is necessary to ensure differential privacy?

• Noise large enough to hide “contribution” of individual record.

• Contribution measured in terms of query sensitivity.

42

Page 43: Privacy-Aware Data Management in Information Networks

Query sensitivity

where D, D’ are any two neighboring databases

The sensitivity of a query q is!q = max | q(D) - q(D’) |

D,D’

Query q Sensitivity !q

q1: Count tuples 1q2: Count(‘B’ students) 1

q3: Count(students with property X) 1q4: Median(age of students) ~ max age

[Dwork, TCC 06] 43

Page 44: Privacy-Aware Data Management in Information Networks

q(D) + Laplace( scale )Δq / ε

The Laplace mechanism

The following algorithm for answering q is ε-differentially private:

ALaplace

Mechanism

[Dwork, TCC 06]

true answer

sample from scaled distribution

privacy parametersensitivity of q

0

0.25

0.5

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.25

0.5

-5 -4 -3 -2 -1 0 1 2 3 4 5

Δq=1ε=1.0

Bob out Bob inBob out Bob in

Δq=1ε=0.5

44

Page 45: Privacy-Aware Data Management in Information Networks

Differentially private algorithms

• Any query can be answered (but perhaps with lots of noise)

• Noise determined by privacy parameter epsilon and the sensitivity (both public)

• Multiple queries can be answered (details omitted)

• Privacy guarantee does not depend on assumptions about the adversary (caveats omitted, see [Kifer, SIGMOD 11])

Survey paper on differential privacy: [Dwork, CACM 10]

45

Page 46: Privacy-Aware Data Management in Information Networks

Adapting differential privacy for networks

• For networks, what is the right notion of “differential object?”

• Hide individual’s “evidence of participation” [Kifer, SIGMOD 11]

• An edge? A set of k edges? A node (and incident edges)?

• More discussion in [Hay, ICDM 09] [Kifer, SIGMOD 11]

• Choice impacts utility

• Existing work considers only edge, and k-edge, differential privacy.

46

A participant’s sensitive information is not a single edge.

Page 47: Privacy-Aware Data Management in Information Networks

What can we learn accurately?

• What can we learn accurately about a network under edge or k-edge differential privacy?

• Basic approach:

• Express desired task as one or more queries.

• Check query sensitivity

• if High: not promising, but sometimes representation matters.

• if Low: maybe promising, but may still require work.

47

Page 48: Privacy-Aware Data Management in Information Networks

Outline of tutorial

• Privately Managing Enterprise Network Data

• Goals, Threats, and Attacks

• Releasing transformed networks (anonymity)

• Releasing network statistics (differential privacy)

• Differential privacy

• Degree sequence

• Subgraph counts

• Personal Privacy in Online Social Networks48

Page 49: Privacy-Aware Data Management in Information Networks

The degree sequence can be estimated accurately

• Degree sequence: the list of degrees of each node in a graph.

• A widely studied property of networks.

Alice Bob Carol

Dave Ed

Fred Greg Harry

[1,1,2,2,4,4,4,4]

Inverse cumulative distribution

Orkutcrawl

orkut

x

CF(x)

0.0

0.2

0.4

0.6

0.8

1.0

s

0 1 4 9 19 49 99

Fra

ctio

n

Degree

[Hay, PVLDB 10] [Hay, ICDM 09]

49

Page 50: Privacy-Aware Data Management in Information Networks

Two basic queries for degrees

Frequency of each degreeFrequency of each degreecnti count of nodes with degree i

F [cnt0, cnt1, ... cntn-1]

Alice Bob Carol

Dave Ed

Fred Greg Harry

Alice Bob Carol

Dave Ed

Fred Greg Harry

G G’

Degree of each nodeDegree of each nodedegA degree of node AD [degA, degB, ... ]

D(G) = [1,4,1,4,4,2,4,2]D(Gʼ) = [1,4,1,3,3,2,4,2]

ΔD=2

F(G) = [0,2,2,0,4,0,0,0]F(Gʼ) = [0,2,2,2,2,0,0,0]

ΔF=450

Page 51: Privacy-Aware Data Management in Information Networks

1 2 5 10 20 50 200 5000.0

0.2

0.4

0.6

0.8

1.0

orkut

x

Pr[X>=x]

These queries are both flawed

• D requires independent samples from Laplace(2/ε) in each component.

• F requires independent samples from Laplace(4/ε) in each component.

• Thus Mean Squared Error is Θ(n/ε2)

originalDF

Fra

ctio

n

Degree

[Hay, PVLDB 10] [Hay, ICDM 09]

New technique allows improved error of O(d log3(n)/ε2)

(where d is # of unique degrees)

51

Page 52: Privacy-Aware Data Management in Information Networks

An alternative query for degrees

Degree of each node, rankedDegree of each node, rankedrnki return the rank ith degreeS [rnk1, rnk2, ... rnkn ]

S(G) = [1,1,2,2,4,4,4,4]S(Gʼ) = [1,1,2,2,3,3,4,4]

ΔS=2

Degree of each nodeDegree of each nodedegA degree of node AD [degA, degB, ... ]

D(G) = [1,4,1,4,4,2,4,2]D(Gʼ) = [1,4,1,3,3,2,4,2]

ΔD=2

Alice Bob Carol

Dave Ed

Fred Greg Harry

Alice Bob Carol

Dave Ed

Fred Greg Harry

G G’

52

Page 53: Privacy-Aware Data Management in Information Networks

S(G) = [10, 10, ....10, 10, 14, 18,18,18,18]• The output of the sorted degree query is not (in general) sorted.

• We derive a new sequence by computing the closest non-decreasing sequence: i.e. minimizing L2 distance.

0 5 10 15 20 25

05

1015

20

!!

!

!

!! !

! !

!!

! ! !

!

!

!

!!

!!

! !! !

! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

!

! ! ! !

!

!

!

S(G) true degree sequencenoisy observations (! = 2)inferred degree sequence

0 5 10 15 20 25

05

1015

20

!!

!

!

!! !

! !

!!

! ! !

!

!

!

!!

!!

! !! !

!

!

!

S(G) true degree sequencenoisy observations (! = 2)inferred degree sequence

0 5 10 15 20 25

05

1015

20

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

!

! ! ! !

!!

!

!

!! !

! !

!!

! ! !

!

!

!

!!

!!

! !! !

!

!

!

S(G) true degree sequencenoisy observations (! = 2)inferred degree sequence

0 5 10 15 20 25

05

1015

20

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

!

! ! ! !

!

!

!

S(G) true degree sequencenoisy observations (! = 2)inferred degree sequence

Using the sort constraint

= 19th smallest degree + noise

Deg

ree

Rank

[Hay, PVLDB 10] [Hay, ICDM 09]

53

Page 54: Privacy-Aware Data Management in Information Networks

Experimental results, continued

powerlaw α=1.5, n=5M

livejournal n=5.3M

orkut n=3.1M

ε=.001

ε=.01

0.0

0.2

0.4

0.6

0.8

1.0

0 1 4 9 19 49

0.0

0.2

0.4

0.6

0.8

1.0

0 1 4 9 19 49

0.0

0.2

0.4

0.6

0.8

1.0

0 1 4 9 19 49 99

0.0

0.2

0.4

0.6

0.8

1.0

0 1 4 9 19 49 99

0.0

0.2

0.4

0.6

0.8

1.0

0 4 9 49 499

0.0

0.2

0.4

0.6

0.8

1.0

0 4 9 49 499

originalnoisyinferred

Fra

ctio

n

Degree54

Page 55: Privacy-Aware Data Management in Information Networks

Outline of tutorial

• Privately Managing Enterprise Network Data

• Goals, Threats, and Attacks

• Releasing transformed networks (anonymity)

• Releasing network statistics (differential privacy)

• Differential privacy

• Degree sequence

• Subgraph counts

• Personal Privacy in Online Social Networks55

Page 56: Privacy-Aware Data Management in Information Networks

Subgraph counting queries

• Given query graph H, return the number of subgraphs of G that are isomorphic to H.

• Importance

• Used in statistical modeling: exponential random graph models

• Descriptive statistics: clustering coefficient from 2-star, triangle

56

2-star 3-star triangle 2-triangle

Page 57: Privacy-Aware Data Management in Information Networks

Subgraph counts have high sensitivity

• QTRIANGLE: return the number of triangles in the graph

• High sensitivity due “pathological” worst-case graph. If input is not pathological, can we obtain accurate answers?

...

n-2 nodes

A B

...

n-2 nodes

A B

G G’

QTRIANGLE (G) = 0 QTRIANGLE (G’) = n-2

High Sensitivity:

ΔQTRIANGLE=O(n)

57

Page 58: Privacy-Aware Data Management in Information Networks

Local sensitivity

• Tempting, but flawed, idea is to add noise proportional to local sensitivity.

• Local sensitivity of q on G: maximum difference in query answer between G and a neighbor G’.

• Example shows problem of using local sensitivity (from [Smith, IPAM 10]): database D is set of number, query q is the median

LS(G) = max | q(G) - q(G’) |G’∈N(G)

0...0 000 c...c}}(n-3)/2 (n-3)/2

D =

LS(D)=0

0...0 00c c...cD’ = }}

(n-3)/2 (n-3)/2

LS(D’)=c

58

Page 59: Privacy-Aware Data Management in Information Networks

Instance-based noise

• Two general approaches to adding instance-based noise

• Smooth sensitivity Compute a smooth upper bound on local sensitivity [Nissim, STOC 07].

• Noisy sensitivity Use differentially private mechanism to get noisy upper “bound” on local sensitivity [Behoora, PVLDB 11] [Dwork, STOC 09].

• Instance-based noise can require modest relaxation of differential privacy to account for (very low probability) “bad” events.

59

Page 60: Privacy-Aware Data Management in Information Networks

Differentially private subgraph counts

• For k-stars and triangles, smooth sensitivity is efficiently computable

• For k-triangles with k ! 2

• Computing smooth sensitivity NP-Hard.

• However, it can be estimated using noisy sensitivity approach

• Empirical and theoretical analysis:

• Generally, instance-based noise not much larger than local sensitivity

• However, for k-triangles on real data, local sensitivity sometimes large (relative to actual number of k-triangles).

[Behoora, PVLDB 11] 60

Page 61: Privacy-Aware Data Management in Information Networks

Alternative representations

• Number of k-stars in a graph can be computed from the degree sequence

• In other words, an answer to the high sensitivity k-star query can be derived from results of the degree sequence estimator.

• Would be interesting to compare error of this approach with instance-based noise approach of [Behoora, PVLDB 11].

k-stars(G) =�

v∈G

�deg(v)

k

61

Page 62: Privacy-Aware Data Management in Information Networks

Other work on releasing network statistics

• [Rastogi, PODS 09] Subgraph counting queries under an alternative model of adversarial privacy. Expected error Θ(log2n) instead of Θ(n) for restricted class of adversaries.

• [Machanavajjhala, PVLDB 11] Investigates recommender systems that use friends’ private data to make recommendations.

• Lower bound on accuracy of differentially private recommender

• Experimental analysis shows poor utility under reasonable privacy.

62

Page 63: Privacy-Aware Data Management in Information Networks

Open questions

• For graph analysis X, how accurately can we estimate X under edge or node differential privacy?

• Lower bounds on accuracy under node differential privacy?

• Is it socially acceptable to offer weaker privacy protection to high-degree nodes (as in k-edge differential privacy)?

• Can we generate accurate synthetic networks under differential privacy?

63

Page 64: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

– 8@;;%")*%,?*3/',*A@'0<3,0*

BC*

Page 65: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

• D=%&*#0*/"#$%6)*"#04*&3*3,(#,'*036#%(E,'&23"4#,-*@0'"0*

•  F='*0%?*0#&@%<3,**– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

– 8@;;%")*%,?*3/',*A@'0<3,0*

BG*

Page 66: Privacy-Aware Data Management in Information Networks

-&*)+.,/)&(01,+%&2(%&(0)3%,$(&'#4)+50(

+#((#3,0*3:*@0'"0*0=%"'*?'&%#(0*3:*&='#"*/'"03,%(*(#$'0*

2#&=*$%0&*,'&23"40*3:*:"#',?0H*%,?*3I',H*0&"%,-'"0*

J3@"&'0)*&3K*=L/KMM222N63,&"#ON%,?"'2N6;@N'[email protected]&M;)-"3@/N=&;(*

BB*

Page 67: Privacy-Aware Data Management in Information Networks

61,#(%0(7+%8,39(+%05(#)(0)3%,$:&'#4)+5%&2("0'+0;(

<1'(%&*)+.,/)&(9)"(01,+'('=7$%3%#$9>('?2?>(&,.'>(,2'>(2'&@'+>(71)&'>(,@@+'00>('.7$)9'+>('#3?(3,&($',@(#)(%@'&/#9(#1'A?((

BQ*

Page 68: Privacy-Aware Data Management in Information Networks

61,#(%0(7+%8,39(+%05(#)(0)3%,$:&'#4)+5%&2("0'+0;(

<1'(%&*)+.,/)&(9)"(@%@(&)#(01,+'('=7$%3%#$9(3,&(,$0)(B'(@'+%8'@(*+).(9)"+(7"B$%3(7+)C$'>(*+%'&@01%7(3)&&'3/)&0()+('8'&(.%3+):#,+2'#'@(,@8'+/0%&2(090#'.0?(

BR*

Page 69: Privacy-Aware Data Management in Information Networks

<1'(0,@(0%#",/)&D(

E)4(#)(7+'8'&#(.9('=(*+).(0''%&2(.9(0#,#"0("7@,#'0;((

E)4(#)(1%@'(.9(*+%'&@($%0#(%&(

#1'(0',+31(+'0"$#0;(

E)4(#)(7+'8'&#(#1'(,77$%3,/)&0(.9(

*+%'&@0(%&0#,$$'@(*+).(,33'00%&2(.9(%&*)+.,/)&;(

F$$(.9(*+%'&@0(1,8'(01,+'@(#1'%+(

1).'#)4&(,&@(71)&'(&".B'+>(.,9B'(-(

01)"$@(,$0)(@)(#1%0;(

-('&G)9'@(01,+%&2(.9(@,%$9(,3/8%/'0(4%#1(#1'(6)+$@H((I"#(,&9(,@8'+0'(

'J'3#0;**

K9(L)@H(61,#(%&*)+.,/)&(-(1,8'(

01,+'@(,$$(#1'0'(9',+0(,&@(41)(3,&(8%'4(#1'0'(%&*)+.,/)&;(

BS*

Page 70: Privacy-Aware Data Management in Information Networks

<1'(0,@(0%#",/)&D(M3)&#?N(•  T3@*=%$'*63,&"3(*3,*2=%&*#,:3";%<3,*)3@*2%,&*&3*0=%"'H*2=3*)3@*2%,&*&3*63,,'6&*2#&=*

•  T3@*?3*,3&*=%$'*63;/"'=',0#$'*%,?*%66@"%&'*#?'%*3:*&='*#,:3";%<3,*)3@*=%$'*'U/(#6#&()*%,?*#;/(#6#&()*?#06(30'?*

•  8'V,-*3,(#,'*/"#$%6)*#0*<;'*63,0@;#,-*%,?*;%,)*3:*)3@*6=330'*&3*%66'/&*&='*?':%@(&*0'V,-*

•  .$',&@%(()*)3@*(30'*63,&"3(WN%,?*)3@*%"'*:%6#,-*/"#$%6)*"#04*

QX*

Page 71: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*•  !"#$%6)*"#04*?@'*&3*2=%&*)3@*0=%"'?*'U/(#6#&()*•  !"#$%6)*"#04*?@'*&3*2=%&*)3@*0=%"'?*#;/(#6#&()**

•  F33(0*&3*$#0@%(#Y'*)3@"*/"#$%6)*/3(#6#'0**– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

– 8@;;%")*%,?*3/',*A@'0<3,0*

QZ*

Page 72: Privacy-Aware Data Management in Information Networks

O+%8,39(+%05(@"'(#)(41,#(9)"(01,+'@('=7$%3%#$9(

•  !"#$%6)*"#04*#0*;'%0@"'?*O)*!"#$%6)*863"'*[\#@H*9J5+*XS]*

•  !"#$%6)*063"'*&%4'0*#,&3*%663@,&*2=%&*#,:3*)3@^$'*0=%"'?*%,?*2=3*6%,*$#'2*&=%&*#,:3*

Q_*

Page 73: Privacy-Aware Data Management in Information Networks

I,0%3(7+'.%0'0()*(7+%8,39(03)+'(

•  P'&0%/8%#9Q(F='*;3"'*0',0#<$'*&='*#,:3";%<3,*

"'$'%('?*O)*%*@0'"H*&='*=#-='"*=#0*/"#$%6)*"#04N*

•  R%0%B%$%#9Q(F='*2#?'"*&='*#,:3";%<3,*%O3@&*%*@0'"*

0/"'%?0H*&='*=#-='"*=#0*/"#$%6)*"#04N**

Q`*

mother’s maiden name is more sensitive than mobile-phone number

home address known by everyone poses higher risks than by friends

[\#@H*9J5+*XS]*

Page 74: Privacy-Aware Data Management in Information Networks

<1'(*+,.'4)+5(

QC*

Privacy Score of User j due to Profile Item i

sensitivity of profile item i visibility of profile item i

name, or gender, birthday, address, phone number, degree, job, etc.

Page 75: Privacy-Aware Data Management in Information Networks

<1'(*+,.'4)+5(M3)&#?N(

QG*

Privacy Score of User j due to Profile Item i

sensitivity of profile item i visibility of profile item i

Overall Privacy Score of User j

name, or gender, birthday, address, phone number, degree, job, etc.

Page 76: Privacy-Aware Data Management in Information Networks

<1'(%#'.(+'07)&0'(#1')+9(M-S<N(,77+),31(

R(n, N) R(n, 1)

R(i, j)

R(1, N) R(1, 2) R(1, 1)

User_1 User_j User_N

Profile Item_1 (birthday)

Profile Item_i (cell phone #)

Profile Item_n

share, R(i, j) = 1

not share, R(i, j) = 0 Profile item’s discrimination

User’s attitude, e.g., conservative or extrovert

Profile item’s sensitivity Profile item i’s true visibility

QB*

Page 77: Privacy-Aware Data Management in Information Networks

T,$3"$,/&2(7+%8,39(03)+'("0%&2(-S<(

QQ*

Sensitivity: Visibility:

Overall Privacy Score of User j

byproducts: profile item’s discrimination and user’s attitude

a((*/%"%;'&'"0*6%,*O'*'0<;%&'?*@0#,-**

+%U#;@;*\#4'(#=33?*.0<;%<3,*%,?*.U/'6&%<3,E+%U#;#Y%<3,N*

Page 78: Privacy-Aware Data Management in Information Networks

-&#'+'0/&2(+'0"$#0(*+).("0'+(0#"@9(

QR*

Sensitivity of The Profile Items Computed by IRT Model Survey

Information-sharing preferences of 153 users on 49 profile items such as name, gender, birthday, political views, address, phone number, degree, job, etc. are collected.

Statistics

• 49 profile items

• 153 users from 18 countries/regions

• 53.3% are male and 46.7% are female

• 75.4% are in the age of 23 to 39

• 91.6% hold a college degree or higher

• 76.0% spend 4+ hours online per day

Average Privacy Scores Grouped by Geo Regions

Page 79: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*•  !"#$%6)*"#04*?@'*&3*2=%&*)3@*0=%"'?*'U/(#6#&()*•  !"#$%6)*"#04*?@'*&3*2=%&*)3@*0=%"'?*#;/(#6#&()**

•  F33(0*&3*$#0@%(#Y'*)3@"*/"#$%6)*/3(#6#'0**– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

– 8@;;%")*%,?*3/',*A@'0<3,0*

QS*

Page 80: Privacy-Aware Data Management in Information Networks

O+%8,39(+%05(@"'(#)(41,#(9)"(01,+'@(%.7$%3%#$9(

•  !"#$%6)*"#04*#0*;'%0@"'?*O)*=32*;@6=*)3@"*/"#$%&'*

#,:3";%<3,*6%,*O'*#,:'""'?*

•  !"#$%&'*#,:3";%<3,*6%,*O'*#,:'""'?*:"3;**

–  T3@"*/@O(#6*/"3b('H*:"#',?0=#/0H*-"3@/*;';O'"0=#/0H*'&6N**

•  !"#$%&'*#,:3";%<3,*6%,*O'*#,:'""'?*@0#,-*

– +%c3"#&)*$3<,-*[d'64'"H*D_8!*XS]H*[e='('$%H*DDD*XS]**

–  J3;;@,#&)*?'&'6<3,*[+#0(3$'H*D85+*ZX]*

–  J(%00#b6%<3,*[e='('$%H*DDD*XS]H*[\#,?%;33?H*DDD*XS]*

RX*

Page 81: Privacy-Aware Data Management in Information Networks

-&*'+'&3'(,U,35Q(.,G)+%#9(8)/&2(

d%0#6*!"';#0'K*O#"?0*3:*%*:'%&='"*f364*&3-'&='"*

RZ*

[d'64'"H*D_8!*XS]**

[e='('$%H*DDD*XS]**

FI

V

T

W

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*63,0'"$%<$'*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK**6334#,-H*%"&0*

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*('-%(#Y'*0%;'*0'U*;%""#%-'*

X-&#'+'0#0YK*:%0=#3,*%,?*%//%"'(*

XL'&@'+YK*g*XO)$%/3,$(R%'40YK*g*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YQ(G'4'$+9(,&@(01)'0(

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*'$'")*<;'*#*b,?*3@&*%*6@&'*O3)*

#0*63,0'"$%<$'*%*(#L('*/%"&*3:*;'*?#'0*

X-&#'+'0#0YK**1Ma*

XL'&@'+YK1Ma*XO)$%/3,$(R%'40YK*1Ma*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK*1Ma*

Page 82: Privacy-Aware Data Management in Information Networks

-&*'+'&3'(,U,35Q(3).."&%#9(@'#'3/)&(

R_*

[+#0(3$'H*D85+*ZX]**

>0'"0*2#&=*63;;3,*%L"#O@&'0*3I',*:3";*?',0'*63;;@,#<'0N*

FI

V

T

W

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*63,0'"$%<$'*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK**6334#,-H*%"&0*

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*('-%(#Y'*0%;'*0'U*;%""#%-'*

X-&#'+'0#0YK*:%0=#3,*%,?*%//%"'(*

XL'&@'+YK*g*XO)$%/3,$(R%'40YK*g*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YQ(G'4'$+9(,&@(01)'0(

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*'$'")*<;'*#*b,?*3@&*%*6@&'*O3)*

#0*63,0'"$%<$'*%*(#L('*/%"&*3:*;'*?#'0*

X-&#'+'0#0YK**1Ma*

XL'&@'+YK1Ma*XO)$%/3,$(R%'40YK*1Ma*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK*1Ma*

Page 83: Privacy-Aware Data Management in Information Networks

-&*'+'&3'(,U,35Q(3).."&%#9(@'#'3/)&(

R`*

[+#0(3$'H*D85+*ZX]**

>0'"0*2#&=*63;;3,*%L"#O@&'0*3I',*:3";*?',0'*63;;@,#<'0N*

FI

V

T

W

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*63,0'"$%<$'*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK**6334#,-H*%"&0*

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*('-%(#Y'*0%;'*0'U*;%""#%-'*

X-&#'+'0#0YK*:%0=#3,*%,?*%//%"'(*

XL'&@'+YK*g*XO)$%/3,$(R%'40YK*g*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YQ(G'4'$+9(,&@(01)'0(

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*'$'")*<;'*#*b,?*3@&*%*6@&'*O3)*

#0*63,0'"$%<$'*%*(#L('*/%"&*3:*;'*?#'0*

X-&#'+'0#0YK**1Ma*

XL'&@'+YK1Ma*XO)$%/3,$(R%'40YK*1Ma*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK*1Ma*

Page 84: Privacy-Aware Data Management in Information Networks

-&*'+'&3'(,U,35Q(3$,00%C3,/)&(

RC*

Z0'+( $'2,$%['(0,.'(0'=(.,++%,2'(( '8'+9(/.'(%(C&@()"#(,(3"#'(B)9(???( <'=,0(3)&0'+8,/8'0( O)$%/3,$(8%'40(

F( \( \( ]( ;(

I( ]( \( \( $%B'+,$(

T( \( ]( \( $%B'+,$(

V( \( \( ]( 3)&0'+8,/8'(

FI

V

T

W

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*63,0'"$%<$'*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK**6334#,-H*%"&0*

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*('-%(#Y'*0%;'*0'U*;%""#%-'*

X-&#'+'0#0YK*:%0=#3,*%,?*%//%"'(*

XL'&@'+YK*g*XO)$%/3,$(R%'40YK*g*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YQ(G'4'$+9(,&@(01)'0(

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*'$'")*<;'*#*b,?*3@&*%*6@&'*O3)*

#0*63,0'"$%<$'*%*(#L('*/%"&*3:*;'*?#'0*

X-&#'+'0#0YK**1Ma*

XL'&@'+YK1Ma*XO)$%/3,$(R%'40YK*1Ma*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK*1Ma*

[e='('$%H*DDD*XS]**

Page 85: Privacy-Aware Data Management in Information Networks

-&*'+'&3'(,U,35Q(3$,00%C3,/)&(

RG*

FI

V

T

W

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*63,0'"$%<$'*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK**6334#,-H*%"&0*

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*('-%(#Y'*0%;'*0'U*;%""#%-'*

X-&#'+'0#0YK*:%0=#3,*%,?*%//%"'(*

XL'&@'+YK*g*XO)$%/3,$(R%'40YK*g*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YQ(G'4'$+9(,&@(01)'0(

XL'&@'+YK*:';%('*

XO)$%/3,$(R%'40YK*(#O'"%(*XL+)"7YK*'$'")*<;'*#*b,?*3@&*%*6@&'*O3)*

#0*63,0'"$%<$'*%*(#L('*/%"&*3:*;'*?#'0*

X-&#'+'0#0YK**1Ma*

XL'&@'+YK1Ma*XO)$%/3,$(R%'40YK*1Ma*XL+)"7YK*&'U%0*63,0'"$%<$'0*X-&#'+'0#0YK*1Ma*

[\#,?%;33?H*DDD*XS]*

Page 86: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*•  !"#$%6)*"#04*?@'*&3*2=%&*)3@*0=%"'?*'U/(#6#&()*•  !"#$%6)*"#04*?@'*&3*2=%&*)3@*0=%"'?*#;/(#6#&()**

•  F33(0*&3*$#0@%(#Y'*)3@"*/"#$%6)*/3(#6#'0**– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

– 8@;;%")*%,?*3/',*A@'0<3,0*

RB*

Page 87: Privacy-Aware Data Management in Information Networks

<))$0(#)(8%0",$%['(7+%8,39(7)$%3%'0(

•  h#0@%(#Y%<3,0*0#-,#b6%,&()*#;/%6&*@0'"0^*

@,?'"0&%,?#,-*3:*&='#"*/"#$%6)*0'V,-0*[+%YY#%H*Ji9*ZZ]H*

[\#/:3"?H*Ji9*ZX]*

RQ*

Page 88: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

•  !"#$%6)*;%,%-';',&*:3"*#,?#$#?@%(0*

•  J3((%O3"%<$'*/"#$%6)*;%,%-';',&*:3"*0=%"'?*63,&',&0**

– 8@;;%")*%,?*3/',*A@'0<3,0*

RR*

Page 89: Privacy-Aware Data Management in Information Networks

O+%8,39(.,&,2'.'&#(*)+(%&@%8%@",$0(

•  836#%(*,%$#-%<3,*[\#@H*9J5+*XS]H*[d'0;'"H*87>!8*ZX]*

•  !"'$',<,-*#,:'"',6'*%L%640*[\#,?%;33?H*DDD*XS]*

•  \'%",#,-*/"#$%6)*/"':'"',6'0*2#&=*(#;#&'?*@0'"*#,/@&0*[j%,-H*DDD*ZX]H*[8='=%OH*DDD*ZX]*

RS*

Page 90: Privacy-Aware Data Management in Information Networks

P)3%,$(&,8%2,/)&((

SX*

P3)+'Q(]\\(^(]_\( P3)+'Q(]\\(^(]_\(

P3)+'Q(]\\(^(]_\(

P3)+'Q(]\\(^(]_\(

P3)+'Q(]\\(^(]_\(

836#%(*,%$#-%<3,*='(/0*@0'"0*;%4'*O'L'"*/"#$%6)*?'6#0#3,0*@0#,-*

63;;@,#&)*4,32('?-'*%,?*'U/'"<0'N**

[\#@H*9J5+*XS]*

[d'0;'"H*87>!8*ZX]*

Page 91: Privacy-Aware Data Management in Information Networks

O+'8'&/&2(%&*'+'&3'(,U,350(

SZ*

k';3$'M=#?'*"#04)*(#,40H*/"3b('0*3"*-"3@/0*&=%&*

63,&"#O@&'?*;30&*&3*&='*#,:'"',6'*%L%640N*

[\#,?%;33?H*DDD*XS]*

Page 92: Privacy-Aware Data Management in Information Networks

`',+&%&2(7+%8,39(7+'*'+'&3'0(

S_*

j#-@"'*63@"&'0)*&3*\@c@,*j%,-*%,?*l"#0&',*\'j'$"'N**

[j%,-H*DDD*ZX]*

[8='=%OH*DDD*ZX]*

\'%",#,-*/"#$%6)*/"':'"',6'0*2#&=*(#;#&'?*@0'"*#,/@&0*%,?*

%@&3;%<6%(()*63,b-@"'*/"#$%6)*0'V,-0*:3"*&='*@0'"N*

Page 93: Privacy-Aware Data Management in Information Networks

<1'(*+,.'4)+5(

•  h#'2*/"#$%6)*/"':'"',6'*;3?'(*%0*%*6(%00#b'"*

–  h#'2*'%6=*:"#',?*%0*%*:'%&@"'*$'6&3"*–  !"'?#6&*6(%00*(%O'(*m%((32*3"*?',)n*0=%"'*3"*,3&*0=%"'o*

•  l')*5'0#-,*p@'0<3,0K*–  i32*&3*63,0&"@6&*:'%&@"'0*:3"*'%6=*:"#',?g*–  i32*&3*03(#6#&*@0'"*#,/@&0*#,*3"?'"*&3*-'&*(%O'('?*?%&%g*

S`*

Page 94: Privacy-Aware Data Management in Information Networks

T)&0#+"3/&2(*',#"+'0(*)+(',31(*+%'&@(

SC*

Age Sex G0 G1 G2 G20 G21 G22 G3

Obama Fan

Pref. Label (DOB)

(Alice) 25 F 0 1 0 0 0 0 0 1 allow (Bob) 18 M 0 0 1 1 0 0 0 0 deny

(Carol) 30 F 1 0 0 0 0 0 0 0 ?

L\(L](

La(

Lb(La\(

La](

Laa(

:"#',?0*

j#-@"'*63@"&'0)*&3*\@c@,*j%,-*%,?*l"#0&',*\'j'$"'N**

[j%,-H*DDD*ZX]*[8='=%OH*DDD*ZX]*

%(03*0''*[q3,'0H*87>!8*ZX]*%(03*0''*[5%,'Y#0H*a98'6*XS]*

Page 95: Privacy-Aware Data Management in Information Networks

P)$%3%/&2("0'+(%&7"#0(

•  a04*@0'"*&3*(%O'(*0/'6#b6*:"#',?0*–  'N-NH*rD3@(?*)3@*(#4'*&3*0=%"'*)3@"*5%&'*3:*d#"&=*2#&=*a(#6'*

a?%;0gs*

•  J=330'*#,:3";%<$'*:"#',?0*@0#,-*%,*%6<$'*('%",#,-*

%//"3%6=*

–  >,6'"&%#,&)*0%;/(#,-*

SG*

Page 96: Privacy-Aware Data Management in Information Networks

Z&3'+#,%&#9(0,.7$%&2(•  8&%"&*2#&=*(%O'('?*:"#',?0*!"*%,?*@,(%O'('?*:"#',?0*!#*

•  8%;/(#,-*/"36''?0*#,*"3@,?0*–  a04*@0'"*&3*(%O'(*3,'*:"#',?*$*:"3;*!#**

–  $*6=30',*O%0'?*3,*@,6'"&%#,&)*'0<;%&'K*•  F"%#,*d%)'0#%,*6(%00#b'"*@0#,-*!"*•  j3"*'%6=*$*#,*!#H*'0<;%&'*%&''()*+*%,-./*•  J=330'*$*#,*!#*&=%&*;%U#;#Y'0***0.1-23&4.3/*5*6%&''()*'(7*%&''()*6*%,-./*'(7*%,-./*

•  >0'"*6%,*A@#&*%&*%,)*<;'*

•  F"%#,*/"':'"',6'*;3?'(*mb,%(*6(%00#b'"o*@0#,-*!"*–  >0'*&3*(%O'(*:"#',?0*#,*!#*

SB*

Page 97: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

•  !"#$%6)*;%,%-';',&*:3"*#,?#$#?@%(0*

•  J3((%O3"%<$'*/"#$%6)*;%,%-';',&*:3"*0=%"'?*63,&',&0**

– 8@;;%")*%,?*3/',*A@'0<3,0*

SQ*

Page 98: Privacy-Aware Data Management in Information Networks

T)$$,B)+,/8'(7+%8,39(.,&,2'.'&#(

SR*

!=3&30*m3"*3&='"*0=%"'?*63,&',&o*@/(3%?'?*&3*036#%(*,'&23"4#,-*0#&'0*%"'*

@0@%(()*63,&"3(('?*O)*0#,-('*@0'"0*2=3*%"'*,3&*&='*%6&@%(*3"*03('*0&%4'=3(?'"0N*

Page 99: Privacy-Aware Data Management in Information Networks

T)$$,B)+,/8'(7+%8,39(.,&,2'.'&#(M3)&#?N(•  F='*J=%((',-'*

–  .%6=*63E32,'"*;#-=&*=%$'*%*?#t'"',&*%,?*/300#O()*63,&"%0<,-*

/"#$%6)*/"':'"',6'*

–  i32*&3*6=330'*/"#$%6)*0'V,-*&3*;%U#;#Y'*3$'"%((*O','b&0g*

•  a,*%L';/&K*6(%"4'*&%U*;'6=%,#0;*[8A@#66#%"#,#H*DDD*XS]*

–  '%6=*32,'"*#,?#6%&'0*='"*/'"6'#$'?*O','b&*%&*'%6=*/"#$%6)*('$'(*m89&2-*)439*.(*(.-+*89&2-*)439*$24-.,8+*-31:o*

–  &='*0)0&';*b,?0*&='*O'0&*/"#$%6)*/"':'"',6'*&=%&*;%U#;#Y'0*&='*

3$'"%((*036#%(*O','b&*

–  '%6=*32,'"*/%)0*6'"&%#,*&%U*&3*&='*0)0&';*&3*63;/',0%&'*3&='"0^*(30'*

–  &='*;'6=%,#0;*/"'$',&0*%,*32,'"*:"3;*@,&"@&=:@(()*?'6(%"#,-*='"*O','b&*&3*;%,#/@(%&'*3@&63;'0*%&*='"*%?$%,&%-'*

SS*

[8A@#66#%"#,#H*DDD*XS]*

Page 100: Privacy-Aware Data Management in Information Networks

!"#$%&'()*(#"#)+%,$(

•  !"#$%&'()*+%,%-#,-*.,&'"/"#0'*1'&23"4*5%&%*

•  !'"03,%(*!"#$%6)*#,*7,(#,'*836#%(*1'&23"40*–  9,:3";%<3,*0=%"#,-*#,*036#%(*,'&23"40*

– >,?'"0&%,?#,-*)3@"*/"#$%6)*"#04*– +%,%-#,-*)3@"*/"#$%6)*63,&"3(*

•  !"#$%6)*;%,%-';',&*:3"*#,?#$#?@%(0*

•  J3((%O3"%<$'*/"#$%6)*;%,%-';',&*:3"*0=%"'?*63,&',&0**

– 8@;;%")*%,?*3/',*A@'0<3,0*

ZXX*

Page 101: Privacy-Aware Data Management in Information Networks

P"..,+9(

•  T3@*=%$'*6'"&%#,*63,&"3(*3:*&='*#,:3*)3@*%"'*0=%"#,-*

•  T3@*3I',*6%,,3&*'0<;%&'*&='*(3,-*&'";*"#04*$0N*

0=3"&*&'";*-%#,*

•  a(-3"#&=;0*&3*;'%0@"'*/3&',<%(*/"#$%6)*"#040*?@'*&3*

#,:3*0=%"'?*'#&='"*'U/(#6#&()*3"*#;/(#6#&()*

•  +3?'(0*&3*%(('$#%&'*)3@"*O@"?',*3,*/"#$%6)*

;%,%-';',&*

ZXZ*

Page 102: Privacy-Aware Data Management in Information Networks

!7'&(c"'0/)&0(

•  a*2#?'()*%66'/&'?*/"#$%6)*063"'*&=%&*O330&0*/@O(#6*%2%"','00*3:*&='*/"#$%6)*"#04*

•  a,*',?E&3E',?*7+,3/3,$*0)0&';*&3*;'%0@"'*%,?*

;%,%-'*/"#$%6)*3,(#,'*

ZX_*

Page 103: Privacy-Aware Data Management in Information Networks

Privately managing enterprise network data

Personal Privacy in Online Social Networks

Data: Enterprise collects data or observes interactions of individuals.

Control: Enterprise controls dissemination of information.

Goal: permit analysis of aggregate properties; protect facts about individuals.

Challenges: privacy for networked data, complex utility goals.

Data: Individuals contribute their data thru participation in OSN.

Control: Individuals control their connections, interactions, visibility.

Goal: reliable and transparent sharing of information.

Challenges: system complexity, leaks thru inference, unskilled users.

Page 104: Privacy-Aware Data Management in Information Networks

Open questions and future directions

• Anonymity: models of adversary knowledge, new attacks, new network transformations, improved utility evaluation.

• Differential privacy: adapting privacy definition to networks, mechanisms for accurate estimates of new network statistics, synthetic network generation, error-optimal mechanisms,

• Extended data model: attributes on nodes/edges, dynamic network data.

Page 105: Privacy-Aware Data Management in Information Networks

References

• [Backstrom, WWW 07] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW 2007.

• [Becker, W2SP 09] J. Becker and H. Chen. Measuring privacy risk in online social networks. In W2SP 2009.

• [Behoora, PVLDB 11] I. Behoora, V. Karwa, S. Raskhodnikova, A. Smith, G. Yaroslavtsev. Private Analysis of Graph Structure. In PVLDB 2011.

• [Besmer, CHI 10] A. Besmer and H. Lipford. Moving beyond untagging: photo privacy in a tagged world. In CHI 2010.

• [Besmer, SOUPS 10] A. Besmer, J. Watson, and H. Lipford. The impact of social navigation on privacy policy configuration. In SOUPS 2010.

• [Bhagat, WWW 10] S. Bhagat, G. Cormode, B. Krishnamurthy, D. Srivastava. Privacy in dynamic social networks. In WWW 2010.

• [Campan, PinKDD 08] A. Campan and T. M. Truta. A clustering approach for data and structural anonymity in social networks. In PinKDD 2008.

Page 106: Privacy-Aware Data Management in Information Networks

References (continued)

• [Chen, Foundations and Trends in Database 09] B. Chen, D. Kifer, K. LeFevre, and A. Machanavajjhala. Privacy-Preserving Data Publishing. In Foundations and Trends in Databases 2009.

• [Cheng, SIGMOD 10] J. Cheng, A. Wai-Chee Fu, and J. Liu. K-Isomorphism: Privacy Preserving Network Publication against Structural Attacks. In SIGMOD 2010.

• [Cormode, PVLDB 08] G. Cormode, D. Srivastava, T. Yu, and Q. Zhang: Anonymizing bipartite graph data using safe groupings. In PVLDB 2008.

• [Cormode, PVLDB 09] G. Cormode and D. Srivastava and S. Bhagat and B. Krishnamurthy. Class-based graph anonymization for social network data. In PVLDB 2009.

• [Danezis, AISec 09] G. Danezis. Inferring privacy policies for social networking services. In AISec 2009.

• [Dwork, CACM 10] C. Dwork. A firm foundation for privacy. In CACM 2010.

• [Dwork, STOC 09] C. Dwork and J. Lei. Differential privacy and robust statistics. In STOC 2009.

Page 107: Privacy-Aware Data Management in Information Networks

References (continued)

• [Dwork, TCC 06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC 2006.

• [Fang, WWW 10] L. Fang and K. LeFevre. Privacy wizards for social networking sites. In WWW 2010.

• [Goldenberg, Foundations 10] A. Goldenberg, S. Fienberg, A. Zheng, E. Airoldi. A Survey of Statistical Network Models. In Foundations 2009.

• [Hay, ICDM 09] M. Hay, C. Li, G. Miklau, and D. Jensen. Accurate estimation of the degree distribution of private networks. In ICDM 2009.

• [Hay, Privacy-Aware Knowledge Discovery 10] M. Hay and G. Miklau and D. Jensen. Enabling Accurate Analysis of Private Network Data. In Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques 2010.

• [Hay, PVLDB 08] M. Hay, G. Miklau, D. Jensen, and D. Towsley. Resisting structural identification in anonymized social networks. In PVLDB 2008.

• [Hay, PVLDB 10] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially-private queries through consistency. In PVLDB 2010.

Page 108: Privacy-Aware Data Management in Information Networks

References (continued)

• [Hay, VLDBJ 10] M. Hay and G. Miklau and D. Jensen and D. Towsley and C. Li. In VLDB Journal 2010.

• [Hunter, JASA 08] D. Hunter, S. Goodreau, and M. Handcock. Goodness of fit of social network models. In JASA 2008.

• [Jones, SOUPS 10] S. Jones, E. O’Neill. Feasibility of structural network clustering for group-based privacy control in social networks. In SOUPS 2010.

• [Kifer, SIGMOD 11] D. Kifer and A. Machanavajjhala. No Free Lunch in Data Privacy. In SIGMOD 2011.

• [Krishnamurthy, WWW 09] B. Krishnamurthy, C. Wills. Privacy diffusion on the web: a longitudinal perspective. In WWW 2009.

• [Lindamood, WWW 09] J. Lindamood, R. Heatherly, M. Kantarcioglu, B. Thuraisingham. Inferring private information using social network data. In WWW 2009.

• [Lipford, CHI 10] H. Lipford, J. Watson, M. Whitney, K. Froiland, R. Reeder. Visual vs. compact: a comparison of privacy policy interfaces. In CHI 2010.

Page 109: Privacy-Aware Data Management in Information Networks

References (continued)

• [Liu, ICDM 09] K. Liu and E. Terzi. A framework for computing privacy scores of users in online social networks. In ICDM 2009.

• [Liu, Next Generation Data Mining 08] K. Liu, K. Das, T. Grandison, and H. Kargupta. Privacy-Preserving Data Analysis on Graphs and Social Networks. In Next Generation of Data Mining 2008.

• [Liu, SDM 09] L. Liu and J. Wang and J. Liu and J. Zhang. Privacy Preservation in Social Networks with Sensitive Edge Weights. In SDM 2009.

• [Liu, SIGMOD 08] K. Liu and E. Terzi. Towards identity anonymization on graphs. In SIGMOD 2008.

• [Machanavajjhala, PVLDB 11] A. Machanavajjhala, A. Korolova, and A. Das Sarma. Personalized Social Recommendations -- Accurate or Private? In VLDB 2011

• [Mazzia, CHI 11] A. Mazzia, K. LeFevre, and E. Adar. A tool for privacy comprehension. In CHI 2011.

Page 110: Privacy-Aware Data Management in Information Networks

References (continued)

• [Mislove, WSDM 10] A. Mislove, B. Viswanath, K. Gummadi, P. Druschel. You are who you know: Inferring user profiles in online social networks. In WSDM 2010.

• [Narayanan, OAKL 09] A. Narayanan and V. Shmatikov. De-anonymizing social networks. In Security and Privacy 2009.

• [Nissim, STOC 07] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC 2007.

• [Rastogi, PODS 09] V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship privacy: Output perturbation for queries with joins. In PODS 2009.

• [Seigneur, Trust Management 04] J. Seigneur and C. Damsgaard Jensen. Trading privacy for trust, Trust Management 2004

• [Shehab, WWW 10] M. Shehab, G. Cheek, H. Touati, A. Squicciarini, and P. Cheng. Learning based access control in online social networks. In WWW 2010.

• [Smith, IPAM 10] A. Smith. In IPAM Workshop on Statistical and Learning-Theoretic Challenges in Data Privacy 2010.

Page 111: Privacy-Aware Data Management in Information Networks

References (continued)

• [Squicciarini, WWW 09] A. Squicciarini, M. Shehab, F. Paci. Collective privacy management in social networks. In WWW 2009.

• [Wu, Managing and Mining Graph Data 10] X. Wu, X. Ying, K. Liu, and L. Chen. A Survey of Algorithms for Privacy- Preservation of Graphs and Social Networks. In Managing and Mining Graph Data 2010.

• [Wu, SDM 10] L. Wu and X. Ying and X. Wu. Reconstruction from Randomized Graph via Low Rank Approximation. In SDM 2010.

• [Ying, SDM 08] X. Ying and X. Wu. Randomizing social networks: a spectrum preserving approach. In SDM 2008.

• [Ying, SNA-KDD 09] X. Ying and K. Pan and X. Wu and L. Guo. Comparisons of Randomization and K-degree Anonymization Schemes for Privacy Preserving Social Network Publishing. In PinKDD 2009.

• [Zheleva, PinKDD 07] E. Zheleva and L. Getoor. Preserving the privacy of sensitive relationships in graph data. In PinKDD 2007.

Page 112: Privacy-Aware Data Management in Information Networks

References (continued)

• [Zheleva, WWW 09] E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In WWW 2009.

• [Zhou, ICDE 08] B. Zhou and J. Pei. Preserving privacy in social networks against neighborhood attacks. In ICDE 2009.

• [Zhou, KIS 10] B. Zhou and J. Pei. k-Anonymity and l-Diversity Approaches for Privacy Preservation in Social Networks against Neighborhood Attacks. In Knowledge and Information Systems: An International Journal 2010.

• [Zhou, SIGKDD 08] B. Zhou and J. Pei and W. Luk. A Brief Survey on Anonymization Techniques for Privacy Preserving Publishing of Social Network Data. In SIGKDD 2008.

• [Zou, PVLDB 09] L. Zou, L. Chen, and M. T. A. Ozsu. K-automorphism: A general framework for privacy preserving network publication. In PVLDB 2009.