large graph mining - department of computer science and ... › ~hillol › ngdm07 › abstracts ›...

68
CMU SCS Large Graph Mining Christos Faloutsos CMU

Upload: others

Post on 05-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

CMU SCS

Large Graph Mining

Christos Faloutsos

CMU

Page 2: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 2

CMU SCS

Thank you!

• Hillol Kargupta

Page 3: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 3

CMU SCS

Outline

• Problem definition / Motivation

• Static & dynamic laws; generators

• Tools: CenterPiece graphs; fraud detection

• Conclusions

Page 4: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 4

CMU SCS

Motivation

Data mining: ~ find patterns (rules, outliers)

• Problem#1: How do real graphs look like?

• Problem#2: How do they evolve?

• Problem#3: How to generate realistic graphs

TOOLS

• Problem#4: Who is the ‘master-mind’?

• Problem#5: Fraud detection

Page 5: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 5

CMU SCS

Problem#1: Joint work with

Dr. Deepayan Chakrabarti

(CMU/Yahoo R.L.)

Page 6: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 6

CMU SCS

Graphs - why should we care?

Internet Map

[lumeta.com]

Food Web

[Martinez ’91]

Protein Interactions

[genomebiology.com]

Friendship Network

[Moody ’01]

Page 7: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 7

CMU SCS

Graphs - why should we care?

• IR: bi-partite graphs (doc-terms)

• web: hyper-text graph

• ... and more:

D1

DN

T1

TM

... ...

Page 8: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 8

CMU SCS

Graphs - why should we care?

• network of companies & board-of-directors

members

• ‘viral’ marketing

• web-log (‘blog’) news propagation

• computer network security: email/IP traffic

and anomaly detection

• ....

Page 9: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 9

CMU SCS

Problem #1 - network and graph

mining

• How does the Internet look like?

• How does the web look like?

• What is ‘normal’/‘abnormal’?

• which patterns/laws hold?

Page 10: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 10

CMU SCS

Graph mining

• Are real graphs random?

Page 11: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 11

CMU SCS

Laws and patterns

• Are real graphs random?

• A: NO!!

– Diameter

– in- and out- degree distributions

– other (surprising) patterns

Page 12: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 12

CMU SCS

Solution#1

• Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Page 13: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 13

CMU SCS

Solution#1’: Eigen Exponent E

• A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

Page 14: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 14

CMU SCS

But:

How about graphs from other domains?

Page 15: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 15

CMU SCS

More power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(in-degree)

log(count)

Zipf

userssites

``ebay’’

Page 16: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 16

CMU SCS

epinions.com

• who-trusts-whom

[Richardson +

Domingos, KDD

2001]

(out) degree

count

trusts-2000-people user

Page 17: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 17

CMU SCS

Motivation

Data mining: ~ find patterns (rules, outliers)

• Problem#1: How do real graphs look like?

• Problem#2: How do they evolve?

• Problem#3: How to generate realistic graphs

TOOLS

• Problem#4: Who is the ‘master-mind’?

• Problem#5: Fraud detection

Page 18: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 18

CMU SCS

Problem#2: Time evolution

• with Jure Leskovec

(CMU/MLD)

• and Jon Kleinberg (Cornell –

sabb. @ CMU)

Page 19: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 19

CMU SCS

Evolution of the Diameter

• Prior work on Power Law graphs hints

at slowly growing diameter:

– diameter ~ O(log N)

– diameter ~ O(log log N)

• What is happening in real data?

Page 20: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 20

CMU SCS

Evolution of the Diameter

• Prior work on Power Law graphs hints

at slowly growing diameter:

– diameter ~ O(log N)

– diameter ~ O(log log N)

• What is happening in real data?

• Diameter shrinks over time

Page 21: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 21

CMU SCS

Diameter – ArXiv citation graph

• Citations among

physics papers

• 1992 –2003

• One graph per

year

time [years]

diameter

Page 22: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 22

CMU SCS

Diameter – “Autonomous

Systems”

• Graph of Internet

• One graph per

day

• 1997 – 2000

number of nodes

diameter

Page 23: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 23

CMU SCS

Diameter – “Affiliation Network”

• Graph of

collaborations in

physics – authors

linked to papers

• 10 years of data

time [years]

diameter

Page 24: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 24

CMU SCS

Diameter – “Patents”

• Patent citation

network

• 25 years of data

time [years]

diameter

Page 25: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 25

CMU SCS

Temporal Evolution of the Graphs

• N(t) … nodes at time t

• E(t) … edges at time t

• Suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

Page 26: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 26

CMU SCS

Temporal Evolution of the Graphs

• N(t) … nodes at time t

• E(t) … edges at time t

• Suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

• A: over-doubled!

– But obeying the ``Densification Power Law’’

Page 27: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 27

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:

– 29,555 papers, 352,807 citations

N(t)

E(t)

??

Page 28: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 28

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:

– 29,555 papers, 352,807 citations

N(t)

E(t)

1.69

Page 29: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 29

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:

– 29,555 papers, 352,807 citations

N(t)

E(t)

1.69

1: tree

Page 30: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 30

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:

– 29,555 papers, 352,807 citations

N(t)

E(t)

1.69clique: 2

Page 31: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 31

CMU SCS

Densification – Patent Citations

• Citations among

patents granted

• 1999

– 2.9 million nodes

– 16.5 million

edges

• Each year is a

datapointN(t)

E(t)

1.66

Page 32: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 32

CMU SCS

Densification – Autonomous Systems

• Graph of

Internet

• 2000

– 6,000 nodes

– 26,000 edges

• One graph per

day

N(t)

E(t)

1.18

Page 33: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 33

CMU SCS

Densification – Affiliation

Network

• Authors linked

to their

publications

• 2002

– 60,000 nodes

• 20,000 authors

• 38,000 papers

– 133,000 edgesN(t)

E(t)

1.15

Page 34: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 34

CMU SCS

Motivation

Data mining: ~ find patterns (rules, outliers)

• Problem#1: How do real graphs look like?

• Problem#2: How do they evolve?

• Problem#3: How to generate realistic graphs

TOOLS

• Problem#4: Who is the ‘master-mind’?

• Problem#5: Fraud detection

Page 35: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 35

CMU SCS

Problem#3: Generation

• Given a growing graph with count of nodes N1,

N2, …

• Generate a realistic sequence of graphs that will

obey all the patterns

Page 36: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 36

CMU SCS

Problem Definition

• Given a growing graph with count of nodes N1, N2, …

• Generate a realistic sequence of graphs that will obey all the patterns

– Static PatternsPower Law Degree Distribution

Power Law eigenvalue and eigenvector distribution

Small Diameter

– Dynamic PatternsGrowth Power Law

Shrinking/Stabilizing Diameters

Page 37: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 37

CMU SCS

Problem Definition

• Given a growing graph with count of nodes

N1, N2, …

• Generate a realistic sequence of graphs that

will obey all the patterns

• Idea: Self-similarity

– Leads to power laws

– Communities within communities

– …

Page 38: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 38

CMU SCS

Adjacency matrix

Kronecker Product – a Graph

Intermediate stage

Adjacency matrix

Page 39: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 39

CMU SCS

Kronecker Product – a Graph

• Continuing multiplying with G1 we obtain G4 and

so on …

G4adjacency matrix

Page 40: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 40

CMU SCS

Kronecker Product – a Graph

• Continuing multiplying with G1 we obtain G4 and

so on …

G4adjacency matrix

Page 41: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 41

CMU SCS

Kronecker Product – a Graph

• Continuing multiplying with G1 we obtain G4 and

so on …

G4adjacency matrix

Page 42: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 42

CMU SCS

Properties:

• We can PROVE that

– Degree distribution is multinomial ~ power law

– Diameter: constant

– Eigenvalue distribution: multinomial

– First eigenvector: multinomial

• See [Leskovec+, PKDD’05] for proofs

Page 43: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 43

CMU SCS

Problem Definition

• Given a growing graph with nodes N1, N2, …

• Generate a realistic sequence of graphs that will obey all

the patterns

– Static Patterns

Power Law Degree Distribution

Power Law eigenvalue and eigenvector distribution

Small Diameter

– Dynamic Patterns

Growth Power Law

Shrinking/Stabilizing Diameters

• First and only generator for which we can prove

all these properties

������������

��������

Page 44: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 44

CMU SCS

(Q: how to fit the parm’s?)

A:

• Stochastic version of Kronecker graphs +

• Max likelihood +

• Metropolis sampling

• [Leskovec+, ICML’07]

Page 45: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 45

CMU SCS

Experiments on real AS graphDegree distribution Hop plot

Network valueAdjacency matrix eigen values

Page 46: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 46

CMU SCS

Conclusions

• Kronecker graphs have:

– All the static properties

Heavy tailed degree distributions

Small diameter

Multinomial eigenvalues and eigenvectors

– All the temporal properties

Densification Power Law

Shrinking/Stabilizing Diameters

– We can formally prove these results

����

����

����

����

����

Page 47: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 47

CMU SCS

Motivation

Data mining: ~ find patterns (rules, outliers)

• Problem#1: How do real graphs look like?

• Problem#2: How do they evolve?

• Problem#3: How to generate realistic graphs

TOOLS

• Problem#4: Who is the ‘master-mind’?

• Problem#5: Fraud detection

Page 48: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 48

CMU SCS

Problem#4: MasterMind – ‘CePS’

• w/ Hanghang Tong,

KDD 2006

• htong <at> cs.cmu.edu

Page 49: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 49

CMU SCS

Center-Piece Subgraph(Ceps)

• Given Q query nodes

• Find Center-piece ( )

• App.

– Social Networks

– Law Inforcement, …

• Idea:

– Proximity -> random walk with restarts

A C

B

b≤

Page 50: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 50

CMU SCS

Case Study: AND query

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

Page 51: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 51

CMU SCS

Case Study: AND query

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V.

Jagadish

Laks V.S.

Lakshmanan

Heikki

Mannila

Christos

Faloutsos

Padhraic

Smyth

Corinna

Cortes

15 1013

1 1

6

1 1

4 Daryl

Pregibon

10

2

1

13

1

6

Page 52: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 52

CMU SCS

Case Study: AND query

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V.

Jagadish

Laks V.S.

Lakshmanan

Heikki

Mannila

Christos

Faloutsos

Padhraic

Smyth

Corinna

Cortes

15 1013

1 1

6

1 1

4 Daryl

Pregibon

10

2

1

13

1

6

Page 53: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 53

CMU SCS

Conclusions

• Q1:How to measure the importance?

• A1: RWR+K_SoftAnd

• Q2:How to do it efficiently?

• A2:Graph Partition (Fast CePS)

– ~90% quality

– 150x speedup (ICDM’06)

A C

B

Page 54: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 54

CMU SCS

Motivation

Data mining: ~ find patterns (rules, outliers)

• Problem#1: How do real graphs look like?

• Problem#2: How do they evolve?

• Problem#3: How to generate realistic graphs

TOOLS

• Problem#4: Who is the ‘master-mind’?

• Problem#5: Fraud detection

Page 55: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 55

CMU SCS

E-bay Fraud detection

w/ Polo Chau &

Shashank Pandit, CMU

Page 56: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 56

CMU SCS

E-bay Fraud detection

• lines: positive feedbacks

• would you buy from him/her?

Page 57: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 57

CMU SCS

E-bay Fraud detection

• lines: positive feedbacks

• would you buy from him/her?

• or him/her?

Page 58: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 58

CMU SCS

E-bay Fraud detection - NetProbe

Page 59: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 59

CMU SCS

OVERALL CONCLUSIONS

• Graphs pose a wealth of fascinating

problems

• self-similarity and power laws work,

when textbook methods fail!

• New patterns (shrinking diameter!)

• New generator: Kronecker

Page 60: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 60

CMU SCS

Promising directions

• Reaching out

– Sociology, epidemiology; physics, ++…

– Computer networks, security, intrusion det.

– Num. analysis (tensors)

time

IP-source

IP-destination

Page 61: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 61

CMU SCS

Promising directions – cont’d

• Scaling up, to Gb/Tb/Pb

– Storage Systems

– Parallelism (hadoop/map-reduce)

Page 62: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 62

CMU SCS

E.g.: self-* system @ CMU

• >200 nodes

• 40 racks of computing equipment

• 774kw of power.

• target: 1 PetaByte

• goal: self-correcting, self-securing, self-monitoring, self-...

Page 63: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 63

CMU SCS

DM for Tera- and Peta-bytes

Two-way street:

<- DM can use such infrastructures to find

patterns

-> DM can help such infrastructures become

self-healing, self-adjusting, ‘self-*’

Page 64: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 64

CMU SCS

References

• Hanghang Tong, Christos Faloutsos, and Jia-Yu

Pan Fast Random Walk with Restart and Its

Applications ICDM 2006, Hong Kong.

• Hanghang Tong, Christos Faloutsos Center-Piece

Subgraphs: Problem Definition and Fast

Solutions, KDD 2006, Philadelphia, PA

Page 65: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 65

CMU SCS

References

• Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award).

• Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication(ECML/PKDD 2005), Porto, Portugal, 2005.

Page 66: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 66

CMU SCS

References

• Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA

• Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff,

Alberta, Canada, May 8-12, 2007.

• Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA

Page 67: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 67

CMU SCS

References

• Jimeng Sun, Yinglian Xie, Hui Zhang, Christos

Faloutsos. Less is More: Compact Matrix

Decomposition for Large Sparse Graphs, SDM,

Minneapolis, Minnesota, Apr 2007. [pdf]

Page 68: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU

NGDM 2007 C. Faloutsos 68

CMU SCS

Contact info:

www. cs.cmu.edu /~christos

(w/ papers, datasets, code, etc)