large graph mining – patterns, tools and cascade analysis by christos faloutsos

Download Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos

If you can't read please download the document

Upload: bigmine

Post on 16-Apr-2017

5.758 views

Category:

Technology


0 download

TRANSCRIPT

Mining large graphs

Large Graph Mining - Patterns, Explanations and Cascade AnalysisChristos FaloutsosCMU

CMU SCS

Faloutsos1

Thank you!

Wei FANAlbert BIFETQiang YANGPhilip YU

KDD'13 BIGMINE(c) 2013, C. Faloutsos2

CMU SCS

Faloutsos2

(c) 2013, C. Faloutsos3RoadmapIntroduction MotivationWhy study (big) graphs?Part#1: Patterns in graphsPart#2: Cascade analysisConclusions[Extra: ebay fraud; tensors; spikes]

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos4Graphs - why should we care?

Internet Map [lumeta.com]Food Web [Martinez 91]>$10B revenue>0.5B usersKDD'13 BIGMINE

CMU SCS

Faloutsos4

(c) 2013, C. Faloutsos5Graphs - why should we care?web-log (blog) news propagationcomputer network security: email/IP traffic and anomaly detectionRecommendation systems....

Many-to-many db relationship -> graphKDD'13 BIGMINE

CMU SCS

Faloutsos5

(c) 2013, C. Faloutsos6RoadmapIntroduction MotivationPart#1: Patterns in graphsStatic graphsTime-evolving graphsWhy so many power-laws?Part#2: Cascade analysisConclusions

KDD'13 BIGMINE

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos7Part 1:Patterns & Laws

CMU SCS

(c) 2013, C. Faloutsos8Laws and patternsQ1: Are real graphs random?KDD'13 BIGMINE

CMU SCS

Faloutsos8

(c) 2013, C. Faloutsos9Laws and patternsQ1: Are real graphs random?A1: NO!!Diameterin- and out- degree distributionsother (surprising) patternsQ2: why no good cuts?A2:

So, lets look at the dataKDD'13 BIGMINE

CMU SCS

Faloutsos9

(c) 2013, C. Faloutsos10Solution# S.1Power law in the degree distribution [SIGCOMM99]

log(rank)log(degree)internet domains

att.com

ibm.com

KDD'13 BIGMINE

CMU SCS

Faloutsos10

(c) 2013, C. Faloutsos11Solution# S.1Power law in the degree distribution [SIGCOMM99]

log(rank)log(degree)

-0.82internet domains

att.com

ibm.com

KDD'13 BIGMINE

CMU SCS

Faloutsos11

(c) 2013, C. Faloutsos12Solution# S.1Q: So what?

log(rank)log(degree)

-0.82internet domains

att.com

ibm.com

KDD'13 BIGMINE

CMU SCS

Faloutsos12

(c) 2013, C. Faloutsos13Solution# S.1Q: So what?A1: # of two-step-away pairs:

log(rank)log(degree)

-0.82internet domains

att.com

ibm.com

KDD'13 BIGMINE

= friends of friends (F.O.F.)

CMU SCS

Faloutsos13

(c) 2013, C. Faloutsos14Solution# S.1Q: So what?A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2

log(rank)log(degree)

-0.82internet domains

att.com

ibm.com

KDD'13 BIGMINE~0.8PB ->a data center(!)

DCO @ CMUGaussian trap= friends of friends (F.O.F.)

CMU SCS

Faloutsos14

(c) 2013, C. Faloutsos15Solution# S.1Q: So what?A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2

log(rank)log(degree)

-0.82internet domains

att.com

ibm.com

KDD'13 BIGMINE~0.8PB ->a data center(!)

Such patterns ->New algorithmsGaussian trap

CMU SCS

Faloutsos15

(c) 2013, C. Faloutsos16Solution# S.2: Eigen Exponent EA2: power law in the eigenvalues of the adjacency matrixE = -0.48Exponent = slopeEigenvalueRank of decreasing eigenvalueMay 2001

KDD'13 BIGMINEA x = l x

CMU SCS

(c) 2013, C. Faloutsos17RoadmapIntroduction MotivationProblem#1: Patterns in graphsStatic graphs degree, diameter, eigen, TrianglesTime evolving graphsProblem#2: Tools

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos18Solution# S.3: Triangle LawsReal social networks have a lot of triangles

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos19Solution# S.3: Triangle LawsReal social networks have a lot of trianglesFriends of friends are friends Any patterns?2x the friends, 2x the triangles ?

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos20Triangle Law: #S.3 [Tsourakakis ICDM 2008]SNReutersEpinionsX-axis: degreeY-axis: mean # trianglesn friends -> ~n1.6 triangles

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos21Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute(3-way join; several approx. algos) O(dmax2)Q: Can we do that quickly?A:detailsKDD'13 BIGMINE

CMU SCS

Faloutsos21

(c) 2013, C. Faloutsos22Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute(3-way join; several approx. algos) O(dmax2)Q: Can we do that quickly?A: Yes!#triangles = 1/6 Sum ( li3 ) (and, because of skewness (S2) , we only need the top few eigenvalues! - O(E)

detailsKDD'13 BIGMINE

A x = l x

CMU SCS

Faloutsos22

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]23

KDD'13 BIGMINE23(c) 2013, C. Faloutsos

???

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]24

KDD'13 BIGMINE24(c) 2013, C. Faloutsos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]25

KDD'13 BIGMINE25(c) 2013, C. Faloutsos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]26

KDD'13 BIGMINE26(c) 2013, C. Faloutsos

CMU SCS

(c) 2013, C. Faloutsos27RoadmapIntroduction MotivationPart#1: Patterns in graphsStatic graphsPower law degrees; eigenvalues; trianglesAnti-pattern: NO good cuts!Time-evolving graphs.Conclusions

KDD'13 BIGMINE

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos28Background: Graph cut problemGiven a graph, and kBreak it into k (disjoint) communities

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos29

Graph cut problemGiven a graph, and kBreak it into k (disjoint) communities(assume: block diagonal = cavemen graph)

k = 2

CMU SCS

Many algos for graph partitioningMETIS [Karypis, Kumar +]2nd eigenvector of LaplacianModularity-based [Girwan+Newman]Max flow [Flake+]KDD'13 BIGMINE(c) 2013, C. Faloutsos30

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos31Strange behavior of min cutsSubtle details: nextPreliminaries: min-cut plots of usual graphsNetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and PrivacyStatistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. WWW 2008.

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos32Min-cut plotDo min-cuts recursively.

log (# edges)log (mincut-size / #edges)

N nodesMincut size = sqrt(N)

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos33Min-cut plotDo min-cuts recursively.

log (# edges)log (mincut-size / #edges)

N nodes

New min-cut

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos34Min-cut plotDo min-cuts recursively.

log (# edges)log (mincut-size / #edges)

N nodes

New min-cut

Slope = -0.5

Bettercut

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos35Min-cut plot

log (# edges)log (mincut-size / #edges)

Slope = -1/dFor a d-dimensional grid, the slope is -1/d

log (# edges)log (mincut-size / #edges)

For a random graph(and clique), the slope is 0

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos36ExperimentsDatasets:Google Web Graph: 916,428 nodes and 5,105,039 edgesLucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edgesUser Website Clickstream Graph: 222,704 nodes and 952,580 edgesNetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos37Min-cut plotWhat does it look like for a real-world graph?

log (# edges)log (mincut-size / #edges)

?

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos38ExperimentsUsed the METIS algorithm [Karypis, Kumar, 1995]

log (# edges)log (mincut-size / #edges) Google Web graph Values along the y-axis are averaged lip for large # edges Slope of -0.4, corresponds to a 2.5-dimensional grid!

Slope~ -0.4

log (# edges)log (mincut-size / #edges)

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos39ExperimentsUsed the METIS algorithm [Karypis, Kumar, 1995]

log (# edges)log (mincut-size / #edges) Google Web graph Values along the y-axis are averaged lip for large # edges Slope of -0.4, corresponds to a 2.5-dimensional grid!

Slope~ -0.4

log (# edges)log (mincut-size / #edges)

Bettercut

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos40ExperimentsSame results for other graphs too

log (# edges)log (# edges)log (mincut-size / #edges)log (mincut-size / #edges)Lucent Router graphClickstream graph

Slope~ -0.57

Slope~ -0.45

CMU SCS

Why no good cuts?Answer: self-similarity (few foils later)KDD'13 BIGMINE(c) 2013, C. Faloutsos41

CMU SCS

(c) 2013, C. Faloutsos42RoadmapIntroduction MotivationPart#1: Patterns in graphsStatic graphsTime-evolving graphsWhy so many power-laws?Part#2: Cascade analysisConclusions

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos43Problem: Time evolutionwith Jure Leskovec (CMU -> Stanford)

and Jon Kleinberg (Cornell sabb. @ CMU)

KDD'13 BIGMINEJure Leskovec, Jon Kleinberg and Christos Faloutsos: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005

CMU SCS

Faloutsos43

(c) 2013, C. Faloutsos44T.1 Evolution of the DiameterPrior work on Power Law graphs hints at slowly growing diameter:[diameter ~ O( N1/3)]diameter ~ O(log N)diameter ~ O(log log N)What is happening in real data?KDD'13 BIGMINE

diameter

CMU SCS

Faloutsos44Diameter first, DPL secondCheck diameter formulasAs the network grows the distances between nodes slowly grow

(c) 2013, C. Faloutsos45T.1 Evolution of the DiameterPrior work on Power Law graphs hints at slowly growing diameter:[diameter ~ O( N1/3)]diameter ~ O(log N)diameter ~ O(log log N)What is happening in real data?Diameter shrinks over time

KDD'13 BIGMINE

CMU SCS

Faloutsos45

(c) 2013, C. Faloutsos46

T.1 Diameter PatentsPatent citation network25 years of [email protected] M nodes16.5 M edges

time [years]diameterKDD'13 BIGMINE

CMU SCS

Faloutsos46

(c) 2013, C. Faloutsos47T.2 Temporal Evolution of the GraphsN(t) nodes at time tE(t) edges at time tSuppose thatN(t+1) = 2 * N(t)Q: what is your guess for E(t+1) =? 2 * E(t)KDD'13 BIGMINESay, k friends on average

k

CMU SCS

Faloutsos47

(c) 2013, C. Faloutsos48T.2 Temporal Evolution of the GraphsN(t) nodes at time tE(t) edges at time tSuppose thatN(t+1) = 2 * N(t)Q: what is your guess for E(t+1) =? 2 * E(t)A: over-doubled! ~ 3xBut obeying the ``Densification Power Law

KDD'13 BIGMINESay, k friends on average

Gaussian trap

CMU SCS

Faloutsos48

(c) 2013, C. Faloutsos49T.2 Temporal Evolution of the GraphsN(t) nodes at time tE(t) edges at time tSuppose thatN(t+1) = 2 * N(t)Q: what is your guess for E(t+1) =? 2 * E(t)A: over-doubled! ~ 3xBut obeying the ``Densification Power Law

KDD'13 BIGMINESay, k friends on average

Gaussian trap

logloglinlin

CMU SCS

Faloutsos49

(c) 2013, C. Faloutsos50T.2 Densification Patent CitationsCitations among patents [email protected] M nodes16.5 M edgesEach year is a datapoint

N(t)E(t)

1.66KDD'13 BIGMINE

CMU SCS

Faloutsos50

MORE Graph PatternsKDD'13 BIGMINE(c) 2013, C. Faloutsos51

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD09.

CMU SCS

MORE Graph PatternsKDD'13 BIGMINE(c) 2013, C. Faloutsos52

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD09.

CMU SCS

MORE Graph PatternsKDD'13 BIGMINE(c) 2013, C. Faloutsos53

Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics (Ed.: Charu Aggarwal)

Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.

CMU SCS

(c) 2013, C. Faloutsos54RoadmapIntroduction MotivationPart#1: Patterns in graphsWhy so many power-laws?Why no good cuts?Part#2: Cascade analysisConclusions

KDD'13 BIGMINE

CMU SCS

Faloutsos54

2 Questions, one answerQ1: why so many power lawsQ2: why no good cuts?KDD'13 BIGMINE(c) 2013, C. Faloutsos55

CMU SCS

2 Questions, one answerQ1: why so many power lawsQ2: why no good cuts?A: Self-similarity = fractals = RMAT ~ Kronecker graphs

KDD'13 BIGMINE(c) 2013, C. Faloutsos56possible

CMU SCS

20 intro to fractalsRemove the middle triangle; repeat-> Sierpinski triangle(Bonus question - dimensionality?>1 (inf. perimeter (4/3) ) no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn(r) nn(r) = C r log3/log2

CMU SCS

20 intro to fractals

KDD'13 BIGMINE(c) 2013, C. Faloutsos

59Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn(r) nn(r) = C r log3/log2

CMU SCS

20 intro to fractals

KDD'13 BIGMINE(c) 2013, C. Faloutsos60Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2

N(t)E(t)

1.66Reminder:Densification P.L.(2x nodes, ~3x edges)

CMU SCS

20 intro to fractals

KDD'13 BIGMINE(c) 2013, C. Faloutsos61Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2

2x the radius,4x neighborsnn = C r log4/log2 = C r 2

CMU SCS

20 intro to fractals

KDD'13 BIGMINE(c) 2013, C. Faloutsos62Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2

2x the radius,4x neighborsnn = C r log4/log2 = C r 2 Fractal dim.

=1.58

CMU SCS

20 intro to fractals

KDD'13 BIGMINE(c) 2013, C. Faloutsos63Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2

2x the radius,4x neighborsnn = C r log4/log2 = C r 2

Fractal dim.

CMU SCS

How does self-similarity help in graphs?A: RMAT/Kronecker generatorsWith self-similarity, we get all power-laws, automatically,And small/shrinking diameterAnd `no good cuts

KDD'13 BIGMINE(c) 2013, C. Faloutsos64R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USARealistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication,by J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, in PKDD 2005, Porto, Portugal

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos65Graph gen.: Problem dfnGiven a growing graph with count of nodes N1, N2, Generate a realistic sequence of graphs that will obey all the patterns Static PatternsS1 Power Law Degree DistributionS2 Power Law eigenvalue and eigenvector distribution Small DiameterDynamic PatternsT2 Growth Power Law (2x nodes; 3x edges)T1 Shrinking/Stabilizing Diameters

CMU SCS

65

KDD'13 BIGMINE(c) 2013, C. Faloutsos66

Adjacency matrixKronecker Graphs

Intermediate stage

Adjacency matrix

CMU SCS

66

KDD'13 BIGMINE(c) 2013, C. Faloutsos67

Adjacency matrixKronecker Graphs

Intermediate stage

Adjacency matrix

CMU SCS

67

KDD'13 BIGMINE(c) 2013, C. Faloutsos68

Adjacency matrixKronecker Graphs

Intermediate stageAdjacency matrix

CMU SCS

68

KDD'13 BIGMINE(c) 2013, C. Faloutsos69Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos70Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos71Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos72Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

Holes within holes;Communities within communities

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos73Properties:We can PROVE thatDegree distribution is multinomial ~ power lawDiameter: constantEigenvalue distribution: multinomialFirst eigenvector: multinomialnewSelf-similarity -> power laws

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos74Problem DefinitionGiven a growing graph with nodes N1, N2, Generate a realistic sequence of graphs that will obey all the patterns Static PatternsPower Law Degree DistributionPower Law eigenvalue and eigenvector distributionSmall DiameterDynamic PatternsGrowth Power LawShrinking/Stabilizing DiametersFirst generator for which we can prove all these properties

CMU SCS

74

Impact: Graph500Based on RMAT (= 2x2 Kronecker)Standard for graph benchmarkshttp://www.graph500.org/Competitions 2x year, with all major entities: LLNL, Argonne, ITC-U. Tokyo, Riken, ORNL, Sandia, PSC, KDD'13 BIGMINE(c) 2013, C. Faloutsos75R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USATo iterate is human, to recurse is devine

CMU SCS

(c) 2013, C. Faloutsos76RoadmapIntroduction MotivationPart#1: Patterns in graphsQ1: Why so many power-laws?Q2: Why no good cuts?Part#2: Cascade analysisConclusions

KDD'13 BIGMINE

A: real graphs -> self similar -> power laws

CMU SCS

Q2: Why no good cuts?A: self-similarityCommunities within communities within communities KDD'13 BIGMINE(c) 2013, C. Faloutsos77

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos78Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

REMINDER

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos79Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

REMINDERCommunities within communities within communities

Linux usersMac userswin users

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos80Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on

G4 adjacency matrix

REMINDERCommunities within communities within communities

How many Communities?3?9?27?

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos81Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on G4 adjacency matrix

REMINDERCommunities within communities within communities

How many Communities?3?9?27?A: one butnot a typical, block-likecommunity

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos82

Communities?(Gaussian)Clusters?Piece-wise flat parts?

age# songs

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos83

Wrong questions to ask!age# songs

CMU SCS

Summary of Part#1*many* patterns in real graphsSmall & shrinking diametersPower-laws everywhereGaussian trapno good cutsSelf-similarity (RMAT/Kronecker): good model

KDD'13 BIGMINE(c) 2013, C. Faloutsos84

CMU SCS

KDD'13 BIGMINE(c) 2013, C. Faloutsos85Part 2:Cascades & Immunization

CMU SCS

Why do we care?Information DiffusionViral MarketingEpidemiology and Public HealthCyber SecurityHuman mobility Games and Virtual Worlds Ecology........

(c) 2013, C. Faloutsos86

KDD'13 BIGMINE

CMU SCS

(c) 2013, C. Faloutsos87RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) ImmunizationEpidemic thresholdsConclusions

KDD'13 BIGMINE

CMU SCS

Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos

SDM 2013, Austin, TX

(c) 2013, C. Faloutsos88KDD'13 BIGMINE

CMU SCS

88

Whom to immunize?Dynamical Processes over networks

Each circle is a hospital ~3,000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005]Problem: Given k units of disinfectant, whom to immunize?(c) 2013, C. Faloutsos89KDD'13 BIGMINE

CMU SCS

89

Whom to immunize?

CURRENT PRACTICEOUR METHOD[US-MEDICARE NETWORK 2005]

~6x fewer!(c) 2013, C. Faloutsos90KDD'13 BIGMINEHospital-acquired inf. : 99K+ lives, $5B+ per year

CMU SCS

90

Fractional Asymmetric Immunization

HospitalAnother Hospital

Drug-resistant Bacteria (like XDR-TB) (c) 2013, C. Faloutsos91KDD'13 BIGMINE

CMU SCS

91

Fractional Asymmetric Immunization

HospitalAnother Hospital

(c) 2013, C. Faloutsos92KDD'13 BIGMINE

CMU SCS

92

Fractional Asymmetric Immunization

HospitalAnother Hospital

(c) 2013, C. Faloutsos93KDD'13 BIGMINE

CMU SCS

93

Fractional Asymmetric Immunization

HospitalAnother Hospital

Problem: Given k units of disinfectant, distribute them to maximize hospitals saved

(c) 2013, C. Faloutsos94KDD'13 BIGMINE

CMU SCS

94

Fractional Asymmetric Immunization

HospitalAnother Hospital

Problem: Given k units of disinfectant, distribute them to maximize hospitals saved @ 365 days

(c) 2013, C. Faloutsos95KDD'13 BIGMINE

CMU SCS

95

Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos96

CMU SCS

Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos97

CMU SCS

Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos98

CMU SCS

Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos99

CMU SCS

Running TimeSimulationsSMART-ALLOC> 1 weekWall-Clock Time

14 secs> 30,000x speed-up!better(c) 2013, C. Faloutsos100KDD'13 BIGMINE

CMU SCS

100

Experiments

K = 120better(c) 2013, C. Faloutsos101KDD'13 BIGMINE# epochs# infecteduniformSMART-ALLOC

CMU SCS

What is the silver bullet?A: Try to decrease connectivity of graph

Q: how to measure connectivity?Avg degree? Max degree?Std degree / avg degree ?Diameter?Modularity?Conductance (~min cut size)?Some combination of above?KDD'13 BIGMINE(c) 2013, C. Faloutsos102

14 secs> 30,000x speed-up!

CMU SCS

What is the silver bullet?A: Try to decrease connectivity of graph

Q: how to measure connectivity?A: first eigenvalue of adjacency matrix

Q1: why??(Q2: dfn & intuition of eigenvalue ? )KDD'13 BIGMINE(c) 2013, C. Faloutsos103Avg degreeMax degreeDiameterModularityConductance

CMU SCS

Why eigenvalue?A1: G2 theorem and eigen-drop:

For (almost) any type of virusFor any network-> no epidemic, if small-enough first eigenvalue (1 ) of adjacency matrix

Heuristic: for immunization, try to min 1The smaller 1, the closer to extinction.KDD'13 BIGMINE(c) 2013, C. Faloutsos104Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks, B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos, ICDM 2011, Vancouver, Canada

CMU SCS

Why eigenvalue?A1: G2 theorem and eigen-drop:

For (almost) any type of virusFor any network-> no epidemic, if small-enough first eigenvalue (1 ) of adjacency matrix

Heuristic: for immunization, try to min 1The smaller 1, the closer to extinction.KDD'13 BIGMINE(c) 2013, C. Faloutsos105

CMU SCS

Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos FaloutsosIEEE ICDM 2011, Vancouver

extended version, in arxivhttp://arxiv.org/abs/1004.0060

G2 theorem~10 pages proof

CMU SCS

106

Our thresholds for some modelss = effective strengths < 1 : below threshold

(c) 2013, C. Faloutsos107KDD'13 BIGMINEModelsEffective Strength (s)Threshold (tipping point)SIS, SIR, SIRS, SEIRs = .

s = 1 SIV, SEIVs = . (H.I.V.)s = .

CMU SCS

107

Our thresholds for some modelss = effective strengths < 1 : below threshold

(c) 2013, C. Faloutsos108KDD'13 BIGMINEModelsEffective Strength (s)Threshold (tipping point)SIS, SIR, SIRS, SEIRs = .

s = 1 SIV, SEIVs = . (H.I.V.)s = .

No immunityTemp.immunityw/ incubation

CMU SCS

108

(c) 2013, C. Faloutsos109RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) Immunizationintuition behind 1Conclusions

KDD'13 BIGMINE

CMU SCS

Intuition for Official definitions:Let A be the adjacency matrix. Then is the root with the largest magnitude of the characteristic polynomial of A [det(A xI)].Also: A x = l x

Neither gives much intuition!

Un-official Intuition For homogeneous graphs, == degree

~ avg degreedone right, for skewed degree distributions(c) 2013, C. Faloutsos110KDD'13 BIGMINE

CMU SCS

110

Largest Eigenvalue ()

2 = N = N-1

N = 1000 nodes 2= 31.67= 999better connectivity higher (c) 2013, C. Faloutsos111KDD'13 BIGMINE

CMU SCS

111

Largest Eigenvalue ()

2 = N = N-1

N = 1000 nodes 2= 31.67= 999better connectivity higher (c) 2013, C. Faloutsos112KDD'13 BIGMINE

CMU SCS

112

Examples: Simulations SIR (mumps)

Fraction of InfectionsFootprintEffective StrengthTime ticks(c) 2013, C. Faloutsos113KDD'13 BIGMINE

(a) Infection profile (b) Take-off plotPORTLAND graph: synthetic population, 31 million links, 6 million nodes

CMU SCS

113

Examples: Simulations SIRS (pertusis)

Fraction of InfectionsFootprintEffective StrengthTime ticks(c) 2013, C. Faloutsos114KDD'13 BIGMINE

(a) Infection profile (b) Take-off plotPORTLAND graph: synthetic population, 31 million links, 6 million nodes

CMU SCS

114

Immunization - conclusionIn (almost any) immunization setting,Allocate resources, such that toMinimize 1(regardless of virus specifics)

Conversely, in a market penetration settingAllocate resources toMaximize 1KDD'13 BIGMINE(c) 2013, C. Faloutsos115

CMU SCS

(c) 2013, C. Faloutsos116RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) ImmunizationEpidemic thresholdsWhat next?Acks & Conclusions[Tools: ebay fraud; tensors; spikes]

KDD'13 BIGMINE

CMU SCS

Challenge #1: Connectome brain wiringKDD'13 BIGMINE(c) 2013, C. Faloutsos117Which neurons get activated by beeHow wiring evolvesModeling epilepsy

N. SidiropoulosGeorge KarypisV. PapalexakisTom Mitchell

CMU SCS

Challenge#2: Time evolving networks / tensorsPeriodicities? Burstiness?What is typical behavior of a node, over timeHeterogeneous graphs (= nodes w/ attributes)KDD'13 BIGMINE(c) 2013, C. Faloutsos118

CMU SCS

(c) 2013, C. Faloutsos119RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) ImmunizationEpidemic thresholdsAcks & Conclusions[Tools: ebay fraud; tensors; spikes]

KDD'13 BIGMINE

Off line

CMU SCS

(c) 2013, C. Faloutsos120ThanksKDD'13 BIGMINEThanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

CMU SCS

Faloutsos et al8/10/13120

(c) 2013, C. Faloutsos121ThanksKDD'13 BIGMINEThanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLabDisclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies

CMU SCS

Faloutsos et al8/10/13121

(c) 2013, C. Faloutsos122Project info: PEGASUS

KDD'13 BIGMINEwww.cs.cmu.edu/~pegasusResults on large graphs: with Pegasus + hadoop + M45Apache licenseCode, papers, manual, videoProf. U Kang

Prof. Polo Chau

CMU SCS

Faloutsos et al8/10/13122

(c) 2013, C. Faloutsos123Cast

Akoglu, LemanChau, PoloKang, UMcGlohon, Mary

Tong, Hanghang

Prakash,AdityaKDD'13 BIGMINE

Koutra,Danai

Beutel,AlexPapalexakis,Vagelis

CMU SCS

Faloutsos et al8/10/13123

(c) 2013, C. Faloutsos124CONCLUSION#1 Big dataLarge datasets reveal patterns/outliers that are invisible otherwiseKDD'13 BIGMINE

CMU SCS

Faloutsos124

(c) 2013, C. Faloutsos125CONCLUSION#2 self-similaritypowerful tool / viewpointPower laws; shrinking diametersGaussian trap (eg., F.O.F.)no good cutsRMAT graph500 generatorKDD'13 BIGMINE

CMU SCS

Faloutsos125

(c) 2013, C. Faloutsos126CONCLUSION#3 eigen-dropCascades & immunization: G2 theorem & eigenvalueKDD'13 BIGMINE

CURRENT PRACTICEOUR METHOD[US-MEDICARE NETWORK 2005]

~6x fewer!

14 secs> 30,000x speed-up!

CMU SCS

Faloutsos126

(c) 2013, C. Faloutsos127ReferencesD. Chakrabarti, C. Faloutsos: Graph Mining Laws, Tools and Case Studies, Morgan Claypool 2012http://www.morganclaypool.com/doi/abs/10.2200/S00449ED1V01Y201209DMK006

KDD'13 BIGMINE

CMU SCS

Faloutsos127

TAKE HOME MESSAGE:Cross-disciplinarityKDD'13 BIGMINE(c) 2013, C. Faloutsos128

CMU SCS

TAKE HOME MESSAGE:Cross-disciplinarityKDD'13 BIGMINE(c) 2013, C. Faloutsos129

QUESTIONS?

CMU SCS

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04