large graph mining – patterns, tools and cascade analysis by christos faloutsos
TRANSCRIPT
Mining large graphs
Large Graph Mining - Patterns, Explanations and Cascade AnalysisChristos FaloutsosCMU
CMU SCS
Faloutsos1
Thank you!
Wei FANAlbert BIFETQiang YANGPhilip YU
KDD'13 BIGMINE(c) 2013, C. Faloutsos2
CMU SCS
Faloutsos2
(c) 2013, C. Faloutsos3RoadmapIntroduction MotivationWhy study (big) graphs?Part#1: Patterns in graphsPart#2: Cascade analysisConclusions[Extra: ebay fraud; tensors; spikes]
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos4Graphs - why should we care?
Internet Map [lumeta.com]Food Web [Martinez 91]>$10B revenue>0.5B usersKDD'13 BIGMINE
CMU SCS
Faloutsos4
(c) 2013, C. Faloutsos5Graphs - why should we care?web-log (blog) news propagationcomputer network security: email/IP traffic and anomaly detectionRecommendation systems....
Many-to-many db relationship -> graphKDD'13 BIGMINE
CMU SCS
Faloutsos5
(c) 2013, C. Faloutsos6RoadmapIntroduction MotivationPart#1: Patterns in graphsStatic graphsTime-evolving graphsWhy so many power-laws?Part#2: Cascade analysisConclusions
KDD'13 BIGMINE
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos7Part 1:Patterns & Laws
CMU SCS
(c) 2013, C. Faloutsos8Laws and patternsQ1: Are real graphs random?KDD'13 BIGMINE
CMU SCS
Faloutsos8
(c) 2013, C. Faloutsos9Laws and patternsQ1: Are real graphs random?A1: NO!!Diameterin- and out- degree distributionsother (surprising) patternsQ2: why no good cuts?A2:
So, lets look at the dataKDD'13 BIGMINE
CMU SCS
Faloutsos9
(c) 2013, C. Faloutsos10Solution# S.1Power law in the degree distribution [SIGCOMM99]
log(rank)log(degree)internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
Faloutsos10
(c) 2013, C. Faloutsos11Solution# S.1Power law in the degree distribution [SIGCOMM99]
log(rank)log(degree)
-0.82internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
Faloutsos11
(c) 2013, C. Faloutsos12Solution# S.1Q: So what?
log(rank)log(degree)
-0.82internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
Faloutsos12
(c) 2013, C. Faloutsos13Solution# S.1Q: So what?A1: # of two-step-away pairs:
log(rank)log(degree)
-0.82internet domains
att.com
ibm.com
KDD'13 BIGMINE
= friends of friends (F.O.F.)
CMU SCS
Faloutsos13
(c) 2013, C. Faloutsos14Solution# S.1Q: So what?A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)log(degree)
-0.82internet domains
att.com
ibm.com
KDD'13 BIGMINE~0.8PB ->a data center(!)
DCO @ CMUGaussian trap= friends of friends (F.O.F.)
CMU SCS
Faloutsos14
(c) 2013, C. Faloutsos15Solution# S.1Q: So what?A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)log(degree)
-0.82internet domains
att.com
ibm.com
KDD'13 BIGMINE~0.8PB ->a data center(!)
Such patterns ->New algorithmsGaussian trap
CMU SCS
Faloutsos15
(c) 2013, C. Faloutsos16Solution# S.2: Eigen Exponent EA2: power law in the eigenvalues of the adjacency matrixE = -0.48Exponent = slopeEigenvalueRank of decreasing eigenvalueMay 2001
KDD'13 BIGMINEA x = l x
CMU SCS
(c) 2013, C. Faloutsos17RoadmapIntroduction MotivationProblem#1: Patterns in graphsStatic graphs degree, diameter, eigen, TrianglesTime evolving graphsProblem#2: Tools
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos18Solution# S.3: Triangle LawsReal social networks have a lot of triangles
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos19Solution# S.3: Triangle LawsReal social networks have a lot of trianglesFriends of friends are friends Any patterns?2x the friends, 2x the triangles ?
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos20Triangle Law: #S.3 [Tsourakakis ICDM 2008]SNReutersEpinionsX-axis: degreeY-axis: mean # trianglesn friends -> ~n1.6 triangles
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos21Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute(3-way join; several approx. algos) O(dmax2)Q: Can we do that quickly?A:detailsKDD'13 BIGMINE
CMU SCS
Faloutsos21
(c) 2013, C. Faloutsos22Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute(3-way join; several approx. algos) O(dmax2)Q: Can we do that quickly?A: Yes!#triangles = 1/6 Sum ( li3 ) (and, because of skewness (S2) , we only need the top few eigenvalues! - O(E)
detailsKDD'13 BIGMINE
A x = l x
CMU SCS
Faloutsos22
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]23
KDD'13 BIGMINE23(c) 2013, C. Faloutsos
???
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]24
KDD'13 BIGMINE24(c) 2013, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]25
KDD'13 BIGMINE25(c) 2013, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD11]26
KDD'13 BIGMINE26(c) 2013, C. Faloutsos
CMU SCS
(c) 2013, C. Faloutsos27RoadmapIntroduction MotivationPart#1: Patterns in graphsStatic graphsPower law degrees; eigenvalues; trianglesAnti-pattern: NO good cuts!Time-evolving graphs.Conclusions
KDD'13 BIGMINE
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos28Background: Graph cut problemGiven a graph, and kBreak it into k (disjoint) communities
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos29
Graph cut problemGiven a graph, and kBreak it into k (disjoint) communities(assume: block diagonal = cavemen graph)
k = 2
CMU SCS
Many algos for graph partitioningMETIS [Karypis, Kumar +]2nd eigenvector of LaplacianModularity-based [Girwan+Newman]Max flow [Flake+]KDD'13 BIGMINE(c) 2013, C. Faloutsos30
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos31Strange behavior of min cutsSubtle details: nextPreliminaries: min-cut plots of usual graphsNetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and PrivacyStatistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. WWW 2008.
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos32Min-cut plotDo min-cuts recursively.
log (# edges)log (mincut-size / #edges)
N nodesMincut size = sqrt(N)
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos33Min-cut plotDo min-cuts recursively.
log (# edges)log (mincut-size / #edges)
N nodes
New min-cut
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos34Min-cut plotDo min-cuts recursively.
log (# edges)log (mincut-size / #edges)
N nodes
New min-cut
Slope = -0.5
Bettercut
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos35Min-cut plot
log (# edges)log (mincut-size / #edges)
Slope = -1/dFor a d-dimensional grid, the slope is -1/d
log (# edges)log (mincut-size / #edges)
For a random graph(and clique), the slope is 0
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos36ExperimentsDatasets:Google Web Graph: 916,428 nodes and 5,105,039 edgesLucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edgesUser Website Clickstream Graph: 222,704 nodes and 952,580 edgesNetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos37Min-cut plotWhat does it look like for a real-world graph?
log (# edges)log (mincut-size / #edges)
?
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos38ExperimentsUsed the METIS algorithm [Karypis, Kumar, 1995]
log (# edges)log (mincut-size / #edges) Google Web graph Values along the y-axis are averaged lip for large # edges Slope of -0.4, corresponds to a 2.5-dimensional grid!
Slope~ -0.4
log (# edges)log (mincut-size / #edges)
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos39ExperimentsUsed the METIS algorithm [Karypis, Kumar, 1995]
log (# edges)log (mincut-size / #edges) Google Web graph Values along the y-axis are averaged lip for large # edges Slope of -0.4, corresponds to a 2.5-dimensional grid!
Slope~ -0.4
log (# edges)log (mincut-size / #edges)
Bettercut
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos40ExperimentsSame results for other graphs too
log (# edges)log (# edges)log (mincut-size / #edges)log (mincut-size / #edges)Lucent Router graphClickstream graph
Slope~ -0.57
Slope~ -0.45
CMU SCS
Why no good cuts?Answer: self-similarity (few foils later)KDD'13 BIGMINE(c) 2013, C. Faloutsos41
CMU SCS
(c) 2013, C. Faloutsos42RoadmapIntroduction MotivationPart#1: Patterns in graphsStatic graphsTime-evolving graphsWhy so many power-laws?Part#2: Cascade analysisConclusions
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos43Problem: Time evolutionwith Jure Leskovec (CMU -> Stanford)
and Jon Kleinberg (Cornell sabb. @ CMU)
KDD'13 BIGMINEJure Leskovec, Jon Kleinberg and Christos Faloutsos: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005
CMU SCS
Faloutsos43
(c) 2013, C. Faloutsos44T.1 Evolution of the DiameterPrior work on Power Law graphs hints at slowly growing diameter:[diameter ~ O( N1/3)]diameter ~ O(log N)diameter ~ O(log log N)What is happening in real data?KDD'13 BIGMINE
diameter
CMU SCS
Faloutsos44Diameter first, DPL secondCheck diameter formulasAs the network grows the distances between nodes slowly grow
(c) 2013, C. Faloutsos45T.1 Evolution of the DiameterPrior work on Power Law graphs hints at slowly growing diameter:[diameter ~ O( N1/3)]diameter ~ O(log N)diameter ~ O(log log N)What is happening in real data?Diameter shrinks over time
KDD'13 BIGMINE
CMU SCS
Faloutsos45
(c) 2013, C. Faloutsos46
T.1 Diameter PatentsPatent citation network25 years of [email protected] M nodes16.5 M edges
time [years]diameterKDD'13 BIGMINE
CMU SCS
Faloutsos46
(c) 2013, C. Faloutsos47T.2 Temporal Evolution of the GraphsN(t) nodes at time tE(t) edges at time tSuppose thatN(t+1) = 2 * N(t)Q: what is your guess for E(t+1) =? 2 * E(t)KDD'13 BIGMINESay, k friends on average
k
CMU SCS
Faloutsos47
(c) 2013, C. Faloutsos48T.2 Temporal Evolution of the GraphsN(t) nodes at time tE(t) edges at time tSuppose thatN(t+1) = 2 * N(t)Q: what is your guess for E(t+1) =? 2 * E(t)A: over-doubled! ~ 3xBut obeying the ``Densification Power Law
KDD'13 BIGMINESay, k friends on average
Gaussian trap
CMU SCS
Faloutsos48
(c) 2013, C. Faloutsos49T.2 Temporal Evolution of the GraphsN(t) nodes at time tE(t) edges at time tSuppose thatN(t+1) = 2 * N(t)Q: what is your guess for E(t+1) =? 2 * E(t)A: over-doubled! ~ 3xBut obeying the ``Densification Power Law
KDD'13 BIGMINESay, k friends on average
Gaussian trap
logloglinlin
CMU SCS
Faloutsos49
(c) 2013, C. Faloutsos50T.2 Densification Patent CitationsCitations among patents [email protected] M nodes16.5 M edgesEach year is a datapoint
N(t)E(t)
1.66KDD'13 BIGMINE
CMU SCS
Faloutsos50
MORE Graph PatternsKDD'13 BIGMINE(c) 2013, C. Faloutsos51
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD09.
CMU SCS
MORE Graph PatternsKDD'13 BIGMINE(c) 2013, C. Faloutsos52
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD09.
CMU SCS
MORE Graph PatternsKDD'13 BIGMINE(c) 2013, C. Faloutsos53
Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics (Ed.: Charu Aggarwal)
Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.
CMU SCS
(c) 2013, C. Faloutsos54RoadmapIntroduction MotivationPart#1: Patterns in graphsWhy so many power-laws?Why no good cuts?Part#2: Cascade analysisConclusions
KDD'13 BIGMINE
CMU SCS
Faloutsos54
2 Questions, one answerQ1: why so many power lawsQ2: why no good cuts?KDD'13 BIGMINE(c) 2013, C. Faloutsos55
CMU SCS
2 Questions, one answerQ1: why so many power lawsQ2: why no good cuts?A: Self-similarity = fractals = RMAT ~ Kronecker graphs
KDD'13 BIGMINE(c) 2013, C. Faloutsos56possible
CMU SCS
20 intro to fractalsRemove the middle triangle; repeat-> Sierpinski triangle(Bonus question - dimensionality?>1 (inf. perimeter (4/3) ) no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn(r) nn(r) = C r log3/log2
CMU SCS
20 intro to fractals
KDD'13 BIGMINE(c) 2013, C. Faloutsos
59Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn(r) nn(r) = C r log3/log2
CMU SCS
20 intro to fractals
KDD'13 BIGMINE(c) 2013, C. Faloutsos60Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2
N(t)E(t)
1.66Reminder:Densification P.L.(2x nodes, ~3x edges)
CMU SCS
20 intro to fractals
KDD'13 BIGMINE(c) 2013, C. Faloutsos61Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2
2x the radius,4x neighborsnn = C r log4/log2 = C r 2
CMU SCS
20 intro to fractals
KDD'13 BIGMINE(c) 2013, C. Faloutsos62Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2
2x the radius,4x neighborsnn = C r log4/log2 = C r 2 Fractal dim.
=1.58
CMU SCS
20 intro to fractals
KDD'13 BIGMINE(c) 2013, C. Faloutsos63Self-similarity -> no char. scale-> power laws, eg:2x the radius, 3x the #neighbors nn = C r log3/log2
2x the radius,4x neighborsnn = C r log4/log2 = C r 2
Fractal dim.
CMU SCS
How does self-similarity help in graphs?A: RMAT/Kronecker generatorsWith self-similarity, we get all power-laws, automatically,And small/shrinking diameterAnd `no good cuts
KDD'13 BIGMINE(c) 2013, C. Faloutsos64R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USARealistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication,by J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, in PKDD 2005, Porto, Portugal
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos65Graph gen.: Problem dfnGiven a growing graph with count of nodes N1, N2, Generate a realistic sequence of graphs that will obey all the patterns Static PatternsS1 Power Law Degree DistributionS2 Power Law eigenvalue and eigenvector distribution Small DiameterDynamic PatternsT2 Growth Power Law (2x nodes; 3x edges)T1 Shrinking/Stabilizing Diameters
CMU SCS
65
KDD'13 BIGMINE(c) 2013, C. Faloutsos66
Adjacency matrixKronecker Graphs
Intermediate stage
Adjacency matrix
CMU SCS
66
KDD'13 BIGMINE(c) 2013, C. Faloutsos67
Adjacency matrixKronecker Graphs
Intermediate stage
Adjacency matrix
CMU SCS
67
KDD'13 BIGMINE(c) 2013, C. Faloutsos68
Adjacency matrixKronecker Graphs
Intermediate stageAdjacency matrix
CMU SCS
68
KDD'13 BIGMINE(c) 2013, C. Faloutsos69Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos70Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos71Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos72Kronecker GraphsContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
Holes within holes;Communities within communities
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos73Properties:We can PROVE thatDegree distribution is multinomial ~ power lawDiameter: constantEigenvalue distribution: multinomialFirst eigenvector: multinomialnewSelf-similarity -> power laws
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos74Problem DefinitionGiven a growing graph with nodes N1, N2, Generate a realistic sequence of graphs that will obey all the patterns Static PatternsPower Law Degree DistributionPower Law eigenvalue and eigenvector distributionSmall DiameterDynamic PatternsGrowth Power LawShrinking/Stabilizing DiametersFirst generator for which we can prove all these properties
CMU SCS
74
Impact: Graph500Based on RMAT (= 2x2 Kronecker)Standard for graph benchmarkshttp://www.graph500.org/Competitions 2x year, with all major entities: LLNL, Argonne, ITC-U. Tokyo, Riken, ORNL, Sandia, PSC, KDD'13 BIGMINE(c) 2013, C. Faloutsos75R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USATo iterate is human, to recurse is devine
CMU SCS
(c) 2013, C. Faloutsos76RoadmapIntroduction MotivationPart#1: Patterns in graphsQ1: Why so many power-laws?Q2: Why no good cuts?Part#2: Cascade analysisConclusions
KDD'13 BIGMINE
A: real graphs -> self similar -> power laws
CMU SCS
Q2: Why no good cuts?A: self-similarityCommunities within communities within communities KDD'13 BIGMINE(c) 2013, C. Faloutsos77
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos78Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
REMINDER
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos79Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
REMINDERCommunities within communities within communities
Linux usersMac userswin users
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos80Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on
G4 adjacency matrix
REMINDERCommunities within communities within communities
How many Communities?3?9?27?
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos81Kronecker Product a GraphContinuing multiplying with G1 we obtain G4 and so on G4 adjacency matrix
REMINDERCommunities within communities within communities
How many Communities?3?9?27?A: one butnot a typical, block-likecommunity
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos82
Communities?(Gaussian)Clusters?Piece-wise flat parts?
age# songs
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos83
Wrong questions to ask!age# songs
CMU SCS
Summary of Part#1*many* patterns in real graphsSmall & shrinking diametersPower-laws everywhereGaussian trapno good cutsSelf-similarity (RMAT/Kronecker): good model
KDD'13 BIGMINE(c) 2013, C. Faloutsos84
CMU SCS
KDD'13 BIGMINE(c) 2013, C. Faloutsos85Part 2:Cascades & Immunization
CMU SCS
Why do we care?Information DiffusionViral MarketingEpidemiology and Public HealthCyber SecurityHuman mobility Games and Virtual Worlds Ecology........
(c) 2013, C. Faloutsos86
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos87RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) ImmunizationEpidemic thresholdsConclusions
KDD'13 BIGMINE
CMU SCS
Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos
SDM 2013, Austin, TX
(c) 2013, C. Faloutsos88KDD'13 BIGMINE
CMU SCS
88
Whom to immunize?Dynamical Processes over networks
Each circle is a hospital ~3,000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005]Problem: Given k units of disinfectant, whom to immunize?(c) 2013, C. Faloutsos89KDD'13 BIGMINE
CMU SCS
89
Whom to immunize?
CURRENT PRACTICEOUR METHOD[US-MEDICARE NETWORK 2005]
~6x fewer!(c) 2013, C. Faloutsos90KDD'13 BIGMINEHospital-acquired inf. : 99K+ lives, $5B+ per year
CMU SCS
90
Fractional Asymmetric Immunization
HospitalAnother Hospital
Drug-resistant Bacteria (like XDR-TB) (c) 2013, C. Faloutsos91KDD'13 BIGMINE
CMU SCS
91
Fractional Asymmetric Immunization
HospitalAnother Hospital
(c) 2013, C. Faloutsos92KDD'13 BIGMINE
CMU SCS
92
Fractional Asymmetric Immunization
HospitalAnother Hospital
(c) 2013, C. Faloutsos93KDD'13 BIGMINE
CMU SCS
93
Fractional Asymmetric Immunization
HospitalAnother Hospital
Problem: Given k units of disinfectant, distribute them to maximize hospitals saved
(c) 2013, C. Faloutsos94KDD'13 BIGMINE
CMU SCS
94
Fractional Asymmetric Immunization
HospitalAnother Hospital
Problem: Given k units of disinfectant, distribute them to maximize hospitals saved @ 365 days
(c) 2013, C. Faloutsos95KDD'13 BIGMINE
CMU SCS
95
Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos96
CMU SCS
Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos97
CMU SCS
Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos98
CMU SCS
Straightforward solution:Simulation:Distribute resourcesinfect a few nodesSimulate evolution of spreading (10x, take avg)Tweak, and repeat step 1KDD'13 BIGMINE(c) 2013, C. Faloutsos99
CMU SCS
Running TimeSimulationsSMART-ALLOC> 1 weekWall-Clock Time
14 secs> 30,000x speed-up!better(c) 2013, C. Faloutsos100KDD'13 BIGMINE
CMU SCS
100
Experiments
K = 120better(c) 2013, C. Faloutsos101KDD'13 BIGMINE# epochs# infecteduniformSMART-ALLOC
CMU SCS
What is the silver bullet?A: Try to decrease connectivity of graph
Q: how to measure connectivity?Avg degree? Max degree?Std degree / avg degree ?Diameter?Modularity?Conductance (~min cut size)?Some combination of above?KDD'13 BIGMINE(c) 2013, C. Faloutsos102
14 secs> 30,000x speed-up!
CMU SCS
What is the silver bullet?A: Try to decrease connectivity of graph
Q: how to measure connectivity?A: first eigenvalue of adjacency matrix
Q1: why??(Q2: dfn & intuition of eigenvalue ? )KDD'13 BIGMINE(c) 2013, C. Faloutsos103Avg degreeMax degreeDiameterModularityConductance
CMU SCS
Why eigenvalue?A1: G2 theorem and eigen-drop:
For (almost) any type of virusFor any network-> no epidemic, if small-enough first eigenvalue (1 ) of adjacency matrix
Heuristic: for immunization, try to min 1The smaller 1, the closer to extinction.KDD'13 BIGMINE(c) 2013, C. Faloutsos104Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks, B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos, ICDM 2011, Vancouver, Canada
CMU SCS
Why eigenvalue?A1: G2 theorem and eigen-drop:
For (almost) any type of virusFor any network-> no epidemic, if small-enough first eigenvalue (1 ) of adjacency matrix
Heuristic: for immunization, try to min 1The smaller 1, the closer to extinction.KDD'13 BIGMINE(c) 2013, C. Faloutsos105
CMU SCS
Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos FaloutsosIEEE ICDM 2011, Vancouver
extended version, in arxivhttp://arxiv.org/abs/1004.0060
G2 theorem~10 pages proof
CMU SCS
106
Our thresholds for some modelss = effective strengths < 1 : below threshold
(c) 2013, C. Faloutsos107KDD'13 BIGMINEModelsEffective Strength (s)Threshold (tipping point)SIS, SIR, SIRS, SEIRs = .
s = 1 SIV, SEIVs = . (H.I.V.)s = .
CMU SCS
107
Our thresholds for some modelss = effective strengths < 1 : below threshold
(c) 2013, C. Faloutsos108KDD'13 BIGMINEModelsEffective Strength (s)Threshold (tipping point)SIS, SIR, SIRS, SEIRs = .
s = 1 SIV, SEIVs = . (H.I.V.)s = .
No immunityTemp.immunityw/ incubation
CMU SCS
108
(c) 2013, C. Faloutsos109RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) Immunizationintuition behind 1Conclusions
KDD'13 BIGMINE
CMU SCS
Intuition for Official definitions:Let A be the adjacency matrix. Then is the root with the largest magnitude of the characteristic polynomial of A [det(A xI)].Also: A x = l x
Neither gives much intuition!
Un-official Intuition For homogeneous graphs, == degree
~ avg degreedone right, for skewed degree distributions(c) 2013, C. Faloutsos110KDD'13 BIGMINE
CMU SCS
110
Largest Eigenvalue ()
2 = N = N-1
N = 1000 nodes 2= 31.67= 999better connectivity higher (c) 2013, C. Faloutsos111KDD'13 BIGMINE
CMU SCS
111
Largest Eigenvalue ()
2 = N = N-1
N = 1000 nodes 2= 31.67= 999better connectivity higher (c) 2013, C. Faloutsos112KDD'13 BIGMINE
CMU SCS
112
Examples: Simulations SIR (mumps)
Fraction of InfectionsFootprintEffective StrengthTime ticks(c) 2013, C. Faloutsos113KDD'13 BIGMINE
(a) Infection profile (b) Take-off plotPORTLAND graph: synthetic population, 31 million links, 6 million nodes
CMU SCS
113
Examples: Simulations SIRS (pertusis)
Fraction of InfectionsFootprintEffective StrengthTime ticks(c) 2013, C. Faloutsos114KDD'13 BIGMINE
(a) Infection profile (b) Take-off plotPORTLAND graph: synthetic population, 31 million links, 6 million nodes
CMU SCS
114
Immunization - conclusionIn (almost any) immunization setting,Allocate resources, such that toMinimize 1(regardless of virus specifics)
Conversely, in a market penetration settingAllocate resources toMaximize 1KDD'13 BIGMINE(c) 2013, C. Faloutsos115
CMU SCS
(c) 2013, C. Faloutsos116RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) ImmunizationEpidemic thresholdsWhat next?Acks & Conclusions[Tools: ebay fraud; tensors; spikes]
KDD'13 BIGMINE
CMU SCS
Challenge #1: Connectome brain wiringKDD'13 BIGMINE(c) 2013, C. Faloutsos117Which neurons get activated by beeHow wiring evolvesModeling epilepsy
N. SidiropoulosGeorge KarypisV. PapalexakisTom Mitchell
CMU SCS
Challenge#2: Time evolving networks / tensorsPeriodicities? Burstiness?What is typical behavior of a node, over timeHeterogeneous graphs (= nodes w/ attributes)KDD'13 BIGMINE(c) 2013, C. Faloutsos118
CMU SCS
(c) 2013, C. Faloutsos119RoadmapIntroduction MotivationPart#1: Patterns in graphsPart#2: Cascade analysis(Fractional) ImmunizationEpidemic thresholdsAcks & Conclusions[Tools: ebay fraud; tensors; spikes]
KDD'13 BIGMINE
Off line
CMU SCS
(c) 2013, C. Faloutsos120ThanksKDD'13 BIGMINEThanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
CMU SCS
Faloutsos et al8/10/13120
(c) 2013, C. Faloutsos121ThanksKDD'13 BIGMINEThanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLabDisclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
CMU SCS
Faloutsos et al8/10/13121
(c) 2013, C. Faloutsos122Project info: PEGASUS
KDD'13 BIGMINEwww.cs.cmu.edu/~pegasusResults on large graphs: with Pegasus + hadoop + M45Apache licenseCode, papers, manual, videoProf. U Kang
Prof. Polo Chau
CMU SCS
Faloutsos et al8/10/13122
(c) 2013, C. Faloutsos123Cast
Akoglu, LemanChau, PoloKang, UMcGlohon, Mary
Tong, Hanghang
Prakash,AdityaKDD'13 BIGMINE
Koutra,Danai
Beutel,AlexPapalexakis,Vagelis
CMU SCS
Faloutsos et al8/10/13123
(c) 2013, C. Faloutsos124CONCLUSION#1 Big dataLarge datasets reveal patterns/outliers that are invisible otherwiseKDD'13 BIGMINE
CMU SCS
Faloutsos124
(c) 2013, C. Faloutsos125CONCLUSION#2 self-similaritypowerful tool / viewpointPower laws; shrinking diametersGaussian trap (eg., F.O.F.)no good cutsRMAT graph500 generatorKDD'13 BIGMINE
CMU SCS
Faloutsos125
(c) 2013, C. Faloutsos126CONCLUSION#3 eigen-dropCascades & immunization: G2 theorem & eigenvalueKDD'13 BIGMINE
CURRENT PRACTICEOUR METHOD[US-MEDICARE NETWORK 2005]
~6x fewer!
14 secs> 30,000x speed-up!
CMU SCS
Faloutsos126
(c) 2013, C. Faloutsos127ReferencesD. Chakrabarti, C. Faloutsos: Graph Mining Laws, Tools and Case Studies, Morgan Claypool 2012http://www.morganclaypool.com/doi/abs/10.2200/S00449ED1V01Y201209DMK006
KDD'13 BIGMINE
CMU SCS
Faloutsos127
TAKE HOME MESSAGE:Cross-disciplinarityKDD'13 BIGMINE(c) 2013, C. Faloutsos128
CMU SCS
TAKE HOME MESSAGE:Cross-disciplinarityKDD'13 BIGMINE(c) 2013, C. Faloutsos129
QUESTIONS?
CMU SCS
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04