data mining --- mining graphs - usf dept. of computer …lohall/dm/cis6930_dm_gm.pdf · ·...

Data mining --- mining graphs

CAP5771.1University of South Florida

Xiaoning Qian

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Today’s Lecture

1. Complex networks

2. Graph representation for networks

3. Markov chain

4. Viral propagation

5. Google’s PageRank

CAP5771.1

1. M Faloutsos, P Faloutsos, C Faloutsos, On power-lawrelationships of the Internet topology. Comput. Commun. Rev.29:251-262, 1999

2. JM Kleinberg, Navigation in a small world—it is easier to findshort chains between points in some networks than others. Nature,406:845, 2000

3. AL Barabasi, Linked: The New Science of Networks. Cambridge,MA, 2002

4. DJ Watts, The “new” science of networks. Annu. Rev. Sociol.30:243–70, 2004

5. DL Alderson, Catching the “network science” bug: Insight andopportunity for the operations researcher. Operations Research56(5): 1047-1065, 2008

6. MEJ Newman, Networks: An Introduction. Oxford, 2010

“New” Science of Networks

CAP5771.1

The first network/graph problem Find a tour crossing every bridge just once, Euler, 1735

Network Science

CAP5771.1

Bridges of Königsberg

1. S Milgram, The small world problem. Psychol.Today 2:60–67, 1967

“New” Science Unprecedented number of empirical networks Much larger scale networks Visualization does not convey enough information

Computer are much more powerful Highly interdisciplinary

Network Science

CAP5771.1

Mining networks/graphs

Topology/structure of complex networks Global: degrees, centrality, connectivity, etc.

Scale-free (power-law) networks: 6 degree separation?

Local: clustering (community), network motifs, etc.

Dynamics/behavior of complex networks Global: the topological effect on dynamics

How information, virus, disease, rumors, etc. propagate?

Local: how individual nodes behave

Network Science

CAP5771.1

Complex Networks (Yeast PPI)

CAP5771.1

Complex Networks (Yeast signaling)

CAP5771.1

Complex Networks (food web)

CAP5771.1

Complex Networks (friendship)

CAP5771.1

Complex Networks (romantic relation)

CAP5771.1

Complex Networks (author citation)

CAP5771.1

Complex Networks (Internet)

CAP5771.1

Complex Networks (Web)

CAP5771.1

Mathematics of Networks (Graphs)

CAP5771.1

What is a network/graph? A collection of vertices/nodes joined by edges Different types of vertices and edges:

Directed vs. Undirected Weighed vs. Binary Labeled vs. Nonlabeled Bipartite graphs Hypergraphs

Mathematically,G = {V, E}

CAP5771.1

Undirected network: Undirected network: <v<vii, , vvjj> > ∈∈ E => < E => < vvjj, v, vii > > ∈∈ E E

CAP5771.1

Adjacency matrix L Symmetric for undirected graphs Square matrix for (self-)graphs; rectangular for bipartite

graphsLijij = eijij if <v<vii, , vvjj> > ∈∈ E E

Matrix analysis for graph mining!

Simple graphs, connected graphs, complete graphs, …

CAP5771.1

Node degree cii

The number of edges incident with vertex vvii

Neighbor set Input-, output-degrees Degree distributions (power-law) …

Trail (distinct edges), path (distinct nodes), cycle, cut, …

Sequential data

Markov chain

CAP5771.1

What is a Markov chain?

Finite Markov chain -- (Q, P) Q = {q1, q2, …, qs} : a finite set of states P : state transition probability matrix Given a sequence of observations: The probability of the sequence is:

For first-order time-homogeneous Markov chain:

Hence,

CAP5771.1

Finite Markov chain -- (Q, P) Q = {B, q1, q2, …, qs} : a finite set of states P : state transition probability matrix

initial state probability:

The probability of a sequence can be expressed with P:

Note: The output are states at each time -- states areobservable!!

What is a Markov chain?

CAP5771.1

3-state Markov chain model for the weather:

Q = {Rain (or snow), Cloudy, Sunny};

P is given in the figure; Initial state probability

An example

0.40.4

0.30.3

0.20.2

0.60.6

0.80.8

0.10.1

CAP5771.1

Chapman-Kolmogorov equationp(xn) = P(n-1) p(x1)

Limiting distribution (stationary/steady-state distribution) Irreducibility, Periodicity, Ergocity

p = P p

How to solve p? Eigen-decomposition of P Power method

Chapman-Kolmogorov Equation

CAP5771.1

Random walk on graphs (network diffusion) is a Markovprocess.

Random walk on graphs

CAP5771.1

The algorithm of Google---PageRank

What’s behind Google?

CAP5771.1

PageRank

CAP5771.1

What is an important Webpage?There are many Webpages pointing to it

Important Webpage point to more important Webpage

Importance diffuses based on links between Webpages

Vertices: Webpages; Edges: hyperlinks;

HITS: JM Kleinberg

PageRank

CAP5771.1

Diffusion (Random walk) on Web

pi: importance for page i; Lij: link from page j to i;

Hence, the problem becomes a Markov chain problem(diffusion process):

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

pi: importance for page i; Lij: link from page j to i;

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

Matrix form:

PseudocountDiffusion factor

PageRank

CAP5771.1

How do we solve this?λ λ

Note that p is simply for ranking and the absolutevalues are not critical! WLOG, we assume

Hence, the problem becomes a Markov chain problem(diffusion process):

Viral propagation

CAP5771.1

How does the virus spread over the network? Will it become an “epidemic” outbreak? How fast the virus will die out or become “epidemic”? How we should design “robust” networks to prevent

cascading failures?

* A Ganesh et. al., The effect of network topology on the spread of epidemics. INFOCOM, 2005

* D Chakrabati, Tools for large graph mining. Ph.D. Thesis, CMU, 2005

Mathematical Epidemiology

CAP5771.1

SIR (Susceptible-Infective-Recovered) model

SIS (Susceptible-Infective-Susceptible) model Catching the disease from Infective neighbors (birth rate): β

Recover rate: δ

Epidemic threshold: τ

SIS model

CAP5771.1

Sum and Product rules in probability!!

SIS model is again a Markov process!

SIS model

CAP5771.1

SIS model

CAP5771.1

With appropriate approximations, we can derivep(vi

t=susceptible) = p(vit-1=susceptible) ζi + p(vi

t-1=infective) δ

1-p(vit) = [1-p(vi

t-1)] ζi + p(vit-1) δ

pt =[ βL + (1-δ)I ] pt-1

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

pp pp 00

LLHence,

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

Epidemic threshold:

LLHence,

Networks/graphs are everywhere and require new toolsto study them efficiently and effectively.

Random walk (Markov chain) on graphs and itsextension can be a useful technique to “mine” complexnetworks/graphs PageRank Viral propagation

Have you learned anything? :)

I am teaching Biological Network Analysis, Spring 2012.

Summary

CAP5771.1

data mining --- mining graphs - usf dept. of computer …lohall/dm/cis6930_dm_gm.pdf · ·...

Documents

roc graphs: notes and practical considerations for data...

mining web graphs for recommendations.bak

netmine: mining tools for large graphs

visualizing bibliographic databases as graphs and mining...

mining, indexing, and similarity search in graphs and...

mining social-network graphs · mining social-network...

mining large graphs: spectral methods, tensors and influence...

graphscope : parameter-free mining of large time-evolving...

mining large graphs

multi-phase process mining: building instance graphs

association analysis (7) (mining graphs). frequent subgraph...

data mining with graphs and matrices · fei wang, tao li,...

mining large graphs and fraud...

topological summaries: using graphs for chemical searching...

mining large graphs part 3: case studies

mining, indexing, and similarity search in graphs and...

net-ray: visualizing and mining billion-scale graphs

cmu scs mining graphs and tensors christos faloutsos cmu

mining behavior graphs for “backtrace” of noncrashing...

mining frequent closed graphs on evolving data streams