data mining --- mining graphs - usf dept. of computer …lohall/dm/cis6930_dm_gm.pdf ·  ·...

38
Data mining --- mining graphs CAP5771.1 University of South Florida Xiaoning Qian

Upload: nguyencong

Post on 26-Apr-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Data mining --- mining graphs

CAP5771.1University of South Florida

Xiaoning Qian

Xiaoning Qian ([email protected]) 10/13/11

Today’s Lecture

1. Complex networks

2. Graph representation for networks

3. Markov chain

4. Viral propagation

5. Google’s PageRank

CAP5771.1

1. M Faloutsos, P Faloutsos, C Faloutsos, On power-lawrelationships of the Internet topology. Comput. Commun. Rev.29:251-262, 1999

2. JM Kleinberg, Navigation in a small world—it is easier to findshort chains between points in some networks than others. Nature,406:845, 2000

3. AL Barabasi, Linked: The New Science of Networks. Cambridge,MA, 2002

4. DJ Watts, The “new” science of networks. Annu. Rev. Sociol.30:243–70, 2004

5. DL Alderson, Catching the “network science” bug: Insight andopportunity for the operations researcher. Operations Research56(5): 1047-1065, 2008

6. MEJ Newman, Networks: An Introduction. Oxford, 2010

Xiaoning Qian ([email protected]) 10/13/11

“New” Science of Networks

CAP5771.1

The first network/graph problem Find a tour crossing every bridge just once, Euler, 1735

Xiaoning Qian ([email protected]) 10/13/11

Network Science

CAP5771.1

Bridges of Königsberg

1. S Milgram, The small world problem. Psychol.Today 2:60–67, 1967

“New” Science Unprecedented number of empirical networks Much larger scale networks Visualization does not convey enough information

Computer are much more powerful Highly interdisciplinary

Xiaoning Qian ([email protected]) 10/13/11

Network Science

CAP5771.1

Mining networks/graphs

Topology/structure of complex networks Global: degrees, centrality, connectivity, etc.

Scale-free (power-law) networks: 6 degree separation?

Local: clustering (community), network motifs, etc.

Dynamics/behavior of complex networks Global: the topological effect on dynamics

How information, virus, disease, rumors, etc. propagate?

Local: how individual nodes behave

Xiaoning Qian ([email protected]) 10/13/11

Network Science

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Yeast PPI)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Yeast signaling)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (food web)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (friendship)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (romantic relation)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (author citation)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Internet)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Web)

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

What is a network/graph? A collection of vertices/nodes joined by edges Different types of vertices and edges:

Directed vs. Undirected Weighed vs. Binary Labeled vs. Nonlabeled Bipartite graphs Hypergraphs

Mathematically,G = {V, E}

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Undirected network: Undirected network: <v<vii, , vvjj> > ∈∈ E => < E => < vvjj, v, vii > > ∈∈ E E

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Adjacency matrix L Symmetric for undirected graphs Square matrix for (self-)graphs; rectangular for bipartite

graphsLijij = eijij if <v<vii, , vvjj> > ∈∈ E E

Matrix analysis for graph mining!

Simple graphs, connected graphs, complete graphs, …

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Node degree cii

The number of edges incident with vertex vvii

Neighbor set Input-, output-degrees Degree distributions (power-law) …

Trail (distinct edges), path (distinct nodes), cycle, cut, …

Sequential data

Xiaoning Qian ([email protected]) 10/13/11

Markov chain

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

What is a Markov chain?

Finite Markov chain -- (Q, P) Q = {q1, q2, …, qs} : a finite set of states P : state transition probability matrix Given a sequence of observations: The probability of the sequence is:

For first-order time-homogeneous Markov chain:

Hence,

CAP5771.1

Finite Markov chain -- (Q, P) Q = {B, q1, q2, …, qs} : a finite set of states P : state transition probability matrix

initial state probability:

The probability of a sequence can be expressed with P:

Note: The output are states at each time -- states areobservable!!

Xiaoning Qian ([email protected]) 10/13/11

What is a Markov chain?

CAP5771.1

3-state Markov chain model for the weather:

Q = {Rain (or snow), Cloudy, Sunny};

P is given in the figure; Initial state probability

RR

CC

SS

Xiaoning Qian ([email protected]) 10/13/11

An example

0.40.4

0.30.3

0.30.3

0.20.2

0.20.2

0.60.6

0.80.8

0.10.1

0.10.1

CAP5771.1

Chapman-Kolmogorov equationp(xn) = P(n-1) p(x1)

Limiting distribution (stationary/steady-state distribution) Irreducibility, Periodicity, Ergocity

p = P p

How to solve p? Eigen-decomposition of P Power method

Xiaoning Qian ([email protected]) 10/13/11

Chapman-Kolmogorov Equation

CAP5771.1

Random walk on graphs (network diffusion) is a Markovprocess.

Xiaoning Qian ([email protected]) 10/13/11

Random walk on graphs

CAP5771.1

The algorithm of Google---PageRank

Xiaoning Qian ([email protected]) 10/13/11

What’s behind Google?

CAP5771.1

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

What is an important Webpage?There are many Webpages pointing to it

Important Webpage point to more important Webpage

Importance diffuses based on links between Webpages

Vertices: Webpages; Edges: hyperlinks;

HITS: JM Kleinberg

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk) on Web

pi: importance for page i; Lij: link from page j to i;

λ λ

Hence, the problem becomes a Markov chain problem(diffusion process):

λ λ

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

pi: importance for page i; Lij: link from page j to i;

λ λ

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

Matrix form:

λ λ

PseudocountDiffusion factor

λ λ

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

How do we solve this?λ λ

Note that p is simply for ranking and the absolutevalues are not critical! WLOG, we assume

Hence, the problem becomes a Markov chain problem(diffusion process):

λ λ

Xiaoning Qian ([email protected]) 10/13/11

Viral propagation

CAP5771.1

How does the virus spread over the network? Will it become an “epidemic” outbreak? How fast the virus will die out or become “epidemic”? How we should design “robust” networks to prevent

cascading failures?

* A Ganesh et. al., The effect of network topology on the spread of epidemics. INFOCOM, 2005

* D Chakrabati, Tools for large graph mining. Ph.D. Thesis, CMU, 2005

Xiaoning Qian ([email protected]) 10/13/11

Mathematical Epidemiology

CAP5771.1

SIR (Susceptible-Infective-Recovered) model

SIS (Susceptible-Infective-Susceptible) model Catching the disease from Infective neighbors (birth rate): β

Recover rate: δ

Epidemic threshold: τ

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

????

Sum and Product rules in probability!!

SIS model is again a Markov process!

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

Sum and Product rules in probability!!

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

Sum and Product rules in probability!!

With appropriate approximations, we can derivep(vi

t=susceptible) = p(vit-1=susceptible) ζi + p(vi

t-1=infective) δ

1-p(vit) = [1-p(vi

t-1)] ζi + p(vit-1) δ

and

pt =[ βL + (1-δ)I ] pt-1

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

LL

pp pp 00

LLHence,

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

Epidemic threshold:

LLHence,

LL

Networks/graphs are everywhere and require new toolsto study them efficiently and effectively.

Random walk (Markov chain) on graphs and itsextension can be a useful technique to “mine” complexnetworks/graphs PageRank Viral propagation

Have you learned anything? :)

I am teaching Biological Network Analysis, Spring 2012.

Xiaoning Qian ([email protected]) 10/13/11

Summary

CAP5771.1