data mining --- mining graphs - usf dept. of computer …lohall/dm/cis6930_dm_gm.pdf ·  ·...

Post on 26-Apr-2018

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data mining --- mining graphs

CAP5771.1University of South Florida

Xiaoning Qian

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Today’s Lecture

1. Complex networks

2. Graph representation for networks

3. Markov chain

4. Viral propagation

5. Google’s PageRank

CAP5771.1

1. M Faloutsos, P Faloutsos, C Faloutsos, On power-lawrelationships of the Internet topology. Comput. Commun. Rev.29:251-262, 1999

2. JM Kleinberg, Navigation in a small world—it is easier to findshort chains between points in some networks than others. Nature,406:845, 2000

3. AL Barabasi, Linked: The New Science of Networks. Cambridge,MA, 2002

4. DJ Watts, The “new” science of networks. Annu. Rev. Sociol.30:243–70, 2004

5. DL Alderson, Catching the “network science” bug: Insight andopportunity for the operations researcher. Operations Research56(5): 1047-1065, 2008

6. MEJ Newman, Networks: An Introduction. Oxford, 2010

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

“New” Science of Networks

CAP5771.1

The first network/graph problem Find a tour crossing every bridge just once, Euler, 1735

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Network Science

CAP5771.1

Bridges of Königsberg

1. S Milgram, The small world problem. Psychol.Today 2:60–67, 1967

“New” Science Unprecedented number of empirical networks Much larger scale networks Visualization does not convey enough information

Computer are much more powerful Highly interdisciplinary

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Network Science

CAP5771.1

Mining networks/graphs

Topology/structure of complex networks Global: degrees, centrality, connectivity, etc.

Scale-free (power-law) networks: 6 degree separation?

Local: clustering (community), network motifs, etc.

Dynamics/behavior of complex networks Global: the topological effect on dynamics

How information, virus, disease, rumors, etc. propagate?

Local: how individual nodes behave

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Network Science

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (Yeast PPI)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (Yeast signaling)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (food web)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (friendship)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (romantic relation)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (author citation)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (Internet)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Complex Networks (Web)

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

What is a network/graph? A collection of vertices/nodes joined by edges Different types of vertices and edges:

Directed vs. Undirected Weighed vs. Binary Labeled vs. Nonlabeled Bipartite graphs Hypergraphs

Mathematically,G = {V, E}

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Undirected network: Undirected network: <v<vii, , vvjj> > ∈∈ E => < E => < vvjj, v, vii > > ∈∈ E E

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Adjacency matrix L Symmetric for undirected graphs Square matrix for (self-)graphs; rectangular for bipartite

graphsLijij = eijij if <v<vii, , vvjj> > ∈∈ E E

Matrix analysis for graph mining!

Simple graphs, connected graphs, complete graphs, …

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Node degree cii

The number of edges incident with vertex vvii

Neighbor set Input-, output-degrees Degree distributions (power-law) …

Trail (distinct edges), path (distinct nodes), cycle, cut, …

Sequential data

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Markov chain

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

What is a Markov chain?

Finite Markov chain -- (Q, P) Q = {q1, q2, …, qs} : a finite set of states P : state transition probability matrix Given a sequence of observations: The probability of the sequence is:

For first-order time-homogeneous Markov chain:

Hence,

CAP5771.1

Finite Markov chain -- (Q, P) Q = {B, q1, q2, …, qs} : a finite set of states P : state transition probability matrix

initial state probability:

The probability of a sequence can be expressed with P:

Note: The output are states at each time -- states areobservable!!

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

What is a Markov chain?

CAP5771.1

3-state Markov chain model for the weather:

Q = {Rain (or snow), Cloudy, Sunny};

P is given in the figure; Initial state probability

RR

CC

SS

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

An example

0.40.4

0.30.3

0.30.3

0.20.2

0.20.2

0.60.6

0.80.8

0.10.1

0.10.1

CAP5771.1

Chapman-Kolmogorov equationp(xn) = P(n-1) p(x1)

Limiting distribution (stationary/steady-state distribution) Irreducibility, Periodicity, Ergocity

p = P p

How to solve p? Eigen-decomposition of P Power method

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Chapman-Kolmogorov Equation

CAP5771.1

Random walk on graphs (network diffusion) is a Markovprocess.

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Random walk on graphs

CAP5771.1

The algorithm of Google---PageRank

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

What’s behind Google?

CAP5771.1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

PageRank

CAP5771.1

What is an important Webpage?There are many Webpages pointing to it

Important Webpage point to more important Webpage

Importance diffuses based on links between Webpages

Vertices: Webpages; Edges: hyperlinks;

HITS: JM Kleinberg

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk) on Web

pi: importance for page i; Lij: link from page j to i;

λ λ

Hence, the problem becomes a Markov chain problem(diffusion process):

λ λ

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

pi: importance for page i; Lij: link from page j to i;

λ λ

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

Matrix form:

λ λ

PseudocountDiffusion factor

λ λ

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

PageRank

CAP5771.1

How do we solve this?λ λ

Note that p is simply for ranking and the absolutevalues are not critical! WLOG, we assume

Hence, the problem becomes a Markov chain problem(diffusion process):

λ λ

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Viral propagation

CAP5771.1

How does the virus spread over the network? Will it become an “epidemic” outbreak? How fast the virus will die out or become “epidemic”? How we should design “robust” networks to prevent

cascading failures?

* A Ganesh et. al., The effect of network topology on the spread of epidemics. INFOCOM, 2005

* D Chakrabati, Tools for large graph mining. Ph.D. Thesis, CMU, 2005

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Mathematical Epidemiology

CAP5771.1

SIR (Susceptible-Infective-Recovered) model

SIS (Susceptible-Infective-Susceptible) model Catching the disease from Infective neighbors (birth rate): β

Recover rate: δ

Epidemic threshold: τ

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

SIS model

CAP5771.1

????

Sum and Product rules in probability!!

SIS model is again a Markov process!

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

SIS model

CAP5771.1

Sum and Product rules in probability!!

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

SIS model

CAP5771.1

Sum and Product rules in probability!!

With appropriate approximations, we can derivep(vi

t=susceptible) = p(vit-1=susceptible) ζi + p(vi

t-1=infective) δ

1-p(vit) = [1-p(vi

t-1)] ζi + p(vit-1) δ

and

pt =[ βL + (1-δ)I ] pt-1

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

LL

pp pp 00

LLHence,

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

Epidemic threshold:

LLHence,

LL

Networks/graphs are everywhere and require new toolsto study them efficiently and effectively.

Random walk (Markov chain) on graphs and itsextension can be a useful technique to “mine” complexnetworks/graphs PageRank Viral propagation

Have you learned anything? :)

I am teaching Biological Network Analysis, Spring 2012.

Xiaoning Qian (xqian@cse.usf.edu) 10/13/11

Summary

CAP5771.1

top related