non-ppyarametric bayesian models for complex networks · 2012. 6. 8. · • draw clustering...
TRANSCRIPT
Non-parametric Bayesian Models p yfor Complex Networks
Morten MørupAssociate ProfessorSection for Cognitive Systems (CogSys)Section for Cognitive Systems (CogSys)DTU InformaticsTechnical University of Denmark
Joint work with
Lars Kai Hansen Mikkel N. Schmidt Tue Herlau
The father of Graph Theory is considered to be The father of Graph Theory is considered to be Leonhard Euler
The bridges of Königsberg
Problem: Find a walk through the city that cross each bridge once and only oncecross each bridge once and only once.
Euler proved that the problem has no solution using a graph representation and proved further that a solution
Leonhard Euler(1707-1783)
graph representation and proved further that a solution to the problem would require all vertices to have even degree (degree of vertex: number of edges (i.e. bridges)
30-05-2012Visionday, DTU Informatics2 DTU Informatik, Danmarks Tekniske Universitet
touching the vertex (land mass)).
Graphs and their adjacency matricesA graph G(V,E) with vertices V and edges E can be represented by the corresponding
adjacency matrix A such that A =1 if there is a link from vertex i to vertex j and adjacency matrix A such that Aij=1 if there is a link from vertex i to vertex j and Aij=0 otherwise.
Undirected Graph Di t d G hUndirected Graph Directed Graph
G(V,E) V: set of vertices/nodes,E: set of edges/linkg /
V={1,2,3,4,5,6}EUndirected={(1,2),(1,5),(2,5),(2,3),(3,4),(4,5),(4,6)} EDirected={(1,5) ,(2,1),(2,5),(3,2),(3,4),(4,5),(5,2), (6,4)}
30-05-2012Visionday, DTU Informatics3 DTU Informatik, Danmarks Tekniske Universitet
Bipartite GraphsBipartite Graphs
12 31 4
3
2
V(1)={1,2,3}
V(2)={1,2,3,4}
EBipartite={(1,2) ,(2,2),(2,3),(2,4),(3,1),(3,4)}
30-05-2012Visionday, DTU Informatics4 DTU Informatik, Danmarks Tekniske Universitet
Large-scale networks with complex patterns of connections between elements abound naturally in both nature and in man-made systems.Biologygy
Complex networks pervades both science and popular culture
Protein Interaction Food web Epidemic Spread Connectome
Social sciences:Economics
F i d hi t k C ll b ti t k S l l tiTrade networkBusiness Organization
Friendship network Collaboration network Sexual relations
Technology
Tele communication network Airline connections The internet
30-05-2012Visionday, DTU Informatics5 DTU Informatik, Danmarks Tekniske Universitet
Basic premise: Local connectivity leads to collective behavior and for such collective behavior to be detectable (i.e., by statistical machine learning methods) one needs to analyze large datasets.
Our research focus:Bayesian non-parametric modeling of relational dataModels we consider can: 1. Adapt to the complexity of the relational data ( Bayesian non-parametrics)
2. Device generative model for graphs ( A statistical process for creating networks)
3. Be used to predict missing entries ( Evaluate significance of extracted patterns)
4. Form easy interpretable representations of structure ( Fascilitate human understanding)
5 Are scalable ( models we research grow in the complexity of the edges of the graph)5. Are scalable ( models we research grow in the complexity of the edges of the graph)
Modeling challenges we consider
30-05-2012Visionday, DTU Informatics6 DTU Informatik, Danmarks Tekniske Universitet
Bayesian learning and the principle of parsimony
The explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory.
To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize
William of Ockham
likelihood function and then normalize
Thomas Bayes
Bayesian learning embodies Occam’s razor, i.e. Complex models are penalized.
David J.C. MacKay(among others)
30-05-2012Visionday, DTU Informatics7 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical LearningInfer parameters from Infer parameters from the joint likelihood induced by the generative process
Evaluate model
Device generative
Evaluate model performance by how well the model is able to predict unobserved Device generative
process for the the data, i.e. a recipie for how the observed world
pquantities or account for observed structures using frequentist hypothesis testing with
30-05-2012Visionday, DTU Informatics8 DTU Informatik, Danmarks Tekniske Universitet
could be generated. hypothesis testing with the generative model as null-distribution.
The Infinite Relational Model/Stochastic Blockmodel(A prominent Bayesian generative model for graphs)
Learning Systems of Concepts with an Infinite Relational Model (AAAI2006)
Josh Tenenbaum Thomas Griffith Takeshi YamadaCharles Kemp Naonori UedaInfinite Hidden Relational Model (UAI 2006)
Zhao Xu Kai Yu Volker Tresp Hans-Peter Kriegel
30-05-2012Visionday, DTU Informatics9 DTU Informatik, Danmarks Tekniske Universitet
The Stochastic Blockmodel and the Infinite l i l d lRelational Model
Graph with elements Aij given b th l ti j ”d f t ” iby the relation j ”defers to” i
Holland et al. (1983), Wasserman & Andersson (1987), Snijders et al. (1997), Nowicki & Snijders (2001)
Wasserman, S., & Faust, K. Social network analysis: Methods and applications. New York, Cambridge Wasserman, S., & Faust, K. Social network analysis: Methods and applications. New York, Cambridge University Press, 1994.
Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.
Kemp, C., Griffiths, T.L. Tenenbaum, J.B. ”Discovering latent classes in relational data”, MIT 2004.
30-05-2012Visionday, DTU Informatics10 DTU Informatik, Danmarks Tekniske Universitet
Xu, Z, Yu, K, Tresp, V, Kriegel, H.-P., ”Infinite Hidden Relational Models”, UAI 2006
Bayesian Statistical LearningInfer parameters from Infer parameters from the joint likelihood induced by the generative process
Evaluate model
Device generative
Evaluate model performance by how well the model is able to predict unobserved Device generative
process for the the data, i.e. a recipie for how the observed world
pquantities or account for observed structures using frequentist hypothesis testing with
30-05-2012Visionday, DTU Informatics11 DTU Informatik, Danmarks Tekniske Universitet
could be generated. hypothesis testing with the generative model as null-distribution.
Non-parametric modeling by theChinese Restaurant Process (CRP)( )
Imagine a (Chinese) restaurant with countable infinitely many tables, labeled 1, 2,…… Customers (i.e. vertices) walk in and sit down at some table (cluster). The tables are chosen according to the following random processthe following random process.1. The first customer always chooses the first table.2. The ith customer chooses:
a) first unoccupied table with probability: /(i-1+)b) an occupied table with probability: mk/(i-1+)
where mk is the number of people already sitting at that table.
2,5,
1,3,4, 8
10
6
7,9
30-05-2012Visionday, DTU Informatics12 DTU Informatik, Danmarks Tekniske Universitet
See also tutorial:
Generative model of the IRM (A statistical recipie for simulating networks)(A statistical recipie for simulating networks)
• Draw clustering assignemnt from CRP
• Draw the relations between • Draw the relations between the generated clusters
• Draw the graph
Main Assumption: Relationships are conditionally independent given cluster assignments.
30-05-2012Visionday, DTU Informatics13 DTU Informatik, Danmarks Tekniske Universitet
(slide adapted from Elena Zhelev and Galileo Namat ”Stochastic Block Models: A Survey”)
Different kinds of relational systems
30-05-2012Visionday, DTU Informatics14 DTU Informatik, Danmarks Tekniske Universitet
Kemp, C., Griffiths, T.L. Tenenbaum, J.B. ”Discovering latent classes in relational data”, MIT 2004.
IRM extends well to many types of graphs
5 5UnDirected Directed Bipartite
2 31 4 5Multi-graph
1 3
2 4
1 3
2 4
1
2
2 31 41 3
2 4
3
2
1 35
2 4
30-05-2012Visionday, DTU Informatics15 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical LearningInfer parameters from Infer parameters from the joint likelihood induced by the generative process
Evaluate model
Device generative
Evaluate model performance by how well the model is able to predict unobserved Device generative
process for the the data, i.e. a recipie for how the observed world
pquantities or account for observed structures using frequentist hypothesis testing with
30-05-2012Visionday, DTU Informatics16 DTU Informatik, Danmarks Tekniske Universitet
could be generated. hypothesis testing with the generative model as null-distribution.
From the generative model
We can write down the joint likelihood
and we observe that can be analytically integrated out (i e collapsed): (i.e., collapsed):
30-05-2012Visionday, DTU Informatics17 DTU Informatik, Danmarks Tekniske Universitet
From the joint distribution
We can use Bayes theorem to obtain the posterior distribution of zi
And infer Z by Gibbs sampling each vertex at a time from the above posterior distribution.p
30-05-2012Visionday, DTU Informatics18 DTU Informatik, Danmarks Tekniske Universitet
By sampling we can infer z that maximizes
Permutation of graph according to z
30-05-2012Visionday, DTU Informatics19 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical LearningInfer parameters from Infer parameters from the joint likelihood induced by the generative process
Evaluate model
Device generative
Evaluate model performance by how well the model is able to predict unobserved Device generative
process for the the data, i.e. a recipie for how the observed world
pquantities or account for observed structures using frequentist hypothesis testing with
30-05-2012Visionday, DTU Informatics20 DTU Informatik, Danmarks Tekniske Universitet
could be generated. hypothesis testing with the generative model as null-distribution.
H d l t th lit f th How do we evaluate the quality of the inferred parameters?
Compare with ground truth if available• Compare with ground truth if available
Vs.
• Evaluate how well the model predicts edges/links or account for t t i th d t i ti ll d lstructure in the data using generative process as null-model.
30-05-2012Visionday, DTU Informatics21 DTU Informatik, Danmarks Tekniske Universitet
Ground truth evaluation: Use metric that is permutation invariant such as Mutual pInformation
12
12
122
3
54
3
54
6
3
54
6
1 2 3 64 51 0.2
0.2
0.20.2
1 2 3 64 511
1 2 3 4 5
2
3
40.2
0.2
0 2
0.2
0.2
2
3
40.2
2
3
4
5
4 0.2
0.1 0.1 5
40.25
4
30-05-2012Visionday, DTU Informatics22 DTU Informatik, Danmarks Tekniske Universitet
When ground truth is not available
• Evaluate models predictive abilityTwo commonly used measures based on prediction:1) Generalization error i e Evaluate predictive distribution estimated on 1) Generalization error, i.e. Evaluate predictive distribution estimated on hold out data (cross-validation) 2) Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) on hold out data (cross-validation)
• Frequentist testing of structure in networkq gUse generative model to draw graphs and evaluate the properties of these graphs to the actual graph using the generative model as null-distribution.
30-05-2012Visionday, DTU Informatics23 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical LearningInfer parameters from Infer parameters from the joint likelihood induced by the generative process
Evaluate model
Device generative
Evaluate model performance by how well the model is able to predict unobserved Device generative
process for the the data, i.e. a recipie for how the observed world
pquantities or account for observed structures using frequentist hypothesis testing with
30-05-2012Visionday, DTU Informatics24 DTU Informatik, Danmarks Tekniske Universitet
could be generated. hypothesis testing with the generative model as null-distribution.
Relational modeling of animal featuresF
Ani
mal
Feature– 50 Animals
– 85 Features
30-05-2012Visionday, DTU Informatics25 DTU Informatik, Danmarks Tekniske Universitet
Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.
ry
FeatureRelational modeling of Countries their characteristics and relations
– 14 Countries
Cou
ntr– 54 Predicates
– 90 Features
Cou
ntry
CountryRelationsCharacteristics
30-05-2012Visionday, DTU Informatics26 DTU Informatik, Danmarks Tekniske UniversitetKemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.
Example of our use of the IRM on fMRI datap
30-05-2012Visionday, DTU Informatics27 DTU Informatik, Danmarks Tekniske Universitet
27
Our Generative Model of Hierarchical Structure:
Stochastic Block Model/ IRM
Clauset et al Nature 2008
30-05-2012Visionday, DTU Informatics28 DTU Informatik, Danmarks Tekniske Universitet(Herlau , Mørup, Schmidt, Hansen, Cognitive Information Processing, 2012)
Clauset et al. Nature 2008Roy et al. NIPS 2006
Synthetic data
Zachary’s Karate Club
Our model
Existing binary d l
30-05-2012Visionday, DTU Informatics29 DTU Informatik, Danmarks Tekniske Universitet(Herlau , Mørup, Schmidt, Hansen, Cognitive Information Processing, 2012)
tree model
30-05-2012Visionday, DTU Informatics30 DTU Informatik, Danmarks Tekniske Universitet(Herlau , Mørup, Schmidt, Hansen, Cognitive Information Processing, 2012)
Our Generative Model of Community Structure
“The organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters.” (Fortunato, 2010)
Our Generative Model of Community Structure
30-05-2012Visionday, DTU Informatics31 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt, in press Neural Computation 2012)
30-05-2012Visionday, DTU Informatics32 DTU Informatik, Danmarks Tekniske Universitet(Mørup&Schmidt, in press Neural Computation 2012)
Our applications so far...• User recommendation on • User recommendation on
social/internet services such as Facebook and Netflix
• Topic modeling of large databases, Topic modeling of large databases, i.e. PubMed and NYTimes.
• Music preferential behaviour in large groups of people attending the Roskilde Festival
• Modeling functional and structural connectivity in the brain
k d l k• Link prediction in social networks
30-05-2012Visionday, DTU Informatics33 DTU Informatik, Danmarks Tekniske Universitet