bayesian complementary clustering, mcmc and anglo-saxon ... · journal of the american statistical...
TRANSCRIPT
Bayesian Complementary Clustering, MCMC andAnglo-Saxon placenames
Giacomo [email protected]
Department of StatisticsUniversity of Warwick, Coventry, UK
14 March 2014
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Overview
1. Motivation: historical problem.
2. Modeling part: literature, our approach, model definition.
3. Computational part: MCMC heuristics.
4. Real data analysis.
5. Related problems and future steps.
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
A classic problem: Cluster Analysis
Aim: organizing objects into groups whose members are “similar”.
Geometrical interpretation: separate points into clusters made of close points.
Figure : A point pattern being divided into two clusters.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 1 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Popular approaches to Cluster Analysis
Deterministic approach
• K-mean clustering,
• Hierarchical clustering,
• ...
Probabilistic approach(model-based clustering)
• Mixture of Gaussians,
• Bayesian cluster models (BCM),• Inferences on centers;• Inferences on intensity measure;• Inferences on cluster partition;
• ...
Figure : k-mean clustering.
Figure : Model-based clustering.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 2 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Bayesian Cluster Models
Definition (Cluster point process)A cluster process is the superposition of a collection of independent daughterpoint processes ∪z∈zxz indexed by the points of a center point process z.
Figure : Centre process z. Figure : Cluster process x.
Observed points x = {x1, ..., xn(x)}
Unobserved centers z = {z1, ..., zN(z)} and partition ρ = {C1, ...,CN(p)}
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 3 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Bayesian Cluster models: Example 1
Kang, Jian, et al. ”Meta analysis of functional neuroimaging data via Bayesianspatial point processes.” Journal of the American Statistical Association 106.493(2011): 124-134.
Figure : Application of a Bayesian Cluster Model in functional neuroimaging: inferenceson activation centers given peak activation locations.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 4 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Model-based Clustering: Example 2
Hill, B.J., Kendall, W.S. & Thonnes, E., 2012. Fibre-generated Point Processesand Fields of Orientations. Annals of Applied Statistics, 6(3), pp.994-1020.
Figure : Fibre clustering. Application to fingerprints.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 5 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Our clustering problem: original motivation
Problem posed by JohnBlair (History Professorfrom Oxford).
Figure : Reconstruction of an Anglo-Saxon settlement inWest-Stow, Suffolk. (Image borrowed from John Blair)
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 6 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Empirical Observations - 1
Stretton, Newton, Burton, Carlton in the region of Gt .Glen
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 7 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Empirical Observations - 1
Stretton, Newton, Burton, Carlton in the region of Gt .Glen
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 7 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Empirical Observations - 2
Stratton, Charlton, Kingston, Burton in the region of Dorchester
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 8 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Empirical Observations - 2
Stratton, Charlton, Kingston, Burton in the region of Dorchester
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 8 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Structure of administrative clusters
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 9 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Need for cluster analysis
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 10 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Need for cluster analysis
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 10 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Need for cluster analysis
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 10 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Need for cluster analysis
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 10 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Need for cluster analysis
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 10 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
The data as a marked point process
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 11 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
The data as a marked point process
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 11 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
The data as a marked point process
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 11 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Historical Questions - 1
Is there statistical support to the “administrative clusters hypothesis” comingfrom the geographical locations of settlements?
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 12 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Historical Questions - 2
What is the typical intra-cluster dispersion σ?
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 13 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Historical Questions - 3
Which portion of the settlements are clustered together?Which placenames tend to cluster together?Can we provide a list of clusters more strongly supported by the analysis?
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 14 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Model requirements
Marked Cluster Point Process: eachpoint is assigned a mark (color), in thiscase placenames.
Complementary clustering: two pointsof the same color are not admitted in thesame cluster.
Marked Cluster Point Process: eachpoint is assigned a mark (color), in thiscase placenames.
Complementary clustering: two pointsof the same color are not admitted in thesame cluster.
Inferences on the cluster partition:
π(ρ|x) ∝ π(ρ)π(x|ρ)
x = {x1, ..., xn} ↔ Observed pointsρ = {C1, ...,CN} ↔ Unobserved partition
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 15 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Model requirements
Marked Cluster Point Process: eachpoint is assigned a mark (color), in thiscase placenames.
Complementary clustering: two pointsof the same color are not admitted in thesame cluster.
Marked Cluster Point Process: eachpoint is assigned a mark (color), in thiscase placenames.
Complementary clustering: two pointsof the same color are not admitted in thesame cluster.
Inferences on the cluster partition:
π(ρ|x) ∝ π(ρ)π(x|ρ)
x = {x1, ..., xn} ↔ Observed pointsρ = {C1, ...,CN} ↔ Unobserved partition
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 15 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
BCM with inferences on the partitions
We are given n observed points x = {x1, ..., xn}.
We denote a partition of x by ρ = {C1, ...,CN(ρ)}.
Random Partition Model (RPM)
1. Define a prior distribution π(ρ): for the partition of n elements into clusters(π must be exchangeable with respect to both clusters and points labels).
2. Define the distribution of x|ρ (usually define the distribution hs(·) of a clusterwith s elements and suppose each cluster is independent).
3. Perform Bayesian inferences on the partition:
π(ρ|x) ∝ π(ρ) π(x|ρ) = π(ρ)
N(ρ)∏i=1
h|Cj |(xCj )
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 16 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Data Generation Model for x|ρCluster centers: z ∼ Inhomogeneous Poisson Point Process
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 17 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Data Generation Model for x|ρLocations: i.i.d. Gaussians conditioned on the cluster centers being their means
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 18 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Prior on the partition π(ρ)
Main idea: pass from a partition of n points into many little clusters to a partitionof n points into k big clusters.
Exchangeable prior on partitionsρ = {C1, ...,CN(ρ)} partition of {1, ..., n}.Nl(ρ) := #{Cj : |Cj | = l}, l=1,...n.
π(ρ) exchangeable over cluster and points indices⇒ π(ρ) depends only on N1(ρ), ...,Nn(ρ) (e.g. Dirichlet process)
Complementary clustering with k typesNk+1(ρ) = ... = Nn(ρ) = 0
⇒ π(ρ) depends only on N1(ρ), ...,Nk(ρ)
Yl(ρ) := l Nl(ρ) number of points in clusters of size l
⇒∑k
l=1 Yl(ρ) = n and Yl ≡l 0
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 19 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Prior on the partition π(ρ)
ρ partition ↔(Y1(ρ), ...,Yk(ρ)
)Distribution on the multivariate vector
Pr(Y1 = y1, ...,Yk = yk) ∝
{n(x)!
y1!···yk !py11 · · · p
ykk if
∑kl=1 yl = n(x) and yl ≡l 0,
0 otherwise,
with p = (p1, ..., pk) ∼ Dir(1, ..., 1).
Distribution on the partition
π(ρ) ∝ 1
η(ρ)
n(x)!
Y1(ρ)! · · ·Yk(ρ)!pY1(ρ)1 · · · pYk (ρ)
k ,
where η(ρ) = #{ρ | Y1(ρ) = Y1(ρ), ...,Yk(ρ) = Yk(ρ)}.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 20 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Posterior distribution
Random objects
pρσx
probability vector of cluster sizespartition into clustersintra-cluster dispersionobserved point process
π(ρ, σ,p|x) ∝ π(p) π(σ) π(ρ|p) π(x|ρ, σ) ∝ π(σ) π(ρ|p)
N(ρ)∏i=1
h|Cj |(xCj ) ∝
∝ π(σ)k∏
l=1
(Nl !
(lNl)!
) N(ρ)∏j=1
(c|Cj | exp
(−∑
i∈Cj(xi − xCj )
2
2σ2
)),
where cs =(ks
)−1 pss
s W (2πσ2)s−1 .
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 21 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Posterior distribution
Random objects
pρσx
probability vector of cluster sizespartition into clustersintra-cluster dispersionobserved point process
π(ρ, σ,p|x) ∝ π(p) π(σ) π(ρ|p) π(x|ρ, σ) ∝ π(σ) π(ρ|p)
N(ρ)∏i=1
h|Cj |(xCj ) ∝
∝ π(σ)k∏
l=1
(Nl !
(lNl)!
) N(ρ)∏j=1
(c|Cj | exp
(−∑
i∈Cj(xi − xCj )
2
2σ2
)),
where cs =(ks
)−1 pss
s W (2πσ2)s−1 .
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 21 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Posterior distribution
Observed point process x is the superposition of all the clusters.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 22 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Posterior distribution
We are interested in the posterior distribution of ρ: π(ρ|x)
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 23 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Intractability of π (ρ|x)
π (ρ|x) probability measure on Pn, the set of partition of {1, ..., n}.We are interested in E[f (ρ)] for ρ ∼ π(·|x).
Problems
• π(·|x) is known up to a normalizing constant.
• Normalizing π(·|x) or sampling exactly from π(·|x) is extremely inefficient(the order of Pn is between n! and nn).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 24 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Finding ρmax = argmaxρπ(ρ|x) ?
Sampling exactly from π(·|x) is unfeasible. Can we at least find the maximum?
Find ρmax ↔ Optimal Assignation Problem
2D Optimal Assignation ProblemSolvable polynomially O(n3) with Hungarianalgorithm (Optimal Transportation Theory).
kD Optimal Assignation Problem (k≥ 3)NP-hard optimization problem. Notapproximable in polynomial time with anydeterministic algorithm (not in APX).
Figure : Optimal assignation with50 red and 50 blue points
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 25 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Feasible approach: Monte Carlo Markov Chains
MCMC approachSimulate an ergodic Markov chain(Xn)n≥0 with stationary distribution π.Estimate I = Eπ[f (X )] with
In :=1
n
t+n∑k=t
f (Xk).
SLLN for Markov Chains gives In → Ia.s. under mild conditions.
Question: How to design a Markov chain whose stationary distribution is π? Is iteasier than sampling or normalizing π?
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 26 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Reversibility and Stationarity
Reversibility conditionA transition kernel P is reversible withrespect to π if ∀(x , y) ∈ X × X
π(x)P(x , y) = π(y)P(y , x). (1)
(1) implies πP = π, i.e. π is the stationarydistribution of (Xn)n≥0 driven by P.
Figure : “Probability flow”between x and y starting from π
Metropolis-Hastings algorithm idea: Given a transition kernel Q(x , y) make itreversible with respect to π by suppressing some moves with the correctprobabilities.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 27 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Metropolis Hastings algorithm
Metropolis-Hasting AlgorithmObtain Xn+1 from Xn by
1. Sample the proposed move X ∼ Q(Xn, ·)2. Compute the acceptance probability
α(Xn,X ) = min{
1,f (X )Q(X ,Xn)
f (Xn)Q(Xn,X )
}3. With probability α(Xn,X ) set Xn+1 = X , otherwise set Xn+1 = Xn
Very general and flexible but performances depend heavily on the choice of Q.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 28 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
2D case (i.e. 2 colors)
Why start from 2D?
• Simpler: use it to design a good proposal Q for MH algorithm (understandwhat can go wrong and how to solve it).
• ρmax = argmaxρ∈Pnπ(ρ|x) is available: reliable check for the MCMC.
• It allows us to explore pairwise interaction among placenames in the dataset.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 29 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
2D case (i.e. 2 colors)
Posterior Sample Space for 2DPartial matchings contained in a complete bipartite graph.
partition ρ ={{1}, {2, 6}, {3}, {4, 7}, {5}
}↔ matching X (ρ)
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 30 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Proposal distribution (I)
Proposal distributionQ(Xold ,Xnew )
1. Pick a red point i and a bluepoint j uniformly at random.
2. Propose the correspondingmove (add/remove/switch).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 31 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Proposal distribution (II)
vertices
edges
↔ states of the MC
↔ moves allowed
Proposal Distribution
Xnew ∼ Unif (N(Xold))
where N(X ) being the set ofstates connected to X .
QuestionDo we have a good mixing?
Figure : Markov Chain represented by a graph
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 32 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Uniform proposal leads to bad mixing
Low acceptance rate: Q(Xold ,Xnew ) often proposes to links two far-away points.
Possible solution: Change Q to take into account the geometry of the problem.
Instead of picking a red point i and a blue point j uniformly at random choosethem according to some q(i , j).
Q(Xold ,Xnew ) ∝
{q(i , j), if Xold goes in Xnew by choosing {i , j},
0 if Xnew /∈ N(Xold).
QuestionWhat is the optimal choice of q(i , j)?
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 33 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Different proposal distributions
Three proposal1. q(i , j) ∝ 1;
2. q(i , j) ∝ π(Xnew );
3. q(i , j) ∝√π(Xnew ).
where Xnew is theproposed state.
Intuitive ideaGiven the set of allowedmoves a good mixingwill be obtained by
Q(Xold ,Xnew )
Q(Xnew ,Xold)≈ π(Xnew )
π(Xold).
Q(Xold ,Xnew )
Q(Xnew ,Xold)=
qXold(i , j)
qXnew (i , j)=
For proposal 2:
=π(Xnew )/π(N(Xold))
π(Xold)/π(N(Xnew ))=π(Xnew )
π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈ π(Xnew )
π(Xold)
π(Xnew )
π(Xold)=
(π(Xnew )
π(Xold)
)2
.
For proposal 3:
=
√π(Xnew )/π(N(Xold))√π(Xold)/π(N(Xnew ))
=
√π(Xnew )√π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈√π(Xnew )√π(Xold)
√π(Xnew )√π(Xold)
=π(Xnew )
π(Xold).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 34 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Different proposal distributions
Three proposal1. q(i , j) ∝ 1;
2. q(i , j) ∝ π(Xnew );
3. q(i , j) ∝√π(Xnew ).
where Xnew is theproposed state.
Intuitive ideaGiven the set of allowedmoves a good mixingwill be obtained by
Q(Xold ,Xnew )
Q(Xnew ,Xold)≈ π(Xnew )
π(Xold).
Q(Xold ,Xnew )
Q(Xnew ,Xold)=
qXold(i , j)
qXnew (i , j)=
For proposal 2:
=π(Xnew )/π(N(Xold))
π(Xold)/π(N(Xnew ))=π(Xnew )
π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈ π(Xnew )
π(Xold)
π(Xnew )
π(Xold)=
(π(Xnew )
π(Xold)
)2
.
For proposal 3:
=
√π(Xnew )/π(N(Xold))√π(Xold)/π(N(Xnew ))
=
√π(Xnew )√π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈√π(Xnew )√π(Xold)
√π(Xnew )√π(Xold)
=π(Xnew )
π(Xold).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 34 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Different proposal distributions
Three proposal1. q(i , j) ∝ 1;
2. q(i , j) ∝ π(Xnew );
3. q(i , j) ∝√π(Xnew ).
where Xnew is theproposed state.
Intuitive ideaGiven the set of allowedmoves a good mixingwill be obtained by
Q(Xold ,Xnew )
Q(Xnew ,Xold)≈ π(Xnew )
π(Xold).
Q(Xold ,Xnew )
Q(Xnew ,Xold)=
qXold(i , j)
qXnew (i , j)=
For proposal 2:
=π(Xnew )/π(N(Xold))
π(Xold)/π(N(Xnew ))=π(Xnew )
π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈ π(Xnew )
π(Xold)
π(Xnew )
π(Xold)=
(π(Xnew )
π(Xold)
)2
.
For proposal 3:
=
√π(Xnew )/π(N(Xold))√π(Xold)/π(N(Xnew ))
=
√π(Xnew )√π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈√π(Xnew )√π(Xold)
√π(Xnew )√π(Xold)
=π(Xnew )
π(Xold).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 34 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Different proposal distributions
Three proposal1. q(i , j) ∝ 1;
2. q(i , j) ∝ π(Xnew );
3. q(i , j) ∝√π(Xnew ).
where Xnew is theproposed state.
Intuitive ideaGiven the set of allowedmoves a good mixingwill be obtained by
Q(Xold ,Xnew )
Q(Xnew ,Xold)≈ π(Xnew )
π(Xold).
Q(Xold ,Xnew )
Q(Xnew ,Xold)=
qXold(i , j)
qXnew (i , j)=
For proposal 2:
=π(Xnew )/π(N(Xold))
π(Xold)/π(N(Xnew ))=π(Xnew )
π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈ π(Xnew )
π(Xold)
π(Xnew )
π(Xold)=
(π(Xnew )
π(Xold)
)2
.
For proposal 3:
=
√π(Xnew )/π(N(Xold))√π(Xold)/π(N(Xnew ))
=
√π(Xnew )√π(Xold)
π(N(Xnew ))
π(N(Xold))≈
≈√π(Xnew )√π(Xold)
√π(Xnew )√π(Xold)
=π(Xnew )
π(Xold).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 34 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Compare performances of the three proposals (I)
1) q(i , j) ∝ 1 2) q(i , j) ∝ π(Xnew ) 3) q(i , j) ∝√π(Xnew )
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 35 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Compare performances of the three proposals (II)
1) q(i , j) ∝ 1 2) q(i , j) ∝ π(Xnew ) 3) q(i , j) ∝√π(Xnew )
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 36 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Multiple proposal scheme
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 37 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Multiple proposal scheme
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 37 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Multiple proposal scheme
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 37 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Convergence Diagnostic (I)
Figure : The summary statistic in (a) and (b) is the number of different edges from afixed matching. In (c) the intensity of gray represents the percentage of time the link hasbeen present in the MCMC.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 38 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Convergence Diagnostic (II)
Figure : Left: G&R diagnostic for a 10-dimensional summary of 5 independent runs ofthe MCMC. Right: average of the value of D obtained by comparing 5 independent runs.
Let pij = Pπ[{i , j} ∈ X ] and p(1)ij , p
(2)ij be the value estimated by two independent
MCMC runs. D12 := sup{i,j}∈E |p(1)ij − p
(2)ij | can be used as a measure of proximity.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 39 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Convergence Diagnostic (III)
Single/multiple q(i , j) ∝ ESS for ESS for Time for Time forproposal scheme 104 steps 10 sec GR < 1.002 D < 0.05
Single√π (Xnew ) 83.46 64.65 20 · 104 28 · 104
Single π (Xnew ) 30.19 23.38 49 · 104 48 · 104
Multiple (l=4)√π (Xnew ) 204.54 128.56 6 · 104 7 · 104
Table : The values refer to a synthetic sample with 100 + 100 points. The EffectiveSample Size (ESS) and Gelman&Rubin diagnostic are estimated using the coda package.The running time is evaluated using R software running on a desktop computer withIntel i7 processor.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 40 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Multimodality for the complete matching case
For pnoise → 0, π(X ) showsmultimodality.⇒ the MCMC gets stuck inlocal maxima.
We used Simulated Tem-pering and tested on ex-treme cases (artificial cy-cles).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 41 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Theoretical bounds for mixing times?
2-color case: Monomer-dimer systemsGiven a graph G = (V ,E ) with edge weights w : E → [0,∞) the state space is{0, 1}E with probability distribution
π(X ) ∝
∏e:X (e)=1
w(e)
(∏i∈V
1(degX (i) ≤ 1)
). (2)
Jerrum and Sinclair [1996] use canonical paths arguments to prove that
τX (ε) ≤ 4(#E )(#V )w ′2(log(#E )#E + log
(ε−1)), w ′ = max
{1,max
e∈Ew(e)
}.
Experiment with 100 points (50 blue + 50 red)
JS bound:5 · 1014 steps
G&R diagnostic:106 steps
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 42 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
General k dimensional case
Posterior Sample Space for KDPartial matchings (i.e. hypergraphs of degree at most 1) contained in a k-partitecomplete hypergraphs.
Figure : A complete 3-partite hypergraphFigure : A matching in a 3-partitehypergraph. The corresponding partitionis {{1}, {2, 3, 4}}.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 43 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Reducing kD to 2D (I)
π (ρ|x) ∝k∏
l=1
(Nl !
(lNl)!
) N(ρ)∏j=1
c|Cj | exp
(−∑
i∈Cj(xi − xCj )
2
2σ2
).
LemmaFor any x1, ..., xs , z ∈ Rn and x = s−1
∑si=1 xi it holds
s∑i=1
(xi − x
)2=
s∑i=1
(xi − z
)2 − s(x − z)2.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 44 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Reducing kD to 2D (II)
CorollaryFor any x1, ..., xs ∈ Rn, x = s−1
∑si=1 xi and x (s−1) = (s − 1)−1
∑s−1i=1 xi it holds
s∑i=1
(xi − x
)2=
(s−1)∑i=1
(xi − x (s−1)
)2 +
s − 1
s(xs − x (s−1))2.
ProofFollow from previous lemma by putting z = x (s−1).
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 45 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Reducing kD to 2D (III)
π(ρ|σ,p, x
)= π2D
(ρ2D |σ,p, x2D
)where
• x2D = x2D(x, ρ, i) is the 2-color point process obtained by keeping the pointsof the i-th color and replace the others with their clusters centroids.
• ρ2D = ρ2D(ρ, i) is the induced matching.
• π2D is the posterior distribution of the 2-color case with slight changes.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 46 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
kD Algorithm
kD-AlgorithmObtain ρnew from ρold as follows
1) Sample a color i ∼ U({1, ..., k}
);
2) Evaluate x2D = x2Dold(x, ρold , i) andρ2Dold = ρ2D(ρold , i);
3) Obtain ρ2Dnew performing a moveof the 2D-Algorithm usingπ2D(·|σ, x2Dold) as target distributionand ρ2Dold as current state;
4) Obtain ρnew from ρ2Dnew .
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 47 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
kD algorithm results
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 48 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Format of the Dataset
PARISH/ GRID DATE OFCOUNTY PLACE TOWNSHIP REF FIRST
EVIDENCEBRK Bourton Bourton SU 230870 c. 1200BUC Bierton Bierton with Broughton SP 836152 DBBUC Bourton Buckingham SP 710333 DBCHE Burton Burton (T) SJ 509639 DBCHE Burton Burton (T) SJ 317743 1152CHE Buerton Buerton (T) SJ 682433 DB
Table : Data available regarding the first 6 settlement with the name Burton.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 49 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Data Cleaning
Operations performed
1. Placenames: Burton, Bourton, Bierton, Buerton, etc. → Burton.
2. Locations: SP 836 152 identify 100× 100m square.SP 83 15 identify 1km × 1km square.In both cases the placename is placed at the centre of the square.
3. “Multiple” settlements
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 50 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Number of Settlements
Placenames total # with # of couples # of couplesnumber 1km acc. (by historians) (by proximity)
Aston/Easton 90 0 1 8Bolton 17 1 1 0Burh-Stall 29 2 1 0Burton 109 2 1 7Centres 46 0 0 0Charlton/Charlcot 98 3 7 1Chesterton 9 0 0 0Claeg 84 13 0 5Draycot/Drayton 55 1 0 2Eaton 33 1 1 5Kingston 71 1 1 1Knighton 26 1 0 0Newbold 34 3 1 0Newton 191 5 4 5Norton 74 1 8 1Stratton 37 0 5 0Sutton 101 2 4 5Tot 77 17 1 1Walton/Walcot 51 4 1 0Weston 85 3 3 2Total 1317 60 40 43
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 51 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Density estimation
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 52 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Density estimation
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 52 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Homogeneous and inhomogeneous K-cross functions
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 53 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Interaction plot based on K-cross functions
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 54 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Using our model: three names considered
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 55 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Using our model: three names considered
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 55 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Using our model: higher-order interaction
Figure : Three placenames are considered: Charlton − Newton − Norton. Estimatedposterior distributions for σ, p1, p2 and p3 are shown in red. Prior distributions areshown in blue.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 56 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 57 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Considering 11 placenames
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 58 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 59 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Future steps - Possible developments
Short term:
• Analyze more carefully the real dataset: sensitivity analysis, consider all 20placenames.
• Consider heterogeneity among placenames?
• Discuss results with historians for the interpretation.
Longer term:
• (Computation) Theoretical results on optimal proposal forMetropolis-Hastings on discrete spaces?
• (Computation) A more careful comparison with other sampling schemes forclustering and data association problems.
• (Applications) Other context where complementary clustering occur?
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 60 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Acknowledgments
Prof. Wilfrid Kendall for the supportive and wise supervision, Prof. John Blair forthe collaboration, EPSRC for funding.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 61 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Thank you
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 62 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Second problem of the MCMC
Problem: MCMC getsstuck in local maxima.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 57 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Simulated Tempering
Aim: overcome multimodality.
Procedure:
• Consider an artificial sample spaceX0 ∪ ...∪Xs made of s + 1 copies ofthe original sample space X .
• Assign to each copy Xj a probability
distribution πj with density fβjπ (a
tempered version of π), withβ0 = 1.
• Implement an MCMC running onthe new space and keep just theiterations lying in X0.
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 58 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Testing Simulated Tempering
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 59 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Density estimations
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 60 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Density estimations
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 60 / 60
Introduction Problem considered The model MCMC MCMC for 2D kD Real Data
Density estimations
Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 14/03/2014 60 / 60