clustering on the simplex
DESCRIPTION
Clustering on the Simplex. Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Joint work with. Christian Walder DTU Informatics - PowerPoint PPT PresentationTRANSCRIPT
Informatics and Mathematical Modelling / Intelligent Signal Processing
1EMMDS 2009 July 3rd, 2009
Clustering on the Simplex
Morten Mørup DTU Informatics
Intelligent Signal ProcessingTechnical University of Denmark
Informatics and Mathematical Modelling / Intelligent Signal Processing
2EMMDS 2009 July 3rd, 2009
Joint work with
Lars Kai HansenDTU Informatics
Intelligent Signal ProcessingTechnical University of Denmark
Christian WalderDTU Informatics
Intelligent Signal ProcessingTechnical University of Denmark
Informatics and Mathematical Modelling / Intelligent Signal Processing
3EMMDS 2009 July 3rd, 2009
Clustering
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. (Wikipedia)
Informatics and Mathematical Modelling / Intelligent Signal Processing
4EMMDS 2009 July 3rd, 2009
Clustering approaches K-means iterative refinement algorithm
(Lloyd, 1982; Hartigan, 1979)
Problem NP-complete (Megiddo and Supowit, 1984)
Relaxations of the hard assigment problem: Annealing approaches based
on temperature parameter(T0 the original clustering problem is recovered)(see for instance Hofmann and Buhmann, 1997)
Fuzzy clustering (Hathaway and Bezdek, 1988)
Expectation Maximization (Mixture of Gaussians)
Spectral Clustering
Previously relaxations are either not exact or dependent on some problem specific annealing parameter in order to recover the original binary combinatorial assignments.
Assignmnt Step (S): Assign each data point to the cluster with closest mean value Update Step (C): Calculate the new mean value for each cluster
No single change in assignment better than current assignment
(1-spin stability).
Guarantee of optimality:
Drawbacks:
Informatics and Mathematical Modelling / Intelligent Signal Processing
5EMMDS 2009 July 3rd, 2009
From the K-means objective to Pairwise ClusteringK-mean objective
Pairwise Clustering (Buhmann and Hofmann, 1994)
K similarity matrix, K=XTX equivalent tothe k-means objective
Informatics and Mathematical Modelling / Intelligent Signal Processing
6EMMDS 2009 July 3rd, 2009
Although Clustering is hard there is room to be simple(x) minded!
Binary Combinatorial (BC) Simplicial Relaxation (SR)
Informatics and Mathematical Modelling / Intelligent Signal Processing
7EMMDS 2009 July 3rd, 2009
The simplicial relaxation (SR) admits standard continuous optimization to solve for the pairwise clustering problems.
For instance by normalization invariant projected gradient ascent:
Informatics and Mathematical Modelling / Intelligent Signal Processing
8EMMDS 2009 July 3rd, 2009
Brown and grey clusters each contain 1000 data-points in R2
Whereas the remaining clusters each have 250 data-points.
Synthetic data exampleK-means SR-clustering
Informatics and Mathematical Modelling / Intelligent Signal Processing
9EMMDS 2009 July 3rd, 2009
SR-clustering algorithm driven by high density regions
Informatics and Mathematical Modelling / Intelligent Signal Processing
10EMMDS 2009 July 3rd, 2009
SR-clustering (init=1) SR-clustering (init=0.01) Lloyd’s K-means
Thus, solutions in general substantially better than Lloyd’s algorithm having the same computational complexity
Informatics and Mathematical Modelling / Intelligent Signal Processing
11EMMDS 2009 July 3rd, 2009
10 components 50 components 100 components
K-means
SR-clustering(init=1)
SR-clustering(init=0.01)
Informatics and Mathematical Modelling / Intelligent Signal Processing
12EMMDS 2009 July 3rd, 2009
SR-clustering for Kernel based semi-supervised learning
(Basu et al, 2004, Kulis et al. 2005, Kulis et al, 2009)
Kernel based semi-supervised learning based on pairwise clustering
Informatics and Mathematical Modelling / Intelligent Signal Processing
13EMMDS 2009 July 3rd, 2009
Simplicial relaxation admit solving the problem as a (non-convex) continous optimization problem
Informatics and Mathematical Modelling / Intelligent Signal Processing
14EMMDS 2009 July 3rd, 2009
Class labels can be handled explicitly fixing Must and cannot links can be absorbed into the Kernel
Hence the problem reduces more or less to standard SR-clustering problem for the estimation of S
Informatics and Mathematical Modelling / Intelligent Signal Processing
15EMMDS 2009 July 3rd, 2009
Thus, Lagrange multipliers give a measure of conflict between the data and the supervision
At stationarity we have that the gradients of elements in each column of S that are 1 are larger than elements that are 0. Thus, evaluating the impact of the supervision can be done estimating the minimal lagrange multipliers that guarantee stationarity of the solution obtained by the SR-clustering algorithm. This is a convex optimization problem
Informatics and Mathematical Modelling / Intelligent Signal Processing
16EMMDS 2009 July 3rd, 2009
Digit classification with one miss-labeled data observation from each class.
Informatics and Mathematical Modelling / Intelligent Signal Processing
17EMMDS 2009 July 3rd, 2009
Community Detection in Complex NetworksCommunities/modules: a natural divisions of network nodes into denselyconnected subgroups (Newman & Girvan 2003)
G(V,E)
Adjacency MatrixA
Community detection algorithm
Permuted adjacency matrixPAPT
Permutation P of graph from clustering assignment S
Informatics and Mathematical Modelling / Intelligent Signal Processing
18EMMDS 2009 July 3rd, 2009
Common Community detection objectives
Hamiltonian (Fu & Anderson, 1986, Reichardt & Bornholdt, 2004)
Modularity (Newman & Girvan, 2004)
Generic problems of the form
Informatics and Mathematical Modelling / Intelligent Signal Processing
19EMMDS 2009 July 3rd, 2009
Again we can make an exact relaxation to the simplex!
Informatics and Mathematical Modelling / Intelligent Signal Processing
20EMMDS 2009 July 3rd, 2009
Informatics and Mathematical Modelling / Intelligent Signal Processing
21EMMDS 2009 July 3rd, 2009
Informatics and Mathematical Modelling / Intelligent Signal Processing
22EMMDS 2009 July 3rd, 2009
SR-clustering of complex networks
Quality of solutions comparable to results obtained by extensive Gibbs sampling
Informatics and Mathematical Modelling / Intelligent Signal Processing
23EMMDS 2009 July 3rd, 2009
So far we have demonstrated how binary combinatorial constraints are recovered at stationarity when relaxing the problems to the simplex.
However, simplex constraints also holds promising data mining properties of their own!
Informatics and Mathematical Modelling / Intelligent Signal Processing
24EMMDS 2009 July 3rd, 2009
Def: The convex hull/convex envelope of XRMN is the minimal convex set containing X. (Informally it can be described as a rubber band wrapped around the data points.)
Finding the convex hull is solvable in linear time, O(N) (McCallum and D. Avis, 1979)However, the size of the convex set grows exponentially with the dimensionality of the data, O(logM-1(N)) (Dwyer, 1988)
The Convex Hull
The Principal Convex Hull (PCH)Def: The best convex set of size K according to some measure of distortion D(·|·) (Mørup et al. 2009). (Informally it can be described as a less flexible rubber band that wraps most of the data points.)
Informatics and Mathematical Modelling / Intelligent Signal Processing
25EMMDS 2009 July 3rd, 2009
C: Give the fraction in which observations in X are used to form each feature (distinct aspects/freaks). In general C will be very sparse!!S: Give the fraction each observation resembles each distinct aspects XC.
(note when K large enough such that the PCH recover the convex hull)
The mathematical formulation of the Principal Convex Hull (PCH) is given by two simplex constraints
”Principal” in terms of the Frobenius norm
X XC
S
Informatics and Mathematical Modelling / Intelligent Signal Processing
26EMMDS 2009 July 3rd, 2009
Relation between the PCH model, low rank decomposition and clustering approaches
PCH naturally bridges clustering and low-rank approximations!
Informatics and Mathematical Modelling / Intelligent Signal Processing
27EMMDS 2009 July 3rd, 2009
Two important properties of the PCH model
The PCH model is invariant to affine
transformation and scaling
The PCH model is unique up to permutation
of the components
Informatics and Mathematical Modelling / Intelligent Signal Processing
28EMMDS 2009 July 3rd, 2009
A featu
re e
xtra
ctio
n exa
mple
More contrast in features than obtained by clustering approaches. As such, PCH aim for distict aspects/regions in data
The PCH model strives to attain Platonic ”Ideal Forms”
Informatics and Mathematical Modelling / Intelligent Signal Processing
29EMMDS 2009 July 3rd, 2009
PCH model f
or PET d
ata
(Posit
ron E
miss
ion Tom
ography)
Data contain 3 components:High-Binding regionsLow-binding regionsNon-binding regionsEach voxel given concentrationfraction of these regions
XC
S
Informatics and Mathematical Modelling / Intelligent Signal Processing
30EMMDS 2009 July 3rd, 2009
NMF spectroscopy of samples of mixtures of propanol butanol and pentanol.
Informatics and Mathematical Modelling / Intelligent Signal Processing
31EMMDS 2009 July 3rd, 2009
Collaborative filtering example
Medium size and large size Movie lens data (www.grouplens.org)Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567
Informatics and Mathematical Modelling / Intelligent Signal Processing
32EMMDS 2009 July 3rd, 2009
Conclusion The simplex offers unique data mining properties Simplicial relaxations (SR) form exact
relaxation of common hard assignment clustering problems, i.e. K-means, Pairwise Clustering and Community detection in graphs.
SR Enable to solve binary combinatorial problems using standard solvers from continuous optimization.
The proposed SR-clustering algorithm outperforms traditional iterative refinement algorithms
No need for annealing parameter. hard assignments guaranteed atstationarity (Theorem 1 and 2)
Semi-Supervised learning can be posed as continuous optimization problem with associated lagrange multipliers giving an evaluation measure of each supervised constraint
Informatics and Mathematical Modelling / Intelligent Signal Processing
33EMMDS 2009 July 3rd, 2009
The Principal Convex Hull (PCH) formed by two types of simplex constraints
Extract distinct aspects of the data Relevant for data mining in general
where low rank approximation and clustering approaches have been invoked.
Conclusion cont.
Informatics and Mathematical Modelling / Intelligent Signal Processing
34EMMDS 2009 July 3rd, 2009
A reformulation of ”Lex Parsimoniae”
Simplicity is the ultimate sophistication.
Simplexity is the ultimate sophistication.
- Leonardo Da Vinci
The simplest explanation is usually the best.
The simplex explanation is usually the best. - William of Ockham
The presented work is described in:M. Mørup and L. K. Hansen ”An Exact Relaxation of Clustering”, Submitted JMLR 2009M. Mørup, C. Walder and L. K. Hansen ”Simplicial Semi-supervised Learning”, submittedM. Mørup and L. K. Hansen ” Platonic Forms Revisited”, submitted