clustering on the simplex

34
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark

Upload: vin

Post on 09-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Clustering on the Simplex. Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Joint work with. Christian Walder DTU Informatics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

1EMMDS 2009 July 3rd, 2009

Clustering on the Simplex

Morten Mørup DTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Page 2: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

2EMMDS 2009 July 3rd, 2009

Joint work with

Lars Kai HansenDTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Christian WalderDTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Page 3: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

3EMMDS 2009 July 3rd, 2009

Clustering

Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. (Wikipedia)

Page 4: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

4EMMDS 2009 July 3rd, 2009

Clustering approaches K-means iterative refinement algorithm

(Lloyd, 1982; Hartigan, 1979)

Problem NP-complete (Megiddo and Supowit, 1984)

Relaxations of the hard assigment problem: Annealing approaches based

on temperature parameter(T0 the original clustering problem is recovered)(see for instance Hofmann and Buhmann, 1997)

Fuzzy clustering (Hathaway and Bezdek, 1988)

Expectation Maximization (Mixture of Gaussians)

Spectral Clustering

Previously relaxations are either not exact or dependent on some problem specific annealing parameter in order to recover the original binary combinatorial assignments.

Assignmnt Step (S): Assign each data point to the cluster with closest mean value Update Step (C): Calculate the new mean value for each cluster

No single change in assignment better than current assignment

(1-spin stability).

Guarantee of optimality:

Drawbacks:

Page 5: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

5EMMDS 2009 July 3rd, 2009

From the K-means objective to Pairwise ClusteringK-mean objective

Pairwise Clustering (Buhmann and Hofmann, 1994)

K similarity matrix, K=XTX equivalent tothe k-means objective

Page 6: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

6EMMDS 2009 July 3rd, 2009

Although Clustering is hard there is room to be simple(x) minded!

Binary Combinatorial (BC) Simplicial Relaxation (SR)

Page 7: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

7EMMDS 2009 July 3rd, 2009

The simplicial relaxation (SR) admits standard continuous optimization to solve for the pairwise clustering problems.

For instance by normalization invariant projected gradient ascent:

Page 8: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

8EMMDS 2009 July 3rd, 2009

Brown and grey clusters each contain 1000 data-points in R2

Whereas the remaining clusters each have 250 data-points.

Synthetic data exampleK-means SR-clustering

Page 9: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

9EMMDS 2009 July 3rd, 2009

SR-clustering algorithm driven by high density regions

Page 10: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

10EMMDS 2009 July 3rd, 2009

SR-clustering (init=1) SR-clustering (init=0.01) Lloyd’s K-means

Thus, solutions in general substantially better than Lloyd’s algorithm having the same computational complexity

Page 11: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

11EMMDS 2009 July 3rd, 2009

10 components 50 components 100 components

K-means

SR-clustering(init=1)

SR-clustering(init=0.01)

Page 12: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

12EMMDS 2009 July 3rd, 2009

SR-clustering for Kernel based semi-supervised learning

(Basu et al, 2004, Kulis et al. 2005, Kulis et al, 2009)

Kernel based semi-supervised learning based on pairwise clustering

Page 13: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

13EMMDS 2009 July 3rd, 2009

Simplicial relaxation admit solving the problem as a (non-convex) continous optimization problem

Page 14: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

14EMMDS 2009 July 3rd, 2009

Class labels can be handled explicitly fixing Must and cannot links can be absorbed into the Kernel

Hence the problem reduces more or less to standard SR-clustering problem for the estimation of S

Page 15: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

15EMMDS 2009 July 3rd, 2009

Thus, Lagrange multipliers give a measure of conflict between the data and the supervision

At stationarity we have that the gradients of elements in each column of S that are 1 are larger than elements that are 0. Thus, evaluating the impact of the supervision can be done estimating the minimal lagrange multipliers that guarantee stationarity of the solution obtained by the SR-clustering algorithm. This is a convex optimization problem

Page 16: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

16EMMDS 2009 July 3rd, 2009

Digit classification with one miss-labeled data observation from each class.

Page 17: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

17EMMDS 2009 July 3rd, 2009

Community Detection in Complex NetworksCommunities/modules: a natural divisions of network nodes into denselyconnected subgroups (Newman & Girvan 2003)

G(V,E)

Adjacency MatrixA

Community detection algorithm

Permuted adjacency matrixPAPT

Permutation P of graph from clustering assignment S

Page 18: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

18EMMDS 2009 July 3rd, 2009

Common Community detection objectives

Hamiltonian (Fu & Anderson, 1986, Reichardt & Bornholdt, 2004)

Modularity (Newman & Girvan, 2004)

Generic problems of the form

Page 19: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

19EMMDS 2009 July 3rd, 2009

Again we can make an exact relaxation to the simplex!

Page 20: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

20EMMDS 2009 July 3rd, 2009

Page 21: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

21EMMDS 2009 July 3rd, 2009

Page 22: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

22EMMDS 2009 July 3rd, 2009

SR-clustering of complex networks

Quality of solutions comparable to results obtained by extensive Gibbs sampling

Page 23: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

23EMMDS 2009 July 3rd, 2009

So far we have demonstrated how binary combinatorial constraints are recovered at stationarity when relaxing the problems to the simplex.

However, simplex constraints also holds promising data mining properties of their own!

Page 24: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

24EMMDS 2009 July 3rd, 2009

Def: The convex hull/convex envelope of XRMN is the minimal convex set containing X. (Informally it can be described as a rubber band wrapped around the data points.)

Finding the convex hull is solvable in linear time, O(N) (McCallum and D. Avis, 1979)However, the size of the convex set grows exponentially with the dimensionality of the data, O(logM-1(N)) (Dwyer, 1988)

The Convex Hull

The Principal Convex Hull (PCH)Def: The best convex set of size K according to some measure of distortion D(·|·) (Mørup et al. 2009). (Informally it can be described as a less flexible rubber band that wraps most of the data points.)

Page 25: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

25EMMDS 2009 July 3rd, 2009

C: Give the fraction in which observations in X are used to form each feature (distinct aspects/freaks). In general C will be very sparse!!S: Give the fraction each observation resembles each distinct aspects XC.

(note when K large enough such that the PCH recover the convex hull)

The mathematical formulation of the Principal Convex Hull (PCH) is given by two simplex constraints

”Principal” in terms of the Frobenius norm

X XC

S

Page 26: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

26EMMDS 2009 July 3rd, 2009

Relation between the PCH model, low rank decomposition and clustering approaches

PCH naturally bridges clustering and low-rank approximations!

Page 27: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

27EMMDS 2009 July 3rd, 2009

Two important properties of the PCH model

The PCH model is invariant to affine

transformation and scaling

The PCH model is unique up to permutation

of the components

Page 28: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

28EMMDS 2009 July 3rd, 2009

A featu

re e

xtra

ctio

n exa

mple

More contrast in features than obtained by clustering approaches. As such, PCH aim for distict aspects/regions in data

The PCH model strives to attain Platonic ”Ideal Forms”

Page 29: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

29EMMDS 2009 July 3rd, 2009

PCH model f

or PET d

ata

(Posit

ron E

miss

ion Tom

ography)

Data contain 3 components:High-Binding regionsLow-binding regionsNon-binding regionsEach voxel given concentrationfraction of these regions

XC

S

Page 30: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

30EMMDS 2009 July 3rd, 2009

NMF spectroscopy of samples of mixtures of propanol butanol and pentanol.

Page 31: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

31EMMDS 2009 July 3rd, 2009

Collaborative filtering example

Medium size and large size Movie lens data (www.grouplens.org)Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567

Page 32: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

32EMMDS 2009 July 3rd, 2009

Conclusion The simplex offers unique data mining properties Simplicial relaxations (SR) form exact

relaxation of common hard assignment clustering problems, i.e. K-means, Pairwise Clustering and Community detection in graphs.

SR Enable to solve binary combinatorial problems using standard solvers from continuous optimization.

The proposed SR-clustering algorithm outperforms traditional iterative refinement algorithms

No need for annealing parameter. hard assignments guaranteed atstationarity (Theorem 1 and 2)

Semi-Supervised learning can be posed as continuous optimization problem with associated lagrange multipliers giving an evaluation measure of each supervised constraint

Page 33: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

33EMMDS 2009 July 3rd, 2009

The Principal Convex Hull (PCH) formed by two types of simplex constraints

Extract distinct aspects of the data Relevant for data mining in general

where low rank approximation and clustering approaches have been invoked.

Conclusion cont.

Page 34: Clustering on the Simplex

Informatics and Mathematical Modelling / Intelligent Signal Processing

34EMMDS 2009 July 3rd, 2009

A reformulation of ”Lex Parsimoniae”

Simplicity is the ultimate sophistication.

Simplexity is the ultimate sophistication.

- Leonardo Da Vinci

The simplest explanation is usually the best.

The simplex explanation is usually the best. - William of Ockham

The presented work is described in:M. Mørup and L. K. Hansen ”An Exact Relaxation of Clustering”, Submitted JMLR 2009M. Mørup, C. Walder and L. K. Hansen ”Simplicial Semi-supervised Learning”, submittedM. Mørup and L. K. Hansen ” Platonic Forms Revisited”, submitted