clusterpath: an algorithm for clustering using convex fusion...

Clusterpath: an Algorithm for Clusteringusing Convex Fusion Penalties

Toby Dylan Hocking Armand Joulin Francis BachJean-Philippe Vert

INRIA – Sierra team, Laboratoire d’Informatique de l’ÉNSMines ParisTech – CBIO, INSERM U900, Institut Curie

Paris, France

The clustering problem: different appraoches

Clustering: assignlabels to n pointsin p dimensionsX ∈ Rn×p.

Methods:K-meansHierarchicalMixture modelsSpectral (Ng etal. 2001)

Issues:HierarchyConvexityGreedinessStabilityInterpretability

Clusterpath: relaxing a hard fusion penalty

Hard-thresholding of differences is a combinatorial problem:minα∈Rn×p ||α− X ||2F subject to

∑i<j 1αi 6=αj 6 t

Relaxation:∑

i<j ||αi − αj ||qwij 6 tThe Lagrange form is useful for optimization algorithms:

α∗(X , λ, q, w) = argminα∈Rn×p12||α− X ||2F + λ

∑i<j

||αi − αj ||qwij

The clusterpath of X is the path of optimal α∗ obtained byvarying λ, for fixed weights wij ∈ R+ and norm q ∈ 1, 2, ∞.Related work: “fused lasso” Tibshirani and Saunders (2005),“grouping pursuit” Shen and Huang (2010), “sum of norms”Lindsten et al. (2011).

Norm and weights control the clusterpath

norm = 1

norm = 2

norm = ∞

Geometric interpretation: constrain area between pointsIdentity weights, t = Ω(X )

Decreasing weights after join, t < Ω(X )

w12w13

αC = α2 = α3

Decreasing weights, t = Ω(X )

w12w13

We propose dedicated algorithms for each norm

Norm Properties Algorithm Complexity Problem sizes1 piecewise linear, separable path O(pn log n) large ≈ 105

2 rotation invariant active-set O(n2p) medium ≈ 103∞ piecewise linear Frank-Wolfe unknown* medium ≈ 103

*each iteration of complexity O(n2p).

Outline of the `1 path algorithm

Condition sufficient for optimality:

0 = αi − Xi + λ∑

j 6=iαi 6=αj

wij sign(αi − αj) + λ∑

j 6=iαi=αj

wijβij,

with |βij | 6 1 and βij = −βji (Hoefling 2009).1 For λ = 0 the solution α = X is optimal. We initialize the

clusters Ci = i and coefficients αi = Xi for all i .2 As λ increases, the solutions will follow straight lines.3 Taking the derivative of the optimality condition with respect toλ and summing over all points in a cluster C leads to:

dλ= vC =

∑j 6∈C

wjC sign(αj − αC) =∑j 6∈C

∑i∈C

wij sign(αj − αC)

4 When 2 clusters C1 and C2 fuse, they form a new clusterC = C1 ∪ C2 with vC = (|C1|v1 + |C2|v2)/(|C1| + |C2|).

5 Stop when all the points merge at the mean X .6 Combine dimensions using λ values.

`1 clusterpath of 10 points in 2d

Joins the left cluster on α1 before joining right cluster.

Solution at λ = 0.18 yields 2 clusters.

-0.5 0.0 0.5 1.0 1.5

Location in the regularization path λ

eof` 1

-0.8-0.6-0.4-0.20.00.20.40.6

0.00 0.05 0.10 0.15 0.20

Free software! http://clusterpath.r-forge.r-project.org/

Dedicated C++ optimization algorithms with R interface.Calculates the exact `1 clusterpath for identity weights.Active-set algorithm for the `1 and `2 clusterpath with general weights.

R interface to Python cvxmod clusterpath solver.Clusterpath visualizations in 2d, 3d, and animations.Coming soon: picking the number of clusters automatically!

Future workNecessary and sufficient conditions for cluster splitting?Automatically learning weights and number of clusters?Applications to solving proximal problems.

Clustering performance and timings

Cluster using the prior knowledge that there are 2 clusters.Quantify partition correspondence using the Normalized RandIndex (Hubert and Arabie, 1985): 1 for perfect correspondence,0 for completely random assignment.Results for 2 non-convex interlocking half-moons in 2d:

Clustering method Rand SD Seconds SDeexp spectral clusterpath 0.99 0.00 8.49 2.64eexp spectral kmeans 0.99 0.00 3.10 0.08`2 clusterpath 0.95 0.12 29.47 2.31e01 Ng et al. kmeans 0.95 0.19 7.37 0.42e01 spectral kmeans 0.91 0.19 3.26 0.21Gaussian mixture 0.42 0.13 0.07 0.00average linkage 0.40 0.13 0.05 0.00kmeans 0.26 0.04 0.01 0.00

Similar performance to spectral clustering, and learns a tree:

The weighted `2 clusterpath applied to the iris data:

Scatter Plot Matrix

Sepal.Length0

2 0 1 2

−2 −1 0

Sepal.Width

3 1 2 3

−2 −1 0

Petal.Length0.0

1.5 0.0 0.5 1.0 1.5

−1.5

−1.0

−0.5

−1.5 −0.50.0

Petal.Width0.0

1.5 0.0 0.5 1.0 1.5

−1.5

−1.0

−0.5

−1.5 −0.50.0

setosa versicolor virginica

Performance for several model sizes

Number of clusters

2 3 5 7 9 11

data:iris

data:m

method

γ = 0.5

γ = 2

γ = 10

kmeans

clusterpath: an algorithm for clustering using convex fusion...

Documents

flip-flop: fast lasso-based isoform prediction from rna...

analysis and prediction of affinity of tap binding...

convex sets and convex functions 1 convex sets,

a general coefficient of similarity and some of its...

principal warps: thin-plate splines and the decomposition of...

ch03. convex sets and concave...

a comprehensive evaluation of multicategory classification...

application of the karhunen-loeve procedure for the...

lecture 4 convex programming and lagrange duality...lecture...

pathological prognostic factors in breast cancer. i. the...

seventh framework programme ”ideas” speciﬁc programme...

high-resolution acgh and expression profiling identifies a...

repeated observation of breast tumor subtypes in...

nonlinear optimization: algorithms 1: unconstrained...

lecture notes on convex polyominoes, convex permutominoes

mathieu.jan@cea - perso.telecom-paristech.fr

vision - perso.telecom-paristech.fr

introduction à la biologie moléculaire et à la bio...

protein classification based on text document...

new insights about herg blockade obtained from protein...