clusterability a theoretical study margareta ackerman joint work with shai ben-david

38
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Upload: phebe-quinn

Post on 03-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

CLUSTERABILITYA THEORETICAL STUDYMargareta Ackerman

Joint work with Shai Ben-David

Page 2: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

The theory-practice gap

Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science

.

.

.

All apply clustering to gain a first understanding of the structure of large data sets.

Yet, there is distressingly little theoretical understanding of

clustering.

Page 3: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Inherent obstacles

• Clustering is not well defined.

There is a wide variety of different clustering tasks, with different (often implicit) measures of quality.

• In most practical clustering tasks there is no clear ground truth to evaluate your solution by.

(in contrast with classification tasks, in which you can have a hold out labeled set to evaluate the classifier against).

Page 4: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Common solutions

Objective utility functions Sum Of In-Cluster Distances, Average Distances to Center Points, Cut Weight, Spectral Clustering, etc. (Shmoys, Charikar, Meyerson, Luxburg, ..)Analyze the computational complexity of discrete optimization problems.

Consider a restricted set of distributions (“generative models”): Ex. Mixtures of Gaussians [Dasgupta ‘99], [Vempala, ’03], [Kannan et al ‘04], [Achlitopas, McSherry ‘05].Recover the parameters of the model generating the data.

Many more….

Page 5: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Quest for a General Theory

What can we say independently of any

specific algorithm,

specific objective function,

or specific generative data model

?Axioms of clusteringEx. Clustering functions [Puzicha, Hofmann, Buhmann ‘00], [Kleinberg ‘02]Axioms of clustering quality measures [Ackerman and Ben-David, ‘08]. Find axioms that define clustering.

Page 6: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Why study clusterability?

Even if a data set has no meaningful structure, any clustering algorithm will find some partition of the data set.

Clusterability aims to determine whether a data set can be meaningfully clustered.

Notions of clusterability quantify the degreeof clustered structure in a data set.

Page 7: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Our contributions

We set out to explore notions of clusterability, compare them, and find patterns of similarity.

Computational complexity of clustering Data sets that are more clusterable are computationally

easier to cluster well.Hardness Determining whether a data set is clusterable is usually NP-

hard. Comparison Notions of clusterability are pairwise inconsistent.

Page 8: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 9: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Definitions and Notation

A k-clustering of X is a k-partition of X

A loss function (or cost function) is a function that takes a clustering and outputs a real number. Ex. k-median, k-means.

Let OPTL,k(X) denote the minimal loss over all

k-clusterings of X.

OPTL,k(X) = min{L(C): C a k-clustering of X}.

Page 10: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Definitions and Notation

A clustering C = {X1, X2, … , Xk } is center-based if there are centers ci in Xi, such that all points in Xi are closer to ci than to any other center.

Page 11: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 12: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Better clusterability implies that data is easier to cluster well

In most formulations, clustering is an NP-hard problem. (ex. k-means and k-median clustering, correlation clustering, ect.)

When a data set has no significant clustered structure, there is no sense in clustering it.

Clustering makes sense only on data sets that have meaningful clustering structure.

We show that the more clusterable a data set is, the easier it is to cluster well.

Clustering is hard only when there isn’t sufficient clustering structure in the data set.

Page 13: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 14: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Center Perturbation Clusterability:A high level definition

Call a clustering optimal when it has optimal cost by some loss function L.

If a clustering looks like the optimal clustering, we might expect that its cost is also near-optimal (in well-clusterable data sets.)

A data set X is CP-clusterable if all clusterings that are structurally similar to an optimal clustering of X have near-optimal cost.

Page 15: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Center Perturbation Clusterability

Two center-based clusterings C and C’ are ε-close if there existcenters c1, c2, c3 ,…, ck of C and c’1, c’2, c’3 ,…, c’k of C’ s.t.

|ci -c’i| ≤ ε.

Definition: A data set X is (ε,δ)-CP Clusterable (for k) if for every clustering C of X that is ε-close to some optimal k-clustering of X,

L(C) ≤(1+ δ) OPTL,k(X) .

Page 16: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Good PC-clusterability implies that it is easy to cluster well

This result holds with any loss function where the optimal clusterings are center-based.

Theorem: Given a data set X in Rm that is (rad(X)/sqrt(l),δ)-PC clusterable

(for k), there is an algorithm that runs in time polynomial in n, and outputs a k-clustering C such that

L(C) ≤(1+ δ)OPTL,k(X).

Page 17: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Proof that PC-clusterability implies that it is easy to cluster well

Let an l-sequence denote a collection of l elements of X (not necessarily distinct).

Algorithm 1: For each k-tuple of l-sequences of X S := centers of mass of the l-sequences Ctemp := the clustering that S induces on X If L(Ctemp ) < L(C) C := Ctemp Return C

Page 18: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Proof that PC-clusterability implies that it is easy to cluster well

By Maurey’s result, there is a clustering C’ examined by Algorithm 1 that is rad(X)/sqrt(l )-close to an optimal clustering of X.

Since Algorithm 1 selects the minimal loss clustering of the ones it examines, L(C)≤ L(C‘).

Since X is (rad(X)/sqrt(l),δ)-PC clusterable,

L(C)≤ L(C ‘) ≤ (1+ δ) OPTL,k(X).

Theorem [Maurey, 1981]:

For any fix l ≥1, and each x in the convex hull of X in Rm, there exist x1, x2, ,…, xl ɛ X such that the average of the xi’s is at most rad(X)/sqrt(l ) away from x.

Page 19: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 20: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Worst Pair Ratio Clusterability: Preliminaries

The width of a clustering is the maximum distance between points in the same cluster (over all clusters).

The split of a clustering is the minimum distance between points in different clusters.

Introduced by Epter, Krishnamoorthy, and Zaki in 1999.

Definition: The Worst Pair Ratio of X (w.r.t. k) is

WPRk(X)=max{split(C)/width(C) : C a k-clustering of X}.

Page 21: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Better WPR-clusterability implies that it is easier to cluster well

Theorem: Given a data set X (on n elements) where WPRk (X) > 1, we can find a k-clustering of X with the

maximal split over width ratio in time O(n 2 log n).

Page 22: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 23: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Separability clusterability

Separability measures how much is gained in the transition from k-1 to k clusters.

In the original definition, L is k-means.

Introduced by Ostrovsky, Rabani, Schulman, and Swamy in 2006.

Definition:A data set X is (k,ε)-separable if

OPTL,k(X) ≤ εOPTL,k-1(X).

Page 24: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Better separability implies that it is easier to cluster well

Theorem [Theorem 4.13, Ostrovsky et al. 2006]:

Given a (k, ε2)-separable data set X in Rm, for small enough ε, we can find a k-clustering with k-means loss at most (1- ε 2)/(1-37 ε 2)OPTk-means, k(X) away from the optimal, with probability at least 1- O(ε2) in time O(nm).

Page 25: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 26: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Variance Ratio – Preliminaries.

)(||

||)( 2

1i

k

i

i XX

XCW

2

1

))()((||

||)( XsscenterOfMaXsscenterOfMa

X

XCB i

k

i

i

Introduced by Zhang in 2001.

The within cluster variance of C is

The between cluster variance of C is

Definition:The Variance Ratio of a data set X (for k) is Vark(X) = max{B(C)/W(C) : C a k-clustering of X}

Page 27: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Better VR clusterability implies that it is easier to cluster well

Proof:- We can show that VR 2(X) = 1/S 2(X) -1.

- The result follows from a theorem by Ostrovsky et al. for 2-clusterings [Theorem 3.5, Ostrovsky et al. 2006].

Theorem:

Given a (2, ε2)-separable data set X in Rm, we can find a 2-clustering with k-means loss at most 1-Θ(ε2) away from the optimal, with probability at least 1- O(ε2) in time O(nm).

Page 28: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Summary of notions of clusterability

Center-perturbation: Whenever a clustering is structurally similar to the optimal

clustering, its cost is near-optimal.

Separability:

Loss of the optimal k-clustering/Loss of the optimal (k-1)-clustering

Variance Ratio: Between-cluster variance/Within-cluster variance.

Worst Pair Ratio: Split/width.

Page 29: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 30: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Comparing notions of clusterability

An arrow from notion A to notion B indicates good clusterability by notion A implies good clusterability by notion B.

Worst Pair

Ratio

Variance Ratio

Separability

Center-Perturbation

No two notions are equivalent.

Page 31: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 32: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Computational complexity of determining the clusterability value

What is the computational complexity of determining the clusterability of a data set?

It is NP-hard to determine whether a data set is (k,ε)-separable.

It is NP-hard to find the Variance Ratio of a data set.

It a data set is well-clusterable by WPR, then the WPR can found in polynomial time.

Page 33: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Outline

Definitions and notation Clusterability and the complexity of clustering Introduce a new notion of clusterability: PC-clusterability Worst pair ratio (Epter, Krishnamoorthy, and Zaki, 1999)

Separability (Ostrovsky, Rabani, Schulman, and Swamy, 2006)

Variance ratio (Zhang, 2001)

Comparison of notions of clusterability The hardness of determining clusterability Summary and future work

Page 34: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Summary

We initiate a study of clusterability and introduce a new notion of clusterability.

We show that three previous and the new notion of clusterabilty are pairwise distinct.

Better clusterability implies that it is easier to cluster well for distinct notions of clusterability.

Determining the degree of clusterability is usually NP-hard.

Page 35: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Future work

Property: The more clusterable a data set is, the easier it is, computationally, to find a near-optimal clustering of the data.

Does this property hold for other natural notions of clusterability?

Can clusterability be axiomatized? Can it be shown that the above property holds for

all reasonable notions of clusterability?

Page 36: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Appendix: Variance Ratio does not imply the other notions

Select a data set X1 with arbitrarily poor (k-1)-clusterability according to any other notion B (separability, Worst Pair Ratio, or Center-Perturbation)

Create data set X2 by taking X1 and adding a single point very far away, making it’s own cluster in the optimal k-clustering.

X2 can have arbitrarily good Variance Ratio.

Clusterability is the same on data sets X1 and X2 by notion B.

X1 X2

Page 37: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

Thinking about clusterability:Are these data sets clusterable?

Clusters come in different shapes and sizes.

Page 38: CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David

What happens with noise, outliers, and “structureless” data?