dynamic characterization of cluster structures for robust and inductive support vector clustering

6
Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering Jaewook Lee, Member, IEEE, and Daewon Lee Abstract—A topological and dynamical characterization of the cluster structures described by the support vector clustering is developed. It is shown that each cluster can be decomposed into its constituent basin level cells and can be naturally extended to an enlarged clustered domain, which serves as a basis for inductive clustering. A simplified weighted graph preserving the topological structure of the clusters is also constructed and is employed to develop a robust and inductive clustering algorithm. Simulation results are given to illustrate the robustness and effectiveness of the proposed method. Index Terms—Clustering, kernel methods, support vector machines, inductive learning, dynamical systems. Ç 1 INTRODUCTION THE support vector clustering (SVC) methods [1], [7], [8] are recently emerged algorithms to characterize the support of a high dimensional distribution inspired by the support vector machines (SVMs) [2], [9] and have been successfully applied to solve some difficult and diverse clustering or outlier detection problems [1], [3], [6], [8]. These methods map data points by means of a kernel to a high dimensional feature space and find a sphere with minimal radius that contains most of the mapped data points in the feature space. This sphere, when mapped back to the data space, can separate into several components, each enclosing a separate cluster of points as in [1]. They have some advantages over other clustering algorithms for their ability to generate cluster bound- aries of arbitrary shape and to deal with outliers by employing a soft margin constant that allows the sphere in feature space not to enclose all points, although some model selection problems arise from the difficulties in choosing a suitable kernel parameter [1], [6]. In this paper, we explore the topological structure of the clusters generated by the SVC utilizing an associated dynamical system and show that each cluster can be decomposed into its constituent (dynamically invariant) sets, the so-called basin level cells, of the constructed system. This decomposition not only facilitates the cluster labeling of each sample data point, but also makes it possible to assign cluster labels to unknown data points, thereby providing a way to partition the whole data space into separate clustered domains for inductive clustering. We also construct a simplified weighted graph preserving the topological structure of the clusters by introducing the concepts of an adjacency and a transition point between basin cells. The constructed graph is applied to developing a robust method to differentiate between the decomposed basin level cells, which is crucial for the correct cluster labeling of the sample data points. The proposed method is shown through simulation to improve and extend the clustering ability of the traditional SVC algorithms. 2 AREVIEW OF A SUPPORT VECTOR CLUSTERING The support vector clustering methods build cluster boundaries which enclose the data points by computing a set of contours generated by a so-called trained kernel support function. Following the derivation of [1], [8] in this section, we construct a trained kernel support function as follows: Let fx i gX be a given data set of N points, with X< n , the data space. Using a nonlinear transformation from X to some high dimensional feature space, we look for the smallest enclosing sphere of radius R described by the constraints kðx j Þ ak 2 R 2 þ $ j ; ð1Þ where a is the center. Introducing the Lagrangian with penalty term L ¼ R 2 X j ðR 2 þ $ j kðx j Þ ak 2 Þ j X j $ j " j þ C X j $ j ; the solution of the primal problem (1) can be obtained by solving its dual problem: max W ¼ X j ðx j Þ 2 j X i;j i j ðx i Þ ðx j Þ subject to 0 j C; X j j ¼ 1;j ¼ 1; ... ;N: ð2Þ Only those points with 0 < j <C lie on the boundary of the sphere and are called support vectors (SVs). Points with j ¼ C lie outside the boundaries and are called bounded support vectors (BSVs). The trained kernel support function (TKSF), defined by the radial distance of the image of x from the sphere center, is then given by f ðxÞ¼ R 2 ðxÞ¼kðxÞ ak 2 ¼ Kðx; xÞ 2 X j j Kðx j ; xÞþ X i;j i j Kðx i ; x j Þ; ð3Þ where the Gaussian kernel Kðx i ; x j Þ¼ ðx i Þ ðx j Þ¼ expðqkx i x j k 2 Þ with width parameter q is used. One distinguishing feature of the trained kernel support function is that cluster boundaries can be constructed by a set of contours that enclose the points in data space given by fx : f ðxÞ¼ " rg, where " r ¼ R 2 ðx i Þ for any support vector x i . (See Fig. 2a.) Specifically, the level set, L f ð" rÞ, of f ðÞ is decomposed into several different clusters: L f ð" rÞ¼ def fx : f ðxÞ " rC 1 [[ C p ; ð4Þ where the C i ;i ¼ 1; ... ;p are different clusters corresponding to the disjoint connected sets and p is the number of clusters determined by R 2 ðÞ. 3 CLUSTER DECOMPOSITION Since the clusters generated by the SVC correspond to the connected components of the level set L f ð" rÞ described by (4), the cluster structure can be analyzed by exploring a topological property (e.g., connectedness) of the level sets. To characterize the topological property of the level set L f ð" rÞ, in this paper, we build a dynamical system associated with the trained kernel support function, f , and show that each connected component, C i , of L f ð" rÞ is exactly composed of dynamically invariant sets, the so-called basin level cells, of the constructed system. This decomposition will be shown to facilitate the cluster labeling of each sample data point and to extend the clusters to enlarged clustered domains that constitute the whole data space. 3.1 Dynamical System Formulation Consider a (negative) gradient dynamical system associated with the trained kernel support function, f , described by dx dt ¼rf ðxÞ: ð5Þ The existence of a unique solution (or trajectory) xðÞ : <!< n for each initial condition xð0Þ is guaranteed since the function f in (3) is twice differentiable and the norm of rf is bounded [4]. A state IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006 1869 . The authors are with the Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, Kyungbuk 790-784, Korea. E-mail: {jaewookl, woosuhan}@postech.ac.kr. Manuscript received 10 July 2005; revised 22 Mar. 2006; accepted 22 Mar. 2006; published online 14 Sept. 2006. Recommended for acceptance by J. Buhmann. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-0363-0705. 0162-8828/06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society

Upload: jaewook-lee

Post on 05-Nov-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

Dynamic Characterization of ClusterStructures for Robust and Inductive

Support Vector Clustering

Jaewook Lee, Member, IEEE, and Daewon Lee

Abstract—A topological and dynamical characterization of the cluster structuresdescribed by the support vector clustering is developed. It is shown that eachcluster can be decomposed into its constituent basin level cells and can benaturally extended to an enlarged clustered domain, which serves as a basis forinductive clustering. A simplified weighted graph preserving the topologicalstructure of the clusters is also constructed and is employed to develop a robustand inductive clustering algorithm. Simulation results are given to illustrate therobustness and effectiveness of the proposed method.

Index Terms—Clustering, kernel methods, support vector machines, inductive

learning, dynamical systems.

Ç

1 INTRODUCTION

THE support vector clustering (SVC) methods [1], [7], [8] arerecently emerged algorithms to characterize the support of a highdimensional distribution inspired by the support vector machines(SVMs) [2], [9] and have been successfully applied to solve somedifficult and diverse clustering or outlier detection problems [1],[3], [6], [8]. These methods map data points by means of a kernel toa high dimensional feature space and find a sphere with minimalradius that contains most of the mapped data points in the featurespace. This sphere, when mapped back to the data space, canseparate into several components, each enclosing a separate clusterof points as in [1]. They have some advantages over otherclustering algorithms for their ability to generate cluster bound-aries of arbitrary shape and to deal with outliers by employing asoft margin constant that allows the sphere in feature space not toenclose all points, although some model selection problems arisefrom the difficulties in choosing a suitable kernel parameter [1], [6].

In this paper, we explore the topological structure of theclusters generated by the SVC utilizing an associated dynamicalsystem and show that each cluster can be decomposed into itsconstituent (dynamically invariant) sets, the so-called basin levelcells, of the constructed system. This decomposition not onlyfacilitates the cluster labeling of each sample data point, but alsomakes it possible to assign cluster labels to unknown data points,thereby providing a way to partition the whole data space intoseparate clustered domains for inductive clustering.

We also construct a simplified weighted graph preserving thetopological structure of the clusters by introducing the concepts ofan adjacency and a transition point between basin cells. Theconstructed graph is applied to developing a robust method todifferentiate between the decomposed basin level cells, which iscrucial for the correct cluster labeling of the sample data points.The proposed method is shown through simulation to improveand extend the clustering ability of the traditional SVC algorithms.

2 A REVIEW OF A SUPPORT VECTOR CLUSTERING

The support vector clustering methods build cluster boundarieswhich enclose the data points by computing a set of contoursgenerated by a so-called trained kernel support function. Following

the derivation of [1], [8] in this section, we construct a trained kernelsupport function as follows: Let fxig � X be a given data set ofN points, with X � <n, the data space. Using a nonlineartransformation � from X to some high dimensional feature space,we look for the smallest enclosing sphere of radiusRdescribed by theconstraints

k�ðxjÞ � ak2 � R2 þ �j; ð1Þ

where a is the center. Introducing the Lagrangian with penalty term

L ¼ R2 �Xj

ðR2 þ �j � k�ðxjÞ � ak2Þ�j �Xj

�j�j þ CXj

�j;

the solution of the primal problem (1) can be obtained by solvingits dual problem:

max W ¼Xj

�ðxjÞ2�j �Xi;j

�i�j�ðxiÞ � �ðxjÞ

subject to 0 � �j � C;Xj

�j ¼ 1; j ¼ 1; . . . ; N:ð2Þ

Only those points with 0 < �j < C lie on the boundary of the sphereand are called support vectors (SVs). Points with �j ¼ C lie outsidethe boundaries and are called bounded support vectors (BSVs).

The trained kernel support function (TKSF), defined by the radialdistance of the image of x from the sphere center, is then given by

fðxÞ ¼ R2ðxÞ ¼ k�ðxÞ � ak2

¼ Kðx;xÞ � 2Xj

�jKðxj;xÞ þXi;j

�i�jKðxi;xjÞ; ð3Þ

where the Gaussian kernel Kðxi;xjÞ ¼ �ðxiÞ � �ðxjÞ ¼ expð�qkxi �xjk2Þwith width parameter q is used. One distinguishing feature ofthe trained kernel support function is that cluster boundaries canbe constructed by a set of contours that enclose the points in dataspace given by fx : fðxÞ ¼ �rg, where �r ¼ R2ðxiÞ for any supportvector xi. (See Fig. 2a.)

Specifically, the level set, Lf ð�rÞ, of fð�Þ is decomposed intoseveral different clusters:

Lfð�rÞ ¼def fx : fðxÞ � �rg ¼ C1 [ � � � [ Cp; ð4Þ

where the Ci; i ¼ 1; . . . ; p are different clusters corresponding tothe disjoint connected sets and p is the number of clustersdetermined by R2ð�Þ.

3 CLUSTER DECOMPOSITION

Since the clusters generated by the SVC correspond to theconnected components of the level set Lf ð�rÞ described by (4), thecluster structure can be analyzed by exploring a topologicalproperty (e.g., connectedness) of the level sets. To characterize thetopological property of the level set Lf ð�rÞ, in this paper, we build adynamical system associated with the trained kernel supportfunction, f , and show that each connected component, Ci, of Lf ð�rÞis exactly composed of dynamically invariant sets, the so-calledbasin level cells, of the constructed system. This decompositionwill be shown to facilitate the cluster labeling of each sample datapoint and to extend the clusters to enlarged clustered domains thatconstitute the whole data space.

3.1 Dynamical System Formulation

Consider a (negative) gradient dynamical system associated withthe trained kernel support function, f , described by

dx

dt¼ �rfðxÞ: ð5Þ

The existence of a unique solution (or trajectory) xð�Þ : < ! <n foreach initial condition xð0Þ is guaranteed since the function f in (3) istwice differentiable and the norm of rf is bounded [4]. A state

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006 1869

. The authors are with the Department of Industrial and ManagementEngineering, Pohang University of Science and Technology, Pohang,Kyungbuk 790-784, Korea. E-mail: {jaewookl, woosuhan}@postech.ac.kr.

Manuscript received 10 July 2005; revised 22 Mar. 2006; accepted 22 Mar.2006; published online 14 Sept. 2006.Recommended for acceptance by J. Buhmann.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0363-0705.

0162-8828/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society

Page 2: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

vector �x satisfying the equation rfð�xÞ ¼ 0 is called an equilibrium

point (or critical point) of system (5). We say that an equilibriumpoint �x of (5) is hyperbolic if the Hessian matrix of f at �x, denoted byr2fð�xÞ, has no zero eigenvalues. Note that all the eigenvalues ofr2fð�xÞ are real since it is symmetric. A hyperbolic equilibrium pointis called 1) a (asymptotically) stable equilibrium point (or an SEP) ifall the eigenvalues of its corresponding Hessian are positive, 2) anunstable equilibrium point (or a UEP) if all the eigenvalues of itscorresponding Hessian are negative, or 3) a saddle point otherwise. Ahyperbolic equilibrium point �x is called an index-k saddle point if itsHessian has exactly k negative eigenvalues. A set K in <n is called apositively (negatively) invariant set of (5) if every trajectory of (5)starting in K remains in a set K for all t � 0 (t � 0).

The next proposition states the complete stability and thepositive invariance property of each cluster in (4) under process (5).

Proposition 1 [6]. For a given trained kernel support function f ,suppose that for any x0 2 <n, each connected component of the levelset Lf ðrÞ ¼ fx : fðxÞ � rg is compact, where r ¼ fðx0Þ. Then, (5) iscompletely stable, i.e., every trajectory of (5) approaches one of theequilibrium points of (5). Furthermore, each connected component ofthe level set LfðrÞ is positively invariant, i.e., if a point is on aconnected component of LfðrÞ, then its entire positive trajectory lieson the same component.

Remark. From the form of a Gaussian kernel, it can be shown thata trained Gaussian kernel support function f satisfies thecondition of this proposition; that is, each connected componentof the set fx : fðxÞ � fðx0Þg is compact. Unless otherwisespecified, in this paper, we will assume that the trained kernelsupport function f satisfies this condition.

3.2 Cluster Decomposition

One important notion introduced in this paper is a basin cell and abasin level cell, which helps us to decompose the whole data spaceinto several separate clustered domains.

Definition 1. 1) The basin of attraction of an SEP, s, is defined as

AðsÞ :¼ fxð0Þ 2 <n : limt!1

xðtÞ ¼ sg:

and the closure of the basinAðsÞ, denoted byAðsÞ, is called a basin cell.The boundary of the basin cell defines the basin cell boundary, denotedby @AðsÞ. 2) The basin level set of s, relative to a level value r, is definedas the set of points in the level set LfðrÞ that converges to s when (5) isapplied; that is,

BrðsÞ :¼ fxð0Þ 2 LfðrÞ : limt!1

xðtÞ ¼ sg:

and the closure of the basin level set BrðsÞ, denoted by BrðsÞ, is calleda basin level cell.

From a topological point of view, AðsÞ and AðsÞ are both connectedand invariant [4]. We next present a result showing that each clustercan be decomposed into the basin level cells of its constituent SEPs.

Theorem 1. Let si; i ¼ 1; . . . ; l be the set of all the stable equilibriumpoints of system (5) in a (nonempty) level set Lf ðrÞ. Then thefollowing holds:

1. Each basin level cell is given by

BrðsiÞ ¼ AðsiÞ \ LfðrÞ

and is connected and positively invariant.2. The level set LfðrÞ is decomposed into the basin level cells;

that is,

LfðrÞ ¼ fx : fðxÞ � rg ¼[li¼1

BrðsiÞ: ð6Þ

3. If sik ; k ¼ 1; . . . ; li is the set of all the stable equilibrium

points in a cluster Ci, then

Ci ¼[lik¼1

BrðsikÞ: ð7Þ

Proof.

1. Since AðsiÞ and LfðrÞ are both positively invariant, theirintersection, i.e., the basin level cell

BrðsiÞ ¼ AðsiÞ \ Lf ðrÞ;

is also positively invariant. The connectedness of the

basin level set can be proven in the same way as that of the

basin of attraction except that the starting points are

restricted to be in LfðrÞ [5]. Since the closure of a

connected set is connected, the basin level cell BrðsiÞ is

connected.2. By the complete stability property of system (5) from

Proposition 1, we have <n ¼Spi¼1 AðsiÞ, where si; i ¼

lþ 1; . . . ; p is the set of all the stable equilibrium pointsoutside the level set Lf ðrÞ of system (5). Since AðsiÞ \LfðrÞ ¼ ; for i ¼ lþ 1; . . . ; p, we have

Lf ðrÞ ¼[pi¼1

ðAðsiÞ \ Lf ðrÞÞ ¼[pi¼1

BrðsiÞ ¼[li¼1

BrðsiÞ:

3. Since the level set LfðrÞ consists of several differentclusters Ci as in (4), the result follows immediately. tu

Theorem 1 implies that, for any sample data point x 2 Lf ðrÞ,there exists a corresponding SEP, say si, such that x 2 BrðsiÞ and

the cluster to which a data point x belongs is identical to the cluster

to which its corresponding SEP, si, belongs. Therefore, the cluster

labeling of each sample data point can be accomplished by

identifying the cluster label of its corresponding SEP si to which

the data point converges by (5). (See Fig. 2a.)Theorem 1 also serves as a basis to extend the clusters given in

(4) to enlarged clusters as follows: Since any point in the basin cell

AðsiÞ enters into the basin level cell BrðsiÞ when (5) is applied,

AðsiÞ, which is connected and invariant, can be considered a

natural extension of BrðsiÞ. Therefore, if sik ; k ¼ 1; . . . ; li is the set

of all the stable equilibrium points in a cluster Ci, then the cluster

Ci can be extended to an enlarged cluster CEi � Ci given by

CEi ¼

[lik¼1

Aðsik Þ: ð8Þ

By the way, since the basins AðsiÞ are disjoint and the whole data

space <n is composed of the basin cells Aðsik Þ, the whole space is

partitioned into separate clustered domains, i.e.,

<n ¼ CE1 [ � � � [ CE

p : ð9Þ

Therefore, this extension provides us with a natural way to assign

cluster labels to unknown data points, which is one distinguishing

feature of our dynamical system approach in clustering.

4 CHARACTERIZATION OF THE CLUSTER STRUCTURE

In the previous section, we have shown that we can differentiate

between points that belong to different clusters by differentiating

between their corresponding SEPs. In this section, by introducing

the concepts of an adjacency and a transition point, we explore a

topological structure of the clusters generated by the SVC and

construct a weighted graph simplifying the cluster structure,

1870 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006

Page 3: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

which provides us with a robust way to differentiate between the

SEPs that belong to different clusters.

4.1 Adjacency and Transition Point

Other important notions introduced in this paper are the adjacency

and the transition point. Before giving these definitions, we present

a result showing a dynamic relationship between a stable

equilibrium point (SEP) and an index-one saddle point lying on

its basin cell boundary.

Proposition 2 ([5]). Let sa be an SEP of (5). Then, there exists an index-

one saddle point d 2 @AðsaÞ such that the 1D unstable manifold1 (or

curve) WuðdÞ of d converges to another stable equilibrium point,

say sb.

This result leads to the following definition:

Definition 2. Two SEPs, sa and sb, are said to be adjacent to each other

if there exists an index-one saddle point d 2 AðsaÞ \AðsbÞ. Such an

index-one saddle point, d, is called a transition point between saand sb.

Note that, in our definition, the existence of the transition point

is necessary for the adjacency of two SEPs in addition to the

condition that the two basin cells intersect. For example, in Fig. 2a,

the basin cells Aðs6Þ and Aðs8Þ of two SEPs, s6 and s8, intersect

(only one intersection point), but they are not adjacent to each

other in terms of our definition since there does not exist a

transition point between them. By contrast, two SEPs, s6 and s7, are

adjacent to each other since there exists a transition point between

them.We next derive a result showing that only the transition points

are sufficient to determine whether two adjacent SEPs are in the

same cluster.

Theorem 2. Let si; sj be any two adjacent SEPs of (5). If fðdÞ < r for a

transition point d between si and sj, then si and sj are in the same

cluster of LfðrÞ.Proof. Let d be a transition point between si and sj. Since d 2

AðsiÞ \AðsjÞ and fðdÞ < r, we have

d 2 AðsiÞ \AðsjÞ \ LfðrÞ ¼ BrðsiÞ \BrðsiÞ:

Since BrðsiÞ, BrðsiÞ, and their intersection are all connected,BrðsiÞ [BrðsiÞ is connected. Therefore, si and sj are in the samecluster of Lf ðrÞ. tu

Remark. The converse of this theorem is not generally true. SeeFig. 1a. In this figure, two adjacent SEPs, s1 and s3, are in thesame cluster of a level set LfðrÞ, but their transition point, d3,has a value fðd3Þ greater than r.

4.2 A Simplified Graph for Cluster Identification

The concepts of adjacent SEPs and transition points enable us tobuild a weighted graph Gr ¼ ðV ;EÞ, describing the connectionsbetween the SEPs, with the following elements:

1. The vertices V of Gr are SEPs s1; . . . ; sp of (5) with fðsiÞ < r,i ¼ 1; . . . ; p.

2. The edge E of Gr is defined as follows; hsi; sji 2 E with theedge weight, !hsi; sji ¼ fðdiÞ, if there is a transition pointdi between si and sj with fðdiÞ < r. (Note that the edgeweights fðdiÞ always take positive values from (3).)

The constructed graph Gr ¼ ðV ;EÞ simplifies the cluster structuresof the level set Lf ðrÞ and gives us an insight into the topologicalstructures of the clusters. The next theorem, one of the main resultsof this paper, establishes the equivalence of the topologicalstructures between the graph Gr and the cluster of LfðrÞ. (SeeFig. 2b.)

Theorem 3. Each connected component of the graph Gr corresponds tothe cluster of the level set LfðrÞ. That is, si and sj are in the sameconnected component of the graph Gr if and only if si and sj are in thesame cluster of the level set LfðrÞ.

Proof (“only if” part). Let si and sj be in the same connectedcomponent of the graph Gr. Then, there is a sequence

si ¼ si0 ; si1 ; . . . ; sih�1; sih ¼ sj

such that sik�1; sik 2 E for each k ¼ 1; . . . ; j. Since sik�1

; sik 2 Eimplies that there is a transition point dk between sik�1

and sikwith fðdkÞ < r by the construction of the graph Gr, by

Theorem 2, sik�1and sik are in the same cluster of Lf ðrÞ for

each k ¼ 1; . . . ; j. Therefore, the sequence si0 ; si1 ; . . . ; sih are in

the same cluster and, in particular, si and sj are in the same

cluster of Lf ðrÞ.(“if” part). Let si and sj be in the same cluster, say Ci, of

LfðrÞ. By Theorem 2, there is a sequence

si ¼ si0 ; si1 ; . . . ; sih�1; sih ¼ sj

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006 1871

1. An unstable manifold of an equilibrium point �x is defined asWuð�xÞ ¼ fxð0Þ 2 <n : limt!�1 xðtÞ ¼ �xg.

Fig. 1. (a) The shaded region represent a cluster of Lf ðrÞ whose boundary is denoted by the solid line. The si, denoted by �, represent the SEPs and the di, denoted by

“+,” represent the transition points. The AðsiÞ, encircled by dashed lines, represent the basins of si and the shaded regions BrðsiÞ, separated by dashed lines, represent

the basin level set of si, showing that BrðsiÞ ¼ AðsiÞ \ Lf ðrÞ. (b) Geometric illustration of a transition point between two adjacent SEPs.

Page 4: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

such that Brðsik�1Þ [Brðsik Þ is connected for each k ¼ 1; . . . ; j.

We will prove that sik�1; sik 2 E for each k ¼ 1; . . . ; j by using

contradiction. Suppose that sik�1; sik 62 E for some k ¼ 1; . . . ; j.

Then, by Theorem 2, the transition point dik between sik�1and

sik should have fðdikÞ > r, which implies that Brðsik�1Þ [Brðsik Þ

is not connected. This is a contradiction. Therefore, sik�1; sik 2 E

for each k ¼ 1; . . . ; j and, so, si and sj are in the same connected

component of the graph Gr. tuThis theorem implies that we can differentiate between the

SEPs that belong to different clusters by constructing the weighted

graph Gr.

5 CLUSTERING

As is shown in the previous section, the weighted graph Gr ¼ðV ;EÞ plays an important role in understanding the cluster

structure in a simple and implementable way. In this section,

utilizing the results developed so far, we suggest a strategy to

implement the graph Gr from a given data set and give an

illustrative example to show how the data set can be clustered with

this strategy. First, we propose the following conceptual algorithm

to construct a graph Gr from a given data set.

Algorithm 1. Constructing the Graph Gr

Given a data set D ¼ fxk ¼ ðx1k; . . . ; xnk Þ

TgNk¼1 (or its reduced

dimensional representations via a linear PCA, for example);

A.O. // Initialization //

Construct a trained kernel support function f given in (3). Set

ai ¼ mink xik and bi ¼ maxk x

ik, i ¼ 1; . . . ; n.

A.1. // Constructing the vertices V and decomposing data points into

several disjoint sets //

M ¼ 0; V ¼ ;; // a set of stable equilibrium points;

for each data point xk 2 D, k ¼ 1; . . . ; N , do

numerically integrate (5) starting from xk until it reaches an

SEP, say, xkif xk =2V // make hsMþ1ithen sMþ1 xk; eV fsMþ1g [ V ; xk 2 hsMþ1i and

M M þ 1;

else find si 2 V such that xk ¼ si and xk 2 hsii;end

A.2. // Finding the equilibrium points of system (5) //

1) Divide the region with ai � xi � bi, i ¼ 1; . . . ; n into several

hypercubes. The length for ith edge of the hypercube is

ðbi � aiÞ=t, where t is the user-defined step length. Therefore,

the total number of hypercubes is td.

2) Randomize the order of data points and initialize

visitðUiÞ ¼ False for each hypercube Ui.

for each data point xk 2 D, k ¼ 1; . . . ; N , do

find the hypercube Ui to enclose xk;

if visitðUiÞ ¼ Falsethen find the solution of rfðxÞ ¼ 0 starting from xk;

visitðUiÞ ¼ True;end

A.3. // Constructing the edges E //

Identify the index-one saddle points di, with fðdiÞ < r, i ¼ 1; . . . ; q,

from the equilibrium points obtained in (A.2) by checking the

eigenvalues of the Hessian r2fðdiÞ.for i ¼ 1 to q do

1) Find a unit length eigenvector vi corresponding to a negative

eigenvalue of r2fðdiÞ. Set xþi ¼ di þ �vi and x�i ¼ di � �vi for

some small � > 0.

2) Numerically integrate (5) starting from xþi and x�i until they

approach the SEPs, say sþi and s�i , respectively. Set hsþi ; s�i i 2 E.

end

To determine the connected components of the constructed

graph Gr and to assign the cluster label to each data point, we next

suggest the following method, which makes repeated calls to the

well-known depth-first search algorithm:

Algorithm 2. Labeling.

Given a graph Gr ¼ ðV ;EÞ; set index ¼ 0;

for each vertex si 2 V // Determining the connected components

of Gr //

visitðsiÞ ¼ False; labelðsiÞ ¼ index;

end

for each vertex si 2 Vif visitðsiÞ ¼ Falsethen index ¼ indexþ 1; call DFSðsi; indexÞ;

end

1872 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006

Fig. 2. (a) There are three clusters, C1, C2, C3 (the regions whose boundaries are represented by solid lines) and each cluster consists of its constituent basin level cells,

e.g. Bðs1Þ [Bðs2Þ. (b) Clustering by Algorithm 2. The SEPs si, denoted by �, represent 10 vertices of the cunstructed weighted graph Gr. The solid lines with arrows

represent the unstable manifolds of the transition points di, implying the eight edges of Gr that connects two adjacent SEPs. The graph Gr determines three extended

clustered groups, CE1 , CE

2 , CE3 , whose boundaries are represented by solid lines. A test data point, denoted by “o,” for example, converges to an SEP, s1 2 CE

1 , implying

that the point belongs to CE1 .

Page 5: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

subprocedure DFSðsi; indexÞ // Depth-first search algorithm //

begin

1) visitðsiÞ ¼ True; labelðsiÞ ¼ index;

2) for each vertex sj such that hsi; sji 2 Eif visitðsjÞ ¼ False, then call DFSðsj; indexÞ;

end

end

for each SEP si 2 V , i ¼ 1; . . . ;M , do // Labeling

labelðxkÞ ¼ labelðsiÞ for all xk 2 hsii obtained in (A.1);

end

To illustrate the proposed algorithms, see Fig. 2. In this example,

applying Algorithm 1, we obtain 10 SEPs, si, vertices of Gr, and

eight transition points, di, where the unstable manifolds of the dithat connect two adjacent SEPs represent eight edges of Gr, as

shown in Fig. 2b. Algorithm 2 is then applied to this constructed

graph and we get the three connected components of Gr by

identifying the cluster labels of si, thereby assigning the cluster

label of each sample data point.

Remark. Computation of transition points is very crucial in

constructing a graph whose topological structure is equivalent

to that of the clusters described by a trained kernel support

function, as in Theorem 3. Moreover, it helps us to correctly

assign a cluster label to each SEP from the vertices consisting of

only SEPs. To illustrate this, consider a widely used complete

graph (CG) labeling strategy [1], [7], [11] applied to a restricted

set of the SEPs fskgMk¼1. The CG strategy builds an adjacency

matrix Aij between pairs of sk and sl: Aij ¼ 1, if fðyÞ � r for all

the points y on the line segment connecting sk and sl and

Aij ¼ 0, otherwise. This strategy does not work correctly for the

example in Fig. 2a. In this example, the SEP, s7, is assigned to a

different cluster from a group of fsig6i¼3, since, for each

i ¼ 3; . . . ; 6, fðs7 þ �ðsi � s7ÞÞ 6� r for some � 2 ½0; 1.

6 EXPERIMENTAL RESULTS

To demonstrate the performance of the proposed method

empirically, we have conducted simulations on some clustering

problems: 2D-P100, 2D-P400, and 2D-P1000 are data sets

obtained from [1], [6] and 2D-N400, 2D-N800, and 2D-N1000

are data sets artificially generated from the same multimodal

distribution with different sizes to compare the time complexity of

the various methods. twocircles and threecirclesjoined

are well-known clustering data sets from [14] and iris,

waveform, satimage and shuttle are widely used classifica-tion data sets from the UCI repository [15].

The proposed method (Proposed) is compared with fivedifferent SVC methods; Complete Graph (CG) [1], DelaunayDiagram (DD) [11], Minimum Spanning Tree (MST), K-NearestNeighbors (K-NN), and Reduced Complete Graph (R-CG) [6]algorithms. Here, R-CG differs from the proposed method in that itapplies the CG algorithm to assign the cluster labels of the SEPs.The criteria for comparison are the cluster labeling error rate andthe average CPU time of cluster labeling. Here, the labeling errorrate means the percentage of the mislabeled data with respect tothe clusters determined by the trained kernel support function(TKSF) of SVC in (3). This measure evaluates how accurately achosen labeling method discriminates a connected component(cluster) from the other separate components (cluster) determinedby the TKSF.2 The parameter values ðC; qÞ can be controlled bycross validation to have a reasonable clustering result (e.g., lowmisclassification error rate for an a priori known, labeled data set).In our simulation, to focus on the comparison of cluster labelingerror rates, we have chosen ðC; qÞ values by fixing C ¼ 1 (not usingsoft margins) and choosing an appropriate width q for each dataset and then used the same values to compare the labelingperformance of the different SVC methods.

Experimental results are shown in Table 1 and Fig. 3. As is

shown in Table 1, cluster labeling task is more computationally

intensive than the SVC training task (QP time) and is frequently

error prone. In a cluster labeling task, CG shows a relatively good

labeling accuracy, but it has a heavy computational burden, and

does not correctly work for some problems as in Fig. 3a. DD shows

a relatively good labeling accuracy with moderate time complexity,

but is very impractical for high dimensional data sets. MST and

KNN show poorer labeling performance than other methods.

R-CG shows a fast computing time with moderate labeling error

rates, but fails frequently for a data set with highly curved cluster

boundaries as in 2D-N400. On large-scale real data sets

(satimage, shuttle), the cluster labeling results of only the

R-CG and the proposed method are available, whereas those of

other methods are not available due to their heavy time complexity

and requirements of large memory. As a result, the proposed

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006 1873

2. The cluster labeling error rates reported in Table 1 are conceptuallydifferent from misclassification error rates for classification problems. Forexample, if two points within the same connected component are labeled asthe same cluster by the TKSF, the labeling error has a zero value in ourmeasure, but the misclassification error can have a nonzero value if thesepoints have different class labels.

TABLE 1The Experimental Results

QP time: SVC training time. : Not available (out of memory or excess of computing time).

Page 6: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

method shows the overall best labeling accuracy with relatively

good time complexity.In summary, the proposed method has several nice features:

First, it is more robust and effective in cluster labeling by checkingthe function values of the transition points instead of checking thefunction values of whole line segments as other methods (see Figs. 3and 2a) do. Second, it focuses on the SEPs and transition pointsbetween components rather than the SVs after an SVC training step.Finally, it is inductive, i.e., it can easily assign a cluster label to unseentest data just by checking the cluster label of its corresponding SEP,whereas other methods have to retrain the data set, including a newdata point to assign a cluster label to the added data point.

7 CONCLUSION

In this paper, we have introduced the concept of a basin level celland developed a topological characterization of the clustersdescribed by the SVC. We have shown that each cluster can bedecomposed into its constituent basin level cells and can beextended to an enlarged clustered domain, thereby partitioning thewhole data space into separate clustered domains for inductiveclustering. We have also constructed a weighted graph thatsimplifies the topological structure of the clusters by introducingthe concept of a transition point and an adjacency and establishedan equivalence of the topological structure between the con-structed graph. To illustrate the developed theoretical results, aclustering algorithm employing the constructed graph was givenand showed, through simulation, a significant improvement in theaccuracy and in the scope of cluster labeling over other SVCmethods.

Until now, the analysis of the cluster structures described by thesupport vector clustering methods had been poorly made. Weexpect that our topological and dynamical characterization willpave the way for many new extensions to the support vectorclustering methods.

ACKNOWLEDGMENTS

This work was supported by the Korea Science and EngineeringFoundation (KOSEF) under grant number R01-2005-000-10746-0.

REFERENCES

[1] A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik, “Support VectorClustering,” J. Machine Learning Research, vol. 2, pp. 125-137, 2001.

[2] C.J. Burges, “A Tutorial on Support Vector Machines for PatternRecognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.

[3] F. Camastra and A. Verri, “A Novel Kernel Method for Clustering,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 461-464,Mar. 2005.

[4] H.K. Khalil, Nonlinear Systems. New York: Macmillan, 1992.[5] J. Lee and H.-D. Chiang, “A Dynamical Trajectory-Based Methodology for

Systematically Computing Multiple Optimal Solutions of General Non-linear Programming Problems,” IEEE Trans. Automatic Control, vol. 49,no. 6, pp. 888-899, June 2004.

[6] J. Lee and D. Lee, “An Improved Cluster Labeling Method for SupportVector Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 27, no. 3, pp. 461-464, Mar. 2005.

[7] B. Scholkopf, J. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson,“Estimating the Support of a High-Dimensional Distribution,” NeuralComputation, vol. 13, no. 7, pp. 1443-1472, 2001.

[8] D.M.J. Tax and R.P.W. Duin, “Support Vector Domain Description,” PatternRecognition Letters, vol. 20, pp. 1191-1199, 1999.

[9] V.N. Vapnik, “An Overviewof Statistical Learning Theory,” IEEE Trans.Neural Networks, vol. 10, pp. 988-999, Sept. 1999.

[10] J. Yang, V. Estivill-Castro, and S.K. Chalup, “Support Vector Clusteringthrough Proximity Graph Modelling,” Proc. Ninth Int’l Conf. NeuralInformation Processing (ICONIP ’02), pp. 898-903, 2002.

[11] R. Jensen, D. Erdogmus, J.C. Principe, and T. Eltoft, “The Laplacian PDFDistance: A Cost Function for Clustering in a Kernel Feature Space,”Advances in Neural Information Processing Systems (NIPS), vol. 17, pp. 625-632, Cambridge, Mass.: MIT Press, 2005.

[12] M. Girolami, “Mercer Kernel Based Clustering in Feature Space,” IEEETrans. Neural Networks, vol. 13, no. 4, pp. 780-784, July 2002.

[13] A.Y. Ng, M.I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis andan Algorithm,” Advances in Neural Information Processing Systems (NIPS),vol. 14, 2002.

[14] UCI Repository of Machine Learning Databases, http:www.ics.uci.edu/~mlearn/MLRepository.html, 2006.

. For more information on this or any other computing topic, please visit ourDigital Library at www.computer.org/publications/dlib.

1874 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 11, NOVEMBER 2006

Fig. 3. Experimental comparison 2D-N400 data set. Parameter values are q ¼ 20, C ¼ 1 and the bold solid line is the cluster boundary determined by the trained kernel

support function. The cluster labels, assigned by the labeling methods, are indicated by different symbols. (a) CG. (b) DD. (c) MST. (d) KNN. (e) R-CG. (f) Proposed.