i-ling chen 1 , bor-chen kuo 1 , chen-hsuan li 2 , chih-cheng hung 3

COMBINING ENSEMBLE TECHNIQUE OF SUPPORT VECTOR MACHINES WITH

THE OPTIMAL KERNEL METHOD FOR HIGH DIMENSIONAL DATA

CLASSIFICATION

I-Ling Chen1, Bor-Chen Kuo1, Chen-Hsuan Li2, Chih-Cheng Hung3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung,

Taiwan, R.O.C.

2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C.

3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.

Introduction• Statement of problems

• The Objective Literature Review• Support Vector Machines

– Kernel method

• Multiple Classifier System

– Random subspace method , Dynamic subspace method

• An Optimal Kernel Method for selecting RBF Kernel Parameter

Optimal Kernel-based Dynamic Subspace Method Experimental Design and Results Conclusion and Future Work

Outline

INTRODUCTION

or so called curse of dimensionality, peaking phenomenon

Small sample size, N

High dimensionality, dlow performance dN

Hughes Phenomenon (Hughes, 1968)

Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998)

It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe-Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006)

SVM includes

Kernel Trick

Support Vector Learning

Support Vector Machines (SVM)

The Goal of Kernel Method for Classification

The samples in the same class can be mapped into the same area.

The samples in the different classes can be mapped into the different areas.

SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set.

Illustration of SV learning with kernel trick:

optimal hyperplanesupport vectors

margins

w

1iy

1iy

support vector

Support Vector Learning

0)( bxwT1)( bxwT

-1)( bxwT

space featurespace original : nonlinear feature mapping

space Feature

Multiple Classifier System

There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets.

(Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010)

Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons.

Approaches to building classifier ensembles.

THE FRAMEWORK OF RANDOM SUBSPACE METHOD(RSM) BASED ON SVM (HO, 1998)

Given the learning algorithm, SVM, and the ensemble size, S.

THE INADEQUACIES OF RSM


＊ Irregular Rule Each individual feature potentiallypossesses the different discriminatepower for classification.A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones.

＊ Implicit NumberHow to choose a suitable subspace dimensionality for the SVM.Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier.

random features selection

Given w

• Two importance distributions– Importance distribution of feature weight, W distribution

to model the selected probability of each feature.

– Importance distribution of subspace dimensionality, R distributionto automatically determine the suitable subspace size.

1 49 97 145 1910

1

2

3

4

Dimensionality of Subspace

De

nsi

ty (

%)

1 49 97 145 1910

1

2

3

4

Dimensionality of Subspace

Den

sity

(%

)

Initialization R0

Kernel smoothing

1 49 97 145 1910

1

2

3

4

Feature

De

nsi

ty (

%)

Class separability of LDA for each feature

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

ML

SVM

kNN

BCC

Re-substitution accuracy for each feature

DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010)

THE FRAMEWORK OF DSM BASED ON SVM


INADEQUACIES OF DSM


＊ Kernel functionThe SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM.

＊ time-consumingChoosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM.

The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function.

Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically.

An Optimal Kernel Method for Selecting RBF Kernel Parameter

Gaussian Radial Basis Function (RBF) kernel :

1),(0 }0{,2

exp),(2

2

zxkR

zxzxk

In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.

Kernel-based Dynamic Subspace Method (KDSM)

THE FRAMEWORK OF KDSM

Original DatasetX

Sep

arab

ility

Feature (Band)

Kernel based Feature Selection Distribution Mdist

Multiple Classifiers

Subspace Pool(Reduced Dataset)

Decision Fusion(Majority Voting)

Kernel based W distribution

Kernel Space (L-dimension)

Optimal RBF Kernel Algorithm + Kernel Smoothing

Optimal RBF Kernel Algorithmfunction) kernel in dimension each of parameters (Optimal

)(~ WWDSc ),,(~

wMXMFSX

Until the performance of classification is stable

Experiment Design

Algorithm Description

SVM_CVWithout any dimension reduction on only a

single SVM with CV method

SVM_OPWithout any dimension reduction on only a

single SVM with OP method

DSM_WACC

DSM with the re-substitution accuracy as the feature weights

DSM_ WLDA

DSM with the separability of Fisher’s LDA as the feature weights

KDSMKernel-based dynamic subspace method

proposed in this research

OP : the optimal method to choose

CV : 5-fold cross-validation

We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ2) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins.

Hyperspectral Image data

EXPERIMENTAL DATASET

IR Image

Image(No. of bands)

Washington, DC Mall(dims d=191)

# of classes 7

Category (No. of labeled data)

Roof (3776)Road (1982)Path (737)

Grass (2870)Tree (1430)

Water (1156)Shadow (840)

Experimental Results

Method SVM_CV SVM_OP DSM_WACC DSM_WLDA KDSM

Case 1 Accuracy (%) 83.66 83.79 85.49 87.47 88.64

CPU Time (sec)

30.35 3.10 6045.31 2188.62 155.31

Case 2 Accuracy (%) 86.39 87.89 88.74 89.43 92.53

CPU Time (sec)

116.02 6.65 21113.75 4883.92 308.26

Case 3 Accuracy (%) 94.69 95.31 95.94 96.94 97.43

CPU Time (sec)

5858.18

376.99 1165048.6

220121.62

17847.7

There are three cases in Washington, DC Mall. case 1: ; case 2:

case 3:

22014020 dNN i28022040 NdN i

2100300220 NNd i

: the number of training samples in class i

: the number of all training samples

iN

N

Experiment Results in Washington, DC Mall

Method

Case 1 Case 2 Case 3

Accuracy

RatioAccurac

y Ratio Accuracy Ratio

DSM_WACC 85.49% 38.924 88.74% 68.493 95.94% 65.277

DSM_WLDA 87.47% 14.092 89.43% 15.844 96.94% 12.333

KDSM 88.64% 1 92.53% 1 97.43% 1

The outcome of classification by using various multiple classifier systems:

Classification Maps with Ni =20 in Washington, DC Mall

□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow

SVM_CV SVM_OP

DSM_WACC DSM_WLDA KDSM

Classification Maps (roof) with Ni =40


SVM_CV SVM_OP


Classification Maps with Ni =300 in Washington, DC Mall


SVM_CV SVM_OP


In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset.

The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets.

Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time.

Conclusions

25

Thank You

i-ling chen 1 , bor-chen kuo 1 , chen-hsuan li 2 , chih-cheng hung 3

Documents

goal of kernel method

suitable subspace size

svm ho

ensemble size

learning algorithm

different feature subsets

nhigh dimensionality

curse of dimensionality