i-ling chen 1 , bor-chen kuo 1 , chen-hsuan li 2 , chih-cheng hung 3

25
COMBINING ENSEMBLE TECHNIQUE OF SUPPORT VECTOR MACHINES WITH THE OPTIMAL KERNEL METHOD FOR HIGH DIMENSIONAL DATA CLASSIFICATION I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih- Cheng Hung 3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C. 2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C. 3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.

Upload: hastin

Post on 31-Jan-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Combining Ensemble Technique of Support Vector Machines with the Optimal Kernel Method for High Dimensional Data Classification. I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

COMBINING ENSEMBLE TECHNIQUE OF SUPPORT VECTOR MACHINES WITH

THE OPTIMAL KERNEL METHOD FOR HIGH DIMENSIONAL DATA

CLASSIFICATION

I-Ling Chen1, Bor-Chen Kuo1, Chen-Hsuan Li2, Chih-Cheng Hung3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung,

Taiwan, R.O.C.

2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C.

3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.

Page 2: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Introduction• Statement of problems

• The Objective Literature Review• Support Vector Machines

– Kernel method

• Multiple Classifier System

– Random subspace method , Dynamic subspace method

• An Optimal Kernel Method for selecting RBF Kernel Parameter

Optimal Kernel-based Dynamic Subspace Method Experimental Design and Results Conclusion and Future Work

Outline

Page 3: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

INTRODUCTION

Page 4: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

or so called curse of dimensionality, peaking phenomenon

Small sample size, N

High dimensionality, dlow performance dN

Hughes Phenomenon (Hughes, 1968)

Page 5: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998)

It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe-Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006)

SVM includes

Kernel Trick

Support Vector Learning

Support Vector Machines (SVM)

Page 6: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

The Goal of Kernel Method for Classification

The samples in the same class can be mapped into the same area.

The samples in the different classes can be mapped into the different areas.

Page 7: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set.

Illustration of SV learning with kernel trick:

optimal hyperplanesupport vectors

margins

w

1iy

1iy

support vector

Support Vector Learning

0)( bxwT1)( bxwT

-1)( bxwT

space featurespace original : nonlinear feature mapping

space Feature

Page 8: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Multiple Classifier System

There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets.

(Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010)

Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons.

Approaches to building classifier ensembles.

Page 9: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

THE FRAMEWORK OF RANDOM SUBSPACE METHOD(RSM) BASED ON SVM (HO, 1998)

Given the learning algorithm, SVM, and the ensemble size, S.

Page 10: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

THE INADEQUACIES OF RSM

Given the learning algorithm, SVM, and the ensemble size, S.

* Irregular Rule Each individual feature potentiallypossesses the different discriminatepower for classification.A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones.

* Implicit NumberHow to choose a suitable subspace dimensionality for the SVM.Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier.

random features selection

Given w

Page 11: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

• Two importance distributions– Importance distribution of feature weight, W distribution

to model the selected probability of each feature.

– Importance distribution of subspace dimensionality, R distributionto automatically determine the suitable subspace size.

1 49 97 145 1910

1

2

3

4

Dimensionality of Subspace

De

nsi

ty (

%)

1 49 97 145 1910

1

2

3

4

Dimensionality of Subspace

Den

sity

(%

)

Initialization R0

Kernel smoothing

1 49 97 145 1910

1

2

3

4

Feature

De

nsi

ty (

%)

Class separability of LDA for each feature

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

1 49 97 145 1910

1

2

Feature

Den

sity

(%

)

ML

SVM

kNN

BCC

Re-substitution accuracy for each feature

DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010)

Page 12: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

THE FRAMEWORK OF DSM BASED ON SVM

Given the learning algorithm, SVM, and the ensemble size, S.

Page 13: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

INADEQUACIES OF DSM

Given the learning algorithm, SVM, and the ensemble size, S.

* Kernel functionThe SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM.

* time-consumingChoosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM.

Page 14: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function.

Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically.

An Optimal Kernel Method for Selecting RBF Kernel Parameter

Gaussian Radial Basis Function (RBF) kernel :

1),(0 }0{,2

exp),(2

2

zxkR

zxzxk

In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.

Page 15: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Kernel-based Dynamic Subspace Method (KDSM)

Page 16: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

THE FRAMEWORK OF KDSM

Original DatasetX

Sep

arab

ility

Feature (Band)

Kernel based Feature Selection Distribution Mdist

Multiple Classifiers

Subspace Pool(Reduced Dataset)

Decision Fusion(Majority Voting)

Kernel based W distribution

Kernel Space (L-dimension)

Optimal RBF Kernel Algorithm + Kernel Smoothing

Optimal RBF Kernel Algorithmfunction) kernel in dimension each of parameters (Optimal

)(~ WWDSc ),,(~

wMXMFSX

Until the performance of classification is stable

Page 17: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Experiment Design

Algorithm Description

SVM_CVWithout any dimension reduction on only a

single SVM with CV method

SVM_OPWithout any dimension reduction on only a

single SVM with OP method

DSM_WACC

DSM with the re-substitution accuracy as the feature weights

DSM_ WLDA

DSM with the separability of Fisher’s LDA as the feature weights

KDSMKernel-based dynamic subspace method

proposed in this research

OP : the optimal method to choose

CV : 5-fold cross-validation

We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ2) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins.

Page 18: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Hyperspectral Image data

EXPERIMENTAL DATASET

IR Image

Image(No. of bands)

Washington, DC Mall(dims d=191)

# of classes 7

Category (No. of labeled data)

Roof (3776)Road (1982)Path (737)

Grass (2870)Tree (1430)

Water (1156)Shadow (840)

Page 19: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Experimental Results

Method SVM_CV SVM_OP DSM_WACC DSM_WLDA KDSM

Case 1 Accuracy (%) 83.66 83.79 85.49 87.47 88.64

CPU Time (sec)

30.35 3.10 6045.31 2188.62 155.31

Case 2 Accuracy (%) 86.39 87.89 88.74 89.43 92.53

CPU Time (sec)

116.02 6.65 21113.75 4883.92 308.26

Case 3 Accuracy (%) 94.69 95.31 95.94 96.94 97.43

CPU Time (sec)

5858.18

376.99 1165048.6

220121.62

17847.7

There are three cases in Washington, DC Mall. case 1: ; case 2:

case 3:

22014020 dNN i28022040 NdN i

2100300220 NNd i

: the number of training samples in class i

: the number of all training samples

iN

N

Page 20: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Experiment Results in Washington, DC Mall

Method

Case 1 Case 2 Case 3

Accuracy

RatioAccurac

y Ratio Accuracy Ratio

DSM_WACC 85.49% 38.924 88.74% 68.493 95.94% 65.277

DSM_WLDA 87.47% 14.092 89.43% 15.844 96.94% 12.333

KDSM 88.64% 1 92.53% 1 97.43% 1

The outcome of classification by using various multiple classifier systems:

Page 21: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Classification Maps with Ni =20 in Washington, DC Mall

□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow

SVM_CV SVM_OP

DSM_WACC DSM_WLDA KDSM

Page 22: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Classification Maps (roof) with Ni =40

□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow

SVM_CV SVM_OP

DSM_WACC DSM_WLDA KDSM

Page 23: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

Classification Maps with Ni =300 in Washington, DC Mall

□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow

SVM_CV SVM_OP

DSM_WACC DSM_WLDA KDSM

Page 24: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset.

The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets.

Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time.

Conclusions

Page 25: I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 ,  Chih-Cheng Hung 3

25

Thank You