parameter optimization for support vector machine based on

Parameter Optimization for Support Vector

Machine Based on Nested Genetic Algorithms

Pin Liao, Xin Zhang, and Kunlun Li College of Science and Technology, Nanchang University, Nanchang, China

Email: [email protected], [email protected], [email protected]

Yang Fu

Tian Ge Interactive Holdings Limited, Beijing, China

Email: [email protected]

Mingyan Wang and Sensen Wang

Information Engineering School, Nanchang University, Nanchang, China

Email: [email protected], [email protected]

Abstract—Support Vector Machine (SVM) is a popular and

landmark classification method based on the idea of

structural risk minimization, which has obtained extensive

adoption across numerous domains such as pattern

recognition, regression, ranking, etc. In order to achieve

satisfying generalization, penalty and kernel function

parameters of SVM must be carefully determined. This

paper presents an original method based on two nested real-

valued genetic algorithms (NRGA), which can optimize the

parameters of SVM efficiently and speed up the parameter

optimization by orders of magnitude compared to the

traditional methods which optimize all the parameters

simultaneously. As illustrated by the experimental results on

gender classification of facial images, the proposed

parameter optimization method, NRGA, can develop a SVM

classifier quickly with superior classification accuracy due

to its overwhelming efficiency and consequent searching

power.

Index Terms—support vector machine, parameter

optimization, genetic algorithm, nested optimization method,

gender classification of facial images

I. INTRODUCTION

Support Vector Machine (SVM) [1], [2] is a well-

known and successful two-class classification method

based on statistical learning theory. SVM is also known

as maximum margin classifier, which maximizes the

geometric margin between two classes in order to reach a

minimum generalization error. Thereby, SVM has been

successfully employed for many diverse classification

applications, mainly because of its extraordinary

generalization performance.

Properly-chosen SVM parameters are critical for

achieving highly generalization. Penalty parameter C and

kernel function parameters are the parameters of SVM to

be optimized. Penalty parameter is introduced to make a

trade-off between minimizing the empirical risk and

Manuscript received October 12, 2014; revised December 26, 2014.

minimizing the model complexity. And kernel function

parameters, such as the gamma γ for the radial basis

function (RBF) kernel, define the non-linear mapping

from the input space to some high-dimensional feature

space. Just for convenience, this work uses RBF as the

kernel function of SVM without loss of generality.

Hence this paper proposes a novel method to optimize

the parameters of SVM by nesting two real-valued

genetic algorithms (NRGA). NRGA is tremendously

efficient compared to the existing parameter optimization

techniques which simultaneously search all the

parameters. In NRGA, the inner-loop real-valued genetic

algorithm (RGA) involves the optimization of penalty

factor C with fixed kernel function parameters, and the

outer-loop RGA is in charge of optimizing kernel

function parameters. So that, in each step of the outer

loop, the kernel values remain unchanged and can be

reused for all iterations of the inner loop. Therefore,

NRGA can reduce the computing time of the SVM

parameter optimization by orders of magnitude

(dependent on the iteration number of the inner loop)

compared to the traditional optimization approaches.

The remainder of this paper is organized as follows.

Section II reviews pertinent literature on parameter

optimization for SVM. Section III gives a brief

introduction to SVM. Section VI describes the parameter

optimization method based on two nested RGA. Section

V presents the experimental results from using the

proposed method for gender classification of facial

images. Conclusions are given in Section VI.

II. RELATED WORK

The optimal parameter search on SVM plays a crucial

role in establishing an efficient SVM model with high

prediction accuracy and stability. Improper parameter

settings result in poor classification performance, while

the optimal categorization accuracy of SVM arises from

seeking optimal parameters.

507

Journal of Automation and Control Engineering Vol. 3, No. 6, December 2015

©2015 Engineering and Technology Publishingdoi: 10.12720/joace.3.6.507-511

It was suggested by Vapnik in [2] that the parameters

can be set directly using a priori knowledge of the

specific problem to be solved. However, usually the

method is not practical and effective, and is seldom

adopted by researchers. As a simple and direct algorithm to seek prompt values

for the SVM parameters [3]-[5], grid search is apt to be time consuming. And the obtained solutions significantly rely on the selected grid granularity and search space. In [3] a “grid-search” on C and γ using cross-validation was recommended, because the authors did not feel safe psychologically to use methods which avoid doing an exhaustive parameter search by approximations or heuristics and they considered that the grid-search could be easily parallelized since each (C, γ) is independent. An approach was presented in [4] for selecting SVM parameters based on ideas from design of experiments, which started with a very coarse grid covering the entire search range and iteratively refined both the grid resolution and search boundaries, maintaining the number of samples at each step almost invariable. A heuristic searching strategy was presented to determine the good combination of parameter values in order to avoid high computational cost of grid search [5].

Genetic algorithm (GA) has been widely adopted for parameter setting of SVM [6], [7], [8], for it tends to be quite good at finding generally good global solutions. In [6], a GA based method was proposed to simultaneously optimize the feature subset and the parameters for SVM. In [7], a RGA is employed to optimize the parameters of SVM for predicting bankruptcy, for that RGA is more straightforward, faster and more efficient than the binary GA. In [8], the asymptotic behaviors of SVM are fused with GA, which thereby directs the search to the straight line of optimal generalization error in the superparameter space.

Gradient descent method is also employed to determine the SVM parameters [9], [10]. In [9] the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton algorithm was used for arriving at good values of hyperparameters. A method for tuning SVM parameters used a gradient descent algorithm over the set of parameters to minimize some estimates of the generalization error [10]. As it is well known, gradient descent algorithm is sensitive to initial parameters and usually convergence to local optimum, so that it becomes impractical for non-complex optimization problems.

As a generic probabilistic metaheuristic for the global optimization problem, simulated annealing (SA) has been applied to optimize SVM parameters [11], [12], because it is excellent at avoiding local optimums and is commonly good at finding an approximate global optimum. Two classical techniques, GA and SA, was evaluated on optimizing SVM parameters in [11], and it was mentioned that the two methods both obtained similar results, while GA tended to be faster and SA needed less parameter settings. In [12] a modified SA (MSA) algorithm was developed and combined with the linear least square and gradient descent paradigms for parameter optimization in SVM.

As a new optimization method based on the biologic

immune principle of living beings, artificial immune

technique can effectively avoid the premature

convergence and guarantee the variety of solution. Thus,

some artificial immune algorithm (AIA) based

approaches were presented for SVM parameter

optimization [13]-[16]. An artificial immune based

parameter optimization of SVM was introduced in [13],

which was able to effectively solve the conflict between

local searching and globe searching in the process of

features selection and parameters selection. Artificial

immunization algorithm is used to optimize the

parameters in SVM in [14] to improve the total capability

of the SVM classifier for the fault diagnosis of turbo

pump rotor, where the concentration and variety of the

antibody is utilized to automatically select these

parameters. In [15], an immune algorithm was given to

search optimal parameter of sphere-shaped SVM for

constructing 3D model for encephalic tissues, which not

only has fast convergence speed and powerful searching

ability, but also could avoid the phenomenon of

degeneration and immaturity as well. A multi-objective

artificial immune algorithm was developed to obtain

optimal multiple kernel and penalize parameters in [16],

which not only maximizes the accuracy rate but also

minimizes the number of support vectors. Particle swarm optimization (PSO) is a method for

global numerical optimization based on swarm intelligence, which makes few or no assumptions about the optimization problem. An approach based on PSO was developed for parameter determination of SVM [17]. The developed approach not only tunes the parameter values of SVM, but also selects an optimal feature subset, maximizing the classification accuracy rate of SVM. The experimental results demonstrate that the classification accuracy rate of the developed approach surpasses those of grid search and many other approaches, and that the developed PSO + SVM approach has a similar result to GA + SVM.

III. SUPPORT VECTOR MACHINE FOR CLASSIFICATION

In this section a brief introduction to SVM for classification is given. A more comprehensive description of SVM was provided in [2]. SVM involves a supervised learning algorithm which can solve linear and nonlinear binary classification problems. The standard two-class soft-margin SVM classification problem is considered in this paper, which tries to find a maximum-margin hyperplane to separate two-class samples with low generalization error.

Given some training samples of two classes xi∈d, i =

1,…, N, with their corresponding labels yi∈ {−1, 1}, where d is the dimension of data points xi, and n is the number of training samples, training SVM requires the solution of the following (primal) quadratic programming (QP) optimization problem to attain the weight vector w and the offset b.

2

, ,1

1min

2

. (( ) ) 1 ,

0, 1,

n

iw b

i

i i i

i

w C

s t y w x b

i n

(1)

Data point xi can be mapped into a high dimensional

space by mapping function () if it is nonlinear. The

508


©2015 Engineering and Technology Publishing

penalty coefficient C controls the width of margin. And

non-negative relax variables ξi measure the degree of

misclassification of the data points xi. The separating

hyper-plane’s generalization ability is determined by the

margin, calculated as 2 w . The minimization process in

formulation (1) is known as structure risk minimization,

which optimizes the trade-off between empirical error

minimization and margin maximization. Usually the

primal problem of (1) can be solved by solving its dual

problem of (2). 1

min2

. . 0

0 , 1,...,

T T

T

i

Q e

s t y

C i n

(2)

where e is the vector of all ones, αi is the Lagrange

multiplier, and Q is an n by n symmetric and positive

semidefinite matrix. The element of Q is given by

( , )ij i j i jQ y y K x x , and ( , ) ( ) ( )T

i j i jK x x x x is called

kernel function.

SVM is trained to get the value of each αi by solving

the QP problem (2) using training data points.

Subsequently, the following function is used to classify

any new data points:

1

( ) ( , )n

i i i

i

g x sign y K x x

where the bias term β is computed during the training as

well. Depending on the selected kernel, β may implicitly

be a part of the kernel function, which will be not

required if RBF kernel is chosen.

IV. PARAMETER OPTIMIZATION BASED ON NESTING

REAL-VALUED GENETIC ALGORITHMS

Linear, polynomial, sigmoid and RBF are usually

employed in SVM as kernel function. RBF is used

particularly widely, for its simplicity with only one

parameter γ, and its ability of generally leading to good

generalization. Thus, RBF is adopted in this study as the

kernel function in SVM without loss of generality, since

the parameters of other kernel functions can also be

sought with the same technique.

Training SVM involves the solution of a large and

convex QP optimization problem. With the purpose of

decreasing the optimization cost, some heuristic

algorithms are developed to decompose the large QP

problem into a series of sub-problems. Typically the

sequential minimal optimization (SMO) [18] algorithm

employs only two working samples, and optimizes the

reduced sub-problems with a simple analytic method.

And LIBSVM [19], an improved SMO algorithm, makes

use of second order information and accelerates

convergence. Therefore, LIBSVM is adopted to train

SVM in this work.

As a search heuristic based on Darwinian natural

selection and genetics in biological systems, GA has been

popularly and effectively applied to many optimization

problems. GA is well suited to solve complex non-linear

optimization problems without requiring gradient

information or a priori knowledge about model properties.

In GA, a population of candidate solutions to an

optimization problem is evolved toward better solutions

using operators inspired by natural evolution, such as

selection, crossover, and mutation. Solutions are

traditionally coded in binary as strings of 0s and 1s.

Real-valued GA (RGA) is quite different from the

traditional binary GA, which uses a real value as a

parameter in a chromosome, so that RGA needs no

coding and encoding process to compute the fitness

values of individuals. Hence, RGA is more direct, quicker

and more convenient than BGA. As a result, the two

parameters, C and γ, of SVM are simply used to form the

chromosome in this work.

The fitness value of RGA is determined by the

accuracy rate of cross-validation. And crossover,

mutation and selection operators are employed to produce

the offspring of the current population. The roulette

wheel technique is used to select which chromosomes can

be the survivors to the next generation.

The crossover operation of real-valued chromosome is

given as follows.

1 1 2 1 2min( , ) 0.5*new old old old oldG G G G G

2 1 2 1 2max( , ) 0.5*new old old old oldG G G G G

where the pair of populations before crossover operation

are denoted as G1old

and G2old

, and the pair of new

populations after crossover operation are denoted as G1new

and G2new

.

The mutation operation adopted here creates a random

real value in a given range.

The computation complexity of SVM parameter

optimization can be reduced by orders of magnitude

using NRGA compared to the traditional methods [3]-[17]

all of which optimize the penalty factor and the kernel

parameter simultaneously. In a different way, NRGA

optimizes the SVM parameters by nesting two loops of

RGA. As a result, in the traditional methods, the kernel

matrix values should be updated for each chromosome of

{C, γ} in the population, which is the most computation

consuming part of SMO. However, in NRGA, the inner

loop RGA optimizes C and the outer loop RGA optimizes

γ, so that at each outer step with one fixed γ the kernel

matrix values should be fixed and can be reused for all

the iterations in the inner loop corresponding to a lot of

different C in a population. As an remarkable result, the

computation cost can be significantly reduced by orders

of magnitude dependent on the size of C population. And

it is notable that NRGA searches the same global range as

the traditional methods do.

Fig. 1 shows the flowchart of NRGA for SVM

parameter-optimization by nesting two loops of RGA.

The algorithm of NRGA can be expressed as follows:

1. Initialize a real-valued γ population.

2. Evaluate the fitness of each γ. For each γ:

2.1 Initialize a real-valued C population.

509



2.2 Evaluate the fitness of each C with the fixed γ,

i.e.

a) Train SVM classifiers using cross

validation with each C and the fixed γ.

b) Calculate the fitness corresponding to the

accuracy rate of cross validation.

2.3 Set the best fitness of the fixed γ with each C

in C population as the fitness of the γ.

Figure 1. The flowchart of NRGA

2.4 Go to Step 3 if termination criterion is satisfied,

or go to Step 2.5 otherwise.

2.5 Evolve a new C population by crossover,

mutation and selection. Go to Step 2.2.

3. Go to Step 5 if termination criterion is satisfied, or

go to Step 4 otherwise.

4. Evolve a new γ population by crossover, mutation

and selection. Go to Step 2.

5. Select the final {C, γ} corresponding to the best

fitness (i.e. the highest accuracy of cross validation).

V. EXPERIMENTS

The used platform is a Dell x86 server with dual

2.4GHz Intel Xeon E5620 CPUs and 72GB RAM.

Window Server 2003 is the used operating system.

Microsoft Visual C++ 2010 is the employed development

environment. GAlib, a C++ Genetic Algorithm library

[20] is adopted, which was developed for easily trying

various objective functions, representations, genetic

operators, and genetic algorithms.

To assess the performance of NRGA for SVM

parameter optimization, we conduct experiments for

gender classification of facial images on a large labeled

database created by us. The large-scale database includes

over 20,000 facial images collected from the Internet.

And some synthetic facial images are derived from

original images with slight geometric transforms of

translation, scaling, rotation and mirror-reflection, so that

a training set with 80,000 samples and a cross-validation

set with 20,000 samples are constructed. In addition, the

testing set consists of 4400 original facial images, which

are irrelevant to the training samples and the cross-

validation samples. The feature with a dimension of 4880

is extracted using Gabor filters.

In the experiments, the search range of C is set as [2-5

,

215

], and the search range of γ [2-15

, 23], as recommended

by [3]. And as to each RGA in NRGA, each population

size is set as 200, crossover rate 0.9, and mutation rate

0.05.

TABLE I.

ACCURACY RATES OF DIFFERENT METHODS

Method

C

γ

Accuracy (%)

Grid Search

4

0.00225

96.9119

Traditional RGA

1.213

0.00829

97.1617

NRGA

1.095

0.00894

97.2071

510



Table I illustrates that NRGA exceeds the grid search

and the traditional RGA which seek all the parameters

contemporaneously. NRGA is far more efficient than the

traditional methods which optimize all the parameters all

together, while searching the same global range. Hence,

the computation complexity can be dramatically reduced

by orders of magnitude dependent on the size of C

population. The best performance of NRGA may result

from its dramatic efficiency, so that it is able to search

much more candidate solutions and to find a better

optimal solution than the other traditional methods.

VI. CONCLUSION

In this paper, a novel SVM parameter optimization

method, NRGA, is proposed. Based on two nested RGA,

NRGA reduces the computation cost significantly by

orders of magnitude (related to the size of C population)

compared to the traditional optimization techniques

which search all the parameters all together. The

remarkable efficiency and effectiveness of NRGA are

illustrated by the experimental results on gender

classification of facial images.

ACKNOWLEDGMENT

The authors wish to thank Professor Chih-Jen Lin and

his research team members for kindly providing the

LIBSVM tool, and to thank Dr. Matthew Wall for GAlib.

This research is partially sponsored by the Jiangxi

Province Education Department science and technology

project (GJJ13086), and Nanchang Key Laboratory of

Pattern Recognition.

REFERENCES

[1] C. Cortes and V. Vapnik., “Support-vector networks,” Machine

Learning, vol. 20, no. 3, pp. 273-297, 1995. [2] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed.

New York: Springer-Verlag, 1999.

[3] C. W. Hsu, C. C. Chang, and C. J. Lin. (2003). A practical guide to support vector classification. Technical Report. [Online].

Available: http://www.csie.ntu.edu.tw/˜cjlin/libsvm/ [4] C. Staelin, “Parameter selection for support vector machines,”

Technical Report HPL-2002-354 (R.1), HP Laboratories Israel,

2003.

[5] J. Wang, X. Wu, and C. Zhang, “Support vector machines based

on k-means clustering for real-time business intelligence systems,” International Journal of Business Intelligence and Data Mining,

vol. 1, no. 1, pp. 54–64, 2005.

[6] C. L. Huang and C. J. Wang, “A GA-based feature selection and parameters optimization for support vector machine,” Expert

Systems with Applications, vol. 31, no. 2, pp. 231–240, 2006. [7] C. H. Wu, G. H. Tzeng, Y. J. Goo, and W. C. Fang, “A real-

valued genetic algorithm to optimize the parameters of support

vector machine for predicting bankruptcy,” Expert Systems with Applications, vol. 32, no. 2, pp. 397–408, 2007.

[8] M. Y. Zhao, C. Fu, L. P. Ji, K. Tang, and M. T. Zhou, “Feature selection and parameter optimization for support vector machines:

A new approach based on genetic algorithm with feature

chromosomes,” Expert Systems with Applications, vol. 38, pp. 5197–5204, 2011.

[9] S. S. Keerthi, “Efficient tuning of SVM hyper parameters using radius/margin bound and iterative algorithms,” IEEE Trans. on

Neural Networks, vol. 13, no. 5, pp. 1225–1229, 2002.

[10] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,”

Machine Learning, vol. 46, no. 1, pp. 131–159, 2002. [11] F. Imbault and K. Lebart, “A stochastic optimization approach for

parameter tuning of support vector machines,” in Proc. 17th

International Conf. Pattern Recognition, vol. 4, 2004, pp. 597–600.

[12] H. Q. Wang, D. S. Huangk, and B. Wang, “Optimisation of radial basis function classifiers using simulated annealing algorithm for

cancer classification,” Electronics Letters, vol. 41, no. 11, pp. 630-

632, 2005. [13] H. G. Zhou and C. D. Yang, “Using immune algorithm to

optimize anomaly detection based on SVM,” in Proc. IEEE International Machine Learning and Cybernetics Conference,

Dalian, China, 2006, pp. 4257–4261.

[14] S. Yuan and F. Chu, “Fault diagnosis based on support vector machine with parameter optimization by artificial immunisation

algorithm,” Mechanical Systems and Signal Processing, vol. 21, no. 3, pp. 1318–1330, 2007.

[15] L. Guo, L. Wang, Y. Wu, W. Yan, and X. Shen, “Research on 3D

modeling for head MRI image based on immune sphere-shaped support vector machine,” in Proc. IEEE Eng. Med. Biol. Soc,

Lyon, France, 2007, pp. 1082–1085. [16] I. Aydin, M. Karakose, and E. Akin, “A multi-objective artificial

immune algorithm for parameter optimization in support vector

machine,” Applied Soft Computing, vol. 11, pp. 120–129, 2011. [17] S. W. Lin, K. C. Ying, S. C. Chen, and Z. J. Lee, “Particle swarm

optimization for parameter determination and feature selection of support vector machines,” Expert Systems with Applications, vol.

35, pp. 1817–1824, 2008.

[18] J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods-

Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola, ed. MIT Press, 1999, pp. 185–208.

[19] C. C. Chang and C. J. Lin, “LIBSVM: A library for support vector

machines,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, pp. 27, 2011.

[20] GAlib Web Site. [Online]. Available: http://lancet.mit.edu/ga [21] P. Liao, et al., “Nesting genetic algorithms for parameter

optimization of support vector machine,” in Proc. International

Academic Conf. Information Science and Communication Engineering, 2014.

Pin Liao was born in 1975. He received the B.S. degree in computer

science from Nanchang University, Nanchang, China in 1996, the M.S. degree in pattern recognition and intelligent system from Beijing

Institute of Technology, Beijing, China in 1999, and the Ph.D. degree in computer science from Institute of Computing Technology, Chinese

Academy of Sciences, China in 2003. He joined the faculty of College

of Science and Technology, Nanchang University, China, in 2005, where he is currently a Professor. His current research interests include

face recognition, computer vision, neural networks and machine learning.

Yang Fu received the B.S. degree in software engineering in 1996, and the M.S. degree in computer system organization in 2012, from

Nanchang University, Nanchang, China. He is now with Tian Ge Interactive Holdings Limited, Beijing, China.

Xin Zhang received the B.S. degree in computing mathematics in 1999, and the M.S. degree in computer science in 2005, from Southwest

Jiaotong University, Chengdu, China. He joined the faculty of College of Science and Technology, Nanchang University, China, in 2005,

where he is currently an Associate Professor. His current research

interests include pattern recognition, data mining and machine learning.

511



http://lancet.mit.edu/ga.

parameter optimization for support vector machine based on

Documents