novel bayesian inference on optimal parameters of support ... content/finalized setting/docum… ·...

13
Novel Bayesian inference on optimal parameters of support vector machines and its application to industrial survey data classication Jingjing Zhong, Peter W. Tse, Dong Wang n Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China article info Article history: Received 1 September 2015 Received in revised form 28 December 2015 Accepted 28 December 2015 Available online 7 June 2016 Keywords: Support vector machine Particle lter State space model Cross-validation accuracy abstract Engineering Asset Management (EAM) is a recently attractive discipline and it aims to address valuable contributions of asset management to organization's success. As of today, there is no specic method to evaluate performance of EAM standards. This paper aims to ll this gap and rank performance of asset management automatically after conducting survey, instead of evaluating questionnaires, analyzing results and ranking performances with a tedious process. Hence, it is necessary to develop intelligent data classication to simplify the whole procedure. Among many supervised learning methods, support vector machine attracts much attention for binary classication problems and its extension, namely multiple support vector machines, is able to solve multiclass classication problems. It is crucial to nd optimal parameters of support vector machines prior to their use for prediction of unknown testing data sets. In this paper, novel Bayesian inference on optimal parameters of support vector machines is pro- posed. Firstly, a state space model is constructed to nd the relationship between parameters of support vector machines and guess cross-validation accuracy. Here, the guess cross-validation accuracy aims to prevent support vector machines from overtting. Secondly, particle lter is introduced to iteratively nd posterior probability density functions of the parameters of support vector machines. Then, optimal parameters of support vector machines can be found from the posterior probability density functions. Ultimately, survey data collected from industry are used to validate the effectiveness of the proposed Bayesian inference method. Comparisons with some randomly selected parameters are conducted to highlight the superiority of the proposed method. The results show that the proposed Bayesian inference method can result in both high training and testing accuracies. & 2016 Elsevier B.V. All rights reserved. 1. Introduction Engineering Asset Management as a discipline addresses valuable contributions of asset management to organization's success [1]. Good asset management is becoming an expected practice in mature organizations all over the world. PAS 55:2008, which is the rst publicly available specication for optimized management of physi- cal assets, was developed by a consortium of 50 organizations from 15 different industry sectors in 10 countries. Given the popularity of PAS 55, after consultation with industry and professional bodies around the world, the specication was put forward in 2009 to the International Standards Organization as the basis for a new ISO standard for asset management. This was approved and the resulting ISO 55000 family of standards have been developed with 31 parti- cipating countries [2]. A EAM certicate provides recognized cred- ibility in good practice and corporate governance, and a robust platform for developing further improvements. A number of utility service providers have obtained asset man- agement certicates through hiring well-known consultancy compa- nies to perform auditing and performance assessment on EAM. However, many small and medium-sized enterprises (SMEs) cannot afford to hire renowned consultancy companies to guide them in obtaining required certicates and provide more professional sug- gestions to optimize asset management. Therefore, one purpose of the current research in EAM is to build an intelligent system so that it can automatically classify different performance levels of a particular company and then identify the most suitable practice in EAM for that company after benchmarking with information and performances given by other companies. For the other, PAS-55 only lists general guidelines in what elements are required to be accomplished so as to obtain the certicate in EAM. There is no specic method to evaluate the standard implementations and measure performance for mana- ging assets. This paper aims to ll this gap and rank performance levels of asset management automatically and rapidly after conducting survey, instead of evaluating questionnaires, analyzing results, and ranking their performances with a tedious process. To achieve this goal, a novel Bayesian inference method for nding optimal para- meters of support vector machines is proposed in this paper. The Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing http://dx.doi.org/10.1016/j.neucom.2015.12.132 0925-2312/& 2016 Elsevier B.V. All rights reserved. n Corresponding author. E-mail addresses: [email protected] (J. Zhong), [email protected] (P.W. Tse), [email protected] (D. Wang). Neurocomputing 211 (2016) 159171

Upload: others

Post on 29-Sep-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

Neurocomputing 211 (2016) 159–171

Contents lists available at ScienceDirect

Neurocomputing

http://d0925-23

n CorrE-m

meptse@

journal homepage: www.elsevier.com/locate/neucom

Novel Bayesian inference on optimal parameters of support vectormachines and its application to industrial survey data classification

Jingjing Zhong, Peter W. Tse, Dong Wang n

Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China

a r t i c l e i n f o

Article history:Received 1 September 2015Received in revised form28 December 2015Accepted 28 December 2015Available online 7 June 2016

Keywords:Support vector machineParticle filterState space modelCross-validation accuracy

x.doi.org/10.1016/j.neucom.2015.12.13212/& 2016 Elsevier B.V. All rights reserved.

esponding author.ail addresses: [email protected] (cityu.edu.hk (P.W. Tse), [email protected]

a b s t r a c t

Engineering Asset Management (EAM) is a recently attractive discipline and it aims to address valuablecontributions of asset management to organization's success. As of today, there is no specific method toevaluate performance of EAM standards. This paper aims to fill this gap and rank performance of assetmanagement automatically after conducting survey, instead of evaluating questionnaires, analyzingresults and ranking performances with a tedious process. Hence, it is necessary to develop intelligentdata classification to simplify the whole procedure. Among many supervised learning methods, supportvector machine attracts much attention for binary classification problems and its extension, namelymultiple support vector machines, is able to solve multiclass classification problems. It is crucial to findoptimal parameters of support vector machines prior to their use for prediction of unknown testing datasets. In this paper, novel Bayesian inference on optimal parameters of support vector machines is pro-posed. Firstly, a state space model is constructed to find the relationship between parameters of supportvector machines and guess cross-validation accuracy. Here, the guess cross-validation accuracy aims toprevent support vector machines from overfitting. Secondly, particle filter is introduced to iteratively findposterior probability density functions of the parameters of support vector machines. Then, optimalparameters of support vector machines can be found from the posterior probability density functions.Ultimately, survey data collected from industry are used to validate the effectiveness of the proposedBayesian inference method. Comparisons with some randomly selected parameters are conducted tohighlight the superiority of the proposed method. The results show that the proposed Bayesian inferencemethod can result in both high training and testing accuracies.

& 2016 Elsevier B.V. All rights reserved.

1. Introduction

Engineering Asset Management as a discipline addresses valuablecontributions of asset management to organization's success [1].Good asset management is becoming an expected practice in matureorganizations all over the world. PAS 55:2008, which is the firstpublicly available specification for optimized management of physi-cal assets, was developed by a consortium of 50 organizations from15 different industry sectors in 10 countries. Given the popularity ofPAS 55, after consultation with industry and professional bodiesaround the world, the specification was put forward in 2009 to theInternational Standards Organization as the basis for a new ISOstandard for asset management. This was approved and the resultingISO 55000 family of standards have been developed with 31 parti-cipating countries [2]. A EAM certificate provides recognized cred-ibility in good practice and corporate governance, and a robustplatform for developing further improvements.

J. Zhong),tyu.edu.hk (D. Wang).

A number of utility service providers have obtained asset man-agement certificates through hiring well-known consultancy compa-nies to perform auditing and performance assessment on EAM.However, many small and medium-sized enterprises (SMEs) cannotafford to hire renowned consultancy companies to guide them inobtaining required certificates and provide more professional sug-gestions to optimize asset management. Therefore, one purpose of thecurrent research in EAM is to build an intelligent system so that it canautomatically classify different performance levels of a particularcompany and then identify the most suitable practice in EAM for thatcompany after benchmarking with information and performancesgiven by other companies. For the other, PAS-55 only lists generalguidelines in what elements are required to be accomplished so as toobtain the certificate in EAM. There is no specific method to evaluatethe standard implementations and measure performance for mana-ging assets. This paper aims to fill this gap and rank performancelevels of asset management automatically and rapidly after conductingsurvey, instead of evaluating questionnaires, analyzing results, andranking their performances with a tedious process. To achieve thisgoal, a novel Bayesian inference method for finding optimal para-meters of support vector machines is proposed in this paper. The

Page 2: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171160

major reason why support vector machines are adopted is that it hasmany unique advantages in solving small samples, nonlinear andhigh-dimensional pattern recognition problems [3–5]. Moreover, afteroptimizing support vector machines, this newly method not only hasaccomplished the requirements for ranking performance levels, butalso can improve effectiveness and efficiency of analyzing and mea-suring performance levels in a simplified and low costing way.Moreover, predicted performance levels can be provided to othersmall and medium-sized enterprises (SMEs) and industries forbenchmarking and proceed further survey and research.

The novelties of the proposed Bayesian inference method aresummarized as follows. Firstly, a state space model is constructed toestablish the relationship between parameters of support vectormachines and guess cross-validation accuracy. Here, the guess cross-validation aims to alleviate the overfitting problem in the trainingprocess of support vector machines. Secondly, particle filter is intro-duced to iteratively obtain posterior probability density functions ofparameters of support vector machines. According to our literaturereview, the particle filter for one-dimensional optimization [6], windfarm layout design [7], slurry pump prognosis [8], bearing fault diag-nosis [9], etc., have been reported. However, its use for optimization ofsupervised learning methods, particularly support vector machines, isvery limited and seldom reported. The contents reported in this papercould be used to clarify how the particle filter is able to find optimalparameters of support vector machines. Moreover, because supportvector machine is just one kind of supervised learning methods, theproposed Bayesian inference on optimal parameters of support vectormachines can be extended to optimize parameters of other supervisedlearning methods.

The rest of this paper is outlined as follows. Fundamental theoriesrelated to the proposed Bayesian inference method are simplyreviewed in Section 2. The novel Bayesian inference method for findingoptimal parameters of support vector machines for multiclass classifi-cation problems is proposed in Section 3. Industrial survey data areanalyzed in Section 4 to demonstrate the effectiveness of the proposedBayesian inference method, and comparisons with some randomlyselected parameters are conducted. Conclusions are drawn at last.

2. Fundamental theories related to the proposed method

2.1. Support vector machine for binary classification problems andits extension for multiclass classification problems

Support vector machine [3] is a popular supervised learningmethod for many binary classification problems. Its fundamentaltheory is introduced in the following. Given a training data setΤ¼ yi; zi

� �yiARp; viA �1;1f g�� �n

i ¼ 1

n. Here, yi is a p-dimensional real

vector. vi is a binary label, which belongs to either �1 or 1. In somecases, if the training data set is linearly separable, two hyperplanes canbe used to separate the training data set. Moreover, it is required thatno training data are located in the margin of the two hyperplanes. Byconsidering the two points, the margin between the two hyperplanesshould be maximized as much as possible. Therefore, solving theoptimization problem provided by Eq. (1) is able to find a maximum-margin hyperplane for binary classification problems:

arg min 12‖ω‖2

subject to vi ωUyi�c� �

Z1 ;ð1Þ

where ω is a normal vector to the hyperplane. � is the dot productoperator and c is an offset of the hyperplane from the origin along thenormal vector.

In some cases, if the constraints used in Eq. (1) are not satisfied,a slack variables ξi and an error penalty constant C should be usedto consider the tradeoff between a large margin and an error

penalty. Then, Eq. (1) is revised as:

arg minω;ξ

12‖ω‖2þC

Xni ¼ 1

ξi

subject to vi ωUyi�c� �

Z1 �ξi ; ξZ0 :ð2Þ

To find a solution to Eq. (2), a Lagrangian equation withLagrange multipliers αi and βi is constructed as follows:

arg minω;ξ;α;β;b

12‖ω‖2þC

Xni ¼ 1

ξi�Xni ¼ 1

αi vi ωUyi�c� ��1þξi

� ��Xni ¼ 1

βiξi ; αi;βiZ0 :

ð3ÞThe derivative of Eq. (3) with respect to ω results in:

ω¼Xni ¼ 1

αiviyi: ð4Þ

The derivative of Eq. (3) with respect to c results in:

Xni ¼ 1

αivi ¼ 0: ð5Þ

Then, after Eqs. (4) and (5) are substituted to Eq. (3), an alter-native form of Eq. (3) is given as follows:

arg maxα

Xni ¼ 1

αi�12

Xni;j ¼ 1;

αiαjvivjyi Uyi subject to CZαiZ0 ;

i¼ 1;…;n;Xni ¼ 1;

αivi ¼ 0: ð6Þ

By solving Eq. (6), a linear decision function for binary classi-fication problems is obtained as follows:

f ðyÞ ¼ signðωUy�cÞ ¼ signXni ¼ 1

αiviyi

!Uy�c

!¼ sign

Xni ¼ 1

αiviyi Uy�c

!; ð7Þ

where only a small number of αi are non-zero, and their corre-sponding training data are called support vectors.

Additionally, if the training data are linearly inseparable, kernelmethods [10–12] should be used to map the training data to ahigh-dimensional space, in which it is possible to linearly separatethe training data. Here, the kernel function should satisfy Mercer'stheorem. In practice, two kernel functions including polynomialand Gaussian radial kernel functions are popular. Compared withthe polynomial kernel function, the Gaussian radial kernel func-tion is preferable because it has less parameters and a good per-formance for handling non-linear classification problems. In thispaper, the Gaussian radial kernel function is chosen. Therefore, Eq.(7) is modified as follows:

f ðyÞ ¼ signXni ¼ 1

αiviexp �γ‖yi�y‖2� ��c

!; ð8Þ

where γ is the kernel parameter and ‖U‖ is the modulus of thep-dimensional real vector.

The aforementioned theory related to support vector machine canonly be used to classify binary problems. If a multiclass problem isrequired to be solved, it is necessary to reduce the multiclass problemto multiple binary classification problems, namely multiple supportvector machines. In other words, multiple support vector machines arerequired to be built for multiclass classification problems. There aretwo popular strategies, namely one-against-all strategy and one-against-one strategy. For details, please refer to [3].

According to Eqs. (6) and (8), given the training data set, it isnot difficult to conclude that the two parameters, including thekernel parameterγ and the error penalty constant C, should beoptimized so as to achieve good performances for predictions ofunknown testing data set. In this paper, particle filter basedBayesian inference on optimal parameters of support vector

Page 3: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171 161

machines for multiclass classification problems is presented inSection 3. Prior to its formal presentation, the fundamental theoryof particle filter is reviewed in Subsection 2.2.

2.2. Particle filter

In Bayesian statistical inference, particle filter or sequentialMonte Carlo [13,14] is a particle Monte Carlo methodology to solvefiltering problems. Its fundamental theory is introduced in thefollowing. Given some measurements z1:k, if Ns random particlesxik� �Ns

i ¼ 1 and their associated weights ωik

� �Ns

i ¼ 1 can be directlydrawn from a true probability density function pðxk z1:kÞ

�� , the trueprobability density function is expressed as follows:

p xk j z1:kð Þ �XNs

i ¼ 1

ωik � δðxk�xikÞ; ð9Þ

where δðU Þ is the Dirac delta function and xk is the state or parameterat iteration k. However, in some cases, because it is intractable todirectly draw the random particles from the true posterior densityfunction, it is necessary to resort to an importance function qðxk j z1:kÞto generate the random particles. Moreover, if the true posteriordensity function pðxk j z1:kÞ is proportional to an analytical functionπðxk j z1:kÞ, the weight ωi

k is proportional to [13,14]:

ωikp

π xik j z1:k� �

q xik j z1:k� �pp xik j z1:k

� �q xik j z1:k� �; i¼ 1;2…;Ns: ð10Þ

Additionally, if qðxik; xik�1 j z1:kÞ ¼ qðxik jxik�1; z1:kÞ � qðxik�1 j z1:k�1Þand pðxik j z1:kÞppðzk jxikÞ � pðxik jxik�1Þ � pðxik�1 j z1:k�1Þ are con-sidered, Eq. (10) can be iteratively updated as follows [13,14]:

ωikp

p xik j z1:k� �

q xik j z1:k� �pp xik�1 j z1:k�1

� �� p zk jxik� �� p xik jxik�1

� �q xik�1 j z1:k�1� �� q xik jxik�1; z1:k

� � pωik�1

�p zk j xik� �� p xik jxik�1

� �q xik jxik�1; z1:k� � ; i¼ 1;2…;Ns: ð11Þ

If qðxik xik�1; z1:kÞ ¼ qðxik xik�1; zkÞ���� and weight normalization are

considered, Eq. (11) is modified as:

ωik ¼ωi

k�1 �p zk j xik� �� p xik jxik�1

� �q xik j xik�1; zk� � =

XNs

i ¼ 1

ωik�1

�p zk jxik� �� p xik jxik�1

� �q xik j xik�1; zk� �

; i¼ 1;2…;Ns: ð12Þ

In many practical cases, to simplify Eq. (12), it is preferable tochoose the importance function as a prior distribution, namely qðxikxik�1; zkÞ ¼ pðxik xik�1Þ

���� [13,14]. Finally, Eq. (12) is simplified as:

ωik ¼ωi

k�1 � p zk jxik �

=XNs

i ¼ 1

ωik�1 � p zk jxik

� !; i¼ 1;2…;Ns: ð13Þ

According to the aforementioned theory of particle filter, it is clearto see that, given the measurements z1:k, the probability densityfunction pðxk z1:kÞ

�� can be posteriorly estimated, which is able to inferan optimal state or parameter. In Section 3, based on particle filter, anovel Bayesian inference method for finding optimal parameters ofsupport vector machines will be proposed.

3. Novel Bayesian inference on optimal parameters of supportvector machines for multiclass classification problems

As aforementioned in Section 2.1, for the use of support vectormachines for multiclass classification problems, such as evaluation ofperformance of EAM, the two parameters, including the kernelparameter γ and the error penalty constant C, must be optimizedto achieve both high training and testing accuracies. Moreover,to avoid the overfitting problem, a well-known procedure called

cross-validation [15] should be employed. In K-fold cross-validation,the training data set is artificially divided to K subsets with an equalsize. One subset is used as validation data to test support vectormachines, which are trained by the remaining K-1 subsets. Circularly,K subsets can be respectively used as validation data. Cross-validationaccuracy (CVA) is defined as the percentage of validation data whichare correctly classified. Consequently, in this paper, finding optimalparameters of support vector machines becomes a two-dimensionaloptimization problem, which aims to maximize cross-validationaccuracy. To posteriorly infer optimal parameters of support vectormachines, the following state space model is constructed:

γk ¼ γk�1; ð14Þ

Ck ¼ Ck�1; ð15Þ

GCVAk ¼ CVAðγk;CkÞ�mk; ð16Þwhere mk is the measurement noise and it is subject to an uniformdistribution U 0;100ð Þ. Here, the range of the uniform distribution isfrom 0 to 100 because CVA is always limited to a range from 0% to100%. GCVA is the abbreviation of guess cross-validation accuracy,which will be constructed later. According to Bayes’ theorem, theposterior probability density function pðγk;Ck GCVA1:kÞ

�� can be infer-red from Eqs. (14) to (16).

Using Bayes' theorem, the joint posterior density function pðγk;σk z1:kÞ�� is represented by:

p γk;Ck jGCVA1:k� �¼ p GCVAk jγk;Ck

� �p γk;Ck jGCVA1:k�1� �

p GCVAk jGCVA1:k�1ð Þ

¼ p γk;Ck jGCVA1:k�1� �

∬ p γk;Ck jGCVA1:k�1� �

dγkdCk;CVA γk;Ck

� �ZGCVAk: ð17Þ

It is interesting to find that pðγk;Ck GCVA1:kÞ�� can be iteratively

calculated according to Eq. (17). Moreover, if GCVA is mono-tonically increasing, optimal parameters of support vectormachines can be posteriorly inferred from Eq. (17) by consideringthe following inequality:

0rGCVA1rGCVA2r⋯rGCVAkrCVAðγk;CkÞrCVAðγoptimal;CoptimalÞr100:

ð18ÞThe details for realizing Eqs. (17) and (18) by using particle

filter are clarified as follows.

Step 1. Ns ¼ 1000 random particles γi0;Ci0

n oNs

i ¼ 1with equal

weights ωi0

� �Ns

i ¼ 1 ¼ 1=Ns are drawn from an uniform distribution.Here, γ is limited to a range from 2�15 to 23. C is limited to a rangefrom 2�5 to 215. According to the theory of cross-validation, theserandom particles result in 1000 cross-validation accuracies. Then,arrange these cross-validation accuracies from the lowest value tothe highest value. The first guess cross-validation accuracy GCVA1

is obtained by taking the 50th percentile of the 1000 cross-validation accuracies. Then, the posterior probability densityfunction of pðγ1;C1 GCVA1Þ

�� is derived as follows:

p γ1;C1 jGCVA1� �¼

PNs

i ¼ 1ωi

0 � δ γ1;C1� �� γi0;C

i0

� �

∬PNs

i ¼ 1ωi

0 � δ γ1;C1� �� γi0;C

i0

� �dγ1dC1

¼XNs

i ¼ 1

ωi1 � δ γ1;C1

� �� γi0;Ci0

� �; ð19Þ

where ωi1 is calculated by:

ωi1 ¼

ωi0=

XNs

i ¼ 1

ωi0

" #CVAðγi0;Ci

0ÞZGCVA1;

0 CVAðγi0;Ci0ÞoGCVA1;

i¼ 1;2;…;Ns:

8>><>>: ð20Þ

Page 4: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

Fig. 1. The flowchart of the proposed Bayesian inference method for finding opti-mal parameters of support vector machines for multiclass classification problems.

J. Zhong et al. / Neurocomputing 211 (2016) 159–171162

According to Eq. (20), if the cross-validation accuracies CVAðγi0;Ci0Þ

are smaller than the first guess cross-validation accuracy, their asso-ciated weights are set to zero. In other words, parts of the randomparticles are discarded in terms of their corresponding low cross-validation accuracies. It should be noted that if a few iterations areconducted, most of the weights will become negligible and the var-iances of the weights increase. This problem is called a degeneracyproblem in particle filter. To relieve this problem, a systematicalresampling method should be conducted. Firstly of all, cumulate allweights by using c1 ¼ 0 and ci ¼ ci�1þωi

1, and form a cumulative

function. Draw a value a1 from an uniform distribution U 0;N�1s

h i.

Then, for aj ¼ a1þN�1s ðj� 1Þ; j¼ 1;2;…;Ns along the cumulative

function, if ajZci is satisfied, i¼ iþ1. Then, γj ¼ γiand Cj ¼ Ci. At last,all weights are set to1=Ns.

Step 2. Draw Ns random particles γik�1;σik�1

� �Ns

i ¼ 1 with equal

weights ωik�1

� �Ns

i ¼ 1 ¼ 1=Ns from the posterior distributionpðγk�1;Ck�1 GCVA1:k�1Þ

�� . According to the theory of cross-validation,these random particles result in 1000 cross-validation accuracies.Arrange these cross-validation accuracies from the lowest value to thehighest value. The kth guess cross-validation accuracy GCVAk isobtained by taking the 50th percentile of the 1000 cross-validationaccuracies. To keep the kth guess cross-validation accuracy mono-tonically increasing, the kth guess cross-validation accuracy must belarger than all of the previous guess cross-validation accuracies. If not,the recent largest guess cross-validation accuracy should be employedas the kth guess cross-validation accuracy. The posterior probabilitydensity function pðγk;Ck GCVA1:kÞ

�� is estimated by:

p γk;Ck jGCVA1:k� �¼

PNs

i ¼ 1ωi

k�1 � δ γk;Ck� �� γik�1;C

ik�1

� �

∬PNs

i ¼ 1ωi

k�1 � δ γk;Ck� �� γik�1;C

ik�1

� �dγkdCk

¼XNs

i ¼ 1

ωik � δ γk;Ck

� �� γik�1;Cik�1

� �; ð21Þ

where ωik is calculated by:

ωik ¼

ωik�1=

XNs

i ¼ 1

ωik�1

!CVAðγik�1;C

ik�1ÞZGCVAk;

0 CVAðγik�1;Cik�1ÞoGCVAk;

i¼ 1;2;…;Ns:

8>><>>: ð22Þ

Accordingly, if the cross-validation accuracies CVAðγik�1;Cik�1Þ are

smaller than the kth guess cross-validation accuracy GCVAk, theirassociated weights are set to zero. Resample Ns random particlesγik;σ

ik

� �Ns

i ¼ 1 from Eq. (21) by the aforementioned systematic resam-pling method.

Step 3. Increase k¼ kþ1 and repeat Step 2 until kexceeds a spe-cified maximum iteration number. It should be noted that calculationtime of the proposed Bayesian inference method becomes extensive asthe specified maximum iteration number increases and the number ofrandom particles. Therefore, the specified maximum iteration numbershould not be set to a large value and is equal to 4 in this paper so as toreduce calculation time. Because the posterior probability densityfunction pðγk;Ck GCVA1:kÞ

�� is estimated, optimal parameters of supportvector machines can be derived by taking the random particle corre-sponding to the Lth percentile of all the cross-validation accuraciesgenerated from the posterior probability density functionpðγk;Ck GCVA1:kÞ

�� . In this paper, L is set to 100.Step 4. Support vector machines with optimal parameters can

be used to predict unknown testing data set for multiclass classi-fication problems.

The proposed Bayesian inference method for finding optimalparameters of support vector machines for multiclass classificationproblems is summarized in Fig. 1.

4. A case study in Hong Kong

In Hong Kong (HK), certificates related to EAM have been awardedto a number of public utilities corporations, such as China Light andPower Co. Ltd. (CLP), Mass Transit Railway Corporation (MTRC), theHong Kong and China Gas Co. Ltd. (TG), etc. Some E&M buildings,small and medium-sized enterprises (SMEs), services organizations,and plants as a substantial part of EAM have not adopted the EAMstandard completely. The consultancy fee for accomplishing assetmanagement certificate is extremely high, and no specializeddepartment or organization evaluates performance levels for assetmanagement in Hong Kong. In order to investigate the performance ofasset management and EAM standard applications, sampling surveywas conducted. According to the PAS-55 guideline, the structuredquestionnaire was designed, and all questionnaires were sent to 40Operation and Maintenance (OM) departments/companies in HongKong, the objects of investigation included: public services of HKgovernment, commercial, residential, industrial and composite build-ings portfolio.

For our questionnaire design, it included general Information aboutcompanies, information management standards, and significancelevels for standard implementation. There were 60 questions in totaland the main part focused on guidelines for information managementin EAM. General information included categories of operations ofrespondents, the number of employees, number of years in businessoperation, and total operation and maintenance cost of buildings/plans. Then, the questionnaire listed criteria of asset informationmanagement which needed to be applied. Besides filling ques-tionnaire, face-to-face interviews were conducted by a researcher,which could explain more about the purpose and specific meaning forrespondents. It improved accuracy and efficiency of survey data. Foranother, some questions were required to be discussed in order toobtain the comprehensive information. After interviewers finishedquestionnaire, the researcher had to check if anything was missed oroverlooked. During the whole process, the interviewers could com-municate with the researcher and ask for further explanations to thelisted questions. Consequently, the integrity and validity check were

Page 5: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171 163

completed for all questionnaires. For the details of questionnaires,please see the Appendix.

According to information management standard in PAS-55, it lists53 criteria which belong to four parts. Since 40 departments/compa-nies stand for 40 samples, and there are 53 attributes for each sample.Therefore, collected survey data size is 40 samples with 53 attributes,and their associated performance levels are 1, 2 and 3. The perfor-mance levels (labels of support vector machines) are defined as fol-lows. Performance level 1 has a satisfied performance for standardadoption with a high percentage in achievements which is rangedfrom 75% to 100%. Performance level 2 shows an average performancecompared with performance level 1 and performance level 3. Thepercentage of standards adoption is from 45% to 74%. Performancelevel 3 stands for companies/organizations have a low percentage instandard application, which is in a range of 20% to 44%. They mayignore some critical criteria, or they do not comply with the standardstep by step. Hence, a few potential risks should be emphasized andsome processes are required to be immediately improved.

Half of the samples are regarded as the training data set. The rest ofthem are treated as the testing data set. Because the sample size of thetraining data set is only 20, 5-fold cross-validation is used in theproposed Bayesian inference method for finding optimal parametersof support vector machines. The optimal parameters found by theproposed Bayesian inference method are listed in Table 1. The supportvector machines with the optimal parameters are used to predict thetraining data set and result in the high training accuracy of 100%.Moreover, the support vector machines with the same optimal para-meters are used to predict the unknown testing data set and generatethe high testing accuracy of 95%. To highlight the superiority of theproposed Bayesian inference method, some randomly selected para-meters are used in support vector machines for processing the sametraining and testing data sets. The results are tabulated in Table 1,where it is clear to find that the randomly selected parameters causethe overfitting problem of support vector machines. Moreover, weobserved that the higher the parameter C is, the higher the trainingaccuracy is; the lower the parameter γ is; the higher the testingaccuracy is. Nevertheless, for optimization of support vector machinesand achieving both high training and testing accuracies, the para-meters C and γ should be jointly optimized.

In this paper, all codes were made based on the software called‘MATLAB’. To realize support vector machines, we used the famousMALTAB software pack ‘LIBSVM’ developed by Chang and Lin [16].To realize Bayesian inference on optimal parameters of supportvector machines, we developed the MATLAB codes by ourselves.The calculation time of the proposed method was measured by theMATLAB installed in a computer with CPU 2.33 GHz and 2 G RAMand it was equal to 15 s. The calculation time was not so fastbecause it was affected by the number of the random particles andthe computer performance.

5. Discussion

The final testing results shows that the 19 out of 20 samples havebeen correctly classified. The 5% misclassification was possibly causedby the samples, which were collected from the different businessnatures, and the performance requirements and critical business maynot be completely consistent. What is more, there are not hugeamount data available to support training and testing data samples,

Table 1Parameters of support vector machines and training and testing accuracies.

The optimal parameters obtained by the proposed method

C¼3.1591�104,γ¼3.7764�10�4

Training accuracy 100%Testing accuracy 95%

since asset management department is seldom set up in the compa-nies currently. For the further applications of our proposed method,after more companies fill the questionnaire, the proposed method willautomatically classify companies’ performance. According to classifi-cation results, respondents will be aware of their current performanceand improve better referring to the engineering asset managementguideline and specifications. Moreover, through investigations fromdifferent companies and organizations, data sample sizes will beincreased and updated continuously, and prediction accuracy will beimproved as well.

6. Conclusion

In this paper, the intelligent classification method for multiclassclassification problems, such as evaluation of performance of EAM,was developed by using the optimized support vector machines. Tofind optimal parameters of support vector machines, a novel Bayesianinference method was proposed. Firstly, the state space model wasconstructed to establish the relationship between the parameters ofsupport vector machines, including the kernel parameter γ and theerror penalty constant C, and the guess cross-validation accuracies.The guess cross-validation accuracies were monotonically increasingand aimed to alleviate the overfitting problem of support vectormachines. Secondly, the particle filter was introduced to use anamount of the random particles and their associated weights to pos-teriorly infer the probability density functions of the parameters ofsupport vector machines. Once their posterior probability densityfunctions were derived, the optimal parameters of support vectormachines could be found. The industrial survey data were investigatedto illustrate how the proposed Bayesian inference method worked.The results showed that the proposed Bayesian inference method isable to produce the high training accuracy of 100% and the hightesting accuracy of 95%, while the other randomly selected parameterscan only produce the high training accuracies but the low testingaccuracies. In other words, if the parameters of support vectormachines for evaluating performance of EAM were incorrectly chosen,the overfitting problem happened.

In this paper, we provided a relatively new idea, namelyBayesian inference, and an alternative way to optimize supportvector machines. In our future work, we will theoretically andthoroughly compare our proposed method with other optimiza-tion methods, such as genetic algorithm [17], particle swarmoptimization [18], etc. Furthermore, we will consider to apply theproposed Bayesian inference to optimize parameters of othersupervised learning methods because support vector machine isjust one kind of supervised learning methods.

Acknowledgement

This work described in this paper is partly supported by a grantfrom National Natural Science Foundation of China (Project No.51505307), a grant from the Research Grants Council of the Hong KongSpecial Administrative Region, China (Project Nos. CityU_122513) anda grant from City University of Hong Kong (Project No. 7004251). Theauthors would like to thank the valuable comments provided byreviewers. Additionally, the first author would like to express hersincere appreciation to Dr. Dong Wang's comments and suggestions.

Some randomly selected parameters

C¼1, C¼1�103, C¼1�103, C¼1�104,γ¼1�10�3 γ¼10�1 γ¼1 γ¼1045% 100% 100% 100%35% 60% 35% 35%

Page 6: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

puting 211 (2016) 159–171

Appendix

J. Zhong et al. / Neurocom164

.

Page 7: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171 165

.

Page 8: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171166

.

Page 9: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171 167

.

Page 10: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171168

.

Page 11: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171 169

.

Page 12: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171170

.

References

[1] J.E. Amadi-Echendu, R. Willett, K. Brown, T. Hope, J. Lee, J. Mathew, N. Vyas, B.-S. Yang, What is Engineering Asset Management?, Definitions, Concepts andScope of Engineering Asset Management, Springer, Harrogate, United Kingdom(2010), p. 3–16.

[2] J. Woodhouse, Iso 55000, 2013.[3] V. Vapnik, The Nature of Statistical Learning Theory, Springer Science & Busi-

ness Media, New York, NY, USA, 2013.[4] S. Ding, X. Hua, Recursive least squares projection twin support vector

machines for nonlinear classification, Neurocomputing 130 (2014) 3–9.[5] Q. He, C. Du, Q. Wang, F. Zhuang, Z. Shi, A parallel incremental extreme SVM

classifier, Neurocomputing 74 (2011) 2532–2540.[6] E. Zhou, M.C. Fu, S.I. Marcus, A particle filtering framework for randomized

optimization algorithms, in: Proceedings of the 40th Conference on WinterSimulation, (Winter Simulation Conference2008), pp. 647–654.

[7] Y. Eroğlu, S.U. Seçkiner, Wind farm layout optimization using particle filteringapproach, Renew. Energy 58 (2013) 95–107.

[8] D. Wang, W.T. Peter, Prognostics of slurry pumps based on a moving-averagewear degradation index and a general sequential Monte Carlo method, Mech.Syst. Signal Process. 56 (2015) 213–229.

[9] D. Wang, S. Sun, P.W. Tse, A general sequential Monte Carlo Method basedoptimal wavelet filter: a Bayesian approach for extracting bearing fault fea-tures, Mech. Syst. Signal Pr. 52–53 (2015) 293–308.

[10] B. Schölkopf, C.J. Burges, Advances in Kernel Methods: Support Vector Learn-ing, MIT press, Cambridge, MA, USA, 1999.

[11] O.A. Arqub, A.-S. Mohammed, S. Momani, T. Hayat, Numerical solutions of fuzzydifferential equations using reproducing kernel Hilbert space method, SoftComput. (2015) 1–20.

[12] O.A. Arqub, Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm–

Volterra integrodifferential equations, Neural Comput. Appl. (2015) 1–20.[13] M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters

for online nonlinear/non-Gaussian Bayesian tracking, IEEE Trans. Signal. Pro-ces. 50 (2002) 174–188.

[14] A. Doucet, A.M. Johansen, A tutorial on particle filtering and smoothing: fifteenyears later, Handb. Nonlinear Filter. 12 (2009) 656–704.

Page 13: Novel Bayesian inference on optimal parameters of support ... Content/Finalized setting/Docum… · Novel Bayesian inference on optimal parameters of support vector machines and its

J. Zhong et al. / Neurocomputing 211 (2016) 159–171 171

[15] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning,Springer series in statistics Springer, Berlin, 2001.

[16] C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support vector Machines, ACMTrans. Intell. Syst. Technol. 2 (2011) 27.

[17] O.A. Arqub, Z. Abo-Hammour, Numerical solution of systems of second-orderboundary value problems using continuous genetic algorithm, Inf. Sci. 279(2014) 396–415.

[18] J. Kennedy, Particle Swarm Optimization, Encyclopedia of Machine Learning,Springer, New York, USA (2010), p. 760–766.

Jingjing Zhong is a PhD candidate in the Departmentof Systems Engineering & Engineering Management,City University of Hong Kong. She received her Master’sin Management, Economics and Industrial Engineeringfrom Politecnico di Milano in Italy in 2009. Herresearch interests include Engineering Asset Manage-ment standards PAS-55, performance management,benchmarking and maintenance management.

Ir Dr. Peter W. Tse is currently the Group Leader of theSmart Engineering Asset Management Laboratory(SEAM) and the Director of Croucher Optical Non-destructive Testing Laboratory in the Department ofSystems Engineering and Engineering Management atthe City University of Hong Kong (CityU). SEAM wasestablished through generous donations from theindustry of Hong Kong. The mission of SEAM is toprovide support to industry for achieving near-zerobreakdown of equipment and maintaining high qualityservices through the smart management of assets. As oftoday, SEAM has research collaboration/consultancy

projects with over 30 international and local compa-

nies. Dr. Tse is the O-Committee Member of the Technical Committees of Non-Destructive Testing (TC 199), Safety of Machinery (TC 135) and Mechanical Vibra-tion and Shock (TC 108) of the International Organization for Standardization(ISO). Currently he is a registered Professional Engineer in Canada, a CharteredEngineer in United Kingdom. He has been awarded the PCN (Personnel Certification

in Non-Destructive Testing) Certificate of Competence in Condition Monitoringfrom the British Institute in Non-Destructive Testing (BINDT) and completed theVibration Training Course - Vibration Analysis Level 2 in according with ISO 18436Part 2. As of today, he has published over 250 articles in various internationaljournals, proceedings, and professional reports.

Dr. Dong Wang graduated from City University of HongKong in 2015 and he is interested in condition mon-itoring and fault diagnosis, prognosis and health man-agement, signal processing, data mining, non-destructive testing and statistical modeling. He haspublished over 30 SCI-indexed journal papers, most ofwhich are highly ranked by Journal Citation Reports.His research works appear in Journal of Sound andVibration, Journal of Vibration and Acoustics-ASMETransactions, Mechanical Systems and Signal Proces-sing, IEEE Transactions on Instrumentation and Mea-surement, Review of Scientific Instruments, Measure-

ment Science and Technology, Measurement, Journal of

Power Sources, Journal of Vibration and Control, Applied Soft Computing, etc. He isa reviewer for the 28 SCI-indexed journals, such as IIE Transactions, MechanicalSystems and Signal Processing, Journal of Sound and Vibration, Measurement Sci-ence and Technology, IEEE Signal Processing Letters, IEEE Transactions on IndustrialElectronics, IEEE Transactions on Signal Processing, IEEE Transactions on Reliability,IEEE Transactions on Instrumentation and Measurement, Microelectronics Relia-bility, Measurement, Reliability Engineering & System Safety, Journal of Vibrationand Control, etc. In recognitions of his contributions to Journal of Sound andVibration, he was awarded Outstanding Reviewer Status in 2014. He was a leadguest editor for Shock and Vibration in Year 2015. Currently, he is a lead guesteditor for Advances in Mechanical Engineering and he is an editorial board memberfor Journal of Low Frequency Noise Vibration and Active Control.