research article the generalization error bound for the...

Hindawi Publishing CorporationThe Scientific World JournalVolume 2013 Article ID 574748 5 pageshttpdxdoiorg1011552013574748

Research ArticleThe Generalization Error Bound for the Multiclass AnalyticalCenter Classifier

Zeng Fanzi and Ma Xiaolong

Key Laboratory for Embedded and Network Computing of Hunan Province Hunan University Changsha 410082 China

Correspondence should be addressed to Zeng Fanzi zengfanzi126com

Received 29 October 2013 Accepted 5 December 2013

Academic Editors J Shu and F Yu

Copyright copy 2013 Z Fanzi and M Xiaolong This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

This paper presents the multiclass classifier based on analytical center of feasible space (MACM) This multiclass classifier isformulated as quadratic constrained linear optimization and does not need repeatedly constructing classifiers to separate a singleclass from all the others Its generalization error upper bound is proved theoretically The experiments on benchmark datasetsvalidate the generalization performance of MACM

1 Introduction

Multiclass classification is an important and on-going re-search subject in machine learning Its application is im-mense such as machine vision [1 2] text and speech catego-rization [3 4] natural language processing [5] and dis-ease diagnosis [6 7] Two kinds of approaches have beenproposed to solve multiclass classification problem [8] Thefirst multiclass classification approach is extending binaryclassifier to handle the multiclass case directly This includedneural networks decision trees support vector machinesnaive Bayes and 119870-nearest neighbors The second approachdecomposes themulticlass classification problem into severalbinary classification tasks Several methods are used for thisdecomposition one-versus-all [9] all-versus-all [10] anderror-correcting output coding [11]

The one-versus-all approach reduces the problem ofclassifying among 119870 classes into 119870 binary problems whereeach problem discriminates a given class from the other119870minus1classes

For the all-versus-all method a binary classifier is builtto discriminate between each pair of classes while discardingthe rest of the classesThis requires building119870(119870minus1)2 binaryclassifiers for119870 classes problemWhen testing a new examplevoting is performed among the classifiers and the class withthe maximum number of votes wins

For error-correcting output coding it works by training119873 binary classifiers to distinguish between the 119870 different

classes Each class is given a codeword of length119873 accordingto a binarymatrix119872 Each row of119872 corresponds to a certainclass

The above multiclass classification algorithms need con-struct binary classifier repeatedly to separate a single classfrom all the others for 119870 classes problem which leads todaunting computation and low efficiency of classificationReference [12] proposes multiclass support vector machine(MSVM) which corresponds to simple quadratic optimiza-tion and need not repeat constructing binary classifier How-ever support vector machine corresponds to the center ofthe largest inscribed hypersphere of feasible space When thefeasible space that is the space of hypotheses consistent withthe training data is elongated or asymmetric support vectormachine is not effective [13] To address the above problemsmulticlass classifier based on the analytical center of feasiblespace (MACM) is proposed At the same time in orderto validate its generalization performance theoretically itsgeneralization error upper bound is formulated and provedAnd the experiments on benchmark dataset validate thegeneralization performance of MACM

2 Multiclass Analytical Center Classifier

To facilitate the discussion of multiclass analytical centerclassifier the following definitions are introduced

2 The Scientific World Journal

Definition 1 (chunk) A vector k = (V1 V

119896119889) isin R119896119889 is

broken into 119896 chunks (k1 k

119896) where the 119894th chunk k

119894=

(V(119894minus1)lowast119889+1

V119894lowast119889)

Definition 2 (expansion) Let Vec(119909 119894) isin R119896119889 be a vectorwhere x isin R119889 is embedded in 119896119889 dimensions space bywriting the coordinates of x in the 119894th chunk of a vectorin R119896119889 0ℓ denotes the zero vector of length ℓ ThenVec(x 119894) can be written formally as the concatenation of threevectors Vec(x 119894) = (0(119894minus1)lowast119889 x 0(119896minus119894)lowast119889) isin R119896119889 And defineVec(x 119894 119895) = Vec(x 119894) minus Vec(x 119895) as a vector where x isembedded in the 119894th chunk and minusx is embedded in the 119895thchunk of a vector inR119896119889

Definition 3 Given the sample (x 119910) isin R119889 times 1 119896its expansion is defined as 119866(x 119910) = Vec(x 119894 119895) | 119894 =119910 119895 = 1 119896119894 the expansion of the whole sample set 119878 =(x1 1199101) (x2 1199102) is defined as 119866(119878) = ⋃

(x119910)isin119878 119866(x 119910)

Definition 4 (piecewise linear separability) The point sets119860119894isin R119889times119898119894 119894 = 1 119896 (119896 represents the class labels and 119898

119894

the number of samples belonging to 119894th class) are piecewiselinear separable if there exists w119894 isin R119889 120574119894 isin R 119894 = 1 119896where 119889 represents the dimension of point such that

119860119894

ℓw119894 minus 120574119894 gt 119860119894

ℓw119895 minus 120574119895

119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898119894

(1)

Definition 5 (piecewise linear classifier) Assume w = (w1

w119896) where (w

1 w

119896) isin R119896lowast(119889+1) = R119889+1 times sdot sdot sdot timesR119889+1

Given a new point x isin R119889+1 a piecewise linear classifier is afunction 119891 R119889+1 rarr 1 119896 as follows

119891 (x) = argmax119894=1119896

w119894x1015840 (2)

where argmax returns to a class label corresponding to themaximum value

To simplify the notation for the formulation of multiclassanalytical center classifier we consider an augmented weightspace as follows

Let

119860119894

ℓ= [119860119894

ℓ

1] isin R

119889+1 w119894 = [w

119894

120574] isin R

119889+1

119894 = 1 119896 ℓ = 1 119898119894

(3)

then inequality (1) can be rewritten as

119860119894

ℓw119894 minus 119860119894

ℓw119895 gt 0 119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898

119894

(4)

Let w = (w1 w

119896) isin R119896lowast(119889+1) = R119889+1timessdot sdot sdottimesR119889+1 Accord-

ing to Definition 2 embedding 119860119894ℓinto R119896lowast(119889+1) space

inequality (4) has the following form

(Vec (119860119894ℓ 119894) minus Vec (119860119894

ℓ 119895)) w gt 0

119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898119894

(5)

Consider that

Vec (119860119894ℓ 119894 119895) = Vec (119860119894

ℓ 119894) minus Vec (119860119894

ℓ 119895)

119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898119894

(6)

Thus inequality (6) can be rewritten as follows

Vec (119860119894ℓ 119894 119895) w gt 0 119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898

119894

(7)

Inequality (7) represents the feasible space of w in thehigher dimension spaceR119896lowast(119889+1) Similar to the binary classi-fication based on the analytical center of version space [8] wedefine the slack variable 119878

119894119895ℓ= Vec(119860119894

ℓ 119894 119895)w 119894 119895 = 1 119896

119894 = 119895 ℓ = 1 119898119894and then have the following minimization

problem whose solver corresponds to the analytical center ofhigher dimension space

min Φ (w) = minus119898119894

sum

ℓ=1

119896

sum

119894 = 119895119894119895=1

ln 119878119894119895ℓ

st ℎ (w) = 12ww1015840 minus 1 = 0

(8)

In order to further simplify the formulation of multiclassanalytical center classifier we introduce some notations asfollows119872 = 119896lowast(119896minus1)lowastsum

119896

119894=1119898119894 119861 = Vec(119860119894

ℓ 119894 119895) | 119894 119895 = 1

119896 119894 = 119895 ℓ = 1 119898119894 isin R119872times119896lowast(119889+1) let 119861

119894represent the

119894th row vector of 119861 Then the optimization problem (8) canbe rewritten as follows

min Φ (w) = minus119872

sum

119894=1

ln 119878119894

where 119878119894= 119861119894w

st ℎ (w) = 12ww1015840 minus 1 = 0

(9)

After solving the optimization problem (9) to get theoptimal weight wlowast we have a piecewise linear classifier 119891 R119889+1 rarr 1 119896 computed in the following way

119891 (x) = arg max119894=1119896

wlowast119894x1015840 (10)


If the dataset is not piecewise linear separable the kernelfunction is used to map the data into high dimension linearspace

3 Generalization Error Bound ofMulticlass Analytical Center Classifier

In order to analyze the generalization error bound theoreti-cally we introduce the definition of classification margin anddata radius and then deduce themargin-based generalizationerror bound of MACM

The Scientific World Journal 3

Definition 6 (classification margin) Given the linear classi-fier ℎ(x

119895) = arg max

119894=1119896w119894x1015840

119895 the classification margin of

the sample (x119895 119910119895) isin R119889 times 1 119896 is defined as follows

119903119895= min119897=119910119895 119894=1119896119897

(w119897x1015840

119895minus w119894x1015840

119895) (11)

For the whole training set 119878 =

(x1 1199101) (x2 1199102) (x

119898 119910119898) the minimal margin is

as followsmargin

119878(ℎ) = min

(x119895 119910119895)isin119878119903119895 (12)

Definition 7 (data radius) Given dataset 119878 = (x1 1199101) (x2

1199102) (x

119898 119910119898) the data radius is defined as follows

120589 (119878) = maxx119894isin119878

1003817100381710038171003817x1198941003817100381710038171003817 (13)

Theorem 8 Define data radius of dataset 119878 as 120589(119878) and dataradius of dataset Vec(119878 119894 119895) as 120589(Vec(119878 119894 119895)) if 120589(119878) le 119877 then120589(Vec(119878 119894 119895)) le 2119877

Proof Consider the following

120589 (Vec (119878 119894 119895))

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894 119895)1003817100381710038171003817

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894) minus Vec (x119897 119895)1003817100381710038171003817

le maxx119897isin119878

(1003817100381710038171003817Vec (x119897 119894)

1003817100381710038171003817 +1003817100381710038171003817Vec (x119897 119895)

1003817100381710038171003817)

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894)1003817100381710038171003817 +max

x119897isin1198781003817100381710038171003817Vec (x119897 119895)

1003817100381710038171003817

(14)

Because Vec(x 119894) = (0 x 0) = x 120589(Vec(119878 119894 119895)) le 2 lowastmaxx119894isin119878x119894 = 2119877 This ends the proof of Theorem 8

Theorem 9 (see [14]) Consider thresholding of a real-valuedfunction H with unit weight vectors w = 1 on the innerproduct space 119883 and fix margin 119903 isin R+ For any probabilitydistribution D on 119883 times minus1 1 with support in a ball of radius119877 around the origin with probability 1 minus 120575 over 119898 randomsamples 119878 any hypothesis ℎ isin H with margin

119878(ℎ) ge 119903 on 119878

has error more thanerrD (ℎ) = 120576 (119903 119898 120575 119877)

=2

119898(641198772

1199032log emr

81198772log 32119898

1199032+ log 4

120575)

(15)

provided119898 gt 2120576 6411987721199032 lt 119898From Definition 3 and inequality (7) it is shown that

to correctly classify the sample 119860119894 119894 = 1 119896 119866(119860)w =

119866(⋃119896

119894=1119860119894)w gt 0 is satisfied Here one introduces the

samplesrsquo pairs 119875+(119860) = 119866(119860) +1 isin R119896lowast(119889+1) times +1

and 119875minus(119860) = minus119866(119860) minus1 isin R119896lowast(119889+1) times minus1 where 0 1

denote the corresponding dimension vector with elements 0 or 1respectively So one can construct the new samplesrsquo set 119875(119860) =119875+(119860)⋃119875

minus(119860) isin R119896lowast(119889+1) times minus1 +1

Theorem 10 The binary classification of sample 119875(119860) byanalytical center classifier is equivalent to the multiclass clas-sification of sample 119860 = 119860119894119896

119894=1by multiclass analytical center

classifier

Proof Assume that 119883+119894 119884+ isin 119875+(119860) 119883

minus

119894 119884minus isin 119875minus(119860) 119894 =

1 |119875+(119860)| then binary classification is to solve the

following feasible problem

119884+(119883+

119894w + 119887) gt 0 119894 = 1

1003816100381610038161003816119875+ (119860)1003816100381610038161003816

119884minus(119883minus

119894w + 119887) gt 0 119894 = 1

1003816100381610038161003816119875minus (119860)1003816100381610038161003816

(16)

Suppose the bias 119887 equals 0 because 119875minus(119860) and 119875

+(119860) are

symmetrical on the origin The feasible constraints can berewritten as follows

119883+

119894w gt 0 119894 = 1

1003816100381610038161003816119875+ (119860)1003816100381610038161003816 (17)

The feasible constraints (17) define the feasible space ofweightvector w isin R119896lowast(119889+1) the binary classification by analyticalcenter classifier can be formulated as follows

min Φ (w) = minus|119875+(119860)|

sum

119894=1

119883+

119894w gt 0

st 1

2w1015840w minus 1 = 0

(18)

Because119883+119894isin 119866(119860) and |119875

+(119860)| = |119866(119860)| the problem (18) is

equivalent to problem (8)This ends the proof ofTheorem 10

Theorem 11 Consider the classifiersrsquo set H from Definition 5with sum

119894=1119896w119894 = 1 on the inner product space 119883 where

ℎ R119889+1 rarr 1 119896 and fix margin 119903 isin R+ For anyprobability distributionD on 119883 times 1 119896 with support in aball of radius119877 around the origin with probability 1minus120575 over119898random samples 119878 any hypothesis ℎ isinH with margin

119878(ℎ) ge 119903

on 119878 has error more than

errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(19)

provided119898 gt 2120576 6411987721199032 lt 119898

Proof Because the sample in 119875(119878) is not independentthe generalization error bound cannot be attained fromTheorem 9 Theorem 9 is independent of the sample dis-tribution so we can construct a new sample distributionD1015840 According to the new distribution and dataset 119878 togenerate the independent sample set 1198751015840(119878) with 119898 samplesthat is for every (x 119910) isin 119878 define 1198751015840(x 119910) as the pointsampled uniformly and randomly from 119875(119878) according tothe distribution D1015840 then we have 1198751015840(119878) = ⋃

(x119910)isin119878 1198751015840(x 119910)

From Theorem 8 the data radius 120589(1198751015840(119878)) of 1198751015840(119878) satisfies


120589(1198751015840(119878)) le 2119877 The generalization error of hypothesis ℎ over

1198751015840(119878) fromTheorem 9 can be calculated as follows

errD1015840 (ℎ) = 120576 (119903 119898 120575 119877)

=2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(20)

Event 119864119894which denotes a sample in 119875(119878) is wrongly

classified and event 119862 which denotes the misclassificationoccurs in1198751015840(119878) From the above analysis themisclassificationof any sample in 119875(119878) causes the misclassification of the pointin 1198751015840(119878) so that the probability of events 119864

119894and119862 satisfies the

following inequality

119875 (119864119894) le 119875 (119862) (21)

Because the cardinality of119875(119878) equals 2(119896minus1) the probabilityof sample misclassification in 119875(119878) is written as follows

119875(

2(119896minus1)

⋃

119894=1

119864119894) (22)

From union bound theorem we have the following inequal-ity

119875(

2(119896minus1)

⋃

119894=1

119864119894) le

2(119896minus1)

sum

119894=1

119875 (119864119894) le

2(119896minus1)

sum

119894=1

119875 (119862) (23)

So the generalization error of hypothesis ℎ over 119878 is

errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=

2(119896minus1)

sum

119894=1

2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(24)

This ends the proof of Theorem 11

4 Computational Experiments

In this section we present the computational results compar-ing multiclass analytical center classifier (MACM) and mul-ticlass support vector machine (MSVM) [12] A descriptionof each of the datasets follows this paragraph The kernelfunction for the piecewise nonlinear MACM and MSVMmethods is 119896(x x

119894) = (xx1015840

119894119873 + 1)

119899 where 119899 is the desiredpolynomial

Wine Recognition Data The wine dataset uses the chemicalanalysis of wine to determine the cultivarThere are 178 pointswith 13 features This is a three class dataset distributed asfollows 59 points in class 1 71 points in class 2 and 48 pointsin class 3

Glass Identification Database The Glass dataset is used toidentify the origin of a sample of glass through chemical

Table 1 The generalization error of MACM and MSVM

Dataset Classifier Degree of polynomial1 3

Wine M-ACM 9774 9865M-SVM 9719 9775

Glass M-ACM 5646 6938M-SVM 5514 6615

analysis This dataset is comprised of six classes of 214points with 9 features The distribution of points by classis as follows 70 float processed building windows 17 floatprocessed vehicle windows 76 nonfloat processed buildingwindows 13 containers 9 tableware and 29 headlamps

Table 1 contains the results for MACM and MSVM onwine and glass datasets As anticipated MACM producesbetter testing generalization than MSVM

5 Summary

In this paper the multiclass analytical center classifier basedon the analytical center of feasible space which correspondsto a simple quadratic constrained linear optimization isproposed At the same time in order to validate its generaliza-tion performance theoretically its generalization error upperbound is formulated and proved By the experiments onwine recognition and glass identification dataset it is shownthat the multiclass analytical center classifier outperformsmulticlass support vector machine in generalization error

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China under Grant nos 61370096 and61173012 and the Key Project of Natural Science Foundationof Hunan Province under Grant no 12JJA005

References

[1] J S Prakash K A Vignesh C Ashok and R Adithyan ldquoMulti-class support vector machines classifier for machine visionapplicationrdquo in Proceedings of the International Conference onMachine Vision and Image Processing pp 197ndash199 2013

[2] A J Joshi F Porikli and N P Papanikolopoulos ldquoScalableactive learning for multiclass image classificationrdquo IEEE Trans-actions on Pattern Analysis andMachine Intelligence vol 34 no11 pp 2259ndash22273 2012

[3] S Ai-Xiang L Ming-Hui H Shun-Liang and Z Jun ldquoA newhypersphere multi-class support vector machine applied intext classificationrdquo in Proceedings of the IEEE 3rd InternationalConference on Communication Software and Networks (ICCSNrsquo11) pp 478ndash481 Xirsquoan China May 2011

[4] L La Q Guo D Yang and Q Cao ldquoMulticlass boostingwith adaptive group-based kNN and its application in textcategorizationrdquo Mathematical Problems in Engineering vol2012 Article ID 793490 24 pages 2012

[5] Y Ni C J Saunders S Szedmak and M Niranjan ldquoStructurelearning for natural language processingrdquo in Proceedings of the


IEEE International Workshop on Machine Learning for SignalProcessing (MLSP rsquo09) pp 1ndash6 September 2009

[6] H Ankıshan and D Yılmaz ldquoComparison of SVM and ANFISfor snore related sounds classification by using the largestLyapunov exponent and entropyrdquo Computational and Mathe-matical Methods in Medicine vol 2013 Article ID 238937 13pages 2013

[7] P Adarsh and D Jeyakumari ldquoMulticlass SVM-based auto-mated diagnosis of diabetic retinopathyrdquo in Proceedings ofthe International Conference on Communication and SignalProcessing (ICCSP rsquo13) pp 206ndash210 Melmaruvathur IndiaApril 2013

[8] M Aly Survey on Multiclass Classification Methods 2005[9] A O Hatch andA Stolcke ldquoGeneralized linear Kernels for one-

versus-all classification application to speaker recognitionrdquo inProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo06) Toulouse FranceMay 2006

[10] M A Bagheri G A Montazer and S Escalera ldquoError cor-recting output codes for multiclass classification applicationto two image vision problemsrdquo in Proceedings of the 16th CSIInternational Symposium on Artificial Intelligence and SignalProcessing (AISP rsquo12) pp 508ndash513 Shiraz Iran May 2012

[11] C W Hsu and C J Lin ldquoA comparison of methods formulticlass support vector machinesrdquo IEEE Transactions onNeural Networks vol 13 no 2 pp 415ndash425 2002

[12] E J Bredensteiner and K P Bennett ldquoMulticategory classifica-tion by support vector machinesrdquo Computational Optimizationand Applications vol 12 no 1ndash3 pp 53ndash79 1999

[13] T B Trafalis and A M Malyscheff ldquoAn analytic centermachinerdquoMachine Learning vol 46 no 1ndash3 pp 203ndash223 2002

[14] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



Definition 1 (chunk) A vector k = (V1 V

119896119889) isin R119896119889 is

broken into 119896 chunks (k1 k

119896) where the 119894th chunk k

119894=

(V(119894minus1)lowast119889+1

V119894lowast119889)

Definition 2 (expansion) Let Vec(119909 119894) isin R119896119889 be a vectorwhere x isin R119889 is embedded in 119896119889 dimensions space bywriting the coordinates of x in the 119894th chunk of a vectorin R119896119889 0ℓ denotes the zero vector of length ℓ ThenVec(x 119894) can be written formally as the concatenation of threevectors Vec(x 119894) = (0(119894minus1)lowast119889 x 0(119896minus119894)lowast119889) isin R119896119889 And defineVec(x 119894 119895) = Vec(x 119894) minus Vec(x 119895) as a vector where x isembedded in the 119894th chunk and minusx is embedded in the 119895thchunk of a vector inR119896119889

Definition 3 Given the sample (x 119910) isin R119889 times 1 119896its expansion is defined as 119866(x 119910) = Vec(x 119894 119895) | 119894 =119910 119895 = 1 119896119894 the expansion of the whole sample set 119878 =(x1 1199101) (x2 1199102) is defined as 119866(119878) = ⋃

(x119910)isin119878 119866(x 119910)

Definition 4 (piecewise linear separability) The point sets119860119894isin R119889times119898119894 119894 = 1 119896 (119896 represents the class labels and 119898

119894

the number of samples belonging to 119894th class) are piecewiselinear separable if there exists w119894 isin R119889 120574119894 isin R 119894 = 1 119896where 119889 represents the dimension of point such that

119860119894

ℓw119894 minus 120574119894 gt 119860119894

ℓw119895 minus 120574119895

119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898119894

(1)

Definition 5 (piecewise linear classifier) Assume w = (w1

w119896) where (w

1 w

119896) isin R119896lowast(119889+1) = R119889+1 times sdot sdot sdot timesR119889+1

Given a new point x isin R119889+1 a piecewise linear classifier is afunction 119891 R119889+1 rarr 1 119896 as follows

119891 (x) = argmax119894=1119896

w119894x1015840 (2)


To simplify the notation for the formulation of multiclassanalytical center classifier we consider an augmented weightspace as follows

Let

119860119894

ℓ= [119860119894

ℓ

1] isin R

119889+1 w119894 = [w

119894

120574] isin R

119889+1

119894 = 1 119896 ℓ = 1 119898119894

(3)

then inequality (1) can be rewritten as

119860119894

ℓw119894 minus 119860119894

ℓw119895 gt 0 119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898

119894

(4)

Let w = (w1 w

119896) isin R119896lowast(119889+1) = R119889+1timessdot sdot sdottimesR119889+1 Accord-

ing to Definition 2 embedding 119860119894ℓinto R119896lowast(119889+1) space

inequality (4) has the following form

(Vec (119860119894ℓ 119894) minus Vec (119860119894

ℓ 119895)) w gt 0

119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898119894

(5)

Consider that

Vec (119860119894ℓ 119894 119895) = Vec (119860119894

ℓ 119894) minus Vec (119860119894

ℓ 119895)

119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898119894

(6)

Thus inequality (6) can be rewritten as follows

Vec (119860119894ℓ 119894 119895) w gt 0 119894 119895 = 1 119896 119894 = 119895 ℓ = 1 119898

119894

(7)

Inequality (7) represents the feasible space of w in thehigher dimension spaceR119896lowast(119889+1) Similar to the binary classi-fication based on the analytical center of version space [8] wedefine the slack variable 119878

119894119895ℓ= Vec(119860119894

ℓ 119894 119895)w 119894 119895 = 1 119896

119894 = 119895 ℓ = 1 119898119894and then have the following minimization

problem whose solver corresponds to the analytical center ofhigher dimension space

min Φ (w) = minus119898119894

sum

ℓ=1

119896

sum

119894 = 119895119894119895=1

ln 119878119894119895ℓ

st ℎ (w) = 12ww1015840 minus 1 = 0

(8)

In order to further simplify the formulation of multiclassanalytical center classifier we introduce some notations asfollows119872 = 119896lowast(119896minus1)lowastsum

119896

119894=1119898119894 119861 = Vec(119860119894

ℓ 119894 119895) | 119894 119895 = 1

119896 119894 = 119895 ℓ = 1 119898119894 isin R119872times119896lowast(119889+1) let 119861

119894represent the

119894th row vector of 119861 Then the optimization problem (8) canbe rewritten as follows

min Φ (w) = minus119872

sum

119894=1

ln 119878119894

where 119878119894= 119861119894w

st ℎ (w) = 12ww1015840 minus 1 = 0

(9)

After solving the optimization problem (9) to get theoptimal weight wlowast we have a piecewise linear classifier 119891 R119889+1 rarr 1 119896 computed in the following way

119891 (x) = arg max119894=1119896

wlowast119894x1015840 (10)


If the dataset is not piecewise linear separable the kernelfunction is used to map the data into high dimension linearspace

3 Generalization Error Bound ofMulticlass Analytical Center Classifier

In order to analyze the generalization error bound theoreti-cally we introduce the definition of classification margin anddata radius and then deduce themargin-based generalizationerror bound of MACM



119895) = arg max

119894=1119896w119894x1015840



119903119895= min119897=119910119895 119894=1119896119897

(w119897x1015840

119895minus w119894x1015840

119895) (11)


(x1 1199101) (x2 1199102) (x


as followsmargin

119878(ℎ) = min

(x119895 119910119895)isin119878119903119895 (12)


1199102) (x


120589 (119878) = maxx119894isin119878

1003817100381710038171003817x1198941003817100381710038171003817 (13)



120589 (Vec (119878 119894 119895))

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894 119895)1003817100381710038171003817

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894) minus Vec (x119897 119895)1003817100381710038171003817


(1003817100381710038171003817Vec (x119897 119894)

1003817100381710038171003817 +1003817100381710038171003817Vec (x119897 119895)

1003817100381710038171003817)

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894)1003817100381710038171003817 +max

x119897isin1198781003817100381710038171003817Vec (x119897 119895)

1003817100381710038171003817

(14)



119878(ℎ) ge 119903 on 119878


=2

119898(641198772

1199032log emr

81198772log 32119898

1199032+ log 4

120575)

(15)



119866(⋃119896








classifier


minus

119894 119884minus isin 119875minus(119860) 119894 =



119884+(119883+

119894w + 119887) gt 0 119894 = 1

1003816100381610038161003816119875+ (119860)1003816100381610038161003816


119894w + 119887) gt 0 119894 = 1

1003816100381610038161003816119875minus (119860)1003816100381610038161003816

(16)


+(119860) are


119883+

119894w gt 0 119894 = 1

1003816100381610038161003816119875+ (119860)1003816100381610038161003816 (17)


min Φ (w) = minus|119875+(119860)|

sum

119894=1

119883+

119894w gt 0

st 1

2w1015840w minus 1 = 0

(18)

Because119883+119894isin 119866(119860) and |119875

+(119860)| = |119866(119860)| the problem (18) is





119878(ℎ) ge 119903


errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(19)

provided119898 gt 2120576 6411987721199032 lt 119898


(x119910)isin119878 1198751015840(x 119910)





errD1015840 (ℎ) = 120576 (119903 119898 120575 119877)

=2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(20)





119875 (119864119894) le 119875 (119862) (21)


119875(

2(119896minus1)

⋃

119894=1

119864119894) (22)


119875(

2(119896minus1)

⋃

119894=1

119864119894) le

2(119896minus1)

sum

119894=1

119875 (119864119894) le

2(119896minus1)

sum

119894=1

119875 (119862) (23)


errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=

2(119896minus1)

sum

119894=1

2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(24)




119894) = (xx1015840

119894119873 + 1)






Wine M-ACM 9774 9865M-SVM 9719 9775

Glass M-ACM 5646 6938M-SVM 5514 6615



5 Summary


Acknowledgments


References
























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





119895) = arg max

119894=1119896w119894x1015840



119903119895= min119897=119910119895 119894=1119896119897

(w119897x1015840

119895minus w119894x1015840

119895) (11)


(x1 1199101) (x2 1199102) (x


as followsmargin

119878(ℎ) = min

(x119895 119910119895)isin119878119903119895 (12)


1199102) (x


120589 (119878) = maxx119894isin119878

1003817100381710038171003817x1198941003817100381710038171003817 (13)



120589 (Vec (119878 119894 119895))

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894 119895)1003817100381710038171003817

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894) minus Vec (x119897 119895)1003817100381710038171003817


(1003817100381710038171003817Vec (x119897 119894)

1003817100381710038171003817 +1003817100381710038171003817Vec (x119897 119895)

1003817100381710038171003817)

= maxx119897isin119878

1003817100381710038171003817Vec (x119897 119894)1003817100381710038171003817 +max

x119897isin1198781003817100381710038171003817Vec (x119897 119895)

1003817100381710038171003817

(14)



119878(ℎ) ge 119903 on 119878


=2

119898(641198772

1199032log emr

81198772log 32119898

1199032+ log 4

120575)

(15)



119866(⋃119896








classifier


minus

119894 119884minus isin 119875minus(119860) 119894 =



119884+(119883+

119894w + 119887) gt 0 119894 = 1

1003816100381610038161003816119875+ (119860)1003816100381610038161003816


119894w + 119887) gt 0 119894 = 1

1003816100381610038161003816119875minus (119860)1003816100381610038161003816

(16)


+(119860) are


119883+

119894w gt 0 119894 = 1

1003816100381610038161003816119875+ (119860)1003816100381610038161003816 (17)


min Φ (w) = minus|119875+(119860)|

sum

119894=1

119883+

119894w gt 0

st 1

2w1015840w minus 1 = 0

(18)

Because119883+119894isin 119866(119860) and |119875

+(119860)| = |119866(119860)| the problem (18) is





119878(ℎ) ge 119903


errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(19)

provided119898 gt 2120576 6411987721199032 lt 119898


(x119910)isin119878 1198751015840(x 119910)





errD1015840 (ℎ) = 120576 (119903 119898 120575 119877)

=2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(20)





119875 (119864119894) le 119875 (119862) (21)


119875(

2(119896minus1)

⋃

119894=1

119864119894) (22)


119875(

2(119896minus1)

⋃

119894=1

119864119894) le

2(119896minus1)

sum

119894=1

119875 (119864119894) le

2(119896minus1)

sum

119894=1

119875 (119862) (23)


errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=

2(119896minus1)

sum

119894=1

2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(24)




119894) = (xx1015840

119894119873 + 1)






Wine M-ACM 9774 9865M-SVM 9719 9775

Glass M-ACM 5646 6938M-SVM 5514 6615



5 Summary


Acknowledgments


References
























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






errD1015840 (ℎ) = 120576 (119903 119898 120575 119877)

=2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(20)





119875 (119864119894) le 119875 (119862) (21)


119875(

2(119896minus1)

⋃

119894=1

119864119894) (22)


119875(

2(119896minus1)

⋃

119894=1

119864119894) le

2(119896minus1)

sum

119894=1

119875 (119864119894) le

2(119896minus1)

sum

119894=1

119875 (119862) (23)


errD (ℎ) = 120576 (119903 119898 120575 119877 119896)

=

2(119896minus1)

sum

119894=1

2

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

=4 (119896 minus 1)

119898(2561198772

1199032log emr

321198772log 32119898

1199032+ log 4

120575)

(24)




119894) = (xx1015840

119894119873 + 1)






Wine M-ACM 9774 9865M-SVM 9719 9775

Glass M-ACM 5646 6938M-SVM 5514 6615



5 Summary


Acknowledgments


References
























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





















Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



research article the generalization error bound for the...

Documents