smooth support vector machines for classification and ... · smooth support vector machines for...

61
Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and Technology International Summer Workshop on the Economics, Financial and Managerial Applications of Computational Intelligence August 16~20, 2004

Upload: others

Post on 30-Apr-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Smooth Support Vector Machinesfor Classification and Regression

Yuh-Jye Lee

National Taiwan University of Science and Technology

International Summer Workshop on the Economics, Financialand Managerial Applications of Computational Intelligence

August 16~20, 2004

Page 2: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Fundamental Problems in Data Mining

Supervised learning:

Feature selection

Classification problems

Too many features could degrade generalizationperformance, curse of dimensionality

Regression problems

Unsupervised learning:Clustering algorithms

Association Rules

Page 3: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Binary Classification Problem

Supervised learning in Machine Learning

Successful applications:

(A Fundamental Problem in Data Mining)

Fisher Linear Discriminator

Find a decision function (rule) to discriminatetwo categories data sets.

Discrimination Analysis in Statistics

Decision Tree, Neural Network, k-NN andSupport Vector Machines, etc.

Marketing, Bioinformatics, Fraud detection

Page 4: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Bankruptcy PredictionBinary classification of firms: solvent vs. bankrupt

The data are financial indicators from middle-marketcapitalization firms in Benelux. From a total of 422firms, 74 went bankrupt and 348 were solvent. Thevariables to be used in the model as explanatory inputsare 40 financial indicators such as: liquidity, profitabilityand solvency measurements.

T. Van Gestel, B. Baesens, J. A. K. Suykens, M. Espinoza, D. Baestaens, J. Vanthienen and B. De Moor, “Bankruptcy Prediction with Least SquaresSupport Vector Machine Classifiers”, International Conference in Computational Intelligence and FinancialEngineering, 2003.

Page 5: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Binary Classification ProblemLinearly Separable Case

A-

A+

x0w + b = à 1

wx0w + b = + 1x0w + b = 0

Bankrupt

Solvent

Page 6: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Why Use Support Vector Machines?Powerful tools for Data Mining

SVM classifier is an optimally defined surfaceSVMs have a good geometric interpretationSVMs can be generated very efficientlyCan be extended from linear to nonlinear case

Typically nonlinear in the input spaceLinear in a higher dimensional “feature” spaceImplicitly defined by a kernel function

Have a sound theoretical foundationBased on Statistical Learning Theory

Page 7: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Support Vector MachinesMaximizing the Margin between Bounding Planes

x0w + b = + 1

x0w + b = à 1

A+

A-

w

||w||22 = Margin

Page 8: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Algebra of the Classification ProblemLinearly Separable Case

Given l points in the n dimensional real space Rn

Represented by an `â n matrix A

orMembership of each point Ai in the classes Aà A+

is specified by an `â ` diagonal matrix D :Dii = à 1 if Ai ∈ Aà and Dii = 1 Ai ∈ A+if

Separate Aà and A+ by two bounding planes such that:

Aiw+ b > + 1, for Dii = + 1,Aiw+ b 6 à 1, for Dii = à 1

More succinctly: D(Aw+ eb)>e

e = [1, 1, . . ., 1]0 ∈ R`.

, where

Page 9: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Support Vector Classification(Linearly Separable Case)

Let S = {(x1, y1), (x2, y2), . . .(xl, yl)}be a linearly

separable training sample and represented bymatrices

A =

(x1)0

(x2)0...(xl)0

⎡⎢⎣⎤⎥⎦ ∈ Rlân, D =

y1 á á á 0.... . .

...0 á á á yl

" #∈ Rlâl

Page 10: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

D(Aw+ eb) >e+ ø

ø>0

where ø : nonnegative slack (error) vector

The term e0ø , 1-norm measure of error vector, is called the training error.

minw, b, ø

e0ø

s.t. (LP)

Robust Linear ProgrammingPreliminary Approach to SVM

For the linearly separable case, at solution of (LP):

ø = 0

Page 11: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

xj

x

x

x

x

x

x

x

x

o

o

o

o

o

o

o

oi

í

í

øj

øi

Page 12: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Two Different Measures of Training Error

min(w,b,ø)∈Rn+1+l

21||w||22 + 2

C||ø||22D(Aw+ eb) + ø>e

2-Norm Soft Margin:

1-Norm Soft Margin:min

(w,b,ø)∈Rn+1+l21||w||22 + Ce 0ø

D(Aw+ eb) + ø>e, ø > 0

Margin is maximized by minimizing reciprocal of

margin.

Page 13: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

1-Norm Soft Margin Dual Formulation

The Lagrangian for 1-norm soft margin:L(w, b, ø, ë, r) =

21w 0w + Ce 0ø+ë 0[eàD(Aw + eb)à ø]à r 0ø

where ë>0 & r>0

The partial derivatives with respect to primalvariables equal zeros

∂w∂L(w,b,ø,ë) = w à A 0Dë = 0

∂b∂L(w,b,ø,ë) = e 0Dë = 0,

∂ø∂L(w,b,ø,ë) = C à ë à r = 0

Page 14: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Dual Maximization Problemfor 1-Norm Soft Margin

Dual:

0 6 ë 6 Ce

maxë∈Rl

e 0ë à21ë 0DAA 0Dë

e 0Dë = 0

The corresponding KKT complementarity:

06ë ⊥ D(Aw + eb) + øà e>0

06ø ⊥ ëà Ce60

Page 15: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Slack Variables for 1-Norm Soft Margin ( )

Non-zero slack can only occur when ëãi = C

The points for which 0 < ëãi < C lie at the

bounding planesThis will help us to find bã

The contribution of outlier in the decisionrule will be at most CThe trade-off between accuracy and regularization directly controls by C

wã = A 0Dëã

Page 16: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Tuning Procedure

overfitting

The final value of parameter is one with the maximum testing set correctness !

Page 17: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

SVM as anUnconstrained Minimization Problem

Hence (QP) is equivalent to the nonsmooth SVM:minw , b 2

C k(e à D(Aw + eb))+k22 + 2

1(kwk22 + b2)

2C køk2

2 + 21(kwk2

2 + b2)

D(Aw + eb) + ø>eø>0,w, bmin

s. t.(QP)

Change (QP) into an unconstrained MP

Reduce (n+1+m) variables to (n+1) variables

At the solution of (QP) :where (á )+= max{á ,0}

ø = (e à D (Aw + eb))+

.

Page 18: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Smooth the Plus Function: Integrate

Step function: xã Sigmoid function:(1+εà5x)

1

Plus function: x+ p-function: p(x, 5)

(1+εàìx)1

p(x, ì) := x +ì1 log(1 + εàìx)

Page 19: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

SSVM: Smooth Support Vector Machine

(á )+Replacing the plus function in the nonsmoothSVM by the smooth p(á , ì) , gives our SSVM:

ìnonsmooth SVM as goes to infinity.The solution of SSVM converges to the solution of

ì = 5(Typically, )

min(w, b) ∈ Rn+1 2

Ckp((e à D(Aw + eb)), ì)k22 + 2

1(kwk22 + b2)

, obtained by integrating the sigmoid function(á )+ofHere, p ( á , ì ) is an accurate smooth approximation

of neural networks. (sigmoid = smoothed step)

Page 20: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Newton-Armijo Method: Quadratic Approximation of SSVM

(w i, b i)è é

generated by solving aThe sequence

(wã, bã)quadratic approximation of SSVM, converges to the

of SSVM at a quadratic rate.

At each iteration we solve a linear system of:n+1 equations in n+1 variablesComplexity depends on dimension of input space

Converges in 6 to 8 iterations

unique solution

It might be needed to select a stepsize

Page 21: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Newton-Armijo Algorithm

Start with any (w0, b0) ∈ Rn+1 . Having (wi, bi),

stop if ∇Φì(wi, bi) = 0, else :

(i) Newton Direction :

∇2Φì(wi, bi)d

i = à∇Φì(wi, bi)

0

(ii) Armijo Stepsize :

(wi+1, bi+1) = (wi, bi) + õidi

õ i = { 1 , 21 , 4

1 , . . . }

globally and globally and quadraticallyquadraticallyconverge to converge to unique unique solution in a solution in a finite number finite number of stepsof steps

Page 22: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

It can not converge to optimum solution !

f(x) = à 61x 6 + 4

1x 4 + 2x 2

g(x) = f(xi) + f0(xi)(x à xi) + 21 f00(xi)(x à xi)

Why use Why use stepsizestepsize

Page 23: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Comparisons of SSVM with other SVMs

89.17128.15

86.1042.41

89.633.69

Ionosphere 351 x 34

61.834.91

66.233.72

68.181.03

WPBC(60 months)110 x 22

82.0212.50

71.086.25

83.472.32

WPBC(24 months)155 x 32

77.071138.0

74.47286.59

78.121.54

Pima Indians768 x 8

69.86124.23

64.0319.94

70.331.05

BUPA Liver345 x 6

72.1267.55

84.5518.71

86.131.63

Cleveland Heart297 x 13

mâ nDataset Size SSVM SVMíí á

íí2

2SVMíí á

íí1

Tenfold test set correctness % (best in Red)CPU time in seconds

QPLPLinear Eqns.

Page 24: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Two-spiral Dataset(94 White Dots & 94 Red Dots)

Page 25: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

X Fþ

þ( ) þ( )

þ( )þ( )

þ( )

þ( )

þ( )þ( )

Page 26: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Dual Representation of SVM(Key of Kernel Methods: )

The hypothesis is determined by (ëã, bã)

h(x) = sgn(êx, A 0Dëãë+ bã)

= sgn(Pi=1

l

yiëãi

êxi, x

ë+ bã)

= sgn(Pëã

i>0

yiëãi

êxi, x

ë+ bã)

w = A0Dëã =Pi=1

`

yiëãiA

0i

Remember : A0i = xi

Page 27: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

f(x) =ðP

i=1

?

wiþi(x)ñ+ b

Linear Machine in Feature Space

Let þ : X→ F be a nonlinear map from the

input space to some feature space

The classifier will be in the form (Primal):

Make it in the dual form:

f(x) =ðP

i=1

l

ëiyiêþ(xi) á þ(x)ëñ+ b

Page 28: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

K(x, z) =êþ(x) á þ(z)ë

Kernel: Represent Inner Product in Feature Space

The classifier will become:

f(x) =ðP

i=1

l

ëiyiK(xi, x)ñ+ b

Definition: A kernel is a function K : XâX→ R

such that for all x, z ∈ X

where þ : X→ F

Page 29: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

A Simple Example of Kernel

Polynomial Kernel of Degree 2: K(x, z) =êx, z

ë2Let x =

x1

x2

ô õ, z =

z1z2

ô õ∈ R2 and the nonlinear map

þ : R2 7→R3 defined by þ(x) =x21

x22

2√

x1x2

⎡⎣ ⎤⎦ .

Thenêþ(x), þ(z)

ë=

êx, z

ë2= K(x, z).

There are many other nonlinear maps, ψ(x), that

satisfy the relation:êψ(x),ψ(z)

ë=

êx, z

ë2= K(x, z)

Page 30: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Power of the Kernel Technique

Consider a nonlinear map þ : Rn 7→Rp that consistsof distinct features of all the monomials of degree d.

Then p = n+ dà 1d

ð ñ.

For example: n = 11, d = 10, p = 92378

Is it necessary? We only need to knowêþ(x), þ(z)

ë!

This can be achieved K(x, z) =êx, z

ëd

Page 31: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

More Examples of KernelK(A,B) : Rmân âRnâl 7à→Rmâl

A ∈ Rmân, a ∈ Rm, ö ∈ R, d is an integer:

Polynomial Kernel : (AA 0 + öaa 0)dï)(Linear KernelAA 0 : ö = 0, d = 1

Gaussian (Radial Basis) Kernel :

εàökAiàAjk22, i, j=1, . . .,mK(A,A0)ij =

The ij-entry of K(A,A0) represents the “similarity”of data pointsAi Ajand

Page 32: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

The value of kernel function represents the inner product of two training points in feature space

Kernel functions merge two steps1. map input data from input space to

feature space (might be infinite dim.)2. do inner product in the feature space

Kernel TechniqueBased on Mercer’s Condition (1909)

Page 33: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Nonlinear SVM Motivation

Linear SVM: (Linear separator:x 0w + b = 0 )

2C k øk 2

2 + 21(kw k 2

2 + b 2 )

D (Aw + eb) + ø> eø>0, w, bmin

s. t.(QP)

By QP “duality”, w = A 0Dë. Maximizing the marginin the “dual space” gives:

2Ckp(e à D(AA 0Dë + eb), ì)k2

2+ 21(këk22 + b2)

ë, bmin

Dual SSVM with separator: x0A0Dë+ b = 0

2C k øk 2

2 + 21(kë k 2

2 + b 2 )

D (AA 0Dë + eb) + ø> eø>0, ë, bmin

s. t.

Page 34: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Nonlinear Smooth SVMK (x 0, A 0)Dë + b = 0

K (A, A 0)ReplaceAA 0by a nonlinear kernel :

2Ckp(eàD(K(A,A0)Dë+ eb, ì)k22+ 2

1(këk22 + b2)

ë, bmin

Use Newton-Armijo algorithm to solve the problem

Each iteration solves m+1 linear equations in m+1 variables

Nonlinear classifier depends on entire dataset :

K (x 0, A 0)Dë + b = 0

Nonlinear Classifier:

Page 35: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Difficulties with Nonlinear SVM for Large Problems

The nonlinear kernel K(A,A0) ∈ Rlâl is fully dense

Computational complexity depends on # of example

Separating surface depends on almost entire datasetComplexity of nonlinear SSVM ø O((l+ 1)3)

Runs out of memory while storing the kernel matrixLong CPU time to compute the dense kernel matrix

O(l2)Need to generate and store entries

Need to store the entire dataset even after solvingthe problem

Page 36: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Solving the SVM with Massive Dataset

Limit the SVM to dataset of a few thousand points

Solution I: SMO (Sequential Minimal Optimization)

Standard optimization techniques require that the the data are held in memory

Solve the sub-optimization problem defined by the working set (size =2)Increase the objective function iteratively

Solution II: RSVM (Reduced Support Vector Machine)

Page 37: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Support Vector Regression(Linear Case: f(x) = x0w+ b)

Given the training set:S = {(xi, yi)| xi ∈ Rn, yi ∈ R, i = 1, . . ., l}

Find a linear function, f(x) = x0w+ b where(w, b) is determined by solving a minimizationproblem that guarantees the smallest overallexperiment error made by f(x) = x0w+ b

Motivated by SVM:||w||2 should be as small as possibleSome tiny error should be discard

Page 38: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

-Insensitive Loss Functionï

-insensitive loss function:ï

|yi à f(xi)|ï = max{0, |yi à f(xi)| à ï}

The loss made by the estimation function, fat the data point (xi, yi) is

|ø|ï = max{0, |ø| à ï}= 0 if |ø|6ï|ø| à ï otherwise

úIf ø ∈ Rn then |ø|ï ∈ Rn is defined as:

(|ø|ï)i = |øi|ï , i = 1. . .n

Page 39: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

x

x

x

x

x

x

x

x

x

ε

ε

-Insensitive Linear Regressionε

f(x) = x0w + b

yj à f(xj)à εf(xk)à yk à ε

Find (w, b) with the smallest overall error

Page 40: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

ï - insensitive Support Vector Regression Model

Motivated by SVM:||w||2 should be as small as possible

Some tiny error should be discarded

min(w,b,ø)∈Rn+1+m

21||w||22 + Ce0 ø| |ï

where ø| |ï ∈ Rm, ( ø| |ï)i = max(0, Aiw+ bà yi| | à ï)

Page 41: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Reformulated - SVR as a Constrained Minimization Problem

min(w,b,ø,øã)∈Rn+1+2m

21w0w+ Ce0(ø+ øã)

yàAwà eb 6 eï+ øAw+ ebà y 6 eï+ øã

ø, øã > 0

subject to

n+1+2m variables and 2m constrains minimization problem

ï

Enlarge the problem size and computational complexity for solving the problem

Page 42: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Five Wild Used Loss Functions

Page 43: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

SV Regression by Minimizing Quadratic -Insensitive Lossï

We minimize ||(w, b)||22 at the same timeOccam’s razor: the simplest is the best

min(w,b,ø)∈Rn+1+l

21(||w||22 + b2) +

2C ||(|ø|ε)||22

We have the following (nonsmooth) problem:

where (|ø|ε)i = |yi à (w0xi + b)|ε

Have the strong convexity of the problem

Page 44: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

- insensitive Loss Functionï

(à x à ï)+ (xà ï)+x| |ï

Page 45: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Quadratic -insensitive Loss Functionï

x| |2ï = ((xà ï)+ + (à xà ï)+)2

= (x à ï)2+ + (à x à ï)

2+

(x à ï ) + á (à x à ï ) + = 0

Page 46: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

p2ï-function replace UseïQuadratic -insensitive Function

p(x, ì)

p2ï(x, ì) = (p(xà ï, ì))2 + (p(à xà ï, ì))2

which is defined byp(x, ì) = x+

ì1 log(1 + expàìx)

p -function withë=10, p(x,10), x∈[à3,3]

Page 47: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

x| | 2ï p 2ï(x, ì), ï = 1, ì = 5

Page 48: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

-insensitive Smooth Support Vector Regression

ï

min(w,b)∈Rn+1

21(w0w+ b2) +

2CPi=1

m

p2ï(Aiw+ bà yi, ì)2CPi=1

m

A iw + b à y i| |2ï

This problem is a strongly convexstrongly convexminimization problem without any constrainsThe object function is twice differentiabletwice differentiablethus we can use a fast NewtonNewton--ArmijoArmijomethodmethod to solve this problem

min(w,b)∈Rn+1

Φï,ë(w, b) :=

Page 49: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Nonlinear -SVRï

w = A0ë, ë ∈ Rm

Based on duality theorem and KKT–optimalityconditions

y ù Aw + eb

y ù AA0ë + eb

y ù K(A,A0)ë + ebIn nonlinear case :

Page 50: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Nonlinear SVR

min(ë,b)∈Rm+1

21||ë||22 + C

Pi=1

m

K(Ai,A0)ë + b à yi| |ï

A ∈ Rmân, B ∈ Rnâ l

K(A, B)

ï à

Letand Rmân â Rnâl=⇒Rmâl

K(Ai,A0) ∈ R1âmand

Nonlinear regression function : f(x) = K(x,A0)ë + b

Page 51: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

min(ë,b)∈Rm+1

21(ë0ë+ b2)

+ 2CPi=1

m

p2ï(K(Ai,A0)u+ bà yi, ë)+ 2

CPi=1

m

K (A i, A0)ë + b à y i| |2ï

Nonlinear Smooth Support Vector -insensitive Regressionï

Page 52: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Training set and testing set (Slice methodSlice method)Gaussian kernelGaussian kernel is used to generate nonlinear -SVR in all experimentsReduced kernel techniqueReduced kernel technique is utilized when training dataset is bigger then 1000Error measure : 2-norm relative error

Numerical Results

ï

yk k 2

yà yêk k 2 : observations: predicted valuesyê

y

Page 53: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

f(x) = 0.5 ã sinc(ù10x)+noise

Noise: mean=0

x ∈ [à 1, 1], 101 points

û = 0.04

Parameter:÷ = 50, ö = 5, ε = 0.02

Training time : 0.3 sec.

101 Data Points inRâR

Nonlinear SSVR with Kernel: expàö||xiàxj||22

Page 54: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

First Artificial Dataset

f(x) = 0.5 ãù30x

sinc( ù30x)

+ ú ú random noise with mean=0,standard deviation 0.04

Training Time : 0.016 sec.Error : 0.059

Training Time : 0.015 sec.Error : 0.068

ï - SSVR LIBSVM

Page 55: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Original Function

Noise : mean=0 , û = 0.4

Parameter : ÷ = 50, ö = 1, ε = 0.5

Training time : 9.61 sec.Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761

Estimated Function

481 Data Points in R2 â R

Page 56: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Noise : mean=0 , û = 0.4

Estimated Function Original Function

Using Reduced Kernel: K(A,A 0) ∈ R28900â300

Parameter :

Training time : 22.58 sec.MAE of 49x49 mesh points : 0.0513

C = 10000, ö = 1, ï = 0.2

Page 57: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Real Datasets

Page 58: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Linear -SSVRTenfold Numerical Result

ï

Page 59: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Nonlinear -SSVRTenfold Numerical Result 1/2

ï

Page 60: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Nonlinear -SSVRTenfold Numerical Result 2/2

ï

Page 61: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

We introduced SVMs for classification and regression.SVMs can be extended from linear to nonlinear by using kernel trick.

We applied smooth technique to SVMs to propose smooth SVMs for classification and regression.

We also described the Newton-Armijo algorithm to solve smooth SVMswhich has been shown convergent globally and quadratically in finite steps to the solution.

The numerical results show the effectiveness and correctness of linear and nonlinear smooth SVMs.

Our smooth SVMs formulation is only need to solve a system of linear equations iteratively instead of solving a convex quadratic problem.

Conclusion