svm_lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 1/69

By : Moataz Al-HajVision Topics – Seminar (University of Haifa)

Supervised by Dr. Hagit Hel-r



-Introduction to SVM : History and motivation

-Problem definition-Te SVM a!!roac: Te "inear se!arable case

-SVM: #on "inear se!arable case:

- V$ %imension

- Te &ernel Tric& : discussion on 'ernel functions( -Soft mar)in: introducin) te slac& variables and

discussin) te trade-off !arameter *$+(

-Procedure for coosin) an SVM model tat best

fits our !roblem ,*'-fold+(

-Some A!!lications of SVM(

-$onclusion: Te Advanta)es and %ra.bac&s of SVM(



Before startin):

0- trou)out te lecture if you see underlined red colored

te1t ten clic& on tis te1t for farter information(

2-let me introduce you to *#odni&it+ :Se is an outstandin)

student( Altou) se as&s many 3uestions but sometimes

tese 3uestions are &ey 3uestions tat el! us understand

te material more in de!t( Also te notes tat se )ives are

very el!ful( Hi4

Before Startin)



-Su!!ort Vector Macine ,SVM is a su!ervisedlearnin) al)oritm develo!ed by Vladimir Va!ni& andit .as first eard in 05526 introduced by Va!ni&6

Boser and 7uyon in $8"T-52(9; -,it is said tat Vladimir Va!ni& as mentioned itsidea in 05<5 in one of is !a!er but its majordevelo!ment .as in te 5=>s

- For many years #eural #et.or&s .as te ultimatecam!ion 6it .as te most effective learnin)al)oritm(

Introduction to SVM : History and motivation

TI"" SVM $AM? 4



Introduction to SVM : History and motivation cont>

-SVM became !o!ular because of its success in

and.ritten di)it reco)nition ,in #IST ,055@( it )ave

accuracy tat is com!arable to so!isticated and

carefully constructed neural net.or&s .it elaboratedfeatures in a and.ritin) reco)nition tas& (90;

-Muc more effective *off te self + al)oritm tan

#eural #et.or&s : It )eneralize )ood on unseen dataand is easier to train and doesn>t ave any local

o!tima in contrast to neural net.or&s tat may ave

many local o!tima and ta&es a lot of time to conver)e(

9;



Introduction to SVM : History and motivation cont>

- SVM as successful a!!lications in many

com!le16 real-.orld !roblems suc as te1t and

ima)e classification6 and-.ritin) reco)nition6

data minin)6 bioinformatics6 medicine and

biose3uence analysis and even stoc& mar&et4

- In many of tese a!!lications SVM is te best

coice(- e .ill furter elaborate on some of tese

a!!lications latter in tis lecture(



Problem definition:

-e are )iven a set of n !oints ,vectors :

suc tat is a vector of len)t m 6

and eac belon) to one of t.o classes .e label tem

by *C0+ and *-0+(-So our trainin) set is:

- e .ant to find a se!aratin) y!er!lane

tat se!arates tese !oints into te t.o classes(

*Te !ositives+ ,class *C0+ and *Te ne)atives+ ,class *-0+(

,Assumin) tat tey are linearly se!arable

1 2, ,....... n x x x

i x

1 1 2 2( , ), ( , ),....( , )n n x y x y x y

, { 1, 1}m

i ii x R y∀ ∈ ∈ + −

0w x b× + =

So t!e decision

function "ill be

( ) ( ) f x sign w x b= × +



A se!aratin)

y!re!lane

1i y = −

1i y = +

Se!aratin) Hy!er!lane

But Tere are many !ossibilities

for suc y!er!lanes 44

( ) ( ) f x sign w x b= × +

0w x b× + =

1 x

2 x



Se!aratin) Hy!er!lanes

1i y = −1i y = +

Des6 Tere are many !ossible se!aratin) y!er!lanes

It could be

ic one sould .e

coose4

tis one or tis or tis or maybeE(4



$oosin) a se!aratin) y!er!lane:

i x

' x

( ) f x

-Su!!ose .e coose te y!re!lane ,seen belo. tat is

close to some sam!le (

- #o. su!!ose .e ave a ne. !oint tat sould be in class

*-0+ and is close to ( Fsin) our classification functiontis !oint is misclassified4

i x

i x

' x

Poor )eneralization4

,Poor !erformance onunseen data

( ) ( ) f x sign w x b= × +



$oosin) a se!aratin) y!er!lane:

i x' x

-Hy!er!lane sould be as far as !ossible from any

sam!le !oint(

-Tis .ay a ne. data tat is close to te oldsam!les .ill be classified correctly(

7ood )eneralization4



$oosin) a se!aratin) y!er!lane(

Te SVM a!!roac: "inear se!arable case

-Te SVM idea is to ma1imize te distance bet.een

Te y!er!lane and te closest sam!le !oint(

In te o!timal y!er-

!lane:

Te distance to te

closest ne)ative !oint G

Te distance to teclosest !ositive !oint(

Aa4 I see 4





i x

M a r ) i n

d d

d

SVM>s )oal is to ma1imize te Mar)in .ic is t.ice

te distance *d+ bet.een te se!aratin) y!er!lane

and te closest sam!le(

y it is te best-/obust to outliners as

.e sa. and tus

stron) )eneralization

ability(-It !roved itself to ave

better !erformance on

test data in bot

!ractice and in teory(





i x

M a r ) i n

d d

d

Tese are

Su!!ort

Vectors

Su!!ort vectors are te sam!les closest to te

se!aratin) y!er!lane(

e .ill see latter tat te

8!timal y!er!lane is

com!letely defined by

te su!!ort vectors(

84 So tis is .ere

te name came from4



w

w

iγ

i x

p

"et us loo& at our decision

boundary :Tis se!aratin)

y!er!lane e3uation is :

ere

#ote tat is orto)onal to

te se!aratin) y!er!lane and

its len)t is 0(

, ,m mw R x R b R∈ ∈ ∈

iγ "et be te distance bet.een te y!er!lane and

Some trainin) e1am!le ( So is te len)t of te

se)ment from ! to (i x

iγ

i

x

w

w

0t w x b+ =

SVM : "inear se!arable case(

ormula for te Mar)in



i

γ

i x

p

w

w

! is !oint on te y!re!lane

so ( 8n te oter

and (i i

w p x

wγ = − ×

define

#ote tat if .e can)ed to and to tis

.ill not affect since (

0t

w p b+ =

( ) 0t

i i

ww x b

wγ − + =

t

i

i

w x b

wγ

× +

=

w wα b bα d


t t w x b w x b

w w

α α

α

+ +=

ormula for te Mar)in cont>

1.. 1..min min

t i

ii n i n

w x bd

wγ

∈ ∈

+= =



-"et be a sam!le !oint closet to

Te boundary( Set

,.e can rescale . and b(

-or uni3ueness set for

any sam!le closest to teboundary(

So no.

' x

' 1t w x b+ =

i

x

2m

w

=

d d

m

Te Mar)in

1t

iw x b+ =


' 1t w x b

d w w

+= =

ormula for te Mar)in cont>

' x



To find te o!timal se!aratin) y!er!lane 6 SVM

aims to ma1imize te mar)in:

-Ma1imize

2

m w=

suc

tat:For 1, 1

For 1, 1

T

i i

T

i i

y b

y b

= + + ≥

= − + ≤ −

w x

w x

Minimize

21

2 w

( ) 1T

i i y b+ ≥w x

suc

tat:


indin) te o!timal y!er!lane:

e transformed te !roblem into a form tat can be

efficiently solved( e )ot an o!timization !roblem .it a

conve1 3uadratic objective .it only linear constrains and

al.ays as a sin)le )lobal minimum(



-8ur o!timization !roblem so far:

( ) 1T

i i y b+ ≥w x

21minimize

2w

s(t(

-e .ill solve tis !roblem by introducin) "a)ran)e

multi!liers associated .it te constrains: iα

2

1

1minimize ( , , ) ( ( ) 1)

2

. 0

n

p i i i

i

i

L w b w y x w b

s t

α α

α

=

= − × + −

≥

∑

I do remember te

"a)ran)e Multi!liers

from $alculus4


Te o!timization !roblem:



So our !rimal o!timization !roblem no.:

e star solvin) tis !roblem:

0 p L

b

∂=

∂

0 p L∂

=∂w 1

n

i i i

i

yα =

= ∑w x

1

0n

i i

i

yα =

=∑


Te o!timization !roblem cont>:

2

1

1minimize ( , , ) ( ( ) 1)

2. 0

n

p i i i

i

i

L w b w y x w b

s t

α α

α =

= − × + −

≥

∑



By substitutin) te above results in te !rimal

!roblem and doin) some mat mani!ulation .e )et:"a)ran)ian %ual Problem:

1 0 0

1

1maximaize ( )

2

. 0 0

n n nt

D i i j i j i j

i i j

n

i i i

i

L y y x x

s t and y

α α α α

α α

= = =

=

= −

≥ =

∑ ∑∑

∑

are no. our variables6 one for eac sam!le

!oint (1 2

{ , ,........., }nα α α α =


Inroducin) Te "e)ran)in %ual Problem(

i x



Fsin) te ''T ,'arus-'un-Tuc&er condition:

( )( ) 1 0T

i i i y bα + − =w xi∀

-e can calculate *b+ by ta&in) * i+ suc tat :0iα >

Must be 1

( ) 1 0 ( {1, 1})t t t

i i i i i i

i

y w x b b w x y w x y y

+ − = ⇒ = − = − ∈ −

-$alculatin) *.+ .ill be done usin) .at .e ave

found above : i i i

i

w y xα = ∑

-Fsually 6Many of te -s are zero so te

calculation of *.+ as a lo. com!le1ity(iα


indin) *.+ and *b+ for te boundary :t w x b+



-Sam!les .it are te Su!!ort Vectors: te

closest sam!les to te se!aratin) y!er!lane(

0iα >

-So (

-And suc tat is a su!!ort vector(

-e see tat te se!aratin) y!er!lane is

com!letely defined by te su!!ort vectors(

1 SV

n

i i i i i i

i i y yα α

= ∈= =∑ ∑

w x x

t

i ib y w x= − i x

t w x b+

-#o. our %ecision unction is:

( ) ( ) ( )t

i i i

i SV

f x sign w x b sign y x x bα

∈

= + = × +∑


Te im!ortance of te Su!!ort Vectors :



1 0 0

1

1maximaize ( )

2

. 0 0

n n nt

D i i j i j i j

i i j

n

i i i

i

L y y x x

s t and y

α α α α

α α

= = =

=

= −

≥ =

∑ ∑∑

∑

( ) D

L α $an be o!timized usin) a JP soft.are( Some e1am!les

are "o3o6 c!le16 etc( ,see tt!:KK...(numerical(rl(ac(u&K3!K3!(tml

-But for SVM te most !o!ular JP is Se3uential Minimal

8!timization ,SM8: It .as introduced by Lon $( Platt in

0555(And it is .idely used because of its efficiency (9;

-Tis is a 3uadratic !ro)rammin) ,JP !roblem(

A )lobal ma1imum of can al.ays be found( ) D L α


Some notes on te dual !roblem:



at if te sam!le !oints are not linearly

se!arable 4

%efinition: *Te V$ dimension of a class of functions fi N is te

ma1imum number of !oints tat can be se!arated ,sattered

into t.o classes in all !ossible .ays by fi N (+ 9O;

-if .e loo& at any ,non -collinear tree !oints in 2d !lane teycan be "inearly se!arated:

Tese ima)es above are ta&en fromE(

Te V$ dimension for a set of oriented lines in is ( 2 R

V$ ,Va!ni&-$ervonen&is %imension



our !oints not

se!arable in

By a y!re!lane

But can be se!arable in

By a y!re!lane

2 R

3 R

-+Te V$ dimension of te set of orientedy!er!lanes in is nC0(+ 9O;n R

-Tus it is al.ays !ossible6 for a finite set of !oints

to find a dimension .ere all !ossible se!aration

of te !oint set can be acieved by a y!er!lane(

V$ %imension cont>



'ey idea: ma! our !oints .it a ma!!in) function

to a s!ace of sufficiently i) dimension so tat tey .ill be se!arable by a y!re!lane:

0 x

or e1am!le :a non linearly se!arable in one dimension:

ma!!in) data to t.o-dimensional s!ace .it 2

( ) ( , ) x x xφ =

0 x

x2

( ) xφ

o.46 no. .e can

use te linear SVM

.e learned in tis

i)er dimensional

s!ace4

#on-linear SVM :

Ma!!in) te data to i)er dimension

-In!ut s!ace: te s!ace .ere te !oints #i are located

-eature s!ace: te s!ace of φ,#i after transformation



#on "inear SVM:

-To solve a non linear classification !roblem .it a

linear classifier all .e ave to do is to substitute ( ) xφ

Instead of 1 every.ere .ere 1 a!!ears in te

o!timization !roblem:

1 1 1 1

1maximize ( ) . 0 0

2

n n n nt

D i i j i j i j i i i

i i j i

L y y x x s t yα α α α α α = = = =

= − ≥ =∑ ∑∑ ∑

#o. it .ill be:

1 1 1 1

1maximize ( ) ( ) ( ) . 0 02

n n n nt


i i j i

L y y x x s t yα α α α φ φ α α = = = =

= − ≥ =∑ ∑∑ ∑

Te decision function .ill be: ( ) ( ( )) ( ( ) )t g x f x sign w x bφ φ = = × +

Ma!!in) te data to i)er dimension cont>

$lic& ere to see a demonstration of ma!!in) te data to a

i)er dimension so tat te can be linearly s!arable(



#on "inear SVM :

An illustration of te al)oritm:



But $om!utations in te feature s!ace can be costly because it may

be i) dimensional 4

Te 'ernel Tric&:

Tat>s ri)t 46 .or&in) in i) dimensional

s!ace is com!utationally e1!ensive(

-But luc&ily te &ernel tric& comes to rescue:

If .e loo& a)ain at te o!timization !roblem:

1 1 1 1

1maximize ( ) ( ) ( ) . 0 0

2

n n n nt


i i j i

L y y x x s t yα α α α φ φ α α = = = =

= − ≥ =∑ ∑∑ ∑

And te decision function:

#o need to &no. tis ma!!in) e1!licitly nor do .e need to

&no. te dimension of te ne. s!ace6 because .e only use

te dot !roduct of feature vectors in bot te trainin) and test(

1

( ( )) ( ( ) ) ( ( ) ( ) )n

t t

i i i

i

f x sign w x b sign y x x bφ φ α φ φ =

= + = +∑



A kernel function is defined as a function tat

corres!onds to a dot !roduct of t.o feature vectors

in some e1!anded feature s!ace:

( , ) ( ) ( )T

i j i j K φ φ ≡x x x x

#o. .e only need to com!ute and .e don>t

need to !erform com!utations in i) dimensional

s!ace e1!licitly( Tis is .at is called te 'ernel Tric&(

( , )i j

K x x

Te 'ernel Tric&:



'ernel Tric&: $om!utational savin) of te &ernel tric&

?1am!le Juadratic Basis function: ,Andre. Moore

Te cost of

com!utation is:

2( )O m

,m is te dimension of in!ut

ere as te corres!ondin) 'ernel is :2( , ) ( 1) K a b a b= × +

( )O mTe cost of com!utation is:

To believe me tat it

is really te real

'ernel :



Hi)er 8rder Polynomials ,rom Andre. Moore

( ) ( )kl k l k l

Q y y x xφ φ =

/ is te number of sam!les6 m is te dimension of te

sam!le !oints(

1 ,k l R≤ ≤



,a&a te 7ram matri1:

-Te central structure in &ernel macines

-Information Qbottlenec&>: contains all necessaryinformation for te learnin) al)oritm(

-one of its most interestin) !ro!erties: Mercer>s

Teorem( based on notes from ...(su!!ort-vectors(com

'G

Te 'ernel Matri1



Mercer>s Teorem:

-A function is a &ernel ,tere e1ists a

suc tat Te 'ernel matri1

is Symmetric Positive Semi-definite(

( , )i j K x x

-Anoter version of mercer>s teorem tat isn>t

related to te &ernel matri1 is: function

is a &ernel

( , )i j

K x x

2

( )

( )

( , ) ( ) ( ) 0

for any suc tat

is fi

g u

g u du

K u ! g

nit" t"n

u g ! dud! ≥

∫

∫ 7reat46 so &no.

.e can cec& if

+ ' *is a &ernel

.itout te need

to &no. ( ) xφ

( ) xφ

( , ) ( ) ( )T

i j i j K φ φ ≡x x x x



?1am!les of 'ernels:

-Some common coices ,te first t.o al.ays

satisfyin) Mercer>s condition:

-Polynomial &ernel ( , ) ( 1)t p

i j i j K x x x x= +

-7aussian /adial Basis unction */B+ ,data is lifted

to infinite dimension:2

2

1( , ) exp( )

2i j i j K x x x x

σ

= − −

-In fact6 SVM model usin) a si)moid &ernel function is

e3uivalent to a t.o-layer6 feed-for.ard neural net.or&(

( , ) tanh( )i j i j K x x kx x δ = × −

δ

-Si)moidal : ,it is not a

&ernel for every & and (



Ta&en from ,$SI R2R SVM

lecture 9<;

#o. .e can

ma&e com!le1

&ernels from

sim!le ones:

Modularity 4

Ma&in) 'ernels:



Im!ortant 'ernel Issues:

Ho. to verify tat risin) to i)er dimension usin) a

s!ecific &ernel .ill ma! te data to a s!ace in .ic

tey are linearly se!arable

Ho. to &no. .ic 'ernel to use

or most of te &ernel function .e don>t &no. te corres!ondin)ma!!in) function so .e don>t &no. to .ic dimension .e

rose te data( So even tou) risin) to i)er dimension

increases te li&eliood tat tey .ill be se!arable .e can>t

)uarantee tat ( e .ill see a com!romisin) solution for tis

!roblem(

-Tis is a )ood 3uestion and actually still an o!en 3uestion6

many researces ave been .or&in) to deal .it tis issue

but still .e don>t ave a firm ans.er( It is one of te .ea&ness

of SVM( e .ill see an a!!roac to tis issue latter(

( ) xφ

I ave some 3uestions

on &ernels( I .rote tem

on te board(




e sa. tat te 7aussian /adial Basis 'ernel lifts te

data to infinite dimension so our data is al.ays

se!arable in tis s!ace so .y don>t .e al.ays use

tis &ernel irst of all .e sould decide .ic to use in tis &ernel ,

(

Secondly6A stron) &ernel 6.ic lifts te data to infinite dimension6

sometimes may lead us te severe !roblem of 8verfittin):

σ 2

2

1exp( )

2 i j x x

σ − −

Sym!toms of overfittin):

0-"o. mar)in !oor classification !erformance(

2-"ar)e number of su!!ort vectors Slo.s do.n te

com!utation(



-If .e loo& at te &ernel matri1 ten it is almost dia)onal(

Tis means tat te !oints are orto)onal and only similar to

itself(

All tese tin)s lead us to say tat our &ernel function is notreally ade3uate( Since it does not )eneralize )ood over te

data(

-It is )ood to say tat 7aussian radial basis function ,/B is

.idely used6 BFT not alone because teir )ot to be a tool torelease some !ressure of tis stron) &ernel(


In addition to te above !roblems 6 anoter !roblem is tat

sometimes te !oints are linearly se!arable but te mar)in is

"o. :




"inearly se!arable

But lo. mar)in4

All tese !roblems leads us to te com!romisin)

solution:

Soft Mar)in4



Soft Mar)in:

-e allo. *error+ in classification( e use *slac&+

Variables ,one for eac sam!le(

iξ

1 2, ,...... nξ ξ ξ

0 1i

ξ < <

0 1iξ < <

1iξ >

i

ξ Is te deviation error

from ideal !lace for

sam!le i:

-If ten sam!le i is

on te .ron) side of te

y!er!lane(

1iξ >

-If ten sam!le i is

on te ri)t side of te

y!er!lane but .itin tere)ion of te mar)in(

0 1iξ < <



Soft Mar)in:

Ta&en from 900;



Soft Mar)in:

Te !rimal o!timization !roblem

-e can)e te constrains to

instead of (

8ur o!timization !roblem no. is:

( ) 1 0t

i i i i y w x b iξ ξ + ≥ − ∀ ≥

( ) 1t

i i y w x b i+ ≥ ∀

2

1

1minimize

2

n

i

i

# ξ =

+ ∑w

Suc tat: ( ) 1 0t

i i i i y w x b iξ ξ + ≥ − ∀ ≥

0# > is a constant( It is a &ind of !enalty on te

term ( It is a tradeoff bet.een te mar)in and te

trainin) error( It is a .ay to control overfittin) alon)

.it te ma1imum mar)in a!!roac90;(

1

n

i

i

ξ =

∑



8ur dual o!timization !roblem no. is:

1 1 1

1maximize

2

n n nT

i i j i j i j

i i j

y yα α α = = =

−∑ ∑∑ x x

Suc tat:1

0 0

n

i i i

i

# i and yα α =

≤ ≤ ∀ =∑-e can find *.+ usin) :

1

n

i i i

i

w y xα =

= ∑-To com!ute *b+ .e ta&e any and solve for *b+( 0 i # α < <

[ ( ) 1] 0t i i i y w x bα + − =

Soft Mar)in:

Te %ual ormulation(

ic value for *$+

sould .e coose(

( ) 1 (poi

0 ( ) 1

nts 0)

0 ( ) 1

T

i i i

T

i

T

i i i i

i i

# y w x b wit

y w x b

# y w x b

α

α

α ξ

= ⇒ + >

< < ⇒ + =

= ⇒ + < >



-*$+ !lays a major role in controllin) overfittin)(

-"ar)er $ less trainin) sam!les tat are not in ideal !osition,.ic means less trainin) error tat affects !ositively te

$lassification Performance ,$P But smaller mar)in ,affects

ne)atively te ,$P ($ lar)e enou) may lead us to overffitin)

,too muc com!licated classifier tat fits only te trainin) set -Smaller $ more trainin) sam!les tat are not in ideal

!osition ,.ic means more trainin) error tat affects ne)atively

te $lassification Performance ,$P But lar)er Mar)in ,)ood for

,$P( $ small enou) may lead to underffitin) ,nave classifier

-indin) te */i)t+ value for *$+ is one of te major

!roblems of SVM:

Soft Mar)in:

Te *$+ Problem



Too muc sim!le4 Too muc com!licated4

Soft Mar)in:

Te *$+ Problem: 8verfittin) and Fnderfittin)

Fnder-ittin) 8ver-ittin)

Trade-8ff

Based on 02 and



SVM :#onlinear case

$ecipe and %odel selection procedure&

-In most of te real-.orld a!!lications of SVM .e combine .at

.e learned about te &ernel tric& and te soft mar)in and use

tem to)eter :

i 1 1 1

1

1maximize ( , )

2

0 0

n n n

i i j i j i j

i j

n

i i i

i

y y K x x

constrain"d to # i and y

α α α

α α

= = =

=

−

≤ ≤ ∀ =

∑ ∑ ∑

∑

-e solve for usin) a Juadratic Pro)rammin) soft.are( α

-To find *b+ .e ta&e any and solve0i

# α < < [ ( ) 1] 0t

i i i y w x bα + − =

-Te $lassification function .ill be:1( ) ( ( , ) )

n

i i ii g x sign y K x x bα =

= +

∑

1

( ) ( " ))" (n

j j j

j

w y x $o n""d to find b"caus" w" may not knw ow xα φ φ =

= ∑

1 1

( ( ( )) ( ) ) 1 ( , )n n

t

i j j j i i j j j i

j j

y y x x b b y y K x xα φ φ α = =

⇒ + = ⇒ = −∑ ∑

SVM # li



0 divide randomly all te available trainin) e1am!les into K

e3ual-sized subsets(

2 use all but one subset to train te SVM .it te cosen !ara>( use te eld out subset to measure classification error(

re!eat Ste!s 2 and for eac subset(

R avera)e te results to )et an estimate of te )eneralization

error of te SVM classifier(

SVM:#onlinear case

%odel selection procedure

-e ave to decide .ic erne! function and *$+ value to use(

-+In !ractice a "a#ssian radial basis or a lo. de)ree !olynomial

&ernel is a )ood start(+ 9$n%re&.oore;

- e start cec&in) .ic set of !arameters ,suc as $

or if .e coose 7aussian radial basis are te mosta!!ro!riate by $ross-Validation ,'- fold , 9 @ ; :

σ



-Te SVM is tested usin) tis !rocedure for various !arameter

settin)s( In te end6 te model .it te smallest )eneralization

error is ado!ted( Ten .e train our SVM classifier usin) tese

!arameters over te .ole trainin) set(

- or 7aussian /B tryin) e1!onentially )ro.in) se3uences of$ and is a !ractical metod to identify )ood !arameters :

- A )ood coice is te follo.in) )rid:

SVM:#onlinear case

%odel selection procedure cont'

Tis )rid is su))ested by "ibSVM ,An inte)rated and easy-

to-use tool for SVM classifier

σ

1

1 1 3

2 , 2 ,......, 2

2 , 2 ,...., 2

#

σ

− −

− −

=

=



SVM:#onlinear case

%odel selection procedure& e#ample

Tis e1am!le is !rovided in te libsvm )uide( In tis e1am!le

tey are searcin) te *best+ values for *$+ and for an

/B 'ernel for a )iven trainin) usin) te model selection

!rocedure .e sa. above(

σ

*2 , 2#

is a good coic"

σ −= =

SVM M lti l l ifi ti , t t



Tere are t.o basic a!!roaces to solve q-class !roblems, .it SVMs ,90=;6900;:0- 8ne vs( 8ters:

.or&s by constructin) a *re)ular+ SVM for eac class i tat

se!arates tat class from all te oter classes ,class * i+ !ositiveand *not i+ ne)ative( Ten .e cec& te out!ut of eac of te

3 SVM classifiers for our in!ut and coose te class i tat its

corres!ondin) SVM as te ma1imum out!ut(

2% >

iω

2-Pair.ise ,one vs one:e construct */e)ular+ SVM for eac !air of classes ,so .econstruct 3,3-0K2 SVMs( Ten .e use *ma1-.ins+ votin)

strate)y: .e test eac SVM on te in!ut and eac time an

SVM cooses a certain class .e add vote to tat class( Ten

.e coose te class .it i)est number of votes(

SVM or Multi-class classification: ,more tan t.o

classes

( ( ) )t g x w x b= +

SVM M lti l l ifi ti t>



-More advanced metod to im!rove !air.ise metod includes

usin) decision )ra!s to determine te class selected in a

similar manner to &noc&out tournaments:

#ample of advanced pair"ise

SV%. T!e numbers -* encode

t!e classes. Ta+en from,

-Bot mentioned metods above )ive in avera)e com!arable

accuracy results ,.ere as te second metod is relatively

slo.er tan te first (-Sometimes for certain a!!lication one metod is !referable

over te oter(

SVM or Multi-class classification cont>:

A li ti f SVM



A!!lications of SVM:

e .ill see no. some a!!lications for SVM from different

fields and elaborate on one of tem .ic is facial e1!ression

reco)nition( or more a!!lications you can visit:tt!:KK...(clo!inet(comKisabelleKProjectsKSVMKa!!list(tml

0- Hand.ritten di)it reco)nition: Te Success of SVM in Tis

a!!lication made it !o!ular:

0(0U test error rate for SVM in #IST ,055@( Tis is te same

as te error rates of a carefully constructed neural net.or&6

"e#et tat .as made *by and+ (90;



Today SVM is te best classification metod for and.ritten

di)it reco)nition 90=;:

2- Anoter field tat uses SVM is Medicine: it is used indetecting Microcalcifications in Mammo)rams .ic is an

indicator for breast cancer6 usin) SVM( when compared to

several other existing methods, the proposed SVM

framework offers the best performance [ 8

A!!lications of SVM: continued

A li ti f SVM ti d



-SVM even as uses in Stoc& mar&et field is Stoc& Mar&et:

o.4 many

a!!lications

for SVM4

A!!lications of SVM: continued




acial ?1!ression /eco)nition: based on acial ?1!ression

/eco)nition Fsin) SVM by Pili!! Micel et al 95;:

-Human bein)s naturally and intuitively use facial e1!ression

as an im!ortant and !o.erful modality to communicate teir

emotions and to interact socially(-acial e1!ression constitutes RR !ercent of te effect of a

communicated messa)e(

-In tis article facial e1!ression are divided into si1 basic *!ea&+

emotion classes : an)er6 dis)ust6 fear6 joy6 sorro.6 sur!riseN,Te neutral state is not a *!ea&+ emotion class


acial ?1!ression /eco)nition

A li ti f SVM



-Tree basic !roblems a facial e1!ression analysis a!!roac

needs to deal .it:

0-face detection in a still ima)e or ima)e se3uence :

Many articles as dealt .it tis !roblem suc as ViolaLones(

e assume a full frontal vie. of te face(2-acial e1!ression data e1traction:

-An Automatic trac&er e1tracts te !osition of 22 facial features

from te video stream ,or an ima)e if .e are .or&in) .it still

ima)e(-or eac e1!ression6 a vector of feature dis!lacements is

calculated by ta&in) te ?uclidean distance bet.een feature

locations in a neutral state of te face and a *!ea&+ frame

re!resentative of te e1!ression(






-acial e1!ression classification: e use Te SVM metod

.e sa. to construct our classifier and te vectors of feature

dis!lacements for te !revious sta)e are our in!ut(








vectors of feature dis!lacements




-A set of 0= e1am!les for eac basic emotion ,in still ima)es

.as used for trainin)6 follo.ed by classification of 0R unseen

e1am!les !er emotion( Tey used libsvm as te underlyin)

SVM classifier(

-At first Tey used te standard SVM classification usin)

linear &ernel and tey )ot <@U accuracy(-Ten .it subse3uent im!rovements includin) selection of

a &ernel function ,tey cose /B and te ri)t *$+

customized to te trainin) data6 te reco)nition accuracyboosted u! to @<(5U4-Te uman Qceilin)> in correctly classifyin) facial e1!ressions

into te si1 basic emotions as been establised at 50(<U by

?&man riesen






e see some !articular combinations suc as ,fear vs( dis)ust are arder

to distin)uis tan oters(

-Ten tey moved to constructin) teir classifier for streamin) video

rater tan still ima)es: $lic& ere for a demoof facial e1!ression

reco)nition ,from

anoter source but

also used SVM



Te Advanta)es of SVM:



►SVMs ave been demonstrated su!erior classification

Accuracies to neural net.or&s and oter metods in many

A!llications(90=;:

►Based on a stron) and nice Teory90=;:

-In contrast to !revious *blac& bo1+ learnin) a!!roaces6 SVMs allo.

for some intuition and uman understandin)(

►Trainin) is relatively easy90;:

-#o local o!timal6 unli&e in neural net.or& -Trainin) time does not de!end on dimensionality of

feature s!ace6 only on fi1ed in!ut s!ace tan&s to te &ernel tric&(

W7enerally avoids over-fittin) 90;:

- Tradeoff bet.een classifier com!le1ity and error can be

controlled e1!licitly(

Te Advanta)es of SVM:

-)eneralize .ell even in i) dimensional s!aces under small trainin) set

conditions( Also it is robust to noise90=;

Te %ra.bac&s of SVM:



Te %ra.bac&s of SVM:

WIt is not clear o. to select a &ernel function in a !rinci!led

manner92;(

Wat is te ri)t value for te *Trade-off+ !arameter *$+ 90;:

- e ave to searc manually for tis value6 Since .e

don>t ave a !rinci!led .ay for tat(

►Tends to be e1!ensive in bot memory and com!utational

time6 es!ecially for multiclass !roblems92;:

- Tis is .y some a!!lications use SVMs for verificationrater tan classification ( Tis strate)y is com!utationally

cea!er once SVMs are called just to solve difficult cases(

90=;



Soft.are: Po!ular im!lementations

SV%light : !ttp&//svmlig!t.0oac!ims.org/

By Loacims6 is one of te most .idely used SVM

classification and re)ression !ac&a)e( %istributed as $CC

source and binaries for "inu16 indo.s6 $y).in6 and

Solaris( 'ernels: !olynomial6 radial basis function6 and neural,tan(

1ibSV% & !ttp&//""".csie.ntu.edu.t"/2c0lin/libsvm/

"IBSVM ,"ibrary for Su!!ort Vector Macines6 is develo!ed

by $an) and "inX also .idely used( %evelo!ed in $CC andLava6 it su!!orts also multi-class classification6 .ei)ted

SVM for unbalanced data6 cross-validation and automatic

model selection( It as interfaces for Pyton6 /6 S!lus6

MAT"AB6 Perl6 /uby6 and "abVI?( 'ernels: linear6

!olynomial6 radial basis function6 and neural ,tan(

T t> ll f l& 44



$ec& ne1t Slides for /eferences

Tat>s all fol&s 44

/eferences:



/eferences:

0 Martin "a. : SVM lecture for $S? @=2 $S de!artment

MSF( 2 Andre. Moore: *Su!!ort vector macines+ $S scool

$MF(

Vi&ramaditya La&&ula : *Tutorial on Su!!ort vector

macines+ scool of ??$S asin)ton State Fniversity (

Andre. #) : *Su!!ort vector macines+ Stanford

university(

R #ello $ristianini : *Su!!ort Vector and 'ernel+ BI8.ulfTecnolo)ies(...(+su!!ort-vectors(net+

O $arlos Tomaz : *Su!!ort vector macines+ Intelli)ent

%ata Analysis and Probabilistic Inference

/eferences:



/eferences:

< 7re) Hamerly: SVM lecture ,$SI R2R

@ *SFPP8/T V?$T8/ MA$HI#? "?A/#I#7 8/

%?T?$TI8# 8 MI$/8$A"$II$ATI8#S IMAMM87/AMS+

!ssam "l#$aqa et%al

5& acial ?1!ression /eco)nition Fsin) Su!!ort VectorMacines+ 'hilipp Michel and (ana "l Kalioub) Fniversity of

$ambrid)e(

0=*Su!!ort Vector Macines for Hand.ritten #umerical Strin)

/eco)nition+ "uiz S( 8liveira and /obert Sabourin(00+ A !ractical )uide to Su!!ort Vector $lassifications+

$i-ei Hsu6 $i-$un) $an)6 and $i-Len "in

svm_lecture.ppt

Documents