svm_lecture.ppt

69
7/25/2019 SVM_Lecture.ppt http://slidepdf.com/reader/full/svmlectureppt 1/69 By : Moataz Al-Haj Vision Topics – Seminar (University of Haifa) Supervised by Dr. Hagit Hel-r 

Upload: avdesh-kothari

Post on 26-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 1/69

By : Moataz Al-HajVision Topics – Seminar (University of Haifa)

Supervised by Dr. Hagit Hel-r  

Page 2: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 2/69

 

Page 3: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 3/69

-Introduction to SVM : History and motivation

-Problem definition-Te SVM a!!roac: Te "inear se!arable case

-SVM: #on "inear se!arable case:

  - V$ %imension

  - Te &ernel Tric& : discussion on 'ernel functions(  -Soft mar)in: introducin) te slac& variables and

  discussin) te trade-off !arameter *$+(

  -Procedure for coosin) an SVM model tat best

fits our !roblem ,*'-fold+(

-Some A!!lications of SVM(

-$onclusion: Te Advanta)es and %ra.bac&s of SVM(

Page 4: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 4/69

Before startin):

0- trou)out te lecture if you see underlined red colored

te1t  ten clic& on tis te1t for farter information(

2-let me introduce you to *#odni&it+ :Se is an outstandin)

student( Altou) se as&s many 3uestions but sometimes

tese 3uestions are &ey 3uestions tat el! us understand

te material more in de!t( Also te notes tat se )ives are

very el!ful( Hi4

Before Startin)

Page 5: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 5/69

-Su!!ort Vector Macine ,SVM is a su!ervisedlearnin) al)oritm develo!ed by Vladimir Va!ni& andit .as first eard in 05526 introduced by Va!ni&6

Boser and 7uyon in $8"T-52(9; -,it is said tat Vladimir Va!ni& as mentioned itsidea in 05<5 in one of is !a!er but its majordevelo!ment .as in te 5=>s

- For many years #eural #et.or&s .as te ultimatecam!ion 6it .as te most effective learnin)al)oritm(

 

Introduction to SVM : History and motivation

TI"" SVM $AM? 4

Page 6: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 6/69

Introduction to SVM : History and motivation cont> 

-SVM became !o!ular because of its success in

and.ritten di)it reco)nition ,in #IST ,055@( it )ave

accuracy tat is com!arable to so!isticated and

carefully constructed neural net.or&s .it elaboratedfeatures in a and.ritin) reco)nition tas& (90;

-Muc more effective *off te self + al)oritm tan

#eural #et.or&s : It )eneralize )ood on unseen dataand is easier to train and doesn>t ave any local

o!tima in contrast to neural net.or&s tat may ave

many local o!tima and ta&es a lot of time to conver)e(

9;

Page 7: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 7/69

Introduction to SVM : History and motivation cont>

- SVM as successful a!!lications in many

com!le16 real-.orld !roblems suc as te1t and

ima)e classification6 and-.ritin) reco)nition6

data minin)6 bioinformatics6 medicine and

biose3uence analysis and even stoc& mar&et4

- In many of tese a!!lications SVM is te best

coice(- e .ill furter elaborate on some of tese

a!!lications latter in tis lecture(

Page 8: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 8/69

Problem definition:

-e are )iven a set of n !oints ,vectors :

suc tat is a vector of len)t m 6

and eac belon) to one of t.o classes .e label tem

  by *C0+ and *-0+(-So our trainin) set is:

- e .ant to find a se!aratin) y!er!lane

tat se!arates tese !oints into te t.o classes(

*Te !ositives+ ,class *C0+ and *Te ne)atives+ ,class *-0+(

,Assumin) tat tey are linearly se!arable

1 2, ,....... n x x x

i x

1 1 2 2( , ), ( , ),....( , )n n x y x y x y

, { 1, 1}m

i ii x R y∀ ∈ ∈ + −

0w x b× + =

So t!e decision

function "ill be

( ) ( ) f x sign w x b= × +

Page 9: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 9/69

 A se!aratin)

y!re!lane 

1i y   = −

1i y   = +

Se!aratin) Hy!er!lane

But Tere are many !ossibilities

  for suc y!er!lanes 44

( ) ( ) f x sign w x b= × +

0w x b× + =

1 x

2 x

Page 10: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 10/69

Se!aratin) Hy!er!lanes

1i y   = −1i y   = +

Des6 Tere are many !ossible se!aratin) y!er!lanes

It could be

ic one sould .e

coose4

tis one or tis or tis or maybeE(4

Page 11: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 11/69

$oosin) a se!aratin) y!er!lane:

i x

' x

( )   f x

-Su!!ose .e coose te y!re!lane ,seen belo. tat is

close to some sam!le  (

- #o. su!!ose .e ave a ne. !oint tat sould be in class

*-0+ and is close to ( Fsin) our classification functiontis !oint is misclassified4

i x

i x

' x

Poor )eneralization4

,Poor !erformance onunseen data

( ) ( ) f x sign w x b= × +

Page 12: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 12/69

$oosin) a se!aratin) y!er!lane:

i x' x

-Hy!er!lane sould be as far as !ossible from any

sam!le !oint(

-Tis .ay a ne. data tat is close to te oldsam!les .ill be classified correctly(

7ood )eneralization4

Page 13: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 13/69

$oosin) a se!aratin) y!er!lane(

Te SVM a!!roac: "inear se!arable case

-Te SVM idea is to ma1imize te distance bet.een

Te y!er!lane and te closest sam!le !oint(

In te o!timal y!er-

!lane:

Te distance to te

closest ne)ative !oint G

Te distance to teclosest !ositive !oint(

 Aa4 I see 4

Page 14: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 14/69

$oosin) a se!aratin) y!er!lane(

Te SVM a!!roac: "inear se!arable case

i x

   M  a  r  )    i  n

  d   d

  d

SVM>s )oal is to ma1imize te Mar)in .ic is t.ice

te distance *d+ bet.een te se!aratin) y!er!lane

and te closest sam!le(

y it is te best-/obust to outliners as

.e sa. and tus

stron) )eneralization

ability(-It !roved itself to ave

better !erformance on

test data in bot

!ractice and in teory(

Page 15: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 15/69

$oosin) a se!aratin) y!er!lane(

Te SVM a!!roac: "inear se!arable case

i x

   M  a  r  )    i  n

  d   d

  d

Tese are

Su!!ort

Vectors

Su!!ort vectors are te sam!les closest to te

se!aratin) y!er!lane(

e .ill see latter tat te

8!timal y!er!lane is

com!letely defined by

te su!!ort vectors(

84 So tis is .ere

te name came from4

Page 16: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 16/69

w

w

iγ  

i x

 p

"et us loo& at our decision

boundary :Tis se!aratin)

y!er!lane e3uation is :

ere

#ote tat is orto)onal to

te se!aratin) y!er!lane and

its len)t is 0(

, ,m mw R x R b R∈ ∈ ∈

iγ  "et be te distance bet.een te y!er!lane and

Some trainin) e1am!le ( So is te len)t of te

se)ment from ! to (i x

iγ  

i

 x

w

w

0t w x b+ =

 SVM : "inear se!arable case(

ormula for te Mar)in

Page 17: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 17/69

i

γ  

i x

 p

w

w

! is !oint on te y!re!lane

so ( 8n te oter

and  (i i

w p x

wγ = − ×

define

#ote tat if .e can)ed to and to tis

.ill not affect since  (

0t 

w p b+ =

( ) 0t 

i i

ww x b

wγ  − + =

i

i

w x b

wγ   

× +

=

w   wα    b   bα d 

 SVM : "inear se!arable case(

t t w x b w x b

w w

α α 

α 

+ +=

ormula for te Mar)in cont>

1.. 1..min min

t i

ii n i n

w x bd 

wγ  

∈ ∈

+= =

Page 18: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 18/69

-"et be a sam!le !oint closet to

Te boundary( Set

,.e can rescale . and b(

-or uni3ueness set for 

any sam!le closest to teboundary(

So no.

' x

' 1t w x b+ =

i

 x

2m

w

=

  d  d

  m

Te Mar)in

1t 

iw x b+ =

 SVM : "inear se!arable case( 

'   1t w x b

d w w

+= =

ormula for te Mar)in cont>

' x

Page 19: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 19/69

To find te o!timal se!aratin) y!er!lane 6 SVM

aims to ma1imize te mar)in:

-Ma1imize 

2

m w=

suc

tat:For 1, 1

For 1, 1

i i

i i

 y b

 y b

= + + ≥

= − + ≤ −

w x

w x

Minimize 

21

2 w

( ) 1T 

i i y b+ ≥w x

suc

tat:

SVM : "inear se!arable case( 

indin) te o!timal y!er!lane:

e transformed te !roblem into a form tat can be

efficiently solved( e )ot an o!timization !roblem .it a

conve1 3uadratic objective .it only linear constrains and

al.ays as a sin)le )lobal minimum(

Page 20: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 20/69

-8ur o!timization !roblem so far:

( ) 1T 

i i y b+ ≥w x

21minimize

2w

s(t(

-e .ill solve tis !roblem by introducin) "a)ran)e

multi!liers  associated .it te constrains: iα 

2

1

1minimize ( , , ) ( ( ) 1)

2

. 0

n

 p i i i

i

i

 L w b w y x w b

 s t 

α α 

α 

=

= − × + −

I do remember te

"a)ran)e Multi!liers

from $alculus4

SVM : "inear se!arable case( 

Te o!timization !roblem:

Page 21: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 21/69

So our !rimal o!timization !roblem no.:

e star solvin) tis !roblem:

0 p L

b

∂=

0 p L∂

=∂w 1

n

i i i

i

 yα =

= ∑w x

1

0n

i i

i

 yα =

=∑

SVM : "inear se!arable case( 

Te o!timization !roblem cont>:

2

1

1minimize ( , , ) ( ( ) 1)

2. 0

n

 p i i i

i

i

 L w b w y x w b

 s t 

α α 

α =

= − × + −

Page 22: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 22/69

By substitutin) te above results in te !rimal

!roblem and doin) some mat mani!ulation .e )et:"a)ran)ian %ual Problem:

 

1 0 0

1

1maximaize ( )

2

. 0 0

n n nt 

 D i i j i j i j

i i j

n

i i i

i

 L y y x x

 s t and y

α α α α  

α α 

= = =

=

= −

≥ =

∑ ∑∑

are no. our variables6 one for eac sam!le

!oint (1 2

{ , ,........., }nα α α α  =

SVM : "inear se!arable case(

Inroducin) Te "e)ran)in %ual Problem(

i x

Page 23: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 23/69

Fsin) te ''T ,'arus-'un-Tuc&er condition: 

( )( ) 1 0T 

i i i y bα    + − =w xi∀

-e can calculate  *b+ by ta&in) * i+ suc tat  :0iα   >

Must be 1

( ) 1 0 ( {1, 1})t t t 

i i i i i i

i

 y w x b b w x y w x y y

+ − = ⇒ = − = − ∈ −

-$alculatin) *.+ .ill be done usin) .at .e ave 

found above : i i i

i

w y xα = ∑

-Fsually 6Many of te -s are zero so te

calculation of *.+ as a lo. com!le1ity(iα 

SVM : "inear se!arable case(

indin) *.+ and *b+ for te boundary  :t w x b+

Page 24: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 24/69

-Sam!les .it are te Su!!ort Vectors: te

closest sam!les to te se!aratin) y!er!lane( 

0iα   >

-So (

-And suc tat is a su!!ort vector(

 -e see tat te se!aratin) y!er!lane is

com!letely defined by te su!!ort vectors(

1 SV

n

i i i i i i

i i y yα α 

= ∈= =∑ ∑

w x x

i ib y w x= −   i x

t w x b+

-#o. our %ecision unction is:

( ) ( ) ( )t 

i i i

i SV 

 f x sign w x b sign y x x bα 

= + = × +∑

SVM : "inear se!arable case(

Te im!ortance of te Su!!ort Vectors : 

Page 25: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 25/69

1 0 0

1

1maximaize ( )

2

. 0 0

n n nt 

 D i i j i j i j

i i j

n

i i i

i

 L y y x x

 s t and y

α α α α  

α α 

= = =

=

= −

≥ =

∑ ∑∑

( ) D

 L   α  $an be o!timized usin) a JP soft.are( Some e1am!les

are "o3o6 c!le16 etc( ,see tt!:KK...(numerical(rl(ac(u&K3!K3!(tml

-But for SVM te most !o!ular JP is Se3uential Minimal

8!timization ,SM8: It .as introduced by Lon $( Platt in

0555(And it is .idely used because of its efficiency (9;

-Tis is a 3uadratic !ro)rammin) ,JP !roblem(

 A )lobal ma1imum of can al.ays be found( ) D L   α 

SVM : "inear se!arable case( 

Some notes on te dual !roblem:

Page 26: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 26/69

at if te sam!le !oints are not linearly

se!arable 4

 %efinition: *Te V$ dimension of a class of functions fi N is te

ma1imum number of !oints tat can be se!arated ,sattered 

into t.o classes in all !ossible .ays by fi N (+ 9O;

-if .e loo& at any ,non -collinear tree !oints in 2d !lane teycan be "inearly se!arated:

Tese ima)es above are ta&en fromE(

 Te V$ dimension for a set of oriented lines in is ( 2 R

V$ ,Va!ni&-$ervonen&is %imension

Page 27: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 27/69

our !oints not

se!arable in

By a y!re!lane

But can be se!arable in

  By a y!re!lane

2 R

3 R

-+Te V$ dimension of te set of orientedy!er!lanes in is nC0(+ 9O;n R

-Tus it is al.ays !ossible6 for a finite set of !oints

to find a dimension .ere all !ossible se!aration

of te !oint set can be acieved by a y!er!lane(

V$ %imension cont>

Page 28: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 28/69

'ey idea: ma! our !oints .it a ma!!in) function

to a s!ace of sufficiently i) dimension so tat tey .ill be se!arable by a y!re!lane:

0   x

or e1am!le :a non linearly se!arable in one dimension: 

ma!!in) data to t.o-dimensional s!ace .it  2

( ) ( , ) x x xφ    =

0   x

 x2

( ) xφ 

o.46 no. .e can

use te linear SVM

.e learned in tis

i)er dimensional

s!ace4

#on-linear SVM :

Ma!!in) te data to i)er dimension

 -In!ut s!ace: te s!ace .ere te !oints #i are located

-eature s!ace: te s!ace of φ,#i after transformation

Page 29: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 29/69

#on "inear SVM:

-To solve a non linear classification !roblem .it a

linear classifier all .e ave to do is to substitute  ( ) xφ 

Instead of 1 every.ere .ere 1 a!!ears in te

o!timization !roblem:

1 1 1 1

1maximize ( ) . 0 0

2

n n n nt 

 D i i j i j i j i i i

i i j i

 L y y x x s t yα α α α α α  = = = =

= − ≥ =∑ ∑∑ ∑

#o. it .ill be:

1 1 1 1

1maximize ( ) ( ) ( ) . 0 02

n n n nt 

 D i i j i j i j i i i

i i j i

 L y y x x s t yα α α α φ φ α α  = = = =

= − ≥ =∑ ∑∑ ∑

Te decision function .ill be: ( ) ( ( )) ( ( ) )t  g x f x sign w x bφ φ = = × +

Ma!!in) te data to i)er dimension cont>

$lic& ere to see a demonstration of ma!!in) te data to a

i)er dimension so tat te can be linearly s!arable(

Page 30: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 30/69

#on "inear SVM :

 An illustration of te al)oritm:

Page 31: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 31/69

But $om!utations in te feature s!ace can be costly because it may

be i) dimensional 4

Te 'ernel Tric&:

Tat>s ri)t 46 .or&in) in i) dimensional

s!ace is com!utationally e1!ensive(

-But luc&ily te &ernel tric& comes to rescue: 

If .e loo& a)ain at te o!timization !roblem:

1 1 1 1

1maximize ( ) ( ) ( ) . 0 0

2

n n n nt 

 D i i j i j i j i i i

i i j i

 L y y x x s t yα α α α φ φ α α  = = = =

= − ≥ =∑ ∑∑ ∑

 And te decision function: 

#o need to &no. tis ma!!in) e1!licitly nor do .e need to

&no. te dimension of te ne. s!ace6 because .e only use 

te dot !roduct of feature vectors in bot te trainin) and test(

1

( ( )) ( ( ) ) ( ( ) (   ) )n

t t 

i i i

i

  f x sign w x b sign y x x bφ φ α φ φ  =

= + = +∑

Page 32: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 32/69

 A kernel function is defined as a function tat

corres!onds to a dot !roduct of t.o feature vectors

in some e1!anded feature s!ace:

( , ) ( ) ( )T 

i j i j K    φ φ ≡x x x x

#o. .e only need to com!ute and .e don>t

need to !erform com!utations in i) dimensional

s!ace e1!licitly( Tis is .at is called te 'ernel Tric&( 

( , )i j

 K x x

Te 'ernel Tric&:

Page 33: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 33/69

'ernel Tric&: $om!utational savin) of te &ernel tric&

?1am!le Juadratic Basis function: ,Andre. Moore 

Te cost of

com!utation is: 

2( )O m

,m is te dimension of in!ut

ere as te corres!ondin) 'ernel is :2( , ) ( 1) K a b a b= × +

( )O mTe cost of com!utation is:

To believe me tat it

is really te real

'ernel :

Page 34: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 34/69

Hi)er 8rder Polynomials ,rom Andre. Moore

( ) ( )kl k l k l  

Q y y x xφ φ =

/ is te number of sam!les6 m is te dimension of te

sam!le !oints(

1 ,k l R≤ ≤

Page 35: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 35/69

,a&a te 7ram matri1:

-Te central structure in &ernel macines

-Information Qbottlenec&>: contains all necessaryinformation for te learnin) al)oritm(

-one of its most interestin) !ro!erties: Mercer>s

Teorem(  based on notes from ...(su!!ort-vectors(com

'G

Te 'ernel Matri1

Page 36: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 36/69

Mercer>s Teorem:

-A function is a &ernel ,tere e1ists a

suc tat Te 'ernel matri1

is Symmetric Positive Semi-definite(

( , )i j K x x

-Anoter version of mercer>s teorem tat isn>t

related to te &ernel matri1 is: function 

is a &ernel 

( , )i j

 K x x

2

( )

( )

( , ) ( ) ( ) 0

  for any suc tat 

is fi

 g u

 g u du

 K u ! g 

nit" t"n

u g ! dud! ≥

∫ 

∫ 7reat46 so &no.

.e can cec& if

+ ' *is a &ernel

.itout te need

to &no.  ( ) xφ 

( ) xφ 

( , ) ( ) ( )T 

i j i j K    φ φ ≡x x x x

Page 37: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 37/69

?1am!les of 'ernels:

-Some common coices ,te first t.o al.ays

satisfyin) Mercer>s condition:

-Polynomial &ernel   ( , ) ( 1)t p

i j i j K x x x x= +

-7aussian /adial Basis unction */B+ ,data is lifted

to infinite dimension:2

2

1( , ) exp( )

2i j i j K x x x x

σ  

= − −

-In fact6 SVM model usin) a si)moid &ernel function is

e3uivalent to a t.o-layer6 feed-for.ard neural net.or&(  

( , ) tanh( )i j i j K x x kx x   δ = × −

δ 

-Si)moidal : ,it is not a

&ernel for every & and ( 

Page 38: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 38/69

Ta&en from ,$SI R2R SVM

lecture 9<;

#o. .e can

ma&e com!le1

&ernels from

sim!le ones:

Modularity 4

Ma&in) 'ernels:

Page 39: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 39/69

Im!ortant 'ernel Issues:

Ho. to verify tat risin) to i)er dimension usin) a

s!ecific &ernel .ill ma! te data to a s!ace in .ic

tey are linearly se!arable

Ho. to &no. .ic 'ernel to use

or most of te &ernel function .e don>t &no. te corres!ondin)ma!!in) function so .e don>t &no. to .ic dimension .e

rose te data( So even tou) risin) to i)er dimension

increases te li&eliood tat tey .ill be se!arable .e can>t

)uarantee tat ( e .ill see a com!romisin) solution for tis

!roblem(

-Tis is a )ood 3uestion and actually still an o!en 3uestion6

many researces ave been .or&in) to deal .it tis issue

but still .e don>t ave a firm ans.er( It is one of te .ea&ness

of SVM( e .ill see an a!!roac to tis issue latter(

( ) xφ 

I ave some 3uestions

on &ernels( I .rote tem

on te board( 

Page 40: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 40/69

Im!ortant 'ernel Issues:

e sa. tat te 7aussian /adial Basis 'ernel lifts te

data to infinite dimension so our data is al.ays

se!arable in tis s!ace so .y don>t .e al.ays use

tis &ernel irst of all .e sould decide .ic to use in tis &ernel ,

(

Secondly6A stron) &ernel 6.ic lifts te data to infinite dimension6

sometimes may lead us te severe !roblem of  8verfittin):

 

σ 2

2

1exp( )

2  i j x x

σ − −

Sym!toms of overfittin):

0-"o. mar)in !oor classification !erformance(

2-"ar)e number of su!!ort vectors Slo.s do.n te

com!utation(

Page 41: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 41/69

-If .e loo& at te &ernel matri1 ten it is almost dia)onal(

Tis means tat te !oints are orto)onal and only similar to

itself(

 All tese tin)s lead us to say tat our &ernel function is notreally ade3uate( Since it does not )eneralize )ood over te

data(

-It is )ood to say tat 7aussian radial basis function ,/B is

.idely used6 BFT not alone because teir )ot to be a tool torelease some !ressure of tis stron) &ernel(

Im!ortant 'ernel Issues:

In addition to te above !roblems 6 anoter !roblem is tat

sometimes te !oints are linearly se!arable but te mar)in is

"o. :

Page 42: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 42/69

Im!ortant 'ernel Issues:

"inearly se!arable

But lo. mar)in4

 All tese !roblems leads us to te com!romisin)

solution:

Soft Mar)in4

Page 43: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 43/69

Soft Mar)in:

-e allo. *error+  in classification( e use *slac&+

Variables ,one for eac sam!le(

iξ 

1 2, ,...... nξ ξ ξ 

0 1i

ξ < <

0 1iξ < <

1iξ   >

i

ξ  Is te deviation error

from ideal !lace for

sam!le i: 

-If ten sam!le i is

on te .ron) side of te

y!er!lane(

1iξ   >

-If ten sam!le i is

on te ri)t side of te

y!er!lane but .itin tere)ion of te mar)in(

0 1iξ < <

Page 44: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 44/69

Soft Mar)in:

Ta&en from 900;

Page 45: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 45/69

Soft Mar)in: 

Te !rimal o!timization !roblem

-e can)e te constrains to

instead of   (

8ur o!timization !roblem no. is:

( ) 1 0t 

i i i i y w x b iξ ξ + ≥ − ∀ ≥

( ) 1t 

i i y w x b i+ ≥ ∀

2

1

1minimize

2

n

i

i

#    ξ =

+   ∑w

Suc tat:   ( ) 1 0t 

i i i i y w x b iξ ξ + ≥ − ∀ ≥

0#  >  is a constant( It is a &ind of !enalty on te 

term ( It is a tradeoff bet.een te mar)in and te

trainin) error( It is a .ay to control overfittin) alon)

.it te ma1imum mar)in a!!roac90;(

1

n

i

i

ξ =

Page 46: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 46/69

8ur dual o!timization !roblem no. is:

1 1 1

1maximize

2

n n nT 

i i j i j i j

i i j

 y yα α α = = =

−∑ ∑∑   x x

Suc tat:1

0 0

n

i i i

i

# i and yα α =

≤ ≤ ∀ =∑-e can find *.+ usin) :

1

n

i i i

i

w y xα =

= ∑-To com!ute *b+ .e ta&e any and solve for  *b+( 0 i   # α < <

[ ( ) 1] 0t i i i y w x bα    + − =

Soft Mar)in: 

Te %ual ormulation(

ic value for *$+

sould .e coose(

( ) 1 (poi

0 ( ) 1

nts 0)

0 ( ) 1

i i i

i

i i i i

i i

# y w x b wit

 y w x b

# y w x b

α 

α 

α ξ 

= ⇒ + >

< < ⇒ + =

= ⇒ + < >

Page 47: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 47/69

-*$+ !lays a major role in controllin) overfittin)(

-"ar)er $ less trainin) sam!les tat are not in ideal !osition,.ic means less trainin) error tat affects !ositively te

$lassification Performance ,$P But smaller mar)in ,affects

ne)atively te ,$P ($ lar)e enou) may lead us to overffitin)

,too muc com!licated classifier tat fits only te trainin) set  -Smaller $ more trainin) sam!les tat are not in ideal

!osition ,.ic means more trainin) error tat affects ne)atively

te $lassification Performance ,$P But lar)er Mar)in ,)ood for

,$P( $ small enou) may lead to underffitin) ,nave classifier

-indin) te */i)t+ value for *$+ is one of te major

!roblems of SVM:

Soft Mar)in: 

Te *$+ Problem

Page 48: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 48/69

Too muc sim!le4 Too muc com!licated4

Soft Mar)in: 

Te *$+ Problem: 8verfittin) and Fnderfittin)

Fnder-ittin) 8ver-ittin)

Trade-8ff 

Based on 02 and

Page 49: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 49/69

SVM :#onlinear case

$ecipe and %odel selection procedure& 

-In most of te real-.orld a!!lications of SVM .e combine .at

.e learned about te &ernel tric& and te soft mar)in and use

tem to)eter :

i 1 1 1

1

1maximize ( , )

2

0 0

n n n

i i j i j i j

i j

n

i i i

i

 y y K x x

constrain"d to # i and y

α α α 

α α 

= = =

=

≤ ≤ ∀ =

∑ ∑ ∑

-e solve for usin) a Juadratic Pro)rammin) soft.are( α 

-To find *b+ .e ta&e any and solve0i

  # α < <   [ ( ) 1] 0t 

i i i y w x bα    + − =

-Te $lassification function .ill be:1( ) ( ( , ) )

n

i i ii g x sign y K x x bα =

= +

1

( ) ( " ))" (n

 j j j

 j

w y x $o n""d to find b"caus" w" may not knw   ow xα φ φ =

= ∑

1 1

( ( ( )) ( ) ) 1 ( , )n n

i j j j i i j j j i

 j j

 y y x x b b y y K x xα φ φ α  = =

⇒ + = ⇒ = −∑ ∑

SVM # li

Page 50: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 50/69

0 divide randomly all te available trainin) e1am!les into K

e3ual-sized subsets(

2 use all but one subset to train te SVM .it te cosen !ara>( use te eld out subset to measure classification error(

re!eat Ste!s 2 and for eac subset(

R avera)e te results to )et an estimate of te )eneralization

error of te SVM classifier(

 

SVM:#onlinear case

%odel selection procedure

-e ave to decide .ic erne! function and *$+ value to use(

-+In !ractice a "a#ssian radial basis or a lo. de)ree !olynomial

&ernel is a )ood start(+ 9$n%re&.oore; 

- e start cec&in) .ic set of !arameters ,suc as $

or if .e coose 7aussian radial basis are te mosta!!ro!riate by $ross-Validation ,'- fold , 9 @ ; :

σ 

Page 51: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 51/69

-Te SVM is tested usin) tis !rocedure for various !arameter

settin)s( In te end6 te model .it te smallest )eneralization

error is ado!ted( Ten .e train our SVM classifier usin) tese

!arameters over te .ole trainin) set(

- or 7aussian /B tryin) e1!onentially )ro.in) se3uences of$ and is a !ractical metod to identify )ood !arameters :

  - A )ood coice is te follo.in) )rid:

 

SVM:#onlinear case

%odel selection procedure cont'

Tis )rid is su))ested by "ibSVM ,An inte)rated and easy-

to-use tool for SVM classifier

σ 

1

1 1 3

2 , 2 ,......, 2

2 , 2 ,...., 2

σ 

− −

− −

=

=

Page 52: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 52/69

SVM:#onlinear case 

%odel selection procedure& e#ample

Tis e1am!le is !rovided in te libsvm )uide( In tis e1am!le

tey are searcin) te *best+ values for *$+ and  for an

/B 'ernel for a )iven trainin) usin) te model selection

!rocedure .e sa. above( 

σ 

*2 , 2# 

is a good coic"

σ    −= =

SVM M lti l l ifi ti , t t

Page 53: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 53/69

Tere are t.o basic a!!roaces to solve q-class !roblems, .it SVMs ,90=;6900;:0- 8ne vs( 8ters:

.or&s by constructin) a *re)ular+ SVM for eac class i tat

se!arates tat class from all te oter classes ,class * i+ !ositiveand *not i+ ne)ative( Ten .e cec& te out!ut of eac of te

3 SVM classifiers for our in!ut and coose te class i tat its

corres!ondin) SVM as te ma1imum out!ut(

2% >

iω 

2-Pair.ise ,one vs one:e construct */e)ular+ SVM for eac !air of classes ,so .econstruct 3,3-0K2 SVMs( Ten .e use *ma1-.ins+ votin)

strate)y: .e test eac SVM on te in!ut and eac time an

SVM cooses a certain class .e add vote to tat class( Ten

.e coose te class .it i)est number of votes(

SVM or Multi-class classification: ,more tan t.o

classes

( ( ) )t  g x w x b= +

SVM M lti l l ifi ti t>

Page 54: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 54/69

-More advanced metod to im!rove !air.ise metod includes

usin) decision )ra!s to determine te class selected in a

similar manner to &noc&out tournaments: 

#ample of advanced pair"ise

SV%. T!e numbers -* encode

t!e classes. Ta+en from,

-Bot mentioned metods above )ive in avera)e com!arable

accuracy results ,.ere as te second metod is relatively

slo.er tan te first (-Sometimes for certain a!!lication one metod is !referable

over te oter(

SVM or Multi-class classification cont>:

A li ti f SVM

Page 55: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 55/69

 A!!lications of SVM:

e .ill see no. some a!!lications for SVM from different

fields and elaborate on one of tem .ic is facial e1!ression

reco)nition( or more a!!lications you can visit:tt!:KK...(clo!inet(comKisabelleKProjectsKSVMKa!!list(tml

0- Hand.ritten di)it reco)nition: Te Success of SVM in Tis

a!!lication made it !o!ular:

0(0U test error rate for SVM in #IST ,055@( Tis is te same

as te error rates of a carefully constructed neural net.or&6

"e#et tat .as made *by and+ (90;

Page 56: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 56/69

Today SVM is te best classification metod for and.ritten

di)it reco)nition 90=;:

2- Anoter field tat uses SVM is Medicine: it is used indetecting Microcalcifications in Mammo)rams .ic is an

indicator for breast cancer6 usin) SVM( when compared to

several other existing methods, the proposed SVM

framework offers the best performance [ 8  

 A!!lications of SVM: continued

A li ti f SVM ti d

Page 57: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 57/69

-SVM even as uses in Stoc& mar&et field is Stoc& Mar&et: 

o.4 many

a!!lications

for SVM4

 A!!lications of SVM: continued

A!!lications of SVM:

Page 58: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 58/69

acial ?1!ression /eco)nition: based on acial ?1!ression

/eco)nition Fsin) SVM by Pili!! Micel et al 95;:

-Human bein)s naturally and intuitively use facial e1!ression

as an im!ortant and !o.erful modality to communicate teir

emotions and to interact socially(-acial e1!ression constitutes RR !ercent of te effect of a

communicated messa)e(

-In tis article facial e1!ression are divided into si1 basic *!ea&+

emotion classes : an)er6 dis)ust6 fear6 joy6 sorro.6 sur!riseN,Te neutral state is not a *!ea&+ emotion class

 A!!lications of SVM:

acial ?1!ression /eco)nition

A li ti f SVM

Page 59: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 59/69

-Tree basic !roblems a facial e1!ression analysis a!!roac

needs to deal .it:

0-face detection in a still ima)e or ima)e se3uence :

Many articles as dealt .it tis !roblem suc as ViolaLones(

e assume a full frontal vie. of te face(2-acial e1!ression data e1traction:

-An Automatic trac&er e1tracts te !osition of 22 facial features

from te video stream ,or an ima)e if .e are .or&in) .it still

ima)e(-or eac e1!ression6 a vector of feature dis!lacements is

calculated by ta&in) te ?uclidean distance bet.een feature

locations in a neutral state of te face and a *!ea&+ frame

re!resentative of te e1!ression(

 A!!lications of SVM:

acial ?1!ression /eco)nition

A!!lications of SVM:

Page 60: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 60/69

-acial e1!ression classification: e use Te SVM metod

.e sa. to construct our classifier and te vectors of feature

dis!lacements for te !revious sta)e are our in!ut(

 A!!lications of SVM:

acial ?1!ression /eco)nition

A!!lications of SVM:

Page 61: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 61/69

 A!!lications of SVM:

acial ?1!ression /eco)nition

vectors of feature dis!lacements

A!!lications of SVM:

Page 62: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 62/69

-A set of 0= e1am!les for eac basic emotion ,in still ima)es

.as used for trainin)6 follo.ed by classification of 0R unseen

e1am!les !er emotion( Tey used libsvm as te underlyin)

SVM classifier(

-At first Tey used te standard SVM classification usin)

linear &ernel and tey )ot <@U accuracy(-Ten .it subse3uent im!rovements includin) selection of

a &ernel function ,tey cose /B and te ri)t *$+

customized to te trainin) data6 te reco)nition accuracyboosted u! to @<(5U4-Te uman Qceilin)> in correctly classifyin) facial e1!ressions

into te si1 basic emotions as been establised at 50(<U by

?&man riesen

 A!!lications of SVM:

acial ?1!ression /eco)nition

A!!lications of SVM:

Page 63: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 63/69

e see some !articular combinations suc as ,fear vs( dis)ust are arder

to distin)uis tan oters(

-Ten tey moved to constructin) teir classifier for streamin) video

rater tan still ima)es: $lic& ere for a demoof facial e1!ression

reco)nition ,from

anoter source but

also used SVM

 A!!lications of SVM:

acial ?1!ression /eco)nition

Te Advanta)es of SVM:

Page 64: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 64/69

►SVMs ave been demonstrated su!erior classification

 Accuracies to neural net.or&s and oter metods in many

 A!llications(90=;:

 

►Based on a stron) and nice Teory90=;:

  -In contrast to !revious *blac& bo1+ learnin) a!!roaces6 SVMs allo.

for some intuition and uman understandin)(

►Trainin) is relatively easy90;:

-#o local o!timal6 unli&e in neural net.or&  -Trainin) time does not de!end on dimensionality of

feature s!ace6 only on fi1ed in!ut s!ace tan&s to te &ernel tric&(

 W7enerally avoids over-fittin) 90;:

  - Tradeoff bet.een classifier com!le1ity and error can be

  controlled e1!licitly(

Te Advanta)es of SVM:

-)eneralize .ell even in i) dimensional s!aces under small trainin) set

conditions( Also it is robust to noise90=;

Te %ra.bac&s of SVM:

Page 65: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 65/69

Te %ra.bac&s of SVM:

WIt is not clear o. to select a &ernel function in a !rinci!led

manner92;(

Wat is te ri)t value for te *Trade-off+ !arameter *$+ 90;: 

- e ave to searc manually for tis value6 Since .e

don>t ave a !rinci!led .ay for tat(

 ►Tends to be e1!ensive in bot memory and com!utational

time6 es!ecially for multiclass !roblems92;:

- Tis is .y some a!!lications use SVMs for verificationrater tan classification ( Tis strate)y is com!utationally

cea!er once SVMs are called just to solve difficult cases(

90=;

Page 66: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 66/69

Soft.are: Po!ular im!lementations 

SV%light : !ttp&//svmlig!t.0oac!ims.org/

By Loacims6 is one of te most .idely used SVM

classification and re)ression !ac&a)e( %istributed as $CC

source and binaries for "inu16 indo.s6 $y).in6 and

Solaris( 'ernels: !olynomial6 radial basis function6 and neural,tan(

1ibSV% & !ttp&//""".csie.ntu.edu.t"/2c0lin/libsvm/

"IBSVM ,"ibrary for Su!!ort Vector Macines6 is develo!ed

by $an) and "inX also .idely used( %evelo!ed in $CC andLava6 it su!!orts also multi-class classification6 .ei)ted

SVM for unbalanced data6 cross-validation and automatic

model selection( It as interfaces for Pyton6 /6 S!lus6

MAT"AB6 Perl6 /uby6 and "abVI?( 'ernels: linear6

!olynomial6 radial basis function6 and neural ,tan(

T t> ll f l& 44

Page 67: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 67/69

$ec& ne1t Slides for /eferences 

Tat>s all fol&s 44

/eferences:

Page 68: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 68/69

/eferences:

0  Martin "a. : SVM lecture for $S? @=2 $S de!artment

MSF( 2 Andre. Moore: *Su!!ort vector macines+ $S scool

$MF(

 Vi&ramaditya La&&ula : *Tutorial on Su!!ort vector

macines+ scool of ??$S asin)ton State Fniversity (

 Andre. #) : *Su!!ort vector macines+ Stanford

university(

R #ello $ristianini : *Su!!ort Vector and 'ernel+ BI8.ulfTecnolo)ies(...(+su!!ort-vectors(net+

O $arlos Tomaz : *Su!!ort vector macines+ Intelli)ent

%ata Analysis and Probabilistic Inference

/eferences:

Page 69: SVM_Lecture.ppt

7/25/2019 SVM_Lecture.ppt

http://slidepdf.com/reader/full/svmlectureppt 69/69

/eferences:

< 7re) Hamerly: SVM lecture ,$SI R2R

@ *SFPP8/T V?$T8/ MA$HI#? "?A/#I#7 8/

%?T?$TI8# 8 MI$/8$A"$II$ATI8#S IMAMM87/AMS+

!ssam "l#$aqa et%al 

5& acial ?1!ression /eco)nition Fsin) Su!!ort VectorMacines+ 'hilipp Michel and (ana "l Kalioub) Fniversity of

$ambrid)e(

0=*Su!!ort Vector Macines for Hand.ritten #umerical Strin)

/eco)nition+ "uiz S( 8liveira and /obert Sabourin(00+ A !ractical )uide to Su!!ort Vector $lassifications+

$i-ei Hsu6 $i-$un) $an)6 and $i-Len "in