lecture12 - svm

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 12Lecture 12Support Vector Machines

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Recap of Lecture 111st generation NN: Perceptrons and othersg p

Also multi-layer percetronsAlso multi-layer percetrons

Slide 2Artificial Intelligence Machine Learning

Recap of Lecture 112nd generation NNg

Some people figure it out how to adapt the weights of internal layers aye s

Seemed to be very powerful and able to solve almost anything

Slide 3

The reality showed that this was not exactly trueArtificial Intelligence Machine Learning

Today’s Agenda

Moving to SVMgLinear SVM

The separable caseThe non-separable case

Non Linear SVMNon-Linear SVM


IntroductionSVM (Vapnik, 1995)( p , )

Clever type of perceptron

I t d f h d di th l f d ti f t hInstead of hand-coding the layer of non-adaptive features, each training example is used to create a new feature using a fixed recipeec pe

A clever optimization technique is used to select the best subset of featuressubset o eatu es

Many NNs researchers switched to SVM in the 1990s because they work betterbecause they work better

Here, we’ll take a slow path into SVM concepts


Shattering Points with Oriented HyperplanesRemember the idea

I want to build hyperplanes that separate points of two classes

In a two-dimensional space lines

E.g.: Linear Classifier

Which is the best separating line?Which is the best separating line?

Remember, a hyperplane is t d b th tirepresented by the equation

0bWX 0=+ bWX


Linear SVMI want the line that maximizes the margin between gexamples of both classes!

Support Vectors


Linear SVMIn more detail

Let’s assume two classesy = {-1 1}yi = {-1, 1}

Each example described by a set of features x (x is aa set of features x (x is a vector; for clarity, we will mark vectors in bold in the remainder of the slides)

The problem can be formulated as followsAll training must satisfy(in the separable case)( )

This can be combined

Slide 8

This can be combined

Artificial Intelligence Machine Learning

Linear SVMWhat are the support vectors?pp

Let’s find the points that lay on the hyper plane H1

Their perpendicular distance to the origin isTheir perpendicular distance to the origin is

Let’s find the points that lay on the hyper plane H2

Their perpendicular distance to the origin is

The margin is:


Linear SVMTherefore, the problem is, p

Find the hyper plane that minimizes

Subject to

But let us change to the Lagrange formulation becauseBut let us change to the Lagrange formulation becauseThe constraints will be placed on the Lagrange multipliers themselves (easier to handle)themselves (easier to handle)

Training data will appear only in form of dot products between vectorsvectors


Linear SVMThe Lagrangian formulation comes to beg g

Where αi are the Lagrange multipliers

So now we need toSo, now we need toMinimize Lp w.r.t w, b

Simultaneously require that the derivatives of Lp w.r.t to αvanish

All subject to the constraints αi ≥ 0


Linear SVMTransformation to the dual problemp

This is a convex problem

W i l tl l th d l blWe can equivalently solve the dual problem

That is, maximize LD

W.r.t αi

Subject to constraintsSubject to constraints

And with αi ≥ 0


Linear SVM

This is a quadratic programming problem. You can solve it with many methods such as gradient descent

We’ll not see these methods in class


The Non-Separable caseWhat if I can not separate the two classesp

We will not be able to solve the Lagrangian formulationWe will not be able to solve the Lagrangian formulation proposed

Any idea?

Slide 14

Any idea?

Artificial Intelligence Machine Learning

The Non-Separable CaseJust relax the constraints by permitting some errorsy p g


The Non-Separable CaseThat means that the Lagrangian is rewritteng g

We change the objective function to be minimized tou c o o be ed o

Therefore, we are maximizing the margin and minimizing the error

C i t t t b h b thC is a constant to be chosen by the user

The dual problem becomes

Subject to and


Non-Linear SVMWhat happens if the decision function is a linear function of ppthe data?

In our equations data appears in form of dot products x xIn our equations, data appears in form of dot products xi · xj

Wouldn’t you like to have polynomials, logarithmics, … functions to fit the data?functions to fit the data?


Non-Linear SVM

The kernel trickThe kernel trickMap the data into a higher-dimensional space

Mercer theorem: any continuous, symmetric, positive semi-definite kernel function K(x, y) can be expressed as a dot product in a high dimensional spaceproduct in a high-dimensional space

Now, we have a kernel function

An example

All we have talked about still holds when using theAll we have talked about still holds when using the kernel function

The only difference is that now my function will beThe only difference is that now my function will be


Non-Linear SVMSome typical kernelsSome typical kernels

A i l l f l i l k l ith 3A visual example of a polynomial kernel with p=3


Some Further IssuesWe have to classify datay

Described by nominal attributes and continuous attributes

P b bl ith i i lProbably with missing values

That may have more than two classes

How SVM deal with them?SVM defined over continuous attributes No problem!SVM defined over continuous attributes. No problem!

Nominal attributes Map into continuous space

S fMultiple classes Build SVM that discriminate each pair of classes


Some Further IssuesI’ve seen lots of formulas… But I want to program a SVM p gbuilder. How I get my SVM?

We have already mentioned that there are many methods toWe have already mentioned that there are many methods to solve the quadratic programming problem

Many algorithms designed for SVMMany algorithms designed for SVM

One of the most significant: Sequential Minimal Optimization

C l h l i hCurrently, there are many new algorithms


Next Class

Association Rules


Introduction to MachineIntroduction to Machine LearningLearning

Lecture 12Lecture 12Support Vector Machines

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

lecture12 - svm

Documents