svm and kernel machines

Support Vector Machines Kernel Machines

- Nawal K Sharma

Remember the XOR problem?

• It is how to make a neural network produce an identical output when the input conditions don't have anything in common

Remember the XOR problem?

Support Vector Machines (SVM)

• Method for supervised learning problemsClassification– Regression

• Two key ideas– Assuming linearly separable classes, learn separating

hyperplane with maximum margin– Expand input into high-dimensional space to deal

with linearly non-separable cases (such as the XOR)

Support vectors

The training points that are nearest to the separating function are called support vectors.

What is the output of our decision function for these points?

Non-linear SVMs

• Transform x (x)

• The linear algorithm depends only on xxi, hence transformed algorithm depends only on (x)(xi)

• Use kernel function K(x,y) such that K(x,y)= (x)(y)

Using SVM for classification

• Prepare the data matrix

• Select the kernel function to use

• Execute the training algorithm using a QP solver to obtain the values

• Unseen data can be classified using the values and the support vectors

Making new kernels from the old

New kernels can be made from valid kernels by allowed operations e.g. addition, multiplication and rescaling of kernels gives a proper kernel as long as the resulting Gram matrix is positive definite.

Also, given a real-valued function f(x) over inputs x, then the

following is a valid kernel

),(),(),(

),(),(

),(),(),(

21221121

21121

21221121

xxxxxx

xxxx

xxxxxx

KKK

λKK

KKK

)()(),( 2121 xxxx ffK

Applications

• Handwritten digits recognition

• Text categorisation

• Face detection

• DNA analysis

• …many others

Discriminative versus generative classification methods

• SVMs learn the discrimination boundary. They are called discriminatory approaches.

• This is in contrast to learning a model for each class, like e.g. Bayesian classification does. This latter approach is called generative approach.

• SVM tries to avoid overfitting in high dimensional spaces (cf regularisation)

Conclusions

• SVMs learn linear decision boundaries (cf perceptrons)– They pick the hyperplane that maximises the margin

– The optimal hyperplane turns out to be a linear combination of support vectors

• Transform nonlinear problems to higher dimensional space using kernel functions; then there is more chance that in the transformed space the classes will be linearly separable.

Resources

• SW & practical guide to SVM for beginners http://www.csie.ntu.edu.tw/~cjlin/libsvm/

• Kernel machines website: http://www.kernel-machines.org/

• Burges, C.J. C: A tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, Vol.2, nr.2, pp.121—167, 1998. Available from http://svm.research.bell-labs.com/SVMdoc.html

• Cristianini & Shawe-Taylor: SVM book (in the School library)

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

http://www.kernel-machines.org/

http://svm.research.bell-labs.com/SVMdoc.html

svm and kernel machines

Technology