svm and kernel machines
DESCRIPTION
SVM and Kernel Machines - quick introduction and description.TRANSCRIPT
![Page 1: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/1.jpg)
Support Vector Machines Kernel Machines
- Nawal K Sharma
![Page 2: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/2.jpg)
Remember the XOR problem?
![Page 3: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/3.jpg)
• It is how to make a neural network produce an identical output when the input conditions don't have anything in common
![Page 4: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/4.jpg)
Remember the XOR problem?
![Page 5: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/5.jpg)
Support Vector Machines (SVM)
• Method for supervised learning problemsClassification– Regression
• Two key ideas– Assuming linearly separable classes, learn separating
hyperplane with maximum margin– Expand input into high-dimensional space to deal
with linearly non-separable cases (such as the XOR)
![Page 6: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/6.jpg)
Support vectors
The training points that are nearest to the separating function are called support vectors.
What is the output of our decision function for these points?
![Page 7: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/7.jpg)
Non-linear SVMs
• Transform x (x)
• The linear algorithm depends only on xxi, hence transformed algorithm depends only on (x)(xi)
• Use kernel function K(x,y) such that K(x,y)= (x)(y)
![Page 8: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/8.jpg)
![Page 9: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/9.jpg)
Using SVM for classification
• Prepare the data matrix
• Select the kernel function to use
• Execute the training algorithm using a QP solver to obtain the values
• Unseen data can be classified using the values and the support vectors
![Page 10: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/10.jpg)
Making new kernels from the old
New kernels can be made from valid kernels by allowed operations e.g. addition, multiplication and rescaling of kernels gives a proper kernel as long as the resulting Gram matrix is positive definite.
Also, given a real-valued function f(x) over inputs x, then the
following is a valid kernel
),(),(),(
),(),(
),(),(),(
21221121
21121
21221121
xxxxxx
xxxx
xxxxxx
KKK
λKK
KKK
)()(),( 2121 xxxx ffK
![Page 11: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/11.jpg)
Applications
• Handwritten digits recognition
• Text categorisation
• Face detection
• DNA analysis
• …many others
![Page 12: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/12.jpg)
Discriminative versus generative classification methods
• SVMs learn the discrimination boundary. They are called discriminatory approaches.
• This is in contrast to learning a model for each class, like e.g. Bayesian classification does. This latter approach is called generative approach.
• SVM tries to avoid overfitting in high dimensional spaces (cf regularisation)
![Page 13: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/13.jpg)
Conclusions
• SVMs learn linear decision boundaries (cf perceptrons)– They pick the hyperplane that maximises the margin
– The optimal hyperplane turns out to be a linear combination of support vectors
• Transform nonlinear problems to higher dimensional space using kernel functions; then there is more chance that in the transformed space the classes will be linearly separable.
![Page 14: Svm and kernel machines](https://reader036.vdocuments.net/reader036/viewer/2022082420/546e84ecb4af9fa0268b46af/html5/thumbnails/14.jpg)
Resources
• SW & practical guide to SVM for beginners http://www.csie.ntu.edu.tw/~cjlin/libsvm/
• Kernel machines website: http://www.kernel-machines.org/
• Burges, C.J. C: A tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, Vol.2, nr.2, pp.121—167, 1998. Available from http://svm.research.bell-labs.com/SVMdoc.html
• Cristianini & Shawe-Taylor: SVM book (in the School library)