vc theory, support vectors and hedged prediction technology
DESCRIPTION
VC theory, Support vectors and Hedged prediction technology. Overfitting in classification. Assume a family C of classifiers of points in feature space F. A family of classifiers is a map from C F to {0,1} (Negative and positive class). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/1.jpg)
VC theory, Support vectors and Hedged prediction technology
![Page 2: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/2.jpg)
Overfitting in classification
• Assume a family C of classifiers of points infeature space F. A family of classifiers is a map from CF to {0,1} (Negative and positive class).
• For each subset X of F and each c in C, c(X) defines a partitioning of X into two classes.
• C shatters X if every partitioning of X is accomplished by some c in C
• If every point set X of size d is shattered by C, then the VC dimension is at least d.
• If a point set of d+1 elements cannot be shattered by C, then the VC-dimension is at most d.
![Page 3: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/3.jpg)
VC-dimension of hyperplanes
• The set of points on the line shatters any two points, but not three
• The set of lines in the plane shatters any three non-collinear points, but no four points.
• Any d+2 points in E^d can be partitioned into two blocks whose convex hulls intersect.
• VC-dimension of hyperplanes in E^d is thus d+1.
![Page 4: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/4.jpg)
Why VC-dimension?
• Elegant and pedagogical, not very useful.• Bounds future error of classifier, PAC-learning.• Exchangeable distribution of (xi, yi).• For first N points, training error for c is
observed error rate for c.• Goodness of selecting from C a classifier with
best performance on training set depends on VC-dimension h:
![Page 5: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/5.jpg)
Why VC-dimension?
![Page 6: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/6.jpg)
Classify with hyperplanes
Frank Rosenblatt (1928 – 1971)
Pioneering work in classifying byhyperplanes in high-dimensional spaces.
Criticized by Minsky-Papert, sincereal classes are not normallylinearly separable.ANN research taken up again in1980:s, with non-linear mappingsto get improved separation.Predecessor to SVM/kernel methods
![Page 7: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/7.jpg)
Find parallel hyperplanes
• Separate examples by wide margin hyperplanes (classifications).
• Enclose examples between hyperplanes (regression).
• If necessary, non-linearly map examples to high-dimensional space where they are better separated.
![Page 8: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/8.jpg)
Find parallel hyperplanes
ClassificationRed true separatingplane.Blue: wide marginseparation in sampleClassify by planebetween blue planes
![Page 9: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/9.jpg)
Find parallel hyperplanes
RegressionRed: true central plane.Blue: narrowest margin enclosing sample
New xk : predict ykso (xk, yk) lies on mid-plane (dotted).
![Page 10: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/10.jpg)
![Page 11: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/11.jpg)
![Page 12: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/12.jpg)
![Page 13: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/13.jpg)
![Page 14: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/14.jpg)
![Page 15: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/15.jpg)
From vector to scalar product
![Page 16: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/16.jpg)
Soft Margins
![Page 17: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/17.jpg)
Soft MarginsQuadratic programming goes through also with soft margins.Specification of softness constant C is part of most packages.
However, no prior rule for setting C is established, and experimentation is necessary for each application. Choice is between narrowing margin, allowing more outliers, and using a more liberal kernel (to be described).
![Page 18: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/18.jpg)
SVM packages
• Inputs xi, yi, and KERNEL and SOFTNESS information
• Only output is , non-zero coefficients indicatesupport vectors.
• Hyperplane obtained by
![Page 19: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/19.jpg)
Kernel Trick
![Page 20: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/20.jpg)
Kernel Trick
![Page 21: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/21.jpg)
Kernel TrickExample: 2D space (x1,x2). Map to 5D space(c1*x1, c2*x2, c3*x1^2, c4*x1*x2, c5*x2^2).
K(x,y)=(xy+1)^2 =2*x1*y1+2*x2*y2+x1^2*y1^2+x2^2*y2^2+2*x1*x2*y1*y2+1 =(x)(y),
Where (x)= ((x1,x2)) = (√2x1, √2x2, x1^2, √2x1*x2, x2^2).
Hyperplanes in R^5 are mapped back to conic sections in R^2!!
![Page 22: VC theory, Support vectors and Hedged prediction technology](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814575550346895db244c4/html5/thumbnails/22.jpg)
Kernel TrickGaussian Kernel:K(x,y) = exp(-||x-y||^2/2