proximal plane classification kdd 2001 san francisco august 26-29, 2001 glenn fung olvi mangasarian...
DESCRIPTION
Outline of Talk (Standard) Support vector machine (SVM) classifiers Proximal support vector machines (PSVM) classifiers Geometric motivation Linear PSVM classifier Nonlinear PSVM classifier Full and reduced kernels Numerical results Correctness comparable to standard SVM Much faster classification! 2-million points in 10-space in 21 seconds Compared to over 10 minutes for standard SVMTRANSCRIPT
Proximal Plane ClassificationKDD 2001
San Francisco August 26-29, 2001
Glenn Fung & Olvi Mangasarian
Second Annual ReviewJune 1, 2001
Data Mining Institute University of Wisconsin - Madison
Key Contributions
Fast new support vector machine classifier
An order of magnitude faster than standard classifiers
Extremely simple to implement
4 lines of MATLAB code
NO optimization packages (LP,QP) needed
Outline of Talk
(Standard) Support vector machine (SVM) classifiers Proximal support vector machines (PSVM) classifiers
Geometric motivation Linear PSVM classifier Nonlinear PSVM classifier
Full and reduced kernels Numerical results
Correctness comparable to standard SVM Much faster classification!
2-million points in 10-space in 21 seconds Compared to over 10 minutes for standard SVM
Support Vector MachinesMaximizing the Margin between Bounding
Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjíwjj22
w
Proximal Vector MachinesFitting the Data using two parallel
Bounding Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjíwjj22
w
SVM as an Unconstrained Minimization Problem
At the solution of (QP) : where (á)+ = maxfá;0g
y = (eà D(Awà eí ))+ ,
Hence (QP) is equivalent to :minw;í 2
÷k(eà D(Awà eí ))+k22 + 2
1kw; í k22
2÷kyk2
2 + 21kw;í k2
2D(Awà eí ) + y > e
y > 0;w;ímin
s. t.(QP)
Changing to 2-norm and measuring margin in ( ) space:w;í
PSVM Formulation
We have from the QP SVM formulation:
w;í (QP)2÷kyk2
2 + 21kw;í k2
2D(Awà eí ) + y
mins. t. = e=
This simple, but critical modification, changes the nature of the optimization problem tremendously!!
Solving for in terms of and gives:
minw;í 2÷keà D(Awà eí )k2
2 + 21kw; í k2
2
y w í
Advantages of New Formulation
Objective function remains strongly convex
An explicit exact solution can be written in terms of the problem data
PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space
Exact leave-one-out-correctness can be obtained in terms of problem data
Linear PSVM
We want to solve:
w;ímin 2
÷keà D(Awà eí )k22 + 2
1kw; í k22
Setting the gradient equal to zero, gives a nonsingular system of linear equations.
Solution of the system gives the desired PSVM classifier
Linear PSVM Solution
H = [A à e]Here,
íw
h i= (÷
I + H 0H)à 1H 0De
The linear system to solve depends on:
H 0H(n + 1) â (n + 1)which is of the size
is usually much smaller than n m
Linear Proximal SVM Algorithm
Classifier: sign(w0x à í )
Input A;D
Define H = [A à e]
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
Nonlinear PSVM Formulation
By QP “duality”, w = A0Du. Maximizing the margin in the “dual space” , gives:
2÷keà D(AA0Du à eí )k2
2+ 21ku;í k2
2u;ímin
K (A;A0) Replace AA0by a nonlinear kernel :
2÷keà D(K (A;A0)Du à eí )k2
2+ 21ku;í k2
2u;ímin
Linear PSVM: (Linear separating surface:x0w = í )
w;í (QP)2÷kyk2
2 + 21kw;í k2
2D(Awà eí ) + y
mins. t. = e
The Nonlinear Classifier
Gaussian (Radial Basis) Kernel :
"à ökA ià A jk22; i; j = 1;. . .;m
Polynomial Kernel : (AA0+ öaa0)dï
K (A;B) : Rmâ n â Rnâ l 7à! Rmâ lK (x0;A0)Du = í
The nonlinear classifier:
Where K is a nonlinear kernel, e.g.:
Nonlinear PSVM
H = [K (A;A0) à e]Defining slightly different:
íu
h i= (÷
I + H 0H)à 1H 0De
Similar to the linear case, setting the gradient equal to zero, we obtain:
However, reduced kernels techniques can be used (RSVM)to reduce dimensionality.
Here, the linear system to solve is of the size
(m+ 1) â (m+ 1)
Linear Proximal SVM Algorithm
Input A;D
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
Non
Define H = [A à e] K = K (A;A0)K
Classifier: sign(w0x à í ) Classifier: sign(u0K (x;A0) à í )
u u = Du
PSVM MATLAB Code
function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = pvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r
Linear PSVM Comparisons with Other SVMs
Much Faster, Comparable Correctness
Data Setm x n
PSVMTen-fold test
%Time (sec.)
SSVM Ten-fold test
%Time (sec.)
SVM Ten-fold test
%Time (sec.)
WPBC (60 mo.)110 x 32
68.50.02
68.50.17
62.73.85
Ionosphere351 x 34
87.30.17
88.71.23
88.02.19
Cleveland Heart297 x 13
85.90.01
86.20.70
86.51.44
Pima Indians768 x 8
77.50.02
77.60.78
76.437.00
BUPA Liver345 x 6
69.40.02
70.00.78
69.56.65
Galaxy Dim4192 x 14
93.50.34
95.05.21
94.128.33
light
Linear PSVMComparisons on Larger Adult Dataset
Much Faster & Comparable Correctness
Dataset Size
Testing correctness % Running time Sec. (Best in Red)
(Train,Test)
Attributes=123
PSVM LSVM
SSVM
SOR SMO SVM
(11221,21341)
84.482.5
84.84
38.9
84.7914.1
84.3718.8
-17.0
84.68306.6
(16101,16461)
84.783.7
85.01
60.5
84.9621.5
84.6224.8
-35.3
84.83667.2
(22697,9865)
85.165.2
85.35
92.0
85.35
29.0
85.0631.3
-85.7
85.171425.6
(32562,16282)
84.567.4
85.05
140.9
85.0244.5
84.9683.9
-163.6
85.052184.0
light
Linear PSVM vs LSVM 2-Million Dataset
Over 30 Times Faster
Dataset Method TrainingCorrectness
%
TestingCorrectness %
TimeSec.
NDC“Easy”
LSVM 90.86 91.23 658.5PSVM 90.80 91.13 20.8
NDC“Hard”
LSVM 69.80 69.44 655.6PSVM 69.84 69.52 20.6
Nonlinear PSVM: Spiral Dataset94 Red Dots & 94 White Dots
Nonlinear PSVM Comparisons
Data Setm x n
PSVMTen-fold test
%Time (sec.)
SSVM Ten-fold test
%Time (sec.)
LSVM Ten-fold test
%Time (sec.)
Ionosphere351 x 34
95.24.60
95.825.25
95.814.58
BUPA Liver345 x 6
73.64.34
73.720.65
73.730.75
Tic-Tac-Toe958 x 9
98.474.95
98.4395.30
94.7350.64
Mushroom *8124 x 22
88.035.50
88.8307.66
87.8503.74
* A rectangular kernel was used of size 8124 x 215
Conclusion
PSVM is an extremely simple procedure for generating linear and nonlinear classifiers
PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier
Comparable test set correctness to standard SVM
Much faster than standard SVMs : typically an order of magnitude less.
Future Work
Extension of PSVM to multicategory classification
Massive data classification using an incremental PSVM
Parallel extension and implementation of PSVM