proximal plane classification kdd 2001 san francisco august 26-29, 2001 glenn fung olvi mangasarian...

Proximal Plane ClassificationKDD 2001

San Francisco August 26-29, 2001

Glenn Fung & Olvi Mangasarian

Second Annual ReviewJune 1, 2001

Data Mining Institute University of Wisconsin - Madison

Key Contributions

Fast new support vector machine classifier

An order of magnitude faster than standard classifiers

Extremely simple to implement

4 lines of MATLAB code

NO optimization packages (LP,QP) needed

Outline of Talk

(Standard) Support vector machine (SVM) classifiers Proximal support vector machines (PSVM) classifiers

Geometric motivation Linear PSVM classifier Nonlinear PSVM classifier

Full and reduced kernels Numerical results

Correctness comparable to standard SVM Much faster classification!

2-million points in 10-space in 21 seconds Compared to over 10 minutes for standard SVM

Support Vector MachinesMaximizing the Margin between Bounding

Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjíwjj22

w

Proximal Vector MachinesFitting the Data using two parallel

Bounding Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjíwjj22

w

SVM as an Unconstrained Minimization Problem

At the solution of (QP) : where (á)+ = maxfá;0g

y = (eà D(Awà eí ))+ ,

Hence (QP) is equivalent to :minw;í 2

÷k(eà D(Awà eí ))+k22 + 2

1kw; í k22

2÷kyk2

2 + 21kw;í k2

2D(Awà eí ) + y > e

y > 0;w;ímin

s. t.(QP)

Changing to 2-norm and measuring margin in ( ) space:w;í

PSVM Formulation

We have from the QP SVM formulation:

w;í (QP)2÷kyk2

2 + 21kw;í k2

2D(Awà eí ) + y

mins. t. = e=

This simple, but critical modification, changes the nature of the optimization problem tremendously!!

Solving for in terms of and gives:

minw;í 2÷keà D(Awà eí )k2

2 + 21kw; í k2

2

y w í

Advantages of New Formulation

Objective function remains strongly convex

An explicit exact solution can be written in terms of the problem data

PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space

Exact leave-one-out-correctness can be obtained in terms of problem data

Linear PSVM

We want to solve:

w;ímin 2

÷keà D(Awà eí )k22 + 2

1kw; í k22

Setting the gradient equal to zero, gives a nonsingular system of linear equations.

Solution of the system gives the desired PSVM classifier

Linear PSVM Solution

H = [A à e]Here,

íw

h i= (÷

I + H 0H)à 1H 0De

The linear system to solve depends on:

H 0H(n + 1) â (n + 1)which is of the size

is usually much smaller than n m

Linear Proximal SVM Algorithm

Classifier: sign(w0x à í )

Input A;D

Define H = [A à e]

Solve (÷I + H 0H) í

wh i

= v

v = H0DeCalculate

Nonlinear PSVM Formulation

By QP “duality”, w = A0Du. Maximizing the margin in the “dual space” , gives:

2÷keà D(AA0Du à eí )k2

2+ 21ku;í k2

2u;ímin

K (A;A0) Replace AA0by a nonlinear kernel :

2÷keà D(K (A;A0)Du à eí )k2

2+ 21ku;í k2

2u;ímin

Linear PSVM: (Linear separating surface:x0w = í )

w;í (QP)2÷kyk2

2 + 21kw;í k2

2D(Awà eí ) + y

mins. t. = e

The Nonlinear Classifier

Gaussian (Radial Basis) Kernel :

"à ökA ià A jk22; i; j = 1;. . .;m

Polynomial Kernel : (AA0+ öaa0)dï

K (A;B) : Rmâ n â Rnâ l 7à! Rmâ lK (x0;A0)Du = í

The nonlinear classifier:

Where K is a nonlinear kernel, e.g.:

Nonlinear PSVM

H = [K (A;A0) à e]Defining slightly different:

íu

h i= (÷

I + H 0H)à 1H 0De

Similar to the linear case, setting the gradient equal to zero, we obtain:

However, reduced kernels techniques can be used (RSVM)to reduce dimensionality.

Here, the linear system to solve is of the size

(m+ 1) â (m+ 1)

Linear Proximal SVM Algorithm

Input A;D

Solve (÷I + H 0H) í

wh i

= v

v = H0DeCalculate

Non

Define H = [A à e] K = K (A;A0)K

Classifier: sign(w0x à í ) Classifier: sign(u0K (x;A0) à í )

u u = Du

PSVM MATLAB Code

function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = pvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Linear PSVM Comparisons with Other SVMs

Much Faster, Comparable Correctness

Data Setm x n

PSVMTen-fold test

%Time (sec.)

SSVM Ten-fold test

%Time (sec.)

SVM Ten-fold test

%Time (sec.)

WPBC (60 mo.)110 x 32

68.50.02

68.50.17

62.73.85

Ionosphere351 x 34

87.30.17

88.71.23

88.02.19

Cleveland Heart297 x 13

85.90.01

86.20.70

86.51.44

Pima Indians768 x 8

77.50.02

77.60.78

76.437.00

BUPA Liver345 x 6

69.40.02

70.00.78

69.56.65

Galaxy Dim4192 x 14

93.50.34

95.05.21

94.128.33

light

Linear PSVMComparisons on Larger Adult Dataset

Much Faster & Comparable Correctness

Dataset Size

Testing correctness % Running time Sec. (Best in Red)

(Train,Test)

Attributes=123

PSVM LSVM

SSVM

SOR SMO SVM

(11221,21341)

84.482.5

84.84

38.9

84.7914.1

84.3718.8

-17.0

84.68306.6

(16101,16461)

84.783.7

85.01

60.5

84.9621.5

84.6224.8

-35.3

84.83667.2

(22697,9865)

85.165.2

85.35

92.0

85.35

29.0

85.0631.3

-85.7

85.171425.6

(32562,16282)

84.567.4

85.05

140.9

85.0244.5

84.9683.9

-163.6

85.052184.0

light

Linear PSVM vs LSVM 2-Million Dataset

Over 30 Times Faster

Dataset Method TrainingCorrectness

%

TestingCorrectness %

TimeSec.

NDC“Easy”

LSVM 90.86 91.23 658.5PSVM 90.80 91.13 20.8

NDC“Hard”

LSVM 69.80 69.44 655.6PSVM 69.84 69.52 20.6

Nonlinear PSVM: Spiral Dataset94 Red Dots & 94 White Dots

Nonlinear PSVM Comparisons

Data Setm x n

PSVMTen-fold test

%Time (sec.)

SSVM Ten-fold test

%Time (sec.)

LSVM Ten-fold test

%Time (sec.)

Ionosphere351 x 34

95.24.60

95.825.25

95.814.58

BUPA Liver345 x 6

73.64.34

73.720.65

73.730.75

Tic-Tac-Toe958 x 9

98.474.95

98.4395.30

94.7350.64

Mushroom *8124 x 22

88.035.50

88.8307.66

87.8503.74

* A rectangular kernel was used of size 8124 x 215

Conclusion

PSVM is an extremely simple procedure for generating linear and nonlinear classifiers

PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier

Comparable test set correctness to standard SVM

Much faster than standard SVMs : typically an order of magnitude less.

Future Work

Extension of PSVM to multicategory classification

Massive data classification using an incremental PSVM

Parallel extension and implementation of PSVM

proximal plane classification kdd 2001 san francisco august 26-29, 2001 glenn fung olvi mangasarian...

Documents