scaling multi-class support vector machines using inter-class confusion

11

Scaling multi-class Support Scaling multi-class Support Vector Machines using inter-Vector Machines using inter-

class confusionclass confusion

Author:ShantanuAuthor:Shantanu Sunita SarawagiSunita Sarawagi Soumen ChakrabartiSoumen ChakrabartiAdvisor:Dr HsuAdvisor:Dr HsuGraduate:ching-wen HongGraduate:ching-wen Hong

22

ContentContent

1.Motivation1.Motivation 2.Objective2.Objective 3.Introduction: (1).SVM (2).Using SVM to solve 3.Introduction: (1).SVM (2).Using SVM to solve

multi-class problems. (3).Present a method in multi-class problems. (3).Present a method in this paper.this paper.

4.OUR APPROACH (1).Hierarchical Approach 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm (2).The GraphSVM algorithm

5.Experimental evaluation5.Experimental evaluation 6.Conclusion6.Conclusion 7.Personal opinion7.Personal opinion

33

MotivationMotivation

Solve multi-class problems.Solve multi-class problems.

44

ObjectiveObjective

SVM excel at two-class discrinative learning prSVM excel at two-class discrinative learning problems. The accuracy of SVM is high. oblems. The accuracy of SVM is high.

SVM is difficult to solve multi-class problems. SVM is difficult to solve multi-class problems. Because training time is long. Because training time is long.

The naïve Bayes(NB) classifier is much faster tThe naïve Bayes(NB) classifier is much faster than SVM in training time.han SVM in training time.

We propose a new technique for multi-way claWe propose a new technique for multi-way classification which exploits the accuracy of SVM ssification which exploits the accuracy of SVM and the speed of NB classifiers. and the speed of NB classifiers.

55

IntroductionIntroduction

1.SVM:1.SVM: Input: a training set S=Input: a training set S= ｛（｛（ xx11,y,y11 ）） ,…, ,…,

（（ xxNN,y,yNN ）｝）｝ ,x,xii is a vector, y is a vector, yii=1,-1=1,-1 Output: a classifier fOutput: a classifier f （（ xx ）） =W=W ．． X+bX+b For example: Medical diagnosisFor example: Medical diagnosis XXii = = （（ age,sex,blood,…,genome,…age,sex,blood,…,genome,… ）） YYi i indicates the risk of cancer.indicates the risk of cancer.

66

1.Linear SVM1.Linear SVM

77

Linear SVMLinear SVM

88


99


1010


1111

2.Using SVM to solve multi-class pr2.Using SVM to solve multi-class problems.oblems.

1. “one-vs-others” approach1. “one-vs-others” approach For each of the N classes, We construct a For each of the N classes, We construct a

one-others (yes/no) SVM for that class alone-others (yes/no) SVM for that class alone.one.

The winning SVM is the one which says yThe winning SVM is the one which says yes, and whose margin is largest among aes, and whose margin is largest among all SVMs.ll SVMs.

1212

Using SVM to solve multi-class proUsing SVM to solve multi-class problemsblems

2.Accumulated votes approach 2.Accumulated votes approach To construct SVMs between all possible To construct SVMs between all possible

pairs of classes.pairs of classes. The winning class has the largest numbeThe winning class has the largest numbe

r of accumulated votes.r of accumulated votes.

1313

3.Present a method in this paper.3.Present a method in this paper.

1.Using scalability of NB classifiers w.r.t. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs.number of classes and accuracy of SVMs.

The first stage :Using multi-class NB clasThe first stage :Using multi-class NB classifier to a confusion matrix.sifier to a confusion matrix.

The second stage :Using SVM with the “The second stage :Using SVM with the “one-vs-others” approach. one-vs-others” approach.

1414

OUR APPROACHOUR APPROACH

Confusion matrix: using NB and held-out Confusion matrix: using NB and held-out validation dataset.validation dataset.

1515

OUR APPROACHOUR APPROACH

1616

Hierarchical ApproachHierarchical Approach

Top-level( L1) classifier(NB or SVM) to diTop-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusterscriminate amongst the top-level clusters of labels.s of labels.

Second-level(L2) we build multi-class SVSecond-level(L2) we build multi-class SVMs within each cluster of classes.Ms within each cluster of classes.

1717

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

We compare four methods:We compare four methods: MCNB(one-vs-others)MCNB(one-vs-others) MCSVM(one-vs-others)MCSVM(one-vs-others) Hier-NB (L1:NB,L2:NB),Hier-NB (L1:NB,L2:NB), Hier-SVM (L1:NB,L2:SVMHier-SVM (L1:NB,L2:SVM))

1818


1919


2020


NB-L2( 89.01%),combining with the NB-L1 (93.NB-L2( 89.01%),combining with the NB-L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%)56%),Hier-NB (83.28%),MCNB (85.27%)

SVM-L2 with NB-L1(92.04%), Hier-SVM(86.12SVM-L2 with NB-L1(92.04%), Hier-SVM(86.12%),MCSVM(89.66%) %),MCSVM(89.66%)

The main reason for the low accuracy of the hiThe main reason for the low accuracy of the hierarchical approaches is the compounding of erarchical approaches is the compounding of errors at the two levels.errors at the two levels.

This led us to design a new algorithm GraphSVThis led us to design a new algorithm GraphSVM. M.

2121

The GraphSVM algorithmThe GraphSVM algorithm

1.The confusion matrix obtained by a fast mult1.The confusion matrix obtained by a fast multi-class NB classifier M1,i-class NB classifier M1,

For each class i,F(i)={mis-classified as class i },For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified.a threshold t% mis-classified.

In Figure1 , I=alt.atheism,t=3%,F(alt.atheism)=In Figure1 , I=alt.atheism,t=3%,F(alt.atheism)={talk.religion.misc,soc.religion.christian}.{talk.religion.misc,soc.religion.christian}.

2.Train a multi-class classifier M2(i) to distingu2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}.ish among the class{i}U F{i}.

2222

.Experimental evaluation.Experimental evaluation

1.Datasets1.Datasets 20-newsgroups:18828 news wire articles 20-newsgroups:18828 news wire articles

from 20 Usenet group.We randomly chofrom 20 Usenet group.We randomly chose 70% of the documents for training anse 70% of the documents for training and 30% for testing.d 30% for testing.

Reuter-21578:135 classes,8819 training Reuter-21578:135 classes,8819 training documents and 1887 test documents.documents and 1887 test documents.

2323

Overall comparisonOverall comparison

2424

Scalability with number of Scalability with number of classesclasses

2525

Scalability with number of Scalability with number of classesclasses

2626

Scalability with training set sizeScalability with training set size

2727

Effect of the threshold Effect of the threshold parameterparameter

2828

ConclusionConclusion

GraphSVM is accurate and efficient in multi-claGraphSVM is accurate and efficient in multi-classes problem.sses problem.

GraphSVM outerforms SVMs w.r.t. training timGraphSVM outerforms SVMs w.r.t. training time and memory requirements.e and memory requirements.

GraphSVM is very simple to understand and reGraphSVM is very simple to understand and requires negligible coding,but it is useful to deal quires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of cwith very large classifiers(ten of thousands of classses and millions of instances).lassses and millions of instances).

2929

Personal opinionPersonal opinion

GraphSVM may be worse is high positive GraphSVM may be worse is high positive value of the threshold t.value of the threshold t.

It is nice that the accurate of GraphSVM It is nice that the accurate of GraphSVM can not affected by the threshold t.can not affected by the threshold t.

scaling multi-class support vector machines using inter-class confusion

Documents

class i

multiclass svms

multiclass problems2

winning class

multiclass classifier

winning svm

yesno svm

hiersvm l1