scaling multi-class support vector machines using inter-class confusion

29
1 Scaling multi-class Scaling multi-class Support Vector Support Vector Machines using inter- Machines using inter- class confusion class confusion Author:Shantanu Author:Shantanu Sunita Sarawag Sunita Sarawag i i Soumen Chakrab Soumen Chakrab arti arti Advisor:Dr Hsu Advisor:Dr Hsu Graduate:ching-wen Hong Graduate:ching-wen Hong

Upload: fagan

Post on 16-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Scaling multi-class Support Vector Machines using inter-class confusion. Author:Shantanu Sunita Sarawagi Soumen Chakrabarti Advisor:Dr Hsu Graduate:ching-wen Hong. Content. 1.Motivation 2.Objective - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scaling multi-class Support Vector Machines using inter-class confusion

11

Scaling multi-class Support Scaling multi-class Support Vector Machines using inter-Vector Machines using inter-

class confusionclass confusion

Author:ShantanuAuthor:Shantanu Sunita SarawagiSunita Sarawagi Soumen ChakrabartiSoumen ChakrabartiAdvisor:Dr HsuAdvisor:Dr HsuGraduate:ching-wen HongGraduate:ching-wen Hong

Page 2: Scaling multi-class Support Vector Machines using inter-class confusion

22

ContentContent

1.Motivation1.Motivation 2.Objective2.Objective 3.Introduction: (1).SVM (2).Using SVM to solve 3.Introduction: (1).SVM (2).Using SVM to solve

multi-class problems. (3).Present a method in multi-class problems. (3).Present a method in this paper.this paper.

4.OUR APPROACH (1).Hierarchical Approach 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm (2).The GraphSVM algorithm

5.Experimental evaluation5.Experimental evaluation 6.Conclusion6.Conclusion 7.Personal opinion7.Personal opinion

Page 3: Scaling multi-class Support Vector Machines using inter-class confusion

33

MotivationMotivation

Solve multi-class problems.Solve multi-class problems.

Page 4: Scaling multi-class Support Vector Machines using inter-class confusion

44

ObjectiveObjective

SVM excel at two-class discrinative learning prSVM excel at two-class discrinative learning problems. The accuracy of SVM is high. oblems. The accuracy of SVM is high.

SVM is difficult to solve multi-class problems. SVM is difficult to solve multi-class problems. Because training time is long. Because training time is long.

The naïve Bayes(NB) classifier is much faster tThe naïve Bayes(NB) classifier is much faster than SVM in training time.han SVM in training time.

We propose a new technique for multi-way claWe propose a new technique for multi-way classification which exploits the accuracy of SVM ssification which exploits the accuracy of SVM and the speed of NB classifiers. and the speed of NB classifiers.

Page 5: Scaling multi-class Support Vector Machines using inter-class confusion

55

IntroductionIntroduction

1.SVM:1.SVM: Input: a training set S=Input: a training set S= {({( xx11,y,y11 )) ,…, ,…,

(( xxNN,y,yNN )})} ,x,xii is a vector, y is a vector, yii=1,-1=1,-1 Output: a classifier fOutput: a classifier f (( xx )) =W=W .. X+bX+b For example: Medical diagnosisFor example: Medical diagnosis XXii = = (( age,sex,blood,…,genome,…age,sex,blood,…,genome,… )) YYi i indicates the risk of cancer.indicates the risk of cancer.

Page 6: Scaling multi-class Support Vector Machines using inter-class confusion

66

1.Linear SVM1.Linear SVM

Page 7: Scaling multi-class Support Vector Machines using inter-class confusion

77

Linear SVMLinear SVM

Page 8: Scaling multi-class Support Vector Machines using inter-class confusion

88

Linear SVMLinear SVM

Page 9: Scaling multi-class Support Vector Machines using inter-class confusion

99

Linear SVMLinear SVM

Page 10: Scaling multi-class Support Vector Machines using inter-class confusion

1010

Linear SVMLinear SVM

Page 11: Scaling multi-class Support Vector Machines using inter-class confusion

1111

2.Using SVM to solve multi-class pr2.Using SVM to solve multi-class problems.oblems.

1. “one-vs-others” approach1. “one-vs-others” approach For each of the N classes, We construct a For each of the N classes, We construct a

one-others (yes/no) SVM for that class alone-others (yes/no) SVM for that class alone.one.

The winning SVM is the one which says yThe winning SVM is the one which says yes, and whose margin is largest among aes, and whose margin is largest among all SVMs.ll SVMs.

Page 12: Scaling multi-class Support Vector Machines using inter-class confusion

1212

Using SVM to solve multi-class proUsing SVM to solve multi-class problemsblems

2.Accumulated votes approach 2.Accumulated votes approach To construct SVMs between all possible To construct SVMs between all possible

pairs of classes.pairs of classes. The winning class has the largest numbeThe winning class has the largest numbe

r of accumulated votes.r of accumulated votes.

Page 13: Scaling multi-class Support Vector Machines using inter-class confusion

1313

3.Present a method in this paper.3.Present a method in this paper.

1.Using scalability of NB classifiers w.r.t. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs.number of classes and accuracy of SVMs.

The first stage :Using multi-class NB clasThe first stage :Using multi-class NB classifier to a confusion matrix.sifier to a confusion matrix.

The second stage :Using SVM with the “The second stage :Using SVM with the “one-vs-others” approach. one-vs-others” approach.

Page 14: Scaling multi-class Support Vector Machines using inter-class confusion

1414

OUR APPROACHOUR APPROACH

Confusion matrix: using NB and held-out Confusion matrix: using NB and held-out validation dataset.validation dataset.

Page 15: Scaling multi-class Support Vector Machines using inter-class confusion

1515

OUR APPROACHOUR APPROACH

Page 16: Scaling multi-class Support Vector Machines using inter-class confusion

1616

Hierarchical ApproachHierarchical Approach

Top-level( L1) classifier(NB or SVM) to diTop-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusterscriminate amongst the top-level clusters of labels.s of labels.

Second-level(L2) we build multi-class SVSecond-level(L2) we build multi-class SVMs within each cluster of classes.Ms within each cluster of classes.

Page 17: Scaling multi-class Support Vector Machines using inter-class confusion

1717

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

We compare four methods:We compare four methods: MCNB(one-vs-others)MCNB(one-vs-others) MCSVM(one-vs-others)MCSVM(one-vs-others) Hier-NB (L1:NB,L2:NB),Hier-NB (L1:NB,L2:NB), Hier-SVM (L1:NB,L2:SVMHier-SVM (L1:NB,L2:SVM))

Page 18: Scaling multi-class Support Vector Machines using inter-class confusion

1818

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

Page 19: Scaling multi-class Support Vector Machines using inter-class confusion

1919

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

Page 20: Scaling multi-class Support Vector Machines using inter-class confusion

2020

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

NB-L2( 89.01%),combining with the NB-L1 (93.NB-L2( 89.01%),combining with the NB-L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%)56%),Hier-NB (83.28%),MCNB (85.27%)

SVM-L2 with NB-L1(92.04%), Hier-SVM(86.12SVM-L2 with NB-L1(92.04%), Hier-SVM(86.12%),MCSVM(89.66%) %),MCSVM(89.66%)

The main reason for the low accuracy of the hiThe main reason for the low accuracy of the hierarchical approaches is the compounding of erarchical approaches is the compounding of errors at the two levels.errors at the two levels.

This led us to design a new algorithm GraphSVThis led us to design a new algorithm GraphSVM. M.

Page 21: Scaling multi-class Support Vector Machines using inter-class confusion

2121

The GraphSVM algorithmThe GraphSVM algorithm

1.The confusion matrix obtained by a fast mult1.The confusion matrix obtained by a fast multi-class NB classifier M1,i-class NB classifier M1,

For each class i,F(i)={mis-classified as class i },For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified.a threshold t% mis-classified.

In Figure1 , I=alt.atheism,t=3%,F(alt.atheism)=In Figure1 , I=alt.atheism,t=3%,F(alt.atheism)={talk.religion.misc,soc.religion.christian}.{talk.religion.misc,soc.religion.christian}.

2.Train a multi-class classifier M2(i) to distingu2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}.ish among the class{i}U F{i}.

Page 22: Scaling multi-class Support Vector Machines using inter-class confusion

2222

.Experimental evaluation.Experimental evaluation

1.Datasets1.Datasets 20-newsgroups:18828 news wire articles 20-newsgroups:18828 news wire articles

from 20 Usenet group.We randomly chofrom 20 Usenet group.We randomly chose 70% of the documents for training anse 70% of the documents for training and 30% for testing.d 30% for testing.

Reuter-21578:135 classes,8819 training Reuter-21578:135 classes,8819 training documents and 1887 test documents.documents and 1887 test documents.

Page 23: Scaling multi-class Support Vector Machines using inter-class confusion

2323

Overall comparisonOverall comparison

Page 24: Scaling multi-class Support Vector Machines using inter-class confusion

2424

Scalability with number of Scalability with number of classesclasses

Page 25: Scaling multi-class Support Vector Machines using inter-class confusion

2525

Scalability with number of Scalability with number of classesclasses

Page 26: Scaling multi-class Support Vector Machines using inter-class confusion

2626

Scalability with training set sizeScalability with training set size

Page 27: Scaling multi-class Support Vector Machines using inter-class confusion

2727

Effect of the threshold Effect of the threshold parameterparameter

Page 28: Scaling multi-class Support Vector Machines using inter-class confusion

2828

ConclusionConclusion

GraphSVM is accurate and efficient in multi-claGraphSVM is accurate and efficient in multi-classes problem.sses problem.

GraphSVM outerforms SVMs w.r.t. training timGraphSVM outerforms SVMs w.r.t. training time and memory requirements.e and memory requirements.

GraphSVM is very simple to understand and reGraphSVM is very simple to understand and requires negligible coding,but it is useful to deal quires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of cwith very large classifiers(ten of thousands of classses and millions of instances).lassses and millions of instances).

Page 29: Scaling multi-class Support Vector Machines using inter-class confusion

2929

Personal opinionPersonal opinion

GraphSVM may be worse is high positive GraphSVM may be worse is high positive value of the threshold t.value of the threshold t.

It is nice that the accurate of GraphSVM It is nice that the accurate of GraphSVM can not affected by the threshold t.can not affected by the threshold t.