comments from pre-submission presentation

17
SoC Presentation Title 2004 Comments from Pre-submission Presentation Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]

Upload: sienna

Post on 13-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Comments from Pre-submission Presentation. Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Comments from Pre-submission Presentation

Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%.

A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]

Page 2: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

[Joachims98][Debole03][Dumais98]Results on the Reuters Corpus

Bayes Rocchio C4.5 kNN SVM

(linear)

SVM

(Poly)

SVM

(rbf)

Micro-BEP(%)

69.84 79.14 77.78 82.5 84.2 86 86

kNN SVM

(linear)

Micro-F1

85.4 92.0

NBayes DT SVM

(linear)

Micro-

BEP

81.5 88.4 92.0

Page 3: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

[Yang 99 Re-examination]Significance Test

Micro-level analysis (s-test)

SVM > kNN >> {LLSF, NNet} >> NB

Macro-level analysis

{SVM, kNN, LLSF} >> {NB, NNet}

Error-rate based comparison

{SVM, kNN} > LLSF > NNet >> NB

Page 4: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Comments from Pre-submission Presentation

2. Explain why BEP & F1 in Chap 7

-Add reference

Page 5: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Breakeven point (1)

BEP, first proposed by Lewis[1992]. Later, he himself pointed out that BEP is not a good effectiveness measure, because

1. there may be no parameter setting that yields the breakeven; in this case the final BEP value, obtained by interpolation, is artificial;2. to have P=R is not necessarily desirable, and it is not clear that a system that achieves high BEP can be tuned to score high on other effectiveness measure.

Page 6: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Breakeven point (2)

Yang[1999Re-examinatio] also noted that when for no value of the parameters P and R are close enough, interpolated breakeven may not be a reliable indicator of effectiveness.

Page 7: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Comments from Pre-submission Presentation

3. Add more qualitative analysis would be better

Page 8: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Analysis and Proposal: Empirical observation

feature

Category: 00_acq Category: 03_earn

idf rf chi2 idf rf chi2

Acquir 3.553 4.368 850.66 3.553 1.074 81.50

Stake 4.201 2.975 303.94 4.201 1.082 31.26

Payout 4.999 1 10.87 4.999 7.820 44.68

dividend 3.567 1.033 46.63 3.567 4.408 295.46

Comparison of idf, rf and chi2 value of four features in two categories of Reuters Corpus

Page 9: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Comments from Pre-submission Presentation

4. Chap 7 remove Joachims Results using quotation is fine

Page 10: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Comments from Pre-submission Presentation

5. Tone down “best” claims

to our knowledge (experience, understanding)

Pay attention this usage when doing presentation

Page 11: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Introduction:Other Text Representation

• Word senses (meanings) [Kehagias 2001]

same word assumes different meanings in a different contexts

• Term clustering [Lewis 1992]

group words with high degree of pairwise semantic relatedness

• Semantic and syntactic representation [Scott & matwin 1999]

Relationship between words, i.e. phrases, synonyms and hypernyms

Page 12: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Introduction:Other Text Representation

• Latent Semantic Indexing [Deerwester 1990]A feature reconstruction technique

• Combination Approach [Peng 2003]combine two types of indexing terms, i.e. words and 3-grams

In general, high level representation did not show good performance in most cases

Page 13: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Literature Review:Knowledge-based Representation

• Theme Topic Mixture Model – Graphical Model [Keller 2004]• Using keywords from summarization [Li 2003]

Page 14: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Literature Review: 2. How to weight a term (feature)

[Salton 1988] elaborated three considerations:

1. term occurrences closely represent the content of document

2. other factors with the discriminating power pick up the relevant documents from other irrelevant documents

3. consider the effect of length of documents

Page 15: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Literature Review: 2. How to weight a term (feature)

1. Term Frequency Factor

Binary representation (1 for present and 0 for absent)

Term frequency (tf): number of times a term occurs in a document

Log(tf): log operation to scale the effect of unfavorably high term frequency

Inverse term frequency (ITF)

Page 16: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Literature Review: 2. How to weight a term (feature)

2. Collection Frequency Factor

idf: the most-commonly used factor

Probabilistic idf: aka. term relevance weight

Feature selection metrics: chi^2, information gain, gain ratio, odds ratio, etc.

Page 17: Comments from  Pre-submission Presentation

SoC Presentation Title 2004

Literature Review: 2. How to weight a term (feature)

3. Normalization Factor

Combine the above two factors by using multiplication operation

In order to eliminate the length effect, we use the cosine normalization to limit the term weighting range within (0,1)