Download - 1 Using Bins to Empirically Estimate Term Weights for Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

1

Using Binsto Empirically Estimate Term Weights

for Text Categorization

Carl Sable (Columbia University)

Kenneth W. Church (AT&T)

http://www.cs.columbia.edu/~sable/research/photos/cnt15187.jpg












2

Binning Overview

I. Task and Corpus: Multimedia news documents

II. Related Work: –Naïve Bayes–Smoothing & Speech Recognition–Binning in Information Retrieval

III. Our Proposal:–Use bins for Text Categorization

IV. Results and Evaluation:–Binning: rarely hurts, sometimes helps

V. Reuters:–Standard benchmark evaluation

VI. Conclusions: Robust version of Naïve Bayes

3

Outdoor Indoor




4

Clues for Indoor/Outdoor:Text (as opposed to Vision)

Denver Summit of Eight leaders begin their first official meeting in the Denver Public Library, June 21.

Villagers look at the broken tail-end of the Fokker 28 Biman Bangladesh Airlines jet December 23, a day after it crash-landed near the town of Sylhet, in northeastern Bangladesh.

5

Event Categories

Politics Struggle

Disaster CrimeOther

6

Manual Categorization Tool

7

Related Work

• Naïve Bayes

• Jelinek, 1998– Smoothing techniques for Speech Recognition– Deleted Interpolation (binning)

• Umemura and Church, 2000– Applied binning to Information Retrieval

)|()(maxarg

i

jijCc

cwPcPcj

8

Bin System:Naïve Bayes + Smoothing

• Binning: based on smoothing in speech recognition

• Not enough training data to estimate weights (log likelihood ratios) for each word– But there would be enough training data if we group

words with similar “features” into a common “bin”

• Estimate a single weight for each bin– This weight is assigned to all words in the bin

• Credible estimates even for small counts (zeros)

9

Intuition WordIndoor Freq

Outdoor Freq IDF Burstiness

Clearly Indoor

conference 15 0 2.5 0

bed 1 0 4.5 0

Clearly Outdoor

airplane 0 2 5.4 1

earthquake 0 4 4.6 1

UnclearGore 1 1 4.5 1

ceremony 5 6 3.9 0

10

“airplane”

• Sparse data• First half of training set:

– “airplane” appears in• 2 outdoor documents

• 0 indoor documents

• Infinitely more likely to be outdoor???• Assign “airplane” to bins of words with similar

features (e.g., IDF, burstiness, counts)

11

Lambdas: Weights• First half of training set: Assign words to bins• Second half of training set: Calibrate

– Average weights over words in bin

binword ||

)(||

1)|(docswordDF

binbinobsP

)|(log2bin binobsP

12

Lambdas for “airplane”:14 times more likely to be outdoor than indoor

410*11.2)binindoor |obs( P

310*90.2)binoutdoor |obs( P

78.3)binoutdoor |obs(

)binindoor |obs(log2

P

P

13

Binning Credible Log Likelihood Ratios

Intuition Word LambdaIndoor Freq

Outdoor Freq IDF Burstiness

Clearly Indoor

conference 5.9 15 0 2.5 0

bed 4.6 1 0 4.5 0

Clearly Outdoor

airplane -3.8 0 2 5.4 1

earthquake -4.9 0 4 4.6 1

UnclearGore 0.7 1 1 4.5 1

ceremony -0.3 5 6 3.9 0

14

Evaluation

• Mutually exclusive categories

• Performance measured by overall accuracy:

sprediction total#

spredictioncorrect #Accuracy

15

Bins: Robust Version of Naïve BayesPerformance is often similar,

but can be much better

70.0%

75.0%

80.0%

85.0%

90.0%

81.0%

83.0%

85.0%

87.0%

89.0%

Bins

Naïve Bayes

Indoor/Outdoor Events: Politics, Struggle, Disaster, Crime, Other

16

Bins: Robust Version of Naïve BayesPerforms well against other alternatives

70.0%

75.0%

80.0%

85.0%

90.0%

81.0%

83.0%

85.0%

87.0%

89.0%Bins

Naïve Bayes

Rocchio 1

KNN

PrInd

SVM

MaxEnt

Rocchio 2

Density

Indoor/Outdoor Events: Politics, Struggle, Disaster, Crime, Other

17

Reuters http://www.research.att.com/~lewis/reuters21578.html

• Common corpus for comparing methods– Over 10,000 articles, 90 topic categories

• Modified method to output multiple cats for each doc– One category per document

• Indoor/outdoor & politics/struggle/disaster/crime/other

– Multiple (0 or more) categories per document • Reuters

Doc #5 grain, wheat, corn, barley, oat, sorghum

Doc # 9earn

Doc # 448gold, acq, platinum

18

Evaluation for Reuters:Accuracy Precision/Recall (F)

• Accuracy is misleading when documents are assigned multiple categories

• Use precision & recall instead

• F-measure: combines precision & recall

• Macro-averaging vs. micro-averaging– Macro: average over categories

– Micro: average over documents

• Macro usually lower– Since small categories are hard

p = a / (a + b)

r = a / (a + c)

Contingency Table:

rp

r*p*2F1

“yes” is correct

“no” is correct

Assigned “yes” a b

Assigned “no” c d

19

Bins: Robust Version of Naïve BayesPerformance is often similar,

but can be much better

Reuters: Micro-F1

79%

81%

83%

85%

87%

NB Bin

Macro-F1

35%

40%

45%

50%

55%

20

Bins: Robust Version of Naïve Bayes

Reuters: Micro-F1

79%

81%

83%

85%

87%

SVM KNN LSF NNet NB Bin

Macro-F1

35%

40%

45%

50%

55%

21

Conclusions

• Binning: Robust version of Naïve Bayes– Often helps, rarely hurts– Smoothing: borrowed from Speech Recognition– Reliable log-likelihood ratios even for small counts:

• airplane: 2 outdoor docs, 0 indoor docs – 14 times more likely to be outdoor than indoor

• Three Evaluations– Indoor vs. Outdoor (mutually exclusive categories)– Events (mutually exclusive categories)– Reuters (many-to-many)

Download - 1 Using Bins to Empirically Estimate Term Weights for Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

Top Related