1 using bins to empirically estimate term weights for text categorization carl sable (columbia...

1

Using Binsto Empirically Estimate Term Weights

for Text Categorization

Carl Sable (Columbia University)

Kenneth W. Church (AT&T)

http://www.cs.columbia.edu/~sable/research/photos/cnt15187.jpg












2

Binning Overview

I. Task and Corpus: Multimedia news documents

II. Related Work: –Naïve Bayes–Smoothing & Speech Recognition–Binning in Information Retrieval

III. Our Proposal:–Use bins for Text Categorization

IV. Results and Evaluation:–Binning: rarely hurts, sometimes helps

V. Reuters:–Standard benchmark evaluation

VI. Conclusions: Robust version of Naïve Bayes

3

Outdoor Indoor




4

Clues for Indoor/Outdoor:Text (as opposed to Vision)

Denver Summit of Eight leaders begin their first official meeting in the Denver Public Library, June 21.

Villagers look at the broken tail-end of the Fokker 28 Biman Bangladesh Airlines jet December 23, a day after it crash-landed near the town of Sylhet, in northeastern Bangladesh.

5

Event Categories

Politics Struggle

Disaster CrimeOther

6

Manual Categorization Tool

7

Related Work

• Naïve Bayes

• Jelinek, 1998– Smoothing techniques for Speech Recognition– Deleted Interpolation (binning)

• Umemura and Church, 2000– Applied binning to Information Retrieval

)|()(maxarg

i

jijCc

cwPcPcj

8

Bin System:Naïve Bayes + Smoothing

• Binning: based on smoothing in speech recognition

• Not enough training data to estimate weights (log likelihood ratios) for each word– But there would be enough training data if we group

words with similar “features” into a common “bin”

• Estimate a single weight for each bin– This weight is assigned to all words in the bin

• Credible estimates even for small counts (zeros)

9

Intuition WordIndoor Freq

Outdoor Freq IDF Burstiness

Clearly Indoor

conference 15 0 2.5 0

bed 1 0 4.5 0

Clearly Outdoor

airplane 0 2 5.4 1

earthquake 0 4 4.6 1

UnclearGore 1 1 4.5 1

ceremony 5 6 3.9 0

10

“airplane”

• Sparse data• First half of training set:

– “airplane” appears in• 2 outdoor documents

• 0 indoor documents

• Infinitely more likely to be outdoor???• Assign “airplane” to bins of words with similar

features (e.g., IDF, burstiness, counts)

11

Lambdas: Weights• First half of training set: Assign words to bins• Second half of training set: Calibrate

– Average weights over words in bin

binword ||

)(||

1)|(docswordDF

binbinobsP

)|(log2bin binobsP

12

Lambdas for “airplane”:14 times more likely to be outdoor than indoor

410*11.2)binindoor |obs( P

310*90.2)binoutdoor |obs( P

78.3)binoutdoor |obs(

)binindoor |obs(log2

P

P

13

Binning Credible Log Likelihood Ratios

Intuition Word LambdaIndoor Freq

Outdoor Freq IDF Burstiness

Clearly Indoor

conference 5.9 15 0 2.5 0

bed 4.6 1 0 4.5 0

Clearly Outdoor

airplane -3.8 0 2 5.4 1

earthquake -4.9 0 4 4.6 1

UnclearGore 0.7 1 1 4.5 1

ceremony -0.3 5 6 3.9 0

14

Evaluation

• Mutually exclusive categories

• Performance measured by overall accuracy:

sprediction total#

spredictioncorrect #Accuracy

15

Bins: Robust Version of Naïve BayesPerformance is often similar,

but can be much better

70.0%

75.0%

80.0%

85.0%

90.0%

81.0%

83.0%

85.0%

87.0%

89.0%

Bins

Naïve Bayes

Indoor/Outdoor Events: Politics, Struggle, Disaster, Crime, Other

16

Bins: Robust Version of Naïve BayesPerforms well against other alternatives

70.0%

75.0%

80.0%

85.0%

90.0%

81.0%

83.0%

85.0%

87.0%

89.0%Bins

Naïve Bayes

Rocchio 1

KNN

PrInd

SVM

MaxEnt

Rocchio 2

Density

Indoor/Outdoor Events: Politics, Struggle, Disaster, Crime, Other

17

Reuters http://www.research.att.com/~lewis/reuters21578.html

• Common corpus for comparing methods– Over 10,000 articles, 90 topic categories

• Modified method to output multiple cats for each doc– One category per document

• Indoor/outdoor & politics/struggle/disaster/crime/other

– Multiple (0 or more) categories per document • Reuters

Doc #5 grain, wheat, corn, barley, oat, sorghum

Doc # 9earn

Doc # 448gold, acq, platinum

18

Evaluation for Reuters:Accuracy Precision/Recall (F)

• Accuracy is misleading when documents are assigned multiple categories

• Use precision & recall instead

• F-measure: combines precision & recall

• Macro-averaging vs. micro-averaging– Macro: average over categories

– Micro: average over documents

• Macro usually lower– Since small categories are hard

p = a / (a + b)

r = a / (a + c)

Contingency Table:

rp

r*p*2F1

“yes” is correct

“no” is correct

Assigned “yes” a b

Assigned “no” c d

19

Bins: Robust Version of Naïve BayesPerformance is often similar,

but can be much better

Reuters: Micro-F1

79%

81%

83%

85%

87%

NB Bin

Macro-F1

35%

40%

45%

50%

55%

20

Bins: Robust Version of Naïve Bayes

Reuters: Micro-F1

79%

81%

83%

85%

87%

SVM KNN LSF NNet NB Bin

Macro-F1

35%

40%

45%

50%

55%

21

Conclusions

• Binning: Robust version of Naïve Bayes– Often helps, rarely hurts– Smoothing: borrowed from Speech Recognition– Reliable log-likelihood ratios even for small counts:

• airplane: 2 outdoor docs, 0 indoor docs – 14 times more likely to be outdoor than indoor

• Three Evaluations– Indoor vs. Outdoor (mutually exclusive categories)– Events (mutually exclusive categories)– Reuters (many-to-many)

1 using bins to empirically estimate term weights for text categorization carl sable (columbia...

Documents

training data

assign words

assign airplane

similar features

use bins

nave bayes smoothingbinning

in2 outdoor documents

text categorizationresults