![Page 1: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/1.jpg)
TextClassificationandNaïveBayes
TheTaskofTextClassification
Many slides are adapted from slides by Dan Jurafsky
![Page 2: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/2.jpg)
Isthisspam?
![Page 3: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/3.jpg)
WhowrotewhichFederalistpapers?• 1787-8:anonymousessaystrytoconvinceNewYorktoratifyU.SConstitution: Jay,Madison,Hamilton.
• Authorshipof12ofthelettersindispute• 1963:solvedbyMosteller andWallaceusingBayesianmethods
JamesMadison AlexanderHamilton
![Page 4: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/4.jpg)
Maleorfemaleauthor?1. By1925present-dayVietnamwasdividedintothreeparts
underFrenchcolonialrule.ThesouthernregionembracingSaigonandtheMekongdeltawasthecolonyofCochin-China;thecentralareawithitsimperialcapitalatHuewastheprotectorateofAnnam…
2. Claraneverfailedtobeastonishedbytheextraordinaryfelicityofherownname.Shefoundithardtotrustherselftothemercyoffate,whichhadmanagedovertheyearstoconverthergreatestshameintooneofhergreatestassets…
S.Argamon,M.Koppel,J.Fine,A.R.Shimoni,2003.“Gender,Genre,andWritingStyleinFormalWrittenTexts,”Text,volume23,number3,pp.321–346
![Page 5: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/5.jpg)
Positiveornegativemoviereview?• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists
• thisisthegreatestscrewballcomedyeverfilmed
• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.
5
![Page 6: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/6.jpg)
Whatisthesubjectofthisarticle?
• Antogonists andInhibitors
• BloodSupply• Chemistry• DrugTherapy• Embryology• Epidemiology• …
6
MeSH SubjectCategoryHierarchy
?
MEDLINE Article
![Page 7: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/7.jpg)
TextClassification
• Assigningsubjectcategories,topics,orgenres• Spamdetection• Authorshipidentification• Age/genderidentification• LanguageIdentification• Sentimentanalysis• …
![Page 8: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/8.jpg)
TextClassification:definition• Input:– adocumentd– afixedsetofclassesC= {c1,c2,…,cJ}
• Output:apredictedclassc Î C
![Page 9: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/9.jpg)
ClassificationMethods:Hand-codedrules
• Rulesbasedoncombinationsofwordsorotherfeatures– spam:black-list-addressOR(“dollars”AND“have beenselected”)
• Accuracycanbehigh– Ifrulescarefullyrefinedbyexpert
• Butbuildingandmaintainingtheserulesisexpensive
![Page 10: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/10.jpg)
ClassificationMethods:SupervisedMachineLearning
• Input:– adocumentd– afixedsetofclassesC= {c1,c2,…,cJ}– Atrainingsetofm hand-labeleddocuments(d1,c1),....,(dm,cm)
• Output:– alearnedclassifierγ:dà c
10
![Page 11: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/11.jpg)
ClassificationMethods:SupervisedMachineLearning
• Anykindofclassifier– Naïve Bayes– Logisticregression,maxent– Support-vectormachines– k-NearestNeighbors
– …
![Page 12: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/12.jpg)
TextClassificationandNaïveBayes
TheTaskofTextClassification
![Page 13: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/13.jpg)
TextClassificationandNaïveBayes
FormalizingtheNaïve BayesClassifier
![Page 14: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/14.jpg)
NaïveBayesIntuition
• Simple(“naïve”)classificationmethodbasedonBayesrule
• Reliesonverysimplerepresentationofdocument– Bagofwords
![Page 15: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/15.jpg)
Bayes’RuleAppliedtoDocumentsandClasses
•Foradocumentd andaclassc
P(c | d) = P(d | c)P(c)P(d)
![Page 16: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/16.jpg)
Naïve BayesClassifier(I)
cMAP = argmaxc∈C
P(c | d)
= argmaxc∈C
P(d | c)P(c)P(d)
= argmaxc∈C
P(d | c)P(c)
MAP is “maximum a posteriori” = most likely class
Bayes Rule
Dropping the denominator
![Page 17: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/17.jpg)
Naïve BayesClassifier(II)
cMAP = argmaxc∈C
P(d | c)P(c)
Document d represented as features x1..xn
= argmaxc∈C
P(x1, x2,…, xn | c)P(c)
![Page 18: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/18.jpg)
Naïve BayesClassifier(IV)
How often does this class occur?
cMAP = argmaxc∈C
P(x1, x2,…, xn | c)P(c)
O(|X|n•|C|)parameters
We can just count the relative frequencies in a corpus
Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.
![Page 19: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/19.jpg)
MultinomialNaïve BayesIndependenceAssumptions
• BagofWordsassumption:Assumepositiondoesn’tmatter
• ConditionalIndependence:AssumethefeatureprobabilitiesP(xi|cj)areindependentgiventheclassc.
P(x1, x2,…, xn | c)
P(x1,…, xn | c) = P(x1 | c)•P(x2 | c)•P(x3 | c)•...•P(xn | c)
![Page 20: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/20.jpg)
ThebagofwordsrepresentationI love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.
γ(
)=c
![Page 21: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/21.jpg)
Thebagofwordsrepresentation
γ(
)=cgreat 2love 2
recommend 1
laugh 1happy 1
... ...
![Page 22: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/22.jpg)
Planning GUIGarbageCollection
Machine Learning NLP
parsertagtrainingtranslationlanguage...
learningtrainingalgorithmshrinkagenetwork...
garbagecollectionmemoryoptimizationregion...
Test document
parserlanguagelabeltranslation…
Bagofwordsfordocumentclassification
...planningtemporalreasoningplanlanguage...
?
![Page 23: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/23.jpg)
ApplyingMultinomialNaiveBayesClassifierstoTextClassification
cNB = argmaxc j∈C
P(cj ) P(xi | cj )i∈positions∏
positions ¬ allwordpositionsintestdocument
![Page 24: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/24.jpg)
TextClassificationandNaïveBayes
FormalizingtheNaïve BayesClassifier
![Page 25: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/25.jpg)
TextClassificationandNaïveBayes
Naïve Bayes:Learning
![Page 26: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/26.jpg)
LearningtheMultinomialNaïve BayesModel
• Firstattempt:maximumlikelihoodestimates– simplyusethefrequenciesinthedata
Sec.13.3
P̂(wi | cj ) =count(wi,cj )count(w,cj )
w∈V∑
P̂(cj ) =doccount(C = cj )
Ndoc
![Page 27: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/27.jpg)
Parameterestimation
• Createmega-documentfortopicj byconcatenatingalldocsinthistopic– Usefrequencyofw inmega-document
fractionoftimeswordwi appearsamongallwordsindocumentsoftopiccj
P̂(wi | cj ) =count(wi,cj )count(w,cj )
w∈V∑
![Page 28: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/28.jpg)
ProblemwithMaximumLikelihood• Whatifwehaveseennotrainingdocumentswiththewordfantastic and
classifiedinthetopicpositive (thumbs-up)?
• Zeroprobabilitiescannotbeconditionedaway,nomattertheotherevidence!
P̂("fantastic" positive) = count("fantastic", positive)count(w, positive
w∈V∑ )
= 0
cMAP = argmaxc P̂(c) P̂(xi | c)i∏
Sec.13.3
![Page 29: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/29.jpg)
Laplace(add-1)smoothing:unknownwords
P̂(wu | c) = count(wu,c)+1
count(w,cw∈V∑ )
#
$%%
&
'(( + V +1
Addoneextrawordtothevocabulary,the“unknownword”wu
=1
count(w,cw∈V∑ )
#
$%%
&
'(( + V +1
![Page 30: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/30.jpg)
UnderflowPrevention:logspace• Multiplyinglotsofprobabilitiescanresultinfloating-pointunderflow.• Sincelog(xy)=log(x)+log(y)
– Bettertosumlogsofprobabilitiesinsteadofmultiplyingprobabilities.• Classwithhighestun-normalizedlogprobabilityscoreisstillmost
probable.
• Modelisnowjustmaxofsumofweights
cNB = argmaxc j∈C
logP(cj )+ logP(xi | cj )i∈positions∑
![Page 31: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/31.jpg)
TextClassificationandNaïve Bayes
Naïve Bayes:Learning
![Page 32: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/32.jpg)
TextClassificationandNaïveBayes
MultinomialNaïve Bayes:AWorkedExample
![Page 33: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/33.jpg)
Choosingaclass:P(c|d5)
P(j|d5) 1/4*(2/10)3 *2/10 *2/10≈0.00008
Doc Words Class
Training 1 Chinese BeijingChinese c
2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseChineseChineseTokyo Japan ?
33
ConditionalProbabilities:P(Chinese|c)=P(Tokyo|c)=P(Japan|c)=P(Chinese|j)=P(Tokyo|j)=P(Japan|j)=
Priors:P(c)=P(j)=
34 1
4
P̂(w | c) = count(w,c)+1count(c)+ |V |
P̂(c) = Nc
N
(5+1)/(8+7)=6/15(0+1)/(8+7)=1/15
(1+1)/(3+7)=2/10(0+1)/(8+7)=1/15
(1+1)/(3+7)=2/10(1+1)/(3+7)=2/10
3/4*(6/15)3 *1/15 *1/15≈0.0002
µ
µ
+1
![Page 34: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/34.jpg)
Summary:NaiveBayesisNotSoNaive• RobusttoIrrelevantFeatures
IrrelevantFeaturescanceleachotherwithoutaffectingresults
• Verygoodindomainswithmanyequallyimportantfeatures
DecisionTreessufferfromfragmentation insuchcases– especiallyiflittledata
• Optimaliftheindependenceassumptionshold:Ifassumedindependenceiscorrect,thenitistheBayesOptimalClassifierforproblem
• Agooddependablebaselinefortextclassification– Butwewillseeotherclassifiersthatgivebetteraccuracy
![Page 35: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/35.jpg)
TextClassificationandNaïveBayes
MultinomialNaïve Bayes:AWorkedExample
![Page 36: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/36.jpg)
TextClassificationandNaïveBayes
TextClassification:Evaluation
![Page 37: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/37.jpg)
The2-by-2contingencytable
correct notcorrectselected tp fp
notselected fn tn
![Page 38: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/38.jpg)
Precisionandrecall• Precision:%ofselecteditemsthatarecorrectRecall:%ofcorrectitemsthatareselected
correct notcorrectselected tp fp
notselected fn tn
![Page 39: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/39.jpg)
Acombinedmeasure:F• AcombinedmeasurethatassessestheP/RtradeoffisFmeasure(weightedharmonicmean):
• PeopleusuallyusebalancedF1measure– i.e.,withb =1(thatis,a =½): F =2PR/(P+R)
RPPR
RP
F+
+=
−+= 2
2 )1(1)1(1
1ββ
αα
![Page 40: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/40.jpg)
Confusionmatrixc• Foreachpairofclasses<c1,c2>howmanydocumentsfromc1 wereincorrectlyassignedtoc2?– c3,2:90wheatdocumentsincorrectlyassignedtopoultry
40
Docsintestset AssignedUK
Assignedpoultry
Assignedwheat
Assignedcoffee
Assignedinterest
Assignedtrade
TrueUK 95 1 13 0 1 0
Truepoultry 0 1 0 0 0 0
Truewheat 10 90 0 1 0 0
Truecoffee 0 0 0 34 3 7
Trueinterest - 1 2 13 26 5
Truetrade 0 0 2 14 5 10
![Page 41: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/41.jpg)
PerclassevaluationmeasuresRecall:Fractionofdocsinclassi classifiedcorrectly:
Precision:Fractionofdocsassignedclassi thatareactually
aboutclassi:
Accuracy:(1- errorrate)Fractionofdocsclassifiedcorrectly: 41
ciii∑
ciji∑
j∑
ciic ji
j∑
ciicij
j∑
Sec. 15.2.4
![Page 42: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/42.jpg)
Micro- vs.Macro-Averaging– Ifwehavemorethanoneclass,howdowecombinemultipleperformancemeasuresintoonequantity?
• Macroaveraging:Computeperformanceforeachclass,thenaverage.Averageonclasses
• Microaveraging:Collectdecisionsforeachinstancefromallclasses,computecontingencytable,evaluate.Averageoninstances
42
Sec. 15.2.4
![Page 43: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/43.jpg)
Micro- vs.Macro-Averaging:Example
Truth:yes
Truth:no
Classifier:yes 10 10
Classifier:no 10 970
Truth:yes
Truth:no
Classifier:yes 90 10
Classifier:no 10 890
Truth:yes
Truth:no
Classifier:yes 100 20
Classifier:no 20 1860
43
Class1 Class2 MicroAve.Table
Sec.15.2.4
• Macroaveraged precision:(0.5+0.9)/2=0.7• Microaveraged precision:100/120=.83• Microaveraged scoreisdominatedbyscoreoncommonclasses
![Page 44: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/44.jpg)
DevelopmentTestSetsandCross-validation
• Metric:P/R/F1orAccuracy• Unseentestset– avoidoverfitting (‘tuningtothetestset’)– moreconservativeestimateofperformance
– Cross-validationovermultiplesplits• Handlesamplingerrorsfromdifferentdatasets
– Poolresultsovereachsplit– Computepooleddev setperformance
Trainingset Development Test Set TestSet
TestSet
TrainingSet
TrainingSetDev Test
TrainingSet
Dev Test
Dev Test
![Page 45: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of](https://reader031.vdocuments.net/reader031/viewer/2022021816/5a79cf6e7f8b9ab05f8c9efa/html5/thumbnails/45.jpg)
TextClassificationandNaïveBayes
TextClassification:Evaluation