naïvebayes - georgia institute of technologybboots3/cs4641-fall2018/lecture10/10… · training...
TRANSCRIPT
Naïve Bayes
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.
EssentialProbabilityConcepts• Marginalization:
• ConditionalProbability:
• Bayes’Rule:
• Independence:
2
P (A | B) =P (A ^B)
P (B)
P (A ^B) = P (A | B)⇥ P (B)
A??B $ P (A ^B) = P (A)⇥ P (B)
$ P (A | B) = P (A)
P (A | B) =P (B | A)⇥ P (A)
P (B)
A??B | C $ P (A ^B | C) = P (A | C)⇥ P (B | C)
P (B) =X
v2values(A)
P (B ^A = v)
DensityEstimation
3
RecalltheJointDistribution...
4
alarm ¬alarmearthquake ¬earthquake earthquake ¬earthquake
burglary 0.01 0.08 0.001 0.009¬burglary 0.01 0.09 0.01 0.79
HowCanWeObtainaJointDistribution?Option1: Elicititfromanexperthuman
Option2: Builditupfromsimplerprobabilisticfacts• e.g,ifweknew
P(a) = 0.7 P(b|a) = 0.2 P(b|¬a) = 0.1then,wecouldcomputeP(a Ù b)
Option3: Learnitfromdata...
5BasedonslidebyAndrewMoore
LearningaJointDistribution
BuildaJDtableforyourattributesinwhichtheprobabilitiesareunspecified
Then,fillineachrowwith:
records ofnumber totalrow matching records)row(ˆ =P
A B C Prob0 0 0 ?
0 0 1 ?
0 1 0 ?
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 ?
1 1 1 ?
A B C Prob0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
FractionofallrecordsinwhichA andB aretruebutC isfalse
Step1: Step2:
Slide©AndrewMoore
DensityEstimation• OurjointdistributionlearnerisanexampleofsomethingcalledDensityEstimation
• ADensityEstimatorlearnsamappingfromasetofattributestoaprobability
DensityEstimator
ProbabilityInputAttributes
Slide©AndrewMoore
RegressorPredictionofreal-valuedoutput
InputAttributes
DensityEstimation
Compareitagainstthetwoothermajorkindsofmodels:
ClassifierPredictionofcategoricaloutput
InputAttributes
DensityEstimator
ProbabilityInputAttributes
Slide©AndrewMoore
EvaluatingDensityEstimation
TestsetAccuracy
?
TestsetAccuracy
Test-setcriterionforestimatingperformance
onfuturedata
RegressorPredictionofreal-valuedoutput
InputAttributes
ClassifierPredictionofcategoricaloutput
InputAttributes
DensityEstimator
ProbabilityInputAttributes
Slide©AndrewMoore
• Givenarecordx,adensityestimatorM cantellyouhowlikelytherecordis:
• Thedensityestimatorcanalsotellyouhowlikelythedatasetis:– UndertheassumptionthatallrecordswereindependentlygeneratedfromtheDensityEstimator’sJD(thatis,i.i.d.)
EvaluatingaDensityEstimator
P̂ (x | M)
P̂ (dataset | M) = P̂ (x1 ^ x2 ^ . . . ^ xn | M) =nY
i=1
P̂ (xi | M)
dataset
SlidebyAndrewMoore
ExampleSmallDataset:MilesPerGallonFromtheUCIrepository(thankstoRossQuinlan)• 192recordsinthetrainingset
mpg modelyear maker
good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe
SlidebyAndrewMoore
ExampleSmallDataset:MilesPerGallonFromtheUCIrepository(thankstoRossQuinlan)• 192recordsinthetrainingset
mpg modelyear maker
good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe
P̂ (dataset | M) =nY
i=1
P̂ (xi | M)
= 3.4⇥ 10�203 (in this case)
SlidebyAndrewMoore
LogProbabilities• Fordecentsizeddatasets,thisproductwillunderflow
• Therefore,sinceprobabilitiesofdatasetsgetsosmall,weusuallyuselogprobabilities
log
ˆP (dataset | M) = log
nY
i=1
ˆP (xi | M) =
nX
i=1
log
ˆP (xi | M)
P̂ (dataset | M) =nY
i=1
P̂ (xi | M)
= 3.4⇥ 10�203 (in this case)
BasedonslidebyAndrewMoore
ExampleSmallDataset:MilesPerGallonFromtheUCIrepository(thankstoRossQuinlan)• 192recordsinthetrainingset
mpg modelyear maker
good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe
SlidebyAndrewMoore
log
ˆP (dataset | M) =
nX
i=1
log
ˆP (xi | M)
= �466.19 (in this case)
Pros/ConsoftheJointDensityEstimatorTheGoodNews:• WecanlearnaDensityEstimatorfromdata.• Densityestimatorscandomanygoodthings…– Cansorttherecordsbyprobability,andthusspotweirdrecords(anomalydetection)
– Candoinference– IngredientforBayesClassifiers(comingverysoon...)
TheBadNews:• Densityestimationbydirectlylearningthejointisimpractical,mayresultinadversebehavior
SlidebyAndrewMoore
CurseofDimensionality
SlidebyChristopherBishop
TheJointDensityEstimatoronaTestSet
• Anindependenttestsetwith196carshasamuchworselog-likelihood– Actuallyit’sabillionquintillionquintillionquintillionquintilliontimeslesslikely
• Densityestimatorscanoverfit......andthefulljointdensityestimatoristheoverfittiest ofthemall!
SlidebyAndrewMoore
Copyright©AndrewW.Moore
OverfittingDensityEstimators
Ifthis everhappens,thejointPDElearnstherearecertaincombinationsthatareimpossible
log
ˆP (dataset | M) =
nX
i=1
log
ˆP (xi | M)
= �1 if for any i, ˆP (xi | M) = 0
SlidebyAndrewMoore
TheNaïveBayesClassifier
19
Bayes’Rule• RecallBaye’s Rule:
• Equivalently,wecanwrite:
whereX isarandomvariablerepresentingtheevidenceandY isarandomvariableforthelabel(ourhypothesis)
• Thisisactuallyshortfor:
whereXj denotestherandomvariableforthej th feature20
P (hypothesis | evidence) = P (evidence | hypothesis)⇥ P (hypothesis)
P (evidence)
P (Y = yk | X = xi) =P (Y = yk)P (X1 = xi,1 ^ . . . ^Xd = xi,d | Y = yk)
P (X1 = xi,1 ^ . . . ^Xd = xi,d)
P (Y = yk | X = xi) =P (Y = yk)P (X = xi | Y = yk)
P (X = xi)
NaïveBayesClassifierIdea:Usethetrainingdatatoestimate
Then,useBayesruletoinferfornewdata
• Estimatingthejointprobabilitydistributionisnotpractical
21
P (X | Y ) P (Y )and.P (Y |Xnew)
P (Y = yk | X = xi) =P (Y = yk)P (X1 = xi,1 ^ . . . ^Xd = xi,d | Y = yk)
P (X1 = xi,1 ^ . . . ^Xd = xi,d)
P (X1, X2, . . . , Xd | Y )
Easytoestimatefromdata
Unnecessary,asitturnsout
Impractical,butnecessary
NaïveBayesClassifierProblem:estimatingthejointPDorCPDisn’tpractical– Severelyoverfits,aswesawbefore
However,ifwemaketheassumptionthattheattributesareindependentgiventheclasslabel,estimationiseasy!
• Inotherwords,weassumeallattributesareconditionallyindependentgivenY
• Oftenthisassumptionisviolatedinpractice,butmoreonthatlater…
22
P (X1, X2, . . . , Xd | Y ) =dY
j=1
P (Xj | Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = ? P(¬play) = ?P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
23
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = ? P(¬play) = ?P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
24
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
25
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
26
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
27
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
28
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
29
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
30
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = ?... ...
31
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = ?... ...
32
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = 1... ...
33
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
LaplaceSmoothing• Noticethatsomeprobabilitiesestimatedbycountingmightbezero– Possibleoverfitting!
• FixbyusingLaplacesmoothing:– Adds1toeachcount
where– cv isthecountoftraininginstanceswithavalueofv for
attributej andclasslabelyk– |values(Xj)| isthenumberofvaluesXj cantakeon
34
P (Xj = v | Y = yk) =cv + 1X
v02values(Xj)
cv0 + |values(Xj)|
TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
35
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...
36
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = 3/5 P(Humid = high | ¬play) = ?... ...
37
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:
P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = 3/5 P(Humid = high | ¬play) = 2/3... ...
38
Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes
P (Xj | Y ) P (Y )
UsingtheNaïveBayesClassifier• Now,wehave
– Inpractice,weuselog-probabilitiestopreventunderflow
• Toclassifyanewpointx,
39
P (Y = yk | X = xi) =P (Y = yk)
Qdj=1 P (Xj = xi,j | Y = yk)
P (X = xi)Thisisconstantforagiveninstance,andsoirrelevanttoourprediction
h(x) = argmax
yk
P (Y = yk)
dY
j=1
P (Xj = xj | Y = yk)
= argmax
yk
logP (Y = yk) +
dX
j=1
logP (Xj = xj | Y = yk)
j th attributevalueofx
h(x) = argmax
yk
P (Y = yk)
dY
j=1
P (Xj = xj | Y = yk)
= argmax
yk
logP (Y = yk) +
dX
j=1
logP (Xj = xj | Y = yk)
40
TheNaïveBayesClassifierAlgorithm
• Foreachclasslabelyk– EstimateP(Y = yk) fromthedata– Foreachvaluexi,j ofeachattributeXi
• EstimateP(Xi = xi,j | Y = yk)
• Classifyanewpointvia:
• Inpractice,theindependenceassumptiondoesn’toftenholdtrue,butNaïveBayesperformsverywelldespitethis
h(x) = argmax
yk
logP (Y = yk) +
dX
j=1
logP (Xj = xj | Y = yk)
TheNaïveBayesGraphicalModel
• Nodesdenoterandomvariables• Edgesdenotedependency• Eachnodehasanassociatedconditionalprobabilitytable(CPT),conditioneduponitsparents
41
Attributes(evidence)
Labels(hypotheses)
…
Y
X1 Xi Xd…
P(Y)
P(X1 | Y) P(Xi | Y) P(Xd | Y)
ExampleNBGraphicalModel
42
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Data:
ExampleNBGraphicalModel
43
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yesno
Data:
ExampleNBGraphicalModel
44
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Data:
ExampleNBGraphicalModel
45
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yesrainy yessunny norainy no
Data:
ExampleNBGraphicalModel
46
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Data:
ExampleNBGraphicalModel
47
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Temp Play? P(Temp|Play)warm yescold yeswarm nocold no
Data:
ExampleNBGraphicalModel
48
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3
Data:
ExampleNBGraphicalModel
49
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3
Humid Play? P(Humid|Play)high yesnorm yeshigh nonorm no
Data:
ExampleNBGraphicalModel
50
…
Play
Sky Temp Humid…
Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3
Humid Play? P(Humid|Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3
Data:
ExampleUsingNBforClassification
Goal: Predictlabelforx =(rainy,warm,normal)51
…
Play
Sky Temp Humid
…
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3
Humid Play? P(Humid|Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3
h(x) = argmax
yk
logP (Y = yk) +
dX
j=1
logP (Xj = xj | Y = yk)
ExampleUsingNBforClassification
Predictlabelfor:x =(rainy,warm,normal)
52
…
Play
Sky Temp Humid
…
Play? P(Play)yes 3/4no 1/4
Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3
Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3
Humid Play? P(Humid|Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3
predictPLAY
NaïveBayesSummaryAdvantages:• Fasttotrain(singlescanthroughdata)• Fasttoclassify• Notsensitivetoirrelevantfeatures• Handlesrealanddiscretedata• Handlesstreamingdatawell
Disadvantages:• Assumesindependenceoffeatures
53SlidebyEamonn Keogh