naïvebayes - georgia institute of technologybboots3/cs4641-fall2018/lecture10/10… · training...

53
Naïve Bayes Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Upload: others

Post on 12-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

Naïve Bayes

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Page 2: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

EssentialProbabilityConcepts• Marginalization:

• ConditionalProbability:

• Bayes’Rule:

• Independence:

2

P (A | B) =P (A ^B)

P (B)

P (A ^B) = P (A | B)⇥ P (B)

A??B $ P (A ^B) = P (A)⇥ P (B)

$ P (A | B) = P (A)

P (A | B) =P (B | A)⇥ P (A)

P (B)

A??B | C $ P (A ^B | C) = P (A | C)⇥ P (B | C)

P (B) =X

v2values(A)

P (B ^A = v)

Page 3: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

DensityEstimation

3

Page 4: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

RecalltheJointDistribution...

4

alarm ¬alarmearthquake ¬earthquake earthquake ¬earthquake

burglary 0.01 0.08 0.001 0.009¬burglary 0.01 0.09 0.01 0.79

Page 5: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

HowCanWeObtainaJointDistribution?Option1: Elicititfromanexperthuman

Option2: Builditupfromsimplerprobabilisticfacts• e.g,ifweknew

P(a) = 0.7 P(b|a) = 0.2 P(b|¬a) = 0.1then,wecouldcomputeP(a Ù b)

Option3: Learnitfromdata...

5BasedonslidebyAndrewMoore

Page 6: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

LearningaJointDistribution

BuildaJDtableforyourattributesinwhichtheprobabilitiesareunspecified

Then,fillineachrowwith:

records ofnumber totalrow matching records)row(ˆ =P

A B C Prob0 0 0 ?

0 0 1 ?

0 1 0 ?

0 1 1 ?

1 0 0 ?

1 0 1 ?

1 1 0 ?

1 1 1 ?

A B C Prob0 0 0 0.30

0 0 1 0.05

0 1 0 0.10

0 1 1 0.05

1 0 0 0.05

1 0 1 0.10

1 1 0 0.25

1 1 1 0.10

FractionofallrecordsinwhichA andB aretruebutC isfalse

Step1: Step2:

Slide©AndrewMoore

Page 7: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

DensityEstimation• OurjointdistributionlearnerisanexampleofsomethingcalledDensityEstimation

• ADensityEstimatorlearnsamappingfromasetofattributestoaprobability

DensityEstimator

ProbabilityInputAttributes

Slide©AndrewMoore

Page 8: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

RegressorPredictionofreal-valuedoutput

InputAttributes

DensityEstimation

Compareitagainstthetwoothermajorkindsofmodels:

ClassifierPredictionofcategoricaloutput

InputAttributes

DensityEstimator

ProbabilityInputAttributes

Slide©AndrewMoore

Page 9: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

EvaluatingDensityEstimation

TestsetAccuracy

?

TestsetAccuracy

Test-setcriterionforestimatingperformance

onfuturedata

RegressorPredictionofreal-valuedoutput

InputAttributes

ClassifierPredictionofcategoricaloutput

InputAttributes

DensityEstimator

ProbabilityInputAttributes

Slide©AndrewMoore

Page 10: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

• Givenarecordx,adensityestimatorM cantellyouhowlikelytherecordis:

• Thedensityestimatorcanalsotellyouhowlikelythedatasetis:– UndertheassumptionthatallrecordswereindependentlygeneratedfromtheDensityEstimator’sJD(thatis,i.i.d.)

EvaluatingaDensityEstimator

P̂ (x | M)

P̂ (dataset | M) = P̂ (x1 ^ x2 ^ . . . ^ xn | M) =nY

i=1

P̂ (xi | M)

dataset

SlidebyAndrewMoore

Page 11: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleSmallDataset:MilesPerGallonFromtheUCIrepository(thankstoRossQuinlan)• 192recordsinthetrainingset

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

SlidebyAndrewMoore

Page 12: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleSmallDataset:MilesPerGallonFromtheUCIrepository(thankstoRossQuinlan)• 192recordsinthetrainingset

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

P̂ (dataset | M) =nY

i=1

P̂ (xi | M)

= 3.4⇥ 10�203 (in this case)

SlidebyAndrewMoore

Page 13: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

LogProbabilities• Fordecentsizeddatasets,thisproductwillunderflow

• Therefore,sinceprobabilitiesofdatasetsgetsosmall,weusuallyuselogprobabilities

log

ˆP (dataset | M) = log

nY

i=1

ˆP (xi | M) =

nX

i=1

log

ˆP (xi | M)

P̂ (dataset | M) =nY

i=1

P̂ (xi | M)

= 3.4⇥ 10�203 (in this case)

BasedonslidebyAndrewMoore

Page 14: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleSmallDataset:MilesPerGallonFromtheUCIrepository(thankstoRossQuinlan)• 192recordsinthetrainingset

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

SlidebyAndrewMoore

log

ˆP (dataset | M) =

nX

i=1

log

ˆP (xi | M)

= �466.19 (in this case)

Page 15: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

Pros/ConsoftheJointDensityEstimatorTheGoodNews:• WecanlearnaDensityEstimatorfromdata.• Densityestimatorscandomanygoodthings…– Cansorttherecordsbyprobability,andthusspotweirdrecords(anomalydetection)

– Candoinference– IngredientforBayesClassifiers(comingverysoon...)

TheBadNews:• Densityestimationbydirectlylearningthejointisimpractical,mayresultinadversebehavior

SlidebyAndrewMoore

Page 16: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

CurseofDimensionality

SlidebyChristopherBishop

Page 17: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TheJointDensityEstimatoronaTestSet

• Anindependenttestsetwith196carshasamuchworselog-likelihood– Actuallyit’sabillionquintillionquintillionquintillionquintilliontimeslesslikely

• Densityestimatorscanoverfit......andthefulljointdensityestimatoristheoverfittiest ofthemall!

SlidebyAndrewMoore

Page 18: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

Copyright©AndrewW.Moore

OverfittingDensityEstimators

Ifthis everhappens,thejointPDElearnstherearecertaincombinationsthatareimpossible

log

ˆP (dataset | M) =

nX

i=1

log

ˆP (xi | M)

= �1 if for any i, ˆP (xi | M) = 0

SlidebyAndrewMoore

Page 19: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TheNaïveBayesClassifier

19

Page 20: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

Bayes’Rule• RecallBaye’s Rule:

• Equivalently,wecanwrite:

whereX isarandomvariablerepresentingtheevidenceandY isarandomvariableforthelabel(ourhypothesis)

• Thisisactuallyshortfor:

whereXj denotestherandomvariableforthej th feature20

P (hypothesis | evidence) = P (evidence | hypothesis)⇥ P (hypothesis)

P (evidence)

P (Y = yk | X = xi) =P (Y = yk)P (X1 = xi,1 ^ . . . ^Xd = xi,d | Y = yk)

P (X1 = xi,1 ^ . . . ^Xd = xi,d)

P (Y = yk | X = xi) =P (Y = yk)P (X = xi | Y = yk)

P (X = xi)

Page 21: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

NaïveBayesClassifierIdea:Usethetrainingdatatoestimate

Then,useBayesruletoinferfornewdata

• Estimatingthejointprobabilitydistributionisnotpractical

21

P (X | Y ) P (Y )and.P (Y |Xnew)

P (Y = yk | X = xi) =P (Y = yk)P (X1 = xi,1 ^ . . . ^Xd = xi,d | Y = yk)

P (X1 = xi,1 ^ . . . ^Xd = xi,d)

P (X1, X2, . . . , Xd | Y )

Easytoestimatefromdata

Unnecessary,asitturnsout

Impractical,butnecessary

Page 22: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

NaïveBayesClassifierProblem:estimatingthejointPDorCPDisn’tpractical– Severelyoverfits,aswesawbefore

However,ifwemaketheassumptionthattheattributesareindependentgiventheclasslabel,estimationiseasy!

• Inotherwords,weassumeallattributesareconditionallyindependentgivenY

• Oftenthisassumptionisviolatedinpractice,butmoreonthatlater…

22

P (X1, X2, . . . , Xd | Y ) =dY

j=1

P (Xj | Y )

Page 23: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = ? P(¬play) = ?P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

23

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 24: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = ? P(¬play) = ?P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

24

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 25: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

25

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 26: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

26

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 27: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

27

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 28: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

28

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 29: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

29

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 30: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

30

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 31: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = ?... ...

31

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 32: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = ?... ...

32

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 33: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayesEstimate anddirectlyfromthetrainingdatabycounting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = 1... ...

33

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 34: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

LaplaceSmoothing• Noticethatsomeprobabilitiesestimatedbycountingmightbezero– Possibleoverfitting!

• FixbyusingLaplacesmoothing:– Adds1toeachcount

where– cv isthecountoftraininginstanceswithavalueofv for

attributej andclasslabelyk– |values(Xj)| isthenumberofvaluesXj cantakeon

34

P (Xj = v | Y = yk) =cv + 1X

v02values(Xj)

cv0 + |values(Xj)|

Page 35: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

35

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 36: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

36

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 37: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = 3/5 P(Humid = high | ¬play) = ?... ...

37

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 38: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TrainingNaïveBayeswithLaplaceSmoothingEstimate anddirectlyfromthetrainingdatabycountingwithLaplacesmoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = 3/5 P(Humid = high | ¬play) = 2/3... ...

38

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 39: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

UsingtheNaïveBayesClassifier• Now,wehave

– Inpractice,weuselog-probabilitiestopreventunderflow

• Toclassifyanewpointx,

39

P (Y = yk | X = xi) =P (Y = yk)

Qdj=1 P (Xj = xi,j | Y = yk)

P (X = xi)Thisisconstantforagiveninstance,andsoirrelevanttoourprediction

h(x) = argmax

yk

P (Y = yk)

dY

j=1

P (Xj = xj | Y = yk)

= argmax

yk

logP (Y = yk) +

dX

j=1

logP (Xj = xj | Y = yk)

j th attributevalueofx

h(x) = argmax

yk

P (Y = yk)

dY

j=1

P (Xj = xj | Y = yk)

= argmax

yk

logP (Y = yk) +

dX

j=1

logP (Xj = xj | Y = yk)

Page 40: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

40

TheNaïveBayesClassifierAlgorithm

• Foreachclasslabelyk– EstimateP(Y = yk) fromthedata– Foreachvaluexi,j ofeachattributeXi

• EstimateP(Xi = xi,j | Y = yk)

• Classifyanewpointvia:

• Inpractice,theindependenceassumptiondoesn’toftenholdtrue,butNaïveBayesperformsverywelldespitethis

h(x) = argmax

yk

logP (Y = yk) +

dX

j=1

logP (Xj = xj | Y = yk)

Page 41: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

TheNaïveBayesGraphicalModel

• Nodesdenoterandomvariables• Edgesdenotedependency• Eachnodehasanassociatedconditionalprobabilitytable(CPT),conditioneduponitsparents

41

Attributes(evidence)

Labels(hypotheses)

Y

X1 Xi Xd…

P(Y)

P(X1 | Y) P(Xi | Y) P(Xd | Y)

Page 42: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

42

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Data:

Page 43: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

43

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yesno

Data:

Page 44: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

44

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Data:

Page 45: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

45

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yesrainy yessunny norainy no

Data:

Page 46: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

46

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Data:

Page 47: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

47

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp|Play)warm yescold yeswarm nocold no

Data:

Page 48: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

48

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3

Data:

Page 49: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

49

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3

Humid Play? P(Humid|Play)high yesnorm yeshigh nonorm no

Data:

Page 50: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleNBGraphicalModel

50

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3

Humid Play? P(Humid|Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3

Data:

Page 51: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleUsingNBforClassification

Goal: Predictlabelforx =(rainy,warm,normal)51

Play

Sky Temp Humid

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3

Humid Play? P(Humid|Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3

h(x) = argmax

yk

logP (Y = yk) +

dX

j=1

logP (Xj = xj | Y = yk)

Page 52: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

ExampleUsingNBforClassification

Predictlabelfor:x =(rainy,warm,normal)

52

Play

Sky Temp Humid

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky|Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp|Play)warm yes 4/5cold yes 1/5warm no 1/3cold no 2/3

Humid Play? P(Humid|Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3

predictPLAY

Page 53: NaïveBayes - Georgia Institute of Technologybboots3/CS4641-Fall2018/Lecture10/10… · Training Naïve Bayes Estimate and directly from the training data by counting! P(play) = 3/4

NaïveBayesSummaryAdvantages:• Fasttotrain(singlescanthroughdata)• Fasttoclassify• Notsensitivetoirrelevantfeatures• Handlesrealanddiscretedata• Handlesstreamingdatawell

Disadvantages:• Assumesindependenceoffeatures

53SlidebyEamonn Keogh