classification+,naïve+bayes...

28
Classification , Naïve Bayes with MLE/MAP Aarti Singh Machine Learning 101315 Sept 9, 2019

Upload: others

Post on 06-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Classification+, Naïve+Bayeswith+MLE/MAP

Aarti&Singh

Machine&Learning&101315Sept&9,&2019

Page 2: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Naïve&Bayes&Classifier

2

• Bayes'Classifier'with'additional'“naïve”'assumption:– Features'are'independent'given'class:

• Has'fewer'parameters,'and'hence'requires'fewer'training'data

Linear&instead&of&Quadratic&or&Exponential&in&d!

Page 3: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Hand%written+digit+recognition

3

0,$1,$2,$…,$9

Input,$X$(images$of$hand:written$digits) Label,$Y

Black:white$images$– d:dim$binary$(0/1)$vector

Discrete+Naïve+Bayes+model:

P(Y$=$y)$=$py for$all$y$in$0,$1,$…,$9 p0,$p1,$…,$p9 (sum$to$1)

P(Xi=xi|Y =$y)$: one$probability$value$for$each$y,$pixel$i

Akin+to+roll+of+a+dice+(Multinomial)

Akin+to+a+coin+flip+for+each+y+and+pixel+i (Bernoulli)

Page 4: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

MLE$for$Bernoulli,$Multinomial

4

Bernoulli(!)P(Heads)1=1!,11P(Tails)1=116!

“Frequency of roll y”p̂y,MLE =

↵yPy ↵y

p̂1,MLE , . . . , p̂6,MLE

“Frequency of heads”

Multinomial(!={p1,p2,..,p6})P(1)1=1p1,1P(2)1=1p2,1…,1P(6)1=1p61=116(p1+…p5)

Page 5: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Naïve&Bayes Algo – Discrete&features

• Training'Data

• Maximum'Likelihood'Estimates– For'Class'probability'

– For'class'conditional'distribution

• NB'Prediction'for'test'data

5

P̂ (xi|y) =

Page 6: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Issues%with%Naïve%Bayes

6

• Issue%1: Usually,)features)are)not)conditionally)independent:

Nonetheless,)NB)is)the)single)most)used)classifier)particularly))))when)data)is)limited,)works)well

• Issue%2: Typically)use)MAP)estimates)instead)of)MLE)since)insufficient)data)may)cause)MLE)to)be)zero.

Page 7: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Insufficient*data*for*MLE

7

• What&if&you&never&see&a&training&instance&where&X1=a&when&Y=b?– e.g.,&b={SpamEmail},&a&={‘Earn’}– P(X1=&a&|&Y&=&b)&=&0

• Thus,&no&matter&what&the&values&X2,…,Xd take:

• What&now???

=&0

Page 8: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Naïve&Bayes Algo – Discrete&features

• Training'Data

• Maximum'A'Posteriori'(MAP)'Estimates'– add'm'“virtual”'datapts

Assume'priors'

MAP'Estimate

Now,&even&if&you&never&observe&a&class/feature&posterior&probability&never&zero.

8

#'virtual'examples'with'Y'='b

Page 9: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Max$A$Posteriori$(MAP)$estimation

Justification+for+adding+virtual+examples• Assume+a+prior+distribution+for+parameters+! (reflects+prior+

belief+in+what+values+! is+likely+to+take)

9

• Choose+! that+maximizes+a+posterior+probability

Page 10: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

MAP$estimation$for$Bernoulli$r.v.

10

Choose(! that(maximizes(a(posterior(probability

MAP(estimate(of(probability(of(head:

Page 11: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Beta%distribution

11

More&concentrated&as&values&of&!H,&!T increase

Beta(2,3) Beta(20,30)

Page 12: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

MAP$estimation$for$Bernoulli$r.v.

12

Choose(! that(maximizes(a(posterior(probability

MAP(estimate(of(probability(of(head:Count(of(H/T(simply(get(added(to(parameters

Page 13: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Beta%conjugate%prior

13

As%n =%!H +%!T increases,%posterior%distribution%becomes%more%concentrated

Beta(2,3) Beta(20,30)

After%observing%1%Tail After%observing%18%Heads%and%28%Tails

Page 14: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

MAP$estimation$for$Bernoulli$r.v.

14

Choose(! that(maximizes(a(posterior(probability

MAP(estimate(of(probability(of(head:

Equivalent(to(adding(extra(coin(flips((βH(D 1(heads,(βT D 1(tails)

Mode(of(Betadistribution

As$we$get$more$data,$effect$of$prior$is$“washed$out”

Count(of(H/T(simply(get(added(to(parameters

Page 15: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

MLE$vs.$MAP

15When is MAP same as MLE?

! Maximum)Likelihood)estimation)(MLE)Choose)value)that)maximizes)the)probability)of)observed)data

! Maximum)a"posteriori (MAP))estimationChoose)value)that)is)most)probable)given)observed)data)and)prior)belief

Page 16: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

How$to$learn$parameters$from$data?MLE,$MAP

(Continuous$case)

16

Page 17: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Naïve&Bayes&with&continuous&features

17

Training&Data:

Gaussian&Naïve&Bayes&model:P(Y&=&y)&=&py for&all&y&in&0,&1,&2,&…,&9 p0,&p1,&…,&p9 (sum&to&1)

P(Xi=xi|Y =&y)&~&N(μ(y)i,&σ2i&(y))&for&each&y&and&each&pixel&i

1

…&n&greyscaleimages

…&n&labels

Input,&X

Label,&Y

Each&image&represented&as&a&vector&of&intensity&values&at&the&d&pixels&(features)

=

2

664

X1

X2

. . .Xd

3

775

2

X

Page 18: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Gaussian'distribution

18

Data,'D'=

• Parameters:'''µ – mean,'!2 2 variance

• Sleep'hrs'are'i.i.d.:– Independent events– Identically'distributed'according'to'Gaussian'distribution

6543 7 8 9 Sleep'hrs

Page 19: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

19

1p2⇡�2

1

(2⇡�2)n/2

Maximum'Likelihood'Estimation'(MLE)

Page 20: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

MLE$for$Gaussian$mean$and$variance

Note: MLE)for)the)variance)of)a)Gaussian)is)biased– Expected)result)of)estimation)is)not$true)parameter!)– Unbiased)variance)estimator:

20

Page 21: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Naïve&Bayes&Algo – continuous&features• Training'Data• Maximum'Likelihood'Estimates– For'Class'probability'

– For'class'conditional'distribution

• NB'Prediction'for'test'data

21

P̂ (xi|y) = N(µ̂i(y)

, �̂i2(y))

P̂ (xi|y) = N(µ̂i(y)

, �̂i2(y))

MLE estimates

Page 22: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

22

Maximum(likelihood(estimates:

jth training)imageith pixel)in)jth training)image

y)class

Naïve)Bayes)Algo)– continuous)features

1

n

nX

j=1

xj

µ̂i(y)=

1Pj 1(Y (j)=y)

X

j

X(j)i 1(Y (j)=y)

1

n� 1

nX

j=1

(xj � µ̂MLE)2

�̂i2(y) =

1Pj 1(Y (j)=y) � 1

X

j

(X(j)i � µ̂(y)

i )21(Y (j)=y)

Page 23: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Parameters(! =((μ,σ2)• Mean(μ:(Gaussian(prior

• Variance(σ2:(Inverse<Wishart(Distribution

MAP$estimation$for$Gaussian$r.v.

23

As$we$get$more$data,$effect$of$prior$is$“washed$out”

=(N(",#2)

Page 24: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Applications+of+Naïve+Bayes• Text%classification

Documents%! TopicEmails%! Spam

Bag%of%words%example%discussed%in%recitation

• Image%classificationHand;written%characters%%! Digit,%LetterBrain%scans% ! words,%objects%person%is%reading/

watching

24

Page 25: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

GNB$for$classifying$word$category

25

~1%mm%resolution

~2%images%per%sec.

15,000%voxels/image

non:invasive,%safe

measures%Blood%Oxygen%Level%Dependent%(BOLD)%response

[Mitchell%et%al.03]

Page 26: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Gaussian'Naïve'Bayes:'Learned'µvoxel,word

26

15,000'voxelsor'features

10'training'examples'orsubjects'perclass'(12'word'categories)

[Mitchell'et'al.03]

Page 27: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

Learned'Naïve'Bayes Models'–Means'for'P(BrainActivity |'WordCategory)

27

Animal)wordsPeople)wordsPairwise classification)accuracy:)85% [Mitchell)et)al.03]

Page 28: Classification+,Naïve+Bayes with+MLE/MAPaarti/Class/10315_Fall19/lecs/Lecture4.pdfClassification+,Naïve+Bayes with+MLE/MAP Aarti&Singh Machine&Learning&101315 Sept&9,&2019

What%you%should%know…

28

• Optimal*decision*using*Bayes Classifier• Naïve*Bayes classifier

– What’s*the*assumption– Why*we*use*it– How*do*we*learn*it*(MLE,*MAP)– Why*is*MAP*estimation*important