![Page 1: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/1.jpg)
Naïve Bayes and Logistic Regression
Machine Learning 10-701
Tom M. MitchellCenter for Automated Learning and Discovery
Carnegie Mellon University
September 27, 2005
Required reading:
• Mitchell draft chapter (see course website)
Recommended reading:
• Mitchell, 6.10 (text learning example)
• Bishop, Chapter 3.1.3, 3.1.4
• Ng and Jordan paper
![Page 2: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/2.jpg)
Naïve Bayes and Logistic Regression
• Design learning algorithms based on our understanding of probability
• Two of the most widely used
• Interesting relationship between these two
• Generative and Discriminative classifiers
![Page 3: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/3.jpg)
Bayes Rule
Which is shorthand for:
Random Variable It’s ith possible value
![Page 4: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/4.jpg)
Bayes Rule
Which is shorthand for:
Equivalently:
![Page 5: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/5.jpg)
Bayes Rule
Which is shorthand for:
Common abbreviation:
![Page 6: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/6.jpg)
Bayes Classifier
Training data:
Learning = estimating P(X|Y), P(Y)Classification = using Bayes rule to
calculate P(Y | Xnew)
X Y
![Page 7: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/7.jpg)
Bayes Classifier
Training data:
How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?
X Y
![Page 8: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/8.jpg)
Bayes Classifier
Training data:
How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?
X Y
Full joint P(X 1
... Xn| Y)
usually impractical!
![Page 9: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/9.jpg)
Naïve Bayes
Naïve Bayes assumesX=h X1, …, Xn i, Y discrete-valued
i.e., that Xi and Xj are conditionally independent given Y, for all i≠j
![Page 10: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/10.jpg)
Conditional IndependenceDefinition: X is conditionally independent of Y given Z,
if the probability distribution governing X is independent of the value of Y, given the value of Z
Which we often write
E.g.,
![Page 11: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/11.jpg)
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y
then:
How many parameters needed now for P(X|Y)? P(Y)?
![Page 12: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/12.jpg)
Naïve Bayes classificationBayes rule:
Assuming conditional independence:
So, classification rule for Xnew =h X1, …, Xn i is:
![Page 13: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/13.jpg)
Naïve Bayes Algorithm
• Train Naïve Bayes (examples) for each* value yk
estimatefor each* value xij of each attribute Xi
estimate
• Classify (Xnew)
* parameters must sum to 1
![Page 14: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/14.jpg)
Estimating Parameters: Y, Xi discrete-valued
Maximum likelihood estimates:
MAP estimates (uniform Dirichlet priors):
![Page 15: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/15.jpg)
![Page 16: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/16.jpg)
Learning to classify text documents
• Classify which emails are spam• Classify which emails are meeting invites• Classify which web pages are student
home pages
How shall we represent text documents for Naïve Bayes?
![Page 17: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/17.jpg)
![Page 18: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/18.jpg)
![Page 19: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/19.jpg)
Baseline: Bag of Words Approach
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
…
Zaire 0
![Page 20: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/20.jpg)
![Page 21: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/21.jpg)
For code, seewww.cs.cmu.edu/~tom/mlbook.htmlclick on “Software and Data”
![Page 22: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/22.jpg)
![Page 23: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/23.jpg)
![Page 24: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/24.jpg)
What if we have continuous Xi ?Eg., image classification: Xi is ith pixel
Gaussian Naïve Bayes (GNB): assume
Sometimes assume variance• is independent of Y (i.e., σi), • or independent of Xi (i.e., σk)• or both (i.e., σ)
![Page 25: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/25.jpg)
Estimating Parameters: Y discrete, Xi continuous
Maximum likelihood estimates: jth training example
δ(x)=1 if x true, else 0
![Page 26: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/26.jpg)
Example: GNB for classifying mental states
~1 mm resolution
~2 images per sec.
15,000 voxels/image
non-invasive, safe
measures Blood Oxygen Level Dependent (BOLD) response
Typical impulse response
10 sec
![Page 27: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/27.jpg)
Brain scans can track activation with precision and sensitivity
![Page 28: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/28.jpg)
Gaussian Naïve Bayes: Learned μvoxel,wordP(BrainActivity | WordCategory = People)
![Page 29: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/29.jpg)
Learned Bayes Models – Means forP(BrainActivity | WordCategory)
Animal wordsPeople words
Pairwise classification accuracy: 85%
![Page 30: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/30.jpg)
Plot of single-voxel classification accuracies.
Gaussian naïve Bayes classifier
(yellow and red are most predictive).
Subject 1 Subject 2 Subject 3
![Page 31: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding](https://reader033.vdocuments.net/reader033/viewer/2022060914/60a7e4bd36668605a90ec8e3/html5/thumbnails/31.jpg)
What you should know:
• Learning (generative) classifiers based on Bayes rule
• Conditional independence– What it is– Why it’s important
• Naïve Bayes assumption and its consequences
• Naïve Bayes with discrete inputs, continuous inputs