04 logistic regression - github pages · naïve bayes as a log-linear model in both naïve bayes...
TRANSCRIPT
![Page 1: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/1.jpg)
Logistic Regression
Some slides adapted from Dan Jurfasky and Brendan O’Connor
Instructor: Wei Xu
![Page 2: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/2.jpg)
Naïve Bayes Recap
• Bag of words (order independent)
• Features are assumed independent given class
Q: Is this really true?
![Page 3: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/3.jpg)
The problem with assuming conditional independence
• Correlated features -> double counting evidence – Parameters are estimated independently
• This can hurt classifier accuracy and calibration
![Page 4: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/4.jpg)
Logistic Regression
• Doesn’t assume features are independent
• Correlated features don’t “double count”
![Page 5: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/5.jpg)
What are “Features”?
• A feature function, f – Input: Document, D (a string) – Output: Feature Vector, X
![Page 6: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/6.jpg)
What are “Features”?
Doesn’t have to be just “bag of words”
![Page 7: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/7.jpg)
Feature Templates
• Typically “feature templates” are used to generate many features at once
• For each word: – ${w}_count – ${w}_lowercase – ${w}_with_NOT_before_count
![Page 8: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/8.jpg)
Logistic Regression: Example
• Compute Features:
• Assume we are given some weights:
![Page 9: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/9.jpg)
Logistic Regression: Example
• Compute Features • We are given some weights • Compute the dot product:
![Page 10: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/10.jpg)
Logistic Regression: Example
• Compute the dot product:
• Compute the logistic function:convert into probabilities between [0, 1]
linear combination
![Page 11: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/11.jpg)
The Logistic function
![Page 12: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/12.jpg)
Logistic Regression
• (Log) Linear Model – similar to Naïve Bayes
![Page 13: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/13.jpg)
The Dot Product
• Intuition: weighted sum of features • All Linear models have this form
![Page 14: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/14.jpg)
Log Linear Model
• A log-linear model is a mathematical model has the form a a function whose logarithm is a linear combination of the parameters.
exp( )
![Page 15: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/15.jpg)
![Page 16: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/16.jpg)
Naïve Bayes as a log-linear model
• Q: what are the features?
• Q: what are the weights?
![Page 17: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/17.jpg)
Naïve Bayes as a Log-Linear Model
![Page 18: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/18.jpg)
Naïve Bayes as a Log-Linear Model
In both naïve Bayes and logistic regression we compute the dot
product!
![Page 19: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/19.jpg)
NB vs. LR
• Both compute the dot product
• NB: sum of log probabilities
• LR: logistic function
![Page 20: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/20.jpg)
NB vs. LR: Parameter Learning
• Naïve Bayes: – Learn conditional probabilities independently by counting
• Logistic Regression: – Learn weights jointly
![Page 21: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/21.jpg)
LR: Learning Weights
• Given: a set of feature vectors and labels
• Goal: learn the weights
![Page 22: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/22.jpg)
LR: Learning Weights
Document
Feature Labels
![Page 23: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/23.jpg)
Q: what parameters should we choose?
• What is the right value for the weights?
• Maximum Likelihood Principle: – Pick the parameters that maximize the
probability of the y labels in the training data given the observations x.
![Page 24: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/24.jpg)
Maximum Likelihood Estimation
![Page 25: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/25.jpg)
Maximum Likelihood Estimation
• Unfortunately there is no closed form solution - (like there was with naïve Bayes)
![Page 26: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/26.jpg)
Closed Form Solution
• a Closed Form Solution is a simple solution that works instantly without any loops, functions etc
• e.g. the sum of integer from 1 to n
s= 0
for i in 1 to n
s = s + i
end for
print s
Iterative Algorithm
s = n(n +1 )/2
Closed Form Solution
![Page 27: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/27.jpg)
• Solution: – Iteratively climb the log-likelihood surface
through the derivatives for each weight
• Luckily, the derivatives turn out to be nice
Maximum Likelihood Estimation
![Page 28: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/28.jpg)
Gradient Ascent
![Page 29: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/29.jpg)
Gradient Ascent
Loop While not converged: For all features j, compute and add derivatives
: Training set log-likelihood
: Gradient vector
![Page 30: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/30.jpg)
Gradient ascent
![Page 31: 04 logistic regression - GitHub Pages · Naïve Bayes as a Log-Linear Model In both naïve Bayes and logistic regression we compute the dot product! NB vs. LR • Both compute the](https://reader036.vdocuments.net/reader036/viewer/2022081617/60482681a5ec2b219016bafd/html5/thumbnails/31.jpg)
Gradient ascent