intro to logistic regression
TRANSCRIPT
![Page 1: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/1.jpg)
Logistic RegressionJacquelyn Victoria & Tamer Wahba
1
![Page 2: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/2.jpg)
Slide OwnershipJacquelyn Victoria - 3 to 9Tamer Wahba - 10 to 15
2
![Page 3: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/3.jpg)
Regression Analysis +
Classification
How can we predict a nominal class using regression analysis?
Consider a binary class:
Each instance x is a vector of feature values
Our output values or class labels are restricted to 0 or 1, i.e. f(x) {0, 1}∈
We need an h(x) where: 0 < h(x) < 1
We need a function which exhibits this behavior
3
![Page 4: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/4.jpg)
Logistic Functions Sigmoid Function σ(x)
Asymptotes at y = 1 and y = 0
Easy to specify threshold (σ(0) = .5)
Results are P(y=1)
As a result:
Where θ is a vector of weights
4
![Page 5: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/5.jpg)
Cost FunctionNeed to find hθ(x) that is a logistic
function that represents our data
Need to find θ to fit our data
-log(1-x)-log(x)
5
![Page 6: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/6.jpg)
Gradient Descent
In order to find the minimum, we can use the partial derivative of J(θ)
do {
}until θ converges
Where α is the learning rate (almost always between 0 and 1, .1-.3 usually a good range)
6
![Page 7: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/7.jpg)
Maximum Likelihood Estimation
7
do {
}until θ converges
Can also be calculated using:
Iteratively Reweighted Least Squares
Multinomial data uses Softmax Regression
![Page 8: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/8.jpg)
Interpreting hypothesis
8
Recall that σ(0) = .5 and that hθ(x) = σ(θTx)
x1
x2
![Page 9: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/9.jpg)
Interpreting hθ
I want to create a model to give me the probability that I will pass a test given how many hours I have studied
Hours 0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50
Pass 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1
Using this generated model, calculate my probability of passing given I have studied 3 hours
P(passing| study time = 3) = .61
9source
![Page 10: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/10.jpg)
Logistic Regression
Compared to Other
Classifiers
Naive Bayes
Support Vector Machines
Decision Trees
10
![Page 11: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/11.jpg)
vs Decision TreeAssumptions
DT: decision boundaries parallel to axes
LR: one smooth boundary
Decision trees can be used when there are multiple decision boundaries
11
![Page 12: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/12.jpg)
Feature Weights
NB: each set independently depending on class
LR: together such that decision function tends to be high for positive classes and low for negative classes
Correlated features have no effect on logistic regression
vs Naive Bayes
12
![Page 13: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/13.jpg)
vs Support Vector Machine
13
Both attempt to find hyperplane separating training samples
SVM: find the solution with maximum margin
LR: find any solution that separates the instances
SVM is a hard classified while LR is probabilistic
![Page 14: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/14.jpg)
AdvantagesWorks well with diagonal decision boundaries
Does not give undue weight to correlated features
Probabilistic outcomes
14
Requires large sample size for stable results
Disadvantages
![Page 15: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/15.jpg)
Use CasesCategorical outcomes
Large sample data
Minimal preprocessing
15
![Page 16: Intro to Logistic Regression](https://reader035.vdocuments.net/reader035/viewer/2022062310/5871c3091a28ab55058b70c7/html5/thumbnails/16.jpg)
For more info...
Helpful links to go into more depth with Logistic Regression
Stanford Open Course (Logit regression section)
Logit Regression Tutorial (exercises in MATLAB)
Logit Regression Tutorial (no code)
How to use Logit Regression in Python
How to use Logit Regression in R
How to use Logit Regression in Java using Weka
16