data mining linear and logistic regression data mining/6... · 07/02/2017 3 finding a and b •in...
TRANSCRIPT
07/02/2017
1
Data Mining Linear and Logistic Regression
Michael Li
1 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Regression
• In statistical modelling, regression analysis is a statistical process for estimating the relationships among variables.
• Regression models are built from data to predict the average you would expect one variable to have, given you know the value of one or more others.
• Simple linear regression maps one variable onto the mean value of another.
© University of Stirling 2017 CSCU9T6 Information Systems 2 of 26
07/02/2017
2
Example: weight-height relation
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100 120
We
igh
t
Height
Weight against Height
iii abxy 3 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Simple Linear Regression
• To find the best values for a and b, simple linear regression uses a method known as ordinary least squares (OLS)
• Least squares means that the sum of the squared distance between each data point and its associated prediction is minimised
• That is, it minimises
n
i i1
2
4 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
3
Finding a and b
• In the case of simple linear regression, a and b can be calculated as follows:
n
i i
n
i ii
xx
yyxxb
1
2
1
)(
))((
xbya
5 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Multiple Regression
• With multiple inputs, the general form of linear regression is
• The parameters in b are calculated as
iiiii bxbxbxby ...3322110
YXXXb TT 1)(
6 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
XbY
07/02/2017
4
Stats Packages
• Many statistics packages (such as SPSS) offer multiple regression
• Assumes there is a linear relationship between the inputs and the output
• Widely used in many fields
– Trend line
– Risk of investment
7 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Logistic Regression
• But what if one of the variables is a class, rather than a number?
• For example, let’s say we have data describing height and gender
• When we want to predict height from gender, it is easy – just calculate the average height of males and that of females, and that is it
• What if you want to predict gender from height?
8 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
5
Logistic Regression
• There is no ‘average’ gender for a given height
• Better to predict the probability of being male (or female) given a height value
• One way to do this is to recode the classes, for example Male =0 and Female = 1
• Then you can do a regression
9 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Linear Class Regression
• Problems
– Probability values go outside [0,1]
– Violates other assumptions made by linear regression
y = -0.0277x + 2.1686
-1
-0.5
0
0.5
1
1.5
2
0 20 40 60 80 100 120
Gender Code
abxxcP )|(
10 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
6
There is a Better Way
• Leave the class labels as they are (Male, Female, in this case)
• Calculate a probability based on log odds
11 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Odds
• The odds of an event (being male, for example) are
• 0.5/0.5 = 1
• 0.75/0.25 = 3
• So odds mean ‘times as probable’
)(1
)(
cp
cP
12 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
7
Odds and Probability
• Lacks a desirable symmetry as the odds of male are not opposite the odds of female
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12
Pro
bab
ility
Odds
13 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Log Odds
• Note that ln(x) = -ln(1/x)
• So we take the log odds and get a function known as the logit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -4 -2 0 2 4 6
Pro
bab
ility
Log Odds (Logit)
)(1
)(ln
cP
cP
14 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
8
Logistic Regression
• Instead of trying to predict P(c|x)=ax + b
• We can predict the log odds given x
• Solving this equation (later ...) gives us the logistic regression curve we need
baxxcP
xcP
)|(1
)|(ln
15 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Logit to Probability
• OK, but if I say “The logit of x being male is 0.8”, you may not know what I mean
• We can get back to probabilities:
)(1
1
1)|(
)|(1
)|(
)|(1
)|(ln
baxbax
bax
bax
ee
excP
excP
xcP
baxxcP
xcP
16 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
9
Finding a and b
• All we need to do now is solve the set of equations that result from plugging our data into
• But there is a problem
• For a given x (height) we don’t have a probability measure, we have a 1 or 0
17 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
)(1
1)|(
baxexcP
Maximum Likelihood
• Let’s say we want to guess a parameter that predicts a probability (which, in this case we do, but this is more general ...)
• We can test a candidate value for the parameter using Maximum Likelihood
• Likelihood is the reverse of a conditional probability:
)|()|( xyPyxL 18 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
10
Maximum Likelihood
• Tossing a coin
• Probability distribution of tossing this coin
• Assume that we received 40 heads in 100 tossing, what is the probability of head?
© University of Stirling 2017 19 of 26 CSCU9T6 Information Systems
Maximum Likelihood
)|()|( xyPyxL
40 heads in 100 tossing
© University of Stirling 2017 20 of 26 CSCU9T6 Information Systems
5.0)( headP
4.0)( headP
07/02/2017
11
Likelihood of a Model
• Call our data set D and imagine we want to estimate a single parameter, a
• The likelihood of the parameter, given the data is
• The probability of the data is
)|()|( aDPDaL
Dd
adpaDP )|()|(
21 of 26 © University of Stirling 20167 CSCU9T6 Information Systems
Likelihood of a Model
• The likelihood of a model is a measure of how well the parameters guess at the true distribution, without ever needing to know the true distribution
• Note that P(c|x) does not appear in the formula, and we don’t need to know it
• P(d|a) is the estimate by the model of the probability of each data point
22 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
12
Maximum Likelihood Logistic Regression
1. Pick a value for a and b
2. Plug those values into for every value of x in the data
3. Find the product of all of these values by multiplying them together
4. Record that value as the likelihood
5. Choose better values for a and b and repeat
)(1
1)(
baxexP
23 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Log Likelihood
• One more problem to fix ...
• Multiplying many small probabilities together soon suffers from arithmetic underflow – the number is too small to represent or compare
• The solution is to take logs and sum because
)ln()ln()ln( abba
24 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
07/02/2017
13
An example of using logistic regression
• Can I get a mortgage with my credit rating?
25 of 26 © University of Stirling 2017 CSCU9T6 Information Systems
Credit score
Result
85 1
75 1
73 0
64 0
69 1
…
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100 120
P(F
ail)
Credit score
P(Fail | score)
Logistic regression
• “Rule of Ten”: A widely-used rule of thumb states that logistic regression models give stable values for the explanatory variables if based on a minimum of about 10 events.
• Sampling: As a rule of thumb, sampling controls at a rate of five times the number of cases will produce sufficient control data.
• Convergence: In some instances the model may not reach convergence.
26 of 26 © University of Stirling 2017 CSCU9T6 Information Systems