1 overview of logistics regression and its sas implementation logistics regression is widely used...

12
1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical studies when the dependent variable is dichotomous, representing an event or a non-event. However, because ordinary linear regression was routinely used before we had the modern statistical packages for analyzing logit, we will compare the statistical assumptions of logistic regression with that of ordinary least square linear regression. Next we will examine PROC LOGISTICS implemented in SAS and discuss the basic statistic output for understanding the logistic regression results. We will then discuss how to setup and understand logistics regression when the dependent variable has more than two outcomes. We will conclude the presentation by comparing PROC LOGISTICS with other SAS procedures that can also perform logistics regression.

Upload: ellen-rich

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

1

Overview of Logistics Regression and its SAS implementation

• Logistics regression is widely used nowadays in finance, marketing research and clinical studies when the dependent variable is dichotomous, representing an event or a non-event. However, because ordinary linear regression was routinely used before we had the modern statistical packages for analyzing logit, we will compare the statistical assumptions of logistic regression with that of ordinary least square linear regression.

• Next we will examine PROC LOGISTICS implemented in SAS and discuss the basic statistic output for understanding the logistic regression results.

• We will then discuss how to setup and understand logistics regression when the dependent variable has more than two outcomes.

• We will conclude the presentation by comparing PROC LOGISTICS with other SAS procedures that can also perform logistics regression.

Page 2: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

2

Examples of discrete responses:

• Examples of discrete responses:

– Getting decease vs. not getting decease

– Good, medium and bad credit risks

– Responders vs. non – responders (both in marketing or clinical trial studies)

– Married vs. unmarried

– Guilty vs. not guilty

Page 3: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

3

Comparing linear and logistic regression

• Linear Probability Model

• Logit Model

ii xp

ikiii

i xxxp

p12211 ..

1log

)..exp(1

1

12211 ikiii xxx

p

Page 4: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

4

Why can linear regression work reasonable well on binary dependent variables ?

2iiVar )(

iii bxay

Noramli ~

0),( jiCov

0)( iE

Assumptions Consequence of violations Notes

1 Biased parameter estimates

Parameter’s meaning hard to interpret, except a linear approximation to nonlinear functions. Prediction can be <0, >1

2 Biased intercept estimate

3 Unbiased estimates but biased Variance of

Biased confidence interval

4 Same as 3

5 Unable for us to use t , F statistical tests for

regression models.

The estimates may still be normal if sample size is large.

^

b^

b

• If 1) and 2) are true, it can be shown that 3) and 5) are necessarily false. However, the consequences may not be as serious as you expect.

Page 5: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

5

Logistic regression for binary response variables

Basic Syntax:

• proc logistic data=chdage1 outest=parms descending;

model chd = age /

selection = stepwise

ctable pprob = (0 to 1 by 0.1)

outroc=roc1;

• proc score data=chdage1 score = parms out=scored type=parms; var age;

run;

In the events/trials syntax, you specify two variables that contain count data for a binomial experiment. These two variables are separated by a slash. The value of the first variable, events, is the number of positive responses (or events). The value of the second variable, trials, is the number of trials.

Page 6: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

6

Interpretation of SAS output - continued

• Model Selection Criteria: – Convergence - difference in parameter estimates is small enough.

• Model Fit Statistics Criteria:

– Likelihood Function:

– – 2 * log (likelihood )

– AIC = – 2 * log ( max likelihood ) + 2 * k

– SIC = – 2 * log ( max likelihood ) + log (N) * k

• Testing Global Null Hypothesis: BETA=0

– Likelihood ratio: ln(L intercept)- ln(L int + covariates), – Score: 1st and 2nd derivative of Log(L)

– Wald: (coefficient / std error)2

iiy yi

n

ii ppL

1

1

)1(

Page 7: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

7

Interpretation of SAS output - continued

• Analysis of Maximum Likelihood Estimates– Parameter estimates and significance test

• Odds Ratio Estimates

– Odds:

– Odds ratio: Oi / Oj per unit change in covariate.

• Association of Predicted Probabilities and Observed Responses– Pairs: 43 (event) * 57 (non event) = 2451

– Concordant (0- lower prob vs. 1- higher prob)

– Discordant (0- higher prob vs. 1- lower prob)

– Tie – all other

• ROC used to visualize model model prediction strength.

)exp(0

ijj

k

ji xO

Page 8: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

8

Interpretation of SAS output - continued

Classification Table:– The model classifies an observation as an event if its estimated probability is greater

than or equal to a given probability cutpoints.

Percentages (%)Prob. Level Event Non Event Event Non Event Correct Sensitivity Specificity FALSE POS FALSE NEG

0 57 0 43 0 57 100 0 43 .0.1 57 1 42 0 58 100 2.3 42.4 00.2 55 7 36 2 62 96.5 16.3 39.6 22.20.3 51 19 24 6 70 89.5 44.2 32 240.4 50 25 18 7 75 87.7 58.1 26.5 21.90.5 45 27 16 12 72 78.9 62.8 26.2 30.80.6 41 32 11 16 73 71.9 74.4 21.2 33.30.7 32 36 7 25 68 56.1 83.7 17.9 410.8 24 39 4 33 63 42.1 90.7 14.3 45.80.9 6 42 1 51 48 10.5 97.7 14.3 54.8

1 0 43 0 57 43 0 100 . 57

Tot Correct / Total

Correct Event/ Tot Event

Correct N.Event/ Tot N.Event

F.Pos / (F.Pos+Pos)

F.Neg / (F.Neg+Neg)

Item a b c d(a+b) / (a+b+c+d) a / (a+d) b / (b+c) c / (a+c) d / (b+d)

Correct Incorrect

Page 9: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

9

Logistic regression for polychotomous response variables

• Example: Three outcomes

• The cumulative probability model

• The assumption:– A common slope parameter associated with the predictor.

213

2

1

1)3(

)2(

)1(

ppxYprp

xYprp

xYprp

xpp

pp

xp

p

221

21

11

1

)1

log(

)1

log(

Page 10: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

10

Logistic regression for polychotomous response variables

Examples:

• proc logistic data=diabetes descending;

model group=glutest;

output out=probs predicted=prob xbeta=logit;

format group gp.;

run;

Page 11: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

11

Other SAS Procedures for Logistic Regression Models

Proc Model Options Notes

Logistics Event/Trial format only works for binary Response

GENMOD Dist = Binormial

Link=logit

One of General linear models

PROBIT Dist=Logistic

CATMOD Proc catmod data=diabetes;

direct glutest;

response logits / out = cat_prob;

model group = glutest;

run;

Allow for individual parameters:

PHREG proc phreg data=diabetes;

model t * group = glutest;

run;

Trick: events occur at time one, non events occur at a later times (censored).

* t is dummy time var,

group is censoring var

xp

p

xp

p

223

2

112

1

)log(

)log(

Page 12: 1 Overview of Logistics Regression and its SAS implementation Logistics regression is widely used nowadays in finance, marketing research and clinical

12

References

• Hosmer, D.W, Jr. and Lemeshow, S. (1989), Applied Logistic Regression, New York: John Wiley & Sons, Inc.

• SAS Institute Inc. (1995), Logistic Regression Examples Using the SAS System, Cary, NC: SAS Institute Inc.

• Paul D. Allison (1999) Logistic Regression Using the SAS System: Theory and Application, BBU Press and John Wiley Sons Inc.