april 6

19
April 6 Logistic Regression Estimating probability based on logistic model Testing differences among multiple groups Assumptions for model

Upload: amato

Post on 05-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

April 6. Logistic Regression Estimating probability based on logistic model Testing differences among multiple groups Assumptions for model. Logistic regression equation. Model log odds of outcome as a linear function of one or more variables X i = predictors, independent variables - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: April 6

April 6

• Logistic Regression– Estimating probability based on logistic model

– Testing differences among multiple groups

– Assumptions for model

Page 2: April 6

Logistic regression equation

Model log odds of outcome as a linear function of one or more variables

Xi = predictors, independent variables

is increase in log odds of 1-unit increase in X

eis relative odds of a 1-unit increase in X

...)1

log( 22110

xx

The model is:

Page 3: April 6

Logistic Regression PredictionEstimating Probability of Y=1

Goal: Estimate for a set of X values

Solve for

...)1

log( 22110

xx

The model is:

exp ( 0 + 1x1 + 2x2)

1 + exp ( 0 + 1x1 + 2x2)

ODDS

1 + ODDS=

Page 4: April 6

Steps in Estimating

• Pick values for x1, x2, …, xp

• Compute log odds for your values of Xs using results– LO = b0 + b1x1 + b2x2 + … bpxp

• EXP LO to get odds– Odds = EXP (LO)

• Compute estimate of – = ODDS/(ODDS + 1)

Page 5: April 6

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -6.0621 1.2884 22.1395 <.0001AGE 1 0.0605 0.0223 7.3310 0.0068women 1 -0.3967 0.3166 1.5701 0.2102

log(odds) = - 6.0621 + 0.0605*age –0.3967*women

What is estimated probability of CVD for a man 60 years old?

Log(odds) = -6.0621 + 0.0605(60) –0.3967(0) = -2.4321

Odds = exp(-2.4321) = 0.0878

Prob = 0.0878 / (1 + 0.0878) = 0.0808

How old does a women have to be to have the same risk?

1-Year of age increases log(odds) by 0.0605

Being female decreases log(odds) by –0.3967

Compute 0.3967/.0605 = 6.6 or women would have to 66.6 years to have P = .0808

Page 6: April 6

PROC LOGISTIC DATA=temp DESCENDING; MODEL clinical = age women/CLODDS=WALD; UNITS age = 5 women = 1;RUN;

Getting Odds Ratio for Differences Other Than 1

SAS OUTPUTWald Confidence Interval for Adjusted Odds Ratios

Effect Unit Estimate 95% Confidence Limits

AGE 5.0000 1.353 1.087 1.685women 1.0000 0.673 0.362 1.251

EXP (5*0.0605)

Page 7: April 6

Testing Differences Among Multiple Groups Using Logistic Regression

• Ho:

• Ha: i not all equal

• Can test using logistic regression since if ’s are equal then log odds are equal

• Can code in SAS two ways– Create dummy (design) variables to represent the groups

– Use a CLASS statement under PROC LOGISTIC

Page 8: April 6

TOMHS Example: Is CVD Rate EqualIn Four Clinical Centers?

• Ho:

• SAS CODE in datastep (create own design variables):

DATA temp; SET tomhs.bpstudy; clinicA = 0; clinicB = 0; clinicC = 0; clinicD = 0; if clinic = 'A' then clinicA = 1; else if clinic = 'B' then clinicB = 1; else if clinic = 'C' then clinicC = 1; else if clinic = 'D' then clinicD = 1;RUN;

Page 9: April 6

Do Simple Analyses First

PROC MEANS N MEAN SUM MIN MAX DATA=temp; CLASS clinic; VAR clinical;RUN;

Analysis Variable : CLINICAL Indicator - Clinical Endpoint

NCLINIC Obs N Mean Sum Minimum Maximum------------------------------------------------------------------------------A 195 195 0.0974359 19.0000000 0 1.0000000

B 251 251 0.0517928 13.0000000 0 1.0000000

C 296 296 0.0472973 14.0000000 0 1.0000000

D 160 160 0.0312500 5.0000000 0 1.0000000

The relative odds (A/D) should be about 3. All betas should be > 0

Page 10: April 6

PROC LOGISTIC CODE

* Using class statement;PROC LOGISTIC DATA=TEMP DESCENDING SIMPLE; CLASS clinic/PARAM=REF; MODEL clinical = clinic ;RUN;

* Using user defined design variables;PROC LOGISTIC DATA=TEMP DESCENDING SIMPLE; MODEL clinical = clinica clinicb clinicc;RUN;

Uses 0/1 coding

Last group as reference

Gives summary statistics

Page 11: April 6

SAS OUTPUT USING CLASS STATEMENT

Response Profile

Ordered Total Value CLINICAL Frequency

1 1 51 2 0 851

Probability modeled is CLINICAL=1.

Class Level Information

Design Variables

Class Value 1 2 3

CLINIC A 1 0 0 B 0 1 0 C 0 0 1 D 0 0 0

Same coding as in datastep

Clinic D reference

Page 12: April 6

SAS OUTPUT USING CLASS STATEMENT

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 7.9632 3 0.0468Score 8.6122 3 0.0349Wald 8.1300 3 0.0434

Type III Analysis of Effects

WaldEffect DF Chi-Square Pr > ChiSq

CLINIC 3 8.1300 0.0434

These are equal because no other variables are in model

Page 13: April 6

SAS OUTPUT USING CLASS STATEMENT

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -3.4339 0.4544 57.1196 <.0001CLINIC A 1 1.2080 0.5145 5.5114 0.0189CLINIC B 1 0.5266 0.5363 0.9644 0.3261CLINIC C 1 0.4311 0.5305 0.6604 0.4164

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

CLINIC A vs D 3.347 1.221 9.175CLINIC B vs D 1.693 0.592 4.844CLINIC C vs D 1.539 0.544 4.353

Page 14: April 6

SAS OUTPUT USING MODEL clinical = clinicA clinicB clinicC

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -3.4339 0.4544 57.1196 <.0001clinicA 1 1.2080 0.5145 5.5114 0.0189clinicB 1 0.5266 0.5363 0.9644 0.3261clinicC 1 0.4311 0.5305 0.6604 0.4164

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

clinicA 3.347 1.221 9.175clinicB 1.693 0.592 4.844clinicC 1.539 0.544 4.353

Page 15: April 6

Maybe clinic rates of CVD differ because age varies among centers

SAS OUTPUT USING MODEL clinical = clinic age

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 16.5582 4 0.0024Score 17.2001 4 0.0018Wald 16.2760 4 0.0027

Type III Analysis of Effects

WaldEffect DF Chi-Square Pr > ChiSq

CLINIC 3 8.9604 0.0298AGE 1 8.4904 0.0036

Test if age and clinic are related to CVD

Page 16: April 6

SAS OUTPUT USING MODEL clinical = clinic age

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -7.2250 1.4096 26.2725 <.0001CLINIC A 1 1.3211 0.5189 6.4816 0.0109CLINIC B 1 0.6448 0.5400 1.4256 0.2325CLINIC C 1 0.5163 0.5335 0.9366 0.3332AGE 1 0.0662 0.0227 8.4904 0.0036

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

CLINIC A vs D 3.747 1.355 10.361CLINIC B vs D 1.906 0.661 5.492CLINIC C vs D 1.676 0.589 4.768AGE 1.068 1.022 1.117

Page 17: April 6

Assumptions: Linear Versus Logistic Regression

• Y normally distributed

• y linearly related to X

• constant over X

• Each observation independent of other observations

• Large N not needed for tests if Y is normally distributed

• Y binary

• Log odds linearly related to X

• N/A

• Each observation independent of other observations

• Large enough N to justify using 2

Page 18: April 6

Illustration of Linearity in Log Odds Assumption

Log odds = -6.2428 + 0.0613* Age

AGE ODDS

50 0.039

60 0.072

70 0.134

RO = 1.85 = .072/.039

RO = 1.85 = .134/.072

Increased relative odds from going from 50 to 60 year is same as going from 60 to 70 years

Note: Absolute risk is not linear with age

Page 19: April 6

Fitted regression line

xp

po 1)

1log(

Curve based on:

o effects location

1 effects curvature