14 logistsic regression

8/7/2019 14 Logistsic regression

1/18

Universit dOttawa - Bio 4518 - Biostatistiques appliques

Antoine Morin et Scott Findlay

11-02-12 01:201

Logistic regressionLogistic regression


2/18



11-02-12 01:202

Logistic regressionLogistic regression

Member of the GLM family Unlike standard linear regression, the

dependent variable is binary (0,1), so that eachcases value is either 0 or 1.

Normally, 0 is taken to mean the absence ofsome attribute, 1 its presence.

Logistic regression can be extended to thecase where there are more than two possiblevalues for the dependent variable (e.g. low,medium, high multinomial regression)


3/18



11-02-12 01:203

Example: incidence of heart attacks inExample: incidence of heart attacks in

relation to agerelation to age

10 30 50 70 90

age

-0.2

0.1

0.4

0.7

1.0

cardiaque

Linear regression

inappropriate because:

Residuals not normal

Residuals heteroscedastic

Predicted values nonsense (e.g.

what does a predicted value of

0.3 mean?)


4/18



11-02-12 01:204

Logistic regression: dependent variableLogistic regression: dependent variable

Variable of interest isthe probability p of

obtaining a a one as a

function of predictor

variables

The magnitude ofregression

coefficients in the

model depends on

distribution of the

predictor variables inthe two groups Y= 0

and Y = 1,

X

Y

X

Y

1

0

1

0


5/18



11-02-12 01:205

Dependent variable: logit (p)Dependent variable: logit (p)

logit( )

logit( )

logit( ) ln1

1 1

y p

y p

pp yp

e ep

e e

= =

= =+ +

-4 -2 0 2 4

logit

0

20

40

60

80

100

p


6/18



11-02-12 01:206

Logistic regression: model coefficientsLogistic regression: model coefficients

Negative regressioncoefficient means

probability of success

decreases with

increasing value of

predictor. Positive regression

coefficient means

probability of success

decreases with

increasing value ofpredictor.

X

Y

X

Y

1

0

1

0

> 0

< 0


7/18



11-02-12 01:207

Logistic regression: model coefficientsLogistic regression: model coefficients

The magnitude ofthe regression

coefficient

depends on how

abruptly pchanges with X,

with large values

indicating abrupt

change.

X

Y

1

0

> 0, small

X

Y

1

0

> 0, large


8/18



11-02-12 01:208

Least squaresLeast squares

estimation (LSE)estimation (LSE)

An ordinary leastsquares (OLS) estimate

of a model parameter

is that whichminimizes the sum of

squared differences

between observed and

predicted values: Predicted values are

derived from some

model whose

parameters we wish to

estimate

2

1)( yySS

N

i

iR = =

OLS

S

SR

),( = xfy


9/18



11-02-12 01:209

Maximum likelihoodMaximum likelihood

estimation (MLE)estimation (MLE)

A maximum likelihoodestimate (MLE) of a

model parameter fora given distribution is

that which maximizes

the probability ofgenerating the observed

sample data.

MLEs are obtained by

maximizing the lossfunction

or equivalently, by

minimizing the negative

log likelihood function

);(1

= =

n

i

ixL

MLE

Lor-lo

g

L

-log LL

));(ln(log1

= =

i

n

i

xL


10/18



11-02-12 01:2010

How are the model parametersHow are the model parameters

estimated?estimated?

Estimated not by least squares, but ratherby Maximum Likelihood

Based on an estimate of the likelihood of obtaining

the observed results based on different values of

the model parameters

In principle, parameter estimates should converge

to those maximizing log-likelihood or minimizing -

LogL


11/18



11-02-12 01:2011

Hypothesis testingHypothesis testing

Likelihood Deviance=-2L

Is apprioximately distributed as chi-square

Measures the variation unexplained by the fitted

model, analagous to residual sums of squares.

Model comparison

Change in deviance when model terms are added

(or deleted) is also approximately distributed aschi-square, so can test hypotheses relating to

individual model terms.


12/18



11-02-12 01:2012

Model assumptionsModel assumptions

Observations are independent Dependent variable has a binomial

distribution

Little error in measurement of dependentvariables.


13/18



11-02-12 01:2013

Logistic regression in SPlusLogistic regression in SPlus*** Generalized Linear Model ***

Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action =na.exclude, control

= list(epsilon = 0.0001, maxit = 50, trace = F))

Deviance Residuals:

Min 1Q Median 3Q Max

-1.545637 -0.5732664 -0.272312 -0.1404323 2.679875

Coefficients:

Value Std. Error t value

(Intercept) -7.76838060 0.376403465 -20.63844

age 0.09557905 0.005097055 18.75182

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 2050.515 on 1999 degrees of freedom

Residual Deviance: 1490.001 on 1998 degrees of freedom

Number of Fisher Scoring Iterations: 4


14/18



11-02-12 01:2014

Incidence of heart attack in relation to ageIncidence of heart attack in relation to age

30 40 50 60 70 80 90

age

-0.1

0.1

0.3

0.5

0.7

0.9

cardiaque

logit ( ) 7.77 0.96

logit( ) 7.77 0.96

y=logit(p) 7.77 0.96

1 1 1

y p Age

y p Age

Age

e e ep

e e e

+

+

= +

= = =+ + +


15/18


Antoine Morin et Scott Findlay11-02-12 01:20

15

Presence of post-operative kyphosis usingPresence of post-operative kyphosis using

logistic regressionlogistic regression

Kyphosis: a binary variable indicating thepresence/absence

of a postoperative spinal deformity called Kyphosis.

Age: the age of the child in months.

Number: the number of vertebrae involved in the spinal

operation.

Start: the beginning of the range of the vertebrae

involved in the operation


16/18



16

Evidence that the distribution of predictorEvidence that the distribution of predictor

variables differs among levels of responsevariables differs among levels of response

variablevariable


17/18



17

The modelThe model


18/18



18

Testing hypothesesTesting hypotheses

14 logistsic regression

Documents