14 logistsic regression

Upload: sleman-terkawi

Post on 09-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 14 Logistsic regression

    1/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:201

    Logistic regressionLogistic regression

  • 8/7/2019 14 Logistsic regression

    2/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:202

    Logistic regressionLogistic regression

    Member of the GLM family Unlike standard linear regression, the

    dependent variable is binary (0,1), so that eachcases value is either 0 or 1.

    Normally, 0 is taken to mean the absence ofsome attribute, 1 its presence.

    Logistic regression can be extended to thecase where there are more than two possiblevalues for the dependent variable (e.g. low,medium, high multinomial regression)

  • 8/7/2019 14 Logistsic regression

    3/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:203

    Example: incidence of heart attacks inExample: incidence of heart attacks in

    relation to agerelation to age

    10 30 50 70 90

    age

    -0.2

    0.1

    0.4

    0.7

    1.0

    cardiaque

    Linear regression

    inappropriate because:

    Residuals not normal

    Residuals heteroscedastic

    Predicted values nonsense (e.g.

    what does a predicted value of

    0.3 mean?)

  • 8/7/2019 14 Logistsic regression

    4/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:204

    Logistic regression: dependent variableLogistic regression: dependent variable

    Variable of interest isthe probability p of

    obtaining a a one as a

    function of predictor

    variables

    The magnitude ofregression

    coefficients in the

    model depends on

    distribution of the

    predictor variables inthe two groups Y= 0

    and Y = 1,

    X

    Y

    X

    Y

    1

    0

    1

    0

  • 8/7/2019 14 Logistsic regression

    5/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:205

    Dependent variable: logit (p)Dependent variable: logit (p)

    logit( )

    logit( )

    logit( ) ln1

    1 1

    y p

    y p

    pp yp

    e ep

    e e

    = =

    = =+ +

    -4 -2 0 2 4

    logit

    0

    20

    40

    60

    80

    100

    p

  • 8/7/2019 14 Logistsic regression

    6/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:206

    Logistic regression: model coefficientsLogistic regression: model coefficients

    Negative regressioncoefficient means

    probability of success

    decreases with

    increasing value of

    predictor. Positive regression

    coefficient means

    probability of success

    decreases with

    increasing value ofpredictor.

    X

    Y

    X

    Y

    1

    0

    1

    0

    > 0

    < 0

  • 8/7/2019 14 Logistsic regression

    7/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:207

    Logistic regression: model coefficientsLogistic regression: model coefficients

    The magnitude ofthe regression

    coefficient

    depends on how

    abruptly pchanges with X,

    with large values

    indicating abrupt

    change.

    X

    Y

    1

    0

    > 0, small

    X

    Y

    1

    0

    > 0, large

  • 8/7/2019 14 Logistsic regression

    8/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:208

    Least squaresLeast squares

    estimation (LSE)estimation (LSE)

    An ordinary leastsquares (OLS) estimate

    of a model parameter

    is that whichminimizes the sum of

    squared differences

    between observed and

    predicted values: Predicted values are

    derived from some

    model whose

    parameters we wish to

    estimate

    2

    1)( yySS

    N

    i

    iR = =

    OLS

    S

    SR

    ),( = xfy

  • 8/7/2019 14 Logistsic regression

    9/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:209

    Maximum likelihoodMaximum likelihood

    estimation (MLE)estimation (MLE)

    A maximum likelihoodestimate (MLE) of a

    model parameter fora given distribution is

    that which maximizes

    the probability ofgenerating the observed

    sample data.

    MLEs are obtained by

    maximizing the lossfunction

    or equivalently, by

    minimizing the negative

    log likelihood function

    );(1

    = =

    n

    i

    ixL

    MLE

    Lor-lo

    g

    L

    -log LL

    ));(ln(log1

    = =

    i

    n

    i

    xL

  • 8/7/2019 14 Logistsic regression

    10/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:2010

    How are the model parametersHow are the model parameters

    estimated?estimated?

    Estimated not by least squares, but ratherby Maximum Likelihood

    Based on an estimate of the likelihood of obtaining

    the observed results based on different values of

    the model parameters

    In principle, parameter estimates should converge

    to those maximizing log-likelihood or minimizing -

    LogL

  • 8/7/2019 14 Logistsic regression

    11/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:2011

    Hypothesis testingHypothesis testing

    Likelihood Deviance=-2L

    Is apprioximately distributed as chi-square

    Measures the variation unexplained by the fitted

    model, analagous to residual sums of squares.

    Model comparison

    Change in deviance when model terms are added

    (or deleted) is also approximately distributed aschi-square, so can test hypotheses relating to

    individual model terms.

  • 8/7/2019 14 Logistsic regression

    12/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:2012

    Model assumptionsModel assumptions

    Observations are independent Dependent variable has a binomial

    distribution

    Little error in measurement of dependentvariables.

  • 8/7/2019 14 Logistsic regression

    13/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:2013

    Logistic regression in SPlusLogistic regression in SPlus*** Generalized Linear Model ***

    Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action =na.exclude, control

    = list(epsilon = 0.0001, maxit = 50, trace = F))

    Deviance Residuals:

    Min 1Q Median 3Q Max

    -1.545637 -0.5732664 -0.272312 -0.1404323 2.679875

    Coefficients:

    Value Std. Error t value

    (Intercept) -7.76838060 0.376403465 -20.63844

    age 0.09557905 0.005097055 18.75182

    (Dispersion Parameter for Binomial family taken to be 1 )

    Null Deviance: 2050.515 on 1999 degrees of freedom

    Residual Deviance: 1490.001 on 1998 degrees of freedom

    Number of Fisher Scoring Iterations: 4

  • 8/7/2019 14 Logistsic regression

    14/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay

    11-02-12 01:2014

    Incidence of heart attack in relation to ageIncidence of heart attack in relation to age

    30 40 50 60 70 80 90

    age

    -0.1

    0.1

    0.3

    0.5

    0.7

    0.9

    cardiaque

    logit ( ) 7.77 0.96

    logit( ) 7.77 0.96

    y=logit(p) 7.77 0.96

    1 1 1

    y p Age

    y p Age

    Age

    e e ep

    e e e

    +

    +

    = +

    = = =+ + +

  • 8/7/2019 14 Logistsic regression

    15/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay11-02-12 01:20

    15

    Presence of post-operative kyphosis usingPresence of post-operative kyphosis using

    logistic regressionlogistic regression

    Kyphosis: a binary variable indicating thepresence/absence

    of a postoperative spinal deformity called Kyphosis.

    Age: the age of the child in months.

    Number: the number of vertebrae involved in the spinal

    operation.

    Start: the beginning of the range of the vertebrae

    involved in the operation

  • 8/7/2019 14 Logistsic regression

    16/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay11-02-12 01:20

    16

    Evidence that the distribution of predictorEvidence that the distribution of predictor

    variables differs among levels of responsevariables differs among levels of response

    variablevariable

  • 8/7/2019 14 Logistsic regression

    17/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay11-02-12 01:20

    17

    The modelThe model

  • 8/7/2019 14 Logistsic regression

    18/18

    Universit dOttawa - Bio 4518 - Biostatistiques appliques

    Antoine Morin et Scott Findlay11-02-12 01:20

    18

    Testing hypothesesTesting hypotheses