7 logistic regression - petra christian...

5
References : Alan Agresti, Categorical Data Analysis, Wiley Interscience, New Jersey, 2002 Subhash Sharma, Applied Multivariate Techniques, John Wiley & Sons, 1996 Introduction Interpreting Parameters in Logistic Regression Inferences for Logistic Regression Logit Models with Categorical Predictors Multiple Logistic Regression Logistic Regression Siana Halim Indriati N Bisono Introduction Interpreting Parameters in Logistic Regression Inferences For Logistic Regression Logit Models with Categorical Predictors Multiple Logistic Regression Introduction Interpreting Parameters in Logistic Regression Inferences for Logistic Regression Logit Models with Categorical Predictors Multiple Logistic Regression Logistic Regression Siana Halim Indriati N Bisono Recently, logistic regression has become a popular tool in business applications. Some credit-scoring applications use logistic regression to model the probability that a subject is credit worthy. A company that relies on catalog sales may determine whether to send a catalog to a potential customer by modeling the probability of a sale as a function of indices of past buying behavior. Introduction Interpreting Parameters in Logistic Regression Inferences for Logistic Regression Logit Models with Categorical Predictors Multiple Logistic Regression Logistic Regression : Model Logistic Regression Siana Halim Indriati N Bisono The logistic regression model is a generalized linear model with Random Component : The response variable is binary Y i = 1 or 0 (an event occurs or it doesn’t) We are interested in probability that Y i = 1, i.e. , π(x i ) The distribution of Y i is binomial. Systematic Component : A linear prediction such as The explanatory or predictor variables may be quantitative (continuous), qualitative (discrete), or both (mixed) ji j i x x β β α + + + ... 1 1

Upload: others

Post on 29-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for

References : Alan Agresti, Categorical Data Analysis, Wiley Interscience, New Jersey, 2002 Subhash Sharma, Applied Multivariate Techniques, John Wiley & Sons, 1996

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Logistic Regression

Siana Halim Indriati N Bisono

IntroductionInterpreting Parameters in Logistic RegressionInferences For Logistic RegressionLogit Models with Categorical PredictorsMultiple Logistic Regression

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Logistic Regression

Siana Halim Indriati N Bisono

Recently, logistic regression has become a popular tool in business applications. Some credit-scoringapplications use logistic regression to model the probability that a subject is credit worthy.

A company that relies on catalog sales may determine whether to send a catalog to a potential customer by modeling the probability of a sale as a function of indices of past buying behavior.

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Logistic Regression : Model

Logistic Regression

Siana Halim Indriati N Bisono

The logistic regression model is a generalized linear model withRandom Component : The response variable is binary

Yi = 1 or 0 (an event occurs or it doesn’t)We are interested in probability that Yi = 1, i.e. , π(xi)The distribution of Yi is binomial.

Systematic Component : A linear prediction such as

The explanatory or predictor variables may be quantitative (continuous), qualitative (discrete), or both (mixed)

jiji xx ββα +++ ...11

Page 2: 7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Logistic Regression : Model

Logistic Regression

Siana Halim Indriati N Bisono

Link Function : The log of the odds that an event occurs, known as “logit” :

Putting all together the logistic regression model is

( ) ⎟⎠⎞

⎜⎝⎛−

ππ1

loglogit

( )( ) ( )( ) jijii

ii xx

xxx ββαπ

ππ +++=⎟⎟

⎞⎜⎜⎝

⎛−

= ...1

log 11logit

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Illustration : Data for Most and Least Successful Financial Institutions

Logistic Regression

Siana Halim Indriati N Bisono

0.86022.5701

0.44022.7001

1.15022.1911

0.34021.4911

1.61023.2411

0.75022.1811

0.70022.9711

0.16022.6711

0.07023.5011

1.08022.7711

1.06022.8011

2.28120.5811

FPSizeSuccessFPSizeSuccessLeast SuccessfulMost Successful

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Illustration : Contingency table for Type and Size of Financial Institution

Logistic Regression

Siana Halim Indriati N Bisono

241311Total12111Least Successful (LS)12210Most Successful (MS)

TotalSmallLargeType of Financial Institution (FI)Size

Probability any FI will be MS is P(MS) = 12/24 = 0.5

Probability FI is MS given it is large (L)P(MS | L ) = 10/11 = 0.909

Probability FI is MS given it small (S)P( MS | S) = 2/13 = 0.154

Odds of a FI being MS areOdds (MS) = 12/12 = 1

Odds of a FI being MS given it is large areOdds (MS | L ) = 10/1 = 10 (1)

Odds of a FI being MS given it is small areOdds ( MS | S) = 2/11 = 0.154 (2)

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Odds and Probability

Logistic Regression

Siana Halim Indriati N Bisono

Odds and probabilities provide the same information, but in different forms. It is easy to convert odds into probabilities and vice versa. For example

10909.01

909.0)|(1

)|()|(

909.0101

10)|(1

)|()|(

=−

=−

=

=+

=+

=

LMSPLMSPLMSOdds

LMSoddsLMSoddsLMSP

Page 3: 7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Illustration : The Logistic Regression Model

Logistic Regression

Siana Halim Indriati N Bisono

Taking the natural log of the odds given by eqn. (1) and (2) we get ln [odds (MS|L)] = ln (10) = 2.303

ln [odds (MS|S)] = ln (0.182) = -1.704These two equations can be combined into the following equation to give the log of the odds as a function of the size of the FI :

ln [odds (MS | SIZE)] = -1.704 + 4.007 x SIZE (3)where SIZE = 1 if the FI is large and SIZE = 0 if the FI is small.

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Illustration : The Logistic Regression Model

Logistic Regression

Siana Halim Indriati N Bisono

In general, for k independent variables (3) can be written

where

or

as ( )[ ] kkk XXXXMSodds βββ +++= ...,...,|ln 1101

kk XXpp βββ +++=−

...1

ln 110

( )ppXXMSodds k −

=1

,...,| 1

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Interpreting β: Odds, Probabilities, and Linear Approximations

Logistic Regression

Siana Halim Indriati N Bisono

For a binary response variable Y and an explanatory variable X, let

π(x) = P(Y=1|X=x) = 1 – P(Y=0|X=x). The logistic regression model is

(4)

Equivalently, the log odds, called the logit,has the linear relationship

(5)

)exp(1)exp()(xxxβα

βαπ++

+=

[ ] xxxx βαπ

ππ +=−

=)(1

)(log)(logit

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Interpreting β: Odds, Probabilities, and Linear Approximations

Logistic Regression

Siana Halim Indriati N Bisono

How can we interpret β in(5) ?Its sign determines whether π(x) is increasing of decreasing as xincreases.The rate of climb or descent increases as |β| increases; as β → 0 the curve flattens to a horizontal straight line. When β = 0, Y is independent of X.Since the logistic density is symmetric, π(x) approaches 1 at the same rate that it approaches 0.

The intercept parameter α is not usually of particular interest. However, by centering the predictor about 0 , α becomes the logit at the mean, and thus

( )xee πα

α=

+1

Page 4: 7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Looking at the Data

Logistic Regression

Siana Halim Indriati N Bisono

Before fitting the model and making such interpretations, look at the data to check that the logistic regression model is appropriate.

Since Y takes only values 0 and 1, it is difficult to check this by plotting Y against x.

It can be helpful to plot sample proportions or logits against x.

Let ni denote the number of observations at setting I of x. Of them, let yi denote the number of “1” outcomes, with pi = yi/ni.

Sample logit i is

This is not finite when yi = 0 or ni. The adjustment

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡− ii

i

i

i

yny

pp

log1

log

21

21

log+−

+

ii

i

yn

y

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Types of Inference : Hypothesis Test (Optional)

Logistic Regression

Siana Halim Indriati N Bisono

For the model with a single predictor,

significance test focus on H0 : β = 0, the hypothesis of independence.

[ ] xx βαπ +=)(logit

Wald test

Wald test uses the log likelihood at , with test statistic

Or its square; Under H0, z2 is asymptotically χ1

2

β̂

SEz β̂=

The Likelihood-ratio test

The likelihood ratio test uses twice the difference between the maximized log likelihood at and at β = 0 and also has an asymptotic χ1

2 null distribution.β̂

The Score test

The score test uses the log likelihood at β = 0 through the derivative of the log likelihood (i.e. the score function) at that point.

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Types of Inference : Confidence Interval (Optional)

Logistic Regression

Siana Halim Indriati N Bisono

An interval for β results from inverting a test of H0 : β = β0. The interval is the set of β0 for which the chi-squared test statistic is no greater than χ1

2(α) = z2α/2. The Wald confidence interval is

SE is given by the estimated square root of

A 95% confidence interval for logit [π(x0)] is

Substituting each endpoint into the inverse transformation gives a corresponding interval for π(x0) .

( )[ ] )(ˆˆ2/

22/

20 SEzorzSE αα βββ ±≤−

( ) ( ) ( ) )ˆ,ˆcov(2ˆvarˆvarˆˆvar 0200 βαβαβα xxx ++=+

( ) SEx 96.1ˆˆ 0 ±+ βα( )

logit)logit)

exp(1exp(

0 +=xπ

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

ANOVA – Type Representation of Factors

Logistic Regression

Siana Halim Indriati N Bisono

Like ordinary regression, logistic regression extends to include qualitative explanatory variables, often called factors. We use dummy variable to do this.

For simplicity, we first consider a single factor X, with I categories. In row I of the I x 2 table, yi is the number of outcomes in the first column (successes) out of nitrials.

ii

i βαπ

π+=

−1log

resembles the model formula for cell means in one-way ANOVA

We treat yi as binomial with parameter πI,. The logit with factor is

Page 5: 7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

ANOVA – Type Representation of Factors

Logistic Regression

Siana Halim Indriati N Bisono

With I categories, X has I – 1 non redundant parameters. One parameter can be set to 0, say βI = 0. If the values do not satisfy this, we can recode so that it is true.For instance, setWhich satisfy .Then

Where the newly defined parameters satisfy the constraint. When βI = 0, α equals the logit in row I, and βi is the difference between the logits in rows i and I. Thus, βi equals the log odds ratio for that pair of rows.

IiIii βααβββ −=−= ~~ and0~

=Iβ( ) ( ) ( ) iIiIiiπ βαβββαβα

~~~~ +=−+−=+=logit

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Multiple Logistic Regression

Logistic Regression

Siana Halim Indriati N Bisono

The model for π(x) = P(Y=1) at values x = (x1, …, xp) of p predictors is

The alternative formula, is

The parameter βI refers to the effect of xi on the log odds that Y = 1, controlling the other xj.

[ ] pp xxx ββαπ +++= ...)( 11logit

)...exp(1)...exp(

11

11)(pp

pp

xxxxx ββα

ββαπ +++++++

=