chapter 3: generalized linear models 3.1 the generalization 3.2 logistic regression revisited 3.3...

56
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Upload: joshua-sharp

Post on 12-Jan-2016

255 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Chapter 3: Generalized Linear Models

3.1 The Generalization

3.2 Logistic Regression Revisited

3.3 Poisson Regression

1

Page 2: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Chapter 3: Generalized Linear Models

3.1 The Generalization3.1 The Generalization

3.2 Logistic Regression Revisited

3.3 Poisson Regression

2

Page 3: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Objectives Review the linear model. Generalize the linear model. Describe several common generalized linear models.

3

Page 4: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Review Linear Models A model is linear in the parameters when there is only

one parameter per term and it is a multiplicative constant.– It is not a matter of a linear response.

The response is modeled as a linear combination of terms.

This model is linear:

This model is not linear:

4

2 20 1 1 2 2 12 1 2 11 1 22 2y x x x x x x

10

xy e

Page 5: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Linear Model Error The linear model is for the expected value or the mean

of the response. The linear model includes the response errors as

normally distributed deviations with a mean of 0 and a constant variance.– The variance does not depend on any explanatory

variable. The errors are added to the expectation.

5

Page 6: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Generalized Linear Model The linear model can be generalized to cases with

nonnormal responses that are functions of the mean.– A random component uses any distribution

in the natural exponential family.– A systematic component relates the predictors

to the response.– A link function relates the mean response

to the systematic component.

6

Page 7: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Random Component The random component uses any distribution

in the natural exponential family. The PMF or PDF is in this form:

– a(θi) is a function of the distribution parameter.

– b(yi) is a function of the response.

– Q(θi) is the natural parameter.

7

; i iy Qi i i if y a b y e

Page 8: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Random Component in JMP The following distributions are available to serve

as the random component of a GLM in JMP:– Normal– Binomial– Poisson– Exponential

8

Page 9: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Systematic Component The systematic component uses a linear model.

9

i j ijj

x

Page 10: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Systematic Component in JMP Use Fit Model to specify the systematic component,

as you would for ordinary least squares regression. Create linear combinations of effects by adding terms

made from data columns.

10

Page 11: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Link Component The link function g relates the random component

and the systematic component.

The link is a monotonic and differentiable function. It is the canonical link function if it transforms

the mean to the natural parameter Q(θ),.

11

i i

i i j ijj

E Y

g x

Page 12: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Link Component in JMP The following functions are available to serve

as the link component in JMP:– Identity– Log– Logit– Reciprocal– Probit– Power:

– Complementary log-log:

12

, 0

log , 0

log log 1

Page 13: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

13

Page 14: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.01 QuizMatch the component of a GLM on the top with its representation or an example on the bottom.

A.Random component

B.Systematic component

C.Link component

1.

2.

3.

log1

i ii

x

; i iy Qi i i if y a b y e

14

Page 15: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.01 Quiz – Correct AnswerMatch the component of a GLM on the top with its representation or an example on the bottom.

A.Random component

B.Systematic component

C.Link component

1.

2.

3.

log1

i ii

x

; i iy Qi i i if y a b y e

The correct answer is A-3, B-2, and C-1.15

Page 16: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Binary Logistic Regression A binary response can also be modeled with a GLM.

The canonical link function is the logit.

16

log1

;

; 1 1

1

1

log1

i i

ii

i ii

y Qi i i i

yn yy

i i i i i

i i

i

ii

i

f y a b y e

f y e

a

b y

Q

Page 17: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Poisson Regression A simple model of counts is the Poisson distribution.

The canonical link function is the log.

17

log

;

1;

! !

1

!

log

i i

i i

i ii

i

y Qi i i i

yyi

i ii i

i

ii

i i

f y a b y e

ef y u e e

y y

a e

b yy

Q

Page 18: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Poisson Loglinear Model with Offset The opportunity for the counts might not be constant for all

observations.– The opportunity N might be a period of time,

a length, an area, or a volume.

Log(Ni) is the offset.

18

log

log log

j ijj

ij ij

ji

i i j ijj

x

i i

xN

N x

N e

Page 19: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Common Generalized Linear ModelsDistribution Link Predictors Method

Normal Identity Continuous OLS Regression

Normal Identity Categorical ANOVA

Binomial Logit Both Logistic Regr.

Binomial Compl. Log-Log Continuous Asymmetric

Poisson Log Both Poisson Regr.

Normal Inverse Normal Both Probit Analysis

Exponential Reciprocal Both Nonlinear Regr.

19

Page 20: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Deviance The deviance is a measure of goodness of fit. The deviance assesses the difference between

the observed and the predicted response.– Differences should be random (chi-square).

The deviance assesses the value of explanatory variables in the model.– Deviance aids model selection.

The deviance is twice the difference in log-likelihood between the saturated model and the full model.

20

Page 21: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Over-Dispersion Lack of fit can result from more variance than

expected from the model distribution. An over-dispersion parameter can be used to account

for the excess in the case of a binomial or Poisson distribution.– The parameter equals 1 when there

is no over-dispersion.

21

Page 22: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

22

Page 23: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.02 Multiple Answer PollWhich of the following statements are true of the deviance associated with a GLM?

a. The deviance is the difference between the predicted response and the observed response.

b. The deviance is twice the difference in log-likelihood between the saturated model and the full model.

c. The deviance is a measure of goodness of fit.

d. The deviance measures the variance of the response.

23

Page 24: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.02 Multiple Answer Poll – Correct AnswerWhich of the following statements are true of the deviance associated with a GLM?

a. The deviance is the difference between the predicted response and the observed response.

b. The deviance is twice the difference in log-likelihood between the saturated model and the full model.

c. The deviance is a measure of goodness of fit.

d. The deviance measures the variance of the response.

24

Page 25: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

25

Page 26: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Chapter 3: Generalized Linear Models

3.1 The Generalization

3.2 Logistic Regression Revisited3.2 Logistic Regression Revisited

3.3 Poisson Regression

26

Page 27: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Objectives Review binary logistic regression models. Model binary responses with a GLM.

27

Page 28: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Binary Logistic Regression Model

28

0 1

0 1

0 1logit log1

1

x

x

xx x

x

ex

e

Page 29: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Advantage of Using Logistic Regression JMP provides the following when you use logistic

regression:– Likelihood ratio test for lack of fit– Many measures of goodness of fit– Profiles of probability for all levels of the predictor– Odds ratios– ROC curve– Lift curve– Confusion matrix

29

Page 30: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Advantage of Using Binary GLM JMP provides the following when you use a GLM

for a binary response:– Deviance for lack of fit– Over-dispersion model parameter– Likelihood ratio test for over-dispersion– Four residual plots– Prediction profiler for probability of target level

30

Page 31: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

GLM for a Binary Response A binary response can also be modeled with a GLM.

The canonical link function is the logit.

31

log1

;

; 1 1

1

1

log1

i i

ii

i ii

y Qi i i i

yn yy

i i i i i

i i

i

ii

i

f y a b y e

f y e

a

b y

Q

Page 32: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Separation Problem It might happen in any given sample that the binary

outcomes are completely separated by the explanatory variable.

This separation causes a problem with estimating the logistic regression or GLM parameters.

Firth’s penalized maximum likelihood estimation method can avoid this problem and reduce bias in the parameter estimates in the case of rare outcomes.

32

Page 33: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Pearson Residuals Pearson chi-square for goodness of fit is the sum

of the squared Pearson residuals.

33

ˆ

ˆ ˆ1i i i

i

i i i

y ne

n

Page 34: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Deviance Residuals The deviance chi-square for goodness of fit is the sum

of the squared deviance residuals.

Studentized residuals provide a common scale for inspection.

34

1

2

2 log 2 logˆ ˆ

ˆsign

i i ii i i i i

i i i

i i i

y n yd s y n y

y n y

s y y

Page 35: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

35

Page 36: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.03 QuizWhat are the three GLM components for a binary response?

36

Page 37: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.03 Quiz – Correct AnswerWhat are the three GLM components for a binary response?

Random component is the binomial distribution.

Systematic component is a polynomial function.

Link component is the logit function.

37

Page 38: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

GLM for Binary Response Example Use GLM with the Titanic Passengers data set

to related Survived with Siblings and Spouses, Parents and Children, and Fare.

38

Page 39: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

This demonstration illustrates the concepts discussed previously.

GLM for a Binary Response

39

Page 40: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

40

Page 41: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Exercise

This exercise reinforces the concepts discussed previously.

41

Page 42: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Chapter 3: Generalized Linear Models

3.1 The Generalization

3.2 Logistic Regression Revisited

3.3 Poisson Regression3.3 Poisson Regression

42

Page 43: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Objectives Identify categorical response of counts. Use a GLM that is also known as Poisson loglinear

regression.

43

Page 44: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Response Is Counts The response can be simply the count of a particular

event in many cases.– Occurrence of a disease– Road accidents– Mold colonies– Number of non-conforming items

44

Page 45: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Response Is Counts, Constant Opportunity The response can be the count of a particular event

in the same span of time, linear dimension, area, or volume.– Occurrence of a disease per annum– Road accidents each month on the same highway– Mold colonies in a standard Petri dish– Number of non-conforming items in a standard

lot size

45

Page 46: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Poisson Regression A simple model of counts is the Poisson distribution.

The canonical link function is the log.

46

log

;

1;

! !

1

!

log

i i

i i

i ii

i

y Qi i i i

yyi

i ii i

i

ii

i i

f y a b y e

ef y u e e

y y

a e

b yy

Q

Page 47: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Response Is Counts, Opportunity Varies The response can be simply the count of a particular

event in the same span of time or linear dimension, area, or volume.– Occurrence of a disease in different hospitals– Road accidents on different highways– Mold colonies in nonstandard field cases– Number of non-conforming items in lots of different

sizes Requires the use of an offset parameter in the model.

– Acts like intercept in the linear model.

47

Page 48: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Poisson Loglinear Model with Offset The opportunity for the counts might not be constant for all

observations.– The opportunity N might be a period of time,

a length, an area, or a volume.

Log(Ni) is the offset.

48

log

log log

j ijj

ij ij

ji

i i j ijj

x

i i

xN

N x

N e

Page 49: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

49

Page 50: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.04 Multiple Answer PollHow is the logarithm always used in Poisson regression with GLM?

a. Transform the response variable

b. Transform the explanatory variable

c. Transform the offset variable

d. Link the systematic and random components

e. Increase the over-dispersion

50

Page 51: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

3.04 Multiple Answer Poll – Correct AnswerHow is the logarithm always used in Poisson regression with GLM?

a. Transform the response variable

b. Transform the explanatory variable

c. Transform the offset variable

d. Link the systematic and random components

e. Increase the over-dispersion

51

Page 52: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Contrasts The effect tests in the GLM analysis provide

a sufficient test about the difference in the expected value of the response at both levels of a categorical explanatory variable with just two levels.

Another level of tests is possible when a categorical predictor has more than two levels: a contrast.

The contrast can test one level against another. The contrast can test a combination of levels against

another level or another combination of levels. Contrasts are based on a likelihood ratio test.

52

Page 53: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Poisson Regression Example The number of new melanoma cases was reported from

1969-1971 for white males in two areas.

– Region is Northern or Southern.

– Age Group is <35, 35-44, 45-54, 55-64, 65-74, and >75.

– Cases is the number of patients with a new melanoma.

– Total is the number of patients in each region and age group. Offset is log(Total).

The total number of patients varies.

53

Page 54: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

This demonstration illustrates the concepts discussed previously.

GLM for Counts

54

Page 55: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

55

Page 56: Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1

Exercise

This exercise reinforces the concepts discussed previously.

56