Download - Diagnostics

Transcript
Page 1: Diagnostics

Diagnostics

Checking Assumptions and Bad Data

Page 2: Diagnostics

Questions

• What is the linearity assumption? How can you tell if it seems met?

• What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem?

• What is an outlier?• What is leverage?

• What is a residual?

• How can you use residuals in assuring that the regression model is a good representation of the data?

• Why consider a standardized residual?

• What is a studentized residual?

Page 3: Diagnostics

Linear Model

• Linear relations b/t X and Y

• Normal distribution of error of prediction

• Homoscedasticity (homogeneity of error in Y across levels of X)

Page 4: Diagnostics

Good-Looking Graph

6420-2

X

9

6

3

0

-3

Y

No apparent departures from line.

Page 5: Diagnostics

Same Data, Different Graph

65320-2

X

3

2

0

-2

-3

Re

sid

ua

ls

No systematic relations between X and residuals.

Page 6: Diagnostics

Problem with Linearity

50 100 150 200 250

Horsepower

10

20

30

40

50M

iles

per

Gal

lon

R Sq Linear = 0.595

Page 7: Diagnostics

Problem with Heteroscedasticity

65320-2

X

10

8

6

4

2

0

Y

Common problem when Y = $

Page 8: Diagnostics

Outliers

65320-2

X

10

8

6

3

1

-1

Y

Outlier

Outlier = pathological point

Page 9: Diagnostics

Review

• What is the linearity assumption? How can you tell if it seems met?

• What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem?

• What is an outlier?

Page 10: Diagnostics

Residuals

• Zresid

• Look for large values (some say |z|>2)

• Studentized residual (Student Residual):

XYS

e

SDresid

e

SDresid

YYZresid

.

2

2

.

)(11

x

XX

NSS XYei

The studentized residual considers the distance of the point from the mean. The farther X is from the mean, the smaller the standard error and the larger the residual. Look for large values. Also, studentized deleted residual (RStudent).

Page 11: Diagnostics

Influence Analysis

• Leverage:

• Leverage is an index of the importance of an observation to a regression analysis.– Function of X only– Large deviations from mean are influential– Maximum is 1; min is 1/N– Average value is (k+1)/N, where k is the

number of IVs

hN

X X

xi

1 2

2

( )

Page 12: Diagnostics

Influence Analysis (2)

• DFBETA and standardized DFBETA

• Change in slope or intercept resulting when you delete the ith person.

• Allow for influence of both X and Y

Page 13: Diagnostics

Example

2 2

3 3

3 1

4 1

4 3

5 2

8 8

4.14 2.86

X Y r = .82; r2 = .67; p < .05.

SX = 1.95, SY = 2.41

b=1.01, a=-1.34

986532

X

10

8

5

3

0

Y

M=

Page 14: Diagnostics

Example (2)

Y Pred Resid Student Residual

Rstudent DFBETA a

DFBETAb

2 .6875 1.3125 1.072 1.0923 .7577 -.6044

3 1.7 1.3 .962 .9526 .3943 -.2546

1 1.7 -.7 -.518 -.476 -.1970 .1272

1 2.7125 -1.7125 -1.224 -1.3086 -.2524 .0423

3 2.7125 .2875 .206 .1846 .0356 -.006

2 3.725 -1.725 -1.256 -1.3584 .0198 -.2681

8 6.7625 1.2375 1.803 2.7249 -3.5303 4.4807

Page 15: Diagnostics

Remedies

• Fit Curves if needed.

• Note heteroscedasticity for applied problems.

• Investigate all outliers. May delete them or not, depending. Report your actions.

Page 16: Diagnostics

Review

• What is leverage?

• What is a residual?

• How can you use residuals in assuring that the regression model is a good representation of the data?

• Why consider a standardized residual?

• What is a studentized residual?


Top Related