diagnostics

16
Diagnostics Checking Assumptions and Bad Data

Upload: miles

Post on 05-Jan-2016

31 views

Category:

Documents


2 download

DESCRIPTION

Diagnostics. Checking Assumptions and Bad Data. What is the linearity assumption? How can you tell if it seems met? What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem? What is an outlier? What is leverage?. What is a residual? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Diagnostics

Diagnostics

Checking Assumptions and Bad Data

Page 2: Diagnostics

Questions

• What is the linearity assumption? How can you tell if it seems met?

• What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem?

• What is an outlier?• What is leverage?

• What is a residual?

• How can you use residuals in assuring that the regression model is a good representation of the data?

• Why consider a standardized residual?

• What is a studentized residual?

Page 3: Diagnostics

Linear Model

• Linear relations b/t X and Y

• Normal distribution of error of prediction

• Homoscedasticity (homogeneity of error in Y across levels of X)

Page 4: Diagnostics

Good-Looking Graph

6420-2

X

9

6

3

0

-3

Y

No apparent departures from line.

Page 5: Diagnostics

Same Data, Different Graph

65320-2

X

3

2

0

-2

-3

Re

sid

ua

ls

No systematic relations between X and residuals.

Page 6: Diagnostics

Problem with Linearity

50 100 150 200 250

Horsepower

10

20

30

40

50M

iles

per

Gal

lon

R Sq Linear = 0.595

Page 7: Diagnostics

Problem with Heteroscedasticity

65320-2

X

10

8

6

4

2

0

Y

Common problem when Y = $

Page 8: Diagnostics

Outliers

65320-2

X

10

8

6

3

1

-1

Y

Outlier

Outlier = pathological point

Page 9: Diagnostics

Review

• What is the linearity assumption? How can you tell if it seems met?

• What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem?

• What is an outlier?

Page 10: Diagnostics

Residuals

• Zresid

• Look for large values (some say |z|>2)

• Studentized residual (Student Residual):

XYS

e

SDresid

e

SDresid

YYZresid

.

2

2

.

)(11

x

XX

NSS XYei

The studentized residual considers the distance of the point from the mean. The farther X is from the mean, the smaller the standard error and the larger the residual. Look for large values. Also, studentized deleted residual (RStudent).

Page 11: Diagnostics

Influence Analysis

• Leverage:

• Leverage is an index of the importance of an observation to a regression analysis.– Function of X only– Large deviations from mean are influential– Maximum is 1; min is 1/N– Average value is (k+1)/N, where k is the

number of IVs

hN

X X

xi

1 2

2

( )

Page 12: Diagnostics

Influence Analysis (2)

• DFBETA and standardized DFBETA

• Change in slope or intercept resulting when you delete the ith person.

• Allow for influence of both X and Y

Page 13: Diagnostics

Example

2 2

3 3

3 1

4 1

4 3

5 2

8 8

4.14 2.86

X Y r = .82; r2 = .67; p < .05.

SX = 1.95, SY = 2.41

b=1.01, a=-1.34

986532

X

10

8

5

3

0

Y

M=

Page 14: Diagnostics

Example (2)

Y Pred Resid Student Residual

Rstudent DFBETA a

DFBETAb

2 .6875 1.3125 1.072 1.0923 .7577 -.6044

3 1.7 1.3 .962 .9526 .3943 -.2546

1 1.7 -.7 -.518 -.476 -.1970 .1272

1 2.7125 -1.7125 -1.224 -1.3086 -.2524 .0423

3 2.7125 .2875 .206 .1846 .0356 -.006

2 3.725 -1.725 -1.256 -1.3584 .0198 -.2681

8 6.7625 1.2375 1.803 2.7249 -3.5303 4.4807

Page 15: Diagnostics

Remedies

• Fit Curves if needed.

• Note heteroscedasticity for applied problems.

• Investigate all outliers. May delete them or not, depending. Report your actions.

Page 16: Diagnostics

Review

• What is leverage?

• What is a residual?

• How can you use residuals in assuring that the regression model is a good representation of the data?

• Why consider a standardized residual?

• What is a studentized residual?