Download - Diagnostics
![Page 1: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/1.jpg)
Diagnostics
Checking Assumptions and Bad Data
![Page 2: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/2.jpg)
Questions
• What is the linearity assumption? How can you tell if it seems met?
• What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem?
• What is an outlier?• What is leverage?
• What is a residual?
• How can you use residuals in assuring that the regression model is a good representation of the data?
• Why consider a standardized residual?
• What is a studentized residual?
![Page 3: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/3.jpg)
Linear Model
• Linear relations b/t X and Y
• Normal distribution of error of prediction
• Homoscedasticity (homogeneity of error in Y across levels of X)
![Page 4: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/4.jpg)
Good-Looking Graph
6420-2
X
9
6
3
0
-3
Y
No apparent departures from line.
![Page 5: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/5.jpg)
Same Data, Different Graph
65320-2
X
3
2
0
-2
-3
Re
sid
ua
ls
No systematic relations between X and residuals.
![Page 6: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/6.jpg)
Problem with Linearity
50 100 150 200 250
Horsepower
10
20
30
40
50M
iles
per
Gal
lon
R Sq Linear = 0.595
![Page 7: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/7.jpg)
Problem with Heteroscedasticity
65320-2
X
10
8
6
4
2
0
Y
Common problem when Y = $
![Page 8: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/8.jpg)
Outliers
65320-2
X
10
8
6
3
1
-1
Y
Outlier
Outlier = pathological point
![Page 9: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/9.jpg)
Review
• What is the linearity assumption? How can you tell if it seems met?
• What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem?
• What is an outlier?
![Page 10: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/10.jpg)
Residuals
• Zresid
• Look for large values (some say |z|>2)
• Studentized residual (Student Residual):
XYS
e
SDresid
e
SDresid
YYZresid
.
2
2
.
)(11
x
XX
NSS XYei
The studentized residual considers the distance of the point from the mean. The farther X is from the mean, the smaller the standard error and the larger the residual. Look for large values. Also, studentized deleted residual (RStudent).
![Page 11: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/11.jpg)
Influence Analysis
• Leverage:
• Leverage is an index of the importance of an observation to a regression analysis.– Function of X only– Large deviations from mean are influential– Maximum is 1; min is 1/N– Average value is (k+1)/N, where k is the
number of IVs
hN
X X
xi
1 2
2
( )
![Page 12: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/12.jpg)
Influence Analysis (2)
• DFBETA and standardized DFBETA
• Change in slope or intercept resulting when you delete the ith person.
• Allow for influence of both X and Y
![Page 13: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/13.jpg)
Example
2 2
3 3
3 1
4 1
4 3
5 2
8 8
4.14 2.86
X Y r = .82; r2 = .67; p < .05.
SX = 1.95, SY = 2.41
b=1.01, a=-1.34
986532
X
10
8
5
3
0
Y
M=
![Page 14: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/14.jpg)
Example (2)
Y Pred Resid Student Residual
Rstudent DFBETA a
DFBETAb
2 .6875 1.3125 1.072 1.0923 .7577 -.6044
3 1.7 1.3 .962 .9526 .3943 -.2546
1 1.7 -.7 -.518 -.476 -.1970 .1272
1 2.7125 -1.7125 -1.224 -1.3086 -.2524 .0423
3 2.7125 .2875 .206 .1846 .0356 -.006
2 3.725 -1.725 -1.256 -1.3584 .0198 -.2681
8 6.7625 1.2375 1.803 2.7249 -3.5303 4.4807
![Page 15: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/15.jpg)
Remedies
• Fit Curves if needed.
• Note heteroscedasticity for applied problems.
• Investigate all outliers. May delete them or not, depending. Report your actions.
![Page 16: Diagnostics](https://reader036.vdocuments.net/reader036/viewer/2022082420/568139c9550346895da178f4/html5/thumbnails/16.jpg)
Review
• What is leverage?
• What is a residual?
• How can you use residuals in assuring that the regression model is a good representation of the data?
• Why consider a standardized residual?
• What is a studentized residual?