e xamining r elationships residuals and residual plots
TRANSCRIPT
EXAMINING RELATIONSHIPS
Residuals and Residual Plots
FACTS ABOUT LEAST SQUARES LINE
Must be clear on Explanatory & Response Variables Switching the variables changes your equation
Line always passes through the point (x-bar,y-bar) This always gives us a point to start w/ or use
during graphing Correlation is closely related to slope
Smaller r = smaller effect of x on predictions r and r2 help define the strength of a straight
line relationship between the variables Higher values = stronger relationship
RESIDUALS Press the Button and I will use Linear Regression to tell your Future!! (or at least something close to
it!!)Just like our friend ZOLTAR we can make predictions using our Line of Best Fit.
However, do we know just how good our predictions are? Would we be willing to put a lot of CASH MONEY down to back them up?
Luckily, we have an indicator in statistics that can help us decide the strength of our predictions AND
tell us if a line is the “Best Fit”.
RESIDUALS
Unless your r value is perfect, your predictions won’t be
A residual is the difference between the actual value and your predicted value
Each value observed value has a residual
The sum of the residuals is always 0 (or really, really close)
Should be… If not, that equation might not be the best fit!
-roundoff error – when earlier values are rounded, the sum may not equal exacty 0
Residual =^
yy
GRAPHING THE LSL ON YOUR SCATTER PLOT
Using the Bone Data, Let’s look at how we get the residuals (and how your calculator does it)
Femur Humerus
38 41
56 63
59 70
64 72
74 84
y = -3.659486682 + 1.196900115x
Plug in all your x values into the equation and get a predicted y-hat
Femur Predicted Humerus (y-hat)
38 -3.659486682 + 1.196900115(38)
56 -3.659486682 + 1.196900115(56)
59
64
74
Femur Residual (y – y-hat)
38 41 – 41.82271769
56 56 – 63.3669976
59
64
74
Now, subtract the PREDICTED value from ACTUAL value.
RESIDUAL PLOT
Scatterplot of the residuals against the explanatory variable (x). Assess the fit of the regression line
Does your plot show the line fits?
Residuals Fit
No pattern Good Fit
Curve Non Linear
Increasing spread
Worse predictions for larger x
Decreasing Spread
Smaller x, worse predictions
o Individual Points w/ Large Residuals = Outliers in y
o Individual Points extreme in x = Influential Points
Why use Residuals?
The residual plot describes how well a
LINEAR model fits our data
RESIDUAL PLOT ON CALCULATOR
Plot the scatterplot of the data Find the least squares equation (LinReg y=a+bx) Put the equation into Y1 and graph it
In L3, You need to get the residuals (quickly)
Go to the top of L3 – 2nd Stat - RESID
Press enter (*Your calculator finds them for you!! YIPPEEE!!)
You have to have STAT: CALC: 8; 1st, before you run the Residuals… You’re calculator has to have an equation to plug into to find the Residuals
Now do a scatterplot with Xlist = L1 and Ylist = L3 (residuals)
The line in the middle is the least squares line.
You can do 1 Variable Stats
your RESID list to find out if the
residual sum is 0.
RESIDUALS ON CALCULATOR (SCREENSHOTS) – BY HAND PRACTICE? Run the GESSEL program
ScatterplotCalc
FunctionRegression
Stats Plot w/ EQ
Residual List Function
Residual Plot
INFLUENTIAL POINT VS. OUTLIER
Outlier – observation that is outside overall pattern (out of whack in the Y direction)
Influential Point – observation that IF removed would dramatically change the result of least squares line and/or predictions (way out in the X direction)
INFLUENTIAL POINT VS. OUTLIER
Let’s Change Child 19’s test score from 121 to 85 and see what happens to the EQ and Graph
ORIGINAL NEW
Notice the minimal change in the
equation and graph… This is an example of
why Child 19 is considered an outlier. An “outlier” in y has a minimal effect on the
equation and subsequent predicted
values.The change here is in the R values.
INFLUENTIAL POINT VS. OUTLIER
Let’s Change Child 18’s test score from 57 to 85 and see what happens to the EQ and Graph
ORIGINAL NEW
Notice the dramatic change in the equation and graph… This is an example of why Child 18 is considered an
influential point. A point in the extreme x can
dramatically effect the position of the least
squares line.
HOMEWORK
Anscombe Discovery#46