describing bivariate relationships chapter 3 summary yms ap stats chapter 3 summary yms ap stats

28
Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats

Upload: walter-hall

Post on 14-Jan-2016

241 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Describing Bivariate Relationships

Describing Bivariate Relationships

Chapter 3 SummaryYMS

AP Stats

Chapter 3 SummaryYMS

AP Stats

Page 2: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

3.1 Response Vs. Explanatory Variables

3.1 Response Vs. Explanatory Variables

• Response variable measures an outcome of a study, explanatory variable helps explain or influences changes in a response variable (like independent vs. dependent).

• Calling one variable explanatory and the other response doesn’t necessarily mean that changes in one CAUSE changes in the other.

• Ex: Alcohol and Body temp: One effect of Alcohol is a drop in body temp. To test this, researches give several amounts of alcohol to mice and measure each mouse’s body temp change. What are the explanatory and response variables?

Page 3: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

ScatterplotsScatterplots

Page 4: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Examining ScatterplotsExamining

Scatterplots Overall pattern

• Direction

• Form

• Strength

• Outliers or deviations

Page 5: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Interpreting ScatterplotsInterpreting Scatterplots

• Direction: in previous example, the overall pattern moves from upper left to lower right. We call this a negative association.

• Form: The form is slightly curved and there are two distinct clusters. What explains the clusters? (ACT States)

• Strength: The strength is determined by how closely the points follow a clear form. The example is only moderately strong.

• Outliers: Do we see any deviations from the pattern? (Yes, West Virginia, where 20% of HS seniors take the SAT but the mean math score is only 511).

Page 6: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

AssociationAssociation

Page 7: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Introducing Categorical Variables

Introducing Categorical Variables

Page 8: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Calculator ScatterplotCalculator Scatterplot

• Enter the Degree-Days in L1 and Gas in L2

• Next specify scatterplot in Statplot menu (first graph). X list L1 Y List L2 (explanatory and response)

• Use ZoomStat.

• Notice that their are no scales on the axes and they aren’t labeled. If you are copying your graph to your paper, make sure you scale and label the Axis (use Trace)

moth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Degree-Day 24 51 43 33 26 13 4 0 0 1 6 12 30 32 52 30Gas (100cuft)

6.3 10.9

8.9 7.5 5.3 4.0 1.7 1.2 1.2 1.2 2.1 3.1 6.4 7.2 11.0

6.9

Page 9: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Correlation rCorrelation r

• The Correlation measures the direction and strength of the linear relationship between 2 variables.

• Formula- (don’t need to memorize or use): r =

• In Calc: Go to Catalog (2nd, zero button), go to DiagnosticOn, enter, enter. You only have to do this ONCE! Once this is done:

• Enter data in L1 and L2 (you can do calc-2 var stats if you want the mean and sd of each)

• Calc, LinReg (A + Bx) enter

ZxZyn 1

Page 10: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Interpreting CorrelationInterpreting Correlation

• Caution- our eyes can be fooled! Our eyes are not good judges of how strong a linear relationship is. The 2 scatterplots depict the same data but drawn with a different scale. Because of this we need a numerical measure to supplement the graph.

Page 11: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Interpreting r Interpreting r • The absolute value of r tells you the strength of the

association (0 means no association, 1 is a strong association)

• The sign tells you whether it’s a positive or a negative association. So r ranges from -1 to +1

• Note- it makes no difference which variable you call x and which you call y when calculating correlation, but stay consistent!

• Because r uses standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. (Ex: Measuring height in inches vs. ft. won’t change correlation with weight)

• values of -1 and +1 occur ONLY in the case of a perfect linear relationship , when the variables lie exactly along a straight line.

Page 12: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

ExamplesExamples1. Correlation requires that both variables be quantitative

2. Correlation measures the strength of only LINEAR relationships, not curved...no matter how strong they are!

3. Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot

4. Correlation is not a complete summary of two-variable data, even when the relationship is linear- always give the means and standard deviations of both x and y along with the correlation.

Page 13: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

3.3- least squares regression

3.3- least squares regression

Text

The slope here B = .00344 tells us that fat gained goes down by .00344 kg for each added calorie of NEA according to this linear model. Our regression equation is the predicted RATE OF CHANGE in the response y as the explanatory variable x changes.

The Y intercept a = 3.505kg is the fat gain estimated by this model if NEA does not change when a person overeats.

Page 14: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

PredictionPrediction

• We can use a regression line to predict the response y for a specific value of the explanatory variable x.

Page 15: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

LSRL LSRL • In most cases, no line will pass exactly

through all the points in a scatter plot and different people will draw different regression lines by eye.

• Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatter plot

• A good regression line makes the vertical distances of the points from the line as small as possible

• Error: Observed response - predicted response

Page 16: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

LSRL Cont. LSRL Cont.

Page 17: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Equation of LSRLEquation of LSRL

• Example: The Sanchez household is about to install solar panels to reduce the cost of heating their house. In order to know how much the panels help, they record their consumption of natural gas before the panels are installed. Gas consumption is higher in cold weather, so the relationship between outside temp and gas consumption is important.

Page 18: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Facts about Least-Squares regressionFacts about Least-Squares regression

• The distinction between explanatory and response variables is essential in regression. If we reverse the roles, we get a different least-squares regression line.

• There is a close connection between corelation and the slope of the LSRL. Slope is r times Sy/Sx. This says that a change of one standard deviation in x corresponds to a change of 4 standard deviations in y. When the variables are perfectly correlated (4 = +/- 1), the change in the predicted response y hat is the same (in standard deviation units) as the change in x.

• The LSRL will always pass through the point (X bar, Y Bar)

• r squared is the fraction of variation in values of y explained by the x variable

Page 19: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats
Page 20: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

R squared- Coefficient of determination

R squared- Coefficient of determination

If all the points fall directly on the least-squares line, r squared = 1. Then all the variation in y is explained by the linear relationship with x.

So, if r squared = .606, that means that 61% of the variation in y among individual subjects is due to the influence of the other variable. The other 39% is “not explained”.

r squared is a measure of how successful the regression was in explaining the response

Page 21: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

3.3 Influences3.3 Influences• Correlation r is not resistant. Extrapolation

is not very reliable. One unusual point in the scatterplot greatly affects the value of r. LSRL also not resistant.

• A point extreme in the x direction with no other points near it pulls the line toward itself. This point is influential.

Page 22: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Lurking Variables- Beware!

Lurking Variables- Beware!

• Example: A college board study of HS grads found a strong correlation between math minority students took in high school and their later success in college. News articles quoted the College Board saying that “math is the gatekeeper for success in college”.

• But, Minority students from middle-class homes with educated parents no doubt take more high school math courses. They are also more likely to have a stable family, parents who emphasize education, and can pay for college etc. These students would likely succeed in college even if they took fewer math courses. The family background of students is a lurking variable that probably explains much of the relationship between math courses and college success.

Page 23: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

• The error of our predictions, or vertical distance from predicted Y to observed Y, are called residuals because they are “left-over” variation in the response.

ResidualsResiduals

One subject’s NEA rose by 135 calories. That subject gained 2.7 KG of fat. The predicted gain for 135 calories is

Y hat = 3.505- .00344(135) = 3.04 kg

The residual for this subject is

y - yhat= 2.7 - 3.04 = -.34 kg

Page 24: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Residual PlotResidual Plot

• The sum of the least-squares residuals is always zero.

• The mean of the residuals is always zero, the horizontal line at zero in the figure helps orient us. This “residual = 0” line corresponds to the regression line

Page 25: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Examining Residual PlotExamining Residual Plot• Residual plot should show no obvious pattern. A

curved pattern shows that the relationship is not linear and a straight line may not be the best model.

• Residuals should be relatively small in size. A regression line in a model that fits the data well should come close” to most of the points.

• A commonly used measure of this is the standard deviation of the residuals, given by:

s residuals

2n 2

For the NEA and fat gain data, S = 7.663

14.740

Page 26: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Residuals List on Calc

Residuals List on Calc

• If you want to get all your residuals listed in L3 highlight L3 (the name of the list, on the top) and go to 2nd- stat- RESID then hit enter and enter and the list that pops out is your resid for each individual in the corresponding L1 and L2. (if you were to create a normal scatter plot using this list as your y list, so x list: L1 and Y list L3 you would get the exact same thing as if you did a residual plot defining x list as L1 and Y list as RESID as we had been doing).

This is a helpful list to have to check your work when asked to calculate an individuals residual.

Page 27: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Residual Plot on CalcResidual Plot on Calc

• Produce Scatterplot and Regression line from data (lets use BAC if still in there)

• Turn all plots off

• Create new scatterplot with X list as your explanatory variable and Y list as residuals (2nd stat, resid)

• Zoom Stat

Page 28: Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats

Bivariate RelationshipsBivariate RelationshipsWhat is Bivariate data?When exploring/describing a bivariate (x,y) relationship:

Determine the Explanatory and Response variablesPlot the data in a scatterplotNote the Strength, Direction, and FormNote the mean and standard deviation of x and the mean and standard deviation of yCalculate and Interpret the Correlation, rCalculate and Interpret the Least Squares Regression Line in context.Assess the appropriateness of the LSRL by constructing a Residual Plot.