chapter 3: examining relationships -...

43
1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3 Least-Squares Regression y = 3.9951x + 4.5711 R 2 = 0.9454 18 19 20 21 22 23 24 25 26 3.5 4.0 4.5 5.0 Fiber Tenacity, g/den Fabric Tenacity, lb/oz/yd^2

Upload: others

Post on 20-Oct-2019

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

1

Chapter 3: Examining Relationships

3.1 Scatterplots 3.2 Correlation 3.3 Least-Squares

Regression

y = 3.9951x + 4.5711R2 = 0.9454

181920212223242526

3.5 4.0 4.5 5.0

Fiber Tenacity, g/den

Fa

bri

c T

ena

city

, lb

/oz/

yd^

2

Page 2: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

2

Relationship Between Fiber Tenacity and Fabric Tenacity

Fiber Tenacity,g/den

Fabric Tenacity,lb/oz/yd2

3.6 19.0

3.9 20.5

4.1 20.8

4.3 21.0

4.8 23.0

5.0 24.9

Page 3: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

3

Variable Designations

•  Which variable is the dependent variable? –  Our text uses the term response variable.

•  Which variable is the independent variable? –  Explanatory variable

•  Note: Sometimes we do not have a clear explanatory-response variable situation … we may just want to look at the relationship between two variables.

•  Problems 3.1 and 3.4, p. 123

Page 4: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

4

Scatterplot 1: Relationship Between Fiber Tenacity and Fabric Tenacity

181920212223242526

3.5 4.0 4.5 5.0

Fiber Tenacity, g/den

Fabr

ic T

enac

ity, l

b/oz

/yd^

2

Note placement of response and explanatory variables. Also note axes labels and plot title.

Page 5: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

5

Problem 3.6, p. 125

•  Type data into your calculator. •  Examining a scatterplot: –  Look for the overall pattern and striking

deviations from that pattern. •  Pay particular attention to outliers

–  Look at form, direction, and strength of the relationship.

Page 6: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

6

Examining a Scatterplot, cont.

•  Form –  Does the relationship appear to be linear?

•  Direction –  Positively or negatively associated?

•  Strength of Relationship –  How closely do the points follow a clear form? –  In the next section, we will discuss the correlation

coefficient as a numerical measure of strength of relationship.

Page 7: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

7

Scatterplot for 3.6

Page 8: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

8

Problem 3.9, p. 129

Page 9: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

9

Tips for Drawing Scatterplots

•  p. 128

Page 10: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

10

0

10

20

30

40

50

60

60 70 80 90 100 110

Year (67=year 1967)

Inco

me

(Tho

usan

ds o

f Yea

r 20

00 D

olla

rs)

Black Hispanic White Asian

Adding a Categorical Variable to a Scatterplot

Page 11: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

11

Homework

•  Reading: pp. 121-135

Page 12: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

12

Practice

•  Problems: –  3.11 (p. 129) –  3.12 (p. 132) –  3.16 (p. 136)

Page 13: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

13

Figure 3.6, p. 136

Page 14: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

14

Which shows the strongest

relationship?

800

900

1000

1100

1200

1300

1400

1500

1600

30 40 50 60

200

600

1000

1400

1800

2200

0 20 40 60 80 100 120

Page 15: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

15

The two plots represent the same data!

•  Our eye is not good enough in describing strength of relationship. –  We need a method for quantifying the

relationship between two variables. •  The most common measure of relationship is

the Pearson Product Moment correlation coefficient. –  We generally just say “correlation coefficient.”

Page 16: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

16

Correlation Coefficient, r

•  The correlation, r, is an average of the products of the standardized x-values and the standardized y-values for each pair.

⎟⎟⎠

⎞⎜⎜⎝

⎛ −⎟⎟⎠

⎞⎜⎜⎝

⎛ −−

= ∑= y

in

i x

i

syy

sxx

nr

111

Page 17: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

17

Correlation Coefficient, r •  A correlation coefficient measures these characteristics of

the linear relationship between two variables, x and y.

–  Direction of the relationship •  Positive or negative

–  Degree of the relationship: How well do the data fit the linear form being considered? •  Correlation of (1 or -1) represents a perfect fit.

•  Correlation of (0) indicates no relationship.

Page 18: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

18

Interpreting Correlation Coefficient, r

•  Correlation Applet: http://www.duxbury.com/authors/mcclellandg/tiein/johnson/correlation.htm

•  Facts about correlation –  pp.143-144

•  Correlation is not a complete description of two-variable data. We also need to report a complete numerical summary (means and standard deviations, 5-number summary) of both x and y.

Page 19: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

19

Exercise 3.25, p. 146

Page 20: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

20

Outlier, or influential point?

•  Let’s enter the data into our calculators and calculate the correlation coefficient. The data are in the middle two columns of Table 1.10, p. 59. –  r=?

•  Now, remove the possible influential point. What happens to r?

Page 21: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

21

Page 22: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

22

Exercises: Understanding Correlation

•  Review “Facts about correlation,” pp. 143-144 •  3.34, 3.35, and 3.37, p. 149 •  Reading: pp. 149-157

Page 23: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

23

Relationship Between Winding Tension and Yarn Elongation

y = -0.0759x + 9.4455R2 = 0.732

6.06.57.07.58.08.59.0

10 15 20 25 30 35Winding Tension, g

Elongation%

Page 24: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

24 (e)error yyresidual^

i =−=

Least Squares Regression

•  Ultimately, we would like to predict elongation by using a more practical measurement, winding tension.

–  A regression line, also called a line of best fit, was found.

•  How was the line of best fit determined?

–  Determine mathematically the distance between the line and each data point for all values of x.

–  The distance between the predicted value and the actual (y) value is called a residual (or error).

Page 25: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

25

∑∑=

−=n

1i

2^

i2 )y(ye

•  The best-fitting line is the line that has the smallest sum of e2 ... the least squares regression line! That is, the line of best fit occurs when:

minimum )y(yen

1i

2^

i2 =−=∑∑

=

Least Squares Regression: Line of Best Fit

•  This could be done for each data point. If we square each residual and sum all of the squared residuals, we have:

Page 26: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

26

A Residual (Figure 3.11, p. 151)

Page 27: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

27

bxa +=^y

Least-Squares Regression Line

•  With the help of algebra and a little calculus, it can be shown that this occurs when:

x

y

ssrb =

xbya −=

Page 28: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

28

Exercise 3.12, p. 132

•  Is there a relationship between lean body mass and resting metabolic rate for females? –  Quantify this relationship.

•  Find the line of best fit (the least-squares regression, LSR).

•  Use the LSR to predict the resting metabolic rate for a woman with mass of 45 kg and for a woman with mass of 59.5 kg.

Page 29: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

29

Interpreting the Regression Model

•  The slope of the regression line is important for the interpretation of the data: –  The slope is the rate of change of the response

variable with a one unit change in the explanatory variable.

•  The intercept is the value of y-predicted when x=0. It is statistically meaningful only when x can actually take values close to zero.

Page 30: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

30 r = 0.85, r2 = 0.72

1- r2 = 0.28

R2: Coefficient of Determination

•  Proportion of variability in one variable that can be associated with (or predicted by) the variability of the other variable.

Page 31: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

31

Exercise 3.45, p. 166

Page 32: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

32

Exercise 3.45, p. 166

Page 33: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

33

Residuals

•  In regression, we see deviations by looking at the scatter of points about the regression line. The vertical distances from the points to the least-squares regression line are as small as possible, in the sense that they have the smallest possible sum of squares.

•  Because they represent “left-over” variation in the response after fitting the regression line, these distances are called residuals.

Page 34: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

34

Examining the Residuals

•  The residuals show how far the data fall from our regression line, so examining the residuals helps us to assess how well the line describes the data. –  Residuals Plot

Page 35: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

35

Residuals Plot

•  Let’s construct a residuals plot, that is, a plot of the explanatory variable vs. the residuals. –  pp. 174-175

•  The residuals plot helps us to assess the fit of the least squares regression line. –  We are looking for similar spread about the line

y=0 (why?) for all levels of the explanatory variable.

Page 36: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

36

Residuals Plot Interpretation, cont.

•  A curved or other definitive pattern shows an underlying relationship that is not linear. –  Figure 3.19(b), p. 170

•  Increasing or decreasing spread about the line as x increases indicates that prediction of y will be less accurate for smaller or larger x. –  Figure 3.19(c), p. 171

•  Look for outliers!

Page 37: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

37

Figures 3.19 (a-c), pp. 170-171

Page 38: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

38

How to create a residuals plot •  Create regression model using your calculator. •  Create a column in your STAT menu for residuals.

Remember that a residual is the actual value minus the predicted value:

residual = y ! y"

Page 39: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

39

Residuals Plot for 3.45

Page 40: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

40

HW

•  Read through end of chapter •  Problems: –  3.42 and 3.43 (parts a and b only), p. 165 –  3.46, p. 173

•  Chapter 3 Test on Monday

Page 41: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

41

Regression Outliers and Influential Observations

•  A regression outlier is an observation that lies outside the overall pattern of the other observations.

•  An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. –  Points that are outliers in the x direction of a scatterplot

are often influential for the least-squares regression line. •  Sometimes, however, the point is not influential when it

falls in line with the remaining data points. –  Note: An influential point may be an outlier in terms of

x, but we label it as “influential” if removing it significantly influences the regression.

Page 42: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

42

Practice Problems

•  Problems: –  3.56, p. 179 –  3.74, p. 188 –  3.76, p. 189

Page 43: Chapter 3: Examining Relationships - goblues.orggoblues.org/faculty/hillm/files/2009/08/Chapter-3.pdf · 1 Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3

43

Preparing for the Test

•  Re-read chapter. –  Know the terms, big concepts.

•  Chapter Review, pp. 181-182 •  Go back over example and HW problems. •  Study slides!