c hapter 5 summarizing bivariate data what conclusions can be made when considering the effect of...

47
CHAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Upload: anna-osborne

Post on 05-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

CHAPTER 5

Summarizing Bivariate Data

What conclusions can be made when considering the effect of one treatment on another?

Page 2: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

SCATTERPLOTS5-1 What is a scatterplot and what can be

determined from them?

Page 3: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

TYPES OF DATA

Univariate—one list

Bivariate—two lists

Multivariate—multiple lists

Page 4: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

SCATTERPLOT

The most important graphical representation of bivariate data

Plotted on a Cartesian coordinate system

graphs

Page 5: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.1 HOMEWORK

Page 150-151 2, 4, 6, 8

Page 6: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

CORRELATION5-2

WHAT IS MEANT BY CORRELATION?

Page 7: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Strong Negative Correlation

As x increases, y decreases

Strong Positive Correlation

As x increases, y increases

No Correlation

x and y do not appear to related

Page 8: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Correlation coefficient—

Indicates the strength of the relationship of bivariate data.

Pearson’s correlation coefficient is the most commonly used and often called simply THE correlation coefficient.

Page 9: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Find , Sx (ave. x, sd of x)

, Sy (ave. y, sd of y)

zx (calc the z-score for each xi)

zy (calc the z-score for each yi)

multiply zx zy (multiply the zx and the zy)

Calc. r

remember -1 ≤ r ≤ 1

To calculate Pearson’s Correlation Coefficientby hand

X Y zx zy zx zy1n

zzr yx

xy

Use the chart to help

Page 10: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Enter the data in L1, L2 Turn on the diagnostics Find the linear

regression for the data

To calculate Pearson’s Correlation Coefficientby calculator

Page 11: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Strong Negative Correlation

As x increases, y decreases

Strong Positive Correlation

As x increases, y increases

No Correlation

x and y do not appear to related

Correlation values:-1 to -.8 and .8 to 1 strong-.8 to -.5 and .5 to .8 moderate-.5 to .5 weak

Same Slide as before with an addition

Page 12: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

EXAMPLE 1observation 1 2 3 4 5 6 7 8 9 10

crisis management score 20 13 27 18 19 21 0 21 21 11

family strength score 50 60 67 57 49 72 50 68 60 58

Find the correlation coefficient for crisis management vs family strength

Using both the calculator and excel

Repeat switching L1 and L2 on the calculator

what does this indicate?

n

yy

n

xx

n

yxxy

r2

22

2 )()(Alternate method:

Listed on formula sheet

Will only be used if they give you summary statistics

Page 13: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Properties of r Does not depend on the unit of measurement Does not depend on which is labeled x Is always between -1 and 1 1 indicates a strong positive correlation 0 indicates no correlation -1 indicates a strong negative correlation--measures the extent to which x and y have a linear

relationship

r – for the sample

Page 14: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Correlation DOES NOT imply causation Often two items have a high correlation not because

they impact each other but because they are strongly related to a third item

EX.Among elementary students, there is a strong positive correlation between vocabulary size and the number of cavities. WHY?

They are both related to age.

Page 15: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

Spearman’s Rank correlation Coefficient Not as effected by “outliers” Order the x’s low to high Order the y’s low to high Keep the original x and y togetherEX

Use the calculator as before OR

12)1)(1(

4)1(

))((2

nnn

nnyrankxrank

rs

2 1 3 4

X 3 -2 5 7

Y 6 9 4 12

2 3 1 4

-1< rs < 1

Page 16: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.2 HOMEWORK

P 163 5.9, 5.10, 5.12, 5.13,

5.14, 5.16, 5.18, 5.22

Page 17: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.3 FITTING A LINE TO BIVARIATE DATA How do you fit a line to linear data?

Page 18: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.3 FITTING A LINE TO BIVARIATE DATA Activation:

Given the following points, find the equation

X Y .-2 2

0 -2 2 -6

Page 19: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

VARIABLES DEFINED

X = the independent or explanatory variable

Y = the dependent or response variable

Stat version of the linear regression (#8)y = a + bx

Algebra and calculus version (#4)y = ax + b

The slope and y-intercept are the same but stat prefers the other set up

Page 20: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

REGRESSION LINEFORMED BY THE PRINCIPLE OF LEAST SQUARES

Determine the vertical distance each point is to the line which is supposed to represent the overall pattern of the data

if y = a + bx then

the predicted points are (x1, y1), (x2, y2), (x3, y3), etc.

the vertical distance is

yi – (a + bxi)

if this is positive yi is above the prediction line

if this is negative yi is below the prediction line

Page 21: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

The least squares regression line is the one that minimizes

The formula for the least squares line is

a and b can be calculated by

(on the AP STAT formula sheet)

LEAST SQUARES REGRESSION LINE

2))(( ii bxay

bxay ˆ

2)(

))((

xx

yyxxb xbya

Page 22: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

CALCULATING BY HAND

n

xx

n

yxxy

b

2

2 )(

xbya

These values can be calculated straight from the data. This formula is not on the formula sheet and is only used when the summary values are given.

Page 23: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

LEAST SQUARES REGRESSION LINE

USE for INTERPOLATION not EXTRAPOLATION

Interpolation—data values between the given values

Extrapolation—data values beyond the given values If you are asked to extrapolate always state that

the values may not be accurate due to extrapolation

Page 24: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

EXAMPLEAge in months Height in inches

19 22

21 23

23

24 25

27 28

29 31

31 28

34 32

38 34

43 39

50 45

72 48

84 54

58

120 62

128

Find the linear regression line for the given data: then find the values for the missing data

Page 25: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

MINITAB INFOxy 407.354.61ˆ

a

The Regression equation isChollevl=61.5 + 3.41 perchgwt

Predictor Coef Stdev t-ratio pConstant 61.537 2.268 27.13 0.000Perchgwt 3.407 1.028 3.31 0.007

value of a value of b (slope)% weight change

Cho

lest

erol

leve

l

Should only be used to predict cholesterol from weight. And only weights from -5 to 3 should be used with any certainty.

Page 26: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

USING PEARSON’S CORRELATION COEFFICIENT AND ALGEBRAIC MANIPULATION:

Given and

1) If

Then

2) If r =1

if

if

3) If it is not a perfect correlation let r =.5

Then substituting

this means that y will be r standard deviations from

that x is from

Hence it pulls (regresses) y back into the line

x

y

s

srb )(ˆ xx

s

sryy

x

y

xx

yy ˆ

)(ˆ xxs

syy

x

y

xsxx 1

ysyy ˆ

xsxx 2

ysyy 2ˆ

)(5.ˆ xxs

syy

x

y

xsxx 1

ysyy 5.ˆ

yx

Page 27: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.3 HOMEWORK

Page 174-176 26, 27, 28, 31, 32, 34

Page 28: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.4 ASSESSING THE FIT OF A LINE

How do you assess how well a line fits the data?

Page 29: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

3 CHECKS FOR FIT

1) Is a line an appropriate way to summarize the data (does it the shape appear to be linear)

2) Are there any unusual aspects of the data that

need to be considered before making predictions

3) How accurate can we expect these predictions to

be

Page 30: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

FINDING RESIDUALS The distance from the actual or observed to the

predicted value (HINT: this is an AP class a residual is Actual – Predicted)

ii yy ˆUsing the calculator to find residuals L1=x L2=y L3= predicted L3

vars stat 5EqReg EQreplace the X in Reg EQ w/L1

L4 = residuals

L4 type L2 – L3

Page 31: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

PLOTTING RESIDUALS OR

There are two types of residuals that can be plotted Each gives us a picture that can be examined

Residuals for a good fit should have no particular pattern but should be in a band not be too far from zero

)ˆ,( yyx )ˆ,ˆ( yyy

Page 32: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

WHAT TO LOOK FOR

Removal of the data causing a single large residual has a minimal impact on the regression line

Removal of a single influential point, has a large impact on the regression line.

An influential point is one where the x is not in the same group as the rest of the values.

Page 33: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

THE COEFFICIENT OF DETERMINATION

Gives the proportion of variation in y that is attributed to the approximate linear relationship between x and y.

0

2 Re1

SST

sidSSr

Amount actually attributed to the linear relationship

Possible amount explained by a linear relationship

Amount not attributed to a linear relationship

Page 34: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

SST0 AND SSRESID CALCULATIONS

SST0

Total sum of squares squared variation from

mean of

SSResid The amount of variation

not attributed to a linear relationship

Referred to as the errorsum of squares

SSResid ≤SST0

y2

0 )( yySST i

2)ˆ(Re ii yysidSS

Easy Computational Formulas

SST0=

SSResid =

All items can be obtained from the regression line and 2 variable stats function including the coefficient of determination

n

yy

22 )(

xybyay2

0

2 Re1

SST

sidSSr

Page 35: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

STANDARD DEVIATION ABOUT THE LEAST SQUARES LINE

Denoted Se => means the Standard Deviation of error

n-2 relates to degrees of freedom—to be discussed later

For a truly good fit r2 must be larger than .5 and Se should be low

2

Re

n

sidSSSe

Page 36: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

MINITAB AND CORRELATION

Page 179

Page 37: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

EXAMPLE

Page use data from 5.441)Use the calculator to :

a)draw a scatterplot

b) find the regression line

c) find the correlation coefficient

d) calculate the predicted values

e) calculate the residuals

f) graph the residuals

X Y

92 1.7

92 2.3

96 1.9

100 2.0

102 1.5

102 1.7

106 1.6

106 1.8

121 1.0

143 0.3

Page 38: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.4 HOMEWORK

Page 188-191 37, 38, 39, 41, 42, 43, 48, 51 c&d

Page 39: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.5 NONLINEAR RELATIONSHIPS AND TRANSFORMATION

How are nonlinear relationships explained?

Page 40: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

TRANSFORMATIONS

DO NOT mean moved from the parent function

DO mean adjusting x and/or y values so that the new points appear linear

Common transformations are sq. roots, logs, and reciprocals

originalAlgebraic transformation

Page 41: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

QUADRATIC AND CUBIC FUNCTIONS

Use a graphing calculator or a STAT package such as minitab or fathom

Quadratic equations can be done by hand although it is not recommended

2)ˆ( yy

0

2

0

2

)ˆ(1

Re1

SST

yy

SST

sidSSR

Page 42: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

UNDOING A TRANSFORMATION y’ = 1.14 – 1.92x where y’ = log (y)log y = 1.14 – 1.92x10log y = 10 1.14 – 1.92x

y = 101.14 – 1.92x

y = (101.14)(10-1.92x) y = 13.8038 (10-1.92x)

Undoing a transformation yields a curve that fits the data, but is not a least squares line.

Page 43: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

DETERMINING WHICH TRANSFORMATION TO USE

+y

-y

-x +x

12

43

If the curve resembles one of the numbered curves to achieve a linear transformation move up(+) or down (-) the power chart as indicated by the closest part of the x or y axis.

Power Function Name

3 X3 Cube

2 X2 Square

1 X No transformation

½ Sq. Root

1/3 Cube Root

0 log x Log

-1 1/x Reciprocal

3 x

x

Page 44: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

EXAMPLE

frying time moisture

x y5 16.310 9.715 8.120 4.225 3.430 2.945 1.960 1.3

#3 curve therefore x and/or y down

frying time moisture transformation

x y log(y)5 16.3 1.21218760410 9.7 0.98677173415 8.1 0.90848501920 4.2 0.6232492925 3.4 0.53147891730 2.9 0.46239799845 1.9 0.27875360160 1.3 0.113943352

Is the transformed data linear?

Find the linear regression on the transformation

Check the residual pattern. Try a different transformation. Plot this residual pattern. Which one looks better? Which has a better r value.

Page 45: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.5 HOMEWORK

Page 206-207 52, 53, 59

Page 46: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

5.6 INTERPRETING THE RESULTS OF STATISTICAL ANALYSIS

Read pages 208-209

Page 47: C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of one treatment on another?

REVIEW

Page 210-213 61, 63, 64, 66, 68, 69