4.1 scatter diagrams and correlation. 2 variables ● in many studies, we measure more than one...

26
4.1 Scatter Diagrams and Correlation

Upload: alexia-moody

Post on 31-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

4.1Scatter Diagrams and

Correlation

Page 2: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

2 Variables

●In many studies, we measure more than one variable for each individual●Some examples are

Rainfall amounts and plant growthExercise and cholesterol levels for a group of

peopleHeight and weight for a group of people

●In these cases, we are interested in whether the two variables have some kind of a relationship

Page 3: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

2 Variables●When we have two variables, they could be related in one of several different waysThey could be unrelatedOne variable (the explanatory or predictor

variable) could be used to explain the other (the response or dependent variable)

One variable could be thought of as causing the other variable to change

●In this chapter, we examine the second case … explanatory and response variables

Page 4: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Lurking Variable•Sometimes it is not clear which variable is the explanatory variable and which is the response variable•Sometimes the two variables are related without either one being an explanatory variable•Sometimes the two variables are both affected by a third variable, a lurking variable, that had not been included in the study

Page 5: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Example of a Lurking Variable●A researcher studies a group of elementary school childrenY = the student’s heightX = the student’s shoe size

●It is not reasonable to claim that shoe size causes height to change●The lurking variable of age affects both of these two variables

Page 6: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

More Examples●Rainfall amounts and plant growth

Explanatory variable – rainfall Response variable – plant growthPossible lurking variable – amount of sunlight

●Exercise and cholesterol levelsExplanatory variable – amount of exerciseResponse variable – cholesterol levelPossible lurking variable – diet

Page 7: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Scatter Diagram•The most useful graph to show the relationship between two quantitative variables is the scatter diagram•Each individual is represented by a point in the diagram•The explanatory (X) variable is plotted on the horizontal scale•The response (Y) variable is plotted on the vertical scale

Page 8: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Scatter Diagram

•An example of a scatter diagram

•Note the truncated vertical scale!

Page 9: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Relations

●There are several different types of relations between two variablesA relationship is linear when, plotted on a scatter

diagram, the points follow the general pattern of a line

A relationship is nonlinear when, plotted on a scatter diagram, the points follow a general pattern, but it is not a line

A relationship has no correlation when, plotted on a scatter diagram, the points do not show any pattern

Page 10: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Positive vs. Negative

•Linear relations have points that cluster around a line•Linear relations can be either positive (the points slants upwards to the right) or negative (the points slant downwards to the right)

Page 11: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Nonlinear•Nonlinear relations have points that have a trend, but not around a line•The trend has some bend in it

Page 12: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Not Related

•When two variables are not related•There is no linear trend•There is no nonlinear trend

•Changes in values for one variable do not seem to have any relation with changes in the other

Page 13: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Examples

●Examples of nonlinear relations“Age” and “Height” for people (including both

children and adults)“Temperature” and “Comfort level” for people

●Examples of no relations“Temperature” and “Closing price of the Dow

Jones Industrials Index” (probably)“Age” and “Last digit of telephone number” for

adults

Page 14: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Linear Correlation Coefficient•The linear correlation coefficient is a measure of the

strength of linear relation between two quantitative variables•The sample correlation coefficient “r” is

•This should be computed with software (and not by hand) whenever possible

Page 15: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Linear Correlation Coefficient●Some properties of the linear correlation coefficientr is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms)

r is always between –1 and +1Positive values of r correspond to positive relations

Negative values of r correspond to negative relations

Page 16: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Linear Correlation Coefficient●Some more properties of the linear correlation coefficientThe closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation

The closer r is to –1, the stronger the negative relation … when r = –1, there is a perfect negative relation

The closer r is to 0, the less of a linear relation (either positive or negative)

Page 17: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Examples●Examples of positive correlation

●In general, if the correlation is visible to the eye, then it is likely to be strong

Page 18: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

•Examples of positive correlation

Strong Positiver = .8

Moderate Positiver = .5

Very Weakr = .1

Page 19: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Negative●Examples of negative correlation

●In general, if the correlation is visible to the eye, then it is likely to be strong

Page 20: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Strong Negativer = –.8

Moderate Negativer = –.5

Very Weakr = –.1

●Examples of negative correlation

Page 21: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Nonlinear●Nonlinear correlation

●Has an r = 0.1, but the difference is that the nonlinear relation shows a clear pattern (or lack of)

Page 22: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Correlation…●Correlation is not causation!●Just because two variables are correlated does not

mean that one causes the other to change●There is a strong correlation between shoe sizes

and vocabulary sizes for grade school childrenClearly larger shoe sizes do not cause larger vocabulariesClearly larger vocabularies do not cause larger shoe sizes

●Often lurking variables result in confounding

Page 23: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

4.3 Coefficient of Determination•R2 – coefficient of determination, measures the proportion of total variation in the response variable

that is explained by the least-squares regression line.

Page 24: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Example•Weight of Car Vs. Miles Per Gallon•Y = -.007036x + 44.8793•R = -964086•R2 = 929461

93% of the variability in miles per gallon can be explained by its linear relationship with the weight.

7% of miles per gallon would be explained by other factors

Page 25: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

Calculators•Draw a scatter diagram

Age HDL Cholesterol38 5742 5446 3432 5655 3552 4061 4261 3826 4738 4466 6230 5351 3627 4552 3849 5539 28

AGE VS. HDL CHOLESTEROLA doctor wanted to determine

whether a relation exists between a male’s age and his HDL (so-called good) cholesterol. He randomly selected 17 of his patients and

determined their HDL cholesterol levels. He obtained the following

data.

Page 26: 4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall

• New Document• Insert Lists & Spreadsheet• Column A (age) Column B (HDL)• Type in Data• Insert Data & Statistics (Ctrl I)• Put “age” on x-axis

(explanatory)• Put “HDL” on y-axis (response)• Observe Data (does there

appear to be a relationship)•Menu

• 6:regression• Linear Regression

Insert Calculator Page (Ctrl I)Run Linear Regression

Menu6: Statistics1: Stat Calculations3: Linear Regression

X List “age”Y List “HDL”ENTER

Record equation, r-value, and r2 - value