chapter 3 association: contingency, correlation, and regression

40
Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables

Upload: neron

Post on 06-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Chapter 3 Association: Contingency, Correlation, and Regression. Learn …. How to examine links between two variables. Section 3.2. How Can We Explore the Association Between Two Quantitative Variables?. Scatterplot. Graphical display of two quantitative variables: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 1 of 52

Chapter 3Association: Contingency,

Correlation, and Regression

Learn ….

How to examine links between two variables

Page 2: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 2 of 52

Section 3.2

How Can We Explore the Association Between Two Quantitative

Variables?

Page 3: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 3 of 52

Scatterplot Graphical display of two quantitative

variables:

• Horizontal Axis: Explanatory variable, x

• Vertical Axis: Response variable, y

Page 4: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 4 of 52

Example: Internet Usage and Gross National Product (GDP)

Page 5: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 5 of 52

Positive Association Two quantitative variables, x and y, are

said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y

Page 6: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 6 of 52

Negative Association

Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y

Page 7: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 7 of 52

Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

Page 8: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 8 of 52

Linear Correlation: r Measures the strength of the linear

association between x and y

• A positive r-value indicates a positive association• A negative r-value indicates a negative association• An r-value close to +1 or -1 indicates a strong linear

association• An r-value close to 0 indicates a weak association

Page 9: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 9 of 52

Calculating the correlation, r

))((11

yx syy

sxx

nr

Page 10: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 10 of 52

Example: 100 cars on the lot of a used-car dealership

Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? Positive association Negative association No association

Page 11: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 11 of 52

Section 3.3

How Can We Predict the Outcome of a Variable?

Page 12: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 12 of 52

Regression Line Predicts the value for the response

variable, y, as a straight-line function of the value of the explanatory variable, x

bxay ˆ

Page 13: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 13 of 52

Example: How Can Anthropologists Predict Height Using Human Remains?

Regression Equation:

is the predicted height and is the length of a femur (thighbone), measured in centimeters

xy 4.24.61ˆ

y x

Page 14: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 14 of 52

Example: How Can Anthropologists Predict Height Using Human Remains?

Use the regression equation to predict the height of a person whose femur length was 50 centimeters

ˆ 61.4 2.4(50)y

Page 15: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 15 of 52

Interpreting the y-Intercept

y-Intercept: • the predicted value for y when x = 0

• helps in plotting the line

• May not have any interpretative value if no observations had x values near 0

Page 16: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 16 of 52

Interpreting the Slope Slope: measures the change in the

predicted variable for every unit change in the explanatory variable

Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height

Page 17: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 17 of 52

Slope Values: Positive, Negative, Equal to 0

Page 18: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 18 of 52

Residuals Measure the size of the prediction

errors

Each observation has a residual

Calculation for each residual:ˆy y

Page 19: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 19 of 52

Residuals

A large residual indicates an unusual observation

Large residuals can easily be found by constructing a histogram of the residuals

Page 20: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 20 of 52

“Least Squares Method” Yields the Regression Line

Residual sum of squares:

The optimal line through the data is the line that minimizes the residual sum of squares

2 2ˆ( ) ( )residuals y y

Page 21: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 21 of 52

Regression Formulas for y-Intercept and Slope

Slope:

Y-Intercept:

( )yx

sb r

s

( )a y b x

Page 22: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 22 of 52

The Slope and the Correlation Correlation:

• Describes the strength of the association between 2 variables

• Does not change when the units of measurement change

• It is not necessary to identify which variable is the response and which is the explanatory

Page 23: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 23 of 52

The Slope and the Correlation Slope:

• Numerical value depends on the units used to measure the variables

• Does not tell us whether the association is strong or weak

• The two variables must be identified as response and explanatory variables

• The regression equation can be used to predict the response variable

Page 24: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 24 of 52

Section 3.4

What Are Some Cautions in Analyzing Associations?

Page 25: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 25 of 52

Extrapolation Extrapolation: Using a regression line

to predict y-values for x-values outside the observed range of the data• Riskier the farther we move from the range

of the given x-values• There is no guarantee that the relationship

will have the same trend outside the range of x-values

Page 26: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 26 of 52

Regression Outliers

Construct a scatterplot

Search for data points that are well removed from the trend that the rest of the data points follow

Page 27: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 27 of 52

Influential Observation An observation that has a large effect on

the regression analysis

Two conditions must hold for an observation to be influential:

Its x-value is relatively low or high compared to the rest of the data

It is a regression outlier, falling quite far from the trend that the rest of the data follow

Page 28: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 28 of 52

Which Regression Outlier is Influential?

Page 29: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 29 of 52

Example: Does More Education Cause More Crime?

Page 30: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 30 of 52

Correlation does not Imply Causation

A correlation between x and y means that there is a linear trend that exists between the two variables

A correlation between x and y, does not mean that x causes y

Page 31: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 31 of 52

Lurking Variable

A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest

Page 32: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 32 of 52

Simpson’s Paradox

The direction of an association between two variables can change after we include a third variable and analyze the data at separate levels of that variable

Page 33: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 33 of 52

Example: Is Smoking Actually Beneficial to Your Health?

Page 34: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 34 of 52

Example: Is Smoking Actually Beneficial to Your Health?

Page 35: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 35 of 52

Example: Is Smoking Actually Beneficial to Your Health?

Page 36: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 36 of 52

Example: Is Smoking Actually Beneficial to Your Health?

Page 37: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 37 of 52

Example: Is Smoking Actually Beneficial to Your Health?

An association can look quite different after adjusting for the effect of a third variable by grouping the data according to the values of the third variable

Page 38: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 38 of 52

Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire

Would you expect the correlation to be negative, zero, or positive?

a. Negativeb. Zeroc. Positive

Page 39: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 39 of 52

If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse?

a. Yesb.No

Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire

Page 40: Chapter 3 Association: Contingency,  Correlation, and Regression

Agresti/Franklin Statistics, 40 of 52

Identify a third variable that could be considered a common cause of x and y:

a. Distance from the fire stationb. Intensity of the firec. Time of day that the fire was

discovered

Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire