a regression of this data gives: r-squared = 92.6% s =0.2417 variable coefficient

26
This is a scatterplot of men’s age at first marriage against year at every census from 1890 to 1940. Comment on this scatterplot.

Upload: lavina

Post on 18-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

This is a scatterplot of men’s age at first marriage against year at every census from 1890 to 1940. Comment on this scatterplot. A regression of this data gives: R-squared = 92.6% s =0.2417 Variable Coefficient Intercept 25.7 Year -0.04 What does all of this information tell us?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

This is a scatterplot of men’s age at first marriage against year at every census from 1890 to 1940.

Comment on this scatterplot.

Page 2: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

ˆ 25.7 0.04age year

A regression of this data gives:

• R-squared = 92.6%• s =0.2417• Variable Coefficient• Intercept 25.7• Year -0.04

• What does all of this information tell us?

Page 3: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Use the model to predict the median age for

first marriage in the year 2000.

What is the name of the word that describes

using data to “predict” values that are “far

off” ?

Page 4: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Extrapolation is always dangerous!

Page 5: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

FOXTROT Cartoon

Page 6: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

When describing unusual points:

High Leverage Points:• A data point can be unusual if the x value

is far from the mean of the x-values.

Page 7: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Influential Points are a kind of high leverage point

Omitting it from the analysis gives a very different model

Influence depends on both leverage and its residual

Page 8: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Example #1

Page 9: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Example #1

• There is the case with moderate leverage but with a very large residual (can be influential)

Page 10: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Example #2

Page 11: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Example #2

• There is the case with high leverage whose y-value sits right on the line of fit (this is not influential – it does not change the slope but does change the R2.

Page 12: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient
Page 13: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

EXAMPLE #3

• There is also the case of extreme leverage where a point pulls the line right to it. This is highly influential but it’s residual is small.

Page 14: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Example #3

Page 15: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Example #4

Page 16: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Regression Applet

• http://www.stat.sc.edu/~west/javahtml/Regression.html

• Experiment with adding additional points to a scatterplot and seeing how the regression line changes.

Page 17: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

COMPUTER LAB EXPLORATION1)      REGRESSION BY EYE

Go to: http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html and read the instructions on that page.  Really read them before you begin.  Click begin on the left side of the page and start guessing!

2)      EXPLORING INFLUENTIAL & HIGH LEVERAGE POINTS

Go to http://illuminations.nctm.org/LessonDetail.aspx?ID=L456 and read the instructions.  Explore the effect of outliers (influential and leverage points) on the correlation coefficient and the line of least squares

3)      RESIDUAL PLOTS

Go to: http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html and read the directions.  Then start plotting points and you will see the line of least squares forming as well as the residual plots

Page 18: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Regression towards the Mean

Page 19: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

This is a closed wallet multiple choice test. If you are not sure of an answer, guess. Write your answers somewhere in your notebook or on a piece of scrap paper.

MONEY TEST

I. On the back of a nickel is:(a) Monticello(b) The Jefferson Memorial

II. On the back of a $2 bill is(c) Signers of the Declaration(d) Independence Hall

III. On the front of a $500 bill is(e) Madison(f) McKinley

Page 20: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Grade yourself (out of 3)MONEY TEST

I. On the back of a nickel is:(a) Monticello(b) The Jefferson Memorial

II. On the back of a $2 bill is(c) Signers of the Declaration(d) Independence Hall

III. On the front of a $500 bill is(e) Madison(f) McKinley

Page 21: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Let’s record our results

• Give us the number correct (out of 3)• My score was 1 / 3

Page 22: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

MONEY MAKE-UP TEST• This is a closed wallet, multiple choice test. If you’re not sure of

an answer, take your best guess.

I. On the front of a $20 bill is(a) Jefferson(b) Jackson

II. On a dollar bill, Washington is looking to his:(a) left(b) right

III. On the front of a $1000 bill is:(a) Cleveland(b) Wilson

Page 23: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Give yourself a grade on the make up

I. On the front of a $20 bill is(a) Jefferson(b) Jackson

II. On a dollar bill, Washington is looking to his:(a) left(b) right

III. On the front of a $1000 bill is:(a) Cleveland(b) Wilson

Page 24: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

Let’s record our make-up exam grades next to our old exams.

FIRST EXAM SCORE SECOND EXAM SCORE vf

Page 25: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

CALCULATE

• What’s the average score on the first test for the “remedial” students (those that missed two or three questions)?

• What was the average score on the second test for the remedial students?

• What was the average score for the “star students” (those that got all three right) on the make-up test?

Page 26: A regression of this data gives: R-squared = 92.6% s =0.2417 Variable   Coefficient

The Regression Effect / Regression towards the Mean

• Explain why the scores of the “remedial” students tended to go up and the scores of the “star” students tended to go down?