lecture13 - php.scripts.psu.edu

5
9/30/13 1 Sept. 30 Statistic for the Day: (David) Justice vs. (Derek) Jeter, 1995 and 1996: Overall batting average: Jeter’s is higher (.310 vs. .270) Each year: Justice’s is higher (.253 vs. .250 in 1995, .321 vs. .314 in 1996) Assignment: Read Chapter 10 weight calories 1 Big Montana 309 g 590 2 Giant Roast Beef 224 450 3 Regular Roast Beef 154 320 4 Beef n Cheddar 195 440 5 Super Roast Beef 230 440 6 Junior Roast Beef 125 270 7 Chicken Breast Fillet 233 500 8 Chicken Bacon n Swiss 209 550 9 Roast Chicken Club 228 470 10 Market Fresh Turkey Ranch Bacon 379 830 11 Market Fresh Ultimate BLT 293 780 12 Market Fresh Roast Beef Swiss 357 780 13 Market Fresh Roast Ham Swiss 357 700 14 Market Fresh Roast Turkey Swiss 357 720 15 Market Fresh Chicken Salad 322 770 Arby’s sandwiches (from a while ago) weight calories 1 Big Montana 309 g 590 2 Giant Roast Beef Max 224 281 450 580 3 Regular Roast Beef Classic 154 320 350 4 Beef n Cheddar Classic 195 440 5 Super Roast Beef Mid 230 210 440 6 Junior Roast Beef 125 87 270 210 7 Chicken Breast Fillet Crispy 233 221 500 510 8 Chicken Bacon n Swiss Crispy 209 205 550 610 9 Roast Chicken Grand Turkey Club 228 233 470 490 10 Market Fresh Turkey Ranch Bacon 379 344 830 800 11 Market Fresh Ultimate BLT 293 780 12 Market Fresh Roast Beef Swiss 357 780 13 Market Fresh Roast Ham Swiss 357 700 14 Market Fresh Roast Turkey Swiss 357 326 720 700 15 Market Fresh Chicken Salad 322 770 Arby’s sandwiches (2012 update) Research Question: At Arby’s, are calories related to the weight of the sandwich? Let’s try using tools from previous chapters first: Observational study •Response = calories •Explanatory variable = small or large sandwich Small sandwich means less than 225 grams (n = 6) Large sandwich means more than 225 grams (n = 4) Arby’s Sandwiches Large Small 200 300 400 500 600 700 800 Calories There seems to be a difference. (Is it statistically significant? That question comes later in the course!) This is where we consider the new topic of Chapter 10: We can refine the explanatory variable and get more information about the relationship between calories and weight: (Note: when we do this, we can no longer think of the explanatory variable as identifying which subpopulation the observation belongs to.) Rather than split it into small and large, keep the numerical values of the explanatory variable.

Upload: others

Post on 20-Feb-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: lecture13 - php.scripts.psu.edu

9/30/13

1

Sept. 30 Statistic for the Day: (David) Justice vs. (Derek) Jeter, 1995 and 1996: Overall batting average: Jeter’s is higher (.310 vs. .270) Each year: Justice’s is higher (.253 vs. .250 in 1995, .321 vs. .314 in 1996)

Assignment: Read Chapter 10

weight calories 1 Big Montana 309 g 590 2 Giant Roast Beef 224 450 3 Regular Roast Beef 154 320 4 Beef ‘n Cheddar 195 440 5 Super Roast Beef 230 440 6 Junior Roast Beef 125 270 7 Chicken Breast Fillet 233 500 8 Chicken Bacon ‘n Swiss 209 550 9 Roast Chicken Club 228 470 10 Market Fresh Turkey Ranch Bacon 379 830 11 Market Fresh Ultimate BLT 293 780 12 Market Fresh Roast Beef Swiss 357 780 13 Market Fresh Roast Ham Swiss 357 700 14 Market Fresh Roast Turkey Swiss 357 720 15 Market Fresh Chicken Salad 322 770

Arby’s sandwiches (from a while ago)

weight calories 1 Big Montana 309 g 590 2 Giant Roast Beef Max 224281 450580 3 Regular Roast Beef Classic 154 320350 4 Beef ‘n Cheddar Classic 195 440 5 Super Roast Beef Mid 230210 440 6 Junior Roast Beef 12587 270210 7 Chicken Breast Fillet Crispy 233221 500510 8 Chicken Bacon ‘n Swiss Crispy 209205 550610 9 Roast Chicken Grand Turkey Club 228233 470490 10 Market Fresh Turkey Ranch Bacon 379344 830800 11 Market Fresh Ultimate BLT 293 780 12 Market Fresh Roast Beef Swiss 357 780 13 Market Fresh Roast Ham Swiss 357 700 14 Market Fresh Roast Turkey Swiss 357326 720700 15 Market Fresh Chicken Salad 322 770

Arby’s sandwiches (2012 update) Research Question: At Arby’s, are calories related to the weight of the sandwich?

Let’s try using tools from previous chapters first: Observational study

• Response = calories • Explanatory variable = small or large sandwich

Small sandwich means less than 225 grams (n = 6) Large sandwich means more than 225 grams (n = 4)

Arby’s Sandwiches

Large Small

200

300

400

500

600

700

800

Calories

There seems to be a difference. (Is it statistically significant? That question comes later in the course!)

This is where we consider the new topic of Chapter 10: We can refine the explanatory variable and get more information about the relationship between calories and weight:

(Note: when we do this, we can no longer think of the explanatory variable as identifying which subpopulation the observation belongs to.)

Rather than split it into small and large, keep the numerical values of the explanatory variable.

Page 2: lecture13 - php.scripts.psu.edu

9/30/13

2

This type of plot, with two measurements per subject, is called a scatterplot (see p. 166).

100 150 200 250 300 350

200

300

400

500

600

700

800

Arby's Sandwiches

Weight

Calories

The correlation measures the strength of the linear relationship between weight and calories.

100 150 200 250 300 350

200

300

400

500

600

700

800

Arby's Sandwiches

Weight

Calories

Correlation=0.95

Facts about Correlation: •  We use the letter “r” to denote the correlation coefficient. •  The correlation coefficient is a measure of the strength of

the linear relationship between the two variables in a scatterplot.

•  The value of r must always be between −1 and 1:

a.  r=0 means no linear relationship. b.  Positive r means the two variables tend to increase

together (with r=1 meaning a perfect linear relationship) c.  Negative r means that one variable increases while the

other decreases (with −1 meaning a perfect linear relationship)

The best-fitting line through the data is called the regression line.

How should we describe this line?

100 150 200 250 300 350

200

300

400

500

600

700

800

Arby's Sandwiches

Weight

Calories

Formula for a regression line

Remember your algebra! The equation for a line is

y = (intercept) + (slope)(x)

or, in this case,

calories = (intercept) + (slope)(weight)

So all we need to describe the line is the intercept and the slope.

The intercept is 41 in this case and the slope is 2.1.

In this class, you don’t need to know how to calculate the slope and intercept (but see p. 195 if you like formulas). 100 150 200 250 300 350

200

300

400

500

600

700

800

Arby's Sandwiches

Weight

Calories

cal = 41 + (2.1)(wt)

Page 3: lecture13 - php.scripts.psu.edu

9/30/13

3

------------------------------------------------- For example, if you have a 200g sandwich, on the average you expect to get about: 41 + (2.1)(200) = 41 + 420 = 461 calories --------------------------------------------------

For a 350g sandwich: 41 + (2.1)(350) = 41 + 735 = 776 calories

calories = 41 + (2.1)(weight in grams)

intercept slope

calories = 41 + (2.1)(weight in grams)

For every extra gram of weight, you expect an increase of 2.1 calories in your Arby’s sandwich.

Interpretation of slope: Expected increase in response for every unit increase (increase of one) in explanatory.

intercept slope

More scatterplots: Exercise hours vs. GPA

0 5 10 15 20 25 30

2.0

2.5

3.0

3.5

4.0

Exercise Hours

Gra

de p

oint

ave

rage

“For how many hours do you typically exercise in a typical week during the semester?”

Linear relationship with GPA?

More scatterplots: Exercise hours vs. GPA

“For how many hours do you typically exercise in a typical week during the semester?”

0 5 10 15 20 25 30

2.0

2.5

3.0

3.5

4.0

Exercise Hours

Gra

de p

oint

ave

rage

More scatterplots: TV hours vs. GPA

“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”

0 10 20 30 40 50 60

510

1520

2530

35

TV Hours

Gra

de p

oint

ave

rage

We may have a slight problem with outliers!

More scatterplots: TV hours vs. GPA

“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”

0 2 4 6 8 10

2.0

2.5

3.0

3.5

4.0

TV Hours

Gra

de p

oint

ave

rage

Page 4: lecture13 - php.scripts.psu.edu

9/30/13

4

More scatterplots: TV hours vs. GPA

“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”

0 2 4 6 8 10

2.0

2.5

3.0

3.5

4.0

TV Hours

Gra

de p

oint

ave

rage

More scatterplots: TV hours vs. GPA

“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”

0 2 4 6 8 10

2.0

2.5

3.0

3.5

4.0

TV Hours

Gra

de p

oint

ave

rage

More scatterplots: Weight vs. Ideal Weight

Question: What is the relationship between weight and ideal weight? We’ll use SP2004 data.

100 150 200 250

100

150

200

250

Men and Women Combined

Weight

Ide

al W

eig

ht

Compare with case study 10.2, page 193

Dotted red line:

Weight = Ideal Weight

(not a regression line; rather, it’s a line for comparison purposes)

100 150 200 250

100

150

200

250

Men and Women Combined

Weight

Ideal W

eig

ht

The green line is the regression line:

Ideal weight = 25.6 + 0.78 Weight

Correlation = .867

R-squared = .752

S=15.17 100 150 200 250

100

150

200

250

Men and Women Combined

Weight

Ide

al W

eig

ht

Dotted red line: Weight = Ideal Weight

150 200 250

140

160

180

200

220

240

Men Only

Weight

Ide

al W

eig

ht

Page 5: lecture13 - php.scripts.psu.edu

9/30/13

5

Green regression line:

Ideal weight = 66.2 + 0.61 Weight

Correlation = .850

R-squared = .723

S=12.36

What does it mean when the lines cross at 169 pounds?

150 200 250

140

160

180

200

220

240

Men Only

Weight

Ide

al W

eig

ht

Dotted red line: Weight = Ideal Weight

100 120 140 160 180 200 220 240

100

110

120

130

140

150

160

Women Only

Weight

Ideal W

eig

ht

Green regression line:

Ideal weight = 56.1 + 0.50 Weight

Correlation = .831

R-squared = .691

S=8.20

100 120 140 160 180 200 220 240

100

110

120

130

140

150

160

Women Only

Weight

Ideal W

eig

ht

The lines cross at 112 pounds.

Spring 2001 Mean Fall 2008 Mean Wt. Ideal

Wt. Diff. Wt. Ideal

Wt. Diff.

Comb. 146 138 8 154 146 8 Men 175 171 4 174 172 2 Women 132 122 10 138 126 12

This pattern remained fairly steady over many years of STAT 100: Men on average are about 0-5 pounds heavier than their ideal, whereas women on average are about 10-12 pounds heavier than their ideal.

Note, however, that the regression lines tell a more complete story!

SP 2001 Mean Weight

FA 2008 Mean Weight

Combined 146 154 Men 175 174 Women 132 138

Notice: Combined mean weight is 8 pounds heavier in 2008. But women are only 6 pounds heavier on average, and men are actually lighter. How is this possible?

A weighty puzzle: SP 2001 vs. FA 2008 in STAT 100

The answer is related to Simpson’s paradox.

Percent men 32% 43%