last time: one-way analysis of variance. example list of 50 spoken words 3 x 10 subjects (split...

35
Last time: Last time: One-way Analysis of Variance One-way Analysis of Variance

Upload: emma-merritt

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Last time:Last time:

One-way Analysis of VarianceOne-way Analysis of Variance

Page 2: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Example• List of 50 spoken words• 3 x 10 Subjects (split among I=3 groups)• Group 1: (Fast sound) Person in movie reads list, but

sounds precede lip movement slightly• Group 2: (Slow sound) Person in movie reads list, but

sounds lag behind lip movement slightly• Group 3: (Synchrony) Person in movie reads list with

auditory and visual stimuli in synchrony• Memory Task: Subjects are asked to recall as many

items as possible.

Page 3: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

One-way Analysis of Variance Model Assumptions:

,1N ,2N ,iN ,IN

11

12

11

...

nX

X

X

22

22

21

...

nX

X

X

iin

i

i

X

X

X

...2

1

IIn

I

I

X

X

X

...2

1

I many Independent Groups

Data … …

Population

Sample Size1n

2n i

nIn… …

Page 4: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

One-way Analysis of Variance

,~ iij NX

,0~ , NX ijijiij

IH ...: 210

oneleast at :aH

Page 5: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Similar recipe as in Linear Regression!

i

iiji

iijii

ij XXnXXXX22

,,

2

Sum SquaresTotal(SST)

Sum SquaresError(SSE)

Sum SquaresGroups(SSG)

Degrees of Freedom

DFT = N-1

Degrees of Freedom

DFG = I-1

Degrees of Freedom

DFE=N-I= +

1,,

1

:SquaresMean

I

SSGMSG

IN

SSEMSE

N

SSTMST

MSG

Page 6: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

IH ...: 210 2

2

,MSE pji

iij

SIN

XX

2 ofestimator unbiasedan is MSE

wellas ofestimator

unbiasedan isMSG 2

1 toclose be totends0 MSE

MSGH

INIFMSE

MSGH ,1~:0

Page 7: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

ANOVASource of Variation SS df MS F P-value

Between Groups 233.8667 2 116.9333 5.894698 0.007513Within Groups 535.6 27 19.83704

Total 769.4667 29

Let’s grind it out for our example…

MSG

89.583704.19

9333.116

MSE

MSG

Large MSG leads tosignificant F statistic.

Reject Null Hypothesis!Conclusion: The population means

are not identical across groups

Page 8: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

What if I=2?

2,1~ :Then NFMSE

MSG

Remember: The Square of a t Random Variable with

n-2 degrees of freedom is an F Random Variablewith 1 degree of freedom in the numerator and

with n-2 degrees of freedom in the denominator.

Thus, the one-way analysis of variance is a natural extensionof the comparison of two means from independent samples(with equal population variances).

Page 9: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Robustness

• If the samples sizes are equal, then the assumption of equal variance (equal standard deviation) is not crucial.

• CLT helps with violations of normality, i.e. as long as sample sizes are large, we do not need normality of the X variables.

Page 10: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Today:Today:

Wrap up “Loose Ends”Wrap up “Loose Ends”

An Illustrating Example An Illustrating Example on Simple Regressionon Simple Regression

Typo CorrectionTypo Correction

One last quiz…One last quiz…

Page 11: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

San Fernando Valley Real Estate Data

RENT FEET YRS DIST OFFICE POWER CLEAR LOAD PARK LOT SPRINK1 0.65 11000 0 12 1000 200 20 0 15 1.52 12 0.47 16544 0 12 2156 400 20 1 33 1.5 13 0.48 15004 0 12 2178 400 20 1 30 1.5 14 0.45 27960 1 16 5824 800 18 0 56 1.43 15 0.55 10665 1 13 1000 400 18 0 27 1.5 16 0.51 13700 1 7 1370 600 24 0 28 1.63 0

47 0.47 13440 23 20 2885 1600 14.5 0 17 1.79 048 0.56 14703 24 3 5500 1800 16 1 46 2.31 149 0.53 10000 30 9 800 200 14 1 30 3.19 050 0.5 10320 31 4 1000 400 14 0 22 1.84 151 0.58 27600 33 3 7600 2000 16 0 52 4.5 152 0.36 10360 33 20 730 200 12 0 0 0.8 0

etc.

Page 12: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

San Fernando Valley Real Estate Data

RENT FEET YRS DIST OFFICE POWER CLEAR LOAD PARK LOT SPRINK1 0.65 11000 0 12 1000 200 20 0 15 1.52 12 0.47 16544 0 12 2156 400 20 1 33 1.5 13 0.48 15004 0 12 2178 400 20 1 30 1.5 14 0.45 27960 1 16 5824 800 18 0 56 1.43 15 0.55 10665 1 13 1000 400 18 0 27 1.5 16 0.51 13700 1 7 1370 600 24 0 28 1.63 0

47 0.47 13440 23 20 2885 1600 14.5 0 17 1.79 048 0.56 14703 24 3 5500 1800 16 1 46 2.31 149 0.53 10000 30 9 800 200 14 1 30 3.19 050 0.5 10320 31 4 1000 400 14 0 22 1.84 151 0.58 27600 33 3 7600 2000 16 0 52 4.5 152 0.36 10360 33 20 730 200 12 0 0 0.8 0

(Rent per square foot)

(Square-footage)

Page 13: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

0

5000

10000

15000

20000

25000

30000

35000

0 10000 20000 30000 40000 50000 60000 70000

Sq. Feet

Page 14: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Is there significant evidence for a linear relationship?

• Test using the correlation• Test using the slope• Test using the ANOVA table

Page 15: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Dependent Variable: TotalRent Realest.xlsIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

Analysis of VarianceSource df Sum Sqrs Mean Sqr F P-value

Regression 1 1.67E+09 1.67E+09 410.179 0.000Residual 50 2.03E+08 4.07E+06

Total 51 1.87E+09

Y

Sample correlation R

n n-2t-stat

Page 16: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Y

Sample correlation R

t-stat

0:

0:0

XYA

XY

H

H

2,22022

21

:if Reject ~

21

:Test

nn t

nR

rHt

nr

r

Dependent Variable: TotalRent Realest.xlsIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Page 17: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Y

Sample correlation R

t-stat

0:

0:0

XYA

XY

H

H

2,2222009.2253.20

2)944(.1

944.have We~

21

:Test

nn t

n

t

nr

r

Dependent Variable: TotalRent Realest.xlsIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

The correlation is significant at 5% significance level.Yes, significant evidence for a linear relationship.

Page 18: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Dependent Variable: TotalRentIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

0:

0:0

iA

i

H

H

1̂0̂SE

1̂SE

Observedt-statistics for

*

* iSEi

ˆ

0ˆ p-value =

observedn ttP 2

toingcorrespond

95% CIs

Page 19: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

0:

0:

1

10

AH

H

1̂ 1̂SE

Observedt-statistic for

*

* 1ˆ

1 0ˆ

SE

p-value <.001Yes, significant evidence forlinear relationship

95% CI

Dependent Variable: TotalRentIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

Page 20: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

p-value <.001Yes, significant evidence forlinear relationship

Analysis of VarianceSource df Sum Sqrs Mean Sqr F P-value

Regression 1 1.67E+09 1.67E+09 410.179 0.000Residual 50 2.03E+08 4.07E+06

Total 51 1.87E+09

179.4100607.4

0967.1

E

E

MSE

MSM

Page 21: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

What is the best fitting regression equation?

Page 22: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Dependent Variable: TotalRentIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

Feet Square 451.779.960Rent

Page 23: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

“I bet the population intercept is more then 900”

This would mean that you pay a fixed

minimum flat amount of $900,

plus whatever rent you need to pay

based on square footage.

Page 24: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Dependent Variable: TotalRentIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

claim. for the evidencet significan No

009.211.951.521

900779.960

025.

900Intercept :

,900Intercept :0

AH

H

Page 25: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

I bet, for every additional 10 Square Feet,

you have to pay more than an extra $4 Rent!

That would mean more than $.4 extra rent per extra square foot.

That would mean the slope is > .4.

Page 26: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

50,02.1 109.2318.2

022.

4.451.4.

:statistic edstandardiz Observe

1

tSE

b

b

4.:4.: 1110 HH

Significant at 2% significance level.Yes, significant evidence that we pay over $4 extra per 10sqft extra.

Page 27: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

For every additional 1,000 Square Feet, how much extra Rent do you have to pay?

Give a 95% Confidence Interval

Page 28: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

]496,.407[.022.009.2451. :obtain wecase,our In

:slope for the Interval Confidence 95%

2

*2,21

XX

stb

i

n

This is our 95% CI for the extra Rent per extra Square Foot.Thus:95% CI for extra Rent per 1,000 Square Feet: [$407, $496]

Page 29: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

What is our best guess at the standard deviation of

the Error Term?

What percentage of the variance are we able to explain with this

model?

Page 30: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

22

n

SSEs

SST

SSR

SST

SSER 12

2n

SSEs

SSR = SST-SSE

n

i iYY YYSSST1

2)(

)ˆˆˆ()ˆ( 101

2 XYYYSSEn

i i

Dependent Variable: TotalRentIndependent Variables: SquareFeet

Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count

SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52

Correlation MatrixVariable SquareFeet TotalRent

SquareFeet 1.000TotalRent 0.944 1.000

Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)

0.944 0.891 0.889 2017.147 52 0 50 2.009

Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%

Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496

Analysis of VarianceSource df Sum Sqrs Mean Sqr F P-value

Regression 1 1.67E+09 1.67E+09 410.179 0.000Residual 50 2.03E+08 4.07E+06

Total 51 1.87E+09

Page 31: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

0

5

10

15

20

25

Residual Range

Histogram of Residuals

Residual

Theoretical

?),0( likelook residuals theDo 2N

Page 32: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Line Fit P lot

-850

4150

9150

14150

19150

24150

29150

34150

39150

5000 15000 25000 35000 45000 55000 65000 75000

SquareF eet

Actual

P redicted

Upper 95%Lower 95%

Prediction Region

Page 33: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Slide Typo Correction:Slide Typo Correction:2x2 Contingency Tables2x2 Contingency Tables

Page 34: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Special Case: 2x2 Tables

1df with Square-Chi toCompare

:

2121

221122211

2

2

)2(1)1(10

CCRR

NNNNn

E

-ENΧ

ppH

i,j ij

ijij

This typo occurred in several slidesdue to cut and pasting.

Page 35: Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split among I=3 groups) Group 1: (Fast sound) Person in movie

Last (and special) QuizLast (and special) QuizCounts as 5 Bonus Points Counts as 5 Bonus Points

in Grand Totalin Grand Total

RegressionRegression