stats chapter 15

68
Chapter 15 Inference for Regression

Upload: richard-ferreria

Post on 03-Dec-2014

2.869 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Stats chapter 15

Chapter 15

Inference for Regression

Page 2: Stats chapter 15

The Regression Model

• We can consider each coordinate pair as a part of a random sample

• For each x-coordinate, there are a number of possible y-coordinates that could have been recorded.

• Depending on the outcomes of the random nature of the experiment, we could have different regression models for every experiment conducted

Page 3: Stats chapter 15

The regression model

• There must be some “true regression model” that we are approximating with our experiments:y = + x

y = “the average response variable”

= “the true value of the intercept” = “the true value of the slope”

• The std dev of y is the same for all values of x• For a fixed value of x, the responses (y) vary

according to a Normal distribution

Page 4: Stats chapter 15
Page 5: Stats chapter 15

Confidence Interval for

This is a PANIC procedure. As always, some of the steps will be the same for a significance test.

Parameterwe are constructing a confidence interval for the value of .

“ is the value of the slope of the regression line of (response var) on (explanatory var)”

Page 6: Stats chapter 15

Confidence Interval for

Assumptions(1) Observations are independent(2) The true relationship is linear

-check scatterplot “scatterplot appears linear”-check residuals “residuals do not show any

pattern

Page 7: Stats chapter 15

Confidence Interval for Assumptions (cont.)(3) The std dev is the same everywhere

-check residuals“residuals do not show a increasing/decreasing

fan pattern”(4) The response varies Normally about the true regression line

-check the histogram of the residuals“Histogram of residuals show a single peaked,

symmetric distribution”this distribution may be slightly skewed.NO OUTLIERS

Page 8: Stats chapter 15

Confidence Interval for

Name of the Interval“We are constructing a (CL)% confidence interval for the value of .

“We are constructing a 90% confidence interval for the value of .”

Page 9: Stats chapter 15

Confidence Interval for

Interval calculations the calculations here get messy: most of the time, we read standard deviations from printouts or calculator work.there are actually 3 standard deviations on a normal printoutstd error of ‘a’ = “SEa” (we will never use this)

std error of ‘b’ = “SEb”

std error of residuals = “s”

Page 10: Stats chapter 15

Confidence Interval for

Interval calculations (cont.) Calculation of standard errors:

laborious calculations, indeed!

2

2

residuals

2

2

sn

y ys

n

2b

sSE

x x

Page 11: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

Page 12: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

a

Page 13: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

b

Page 14: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

r2

Page 15: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

SEb

Page 16: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

s (SE of residuals)

Page 17: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

IGNORED

Page 18: Stats chapter 15

Confidence Interval for

Interval calculationsUsing your calculator to find SEb will be shown laterCI = b ± t*df (SEb)

df = n – 2Conclusion

We are (CL)% confident that the value of the slope of the regression line of (response) on (explanatory) is in the interval (CI).

Page 19: Stats chapter 15

Confidence Interval for

Example from print-out

b = 0.018, SEb = 0.0024, n = 16

Page 20: Stats chapter 15

Confidence Interval for

Example from print-outb = 0.018, SEb = 0.0024, n = 16, df = 14For a 95% CI, t* = 2.145

(chart or “-invT(-0.05/2)”)CI = 0.018 ± 2.145(0.0024) = (0.013,

0.023)“We are 95% confident that the value of the slope of the regression line of BAC level on number of beers drunk is in the interval (0.013, 0.023).”

Page 21: Stats chapter 15

PHANTOMS again

Parameterfor these procedures, we are conducting a significance test on the value of .

“ is the value of the slope of the regression line of (response var) on (explanatory var)”

Page 22: Stats chapter 15

PHANTOMS again

HypothesisWhen = 0 there is no linear relationship between the two variables.

H0: = 0 (there is no linear relationship)

Ha: 0 (there is a linear relationship) or,

Ha: < 0 (there is a neg. linear relationship) or,Ha: > 0 (there is a positive linear relationship)

Page 23: Stats chapter 15

PHANTOMS again

Assumptions(1) Observations are independent(2) The true relationship is linear

-check scatterplot “scatterplot appears linear”-check residuals “residuals do not show any

pattern

Page 24: Stats chapter 15

PHANTOMS again

Assumptions (cont.)(3) The std dev is the same everywhere

-check residuals“residuals do not show a increasing/decreasing

fan pattern”(4) The response varies Normally about the true regression line

-check the histogram of the residuals“Histogram of residuals show a single peaked,

symmetric distribution”this distribution may be slightly skewed.NO OUTLIERS

Page 25: Stats chapter 15

PHANTOMS again

Name of Test“t-test for the slope of a linear regression”

Page 26: Stats chapter 15

PHANTOMS again

Test Statisticthe calculations here get messy: most of the time, we read standard deviations from printouts or calculator work.there are actually 3 standard deviations on a normal printoutstd error of ‘a’ = “SEa” (we will never use this)

std error of ‘b’ = “SEb”

std error of residuals = “s”

Page 27: Stats chapter 15

PHANTOMS again

Test Statistic (cont.)Calculation of standard errors:

laborious calculations, indeed!

2

2

residuals

2

2

sn

y ys

n

2b

sSE

x x

Page 28: Stats chapter 15

PHANTOMS again

Test Statistic (cont.)

In actuality, a printout will have the test statistic almost completely calculated for you!

b

btSE

2df n

Page 29: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

Page 30: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

a

Page 31: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

b

Page 32: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

r2

Page 33: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

SEb

Page 34: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

s (SE of residuals)

Page 35: Stats chapter 15

PHANTOMS again

Interval Calculations (cont.)From a printout:

IGNORED

Page 36: Stats chapter 15

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)

Page 37: Stats chapter 15

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)

Page 38: Stats chapter 15

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)

Page 39: Stats chapter 15

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)

Page 40: Stats chapter 15

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)Ha: 0; pval = 2 x P(t > |T|)

Page 41: Stats chapter 15

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)Ha: 0; pval = 2 x P(t > |T|)

Page 42: Stats chapter 15

PHANTOMS again

DecisionSimilarly to the other tests, reject the null hypothesis when the p-value is below the accepted level

SummaryUse the same 3 part summary:1) Interpret the p-value w.r.t. sampling distribution2) Make decision with reference to an alpha level3) Summarize the results in context of the problem

Page 43: Stats chapter 15

Calculator Methods

• The TI83/84/89 must have the data in list1/list2

• TI83/84 [STAT] -> “TEST” -> “LinRegTTest”

• TI89[APPS] -> “Stat/List Editor” -> [TESTS] -> “LinRegTTest”

• Select Xlist, Ylist, and Ha

• “Calculate”• There are 2 screens of data

Page 44: Stats chapter 15

Problem 15.1

• We will try to determine whether a linear regression will fit the data

Page 45: Stats chapter 15

Problem 15.1

Begin by inputting the data in our lists

Page 46: Stats chapter 15

Problem 15.1

Begin by inputting the data in our lists

Page 47: Stats chapter 15

Problem 15.1

Begin by inputting the data in our listsParameter

“ is the slope of the true regression line of IQ on peaks of infant crying in a their most active 20 second interval”

HypothesesH0: = 0

Ha: 0

Page 48: Stats chapter 15

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”

Page 49: Stats chapter 15

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)

Page 50: Stats chapter 15

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)

Page 51: Stats chapter 15

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)“The scatterplot appears moderately linear”

Page 52: Stats chapter 15

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)“The scatterplot appears moderately linear”

Page 53: Stats chapter 15

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)“The scatterplot appears moderately linear”“The residual plot shows no obvious pattern”

Page 54: Stats chapter 15

Problem 15.1

Assumptions (cont.)(3) standard deviations

Page 55: Stats chapter 15

Problem 15.1

Assumptions (cont.)(3) standard deviations

Page 56: Stats chapter 15

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”

Page 57: Stats chapter 15

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”(4) Normal responses

Page 58: Stats chapter 15

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”(4) Normal responses

Page 59: Stats chapter 15

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”(4) Normal responses“The Histogram of the residuals is right skewed, but our procedure is robust enough to handle the skewness (n = 38)”

Page 60: Stats chapter 15

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Page 61: Stats chapter 15

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Page 62: Stats chapter 15

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Page 63: Stats chapter 15

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Page 64: Stats chapter 15

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Page 65: Stats chapter 15

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and pt = 3.065 and p = 0.004If needed, used the equation for t to solve for SEb

Page 66: Stats chapter 15

Problem 15.1

Obtain p-valuepvalue = 2 x P(t > 3.065)“2*tcdf(3.065,1E99,36)”pvalue = 0.004

Make decisionreject null hypothesis

Page 67: Stats chapter 15

Problem 15.1

Summary“Approximately 0.4% of the time, a random sample of 38 will produce a test statistic at least as extreme as 3.065”“Because this p-value is less than an alpha of 0.01, we will reject the null hypothesis”“We must conclude that a linear relationship between the peaks of a baby crying in a 20 second interval and the babies IQ does exist.”

Page 68: Stats chapter 15

OMG WE FINISHED THE BOOK