topic 12: multiple linear regression. outline multiple regression –data and notation –model...

26
Topic 12: Multiple Linear Regression

Upload: heriberto-warters

Post on 14-Dec-2015

236 views

Category:

Documents


8 download

TRANSCRIPT

Topic 12: Multiple Linear Regression

Outline

• Multiple Regression– Data and notation–Model – Inference

• Recall notes from Topic 3 for simple linear regression

Data for Multiple Regression

• Yi is the response variable

• Xi1, Xi2, … , Xi,p-1 are p-1 explanatory (or predictor) variables

• Cases denoted by i = 1 to n

Multiple Regression Model

• Yi is the value of the response variable for the ith case

• β0 is the intercept• β1, β2, … , βp-1 are the regression

coefficients for the explanatory variables

0 1 1 2 2 1 , 1...i i i p i p iY X X X

Multiple Regression Model

• Xi,k is the value of the kth explanatory variable

for the ith case

• ei are independent Normally distributed

random errors with mean 0 and variance σ2

0 1 1 2 2 1 , 1...i i i p i p iY X X X

Multiple Regression Parameters

• β0 is the intercept

• β1, β2, … , βp-1 are the regression coefficients for the explanatory variables

• σ2 the variance of the error term

Interesting special cases

• Yi = β0 + β1Xi + β2Xi2

+…+ βp-1Xip-1+ ei

(polynomial of order p-1)

• X’s can be indicator or dummy variables taking the values 0 and 1 (or any other two distinct numbers)

• Interactions between explanatory variables (represented as the product of explanatory variables)

Interesting special cases

• Consider the model

Yi= β0 + β1Xi1+ β2Xi2+β3X i1Xi2+ ei

• If X2 a dummy variable

–Yi = β0 + β1Xi + ei (when X2=0)

–Yi = β0 + β1Xi1+β2+β3Xi1+ ei (when X2=1)

= (β0+β2) + (β1+β3)Xi1+ ei

–Modeling two different regression lines at same time

Model in Matrix Form

n 1 n p n 1p 1

2

n n

2

n n

~N(0, )

~ ( , )

β

Y X

I

Y N X I

Least Squares

YXXbX

equations normal Obtain

)XbY()XbY(SSE

minimize to b Find

Least Squares Solution

Fitted (predicted) values

YX)XX(b 1

HY

YX)XX(XXbY 1

Residuals

Y)HI(

HYY

YYe

is symetric and idempotent

( )( ) ( )

I H

I H I H I H

Covariance Matrix of residuals

• Cov(e)=σ2(I-H)(I-H)΄= σ2(I-H)

• Var(ei)= σ2(1-hii)

• hii= X΄i(X΄X)-1Xi

• X΄i =(1,Xi1,…,Xi,p-1)

• Residuals are usually correlated

• Cov(ei,ej)= -σ2hij

Estimation of σ2

2

n

( ) ( )

n

E

sp

p

SSEMSE

df

s s Root MSE

Y Xb Y Xb

Distribution of b

• b = (X΄X)-1X΄Y

• Since Y~N(Xβ, σ2I)

• E(b)=((X΄X)-1X΄)Xβ=β

• Cov(b)=σ2 ((X΄X)-1X΄)((X΄X)-1X΄)΄

=σ2(X΄X)-1

• σ2 (X΄X)-1 is estimated by s2 (X΄X)-1

ANOVA Table• Sources of variation are

–Model (SAS) or Regression (KNNL)

–Error (SAS, KNNL) or Residual

–Total

• SS and df add as before

–SSM + SSE =SSTO

–dfM + dfE = dfTotal

Sums of Squares

n 2

ii 1

n2

i ii 1

n2

ii 1

ˆSSM Y Y

ˆSSE (Y Y )

SSTO (Y Y)

Degrees of Freedom

1-n

-n

1-

Total

E

M

df

pdf

pdf

Mean Squares

Total

E

M

SSTO/df MST

SSE/df MSE

SSM/dfMSM

Mean Squares

n 2

ii 1

n2

i ii 1

n2

ii 1

ˆMSM Y Y / ( 1)

ˆMSE (Y Y ) / (n )

MST (Y Y) / (n 1)

p

p

ANOVA Table

Source SS df MS F

Model SSM dfM MSM MSM/MSE

Error SSE dfE MSE

Total SSTO dfTotal MST

ANOVA F test

• H0: β1 = β2 = … = βp-1 = 0

• Ha: βk ≠ 0, for at least one k=1,., p-1

• Under H0, F ~ F(p-1,n-p)

• Reject H0 if F is large, use P-value

P-value of F test• The P-value for the F significance test

tells us one of the following:– there is no evidence to conclude that

any of our explanatory variables can help us to model the response variable using this kind of model (P ≥ .05)–one or more of the explanatory

variables in our model is potentially useful for predicting the response variable in a linear model (P ≤ .05)

R2

• The squared multiple regression

correlation (R2) gives the proportion of

variation in the response variable

explained by all the explanatory variables

• It is usually expressed as a percent

• It is sometimes called the coefficient of

multiple determination (KNNL p 226)

R2

• R2 = SSM/SST

– the proportion of variation explained

• R2 = 1 – (SSE/SST)

– 1 – the proportion not explained

• Can express F test is terms of R2

F = [ (R2)/(p-1) ] / [ (1- R2)/(n-p) ]

Background Reading

• We went over KNNL 6.1 - 6.5