linear regression models

31
Linear regression models

Upload: serafina-mauro

Post on 01-Jan-2016

74 views

Category:

Documents


2 download

DESCRIPTION

Linear regression models. Simple Linear Regression. History. Developed by Sir Francis Galton (1822-1911) in his article “Regression towards mediocrity in hereditary structure”. Purposes:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linear regression models

Linear regression models

Page 2: Linear regression models

Simple Linear Regression

Page 3: Linear regression models

History

• Developed by Sir Francis Galton (1822-1911) in his article “Regression towards mediocrity in hereditary structure”

Page 4: Linear regression models

Purposes:

• To describe the linear relationship between two continuous variables, the response variable (y-axis) and a single predictor variable (x-axis)

• To determine how much of the variation in Y can be explained by the linear relationship with X and how much of this relationship remains unexplained

• To predict new values of Y from new values of X

Page 5: Linear regression models

The linear regression model is:

• Xi and Yi are paired observations (i = 1 to n)

• β0 = population intercept (when Xi =0)

• β1 = population slope (measures the change in Yi

per unit change in Xi)

• εi = the random or unexplained error associated with the i th observation. The εi are assumed to be independent and distributed as N(0, σ2).

iii XY 10

Page 6: Linear regression models

Linear relationshipY

X

ß0

ß1

1.0

Page 7: Linear regression models

Linear models approximate non-linear functions

over a limited domain

extrapolation extrapolationinterpolation

Page 8: Linear regression models

Yi = βo + β1*Xi + εi

ε ~ N(0,σ2) E(εi) = 0

E(Yi ) = βo + β1*Xi

X1 X2

E(Y1)

E(Y2)

Y

X

• For a given value of X, the sampled Y values are independent with normally distributed errors:

Page 9: Linear regression models

Yi

Ŷi

Yi – Ŷi = εi (residual)

Xi

Fitting data to a linear model:

ii XY 10ˆˆˆ

Page 10: Linear regression models

The residual

2)( iii YYd

n

i ii YYRSS1

2)(

The residual sum of squares

Page 11: Linear regression models

Estimating Regression Parameters

• The “best fit” estimates for the regression population parameters (β0 and β1) are the values that minimize the residual sum of squares (SSresidual) between each observed value and the predicted value of the model:

n

iii

n

iii XYYY

1

210

1

210 ))ˆˆ(()ˆ(minimize toˆ ,ˆ Choose

Page 12: Linear regression models

)()()(1 1

2ii

n

i

n

i iiiiY YYYYYYSS

)()(1 ii

n

i iiXY XXYYSS

Sum of squares

Sum of cross products

Page 13: Linear regression models

Least-squares parameter estimates

n

i iiX XXSS1

2)(

XX

XY

X

XY

SS

SS

s

s

21̂

where

Page 14: Linear regression models

)()(1

11

2 XXXXn

s ini iX

)()(1

11 YYXX

ns i

ni iXY

Sample variance of X:

Sample covariance:

X

XY

i

n

ii

i

n

ii

X

XYSS

SS

XXXX

YYXX

s

s

)()(

)()(

ˆ

1

121

Page 15: Linear regression models

Thus, our estimated regression equation is:

Solving for the intercept:

XY 10ˆˆ

ii XY 10ˆˆˆ

Page 16: Linear regression models

Hypothesis Tests with Regression

• Null hypothesis is that there is no linear relationship between X and Y:

H0: β1 = 0 Yi = β0 + εi

HA: β1 ≠ 0 Yi = β0 + β1 Xi + εi

• We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses

Page 17: Linear regression models

Variance of the error of regression:

2

122

ˆ

n

YY

n

SS

n

iii

residual

NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MSresidual)

Page 18: Linear regression models

Mean square of regression:

2

1regressionregression 1

ˆ

1

n

ii YY

SSMS

The F-ratio is: (MSRegression)/(MSResidual)

This ratio follows the F-distribution with (1, n-2) degrees of freedom

Page 19: Linear regression models

Variance components and Coefficient of determination

RSSSSSS Yreg

RSSSSSS regY

Page 20: Linear regression models

Coefficient of determination

RSSSS

SS

SS

SSr

reg

reg

Y

reg

2

Page 21: Linear regression models

ANOVA table for regression

Source Degrees of freedom

Sum of squares Mean square

Expected mean square

F ratio

Regression 1

Residual n-2

Total n-1

n

i iiY YYSS1

2)(

n

i iireg YYSS1

2)ˆ(

n

i ii YYRSS1

2)ˆ(

1regSS

2nRSS

1nSSY

N

i

X1

221

2

2

2Y

)2/(

1/

nRSS

SSreg

Page 22: Linear regression models

Product-moment correlation coefficient

YX

XY

YX

XY

ss

s

SSSS

SSr

Page 23: Linear regression models

Parametric Confidence Intervals• If we assume our parameter of interest has a particular sampling

distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile.

• Example: if we assume Y is a normal random variable with unknown mean μ and variance σ2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead: , giving us a t-distribution with (n-1) degrees of freedom.

• The 100(1-α)% confidence interval for μ is then given by:

• IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.

)( Y

YsY )(

YnYn stYstY )1;2/1()1;2/1(

Page 24: Linear regression models

Publication form of ANOVA table for regression

SourceSum of Squares df

Mean Square F Sig.

Regression 11.479 1 11.479 21.044 0.00035

Residual8.182 15 .545

Total 19.661 16

Page 25: Linear regression models

Variance of estimated intercept

XSS

X

n

222 1

ˆˆ0

00

ˆ2,00ˆ2,0 ˆˆˆˆˆ nn tt

Page 26: Linear regression models

Variance of the slope estimator

XSS

22 ˆ

ˆ1

11

ˆ2,11ˆ2,1 ˆˆˆˆ nn tt

Page 27: Linear regression models

Variance of the fitted value

X

iXY SS

XX

n

222

)|ˆ(

1ˆˆ

)|ˆ(2,)|ˆ(2, ˆˆˆˆˆXYnXYn tYYtY

Page 28: Linear regression models

Variance of the predicted value (Ỹ):

XXY SS

XX

n

222

)~

|~

(

~1

1ˆˆ

)~

|~

(2,)~

|~

(2, ˆ~~

ˆ~

XYnXYn tYYtY

Page 29: Linear regression models

Regression

Ln( Island Area)

1086420-2

Ln (nu

mbe

r of

spe

cies

)

8

7

6

5

4

3

2

1

Page 30: Linear regression models

Assumptions of regression

• The linear model correctly describes the functional relationship between X and Y

• The X variable is measured without error

• For a given value of X, the sampled Y values are independent with normally distributed errors

• Variances are constant along the regression line

Page 31: Linear regression models

Residual plot for species-area relationship

Unstandardized Predicted Value

6.05.55.04.54.03.53.02.5

Uns

tand

ardi

zed

Res

idua

l

1.5

1.0

.5

0.0

-.5

-1.0

-1.5