time series analysis – chapter 2 simple regression

Post on 06-Feb-2016

47 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Time Series Analysis – Chapter 2 Simple Regression. Essentially, all models are wrong, but some are useful. - George Box Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN 0471810339 George Box is the son-in-law of Sir Ronald Fisher. - PowerPoint PPT Presentation

TRANSCRIPT

Time Series Analysis – Chapter 2Simple Regression

Essentially, all models are wrong, but some are useful. - George Box

Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN 0471810339George Box is the son-in-law of Sir Ronald Fisher.

Time Series Analysis – Chapter 2Simple Regression

Equation of a Line – AlgebraVs.

Simple Regression – Statistics

Equation of a Line Exampley = mx + b

wage = 3.55educ – 33.8

y = wage in dollars per hourx = education in years completed

Note: if I know how many years of education someone has completed I can predict their wage perfectly. Nothing else matters.

Simple Regression Example

y = wage per hour ($) – dependent variablex = education completed (years) – independent variable= unknown intercept = unknown slopeu = error term – factors other than x that affect y

Simple Regression Example

Need to estimate and • Collect data• Conduct a “regression analysis”

Algebra vs. Statistics - Summary

Algebra: wage = 3.55educ – 33.8Deterministic Model

Statistics: Stochastic Model

Algebra vs. Statistics - Summary

All factors affecting y (wage) other than x (education) are considered unobservable. The error term u represents the effect of these other factors.

Upshot: u is independent of x.

+x if

+x if

• Equation tells us how the “average” value of y changes or is related to a particular x value.

• = 0.568 + 0.102 ACTStudent GPA ACT1 2.8 212 3.4 243 3.0 264 3.5 275 3.6 296 3.0 257 2.7 258 3.7 30

= 0.568 + 0.102 ACT

The Analysis of Variance TableAnalysis of Variance

Source DF SS MS F PRegression 1 0.59402 0.59402 8.20 0.029Residual Error 6 0.43473 0.07245Total 7 1.02875

ANOVA

Models can be evaluated by examining variability.

There are three types of variability that are quantified.• Overall or total variability present in the data (SST)• Variability explained by the regression model (SSR)• Error variability that is unexplained (SSE)

SST = SSR + SSE

ANOVA

The larger the regression variability (SSR) is compared to the error variability (SSE) the more evidence there is that the model is explanatory.

Analysis of Variance

Source DF SS MS F PRegression 1 0.59402 0.59402 8.20 0.029Residual Error 6 0.43473 0.07245Total 7 1.02875

ANOVA – R2

R2 is the Coefficient of Determination

R2 = SSR/SST = 1 – SSE/SST TYPO on pg. 40!! R2 is the percent of the variation in y (response variable)

explained by x (explanatory variable).

R-Sq = SSR/SST = 0.59402/ 1.02875 = 57.7%

ANOVA – r

r is the correlation coefficient and r = • Positive if a positive relationship is present• Negative if a negative relationship is present

0.7596

ANOVA – R2 vs. r

R2 always exists for simple regression and multiple regression and always has the same definition

r only exists and makes sense for simple regression

Nobel Prize vs. # of McDonalds

• Explanatory variable is number of McDonalds a country has

• Response variable is number of Nobel Prizes that have been awarded that country.

Logs

Level – Level Model• Dependent variable: y• Independent variable: x

• = 0.568 + 0.102 ACT (verify)

• (interpret)Student GPA ACT1 2.8 212 3.4 243 3.0 264 3.5 275 3.6 296 3.0 257 2.7 258 3.7 30

Level – Log Model• Dependent variable: y• Independent variable: log(x)

• Not used in this chapter, discussed in future chapters.

Log – Level Model• Dependent variable: log(y)• Independent variable: x

• = 0.341 + 0.0317 ACT (verify)

Student GPA ACT log(GPA)1 2.8 21 1.029622 3.4 24 1.223783 3.0 26 1.098614 3.5 27 1.252765 3.6 29 1.280936 3.0 25 1.098617 2.7 25 0.993258 3.7 30 1.30833

Log – Level Model• Dependent variable: log(y)• Independent variable: x• = 0.341 + 0.0317 ACT

• (see Appendix A)

• So, for every ACT score increase of 1 the GPA should increase by about 3.17%.

Level – Level Model• = 0.568 + 0.102 ACT

Log – Level Model• = 0.341 + 0.0317 ACT• =

Is this still linear regression?• = + ACT and this equation is linear in the parameters

and !

Log – Log Model• Dependent variable: log(y)• Independent variable: log(x)

• = - 1.41 + 0.791 log(ACT) (verify)

Student GPA ACT log(GPA) log(ACT)1 2.8 21 1.02962 3.044522 3.4 24 1.22378 3.178053 3.0 26 1.09861 3.258104 3.5 27 1.25276 3.295845 3.6 29 1.28093 3.367306 3.0 25 1.09861 3.218887 2.7 25 0.99325 3.218888 3.7 30 1.30833 3.40120

Log – Log or Constant Elasticity Model• Dependent variable: log(y)• Independent variable: log(x)• = - 1.41 + 0.791 log(ACT)

• (see Appendix A)

Log – Log or Constant Elasticity Model• = - 1.41 + 0.791 log(ACT)

• (see Appendix A)

• is the estimated elasticity of GPA with respect to ACT.

• A 1% increase in ACT implies a 0.791% increase in GPA.

Simple Linear Regression Assumptions

• SLR.1: The model to be estimated must be linear in the parameters and .

Simple Linear Regression Assumptions

• SLR.2: The sample of size n used to estimate the model parameters is a random sample (sometimes called a simple random sample).

What is the definition of a random sample?

Simple Linear Regression Assumptions

• SLR.3: The sample x values are not all the same value.

OK NOT OKx y3.4 243.0 263.5 273.6 293.0 252.7 25

x y3 243 263 273 293 253 25

Simple Linear Regression Assumptions

• SLR.4: The error variable u has an expected value of zero given any value f the explanatory variable x.

Simple Linear Regression Assumptions

• SLR.5: The error term u has the same variance (variability) associated with it given any value of the explanatory variable. In other words

Var

This is called homoskedasticity.

Ordinary Least Squares Estimators

• How do we estimate the parameters in the model

• Ordinary Least Squares gives unique estimates of and .

• Recall that the mean of is zero so we don’t need to estimate .

Ordinary Least Squares

• Minimize the sum of the squared residuals.

Ordinary Least Squares

Definition of residual:

Some are positive,Some negative, and

Student GPA ACT RESI11 2.8 21 0.0857142 3.4 24 0.3791213 3.0 26 -0.2252754 3.5 27 0.1725275 3.6 29 0.0681326 3.0 25 -0.1230777 2.7 25 -0.4230778 3.7 30 0.065934

Ordinary Least Squares

Minimize

(see notes in class)

top related