llio9921

14
1 Lesson 2: MULTIPLE LINEAR REGRESSION MODEL (I) 2.1. FORMULATION AND BASIC ASSUMPTIONS OF THE MODEL 2.2. ORDINARY LEAST SQUARE ESTIMATION (OLS). STATISTICAL PROPERTIES 2.3. RESIDUAL ANALYSIS 2.4. ESTIMATION OF THE VARIANCE OF THE DISTURBANCE TERM 2.5. MAXIMUM LIKELIHOOD ESTIMATION (MLE). STATISTICAL PROPERTIES 2.6. MODEL VALIDATION AND MEASURES OF GOODNESS OF FIT 2.7. THE MULTIPLE LINEAR REGRESSION MODEL IN DEVIATIONS 2.8. CHANGE OF ORIGIN AND SCALE OF THE VARIABLES. UNITS OF MEASUREMENT SELF-EVALUATION EXERCISES

Upload: mauricio-ortiz-osorio

Post on 08-Dec-2015

212 views

Category:

Documents


0 download

DESCRIPTION

ley6648,02

TRANSCRIPT

Page 1: LLIo9921

1

Lesson 2:

MULTIPLE LINEAR REGRESSION

MODEL (I)

22..11.. FFOORRMMUULLAATTIIOONN AANNDD BBAASSIICC AASSSSUUMMPPTTIIOONNSS OOFF TTHHEE

MMOODDEELL 22..22.. OORRDDIINNAARRYY LLEEAASSTT SSQQUUAARREE EESSTTIIMMAATTIIOONN ((OOLLSS)).. SSTTAATTIISSTTIICCAALL PPRROOPPEERRTTIIEESS 22..33.. RREESSIIDDUUAALL AANNAALLYYSSIISS 22..44.. EESSTTIIMMAATTIIOONN OOFF TTHHEE VVAARRIIAANNCCEE OOFF TTHHEE DDIISSTTUURRBBAANNCCEE

TTEERRMM 22..55.. MMAAXXIIMMUUMM LLIIKKEELLIIHHOOOODD EESSTTIIMMAATTIIOONN ((MMLLEE)).. SSTTAATTIISSTTIICCAALL PPRROOPPEERRTTIIEESS 22..66.. MMOODDEELL VVAALLIIDDAATTIIOONN AANNDD MMEEAASSUURREESS OOFF GGOOOODDNNEESSSS OOFF

FFIITT 22..77.. TTHHEE MMUULLTTIIPPLLEE LLIINNEEAARR RREEGGRREESSSSIIOONN MMOODDEELL IINN

DDEEVVIIAATTIIOONNSS 22..88.. CCHHAANNGGEE OOFF OORRIIGGIINN AANNDD SSCCAALLEE OOFF TTHHEE VVAARRIIAABBLLEESS.. UUNNIITTSS OOFF MMEEAASSUURREEMMEENNTT SSEELLFF--EEVVAALLUUAATTIIOONN EEXXEERRCCIISSEESS

Page 2: LLIo9921

2

OOBBJJEECCTTSS::

In this lesson we study the specification for the multiple linear regression model (MLRM) that establishes a linear relation between the endogenous variable and the exogenous variables. Then, we will see the basic assumptions underlying the model and we will estimate the parameters of the model. Finally, we will present the steps to be followed in order to validate the estimated model: the determination coefficient and its correction, and the measures which allow evaluating the goodness of fit. We then analyze the economic and statistical significance of the parameter estimated.

KKEEYYWWOORRDDSS:: Endogenous and exogenous variables, parameters, error term, basic assumptions, OLS estimation, ML estimation, statistical properties of the estimators, residuals, estimated variance of the error term, goodness of fit, R2 coefficient, economic significance, individual or joint statistical significance.

RREEFFEERREENNCCEESS::

• Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 5th Edition Michigan State University. ISBN-10: 1111531048 ISBN-13: 9781111531041. 2012. Chapter 2, 3 (no multicollinearity), 4 until 4.4, approximatively. • William H. Greene, Econometric Analysis, 7/E. Stern School of Business, New York University. Prentice Hall. 2012. Chapter 2, 3 (3.1, 3.2, 3.5, 3.6), 4 (4.1-4.8), 5 (5.1-5.2), 17 (17.1-17.4). 22..11.. FFOORRMMUULLAATTIIOONN AANNDD BBAASSIICC AASSSSUUMMPPTTIIOONNSS OOFF TTHHEE

RREEGGRREESSSSIIOONN MMOODDEELL A) FORMULATION OF THE REGRESSION MODEL

Page 3: LLIo9921

3

B) DEFINITION OF THE BASIC ASSUMPTIONS A) Formulation of the regression model A multiple linear regression model (MLRM) is the model where an endogenous variable (Y) can be explained by different k exogenous variables (X)

y f x x x xK= ( , , ,..., )1 2 3 the causality relationship is linear and unidirectional

Multiple Linear Regression Model (MLRM)

y x x x xk k= + + + +β β β β1 1 2 2 3 3 ...

x1=1

y x x xk k= + + + +β β β β1 2 2 3 3 ...

β∂∂j

j

y

xj k= = 2 3, ,...,

But, in the model specified an additional element is missing. en

Parameters o coefficients independent term or

constant

It measures the amount by which y changes when x increases by one unit.

Page 4: LLIo9921

4

Deterministic Relationship Stochastic Relationship

Introducing the Error Term

For instance: Keynesian Consumption Function

ii IC 21 ββ +=

It implies a deterministic relationship. It supposes that once you know the value of the Income is possible to know exactly the amount of Consumption. Moreover, it supposes that all families with the same Income have the same Consumption level. However, the reality is different: In fact, there exist other factors that influence the Consumption decisions other than the Income. Hence, we have to introduce a term that will include: • The effect of other variables that are also important in

explaining the consumption behaviour, but that it is not

ii IC 21 ββ +=

Y

C

x x

x

x x

x

x x

x

x

x

x x

x

x

x

Page 5: LLIo9921

5

possible to include among the explicative variables (and for which the joint effect on the consumption is null on average).

• The random behaviour of the Consumption (and, in general, of the economic relationships)

• Measurement errors of the variables included in the model

or possible specification errors. Therefore, each econometric model has to reflect a stochastic and not deterministic relationship between the variables:

y x x x uk k= + + + + +β β β β1 2 2 3 3 ... y : Endogenous Variable or Dependent variable or

Explained Variable

xj : Exogenous Variable or Independent variable (j=2,3,...,k)

or Explanatory Variable

u : Error term or disturbance

β1 : Coefficient of the "constant" (independent term)

βj : Other coefficients or parameters (j=2,3,...,k)

Deterministic Part Stochastic (random) part: not observable

Page 6: LLIo9921

6

Then, we have to select a spatial or time dimension and we need a sample of observation to estimate the model... We continue with an example using cross sectional data, where for each individual we can specify an equation:

y x x x uk k1 1 2 21 3 31 1 1= + + + + +β β β β...

y x x x uk k2 1 2 22 3 32 2 2= + + + + +β β β β...

y x x x uk k3 1 2 23 3 33 3 3= + + + + +β β β β...

...

y x x x uN N N k kN N= + + + + +β β β β1 2 2 3 3 ...

yi : endogenous variable for the observation i.

Two types of data

• Cross sectional y x x x ui i i k ki i= + + + + +β β β β1 2 2 3 3 ... i=1,2,...,N • Time series y x x x ut t t k k t t= + + + + +β β β β1 2 2 3 3 ... t=1,2,...,T

Page 7: LLIo9921

7

xji : exogenous variable xj for observation i.

This system can be written in a matrix form:

y

y

y

x x x

x x x

x x x

u

u

uN

k

k

N N kN k N

1

2

21 31 1

22 32 2

2 3

1

2

1

2

1

1

1

...

...

...

... ... ... ... ...

...

... ...

=

+

ββ

β

1Nx1kxNxk1Nx UXY += β

B) Definition of the Basic Assumptions In order to determine the properties of the estimators and the tests that it is possible to perform, it is necessary to formulate the following hypothesis: a) Basic Assumptions upon the model b) Basic Assumptions upon the error term c) Basic Assumptions upon the exogenous variables d) Basic Assumptions upon the parameters a) Basic Assumptions upon the model

Page 8: LLIo9921

8

We have three basic assumptions: 1ª) the model is stochastic 2ª) the model is linear (or it is possible to make it linear). It is possible to establish two types of relationships: linear and not linear. The non linear relationship can give raise to:

• Strictly linear models: nonlinear relationship which can be made linear by some transformation in the model. They are linear with respect to the parameters, but not with respect to the variables.

• Example: Cobb-Douglas Production Function

Y AL Ki i i= β β1 2

LnY LnA LnL LnKi i i= + +β β1 2

• Inherently nonlinear models: they cannot be made

linear. They are nonlinear with respect to the parameters.

3ª) The size of the data sample. The minimum requirement is that the number of observations is higher or equal than the numbers of parameters to be estimated. This is the minimum requirement, it is advisable however to have a big sample size to guarantee a good and feasible estimation.

Page 9: LLIo9921

9

N k N k≥ − ≥ 0

b) Basic assumption upon the error term There are 4 basic assumptions: 1ª) The expected value of the error term is equal to 0

E(ui)=0 ∀i

E U E U

E u

E u

E u

Nx

N

( ) ( )

( )

( )

...

( )

...= =

=

0

0

0

0

1

1

2

2ª) The variance of the error term is constant

VAR u ii u( ) = ∀σ2

Homoskedasticity assumption

Degrees of freedom

Page 10: LLIo9921

10

When this assumption is not satisfied, there is Heteroskedasticity

VAR u VAR ui ui j uj( ) ( )= ≠ =σ σ2 2

We assume that the error term is homoskedastic

3ª) There is no autocorrelation in the error terms, i.e. they are independent.

( )( )[ ] [ ]COV u u E u E u u E u E u u

i j N i j

i j i i j j i j( , ) ( ) ( )

, , , ... ,

= − − = =

∀ = ≠

0

1 2 If the error term is homoskedastic and there is no autocorrelation we will say that the error term is spherical and in this case the variance-covariance matrix of the error term (Ω) is scalar. Which will be the dimension of Ω?

Page 11: LLIo9921

11

[ ] ( )

( ) ( )( ) ( )

( ) (

NxNuu

u

u

u

jiui

jiji

ii

NNNN

N

N

N

N

I

uuCOVanduVAR

HypothesesdueBasic

iuuCOVuuE

uVARuEdue

uuuuuu

uuuuuu

uuuuuu

E

uuu

u

u

u

EUUEUVAR

22

2

2

2

2

2

21

22212

12111

212

1

1...00

............

0...10

0...01

...00

............

0...0

0...0

:

,

...

............

...

...

......

')(

σσ

σ

σσ

σ

=

=

=

=

==

=

=

=

===Ω

Therefore, with no heteroskedasticity and no autocorrelation we will have that:

Ω = =VAR U Iu N( ) σ2

Ω is a square matrix, symmetric and positive definite.

Page 12: LLIo9921

12

A way to summarize the basic assumptions of homoskedasticity and absence of autocorrelation is:

( )E u ui j

i ji j

u==≠

σ2

0

4ª) The error term follows a normal distribution In this way the basic hypotheses of the error term can be summarized and expressed as follows:

ui ~ N(0,σu2 ) U ~ N(0, σu

2 IN)

c) Basic Assumptions upon the explanatory variables There are 5 basic assumptions: 1ª) The explicative variables are fixed or deterministic (they are not random) 2ª) The explicative variables are uncorrelated with the error term. Therefore:

( )E x u i N

j k

ji i = ∀ =

∀ =

0 1 2

1 2

, ,...,

, ,...,

Page 13: LLIo9921

13

3ª) There is no exact linear relationship between the explicative variables...there is no perfect multicollinearity. The columns of the matrix X are linearly independent.

ρ(X)=k (the rank of the matrix X is full)

4ª) The explicative variables are not measured with errors. 5ª) In the model there is neither omission of relevant variables nor inclusion of irrelevant variables. At this point we can ask: which is the expected value and the variance of Y?

( ) ( ) ( ) βββ XUEXUXEYE =+=+=

( ) ( ) ( )[ ]

( )[ ] ( ) 2u

22

2

UEXUXE

YEUXEYVAR

σββ

β

==−+=

=−+=

Therefore Y follows a normal distribution with expected

value βX and variance2uσ .

d) Basic assumptions upon the parameters

The βj coefficients are constant for the entire data sample

Page 14: LLIo9921

14

Assumption of Structural Stability

Once we have analyzed the basic assumptions we can say that we can have a situation where one or more of these assumptions are not met... • Heteroskedasticity • Autocorrelation • No normality of the error term U • Problem of endogeneity of the X • Perfect Multicollinearity • Measurement Errors • Specification Error of the X • Structural Change

Econometrics II

Econometrics III

Econometrics I - Lesson 5

Econometrics I - Lesson 6