charles university

Charles University

FSV UK

STAKAN III

Institute of Economic Studies

Faculty of Social Sciences Institute of Economic Studies

Faculty of Social Sciences

Jan Ámos VíšekJan Ámos Víšek

Econometrics Econometrics

Charles University

Sixth Lecture

Tuesday, 14.00 – 15.20

Schedule of today talk

First pattern of output from statistical package.

Is the estimated model acceptable or not?

Misinterpretations of results and how to avoid it.

(Verifying the assumptions for OLS to be BLUE.)

(Consistency and normality will be next time.)

Menu-oriented all required evaluations are made “by clicking the mouse”

Types of statistical libraries (packages)

Key-oriented

required evaluations are performed as sequence of orders written by means of key-words

STATISTICA , E-views

TSP, SAS , S-PLUS, R

Combined

St. Err. St. Err.

BETA of BETA B of B t(25) p-level

Intercept -3.6186 56.1027 -0.0645 0.9491

VAHA 0.6466 0.1463 1.2676 0.2869 4.4188 0.0002

PULS -0.0604 0.0992 -0.5252 0.8628 -0.6087 0.5482

SILA -0.2547 0.1240 -0.5050 0.2459 -2.0536 0.0506

ZCAS 0.5667 0.1086 3.9030 0.7477 5.2199 0.0000

Regression Summary for Dependent Variable: CCAS

R= .92364067 R²= .85311209 Adjusted R²= .82961003

F(4,25)=36.299 p<.00000 Std.Error of estimate: 28.671

A first pattern of statistical package output

Estimates of coefficients in model for transformed data.

Estimates of coefficients in model for original data.

of columns of the table see the next slide.)(For further discussion

Remember the Fifth Lecture

St. Err. St. Err.

BETA of BETA B of B t(25) p-level

Intercept -3.6186 56.1027 -0.0645 0.9491

VAHA 0.6466 0.1463 1.2676 0.2869 4.4188 0.0002

PULS -0.0604 0.0992 -0.5252 0.8628 -0.6087 0.5482

SILA -0.2547 0.1240 -0.5050 0.2459 -2.0536 0.0506

ZCAS 0.5667 0.1086 3.9030 0.7477 5.2199 0.0000

Evidently insignificant

Slightly insignificant

Surely significant

Clearly insignificant, but .....

Remember again

the Fifth Lecture

Regression Summary for Dependent Variable: CCAS

R= .92364067 R²= .85311209 Adjusted R²= .82961003

F(4,25)=36.299 p<.00000 Std.Error of estimate: 28.671

A first pattern of statistical package output

One of frequently appearing misinterpretation of model:

Then we can frequently meet with a conclusion of type

The first assertion can be true, under some circumstances, but generally we cannot claim anything like that.

Why?

“As the estimate of regression coefficient for Weight is positive, the Weight has positive impact on Time Total .”

or (even)

“Although the coefficient of determination is small, the po-larity of the estimated coefficients corresponds to our ideas.”.

Time Total = -3.62 + 1.27 * Weight - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile.

Assume that the result of regression analysis was

The second assertion can have, under some circumstances, a sense but generally is false.

Let us consider following, a bit academic, example:

n,,2,1i,X85.1X58.223.1Y i2i1ii

The regression model has random explanatory variables and the shape

.

n,,2,1i,X85.12.1X i82i1i

Moreover,

but we are not aware of this relation between and and we take into account only (1).

(1)

If we conclude that has positive impact on ,

2iX iY

What to do?

1iX 2iX

Remember “Ceteris paribus” -- unfortunately, it’s an academic illusion !!

it is surely false conclusion.

cannot be “discovered” by correlation analysis.

The correlation indicates only linear relations among r.v.’s. Sorry!

n,,2,1i,X85.12.1X i82i1i

The relation

kp,n

2p,np,n

k2n

22n2n

k1n

21n1n

kp,2

2p,2p,2

k22

22222

k21

22121

kp,1

2p,1p,1

k12

21212

k11

21111

X,,X,X,,X,,X,X,X,,X,X



But if we take into account correlation matrix of ( with , of course ) npk

we have a chance, again due to Weierstrass theorem, to find some latent relations between (keep in mind that not among)

the explanatory variables.

(2)or a similar one

Denote this matrix by . X~

Of course, the best is to try to regressone explanatory variable on various combinations

of others, their functions, powers etc.

Clearly, it is tiresome, time-consuming job, full of routine, etc.

Running regression , we obtain as an indication 0XY

of collinearity the table “Redundancy” containing all coef-

ficients of determination of models which regressesXone column of on all other columns.

We shall speak about collinearity later on.

In STATISTICA

Let us recall that we have already showed another possible

misinterpretation - namely how misleading can be to infer

ponse from the magnitude of estimated coefficient.

After all, analysis of data is, at least partially, an art. (Or the art ?!)

on the impact of given explanatory variable on the res-

See the Fifth Lecture,the fourth slide

Menandros, 342 – 293 B.C.

I would like to have a drop of good luck,or a barrel of intelligence !!

Sometimes, a grain of intuition is better than hours of routine.

To estimate the model which includes only significant explana-

tory variables. To check that the coefficient of determination is

acceptably large. To find out the mutual relations of explanato-

ry variables.

I am sorry, not yet !!

A hint: The answer can be deduced from what was already given !!!

What should we do more?

We already know that we should ......

Can we then accept model and interpret it?

We should check the assumptions under which the OLS are optimal estimator !!

The answer is as simple as follows:

Prior to looking for a way how to verify the assumptions,

let us also recall the picture, graphically showing ......

Assumptions

Let be a sequence of r.v’s,1ii }{

Assertions

Then is the best linear unbiased estimator .

Let us recall - Theorem

,,0 ij2

jii ),0(2

)n,SLO(

Assumptions

If moreover , and)n(OXX T )n(O)XX( 11T ‘s are independent,

If further , regular matrix,

.

QXXlim Tn

1

n

Assertions

)n,SLO( is consistent.Assumptions

Assertions

then

Q

),0())ˆ(n n0)n,OLS( N(L

where .Q))ˆ(n(cov 20)n,OLS(

)n,OLS(ˆX

)X(Rn M

.Y1

Y1Y Y

)2(X)X( M

1)1(X

)X()1,,1,1( T M 1

Recall that

)ˆ(r )n,SLO(

2R

20 SR

and

0)ˆ(r)ˆ(rn

1i

)n,SLO(i

)n,SLO(T 1

)X()ˆ(r )n,SLO( M

0i Sometimes we can meet with idea that showing that

n

1i)n,SLO(

in

1 )ˆ(r

is small, the assumption will be verified. However, as we have

seen on previous slide, it holds always.

Testing validity of the assumptions

by the ( linear ) regression model. Notice that in the case when

0i Moreover, the assumption is in fact accommodated

0i ,we can consider model

)(XX)1(Y ipip22i1i

ipip22i1~XX

~1

0~i with .

cannot be and even need not be verified. 0i In other words, the assumption

n,,2,1i),,0(22i (homoscedasticity)

The most of test of the homoscedasticity are based on testing

an idea about the model of heteroscedasticity.

We are going to show one of them but there are plenty of such tests – rarely implemented.

Testing validity of the assumptions

Another test will not be from this class, it is based on surprisingly simple idea and it is frequently used.

Assume that where h is a (smooth) function )Z(h Ti

2i

and . Denoting , )Z,,Z,Z,1(Z ik3i2ii

n,,2,1i,:H 22io we observe that if null hypothesis

),,,,( k321

is valid, then are not significant ( we will learnk32 ,,,

later how to test simultaneously whether several coefficients

are not significant).

Assumptions

Assertions

The locally most powerful test can be based on the statistic

n

1i

22)n,OLS(2i

1

T1TT

)s)ˆ(r(n

qZ)ZZ(Zq

with )1k()( 2 L ,T

n212)n,OLS(2

iiT

n21 )Z,,Z,Z(Z,s)ˆ(rq,)q,q,q(q .

powerful against skew d.f. of ’s i

Breusch-Pagan test (1979)

White’s test (1980)

The idea of test is given by

Prior to the explanation notice that

jk

n

1i

Tiijk

n

1i

Tii

n

1i ikijjkT XXXXXXXX

and recall that

n

1i i1

emp ZnZ.

Moreover, if random variables and are independent,

}{

Technicalities

}rXX{rXX 21

n

1i

T11emp

2i

Tiin

1

n

1i

2in

1n

1i

Tiin

121emp

T11emp rXX}r{}XX{

}rXX{ 21

T11emp

2Tn

1 sXX .

.

White’s test (1980)

So, the idea of test is to compare two matrices

Test is in fact carried as follows (Halbert White, 1980)

n

1i

2i

Tiin

1 rXX 2Tn

1 sXXand .

Assumptions

Assertions

2

)1p(p,...,2,1s,XX ilikis

i2/)1p(p,i2/)1p(p2i21i10)n,LS(2

i )ˆ(r

and regress on , i.e. 1,2/)1p(p2i1i ,,,,1 )ˆ(r )n,LS(2i

consider the regression model

.

Put

.L ( )2nR2

2

)1p(pn Then for its coefficient of determination we have 2R

Continued

1T2 )XX(s

Let us recall that and hence

)ˆcov( )n,SLO(

1T21T2T1T )XX()XX(XIX)XX(

and hence can be estimated by . )ˆcov( )n,SLO(

T1T0)n,SLO( XXXˆ

1TTT1T )XX(XX)XX(

Again technicalities

2nR2

2

)1p(p Of course, libraries (e.g. TSP, E-Views) give the corresponding

p-value that r.v. is larger than value of result-

ing from the regression given on the previous slide.

White’s test (1980) Continued

However, if , we obtain },,,{diag 2n

22

21

T

)ˆcov( )n,SLO( 1TTi

n

1i i2i

1T )XX(XX)XX(

.

Remember this matrix!

White’s test (1980) – consequence for looking for a model

.

So if the hypothesis about homoscedasticity is rejected, we should

can be consistently estimated by

Halbert White showed that

by the square root of diagonal elements of matrix )ˆ(cov )n,SLO(

)n,SLO(

We shall recall what it is on the next slide !

(or better, has to) studentize the coordinates of estimate

)ˆcov( )n,SLO( 1TTi

n

1i i2i

1T )XX(XX)XX(

1TTii

)n,OLS(n

1i

2i

1T XXXX)ˆ(rXX

)ˆcov( )n,SLO( .

Continued

Put

then

)ˆ(cst 0i

)n,SLO(i

2

1

ii1

ni

where ii1T

ii )XX(c ,, is called studentization.

Assumptions

Let be iid. r.v’s, .1ii }{ ),0(,0 22

ii Lemma

Moreover, let and be regular. ),0()( 2i NL XX T

AssertionsThen .

Assumptions

))XX(,0()ˆ( 1T20)n,SLO( NL

Put )ˆ(cst 0

i)n,SLO(

i2

1

ii1

ni

where ii1T

ii )XX(c

Assertions

Then )pn(i t)t( L ,

i.e. is distributed as Student with degrees of freedom.

it t pn

.This transformation

is called studentization.

By it we rid of which is unknown.

2

Recalling studentization

White’s test (1980) – consequence for looking for a model

What can happen if heteroscedasticity is recognized

but ignored, demonstrates the example on the next slide.

Continued

Assumptions

ii

1TTii

)n,OLS(n

1i

2i

1Tii XXXX)ˆ(rXXd

.

So, similarly as in previous, we put

)ˆ(dt 0i

)n,SLO(i

2

1

iii

where however now

Assertions

Then again )pn(i t)t( L .

Lemma

Model of export from the Czech republic into EU

Agriculture and hunting , Forestry and logging , Fishing , Mining of coal and lignite , Other mining , Manufacture of food products and beverages , Man.of tobacco products , Man. of textiles , Man. of textile products for the household , Man. of footwear, Man. of wood and of products of wood , Man. of pulp, paper and paper pro-ducts , Publishing, printing and reproduction, Man. of coke, refined petroleum products and nuclear fuel , Man. of chemicals and phar-maceuticals, Man. of rubber and plastic products , Man. of bricks, tiles, constr. products and glass, Man. of basic metals , Man. of struc-tural metal products , Man. of machinery , Man. of office machinery , Manufacture of electrical machinery and apparatus, Radio, televisionand communication equipment , Medical, precision and optical instruments , Motor vehicles , Transport equipment , Furniture , Recycling , Electricity, gas, steam and hot water supply

29 industries:

1993 - 2001

PE/kg, PI/kg, Tariffs from EU on X, Tariffs CZ on M, Prize deflator (base 94), FDI stock , K/L, EMPloyment, Value Added, GDPeu (total), GDPcz (total), REER, Wages, Annual cost, Total expenditure, Debts

Export from the Czech republic into EU

Response variable:

Explanatory variables :

Some preliminary analysis ( we shall speak about it later - - may be in summer term ) indicated that also past values of explanatory variables should be used.

Estim. StandardVariable Coeff. Error t-stat. P-valueC 9.643 5.921 1.629 [.104]

log(BX) 0.827 0.033 25.18 [.000]

log(PE) -0.164 0.06 -2.724 [.007]

log(BPE) 0.2 0.062 3.24 [.001]

log(VA) 0.337 0.077 4.365 [.000]

log(BVA) -0.228 0.079 -2.899 [.004]

log(K/L) -0.625 0.159 -3.937 [.000]

log(BK/BL) 0.518 0.157 3.29 [.001]

log(DE/VA) 0.296 0.122 2.419 [.016]

log(BDE/BVA) -0.292 0.119 -2.456 [.015]

log(FDI) 0.147 0.056 2.629 [.009]

log(BFDI) -0.151 0.056 -2.717 [.007]

log(GDPeu) 1.126 0.629 1.789 [.045]

log(BGDPeu) -1.966 0.623 -3.155 [.002]

Example of model ignoring heteroscedasticity

Notice that all explanatory

variables are significant!

Other characteristics of model

Mean of dep. var. = 11.115 Durbin-Watson = 1.98 [<.779]

Std. dev. of dep. var. = 1.697 White het. test = 244.066 [.000]

Sum of squared residuals

= 150.997 Jarque-Bera test = 372.887 [.000]Variance of residuals = 0.519 Ramsey's RESET2 = 8.614 [.004]

Std. error of regression = 0.72 F (zero slopes) = 107.422 [.000]

R-squared = 0.828 Schwarz B.I.C. = 365.603Adjusted R-squared = 0.82 Log likelihood = -325.56LM het. test = 19.964

Estim. StandardVariable Coeff. Error t-stat. P-valueC 9.643 4.128 2.336 [.020]

log(BX) 0.827 0.046 18.141 [.000]

log(PE) -0.164 0.107 -1.53 [.127]

log(BPE) 0.2 0.107 1.876 [.062]

log(VA) 0.337 0.203 1.661 [.098]

log(BVA) -0.228 0.192 -1.191 [.235]

log(K/L) -0.625 0.257 -2.435 [.016]

log(BK/BL) 0.518 0.301 1.717 [.087]

log(DE/VA) 0.296 0.292 1.014 [.312]

log(BDE/BVA) -0.292 0.282 -1.034 [.302]

log(FDI) 0.147 0.141 1.039 [.300]

log(BFDI) -0.151 0.123 -1.223 [.222]

log(GDPeu) 1.126 1.097 1.027 [.305]

log(BGDPeu) -1.966 0.995 -1.976 [.049]

This is the signifikance of coefficients when Whiteestimate of covariance matrix was employed

Notice that nearly

all explanatory variables are

non-significant!

Estim. StandardVariable Coeff. Error t-stat. P-valuelog(BX) 0.804 0.05 16.125 [.000]log(VA) 0.149 0.039 3.784 [.000]log(K/L) -0.214 0.063 -3.38 [.001]log(GDPeu) 1.896 0.782 2.425 [.016]log(BGDPeu) -2.538 0.778 -3.261 [.001]

)L

Klog()VAlog()Xlog()Xlog(

t

t4t31t21t

t1t6t5 u)GDPeulog()GDPeulog(

Employing White estimate of covariance matrix of the estimates of regression coefficients

Resulting model is considerably simpler !!!

Notice that the heteroscedasticity is not removed, only (?) the significance was judged on modified values of studentized estimates of regression coefficients !!

Other characteristics of model

Mean of dep. var. = 11.115 Durbin-Watson = 1.914 [<.344]

Std. dev. of dep. var. = 1.697 White het. test = 116.659 [.000]

Sum of squared residuals

= 171.003 Jarque-Bera test = 449.795 [.000]

Variance of residuals = 0.572 Ramsey's RESET2 = 3.568 [.060]

Std. error of regression = 0.756 F (zero slopes) = 246.404 [.000]

R-squared = 0.805 Schwarz B.I.C. = 361.696

Adjusted R-squared = 0.801 Log likelihood = -344.53

LM het. test = 17.876

Warning !!!

The only exception may be when the shape

of heteroscedasticity is know with high degree of reliability.

An example:

Data are aggregated values of some economic, demographic,

sociologic, educational, etc. characteristics over districts of

a country. Then the variance of these givens are inversely

proportional to the number of inhabitants, economic

subjects, etc. Then there is a grain of hope that .........

Attempts of removing heteroscedasticity by

a transformation of data is typically the reliable way to hell !!!

Analyzing homoscedasticity by graphic tools

The idea:

So plotting against i , we should not obtain )ˆ(r )n,SLO(i

any regular or periodical shape of graph. Such graph is called

A “handicap” of the idea is that the shape of graph depends on

the order of observations in analyzed data. Hence one can easily

reorder the data so that we obtain a regular shape of graph.

A remedy is simple !

If , then n,,2,1i),,0(22i )ˆ(r )n,SLO(

i

should not depend on i .

index plot .

Analysing homoscedasticity by graphic tools

The refined idea:

If then n,,2,1i),,0(22i )ˆ(r )n,SLO(

i

should not depend on and/or . So plotting )ˆ(r )n,SLO(i ii Y,X iY

e.g. against , we should not obtain any regular shape ...... .iY

(An example is on the next slide.)

Looking for heteroscedasticity by circumstances

Assume the consumption of households. Those with large income do not consume all but save some money and buy, from time to

time, TV-set, fridge, car, etc. . It means that their consumption is

sometimes smaller sometimes larger while the consumption of

poorer households is nearly the same all the time.

Hence the consumption will not be (usually) homoscedatic.

Squared residuals plotted against predicted values of response.

The result indicates that a suspicion on a slight heteroscedasticity can arise.

Assumptions

Let be a sequence of r.v’s,1ii }{

Assertions

Then is the best linear unbiased estimator .

,,0 ij2

jii ),0(2

)n,SLO(

Assumptions

If moreover , and)n(OXX T )n(O)XX( 11T ‘s are independent,

If further , regular matrix,

.

QXXlim Tn

1

n

Assertions

)n,SLO( is consistent.Assumptions

Assertions

then

Q

),0())ˆ(n n0)n,OLS( N(L

where .Q))ˆ(n(cov 20)n,OLS(

Let us recall once again - Theorem

There are still some assumptions to be verified.

We’ll discuss them on the next lecture.

What is to be learnt from this lecture for exam ?

Linearity of estimator and of model – what advantages and restrictions do they represent ? How to test basic assumptions: - , - homoscedasticity : White test versus tests based on model of heteroscedasticity, i.e. two approaches based on different ideas ? - graphic tests? What means : “The estimator is the best in the class of ….” OLS is the best unbiased estimator - the condition(s) for it.

All what you need is on http://samba.fsv.cuni.cz/~visek/

0i

charles university

Documents