gretl

Study Guide for Econometrics

(second semester)

Programa Universitat-Empresa

Universitat Autònoma de Barcelona

February 2008

Michael Creel and Montserrat Farell

Contents

Introduction 7

Econometrics at the Facultat 7

About this study guide 8

Bibliograpy 9

Chapter 1. GRETL 11

1.1. Introduction 11

1.2. Getting Started 12

1.3. Chapter Exercises 20

Chapter 2. Dummy Variables 23


2.2. Motivation 23

2.3. Denition, Basic Use, and Interpretation 25

2.4. Additional Details 28

2.5. Primer Projecte Docencia Tutoritzada 30


Chapter 3. Collinearity 35


3.2. Motivation: Data on Mortality and Related Factors 35

3.3. Denition and Basic Concepts 38

3.4. When does it occur? 39

3

4 CONTENTS

3.5. Consequences of Collinearity 40

3.6. Detection of Collinearity 44

3.7. Dealing with collinearity 45

3.8. Segon Projecte de Docencia Tutoritzada 45


Chapter 4. Heteroscedasticity 47


4.2. Motivation 47

4.3. Basic Concepts and Denitions 48

4.4. Eects of Het. and Aut. on the OLS estimator 50

4.5. The Generalized Least Squares (GLS) estimator 52

4.6. Feasible GLS 55

4.7. Heteroscedasticity 56

4.8. Example 64

4.9. Tercer Projecte de Docència Tutoritzada 64


Chapter 5. Autocorrelation 67


5.2. Motivation 67

5.3. Causes 69

5.4. Eects on the OLS estimator 70

5.5. Corrections 70

5.6. valid inferences with autocorrelation of unknown form 77

5.7. Testing for autocorrelation 78

5.8. Lagged dependent variables and autocorrelation: A Caution 79

CONTENTS 5

5.9. Quart Projecte de Docència Tutoritzada 80


Chapter 6. Data sets 83

Introduction

Econometrics at the Facultat

Econometrics (Econometria) is an annual (two semester) course in the Facultat

de Ciències Econòmiques i Empresarials at the UAB. It is a required course for

the degree of Llicenciat in both Administració i Direcció d'Empreses (ADE) and

Economia (ECO). In both ADE and ECO, Econometrics is normally taken in the

third year of study.

Econometrics is an area of Economics that uses statistical and mathematical

tools to analyze data on economic phenomena. Econometrics can be used to nd

a mathematical model that gives a good representation of an actual economy, to

test theories about how an economy behaves, or to make predictions about how

an economy will evolve. Estimation of models, testing hypotheses, and making

predictions are things that can be done using econometric methods.

Courses that are fundamental for successfully studying Econometrics are Matemà-

tiques per a Economistes I and Matemàtiques per a Economistes II (rst year of

study) and Estadistica I and Estadistica II (second year of study). Ideally, stu-

dents should have passed these courses before beginning Econometrics. If this is

not possible, any student of Econometrics should immediately begin serious review

of the material covered in these courses. Basic matrix algebra, constrained and

unconstrained minimization of functions, conditional and unconditional expecta-

tions of random variables, and hypothesis testing are areas that should be reviewed.

7

8 INTRODUCTION

Microeconomia I and Microeconomia II are courses that provide a theoretical

background which is important to understand why and how we use econometric

tools. Macroeconomia I also provides a theoretical background for some of the

examples of the second half of Econometrics.

About this study guide

This study guide covers the material taught in the second semester, in groups 13

and 14 (the groups of the PUE). The guide contains brief notes for all of the material,

as well as examples that use the GRETL. This guide does not substitute reading a

textbook, it accompanies a textbook. It also does not substitute attending class. The

guide highlights essential concepts, provides examples, and gives exercises. However,

class lectures contain details that are not reproduced in the guide. To learn these

details, attending class is fundamental, as is careful reading of a textbook. The

guide provides references to the book Econometría (cuarta edición) by D. Gujarati,

mentioned below. In the second semester of Econometrics, we will cover material in

Chapters 9, 10, 11 and 12 of Gujarati's book.

This guide has been checked to work properly using the Firefox web browser,

and Adobe Acrobat Reader. Both of these packages are freely available for the

commonly used operating systems. You should congure Acrobat Reader to use

Firefox to open links. This study guide and related materials (data sets, copies of

software and manuals, etc.) are available at the Econometrics Study Guide web

page.

http://pareto.uab.es/mcreel/EconometricsStudyGuide/index.html

BIBLIOGRAPY 9

Bibliograpy

There are many excellent textbooks for econometrics. Any of the following are

appropriate. This study guide refers to Gujarati's book. You should denitely read

the appropriate sections of at least one of these books.

(1) Novales, A. , Econometria, McGraw-Hill

(2) Gujarati, D. , Econometria, McGraw-Hill

(3) Johnston, J. i J. Dinardo, Metodos de Econometria, Vicens Vives

(4) Kmenta, J., Elementos de Econometria, Vicens Vives

(5) Maddala, G.S.(1996), Introducción a la econometria, Segona edició. Pren-

tice Hall

(6) Pindyck, R.S. & Rubinfeld, D.L. (2001), Econometria: modelos y pronós-

ticos, McGraw-Hill. Quarta Edició.

CHAPTER 1

GRETL

1.1. Introduction

GRETL (GRETL http://gretl.sourceforge.net/) is a free computer pack-

age for doing econometrics. It is installed on the computers in Aules 21-22-23 as

well as in the Social Sciences computer rooms. You can download a copy and install

it on your own computer. It works with Windows, Macs, and Linux. It is avail-

able in a number of languages, including Spanish. The version for Windows, along

with the manual and the data sets that accompany D. Gujarati's Econometría are

distributed with this study guide, and are also available :

• Gretl v. 1.7.1 for Windows

• Data to accompany Gujarati's book

The examples in this study guide use GRETL, and to do the class assignments you

will need to use GRETL. This chapter explains the basic steps of using GRETL.

• Basic concepts and goals for learning:

(1) become familiar with the basic use of GRETL

(2) learn how to load ASCII and spreadsheet data

(3) learn how to select certain observations in a data set

• Readings: GRETL manual in Spanish or in English . You don't have to

read the whole manual, but looking though it would be good idea.

11

http://gretl.sourceforge.net/

http://pareto.uab.es/mcreel/EconometricsStudyGuide/Gretl/gretl-1.7.1.exe

http://pareto.uab.es/mcreel/EconometricsStudyGuide/Gretl/gujarati_data.exe

http://pareto.uab.es/mcreel/EconometricsStudyGuide/Gretl/gretl-guide-es.pdf

http://pareto.uab.es/mcreel/EconometricsStudyGuide/Gretl/gretl-guide-a4.pdf

12 1. GRETL

Figure 1.2.1. GRETL's startup window

1.2. Getting Started

Once you start GRETL, you see the window in Figure 1.2.1. You need to load

some data to use GRETL. Data comes in many forms: plain text les, spreadsheet

les, binary les that use special formats, etc. GRETL can use most of these forms.

We'll look at how to deal with two cases: plain ASCII text data, and Microsoft

Excel spreadsheet data.

1.2.1. Loading ASCII text data. The Wisconsin longitudinal survey is long

term study of people who graduated from high school in the state of Wisconsin (US)

during the year 1957. The data has been collected repeatedly in subsequent years.

http://www.ssc.wisc.edu/wlsresearch/

1.2. GETTING STARTED 13

This data can be obtained over the Internet from the address given previously. In

Figure 1.2.2 you can see that several variables have been selected for download.

Figure 1.2.2. Downloading data

In Figure 1.2.3 you see that one of the available formats is comma separated

values (csv), which provides records (lines) that have variables which may be text

or numbers, each separated by commas. Downloading that gives us the le wls.csv

, the rst few lines of which are

iduser,ix010rec,sexrsp,gg021jjd,gwiiq_bm

1001,60,2,18000,109

1002,,1,,79

1003,,2,,111

1004,,1,,96

1005,,2,,83

1006,65,2,-2,99

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/wls.csv

14 1. GRETL

Figure 1.2.3. comma separated

1007,70,1,-2,86

1008,71,1,-2,86

1009,67,2,16827,106

1010,72,1,17094,88

1011,67,2,7698,124

1012,,2,-2,124

This rst line of the le gives the variable names, and the other lines are the individ-

ual records, one for each person. There are a total of 10317 records, for individual

people. Some variables are missing for some people. In the data set, this is indicated

by two commas in a row with no number in between.

We need to know how to load this data into GRETL. This can be done as is seen

in Figure 1.2.4. Doing that, we now have the data in GRETL, as we seen in Figure


Figure 1.2.4. Loading a csv le

1.2.5.

This data set has some problems that make it dicult to use. First, the variable

names are strange and not intuitive. Second, many observations have missing values.

You can change names of variables by right-clicking on a variable, and selecting

Edit attributes. Then change the name to whatever you like. See Figure 1.2.6. To

see that many observations are missing values, right-click on a variable and choose

Display values or Descriptive statistics. For example, the variable income (I

renamed gg021jjd to income) shows what we see in Figure 1.2.7.

16 1. GRETL

Figure 1.2.5. CSV data loaded

Figure 1.2.6. Changing a variable's name


Figure 1.2.7. Missing observations

To eliminate missing observations, we can select from the menu Sample -> Re-

strict, based on criterion, as in Figure 1.2.8. We need to enter a selection criterion.

This data set is missing many observation on income and age. We can select that

these variables must be positive. This is illustrated in Figure 1.2.9. Once we do this,

the new sample has 4934 observations, as we can seen in Figure 1.2.10. Whenever

you are using this data, you should make sure that you have removed the observa-

tions with missing data.

1.2.2. Loading spreadsheet data. Data is often distributed as spreadsheet

les. These are easy to load into GRETL using the File -> Open data -> Import

option. Figure 1.2.11 shows how to do it. We need some spreadsheet data to try

18 1. GRETL

Figure 1.2.8. Select sample, 1

this. Get the nerlove.xls data, and then import is as I have just explained. Once

you do this you will see the dialog in Figure 1.2.12. Select no.

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/nerlove.xls


Figure 1.2.9. Selection criterion

Figure 1.2.12. Data dialog

20 1. GRETL

Figure 1.2.10. Restricted sample

1.3. Chapter Exercises

(1) For the Wisconsin data set:

(a) change the variable name of the variable ix010rec to age

(b) change the name of gg021jjd to income

(c) change the name of gwiiq_bm to IQ.

(d) select observations such that age and income are positive. You

should have 4934 observations after doing so.

(e) save the restricted data, with new variable names, as the data set wis-

consin.gdt. Conrm that you can load this data into a new GRETL

session.

(2) With your wisconsin.gdt data set:

(a) explore the GRETL menu options, the help features, and the manual,

and print histograms (frequency plots) for the variables age, income

and IQ.

1.3. CHAPTER EXERCISES 21

Figure 1.2.11. Loading spreadsheet data

(b) print descriptive statistics for all variables.

CHAPTER 2

Dummy Variables

2.1. Introduction

• Basic concepts and goals for learning. After studying the material, you

should be able to answer the following questions:

(1) What is a dummy variable?

(2) How can dummy variables be used in regression models?

(3) What is the correct interpretation of a regression model that contains

dummy variables?

(4) How can dummy variables be used in the cases of multiple categories,

interaction terms, and seasonality?

(5) What is the equivalence between the dierent parameterizations that

can be used when incorporating dummy variables?

• Readings:

(1) Gujarati, Econometria, (cuarta edicion), Chapter 9: Modelos de re-

gressión con variables dicótomas, pp. 285 - 320.

2.2. Motivation

Often, qualitative factors can have an important eect on the dependent variable

we may be interested in. Consider the Wisconsin data set wisconsin.gdt . If we

regress income on height, having selected the sample to include men only, we obtain

the tted line in Figure 2.2.1. Doing the same for the sample of women, we get

Figure 2.2.2. Comparing the two plots, we can see that:

23

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/wisconsin.gdt

24 2. DUMMY VARIABLES

Figure 2.2.1. Income regressed on height, men

Figure 2.2.2. Income regressed on height, women

2.3. DEFINITION, BASIC USE, AND INTERPRETATION 25

• the y-intercept is higher for men than for women

• the slope of the line is steeper for men than for women

• men are taller on average - for men, mean height is around 70 inches, while

for women it's about 65 inches

There are a few questions we might ask:

• why does income appear to depend upon height? What economic explana-

tions are possible?

• why do women appear to be earning less than men, other things equal?

Apart from these questions, it is clear that a qualitative feature - the sex of the

individual - has an impact upon the individual's expected income.

• How can we incorporate such a qualitative characteristic into an economet-

ric model?

The need to use qualitative information in our models motivates the study of dummy

variables.

2.3. Denition, Basic Use, and Interpretation

Dummy variable (denition): A dummy variable is a binary-valued variable

that indicates whether or not some condition is true. It is customary to assign the

value 1 if the condition is true, and 0 if the condition is false.

Dummy variable (example): for the Wisconsin data, the variable sexrsp takes

the value 1 for men, and 2 for women. As such, sexrsp is not a dummy variable,

since the values are not 0 or 1. We can dene the condition Is the person a woman?

This is equivalent to the condition Is the value of sexrsp equal 2?. This condition

will be true for some observations, and false for others. With GRETL, we can dene

such a dummy variable, using the Variable -> Dene new variable menu item, as in


Figure 2.3.1. Dening a dummy variable

Figure 2.3.2. Display values

Figure 2.3.1. To check that this worked properly, highlight both variables, R-click,

and select Display values. This shows us what we see in Figure 2.3.2. Note that

woman is now a variable like any other, that takes on the values 0 or 1.

2.3.1. Basic use and interpretation. Dummy variables are used essentially

like any other regressor. In class we will discuss the following models. Variables

like dt and dt2 are understood to be dummy variables. Variables like xt and xt3 are

2.3. DEFINITION, BASIC USE, AND INTERPRETATION 27

ordinary continuous regressors. You should understand the interpretation of all of

them.

yt = β1 + β2dt + εt

yt = β1dt + β2(1− dt) + εt

yt = β1 + β2dt + β3xt + εt

Interaction terms: an interaction term is the product of two variables, so that

the eect of one variable on the dependent variable depends on the value of the

other. The following model has an interaction term. Note that ∂E(y|x)∂x

= β3 + β4dt.

The slope depends on the value of dt.

yt = β1 + β2dt + β3xt + β4dtxt + εt

Multiple dummy variables: we can use more than one dummy variable in a

model. We will study models of the form

yt = β1 + β2dt1 + β3dt2 + β4xt + εt

yt = β1 + β2dt1 + β3dt2 + β4dt1dt2 + β5xt + εt

Incorrect usage: You should understand why the following models are not

correct usages of dummy variables:

(1) overparameterization:

yt = β1 + β2dt + β3(1− dt) + εt

(2) multiple values assigned to multiple categories. Suppose that we a condition

that denes 4 possible categories, and we create a variable d = 1 if the

observation is in the rst category, d = 2 if in the second, etc. (This is not

strictly speaking a dummy variable, according to our denition). Why is

the following model not a good one?

yt = β1 + β2d+ ε

What is the correct way to deal with this situation?

2.4. Additional Details

Seasonality and dummy variables. Dummy variables can be used to treat

seasonal variations in data. We will use the Keeling-Whorf.gdt data to illustrate

this. You should be able to use GRETL to reproduce the following results:

Model 1: OLS estimates using the 468 observations 1965:012003:12

Dependent variable: C02

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/keeling_whorf_CO2.gdt

2.4. ADDITIONAL DETAILS 29

Variable Coecient Std. Error t-statistic p-value

djan 316.864 0.210610 1504.5009 0.0000

dfeb 317.533 0.210789 1506.4046 0.0000

dmar 318.271 0.210967 1508.6276 0.0000

dapr 319.418 0.211147 1512.7780 0.0000

dmay 319.848 0.211327 1513.5233 0.0000

djun 319.187 0.211507 1509.1057 0.0000

djul 317.653 0.211688 1500.5705 0.0000

daug 315.539 0.211870 1489.3056 0.0000

dsep 313.690 0.212052 1479.3061 0.0000

doct 313.548 0.212235 1477.3572 0.0000

dnov 314.792 0.212419 1481.9367 0.0000

ddec 315.961 0.212603 1486.1530 0.0000

time 0.121327 0.000404332 300.0664 0.0000

Mean of dependent variable 345.310

S.D. of dependent variable 16.5472

Sum of squared residuals 634.978

Standard error of residuals (σ) 1.18134

Unadjusted R2 0.995034

Adjusted R2 0.994903

F (12, 455) 7597.57

DurbinWatson statistic 0.0634062

and the plot in Figure 2.4.1.

Multiple parameterizations. To formulate a model that conditions on a given

set of categorical information, there are multiple ways to use dummy variables. For


Figure 2.4.1. Keeling-Whorf CO2 data, t using monthly dummies

example, the two models

yt = β1dt + β2(1− dt) + β3xt + β4dtxt + εt

and

yt = α1 + α2dt + α3xtdt + α4xt(1− dt) + εt

are equivalent. You should know what are the 4 equations that relate the βj pa-

rameters to the αj parameters, j = 1, 2, 3, 4. You should know how to interpret the

parameters of both models.

2.5. Primer Projecte Docencia Tutoritzada

Podeu treballar en grups de ns 5 alumnes. L'avaluació formarà part de la

nota dels exercicis. Recomano instalar Gretl en un ordinador portàtil amb WiFi,

2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA 31

per poder treballar comodament. Heu d'entregar abans del dia 1 de juny un breu

informe (10 pàgines màxim) sobre el següent:

2.5.1. Theoretical background. For a rm that takes input prices w and the

output level q as given, the cost minimization problem is to choose the quantities of

inputs x to solve the problem

minxw′x

subject to the restriction

f(x) = q.

The solution is the vector of factor demands x(w, q). The cost function is obtained

by substituting the factor demands into the criterion function:

Cw, q) = w′x(w, q).

• Monotonicity Increasing factor prices cannot decrease cost, so

∂C(w, q)

∂w≥ 0

Remember that these derivatives give the conditional factor demands (Shep-

hard's Lemma).

• Homogeneity The cost function is homogeneous of degree 1 in input prices:

C(tw, q) = tC(w, q) where t is a scalar constant. This is because the factor

demands are homogeneous of degree zero in factor prices - they only depend

upon relative prices.


• Returns to scale The returns to scale parameter γ is dened as the inverse

of the elasticity of cost with respect to output:

γ =

(∂C(w, q)

∂q

q

C(w, q)

)−1

Constant returns to scale is the case where increasing production q implies

that cost increases in the proportion 1:1. If this is the case, then γ = 1.

2.5.2. Cobb-Douglas functional form. The Cobb-Douglas functional form

is linear in the logarithms of the regressors and the dependent variable. For a cost

function, if there are g factors, the Cobb-Douglas cost function has the form

C = Aqβqwβ1

1 ...wβgg e

ε

What is the elasticity of C with respect to wj?

eCwj=

(∂C

∂WJ

)(wjC

)= βjAq

βqwβ1

1 .wβj−1j ..wβg

g eε wj

Aqβqwβ1

1 ...wβgg eε

= βj

This is one of the reasons the Cobb-Douglas form is popular - the coecients are easy

to interpret, since they are the elasticities of the dependent variable with respect to

the explanatory variable. Not that in this case,

eCwj=

(∂C

∂WJ

)(wjC

)= xj(w, q)

wjC

≡ sj(w, q)

2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA 33

the cost share of the jth input. So with a Cobb-Douglas cost function, βj = sj(w, q).

The cost shares are constants.

Note that after a logarithmic transformation we obtain

lnC = α + βq ln q + β1 lnw1 + ...+ βg lnwg + ε

where α = lnA . So we see that the transformed model is linear in the logs of the

data.

One can verify that the property of HOD1 implies that

g∑i=1

βg = 1

In other words, the cost shares add up to 1.

The hypothesis that the technology exhibits CRTS implies that

γ =1

βq= 1

so βq = 1. Likewise, monotonicity implies that the coecients βi ≥ 0, i = 1, ..., g.

2.5.3. The Nerlove data and OLS. The le nerlove.xls contains data on 145

electric utility companies' cost of production, output and input prices. The data are

for the U.S., and were collected by M. Nerlove. The observations are by row, and the

columns are COMPANY, COST (C), OUTPUT (Q), PRICE OF LABOR

(PL), PRICE OF FUEL (PF ) and PRICE OF CAPITAL (PK). Note that the

data are sorted by output level (the third column).

(1) Baixar les dades nerlove.xls (és un txer Excel).

(2) Importar les dades en Gretl

(3) Crear logaritmes de cost, output, labor, fuel, capital




(4) Estimar amb MQO el model

(2.5.1) ln(cost) = β1 +β2 ln(output)+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ ε

(5) Comentar els resultats, en general, i especicament respecte homogeneitat

de grau 1 i rendiments a escala

(6) Crear variables ctícies

(a) d1 = 1 si 101 <= rm <= 129, d1 = 0 al contrari

(b) d2 = 1 si 201 <= rm <= 229, d2 = 0 al contrari

(c) d3 = 1 si 301 <= rm <= 329, d3 = 0 al contrari

(d) d4 = 1 si 401 <= rm <= 429, d4 = 0 al contrari

(e) d5 = 1 si 501 <= rm <= 529, d5 = 0 al contrari

(7) Estimar el model

(2.5.2)

ln(cost) =5∑j=1

αjdj+5∑j=1

γj [dj ln(output)]+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ε

(8) Comentar resultats, enfatitzant rendiments a escala. Presentar un gràc

representant rendiments a escala com una funció del tamany de l'empresa.

Interpretar el gràc.

(9) Contrastar restriccions α1 = α2 = α3 = α4 = α5 conjuntament amb γ1 =

γ2 = γ3 = γ4 = γ5 i interpretar el resultat.


The professor of the practical session will give you a problem list. The problems

9.1, 9.2, 9.3, 9.5, 9.6, 9.13, 9.15 on pages 311-320 of Gujarati's book are recommended

for study.

CHAPTER 3

Collinearity

3.1. Introduction


should learn the answers to the following questions:

(1) What is collinearity?

(2) What are the eects of collinearity on the OLS estimator: how does it

aect estimation, hypothesis testing and prediction?

(3) How can the presence of collinearity be detected?

(4) What can be done to improve the situation if collinearity is a problem?

• Readings:

Gujarati, Econometria, (cuarta edicion), Chapter 10: Multicolinalidad:

¾Que pasa si las regresoras están correlacionadad?, pp. 327-371.

3.2. Motivation: Data on Mortality and Related Factors

The data set mortalitat.gdt contains annual data from 1947 - 1980 on death rates

in the U.S., along with data on factors like smoking and consumption of alcohol.

The data description is:

DATA4-7: Death rates in the U.S. due to coronary heart disease and their

determinants. Data compiled by Jennifer Whisenand

• chd = death rate per 100,000 population (Range 321.2 - 375.4)

• cal = Per capita consumption of calcium per day in grams (Range 0.9 -

1.06)

35

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/mortalitat.gdt

• unemp = Percent of civilian labor force unemployed in 1,000 of persons 16

years and older (Range 2.9 - 8.5)

• cig = Per capita consumption of cigarettes in pounds of tobacco by persons

18 years and olderapprox. 339 cigarettes per pound of tobacco (Range

6.75 - 10.46)

• edfat = Per capita intake of edible fats and oil in poundsincludes lard,

margarine and butter (Range 42 - 56.5)

• meat = Per capita intake of meat in poundsincludes beef, veal, pork, lamb

and mutton (Range 138 - 194.8)

• spirits = Per capita consumption of distilled spirits in taxed gallons for

individuals 18 and older (Range 1 - 2.9)

• beer = Per capita consumption of malted liquor in taxed gallons for indi-

viduals 18 and older (Range 15.04 - 34.9)

• wine = Per capita consumption of wine measured in taxed gallons for indi-

viduals 18 and older (Range 0.77 - 2.65)

Consider the models, with the estimation results:

chd = β1 + β2cig + β3spirits+ β4beer + β5wine+ ε

chd = 334.914(58.939)

+ 5.41216(5.156)

cig + 36.8783(7.373)

spirits− 5.10365(1.2513)

beer

+ 13.9764(12.735)

wine

T = 34 R2 = 0.5528 F (4, 29) = 11.2 σ = 9.9945

(standard errors in parentheses)

chd = β1 + β2cig + β3spirits+ β4beer + ε

chd = 353.581(56.624)

+ 3.17560(4.7523)

cig + 38.3481(7.275)

spirits− 4.28816(1.0102)

beer

T = 34 R2 = 0.5498 F (3, 30) = 14.433 σ = 10.028


chd = β1 + β2cig + β3spirits+ β5wine+ ε

chd = 243.310(67.21)

+ 10.7535(6.1508)

cig + 22.8012(8.0359)

spirits− 16.8689(12.638)

wine

T = 34 R2 = 0.3198 F (3, 30) = 6.1709 σ = 12.327


chd = β1 + β2cig + β3spirits+ ε

chd = 181.219(49.119)

+ 16.5146(4.4371)

cig + 15.8672(6.2079)

spirits

T = 34 R2 = 0.3026 F (2, 31) = 8.1598 σ = 12.481


38 3. COLLINEARITY

Note how the signs of the coecients change depending on the model, and that

the magnitude of the parameter estimates varies a lot too. The parameter estimates

are highly sensitive to the particular model we estimate. Why? We'll see that the

problem is that the data exhibit collinearity.

3.3. Denition and Basic Concepts

Collinearity (denition): Collinearity is the existence of linear relationships

amongst the regressors. We can always write

λ1x1 + λ2x2 + · · ·+ λKxK + v = 0

where xi is the ith column of the regressor matrix X, and v is an n × 1 vector. In

the case that there exists collinearity, the variation in v is relatively small, so that

there is an approximately exact linear relation between the regressors.

• relative and approximate are imprecise terms, so the existence of collinear-

ity is also an imprecise, relative concept.

• many authors, including Gujarati, use the term multicollinearity. Some,

including myself, prefer to call the phenomenon collinearity. Collinear-

ity as used here means exactly what Gujarati and others refer to as mul-

ticollinearity.

Exact (or Perfect) Collinearity (denition):

In the extreme, if there are exact linear relationships, we can write

λ1x1 + λ2x2 + · · ·+ λKxK = 0

3.4. WHEN DOES IT OCCUR? 39

In this case, ρ(X) < K, so ρ(X ′X) < K, so X ′X is not invertible and the OLS esti-

mator is not uniquely dened. The existence of exact linear relationships amongst

the regressors is known as perfect collinearity or exact collinearity.

For example, if the model is

yt = β1 + β2x2t + β3x3t + εt

x2t = α1 + α2x3t

then we can write

yt = β1 + β2 (α1 + α2x3t) + β3x3t + εt

= β1 + β2α1 + β2α2x3t + β3x3t + εt

= (β1 + β2α1) + (β2α2 + β3)x3t

= γ1 + γ2x3t + εt

• The γ′s can be consistently estimated, but since the γ′s dene two equations

in three β′s, the β′s can't be consistently estimated (there are multiple val-

ues of β that solve the rst order conditions that dene the OLS estimator).

The β′s are unidentied in the case of perfect collinearity.

3.4. When does it occur?

Perfect collinearity:

• Perfect collinearity is unusual, except in the case of an error in construction

of the regressor matrix, such as including the same regressor twice.

• Another case where perfect collinearity may be encountered is with models

with dummy variables, if one is not careful. Consider a model of rental

40 3. COLLINEARITY

price (yi) of an apartment. This could depend factors such as size, quality

etc., collected in xi, as well as on the location of the apartment. Let Bi = 1

if the ith apartment is in Barcelona, Bi = 0 otherwise. Similarly, dene Gi,

Ti and Li for Girona, Tarragona and Lleida. One could use a model such

as

yi = β1 + β2Bi + β3Gi + β4Ti + β5Li + x′iγ + εi

In this model, Bi+Gi+Ti+Li = 1, ∀i, so there is an exact relationship be-

tween these variables and the column of ones corresponding to the constant.

One must either drop the constant, or one of the qualitative variables.

Collinearity (inexact):

The more common case, if one doesn't make mistakes such as these, is the

existence of inexact linear relationships, i.e., correlations between the regressors

that are less than one in absolute value, but not zero. This is (unfortunately) quite

common with economic data.

• economic data is non-experimental, so a researcher cannot control the values

of the variables.

• common factors aect dierent variables at the same time, which tends to

induce correlations. Variables tend to move together over time (for example,

prices of apartments in Barcelona and in Valencia).

3.5. Consequences of Collinearity

The basic problem is that when two (or more) variables move together, it is di-

cult to determine their separate inuences. This is reected in imprecise estimates,

i.e., estimates with high variances. With economic data, collinearity is commonly

encountered, and is often a severe problem.

3.5. CONSEQUENCES OF COLLINEARITY 41

Figure 3.5.1. s(β) when there is no collinearity

When there is collinearity, the minimizing point of the objective function that

denes the OLS estimator (s(β), the sum of squared errors) is relatively poorly

dened. This is seen in Figures 3.5.1 and 3.5.2.

To see the eect of collinearity on variances, partition the regressor matrix as

X =[

x W]

where x is the rst column of X (note: we can interchange the columns of X if

we like, so there's no loss of generality in considering the rst column). Now, the

variance of β, under the classical assumptions, is

V (β) = (X ′X)−1σ2

42 3. COLLINEARITY

Figure 3.5.2. s(β) when there is collinearity

Using the partition,

X ′X =

x′x x′W

W ′x W ′W

and following a rule for partitioned inversion,

(X ′X)−11,1 =

(x′x− x′W (W ′W )−1W ′x

)−1

=(x′(In −W (W ′W )

′1W ′)

x)−1

=(ESSx|W

)−1

where by ESSx|W we mean the error sum of squares obtained from the regression

x = Wλ+ v.

3.5. CONSEQUENCES OF COLLINEARITY 43

Since

R2 = 1− ESS/TSS,

we have

ESS = TSS(1−R2)

so the variance of the coecient corresponding to x is

V (βx) =σ2

TSSx(1−R2x|W )

We see three factors inuence the variance of this coecient. It will be high if

(1) σ2 is large

(2) There is little variation in x. Draw a picture here.

(3) There is a strong linear relationship between x and the other regressors, so

thatW can explain the movement in x well. In this case, R2x|W will be close

to 1. As R2x|W → 1, V (βx)→∞.

The last of these cases is collinearity.

Intuitively, when there are strong linear relations between the regressors, it is

dicult to determine the separate inuence of the regressors on the dependent vari-

able. This can be seen by comparing the OLS objective function in the case of no

correlation between regressors with the objective function with correlation between

the regressors. See Figures 3.5.1 and 3.5.2.

Consequences - summary:

• the parameters associated with variables aected by collinearity have high

variances.

• high variances lead to low power when testing hypotheses.

• high variances lead to low t-statistics, broad condence intervals, etc.

• the results are sensitive to small changes in the sample.

3.6. Detection of Collinearity

• The best way is simply to regress each explanatory variable in turn on the

remaining regressors. If any of these auxiliary regressions has a high R2,

there is a problem of collinearity. Furthermore, this procedure identies

which parameters are aected.

Sometimes, we're only interested in certain parameters. Collinearity

isn't a problem if it doesn't aect what we're interested in estimating.

• An alternative is to examine the matrix of correlations between the regres-

sors. High correlations are sucient but not necessary for severe collinearity.

There may be a near exact linear relationship between 3 variables without

the existence of any near exact linear relationship between pairs of variables.

• Also indicative of collinearity is that the model ts well (high R2), but

none of the variables is signicantly dierent from zero (e.g., their separate

inuences aren't well determined).

• In summary, the articial regressions are the best approach if one wants to

be careful.

Example: using the mortalitat.gdt data, discussed above (Section 3.2), we can use

the articial regression approach, regressing spirits on the other regressors (cig, wine,

beer). The results are

spirits = −1.01350(1.4477)

+ 0.0670534(0.12709)

cig + 0.0794414(0.02738)

beer + 0.313745(0.3101)

wine

T = 34 R2 = 0.8907 F (3, 30) = 90.669 σ = 0.24749



3.8. SEGON PROJECTE DE DOCENCIA TUTORITZADA 45

. Note that R2 is very high: we have a serious problem of collinearity. This explains

the instability of the parameters we found earlier when we tried several models in

Section 3.2.

3.7. Dealing with collinearity

Collinearity is a problem of an uninformative sample. The rst question is: is all

the available information being used? Is more data available? Are there coecient

restrictions that have been neglected? Picture illustrating how a restriction can solve

problem of perfect collinearity.

There do exist specialized methods such as ridge regression, principal components

analysis, etc. that can be used when there is a severe problem of collinearity, but

these topics are advanced and are outside the scope of this course. These methods

present problems of their own, they are not clear and obviously good solutions to

the problem.

In sum, collinearity is a fact of life in econometrics, and there is no clear solution

to the problem. It is important to be aware of its eects and to know when it is

present.

3.8. Segon Projecte de Docencia Tutoritzada

(1) Pel model de Nerlove de cost de producci´o d'electriticat

ln(cost) = β1 + β2 ln(output) + β3 ln(labor) + β4 ln(fuel) + β5 ln(capital) + ε

que s'ha explicat en Secció2.5.3, fes servir regressions articials per com-

provar l'existència de conlinealitat.

(2) Quin és el motiu per la falta de signicativitat del coecient β5 en el model

de Nerlove? Dona una interpretació econòmica.

46 3. COLLINEARITY

(3) Verica l'existència de colinealitat en els models de mortalitat que estan

presentats en Secció 3.2. Baixa les dades i fes les regressions articials

pertinyents. També presenta la matriu de correlacions dels regressors cig,

spirits, wine, beer. Dona una interpretació


The professor of the practical sessions will give you a list of problems. To that,

you might also consider the exercises 10.5, 10.7, 10.9, 10.19, 10.30a, 10.30b from

Gujarati, pp. 361-371.

CHAPTER 4

Heteroscedasticity

4.1. Introduction



(1) What is heteroscedasticity?

(2) What are the properties of the OLS estimator when there is het-

eroscedasticity?

(3) What is the GLS estimator?

(4) What is the feasible GLS estimator?

(5) What are the properties of the (F)GLS estimator?

(6) How can the presence of heteroscedasticity be detected?

(7) How can we deal with heteroscedasticity if it is present?

• Readings:

Gujarati, Econometria, (cuarta edicion), Chapter 11: Heteroscedastici-

dad: ¾Qué pasa cuando la varianza del error no es constante?, pp. 372 -

424.

4.2. Motivation

One of the assumptions we've made up to now is that

εt ∼ IID(0, σ2),

47

48 4. HETEROSCEDASTICITY

or occasionally

εt ∼ IIN(0, σ2).

This model is quite unreasonable in many cases. Often, the variance of εt will

change depending on the values of the regressors, or there may be correlations be-

tween dierent εt,εs,s 6= t. For example, consider the Nerlove model of section 2.5.3.

If we estimate the model in equation 5.9.1, a plot of the residuals versus log(output)

is in Figure 4.2.1. Note that the variance of the error appear to be larger for small

rms, and smaller for large rms. This seems to violate the classical assumption that

E(εt) = σ2,∀t. If the variance is not constant, we have a problem of heteroscedas-

ticity. Note also in Figure 4.2.1 that there seems to be correlation in the residuals:

when a residual is positive, the next one is too in most cases. When a residual is

negative, the next one is more likely to be negative than positive. If this is the case,

it's a violation of the classical assumption that E(εtεs) = 0, t 6= s. If this is the case,

we have a problem of autocorrelation.

In this chapter and the next, we'll investigate what is the importance of these

two problems, and how to deal with them.

4.3. Basic Concepts and Denitions

Now we'll investigate the consequences of nonidentically and/or dependently

distributed errors. We'll assume xed regressors for now, relaxing this admittedly

unrealistic assumption later. The model is

y = Xβ + ε

E(ε) = 0

V (ε) = Σ

4.3. BASIC CONCEPTS AND DEFINITIONS 49

Figure 4.2.1. Residuals of Nerlove model

where Σ is a general symmetric positive denite matrix.

• The case where Σ is a diagonal matrix gives uncorrelated, nonidentically

distributed errors. This is known as heteroscedasticity (HET).

• The case where Σ has the same number on the main diagonal but nonzero

elements o the main diagonal gives identically (assuming higher moments

are also the same) dependently distributed errors. This is known as auto-

correlation (AUT).

Heteroscedasticity (denition): Heteroscedasticity is the existence of errors that

have dierent variances. More precisely, there exist εi and εj such that V (εi) 6= V (εj).

Autocorrelation (denition): Autocorrelation is the existence of errors that

are correlated with one another. More precisely, there exist distinct εi and εj such

E(εiεj) 6= 0.


• Note that presence of HET implies that Σ will have dierent elements on

its main diagonal.

• If there is AUT, then at least some elements of Σ o the main diagonal will

be dierent from zero.

• When there is HET but not AUT, Σ will be a diagonal matrix.

• It is possible to have both HET and AUT at the same time. In this case,

Σ can be a general symmetric positive denite matrix.

4.4. Eects of Het. and Aut. on the OLS estimator

The least square estimator is

β = (X ′X)−1X ′y

= β + (X ′X)−1X ′ε

• We have unbiasedness, as before.

• The variance of β is

E[(β − β)(β − β)′

]= E

[(X ′X)−1X ′εε′X(X ′X)−1

]= (X ′X)−1X ′ΣX(X ′X)−1(4.4.1)

Due to this, any test statistic that is based upon an estimator of σ2 is

invalid, since there isn't any σ2, it doesn't exist as a feature of the true

process that generates the data. In particular, the formulas for the t, F, χ2

based tests given above do not lead to statistics with these distributions.

• β is still consistent, following exactly the same argument given before.

4.4. EFFECTS OF HET. AND AUT. ON THE OLS ESTIMATOR 51

• If ε is normally distributed, then

β ∼ N(β, (X ′X)−1X ′ΣX(X ′X)−1

)The problem is that Σ is unknown in general, so this distribution won't be

useful for testing hypotheses.

• Without normality, we still have

√n(β − β

)=√n(X ′X)−1X ′ε

=

(X ′X

n

)−1

n−1/2X ′ε

Dene the limiting variance of n−1/2X ′ε (supposing a CLT applies) as

limn→∞

E(X ′εε′X

n

)= Ω

so we obtain√n(β − β

)d→ N

(0, Q−1

X ΩQ−1X

)

Summary: OLS with heteroscedasticity and/or autocorrelation is:

• unbiased in the same circumstances in which the estimator is unbiased with

i.i.d. errors

• has a dierent variance than before, so the previous test statistics aren't

valid

• is consistent

• is asymptotically normally distributed, but with a dierent limiting covari-

ance matrix. Previous test statistics aren't valid in this case for this reason.

• is inecient, as is shown below.


4.5. The Generalized Least Squares (GLS) estimator

Suppose Σ were known. Then one could form the Cholesky decomposition

P ′P = Σ−1

Here, P is an upper triangular matrix. We have

P ′PΣ = In

so

P ′PΣP ′ = P ′,

which implies that

PΣP ′ = In

Consider the model

Py = PXβ + Pε,

or, making the obvious denitions,

y∗ = X∗β + ε∗.

This variance of ε∗ = Pε is

E(Pεε′P ′) = PΣP ′

= In

4.5. THE GENERALIZED LEAST SQUARES (GLS) ESTIMATOR 53

Therefore, the model

y∗ = X∗β + ε∗

E(ε∗) = 0

V (ε∗) = In

satises the classical assumptions. The GLS estimator is simply OLS applied to the

transformed model:

βGLS = (X∗′X∗)−1X∗′y∗

= (X ′P ′PX)−1X ′P ′Py

= (X ′Σ−1X)−1X ′Σ−1y

The GLS estimator is unbiased in the same circumstances under which the OLS

estimator is unbiased. For example,

E(βGLS) = E

(X ′Σ−1X)−1X ′Σ−1y

= E

(X ′Σ−1X)−1X ′Σ−1(Xβ + ε

= β.

The variance of the estimator can be calculated using

βGLS = (X∗′X∗)−1X∗′y∗

= (X∗′X∗)−1X∗′ (X∗β + ε∗)

= β + (X∗′X∗)−1X∗′ε∗


so

E(

βGLS − β)(

βGLS − β)′

= E

(X∗′X∗)−1X∗′ε∗ε∗′X∗(X∗′X∗)−1

= (X∗′X∗)−1X∗′X∗(X∗′X∗)−1

= (X∗′X∗)−1

= (X ′Σ−1X)−1

Either of these last formulas can be used.

• All the previous results regarding the desirable properties of the least squares

estimator hold, when dealing with the transformed model, since the trans-

formed model satises the classical assumptions.

• Tests are valid, using the previous formulas, as long as we substitute X∗ in

place of X. Furthermore, any test that involves σ2 can set it to 1. This is

preferable to re-deriving the appropriate formulas.

• The GLS estimator is more ecient than the OLS estimator. This is a

consequence of the Gauss-Markov theorem, since the GLS estimator is based

on a model that satises the classical assumptions but the OLS estimator

is not. To see this directly, not that (the following needs to be completed)

V ar(β)− V ar(βGLS) = (X ′X)−1X ′ΣX(X ′X)−1 − (X ′Σ−1X)−1

= AΣA′

where A =[(X ′X)−1X ′ − (X ′Σ−1X)−1X ′Σ−1

]. This may not seem obvi-

ous, but it is true, as you can verify for yourself. Then noting that AΣA′

is a quadratic form in a positive denite matrix, we conclude that AΣA′is

positive semi-denite, and that GLS is ecient relative to OLS.

4.6. FEASIBLE GLS 55

• As one can verify by calculating rst order necessary conditions, the GLS

estimator is the solution to the minimization problem

βGLS = arg min(y −Xβ)′Σ−1(y −Xβ)

so the metric Σ−1 is used to weight the residuals.

4.6. Feasible GLS

The problem is that Σ isn't known usually, so this estimator isn't available.

• Consider the dimension of Σ : it's an n× n matrix with (n2 − n) /2 + n =

(n2 + n) /2 unique elements.

• The number of parameters to estimate is larger than n and increases faster

than n. There's no way to devise an estimator that satises a law of large

numbers without adding restrictions.

• The feasible GLS estimator is based upon making sucient assumptions

regarding the form of Σ so that a consistent estimator can be devised.

Suppose that we parameterize Σ as a function of X and θ, where θ may include β

as well as other parameters, so that

Σ = Σ(X, θ)

where θ is of xed dimension. Assuming that the parametrization is correct, so in

fact Σ = Σ(X, θ), and if we can consistently estimate θ, then we can consistently

estimate Σ (as long as Σ(X, θ) is a continuous function of θ). In this case,

Σ = Σ(X, θ)p→ Σ(X, θ)


If we replace Σ in the formulas for the GLS estimator with Σ, we obtain the FGLS

estimator. The FGLS estimator shares the same asymptotic properties as

GLS. These are

(1) Consistency

(2) Asymptotic normality

(3) Asymptotic eciency if the errors are normally distributed. (Cramer-Rao).

(4) Test procedures are asymptotically valid.

In practice, the usual way to proceed is

(1) Dene a consistent estimator of θ. This is a case-by-case proposition, de-

pending on the parametrization Σ(θ). We'll see examples below.

(2) Form Σ = Σ(X, θ)

(3) Calculate the Cholesky factorization P = Chol(Σ−1).

(4) Transform the model using

P ′y = P ′Xβ + P ′ε

(5) Estimate using OLS on the transformed model.

4.7. Heteroscedasticity

Heteroscedasticity is the case where

E(εε′) = Σ

is a diagonal matrix, so that the errors are uncorrelated, but have dierent vari-

ances. Heteroscedasticity is usually thought of as associated with cross sectional

data, though there is absolutely no reason why time series data cannot also be

4.7. HETEROSCEDASTICITY 57

heteroscedastic. Actually, the popular ARCH (autoregressive conditionally het-

eroscedastic) models that you may hear about in your nance classes explicitly

assume that a time series is heteroscedastic.

Consider a supply function

qi = β1 + βpPi + βsSi + εi

where Pi is price and Si is some measure of size of the ith rm. One might suppose

that unobservable factors (e.g., talent of managers, degree of coordination between

production units, etc.) account for the error term εi. If there is more variability in

these factors for large rms than for small rms, then εi may have a higher variance

when Si is high than when it is low.

Another example, individual demand.

qi = β1 + βpPi + βmMi + εi

where P is price andM is income. In this case, εi can reect variations in preferences.

There are more possibilities for expression of preferences when one is rich, so it is

possible that the variance of εi could be higher when M is high.

Add example of group means.

4.7.1. Detection. There exist many tests for the presence of heteroscedasticity.

We'll discuss three methods.

4.7.1.1. Goldfeld-Quandt. The sample is divided in to three parts, with n1, n2

and n3 observations, where n1 +n2 +n3 = n. The model is estimated using the rst

and third parts of the sample, separately, so that β1 and β3 will be independent.

Then we haveε1′ε1

σ2=ε1′M1ε1

σ2

d→ χ2(n1 −K)


and

ε3′ε3

σ2=ε3′M3ε3

σ2

d→ χ2(n3 −K)

soε1′ε1/(n1 −K)

ε3′ε3/(n3 −K)

d→ F (n1 −K,n3 −K).

The distributional result is exact if the errors are normally distributed. This test is

a two-tailed test. Alternatively, and probably more conventionally, if one has prior

ideas about the possible magnitudes of the variances of the observations, one could

order the observations accordingly, from largest to smallest. In this case, one would

use a conventional one-tailed F-test. Draw picture.

• Ordering the observations is an important step if the test is to have any

power.

• The motive for dropping the middle observations is to increase the dier-

ence between the average variance in the subsamples, supposing that there

exists heteroscedasticity. This can increase the power of the test. On the

other hand, dropping too many observations will substantially increase the

variance of the statistics ε1′ε1 and ε3′ε3. A rule of thumb, based on Monte

Carlo experiments is to drop around 25% of the observations.

• If one doesn't have any ideas about the form of the het. the test will

probably have low power since a sensible data ordering isn't available.

4.7.1.2. White's test. When one has little idea if there exists heteroscedasticity,

and no idea of its potential form, the White test is a possibility. The idea is that if

there is homoscedasticity, then

E(ε2t |xt) = σ2,∀t


so that xt or functions of xt shouldn't help to explain E(ε2t ). The test works as

follows:

(1) Since εt isn't available, use the consistent estimator εt instead.

(2) Regress

ε2t = σ2 + z′tγ + vt

where zt is a P -vector. zt may include some or all of the variables in xt, as

well as other variables. White's original suggestion was to use xt, plus the

set of all unique squares and cross products of variables in xt.

(3) Test the hypothesis that γ = 0. The qF statistic in this case is

qF =(ESSR − ESSU) /P

ESSU/ (n− P − 1)

Note that ESSR = TSSU , so dividing both numerator and denominator by

this we get

qF = (n− P − 1)R2

1−R2

Note that this is the R2 or the articial regression used to test for het-

eroscedasticity, not the R2 of the original model.

An asymptotically equivalent statistic, under the null of no heteroscedasticity (so

that R2 should tend to zero), is

nR2 a∼ χ2(P ).

This doesn't require normality of the errors, though it does assume that the fourth

moment of εt is constant, under the null. Question: why is this necessary?


• The White test has the disadvantage that it may not be very powerful unless

the zt vector is chosen well, and this is hard to do without knowledge of the

form of heteroscedasticity.

• It also has the problem that specication errors other than heteroscedastic-

ity may lead to rejection.

• Note: the null hypothesis of this test may be interpreted as θ = 0 for the

variance model V (ε2t ) = h(α + z′tθ), where h(·) is an arbitrary function

of unknown form. The test is more general than is may appear from the

regression that is used.

4.7.1.3. Plotting the residuals. A very simple method is to simply plot the resid-

uals (or their squares). Draw pictures here. Like the Goldfeld-Quandt test, this will

be more informative if the observations are ordered according to the suspected form

of the heteroscedasticity.

4.7.2. Dealing with heteroscedasticity if it is present. Correcting for het-

eroscedasticity requires that a parametric form for Σ(θ) be supplied, and that a

means for estimating θ consistently be determined. The estimation method will

be specic to the for supplied for Σ(θ). We'll consider two examples, multiplicative

HET and HET by groups. Before this, let's consider using OLS, even if we have

HET. The advantage of this is that we don't need to specify the form of Σ(θ).

4.7.2.1. OLS with heteroscedasticity-consistent covariance matrix estimation. Eicker

(1967) and White (1980) showed how to modify test statistics to account for het-

eroscedasticity of unknown form. The OLS estimator has asymptotic distribution

√n(β − β

)d→ N

(0, Q−1

X ΩQ−1X

)


as we've already seen. Recall that we dened

limn→∞

E(X ′εε′X

n

)= Ω

This matrix has dimension K × K and can be consistently estimated, even if we

can't estimate Σ consistently. The consistent estimator, under heteroscedasticity

but no autocorrelation is

Ω =1

n

n∑t=1

xtx′tε

2t

One can then modify the previous test statistics to obtain tests that are valid when

there is heteroscedasticity of unknown form. For example, the Wald test for H0 :

Rβ − r = 0 would be

n(Rβ − r

)′(R

(X ′X

n

)−1

Ω

(X ′X

n

)−1

R′

)−1 (Rβ − r

)a∼ χ2(q)

4.7.2.2. Multiplicative heteroscedasticity. Suppose the model is

yt = x′tβ + εt

σ2t = E(ε2

t ) = (z′tγ)δ

but the other classical assumptions hold. In this case

ε2t = (z′tγ)

δ+ vt

and vt has mean zero. Nonlinear least squares could be used to estimate γ and

δ consistently, were εt observable. The solution is to substitute the squared OLS

residuals ε2t in place of ε2

t , since it is consistent by the Slutsky theorem. Once we

have γ and δ, we can estimate σ2t consistently using

σ2t = (z′tγ)

δp

→ σ2t .


In the second step, we transform the model by dividing by the standard deviation:

ytσt

=x′tβ

σt+εtσt

or

y∗t = x∗′t β + ε∗t .

Asymptotically, this model satises the classical assumptions.

• This model is a bit complex in that NLS is required to estimate the model

of the variance. A simpler version would be

yt = x′tβ + εt

σ2t = E(ε2

t ) = σ2zδt

where zt is a single variable. There are still two parameters to be estimated,

and the model of the variance is still nonlinear in the parameters. However,

the search method can be used in this case to reduce the estimation problem

to repeated applications of OLS.

• First, we dene an interval of reasonable values for δ, e.g., δ ∈ [0, 3].

• Partition this interval intoM equally spaced values, e.g., 0, .1, .2, ..., 2.9, 3.

• For each of these values, calculate the variable zδmt .

• The regression

ε2t = σ2zδmt + vt

is linear in the parameters, conditional on δm, so one can estimate σ2 by

OLS.

• Save the pairs (σ2m, δm), and the corresponding ESSm. Choose the pair with

the minimum ESSm as the estimate.


• Next, divide the model by the estimated standard deviations.

• Can rene. Draw picture.

• Works well when the parameter to be searched over is low dimensional, as

in this case.

4.7.2.3. Groupwise heteroscedasticity. A common case is where we have repeated

observations on each of a number of economic agents: e.g., 10 years of macroeco-

nomic data on each of a set of countries or regions, or daily observations of trans-

actions of 200 banks. This sort of data is a pooled cross-section time-series model.

It may be reasonable to presume that the variance is constant over time within

the cross-sectional units, but that it diers across them (e.g., rms or countries of

dierent sizes...). The model is

yit = x′itβ + εit

E(ε2it) = σ2

i ,∀t

where i = 1, 2, ..., G are the agents, and t = 1, 2, ..., n are the observations on each

agent.

• The other classical assumptions are presumed to hold.

• In this case, the variance σ2i is specic to each agent, but constant over the

n observations for that agent.

• In this model, we assume that E(εitεis) = 0. This is a strong assumption

that we'll relax later.

To correct for heteroscedasticity, just estimate each σ2i using the natural estimator:

σ2i =

1

n

n∑t=1

ε2it


• Note that we use 1/n here since it's possible that there are more than n

regressors, so n − K could be negative. Asymptotically the dierence is

unimportant.

• With each of these, transform the model as usual:

yitσi

=x′itβ

σi+εitσi

Do this for each cross-sectional group. This transformed model satises the

classical assumptions, asymptotically.

4.8. Example

4.8.1. Example: the Nerlove model. Let's check the Nerlove data for evi-

dence of heteroscedasticity. In what follows, we're going to use the model with the

constant and output coecient varying across 5 groups, but with the input price

coecients xed (see Equation 2.5.2). If you plot the residuals of this model, you

obtain Figure 4.8.1. We can see pretty clearly that the error variance is larger for

small rms than for larger rms.

As part of your next Docencia Tutoritzada project, you will use the White and

Goldfeld-Quandt tests to conrm that homoscedasticity is strongly rejected.

4.9. Tercer Projecte de Docència Tutoritzada

(1) Dades de Wisconsin

(a) Baixa les dades de Wisconsin, sobre alçada i renda

(b) selecciona les observacions amb informació completa sobre alçada i

renda.

(c) crea una variable ctícia indicant si la persona és dona/home

4.9. TERCER PROJECTE DE DOCÈNCIA TUTORITZADA 65

Figure 4.8.1. Residuals, Nerlove model, sorted by rm size

(d) crea noves variables "AD" i "IQD" que expressen alçada i IQ en desvia-

cions respecte les seves mitjanes mostrals.

(e) Estima el model renda = b1 + b2*Dona + b3* AD + b4*(Dona*AD)

+ b5*IQD + e amb l'estimador MQO.

(f) Comenta els resultats

(g) Comprova si hi ha heteroscedasticitat

(i) dibuixant els residus

(ii) amb el contrast Goldfeld-Quandt

(iii) amb el contrast de White

(h) Torna a estimar amb MQO, però amb desviacions típiques robustas.

Compara els resultats amb els d'abans.


(i) Fes una estimació MQ Generalitzat, suposant que hi ha heteroscedas-

ticitat per grups. Hi ha dos grups - homes i dones. Comenta els

resultats.

(j) Fes una estimació MQ Generalitzat, fent servir l'opció de GRETL

"Correcion de heteroscadaticidad" . Comenta els resultats.

(2) Dades Nerlove

(a) Torna a estimar el model amb variables ctícies i termes d'interacció

del Primer Projecte de Docència Tutoritzada

ln(cost) =5∑j=1

αjdj+5∑j=1


(b) Contrasta l'hipòtesis nulla: "els errors són homoscedastics" amb el

contrast de White.

(c) fés gràcs del residus, i comenta si es detecta l'heteroscedasticitat.

S'hauria d'obtenir un gràc semblant amb Figure 4.8.1.

(d) Fes una estimació MQ Generalitzat, fent servir l'opció de GRETL

"Correcion de heteroscadaticidad" . Comenta els resultats.



you might also consider exercises 11.1, 11.2, 11.6, 11.15, 11.16, from Gujarati, pp.

413-421.

CHAPTER 5

Autocorrelation

5.1. Introduction



(1) What is autocorrelation (AUT)?

(2) What are the properties of the OLS estimator when there is autocor-

relation?

(3) How can the presence of autocorrelation be detected?

(4) How can we deal with autocorrelation if it is present?

• Readings:

Gujarati, Econometria, (cuarta edicion), Chapter 12: Autocorrelación:

¾qué sucede si los términos error están correlacionados?, pp. 425 - 486.

5.2. Motivation

Autocorrelation, which is the serial correlation of the error term, so that E(εtεs 6=

0) for t 6= s, is a problem that is usually associated with time series data, but also

can aect cross-sectional data. For example, a shock to oil prices will simultaneously

aect all countries, so one could expect contemporaneous correlation of macroeco-

nomic variables across countries. Seasonality is another common problem.

Consider the Keeling-Whorf.gdt data. If we regress C02 concentration on a time

trend, we obtain the tted line in 5.2.1. The residuals from the same model are in

Figure 5.2.2. In addition to a high frequency monthly pattern in the residuals, there

67


68 5. AUTOCORRELATION

Figure 5.2.1. Keeling-Whorf CO2 data, t using time trend

Figure 5.2.2. Keeling-Whorf CO2 data, residuals using time trend

is a long term low frequency wave. It is clear that the errors of this model are not

independent over time. This is an example of autocorrelation.

5.3. CAUSES 69

If you examine the residuals of the simple Nerlove model (equation 5.9.1), in

Figure 4.8.1, you can also detect that there appears to be autocorrelation.

In this Chapter, we will explore the causes, eects and treatments for AUT.

5.3. Causes

Autocorrelation is the existence of correlation across the error term:

E(εtεs) 6= 0, t 6= s.

Why might this occur? Plausible explanations include:

(1) Lags in adjustment to shocks. In a model such as

yt = x′tβ + εt,

one could interpret x′tβ as the equilibrium value. Suppose xt is constant over

a number of observations. One can interpret εt as a shock that moves the

system away from equilibrium. If the time needed to return to equilibrium

is long with respect to the observation frequency, one could expect εt+1 to

be positive, conditional on εt positive, which induces a correlation.

(2) Unobserved factors that are correlated over time. The error term is often

assumed to correspond to unobservable factors. If these factors are corre-

lated, there will be autocorrelation.

(3) Misspecication of the model. Suppose that the data generating process

(DGP) is

yt = β0 + β1xt + β2x2t + εt

but we estimate

yt = β0 + β1xt + εt


Figure 5.3.1. Autocorrelation induced by misspecication

The eects are illustrated in Figure 5.3.1. A similar problem might explain

the residuals of the simple Nerlove model, in Figure 4.2.1.

5.4. Eects on the OLS estimator

The variance of the OLS estimator is the same as in the case of heteroscedasticity

- the standard formula does not apply. The correct formula is given in equation 4.4.1.

Next we discuss two GLS corrections for OLS.

5.5. Corrections

There are many types of autocorrelation. The way to correct for the problem

depends on the exact type of autocorrelation that exists. We'll consider two ex-

amples. The rst is the most commonly encountered case: autoregressive order 1

(AR(1) errors.

5.5. CORRECTIONS 71

5.5.1. AR(1). The model is

yt = x′tβ + εt

εt = ρεt−1 + ut

ut ∼ iid(0, σ2u)

E(εtus) = 0, t < s

We assume that the model satises the other classical assumptions.

• We need a stationarity assumption: |ρ| < 1. Otherwise the variance of εt

explodes as t increases, so standard asymptotics will not apply.

• By recursive substitution we obtain


= ρ (ρεt−2 + ut−1) + ut

= ρ2εt−2 + ρut−1 + ut

= ρ2 (ρεt−3 + ut−2) + ρut−1 + ut

In the limit the lagged ε drops out, since ρm → 0 as m→∞, so we obtain

εt =∞∑m=0

ρmut−m

With this, the variance of εt is found as

E(ε2t ) = σ2

u

∞∑m=0

ρ2m

=σ2u

1− ρ2


• If we had directly assumed that εt were covariance stationary, we could

obtain this using

V (εt) = ρ2E(ε2t−1) + 2ρE(εt−1ut) + E(u2

t )

= ρ2V (εt) + σ2u,

so

V (εt) =σ2u

1− ρ2

• The variance is the 0th order autocovariance: γ0 = V (εt)

• Note that the variance does not depend on t

Likewise, the rst order autocovariance γ1 is

Cov(εt, εt−1) = γs = E((ρεt−1 + ut) εt−1)

= ρV (εt)

=ρσ2

u

1− ρ2

• Using the same method, we nd that for s < t

Cov(εt, εt−s) = γs =ρsσ2

u

1− ρ2

• The autocovariances don't depend on t: the process εt is covariance sta-

tionary

The correlation ( in general, for r.v.'s x and y) is dened as

corr(x, y) =cov(x, y)

se(x)se(y)

5.5. CORRECTIONS 73

but in this case, the two standard errors are the same, so the s-order autocorrelation

ρs is

ρs = ρs

• All this means that the overall matrix Σ has the form

Σ =σ2u

1− ρ2︸︷︷︸this is the variance

1 ρ ρ2 · · · ρn−1

ρ 1 ρ · · · ρn−2

.... . .

...

. . . ρ

ρn−1 · · · 1

︸︷︷︸

this is the correlation matrix

So we have homoscedasticity, but elements o the main diagonal are not

zero. All of this depends only on two parameters, ρ and σ2u. If we can

estimate these consistently, we can apply FGLS.

It turns out that it's easy to estimate these consistently. The steps are

(1) Estimate the model yt = x′tβ + εt by OLS.

(2) Take the residuals, and estimate the model

εt = ρεt−1 + u∗t

Since εtp→ εt, this regression is asymptotically equivalent to the regression



which satises the classical assumptions. Therefore, ρ obtained by applying

OLS to εt = ρεt−1 + u∗t is consistent. Also, since u∗t

p→ ut, the estimator

σ2u =

1

n

n∑t=2

(u∗t )2 p→ σ2

u

(3) With the consistent estimators σ2u and ρ, form Σ = Σ(σ2

u, ρ) using the

previous structure of Σ, and estimate by FGLS. Actually, one can omit the

factor σ2u/(1− ρ2), since it cancels out in the formula

βFGLS =(X ′Σ−1X

)−1

(X ′Σ−1y).

• An asymptotically equivalent approach is to simply estimate the trans-

formed model

yt − ρyt−1 = (xt − ρxt−1)′β + u∗t

using n − 1 observations (since y0 and x0 aren't available). This is the

method of Cochrane and Orcutt. Dropping the rst observation is asymp-

totically irrelevant, but it can be very important in small samples. One can

recuperate the rst observation by putting

y∗1 = y1

√1− ρ2

x∗1 = x1

√1− ρ2

Note that the variance of y∗1 is σ2u, asymptotically, so we see that the trans-

formed model will be homoscedastic (and nonautocorrelated, since the u′s

are uncorrelated with the y′s, in dierent time periods.

5.5. CORRECTIONS 75

5.5.2. MA(1). The linear regression model with moving average order 1 errors

is

yt = x′tβ + εt

εt = ut + φut−1

ut ∼ iid(0, σ2u)

E(εtus) = 0, t < s

In this case,

V (εt) = γ0 = E[(ut + φut−1)

2]= σ2

u + φ2σ2u

= σ2u(1 + φ2)

Similarly

γ1 = E [(ut + φut−1) (ut−1 + φut−2)]

= φσ2u

and

γ2 = [(ut + φut−1) (ut−2 + φut−3)]

= 0


so in this case

Σ = σ2u

1 + φ2 φ 0 · · · 0

φ 1 + φ2 φ

0 φ. . .

...

.... . . φ

0 · · · φ 1 + φ2

Note that the rst order autocorrelation is

ρ1 = φσ2u

σ2u(1+φ2)

=γ1

γ0

=φ

(1 + φ2)

• This achieves a maximum at φ = 1 and a minimum at φ = −1, and the

maximal and minimal autocorrelations are 1/2 and -1/2. Therefore, series

that are more strongly autocorrelated can't be MA(1) processes.

Again the covariance matrix has a simple structure that depends on only two pa-

rameters. The problem in this case is that one can't estimate φ using OLS on

εt = ut + φut−1

because the ut are unobservable and they can't be estimated consistently. However,

there is a simple way to estimate the parameters.

• Since the model is homoscedastic, we can estimate

V (εt) = σ2ε = σ2

u(1 + φ2)

using the typical estimator:

σ2ε = σ2

u(1 + φ2) =1

n

n∑t=1

ε2t

5.6. VALID INFERENCES WITH AUTOCORRELATION OF UNKNOWN FORM 77

• By the Slutsky theorem, we can interpret this as dening an (unidentied)

estimator of both σ2u and φ, e.g., use this as

σ2u(1 + φ2) =

1

n

n∑t=1

ε2t

However, this isn't sucient to dene consistent estimators of the parame-

ters, since it's unidentied.

• To solve this problem, estimate the covariance of εt and εt−1 using

Cov(εt, εt−1) = φσ2u =

1

n

n∑t=2

εtεt−1

This is a consistent estimator, following a LLN (and given that the epsilon

hats are consistent for the epsilons). As above, this can be interpreted as

dening an unidentied estimator:

φσ2u =

1

n

n∑t=2

εtεt−1

• Now solve these two equations to obtain identied (and therefore consistent)

estimators of both φ and σ2u. Dene the consistent estimator

Σ = Σ(φ, σ2u)

following the form we've seen above, and transform the model using the

Cholesky decomposition. The transformed model satises the classical as-

sumptions asymptotically.

5.6. valid inferences with autocorrelation of unknown form

In Section 4.7.2.1 we saw that it is possible to consistently estimate the correct

covariance matrix of the OLS estimator when there is HET. It is also possible to do


this when there is AUT, or both HET and AUT. The details are beyond the scope

of this course.

It is important to remember that a correction for autocorrelation will only give an

ecient estimator and valid test statistics if the model of autocorrelation is correct.

It may be hard to determine which is the correct model for the autocorrelation of

the errors, so one may prefer to foregoe the GLS correction and simply use OLS.

If this is done, one needs to account for the existence of AUT when estimating the

covariance of the parameters, to obtain correct test statistics. We will see examples

in the Projecte de Docència Tutoritzada.

5.7. Testing for autocorrelation

Breusch-Godfrey test

This test uses an auxiliary regression, as does the White test for heteroscedas-

ticity. The regression is

εt = x′tδ + γ1εt−1 + γ2εt−2 + · · ·+ γP εt−P + vt

and the test statistic is the nR2 statistic, just as in the White test. There are P

restrictions, so the test statistic is asymptotically distributed as a χ2(P ).

• The intuition is that the lagged errors shouldn't contribute to explaining

the current error if there is no autocorrelation.

• xt is included as a regressor to account for the fact that the εt are not

independent even if the εt are. This is a technicality that we won't go into

here.

• This test is valid even if the regressors are stochastic and contain lagged

dependent variables.

5.8. LAGGED DEPENDENT VARIABLES AND AUTOCORRELATION: A CAUTION 79

• The alternative is not that the model is an AR(P), following the argument

above. The alternative is simply that some or all of the rst P autocorrelations

are dierent from zero. This is compatible with many specic forms of au-

tocorrelation.

5.8. Lagged dependent variables and autocorrelation: A Caution

We've seen that the OLS estimator is consistent under autocorrelation, as long as

plimX′εn

= 0. This will be the case when E(X ′ε) = 0, following a LLN. An important

exception is the case where X contains lagged y′s and the errors are autocorrelated.

A simple example is the case of a single lag of the dependent variable with AR(1)

errors. The model is

yt = x′tβ + yt−1γ + εt


Now we can write

E(yt−1εt) = E

(x′t−1β + yt−2γ + εt−1)(ρεt−1 + ut)

6= 0

since one of the terms is E(ρε2t−1) which is clearly nonzero. In this case E(X ′ε) 6= 0,

and therefore plimX′εn6= 0. Since

plimβ = β + plimX ′ε

n

the OLS estimator is inconsistent in this case. One needs to estimate by instrumental

variables (IV). This is a topic that is beyond the scope of this course. It is important

to be aware of the possibility that the OLS estimator can be inconsistent, though.


5.9. Quart Projecte de Docència Tutoritzada

Fent servir les dades de Nerlove (ja heu fet servir les dades, però el txer Excel

està aqui si cal.)

(1) Pel model senzill

(5.9.1) ln(cost) = β1 +β2 ln(output)+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ ε

(a) estimar el model amb MQO

(b) Fes servir el contrast de Breusch-Godfrey per comprovar si hi ha auto-

correlació. Important: Per poder fer aix s'haurà de donar una estruc-

tura de serie temporal a les dades.

(c) fer un gràc dels residus, i dona una interpretació de si es veu o no un

problema d'autocorrelació.

(2) Repetir exercici 1, però fent servir el model

ln(cost) =5∑j=1

αjdj+5∑j=1


que es va presentar en Secció 2.5.3.

(3) Amb les dades Keeling-Whorf.gdt

(a) estimar el model

CO2t = β1 + β2t+ εt

(b) comprova si hi ha autocorrelació fent servir el contrast de Breusch-

Godfrey.

(c) fer un gràc dels residus

(d) tornar a estimar el model fent servir els métodes de Cochrane-Orcutt

i Prais-Winsten, i fer gràcs dels residus.



5.10. CHAPTER EXERCISES 81

(e) comentar tots els resultats



you might also consider exercises 12.1, 12.8, 12.9, 12.11, 12.14, 12.17, 12.22, 12.26,

12.28 from Gujarati, pp. 472-486.

CHAPTER 6

Data sets

This chapter gives links to the data sets referred to in the Study Guide

Wisconsin height-income data (comma separated values)

Wisconsin height-income data (Gretl data le)

Nerlove data (Excel spreadsheet le)

Nerlove data (Gretl data le)

Keeling-Whorf CO2 data (Gretl data le)

Cigarette-Alcohol Mortality data (Gretl data le)

83

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/wls.csv

http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/wisconsin.gdt


http://pareto.uab.es/mcreel/EconometricsStudyGuide/data/nerlove.gdt



gretl

Documents