gretl
DESCRIPTION
gretlTRANSCRIPT
Study Guide for Econometrics
(second semester)
Programa Universitat-Empresa
Universitat Autònoma de Barcelona
February 2008
Michael Creel and Montserrat Farell
Contents
Introduction 7
Econometrics at the Facultat 7
About this study guide 8
Bibliograpy 9
Chapter 1. GRETL 11
1.1. Introduction 11
1.2. Getting Started 12
1.3. Chapter Exercises 20
Chapter 2. Dummy Variables 23
2.1. Introduction 23
2.2. Motivation 23
2.3. Denition, Basic Use, and Interpretation 25
2.4. Additional Details 28
2.5. Primer Projecte Docencia Tutoritzada 30
2.6. Chapter Exercises 34
Chapter 3. Collinearity 35
3.1. Introduction 35
3.2. Motivation: Data on Mortality and Related Factors 35
3.3. Denition and Basic Concepts 38
3.4. When does it occur? 39
3
4 CONTENTS
3.5. Consequences of Collinearity 40
3.6. Detection of Collinearity 44
3.7. Dealing with collinearity 45
3.8. Segon Projecte de Docencia Tutoritzada 45
3.9. Chapter Exercises 46
Chapter 4. Heteroscedasticity 47
4.1. Introduction 47
4.2. Motivation 47
4.3. Basic Concepts and Denitions 48
4.4. Eects of Het. and Aut. on the OLS estimator 50
4.5. The Generalized Least Squares (GLS) estimator 52
4.6. Feasible GLS 55
4.7. Heteroscedasticity 56
4.8. Example 64
4.9. Tercer Projecte de Docència Tutoritzada 64
4.10. Chapter Exercises 66
Chapter 5. Autocorrelation 67
5.1. Introduction 67
5.2. Motivation 67
5.3. Causes 69
5.4. Eects on the OLS estimator 70
5.5. Corrections 70
5.6. valid inferences with autocorrelation of unknown form 77
5.7. Testing for autocorrelation 78
5.8. Lagged dependent variables and autocorrelation: A Caution 79
CONTENTS 5
5.9. Quart Projecte de Docència Tutoritzada 80
5.10. Chapter Exercises 81
Chapter 6. Data sets 83
Introduction
Econometrics at the Facultat
Econometrics (Econometria) is an annual (two semester) course in the Facultat
de Ciències Econòmiques i Empresarials at the UAB. It is a required course for
the degree of Llicenciat in both Administració i Direcció d'Empreses (ADE) and
Economia (ECO). In both ADE and ECO, Econometrics is normally taken in the
third year of study.
Econometrics is an area of Economics that uses statistical and mathematical
tools to analyze data on economic phenomena. Econometrics can be used to nd
a mathematical model that gives a good representation of an actual economy, to
test theories about how an economy behaves, or to make predictions about how
an economy will evolve. Estimation of models, testing hypotheses, and making
predictions are things that can be done using econometric methods.
Courses that are fundamental for successfully studying Econometrics are Matemà-
tiques per a Economistes I and Matemàtiques per a Economistes II (rst year of
study) and Estadistica I and Estadistica II (second year of study). Ideally, stu-
dents should have passed these courses before beginning Econometrics. If this is
not possible, any student of Econometrics should immediately begin serious review
of the material covered in these courses. Basic matrix algebra, constrained and
unconstrained minimization of functions, conditional and unconditional expecta-
tions of random variables, and hypothesis testing are areas that should be reviewed.
7
8 INTRODUCTION
Microeconomia I and Microeconomia II are courses that provide a theoretical
background which is important to understand why and how we use econometric
tools. Macroeconomia I also provides a theoretical background for some of the
examples of the second half of Econometrics.
About this study guide
This study guide covers the material taught in the second semester, in groups 13
and 14 (the groups of the PUE). The guide contains brief notes for all of the material,
as well as examples that use the GRETL. This guide does not substitute reading a
textbook, it accompanies a textbook. It also does not substitute attending class. The
guide highlights essential concepts, provides examples, and gives exercises. However,
class lectures contain details that are not reproduced in the guide. To learn these
details, attending class is fundamental, as is careful reading of a textbook. The
guide provides references to the book Econometría (cuarta edición) by D. Gujarati,
mentioned below. In the second semester of Econometrics, we will cover material in
Chapters 9, 10, 11 and 12 of Gujarati's book.
This guide has been checked to work properly using the Firefox web browser,
and Adobe Acrobat Reader. Both of these packages are freely available for the
commonly used operating systems. You should congure Acrobat Reader to use
Firefox to open links. This study guide and related materials (data sets, copies of
software and manuals, etc.) are available at the Econometrics Study Guide web
page.
BIBLIOGRAPY 9
Bibliograpy
There are many excellent textbooks for econometrics. Any of the following are
appropriate. This study guide refers to Gujarati's book. You should denitely read
the appropriate sections of at least one of these books.
(1) Novales, A. , Econometria, McGraw-Hill
(2) Gujarati, D. , Econometria, McGraw-Hill
(3) Johnston, J. i J. Dinardo, Metodos de Econometria, Vicens Vives
(4) Kmenta, J., Elementos de Econometria, Vicens Vives
(5) Maddala, G.S.(1996), Introducción a la econometria, Segona edició. Pren-
tice Hall
(6) Pindyck, R.S. & Rubinfeld, D.L. (2001), Econometria: modelos y pronós-
ticos, McGraw-Hill. Quarta Edició.
CHAPTER 1
GRETL
1.1. Introduction
GRETL (GRETL http://gretl.sourceforge.net/) is a free computer pack-
age for doing econometrics. It is installed on the computers in Aules 21-22-23 as
well as in the Social Sciences computer rooms. You can download a copy and install
it on your own computer. It works with Windows, Macs, and Linux. It is avail-
able in a number of languages, including Spanish. The version for Windows, along
with the manual and the data sets that accompany D. Gujarati's Econometría are
distributed with this study guide, and are also available :
• Gretl v. 1.7.1 for Windows
• Data to accompany Gujarati's book
The examples in this study guide use GRETL, and to do the class assignments you
will need to use GRETL. This chapter explains the basic steps of using GRETL.
• Basic concepts and goals for learning:
(1) become familiar with the basic use of GRETL
(2) learn how to load ASCII and spreadsheet data
(3) learn how to select certain observations in a data set
• Readings: GRETL manual in Spanish or in English . You don't have to
read the whole manual, but looking though it would be good idea.
11
12 1. GRETL
Figure 1.2.1. GRETL's startup window
1.2. Getting Started
Once you start GRETL, you see the window in Figure 1.2.1. You need to load
some data to use GRETL. Data comes in many forms: plain text les, spreadsheet
les, binary les that use special formats, etc. GRETL can use most of these forms.
We'll look at how to deal with two cases: plain ASCII text data, and Microsoft
Excel spreadsheet data.
1.2.1. Loading ASCII text data. The Wisconsin longitudinal survey is long
term study of people who graduated from high school in the state of Wisconsin (US)
during the year 1957. The data has been collected repeatedly in subsequent years.
1.2. GETTING STARTED 13
This data can be obtained over the Internet from the address given previously. In
Figure 1.2.2 you can see that several variables have been selected for download.
Figure 1.2.2. Downloading data
In Figure 1.2.3 you see that one of the available formats is comma separated
values (csv), which provides records (lines) that have variables which may be text
or numbers, each separated by commas. Downloading that gives us the le wls.csv
, the rst few lines of which are
iduser,ix010rec,sexrsp,gg021jjd,gwiiq_bm
1001,60,2,18000,109
1002,,1,,79
1003,,2,,111
1004,,1,,96
1005,,2,,83
1006,65,2,-2,99
14 1. GRETL
Figure 1.2.3. comma separated
1007,70,1,-2,86
1008,71,1,-2,86
1009,67,2,16827,106
1010,72,1,17094,88
1011,67,2,7698,124
1012,,2,-2,124
This rst line of the le gives the variable names, and the other lines are the individ-
ual records, one for each person. There are a total of 10317 records, for individual
people. Some variables are missing for some people. In the data set, this is indicated
by two commas in a row with no number in between.
We need to know how to load this data into GRETL. This can be done as is seen
in Figure 1.2.4. Doing that, we now have the data in GRETL, as we seen in Figure
1.2. GETTING STARTED 15
Figure 1.2.4. Loading a csv le
1.2.5.
This data set has some problems that make it dicult to use. First, the variable
names are strange and not intuitive. Second, many observations have missing values.
You can change names of variables by right-clicking on a variable, and selecting
Edit attributes. Then change the name to whatever you like. See Figure 1.2.6. To
see that many observations are missing values, right-click on a variable and choose
Display values or Descriptive statistics. For example, the variable income (I
renamed gg021jjd to income) shows what we see in Figure 1.2.7.
16 1. GRETL
Figure 1.2.5. CSV data loaded
Figure 1.2.6. Changing a variable's name
1.2. GETTING STARTED 17
Figure 1.2.7. Missing observations
To eliminate missing observations, we can select from the menu Sample -> Re-
strict, based on criterion, as in Figure 1.2.8. We need to enter a selection criterion.
This data set is missing many observation on income and age. We can select that
these variables must be positive. This is illustrated in Figure 1.2.9. Once we do this,
the new sample has 4934 observations, as we can seen in Figure 1.2.10. Whenever
you are using this data, you should make sure that you have removed the observa-
tions with missing data.
1.2.2. Loading spreadsheet data. Data is often distributed as spreadsheet
les. These are easy to load into GRETL using the File -> Open data -> Import
option. Figure 1.2.11 shows how to do it. We need some spreadsheet data to try
18 1. GRETL
Figure 1.2.8. Select sample, 1
this. Get the nerlove.xls data, and then import is as I have just explained. Once
you do this you will see the dialog in Figure 1.2.12. Select no.
1.2. GETTING STARTED 19
Figure 1.2.9. Selection criterion
Figure 1.2.12. Data dialog
20 1. GRETL
Figure 1.2.10. Restricted sample
1.3. Chapter Exercises
(1) For the Wisconsin data set:
(a) change the variable name of the variable ix010rec to age
(b) change the name of gg021jjd to income
(c) change the name of gwiiq_bm to IQ.
(d) select observations such that age and income are positive. You
should have 4934 observations after doing so.
(e) save the restricted data, with new variable names, as the data set wis-
consin.gdt. Conrm that you can load this data into a new GRETL
session.
(2) With your wisconsin.gdt data set:
(a) explore the GRETL menu options, the help features, and the manual,
and print histograms (frequency plots) for the variables age, income
and IQ.
1.3. CHAPTER EXERCISES 21
Figure 1.2.11. Loading spreadsheet data
(b) print descriptive statistics for all variables.
CHAPTER 2
Dummy Variables
2.1. Introduction
• Basic concepts and goals for learning. After studying the material, you
should be able to answer the following questions:
(1) What is a dummy variable?
(2) How can dummy variables be used in regression models?
(3) What is the correct interpretation of a regression model that contains
dummy variables?
(4) How can dummy variables be used in the cases of multiple categories,
interaction terms, and seasonality?
(5) What is the equivalence between the dierent parameterizations that
can be used when incorporating dummy variables?
• Readings:
(1) Gujarati, Econometria, (cuarta edicion), Chapter 9: Modelos de re-
gressión con variables dicótomas, pp. 285 - 320.
2.2. Motivation
Often, qualitative factors can have an important eect on the dependent variable
we may be interested in. Consider the Wisconsin data set wisconsin.gdt . If we
regress income on height, having selected the sample to include men only, we obtain
the tted line in Figure 2.2.1. Doing the same for the sample of women, we get
Figure 2.2.2. Comparing the two plots, we can see that:
23
24 2. DUMMY VARIABLES
Figure 2.2.1. Income regressed on height, men
Figure 2.2.2. Income regressed on height, women
2.3. DEFINITION, BASIC USE, AND INTERPRETATION 25
• the y-intercept is higher for men than for women
• the slope of the line is steeper for men than for women
• men are taller on average - for men, mean height is around 70 inches, while
for women it's about 65 inches
There are a few questions we might ask:
• why does income appear to depend upon height? What economic explana-
tions are possible?
• why do women appear to be earning less than men, other things equal?
Apart from these questions, it is clear that a qualitative feature - the sex of the
individual - has an impact upon the individual's expected income.
• How can we incorporate such a qualitative characteristic into an economet-
ric model?
The need to use qualitative information in our models motivates the study of dummy
variables.
2.3. Denition, Basic Use, and Interpretation
Dummy variable (denition): A dummy variable is a binary-valued variable
that indicates whether or not some condition is true. It is customary to assign the
value 1 if the condition is true, and 0 if the condition is false.
Dummy variable (example): for the Wisconsin data, the variable sexrsp takes
the value 1 for men, and 2 for women. As such, sexrsp is not a dummy variable,
since the values are not 0 or 1. We can dene the condition Is the person a woman?
This is equivalent to the condition Is the value of sexrsp equal 2?. This condition
will be true for some observations, and false for others. With GRETL, we can dene
such a dummy variable, using the Variable -> Dene new variable menu item, as in
26 2. DUMMY VARIABLES
Figure 2.3.1. Dening a dummy variable
Figure 2.3.2. Display values
Figure 2.3.1. To check that this worked properly, highlight both variables, R-click,
and select Display values. This shows us what we see in Figure 2.3.2. Note that
woman is now a variable like any other, that takes on the values 0 or 1.
2.3.1. Basic use and interpretation. Dummy variables are used essentially
like any other regressor. In class we will discuss the following models. Variables
like dt and dt2 are understood to be dummy variables. Variables like xt and xt3 are
2.3. DEFINITION, BASIC USE, AND INTERPRETATION 27
ordinary continuous regressors. You should understand the interpretation of all of
them.
yt = β1 + β2dt + εt
yt = β1dt + β2(1− dt) + εt
yt = β1 + β2dt + β3xt + εt
Interaction terms: an interaction term is the product of two variables, so that
the eect of one variable on the dependent variable depends on the value of the
other. The following model has an interaction term. Note that ∂E(y|x)∂x
= β3 + β4dt.
The slope depends on the value of dt.
yt = β1 + β2dt + β3xt + β4dtxt + εt
Multiple dummy variables: we can use more than one dummy variable in a
model. We will study models of the form
yt = β1 + β2dt1 + β3dt2 + β4xt + εt
yt = β1 + β2dt1 + β3dt2 + β4dt1dt2 + β5xt + εt
Incorrect usage: You should understand why the following models are not
correct usages of dummy variables:
(1) overparameterization:
yt = β1 + β2dt + β3(1− dt) + εt
(2) multiple values assigned to multiple categories. Suppose that we a condition
that denes 4 possible categories, and we create a variable d = 1 if the
observation is in the rst category, d = 2 if in the second, etc. (This is not
strictly speaking a dummy variable, according to our denition). Why is
the following model not a good one?
yt = β1 + β2d+ ε
What is the correct way to deal with this situation?
2.4. Additional Details
Seasonality and dummy variables. Dummy variables can be used to treat
seasonal variations in data. We will use the Keeling-Whorf.gdt data to illustrate
this. You should be able to use GRETL to reproduce the following results:
Model 1: OLS estimates using the 468 observations 1965:012003:12
Dependent variable: C02
2.4. ADDITIONAL DETAILS 29
Variable Coecient Std. Error t-statistic p-value
djan 316.864 0.210610 1504.5009 0.0000
dfeb 317.533 0.210789 1506.4046 0.0000
dmar 318.271 0.210967 1508.6276 0.0000
dapr 319.418 0.211147 1512.7780 0.0000
dmay 319.848 0.211327 1513.5233 0.0000
djun 319.187 0.211507 1509.1057 0.0000
djul 317.653 0.211688 1500.5705 0.0000
daug 315.539 0.211870 1489.3056 0.0000
dsep 313.690 0.212052 1479.3061 0.0000
doct 313.548 0.212235 1477.3572 0.0000
dnov 314.792 0.212419 1481.9367 0.0000
ddec 315.961 0.212603 1486.1530 0.0000
time 0.121327 0.000404332 300.0664 0.0000
Mean of dependent variable 345.310
S.D. of dependent variable 16.5472
Sum of squared residuals 634.978
Standard error of residuals (σ) 1.18134
Unadjusted R2 0.995034
Adjusted R2 0.994903
F (12, 455) 7597.57
DurbinWatson statistic 0.0634062
and the plot in Figure 2.4.1.
Multiple parameterizations. To formulate a model that conditions on a given
set of categorical information, there are multiple ways to use dummy variables. For
30 2. DUMMY VARIABLES
Figure 2.4.1. Keeling-Whorf CO2 data, t using monthly dummies
example, the two models
yt = β1dt + β2(1− dt) + β3xt + β4dtxt + εt
and
yt = α1 + α2dt + α3xtdt + α4xt(1− dt) + εt
are equivalent. You should know what are the 4 equations that relate the βj pa-
rameters to the αj parameters, j = 1, 2, 3, 4. You should know how to interpret the
parameters of both models.
2.5. Primer Projecte Docencia Tutoritzada
Podeu treballar en grups de ns 5 alumnes. L'avaluació formarà part de la
nota dels exercicis. Recomano instalar Gretl en un ordinador portàtil amb WiFi,
2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA 31
per poder treballar comodament. Heu d'entregar abans del dia 1 de juny un breu
informe (10 pàgines màxim) sobre el següent:
2.5.1. Theoretical background. For a rm that takes input prices w and the
output level q as given, the cost minimization problem is to choose the quantities of
inputs x to solve the problem
minxw′x
subject to the restriction
f(x) = q.
The solution is the vector of factor demands x(w, q). The cost function is obtained
by substituting the factor demands into the criterion function:
Cw, q) = w′x(w, q).
• Monotonicity Increasing factor prices cannot decrease cost, so
∂C(w, q)
∂w≥ 0
Remember that these derivatives give the conditional factor demands (Shep-
hard's Lemma).
• Homogeneity The cost function is homogeneous of degree 1 in input prices:
C(tw, q) = tC(w, q) where t is a scalar constant. This is because the factor
demands are homogeneous of degree zero in factor prices - they only depend
upon relative prices.
32 2. DUMMY VARIABLES
• Returns to scale The returns to scale parameter γ is dened as the inverse
of the elasticity of cost with respect to output:
γ =
(∂C(w, q)
∂q
q
C(w, q)
)−1
Constant returns to scale is the case where increasing production q implies
that cost increases in the proportion 1:1. If this is the case, then γ = 1.
2.5.2. Cobb-Douglas functional form. The Cobb-Douglas functional form
is linear in the logarithms of the regressors and the dependent variable. For a cost
function, if there are g factors, the Cobb-Douglas cost function has the form
C = Aqβqwβ1
1 ...wβgg e
ε
What is the elasticity of C with respect to wj?
eCwj=
(∂C
∂WJ
)(wjC
)= βjAq
βqwβ1
1 .wβj−1j ..wβg
g eε wj
Aqβqwβ1
1 ...wβgg eε
= βj
This is one of the reasons the Cobb-Douglas form is popular - the coecients are easy
to interpret, since they are the elasticities of the dependent variable with respect to
the explanatory variable. Not that in this case,
eCwj=
(∂C
∂WJ
)(wjC
)= xj(w, q)
wjC
≡ sj(w, q)
2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA 33
the cost share of the jth input. So with a Cobb-Douglas cost function, βj = sj(w, q).
The cost shares are constants.
Note that after a logarithmic transformation we obtain
lnC = α + βq ln q + β1 lnw1 + ...+ βg lnwg + ε
where α = lnA . So we see that the transformed model is linear in the logs of the
data.
One can verify that the property of HOD1 implies that
g∑i=1
βg = 1
In other words, the cost shares add up to 1.
The hypothesis that the technology exhibits CRTS implies that
γ =1
βq= 1
so βq = 1. Likewise, monotonicity implies that the coecients βi ≥ 0, i = 1, ..., g.
2.5.3. The Nerlove data and OLS. The le nerlove.xls contains data on 145
electric utility companies' cost of production, output and input prices. The data are
for the U.S., and were collected by M. Nerlove. The observations are by row, and the
columns are COMPANY, COST (C), OUTPUT (Q), PRICE OF LABOR
(PL), PRICE OF FUEL (PF ) and PRICE OF CAPITAL (PK). Note that the
data are sorted by output level (the third column).
(1) Baixar les dades nerlove.xls (és un txer Excel).
(2) Importar les dades en Gretl
(3) Crear logaritmes de cost, output, labor, fuel, capital
34 2. DUMMY VARIABLES
(4) Estimar amb MQO el model
(2.5.1) ln(cost) = β1 +β2 ln(output)+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ ε
(5) Comentar els resultats, en general, i especicament respecte homogeneitat
de grau 1 i rendiments a escala
(6) Crear variables ctícies
(a) d1 = 1 si 101 <= rm <= 129, d1 = 0 al contrari
(b) d2 = 1 si 201 <= rm <= 229, d2 = 0 al contrari
(c) d3 = 1 si 301 <= rm <= 329, d3 = 0 al contrari
(d) d4 = 1 si 401 <= rm <= 429, d4 = 0 al contrari
(e) d5 = 1 si 501 <= rm <= 529, d5 = 0 al contrari
(7) Estimar el model
(2.5.2)
ln(cost) =5∑j=1
αjdj+5∑j=1
γj [dj ln(output)]+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ε
(8) Comentar resultats, enfatitzant rendiments a escala. Presentar un gràc
representant rendiments a escala com una funció del tamany de l'empresa.
Interpretar el gràc.
(9) Contrastar restriccions α1 = α2 = α3 = α4 = α5 conjuntament amb γ1 =
γ2 = γ3 = γ4 = γ5 i interpretar el resultat.
2.6. Chapter Exercises
The professor of the practical session will give you a problem list. The problems
9.1, 9.2, 9.3, 9.5, 9.6, 9.13, 9.15 on pages 311-320 of Gujarati's book are recommended
for study.
CHAPTER 3
Collinearity
3.1. Introduction
• Basic concepts and goals for learning. After studying the material, you
should learn the answers to the following questions:
(1) What is collinearity?
(2) What are the eects of collinearity on the OLS estimator: how does it
aect estimation, hypothesis testing and prediction?
(3) How can the presence of collinearity be detected?
(4) What can be done to improve the situation if collinearity is a problem?
• Readings:
Gujarati, Econometria, (cuarta edicion), Chapter 10: Multicolinalidad:
¾Que pasa si las regresoras están correlacionadad?, pp. 327-371.
3.2. Motivation: Data on Mortality and Related Factors
The data set mortalitat.gdt contains annual data from 1947 - 1980 on death rates
in the U.S., along with data on factors like smoking and consumption of alcohol.
The data description is:
DATA4-7: Death rates in the U.S. due to coronary heart disease and their
determinants. Data compiled by Jennifer Whisenand
• chd = death rate per 100,000 population (Range 321.2 - 375.4)
• cal = Per capita consumption of calcium per day in grams (Range 0.9 -
1.06)
35
• unemp = Percent of civilian labor force unemployed in 1,000 of persons 16
years and older (Range 2.9 - 8.5)
• cig = Per capita consumption of cigarettes in pounds of tobacco by persons
18 years and olderapprox. 339 cigarettes per pound of tobacco (Range
6.75 - 10.46)
• edfat = Per capita intake of edible fats and oil in poundsincludes lard,
margarine and butter (Range 42 - 56.5)
• meat = Per capita intake of meat in poundsincludes beef, veal, pork, lamb
and mutton (Range 138 - 194.8)
• spirits = Per capita consumption of distilled spirits in taxed gallons for
individuals 18 and older (Range 1 - 2.9)
• beer = Per capita consumption of malted liquor in taxed gallons for indi-
viduals 18 and older (Range 15.04 - 34.9)
• wine = Per capita consumption of wine measured in taxed gallons for indi-
viduals 18 and older (Range 0.77 - 2.65)
Consider the models, with the estimation results:
chd = β1 + β2cig + β3spirits+ β4beer + β5wine+ ε
chd = 334.914(58.939)
+ 5.41216(5.156)
cig + 36.8783(7.373)
spirits− 5.10365(1.2513)
beer
+ 13.9764(12.735)
wine
T = 34 R2 = 0.5528 F (4, 29) = 11.2 σ = 9.9945
(standard errors in parentheses)
chd = β1 + β2cig + β3spirits+ β4beer + ε
chd = 353.581(56.624)
+ 3.17560(4.7523)
cig + 38.3481(7.275)
spirits− 4.28816(1.0102)
beer
T = 34 R2 = 0.5498 F (3, 30) = 14.433 σ = 10.028
(standard errors in parentheses)
chd = β1 + β2cig + β3spirits+ β5wine+ ε
chd = 243.310(67.21)
+ 10.7535(6.1508)
cig + 22.8012(8.0359)
spirits− 16.8689(12.638)
wine
T = 34 R2 = 0.3198 F (3, 30) = 6.1709 σ = 12.327
(standard errors in parentheses)
chd = β1 + β2cig + β3spirits+ ε
chd = 181.219(49.119)
+ 16.5146(4.4371)
cig + 15.8672(6.2079)
spirits
T = 34 R2 = 0.3026 F (2, 31) = 8.1598 σ = 12.481
(standard errors in parentheses)
38 3. COLLINEARITY
Note how the signs of the coecients change depending on the model, and that
the magnitude of the parameter estimates varies a lot too. The parameter estimates
are highly sensitive to the particular model we estimate. Why? We'll see that the
problem is that the data exhibit collinearity.
3.3. Denition and Basic Concepts
Collinearity (denition): Collinearity is the existence of linear relationships
amongst the regressors. We can always write
λ1x1 + λ2x2 + · · ·+ λKxK + v = 0
where xi is the ith column of the regressor matrix X, and v is an n × 1 vector. In
the case that there exists collinearity, the variation in v is relatively small, so that
there is an approximately exact linear relation between the regressors.
• relative and approximate are imprecise terms, so the existence of collinear-
ity is also an imprecise, relative concept.
• many authors, including Gujarati, use the term multicollinearity. Some,
including myself, prefer to call the phenomenon collinearity. Collinear-
ity as used here means exactly what Gujarati and others refer to as mul-
ticollinearity.
Exact (or Perfect) Collinearity (denition):
In the extreme, if there are exact linear relationships, we can write
λ1x1 + λ2x2 + · · ·+ λKxK = 0
3.4. WHEN DOES IT OCCUR? 39
In this case, ρ(X) < K, so ρ(X ′X) < K, so X ′X is not invertible and the OLS esti-
mator is not uniquely dened. The existence of exact linear relationships amongst
the regressors is known as perfect collinearity or exact collinearity.
For example, if the model is
yt = β1 + β2x2t + β3x3t + εt
x2t = α1 + α2x3t
then we can write
yt = β1 + β2 (α1 + α2x3t) + β3x3t + εt
= β1 + β2α1 + β2α2x3t + β3x3t + εt
= (β1 + β2α1) + (β2α2 + β3)x3t
= γ1 + γ2x3t + εt
• The γ′s can be consistently estimated, but since the γ′s dene two equations
in three β′s, the β′s can't be consistently estimated (there are multiple val-
ues of β that solve the rst order conditions that dene the OLS estimator).
The β′s are unidentied in the case of perfect collinearity.
3.4. When does it occur?
Perfect collinearity:
• Perfect collinearity is unusual, except in the case of an error in construction
of the regressor matrix, such as including the same regressor twice.
• Another case where perfect collinearity may be encountered is with models
with dummy variables, if one is not careful. Consider a model of rental
40 3. COLLINEARITY
price (yi) of an apartment. This could depend factors such as size, quality
etc., collected in xi, as well as on the location of the apartment. Let Bi = 1
if the ith apartment is in Barcelona, Bi = 0 otherwise. Similarly, dene Gi,
Ti and Li for Girona, Tarragona and Lleida. One could use a model such
as
yi = β1 + β2Bi + β3Gi + β4Ti + β5Li + x′iγ + εi
In this model, Bi+Gi+Ti+Li = 1, ∀i, so there is an exact relationship be-
tween these variables and the column of ones corresponding to the constant.
One must either drop the constant, or one of the qualitative variables.
Collinearity (inexact):
The more common case, if one doesn't make mistakes such as these, is the
existence of inexact linear relationships, i.e., correlations between the regressors
that are less than one in absolute value, but not zero. This is (unfortunately) quite
common with economic data.
• economic data is non-experimental, so a researcher cannot control the values
of the variables.
• common factors aect dierent variables at the same time, which tends to
induce correlations. Variables tend to move together over time (for example,
prices of apartments in Barcelona and in Valencia).
3.5. Consequences of Collinearity
The basic problem is that when two (or more) variables move together, it is di-
cult to determine their separate inuences. This is reected in imprecise estimates,
i.e., estimates with high variances. With economic data, collinearity is commonly
encountered, and is often a severe problem.
3.5. CONSEQUENCES OF COLLINEARITY 41
Figure 3.5.1. s(β) when there is no collinearity
When there is collinearity, the minimizing point of the objective function that
denes the OLS estimator (s(β), the sum of squared errors) is relatively poorly
dened. This is seen in Figures 3.5.1 and 3.5.2.
To see the eect of collinearity on variances, partition the regressor matrix as
X =[
x W]
where x is the rst column of X (note: we can interchange the columns of X if
we like, so there's no loss of generality in considering the rst column). Now, the
variance of β, under the classical assumptions, is
V (β) = (X ′X)−1σ2
42 3. COLLINEARITY
Figure 3.5.2. s(β) when there is collinearity
Using the partition,
X ′X =
x′x x′W
W ′x W ′W
and following a rule for partitioned inversion,
(X ′X)−11,1 =
(x′x− x′W (W ′W )−1W ′x
)−1
=(x′(In −W (W ′W )
′1W ′)
x)−1
=(ESSx|W
)−1
where by ESSx|W we mean the error sum of squares obtained from the regression
x = Wλ+ v.
3.5. CONSEQUENCES OF COLLINEARITY 43
Since
R2 = 1− ESS/TSS,
we have
ESS = TSS(1−R2)
so the variance of the coecient corresponding to x is
V (βx) =σ2
TSSx(1−R2x|W )
We see three factors inuence the variance of this coecient. It will be high if
(1) σ2 is large
(2) There is little variation in x. Draw a picture here.
(3) There is a strong linear relationship between x and the other regressors, so
thatW can explain the movement in x well. In this case, R2x|W will be close
to 1. As R2x|W → 1, V (βx)→∞.
The last of these cases is collinearity.
Intuitively, when there are strong linear relations between the regressors, it is
dicult to determine the separate inuence of the regressors on the dependent vari-
able. This can be seen by comparing the OLS objective function in the case of no
correlation between regressors with the objective function with correlation between
the regressors. See Figures 3.5.1 and 3.5.2.
Consequences - summary:
• the parameters associated with variables aected by collinearity have high
variances.
• high variances lead to low power when testing hypotheses.
• high variances lead to low t-statistics, broad condence intervals, etc.
• the results are sensitive to small changes in the sample.
3.6. Detection of Collinearity
• The best way is simply to regress each explanatory variable in turn on the
remaining regressors. If any of these auxiliary regressions has a high R2,
there is a problem of collinearity. Furthermore, this procedure identies
which parameters are aected.
Sometimes, we're only interested in certain parameters. Collinearity
isn't a problem if it doesn't aect what we're interested in estimating.
• An alternative is to examine the matrix of correlations between the regres-
sors. High correlations are sucient but not necessary for severe collinearity.
There may be a near exact linear relationship between 3 variables without
the existence of any near exact linear relationship between pairs of variables.
• Also indicative of collinearity is that the model ts well (high R2), but
none of the variables is signicantly dierent from zero (e.g., their separate
inuences aren't well determined).
• In summary, the articial regressions are the best approach if one wants to
be careful.
Example: using the mortalitat.gdt data, discussed above (Section 3.2), we can use
the articial regression approach, regressing spirits on the other regressors (cig, wine,
beer). The results are
spirits = −1.01350(1.4477)
+ 0.0670534(0.12709)
cig + 0.0794414(0.02738)
beer + 0.313745(0.3101)
wine
T = 34 R2 = 0.8907 F (3, 30) = 90.669 σ = 0.24749
(standard errors in parentheses)
3.8. SEGON PROJECTE DE DOCENCIA TUTORITZADA 45
. Note that R2 is very high: we have a serious problem of collinearity. This explains
the instability of the parameters we found earlier when we tried several models in
Section 3.2.
3.7. Dealing with collinearity
Collinearity is a problem of an uninformative sample. The rst question is: is all
the available information being used? Is more data available? Are there coecient
restrictions that have been neglected? Picture illustrating how a restriction can solve
problem of perfect collinearity.
There do exist specialized methods such as ridge regression, principal components
analysis, etc. that can be used when there is a severe problem of collinearity, but
these topics are advanced and are outside the scope of this course. These methods
present problems of their own, they are not clear and obviously good solutions to
the problem.
In sum, collinearity is a fact of life in econometrics, and there is no clear solution
to the problem. It is important to be aware of its eects and to know when it is
present.
3.8. Segon Projecte de Docencia Tutoritzada
(1) Pel model de Nerlove de cost de producci´o d'electriticat
ln(cost) = β1 + β2 ln(output) + β3 ln(labor) + β4 ln(fuel) + β5 ln(capital) + ε
que s'ha explicat en Secció2.5.3, fes servir regressions articials per com-
provar l'existència de conlinealitat.
(2) Quin és el motiu per la falta de signicativitat del coecient β5 en el model
de Nerlove? Dona una interpretació econòmica.
46 3. COLLINEARITY
(3) Verica l'existència de colinealitat en els models de mortalitat que estan
presentats en Secció 3.2. Baixa les dades i fes les regressions articials
pertinyents. També presenta la matriu de correlacions dels regressors cig,
spirits, wine, beer. Dona una interpretació
3.9. Chapter Exercises
The professor of the practical sessions will give you a list of problems. To that,
you might also consider the exercises 10.5, 10.7, 10.9, 10.19, 10.30a, 10.30b from
Gujarati, pp. 361-371.
CHAPTER 4
Heteroscedasticity
4.1. Introduction
• Basic concepts and goals for learning. After studying the material, you
should learn the answers to the following questions:
(1) What is heteroscedasticity?
(2) What are the properties of the OLS estimator when there is het-
eroscedasticity?
(3) What is the GLS estimator?
(4) What is the feasible GLS estimator?
(5) What are the properties of the (F)GLS estimator?
(6) How can the presence of heteroscedasticity be detected?
(7) How can we deal with heteroscedasticity if it is present?
• Readings:
Gujarati, Econometria, (cuarta edicion), Chapter 11: Heteroscedastici-
dad: ¾Qué pasa cuando la varianza del error no es constante?, pp. 372 -
424.
4.2. Motivation
One of the assumptions we've made up to now is that
εt ∼ IID(0, σ2),
47
48 4. HETEROSCEDASTICITY
or occasionally
εt ∼ IIN(0, σ2).
This model is quite unreasonable in many cases. Often, the variance of εt will
change depending on the values of the regressors, or there may be correlations be-
tween dierent εt,εs,s 6= t. For example, consider the Nerlove model of section 2.5.3.
If we estimate the model in equation 5.9.1, a plot of the residuals versus log(output)
is in Figure 4.2.1. Note that the variance of the error appear to be larger for small
rms, and smaller for large rms. This seems to violate the classical assumption that
E(εt) = σ2,∀t. If the variance is not constant, we have a problem of heteroscedas-
ticity. Note also in Figure 4.2.1 that there seems to be correlation in the residuals:
when a residual is positive, the next one is too in most cases. When a residual is
negative, the next one is more likely to be negative than positive. If this is the case,
it's a violation of the classical assumption that E(εtεs) = 0, t 6= s. If this is the case,
we have a problem of autocorrelation.
In this chapter and the next, we'll investigate what is the importance of these
two problems, and how to deal with them.
4.3. Basic Concepts and Denitions
Now we'll investigate the consequences of nonidentically and/or dependently
distributed errors. We'll assume xed regressors for now, relaxing this admittedly
unrealistic assumption later. The model is
y = Xβ + ε
E(ε) = 0
V (ε) = Σ
4.3. BASIC CONCEPTS AND DEFINITIONS 49
Figure 4.2.1. Residuals of Nerlove model
where Σ is a general symmetric positive denite matrix.
• The case where Σ is a diagonal matrix gives uncorrelated, nonidentically
distributed errors. This is known as heteroscedasticity (HET).
• The case where Σ has the same number on the main diagonal but nonzero
elements o the main diagonal gives identically (assuming higher moments
are also the same) dependently distributed errors. This is known as auto-
correlation (AUT).
Heteroscedasticity (denition): Heteroscedasticity is the existence of errors that
have dierent variances. More precisely, there exist εi and εj such that V (εi) 6= V (εj).
Autocorrelation (denition): Autocorrelation is the existence of errors that
are correlated with one another. More precisely, there exist distinct εi and εj such
E(εiεj) 6= 0.
50 4. HETEROSCEDASTICITY
• Note that presence of HET implies that Σ will have dierent elements on
its main diagonal.
• If there is AUT, then at least some elements of Σ o the main diagonal will
be dierent from zero.
• When there is HET but not AUT, Σ will be a diagonal matrix.
• It is possible to have both HET and AUT at the same time. In this case,
Σ can be a general symmetric positive denite matrix.
4.4. Eects of Het. and Aut. on the OLS estimator
The least square estimator is
β = (X ′X)−1X ′y
= β + (X ′X)−1X ′ε
• We have unbiasedness, as before.
• The variance of β is
E[(β − β)(β − β)′
]= E
[(X ′X)−1X ′εε′X(X ′X)−1
]= (X ′X)−1X ′ΣX(X ′X)−1(4.4.1)
Due to this, any test statistic that is based upon an estimator of σ2 is
invalid, since there isn't any σ2, it doesn't exist as a feature of the true
process that generates the data. In particular, the formulas for the t, F, χ2
based tests given above do not lead to statistics with these distributions.
• β is still consistent, following exactly the same argument given before.
4.4. EFFECTS OF HET. AND AUT. ON THE OLS ESTIMATOR 51
• If ε is normally distributed, then
β ∼ N(β, (X ′X)−1X ′ΣX(X ′X)−1
)The problem is that Σ is unknown in general, so this distribution won't be
useful for testing hypotheses.
• Without normality, we still have
√n(β − β
)=√n(X ′X)−1X ′ε
=
(X ′X
n
)−1
n−1/2X ′ε
Dene the limiting variance of n−1/2X ′ε (supposing a CLT applies) as
limn→∞
E(X ′εε′X
n
)= Ω
so we obtain√n(β − β
)d→ N
(0, Q−1
X ΩQ−1X
)
Summary: OLS with heteroscedasticity and/or autocorrelation is:
• unbiased in the same circumstances in which the estimator is unbiased with
i.i.d. errors
• has a dierent variance than before, so the previous test statistics aren't
valid
• is consistent
• is asymptotically normally distributed, but with a dierent limiting covari-
ance matrix. Previous test statistics aren't valid in this case for this reason.
• is inecient, as is shown below.
52 4. HETEROSCEDASTICITY
4.5. The Generalized Least Squares (GLS) estimator
Suppose Σ were known. Then one could form the Cholesky decomposition
P ′P = Σ−1
Here, P is an upper triangular matrix. We have
P ′PΣ = In
so
P ′PΣP ′ = P ′,
which implies that
PΣP ′ = In
Consider the model
Py = PXβ + Pε,
or, making the obvious denitions,
y∗ = X∗β + ε∗.
This variance of ε∗ = Pε is
E(Pεε′P ′) = PΣP ′
= In
4.5. THE GENERALIZED LEAST SQUARES (GLS) ESTIMATOR 53
Therefore, the model
y∗ = X∗β + ε∗
E(ε∗) = 0
V (ε∗) = In
satises the classical assumptions. The GLS estimator is simply OLS applied to the
transformed model:
βGLS = (X∗′X∗)−1X∗′y∗
= (X ′P ′PX)−1X ′P ′Py
= (X ′Σ−1X)−1X ′Σ−1y
The GLS estimator is unbiased in the same circumstances under which the OLS
estimator is unbiased. For example,
E(βGLS) = E
(X ′Σ−1X)−1X ′Σ−1y
= E
(X ′Σ−1X)−1X ′Σ−1(Xβ + ε
= β.
The variance of the estimator can be calculated using
βGLS = (X∗′X∗)−1X∗′y∗
= (X∗′X∗)−1X∗′ (X∗β + ε∗)
= β + (X∗′X∗)−1X∗′ε∗
54 4. HETEROSCEDASTICITY
so
E(
βGLS − β)(
βGLS − β)′
= E
(X∗′X∗)−1X∗′ε∗ε∗′X∗(X∗′X∗)−1
= (X∗′X∗)−1X∗′X∗(X∗′X∗)−1
= (X∗′X∗)−1
= (X ′Σ−1X)−1
Either of these last formulas can be used.
• All the previous results regarding the desirable properties of the least squares
estimator hold, when dealing with the transformed model, since the trans-
formed model satises the classical assumptions.
• Tests are valid, using the previous formulas, as long as we substitute X∗ in
place of X. Furthermore, any test that involves σ2 can set it to 1. This is
preferable to re-deriving the appropriate formulas.
• The GLS estimator is more ecient than the OLS estimator. This is a
consequence of the Gauss-Markov theorem, since the GLS estimator is based
on a model that satises the classical assumptions but the OLS estimator
is not. To see this directly, not that (the following needs to be completed)
V ar(β)− V ar(βGLS) = (X ′X)−1X ′ΣX(X ′X)−1 − (X ′Σ−1X)−1
= AΣA′
where A =[(X ′X)−1X ′ − (X ′Σ−1X)−1X ′Σ−1
]. This may not seem obvi-
ous, but it is true, as you can verify for yourself. Then noting that AΣA′
is a quadratic form in a positive denite matrix, we conclude that AΣA′is
positive semi-denite, and that GLS is ecient relative to OLS.
4.6. FEASIBLE GLS 55
• As one can verify by calculating rst order necessary conditions, the GLS
estimator is the solution to the minimization problem
βGLS = arg min(y −Xβ)′Σ−1(y −Xβ)
so the metric Σ−1 is used to weight the residuals.
4.6. Feasible GLS
The problem is that Σ isn't known usually, so this estimator isn't available.
• Consider the dimension of Σ : it's an n× n matrix with (n2 − n) /2 + n =
(n2 + n) /2 unique elements.
• The number of parameters to estimate is larger than n and increases faster
than n. There's no way to devise an estimator that satises a law of large
numbers without adding restrictions.
• The feasible GLS estimator is based upon making sucient assumptions
regarding the form of Σ so that a consistent estimator can be devised.
Suppose that we parameterize Σ as a function of X and θ, where θ may include β
as well as other parameters, so that
Σ = Σ(X, θ)
where θ is of xed dimension. Assuming that the parametrization is correct, so in
fact Σ = Σ(X, θ), and if we can consistently estimate θ, then we can consistently
estimate Σ (as long as Σ(X, θ) is a continuous function of θ). In this case,
Σ = Σ(X, θ)p→ Σ(X, θ)
56 4. HETEROSCEDASTICITY
If we replace Σ in the formulas for the GLS estimator with Σ, we obtain the FGLS
estimator. The FGLS estimator shares the same asymptotic properties as
GLS. These are
(1) Consistency
(2) Asymptotic normality
(3) Asymptotic eciency if the errors are normally distributed. (Cramer-Rao).
(4) Test procedures are asymptotically valid.
In practice, the usual way to proceed is
(1) Dene a consistent estimator of θ. This is a case-by-case proposition, de-
pending on the parametrization Σ(θ). We'll see examples below.
(2) Form Σ = Σ(X, θ)
(3) Calculate the Cholesky factorization P = Chol(Σ−1).
(4) Transform the model using
P ′y = P ′Xβ + P ′ε
(5) Estimate using OLS on the transformed model.
4.7. Heteroscedasticity
Heteroscedasticity is the case where
E(εε′) = Σ
is a diagonal matrix, so that the errors are uncorrelated, but have dierent vari-
ances. Heteroscedasticity is usually thought of as associated with cross sectional
data, though there is absolutely no reason why time series data cannot also be
4.7. HETEROSCEDASTICITY 57
heteroscedastic. Actually, the popular ARCH (autoregressive conditionally het-
eroscedastic) models that you may hear about in your nance classes explicitly
assume that a time series is heteroscedastic.
Consider a supply function
qi = β1 + βpPi + βsSi + εi
where Pi is price and Si is some measure of size of the ith rm. One might suppose
that unobservable factors (e.g., talent of managers, degree of coordination between
production units, etc.) account for the error term εi. If there is more variability in
these factors for large rms than for small rms, then εi may have a higher variance
when Si is high than when it is low.
Another example, individual demand.
qi = β1 + βpPi + βmMi + εi
where P is price andM is income. In this case, εi can reect variations in preferences.
There are more possibilities for expression of preferences when one is rich, so it is
possible that the variance of εi could be higher when M is high.
Add example of group means.
4.7.1. Detection. There exist many tests for the presence of heteroscedasticity.
We'll discuss three methods.
4.7.1.1. Goldfeld-Quandt. The sample is divided in to three parts, with n1, n2
and n3 observations, where n1 +n2 +n3 = n. The model is estimated using the rst
and third parts of the sample, separately, so that β1 and β3 will be independent.
Then we haveε1′ε1
σ2=ε1′M1ε1
σ2
d→ χ2(n1 −K)
58 4. HETEROSCEDASTICITY
and
ε3′ε3
σ2=ε3′M3ε3
σ2
d→ χ2(n3 −K)
soε1′ε1/(n1 −K)
ε3′ε3/(n3 −K)
d→ F (n1 −K,n3 −K).
The distributional result is exact if the errors are normally distributed. This test is
a two-tailed test. Alternatively, and probably more conventionally, if one has prior
ideas about the possible magnitudes of the variances of the observations, one could
order the observations accordingly, from largest to smallest. In this case, one would
use a conventional one-tailed F-test. Draw picture.
• Ordering the observations is an important step if the test is to have any
power.
• The motive for dropping the middle observations is to increase the dier-
ence between the average variance in the subsamples, supposing that there
exists heteroscedasticity. This can increase the power of the test. On the
other hand, dropping too many observations will substantially increase the
variance of the statistics ε1′ε1 and ε3′ε3. A rule of thumb, based on Monte
Carlo experiments is to drop around 25% of the observations.
• If one doesn't have any ideas about the form of the het. the test will
probably have low power since a sensible data ordering isn't available.
4.7.1.2. White's test. When one has little idea if there exists heteroscedasticity,
and no idea of its potential form, the White test is a possibility. The idea is that if
there is homoscedasticity, then
E(ε2t |xt) = σ2,∀t
4.7. HETEROSCEDASTICITY 59
so that xt or functions of xt shouldn't help to explain E(ε2t ). The test works as
follows:
(1) Since εt isn't available, use the consistent estimator εt instead.
(2) Regress
ε2t = σ2 + z′tγ + vt
where zt is a P -vector. zt may include some or all of the variables in xt, as
well as other variables. White's original suggestion was to use xt, plus the
set of all unique squares and cross products of variables in xt.
(3) Test the hypothesis that γ = 0. The qF statistic in this case is
qF =(ESSR − ESSU) /P
ESSU/ (n− P − 1)
Note that ESSR = TSSU , so dividing both numerator and denominator by
this we get
qF = (n− P − 1)R2
1−R2
Note that this is the R2 or the articial regression used to test for het-
eroscedasticity, not the R2 of the original model.
An asymptotically equivalent statistic, under the null of no heteroscedasticity (so
that R2 should tend to zero), is
nR2 a∼ χ2(P ).
This doesn't require normality of the errors, though it does assume that the fourth
moment of εt is constant, under the null. Question: why is this necessary?
60 4. HETEROSCEDASTICITY
• The White test has the disadvantage that it may not be very powerful unless
the zt vector is chosen well, and this is hard to do without knowledge of the
form of heteroscedasticity.
• It also has the problem that specication errors other than heteroscedastic-
ity may lead to rejection.
• Note: the null hypothesis of this test may be interpreted as θ = 0 for the
variance model V (ε2t ) = h(α + z′tθ), where h(·) is an arbitrary function
of unknown form. The test is more general than is may appear from the
regression that is used.
4.7.1.3. Plotting the residuals. A very simple method is to simply plot the resid-
uals (or their squares). Draw pictures here. Like the Goldfeld-Quandt test, this will
be more informative if the observations are ordered according to the suspected form
of the heteroscedasticity.
4.7.2. Dealing with heteroscedasticity if it is present. Correcting for het-
eroscedasticity requires that a parametric form for Σ(θ) be supplied, and that a
means for estimating θ consistently be determined. The estimation method will
be specic to the for supplied for Σ(θ). We'll consider two examples, multiplicative
HET and HET by groups. Before this, let's consider using OLS, even if we have
HET. The advantage of this is that we don't need to specify the form of Σ(θ).
4.7.2.1. OLS with heteroscedasticity-consistent covariance matrix estimation. Eicker
(1967) and White (1980) showed how to modify test statistics to account for het-
eroscedasticity of unknown form. The OLS estimator has asymptotic distribution
√n(β − β
)d→ N
(0, Q−1
X ΩQ−1X
)
4.7. HETEROSCEDASTICITY 61
as we've already seen. Recall that we dened
limn→∞
E(X ′εε′X
n
)= Ω
This matrix has dimension K × K and can be consistently estimated, even if we
can't estimate Σ consistently. The consistent estimator, under heteroscedasticity
but no autocorrelation is
Ω =1
n
n∑t=1
xtx′tε
2t
One can then modify the previous test statistics to obtain tests that are valid when
there is heteroscedasticity of unknown form. For example, the Wald test for H0 :
Rβ − r = 0 would be
n(Rβ − r
)′(R
(X ′X
n
)−1
Ω
(X ′X
n
)−1
R′
)−1 (Rβ − r
)a∼ χ2(q)
4.7.2.2. Multiplicative heteroscedasticity. Suppose the model is
yt = x′tβ + εt
σ2t = E(ε2
t ) = (z′tγ)δ
but the other classical assumptions hold. In this case
ε2t = (z′tγ)
δ+ vt
and vt has mean zero. Nonlinear least squares could be used to estimate γ and
δ consistently, were εt observable. The solution is to substitute the squared OLS
residuals ε2t in place of ε2
t , since it is consistent by the Slutsky theorem. Once we
have γ and δ, we can estimate σ2t consistently using
σ2t = (z′tγ)
δp
→ σ2t .
62 4. HETEROSCEDASTICITY
In the second step, we transform the model by dividing by the standard deviation:
ytσt
=x′tβ
σt+εtσt
or
y∗t = x∗′t β + ε∗t .
Asymptotically, this model satises the classical assumptions.
• This model is a bit complex in that NLS is required to estimate the model
of the variance. A simpler version would be
yt = x′tβ + εt
σ2t = E(ε2
t ) = σ2zδt
where zt is a single variable. There are still two parameters to be estimated,
and the model of the variance is still nonlinear in the parameters. However,
the search method can be used in this case to reduce the estimation problem
to repeated applications of OLS.
• First, we dene an interval of reasonable values for δ, e.g., δ ∈ [0, 3].
• Partition this interval intoM equally spaced values, e.g., 0, .1, .2, ..., 2.9, 3.
• For each of these values, calculate the variable zδmt .
• The regression
ε2t = σ2zδmt + vt
is linear in the parameters, conditional on δm, so one can estimate σ2 by
OLS.
• Save the pairs (σ2m, δm), and the corresponding ESSm. Choose the pair with
the minimum ESSm as the estimate.
4.7. HETEROSCEDASTICITY 63
• Next, divide the model by the estimated standard deviations.
• Can rene. Draw picture.
• Works well when the parameter to be searched over is low dimensional, as
in this case.
4.7.2.3. Groupwise heteroscedasticity. A common case is where we have repeated
observations on each of a number of economic agents: e.g., 10 years of macroeco-
nomic data on each of a set of countries or regions, or daily observations of trans-
actions of 200 banks. This sort of data is a pooled cross-section time-series model.
It may be reasonable to presume that the variance is constant over time within
the cross-sectional units, but that it diers across them (e.g., rms or countries of
dierent sizes...). The model is
yit = x′itβ + εit
E(ε2it) = σ2
i ,∀t
where i = 1, 2, ..., G are the agents, and t = 1, 2, ..., n are the observations on each
agent.
• The other classical assumptions are presumed to hold.
• In this case, the variance σ2i is specic to each agent, but constant over the
n observations for that agent.
• In this model, we assume that E(εitεis) = 0. This is a strong assumption
that we'll relax later.
To correct for heteroscedasticity, just estimate each σ2i using the natural estimator:
σ2i =
1
n
n∑t=1
ε2it
64 4. HETEROSCEDASTICITY
• Note that we use 1/n here since it's possible that there are more than n
regressors, so n − K could be negative. Asymptotically the dierence is
unimportant.
• With each of these, transform the model as usual:
yitσi
=x′itβ
σi+εitσi
Do this for each cross-sectional group. This transformed model satises the
classical assumptions, asymptotically.
4.8. Example
4.8.1. Example: the Nerlove model. Let's check the Nerlove data for evi-
dence of heteroscedasticity. In what follows, we're going to use the model with the
constant and output coecient varying across 5 groups, but with the input price
coecients xed (see Equation 2.5.2). If you plot the residuals of this model, you
obtain Figure 4.8.1. We can see pretty clearly that the error variance is larger for
small rms than for larger rms.
As part of your next Docencia Tutoritzada project, you will use the White and
Goldfeld-Quandt tests to conrm that homoscedasticity is strongly rejected.
4.9. Tercer Projecte de Docència Tutoritzada
(1) Dades de Wisconsin
(a) Baixa les dades de Wisconsin, sobre alçada i renda
(b) selecciona les observacions amb informació completa sobre alçada i
renda.
(c) crea una variable ctícia indicant si la persona és dona/home
4.9. TERCER PROJECTE DE DOCÈNCIA TUTORITZADA 65
Figure 4.8.1. Residuals, Nerlove model, sorted by rm size
(d) crea noves variables "AD" i "IQD" que expressen alçada i IQ en desvia-
cions respecte les seves mitjanes mostrals.
(e) Estima el model renda = b1 + b2*Dona + b3* AD + b4*(Dona*AD)
+ b5*IQD + e amb l'estimador MQO.
(f) Comenta els resultats
(g) Comprova si hi ha heteroscedasticitat
(i) dibuixant els residus
(ii) amb el contrast Goldfeld-Quandt
(iii) amb el contrast de White
(h) Torna a estimar amb MQO, però amb desviacions típiques robustas.
Compara els resultats amb els d'abans.
66 4. HETEROSCEDASTICITY
(i) Fes una estimació MQ Generalitzat, suposant que hi ha heteroscedas-
ticitat per grups. Hi ha dos grups - homes i dones. Comenta els
resultats.
(j) Fes una estimació MQ Generalitzat, fent servir l'opció de GRETL
"Correcion de heteroscadaticidad" . Comenta els resultats.
(2) Dades Nerlove
(a) Torna a estimar el model amb variables ctícies i termes d'interacció
del Primer Projecte de Docència Tutoritzada
ln(cost) =5∑j=1
αjdj+5∑j=1
γj [dj ln(output)]+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ε
(b) Contrasta l'hipòtesis nulla: "els errors són homoscedastics" amb el
contrast de White.
(c) fés gràcs del residus, i comenta si es detecta l'heteroscedasticitat.
S'hauria d'obtenir un gràc semblant amb Figure 4.8.1.
(d) Fes una estimació MQ Generalitzat, fent servir l'opció de GRETL
"Correcion de heteroscadaticidad" . Comenta els resultats.
4.10. Chapter Exercises
The professor of the practical sessions will give you a list of problems. To that,
you might also consider exercises 11.1, 11.2, 11.6, 11.15, 11.16, from Gujarati, pp.
413-421.
CHAPTER 5
Autocorrelation
5.1. Introduction
• Basic concepts and goals for learning. After studying the material, you
should learn the answers to the following questions:
(1) What is autocorrelation (AUT)?
(2) What are the properties of the OLS estimator when there is autocor-
relation?
(3) How can the presence of autocorrelation be detected?
(4) How can we deal with autocorrelation if it is present?
• Readings:
Gujarati, Econometria, (cuarta edicion), Chapter 12: Autocorrelación:
¾qué sucede si los términos error están correlacionados?, pp. 425 - 486.
5.2. Motivation
Autocorrelation, which is the serial correlation of the error term, so that E(εtεs 6=
0) for t 6= s, is a problem that is usually associated with time series data, but also
can aect cross-sectional data. For example, a shock to oil prices will simultaneously
aect all countries, so one could expect contemporaneous correlation of macroeco-
nomic variables across countries. Seasonality is another common problem.
Consider the Keeling-Whorf.gdt data. If we regress C02 concentration on a time
trend, we obtain the tted line in 5.2.1. The residuals from the same model are in
Figure 5.2.2. In addition to a high frequency monthly pattern in the residuals, there
67
68 5. AUTOCORRELATION
Figure 5.2.1. Keeling-Whorf CO2 data, t using time trend
Figure 5.2.2. Keeling-Whorf CO2 data, residuals using time trend
is a long term low frequency wave. It is clear that the errors of this model are not
independent over time. This is an example of autocorrelation.
5.3. CAUSES 69
If you examine the residuals of the simple Nerlove model (equation 5.9.1), in
Figure 4.8.1, you can also detect that there appears to be autocorrelation.
In this Chapter, we will explore the causes, eects and treatments for AUT.
5.3. Causes
Autocorrelation is the existence of correlation across the error term:
E(εtεs) 6= 0, t 6= s.
Why might this occur? Plausible explanations include:
(1) Lags in adjustment to shocks. In a model such as
yt = x′tβ + εt,
one could interpret x′tβ as the equilibrium value. Suppose xt is constant over
a number of observations. One can interpret εt as a shock that moves the
system away from equilibrium. If the time needed to return to equilibrium
is long with respect to the observation frequency, one could expect εt+1 to
be positive, conditional on εt positive, which induces a correlation.
(2) Unobserved factors that are correlated over time. The error term is often
assumed to correspond to unobservable factors. If these factors are corre-
lated, there will be autocorrelation.
(3) Misspecication of the model. Suppose that the data generating process
(DGP) is
yt = β0 + β1xt + β2x2t + εt
but we estimate
yt = β0 + β1xt + εt
70 5. AUTOCORRELATION
Figure 5.3.1. Autocorrelation induced by misspecication
The eects are illustrated in Figure 5.3.1. A similar problem might explain
the residuals of the simple Nerlove model, in Figure 4.2.1.
5.4. Eects on the OLS estimator
The variance of the OLS estimator is the same as in the case of heteroscedasticity
- the standard formula does not apply. The correct formula is given in equation 4.4.1.
Next we discuss two GLS corrections for OLS.
5.5. Corrections
There are many types of autocorrelation. The way to correct for the problem
depends on the exact type of autocorrelation that exists. We'll consider two ex-
amples. The rst is the most commonly encountered case: autoregressive order 1
(AR(1) errors.
5.5. CORRECTIONS 71
5.5.1. AR(1). The model is
yt = x′tβ + εt
εt = ρεt−1 + ut
ut ∼ iid(0, σ2u)
E(εtus) = 0, t < s
We assume that the model satises the other classical assumptions.
• We need a stationarity assumption: |ρ| < 1. Otherwise the variance of εt
explodes as t increases, so standard asymptotics will not apply.
• By recursive substitution we obtain
εt = ρεt−1 + ut
= ρ (ρεt−2 + ut−1) + ut
= ρ2εt−2 + ρut−1 + ut
= ρ2 (ρεt−3 + ut−2) + ρut−1 + ut
In the limit the lagged ε drops out, since ρm → 0 as m→∞, so we obtain
εt =∞∑m=0
ρmut−m
With this, the variance of εt is found as
E(ε2t ) = σ2
u
∞∑m=0
ρ2m
=σ2u
1− ρ2
72 5. AUTOCORRELATION
• If we had directly assumed that εt were covariance stationary, we could
obtain this using
V (εt) = ρ2E(ε2t−1) + 2ρE(εt−1ut) + E(u2
t )
= ρ2V (εt) + σ2u,
so
V (εt) =σ2u
1− ρ2
• The variance is the 0th order autocovariance: γ0 = V (εt)
• Note that the variance does not depend on t
Likewise, the rst order autocovariance γ1 is
Cov(εt, εt−1) = γs = E((ρεt−1 + ut) εt−1)
= ρV (εt)
=ρσ2
u
1− ρ2
• Using the same method, we nd that for s < t
Cov(εt, εt−s) = γs =ρsσ2
u
1− ρ2
• The autocovariances don't depend on t: the process εt is covariance sta-
tionary
The correlation ( in general, for r.v.'s x and y) is dened as
corr(x, y) =cov(x, y)
se(x)se(y)
5.5. CORRECTIONS 73
but in this case, the two standard errors are the same, so the s-order autocorrelation
ρs is
ρs = ρs
• All this means that the overall matrix Σ has the form
Σ =σ2u
1− ρ2︸ ︷︷ ︸this is the variance
1 ρ ρ2 · · · ρn−1
ρ 1 ρ · · · ρn−2
.... . .
...
. . . ρ
ρn−1 · · · 1
︸ ︷︷ ︸
this is the correlation matrix
So we have homoscedasticity, but elements o the main diagonal are not
zero. All of this depends only on two parameters, ρ and σ2u. If we can
estimate these consistently, we can apply FGLS.
It turns out that it's easy to estimate these consistently. The steps are
(1) Estimate the model yt = x′tβ + εt by OLS.
(2) Take the residuals, and estimate the model
εt = ρεt−1 + u∗t
Since εtp→ εt, this regression is asymptotically equivalent to the regression
εt = ρεt−1 + ut
74 5. AUTOCORRELATION
which satises the classical assumptions. Therefore, ρ obtained by applying
OLS to εt = ρεt−1 + u∗t is consistent. Also, since u∗t
p→ ut, the estimator
σ2u =
1
n
n∑t=2
(u∗t )2 p→ σ2
u
(3) With the consistent estimators σ2u and ρ, form Σ = Σ(σ2
u, ρ) using the
previous structure of Σ, and estimate by FGLS. Actually, one can omit the
factor σ2u/(1− ρ2), since it cancels out in the formula
βFGLS =(X ′Σ−1X
)−1
(X ′Σ−1y).
• An asymptotically equivalent approach is to simply estimate the trans-
formed model
yt − ρyt−1 = (xt − ρxt−1)′β + u∗t
using n − 1 observations (since y0 and x0 aren't available). This is the
method of Cochrane and Orcutt. Dropping the rst observation is asymp-
totically irrelevant, but it can be very important in small samples. One can
recuperate the rst observation by putting
y∗1 = y1
√1− ρ2
x∗1 = x1
√1− ρ2
Note that the variance of y∗1 is σ2u, asymptotically, so we see that the trans-
formed model will be homoscedastic (and nonautocorrelated, since the u′s
are uncorrelated with the y′s, in dierent time periods.
5.5. CORRECTIONS 75
5.5.2. MA(1). The linear regression model with moving average order 1 errors
is
yt = x′tβ + εt
εt = ut + φut−1
ut ∼ iid(0, σ2u)
E(εtus) = 0, t < s
In this case,
V (εt) = γ0 = E[(ut + φut−1)
2]= σ2
u + φ2σ2u
= σ2u(1 + φ2)
Similarly
γ1 = E [(ut + φut−1) (ut−1 + φut−2)]
= φσ2u
and
γ2 = [(ut + φut−1) (ut−2 + φut−3)]
= 0
76 5. AUTOCORRELATION
so in this case
Σ = σ2u
1 + φ2 φ 0 · · · 0
φ 1 + φ2 φ
0 φ. . .
...
.... . . φ
0 · · · φ 1 + φ2
Note that the rst order autocorrelation is
ρ1 = φσ2u
σ2u(1+φ2)
=γ1
γ0
=φ
(1 + φ2)
• This achieves a maximum at φ = 1 and a minimum at φ = −1, and the
maximal and minimal autocorrelations are 1/2 and -1/2. Therefore, series
that are more strongly autocorrelated can't be MA(1) processes.
Again the covariance matrix has a simple structure that depends on only two pa-
rameters. The problem in this case is that one can't estimate φ using OLS on
εt = ut + φut−1
because the ut are unobservable and they can't be estimated consistently. However,
there is a simple way to estimate the parameters.
• Since the model is homoscedastic, we can estimate
V (εt) = σ2ε = σ2
u(1 + φ2)
using the typical estimator:
σ2ε = σ2
u(1 + φ2) =1
n
n∑t=1
ε2t
5.6. VALID INFERENCES WITH AUTOCORRELATION OF UNKNOWN FORM 77
• By the Slutsky theorem, we can interpret this as dening an (unidentied)
estimator of both σ2u and φ, e.g., use this as
σ2u(1 + φ2) =
1
n
n∑t=1
ε2t
However, this isn't sucient to dene consistent estimators of the parame-
ters, since it's unidentied.
• To solve this problem, estimate the covariance of εt and εt−1 using
Cov(εt, εt−1) = φσ2u =
1
n
n∑t=2
εtεt−1
This is a consistent estimator, following a LLN (and given that the epsilon
hats are consistent for the epsilons). As above, this can be interpreted as
dening an unidentied estimator:
φσ2u =
1
n
n∑t=2
εtεt−1
• Now solve these two equations to obtain identied (and therefore consistent)
estimators of both φ and σ2u. Dene the consistent estimator
Σ = Σ(φ, σ2u)
following the form we've seen above, and transform the model using the
Cholesky decomposition. The transformed model satises the classical as-
sumptions asymptotically.
5.6. valid inferences with autocorrelation of unknown form
In Section 4.7.2.1 we saw that it is possible to consistently estimate the correct
covariance matrix of the OLS estimator when there is HET. It is also possible to do
78 5. AUTOCORRELATION
this when there is AUT, or both HET and AUT. The details are beyond the scope
of this course.
It is important to remember that a correction for autocorrelation will only give an
ecient estimator and valid test statistics if the model of autocorrelation is correct.
It may be hard to determine which is the correct model for the autocorrelation of
the errors, so one may prefer to foregoe the GLS correction and simply use OLS.
If this is done, one needs to account for the existence of AUT when estimating the
covariance of the parameters, to obtain correct test statistics. We will see examples
in the Projecte de Docència Tutoritzada.
5.7. Testing for autocorrelation
Breusch-Godfrey test
This test uses an auxiliary regression, as does the White test for heteroscedas-
ticity. The regression is
εt = x′tδ + γ1εt−1 + γ2εt−2 + · · ·+ γP εt−P + vt
and the test statistic is the nR2 statistic, just as in the White test. There are P
restrictions, so the test statistic is asymptotically distributed as a χ2(P ).
• The intuition is that the lagged errors shouldn't contribute to explaining
the current error if there is no autocorrelation.
• xt is included as a regressor to account for the fact that the εt are not
independent even if the εt are. This is a technicality that we won't go into
here.
• This test is valid even if the regressors are stochastic and contain lagged
dependent variables.
5.8. LAGGED DEPENDENT VARIABLES AND AUTOCORRELATION: A CAUTION 79
• The alternative is not that the model is an AR(P), following the argument
above. The alternative is simply that some or all of the rst P autocorrelations
are dierent from zero. This is compatible with many specic forms of au-
tocorrelation.
5.8. Lagged dependent variables and autocorrelation: A Caution
We've seen that the OLS estimator is consistent under autocorrelation, as long as
plimX′εn
= 0. This will be the case when E(X ′ε) = 0, following a LLN. An important
exception is the case where X contains lagged y′s and the errors are autocorrelated.
A simple example is the case of a single lag of the dependent variable with AR(1)
errors. The model is
yt = x′tβ + yt−1γ + εt
εt = ρεt−1 + ut
Now we can write
E(yt−1εt) = E
(x′t−1β + yt−2γ + εt−1)(ρεt−1 + ut)
6= 0
since one of the terms is E(ρε2t−1) which is clearly nonzero. In this case E(X ′ε) 6= 0,
and therefore plimX′εn6= 0. Since
plimβ = β + plimX ′ε
n
the OLS estimator is inconsistent in this case. One needs to estimate by instrumental
variables (IV). This is a topic that is beyond the scope of this course. It is important
to be aware of the possibility that the OLS estimator can be inconsistent, though.
80 5. AUTOCORRELATION
5.9. Quart Projecte de Docència Tutoritzada
Fent servir les dades de Nerlove (ja heu fet servir les dades, però el txer Excel
està aqui si cal.)
(1) Pel model senzill
(5.9.1) ln(cost) = β1 +β2 ln(output)+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ ε
(a) estimar el model amb MQO
(b) Fes servir el contrast de Breusch-Godfrey per comprovar si hi ha auto-
correlació. Important: Per poder fer aix s'haurà de donar una estruc-
tura de serie temporal a les dades.
(c) fer un gràc dels residus, i dona una interpretació de si es veu o no un
problema d'autocorrelació.
(2) Repetir exercici 1, però fent servir el model
ln(cost) =5∑j=1
αjdj+5∑j=1
γj [dj ln(output)]+β3 ln(labor)+β4 ln(fuel)+β5 ln(capital)+ε
que es va presentar en Secció 2.5.3.
(3) Amb les dades Keeling-Whorf.gdt
(a) estimar el model
CO2t = β1 + β2t+ εt
(b) comprova si hi ha autocorrelació fent servir el contrast de Breusch-
Godfrey.
(c) fer un gràc dels residus
(d) tornar a estimar el model fent servir els métodes de Cochrane-Orcutt
i Prais-Winsten, i fer gràcs dels residus.
5.10. CHAPTER EXERCISES 81
(e) comentar tots els resultats
5.10. Chapter Exercises
The professor of the practical sessions will give you a list of problems. To that,
you might also consider exercises 12.1, 12.8, 12.9, 12.11, 12.14, 12.17, 12.22, 12.26,
12.28 from Gujarati, pp. 472-486.
CHAPTER 6
Data sets
This chapter gives links to the data sets referred to in the Study Guide
Wisconsin height-income data (comma separated values)
Wisconsin height-income data (Gretl data le)
Nerlove data (Excel spreadsheet le)
Nerlove data (Gretl data le)
Keeling-Whorf CO2 data (Gretl data le)
Cigarette-Alcohol Mortality data (Gretl data le)
83