panel data i
TRANSCRIPT
-
8/9/2019 Panel Data I
1/34
1
PANEL DATA WORKSHOP
BRUNEL UNIVERSITY
February 29, 2008.
PART I:
THE ABC OF INSTRUMENTAL VARIABLES
AND GMM ESTIMATION
-
8/9/2019 Panel Data I
2/34
2
The ABC of instrumental variables and
GMM estimation
Presentation outline
1. Introduction
2. Instrumental variables estimation
3. Empirical example
4. Three important tests
5. Empirical example continued
6. Summary
-
8/9/2019 Panel Data I
3/34
3
1. Introduction
Econometrics is concerned with the analysis of financial
business and /economic data– time series, cross sectional
or panel data.
The analysis can be at individual, firm, industry or country
level.
Often the aim is to establish a causal relationship between
various variables.
1. Do bank loans help firms export more?
2. Are women discriminated against in the labour market?
3. Does financial development foster aggregate growth?
-
8/9/2019 Panel Data I
4/34
4
How about the endogeneity
problem???
-
8/9/2019 Panel Data I
5/34
5
1. Introduction
Suppose OLS regression of exports on bank loans shows a
positive correlation between the two variables.
Does this correlation imply that bank loans are the cause of
increased export? -- Not necessarily so!!!
May be exporters are more successful in obtaining bank
loans or banks favour exporters more than non-exporters! Thus the causality might be from exports (dependent
variable) to bank loans (independent variable).
This is known as the problem of reverse causality or more
generally endogeneity.
Bank loans are potentially endogenous in the model.
-
8/9/2019 Panel Data I
6/34
6
1. Introduction
As another example, suppose that OLS regression of per
capita GDP growth on financial development shows a
positive relationship between the two variables.
Since it is possible that financial markets develop in
anticipation of future GDP growth, financial development
could be a lead indicator of growth rather than an
exogenous cause of growth.
This creates the problem of endogeneity because the
finance-growth relationship is simultaneously determined.
It is important to remember that OLS would be biased when
one or more regressors are endogenous.
-
8/9/2019 Panel Data I
7/34
7
1. Introduction
The examples illustrate that (i) correlation does not
necessarily imply causality, and (ii) OLS may not always be
an adequate empirical tool in finance.
The objective of this lecture is to first introduce an estimation
technique which is effective at dealing with the problem of
endogeneity.
This technique is known as instrumental variables (IV)
estimation, or more generally as generalised method of
moments (GMM).
Unlike OLS, IV/GMM offer the chance of testing for causal
relationships between economic variables.
-
8/9/2019 Panel Data I
8/34
8
2. IV estimation
Consider the following regression model
For OLS to be unbiased, the matrix of regressors X and the
error term ε should be uncorrelated. That is
In this case we say that the regressors are exogenous.
When at least one of the regressors are correlated with ε,
Regressors that are correlated with the error term are called
endogenous regressors.
-
8/9/2019 Panel Data I
9/34
9
2. IV estimation
Endogeneity could result from a variety of reasons including:
1. Reverse causality.
2. Simultaneity bias
3. Omitted variables bias.
4. Measurement errors.
Whatever the reason behind endogeneity, the OLS estimator
of β
will be biased and inconsistent.
If there are several regressors and just one of them is
endogenous, the OLS estimator would still be biased.
-
8/9/2019 Panel Data I
10/34
10
2. IV estimation
In order to obtain a valid estimator of β and make correct
inference about the relationship between y and X, we need
some additional variables.
The variables which help obtain a consistent estimator of β
are known as instrumental variables (say Z).
Instrumental variables should satisfy two properties:
1. They have to be correlated with the endogenous
regressors X:
Instrument relevance.
2. They have to be uncorrelated with the error term ε:
Instrument exogeneity.
-
8/9/2019 Panel Data I
11/34
11
2. IV estimation
The instrumental variables should only affect the dependent
variable (y) indirectly through their relationship with the
endogenous regressors (X).
In other words Z should not be part of the model.
It is not always easy to come up with valid instruments that
are exogenous to the model AND correlated with X.
Suppose y = per capita GDP growth and X includes an
indicator of financial development.
The concern of endogeneity arises because faster per capita
GDP growth is conducive to financial development.
-
8/9/2019 Panel Data I
12/34
-
8/9/2019 Panel Data I
13/34
13
2. IV estimation
If valid instruments that satisfy the properties of relevance
and exogeneity are available, a consistent estimator of β
can be obtained.
This consistent estimator is called the instrumental variables
(IV) estimator, and is denoted as .
Consistency means that as the sample size gets large, the
estimator converges to the true value β. The formula for the basic IV estimator is
The IV estimator has approximate normal distribution in
large samples. So statistical inference such as t-tests can be
conducted in a standard fashion.
-
8/9/2019 Panel Data I
14/34
14
2. IV estimation
The basic IV estimator can be obtained as a two-stage least
squares estimation process:
1. Regress each endogenous regressor on all instruments
and exogenous regressors, and generate predicted values.
2. Estimate the model by OLS, replacing the endogenous
regressors with their predicted values.
If the error term is heteroskedastic or serially correlated,there are two options:
a. can be used with robust standard errors.
This option corresponds to the use of robust standard errors
in OLS regressions
It is the "safest” option, though not the most efficient one.
-
8/9/2019 Panel Data I
15/34
15
2. IV estimation
b. An efficient version of the IV estimator called the
generalised method of moments (GMM) can also be used.
This option corresponds to the use of Generalised Least
Squares (GLS) in the standard regression analysis.
In small samples, the GMM estimator tends to be inaccurate.
IV-GMM estimation requires at least as many instrumental
variables as endogenous regressors.
When there are more instruments than endogenous
regressors we say that the model is overidentified.
-
8/9/2019 Panel Data I
16/34
16
IV, GMM, relevance,
exogeneity, overidentification.
-
8/9/2019 Panel Data I
17/34
17
3. Empirical example
The aim is to test whether access to finance (bank loans)
causes an increase in private firms exports in China.
The following model is specified (i indexes firm)
EXPORT is log of export, BANK is log of bank loans, DIST is
log of distance from the city the firm is located in to the
nearest port, LAB is a dummy variable showing whether thefirm is in a labour intensive industry or not.
The model has three regressors, two of which, DIST and
LAB are arguably exogenous (why?)
BANK is potentially endogenous, however.
One the one hand, bank loans might help firms export by
providing them with the necessary financial resources.
-
8/9/2019 Panel Data I
18/34
18
3. Empirical example
On the other hand, banks might prefer to lend to exporting
firms. So exporting could help secure more bank loans.
Because of this potential problem of simultaneity bias, we
employ IV/GMM.
To start with, explore the following three variables as
potential instruments:
1. POL: A dummy variable indicating whether the firm has
political connections or not.
2. STATE: The share of state-owned enterprises (SOEs) in
the region the firm is located in.
3. EQUITY: The amount of equity/collateral the firm has.
-
8/9/2019 Panel Data I
19/34
19
3. Empirical example
Arguably, these instrumental variable candidates are
correlated with the endogenous regressor (BANK). Thus
they are likely to be relevant instruments.
Political connection and high level of collateral help obtain
more bank loans; while high presence of SOEs is likely to
reduce private firms’ access to finance.
On the other hand, the property of exogeneity requires that
the instruments affect exporting through bank loans alone,
rather than being fundamental drivers of export.
First estimate the model by OLS with robust standard errors,
and then by IV/GMM.
-
8/9/2019 Panel Data I
20/34
20
3. Empirical example
A peek at the cross sectional data ( N=5167)
-
8/9/2019 Panel Data I
21/34
21
3. Empirical example
OLS with robust standard error:
Bank loans are positively correlated with export, but with
marginal statistical significance.
Should we trust these results? Probably not, because of
simultaneity bias.
-
8/9/2019 Panel Data I
22/34
22
3. Empirical example
Two-stage least squares (IV) with robust standard errors: First stage regression
1.These are the instrumental variables.
2. Usually first-stage regressions are not reported in applied work.
But it is important to routinely inspect them.
-
8/9/2019 Panel Data I
23/34
23
3. Empirical example
Two-stage least squares (IV) with robust standard errors: Second stage regression
1. All interpretation of the model should be based on the second stage
regression.
2. Compared to OLS, 2SLS results appear to be counterintuitive: bank
loans hurt exports; and distance to port does not seem to matter.
3. We should test for the validity of the instruments before taking these
results seriously!!
-
8/9/2019 Panel Data I
24/34
24
4. Three important tests
When working with instrumental variables three importanttests should be performed as a matter of routine. These are:
1. Testing or checking for the relevance of the instrumental
variable candidates: If the instruments have no or little
correlation with the endogenous regressors, they are called
weak instruments and would bias the IV estimator.
2. Testing for the exogeneity of the instruments: If theinstruments are correlated with the error term, IV would be
invalid.
3. Testing whether the endogenous regressors are really
endogenous: If the regressors are not endogenous after all,
OLS would be the most efficient estimation method.
-
8/9/2019 Panel Data I
25/34
25
4. Three important tests
1. TESTING FOR INSTRUMENT RELEVANCE:
The idea is to check whether the instruments are sufficiently
correlated with the endogenous regressors.
The simplest way is to test for the joint significance of the
instruments in the first stage regression.
As a rule of thumb, if the calculated F statistic is more than10 and the p-value is 0, the instruments are likely to be
relevant.
If the instruments are weak, it is advisable to look for
other/additional instruments.
-
8/9/2019 Panel Data I
26/34
26
4. Three important tests
2. TESTING FOR INSTRUMENT EXOGENEITY
The instruments should have no correlation with the error.
In order to test for instruments exogeneity, we need to have
more instruments than endogenous regressors. The number
of excess instruments is called the number of overidentifying
restrictions (in our example this number equals 2).
The Sargan/Hansen can be used to test for IV exogeneity. The null hypothesis of the test is “ All instruments are
valid”.
If the null hypothesis is rejected, it means that at least one of
the instruments is not valid.
The test does not pinpoint which instruments are invalid.
-
8/9/2019 Panel Data I
27/34
27
4. Three important tests
3. TESTING FOR ENDOGENEITY OF REGRESSORS.
Even if the instruments are found to be valid, it is a good
idea to test whether it is really necessary to use IV/GMM.
This can be achieved through the Hausman test for the
endogeneity of regressors
Where VIV and VOLS are the variance of the IV and OLS
estimators respectively.
The null hypothesis is : “All regressors are exogenous”.
Under the null, H is distributed as Chi-squared random variable
with degrees of freedom equal to the number of regressors.
If the null hypothesis is not rejected, stick with OLS!!
-
8/9/2019 Panel Data I
28/34
28
Relevance
Exogeneity
Endogeneity
TEST TEST TEST!
-
8/9/2019 Panel Data I
29/34
29
5. Empirical example cont.
We can test for the relevance and exogeneity of the
instruments in our export model as follows:
The command “estat first” tests for instrument relevance.
Since the F statistic is greater than 10 and the p-value = 0,
the problem of weak instruments is probably not too serious.
-
8/9/2019 Panel Data I
30/34
30
5. Empirical example cont.
The command “ estat overid ” test for the exogeneity of
instruments.
The null hypothesis of the test is “ POL, EQUITY and STATE
are all exogenous instruments”.
Under the null, the test statistics is distributed as a Chi-
squared random variable with 2 degree of freedom ( 3
instrumental variables – 1 endogenous regressor).
The p-value of the test = 0, so we reject the null hypothesis
and conclude that at least one of the instruments is not valid.
Thus the IV results reported earlier should be discarded.
Let’s try re-estimating the model by dropping one of the
instruments (EQUITY).
-
8/9/2019 Panel Data I
31/34
31
5. Empirical example cont.
POL and STATE
are valid instruments
-
8/9/2019 Panel Data I
32/34
32
5. Empirical example cont.
With valid instruments, the results suggest that bank loans
play a positive and highly significant role in boosting exports.
To check the robustness of this finding, re-estimate the last
model by GMM.
The two sets of results are practically the same, which is
reassuring.
-
8/9/2019 Panel Data I
33/34
33
5. Empirical example cont
.
Finally test for the endogeneity of BANK
The Hausman test suggests that BANK is indeed
endogenous. So using OLS would have been problematic.
REJECT THE NULL
-
8/9/2019 Panel Data I
34/34
34
6. Summary
1. The problem of endogeneity is common in applied
econometrics.
2. IV/GMM offer a way of tackling the problem of
endogeneity.
3. It is important to test for the validity of the
instruments before taking the results from IV/GMM
estimation seriously.
THANK YOU!