week 2: pooling cross section across time (wooldridge chapter...

30
Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China March 3, 2014 1 / 30

Upload: others

Post on 22-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Week 2: Pooling Cross Section across Time(Wooldridge Chapter 13)

Tsun-Feng Chiang*

*School of Economics, Henan University, Kaifeng, China

March 3, 2014

1 / 30

Page 2: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time

Pooling Cross Sections across Time

Pooled Cross-Section DataFor each time period, samples are randomly drew within a region.Observations are independently, but not identically.

Sample (Individual) Year y x1 x2 x3 x41 1990 132 52 33 25 12 1990 340 43 81 22 13 1990 154 22 54 75 04 1990 211 26 97 86 15 1990 95 22 81 32 06 1991 315 54 59 29 17 1991 203 22 65 37 08 1991 184 45 75 26 1...

......

......

......

2 / 30

Page 3: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time

Panel Data (Longitudinal Data)For each sample, its observations are collected across time.Therefore, the observations are not independent. That is, sometime-invariant sample characterisitcs could affect the observationsover time.Balanced Panel: each sample (Individual) has the same timeperiod.

Sample (Individual) Year y x1 x2 x3 x41 1990 132 52 33 25 11 1991 340 43 81 22 11 1992 154 22 54 75 11 1993 211 26 97 86 11 1994 95 22 81 32 12 1990 315 54 59 29 02 1991 203 22 65 37 02 1992 184 45 75 26 02 1993 319 56 60 34 02 1994 189 59 43 22 03 1990 200 39 76 25 1...

......

......

......

3 / 30

Page 4: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Pooled Cross Sections

Since each obervation is independent, we treat all samples like theyare ordinary cross-sectional data, but need to control the effect of timeon the dependent variable of interest. The regression model of pooledcross section:

yi = β0 + δ2d2 + δ3d3 + · · ·+ δT dT + β1xi1 + β2xi2 + · · ·+ βkxik + ui(eq. 1)

where d2,d3, · · · ,dT , is a dummy variable of time when the sample iwas drew.

Given the Assumption 1 to 6, the model with pooled data can beestimated by OLS.

4 / 30

Page 5: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Example 13.1Figure: Determinants of Women’s Fertility

5 / 30

Page 6: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Structural Change across Time

The previous estimations assume the effects of independent variablesare time invariant. What if they are not? The possible solution lies onthe Chow Test (Ch. 7).

Method 1Run one restricted model (eq. 1) and T unrestricted models forseparate time period:

yi = β0 + β1xi1 + β2xi2 + · · ·+ βkxik + ui for t = 1,2, · · · ,T

Obtain SSRs from both the unrestricted and restricted model, thenapply the F test (Chow statistic):

F = [SSRr−SSRurSSRur

][ (n−T−Tk)(T−1)k ]

where SSRur = SSR1 + · · ·+ SSRT ; k: # of explanatory variables.

6 / 30

Page 7: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Example 13.2Figure: Change in the Return to Education and Gender Gap

7 / 30

Page 8: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Chow Test (Example 13.2)Figure: SSR for the first period unrestricted model

Figure: SSR for the second period unrestricted model

8 / 30

Page 9: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Chow Test (Example 13.2, continued)Figure: SSR for the restricted model

F(5,1072) = [ 184.3−(81.3+101.3)(81.3+101.3) ][ (1084−2−2∗5)

(2−1)∗5 ] = 1.99

Do not reject the null hypothesis that no strucutual change exist.

9 / 30

Page 10: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Chow Test (Example 13.2, continued)My R code:>gender_return = read.csv(file.choose(), header = TRUE)

Separate data collected from two different years:> subset_1978 <-subset(gender_return, y85<1)> subset_1985 <-subset(gender_return, y85>0)> gender_return_all <- lm(lwage ∼y85+educ+exper+I(exper^2)+union+female,data = gender_return )> anova(gender_return_all)

> gender_return_1978 <- lm(lwage ∼educ+exper+I(exper^2)+union+female,data = subset_1978 )> anova(gender_return_1978)

> gender_return_1985 <- lm(lwage ∼educ+exper+I(exper^2)+union+female,data = subset_1985 )> anova(gender_return_1985)

10 / 30

Page 11: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Method 2 (applicable to two-period models only)Add a time dummy in the model and Interact each explanatory variablewith the time dummy:

yi =β0+δ0d2+δ1d2×xi1+β1xi1+δ2d2×xi2+β2xi2+· · ·+δkd2×xik +βkxik +ui

Jointly test the linear hypothesis: H0 : δ0 = δ1 = · · · = δk = 0.If rejected, then there is the structural change.

11 / 30

Page 12: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Policy Analysis

Using the pooled cross section data, it is possible to do policy analysis.Suppose a policy (mostly a dummy) was employed at a time, then wecan see whether it caused effect on the dependent variable hereafter.The change of the environment where the samples operate cause bysome exogenous event is called a natural experiment(quasi-experiment).

Let z be the scale of a policy, then with the two-period pooledcross-section data, the regression model is:

y = β0 + δ0d2 + β1z + δ1d2 · z+ other control variables. (eq. 2)

when z = 1, it means the sample is in the scale of a policy (treatment groupin the experiment); otherwise z = 0 (control group in the experiment); d2 = 1is the time after the policy was employed, otherwise, d2 = 0.

12 / 30

Page 13: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Suppose there is a representative sample (x) who is:(i) not covered by the policy (z = 0), then in the first period (d2 = 0) beforethe policy is employed, then the average effect of the policy and time isyz=0,d2=0 = β0.

(ii) not covered by the policy (z = 0), then in the second period (d2 = 1) afterthe policy is employed, the average effect of the policy and time isyz=0,d2=1 = β0 + δ0 .

(iii) covered by the policy (z = 1), then in the first period (d2 = 0) before thepolicy is employed, the average effect of the policy and time isyz=1,d2=0 = β0 + β1.

(iv) covered by the policy (z = 1), then in the second period (d2 = 1) after thepolicy, the average effect of the policy and time isyz=1,d2=1 = β0 + δ0 + β1 + δ1.

the average treatment effect is:δ1 = (yz=1,d2=1− yz=1,d2=0)− (yz=0,d2=1− yz=0,d2=0) = (∆ytreatment −∆ycontrol ).

13 / 30

Page 14: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Pooled Cross Sections

Example 13.3Figure: Effect of a Garbage Incinerator’s Location on Housing Prices (Column3, Table 13.2)

14 / 30

Page 15: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

Two-Period Panel Data Analysis

Omitted Variables Motivation: there are some important controlvariables, but not observable or not available in the data sets. Thefixed effect the capture the effect of unobservable characteristics ofsamples.

Let i denote a unit of sample, such as a firm, an individual, a county,and t denote time. A fixed effects model can be written as:

yit = β0 + δ0d2t + β1xit1 + β2xit2 + · · ·+ βkxitk + ai + uit , t = 1,2 (eq. 3)

where ai is called fixed effect, a i ’s unobservable charactersitic (so it iscalled unobserved heterogeneity). uit is called the idiosyncratic error ortime-varying error.

15 / 30

Page 16: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

If both uit and ai are uncorrelated with all variables xits, let vit = uit + ai ,which is also uncorrelated with xits. (eq. 3) can be written as:

yit = β0 + δ0d2t + β1xit1 + β2xit2 + · · ·+ βkxitk + vit

vit is called the composite error. Since Assumption 1-4 hold, the βestimated using pooled OLS is unbiased (Assumption 5 is notguaranteed, so the test of homoskedasticity might follow).

However, it is rare that ai are uncorrelated with all variables xits. Whenai affects any xit , the pooled OLS estimates are biased (calledheterogeneity bias).

Remedy: take ai away.

16 / 30

Page 17: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

In the second period:

yi2 = (β0 + δ0) + β1xi21 + β2xi22 + · · ·+ βkxi2k + ai + ui2, (d2t = 1)

In the first period:

yi1 = β0 + β1xi11 + β2xi12 + · · ·+ βkxi1k + ai + ui1, (d2t = 0)

Difference the two equations to obtain the following first-differencedequation:

(yi2 − yi1) =δ0 + β1(xi21 − xi11) + β2(xi22 − xi12) + · · ·+ βk (xi2k − xi1k ) + (ui2 − ui1)

⇒ ∆yi = δ0 + β1∆xi,1 + β2∆xi,2 + · · ·+ βk ∆xi,k + ∆ui (eq. 4)

where (xi21 − xi11) = ∆xi,1, (xi22 − xi12) = ∆xi,2 and so on.

17 / 30

Page 18: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

For (eq. 4), if the strict exogeneity holds, that is ∆ui is uncorrelatedwith all of ∆xi,1 and ∆xi,2 · · · and ∆xi,k , the OLS method is applicableto estimate δ and β. δ and β are unbiased. However, (eq. 4) is notalways working when

Strict exogeneity failsUsing the OLS leads to biased estimators. Adding moretime-varying variables can solve this problem in some extent.

Variation in xitj is small across timeLittle variation in ∆xij can lead to large standard error for βj . Usinglonger differences over time is a way to increase variation. If∆xij = 0, it is impossible to estimate (eq. 4) using the OLS sincethe assumption of no mulitcollinearity is violated. These kinds oftime-invariant should be taken away from the model, because it isa part of ai .

18 / 30

Page 19: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

Example 13.5Figure: Two-Period Panel Data

19 / 30

Page 20: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

Example 13.5 (continued)can’t use pooled cross-section data because individual fixed characteristicsare correlated with other independent variables.Figure: Sleeping versus Working

20 / 30

Page 21: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

Policy Analysis

Like the pooled cross-section data, the two-period panel data areconvenient to analyze the effects of policies. Let z be a dummyvariable of policy coverage for the sample (zit = 1 if the sample iscovered at time t ; zit = 0, otherwise) and d2 for the time dummy(d2 = 1 when the sample is in the second period; d2 = 0 otherwise).The the fixed-effect model is:

yit = β0 + δ0d2 + β1zit + ai + uit for t = 1,2

Take the difference of the two equations, the model becomes

∆yi = δ0 + β1∆zi + ∆ui (eq. 5)

Use OLS to run the model if Assumption 1 to 4 hold. β1 is the averagetreatment effect.

21 / 30

Page 22: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

Why no interaction term?Suppose we had estimated (eq. 5), and obtained OLS coefficients δ0and β1. The average effect of the policy is:

∆y = δ0 + β1∆z

For a sample who is in the treatment group (∆z = 1), the averageeffect of the policy is ∆ytreatment = δ0 + β1.

For a sample who is in the control group (∆z = 0), the average effectof the policy is ∆ycontrol = δ0.Therefore,

β1 = ∆ytreatment −∆ycontrol

which is equivalent to the definition of δ1 estimated in pooledcross-section model (eq. 2).

22 / 30

Page 23: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Two-Period Panel Data Analysis

Example 13.7Figure: Effect of Drunk Driving Laws on Traffic Fatalities

23 / 30

Page 24: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Multi-Period Panel Data Analysis

Panel Data with More than Two Time Periods

When there are T>2 periods, more time dummies should be added in(eq. 3),

yit = δ1 + δ2d2t + δ3d3t + · · ·+ δT dTt + β1xit1 + β2xit2 + · · ·+ βkxitk +ai + uit , t = 1,2, · · · ,T

Assume strict endogeneity holds, or cov(uis, xitj) = 0 for all t , s, j . Thisrules out the effect of current changes in the error on the futureexplanatory variables (see Chapter 15 if it is violated).

To remove the fixed effect ai from all T equations, substract the modelof the t-th period from the (t + 1)th period. There will be T − 1differenced equations,

∆yit = δ2∆d2t + δ3∆d3t + · · ·+ δT ∆dTt +β1∆xit1 + · · ·+βk ∆xitk + ∆uit ,t = 2,3, · · · ,T (eq. 6)

24 / 30

Page 25: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Multi-Period Panel Data Analysis

(eq. 6) has no intercept, it is inconvenient to calculate R2. Look closer,the differenced equations can be shown in the following form for eachperiod,

δ2 + β1∆xit1 + · · ·+ βk ∆xitk + ∆uit , for t = 2−δ2 + δ3 + β1∆xit1 + · · ·+ βk ∆xitk + ∆uit , for t = 3−δ3 + δ4 + β1∆xit1 + · · ·+ βk ∆xitk + ∆uit , for t = 4

...−δT−1 + δT + β1∆xit1 + · · ·+ βk ∆xitk + ∆uit , for t = T

There are actually an intercept and a time dummy in the equation foreach t . (The equation for t = 2 is an exception, we treat its timedummy as an intercept.) Therefore, (eq. 6) can be re-written as:

∆yit = α0 + α3d3t + · · ·+ αT dTt + β1∆xit1 + · · ·+ βk ∆xitk + ∆uit ,t = 2,3, · · · ,T (eq. 7)

Using OLS for pooled cross-section data to estimate (eq. 7); theestimates of βj are identical in (eq. 6).

25 / 30

Page 26: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Multi-Period Panel Data Analysis

A New Assupmtion of UncorrelationWhen there are more than two-period data, the new assumption that∆uit is uncorrelated over time. That is,

cov(∆uit ,∆uis) = 0, for all t 6= s

(This assumption is not guaranteed If we assume uit in thepre-differenced equation is uncorrelatd over time.) Only when uitfollows the random walk will this assumption be satisfied.

Definition: Autogression Model (AR Model)

A regression model is called AR(P)

yt = a0 + a1yt−1 + a2yt−2 + · · ·+ apyt−p + et

where et is the disturbance white noise which satisfies (1) E(et ) = 0for all t , (2) E(e2

t ) = σ2 for all t , and (3) cov(et ,es) = 0 for all t 6= s.Random Walk is an AR(1) with a0 = 0 and a1 = 1.

26 / 30

Page 27: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Multi-Period Panel Data Analysis

Example 13.8 Panel data with more than two periodsFigure: Data

27 / 30

Page 28: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Multi-Period Panel Data Analysis

Example 13.8 (Continued)Figure: Missing Value in the Data Set

I use the command na.omit( ) to get away the NA in the data.> ezone_nomissing <- na.omit(ezone)

28 / 30

Page 29: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Pooling Cross Sections across Time Multi-Period Panel Data Analysis

Example 13.8 (Continued)Figure: Effect of Enterprise Zones on Unemployment Claims

29 / 30

Page 30: Week 2: Pooling Cross Section across Time (Wooldridge Chapter …economics-course.weebly.com/uploads/2/5/7/2/25725158/... · 2019. 8. 6. · Pooling Cross Sections across Time Pooling

Announcement

Homework 1 due on March 11th, 2014.

Homework 2 due on March 25th, 2014.

First midterm is scheduled on April 2nd, 2014.

No class on March 18th, 2014.

30 / 30