the big problems file

197
Duke University, practice problems for Introduction to Econometrics October 26, 2009 1 Properties of Expectations, Variances and Covariances, Distributions 1. (5 points) You know that income per head in Italy (denoted by y ), expressed in Euros, is normally distributed, that is, y N y ; 2 y You also know that the mean y is Euros 16,000, and the standard deviation of y is 2000. What is the distribution of income per head in Italy, in US$, if the exchange rate is 1.10 US$ per one Euro? Solution: Denote income per head in US$ as ~ y. Then ~ y =1:10y. Given that y is normally distributed and ~ y is just a linear transformation of y, ~ y also follows a normal distribution with mean ~ y : ~ y =1:10 y =1:10 16; 000 = 17; 600 and variance ~ 2 y : ~ 2 y = V ar(1:10y)=1:10 2 V ar(y) = (1:10 SD(y)) 2 = (1:10 2; 000) 2 =4; 840; 000 Thus: ~ y N (17; 600; 4; 840; 000) Here, notice that, by convention, the second argument of N is the variance, and not the standard deviation! 2. (6 points) Let Y have a uniform distribution over the interval (0;). Show that 2 Y is an unbiased estimator for . (Recall that the pdf of a uniform distribution over the interval (a; b) is given by f (x)= 1 ba ). Answer: Here Y i U [0;] : The pdf of this uniform distribution is f (y)= 1 : Therefore, E (Y i )= 1 R 1 yf (y) dy = R 0 y 1 dy = y 2 2 j 0 = 2 2 = 2 Now, E 2 Y = 2 E Y =2 E 1 n n P i=1 Y i = 2 1 n n P i=1 E(Y i )=2 1 n n P i=1 2 =2 2 = Since E 2 Y = ; 2 Y is an unbiased estimator of : Note: Several people showed that E (2Y )= ; which is also true, but not what the question was asking so you only got partial credit for that. 3. (7 points total) Suppose that a mutual fund is investing in three di/erent asset categories. Each asset category includes many di/erent stocks or bonds. Let the variable X represent the asset category, and let R indicate the one-year expected (predicted) percentage return for a particular asset (one particular bond, or one particular stock). The following table shows the asset allocation of the fund, together with the one-year 1

Upload: michael-mazzeo

Post on 05-Mar-2015

1.185 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: The Big Problems File

Duke University, practice problems for Introduction to Econometrics

October 26, 2009

1 Properties of Expectations, Variances and Covariances, Distributions

1. (5 points) You know that income per head in Italy (denoted by y ), expressed in Euros, is normally distributed,that is,

y � N��y; �

2y

�You also know that the mean �y is Euros 16,000, and the standard deviation of y is 2000. What is thedistribution of income per head in Italy, in US$, if the exchange rate is 1.10 US$ per one Euro? Solution:Denote income per head in US$ as ~y. Then ~y = 1:10y. Given that y is normally distributed and ~y is just alinear transformation of y, ~y also follows a normal distribution with mean ~�y:

~�y = 1:10�y = 1:10 � 16; 000 = 17; 600

and variance ~�2y:

~�2y = V ar(1:10y) = 1:102V ar(y) = (1:10 � SD(y))2 = (1:10 � 2; 000)2 = 4; 840; 000

Thus:~y � N (17; 600; 4; 840; 000)

Here, notice that, by convention, the second argument of N is the variance, and not the standard deviation!

2. (6 points) Let Y have a uniform distribution over the interval (0; �). Show that 2Y is an unbiased estimatorfor �. (Recall that the pdf of a uniform distribution over the interval (a; b) is given by f (x) = 1

b�a). Answer:Here Yi � U [0; �] : The pdf of this uniform distribution is f (y) = 1

� : Therefore,

E (Yi) =1R�1yf (y) dy =

�R0

y1

�dy =

y2

2�j�0 =

�2

2�=�

2

Now,

E�2Y�= 2 � E

�Y�= 2 � E

�1

n

nPi=1Yi

�= 2 � 1

n

nPi=1E(Yi) = 2 �

1

n

nPi=1

2= 2 � �

2= �

Since E�2Y�= �; 2Y is an unbiased estimator of �: Note: Several people showed that E (2Y ) = �; which is

also true, but not what the question was asking so you only got partial credit for that.

3. (7 points total) Suppose that a mutual fund is investing in three di¤erent asset categories. Each assetcategory includes many di¤erent stocks or bonds. Let the variable X represent the asset category, and let Rindicate the one-year expected (predicted) percentage return for a particular asset (one particular bond, orone particular stock). The following table shows the asset allocation of the fund, together with the one-year

1

Page 2: The Big Problems File

percentage expected return for each asset category. The expected returns have been calculated using someforecasting model, but how the forecast has been done has no relevance in this problem.

Proportion of assets invested One-year expected return forin asset type X assets in asset category X

Asset Category (X)

X = 1; Domestic Stock .30 0.10X = 2; International Stock .20 0.15X = 3; Bonds .50 0.00

(a) (5 points) Calculate the expected return of a dollar invested in the mutual fund. Which property ofexpectations is useful to solve this problem? Explain. 0.10(0.3) + 0.15(0.20) + 0.00(0.5) = 0.06 L.I.E.:

E(Return) = P (DS)E(ReturnjDS)+ P (IS)E(ReturnjIS)+ P (B)E(ReturnjB)

(b) (2 points) Calculate the predicted one-year percentage return of the fraction of your investment investedin Stocks. Using the L.I.E.:

E(ReturnjS) = P (DSjS)E(ReturnjDS) + P (ISjS)E(ReturnjIS)

=0:3

0:5� 0:1 + 0:2

0:5� 0:15

= 0:12

4. You decide to analyze whether or not the presidential candidate for a certain party did better if his partycontrolled the house. You have data for 34 presidential elections. Think of these data as the population whichyou want to describe, rather than a sample from which you want to infer behavior of a larger population. Yougenerate the following table:

Joint Distribution of Presidential Party A¢ liation and Party Control of House of Representa-tives, 1860-1996

Dem. control House (Y=0) Rep. control House (Y=1)Democratic President (X = 0) .412 .03Republican President (X = 1) .176 .382

(a) Compute E [X]. = P (X = 1) = :176 + :382 = 0:558

(b) Compute E [X j Y = 1] : = 1P (X=1;Y=1)P (Y=1) + 0P (X=0;Y=1)P (Y=1) = :382:382+:03 = 0:927 18

(c) If you picked one of the Republican presidents at random, what is the probability that during his termthe Democrats had control of the House?

P (Y = 0jX = 1) =:176

:558= :315

(d) Are X and Y independent? Justify your answer. Certainly not independent. Clearly the two variablesare interrelated. Formally, it is su¢ cient to notice that E [X] 6= E [X j Y = 1] .

2

Page 3: The Big Problems File

5. Let Y be a binary random variable, that is, a random variable that only takes two values, 0 and 1. Y representsunemployment status, and Y = 1 if you are unemployed at age 30, and = 0 otherwise. Let X represent yearsof schooling. You know that the conditional probability of being unemployed at age 30 given years of schoolingis described by the following relation:

P (Y = 1jX) = e�1�(0:1)X

1 + e�1�(0:1)X:

(a) Prove that E (Y j X) = P (Y = 1 j X) : Remember that Y is either equal to 1 or to zero. So:

E (Y j X) = 1P (Y = 1jX) + 0P (Y = 0jX) = P (Y = 1jX)

This was obvious enough once you realize that this conditinal random variable has a Bernoulli distribution,with the probability of a one that depends on X:

(b) Calculate V ar(Y j X): Y given X has a Bernoulli distribution with probability of a one equal toe�1�(0:1)X

1+e�1�(0:1)X; so

V ar(Y j X) =e�1�(0:1)X

1 + e�1�(0:1)X

1� e�1�(0:1)X

1 + e�1�(0:1)X

!

=e�1�(0:1)X�

1 + e�1�(0:1)X�2

(c) Calculate E (Y j X = 0)

E (Y j X = 0) = P (Y = 1 j X = 0) =e�1

1 + e�1= 0:268 94

(d) Calculate P (Y = 1 j X = 20)

P (Y = 1 j X = 20) =e�1�(0:1)20

1 + e�1�(0:1)20= 0:04742 6

6. Remember thatnXi=1

Xi = X1 +X2 + :::+Xn�1 +Xn

(a) prove that if Xi is constant, so that Xi = a for all i;P3i=1Xi = 3a

3Xi=1

Xi = X1 +X2 +X3 = a+ a+ a = 3a

(b) prove thatP3i=1

�XiP2j=1 Yj

�=

P3i=1XiP2j=1 Yj

3Xi=1

XiP2j=1 Yj

!=

X1P2j=1 Yj

+X2P2j=1 Yj

+X3P2j=1 Yj

=X1 +X2 +X3P2

j=1 Yj=

P3i=1XiP2j=1 Yj

3

Page 4: The Big Problems File

(c) verify thatP3i=1

�P2j=1XiZj

�=�P2

j=1 Zj

��P3i=1Xi

�3Xi=1

0@ 2Xj=1

XiZj

1A =3Xi=1

Xi

0@ 2Xj=1

Zj

1A =

0@ 2Xj=1

Zj

1A 3Xi=1

Xi

the last step follows because the summation over j does NOT depend on the index i:

(d) verify thatP3i=1 (a+ bXi) = 3a+ b

P3i=1Xi

3Xi=1

(a+ bXi) =3Xi=1

a+3Xi=1

bXi = 3a+ b3Xi=1

Xi

7. You have n observations, and you calculate the sample mean, which, by de�nition, is

�X =1

n

nXi=1

Xi

CalculatePni=1

�Xi � �X

�(show your steps!!).

nXi=1

�Xi � �X

�=

nXi=1

Xi �nXi=1

�X = n �X � n �X = 0

obviously, the mean deviation from the mean... is zero!

8. Prove thatPni=1

�Xi � �X

�2=Pni=1X

2i � n �X2. And once again don�t forget to show your steps!!!

nXi=1

�Xi � �X

�2=

nXi=1

�X2i � 2Xi �X + �X2

�=

nXi=1

X2i �

nXi=1

2Xi �X +

nXi=1

�X2

=

nXi=1

X2i � 2 �X

nXi=1

Xi + n �X2 =

nXi=1

X2i � 2 �Xn �X + n �X2

=

nXi=1

X2i � n �X2

9. 11. (16 points) A supermarket has two express lines. Let Xand Y denote the number of customers in the�rst and in the second, respectively, at any given time. During nonrush hours, the joint pdf of X and Y issummarized by the following table of joint probabilities:

X = 0 X = 1 X = 2 X = 3Y = 0 0:1 0:2 0 0Y = 1 0:2 0:25 0:05 0Y = 2 0 0:05 0:05 0:025Y = 3 0 0 0:025 0:05

(a) (4 points) What are the expectations of X and Y ?

4

Page 5: The Big Problems File

Solution: To answer this you need the marginal probabilities P (Y = y) and P (X = x), which can becalculated by summing up the rows and columns in the joint distribution matrix:

X = 0 X = 1 X = 2 X = 3Y = 0 0:1 0:2 0 0 0:3Y = 1 0:2 0:25 0:05 0 0:5Y = 2 0 0:05 0:05 0:025 0:125Y = 3 0 0 0:025 0:05 0:075

0:3 0:5 0:125 0:075

which can be used then to calculate E (X) = 0:3 � 0+ 0:5 � 1+ 0:125 � 2+ 0:075 � 3 = :975: Since the matrixis symmetric, the expectation of Y is also :975:

(b) (4 points) What is E (Y j X = 3)?

Solution: First you need to calculate P (Y = y j X = 3) for all values of y :

P (Y = 0 j X = 3) =0

:075= 0

P (Y = 1 j X = 3) =0

:075= 0

P (Y = 2 j X = 3) =0:025

:075=1

3

P (Y = 3 j X = 3) =0:05

:075=2

3

so that you can then calculate E (Y j X = 3) = 0 � 0 + 0 � 1 + 13 � 2 +

23 � 3 = 2:667:

(c) (4 points) Are X and Y independent? Explain.

Solution: If X and Y are independent, then

P (X = x; Y = y) = P (X = x) � P (Y = y)

This should hold for all values of X and Y , but we can see, for example, that

P (X = 0; Y = 0) = :1 6= P (X = 0) � P (Y = 0) = :09

so they are not independent. You could have picked many other examples from what the joint pdf wouldbe if they were independent (you did not need to reproduce this entire matrix!)

X = 0 X = 1 X = 2 X = 3Y = 0 0:09 0:15 0:0375 0:0225 0:3Y = 1 0:15 0:25 0:0625 0:0375 0:5Y = 2 0:0375 0:0625 0:015625 0:009375 0:125Y = 3 0:0225 0:0375 0:009375 0:005625 0:075

0:3 0:5 0:125 0:075

(d) (4 points) Find P (jX � Y j = 1), the probability that X and Y di¤er by exactly 1.

Solution: By de�nition,

P (jX � Y j = 1) =XXjX�Y j=1

fX;Y (x; y)

= fX;Y (0; 1) + fX;Y (1; 0) + fX;Y (1; 2) + fX;Y (2; 1) + fX;Y (2; 3) + fX;Y (3; 2)

= 0:2 + 0:2 + 0:05 + 0:05 + 0:025 + 0:025 = 0:55

5

Page 6: The Big Problems File

(You could also have calculated it as one minus the sum of the all the joint probabilities for whichjX � Y j 6= 1).

10. (6 points overall) Suppose you are estimating a parameter �; and that your estimator is consistent and normallydistributed (everything would work also for asymptotic normality, but let us keep things simple). We knowthat if we test the null hypothesis that � = �0; the signi�cance level � is the probability of rejecting the nullhypothesis when the null is actually correct. So, if � = 0:05;

0:05 = Pr (jtj > 1:96 j � = �0)where

t =� � �0SE

���

where notice that the probability is conditional on the null being true. Another important concept in econo-metrics is that of power. The power of a test is the probability of (correctly) rejecting the null when the nullis actually false! The power of a test depends upon which value of � is actually the true one. In fact, byde�nition,

power (�A) = Pr (jtj > 1:96 j � = �A)

(a) (3 points) Now suppose that the true value of the parameter of interest � = �A is very far away from thenull �0: Suppose also that you have a very large sample. Do you think that the power of the test will bebig or small? Justify your argument.

Solution: Given a large sample, the estimated value of the parameter will be close to its true value,�A. Under some regularity conditions, the variance of � should be relatively low (since we have a lot of

observations to estimate � accurately). In this case, t-statistic, which is ���0SE(�)

, will be large. Therefore,

we expect the power of the test to be high as well.

(b) (3 points) You are interested in testing a certain null hypothesis. If you could choose between twotests 1 and 2, and if you knew that for every possible true value of the parameter power (�A) jtest 1>power (�A) jtest 2; which test would you choose, 1 or 2? Justify your answer.

Solution: High power means that we reject false hypothesis more often, whatever the "truth" is. Thus,we would choose a more powerful test, i.e. test 1.

6

Page 7: The Big Problems File

2 Estimation and OLS Theory

1. (6 points) Assume that the OLS linearity assumption holds. So Yi = �0 + �1Xi + ui; and E [ui j Xi] = 0:Here you have to prove your answer. Be as speci�c as you can. The variance of Yi is given by

(a) �20 + �21V ar(Xi) + V ar(ui):

(b) the variance of ui.

(c) �21V ar(Xi) + V ar(ui)

(d) the variance of the residuals. Answer: C (use the fact that E [ui j Xi] = 0 implies that ui and Xi areuncorrelated.

V ar(Yi) = V ar(�0 + �1Xi + ui)

= V ar(�0) + V ar(�1Xi) + V ar(ui)

= 0 + �21V ar(Xi) + V ar(ui)

2. (6 points) We are using an estimator � to estimate a certain parameter �0 and your sample contains 20000observations. It turns out that the true value of the parameter is �0 = 2; but our point estimate is � = 2:5:What does this tell us about our estimator � ? Justify your answer.

(a) � has a large variance.

(b) � is biased.

(c) � is not consistent.

(d) none of the above. Answer: DThat the point estimate is di¤erent from the true parameter value may indicate large variance, but doesnot necessarily so. The de�nition of bias is E(�) = �0. This is not what is given in the question.Therefore, the statement may indicate bias, but not necessarily. Similarly, the given in the question doesnot necessarily mean � is inconsistent.

3. (21 points) In this problem, we will consider what happens when we run a �reverse� regression. Supposethat you have two variables Y and X for which the following relationship holds

Yi = �0 + �1Xi + ui (1)

Futhermore, suppose that the �rst three assumptions of OLS hold for this relationship, so that we can obtainunbiased and consistent estimates of �0 and �1 from an OLS regression of Y on X. However, by solving (1)for Xi we can also obtain

Xi = ��0�1+1

�1Yi �

ui�1

(2)

which we can simply rewrite asXi = �0 + �1Yi + vi (3)

where �0 � ��0�1; �1 � 1

�1; and vi � � ui

�1. Since (1) represents a true population relationship, we know that (3)

does too (by construction). We are interested in whether we can use a regression based on (3) (i.e. an OLSregression of X on Y ) to recover a consistent estimate of �1.

(a) (5 points) First, let�s establish a couple of preliminary results. Given that we know (from OLS Assump-tion 1) that E(u j X) = 0, prove that Cov(X;u) � �Xu = 0: Solution: We already know (from class orusing the LIE) that E(u j X) = 0) �u = E(u) = 0: Now let E(X) = �X : Starting with the de�nition ofcovariance and expanding the square we �nd that:

Cov(X;u) = E [(X � �X) (u� �u)] = E [(X � �X) (u)]= E(Xu)� E(�Xu) = E(Xu)� �XE(u) = E(Xu)

7

Page 8: The Big Problems File

Now applying the Law of Iterated Expectations we have:

E(Xu) = E(E (Xu j X)) = E(XE (u j X))) = 0

So we conclude that Cov(X;u) = 0:

(b) (3 points) Now let V ar(u) = �2u: Using the result from part a, calculate the covariance between Y and v interms of �2u: Solution: Since we know in general that Cov(a+bX+cY; dY ) = bdCov(X;Y )+cdCov(Y; Y ),we can see that

Cov(Y; v) = Cov

��0 + �1X + u;� ui

�1

�= �Cov(X;u)� 1

�1Cov(u; u) = �0� �

2u

�1= ��

2u

�1

(c) (2 points) Now let�s consider the proposed �reverse� regression. Show that the OLS estimator of �1(from a regression of X on Y ) can be written as

b�1 = �1 +Xn

i=1

�Yi � Y

�viXn

i=1

�Yi � Y

�2Solution: We know that in usual regression of Y on X that the estimator of the slope �1 can be writtenas

b�1 = �1 +Xn

i=1

�Xi �X

�uiXn

i=1

�Xi �X

�2Simply switching the roles of Y and X and using (3) we see that b�1 can be written as

b�1 = �1 +Xn

i=1

�Yi � Y

�viXn

i=1

�Yi � Y

�2(d) (6 points) Is b�1 a consistent estimator for �1? Prove your answer. Solution: We know that the OLS

estimator b�1 can be written asb�1 = �1 +

Xn

i=1

�Yi � Y

�viXn

i=1

�Yi � Y

�2= �1 +

1n�1

Xn

i=1

�Yi � Y

�(vi � v)

1n�1

Xn

i=1

�Yi � Y

�2 = �1 +sY vs2Y

We know that sY vp! �Y v and s2Y

p! �2Y so that b�1 p! �1 + �2Y: But since �Y v = ��2u

�1, the second term

does not converge to zero, so b�1 is not a consistent estimator for �1:(e) (5 points) Suppose you were able to �nd a consistent estimator of �1. Let�s call this estimator e�1: Can

you use e�1 to construct a consistent estimator of �1: If yes, do so. If not, explain why you can�t. Solution:Since e�1 is a consistent estimator of �1, we know that e�1 p! �1 � 1

�1: Therefore, by the continuous mapping

theorem, we know that 1e�1 p! 1�1� �1 (assuming that �1 6= 0). So 1e�1 is a consistent estimator of �1:

4. You want to calculate the mean proportion of the total budget that individuals devote to health expenditurein the US. You have an iid sample of n individuals from a large household survey, where total expenditure (Yi)and health expenditure (Hi) are recorded for each individual. Let Wi be the budget share for health, that is,

Wi =HiYi:

So, you would like to estimate E (Wi) = �W :

8

Page 9: The Big Problems File

(a) (8 points) Suggest a consistent estimator for E (Wi). Write down the proposed estimator, and explainwhy it is consistent (this is very simple, so please do not look for di¢ cult or long answers). Solution:We estimate expectations with their respective sample equivalent, that is, the sample mean. If I want to to estimatethe expectation of W, my estimator will be

�W =1

n

nXi=1

Wi

This is consistent by the usual LLN.

(b) (8 points) A researcher suggests that one may estimate �W calculating the ratio of total expenditure inhealth for all individuals in the sample over total expenditure for all individuals in the sample. That is,the suggested estimator is

�W =

Pni=1HiPni=1 Yi

Is �W a consistent estimator for �W ? Prove your answer. Solution: No, it is not consistent, because it willconverge to the ratio of the two expectations, and this is di¤erent from the expectation of the ratios!!Pn

i=1HiPni=1 Yi

=1n

Pni=1Hi

1n

Pni=1 Yi

! E [H]

E [Y ]6= E

�H

Y

�In practice, you can actually prove that this will converge to a weighted average of budget shares, with more weightgiven to richer households!Pn

i=1HiPni=1 Yi

=

Pni=1Hi

YiYiPn

i=1 Yi=

Pni=1 Yi

�HiYi

�Pni=1 Yi

=nXi=1

�YiPni=1 Yi

��HiYi

�5. (12 points total) You want to calculate the average share of income that people donate to charity in theU.S. You have an iid sample of n individuals from a large government survey, where total charitable donations(Di) and total income (Yi) are recorded for each individual. Let Ci be the share of income donated to charity:

Ci =DiYi:

You would like to estimate E (Ci) = �C :

(a) (5 points) Suggest a consistent estimator for E (Ci). Write down the proposed estimator, and explainwhy it is consistent.

Solution: The sample mean C = 1n

Pni=1Ci =

1n

Pni=1

DiYiis a consistent estimator. We know it�s

consistent by the law of large numbers, which states that the sample mean is a consistent estimator ofthe true mean (for an iid sample with �nite variance).

(b) (7 points) A colleague suggests that you could estimate �C calculating the ratio of total charitiabledonations for all individuals in the sample over total income for all individuals in the sample:

�C =

Pni=1DiPni=1 Yi

Is �C a consistent estimator for �C? Prove your answer.

Solution: This is not a consistent estimator for �C � E�DY

�. While the probability limit of b�C is

E(D)E(Y ) =

�D�Y, because

1n

Pni=1Di

1n

Pni=1 Yi

p�! �D�Y

=E(D)

E(Y )

we know that E(D)E(Y ) 6= E�DY

�= E (C) = �C ; which is what we want to estimate here.

9

Page 10: The Big Problems File

6. You have n observations, and you calculate the sample mean, which, by de�nition, is

�X =1

n

nXi=1

Xi

(a) (4 points) CalculatePni=1

�Xi � �X

�(show your steps!!).

SolutionnXi=1

�Xi � �X

�=

nXi=1

Xi �nXi=1

�X = n �X � n �X = 0

obviously, the mean deviation from the mean... is zero!

(b) (5 points) Prove thatPni=1

�Xi � �X

�2=Pni=1X

2i � n �X2. And once again don�t forget to show your

steps!!!

SolutionnXi=1

�Xi � �X

�2=

nXi=1

�X2i � 2Xi �X + �X2

�=

nXi=1

X2i �

nXi=1

2Xi �X +nXi=1

�X2

=nXi=1

X2i � 2 �X

nXi=1

Xi + n �X2 =

nXi=1

X2i � 2 �Xn �X + n �X2

=nXi=1

X2i � n �X2

7. (10 points) This is a simple exercise with an important �hidden� lesson. Suppose that grade G in youreconometrics class depends on the di¢ culty D of an exam. Your crazy instructor does not like studentsarriving late in class, because he �nds it distracting. So, chances are that if students keep arriving late, hewill be subconsciously upset while writing exams, with the consequence that the likelihood of having hardexam questions will increase. Let L denote a binary variable equal to one if some students keep arriving late,and equal to zero otherwise. Let D be a binary variable equal to 1 if the exam is �di¢ cult�, and equal tozero otherwise. You know that Pr (D = 1 j L = 1) = :5, and Pr (D = 1 j L = 0) = :2. You also know thatconditional on exam di¢ culty, G and L are independent, that is, once you know if D is equal to one or zero,knowing whether students used to arrive late has no information whatsoever about G: Exams are graded on a 0to 100 scale, and based on data from previous years you know that E [G j D = 1] = 60 and E [G j D = 0] = 75.Calculate E [G j L = 0] and E [G j L = 1]. Based on your results, would you say that arriving late in class isa good or a bad strategy if one wants to maximize the grade? You do not need to turn in answers to the lastquestion, which is just meant to be rhetorical. � SOLUTION: We know that exams can be either di¢ cultor not di¢ cult. So we can write can also be written as

E [GjL = 0] = E [GjL = 0; D = 1]P (D = 1jL = 0)+E [GjL = 0; D = 0]P (D = 0jL = 0) :

We know that once you know if D is equal to one or zero, knowing whether students used to arrive late hasno information whatsoever about G: So we can be sure that

E [GjL = 0; D] = E [GjD] :

But then we have all elements to calculate the conditional expectations we are after, because

E [GjL = 0] = E [GjD = 1]P (D = 1jL = 0) + E [GjD = 0]P (D = 0jL = 0)= 60 (:2) + 75 (:8) = 72

10

Page 11: The Big Problems File

Similarly, we have

E [GjL = 1] = E [GjD = 1]P (D = 1jL = 1) + E [GjD = 0]P (D = 0jL = 1)= 60 (:5) + 75 (:5) = 67:5 < 72

8. You have an i:i:d: sample of n observations drawn from a random variable (RV) X. You can assume that all

i:i:d RVs in this problem have �nite variance. By de�nition, the variance of X is �2X = Eh(X � �X)2

i: The

variance is just an expected value, and we know that expected values can usually be estimated using their�sample equivalent�, that is, a sample mean. So, if �X is known, one can estimate the variance with

�2X =1

n

nXi=1

(Xi � �X)2 (1)

(a) Show that �2X is an unbiased estimator of �2X ; that is, show that E

��2X�= �2X :

E��2X�= E

"1

n

nXi=1

(Xi � �X)2#

=1

n

nXi=1

E (Xi � �X)2 =1

n

nXi=1

�2X = �2X

(b) Show that �2Xp�! �2X (that is, �

2X converges in probability to the true variance, so that �

2X consistently

estimates the variance). We know that (Xi � �X)2 is iid, because it is a deterministic function of an iid RV.The problem tells us that all iid variables in this problem have �nite variance, so we can use the LLN and concludethat

1

n

nXi=1

(Xi � �X)2p�! E (Xi � �X)2 = �2X

(c) To use (1) above, you need to know �X ; which in general is not known. So now suppose that you haveto estimate �X as well. Of course we will use a sample mean to estimate this parameter too. Then theestimator for the variance becomes a �two-step estimator�. Let us call it ~�2X . In a �rst step you estimate�X by using �X; then you plug this estimate into (1) to obtain �

2X : So, if the mean is unknown, we have

~�2X =1

n

nXi=1

�Xi � �X

�2:

We want to see if this estimator is unbiased and consistent for �2X : We can do this through a sequence ofsmall steps. First, if you add and subtract �X in the expression in parenthesis what you get is

~�2X =1

n

nXi=1

�(Xi � �X)�

��X � �X

��2 (2)

(d) Let �2�X represent the variance of the sample mean. Prove that �2�X = E��X2�� �2X . We know that �X is

a random variable with mean �X and variance�2Xn . We also know that in general the variance of a RV can be

written as the di¤erence between the expected value of its square and the square of its expected value. Hence theresult follows.

(e) Now, prove that 1n

Pni=1 (Xi � �X)

��X � �X

�=��X�2 � 2�X �X + �2X :

1

n

nXi=1

(Xi � �X)��X � �X

�=1

n

nXi=1

�Xi �X �Xi�X � �X�X + �

2X

�= �X

1

n

nXi=1

Xi � �X1

n

nXi=1

Xi �1

n�X�X

nXi=1

1 +1

n

nXi=1

�2X

= �X2 � �X �X � �X�X + �2X = �X2 � 2�X �X + �2X

11

Page 12: The Big Problems File

(f) Show that Eh��X�2 � 2�X �X + �2X

i=

�2Xn

Eh��X�2 � 2�X �X + �2X

i= E

h��X�2i� 2�2X + �2X

= Eh��X�2i� �2X = �2X

n

which is equal to the variance of the sample mean, and then is equal to�2Xn .

(g) Show that Eh1n

Pni=1

��X � �X

�2i=

�2Xn

E

"1

n

nXi=1

��X � �X

�2#=

1

n

nXi=1

Eh��X � �X

�2i=

1

n

nXi=1

V ar��X�= V ar

��X�=�2Xn

(h) Expanding the square in equation (2), and using the results in parts a), f), and g), show that

E�~�2X�=n� 1n

�2X

E�~�2X�= E

"1

n

nXi=1

�(Xi � �X)�

��X � �X

��2#= E

"1

n

nXi=1

(Xi � �X)2#

| {z }from (a)

+E

"1

n

nXi=1

��X � �X

�2#| {z }

from (g)

� 2E"1

n

nXi=1

(Xi � �X)��X � �X

�#| {z }

from f (and (e))

= �2X +�2Xn� 2�

2X

n= �2X �

�2Xn=

�n� 1n

��2X

12

Page 13: The Big Problems File

(i) So, ~�2X is a biased estimator of the variance! What happens to the bias when n!1 ?

Our estimator is biased, but the bias goes to zero when n grows large. So, we should expect this estimator to bebiased but consistent.

(j) Given the results above, �nd an unbiased estimator for the variance �2X for the case where �X� as in thiscase� is not known. We know that

E�~�2X�=

�n� 1n

��2X

so that an unbiased estimator can be obtained simply multiplying ~�2X by the inverse of n�1n : Let

S2 � n

n� 1 ~�2X

then

E�S2�=

n

n� 1E�~�2X�=

n

n� 1

�n� 1n

��2X = �

2X

So, an unbiased estimator of the variance is (see equation 3.7 on Stock & Watson)

S2 =n

n� 1 ~�2X =

n

n� 11

n

nXi=1

�Xi � �X

�2=

1

n� 1

nXi=1

�Xi � �X

�2

13

Page 14: The Big Problems File

9. (6 points) Suppose you are given the following model:

yi = �+ ui

where � is an unknown constant.

(a) (2 points) What is the OLS estimator for �? What other estimator is computed in the same way?Answer: The OLS estimator solves the minimization problem

Min�

Xbu2i = Min�

X(yi � byi)2 =Min

X(yi � �)2

@P(yi � �)2

@�= 0 =)

X(yi � �) = 0 =) � =

1

n

Xyi = y

which is just the sample mean of y. This corresponds to the univariate regression where all the Xi�s (andtherefore X ) equal zero. So �1 (and b�1) equal zero and b�0 = Y � b�1X = Y

(b) (2 points) Compute 1n�1

Pbu2i :What other estimator is computed in the same way? Answer: 1n�1

Pbu2i =1n�1

P(yi � byi)2 = 1

n�1P(yi � b�)2 = 1

n�1P(yi � y)2 which is just the sample variance of y.

(c) (2 points) Interpret your �ndings from a) and b). Answer: As we showed in class, Y is the least squaresestimator of �Y ; (Min�

P(yi � �)2) so part a) should not be surprising. In part b), you are calculating a

measure of the variance that you are not explaining, which in this case is all of the variation around themean of Y (i.e. the variance of Y ).

10. You have a sample of n i.i.d. observations from two independent random variables ei and ui, and you knowthat these two random variables have both mean zero. You also know that V ar(ei) = �2e <1 and V ar(ui) =�2u <1. Let �e and �u denote the means of the two random variables.

(a) Using the Central Limit Theorem (CLT), prove thatpn(�e)

d! N(0; �2e), that is, prove that the asymptoticdistribution of

pn(�e) is a normal with mean zero and variance �2e. Solution: From the CLT (and using

the fact that we have iid observations with �nite variance) we know that

(�e� �e)�epn

=

pn(�e� �e)�e

d! N (0; 1))pn(�e� �e)

d! N�0; �2e

�:

Then the conclusion simply follows noting that �e = 0 by assumption.

(b) Using again the CLT, determine the asymptotic distribution ofpn(�u). Solution: This is just the same

as in the previous point.

(�u� �u)�upn

=

pn(�u� 0)�u

d! N (0; 1))pn(�u)

d! N�0; �2u

�(c) Using the results from the previous two parts, and assuming that the sample size n is very large (but not

1), what is the distribution of (�e+ �u) approximately equal to? Solution: We know that the sum of twonormal distributions is normal, and we know that u and e are independent, hence uncorrelated with zerocovariance. Then in large samples

pn(�e+ �u) � N

�0; �2e + �

2u

�:

But then it will also be the case that

(�e+ �u) � N�0;�2e + �

2u

n

14

Page 15: The Big Problems File

11. Let the following de�nitions hold:

�X =1

n

nXi=1

Xi

�2XY =1

n

nXi=1

�Xi � �X

� �Yi � �Y

�(a) Prove that

�2XY =1

n

nXi=1

�Xi � �X

�Yi =

1

n

nXi=1

�Yi � �Y

�Xi

Proof:

�2XY =1

n

nXi=1

�Xi � �X

� �Yi � �Y

�=1

n

nXi=1

��Xi � �X

�Yi �

�Xi � �X

��Y�

=1

n

nXi=1

�Xi � �X

�Yi �

1

n

nXi=1

��Xi � �X

��Y�

=1

n

nXi=1

�Xi � �X

�Yi �

1

n�Y

nXi=1

�Xi � �X

�| {z }

=0

=1

n

nXi=1

�Xi � �X

�Yi

similarly:

�2XY =1

n

nXi=1

�Xi � �X

� �Yi � �Y

�=

1

n

nXi=1

�Yi � �Y

�Xi �

1

n

nXi=1

�Yi � �Y

��X

=1

n

nXi=1

�Yi � �Y

�Xi �

1

n�X

nXi=1

�Yi � �Y

�=

1

n

nXi=1

�Yi � �Y

�Xi

12. You know that companies in a certain economic sector produce an item y using labor (L) and capital (K).However, di¤erent companies use slightly di¤erent technologies. The technology for the ith �rm can be describedby the following relation

yi = �L�iK

1��i ui (1)

Where � and � are just two constants, and ui is an error or residual term which represents the fact that di¤erent�rms use partly di¤erent technologies (note that the functional form and the parameters are common to all�rms). (incidentally, you might remember from other ECON courses that (1) is a Cobb-Douglas productionfunction, which here has a random component). You also know that the following assumption holds:

E (ui j Li;Ki) = 1

(a) Prove that E (yi j Li;Ki) = �L�iK1��i

E (yi j Li;Ki) = E��L�iK

1��i ui j Li;Ki

�= �L�iK

1��i E (ui j Li;Ki)

= �L�iK1��i

15

Page 16: The Big Problems File

(b) Using the information provided, can you conclude that the following equality holds?

E (ln yi j Li;Ki) = ln�+ � lnLi + (1� �) lnKi

No! In fact:

E (ln yi j Li;Ki) = E [ln�+ � lnLi + (1� �) lnKi + lnui j Li;Ki]= ln�+ � lnLi + (1� �) lnKi + E [lnui j Li;Ki]

but even if we know that E [ui j Li;Ki] = 1; we CANNOT conclude that E [lnui j Li;Ki] = ln(1) = 0:The logarithm is not a linear operation, and in general

E (g (x)) 6= g (E (x))

13. You want to estimate an expected value �X ; and you want to compare the performance of two estimators �1and �2: You have an iid sample of 3 observations X1; X2; X3: The two estimators are de�ned as follows

�1 =1

3

3Xi=1

Xi

�2 =1

6X1 +

2

3X2 +

1

6X3

(a) Are �1 and �2 unbiased? Yes, both estimators are unbiased, asE[�1] = E[

13

P3i=1Xi] =

133E[X]

E[�2] = E��16X1 +

23X2 +

16X3

��= (16 +

23 +

16)E[X] = E[X].

Here we used the fact that the sample draws are identically distributed.

(b) Which estimator, between �1 and �2, is more efficient? Estimator 1, �1, is more e¢ cient, as V ar[�1] <V ar[�2]:

To show this, look at the variances, using the i.i.d. property of the draws in our sample:Var[�1] = 1

3Var[X], and Var[�2] = (136 +

49 +

136)Var[X] =

12Var[X].

(c) Which estimator has the smallest MSE? Estimator 1, �1, has the smallest MSE, as it is more e¢ cient andboth estimators are unbiased.

14. Here we see what can happen when we use assumptions that are not correct. Suppose that you have an iidsample, and you correctly assume that the conditional expectation of y given x is linear, so that E (ui j Xi) = 0:However, suppose also that you incorrectly assume that the intercept is zero. So the �truth� is that theregression is the �usual�

Yi = �0 + �1Xi + ui (2)

with �0 6= 0, but instead you assume thatYi = �1Xi + ui: (3)

If you think that (2) is correct (which it isn�t), the OLS estimator for the slope is (make sure you understandwhy) the solution �1 of the following problem (you minimize the sum of squared errors)

minb1

nXi=1

(Yi � b1Xi)2

(a) Using steps analogous to those we saw in class for the �standard�OLS estimators, show that now the OLSestimator for the slope is

�wrong

1 =

Pni=1XiYiPni=1X

2i

16

Page 17: The Big Problems File

Solution:

min�1

X(Yi � �1Xi)2

min�1

X(Y 2i � 2�1YiXi + �21X2

i )X(�2YiXi + 2�1X2

i ) = 0

�2X

YiXi = �2�1X

X2i

�!1 =

PYiXiPX2i

(b) We want to see what happens when we use this estimator when the assumption that the intercept is zerois incorrect. Using what you know to be the truth about Yi, show that

�wrong

1 = �1 + �0

Pni=1XiPni=1X

2i

+

Pni=1XiuiPni=1X

2i

:

Solution:

�!1 =

P(�0 + �1Xi + ui)XiP

X2i

= �0

PXiPX2i

+ �1

PX2iP

X2i

+

PXiuiPX2i

(c) What does 1n

Pni=1X

2i converge in probability to? That is, calculate the probability limit of

1n

Pni=1X

2i :

(this is really simple, and does not require any long calculation!! The two following points should be evensimpler). Solution: Just use the LLN and the fact that the observations are iid. Then

1

n

nXi=1

X2ip! E

�X2i

�(d) Calculate the probability limit of 1n

Pni=1Xiui:Solution: like before, just use iid-ness and LLN. Then

use LIE and the assumption E (ui j Xi) = 0:

1

n

nXi=1

Xiuip! E [Xiui] = E [E (XiuijXi)] = E [XiE (uijXi)] = E [Xi0] = 0

(e) Calculate the probability limit of 1nPni=1Xi Solution: like before, just use iid-ness and LLN.

(f) Using the previous results, show whether (or not) �wrong

1 is a consistent estimator for the slope �1.Solution: Putting all pieces together:

�!1 = �0

1n

PXi

1n

PX2i

+ �1 +1n

PXiui

1n

PX2i

p! �1 + �0E [Xi]

E�X2i

� 6= �1unless �0 = 0 (which of course means that the �wrong�model is right after all) and/or E [Xi] = 0; inwhich case the true intercept would again be zero.

(g) Give an intuition for your results in part (f). (Hint, draw a scatterplot of points, with Xi on the x-axis,and Yion the y-axis, more or less around a line, and do it in such a way that the line should NOT have anintercept equal to zero. Then think about the consequences of minimizing the sum of squared residualsfrom that scatterplot, but using a line that MUST pass through the origin of your graph. How wouldyou end up drawing it? What would be the relation between the slope estimated without the assumptionof �0 = 0, and the one estimated with it?). Solution: Using �

!1 leads to inconsistency because it forces

the regression line through the origin. Assuming that the Xi�s are non-negative, if �0 is negative, thiswill make �!1 smaller than it should in reality. This means we would under-estimate the impact of Xon Y . Conversely, if �0 is positive, this will make �

!1 larger than it should. In other words, we would

over-estimate the impact of X on Y .

17

Page 18: The Big Problems File

15. You are interested in the relation between wages and education. For the purpose of this problem, you maydisregard the issues of omitted variable bias that we have frequently mentioned. Suppose that you know thatthe correct model is the following:

lnwagei = �0 + �1ieduci + ui

where Cov (educi; ui) = 0; and where you should notice that the slope changes for di¤erent individuals. Then, wecan treat the individual-speci�c slopes as random variables themselves. You have a sample of iid observations,and you have a single observation for each individual i: Clearly, you cannot estimate the individual-speci�cslope, as you have only one observation per individual. However, you may want to estimate the mean slopein the population, that is, the mean percentage increase in wages associated with one more year of education.Let this parameter be

�1 = E (�1i)

You also know that the value of the slope is statistically independent from education. You want to understandif an OLS regression of lnwagei on educi allows you to estimate consistently �1: As usual, then, the startingpoint is the OLS estimate for the slope, which in this case becomes

�1 =

Pni=1

�educi � educ

�lnwageiPn

i=1

�educi � educ

�2(a) (4 points) Prove that

�1 =1n

Pni=1

�educi � educ

��1ieduci

1n

Pni=1

�educi � educ

�2 +1n

Pni=1

�educi � educ

�ui

1n

Pni=1

�educi � educ

�2Substituting the expression for lnwagei in the formula for b�1:

b�1 =1n

Pni=1(educi � educ)(�0 + �1ieduci + ui)

1n

Pni=1(educi � educ)2

= �0 �1n

Pni=1(educi � educ)

1n

Pni=1(educi � educ)2

+1n

Pni=1(educi � educ)�1ieduci1n

Pni=1(educi � educ)2

+1n

Pni=1(educi � educ)ui

1n

Pni=1(educi � educ)2

= 0 +1n

Pni=1(educi � educ)�1ieduci1n

Pni=1(educi � educ)2

+1n

Pni=1(educi � educ)ui

1n

Pni=1(educi � educ)2

(b) (5 points) We know that under the usual �regularity�conditions

p lim1

n

nXi=1

�educi � educ

�ui = Cov (educi; ui)

p lim1

n

nXi=1

�educi � educ

�2= �2educ

What is the probability limit of 1nPni=1

�educi � educ

��1ieduci?

1

n

nXi=1

(educi � educ)�1ieduci =1

n

nXi=1

(educi � educ)(�1ieduci � �1educ)

18

Page 19: The Big Problems File

When taking the probability limit, we know that we can switch averages with expected values:

p lim1

n

nXi=1

(educi � educ)�1ieduci

= E[(educi � E(educi))(�1ieduci � E(�1ieduci))]= Cov(educi; �1ieduci)

= E[educi � �1ieduci]� E[educi] � E[�1ieduci]= E[�1ieduc

2i ]� E[educi] � E[�1ieduci]

= E[�1i] � E[educ2i ]� E[educi]2 � E[�1i] (Independence of �1i and educi)

= E[�1i] � (E[educ2i ]� E[educi]2)= E[�1i] � �2educ

(c) (2 points) Using the results from the previous steps, do you conclude that �1 is a consistent estimate ofthe mean slope �1? Explain.

p lim b�1 =E[�1i] � �2educ

�2educ+Cov(educi; ui)

�2educ= E[�1i] + 0

Therefore, the answer is yes: the estimator b�1 is a consistent estimator of the mean slope �1.(d) (2 points) Do you think your result would change if you knew that the value of the slope is correlated

with the level of education? Provide a brief intuition. No formal proofs are necessary here.

Yes, results would change. If �1i and educ were not independent, we would not be able to write:

Cov(educi; �1ieduci) = E[�1i] � �2educ

and therefore we would have that p lim b�1 6= E[�1i].16. (15 points overall). Suppose that you have a dataset of iid observations (Y �i ; Xi) where Y

�i is the depen-

dent variable measured with error. In particular, you know that the error has additive form, and that it isuncorrelated with the error in the true regression, which is linear in Xi (without an intercept, to keep thingssimpler). Xi is measured without error. So, while the true model is the following

Yi = �Xi + ui (1)

0 = E(ui j Xi)

you cannot estimate it, since you only observe Y �i = Yi + "i;where E("i) = 0 and cov("i; Xi) = 0: Since youobserve Y �i and not Yi, your estimator for the slope will be

� =

Xn

i=1XiY

�iXn

i=1X2i

(a) (2 points) Prove that E(Xiui) = 0:

Solution: E(Xiui) = E(XiE(uijXi)) = 0 by L.I.E.(b) (2 points) Prove that E(Xi"i) = 0

Solution: cov("i; Xi) = 0 = E(Xi"i)� E(Xi)E("i) = E(Xi"i). Therefore, E(Xi"i) = 0.

19

Page 20: The Big Problems File

(c) (2 points) Prove that

� = � +1n

PXiui

1n

PX2i

+1n

PXi"i

1n

PX2i

Solution:

� =

PXi(�Xi + ui + "i)P

X2i

= �

PX2iP

X2i

+

PXiuiPX2i

+

PXi"iPX2i

= � +1n

PXiui

1n

PX2i

+1n

PXi"i

1n

PX2i

(d) (2 points) Assume that you can use the Law of Large Numbers for all the above averages, so that eachaverage will converge in probability to the corresponding expectation when n grows large. What does �converge in probability to?

Solution: We know that � P! � because by L.L.N, 1nPX2iP! E

�X2i

�, 1nPXiui

P! 0 and 1n

PXi"i

P! 0.

(e) (2 points) Is � a consistent estimator for �? How does this result compare with the care where you havemeasurement error in the regressor?

Solution: As shown in the last question, � is a consistent estimator of �. That is, as n becomes largeenough, the estimator approaches its true value, even with the measurement error. This is in contrastwith when measurement error is in the regressor, in which case, the estimator is inconsistent. Whilemeasurement error in X induces correlation between regressor and the error, this is not the case when Yis measured with error, and the error has the structure de�ned in the problem.

(f) (2 points) What is the intuition behind this result (Hint: it might be useful to prove �rst that theregression we are estimating is not (1) above, but rather Y �i = �Xi + vi, where vi = (ui + "i) is the trueerror of the regression we are actually estimating).

Solution:

Y �i = �Xi + vi

vi = (ui + "i)

� = � +1n

PXiui

1n

PX2i

+1n

PXi"i

1n

PX2i

= � +1n

PXivi

1n

PX2i

Since E(vijXi) = E(ui + "ijXi) = 0, we see that the OLS assumptions still hold.

17. You want to estimate the parameters of a production function, and you have a sample of n factories. Let Yibe output for the ith factory, let Li be labor, and let Ki be capital. You know that

Yi = �L�iK

�i + ui (1)

and you also know that E [ui j Li;Ki] = 0:

(a) (3 points) Prove that E [Yi j Li;Ki] = �L�iK�i :

Solution

20

Page 21: The Big Problems File

E[YijLi;Ki] = E[�L�iK�i jLi;Ki] + E[uijLi;Ki]

= E[�L�iK�i jLi;Ki] + 0

= E[�L�iK�i jLi;Ki]

= �L�iK�i

(b) (4 points) Suppose that you have estimated the model in (1), and you found � = � = 1; and � = 0:5:What is the predicted change in output associated to an increase in capital from 10 to 11, if labor is equalto 10?

Solution�Yi = 1 � 10 � 110:5 � 1 � 10 � 100:5 � 1:54

(c) (5 points) Can you estimate the parameters �; �; and � in model (1) using OLS? If your answer is yes,explain why. If you answer is no, explain why, and propose an alternative estimator that you may use.

Solution: No, the model is nonlinear in the parameters and cannot be estimated with OLS.It can be estimated using NLLS as E(uijLi;Ki):

min�;�;�

nXi=1

(Yi � �L�iK�i )2

Take the �rst order conditions FOC and solve numerically for the parameters �; �; �.

18. You want to study what is the expected e¤ect of years of schooling (denoted by S ) on income (denoted by Y), but income is measured with error. You have n i.i.d. observations (Y �i ; Si) ; where Y

�i is observed income,

measured with error. You also know that the relation between observed income Y �i and true income Yiis described by the following relation

Y �i = Yi + "i

where "i is an i.i.d. zero-mean reporting error. The relation between true income and schooling is describedby the following equation:

Yi = �0 + �1Si + ui

E [ui j Si] = 0 ) Cov (Si; ui) = 0

However, you also know that reporting errors are not the same for individuals with very di¤erent schooling, sothat the error is correlated with the level of schooling. That is,

Cov (Si; "i) = �2S" 6= 0

You want to estimate � using OLS, but since you do not observe true income Yi; your OLS estimator ~� willbe

~� =

Pni=1

�Si � �S

�Y �iPn

i=1

�Si � �S

�2

21

Page 22: The Big Problems File

(a) (4 points) Show that the OLS estimator for the slope can be rewritten as

~� = � +1n

Pni=1

�Si � �S

�ui

1n

Pni=1

�Si � �S

�2 + 1n

Pni=1

�Si � �S

�"i

1n

Pni=1

�Si � �S

�2 (1)

Answer:

e�1 =

P(Si � S)(Yi + "i)P

(Si � S)2

=

P(Si � S)(�0 + �1Si + ui + "i)P

(Si � S)2

=�1P(Si � S)Si +

P(Si � S)ui +

P(Si � S)"iP

(Si � S)2

=�1P(Si � S)2 +

P(Si � S)ui +

P(Si � S)"iP

(Si � S)2

= �1 +1n

P(Si � S)ui + 1

n

P(Si � S)"i

1n

P(Si � S)2

:

(b) (2 points) Now assume that all necessary �regularity conditions� hold, so that each sample mean inequation (1) converges in probability to its corresponding expectation. So, for example,

1

n

nXi=1

�Si � �S

�2 p! Eh(Si � �S)2

i= �2S

What does 1n

Pni=1

�Si � �S

�ui converge in probability to?

Answer:1

n

X(Si � S)ui �!P Cov[S; u] = 0;

the last equality is by assumption of the model.

(c) (2 points) What does 1n

Pni=1

�Si � �S

�"i converge in probability to?

Answer:1

n

X(Si � S)"i �!P Cov[S; "] 6= 0;

where again, that the covariance is not zero is the model�s assumption.

(d) (3 points) Using the results in the previous parts, what does ~� converge in probability to?

Answer: e�1 �!P �1 +Cov[S; "]

V ar[S];

where the last term is not zero.

(e) (4 points) Is ~� a consistent estimator for the true slope �? Is your conclusion the same as the oneyou would get if the measurement error in the dependent variable were uncorrelated with the regressors?Why?

Answer: As can be clearly seen in part d, the OLS estimator is inconsistent. The reason is the correlationof the regressor with the composite error term, which includes the measurement error.

19. (10 points) Consider the multiple regression model with three regressors, where OLS Assumptions 1-3 aresatis�ed:

y = �0 + �1x1 + �2x2 + �3x3 + u

You would like to test the null hypothesis H0 : �1 � 3�2 = 1

22

Page 23: The Big Problems File

(a) (3 points) Let b�1 and b�2 denote the OLS estimators of �1 and �2. Find V ar �b�1 � 3b�2� in terms of thevariances of b�1 and b�2 and the covariance between them. What is the standard error of b�1 � 3b�2?Solution: From formula 2.31 in Stock and Watson, we know that

V ar (aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X;Y )

Applied here, V ar�b�1 � 3b�2� = V ar(b�1) + 9V ar(b�2)� 6Cov(b�1; b�2): The standard error is simply

SE�b�1 � 3b�2� =qV ar(b�1) + 9V ar(b�2)� 6Cov(b�1; b�2)

(b) (2 points) What is the t-statistic for testing H0 : �1 � 3�2 = 1? (Just write the formula, since you arenot given any values to plug into it).

Solution:

t =b�1 � 3b�2 � 1SE

�b�1 � 3b�2�(c) (5 points) De�ne �1 = �1 � 3�2 and b�1 = b�1 � 3b�2. Write a regression equation involving �0; �2; �3;

and �1 that allows you to directly obtain b�1 and its standard error and describe how you would use it totest the null hypothesis in part b.

Solution: Because �1 = �1 � 3�2, we can write �1 = �1 + 3�2. Plugging this into the population modelgives

y = �0 + (�1 + 3�2)x1 + �2x2 + �3x3 + u

= �0 + �1x1 + �2 (3x1 + x2) + �3x3 + u

This last equation is what we would estimate by regressing y on x1; (3x1 + x2) ; and x3. The coe¢ cientand standard error on x1 are what we want. Speci�cally, we can test the null hypothesis H0 : �1�3�2 = 1with the following t-statistic: t =

b�1�1SE(b�1)

23

Page 24: The Big Problems File

3 Empirical (or Empirical + Theory) Univariate

1. (25 points overall) You have obtained a sample of 1744 individuals and are interested in the relationshipbetween weekly earnings and age. We assume that the usual OLS assumptions hold. Linearity then impliesthat

earningsi = �0 + �1agei + ui

The OLS regression, using heteroskedasticity-robust standard errors, yielded the following result (standarderrors in parenthesis):

dearnings = 239:16(20:24)

+ 5:20(0:57)

�Age (4)

R2 = 0:05

where Earnings and Age are measured in dollars and years respectively.

(a) (5 points) Is the relationship between Age and Earnings statistically signi�cant at the 1% signi�cancelevel?

H0 : �1 = 0

H1 : �1 6= 0

t =5:20� 00:57

= 9:123 > 2:58

Therefore, reject the null hypothesis.

(b) (5 points) Suppose that you want to test the null hypothesis that becoming one year older increasesyour expected weekly earnings by 4 dollars, versus a two-sided alternative. State clearly H0 and H1; andcompute the p-value. Would you reject the null at the 5% signi�cance level? And at the 1% signi�cancelevel?

H0 : �1 = 4

H1 : �1 6= 4

t =5:20� 40:57

= 2:105

p-value = 2�(�jtj)= 2� 0:0179 = 0:0358

Therefore, reject the null hypothesis at 5 %, but fail to reject at 1 %.

(c) (5 points) Compute a 95% con�dence interval for the e¤ect on expected earnings of increasing a worker�sage from 35 to 40 years. Do you think the expected change looks important, in economic terms? Solution:

5� (�1 � 1:96 � SE(�1))5� (5:20� 1:96 � 0:57) = (26� 5:586) = (20:414; 31:586)

(d) (5 points) Give an economic interpretation of the estimates in regression (1) above: why should agematter in the determination of earnings? Do you think that growing old, by itself, causes an increasein your expected earnings? Solution: Age is a proxy for experience. Aging by itself does not increaseincome.

(e) (5 points) Do you think that assuming that the relationship between age and earnings is linear was agood idea? Discuss brie�y. Solution: No. Slope probably begins to decrease and turns negative as aperson approaches retirement.

24

Page 25: The Big Problems File

2. (7 points) You are interested in comparing the e¤ect of increasing total monthly expenditure per person (callit MEP ) on the fraction of the budget spent on food (the �food budget share�, call it FBS ) between twoIndian States, Andhra Pradesh (AP) and Uttar Pradesh (UP). You decide to draw two independent samples,one from AP, and one from UP, and to run two separate regressions using OLS, using heteroskedasticity robuststandard errors. Assume that all the necessary OLS assumptions hold, so that the usual asymptotic resultsare correct. Your OLS regression for AP turns out to be

dFBSAP = 1:587(0:024)

� 0:153(0:0038)

�MEPAP

while your estimated OLS regression for UP is

dFBSUP = 1:218(0:018)

� 0:0990(0:0029)

�MEPUP

Using these estimates, test the hypothesis that the e¤ect on FBS of a unit increase in MEP is the sameacross the two di¤erent Indian states, using a 1% signi�cance level. Solution: Since the two draws for thetwo equations are independent, we can just use a t-test with the two � parameters we�re interested in. De�ne = �2AP � �2UP .

H0 : = 0

H1 : 6= 0

Now, run the usual t-test.

t = � 0SE( )

=�0:153 + 0:0990p0:00382 + 0:00292

=0:054

0:0048= 11:3

Therefore, reject the null hypothesis at 1 % that the result of a unit increase in MEP is equal in the two states.

3. (24 points) You have collected data on births from a random sample of women in the United States. Twovariables of interest are the dependent variable, infant birth weight in ounces (bwght), and an explanatoryvariable (cigdum), which is a dummy (0,1) variable equal to 1 if the mother smoked during pregnancy andequal to 0 if she did not. The following simple regression was estimated using data on n = 1388 births:

bwght = �0 + �1cigdum+ u

(a) (3 points) What is the interpretation of �0 and �1 in this example? What do the parameter estimatesimply about the relationship between birth weight and smoking? (You do not have to discuss statisticalsigni�cance). Answer: �0 is the population mean birth weight for babies with mothers who did not

25

Page 26: The Big Problems File

smoke during pregnancy. Similarly, �0+ �1 is the population mean birth weight for babies with motherswho did smoke during pregnancy (you did not need to note this part on the exam). �1 is then thedi¤erence between the population mean of birth weights for babies with mothers who did not smokeduring pregnancy and the population mean birth weight for babies with mothers who did. The negativecoe¢ cient on CIGDUM implies that expected birth weights are lower among mothers who smoked duringpregnancy.

(b) (4 points) Formulate and conduct a test at the 5% level of the null hypothesis that birth weights arenot a¤ected by cigarette smoking (use a two-sided alternative hypothesis). What do you conclude? Doyou think a two-sided alternative hypothesis is sensible in this context? Answer: The hypothesis testshould be set up as follows H0 : �1 = 0 HA : �1 6= 0: The t-stat, which is also reported in the Eviewsoutput, is t = �8:91�0

1:44 = �6:18: The corresponding p-value = 2� (�6:18) � 0: Given that the p-value< :05 (our signi�cance level), we can reject the null that the mean birth weights are the same. Note: Youcould have also used a con�dence interval or acceptance region to answer this. A two-sided alternativeseems appropriate here since we don�t necessarily have a reason to believe that babies born to motherswho smoked will have a lower birth weight than babies born to mothers who did not smoke. They couldalso be heavier.

(c) (4 points) How much of the variation in birth weight is explained by whether or not a mother smokesduring pregnancy? Does this mean that cigarette smoking does not have a signi�cant impact on birthweight? Explain. Answer: According to the R2 reported in the regression output, about 2.5% of thevariation in birth weight is explained by whether or not a mother smokes during pregnancy. This R2 isclearly quite small, suggesting that there are undoubtedly many other factors that impact birth weight(such as mother�s nutrition, race, and the length of pregnancy). However, the low value of the R2 doesnot mean that smoking during pregnancy has an insigni�cant impact on birth weight. This requires atest of the statistical signi�cance of �1; as was conducted in part b (where we found that it did).

(d) (4 points) How many of the women in the sample smoked while pregnant? Justify your results. Answer:We can see in the regression output that �Mean dependent variable�= BWGHT = 118:6996 Since theOLS formula for b�0 tell us that Y = b�0 + b�1X we can solve for

CIGDUM =BWGHT � b�0b�1 =

118:6996� 120:0612�8:914998 = :153

The number of smokers is then CIGDUM � n = :153 � 1388 = 212(e) (3 points) Suppose that instead of using the dummy variable cigdum as the explanatory variable, you

use the average number of cigarettes the mother smoked per day during pregnancy (cigs).

bwght = �0 + �1cigs+ u

Why might you prefer this regression to the one reported above? Answer: Employing a continuousmeasure of cigarette consumption instead of a binary one allows us to quantify the impact per cigaretteon birth weight. In particular, we can now calculate the predicted e¤ect on birth weight of smoking onlyone cigarette per day during pregnancy (or the e¤ect of smoking 20).

26

Page 27: The Big Problems File

(f) (3 points) Here are the results of the regression described in part e).

What is the predicted birth weight when cigs = 0? How about when cigs = 20? Answer: We knowthat the prediction equation for the regression speci�ed here is given by dBWGHT = b�0 + b�1CIGS: AtCIGS = 0 : dBWGHT = 119:77 � :514 � 0 = 119:77: So the predicted birth weight when CIGS = 0 is119.8 ounces. At CIGS = 20 : dBWGHT = 119:77 � :514 � 20 = 109:49: So the predicted birth weightwhen CIGS = 20 is 109.5 ounces.

(g) (3 points) Does this regression necessarily capture a causal relationship between the child�s birth weightand the mother�s smoking habits? Explain. Answer: Not necessarily. If there are other unobservablefactors such as parent�s income or nutritional habits that impact birth weight, and if cigarette smokingis related to those factors, then we may only be uncovering a spurious correlation between smoking andbirth weight. If the assumptions of OLS hold for certain, then we are on somewhat safer ground (ofcourse, there is no way to know for certain that they do). Causation in econometrics is a subtle issue thatwe will return to later in the course.

27

Page 28: The Big Problems File

4. (22 points) You have collected data on the prices of diamonds at an online retailer (you can assume it�s arandom sample). In particular, you have the price (price) in U.S. dollars and the weight in carats (carats) for380 diamonds which are of similar clarity and cut (2 measures of diamond quality). Here is a scatterplot ofthe data

(a) (4 points) Would you be comfortable using homoskedasticity only standard errors to construct a con�-dence interval for �1 in the following regression

price = �0 + �1carats+ u

Why or why not? Explain. Answer: No. Looking at the scatterplot, there is reason to believe thatthe dispersion in price increases with carats: there appears to be more volatility in price when caratsis larger. This suggests that the errors are likely to be heteroskedastic. Given this evidence, it wouldnot be wise to assume homoskedasticity since, if the errors are heteroskedastic, the homoskedasticity onlystandard errors are incorrect and could well lead to incorrect conclusions.

(b) (3 points) The following simple regression was estimated

price = �0 + �1carats+ u

What is the interpretation of �0 here? Does the parameter estimate make sense to you? Explain.Answer: �0 is the predicted value of the price of a diamond when carats = 0 (i.e. the value of the

28

Page 29: The Big Problems File

population regression line when x = 0). Intuitively, we expect that this should be equal to 0, since adiamond with no weight (i.e. no diamond) should not have any value. The negative parameter estimateis clearly strange, given the interpretation of �0: However, a look at the scatter plot reveals that we donot in fact observe diamonds with carats = 0 (we can�t really) or even any diamonds with weights closeto 0. Since we are then extrapolating (i.e. predicting out of sample), we should not try to interpret thiscoe¢ cient estimate.

(c) (3 points) Using the results reported above, construct a 95% con�dence interval for the predicted a¤ecton price of a .2 carat increase in weight. Answer: The general format for a con�dence interval for achange in X of �X is b�1 � �X � z�=2 � SE(b�1) � �X Here we have b�1 � �X � 1:96 � SE(b�1) � �X =5573:34 � :2� 1:96 � 187:14 � :2 = 1114:67� 73:36 = (1041:31; 1188:03)

(d) (4 points) Calculate the p-value for the null hypothesis that �1 = 5200 against a two-sided alternative.Can you reject the null when � = :05? Answer: The t-stat, which is not reported in the Eviews output,is t = 5573:35�5200

187:14 = 1:995: The corresponding p-value = 2� (�1:995) = :046: Since the p-value = :046 <� = :05, we can reject the null in this case.

(e) (4 points)What is the interpretation of R2 in this example? Why do you think is it so di¤erent from whatwe found in the birth weight regressions in the previous question? Answer: The R2 is the percentage ofthe total variation in price explained by the regression. In this case it is about 65%, which is quite high.This should not be surprising, given the high degree of correlation apparent in the scatter plot. Also,intuitively, it seems likely that weight would be a prominent factor in explaining the price of diamonds,especially given that these are diamonds of similar cut and clarity1 (so I have e¤ectively held at least twomeasures of quality constant - something I was not able to do in the smoking example). In other words,there are arguably fewer outside factors impacting the dependent variable (price) in this case than in thesmoking example.

(f) (4 points) Suppose you also have data on the color of the stone (note that people strongly preferdiamonds with as little color as possible). Let the variable (color) be a dummy variable equal to 1 if thediamond is near colorless and equal to 0 if it is faintly yellow (none of the diamonds in this dataset aretruly colorless). The following simple regression was estimated

price = �0 + �1color + u

Is there a signi�cant price premium for near colorless stones? (To answer this question, set up a hypothesistest and calculate the p-value for the null hypothesis against a one-sided alternative, using a 1% signi�cancelevel). Is a one-sided alternative sensible in this context?

1 I noted this in the set-up of the problem.

29

Page 30: The Big Problems File

(g) Answer: The hypothesis test should be set up as follows. H0 : �1 = 0 HA : �1 > 0: The t-stat, whichis also reported in the Eviews output, is t = 649:95�0

100:08 = 6:49: The corresponding p-value = �(�6:49) � 0:Given that the p-value < :01 (our signi�cance level), we can reject the null that there is no price premium..Here, a one-sided test does seem appropriate, since the problem states that people have strong preferencesfor diamonds with less color.

5. You are interested in studying the factors that in�uence a person�s decision of whether to go to college.Therefore, you have collected data from 3796 high-school graduates, 6 years after they graduated from highschool. You can assume you have an iid sample. In particular, you observe their total years of education(yrsed), which ranges from 12 to 18, and whether or not at least one of their parents graduated from college.

(a) (4 points) Out of the 3796 people in your dataset, 954 have at least one parent who graduated fromcollege. The average years of education (yrsedc) for this group is 14.8 years with a sample standarddeviation (sc) of 1.74. The remaining 2842 people in the sample have parents who did not graduate fromcollege. In this group, the average years of education (yrsednc) is 13.5 years with a sample standarddeviation (snc) of 1.72. Using this information, construct 95% con�dence intervals for the populationmeans of years of education for each group. Solution: For the people with at least one parent whowent to college, a con�dence interval for �yrsedc is given by yrsedc � 1:96 � SE

�yrsedc

�: We know that

yrsedc = 14:8 and SE�yrsedc

�= scp

nc= 1:74p

954= :056:Therefore the con�dence interval is yrsedc � 1:96 �

SE�yrsedc

�= 14:8� 1:96 � :056 = (14:69; 14:91) : For the people for which neither parent went to college,

a con�dence interval for �yrsednc is given by yrsednc�1:96 �SE�yrsednc

�In this case, yrsednc = 13:5 and

SE�yrsednc

�= sncp

nnc= 1:72p

2842= :032:Therefore the con�dence interval is yrsednc� 1:96 �SE

�yrsednc

�=

13:5� 1:96 � :032 = (13:44; 13:56)(b) (5 points) Using a 1% signi�cance level and the information in the setup of part a), formulate and

conduct a test of the null hypothesis that there is no di¤erence in the mean of yrsed between the twogroups of people. (You may assume that the two population variances are equal). Solution The null andalternative hypotheses are written: H0 : �c � �nc = 0; HA : �c � �nc 6= 0: To test this hypothesis we caneither construct a con�dence interval (using z�=2 = 2:58) or simply calculate the p-value. For either one,we need to calculate

SE�yrsedc � yrsednc

�=

ss2cnc+s2ncnnc

=

s(1:74)2

954+(1:72)2

2842= :065

Using the information in part a, a 99% CI is simply:

yrsedc � yrsednc � 2:58 � SE(yrsedc � yrsednc)= 14:8� 13:5� 2:58 � :065 = (1:13; 1:47)

Since this CI does not contain zero, we can reject the null hypothesis at the 1% level. To calculate the

p-value, we �rst need the t-stat which is t = yrsedc�yrsednc�0SE(yrsedc�yrsednc)

= 1:3�0:065 = 20 The p-value is then p-value

= 2� (�20) � 0: This small p-value leads us to reject the null at the 1% level, just as we concluded withthe con�dence interval.

(c) (3 points) Now, again using a 1% level of signi�cance, repeat the exercise in part b using a one-sidedtest (where the alternative is that the mean of yrseduc is greater for people for whom a parent graduatedfrom college). What do you conclude now? Solution: The null and alternative hypotheses should nowbe written as

H0 : �c � �nc = 0;HA : �c � �nc > 0

The t-stat is now t = yrsedc�yrsednc�0SE(yrsedc�yrsednc)

= 1:3�0:065 = 20 and the p-value is now p-value = 1�� (20) � 0: So,

we de�nitly reject the null hypothesis at the 1% level.

30

Page 31: The Big Problems File

(d) (3 points) Now, let�s analyze the data using a univariate regression. Using the same dataset, youconstruct a dummy variable (parcol), which is equal to 1 if at least one of the person�s parents graduatedfrom college and 0 if not. The following simple regression was estimated using data on all n = 3796people:

yrsed = �0 + �1 � parcol + udyrsed = 13:5(:032)

+ 1:30(:065)

� parcol; R2 = :095

What is the interpretation of the constant in this regression? Does it have a meaningful interpretation inthis model? Solution: �0 is the average years of education for people with no parents who went to college.This does have a meaningful interpretation in this setting: it�s simply a group mean. Furthermore, asit should, our estimate, b�0, matches what we were told about the sample estimate of this group mean(yrsednc) in the setup of part a)

(e) (3 points) What is the interpretation of the slope in this regression? Using the regression results, what isthe predicted mean of yrsed for people with at least one parent who graduated from college? Solution: �1is the di¤erence in mean years of education between people with no parents who went to college and peoplewith at least one parent who went to college, or �c � �nc above. Our estimate, b�1, matches the estimate(yrsedc � yrsednc) that we calculated in part b). The predicted mean is b�0 + b�1 = 13:5 + 1:3 = 14:8;which matches what we were told in the setup of part a).

(f) (4 points) Repeat the hypothesis test in part b) using the regression results. Do your conclusions change?Should they? Solution: The null and alternative hypotheses are now:

H0 : �1 = 0

HA : �1 6= 0

As always, to test this hypothesis we can either construct a con�dence interval (using z�=2 = 2:58) or

simply calculate the p-value. Using the regression results: b�1 � 1:96 � SE(b�1) = 1:3 � 2:58 � :065 =(1:13; 1:47) : To calculate the p-value, we �rst need the t-stat which is t =

b�1�0SE(b�1) = 1:3�0

:065 = 20: The

p-value is then p-value = 2� (�20) � 0: These are the same results we found in part b), which is notsurprising since this procedure is equivalent to a di¤erence in means analysis.

(g) (6 points) In addition to the information on parent�s college status, you have also collected informationon the distance to the nearest college (dist) in 10�s of miles (dist has a range of 0 to 16). You decide torun another regression, this time using dist as the regressor. Here are the results:

yrsed = 13:9(:038)

� :073(:014)

� dist; R2 = :01

What is the interpretation of the constant in this regression? What is the interpretation of the slope?Is the slope statistically signi�cant at the 5% level? Solution The intercept here is the expected yearsof education for a person who lives 0 mile from a college (probably a current student). The slope is theexpected change in years of education associated with a 10 mile increase in the distance to the nearestcollege. To address the issue of signi�cance, we need to test the following null hypothesis:

H0 : �1 = 0

HA : �1 6= 0

Again, to test this hypothesis we can either construct a con�dence interval (using z�=2 = 1:96) or simplycalculate the p-value.The con�dence interval is:b�1 � 1:96 � SE(b�1) = �:073� 1:96 � :014 = (�:10;�:05)To calculate the p-value, we �rst need the t-stat which is t =

b�1�0SE(b�1) = �:073�0

:014 = �5:21The p-value isthen p-value = 2� (�5:21) � 0:Either way, we can reject the null, so dist does have a signi�cant (andnegative) impact on yrsed:

31

Page 32: The Big Problems File

(h) (3 points) What is the interpretation of R2 in this regression? Does the low value of R2 imply thatthe coe¢ cient on dist is not statistically signi�cant at the 1% level? Solution The R2 represents thepercent of variation in the dependent variable (yrsed) that is explained by the regressor (dist). Here wesee that we are explaining about only about 1% of the variation in yrsed with dist. While this low R2

does tell us that only a small part of the variation in yrsed is explained by the distance to the nearestcollege, the question of statistical signi�cance can only be answered with a formal test of signi�cance, i.e.by calculating a p-value or a t-stat.

(i) (3 points) Using the regression results, what is the predicted mean years of education for a person wholives 13 miles from the nearest college? How about a person who lives 100 miles from the nearest college?Solution: The best predicted value for a person who lives 13 miles away (dist = 1:3) is simply

dyrsed = 13:9� :073 � dist = 13:9� :073 � 1:3 = 13:81For a person who lives 100 miles away we have

dyrsed = 13:9� :073 � dist = 13:9� :073 � 10 = 13:17(j) (3 points) Construct a 95% con�dence interval for the expected decrease in mean years of education

associated with moving 20 miles farther away from the nearest college. Solution: This is asking for acon�dence interval for 2�1; which is constructed as:

CI = 2b�1 � 1:96 � 2 � SE �b�1� = 2 � �:073� 1:96 � 2 � :014 = (�:2;�:09)

32

Page 33: The Big Problems File

6. (12 points) The director of marketing for the Durham Bull�s baseball team is interested in the number ofgames that Duke undergraduates attend. Each undergraduate at Duke was classi�ed according to their year inschool (freshman, sophomore, junior, or senior) and according to the number of times they attended a DurhamBull�s baseball game that year (never, once, or more than once). The proportion of students in the variousclassi�cations are given in the following table:

Never Once > 1Freshman 0.08 0.10 0.04Sophomores 0.04 0.10 0.04Juniors 0.04 0.09Seniors 0.02 0.15 0.10

(a) (2 points) What is the value that belongs in the missing space? Explain. Solution The missing valueis 0.20 since the probabalities (here proportions) must all sum to 1.

(b) (2 points) If there are 6,200 undergraduates, how many of them are freshman? Solution: To answerthis, we need to know the proportion (probability) of students in the freshman class. In particular, we need

to calculate P (Y ear = Freshman) using the de�nition of marginal probabilty: P (Y = y) =nPi=1P (X =

xi; Y = y) We see that P (Y ear = Freshman) = :08 + :10 + :04 = :22. So the number of freshman is:22 � 6200 = 1364:

(c) ( 4 points) If a student selected at random from the Duke undergraduate population is a junior, whatis the probability that the student has never attended a Bull�s game?

Solution: We are interested in �nding P (Games = Never j Y ear = Junior) : Since

P (Games = Never j Y ear = Junior) = P (Y ear = Junior;Games = Never)

P (Y ear = Junior)

we need to calculate P (Y ear = Junior) using the de�nition of marginal probabilty: P (Y = y) =nPi=1P (X = xi; Y = y) We see that P (Y ear = Junior) = :04 + :20 + :09 = :33 Therefore, P (Games =

Never j Y ear = Junior) = :04:33 = :12

(d) (4 points) If a student selected at random from the undergraduate population has attended one game,what is the probability that the student is a senior? Solution Now we are interested in �nding P (Y ear = Senior j Games = One) :Since

P (Y ear = Senior j Games = One) = P (Games = One; Y ear = Senior)

P (Games = One)

we need to calculate P (Games = One) using the de�nition of marginal probabilty: P (Y = y) =nPi=1P (X =

xi; Y = y): We see that P (Games = One) = :1 + :1 + :2 + :15 = :55: Therefore, P (Y ear = Senior jGames = One) = :15

:55 = :27:

7. You have a random sample of 5911 individuals from the US Current Population Survey (CPS) and are interestedin the relationship between hourly earnings and education. You estimate the following model:

Wagei = �0 + �1Ci + "i: (1)

where Wagei is average hourly earnings for the ith individual and Ci is a dummy equal to one if the ith

individual graduated from college and zero if not. The results from a univariate regression of Wage on thiscollege dummy variable are as follows (homoskedasticity-only standard errors in parenthesis):

dWagei = 11:7(:11)

+ 5:09(:17)

� Ci R2 = :13 (2)

where dWagei is the OLS predicted (��tted�) value.33

Page 34: The Big Problems File

(a) (2 points) What is the interpretation of the intercept in this regression? Solution: The intercept inthis regression represents average hourly earnings for an individual who has not graduated from college.There is no problem with interpretation since we are not extrapolating in this case. Here, we �nd thatnon-college graduates earn $11.70 per hour on average.

(b) (2 points) Calculate the predicted average hourly earnings for an individual who has graduated fromcollege.Solution: This is just 11:7 + 5:09� (1) = 16:79 or $16:79 per hour.

(c) (3 points) Calculate a 90% con�dence interval for �0:

11:7� 1:645 (:11) = (11:52; 11:88)

(d) (5 points) Calculate a 95% con�dence interval for the average hourly earnings of a college graduate,

given that dCov �b�0; b�1� = �0:01:Solution: To answer this question, you have to use the covariancethat is provided above. The parameter you are estimating is �0 + �1; so that the con�dence interval isb�0 + b�1 � 1:96� SE �b�0 + b�1�where

SE�b�0 + b�1� =

rdV ar �b�0 + b�1� =rdV ar �b�0�+ dV ar �b�1�+ 2dCov �b�0; b�1�=

p:112 + :172 + 2 (�:01) = :145

so the CI is11:7 + 5:09� 1:96� :145 = 16:79� :284 = (16:5; 17:1)

(e) (4 points) Is the di¤erence in mean Wage between college graduates and non-college graduates statis-tically signi�cant at the 1% level? Solution: We can test this with the following t-statistic:

t =5:09� 0:17

= 29:94 > 2:58

So the answer is yes.

(f) (3 points) Is the di¤erence in mean Wage between college graduates and non-college graduates large, ineconomic terms? Solution: it is very large. In fact, according to this regression, college graduates earnabout 44% more per hour than non-college graduates. This is a substantial di¤erence, especially once youtake into account how many working hours there are in a year!

(g) (2 points) How much of the variation in average hourly earnings is our regression explaining? Is thissurprising? Solution: According to the R2 ; about 13%. This doesn�t seem like a lot, but shouldn�tbe too surprising since this is cross-sectional data and there are undoubtedly lots of other factors thatdetermine wages, such as ability, race, gender, profession, experience, etc.

(h) (4 points) Consider a two-sided test where the null hypothesis is that �1 = 5: Calculate the p-value forthis test. Can you reject the null at the 10% level? Solution: First, let�s calculate the t-statistic:

5:09� 5:17

= :53

Then the p-value is simply2� (� j:53j) = 2 � :298 = :596

With this large p-value (>> :10) we clearly can�t reject the null.

(i) (4 points) The standard errors in the estimated regression (2) were calculated assuming homoskedasticity.Now you re-estimate the model using the same data, but allowing for heteroskedasticity, and your resultsare as follows: dWagei = 11:7

(:115)+ 5:09(:175)

� Ci (3)

Do the results suggest that the standard errors from the �rst model you have estimated (i.e. (2)) arereliable? Solution: Since the di¤erence is quite small, both in absolute and in relative terms, het-eroskedasticity does not seem to be a problem in this setting.

34

Page 35: The Big Problems File

(j) (4 points) The point estimates for �0 and �1 in (2) and (3) are exactly the same. Does this makesense? Why or why not? Solution: Absolutely. The adjustment in the standard errors that allowsfor heteroskedasticity has nothing whatsoever to do with the point estimates, which remain numericallyidentical. As an aside: There are other estimators (apart from OLS) that do change the point estimatesbecause they use information about the variance of the error when they calculate the point estimates(these are called GLS, or Generalized Linear Squares - a practical form of WLS), but this does nothappen with OLS, so the answer is yes. The point estimates wouldn�t change even if the data were veryheteroskedastic.

(k) (4 points) A classmate sees your results and concludes that there is clear evidence that graduatingfrom college increases an individual�s expected earnings. Do you agree with this conclusion (based on theresults above)? Why, or why not? Solution: You shouldn�t agree with this conclusion. While there isample reason to believe that there�s a strong and positive return to education, this simple comparison isnot enough to establish this empirically. As I mentioned in part g (and also discussed in class), there arelots of other things that determine a person�s wage (like ability for example) that may also be correlatedwith the included regressor. To establish a causal e¤ect, we would need to control for these other factors.

(l) (5 points) Suppose you want to estimate the Wage ratio between college graduates and non-collegegraduates. Let � denote the true value for this parameter. Another classmate suggests using the followingestimator: b� = b�0b�1Would you accept your classmates�s suggestion? In particular, is b� a consistent estimator for �? If so,prove that it is. If not, prove that is isn�t and propose an alternative but consistent estimator. Solution:You shouldn�t. The ratio between Wage for college and non-college graduates is �0+�1�0

not �0�1; so that

the proposed estimator is de�nitely not consistent a consistent estimator of � = �0+�1�0

. In fact, from OLS

Assumption 1, we know that b�0 p! �0 and b�1 p! �1; so that

b� = b�0b�1 p! �0�16= �

Instead, a consistent estimator of the ratio we are actually interested in is

b�� = b�0 + b�1b�0 p! �0 + �1�0

= �

(m) (4 points) Suppose that you happen to know that b�0 and b�1 are unbiased estimators for �0 and �1respectively. Does this imply that

b�0b�1 will be an unbiased estimator for �0�1 ? Why or why not? Solution:Although expectation is a linear operator, a ratio is NOT a linear function, and thus it is not generallytrue that

E

�X

Y

�=E (X)

E (Y )

so while it is true thatE�b�0�

E�b�1� =

�0�1

it is not true that

E

b�0b�1!=�0�1

In other words, unbiasedness of the b��s does not imply unbiasedness of the ratio. While it is certainlytrue that

plimb�0b�1 = plim b�0

plim b�1 = �0�1

this has nothing to do with unbiasedness.

35

Page 36: The Big Problems File

8. (4 points) Suppose now that you know that the true value of the parameter �1 is 5: Does this imply thatb�1 (the OLS estimator for the parameter �1) is a biased estimator for �1? Justify your answer.Solution:De�nitely not. An estimator b�1 biased for �1 if E �b�1� 6= �1 and we already know from OLS Assumption 1

that E�b�1� 6= �1:We are told that �1 = 5; while our regression provides an estimate b�1 = 5:09. Knowing the

true value of �1 doesn�t change the fact that E�b�1� = �1 if OLS Assumption 1 holds. Given that samples

vary, it will virtually never be the case that our estimate�b�1� hits the true value of �1 exactly. It would

make no sense at all to evaluate how �good�an estimator is based on whether we get the right value of theparameter every time.

36

Page 37: The Big Problems File

4 Empirical Multivariate

1. Using a sample of 534 i.i.d. observations from the 1985 CPS (Current Population Survey), you have estimatedthat the relation between wages Wi and years of schooling Si is described by the following relation:dlnWi = �0 + �1Si = 1:06

(:10743)+ 0:077(:00809)

Si

where heteroskedasticity-robust standard errors are reported in parenthesis, and lnWi is the natural logarithmof the hourly wage. You can assume that all the usual OLS assumptions hold.

(a) (4 points) What is the economic interpretation of the estimated slope in the above regression? Solution:This is a log-linear regression, so one more year of education is expected to increase wages by about 8%.

(b) (3 points) Does the relation between schooling and wage appear important, in economic terms? Solution:Yes, it does; � an 8% increase in wages is associated with only one additional year of schooling. Noargument on the statistical signi�cance was required here!

(c) (3 points) What is the expected level of (lnWi) for an individual with 15 years of schooling? Solution:dln(Wi) = 1:06 + 0:077 � 15 = 2:215

(d) (4 points) In the above regression, is the slope signi�cantly di¤erent from zero, using a 1% signi�cancelevel? Solution:Form hypotheses: H0 : �1 = 0; H1 : �1 6= 0; The test statistics is constructed as follows:

t =b�1

SE(b�1) = 0:077

0:00809= 9:52 > t1% critial value = 2:58

) Reject the null that the slope coe¢ cient is not signi�cantly di¤erent from zero.

(e) (5 points) You want to test the null hypothesis that �1 = 0:1; against a two-sided alternative. Calculatethe p-value for this test. Solution: Form test statistic:

t =b�1 � 0:1SE(b�1) = 0:077� 1

0:00809= �2:84:

p-value then can be computed as follows:

p� value = 2�(�jtj) = 2 � 0:0023 = 0:0046 � 0:5%:

(f) (5 points) Calculate a 95% con�dence interval for the predicted e¤ect on (lnWi) of increasing schoolingby 3 years. Solution:

CI95% = 3 � [b�1 � t2:5% crit:value � SE(b�1)] = 3 � [0:077� 1:96 � 0:00809] = [0:183; 0:279](g) (4 points) Would you expect the point estimates of the coe¢ cients �0 and �1to change if you did not

assume homoskedasticity? Why, or why not? Solution: No, OLS point estimates are not a¤ected byassumptions on V ar(uijXi).

(h) (4 points) You calculate the standard errors assuming homoskedasticity, and the estimated standard errorsfor �0and �1 are respectively 0.05986 and 0.00807. Do you think heteroskedasticity is a serious concernin this regression? Solution: Probably yes. The SEs look almost identical for the slope but the SE forthe intercept is now about half as large as it was before.

(i) Now you complicate the model, since you want to study whether the relation between wage and schoolingchanges if the individual is a Trade Union member. You estimate the following regressiondlnWi = 0.91

(0.13)+ 0.75

(0.25)Ui + 0.08

(0.01)Si � 0.035

(0.019)UiSi

where Ui is a dummy variable equal to 1 if the worker is a Trade Union member, and zero otherwise.(5 points) What is the predicted value of (lnWi) for a unionized individual with 12 years of schooling?Solution: dln(Wi) = 0:91 + 0:75 + 0:08 � 12� 0:035 � 12 = 2:2

37

Page 38: The Big Problems File

(j) (5 points) Does the e¤ect of schooling on (logarithm of) wages di¤er importantly, in economic terms,between unionized and non-unionized individuals? Solution: Yes, it does. The e¤ect of schooling ismeasured by the slope. For unionized workers the estimated slope coe¢ cient is 0.045 (0.08-0.035), whilefor non-unionized workers it is equal to 0.08. Thus, an increase in log wages from additional schooling forunionized workers is on average about 50% (!) less than that for non-unionized individuals.

(k) (5 points) Test the null hypothesis that the predicted e¤ect of increasing the level of schooling on (lnWi)is the same for unionized and non-unionized individuals, against a two-sided hypothesis, with a 10%signi�cance level. Solution: Here we have to test that the slope coe¢ cients are the same for twocategories of workers. The di¤erence in the slopes is captured by the coe¢ cient on the cross-productvariable, UiSi. Thus, our hypotheses are formulated as follows: H0 : coef. on UiSi = 0; H1 :coef.onUiSi 6= 0 t-statistic is calculated in usual way:

t� stat = �0:0350:019

= �1:84:

Since the absolute value of t-statistic is greater than 1.64 (the critical value of t-distribution for a givenlevel of signi�cance), we reject the null.

2. 25 points overall. Earnings functions attempt to �nd the determinants of earnings, using both continuousand binary variables. One of the central questions analyzed in this relationship is the returns to education.Your estimated regression looks like the following:

ln(Earn) = �0:01(0:16)

+ 0:101(0:012)

Educ+ 0:033(0:006)

Exper � 0:0005(0:0001)

Exper2 + e

where Earn is average hourly earnings, Educ is years of education, Exper is years of experience, and e is the(estimated) error.

(a) (4 points) What is the e¤ect of an additional year of experience for a person who has worked for 20years? What is the e¤ect for a person who has worked for 30 years?

Solution: For a person going from 20 to 21 years of experience,

� ln(Earn) = 0:033� (0:0005� 212 � 0:0005� 202) = 0:0125

Or a 1.25% increase in earnings. For a person going from 30 to 31 years of experience,

� ln(Earn) = 0:033� (0:0005� 312 � 0:0005� 302) = 0:0025

Or a 0.25% increase in earnings.

You want to �nd the e¤ect of introducing two variables, gender and marital status. Accordingly youspecify a binary variable that takes on the value of one for females and is zero otherwise (Female), andanother binary variable that is one if the worker is married but is zero otherwise (Married). Adding thesevariables to the regressors results in:

ln(Earn) = �0:21(0:16)

+ 0:093(0:012)

Educ+ 0:032(0:006)

Exper � 0:0005(0:0001)

Exper2 � 0:289(0:049)

Female+ 0:062(0:056)

Married+ e

(b) (4 points) Are the coe¢ cients of the two added binary variables individually statistically signi�cant?

38

Page 39: The Big Problems File

Solution: �Female�is signi�cant at the 1% level, but �Married�is not signi�cant.

H0 : � = 0

H1 : � 6= 0

tFemale =�0:289� 00:049

= �5:9

tMarried =0:062� 00:056

= 1:1

(c) (4 points) In percentage terms, how much less do females earn per hour, controlling for education andexperience? Is the di¤erence economically important?

Solution: Females earn approximately 29% less than their male counterparts. This is very economicallysigni�cant.

(d) (4 points) In percentage terms, how much more do married people make? Is the di¤erence economicallyimportant?

Solution: Married people make 6.2% more than singles. This is also economically signi�cant.

(e) (4 points) In your �nal speci�cation, you allow for the binary variables to interact. The results are asfollows:

ln(Earn) = 0:14(0:16)

+ 0:093(0:011)

Educ+ 0:032(0:006)

Exper � 0:0005(0:001)

Exper2

�0:158(0:075)

Female+ 0:173(0:080)

Married� 0:218(0:097)

Female �Married+ e

In percentage terms, how much less do single females earn per hour, when compared with single males,keeping education and experience constant?

Solution: Single females earn about 15.8 % less than single males with comparable education andexperience.

(f) (5 points) In percentage terms, how much less do married females earn per hour, when compared withmarried males, keeping education and experience constant?

Solution: Married women earn �0:158 � 0:218 = �37:6% compared to the baseline value of marriedmales.

3. (12 points overall) Suppose that, in the population, the relation between the budget share spent in educationrelated goods in family i is accurately described by the following linear model

sharei = �0:7 + 0:1 lnPCEi + 0:03schoolingi + ui

where sharei is the budget share for education related goods (that is, the proportion of the total budget spenton those goods), PCEi is total expenditure per person, and schoolingi is average years of schooling in thefamily.

(a) What is the predicted budget share spent in education related goods for a family with total expendituresper person equal to 600, and with average schooling equal to 12?

Solution:�0:7 + 0:1� ln(600) + 0:03� 12 = 0:299

Or about 30% of the predicted budget.

39

Page 40: The Big Problems File

(b) (2 points) Suppose that you estimate the above model omitting the variable schoolingi: Describe thecondition necessary for the OLS estimator of the e¤ect of lnPCEi on sharei to be still unbiased. Do youthink these conditions are satis�ed here? If they are not, do you think that the coe¢ cient of the includedregressor will be biased upwards or downwards?

Solution: It would have to be that ln(PCEi) and schoolingi are uncorrelated. This is unlikely to holdtrue. We would expect total expenditure per head to be positively correlated with schooling, so if theschooling variable is omitted, we�d expect the coe¢ cient in front of ln(PCEi) to have upward bias.

(c) (2 points) You are still omitting the variable schoolingi from the estimated regression. Suppose thatcov(lnPCEi; schoolingi) = 0:7 and var(lnPCEi) = 0:36: Compute the value of the asymptotic bias ofthe OLS estimate for the e¤ect of lnPCEi on sharei:

Solution: Using the de�nition of omitted variable bias, we know that:

�1P! �1 + �2

�ln(PCEi)schoolingi�2ln(PCEi)

Therefore, the bias is �2 � 0:70:36 = 0:0583.

4. (15 points) The following data, taken from Forbes Magazine�s 1996 survey of CEO (chief executive o¢ cer)compensation, contains information on CEO compensation at 770 publicly traded �rms. Each �rm in thedataset had only one CEO in 1996. For each of the 770 �rms in the dataset, we observe:Salbon - The CEOs salary plus bonus (in 1000s of dollars)Logsalbon - The natural log of the CEOs salary plus bonus (in 1000s of dollars)Logsales - The natural log of the �rm�s sales in 1996 (in millions of dollars)Fiveret - The �rm�s �ve year average total return (in percentage)Age - The CEOs age in 1996 (in years)Grad - A dummy variable equal to 1 if the CEO attended a post-graduate program (e.g. MBA), 0 if notComputer - A dummy variable equal to 1 if the �rm is in the computer industry, 0 if notFinancial - A dummy variable equal to 1 if the �rm is in the �nancial industry, 0 if notHere are the results of a regression of Logsalbon on the covariates:

LogSalbon = 3:65(:25)

+ :31(:019)

LogSales+ :0016(:0012)

Fiveret+ :014(:003)

Age�:037(:039)

Grad� :0037(:064)

Computer + :156(:049)

Financial

R2= :32

Here are some tests of a few joint hypotheses:

H0 : �2 = �4 = 0; F � statistic = 1:423H0 : �2 = �5 = 0; F � statistic = 1:026H0 : �4 = �5 = 0; F � statistic = 0:473H0 : �2 = �4 = �5 = 0; F � statistic = 0:959

(a) (4 points) People frequently complain about the high salaries and bonuses earned by CEOs. Somesuggest that their compensation is almost totally disconnected from the performance of their �rms. Usingthe company�s �ve year return (Fiveret) as a measure of a �rm�s performance, do you �nd evidence thatCEO�s are rewarded for good performance? What e¤ect do you �nd? Justify your answer.

Solution: Here the RHS variable is measured in levels, and the LHS in logs so the relationship is log-linearfor Fiveret. Thus, the coe¢ cient of .0016 on Fiveret implies that a 1% change in Fiveret is expected toincrease Salbon by .16% (a pretty small e¤ect). Moreover, since the t-statistic here is t = :0016

:0012 = 1:33;which is less than 1.96 so the e¤ect is statistically insigni�cantly di¤erent from zero.

40

Page 41: The Big Problems File

(b) (4 points) Do CEO�s at larger �rms earn higher salaries? If so, how much? Justify your answer usingLogSales as your measure of �rm size.

Solution: Here the RHS variable is measured in logs as well, so the relationship is log-log for LogSales.Thus, the coe¢ cient of .31 on LogSales implies that a 1% change in LogSales is expected to increaseLogSalbon by .31%. Moreover, since the t-statistic here is t = :31

:019 = 16:3; which a lot bigger than 1.96so the e¤ect is statistically signi�cantly di¤erent from zero.

(c) (4 points) Do the data provide any evidence that CEO�s receive a premium (i.e. higher earnings) forhaving attended a post-graduate program? What e¤ect do you �nd? Explain.

Solution: Here the RHS variable is a dummy, so we will have to talk about the e¤ect on LogSalbondirectly. Thus, the coe¢ cient of�:037 means that having a post graduate degree actually lowers expectedLogSalbon by �:037. However, since the t-statistic here is t = �:037

:039 = :948; which is less than 1:96, thee¤ect is statistically insigni�cantly di¤erent from zero.

(d) (3 points) Suppose I re-run the regression using Salbon instead of LogSalbon as the dependent variableand �nd that R

2= :34: On the basis of this evidence, should I conclude that it�s better to run this new

regression instead of the earlier one? Explain.

Solution: No, you can�t compare R2�s when the LHS variables are di¤erent, so you don�t have theinformation to make a reasonable comparison. (If you wanted to make a meaningful comparison, youwould need information I didn�t give you, like the output of the new regression, and scatter plots ofthe two dependent variables against some of the independent variables. Then you could use the methodoutlined on page 205 for identifying nonlinearities. You would also need to think carefully about whetherthe relationship should involve logs (i.e. have a theory in mind).

5. (24 points) An avid basketball fan, you have collected data on the average points-per-game (Points), yearsin the league (Exper), age (Age), and the number of years played in college (College) for 269 basketball playersin the NBA. Using this dataset, you estimate the following regression:

dPoints = 35:2(7:44)

+ 2:36(:399)

Exper � :077(:026)

(Exper)2 � 1:07(:314)

Age� 1:29(:437)

College ;R2= :129

(a) (4 points) Is it reasonable to include (Exper)2 in the above regression? Why or why not? On the basisof the estimation results, would you recommend dropping it? Why or why not?

Solution: Including (Exper)2 in the regression allows for a non-linear relationship between points scoredand experience. We might reasonably think that a player will bene�t from experience playing in the pros,but that the returns to that experience (in terms of points per fame) might fall over time. I would notrecommend dropping it from the regression because it is signi�cantly di¤erent from zero at the 1% level(P-value = 2 � �

����tact��� = 2 � � �� :077

:026

�= 2 � � (�2:96) � 0:003 < :01)

(b) (5 points) Holding college years and age �xed, what is the expected increase in points-per-game associ-ated with increasing years of experience from 7 to 8. Again, holding college years and age �xed, at whatvalue of experience does the next year of experience actually reduce points-per-game? Does this makesense? Explain.

Solution: For a quadratic regression (like Y = �0 + �1X + �2X2 + u), the predicted e¤ect on Y of a

change in X is given by

bY = b�0 + b�1 (X + 1) + b�2 (X + 1)2 ��b�0 + b�1X + b�2X2

�= b�1 + 2b�2X + b�2

41

Page 42: The Big Problems File

(Hint: You could have just used this formula without deriving it.) In our case, dPoints = b�1+2b�2Exper+b�2 = 2:36� 2 � :077 � 7� :077 � 1:21 so we expect points to increase by 1.21 on average. To answer this,you need to �nd where the quadratic function for experience hits its peak and starts to decline. To dothis, set the derivative of �1X + �2X

2 equal to zero and solve for X. We then have X = �1�2�2

: In our

case, Exper0 = 2:362�:077 � 15:32; so the increase from 15 to 16 years of experience would actually reduce

points per game. This is a very high level of experience, and few players last this long in the NBA, sowe can essentially ignore this prediction. (It may be picking up some of the e¤ects of age that are notcaptured by include it only linearly).

(c) (4 points) What is the interpretation of the coe¢ cient on college? Is is signi�cantly di¤erent from zeroat the 5% level? Does this make sense? Explain. (Hint: NBA players can enter the NBA before �nishingcollege and some, like Kevin Garnett, start playing in the NBA right after high school.)

Solution: The coe¢ cient on college implies that an additional year of college is expected to decreasepoints per game by 1.29. Since the P-value = 2 ��

����tact��� = 2 �� ��1:29

:437

�= 2 �� (�2:95) � 0:003 < :05;

it is signi�cantly di¤erent from zero at the 5% (or even 1%) level. Many of the most promising playersleave college early, or, in some cases, forego college altogether, to play in the NBA. These top playerscommand the highest salaries. It is not more college that hurts salary, but less college is indicative ofsuper-star potential.

(d) (3 points) You also have data on the position played by each player in the sample (either Center, Guard,or Forward). Suppose you now decide to include dummy variables for the position played by each playerin the regression. Here are the regression results:

dPoints = 33:2(7:73)

+ 2:28(:410)

Exper � :072(:026)

(Exper)2 � 1:04(:323)

Age� 1:34(:418)

College+ 2:30(1:21)

Guard+ 1:47(1:23)

Forward

R2= :140

Why have I not included a dummy variable for Center? Will this bias the coe¢ cients on the remainingdummy variables (i.e. Guard and Forward)? Why or why not?

Solution: Including all three position dummy variables would be redundant, and result in perfect multi-collinearity (as you showed in the last problem set). Each player falls into one of the three categories, andthe overall intercept in this regression is the intercept for centers. The choice of which dummy variable(or �0) to drop has no impact on bias, it merely changes the interpretation of the coe¢ cients.

(e) (4 points) Holding everything else �xed, does a guard score more points than a center? How muchmore? Is the di¤erence statistically signi�cant.

Solution: Yes, a guard is estimated to score about 2.3 points more per game on average than a center,holding all other regressors �xed. The P-value

= 2 � �����tact��� = 2 � ���2:30

1:21

�= 2 � � (�1:90) � :057 > :05

so the di¤erence is not statistically di¤erent from zero at the 5% level (although it is close).

(f) (4 points) Holding everything else �xed, does a guard score more points than a forward? How muchmore? Is the di¤erence statistically signi�cant? Can you answer this question with the regression outputprovided? Why or why not?

Solution: Yes, a guard is estimated to score about :83 (= 2:3� 1:74) points more per game on averagethan a forward, holding all other regressors �xed. We cannot test the statistical signi�cance of this withoutmore information, such as the covariance between b�5 and b�6. (The easiest way to test this would be tomake Forward the omitted category instead of Center).

6. (25 points) Your best friend is applying to medical school this year. To assess her chances of being acceptedto the school of her choice, you have collected data on applicants to a major west coast medical school. For 120

42

Page 43: The Big Problems File

applicants you observe whether they were accepted (accept), the applicant�s age (age), whether the applicantis male (male), the applicant�s grade point average (gpa), and the applicant�s MCAT score (mcat). An LPMregression of accept on the above explanatory variables yields:

daccept = �:786(:321)

� :002(:012)

age+ :113(:076)

male+ :016(:086)

gpa+ :048(:006)

mcat ,R2= :35

A Logit of Accept on the same explanatory variables yields:

cPr (Accept = 1 j X�s)= F

��9:90(2:32)

+ :002(:071)

age+ :805(:491)

male� :111(:648)

gpa+ :379(:076)

mcat

�(a) (4 points) Is the e¤ect of mcat signi�cant at the 5% level in each model?

Solution: For the LPM, P-value = 2 ������tact��� = 2 �� �� :048

:006

�= 2 �� (�8) � 0, which is signi�cant at

the 5% (or any) level. For the Logit, P-value = 2 � �����tact��� = 2 � � �� :379

:076

�= 2 � � (�4:99) � 0, which

is also signi�cant at the 5% (or any) level. Aside: This is not surprising since the MCAT would seem tobe a major factor in determining admissions.

(b) (5 points) Focusing on the LPM output, holding everything else constant, what is the expected increasein the probability of acceptance associated with a one point increase the applicant�s MCAT score? Canyou perform the same calculation for the Logit model? If yes, calculate the expected increase. If not,explain why you cannot.

Solution: For the LPM, the coe¢ cient on MCAT implies that a one unit increase in an applicant�sMCAT score is expected to increase the probability of admission by 4.8%. Because the Logit model isnon-linear in the coe¢ cients, we cannot perform the same calculation for the Logit output, we would needto know the initial MCAT score as well as the applicant�s age, gender and GPA.

(c) (4 points) Focusing again on the LPM, what is the predicted probability of acceptance for a 22 year old,female applicant, with a GPA of 1.2 and an MCAT of 16? Does this make sense? If not, what do youthink is going wrong?

Solution: According to the LPM results, the predicted probability of acceptance for a 22 year old, femaleapplicant, with a GPA of 1.2 and an MCAT of 16 is

cPr (Accept = 1 j X�s) = �:786� :002 � 22 + :113 � 0 + :016 � 1:2 + :048 � 16 � �:04This, of course, does not make sense because probabilities must be between 0 and 1. This is the wellknown problem that the LPM �ts a straight line (or more generally, a linear function) through the data,so it can make predictions outside the (0; 1) interval.

(d) (4 points) Focusing now on the Logit output, what is the predicted probability of acceptance for thesame applicant considered in part c? Does this make sense? Do the predictions from part c and d di¤er?If so, why?

Solution: According to the Logit results, the predicted probability of acceptance for a 22 year old, femaleapplicant, with a GPA of 1.2 and an MCAT of 16 is

cPr (Accept = 1 j X�s) = F (�9:90 + :002 � 22 + :805 � 0� :111 � 1:2 + :379 � 16)

= F (�3:93) = e�3:93

1 + e�3:93� :02

Thus, the applicant has about a 2% chance of getting accepted, which makes a lot more sense. Thepredictions are di¤erent because the logit is �tting a non-linear function (which is forced to lie in the(0; 1) interval) through the data, while the LPM is simply linear.

43

Page 44: The Big Problems File

(e) (4 points) Focusing again on the Logit output, is the e¤ect of GPA signi�cantly di¤erent from zero atthe 5% level? What do the sign and signi�cance of this coe¢ cient imply about the impact of GPA onadmissions? Does this make sense to you? Explain.

Solution: For the coe¢ cient on GPA in the Logit, P-value = 2 � �����tact��� = 2 � �

�� :111:648

�= 2 �

� (�:171) � :86, which is insigni�cant at the 5% level (or any other reasonable level as well). Thecoe¢ cient is also negative, which means higher having a higher GPA lowers the applicants probabilityof admission. However, it is important to emphasize that we cannot reject that the coe¢ cient is equalto zero at any reasonable level of signi�cance, so we should really conclude that GPA has no e¤ect onthe probability of admission. Although this might at �rst be surprising, it is possible that admissionscommittees ignore GPA because the MCAT score already tells them what they need to know about theapplicant�s ability (it�s a pretty substantial test). Alternatively, GPA might vary so much across schoolsand majors (due to grade in�ation for example) that we cannot pick up its e¤ect without controlling forthese other factors.

(f) (4 points) Focusing again on the Logit output, you want to test the null hypothesis that all of the slopecoe¢ cients (excluding mcat) are equal to zero. The value of the F-stat is 1.01. Can you reject the nullusing a 10% signi�cance level? Is this surprising? Explain.

Solution: The value of the F-test, 1.01, is smaller than the critical value F3;1 = 2:08 (there are threerestrictions here), so we cannot reject the null that all those slope coe¢ cients are equal to zero. Only GPAis at all surprising, although the arguments in part e explain why it might be reasonable. We would hopethat schools are not discriminating on the basis of age or gender, so the fact that the other coe¢ cientsare not signi�cant is not surprising (and somewhat comforting).

7. (29 points) You are interested in understanding some of the determinants of the variation in grade pointaverages (GPAs) across college students. As such, you have collected data on 4137 students at a large mid-western research university that supports a Division 1 athletics program. Speci�cally, your dataset includesthe following variables:

� colgpa - the student�s cumulative GPA, measured on a four point scale (mean: 2.65)� hsize - the size of the student�s graduating high school class, in 100s (mean: 2.80)� hsperc - the student�s academic percentile2 in their high school graduating class (mean: 19.2)� sat - the student�s combined SAT score (mean: 1030)� female - a dummy variable equal to one if the student is female (mean: .45)� athlete - a dummy variable equal to one if the student is a student-athlete (mean: .05)

You estimate the following regression using OLS (HR Standard Errors in parentheses):

dcolgpa = 1:24(:080)

� :057(:017)

hsize+ :0047(:002)

hsize2� :013(:0006)

hsperc+ :0016(:00007)

sat+ :155(:018)

female+ :169(:037)

athlete; R2= :29

(a) (4 points) Consider the coe¢ cient on the variable hsperc: Does it make sense that this coe¢ cient isnegative? Explain why or why not. Is this coe¢ cient statistically signi�cant at the 5% level?

Solution: The variable hsperc is de�ned so that the larger it is, the lower the student�s standing inhigh school. The negative coe¢ cient means that the worse the student did in high school, the lower theircollege GPA will be. This seems pretty sensible. Finally, since the t-stat = �:013

:0006 = �21:7 is larger inabsolute value than the critical value 1.96, it is indeed signi�cant at the 5% level.

2Percentile is de�ned so that, for example, hsperc = 5 means the top �ve percent of the class.

44

Page 45: The Big Problems File

(b) (5 points)What is the expected change in cumulative GPA associated with increasing the size a student�sgraduating high school class from 100 to 110 students? How about increasing the size from 200 to 250students?

Solution: To answer this, we need to recall the formula for calculating the e¤ect of a change in X on thepredicted value of Y in a quadratic regression. That formula is simply:

�bY = b�1�Xi + 2b�2Xi (�Xi) + b�2 (�Xi)2or, using the variable names we have here

� dcolgpa = b�1�hsizei + 2b�2hsizei (�hsizei) + b�2 (�hsizei)2So, for the change from 100 to 110 we have hsizei = 1 and �hsizei = :1, so

� dcolgpa = b�1 � :1 + 2b�2 � 1 (:1) + b�2 (:1)2 = �:057 � :1 + 2 � :0047 � :1 + :0047 � (:1)2 = �:005For the change from 200 to 250 we have hsizei = 2 and �hsizei = :5, so

� dcolgpa = b�1 � :5 + 2b�2 � 2 (:5) + b�2 (:5)2 = �:057 � :5 + 2 � :0047 � 1 + :0047 � (:5)2 = �:02(c) (6 points) Given that hsize ranges from .1 to 6 in the data, what do the signs on hsize and hsize2

imply about the relationship between high school size and cumulative GPA? Suppose you were to replaceboth hsize and hsize2 with the single variable ln(size): What sign would you expect the coe¢ cient onthis new variable to have? Why?

Solution: The negative sign on hsize and positive sign on hsize2 mean that the relationship betweencolgpa and hsize is convex (or bowl shaped as the �gure below illustrates). Moreover, over the rangeof hsize observed in the data, colgpa is never increasing in hsize (which it could have been since thisquadratic does turn upward at some point). This means that colgpa is decreasing in hsize but at adecreasing rate. We can verify this explicitly by �nding the minimum value of colgpa with respect tohsize, which occurs at the value of hsize at which the derivative of �:057hsize + :0047hsize2 withrespect to hsize is equal to zero. This occurs when �:057+2 � :0047 �hsize = 0; that is when hsize = 6:06;which is outside the observed range of hsize. Given that colgpa is decreasing in hsize at a decreasingrate, the negative of the log function should be able to �t this relationship quite well (since it should looksomething like the quadratic before it bottoms out), so you should expect the coe¢ cient on ln(size) tohave a negative sign.

colg

pa

hsize0 6 12

1.06719

1.24415

(d) (3 points) What is the estimated GPA di¤erential between females and males? Is it statistically signi�-cant at the 1% level?

Solution: The coe¢ cient on female implies that, holding all other variables constant, we expect theGPAs of female students to be about .155 higher than those of males. Since the t-stat = :155

:018 = 8:61 isgreater than 2.58, the estimated di¤erential is statistically signi�cant at the 1% level.

45

Page 46: The Big Problems File

(e) (3 points) What is the estimated GPA di¤erential between athletes and non-athletes? Is it statisticallysigni�cant at the 5% level?

Solution: The coe¢ cient on athlete implies that, holding all other variables constant, we expect the GPAsof student-athletes to be about .169 higher than those of non-athletes. Since the t-stat = :169

:037 = 4:57is greater than 1.96, the estimated di¤erential is statistically signi�cant at the 5% level (or even the 1%level).

(f) (5 points) If we drop the variable sat from the model and re-estimate it, we get the following result

dcolgpa = 3:05(:034)

� :053(:018)

hsize+ :0053(:002)

hsize2 � :017(:0006)

hsperc+ :058(:019)

female+ :005(:039)

athlete; R2= :19

What is the estimated GPA di¤erential between athletes and non-athletes now? Is it statistically signi�-cant at the 5% level? Discuss why the estimate of the coe¢ cient on athlete might be di¤erent from whatyou found in part e).

Solution: Now the coe¢ cient on athlete implies that, holding all other variables constant, we expectthe GPAs of student-athletes to be about .005 higher than those of non-athletes. This is a pretty smalle¤ect. Moreover, since the t-stat = :005

:039 = :128 is much smaller than 1.96, the estimated di¤erential isnot statistically signi�cant from zero at the 5% level. We are now omitting SAT from the regression,and this is likely to cause omitted variable bias. In particular, if athletes have lower SAT scores on av-erage, omitting SAT from the analysis will lead to a downward bias when we estimate the e¤ect of athlete.

(g) (3 points) Explain how you might go about testing whether the e¤ect of sat on colgpa di¤ers by gender.

Solution: You would simply add an interaction term like satfem = sat � female to the regressionestimated above and test whether the coe¢ cient is signi�cantly di¤erent from zero.

8. 12. (16 points) You are asked to analyze student housing demand at a mid-sized southeastern university.You have gathered data from 32 students (don�t worry about the somewhat small sample) on rent per person(RentPer - which is the total apartment rent divided by the number of roommates), whether the student ismale or female (Female), the number of rooms per person in the apartment (RoomPer - which is the numberof rooms in the apartment divided by the number of roommates), and the distance from campus (Dist). Youthen run the following regressions:

RentPer = 8:98(4:37)

+ 5:01(5:20)

Female+ 29:5(7:93)

RoomPer �0:20(0:15)

Dist; R2 = :36; R2= :29

RentPer = 8:17(4:39)

+ 33:0(6:21)

RoomPer � 0:26(0:11)

Dist; R2 = :33; R2= :28

Denote the coe¢ cients of Female; RoomPer; and Dist by �1; �2; and �3 respectively.

(a) (5 points) What is the interpretation of �2? Using the �rst regression, test the hypothesis that �2 = 0(as opposed to �2 > 0). Is this what you would expect?

Solution: �2 represents how much more students are willing to pay (per person) for additional roomsper person (i.e. to not have to share). This is a one-sided hypothesis test of the form

H0 : �2 = 0

HA : �2 > 0

To calculate the relevant P-value we evaluate P-value = �����tact��� = � ��29:5

7:93

�= �(�3:72) � 0, so we

reject the null at any signi�cance level. It makes sense that students would be willing to pay more tohave their own room.

46

Page 47: The Big Problems File

(b) (5 points) What is the interpretation of �3? Using the �rst regression, test the hypothesis that �3 = 0(as opposed to �3 < 0). Is this what you would expect?

Solution: �3 represents how much less students are willing to pay for apartments that are farther fromcampus. Again, this is a one-sided hypothesis test of the form

H0 : �3 = 0

HA : �3 < 0

To calculate the relevant P-value we evaluate P-value = ��tact�= �

�� 0:20:15

�= �(�1:29) � 0:0985, so

we fail to reject the null at the 5% signi�cance level. This is a bit surprising since we would think thatstudents would be willing to pay signi�cantly more to have an apartment that�s close to campus (maybethe campus isn�t in so great an area!).

(c) (6 points) Using the �rst regression, test the hypothesis that �1 = 0 (as opposed to �1 6= 0). Using theinformation provided above, there is another way to test this hypothesis. What do you need to assumefor this method to be valid? What do you conclude?

Solution: Now we have a two-side hypothesis of the form

H0 : �4 = 0

HA : �4 6= 0

To calculate the relevant P-value we evaluate P-value = 2 ������tact��� = 2 �� ��5:01

5:20

�= 2 �� (�0:998) �

0:32, so we fail to reject the null at the 5% signi�cance level. Since we have regression output both withand without the variable Female, we can use the rule of thumb F-statistic discussed in the appendix ofchapter 5. To use this test, we have to assume that the errors are homoskedastic, which is oftenunrealistic. The formula for the rule of thumb F-statistic is

F =

�R2U �R2R

�=q�

1�R2U�= (n� kU � 1)

which is distributed as Fq;n�kU�1: Here we have

F =(:36� :33) =1

(1� :36) = (32� 3� 1) = 1:31

which is less than the 5% critical value F1;28 = 4:2 so again, we fail to reject the null. Note, if you hadbeen given homoskedasticity only standard errors in the �rst regression, this statistic would be identicalto the one you�d get by looking at t2:

9. 13. (16 points) Two authors published a study in 1992 of the e¤ect of minimum wages on teenage employmentusing a U.S. state panel. The paper used annual observations for the years 1977-1989 and included all 50 statesplus the District of Columbia. The estimated equation is of the following type

Eit = �0 + �1Mit

Wit+ �2Teenit + �3Uramit + State FEs+ Time FEs+ uit

where E is the employment to population ratio of teenagers,M is the nominal minimum wage andW is averagewage in the state (so M

W measures the relative minimum wage), Uram is the prime-age male unemploymentrate, and Teen is the teenage population share.

(a) (4 points) Brie�y discuss the advantage of using panel data in this situation rather than pure cross-sections or time series.

47

Page 48: The Big Problems File

Solution: There are likely to be omitted variables in the above regression. One way to deal with someof these is to introduce state and time e¤ects. State e¤ects will capture the in�uence of omitted variablesthat are state speci�c and do not vary over time, while time e¤ects capture those of countrywide variablesthat are common to all states at a point in time. Furthermore, there are more observations when usingpanel data, resulting in more variation.

(b) (4 points) Estimating the model by OLS but including only time �xed e¤ects results in the followingoutput. bEit = b�0 � 0:33

(0:08)

Mit

Wit+ 0:35(0:28)

Teenit � 1:53(0:13)

Uramit; R2= :20

Coe¢ cients for the time �xed e¤ects and the constant term are not reported. Comment on the aboveresults (i.e. interpret the results). Are the coe¢ cients statistically signi�cant?

Solution: There is a negative relationship between minimum wages and the employment to populationratio. Increases in the share of teenagers in the population result in a higher employment to populationratio, and increases in the prime-age male unemployment rate lower the employment to population ratio.20 percent of employment to population of teenagers variation is explained by the above regression. Therelative minimum wage and the prime-age male unemployment rate are signi�cant using a 1% signi�cancelevel, while the proportion of teenagers in the population is not.

(c) (4 points) Adding state �xed e¤ects changes the above equation as follows:

bEit = b�0 + 0:07(0:10)

Mit

Wit� 0:19(0:22)

Teenit � 0:54(0:11)

Uramit; R2= :69

Compare the two results. Why would the inclusion of state �xed e¤ects change the coe¢ cients in thisway?

Solution: The parameter of interest here is the coe¢ cient on the relative minimum wage. While it washighly signi�cant in the previous regression, it now has changed signs and is statistically insigni�cant.The explanatory power of the equation has increased substantially. The size of the other two coe¢ cientshas also decreased. The results suggest that omitted variables, which are now captured by state �xede¤ects, were correlated with the regressors and caused omitted variable bias.

(d) (4 points) The signi�cance of each coe¢ cient decreased, yet R2 increased. How is that possible? Whatdoes this result tell you about testing the hypothesis that all of the state �xed e¤ects can be restricted tohave the same coe¢ cient? How would you test for such a hypothesis (just describe what you would do,no calculations!)?

Solution: The in�uence of the state e¤ects is large, which is re�ected in the dramatic increase inR2: Omitted variable bias is almost certainly causing the changes in the coe¢ cients and their degree of

signi�cance. These are bound to be statistically signi�cant and the hypothesis to restrict these coe¢ cientsto zero is bound to fail. Since these are linear hypotheses that are supposed to hold simultaneously, anF-test is appropriate here.

10. (20 points) You want to study the trade-o¤ between time spent sleeping and time spent working, and alsolook at other factors a¤ecting sleep. You decide to estimate the following relationship

sleep = �0 + �1totwrk + �2educ+ �3age+ u

where sleep and totwrk (total work) are measured in minutes per week and educ and age are measured inyears.

48

Page 49: The Big Problems File

(a) (3 points) If adults trade o¤ sleeping for work, what is the sign of �1? What signs do you think �2 and�3 will have? Why?

Solution: If adults trade o¤ sleep for work, more work implies less sleep (other things equal), so �1 < 0:The signs of �2 and �3 are not so obvious. One could argue that more educated people like to get moreout of life, and so, other things equal, they sleep less (�2 < 0). The relationship between sleeping andage is probably more complicated than this model suggests, but elderly people do seem to sleep less.

(b) (5 points) After collecting data from 706 adults in the United States, you estimate the following regression

dsleep = 3638:2(115:1)

� :148(:019)

totwrk � 11:13(5:78)

educ+ 2:20(1:44)

age ,R2= :11

If someone works �ve more hours per week, by how many minutes is sleep expected to fall? Constructa 95% con�dence interval for this prediction.

Solution: Since totwrk is in minutes, we must convert �ve hours into minutes: �totwrk = 5(60) = 300.Then sleep is predicted to fall by :148(300) = 44:4 minutes. For a week, 45 minutes less sleep is not an

overwhelming change. A 95% CI for 300 ��1 is 300 �b�1�1:96 �300 �SE �b�1� = 300 � :148�1:96 �300 � :019 =(33:2; 55:6) (Of course, you could also have written it as (�55:6;�33:2)).

(c) (4 points) Do totwrk, educ, and age explain much of the variation in sleep? What other factors mighta¤ect the time spent sleeping? Are these likely to be correlated with totwrk?

Solution: Not surprisingly, the three explanatory variables explain only about 11.3% of the variation insleep. One important factor in the error term is general health. Another is marital status, and whetherthe person has children. Health (however we measure that), marital status, and number and ages ofchildren would generally be correlated with totwrk. (For example, less healthy people would tend to workless.)

(d) (4 points) To investigate whether gender has an impact on sleeping habits you estimate the followingregression

dsleep = 3840:8(250:4)

� :163(:021)

totwrk � 11:71(5:75)

educ� 8:70(11:49)

age+ :129(:134)

(age)2 + 87:75(35:46)

male ,R2= :12

All other factors being equal, is there evidence that men sleep more than women? How strong is thisevidence?

Solution: The coe¢ cient on male is 87.75, so a man is estimated to sleep almost one and one-half hoursmore per week than a comparable woman. Furthermore, the P-value = 2 ��

����tact��� = 2 �� ��87:75

35:46

�=

2�� (�2:47) � :014 is close to being signi�cant at the 1% level. Thus, the evidence for a gender di¤erentialis fairly strong.

(e) (4 points) You want to test the null hypothesis that the e¤ect of both age and (age)2 are equal to zero.The value of the F-test is 1.75. Can you reject the null using a 5% signi�cance level?

Solution: The value of the F-test, 1.75, is smaller than the critical value F2;1 = 3, so we cannot rejectthe null that both age coe¢ cients are equal to zero.

49

Page 50: The Big Problems File

5 Limited Dependent Variable Models & Maximum Likelihood

1. 20 points overall. A researcher wants to study whether there is discrimination against female applicantsfor mortgages. She uses the data collected in the Boston area in 1990 that Stock & Watson mention in thetextbook, but she uses only observations related to �white�applicants with no missed mortgage payments intheir credit history. This leads to a sample of 1916 observations. The researcher wants to use di¤erent modelsto see if the choice of model makes a di¤erence. The dependent variable is a dummy equal to one ifthe mortgage is approved. The results are the following (the �rst column reports the average in the sampleof the corresponding variable). Hetersoskedasticity-robust standard errors are in parenthesis.

Regressor Sample Mean LPM Logit Probitconstant 1:16

(0:073)5:32(1:02)

2:81(0:496)

Female 0.18 0:021(0:017)

0:26(0:23)

0:12(0:11)

NoHistory 0.65 �0:040(0:013)

�0:59(0:19)

�0:29(0:092)

P/I ratio 32.56 �0:006(0:001)

�0:053(0:012)

�0:026(0:005)

Log(loan) 4.82 �0:01(0:014)

�0:18(0:19)

�0:086(0:095)

where �Female�is a dummy equal to one when the applicant is a female, �NoHistory�is a dummy equal to oneif the applicant has no credit history, �P/I ratio�is total monthly payments for the mortgage as a proportionof monthly income, and �Log(loan)�is the logarithm of the loan amount.

(a) (5 points) LPM, Logit, and Probit provide very di¤erent point estimates for all coe¢ cients. Does thismean that their prediction will be very di¤erent? Why?

Solution: No the predictions will be similar. While LPM�s coe¢ cients translate directly to marginalprobability increases, probit and logit coe¢ cients do not say much about magnitude of probability increaseor decrease.

(b) (5 points) How do you interpret the coe¢ cient for �Female� in the Linear Probability Model? Is theinterpretation the same in the logit and the probit models? Is there evidence of discrimination againstwomen applicants in any of these models?

Solution: The �Female�coe¢ cient for LPM is interpreted as a 2.1 % increase in the probability of beingapproved if the applicant is a woman, all else equal. The interpretation is not the same. All that we cansay about the probit and logit �Female�coe¢ cient is that there is an increase in the probability of beingapproved if the applicant is a woman, and in all cases the coe¢ cient is not statistically signi�cant. Thereis no evidence of discrimination against women. In fact, there is a small but weakly signi�cant bias forwomen.

(c) (5 points) Give an economic interpretation to the signs of the coe¢ cients in the three models. Do youthink the signs make sense? Do they di¤er across di¤erent models?

Solution: The economic interpretation is that a person is more likely to get a loan if: i. the applicant isfemale ii. the applicant has a credit history iii. the applicant has a low P/I ratio. The signs are consistentacross all three models, and they make sense, except the �Female�parameter. There�s no clear reason whybeing female should confer an advantage. We note that the coe¢ cient in front of log(loan) is insigni�cant,which seems to indicate that the size of the loan is not a major factor in approval or denial of the loan.

50

Page 51: The Big Problems File

(d) (5 points) For the three di¤erent models, compute the predicted e¤ect on the probability of beinggranted the mortgage of having no credit history, for a male, evaluating all other regressors at theirsample averages. Is the estimated e¤ect very di¤erent for the three models? Solution:

LPM : 1:16� 0:04� 0:006� 32:56� 0:01� 4:82 = 0:85844 �= 85:8%

Logit : F (5:32� 0:59� 0:053� 32:56� 0:18� 4:82) = 1

1 + e�2:13672= 0:894421 �= 89:4%

Probit : �(2:81� 0:29� 0:026� 32:56� 0:086� 4:82) = �(1:25892) �= 0:895 = 89:5%

The estimated probabilities for all three models are fairly close.

2. (21 points total) The Poisson distribution, whose name comes from the French mathematician SimeonPoisson, is often used to model count data (that is, non-negative integer valued random variables that representthe number of times something happens). The pf of the Poisson distribution is given by

p(y j �) = e���y

y!for y = 0; 1; ::: (1)

where the parameter � also happens to be the mean of the distribution.3 In one of its earliest empiricalapplications, the Poisson distribution was used to model the number of Prussian cavalry (i.e. horse-riding)soldiers who were killed by kicks from their horses. We are going to repeat that analysis here. In particular,we have data on 10 cavalry units over a period of 20 years, yielding a total of 200 unit/year level observations.Letting d = the number of deaths, we observe the following data

Observed Horse-Kick Fatalities

d = number of Number of unit/years indeaths which d deaths occurred0 1091 652 223 34 1

so that, for example, no deaths occurred in 109 of the 200 unit/years and 4 deaths occurred in 1 unit/year.We are going to estimate the mean (�) using Maximum Likelihood and then see how well our estimator �tsthe data. We will assume that our sample (y1; ::; y200) is iid.

(a) (5 points) Treating the observations (y1; ::; yn) as known, show that the likelihood function Ln (�) canbe written as

Ln (�) =e�n��

Pni=1 yi

nQi=1yi!

(2)

Solution: From the formula for the Poisson pf, we know that the pf of a single observation is just

fi(y) =e���yi

yi!

The likelihood is simply the product of these pfs, or

nQi=1fi(y) =

nQi=1

e���yi

yi!=

nQi=1e���yi

nQi=1yi!

=e�n��

Pni=1 yi

nQi=1yi!

3Recall that the factorial n! is de�ned for a positive integer n as n! = n � (n� 1)::2 � 1 and the special case 0! is de�ned to have value0! = 1:

51

Page 52: The Big Problems File

(b) (4 points) Show that the value of � that maximizes (2) is in fact the sample mean 1n

nPi=1yi. Hint: it will

be much easier to work with the log likelihood here.Solution: Taking logs of both sides of (2) yields

logLn (�) = �n�+

nXi=1

yi

!log �� ln

�nQi=1yi!

�We can �nd the value of � that maximizes the log likelihood by taking the derivative with respect to �and setting it equal to zero. Because the third term does not depend on �, the derivative is simply

@ logLn (�)

@�= �n+ 1

nXi=1

yi

Setting this equal to zero and solving for � yields

(c) (3 points) How do we know that we have maximized and not minimized the likelihood in part b?Solution: To assure that we are �nding the maximum, it is su¢ cient to establish that the function weare maximizing is globally concave (i.e. shaped like a hill, rather than a bowl). One way to do this is tograph the function

logLn (�) = �n�+

nXi=1

yi

!log �� ln

�nQi=1yi!

�for some particular values of n and

Pni=1 yi (the last term is a constant in this function, so it doesn�t

really matter here). Here�s what it looks like for our sample values (n = 200 andPni=1 yi = 122):

lnL

lambda0 .61 3

­1685.49

­182.304

which appears to hit its peak around � = :6: While you could have traced out a �gure like this on your

exam, a much easier way to proceed is to calculate the second derivative of lnL�� 1�2Pni=1 yi

�. Since it

is negative for all � > 0, we know that the likelihood function is globally concave, so the spot where the�rst derivative is 0 will be a maximum.

(d) (3 points) Now, using the data in Table 1, �nd the maximum likelihood estimate of �:

Solution: All we need to do here is calculate the sample mean, which is simply

�y =1

200(109 � 0 + 65 � 1 + 22 � 2 + 3 � 3 + 1 � 4) = 1

200(65 + 44 + 9 + 4)

=122

200= :61; so�MLE = :61

(e) (6 points) Using your maximum likelihood estimator (i.e. (1) with � = �), �ll in the expected frequenciesin the third column of the following table:

52

Page 53: The Big Problems File

Horse-Kick Fatalities

d = number Observed Expectedof deaths Frequency Frequency

0 1091 652 223 34 1

How well did your estimator do?Solution: To �ll in the table, we need to calculate the expected frequencies using our ML estimator

p(x) =e�0:61 (0:61)y

y!for y = 0; 1; :::

and then multiply each one by 200. So we have,

p(0) =e�0:61 (0:61)0

0!=:543 � 11

= :543! :543 � 200 = 108:7

p(1) =e�0:61 (0:61)1

1!=:543 � :61

1= :331! :331 � 200 = 66:2

p(2) =e�0:61 (0:61) 2

2!=:543 � :372

2= :101! :101 � 200 = 20:2

p(3) =e�0:61 (0:61)3

3!=:543 � :227

6= :020! :020 � 200 = 4

p(4) =e�0:61 (0:61)4

4!=:543 � :138

24= :003! :003 � 200 = 0:6

3. Filling in the Table, we have:(15 points) Your company bottles and distributes soft drinks. One productcomes in both �regular�and �diet.�The company would like to know if customers�diet/regular choice can bepredicted from the kind of data they can reasonably expect to obtain (e.g., customer purchases and demographicinformation), or whether this choice is driven by more di¢ cult-to-measure �preference�factors. This issue isinteresting because they are considering an expansion that involves increased distribution expenses, and onlymakes sense if the mix of sales can be shifted towards the diet product, which has a higher margin. To explorethis question, you have been given data on 465 customers who purchased either the diet or regular version ofyour product last weekend. The data include the following variables:DIET �a dummy variable equal to one if the customer purchased the diet product;AGE �the customer�s age, in years;FEMALE �a dummy variable equal to one if the customer is female; andINCOME �customer�s family income in $1,000s.To begin with, you start by specifying a probit model in which the dependent variable is DIET, and theindependent variables are AGE, FEMALE and INCOME. The probit output is as follows.

cPr (diet = 1 j X�s) = ��3:68(:389)

+ :476(:129)

Female� :111(:012)

Age+ :0047(:0025)

Income

�(a) (3 points) Give an economic interpretation to the signs of the coe¢ cients in this model. Are the e¤ects

of female and age signi�cantly di¤erent from zero? Justify your answer.Solution: The probability that a consumer purchases a diet soft drink is higher for females, increaseswith income, and decreases with age. In other words, females are more likely to choose diet soft drinksand so are wealthier people. Older people are less likely to choose them. To discuss signi�cance you needto calculate t-statistics: For Female, t = :476

:129 = 3:39 and for Age, t =:111:012 = 9:25: Both are greater than

1.96 so they are statistically signi�cantly di¤erent from zero.

(b) (5 points) What is the change in the predicted probability of purchasing a diet soft drink when ageincreases from 33 to 34 for a female with a family income of $33,000? How about when age increases from

53

Page 54: The Big Problems File

20 to 21 for this "average" consumer? Do the answers di¤er? If so, why?Solution: The e¤ect of increasing Age from 33 to 34 (on the predicted probability of purchasing diet)for a female with a family income of $33,000 is

� (3:68 + :476� :111� 34 + :0047� 33)� � (3:68 + :476� :111� 33 + :0047� 33)= � (:537)� � (:648) = �:037 =) 100� (�:037) = �3:7%

(i.e. it decreases the predicted probability by -3.7%). Similarly, the e¤ect of increasing Age from 20 to21 (on the predicted probability of purchasing diet) for a female with a family income of $33,000 is

� (3:68 + :476� :111� 21 + :0047� 33)� � (3:68 + :476� :111� 20 + :0047� 33)= � (1:98)� � (2:09) = �:006 =) 100� (�:006) = �:6%

(i.e. it decreases the predicted probability by -.6%) The marginal e¤ect depends on the level of age sinceprobit implies a nonlinear relationship.

(c) (3 points) What is the e¤ect of gender (i.e. female) on the probability of purchasing a diet soft drink?(You should evaluate the e¤ect for an "average" individual who is 34 years old with a family income of$33,000)Solution: The e¤ect of gender (on the predicted probability of purchasing diet) for 34 year old with afemale with a family income of $33,000 is

� (3:68 + :476� :111� 34 + :0047� 33)� � (3:68� :111� 34 + :0047� 33)= � (:537)� � (:061) = :18 =) 100� :18 = 18%

(i.e. it increases the predicted probability by 18%)

(d) (4 points) After some experimentation, you decide to run a probit model that also includes an Age �Female interaction variable (FemAge). The probit output is as follows.

cPr (diet = 1 j X�s) = ��3:26(:500)

+ 1:29(:758)

Female� :099(:015)

Age+ :0046(:0025)

Income� :023(:021)

FemAge

�In addition, a test of the null hypothesis H0 : �1 = �4 = 0 yields an F-statistic = 7:38What does this new output tell you about the e¤ect of gender on the probability of purchasing a diet softdrink? Be complete.Solution: Here we �nd that Female and FemAge are individually insigni�cant (they have t-statisticsof 1.70 and 1.09 respectively). However, the F-statistic for their joint signi�cance is 7.38 which is greaterthan the the critical values for the F2;1 distribution at even the 1% level, so they are jointly signi�cant.This tells us that females are more likely to choose diet soft drinks, but that this positive e¤ect decreaseswith age. This should not be surprising given the results in b.which matches the observed frequenciesquite closely!

4. (15 points) Your company bottles and distributes soft drinks. One product comes in both �regular� and�diet.�The company would like to know if customers�diet/regular choice can be predicted from the kind ofdata they can reasonably expect to obtain (e.g., customer purchases and demographic information), or whetherthis choice is driven by more di¢ cult-to-measure �preference� factors. This issue is interesting because theyare considering an expansion that involves increased distribution expenses, and only makes sense if the mixof sales can be shifted towards the diet product, which has a higher margin. To explore this question, youhave been given data on 465 customers who purchased either the diet or regular version of your product lastweekend. The data include the following variables:DIET �a dummy variable equal to one if the customer purchased the diet product;AGE �the customer�s age, in years;FEMALE �a dummy variable equal to one if the customer is female; andINCOME �customer�s family income in $1,000s.

54

Page 55: The Big Problems File

To begin with, you start by specifying a probit model in which the dependent variable is DIET, and theindependent variables are AGE, FEMALE and INCOME. The probit output is as follows.

cPr (diet = 1 j X�s) = ��3:68(:389)

+ :476(:129)

Female� :111(:012)

Age+ :0047(:0025)

Income

�(a) (3 points) Give an economic interpretation to the signs of the coe¢ cients in this model. Are the e¤ects

of female and age signi�cantly di¤erent from zero? Justify your answer.Solution: The probability that a consumer purchases a diet soft drink is higher for females, increaseswith income, and decreases with age. In other words, females are more likely to choose diet soft drinksand so are wealthier people. Older people are less likely to choose them. To discuss signi�cance you needto calculate t-statistics: For Female, t = :476

:129 = 3:39 and for Age, t =:111:012 = 9:25: Both are greater than

1.96 so they are statistically signi�cantly di¤erent from zero.

(b) (5 points) What is the change in the predicted probability of purchasing a diet soft drink when ageincreases from 33 to 34 for a female with a family income of $33,000? How about when age increases from20 to 21 for this "average" consumer? Do the answers di¤er? If so, why?Solution: The e¤ect of increasing Age from 33 to 34 (on the predicted probability of purchasing diet)for a female with a family income of $33,000 is

� (3:68 + :476� :111� 34 + :0047� 33)� � (3:68 + :476� :111� 33 + :0047� 33)= � (:537)� � (:648) = �:037 =) 100� (�:037) = �3:7%

(i.e. it decreases the predicted probability by -3.7%). Similarly, the e¤ect of increasing Age from 20 to21 (on the predicted probability of purchasing diet) for a female with a family income of $33,000 is

� (3:68 + :476� :111� 21 + :0047� 33)� � (3:68 + :476� :111� 20 + :0047� 33)= � (1:98)� � (2:09) = �:006 =) 100� (�:006) = �:6%

(i.e. it decreases the predicted probability by -.6%) The marginal e¤ect depends on the level of age sinceprobit implies a nonlinear relationship.

(c) (3 points) What is the e¤ect of gender (i.e. female) on the probability of purchasing a diet soft drink?(You should evaluate the e¤ect for an "average" individual who is 34 years old with a family income of$33,000)Solution: The e¤ect of gender (on the predicted probability of purchasing diet) for 34 year old with afemale with a family income of $33,000 is

� (3:68 + :476� :111� 34 + :0047� 33)� � (3:68� :111� 34 + :0047� 33)= � (:537)� � (:061) = :18 =) 100� :18 = 18%

(i.e. it increases the predicted probability by 18%)

(d) (4 points) After some experimentation, you decide to run a probit model that also includes an Age �Female interaction variable (FemAge). The probit output is as follows.

cPr (diet = 1 j X�s) = ��3:26(:500)

+ 1:29(:758)

Female� :099(:015)

Age+ :0046(:0025)

Income� :023(:021)

FemAge

�In addition, a test of the null hypothesis H0 : �1 = �4 = 0 yields an F-statistic = 7:38What does this new output tell you about the e¤ect of gender on the probability of purchasing a diet softdrink? Be complete.Solution: Here we �nd that Female and FemAge are individually insigni�cant (they have t-statisticsof 1.70 and 1.09 respectively). However, the F-statistic for their joint signi�cance is 7.38 which is greaterthan the the critical values for the F2;1 distribution at even the 1% level, so they are jointly signi�cant.This tells us that females are more likely to choose diet soft drinks, but that this positive e¤ect decreaseswith age. This should not be surprising given the results in b.

55

Page 56: The Big Problems File

5. (15 points) On April 15th, 1912, the ocean liner Titanic collided with an iceberg and sank in the AtlanticOcean. The following dataset contains information on the passengers who were on board the Titanic andwhether or not they survived. In particular, for each of the 2201 people who boarded the ship, we observe thefollowing:Survived - A dummy variable equal to 1 if the passenger survived, 0 if they perishedFemale - A dummy variable equal to 1 if the passenger was a female, 0 if maleChild - A dummy variable equal to 1 if the passenger was a child, 0 if the passenger was an adultFirst - A dummy variable equal to 1 if the passenger was traveling in �rst class, 0 if notSecond - A dummy variable equal to 1 if the passenger was traveling in second class, 0 if notThird - A dummy variable equal to 1 if the passenger was traveling in third class, 0 if notCrew - A dummy variable equal to 1 if the passenger was a member of the Titanic�s crew, 0 if notNote: The variables First, Second, Third and Crew are mutually exclusive. All passengers belonged to oneand only one of these categories.A LPM regression of Survived on various covariates yields

dSurvived = :09(:016)

+ :31(:028)

First+ :12(:026)

Second+ :13(:021)

Crew + :49(:024)

Female+ :18(:048)

Child ,R2= :25

A Probit of Survived on the same covariates yield

cPr (survived = 1 j X�s)= �

��1:24(:07)

+ 1:03(:095)

First+ :398(:095)

Second+ :487(:085)

Crew + 1:45(:078)

Female+ :580(:151)

Child

�(a) (4 points) Using each model, calculate the predicted probability of survival for a female child in �rst

class accommodations. Do these predictions make intuitive sense?

Solution: For the LPM, the predicted probability of survival for a female child in �rst class accommo-dations is cPr (survived = 1 j X�s) = :09 + :31 + :49 + :18 = 1:07For the Probit, the predicted probability of survival for a female child in �rst class accommodations is

cPr (survived = 1 j X�s) = � (�1:24 + 1:03 + 1:45 + :58) = � (1:82) = :966We certainly might expect that the probability of survival would be high for a female child (womenand children �rst!), but the predicted probability from the LPM is greater than one, which is of coursenonsense. This is a result of the assumed linearity of the LPM, which is why we prefer the probit (orlogit) speci�cation.

(b) (4 points) Maritime code (the code of the sea) dictates that women and children be saved before adultmales (hence the saying �women and children �rst�). Using each output, discuss whether (and by howmuch) being a child increased the probability of survival (for the probit, you should perform a comparisonfor a male child in third class accommodations). Again using each output, discuss whether (and by howmuch) being female increased the probability of survival (for the probit, you should perform a comparisonfor a female adult in third class accommodations).Solution: For the LPM, the e¤ect of child (on the predicted probability of survival): b�5 = :18 =)100� :18 = 18% (i.e. it increases the predicted probability by 18%) For the probit, the e¤ect of child (onthe predicted probability of survival) for a male in third class accommodations is:

� (�1:24 + :58)� � (�1:24) = � (�:66)� � (�1:24) = :147 =) 100� :147 = 14:7%

(i.e. it increases the predicted probability by 14.7%). For the LPM, the e¤ect of female (on the predictedprobability of survival) is: b�4 = :49 =) 100� :49 = 49%

56

Page 57: The Big Problems File

(i.e. it increases the predicted probability by 49%). For the probit, the e¤ect of female (on the predictedprobability of survival) for an adult in third class accommodations is:

� (�1:24 + 1:45)� � (�1:24) = � (:21)� � (�1:24) = :476 =) 100� :476 = 47:6%

(i.e. it increases the predicted probability by 47.6%)

(c) (4 points) Maritime code also dictates that the captain go down (perish) along with the ship. Indeed,the Titanic�s captain did not survive the voyage. Using the LPM output only, discuss whether the dataprovides any evidence that the practice of �going down with the ship�was followed by the rest of thecrew? How would your answer change if you had used the probit output instead? Justify your answers.

Solution: For the LPM, the e¤ect of crew (on the predicted probability of survival) is: b�3 = :13 =)100 � :13 = 13% (i.e. it increases the predicted probability by 13%). For the probit, the e¤ect of crewfor depends on who you are comparing them to (crew is like class here since they are mutually exclusivecategories): Compared to an adult male in third class, for example, the e¤ect (on the predicted probabilityof survival) of being a male crewmen is

� (�1:24 + :487)� � (�1:24) = � (�:753)� � (�1:24) = :118 =) 100� :118 = 11:8%

But compared to an adult male in �rst class, the e¤ect of a male crewmen is

� (�1:24 + :487)� � (�1:24 + 1:03) = � (�:753)� � (�:21)= �:191 =) 100� (�:191) = �19:1%

So the answer depends on who you are comparing the crew member to (and whether each person is maleor female as well). To completely answer this question (which I did not expect) you would also perhapswant to know something about where the crew was quartered and how many were males and females, soyou would have an idea of who they should be compared to. Of course, interpreted literally, since someof the crew survived, they didn�t all go down with the ship.

(d) (3 points) Several historians have argued that the practice of saving women and children �rst, even iffollowed by the passengers of the Titanic, did not extend to children in third (the lowest) class accommo-dations. What variable could you add to the probit model above to test this hypothesis? Assuming thatthe historians�hypothesis is correct, what would you expect to �nd?

Solution: You could add a Child�Third interaction variable and test whether the coe¢ cient is negative(using a one-sided test).

6. You are studying child nutrition in India. You use data for a sample of about 27,000 less than 3 year oldchildren, collected in 1998-99 from the Indian National Family and Health Survey. The dependent variableyou are interested in is a binary variable, mi; which is equal to one when the child is malnourished, and zerootherwise. You want to study the determinants of child malnutrition, and you obtain the following estimates.

57

Page 58: The Big Problems File

Robust standard errors are given in parenthesis.

Dependent variable - mi , dummy = 1 if child is �malnourished�Sample mean LPM LPM LPM Logit Probit

(1) (2) (3) (4) (5)Constant 0.130 0.120 0.132 -1.893 -1.121

(0.007) (0.005) (0.005) (0.044) (0.024)Female (dummy=1 if female) -0.020

(0.010)Mother Illiterate 0.53 0.017 0.021 0.004 0.026 0.015

(0.008) (0.006) (0.006) (0.050) (0.027)(Mother Illiterate)*Female 0.009

(0.011)Father Illiterate 0.28 0.023 0.027 0.019 0.133 0.074

(0.010) (0.007) (0.007) (0.050) (0.028)(Father Illiterate)*Female 0.009

(0.014)Birth Order 2.78 0.001 0.002 0.001 0.006 0.003

(0.002) (0.001) (0.001) (0.011) (0.006)(Birth Order)*Female 0.002

(0.003)Wealth -0.05 -0.013 -0.126 -0.066

(0.001) (0.015) (0.008)

Log-Likelihood -11081.7 -11082.4

�Wealth� is a measure of the family wealth (this measure of wealth can be negative, given the way it isconstructed, but the exact construction of this variable is not relevant here), �Mother (Father) illiterate�are(self-explanatory) dummies, and �Birth Order�represents the birth order of the child (so, older children have alower birth order). PLEASE NOTE: this table is also reported in the very last page of the exam.You can take that page o¤ the exam book, and use it for your convenience.

(a) (2 points) You �rst estimate the model in Column (1), using a Linear Probability Model (LPM).Calculate the predicted probability that a boy is malnourished, when both parents are literate, and birthorder is 1.

Answer: The predicted probability in this case is 0.131.

(b) (4 points) Calculate a 95% con�dence interval for the predicted probability of being malnourished for achild with the characteristics described in part (a). The covariance between the estimated constant andthe slope for birth order is -.00001.

Answer: The 95% con�dence interval for the predicted probability is

0:131� 1:96� SE(b�0 + 1� b�7) =

0:131� 1:96�qSE(b�0)2 + SE(b�7)2 � 2Cov(b�0; b�7) = [0:1197; 0:1423]:

(c) (4 points) You want to see if results are di¤erent for boys and girls, so you test the null hypothesis thatall coe¢ cients related to the dummy �Female�are equal to zero. The value of the F-statistic is 1.35. Canyou reject the null using a 10% signi�cance level?

Answer: We can not reject the null that all coe¢ cients related to the variable �female�are zero, as the10% level critical value for the F (4;1) variable is 1.94.

58

Page 59: The Big Problems File

(d) (5 points) You re-estimate the model without the variables related to the dummy �Female�. The resultsare in Column (2). Consider two children, both third born (that is, Birth Order = 3), but one whoseparents are both illiterate, and one with both parents literate. Calculate the di¤erence in the predictedprobability of being malnourished between these two children.

Answer: In this version of the LPM the di¤erence in the predicted probability of being malnourishedwhen parents are illiterate vs. not is 0.048.

(e) (6 points) You are interested in how parental illiteracy a¤ects child malnutrition. However, you suspectthat the results in Column (2) are a¤ected by omitted variable bias, as you are not including a measureof household wealth. Therefore, you estimate a new LPM model, reported in Column (3). Compare theestimated slope for �Mother Illiterate�and �Father Illiterate�in Columns (2) and (3). Do the slopes changein the expected direction?

Answer: The slopes become much smaller. They have changed in the expected direction, as we ex-pect wealth to negatively a¤ect the probability of malnutrition and to be negatively correlated with theilliteracy characteristic. Thus, the coe¢ cients in (2) are upward biased, and go down as we include wealth.

(f) (6 points) Now you re-estimate the last model using logit and probit. The results are reported inColumns (4) and (5) respectively. Consider two children, both fourth born (that is, Birth Order = 4),but one whose parents are both illiterate, and one with both parents literate. Calculate the di¤erencein the predicted probability of being malnourished between these two children, for both logit and probit,assuming that wealth is equal to its mean value in the sample. Do probit and logit produce very di¤erentanswers?

Answer: The changes in predicted probability in Logit and Probit are close, both around 2% (i.e. thepredicted probability of being malnourished when both parents are illiterate is 2% higher that when theyare both literate for a fourth born child, with a family of average wealth).

(g) (4 points) We know that when we estimate the same regression using probit and logit, the estimatedcoe¢ cients are always very di¤erent, because logit and probit make a di¤erent assumption as to which isthe correct likelihood. However, the results in the table show that the two models produce values for thelog-likelihood which are almost identical. Give an intuition as to why this is the case.

Answer: The loglikelihood values are close, as both Logit and Probit predict similar probabilities, inspite of di¤erent functional forms and coe¢ cients.

7. You have a sample of 63,168 individuals from rural India, and you want to see if illiteracy is associated with theprobability that an individual works as an agricultural wage laborer. Let Li = 1 if the individual is working asan agricultural wage laborer at the time of the interview, and 0 otherwise. Let ILLITi be another dummy = 1if the individual is illiterate, and 0 if instead he/she has some education. The following table contains resultsfrom two di¤erent regressions. All results have been estimated using a linear probability model (that is, usingOLS), and heteroskedasticity-robust standard errors are listed to the right of each coe¢ cient.

59

Page 60: The Big Problems File

Table 1 - LPM - Dependent variable is LiModel (1) Model (2)

Coe¢ cient (s.e.) Coe¢ cient (s.e.)

Age -0.0052 0.0001 -0.0038 0.0001log(income) -0.2305 0.0033ILLIT 0.2252 0.0037 0.1542 0.0038ILLIT�Female 0.0974 0.0148 0.0617 0.0144Female -0.0374 0.0114 0.0142 0.0110Owns_land -0.0416 0.0073 -0.0819 0.0070Fall 0.0054 0.0049 0.0096 0.0047Winter 0.0014 0.0048 0.0024 0.0047Spring 0.0046 0.0048 0.0096 0.0047Intercept 0.4416 0.0092 1.8771 0.0232

R2 0.0822 0.1412

where Female is a dummy equal to one if the individual is a woman, Owns_land is a dummy = 1 if theindividual owns agricultural land, and Fall, Winter, and Spring are dummies = 1 if the data have beencollected during the Fall, Winter, or Spring respectively.

(a) (6 points) Using the results for model (2), interpret the coe¢ cients corresponding to the variableslog(income), age, and ILLIT. Are they statistically signi�cant, using a 1% signi�cance level?

Solution: For log(income) the t�statistic is �0:23050:0033 � �70 and therefore signi�cant at the 1% probabil-ity level. A 1% increase in income is associated with a 0.0023 (0:001� 0:2305) decrease in probability ofworking as a wage labor. This result makes sense since richer individuals are unlikely to be agriculturalwage laborers.The coe¢ cient of Age is also signi�cant as its t�statistic is equal to �0:0038

0:0001 � �38. Keeping everythingelse constant, one more year reduces the probability by 0.0038: older individuals may have more experi-ence.ILLIT indicates that an illiterate man is � 15% more likely to work as an agricultural wage laborerthen a litterate man. Its coe¢ cient is higly signi�cant t = 0:1542

0:0038 � 41: it makes sense as education leadsto better opportunities.

(b) (5 points) Using again the results from model (2), calculate the di¤erence between men and women inthe predicted impact of being illiterate on the probability of working as an agricultural wage laborer. Isthe di¤erence large in economic terms? Is is statistically signi�cant, using a 5% level?

Solution: The di¤erence is measured by 0.0617. The t�statistic is 0:06170:0144 � 4:28 which is signi�cant

at 5%. The di¤erence is fairly large: being illiterate increases the probability of working as AWL by 6pecentage points more if you are a woman than if you are a man.

(c) (4 points) Compare the coe¢ cients for ILLIT in model (1) and (2). How would you explain the di¤er-ence, in terms of omitted variable bias?

Solution: The coe¢ cient for ILLIT (and also the interaction with Female) goes down when log(income)is included. This makes sense as the coe¢ cient of the omitted variable is negative and one would expectthat Cov(ILLIT; log(income)) < 0. Therefore the result in model (1) will have positive omitted variablebias [(�)� (�) = (+)].

60

Page 61: The Big Problems File

(d) (4 points) Compare the coe¢ cients for Age in model (1) and (2). How would you explain the di¤erence,in terms of omitted variable bias, and taking into account that older individuals, in this sample, usuallyhave higher income?

Solution: In model (2) the point estimate is larger. The explanation follows the one above: thecoe¢ cient of the omitted variable is negative, Cov(Age; log(income)) > 0 and therefore the sign of theomitted variable bias will be negative [(�)� (+) = (�)].

(e) (4 points) Use the results from model (2). Is there evidence of important seasonality in the probabilityof working as an agricultural wage laborer? That is, does it look like the probability is very di¤erentacross seasons, keeping everything else constant?

Solution: There is no evidence of important seasonality e¤ect as all the coe¢ cients for the seasondummies are < 0:01, that is very small. Keeping everything else constant, Pr(Li = 1) remains more orless the same.

(f) (5 points) You test formally the hypothesis that the probability of working as an agricultural wagelaborer does not depend on the season (keeping everything else constant). State carefully the null andthe alternative hypothesis for this test. The value of the F test is 2.25. What do you conclude?

Solution: H0 : �1 = �2 = �3 = 0H1 : at least one of �1; �2; �3 6= 0where �1; �2; �3 are the coe¢ cients for Fall, Winter, and Spring.

The critical values of an F3;1 for a signi�cance level of 10% and 5% are respectively 2.08 and 2.60.Hence, one would reject the null if testing at 10% and not reject if testing at 5%. The evidence issomehow mixed.

8. 2. (21 points) You are interested in the factors that in�uence whether a person chooses to smoke cigarettes.To analyze this, you have collected a random sample from 807 adults living throughout the United States.Your sample includes the following variables:

� smoke - a dummy variable equal to one if the person is a smoker� cigprice - the per pack price of cigarettes (in cents)� income - the person�s annual income (in dollars)� educ - the person�s total years of schooling (in years)� age - the person�s age (in years)� restaurant - a dummy variable equal to one if person lives in a state where smoking is banned inrestaurants.

� white - a dummy equal to one if the person�s race is white.

61

Page 62: The Big Problems File

You estimate the following models using LPM, Logit, and Probit (HR Standard Errors in parentheses):

Dependent variable: smoke, a dummy = 1 if the person smokesSample mean LPM LPM Logit Probit Probit

(1) (2) (3) (4) (5)Constant .656 .449 -.360 -.199 -.295

(.864) (.119) (.575) (.350) (.045)ln(cigprice) 4.09 -.068

(.208)ln(income) 9.69 .012

(.026)education 12.5 -.029 -.028 -.132 -.082

(.005) (.005) (.027) (.016)age 41.2 .020 .020 0.106 .064

(.005) (.005) (.027) (.016)age2 1990 -.0003 -.0003 -.001 -.0008

(.00006) (.00005) (.0003) (.0002)restaurant -.101 -.099 -.452 -.282

(.038) (.037) (.176) (.107)white -.026

(.051)

Log-Likelihood -510.5 -510.2 -537.5

(a) (3 points) Focusing on LPM (1), a test of the joint signi�cance of the coe¢ cients on ln(cigprice);ln(income), and white yields an F -Statistic of .19. What do you conclude about the joint signi�cance ofthese three variables?

Solution: The joint test has three restrictions and even the 10% critical value of the F3;1 distribution(2.08) is larger than .19, so we can�t reject the null hypothesis that these three coe¢ cients are jointlyinsigni�cant at even the 10% level.

(b) (4 points) Focusing on LPM (2), what is the expected e¤ect of a restaurant smoking ban on the predictedprobability that someone smokes? Is it signi�cant at the 1% level? Construct a 95% con�dence intervalfor this expected e¤ect.

Solution: For LPM (2), the e¤ect is given by the coe¢ cient on restaurant. The estimated coe¢ cientimplies that the smoking ban leads to about a 10% reduction in the probability that someone smokes.Since the t-stat = �:099

:037 = �2:67 is bigger in absolute value than 2.58, this di¤erential is signi�cantlydi¤erent from 0 at the 1% level. A 95% con�dence interval for �rest is given by b�rest� 1:96 � SE(b�rest) =�:099� 1:96 � :037 = (�:17;�:026)

(c) (6 points) Evaluated at the mean values of the other regressors, what is the expected e¤ect of a restaurantsmoking ban on the predicted probability that someone smokes using Logit (3) and Probit (4) respectively?

Solution: For Logit (3), we need to calculate the di¤erence in predicted probabilities:

F (�:360� (:132 � 12:5) + (:106 � 41:2)� (:001 � 1990)� (1 � :452))�F (�:360� (:132 � 12:5) + (:106 � 41:2)� (:001 � 1990))

= F (�:085)� F (:367) = e�:085

1 + e�:085� e:367

1 + e:367= �:11

For Probit (4), we need to calculate the di¤erence in predicted probabilities:

� (�:199� (:082 � 12:5) + (:064 � 41:2)� (:0008 � 1990)� (1 � :282))�� (�:199� (:082 � 12:5) + (:064 � 41:2)� (:0008 � 1990))

= �(�:463)� �(�:181) = �:106

62

Page 63: The Big Problems File

(d) (4 points) How many of the people in the sample are smokers? Explain.

Solution: The speci�cation in column (5) is a probit with no right hand side variables, apart from aconstant. As we saw in class, estimating a Probit (or Logit) with no regressors is equivalent to estimatinga Bernoulli model. In particular, bp = bP (smoke = 1) = �(b�0): Once we have estimated this probability,we can recover the number of smokers in the sample simply by multiplying the probability by the size ofthe sample. In this case, bp = bP (smoke = 1) = �(b�0) = �(�:295) = :384 and :384 � 807 � 310; so thereare 310 smokers in the sample.

(e) (4 points) Using the output provided, calculate the Pseudo-R2 for both logit (3) and probit (4). Basedon these calculations, is there a strong reason to prefer one model over the other in this case?

Solution: The formula for calculating the Pseudo-R2 for a probit is 1� ln(Lmaxprobit)ln(Lmaxbernoulli)

; while for a logit it�s

1� ln(Lmaxlogit)ln(Lmaxbernoulli)

. As explained in part c, the Probit in column 5 is equivalent to estimating the Bernoulli

model using MLE so its maximized log likelihood is equal to ln (Lmaxbernoulli) : For the logit we have

Pseudo�R2 = 1�ln�Lmaxlogit

�ln�Lmaxbernoulli

� = 1� �510:5�537:5 = 1��510:5�537:5 = 1� :950 = :050

For the probit we have

Pseudo�R2 = 1�ln�Lmaxprobit

�ln�Lmaxbernoulli

� = 1� �510:2�537:5 = 1��510:2�537:5 = 1� :949 = :051

The results imply that the models yield very similar results, so there is no strong reason to prefer oneover the other.

9. (20 points) We are going to show how to estimate the mean and variance of a (Normally distributed)population using Maximum Likelihood. Recall that the pdf of a Normal distribution is given by

�(xj�; �2) = 1p2��

exp

"�12

�x� ��

�2#; �1 < x <1

Assume you observe a sample (x1; ::; xn) of size n from a normal distribution, but you do not know � or �2

and would like to estimate them with ML using this sample.

(a) (4 points) Treating the observations (x1; ::; xn) as known, show that the likelihood function Ln��; �2

�can be written as

Ln��; �2

�=

1

(2��2)n=2exp

"� 1

2�2

nXi=1

(xi � �)2#

(1)

Solution: From the formula for the Normal pdf, we know that the pdf of a single observation is just

fi(x) =1

(2��2)1=2exp

"�12

�xi � ��

�2#The likelihood is simply the product of these pdfs, or

nQi=1fi(x) =

nQi=1

1

(2��2)1=2exp

"�12

�xi � ��

�2#Since the product of exponents yields a sum (and � is not indexed by i), we have

nQi=1

1

(2��2)1=2exp

"�12

�xi � ��

�2#=

1

(2��2)n=2exp

"� 1

2�2

nXi=1

(xi � �)2#= Ln

��; �2

63

Page 64: The Big Problems File

(b) (6 points) Treating the value of �2 as known, show that the value of � that maximizes (1) is in fact the

sample mean 1n

nPi=1xi. Hint: it may be easier to work with the log likelihood here.

Solution: We can work with either the likelihood function or log likelihood, but it�s a bit easier to workwith the log likelihood. Taking logs of both sides of (1) yields

logLn��; �2

�= �n

2log(2�)� n

2log �2 � 1

2�2

nXi=1

(xi � �)2

We can �nd the value of � that maximizes the log likelihood by taking the derivative with respect to �and setting it equal to zero. The derivative is simply

@ logLn��; �2

�@�

=1

�2

nXi=1

(xi � �)

Setting this equal to zero yields

1

�2

nPi=1(xi � �) = 0)

nPi=1(xi � �) = 0)

nPi=1xi � n� = 0) � =

1

n

nPi=1xi

(c) (4 points) Remember that we are interested in estimating both � and �2: In part b), we showed that

the MLE of � is 1n

nPi=1xi; now let�s �nd the MLE of �2: Treating the value of � as both given and equal

to x = 1n

nPi=1xi; re-write the likelihood function (1) as a function of �2 (and x): Hint: this is very easy!

Solution: We simply replace � in (1) with x; yielding

Ln��2�=

1

(2��2)n=2exp

"� 1

2�2

nXi=1

(xi � x)2#

(2)

(d) (6 points) Find the value of �2 that maximizes the likelihood function you derived in part c). Hint: itwill de�nitely be easier to work with the log likelihood here.

Solution: In this case it is much easier to work with the log likelihood. Taking logs of both sides of (2)yields

logLn��2�= �n

2log(2�)� n

2log �2 � 1

2�2

nXi=1

(xi � x)2

Again we can �nd the value of �2 that maximizes the log likelihood by taking the derivative with respectto �2 and setting it equal to zero. The derivative is simply

@ logLn��2�

@�2= �n

2

1

�2+

1

2 (�2)2

nXi=1

(xi � x)2

Setting this equal to zero yields

�n2

1

�2+

1

2 (�2)2

nPi=1(xi � x)2 = 0)

1

2 (�2)2

nPi=1(xi � x)2 =

n

2

1

�2) �2 =

1

n

nPi=1(xi � x)2

10. (28 points total) The website for the Dave Matthew�s Band (DMB) is not only a great source of informationabout the band, but also sells CD�s, posters, DVD�s, clothing, concert tickets, and so forth. Having witnessedthe great success that companies like Amazon.com have had in increasing sales by making �personal�recom-mendations to their customers, the person in charge of marketing for DMB has asked you for help. In order topurchase from the site, one must become a member; this allows demographic and purchase history informationto be gathered. You have data on recent orders that were followed by some sort of recommendation from thesite. DMB has tried several approaches for making recommendations of other products:

64

Page 65: The Big Problems File

� Recommending another product frequently ordered by purchasers of the product just purchased (e.g. �Peoplewho bought Dave�s latest album often purchase this new poster too�).

� Recommending another product often purchased by members demographically similar to the purchaser (e.g.to younger members: �Do you have a copy of Dave�s �rst album?�).

� Recommending another product based on the member�s history (e.g. to purchaser�s of DMB�s �rst live CD:�Check out Dave�s second live CD�).

� Recommending a randomly selected product.

For each of 1500 orders, you have information on:4

� Age - the member�s age (in years)

and �ve dummy variables:

� Purchase - equal to one if the recommendation resulted in an additional purchase

� Male - equal to one for men

� Others - equal to one if strategy 1 was used

� LikeY ou - equal to one if strategy 2 was used

� PastPurch - equal to one if strategy 3 was used

Note that one and only one recommendation was used for each order, so Others = LikeY ou = PastPurch = 0corresponds to the use of strategy 4.

You estimate the following models using LPM, Probit, and Logit models:

Dependent variable: Purchase, dummy = 1 if person makes an additional purchaseSample LPM LPM Probit Logit Logitmean (1) (2) (3) (4) (5)

Constant :086(:013)

:287(:035)

�:153(:230)

�:343(:430)

�1:62(:069)

Others :222 :091(:024)

:091(:024)

:440(:118)

:821(:220)

LikeY ou :209 :192(:028)

:189(:027)

:763(:114)

1:39(:209)

PastPurch :235 :073(:023)

:073(:023)

:369(:118)

:690(:221)

Age 22 �:012(:001)

�:077(:009)

�:135(:018)

Male :80 :094(:020)

:476(:118)

:935(:229)

Log-Likelihood �594:5 �595:3 �670:9

1. (a) (2 points) How many of the 1500 recommendations made in this dataset were for a randomly selectedproduct?

Solution: The sample means in the table give us the proportions of people who received recommendationsof the �rst three types. Summing them up we get :222 + :209 + :235 = :666 so the remaining fraction(:334) must have received the random recommendation, yielding a total of :334 � 1500 = 501 people.

4Although the data refer to orders, not members, no member appears more than once in these data.

65

Page 66: The Big Problems File

(b) (4 points) How many of the 1500 recommendations resulted in an additional purchase?

Solution: To answer this question, you need to recover the sample proportion of people for whompurchase = 1; which you can get from the logit with no regressors in column 5. Since

E (Purchase) = P (Purchase = 1) = F (�0) =1

1 + e1:62= :165

we know that 1500 � :165 � 247 people made an additional purchase. Note: it is also possible to use eitherLPM 1 or 2 to do this calculation, since we know that the OLS regression line goes through the means,and you have all their values so, for example,

E (Purchase) = P (Purchase = 1) = :086 + 0:091 � 0:222 + 0:192 � 0:209 + 0:073 � 0:235 = :163

and 1500 � :163 � 245: It was �ne if you did it that way instead (it just would have taken longer).(c) (4 points) Focusing �rst on the LPM in column 1, what is the interpretation of the coe¢ cient on

LikeY ou? Is it signi�cantly di¤erent from 0 at the the 1% level?

Solution: The coe¢ cient of .192 on LikeY ou implies that receiving a recommendation based on strategy2 increases the probability of making a purchase by 19.2%, relative to receiving a recommendation for arandomly selected product. Since the t-statistic

�t = :192

:028 = 6:86�is larger in absolute value than 2.58, it

is signi�cant at the 1% level.

(d) (4 points) Because you were worried that ignoring the demographic variables (age, gender) might leadto omitted variable bias, you added the Male and Age variables to the LPM model in column 2. Basedon the results in column 2, do your concerns appear to have been justi�ed?

Solution: Although the coe¢ cients on Male and Age (in column 2) are statistically signi�cant at anyreasonable level, the coe¢ cients on three strategy variables do not change much at all. Thus, it seemsthat the two demographic variables only satisfy one of the conditions for omitted variables bias and yourconcerns appear to have been misplaced (in this case). Nonetheless, it is probably a good idea to includethem in the analysis anyway, since they are highly signi�cant.

(4 points) For the LPM, Probit and Logit models in columns 2-4, compute the predicted probability ofmaking an additional purchase for a 22 year old male who receives a recommendation based on strategy1. Are the predicted probabilities very di¤erent across the three models?

Solution: For the LPM model, the calculation is simply:287 + :091 � :012 � 22 + :094 = :208 or 20.8%.For the probit model, you must calculate � (�:153 + :44� :077 � 22 + :476) = :176 or 17.6% For the logitmodel, you must calculate F (�:343 + :821� :135 � 22 + :935) = 1

1+e1:557= :174 or 17.4%. The results are

pretty similar across the models, which is not surprising given what we learned in class.

(e) (4 points) For the LPM, Probit and Logit models in columns 2-4, compute the change in the predictedprobability of making an additional purchase when age increases from 22 to 24, for a male who receivesa random recommendation.

Solution: For the LPM model, the calculation is simply 2 � �4 = 2 ��:012 = �:024 or -2.4%. For the pro-bit model, you must calculate � (�:153� :077 � 24 + :476) � � (�:153� :077 � 22 + :476) = � (�1:525) �� (�1:371) = �:022 or -2.2% For the logit model, you must calculate F (�:343� :135 � 24 + :935) �F (�:343� :135 � 22 + :935) = F (�2:648)� F (�2:378) = �:019 or -1.9%

(f) (3 points) Focusing on the Logit model in column 4, you want to test whether the coe¢ cients on thethree included strategy variables are equal. Your joint test yields an F -statistic of 7.36. What do youconclude?

Solution: Since the null hypothesis we are testing is H0 : �1 = �2 = �3; there are 2 restrictions, so weneed to compare our F -statistic to the critical values of the F2;1 distribution. Since the 1% critical valuefor F2;1 is 4.61, we can reject the null hypothesis that the three coe¢ cients are equal at the 1% level.

66

Page 67: The Big Problems File

(g) (3 points) Calculate the Pseudo-R2�s for the models in columns 3 and 4.

Solution: Using the formula from the book, the Pseudo-R2 for the probit is simply

Pseudo�R2 = 1�ln�Lmaxprobit

�ln�Lmaxbernoulli

� = 1� �594:5�670:9 = 1� :886 = :114

and the Pseudo-R2 for the logit is simply

Pseudo�R2 = 1�ln�Lmaxlogit

�ln�Lmaxbernoulli

� = 1� �595:3�670:9 = 1� :887 = :113

2. 6. (5 points overall) Some researchers want to estimate a binary dependent variable model, and they want todecide if logit or probit is more appropriate.

(a) (3 points) Now suppose they have no idea about the true functional form of the conditional probabilityP (Yi = 1 j X1i; :::; Xki). One of them suggests to use a Hausman test (based, as usual, on the �normalized�distance between the two sets of coe¢ cients estimated with the two di¤erent estimators) to decide betweenlogit and probit. Do you think this test would make sense? Why?

Solution: No, Hausman test does not allow us to test for functional misspeci�cation. Since both modelscan be misspeci�ed, the distance between the two sets of estimated coe¢ cients will be uninformative.Besides, recall that Logit and Probit always lead to very di¤erent point estimates. This just reinforcesthe uselessness of the Hausman test.

(b) (2 points) Now suppose that the researchers know for sure that either logit or probit is the correct model.Would a Hausman test make sense as a tool to decide which model is correct? why?

Solution: No, �the reasoning is the same as in part (a).

67

Page 68: The Big Problems File

6 Instrumental Variables & Simultaneous Equation Models (SEM)

1. You want to estimate �1 in the following regression model:

yi = �0 + �1xi + ui; cov(x; u) � �xu 6= 0:

We know that in this case OLS will be inconsistent because the error is correlated with the regressor. However,suppose that there is another variable z such that E(ui j zi) = 0; cov(x; z) � �xz 6= 0. In midterm 1, we haveproved that a consistent estimator for �1 can be obtained as:

��1 =

1n

Pni=1(zi � �z)yi

1n

Pni=1(zi � �z)xi

= �1 +1n

Pni=1(zi � �z)ui

1n

Pni=1(zi � �z)xi

: (5)

In this problem, you can assume that all �regularity conditions�needed for the validity of the Law of LargeNumbers and for the Central Limit Theorem hold. We assume that we have a large sample, so that �z can beapproximated with its true value �z: Using this approximation, one can show that (5) can be re-written as:

pn���1 � �1

�=

�pn �v�v

��v

1n

Pni=1(zi � �z)xi

where vi � (zi � �z)ui and �2v = V ar (v) = V ar [(zi � �z)ui].

(a) (4 points) What does 1n

Pni=1(zi � �z)xi converge in probability to? Justify your answer.

All regularity conditions for LLN hold (so observations are also iid). Then, we know that

1

n

nXi=1

(zi � �z)xip�! E [(zi � �z)xi] = E [(zi � �z) (xi � �x)] = �xz

(b) (4 points) Prove that E (vi) = 0:

Just use LIE:

E (vi) = E [(zi � �z)ui] = E [E [(zi � �z)uijzi]]

= E

24(zi � �z)E (uijzi)| {z }=0

35 = 0(c) (3 points) Suppose that you know that Xn

d�! N(0; 1); that is, you know that a certain random variableXn when the sample size n goes to in�nity converges in distribution to a standard normal. Also, let a bea constant. What does aXn converge in distribution to?

aXnp�! N

�a0; a2

�= N

�0; a2

68

Page 69: The Big Problems File

(d) (3 points) Using the Central Limit Theorem (CLT), prove thatpn �v�v

d�! N (0; 1) (that is,pn �v�v

converges in distribution to a standard normal).

We just have to apply the CLT. According to the CLT, if we have iid variables vi with �nite variance:

�v � �v�vpn

d�! N (0; 1) ;

but this is exactly what we have here, as we proved in part b) that �v = 0; and so

pn�v

�v=pn�v � �v�v

=�v � �v�vpn

d�! N (0; 1)

(e) (4 points) Using the results so far, prove that

pn���1 � �1

�d�! N

�0;�2v�2xz

�:

At this point we know thatpn �v�v

d�! N (0; 1) ; 1nPni=1(zi��z)xi

p�! �xz and we know thatpn���1 � �1

�=�p

n �v�v

��v

1n

Pni=1(zi��z)xi

: Putting together all the pieces, we get (rigorously speaking, because of Slutsky Theorem,

but it was not necessary to mention this)

pn���1 � �1

�=

d�!N(0;1)z }| {�pn�v

�v

��v

1

n

nXi=1

(zi � �z)xi| {z }p�!�xz

d�! �v�xz

N (0; 1) = N

�0;�2v�2xz

(f) (3 points) Based on the results above, how would you construct a 95% con�dence interval for �1?

pn���1 � �1

�d�! N

�0;�2v�2xz

�so that in large samples �

�1 is approximately distributed as

N

��1;

1

n

�2v�2xz

�:

Hence, a con�dence interval can be constructed as

�1 � 1:96

s1

n

�2v�2xz

69

Page 70: The Big Problems File

2. You have data on the performance of n mutual fund managers. The performance is measured by a variableY , whose mean, in the sample, is equal to 10. You also have data on whether the manager reads a certain�nancial newsletter. You estimate the following regression using OLS (robust standard errors in parenthesis).

bYi = 5( 0.1)

+ 1:5( 0.15)

Ri + 0:1( 0.03)

� ei + 0:005( 0.001 )

� e2i (6)

where Ri is a dummy = 1 if the i-th manager reads the newsletter, and ei is years of experience.

(a) (5 points) Suppose that the model as speci�ed in (6) is correct. Does it appear that reading the newsletterimproves the manager�s performance? Evaluate both the statistical and the economic signi�cance of theresults.

Solution: Yes, reading the newsletter improves the performance by 1.5, which is 15% of the mean, andseems to be economically signi�cant. This improvement is also statistically signi�cant: t� stat = 1:5

0:15 =10 exceeds the 1% critical value.

(b) You suspect that the above results are unreliable, as the decision to read the newsletter might be corre-lated with many unobserved characteristics of the manager. For example, suppose that better managerstypically read the newsletter. Then you re-estimate the above model using IV, using as instrument for Ria dummy equal to one if the manager received an o¤er of free subscription. You know that the o¤er wassent to a randomly selected group of managers. Do you think the dummy is a valid instrument? Why, orwhy not? Solution:

Relevant? �Yes, probably free o¤er does a¤ect readership, �people are more likely to read a newsletterwhen they are o¤ered a free subscription.Exogenous? �Yes, since the o¤er has been made randomly. There is no reason to believe that the freeo¤er would have an additional direct e¤ect on Yi beyond its in�uence through Ri.Since this variable is both relevant and exogenous ) looks like it is a valid instrument.

(c) Suppose that the instrument described in the previous part is valid. Would you expect the IV estimateof the newsletter bene�t to be larger or smaller than the OLS estimate? Justify your answer!

Solution: Results in (6) are likely to be biased upward as �readership�will absorb the e¤ect of manager�s�quality�on Y . Hence, we expect IV estimate to be smaller. The variable that controls for the �quality�of managers (call it Q; and let �Q be the corresponding slope in the regression) is omitted from theregression. This leads to an upwards bias in OLS estimate, since Cov(Qi; Ri) > 0 and �Q > 0.

(d) Suppose again that the instrument described previously is valid. You �nd another two variables thatmight be valid instruments, but you are not sure. Since you have more instruments that endogenousvariables, you can calculate the J-statistic, which turns out to be equal to 2.5. What do you conclude?Solutions: There are three instruments and one endogenous variable, hence there are 2 overidentifyingrestrictions and J-statistic follows �2 distribution with 2 degrees of freedom. Critical values for �22 givendi¤erent levels of signi�cance are:

� 10% �4.61� 5% �5.99� 1% �9.21

Since J-stat is smaller than any of the critical values, we fail to reject the null hypothesis that all theinstruments are exogenous.

(e) Now suppose that you also suspect that the population of managers is heterogeneous. Speci�cally, supposethat now the model is

Yi = �0 + �RiRi + �e � ei + �e2 � e2i + ui;

and the �rst stage will beRi = �0 + �iDDi + �e � ei + �e2 � e2i + vi;

70

Page 71: The Big Problems File

where Di is a dummy equal to one if manager i receives the o¤er of free subscription. Notice that nowthere is heterogeneity in �Ri (the slope for readership) as well as �iD (the impact that the o¤er hason readership). You would like to estimate the ATE (Average Treatment E¤ect) or readership, that is,E [�Ri]. Do you think that the IV estimator will be consistent for the ATE? Why? If your answer is no,do you think IV will generally lead to a result that is lower or higher than ATE? Why? (you do not needto use formal proofs here, just use a reasoning along the lines of what we discussed in class for cases whereyou use IV with heterogeneity)

Solution: Using the results we saw in class, with heterogeneity in this population we would expect �2SLS

1

to converge in probability to a weighted average of the individual-speci�c �Ri (the slopes for readership),with weights that will depend on how much individual i�s decision to read the newsletter is a¤ected bythe o¤er of a free subscription (that is, on �iD). Here we suspect that better managers have higher �Ri(they bene�t more) but they also always read the newsletter, so the subscription o¤er does not changetheir decision (hence for these managers �iD is close to zero). This means that the probability limit willattach relatively larger weights to the slopes of the �worst�managers, and hence we should expect anestimate LOWER than the true ATE.

3. Problem (12) �15 points overall Here are some examples of the instrumental variables regression model.In each case you are given the number of instruments and the J -statistic. We follow Stock & Watson�s notation,so the X variables are endogenous variables, and the W variables are exogenous variables. For parts b and c�nd the relevant value from the �2m�k distribution, using a 1% and 5% signi�cance level, and make a decisionwhether or not to reject the null hypothesis.

(a) (3 points) What is the null hypothesis of the J -statistic?

Solution: If we let uiTSLS be the residuals from TSLS estimation, and set up an OLS estimation:

uiTSLS = 0 + 1Z1i + :::+ mZmi + m+1W1i + :::+ m+rWri + ei

where Z�s are instruments, W�s are exogenous variables, and ei are regression error terms. The nullhypothesis is that 1 = ::: = m = 0.

(b) (6 points) Yi = �0 + �1X1i + ui; i = 1; :::; n; Z1i; Z2i are valid instruments, J = 2:58.

Solution: Note that there is one degree of freedom. The �21 value is 6.63 at the 1% level and 3.84 at the5% level. Therefore, we cannot reject the null hypothesis that all instruments are exogenous.

(c) (6 points) Yi = �0 + �1X1i + �2X2i + �3W1i + ui; i = 1; :::; n; Z1i; Z2i; Z3i; Z4i are valid instruments,J = 9:63.

Solution: Note that there are two degrees of freedom. The �22 value is 9.21 at the 1% level and 5.99 atthe 5% level. Therefore, we can reject the null hypothesis that all instruments are exogenous.

4. 14. (16 points) Earnings functions, whereby the log of earnings is regressed on years of education, years ofon-the-job training, and individual characteristics, have been studied for a variety of reasons. Some studieshave focused on the returns to education, others on discrimination, union and non-union di¤erentials, etc. Forall these studies, a major concern has been the fact that ability should enter as a determinant of earnings,but that it is close to impossible to measure and therefore represents an omitted variable. Assume that thecoe¢ cient on years of education is the parameter of interest. Given that education is positively correlated withability, since, for example, more able students attract scholarships and hence receive more years of education,the OLS estimator for the returns to education could be upward-biased. To overcome this problem, variousauthors have used instrumental variables estimation techniques. For each of the potential instruments listedbelow, brie�y discuss instrument validity.

71

Page 72: The Big Problems File

(a) (4 points) The individual�s postal zip code.

Solution: Instrumental validity has two components, instrument relevance

Cov (Zi; Xi) 6= 0

and instrument exogeneityCov (Zi; ui) = 0

The individual�s postal zip code is somewhat likely to be relevant, because the ZIP code indicates, afterall, where the individual lives, and di¤erent people sort across ZIP codes based on their characteristics.But for the same reason the ZIP code is also likely to be ENDOGENOUS, because where the individuallives is likely correlated with lots of unobservable variables (such as school quality, job opportunities,crime levels etc.) which are also going to have a direct impact on earnings.

(b) will almost certainly be uncorrelated with the omitted variable, ability, even though some zip codes mayattract more able (or more likely richer) individuals, so there is no problem with exogeneity. However,this is an example of a weak (not relevant) instrument, since it is also almost certainly uncorrelated withyears of education.

(c) (4 points) The individual�s IQ or test-score on a work-related exam.

Solution: There is instrument relevance in this case, since, on average, individuals who do well inintelligence scores or other work-related test scores will have more years of education. Unfortunatelythere is bound to be a high correlation with the omitted variable ability, since this is what these tests aresupposed to measure, so there is a problem with exogeneity.

(d) (4 points) Years of education for the individual�s mother or father.

Solution: A non-zero correlation between the mother�s or father�s years of education and the individual�syears of education can be expected. Hence this is a relevant instrument. However, it is not clear thatthe parent�s years of education are uncorrelated with parent�s ability, which in turn, can be a majordeterminant of the individual�s ability. If this is the case, then years of education of the mother or fatheris not a valid instrument, due to a failure of exogeneity.

(e) (4 points) Number of siblings the individual has.

Solution: There is some evidence that the larger the number of siblings of an individual, the less thenumber of years of education the individual receives. This makes sense as education is expensive (at leastcollege is). Hence the number of siblings is a relevant instrument. It could also be argued that numberof siblings is uncorrelated with an individual�s ability. In that case it also represents an exogenousinstrument. However, there is the possibility that ability depends on the attention an individual receivesfrom parents, and this attention is shared with other siblings, so it�s a little ambiguous.

5. SEM (10 points) Consider the following model of demand and supply of co¤ee:

Demand: QCoffeei = �1PCoffeei + �2P

Teai + ui

Supply : QCoffeei = �3PCoffeei + �4P

Teai + �5Weather + vi

(Variables are measured in deviations from means, so that the constant is omitted. Don�t worry about this.)What are the expected signs of the various coe¢ cients this model? Explain. Assume that the price of tea�P Tea

�and Weather are exogenous variables. Are the coe¢ cients in the supply equation identi�ed? Are the

coe¢ cients in the demand equation identi�ed? Are they overidenti�ed? Is this result surprising given that

72

Page 73: The Big Problems File

there are more exogenous regressors in the second equation?

Solution: Our intuition about supply and demand curves apply here. We expect:

� �1 < 0 : increasing price drives demand down.� �2 > 0 : co¤ee and tea are substitutes, so increases in the price of tea should increase demand for co¤ee(as consumers substitute toward it).

� �3 > 0 : increasing price causes producers to produce more� �4 > 0 : again, co¤ee and tea are substitutes, so increases in the price of tea should increase the supply ofco¤ee (there�s more potential for pro�t). You could also have argued that �4 = 0 since the price of othergoods shouldn�t be in the supply equation in a static setting.

� �5 < 0 : assuming Weather means bad weather. If you assumed it meant good weather (which is �ne),the expected sign would be the opposite.

� Changes in Weather will shift the supply equation and thereby trace out the demand equation. Hencethe coe¢ cients of the demand equation are exactly identi�ed since the number of instruments equals thenumber of endogenous regressors. However, the coe¢ cients of the supply equation are underidenti�edsince there is no instrumental variable available for estimation. The result is not surprising, since it isnot the number of exogenous regressors in the equation that matters when determining whether or notthe coe¢ cients are identi�ed. Instead what matters is the number of instruments available relative tothe number of endogenous regressors. It is possible that the regression coe¢ cients can be (over)identi�edeven if there are no exogenous regressors present in the equation.

� If you argued that �4 = 0, then both equations are exactly identi�ed.

6. (60 points overall) You have a sample of n schools from a very poor country, and you carry out an experimentto evaluate the e¤ect of a program of free meals on average test scores for young kids. Let Ti be the averagetest score in school i (after the program) and and let Xi = 1 for the treatment group (schools where theprogram is implemented), and equal to zero for the control group (where the program is not implemented). Allschools comply with the rules of the experiment, and randomization is done carefully, so there is no systematicdi¤erence between the treatment and the control group before the experiment. Therefore one can recover thetreatment e¤ect estimating the following regression:

Ti = �0 + �1Xi + ui (1)

You took care of collecting data on test scores, so that Ti is measured without error. Unfortunately, your tworesearch assistants (RA hereafter) have messed up their data, and they did mistakes in reporting which schoolsbelong to the treatment and the control group. Both RA�s recorded information on all schools. That is, youdo not observe Xi; but you observe XA

i and XBi ; where

XAi = Xi + e

Ai

XBi = Xi + e

Bi

where eAi and eBi are the errors made by RA A and B respectively. Assume that these errors are totally

random, so that they have zero expectation, and that they are uncorrelated with all other random variables.

(a) (6 points) Suppose that you regress Ti on XAi : Will you get a consistent estimate of the treatment e¤ect

�1? Justify your answer.

Solution: No, the estimator will be inconsistent, �the measurement error introduces bias in the OLSestimator of �1:

�1p! �1

�2X�2X + �

2eA

73

Page 74: The Big Problems File

Since the ratio �2X�2X+�

2eAis less than one, �1 will be biased towards zero. This result if proved in S&Watson�s

chapter on �assessing studies based on multiple regression�. You can also prove this result in the usualway. You start from the OLS formula, you plug in the �truth�and you solve. Because you do not observeXi; all you can do is to regress Ti on the mismeasured XA

i : Then your OLS estimator is

�1 =1n

Pni=1

�XAi � �XA

�Ti

1n

Pni=1

�XAi � �XA

�2 :But then using the usual LLNs we have

�1p!cov

�XAi ; Ti

�var

�XAi

�where (using the fact that eAi is uncorrelated with everything else and Xi is exogenous)

var�XAi

�= �2X + �

2eA

cov�XAi ; Ti

�= cov

�Xi + e

Ai ; �0 + �1Xi + ui

�= cov (Xi; �1Xi) = �1�

2X

and the result follow.

(b) (5 points) Let XMi be the mean of the two RA�s reports. So XM

i =XAi +X

Bi

2 : Are XAi ; X

Bi ; and X

Mi

unbiased estimators for the true treatment status Xi?

Solution: Yes, all three estimators are unbiased estimators of the true treatment status: E[XAi ] =

[Xi+ eAi ] = Xi+E[e

Ai ] = Xi Same is true for X

Bi . Since, both X

Ai and X

Bi are unbiased estimators, their

linear combination (with the weights sum up to 1!) is an unbiased estimator as well. Note: Xi is NOT arandom variable here, it is the "parameter" you want to estimate!

(c) (6 points) You still want to estimate Xi using XAi ; X

Bi ; or X

Mi : Let �

2A = V ar

�eAi�and �2B = V ar

�eBi�:

Calculate the variance of the three estimators of the true treatment status Xi:

Solution:

V ar(XAi ) = V ar(Xi + e

Ai ) = V ar(e

Ai ) = �

2A

V ar(XBi ) = V ar(Xi + e

Bi ) = V ar(e

Bi ) = �

2B

V ar(XMi ) = V ar(

XAi +X

Bi

2) =

�2A + �2B + Cov(e

Ai ; e

Bi )

4=�2A + �

2B

4

where the last equality holds since Cov(eAi ; eBi ) = 0 by assumption.

(d) (6 points) Suppose that you have reasons to believe that RA A is more accurate than B: In particular,you believe that �2B = 2�

2A: If you wanted to predict the value of Xi; which estimator would you choose,

XAi ; X

Bi ; or X

Mi ? Justify your answer.

Solution: Since all estimators are unbiased, we want to choose the most e¢ cient one, i.e. the one withthe smallest variance. Compare the variances of the estimators in hand:

V ar(XAi ) = �2A

V ar(XBi ) = �2B = 2�

2A

V ar(XMi ) =

�2A + �2B

4=�2A + 2�

2A

4=3

4�2A

Since the last estimator has the smallest variance among the three, we would choose XMi to estimate Xi.

74

Page 75: The Big Problems File

(e) (6 points) You want to estimate equation (1) using OLS, and (as in part d) you think that �2B = 2�2A:Which variable would you choose as regressor, XA

i ; XBi ; or X

Mi ? Justify your answer.

Solution: Since the magnitude of the bias depends on the variance of the measurement error:

�1 ! �1�2X

�2X + �2meas:error

we would choose XMi since it has the smallest variance among the estimators that are proposed. This

will minimize the bias induced the the measurement error.

(f) (5 points) Now you entertain the idea of estimating �1 using instrumental variables. You decide to use XAi

as the �endogenous�variable, and XBi as the �instrument�. Brie�y describe the two steps of the estimation

procedure (just mention what you regress on what in each stage).

Solution: I stage: regress XAi on X

Bi and calculate the �tted values of XA

i : XAi ; II stage: regress Ti on

XAi .

(g) (6 points) Now you have to understand whether XBi is a valid instrument. Is it relevant as predictor of

XAi ?

Solution: Obviously XBi is a relevant instrument: since both X

Ai and X

Bi measure the same thing (Xi),

XBi is a good predictor of XA

i . You can show that Cov(XAi ; X

Bi ) 6= 0.

(h) (6 points) Prove that Cov�XBi ; ui � �1eAi

�= 0:

Solution:

Cov�XBi ; ui � �1eAi

�= Cov(Xi + e

Bi ; ui � �1eAi ) =

= Cov(Xi; ui)� �1Cov(Xi; eAi ) + Cov(eBi ; ui)� �1Cov(eBi ; eAi ) == 0

All the covariation terms are equal zero by assumptions.

(i) (4 points) Prove that (1) can be rewritten as

Ti = �0 + �1XAi +

�ui � �1eAi

�Solution:

Ti = �0 + �1Xi + ui =

= �0 + �1Xi + �1eAi � �1eAi + ui =

= �0 + �1(Xi + eAi ) + ui � �1eAi =

= �0 + �1XAi +

�ui � �1eAi

�(j) (5 points) Does the instrument XB

i satisfy the exogeneity condition? That is, is XBi uncorrelated with

the error of the regression you are estimating?

Solution: Yes, given the results in (h) and (i): Cov(XBi ; ui � �1eAi ) = 0.

75

Page 76: The Big Problems File

(k) (5 points) To wrap up, is the proposed 2SLS estimator a consistent estimator of the true treatment e¤ect�1? Justify your answer.

Solution: Since, the instrument is both relevant and exogenous, 2SLS will give a consistent estimator ofthe true treatment e¤ect.

�2SLS

1 ! Cov(XBi ; Ti)

Cov(XBi ; X

Ai )=Cov(Xi + e

Bi ; �0 + �1Xi + ui)

Cov(Xi + eBi ; Xi + eAi )

=�1�

2X

�2X= �1

7. SEM (24 points overall) You are studying demand and supply of labor in a certain manufacturing sector. Youthink demand and supply are well represented by the following system of two structural equations

Ht = �1 � �2Wt + �3Pt + u1t1 (1)

Ht = �4 + �5Wt + u2t2 (2)

where Ht is hours worked, Wt is the wage, and Pt is price of raw material. Here, all ��s are positive. So,equation (1) represents demand of labor, while equation (2) represents supply. Assume that Pt is exogenous.

(a) (5 points) Brie�y explain why one cannot estimate equations (1) and (2) using OLS.

Solution: In equilibrium H and W are simultaneously determined, �they are both endogenous. Thisintroduces a simultaneous causality bias (since the error term is correlated with the regressor) and makesOLS estimators inconsistent.

(b) (6 points) From (1) and (2), and with some algebra, one can derive the reduced form system, which is:

Ht = �1 + �2Pt + vHt3 (3)

Wt = �3 + �4Pt + vWt4 (4)

where�1 =

�4�2 + �5�1�2 + �5

, �2 =�5�3�2 + �5

, vHt =u2t�2 + �5u1t�2 + �5

You do not have to prove the above result, just take it as given. Prove instead that

�3 =�1 � �4�2 + �5

; �4 =�3

�2 + �5; vWt =

u1t � u2t�2 + �5

Solution: Subtract (2) from (1): 0 = �1 � �4 � (�2 + �5)Wt + �3Pt + u1t � u2t Rearranging, obtain:

Wt =�1 � �4�2 + �5

+�3

�2 + �5Pt +

u1t � u2t�2 + �5

It follows then that:�3 =

�1 � �4�2 + �5

; �4 =�3

�2 + �5; vWt =

u1t � u2t�2 + �5

(c) (5 points) You estimate the reduced form system of equations (3) and (4), so that now you have estimates�1; �2; �3; �4: From these point estimates, can you recover an estimate for the slope of the supply function,�5?

Solution: Yes,

�5 =�2�4

76

Page 77: The Big Problems File

(d) (4 points) Are the coe¢ cients �4 and �5 of the supply function identi�ed? Why? (NO algebra is necessaryhere!)

Solution: Yes, they are identi�ed since the necessary condition for identi�cation is satis�ed, �there isonly one included endogenous variable and one excluded exogenous variable.

(e) (4 points) Are the coe¢ cient �1, �2 and �3 of the demand function identi�ed? Explain. (NO algebra isnecessary here!)

Solution: No, parameters in the demand equation are not identi�ed, �there is one included endogenousvariable but none excluded exogenous variables.

8. SEM (7 points) Consider the following three-equation system:

Y1 = �1 + �2Y2 + �3X1 + �4X2 + u1

Y2 = �1 + �2Y3 + �3X2 + u2

Y3 = 1 + 2Y2 + u3

(a) (3 points) Which of the above equations (if any) can be (consistently) estimated using OLS? Explain.

Solution: Estimating equations 2 and 3 by OLS will lead to simultaneous bias, since there are endogenousvariables on the right hand side of these equations. Equation 1 is more complicated, because Y1 does notappear on the right hand side of equations 2 or 3. Therefore, if u1 is uncorrelated with u2; then equation1 can be consistently estimated using OLS.

(b) (4 points) Which of the above equations (if any) are under-identi�ed? Exactly identi�ed? Over-identi�ed? Explain your answer and be precise.

Solution: If u1 is uncorrelated with u2; then equation 1 can be estimated by OLS, so identi�cation isnot an issue (if u1 is correlated with u2; then equation 1 is under-identi�ed, since there are no availableexogenous instruments). The second equation is unidenti�ed because there are no available instrumentsin the third equation which are excluded from the second. The third is exactly identi�ed (provided that�3 6= 0) since there is a single instrument (X2) available in the second equation for the endogenous variableY2.

9. SEM (30 points) You are interested in the relationship between cigarette smoking and income.

(a) (3 points) A model to estimate the e¤ects of smoking on annual income (perhaps through lost workdays due to illness, or productivity e¤ects) is

ln (income) = �0 + �1cigs+ �2educ+ �3age+ �4age2 + u

where cigs is the number of cigarettes smoked per day, on average. How do you interpret �1?

Solution: Assuming the structural equation represents a causal relationship, 100 � �1 is the approximatepercentage change in income if a person smokes one more cigarette per day. Given e¤ects on productivity,we expect it to be negative (or maybe zero).

(b) (4 points) To re�ect the fact that cigarette consumption might be jointly determined with income, ademand for cigarettes equation is

cigs = 0 + 1 ln (income) + 2educ+ 3age+ 4age2 + 5 ln (cigpric) + 6restaur + v

77

Page 78: The Big Problems File

where cigpric is the price of a pack of cigarettes (in cents), and restaur is a binary variable equal to oneif the person lives in a state with restaurant smoking restrictions. Assuming these are exogenous to theindividual, what signs would you expect for 5 and 6: Explain your answers.

Solution: Since consumption and price are negatively related, we expect 5 < 0. Similarly, every-thing else equal, restaurant smoking restrictions should reduce cigarette smoking (since the bene�ts andopportunities are decreased), so 6 < 0.

(c) (4 points) Under what circumstances is the income equation from part a) identi�ed? How about thedemand equation in part b)?

Solution: We need either 5 or 5 (or both) to be di¤erent from zero. That is, we need at least oneexogenous variable in the cigs equation that is not also in the ln(income) equation. The demand equationis not identi�ed because there are no exogenous variables in the ln(income) equation that are not also inthe cigs equation.

(d) (5 points) Estimating the income equation by OLS yields the following output

dln (income) = 7:80(0:17)

+ :0017(:0017)

cigs+ :060(:008)

educ+ :058(:008)

age� :00063(:0008)

age2; bR2 = :165Discuss the estimate of �1 (remember to discuss the size and signi�cance). Does this make sense? If not,what do you think is going wrong?

Solution: The coe¢ cient on cigs implies that cigarette smoking causes income to increase. In particular,smoking one additional cigarette per day is expected to increase income by 100 � :0017 = :17 percent.However, the coe¢ cient is not statistically di¤erent from zero (t-stat = 1) at even the 10% level. Thisresult does not make sense, but since OLS ignores potential simultaneity between income and cigarettesmoking, we probably do not have an unbiased or consistent estimate of the true e¤ect.

(e) (3 points) The �rst stage (or reduced form) estimate for the income equation is

dcigs = 1:58(23:70)

� :450(:162)

educ+ :823(:154)

age� :0096(:0017)

age2 � :351(5:766)

ln (cigpric)� 2:74(1:11)

restau; bR2 = :051and the F-stat for the joint signi�cance of ln(cigpric) and restaurn is 3.13. What do these results implyabout the instruments available to identify the income equation? Explain.

Solution: While ln(cigpric) is very insigni�cant, restaurn has the expected negative sign and a t-statisticof about �2.47. (People living in states with restaurant smoking restrictions smoke almost three fewercigarettes, on average, given education and age.) Moreover, the F-test of joint signi�cance (3.13) is greaterthan the critical value F2;1 = 3, so we can reject the null that both coe¢ cients are equal to zero. However,the F-stat is nowhere near the cuto¤ of 10 suggested in S&W, implying that we should be worried aboutweak instruments.

(f) (4 points) Estimating the income equation by 2SLS yields

dln (income) = 7:78(0:23)

� :042(:026)

cigs+ :040(:016)

educ+ :094(:023)

age� :00105(:00027)

age2

How does the 2SLS estimate of �1 compare with the OLS estimate (be sure to discuss the size andsigni�cance of the coe¢ cient). Construct a 95% con�dence interval for �1 and discuss its implications.

Solution: Now the coe¢ cient on cigs is negative, but not quite signi�cant at the 10% level (t = 1:615).

78

Page 79: The Big Problems File

However, the estimated e¤ect is very large: each additional cigarette someone smokes lowers predicted

income by about 4.2%. Nonetheless, the 95% CI for �1 is very wide: b�1 � 1:96 � SE �b�1� = :042� 1:96 �:026 = (�:009; :093) as is the CI for the estimated impact of smoking one additional cigarette per day:100 � b�1 � 1:96 � 100 � SE �b�1� = 100 � :042� 1:96 � 100 � :026 = (�:9; 9:3) :

(g) (6 points) Do you think that cigarette prices and restaurant smoking restrictions are likely to be ex-ogenous in the income equation? Explain why or why not. The J-stat from the 2SLS estimation is 6.26.What do you conclude?

Solution: Assuming that state level cigarette prices and restaurant smoking restrictions are exogenous inthe income equation seems problematic. Incomes are known to vary by region, as do restaurant smokingrestrictions. It could be that in states where income is lower (after controlling for education and age),restaurant smoking restrictions are less likely to be in place. Also, cigarette prices (or taxes) may well belower in low income states. Since there are two instruments and one endogenous regressor, the J-stat isdistributed as �21: The �

21 critical value is 6.63 at the 1% level and 3.84 at the 5% level. Therefore, we

can reject the null hypothesis that both instruments are exogenous at the 5% but not the 1% level.

10. (18 points) Consider the univariate regression model

Yi = �0 + �1Xi + ui

and let the correlation between Xi and ui be corr (Xi; ui) = �Xu: Suppose that the second and third OLSassumptions hold, but the �rst does not, because �Xu is nonzero. Also, let �u be the standard deviation of uand �X be the standard deviation of X:

(a) (5 points) Recalling that the OLS estimator of �1 can be written as

b�1 = �1 + 1n

P�Xi �X

�ui

1n

P�Xi �X

�2prove that b�1 p�! �1 + �Xu

�u�X

Solution: Given that the OLS estimator of �1 can be written as

b�1 = �1 + 1n

P�Xi �X

�ui

1n

P�Xi �X

�2we can use our summation trick to re-write this as

b�1 = �1 + 1n

P�Xi �X

�(ui � u)

1n

P�Xi �X

�2Since we know 1

n

P�Xi �X

� p�! �2X and

1

n

X�Xi �X

�ui

p�! cov (ui; Xi)p�! �Xu�u�X :

Substituting this into the above equation yields the desired result

b�1 p�! �1 + �Xu�u�X

79

Page 80: The Big Problems File

(b) (5 points) Suppose you now have a potential instrument Z for X. Let the correlation between Zi andui be corr (Zi; ui) = �Zu and the correlation between Zi and Xi be corr (Zi; Xi) = �ZX : Recalling thatthe 2SLS estimator for �1 can be written as

b�2SLS1 = �1 +1n

P�Zi � Z

�uib�1 1nP�

Zi � Z�2

where b�1 is the OLS estimator for the slope in the regression of X on Z; prove that

b�2SLS1p�! �1 +

�Zu�ZX

�u�X

Solution: Given that the 2SLS estimator for �1 can be written as

b�2SLS1 = �1 +1n

P�Zi � Z

�uib�1 1nP�

Zi � Z�2

where we again use our summation trick to re-write this as

b�2SLS1 = �1 +1n

P�Zi � Z

�(ui � u)b�1 1nP�

Zi � Z�2

Now, since b�1 is the OLS estimator for the regression of X on Z; we know that

b�1 = P�Zi � Z

�XiP�

Zi � Z�2

we can re-write the 2SLS estimator as

b�2SLS1 = �1 +1n

P�Zi � Z

�ui

1n

P�Zi � Z

�Xi

Now, since 1n

P�Zi � Z

�ui

p�! cov (ui; Zi)p�! �Zu�u�Z and 1

n

P�Zi � Z

�Xi

p�! cov (Xi; Zi)p�!

�ZX�X�Z substitution will yield the desired result:

b�2SLS1p�! �1 +

�Zu�ZX

�u�X

(c) (4 points) Assume that �u = �X ; so that the population variation in the error term is the same as itis in X: Suppose that the instrumental variable Z is slightly correlated with u : corr(Z; u) = :1: Supposealso that Z and X have a somewhat stronger correlation: corr(Z;X) = :2: What is the asymptotic biasin the IV estimator?

Solution: Using b�2SLS1p�! �1 +

�Zu�ZX

�u�X

with �u = �X ; b�2SLS1p�! �1 +

:1:2 = �1 + :5: So the asymptotic

bias is .5.

(d) (4 points) How much correlation would have to exist between X and u before OLS has more asymptoticbias than 2SLS?

Solution: Using b�1 p�! �1+�Xu�u�X

with �u = �X ; b�1 p�! �1+�Xu: So we would need to have �Xu > :5before the asymptotic bias in OLS exceeds that of IV.

80

Page 81: The Big Problems File

11. (30 points) You are interested in estimating a demand equation for the �sh sold at Manhattan�s FultonFish market, so you go to the market and collect daily price and quantity observations for 97 consecutive days(since the market is closed on the weekends, you collect data for Monday through Friday). Speci�cally, youhave data on the following variables:

� � totqty - the total quantity of �sh sold that day

� avgprc - the average price of �sh sold that day

�mon - a dummy for whether the day is a Monday

� tues - a dummy for whether the day is a Tuesday

�wed - a dummy for whether the day is a Wednesday

� thurs - a dummy for whether the day is a Thursday

�wave2 - the average maximum wave height over the two days prior to the price and quantity data

�wave3 - the average maximum wave height three and four days prior to the price and quantity data

Note: even though we use time subscripts throughout this question, we will not be using time seriesmethods here. We will also maintain the assumption that all errors are homoskedastic.

1. (a) (3 points) Assume the demand equation can be written for each time period as

ln (totqtyt) = �0 + �1 ln (avgprct) + �2mont + �3tuest + �4wedt + �5thurst + ut

so that demand is allowed to di¤er across days of the week. Why is it inappropriate to use OLS toestimate this demand equation? What additional information do we need to consistently estimate thedemand equation parameters?

Solution: Since price and quantity are determined in equilibrium (as the intersection of the supply anddemand curves), ln (avgprct) will be endogenous in the demand equation above (i.e. correlated with theerror ut). To estimate the demand equation, we will need at least one exogenous variable that appears inthe supply equation.

(b) (4 points) The variables wave2t and wave3t are measures of ocean wave heights over the past severaldays. What two assumptions do we need to make in order to use wave2t and wave3t as instruments forln (avgprct) in estimating the demand equation? Be sure to discuss how these assumptions relate to thedemand and supply equations.

Solution: For wave2t and wave3t to be valid IVs for ln (avgprct), we need two assumptions. The �rst isthat these can be properly excluded from the demand equation (i.e. that they are exogenous). (This maynot be entirely reasonable, as wave heights are determined partly by weather, and demand at a local �shmarket could depend on weather, but its probably pretty reasonable to treat them as exogenous). Thesecond assumption is that at least one of wave2t and wave3t appears in the supply equation (i.e. thatthey are relevant). It seems reasonable that supply would depend on the conditions at sea.

(c) (5 points) The �rst stage of a 2SLS regression yields

dln (avgpric) = �1:02(0:14)

� :012(:114)

mont � :0090(:1119)

tuest + :051(:112)

wedt + :124(:111)

thurst + :094(:021)

wave2t + :053(:020)

wave3t;

bR2 = :165

and a test of the joint signi�cance of wave2t and wave3t yields an F -stat = 19:1, while a test of the jointsigni�cance of the day dummies yields an F -stat = :53. Are wave2t and wave3t individually signi�cantat the 1% level? What do the results of this �rst stage regression reveal about our instruments?

81

Page 82: The Big Problems File

Solution: In the �rst stage, we are primarily concerned with whether or not our instruments are weak. Inthis case, our instruments wave2t and wave3t are both individually and jointly signi�cant. Their t-statsare :094:021 = 4:48 and

:053:020 = 2:65; which are both greater than 2.58 so they are each individually signi�cant

at the 1% level. Since the F -stat = 19:1 > 10 we conclude that our instruments are not weak (using ourRule of Thumb).

(d) (4 points) Estimating the demand equation by 2SLS yields

dln (totqtyt) = 8:16(:18)

� :816(:327)

ln (avgprct)� :307(:229)

mont � :685(:226)

tuest � :521(:224)

wedt + :095(:225)

thurst

where the standard errors in parentheses are the correct ones (i.e. take the two-stage procedure intoaccount). What is the the interpretation of the coe¢ cient on ln (avgprct)? Does its magnitude seemreasonable? Construct a 95% con�dence interval for this coe¢ cient.

Solution: Since this is a ln-ln speci�cation, the coe¢ cient on ln (avgprct) represents the elasticity ofdemand. The estimated coe¢ cient implies that, holding all other variables constant, we expect a 1%increase in price to reduce quantity demanded by about .82% (or equivalently, a 10% increase in price toreduce quantity demanded by about 8.2%). This seems like a reasonable magnitude. A 95% con�denceinterval for this coe¢ cient is given by �:816� 1:96 � :327 = [�1:46;�:175] :

(e) (3 points) Since we have two instruments and one endogenous variable, the demand equation is over-identi�ed. The test of over-identifying restrictions yields an F -statistic of .013. What do you conclude?

Solution: We need to construct the J-statistic which we know is given by J = mF d! �2m�k, where m isthe number of instruments and k is the number of endogenous variables. In this case m = 2 and k = 1,so J = 2 � :013 = :026. The 10% critical value of the �21 distribution is 2.71, so we cannot reject the nullhypothesis that our instruments are exogenous (which is, of course, a good thing).

(f) (3 points) Given that the (unspeci�ed) supply equation evidently depends on the wave variables, whattwo assumptions would we need to make in order to estimate the price elasticity of supply?

Solution: To estimate the supply elasticity, we would have to assume that the day-of-the-week dummiesdo not appear in the supply equation, but they do appear in the demand equation. Part d) providesevidence that there are day-of-the-week e¤ects in the demand function. But we cannot know about thesupply function.

(g) (4 points) Here are the results from the 2SLS estimation of a possible supply equation for this industry

dln (totqtyt) = 10:82(2:23)

+ 2:13(2:24)

ln (avgprct)� :267(:212)

wave2t � :169(:139)

wave3t

What is the the interpretation of the coe¢ cient on ln (avgprct)? Is it statistically signi�cant at the 10%level?

Solution: Since this is also a ln-ln speci�cation, the coe¢ cient on ln (avgprct) in this estimation representsthe elasticity of supply. The estimated coe¢ cient implies that, holding all other variables constant, weexpect a 1% increase in price to increase quantity supplied by about 2.13% (or equivalently, a 10% increasein price to increase quantity supplied by about 21.3%). This seems like a pretty big e¤ect, maybe toobig, but I didn�t expect you to say this. Also, the t-stat on ln (avgprct) is 2:13

2:24 = :95 meaning that wecannot reject that there is no e¤ect of price on supply, even at the 10% level. These results do not instillcon�dence in our supply equation estimates.

82

Page 83: The Big Problems File

(h) (4 points) Consider again the results in part c. Do they provide any explanation for what you found inpart g? Why or why not?

Solution: The regression in part c is also the �rst stage of the 2SLS estimation of the supply curveestimated in part g. None of the day of week dummies show up as individually signi�cant in that

regression. Moreover, these four variables are jointly insigni�cant�F = :53 < 1:94 = F 10%4;1

�and the F -

stat is nowhere near our Rule of Thumb cuto¤ of 10: This means that we have very weak instrumentsin our supply regression, which can cause both our coe¢ cient estimates and standard errors to explode.This looks a lot like what we found in part g.

2. SEM. 4. (26 points total) Consider the two equation structural model given by

y1 = �1y2 + �1z1 + u1 (1)

y2 = �2y1 + �2z2 + u2 (2)

where �1 < 0 and �2 > 0; z1 and z2 are each uncorrelated with both u1 and u2; and u1 is uncorrelated withu2 by assumption.

(a) (5 points) The reduced form equation for y2 takes the form

y2 = �21z1 + �22z2 + v2 (3)

where �21; �22 and v2 are functions of the structural parameters (��s, ��s, and u�s) in equations (1) and (2).Using equations (1) and (2), solve for this reduced form equation (in terms of the structural parameters).

Solution: If we plug the right hand side of (1) into (2) we get

y2 = �2 (�1y2 + �1z1 + u1) + �2z2 + u2

or(1� �2�1) y2 = �2�1z1 + �2z2 + �2u1 + u2

Since �2�1 6= 1 (due to the sign restrictions imposed in the set-up), we can divide through by (1� �2�1)yielding

y2 =�2�1

(1� �2�1)z1 +

�2(1� �2�1)

z2 +�2u1 + u2(1� �2�1)

so �21 =�2�1

(1��2�1) ; �22 =�2

(1��2�1) ; and v2 =�2u1+u2(1��2�1)

(b) (6 points) Using the reduced form equation you solved for in part a, explain why using OLS to estimateequation (1) will yield an inconsistent estimate of �1: What is the direction of the inconsistency (i.e. willOLS over-estimate or under-estimate the true e¤ect)?

Solution: Our reduced form equation

y2 =�2�1

(1� �2�1)z1 +

�2(1� �2�1)

z2 +�2u1 + u2(1� �2�1)

clearly illustrates the problem of simultaneity bias: y2 is a function of the structural error (u1) fromequation (1), which violates OLS assumption 1. We can �nd the sign of the bias by calculating thecovariance between y2 and u1 (since we know that the sign of the bias will coincide with the sign of theircorrelation).

cov(y2; u1) = cov(�2�1

(1� �2�1)z1 +

�2(1� �2�1)

z2 +�2u1 + u2(1� �2�1)

; u1)

= cov(�2u1 + u2(1� �2�1)

; u1) =�2

1� �2�1cov(u1; u1) =

�21� �2�1

�2u1

83

Page 84: The Big Problems File

Since �1�2 < 0 and �2 > 0; the (asymptotic) bias is positive and OLS will over-estimate the true e¤ect.

(c) (5 points) The reduced form equation for y1 takes the form

y1 = �11z1 + �12z2 + v1 (4)

where �11; �12 and v1 are functions of the structural parameters (��s, ��s, and u�s) in equations (1) and(2). Using equations (1) and (2), solve for this reduced form equation.

Solution: Following similar steps to part (a) yields

y1 =�1

(1� �2�1)z1 +

�1�2(1� �2�1)

z2 +�1u2 + u1(1� �2�1)

so

�11 =�1

(1� �2�1); �12 =

�1�2(1� �2�1)

; and v1 =�1u2 + u1(1� �2�1)

(d) (4 points) Can we obtain consistent estimates of the parameters (��s) in equations (3) and (4) usingOLS? Why or why not?

Solution: Yes. Both z1 and z2 are uncorrelated with both u1 and u2 by assumption. Since the errors(the v�s) in the reduced form equations (3) and (4) are functions of the u�s, the z�s are also uncorrelatedwith these reduced form errors, implying that OLS Assumption 1 is satis�ed for equations (3) and (4).This means that estimating (3) and (4) by OLS will yield consistent estimates of the ��s.

(e) (6 points) Estimation of the reduced form equations (3) and (4) by OLS yields

y2 = 1:42 z1 + 1:72 z2

y1 = 1:29 z1 � 2:07 z2

where we will ignore the issue of calculating standard errors and focus only on parameter estimates. Canyou use these estimates to construct consistent estimates of �1 and �1? If yes, do so. If not, explainwhy you can�t. Hint: think about using ratios of the estimated parameters...

Solution: Using the reduced form equations, we see that we can construct a consistent estimate of �1using the ratio

�12�22

=�1�2

(1� �2�1)(1� �2�1)

�2= �1

�12�22

=�2:071:72

= �1:20

Similarly, we can construct a consistent estimate of �2 using

�21�11

=1:42

1:29= 1:10

Finally, since a consistent estimate of 1� �2�1 is 1� (1:10 � �1:20) = 2:32; a consistent estimate of �1 isthen

(1� �2�1) �11 = 2:32 � 1:29 = 3

3. (31 points total) You are interested in the impact children have on a woman�s choice of whether to work (i.e.how fertility impacts labor supply). Speci�cally, you want to know whether (and by how much) a woman�slabor supply falls when she has an additional child. You have collected data from the 1980 U.S. Census on250,000 married women aged 21-35 with two or more children. Your dataset contains the following variables:

84

Page 85: The Big Problems File

� �weeks - total weeks the woman worked in 1979

�morekids - dummy equal to 1 if the woman has three or more kids

� age - age of the woman

� samesex - dummy equal to 1 if the two oldest children are of the same sex (i.e. boy-boy or girl-girl)

1. (a) (4 points) A simple OLS regression of weeks on morekids yields the following output

weeks = 21:07(:056)

� 5:39(:087)

morekids

On average, do women with more than two children work less than women with two children? How muchless? Is the di¤erence statistically signi�cant at the 1% level?

Solution: The OLS regression shows that women with two or more kids work 5.39 fewer weeks (peryear) than women with two kids. The di¤erence is statistically signi�cant at the 1% level since t = 5:39

:087 =61:96 > 2:58:

(b) (4 points) Is there a good reason to think that OLS is an inappropriate technique for estimating thecausal e¤ect of fertility (morekids) on labor supply (weeks)? Explain why or why not.

Solution: There is a good reason to believe OLS is inappropriate. Both the LHS and RHS variables hereare choice variables of the woman and are probably in�uenced by the same underlying unobservables.In particular, the type of unobserved variable that would lead a woman to works a greater than averagenumber of hours would likely also lead her to have fewer children (meaning that morekids is likelypositively correlated with the regression error, causing b�1 to be positively biased).

(c) (4 points) You decide to examine whether the decision to have more than two children is in�uenced bythe gender of the �rst two children. To examine this issue, you regress the variable morekids on samesex(using a LPM), yielding the following results

morekids = :346(:001)

+ :068(:002)

samesex

Are couples whose �rst two children are of the same sex more likely to have a third child? Is the e¤ectlarge? Is it statistically signi�cant at the 1% level?

Solution: The regression reveals that couples whose �rst two children are the same sex are indeed morelikely to have a third child. Speci�cally, they are 6.8% more likely, which is a pretty big e¤ect. It isstatistically signi�cant at the 1% level since t = :068

:002 = 34 > 2:58.

(d) (4 points) Do you think samesex is a valid instrument for an IV regression of weeks on morekids? Whyor why not? Is samesex a weak instrument?

Solution: Since people don�t/can�t really choose the gender of their children (yet!), it would seem tobe a nice source of random variation that pushes around how many kids people choose to have (i.e. it�sexogenous). We have already established relevance in part c), since the coe¢ cient on samesex was positiveand highly signi�cant. Moreover, using the F -Statistic based rule of thumb, it is also quite strong, sinceF = t2 = 1156 >> 10:

85

Page 86: The Big Problems File

(e) (4 points) You now estimate the 2SLS regression of weeks onmorekids using samesex as the instrument.Here are the results:

weeks = 21:42(:487)

� 6:31(1:27)

morekids

How large is the fertility e¤ect on labor supply now? Is it statistically signi�cant?

Solution: The 2SLS regression shows that women with two or more kids work 6.31 fewer weeks (peryear) than women with two kids. The di¤erence is statistically signi�cant at the 1% level since t = 6:31

1:27 =4:97 > 2:58:

(f) (4 points)Would the woman�s level of educational attainment be a valid instrument? Why or why not?How about the woman�s age?

Solution: Neither of these candidates seem like good instruments. While both are likely to be relevant- more educated women might have fewer kids, older women likely have more kids (just because they�vehad more time to have them!) - neither are likely to be exogenous. Education and age probably havestrong direct e¤ects on weeks worked!

(g) (4 points) We don�t observe the level of education in this dataset, but we do have age. Here�s what weget when we use age (as well as samesex) as an instrument in the 2SLS regression of weeks on morekids

weeks = 6:97(:352)

+ 31:67(:921)

morekids

What is the fertility e¤ect on labor supply now? Is it statistically signi�cant? Do you think you havecorrectly recovered the causal e¤ect now? Why or why not?

Solution: Now we �nd the opposite e¤ect: women with three or more kids work 31.67 more weeks peryear than women with two kids. The result is also highly signi�cant, since t = 31:67

:921 = 34:4: However,you should not be comfortable concluding that you have recovered a causal e¤ect; this is an implausibleoutcome, likely driven by the fact that we have included an invalid instrument in our 2SLS regression!(we will con�rm this below).

(h) (3 points) The J-statistic from the regression is part g) is 791.8. What do you conclude?

Solution: Since we have two instruments and one endogenous variable, the J-statistic will have a �21distribution. The 1% critical value for the �21 distribution is 6.63, so with a J-statistic of 791.8 we rejectthe null at any level of signi�cance, meaning that at least one of our instruments is likely endogenous (notsurprising, given the results in part g)).

2. (24 points total) Never tiring of the same old subject, you decide to once again look at the returns toeducation. To do so, you collected data from the NLSY on 1230 white males who are currently in theworkforce. You have data on the following variables:

� � ln(wage) - log of the worker�s hourly wage (in dollars)

� educ - highest grade of school completed

� exper - years of experience in the workforce

� ctuit - average level of college tuition (in $1000s) in the state where the person resides

1. (a) (4 points) To start with, you simply regress ln(wage) on educ using OLS. Here�s what you �nd

ln(wage) = 1:09(:099)

+ :101(:007)

educ

86

Page 87: The Big Problems File

What is the estimated e¤ect of an additional year of schooling? Construct a 95% con�dence interval forthis e¤ect.

Solution: Since this is a log-level regression, the coe¢ cient on educ implies that an additional year ofschooling is associated with a 10.1% increase in hourly wages. The 95% con�dence interval is simply:101� 1:96 � :007 = :101� :01372 = (:087; :115)

(b) (4 points) In search of a valid instrument, you try regressing educ on ctuit. Here�s what you �nd

educ = 13:04(:067)

� :049(:079)

ctuit

What do you conclude about the potential usefulness of ctuit as an instrument for educ?

Solution: We �nd (perhaps not all that surprisingly) that the average college tuition in the state isnegatively correlated with a person�s highest grade of school completed. However, not surprisingly, it isnot particularly signi�cant: t = :049

:079 = :62: So even though college tuition would seem to be exogenous(probably not related to a given persons wage) and potentially relevant (people consume less of thingsthat are more expensive, including education), we don�t really have the best tuition measure here (theaverage tuition in the state where the person resides is probably not the relevant price). So it�s unclearexactly what we are capturing here. More importantly, F = t2 = :384 << 10; so it�s very weak. Overall,it doesn�t look like a good instrument.

(c) (4 points) You decide to include some additional regressors in your original OLS regression. In particular,you add exper; exper2; and some dummy variables for various geographic regions. You �nd the followingresults

ln(wage) = � :507(:241)

+ :137(:009)

educ+ :112(:027)

exper � :003(:001)

exper2

where I have omitted the coe¢ cients on the regional dummies for brevity. What is the estimated e¤ectof an additional year of schooling now? Construct a 95% con�dence interval for this e¤ect.

Solution: Since this is still a log-level regression, the coe¢ cient on educ implies that an additional yearof schooling is associated with a 13.7% increase in hourly wages. This is a pretty big e¤ect, likely biasedupward by omitted variable bias. The 95% con�dence interval is simply :137�1:96 � :009 = :137� :01764 =(:119; :155)

(d) (3 points) Based on the regression results in part c), how does experience impact earnings? (Just describein words, no calculations are necessary here).

Solution: Experience is modelled here as a quadratic, but wages are in logs. Given the signs (onepositive, the other negative), we are �nding that the log of wages increases with experience, but at adecreasing rate (i.e. it�s concave). Having the quadratic term is quite important here since, without it,wages (the level, not the log) would have a convex relationship to experience, which doesn�t make muchsense (unlike education, whose impact on earnings is likely to be convex, we probably expect the returnsto experience to hit diminishing returns at some point).

(e) (4 points) Using your new covariates (including the regional dummies), you re-estimate the e¤ect ofctuit on educ using OLS. Now you �nd a coe¢ cient of -.165 with a standard error of .075. How do youfeel about ctuit as an instrument now?

Solution: The t-statistic on ctuit is now t = �:165:075 = �2:2: While it is statistically signi�cant (at the 5%

level, but not the 1% level), it is still weak, since F = t2 = 4:84 < 10: It still looks like a poor choice ofinstrument!

87

Page 88: The Big Problems File

(f) (5 points) You decide to re-estimate the regression in part c) using 2SLS, with ctuit as an instrument.Here�s what you �nd

ln(wage) = �2:89(2:57)

+ :250(:121)

educ+ :209(:108)

exper � :005(:002)

exper2

What does this regression tell you about the returns to education? Do you have faith in these results?Why or why not?

Solution: Contrary to what we should expect (i.e. that OLS should be biased upwards), we are now�nding an even larger impact of education. In particular, since this is still a log-level regression, thecoe¢ cient on educ implies that an additional year of schooling is associated with a 25% increase in hourlywages. This implausible result is almost certainly driven by our weak instrument, which we expect tobias the coe¢ cient upward.

2. (24 points total) You are interested in estimating treatment e¤ects for a heterogeneous population. We knowthat, since the population is heterogeneous, the population regression equation can be written as

Yi = �0i + �1iXi + ui

In the setting you are interested in, treatment is only partially randomly determined, but you do have a validinstrument Zi with which to perform IV. However, there is also heterogeneity in the e¤ect of Xi on Zi: Inparticular, Xi is related to Zi by the linear model

Xi = �0i + �1iZi + �i

Suppose you know that �0i; �1i; �0i; and �1i are distributed independently of ui; �i; and Zi; that E(ui j Zi) =E(�i j Zi) = 0, and that E (�1i) 6= 0: We are going to show that

b�2SLS1 =

Pni=1

�Zi � Z

� �Yi � Y

�Pni=1

�Zi � Z

� �Xi �X

� p�! E (�1i�1i)

E (�1i)

(Note: we showed this equation in class, but did not prove it. Now, we will.)

(a) (3 points) Let�s start with the �rst step. Assuming you have an iid sample, show that

b�2SLS1p�! �ZY�ZX

Solution: b�2SLS1 =1

n�1Pni=1(Zi�Z)(Yi�Y )

1n�1

Pni=1(Zi�Z)(Xi�X)

= sZYsZX

p�! �ZY�ZX

by the Law of Large Numbers.

(b) (4 points) Let�s focus �rst on the denominator of �ZY�ZX. Using the de�nition of covariance, show that

�ZX = E [(Zi � �Z)�0i] + E [�1iZi (Zi � �Z)] + E [(Zi � �Z) �i]

Solution: By the de�nition of covariance

�ZX = E [(Zi � �Z) (Xi � �X)] = E [(Zi � �Z)Xi] = E [(Zi � �Z) (�0i + �1iZi + �i)]

Breaking up this last term yields

E [(Zi � �Z) (�0i + �1iZi + �i)]= E [(Zi � �Z)�0i] + E [�1iZi (Zi � �Z)] + E [(Zi � �Z) �i]

88

Page 89: The Big Problems File

(c) (4 points) Focusing on the �rst term of the equation from part b), show that E [(Zi � �Z)�0i] = 0

Solution:

E [(Zi � �Z)�0i] = E (Zi � �Z)E (�0i) (since Zi is distributed independently of �0i)

= (E (Zi)� �Z)E (�0i) = (�Z � �Z)E (�0i) = 0

(d) (4 points) Looking now at the second term, show that E [�1iZi (Zi � �Z)] = �2ZE (�1i)

Solution:

E [�1iZi (Zi � �Z)] = E (�1i)E [Zi (Zi � �Z)] since Zi is distributed independently of �1i= E (�1i)E [(Zi � �Z) (Zi � �Z)]= �2ZE (�1i)

(e) (3 points) Finally, looking now at the third term, show that E [(Zi � �Z) �i] = 0

Solution:E [(Zi � �Z) �i] = E [(Zi � �Z) (�i � ��)] = cov (Zi; �i) = 0 by assumption

(f) (3 points) Now, putting it all together, show that �ZX = �2ZE (�1i)

Solution: Very simple! You just combine the results of the previous steps.

�ZX = E [(Zi � �Z)�0i] + E [�1iZi (Zi � �Z)] + E [(Zi � �Z) �i]= 0 + �2ZE (�1i) + 0 = �

2ZE (�1i)

(g) (3 points) In the interest of time, I won�t have you prove that �ZY = �2ZE (�1i�1i) : Instead, taking thisresult as given, use the preceding results to show that

b�2SLS1 =sZYsZX

p�! �ZY�ZX

=E (�1i�1i)

E (�1i)

Solution: From part a) we have b�2SLS1 = sZYsZX

p�! �ZY�ZX

; from part f) we have �ZX = �2ZE (�1i) ; and�nally, we are given that �ZY = �2ZE (�1i�1i) : Therefore

b�2SLS1 =sZYsZX

p�! �ZY�ZX

=�2ZE (�1i�1i)

�2ZE (�1i)=E (�1i�1i)

E (�1i):

89

Page 90: The Big Problems File

7 Estimation with Panel Data

1. 25 Points overall. A researcher investigating the determinants of crime in the United Kingdom has data for42 police regions over 22 years. She estimates by OLS the following regression

ln(cmrt)it = �i + �t + �1unrtmit + �2proythit + �3 ln(pp)it + uit; i = 1; :::; 42; t = 1; :::; 22

where cmrt is the crime rate per head in the population, unrtm is the unemployment rate of males, proyth isthe proportion of youths, and pp is the probability of punishment measured as (number of convictions)/(numberof crimes reported). � and � are area and year �xed e¤ects, where �i equals one for area i and is zero otherwisefor all i, and �t is one in year t and zero for all other years for t = 2; : : : ; 22. �1 is not included.

(a) (4 points) What is the purpose of excluding �1?

Solution: We leave out �1 to avoid perfect multi-collinearity. If you included �1, the sum of the ��swould always equal one, which is also the sum of the ��s.

(b) (4 points) Brie�y discuss the advantages of using panel data for this type of investigation.

Solution: Using panel data, we can control for group speci�c, time invariant e¤ects, such as attitudetowards crime in urban vs. rural areas, and we can also control for time speci�c, group invariant e¤ects,such as macroeconomic shocks.

(c) (4 points) Estimation by OLS using heteroskedasticity-robust standard errors results in the followingoutput, where the coe¢ cients of the �xed e¤ects are not reported for brevity:

bln(cmrt)it = 0:063(0:109)

unrtmit + 3:739(0:179)

proythit � 0:588(0:024)

ln(pp)it;R2 = 0:904

Brie�y interpret the signs of the coe¢ cients. Do the coe¢ cients have the expected signs? Justify youranswer.

Solution: A higher male unemployment rate and a higher proportion of youths increase the crime rate,while a higher probability of punishment decreases the crime rate. The coe¢ cients on the probability ofpunishment and the proportion of youths is statistically signi�cant. while the male unemployment rateis not. The regression explains roughly 90 percent of the variation in crime rates in the sample.

(d) (4 points) Using the results above, what is the e¤ect of a ten percent increase in the probability ofpunishment?

Solution: A ten percent increase in the number of convictions over the number of crimes reporteddecreases the crime rate by about 5.88 percent.

(e) (4 points) You want to test for the relevance of the year �xed e¤ects, and the relevant F -statistic is 1.7.Using a 1% signi�cance level, what do you conclude?

Solution: Since there are 22 years in the sample, there are 21 restrictions imposed by eliminating theyear �xed e¤ects and adding a constant. The critical value is about 1.85 at the 1 % level. Therefore, wecannot reject the restriction.

90

Page 91: The Big Problems File

(f) (5 points) You would like to use a Random E¤ect estimator (RE), but you are afraid that there mightbe correlation between the regressors and the �police�region �xed e¤ect. You therefore run a Hausmantest, and the result is 21.5. What do you conclude?

Solution: Since the value calculated from the Hausman test is 21.5 and there are 24 degrees of freedom(3 variables plus 21 time dummies) �224 is 42.98 at the 1% level and 36.41 at the 5% level. Therefore, wecannot reject the hypothesis that there is no correlation between regressors and regions.

2. (16 points) Recall the example from class, where we had data on tra¢ c fatalities5 and the real tax6 on acase of beer in 48 U.S. states

FatalityRatei = �0 + �1BeerTaxi + �2CultFactorsi + ui

but we don�t observe (among other things) cultural attitudes toward drinking and driving. We addressed thisproblem by specifying the model

FatalityRateit = �1BeerTax1;it + �i + �t + uit

which we estimated using state and year �xed e¤ects as

dFatalityRate = �0:64(0:25)

BeerTax+ StateFEs+ Y earFEs

(a) (3 points) Construct a 99% con�dence interval for the e¤ect of a 50c/ increase in the BeerTax on thefatality rate.

Solution: A 99% con�dence interval for the e¤ect of a 50c/ increase in the BeerTax on the fatality rateis the same as a 99% CI for 12�1 (since BeerTax is measured in dollars): 99% CI for

1

2�1 =

1

2b�1 � 2:58 � 12 � SE �b�1� = 1

2� (�:64)� 2:58 � 1

2� :25 = (�:643; :003)

(b) (4 points) Using the con�dence interval constructed above, can you reject the null hypothesis that a 50c/increase in the BeerTax will decrease the fatality rate by 6 deaths per 100,000?

Solution: This is the same as asking if you can reject the null hypothesis that a 50c/ increase in theBeerTax will decrease the fatality rate by .6 deaths per 10,000. You cannot reject this null hypothesisbecause -.6 lies inside the con�dence interval constructed in part a)

(c) (6 points) This estimated relationship between beer tax and the fatality rate is immune to omittedvariable bias from variables that are constant either over time or across states. But many importantdeterminants of tra¢ c deaths do not fall into this category. Identify two such factors and describe whyomitting them from the analysis could lead to omitted variable bias.

Solution: Alcohol taxes are only one way to discourage drinking and driving. States also di¤er in theirpunishments for drunk driving, and a state that cracks down on drunk driving could do so across theboard by toughing laws as well as raising taxes. If so, omitting these laws could produce omitted variablebias in the OLS estimator of the e¤ect of real beer taxes on tra¢ c fatalities, even after including �xede¤ect. Examples of such laws include the legal drinking age, mandatory jail sentences, and mandatory

5The number of tra¢ c fatalities per 10,000 people in the state. The average value of this variable in this dataset is about 2 (so 2fatalities per 10,000).

6 In 1988 dollars. The average tax is about 50c/ a case.

91

Page 92: The Big Problems File

community service. In addition, because vehicle use depends in part on whether drivers have jobs andbecause tax changes can re�ect economic conditions, omitting state economic conditions also could resultin omitted variable bias. Examples include the unemployment rate and measures of average income.

(d) (3 points) Your friend Daniel, who has not taken this class, suggests handling the issues raised in part c)by replacing the year and state �xed e¤ects with a year/state �xed e¤ect, that is a �xed e¤ect for everyyear-state pair. What, if anything, is wrong with his suggestion? Explain.

Solution: This is like having a dummy variable for every observation, which in addition to the BeerTaxregressor, would give you more variables than observations, making the regression infeasible.

3. (23 points total) Tra¢ c crashes are the leading cause of death for Americans between the ages of 5 and 32.Through various spending policies, the federal government has encouraged states to institute mandatory seatbelt laws to reduce the number of fatalities and serious injuries. You are interested in examining how e¤ectivethese laws are in reducing fatalities. You have collected a panel dataset from 50 U.S. states (plus the Districtof Columbia) for the years 1983-1997. Your dataset includes the following variables:

� � fatalityrate is the number of fatalities per thousand of tra¢ c miles

� sb_usage is the seat belt usage rate

� speed65 is a dummy =1 if 65 mile per hour speed limit, =0 otherwise

� speed70 is a dummy =1 if 70 or higher mile per hour speed limit, =0 otherwise

� ba08 is a dummy =1 if blood alcohol limit � :08%, =0 otherwise� drinkage21 is a dummy =1 if age 21 drinking age, =0 otherwise

� income is per capita income

� age is mean age

� state is a set of state dummies

� year is a set of year dummies

The table below contains the results of several regressions (pooled OLS, OLS with state �xed e¤ects, GLSwith state random e¤ects, and OLS with state and year �xed e¤ects). The following questions are basedon these results.

1 2 3 4sb_usage 4:07

(1:22)�5:77(1:15)

�4:50(1:12)

�3:72(1:13)

speed65 0:148(0:403)

�0:425(0:334)

�0:341(0:337)

�0:783(0:424)

speed70 2:40(0:511)

1:23(0:329)

1:34(0:328)

0:804(0:340)

ba08 �1:92(0:445)

�1:38(0:373)

�1:36(0:367)

�0:822(0:352)

drinkage21 0:079(0:876)

0:745(0:507)

0:767(0:510)

�1:13(0:535)

ln(inc) �18:1(0:931)

�13:5(1:42)

�12:6(1:14)

6:26(3:86)

age �0:007(0:109)

0:979(0:382)

0:232(0:239)

1:32(0:383)

constant 196:5(8:22)

137:9(8:92)

State E¤ects None FE RE FEYear E¤ects None None None FER2 0:544 0:874 0:683 0:897

92

Page 93: The Big Problems File

1. (a) (4 points) Focusing on the results from the pooled OLS regression in column 1, does the estimatedregression suggest that increased seat belt use signi�cantly reduces fatalities? Be complete. Does thisresult make sense? If so, explain why. If not, explain what you think is going on here.

Solution: According to the pooled OLS regression 1, higher seat belt usage rates are actually associatedwith a higher fatality rate. Moreover, this positive result is in fact signi�cant at the 1% level sincet = 4:07

1:22 = 3:34 > 2:58: This is a very suspicious result, suggesting that we may be su¤ering from omittedvariable bias. In particular, it seems likely that in places with the most dangerous driving conditions,people might be more likely to wear seat belts.

(b) (4 points) How do the results regarding the impact of the seat belt usage rate on tra¢ c fatalities changewhen we add state �xed e¤ects (column 2)? Provide an intuitive explanation for why the results changed.

Solution: Once we control for state �xed e¤ects, the coe¢ cient on sb_usage switches signs. We now �nda negative and signi�cant impact on tra¢ c fatalities of increased seat belt usage (t = �5:77

1:15 = �5:02 <�2:58). This seems much more reasonable. If the omitted variables associated with dangerous roadconditions are constant over time, but vary across states, we have now controlled for them, allowing usto isolate the true impact of sb_usage:

(c) (4 points) The speci�cation in column 3 replaces the state �xed e¤ects with state random e¤ects. Doesthis also seem like a reasonable solution to the problem you identi�ed in part a?

Solution: No. The RE speci�cation simply handles the fact that repeated observations from the sameentity are not iid. It does not address the omitted variables problem since it assumes the �xed e¤ect isuncorrelated with the included regressors. However, it is interesting that the coe¢ cients are all prettysimilar to the FE regression. Nonetheless, we know that random e¤ects are not a remedy for the OVBproblem (since RE assumes that �i is uncorrelated with the regressors).

(d) (4 points) Here are the results of a Hausman test between the regressions in columns 2 and 3

What do you conclude from these results?

Solution: The null hypothesis of the Hausman test is that both the RE and FE models are consistent.The Hausman test statistic is distributed �27 here since there are 7 time varying regressors. The 1%critical value for the �27 distribution is 18.48, so we can reject the null at the 1% level, implying that the

93

Page 94: The Big Problems File

RE model is inconsistent (despite the apparent similarity in the coe¢ cients). Note: you could have alsoused the reported p-value of .0004, which is < :01 here.

(e) (4 points) The model in column 4 adds year �xed e¤ects to the model in column 2. An F -test of thejoint signi�cance of these 14 year dummies yields an F -statistic of 8.85. What do you conclude aboutthe joint signi�cance of the year dummies? Do the results regarding the impact of seat belt usage changenow that we have added year �xed e¤ects?

Solution: The 1% critical value for the F14;1 distribution is 2.08, so the time dummies are signi�cantat the 1% level. The results do change, as the coe¢ cient on sb_useage is now �3:72.

(f) (3 points) Which regression speci�cation 1, 2, 3, or 4 is most reliable? Why?

Solution: The pooled OLS regression clearly su¤ered from Omitted Variable Bias, so we need to dosomething about that. RE does not solve this problem, so that speci�cation is inappropriate as well(you also rejected it with the Hausman test). Between the two FE speci�cations, the one with both timeand state �xed e¤ects certainly controls for the most omitted stu¤, and since the time �xed e¤ects aresigni�cant (and the coe¢ cients change a bit too), the speci�cation is column 4 is the best choice.

2. You are using data collected from 545 men who worked every year between 1980 and 1987. The includedvariables are as followsvariable description

id person identi�eryear 1980 to 1987lwage log(wage)le log(labor mkt experience)black Dummy, =1 if blackhisp Dummy, =1 if Hispanicmarried Dummy, =1 if marriededuc years of schoolingunion Dummy, =1 if worked belongs to a uniond81 Dummy, =1 if year = 1981d82 etc. . .d87In what follows we will assume that the error terms in our regressions are homoskedastic. Indeed, it turns outthat the Hausman test that we saw in class is valid only when there is homoskedasticity (there are ways tomodify it to make it robust to heteroskedasticity, but we won�t see them). Here we want to study the relationbetween log(wage), education, and experience, controlling for some other variables. We assume that the trueregression is

ln (wageit) = �0 + �1educit + �2leit +87X

Y=81

Y dY + other controls + �i + uit (1)

where �i is the unobserved �xed e¤ect. Note that the model includes year-speci�c dummies.

(a) The sample includes data from 1980 to 1987, but equation (1) includes only dummies for years 1981 to1987. Why? Solution If we included the dummy for 1980 we would have a multi-collinearity problem.

(b) Before dealing with panel data, we have typically assumed that observations are i.i.d.. You know thatindependent observations are uncorrelated, so, if you can show that two observations are correlated, youshowed that they cannot be independent, and then they cannot be i.i.d.. Here the error term for worker iin year t is vit = �i+ uit, while in year s the error is vis = �i+ uis . Assume that both the �xed e¤ect �i

94

Page 95: The Big Problems File

and the other error component uis have zero expected value, and you can assume that cov (uit; ujs) = 0for every i; j; t; s (unless, of course, i = j and t = s !!). Assume also that the �xed e¤ects are uncorrelatedwith all the u�s. Calculate cov (vit; vis). Is it equal to zero? Do you think that in a panel data theassumption that observations are i.i.d. is a good one? Solution

Cov(vit; vis) = Cov(�i + uit; �i + uis)

= Cov(�i; �i) + Cov(�i; uit) + Cov(�i; uis) + Cov(uit; uis)

= Cov(�i; �i)

Since Cov(vit; vis) = Cov(�i; �i) = V ar(�i) 6= 0, it is not valid to assume that observations in panel datasets are iid.

You estimate equation (1) using RE (Random E¤ects), and FE (Fixed E¤ects) and you get the followingresults (standard errors are shown to the right of the corresponding coe¢ cient):

(1) (2)Random E¤ect Fixed E¤ect

Variable Coe¤. s.e. Coe¤. s.e.

educ 0.104 0.024le 0.301 0.092 0.176 0.116black -0.125 0.139hisp 0.104 0.162married 0.067 0.038 0.003 0.042union 0.050 0.041 0.015 0.044d81 0.039 0.058 0.088 0.063d82 -0.020 0.074 0.068 0.086d83 -0.031 0.089 0.084 0.106d84 -0.055 0.102 0.083 0.124d85 -0.019 0.113 0.140 0.139d86 -0.010 0.124 0.168 0.153d87 0.081 0.133 0.276 0.165constant -0.123 0.316

(c) Using the RE results in column (1), interpret the coe¢ cient related to the variables educ and le. SolutionThe coe¢ cient on educ means that increasing education by one year will increase the wage by 10%. Thecoe¢ cient on le means that increasing experience by 1% will increase wages by 0.3%.

(d) Using again RE, you run a test for the joint signi�cance of all the year-speci�c dummies, and the F -testis equal to 1:576. What do you conclude? Solution The critical value for an F-test with 7 degrees offreedom is 1.72 at the 10% level. Therefore, we can not reject the null hypothesis that all of the timedummies are jointly equal to zero.

(e) Now you turn to the estimates obtained estimating equation (1) using FE (Fixed E¤ects). How do youjustify the fact that the FE estimator did not estimate the coe¢ cients for educ, black, and hisp? SolutionThe race and education level of the individuals in this sample do not vary across time. (Note that everyonewas working during these years and not in school.) Therefore, the e¤ect of these variables is captured bythe �xed e¤ect coe¢ cient.

(f) The estimated coe¢ cients using FE and RE look quite di¤erent, so you suspect that the �xed e¤ectmight be correlated with one or more included regressors. Therefore, you decide to run a Hausmantest and the result is 17.2 Do you reject the null using a 5% signi�cance level? And what about usinga 10% signi�cance level? What does the result of the test suggest about the presence of correlationbetween the �xed e¤ect �i and the included regressors? Based on the result of the Hausman test, is there

95

Page 96: The Big Problems File

evidence that the RE estimators are not consistent? (Note for the calculation of the number of degreesof freedom: this test is based on the comparison of coe¢ cients estimated by using FE and RE, so youcan only compare coe¢ cients that are estimated in both models!). Solution The null hypothesis underthe Hausman test is that the individual e¤ects are uncorrelated with the other regressors. The criticalvalues for a �2 distribution with 10 degrees of freedom are 16.0 for 10% and 18.3 for 5%. (Ten degrees offreedom because there are 10 coe¢ cients that are estimated in both models.) Therefore, we can reject thenull hypothesis at the 10% but not the 5% level. This means that �i may be correlated with the otherregressors, which would cause a random e¤ects model to be inconsistent.

96

Page 97: The Big Problems File

8 Empirical Exercises with a bit of everything

1. SEM 30 Points overall. Consider the following system of simultaneous equations

PBi = �0 + �1QBi + �2P

Wi + u1i (1)

QBi = �3 + �4PBi + u2i (2)

where PBi is the average price of a generic bottle of beer in market i and QBi is the quantity of this beer soldin market i. PWi is the price of a generic bottle of wine in market i.

(a) (5 points) Which is the supply equation and which is the demand equation? Justify your answer.

Solution: Equation (1) is demand and equation (2) is supply. We know (1) must be demand becausethe price of generic wine would not a¤ect supply, but it must surely a¤ect demand, since generic wine isa substitute for generic beer.

(b) (5 points) What are the expected signs of �1; �2 and �4?

Solution:

�1 < 0

�2 > 0

�4 > 0

(c) (5 points) Very brie�y explain why estimating the �rst equation by OLS would result in biased estimatesof (�0; �1; �2).

Solution: Price and quantity of beer are jointly determined by both demand and supply. This is theclassic identi�cation problem.

(d) (5 points) Is equation (1) identi�ed? Justify your answer. If the answer is yes, brie�y mention how youwould estimate the corresponding parameters.

Solution: Equation (1) is not identi�ed.

(e) (5 points) Is equation (2) identi�ed? Justify your answer. If the answer is yes, brie�y mention how youwould estimate the corresponding parameters.

Solution: Equation (2) is identi�ed. Using price of wine as an instrument, run TSLS. There is 1 excludedexogenous parameter. To solve, we�d using the market-clearing mechanism from introductory economicsand note that quantity demanded = quantity supplied. We can re-write equation (1) as:

QBi = �B0B1

+1

B1PBi �

B2B1

� u1iB1

And now set equation (1) to equal equation (2), and solve for price. Substituting price into the demandor supply function, we can write quantity in the same manner. Then, we can run OLS to �nd the�transformed�parameters, and use algebra to isolate the true parameter values from supply.

97

Page 98: The Big Problems File

(f) (5 points) What should change in the above system of equations in order for the �rst equation to beoveridenti�ed with 2 overidenti�cation restrictions?

Solution: We would need to add three additional excluded exogenous variables to the supply function.

2. This year, you are teaching ECON 139. Suppose you have a very large class so that, in all what follows, youcan use asymptotic results. You decide to have two midterms and one �nal. Both midterms are worth 75points. Midterm 1 has just been graded, and you are not very happy with the performance of the class. Youwant to analyze if one extra section of econometrics per week leads to higher grades. To do this, you createa new section, held on Saturday morning, so that there will be no schedule con�icts. Participation to thisextra section is on a voluntary basis. When midterm 2 comes, and grades have been assigned, you estimatethe following regression dgradeM2i = 52:3

(2:9)+ 0:20(0:05)

gradeM1i + 8:37(1:08)

Extrai (7)

where gradeMj denotes grade in midterm j; j = 1; 2; and Extrai is a dummy equal to one if the studentattended the extra sections.

(a) Using the results in equation 1, does it look like the students who attended the extra sections performedrelatively better than the others? Evaluate both the statistical signi�cance and the magnitude of theresults. SolutionsYes, the students who attended the extra sections did better. The average gain was 8.37, over 10% of thetotal points possible on the test. And with a t-statistic of 7.75, this di¤erence is statistically signi�cantas well.

(b) Using again the data that you have collected after Midterm 2, now you estimate the following logit model:

P (Extrai = 1 j gradeM1i) = F��6:27(2:5)

+ 0:11(0:043)

gradeM1i

�(8)

where F is the logit CDF. Does it look like students who performed better in Midterm 1 are statisticallysigni�cantly more likely to participate to the extra sections? Solution

t =0:11

0:043= 2:56 > 1:97

The coe¢ cient on midterm 1 is statistically signi�cant at the 5% level. Therefore, we do conclude thatstudents who do better on midterm 1 are more likely to attend the extra sections.

(c) Calculate the estimated di¤erence in the probability of attending the extra sections for a student who got50 in Midterm 1, and another student who instead got 70. Solution

1

1 + e(6:27�0:11�70)� 1

1 + e(6:27�0:11�50)

= 0:81� 0:32 = 0:49

A student with a midterm grade of 70 is 49% more likely to attend the extra sections than a student whoscored 50.

(d) Ideally, you would like to estimate the causal impact of the extra sections on the grade in midterm 2. Dothe results you obtained from the logit model in (2) suggest that the coe¢ cient for Extra in equation (1)can or cannot be interpreted in a causal way? Explain. SolutionThe logit results show that we can not interpret the coe¢ cient from equation (1) in a causal way. Thelogit model shows that there is signi�cant self-selection. Good students are more likely to go to sectionand to do well on the second midterm.

98

Page 99: The Big Problems File

(e) What does your answer to the previous question suggest in terms of the �reliability� of the estimatedcoe¢ cient for Extrai in equation (1). Is 8:37 likely to be upward biased? Downward biased? Or is it likelyto be close to the true causal impact? Explain. SolutionIt is likely that the coe¢ cient on Extra will be upwardly biased. We have seen from equation (2) thatthere is signi�cant self-selection among those who attend the extra section. If this selection is driven onlyby the grade on midterm 1, we can control for this. However, it is likely that there unobserved variables,such as motivation, that also matter. Motivation would have a positive correlation with midterm scoresand with attendance. From the omitted variable bias formula, we know this will bias the Extra coe¢ cientupwards.

After Midterm 2, you carry out the following experiment. You generate a dummy variable Di, and you setthe dummy variable = 1 for student i if tossing a coin you get a head, and you assign instead Di = 0 if theresult is tails. Then you tell students who got Di = 1 that if they attend the extra sections (attendancewill be checked every day by the TA), they will get a $100 gift card to be used at a local grocery store.However, attendance of the extra sections is still voluntary. Before the �nal, you collect from your TA allinformation on attendance, and you come up with the following 2 � 2 table, which represents the jointprobability of Di (=1 if you got the o¤er) and Extra (=1 if you attended the post-midterm 2 sections).Please note that from now on the data on section attendance before midterm 2 are no longer used.

Extrai1 0

Di 0 0.3 0.21 0.4 0.1

(f) Calculate P (Extrai = 1 j Di = 1), P (Extrai = 1 j Di = 0) ; and P (Extrai = 1) : Solution

P (Extrai = 1 j Di = 1) =0:4

0:5= 0:8

P (Extrai = 1 j Di = 0) =0:3

0:5= 0:6

P (Extrai = 1) = 0:7

(g) Are Extrai and Di independent? Explain? SolutionNo, Extrai and Di are not independent. They are positively correlated. A higher Di increases theprobability of attending section.

(h) You think of using Di as an instrument for Extrai. Do you think this would be a valid instrument?Would it be exogenous? Would it be relevant? Explain. SolutionYes, it should be a valid instrument. As shown above, it is relevant because it increases the probabilityof attending the extra section. Furthermore, it is exogenous to the unobserved portion of the selectionequation because it was assigned randomly.Now you want to estimate the following model using 2SLS, using Di as an instrument.

gradeFi = �0 + �1gradeM1i + �2gradeM2i + �3Extrai + ui

where gradeFi is the grade in the Final. The grade in the �nal is out of a total of 180 points.

(i) After estimating the �rst stage, you want to test for the relevance of the instrument. The F-statisticis equal to 13. Describe the regression estimated in the �rst stage, then carefully describe the nullhypothesis that is being tested by the F-test. Finally, explain whether the result of the test suggests thatyour instrument is weak. SolutionThe �rst stage is a regression of Extrai on the instrument and the exogenous variables. The null hypothesisthat is being tested by the F-test is the hypothesis that the coe¢ cient on the instrument, Di, is zero. Aninstrument is considered weak if the F-test is under 10. Therefore, we would conclude that Di is not aweak instrument.

99

Page 100: The Big Problems File

(j) You estimate the second stage, and the results are as follows:

dgradeFi = 74(46:9)

+ 0:48(0:27)

gradeM1i + 0:68(0:69)

gradeM2i + 3:23(10:5)

Extrai

When you estimate the same model using OLS, the results are instead the following:

dgradeFi = 105:9(26:1)

+ 0:36(0:22)

gradeM1i + 0:24(0:42)

gradeM2i + 13:1(5:8)

Extrai

Comment on the change in the coe¢ cient for Extra when you go from OLS to 2SLS. Is the changeconsistent with what you expected? Why? What do you conclude about the utility of the extra sections?SolutionWhen we move from OLS to 2SLS, the coe¢ cient on Extra went down and is no longer statisticallysigni�cant. This is not unexpected because we saw that there was self-selection of better students intothe extra section and that this would bias the coe¢ cient on Extra upwards. From the 2SLS, we concludethat the extra section does not help very much and the evidence that it helps at all is somewhat weak.

(k) One of your students suggests that you should try to add other instruments to your 2SLS estimator, soyou will be able to test for instrument exogeneity. In particular, he suggests that you should use GPA inother ECON courses, and GPA in non-ECON courses as two added instruments. Should you accept thestudent�s suggestion? Explain. SolutionAdding these instruments is probably not a good idea. Since Di was drawn randomly, we are con�dentthat it is exogenous to the error term in equation (1). However, the GPA variables are not likely to beexogenous. For example, motivation might in�uence the student�s test scores in Econ 139 and in her GPAin other classes.

(l) You show your student what happens if you estimate the model using 2SLS, using, as instruments, thetwo GPA�s described before, as well as the dummy Di: The J-statistic turns out to be equal to 25. Whatshould your student conclude? SolutionThe J-statistic will be distributed �2 withm�k degrees of freedom, wherem is the number of instrumentsand k is the number of endogenous regressors. Here, the critical value of �2 with 2 degrees of freedom is9.21 at the 1% level. Therefore, with a value of 25, we can reject the null hypothesis that the instrumentsare exogenous. Most likely it is the GPA variables that are the problem.

3. 40 Points overall. You teach a course in econometrics, and you want to know if 2 extra hours of econometricclasses per week help your students or not. After the midterm has been graded, you assign randomly thetreatment X to your students. Students assigned to the treatment group (for which X = 1) are asked toattend one extra 2-hour lecture per week, while students in the control group (for which X = 0) are asked notto attend the extra lectures. Let G be the grade in the �nal exam (assume that the maximum grade is 100).Your estimates are as follows (heteroskedasticity-robust standard errors in parenthesis)

Gi = 80:6(3:23)

+ 10:03(3:53)

Xi R2 = 0:18 (1)

(a) (4 points) Interpret the results. Judging solely on the above regression, is there evidence that moreeconometric classes help improving your grade? Is the e¤ect statistically signi�cant? Does it look impor-tant?

Solution: There seems to be evidence that more econometric classes helps to improve grades. The e¤ectis statistically signi�cant, and it looks important.

(b) (4 points) Brie�y explain why it might be a good idea to re-estimate the above regression including, asregressor, the grade obtained in the midterm.

Solution: If treatment is randomly assigned, the OLS estimator of interest is more e¢ cient using multiple

100

Page 101: The Big Problems File

regressor than a single regressor.

You re-estimate the above model including the student�s grade in the midterm (Mi) as regressor, andyour estimates are as follows (heteroskedasticity-robust standard errors in parenthesis):

Gi = 39:1(14:44)

+ 0:54(0:18)

Mi + 10:5(3:14)

Xi R2 = 0:336 (2)

Compare the estimates in (2) to those in the previous regression (1) and brie�y comment on the following:

(c) (4 points) Note that in (2) the constant is much smaller than in (1). How do you explain this?

Solution: In equation (1) the constant was measuring the average grade for people in the control group,but now it doesn�t since you also include the midterm grade.

(d) (4 points) The standard error for the coe¢ cient of X in model (2) is smaller than the correspondingstandard error in model (1). Was this to be expected? Why?

Solution: Yes, we expect smaller standard errors, since we�ve added additional regressors.(e) (4 points) How do you explain the large increase in the R2 when you move from model (1) to model (2)?

Solution: Similarly as in part (d), the addition of midterm grade as a regressor helps to explain variationin the data. Additional regressors always improves �t.

(f) (4 points) Is the coe¢ cient for M signi�cant? Does your answer suggest that our estimator for the e¤ectof the �program�in model (1) is not consistent? Justify your answer.

Solution: t = 0:54�00:18 = 3. Since the critical values are about 1.99 and 2.63 at 5% and 1%, respectively,

we can reject the null that midterm grade do not help to predict �nal grades. This does not indicatean inconsistent estimator in equation (1), since true randomly assigned treatment will make the OLSestimator consistent.

(g) (4 points) Comparing your estimates from models (1) and (2), is there evidence that the randomizationacross students was done incorrectly, or that partial compliance is a serious issue, here? Justify youranswer.

Solution: Because the OLS estimates for the e¤ect of the additional econometric classes is fairly closefor the two equations we can be reasonably sure that randomization was done correctly.

Now you want to estimate what is the e¤ect of attending extra lectures on the probability of improvingthe midterm�s grade in the �nal exam. Improvedi is a binary variable equal to one if the i -th student�sgrade in the �nal is above his/her grade in the midterm. You estimate the following probit model.

P (Improvedi = 1 j Xi;Mi) = �

�4:1(1:72)

� 0:05(0:03)

Mi + 1:05(0:46)

Xi

�(3)

(h) (4 points) What is the predicted e¤ect of the program on the probability of improving the grade? Doesit look important? In your answer, use the fact that the average midterm grade was equal to 76.

Solution: �(improvedjclasses) � �(improvedjnoclasses) = �(4:1 � 0:05 � 76 + 1:05 � 1) � �(4:1 �

101

Page 102: The Big Problems File

0:05� 76+ 1:05� 0) = �(1:35)��(0:30) = 0:9115� 0:6179 = 0:2936. Since probit coe¢ cients calculatedusing MLE is consistent and normally distributed in large samples, we can just use the usual t-statistics.t = 1:05�0

0:46 = 2:28. Therefore, it�s statistically signi�cant at 5% but not at 1%. It does seem to beimportant, since attending these sessions seems to increase the chance of getting a better grade by 30percent.

(i) (4 points) You re-estimate the above model using a Linear Probability Model, and the results are asfollows

P (Improvedi = 1 j Xi;Mi) = 1:47(0:4)

� 0:01(0:005)

Mi + 0:273(0:111)

Xi (4)

What is the predicted e¤ect of the program on the probability of improving the grade? Is is very di¤erentfrom the one estimated using a probit model? Did you expect it to be very di¤erent or not, and why?

Solution: The predicted e¤ect is a 27.3 percent increase in the likelihood of receiving an improved grade.The estimates are in line with results from the probit model. We expect the results to be similar. LPMoften provide answers close to probit or logit if there are not too many extreme values of the regressors

(j) (4 points) In both models (3) and (4) the coe¢ cient related to the midterm�s grade is negative (forgetabout the statistical signi�cance, here). Can you think of a reason why this might be the case?

Solution: If a student received a very high grade on his/her midterm, it�s more di¢ cult to improve uponthat in the �nal grade. Therefore, it makes sense that a high midterm score would negatively a¤ect theprobability of the �nal exam grade improving over the midterm grade.

4. Problem (16) �25 points overall. Consider the following population regression model relating the depen-dent variable Yi and regressor Xi,

Yi = �0 + �1Xi + ui; i = 1; : : : ; n: (1)

Xi = Yi + Zi (2)

where Z is a valid instrument for X.

(a) (4 points) Explain why you should not use OLS to estimate �1.

Solution: Substitution of the �rst equation into the identity shows that X is correlated with the errorterm. Hence estimation with OLS results in an inconsistent estimator.

(b) (4 points) To generate a consistent estimator for �1, what should you do? (Just brie�y describe theestimation procedure you would follow. No computation or proof is necessary here!).

Solution: The instrumental variable estimator is consistent and in this case is �12SLS

=P(Xi� �X)YiP(Xi� �X)Zi

.

(c) (5 points) The two structural equations (1) and (2) above make up a system of equations in twounknowns. Specify the two reduced form equations in terms of the original coe¢ cients. (Hint: Substitutethe identity into the �rst equation and solve for Y . Similarly, substitute Y into the identity and solve forX.)

102

Page 103: The Big Problems File

Solution:

Yi = �0 + �1(Yi + Zi) + ui

Xi = (�0 + �1Xi + ui) + Zi

or

(1� �1)Yi = �0 + �1Zi + ui

(1� �1)Xi = �0 + Zi + ui

Hence

Yi = �0 + �2Zi + v1i

Xi = �3 + �4Zi + v2i

where �0 = �3 =�01��1

, �2 =�11��1

, �4 = 11��1

, and v1i = v2i = 11��1

ui:

(d) (4 points) Can you estimate consistently the two reduced form equations using OLS? Why?

Solution: Since Z is a valid instrument by assumption, it must be uncorrelated with the error term.Hence using OLS results in a consistent estimator.

(e) (4 points) What is the ratio of the two estimated slopes? (This estimator is called �indirect leastsquares.�)

Solution:

�2�4=

P(Xi� �X)YiP(Xi� �X)

2P(Xi� �X)ZiP(Xi� �X)

2

=

P�Xi � �X

�YiP�

Xi � �X�Zi

(f) (4 points) How does it compare to the TSLS in this example?

Solution: This indirect least squares estimator is identical to the TSLS estimator.

5. (45 points overall) This is (loosely) based on Goldin & Rouse (2000), Orchestrating impartiality: the impactof �blind auditions� on female musicians. Sex-biased hiring has been alleged for many occupations but isextremely di¢ cult to prove. A change in the audition procedures of symphony orchestras� adoption of �blind�auditions with a �screen�to conceal the candidate�s identity from the jury� provides a test for sex-biased hiring.

(a) (6 points) Suppose that the identity of a candidate is known, and you regress a dummy = 1 if a jobcandidate is given a position (Job) on a dummy equal to one if the candidate is female ( Female). Yourestimated regression is dJobi = 0:4

( 0.01)� 0:1( 0.04)

Femalei

Explain why you cannot interpret the results as necessarily indicating discrimination.

Solution: Because there are a lot of relevant variables that are omitted from the regression. For ex-ample, we do not control for the quality of candidates. It well may be the case that in this particularsample female candidates are less quali�ed. Then the coe¢ cient on Female is downward biased since:Cov(Job;Quality) > 0 and Cov(Female;Quality) < 0.

The authors want to study if the existing composition of an orchestra appears to a¤ect the probability

103

Page 104: The Big Problems File

that blind auditions will be chosen. They estimate the following regression, using Probit (actually, theconstant is not reported in the paper, so I had to make it up...). The dependent variable is a dummyequal to 1 if in a given year a screen was adopted for the auditions.

PROBIT estimates - (robust standard errors in parenthesis)Probit estimates

constant �0 2:5(0:5)

Proportion females in previous year �1 0:490(1:163)

Prop. orchestra with tenure < 6 years �2 �9:467(2:787)

Pseudo-R2 0:05

(b) (4 points) Using a 10 signi�cance level, can you reject the hypothesis that �1 is equal to zero?

Solution: No, we fail to reject the hypothesis: t� stat = 0:491:163 = 0:42 < 1:645

(c) (4 points) Using a 1 signi�cance level, can you reject the hypothesis that �2 is equal to zero?

Solution: Yes, we do reject the hypothesis: t� stat = 9:4672:787 = 3:4 > 2:58

(d) (5 points) Suppose that the proportion of women in an orchestra in the previous year increases from 0 to0.4, and that the proportion of the orchestra with tenure below six years remains equal to 0.2. What isthe predicted e¤ect on the probability that a screen will be adopted this year?

Solution:

�(dProb) = �(2:5 + 0:49 � 0:4� 9:467 � 0:2)� �(2:5 + 0:49 � 0� 9:467 � 0:2)= �(0:803)� �(0:607) = 0:788� 0:729 = 0:059

(e) (6 points) Suppose that you add another regressor to the above regression, a dummy equal to one if theorchestra is one of the so-called �big �ves� (Boston, Chicago, Philadelphia, NY, Cleveland). The newvariable is not statistically signi�cant using standard signi�cance level. Is it possible for the Pseudo-R2

for this new regression to be below 0.05? Brie�y justify your answer.

Solution: No, because the MLE maximizes the likelihood function, adding another regressor to a modelincreases the value of the maximized likelihood, just like adding a regressor necessarily reduces the sum ofsquared OLS residuals in linear regression. In other words, without an additional regressor, the objectivefunction is maximized under the restriction that the coe¢ cient on this omitted variable is zero. Comparethe two Pseudo�R2�s. For the unrestricted model: Pseudo�R2u = 1� LogLu

LogL0For the restricted model:

Pseudo�R2r = 1�LogLrLogL0

where: LogLu and LogLr are the values of the maximized probit loglikelihoods forthe unrestricted and restricted models respectively; and LogL0 is the value of the maximized loglikelihoodexcluding all the regressors. The value of the constrained maximized likelihood is always less than thevalue of its unconstrained counterpart, i.e.: LogLr < LogLu. It is important here to recall that theloglikelihood is always negative (!, i.e. LogLu, LogLr, LogL0 are all less than zero). Thus, the ratioin the formula for Pseudo � R2 goes DOWN as we go from restricted to the unrestricted model (i.e. amodel with an additional regressor), and therefore: Pseudo � R2r < Pseudo � R2u, In this example thePseudo�R2 for the restricted model is equal to 0.05, so as we add another regressor (and therefore relaxthe constraint), the new Pseudo�R2 will exceed this value.

104

Page 105: The Big Problems File

You have observations for a large number of individuals who participated to at least one blind and atleast one not blind audition. You use a Linear Probability Model in a regression where the dependentvariable is a dummy = 1 if the individual advanced to the next stage of the audition. Blind is a dummyvariable equal to one if a �screen�was used during the audition (that is, if the audition was blind). Theresults are the following (robust standard errors in parenthesis).

Fixed E¤ects No Fixed E¤ects(1) (2)

Blind �0 0:399 (0:027) 0:103 (0:018)

Blind � Woman �1 0:041 (0:039) �0:069 (0:022)Woman �2 �0:005 (0:019)

In column (1), Fixed E¤ects estimates are obtained including individual �xed e¤ects. The coe¢ cientscorresponding to the individual-speci�c dummies (the �xed e¤ects) are not reported for brevity.

(f) (5 points) The results in column (1) do not include an estimate �2. Why?

Solution: To avoid multicollinearity, � the �xed e¤ect absorbs any individual-speci�c time-invariantcharacteristics, and gender is one of them.

(g) (5 points) Consider a female musician. Using the Fixed E¤ects results (column 1) calculate the di¤erencein the predicted probability of advancing to the next stage between a blind and a non-blind audition.

Solution:�P = Pblind;f � Pnon�blind;f = 0:399 + 0:041� 0 = 0:44

(h) (5 points) Consider a female musician. Using the OLS results (column 2) calculate the di¤erence in thepredicted probability of advancing to the next stage between a blind and a non-blind audition.

Solution:�P = Pblind;f � Pnon�blind;f = 0:103� 0:069� 0 = 0:034

(i) (5 points) You test the null hypothesis that both �0 and �1 are equal to zero, using Fixed E¤ects results.The result of the F-test is 8. Can you reject the null using a 1 signi�cance level?

Solution: Since F-stat is greater than 4.61 (the 1% critical value for F2;1 distribution), we reject thehypothesis.

6. (36 points) You are interested in understanding whether student athletes�grade point averages (GPAs) su¤erduring the semester their sport is in season. As such, you have collected data on 366 student-athletes from alarge midwestern research university that supports a Division 1 athletics program. You have observations forthe same students in both the fall and spring semesters (i.e. a two period panel). Your dataset includes thefollowing variables:

� � termgpa - the student�s GPA for that term, measured on a four point scale (mean: 2.33)

� spring - a dummy for whether the semester is the spring semester

� sat - the student�s combined SAT score

� hsperc - the student�s academic percentile in their high school graduating class

� female - a dummy variable equal to one if the student is female

105

Page 106: The Big Problems File

� crsgpa - a weighted average of the overall GPA (among all students) in courses taken by this student

� season - a dummy variable equal to one is the student�s sport is in season

We will maintain the assumption that all errors are homoskedastic throughout this question. You estimatethe following regression using OLS (pooling observations from both semesters):

dtermgpa = �2:15(:311)

� :057(:045)

spring + :002(:0001)

sat� :008(:001)

hsperc+ :366(:051)

female+ 1:06(:096)

crsgpa� :035(:049)

season

1. (a) (3 points) Based on this regression, do student-athletes GPAs su¤er when their sport is in season?Justify your answer.

Solution: The coe¢ cient on season implies that, holding all other variables constant, we expect theGPAs of student-athletes to be -.035 points lower when their sport is in season. Although negative, thisis a very small e¤ect (only 1.5% of the mean). Moreover, since the t-stat = �:035

:049 = �:71 is smaller inmagnitude than even the 10% critical value (1.64), we conclude that the di¤erential is not statisticallysigni�cant.

(b) (3 points) What is the estimated GPA di¤erential between females and males? Is it statistically signi�-cant at the 1% level?

Solution: The coe¢ cient on female implies that, holding all other variables constant, we expect theGPAs of female student-athletes to be .366 higher than those of males. Since the t-stat = :366

:051 = 7:18 isgreater than 2.58, the estimated di¤erential is signi�cant at the 1% level.

(c) (3 points) Since you have panel data (two semesters for each student athlete), you decide to estimatethe model using Random E¤ects as well. Here is the output you obtain:

dtermgpa = �2:22(:313)

� :063(:034)

spring + :002(:0001)

sat� :008(:001)

hsperc+ :369(:061)

female+ 1:09(:092)

crsgpa� :047(:039)

season

Based on this RE regression, do student-athletes GPAs su¤er when their sport is in season? Justify youranswer.

Solution: The coe¢ cient on season implies that, holding all other variables constant, we expect theGPAs of student-athletes to be -.045 points lower when their sport is in season. Again, while negative insign, this is a very small e¤ect. Moreover, since the t-stat = �:047

:039 = �1:21 is still smaller in magnitudethan the 10% critical value (1.64), we conclude that the di¤erential is not statistically signi�cant.

(d) (4 points) Write out the population regression model (including the errors) for the RE estimator usedabove. What must be true about the error terms for the RE estimator to yield consistent estimates ofthe coe¢ cients?

Solution: The population regression model is given by

termgpa = �0 + �1spring + �2sat+ �3hsperc+ �4female+ �5crsgpa+ �6season+ �i + uit

In order for the RE estimator to yield consistent estimates, both the �xed e¤ect �i and the idiosyncraticerror uit must be uncorrelated with the regressors:

(e) (5 points) Most of the athletes who play their sport only in the fall are football players. Suppose theability levels of football players di¤er systematically from those of other athletes (i.e. they have lowerability). If ability is not fully captured by SAT score and high school percentile, do you think the OLS

106

Page 107: The Big Problems File

estimate of the coe¢ cient on season will be biased? What do you think the sign of that bias will be?Why?

Solution: If omitted ability is correlated with season then, as we know from Chapters 5, OLS is biasedand inconsistent, so the coe¢ cient on season will be biased.. However, the sign of the bias is di¢ cult todetermine since we are pooling across semesters. First, suppose we used only the fall term, when footballis in season. Then the error term and season would be negatively correlated, which produces a downwardbias in the OLS estimator of �6. Because �6 is hypothesized to be negative, an OLS regression using onlythe fall data will produce a downward biased estimator. However, if we use just the spring semester, thebias is in the opposite direction because ability and season would be positive correlated (more academicallyable athletes are in season in the spring). When we pool the two semesters we cannot, with a much moredetailed analysis, determine which bias will dominate.

(f) (3 points) Since you are concerned with unobserved ability, you decide to use a �xed e¤ects model toestimate the impact of season on termgpa. How will the �xed e¤ects estimator allow you to control forunobserved ability?

Solution: Using the �xed e¤ects estimator will allow us to estimate a separate �i for each studentathlete. This allows us to control for the unobserved aspects of each individual that do not change overtime. If we assume that innate ability is constant over time, then we will be controlling for ability byincluding this �xed e¤ect.

(g) (5 points) When you use the FE estimator, you obtain the following results:

Dependent variable: termgpa

Regressor Coe¢ cientspring �:069

(:034)crsgpa 1:14

(:118)season �:057

(:041)Fixed E¤ects IncludedStandard Errors in parentheses

Notice that the variables sat, hsperc; and female have been dropped from the estimation. Why? Whyweren�t these variables dropped from the RE regression?

Solution: The variables sat, hsperc; and female must be dropped from the FE regression since they donot vary over time for a particular student. Therefore, the same di¤erencing procedure that purges �iwill get rid of them as well: Alternatively, if you are estimating the FE regression with student dummies,including these three variables would lead to perfect multicollinearity. Since the RE estimator does notexplicitly include student speci�c intercepts (i.e. dummies), there is no collinearity problem. Additionalnote: Unlike the FE estimator, the goal of the RE estimator isn�t to eliminate �i; but to use its presenceto improve e¢ ciency. In particular, the RE estimator simply requires that �i be uncorrelated with theincluded regressors, but does not require that those regressors be constant over time.

(h) (3 points) Based on the FE regression in part g, do student-athletes GPAs su¤er when their sport is inseason? Justify your answer.

Solution: The coe¢ cient on season implies that, holding all other variables constant, we expect the

107

Page 108: The Big Problems File

GPAs of student-athletes to be -.057 points lower when their sport is in season. Once again, althoughnegative, this is a very small e¤ect. Also, since the t-stat = �:057

:041 = �1:39 is still smaller in magnitudethan the 10% critical value, we conclude that the di¤erential is not statistically signi�cant.

(i) (3 points) A Hausman test comparing the FE and RE estimators above produces a test statistic equalto .88 . What do you conclude about the validity of the RE assumptions in this setting?

Solution: The Hausman test statistic is distributed �2M ; whereM is the number of coe¢ cients (excludingthe ones that do not vary over time). M here is equal to 3 and the 10% critical value of the �23 is 6.25,so we cannot reject the null hypothesis that the coe¢ cient estimates are the same across speci�cations.Therefore, we conclude that it is appropriate to use the RE estimator in this case.

(j) (4 points) Suppose you decide to drop the three variables sat, hsperc; and female and re-estimate theRE model using the remaining variables. Here is the output you obtain.

dtermgpa = � :520(:286)

� :046(:034)

spring + 1:04(:102)

crsgpa� :009(:040)

season

The Hausman test comparing this RE estimator to the FE estimator from part g yields a test statisticequal to 32.13. What do you conclude about the validity of the RE assumptions in this new setting? Doyou reach the same conclusion as in part i? Explain why you do or do not.

Solution: The Hausman test statistic is distributed �2M ; whereM is the number of coe¢ cients (excludingthe ones that do not vary over time). M here is again equal to 3 and the 1% critical value of the �23is 11.34, so we can easily reject the null hypothesis that the coe¢ cient estimates are the same acrossspeci�cations. This is strong evidence that the RE assumptions are invalid. This is the opposite of whatwe concluded in part i. Note that we are no longer controlling for variation in sat, hsperc; and female .This variation is now part of the error and may well be correlated with the remaining regressors. Thecoe¢ cient estimates on the remaining parameters seem quite di¤erent than those of the FE estimator, sowe should not be surprised by the results of the Hausman test.

2. (27 points) Consider the following two-equation system:

Y1 = �0 + �1X1 + u1 (9)

Y2 = �0 + �1Y1 + u2 (10)

(a) (3 points) Under what condition(s) will simple OLS yield consistent estimates for the parameters �0 &�1 in equation (9)?

Solution: For equation (9), we simply need OLS assumption 1 to hold: OLSAssumption 1: E (u1i j X1i) =0. In fact, even the weaker condition Cov(X1i; u1i) = 0 is su¢ cient.

(b) (3 points) Under what condition(s) will simple OLS yield consistent estimates for the parameters �0 &�1 in equation (10)? Does this place any restrictions on the relationship between u2; u1; and X1?

Solution: For equation (10), we again need the OLS assumption 1 to hold: OLS Assumption 1:E (u2i j Y1i) = 0. This alone is a su¢ cient condition for OLS to yield consistent estimates. However,because Y1 appears in the other equation, it does place some restrictions on the variables in that equation,namely that Cov(u2i; u1i) = 0 and Cov(u2i; X1i) = 0

(c) (10 points) Suppose that, under the conditions you described above for equation (10), a researcherdecides instead to use 2SLS to estimate (10). That is, you �rst regress Y1 on X1 using simple OLS andthen regress Y2 on the �tted values bY1 from the �rst step. Will this procedure yield a consistent estimate

108

Page 109: The Big Problems File

of �1? Justify your answer. (Hint: you should follow the same steps we used to establish consistency ofthe 2SLS estimator in class).

Solution: Recall the trick we used in class: Since bY1i = b�0 + b�1X1i and bY 1 = b�0 + b�1X1 we knowbY1i � bY 1 = b�1 �X1i �X1

�: Now, using OLS to estimate equation (9) yields:

b�1 = P�X1i �X1

� �Y1i � Y 1

�P�X1i �X1

�2The second stage estimator is given by

b�1 =

P�bY1i � bY 1� �Y2i � Y 2�P�bY1i � bY 1�2 =

P�bY1i � bY 1�Y2iP�bY1i � bY 1�2 =

P�bY1i � bY 1� (�0 + �1Y1i + u2i)P�bY1i � bY 1�2=

b�1P�X1i �X1

�(�0 + �1Y1i + u2i)b�21P�

X1i �X1

�2=

�0P�

X1i �X1

�b�1P�

X1i �X1

�2 + �1P�X1i �X1

�(Y1i)b�1P�

X1i �X1

�2 +

P�X1i �X1

�(u2i)b�1P�

X1i �X1

�2= �1 +

P�X1i �X1

�(u2i)b�1P�

X1i �X1

�2 p! �1 +Cov(X1; u2)

�1V ar(X1)= �1

(as long as �1 6= 0, since Cov(X1; u2) = 0 by assumption).

(d) (8 points) Someone who has taken Econ 139 in the past suggests that yet another way to obtain aconsistent estimator of �1 is to use indirect least squares (ILS). This person suggests �rst regressing Y1 onX1; then regressing Y2 on X1; and then using the results from both regressions to construct an estimateof �1: Show how you can use these two regressions to obtain a consistent estimate of �1: Justify youranswer.

Solution: Plugging equation (9) into equation (10) yields

Y2 = �0 + �1Y1 + u2 ) Y2 = �0 + �1 (�0 + �1X1 + u1) + u2 ) Y2 = �0 + �1�0 + �1�1X1 + �1u1 + u2

So regressing Y2 on X1 yields a consistent estimate of �1�1 (provided E(�1u1+ u2 j X1) = 0). While youcan show that the estimator is consistent by constructing the ratios of the two OLS formulas themselvesand proceed as in the previous question, it is far easier to use what we already know about each OLSestimator and use the known properties of convergence in probability. Since a regression of Y1 on X1 yieldsa consistent estimate of �1 (provided E(u1 j X1) = 0) we can recover an estimate of �1 by constructingthe ratio of the two slope coe¢ cients. Since b�1 p! �1 and d�1�1 p! �1�1,

d�1�1b�1 p! �1�1�1

= �1.

(e) (3 points) Is there any reason to prefer simple OLS to these two alternative procedures?

Solution: If the OLS assumptions are satis�ed, OLS will be more e¢ cient than 2SLS (since, in thiscase, we are adding unnecessary noise by using bY1 instead of Y1). Also, 2SLS is generally biased in smallsamples, while OLS is not.

3. (22 points total)Workers in the U.S. have several options for saving money for their retirement. Two primaryoptions are contributing to an individual retirement account (IRA) and participating in a 401(k) plan. Whileeveryone is eligible to contribute savings to an IRA, you can only participate in a 401(k) plan if your employer

109

Page 110: The Big Problems File

o¤ers one (that is, only if you work for a �rm that chooses to o¤er one as part of the overall compensationpackage). The goal of this exercise is to test whether there is a trade-o¤ between participating in a 401(k) planand having an IRA (some economists have claimed that 401(k) plans crowd out IRAs). In particular, we areinterested in whether participating in a 401(k) plan makes a person less likely to participate in an IRA.

To explore this issue, you propose estimating the following linear probability model (LPM)

pira = �0 + �1p401k + �2inc+ �3inc2 + �4age+ �5age

2 + u (1)

where pira is a dummy variable indicating that a worker contributes to an IRA, p401k is a dummy variableindicating whether a worker participates in a 401(k) plan, inc is the worker�s annual income, and age is theworker�s age. We are primarily interested in the coe¢ cient �1.

(a) (3 points) OLS estimation of this LPM yields

dpira = � :198(:069)

+ :054(:010)

p401k + :0087(:0005)

inc� :000023(:000004)

inc2 � :0016(:0033)

age+ :00012(:00004)

age2

What is the estimated e¤ect of p401k? Is it statistically signi�cant at the 1% level?

Solution: The coe¢ cient on p401k implies that participation in a 401(k) plan is associated with a .054higher probability of having an IRA, holding income and age �xed. We can conclude that it is statisticallysigni�cant at the 1% level since t = :054

:010 = 5:4 > 2:58:

(b) (4 points) What, if anything, is wrong with using OLS to estimate (1). Hint: think about the exogeneityof p401k:

Solution: While regression (1) controls for income and age, it does not account for the fact that di¤erentpeople have di¤erent taste for savings, even within given income and age categories. People that tendto be savers will tend to have both a 401(k) plan and an IRA. (This means that the error term, u, ispositively correlated with p401k.) What we would like to know is, for a given person, if that personparticipates in a 401(k) does it make it less likely or more likely that the person also has an IRA. Thisquestion is di¢ cult to answer using OLS without having many more controls for the taste for saving.

(c) (4 points) The variable e401k is a binary variable equal to one if a worker is eligible to participate in a401(k) plan. Explain what is required for e401k to be a valid instrumental variable (IV) for p401k. Dothese assumptions seem reasonable?

Solution: To be a valid IV for p401k, e401k must be both relevant and exogenous. For relevance, weneed e401k to be correlated with p401k; not surprisingly, this is not an issue, as being eligible for a401(k) plan is, by de�nition, necessary for participation. (The regression in part (d) veri�es that they arestrongly positively correlated.) The more di¢ cult issue is whether e401k can be taken as exogenous inthe structural model. In other words, is being eligible for a 401(k) correlated with unobserved taste forsaving? If we think workers that like to save for retirement will match up with employers that providevehicles for retirement saving, then u and e401k would be positively correlated. Certainly we think thate401k is less correlated with u than is p401k.

(d) (7 points) Estimating the reduced form (�rst stage) equation for p401k by OLS yields

dp401k = :059(:049)

+ :689(:008)

e401k + :0011(:0003)

inc� :0000018(:0000027)

inc2 � :0047(:0022)

age+ :000052(:000026)

age2

What is the estimated impact of e401k on p401k? Is it statistically signi�cant at the 1% level? What doyou conclude about the relevance of e401k as an instrument for p401k? Is it exogenous?

110

Page 111: The Big Problems File

Solution: The coe¢ cient estimate for e401k implies that, holding income and age �xed, eligibility in a401(k) plan increases the probability of participating in a 401(k) by .69. The t-statistic t = :689

:008 = 86:1establishes that it is signi�cant at the 1% level (it is well above 2.58). Moreover, since the F -statF = t2 = 86:12 = 7418 is well above 10 (our rule of thumb cut-o¤), our instrument is very strong.Clearly, e401k passes one of the two requirements as an IV for p401k. Unfortunately, we cannot evaluatethe exogeneity of e401k because we are exactly identi�ed.

(e) (4 points) We now estimate equation (1) by 2SLS and obtain the following results

dpira = � :207(:065)

+ :021(:013)

p401k + :0090(:0005)

inc� :000024(:000004)

inc2 � :0011(:0032)

age+ :00011(:00004)

age2

What do you conclude about the trade o¤ between participating in a 401(k) plan and participating in anIRA? Did your conclusions change from part a?

Solution: The IV estimate of �p401k is less than half as large as the OLS estimate, and the IV estimatehas a t-statistic t = :021

:013 = 1:62 so it isn�t signi�cant at the standard levels. A reduction in �p401k iswhat we expect given the unobserved taste for saving argument made in part (b). But we still do notestimate a trade-o¤ between participating in a 401(k) plan and participating in an IRA. This conclusionhas prompted some in the economics literature to claim that 401(k) saving is additional saving; it doesnot simply crowd out saving in other plans.

4. (24 points total) Suppose the (inverse) supply function for the monthly growth in cement price (gprc) as afunction of growth in quantity (gcem) is given by

gprct = �0 + �1gcemt + �2gprcpett + �3febt + :::+ �13dect + ut (1)

where gprcpet; the growth in the price of petroleum (a key input to the production of cement), is assumed tobe exogenous and feb; :::; dec are dummy variables for the months of the year.

(a) (2 points) Why is there no January dummy in equation (1)?

Solution: It would lead to perfect multicollinearity.

(b) (2 points) What signs do you expect for �1 and �2?

Solution: �1 > 0 (supply curves should be upward sloping) and �2 > 0 (more expensive inputs shouldincrease the price of outputs).

(c) (6 points) Estimation of equation (1) by OLS yields

dgprc = :0144(:0058)

� :0443(:0127)

gcem+ :0628(:0256)

gprcpet

where the month dummies have been suppressed for brevity. What does the estimate of �1 imply aboutthe supply curve? Is this surprising? What, if anything, is wrong with using OLS to estimate equation(1)?

Solution: The estimated supply curve slopes down, not up, and the coe¢ cient on gcemt is statisticallysigni�cant at the 1% level (t = �:0043

:0127 = �3:47). This contradicts standard economic theory. However,since this is a supply curve and prices and quantities are determined in equilibrium, it is very likely thatour results are tainted by simultaneity bias.

111

Page 112: The Big Problems File

(d) (3 points) The variable grdefs is the monthly growth in real defense spending in the United States.Estimation of the reduced form (�rst stage) equation yields

dgcem = �:2482(:0296)

�1:054(3:255)

grdefs+ :0670(:0909)

gprcpet

where the month dummies have been suppressed for brevity. What does this regression tell you aboutthe value of grdefs as an instrument for gcem?

Solution: The coe¢ cient on grdefs is �1:054 with t-statistic t = �1:0543:255 = �0:32. We cannot reject

H0 : �grdefs = 0 at any reasonable signi�cance level and F = t2 = (�0:32)2 = :102 is well under our rule

of thumb of 10. We conclude that grdefs is not a useful IV for gcem (even if grdefs is exogenous in thedemand equation).

112

Page 113: The Big Problems File

e. (3 points) Two additional instruments available to us are the growth in output of residential (grres) andnonresidential (grnon) construction. These are demand shifters that should be roughly uncorrelated with the supplyerror. Estimation of the reduced form (�rst stage) equation yields

dgcem = �:2437(:0267)

+ :1361(:1280)

grres+ 1:145(:2887)

grnon+ :0369(:0938)

gprcpet

where the month dummies have been suppressed for brevity and an F -test of the joint signi�cance of the two IVsyields an F -statistic of 16:9: What do you conclude about the relevance of these instruments?

Solution:Our F -statistic F > 10, so it passes our rule of thumb cut-o¤. Our instruments appear to be both relevant and

reasonably strong.

f. (3 points) You are concerned about the exogeneity of the two instruments proposed in part e, so you perform astandard test of the over-identifying restrictions, which yields a J-statistic J = 0:178. What do you conclude?

Solution:Here we have two instruments (grres; grnon) and one endogenous variable (gcem) so (in the notation of Stock

and Watson) m = 2 and k = 1; meaning that the J-statistic has a �2m�k = �21 distribution. Since the 10% critical

value for the �21 distribution is 2.71, we are unable to reject the null hypothesis that the instruments are exogenous.This is, of course, a good thing.

g. (5 points) Using the two instruments above (grres; grnon) to estimate equation (1) by 2SLS yields

dgprc = :0228(:0073)

� :0106(:0277)

gcem+ :0605(:0157)

gprcpet

where the month dummies have been suppressed for brevity. Construct a 95% con�dence interval for �1. What doyou conclude about the supply curve now?

Solution:A 95% con�dence interval for �1 is given by �1 � 1:96 � SE(�1) = �:0106� 1:96 � :0277 = [�:065; :044]While the coe¢ cient on gcemt is still negative, it is only about one-fourth the size of the OLS coe¢ cient, and

it is now very insigni�cant (t = �:0106:0277 = �:38). At this point we would conclude that the static supply function is

horizontal (with gprc on the vertical axis, the standard convention).

113

Page 114: The Big Problems File

9 Program Evaluation - Di¤erences-in-Di¤erences

1. 15 points overall. In 1992, there was an increase in the (state) minimum wage in New Jersey, but not in

a neighboring location (eastern Pennsylvania). To calculate the �diffs�in�diffs1 you need the change in the

treatment group and the change in the control group. To do this, the study provides you with the followinginformation,

Pennsylvania New Jersey

Employment before 23.33 20.44Employment after 21.17 21.03

The numbers are average employment per restaurant.

(a) (3 points) Calculate the change in the treatment group.

Solution: 21:03� 20:44 = +0:59%(b) (3 points) Calculate the change in the control group.

Solution: 21:17� 23:33 = �2:16%(c) (3 points) Calculate the di¤erence-in-di¤erences estimator �

diffs�in�diffs1 .

Solution: �1diff�in�diff

= 0:59� (�2:16) = 2:75

(d) (3 points) Since minimum wages represent a price �oor, did you expect �diffs�in�diffs1 to be positive or

negative? How do your expectations compare with the above results?

Solution: According to standard economic theory, we expect �1diff�in�diff

to be negative. Becauseminimum wage is supposed to act as a price �oor, we expect employment to decrease after a minimumwage hike relative to the treatment group.

(e) (3 points) The standard error for �diffs�in�diffs1 is 1.36. Test whether or not the coe¢ cient is statistically

signi�cant, given that there are 410 observations.

Solution: t = 2:75�01:36 = 2:02. �1

diff�in�diffis signi�cant at the 5% level.

2. (13 points total) On the last problem set, you analyzed the e¤ect of a minimum wage increase using aquasi-experiment for two adjacent states: New Jersey and Pennsylvania. In particular, you calculated a Di¤s-in-Di¤s estimate by comparing average employment changes per restaurant between a treatment group (NewJersey) and a control group (Pennsylvania). However, the authors of the originial study also provided data onthe employment changes between �low wage�restaurants and �high wage�restaurants in New Jersey only.A restaurant was classi�ed as �low wage� if the starting wage in the �rst wave of surveys was at the thenprevailing minimum wage of $4.25. A �high wage�restaurant was a restaurant with a starting wage close toor above the $5.25 minimum wage after the increase.

(a) (4 points) Explain why employment changes of the �high wage� and �low wage� restaurants mightconstitute a quasi-experiment. Which is the treatment group and which the control group?

Solution: In the above example, the increase in wages (�treatment�) occurs not because of changes in

114

Page 115: The Big Problems File

the demand or supply of labor, but because of an external event, namely the raising of the minimum wagein New Jersey. This is therefore a good example of a �natural experiment.�The treatment group is the�low wage�restaurants, since the wages there are actually changed. The �high wage�restaurants are thecontrol group.

(b) (6 points) The following information is provided

Low Wage High Wage

Employment before 19.56 22.25Employment after 20.88 20.21

where the numbers are average employment per restaurant. Calculate the change in the treatment group,

the change in the control group, and �nally �Diffs�in�Diffs1 . Since minimum wages represent a price

�oor, did you expect �Diffs�in�Diffs1 to be positive or negative?

Solution: The change in treatment group is +1:32, the change in control group is �2:04, so

�Diffs�in�Diffs1 = 3:36

According to standard economic theory, we expect �1Diffs�In�Diffs

to be negative (higher minimumwages should reduce employment since �rms now have higher costs).

(c) (3 points) The standard error for �Diffs�in�Diffs1 is 1.48. Test whether or not this is statistically

signi�cant at the 5% level, given that there are 174 observations.

Solution: The t-statistic is 3:361:48 = 2:27, so the coe¢ cient statistically signi�cant at the 5% level.

115

Page 116: The Big Problems File

10 Time Series

1. Consider a time series Yt: You think that this time series may be either described by Model 1 or by Model 2.The two models are as follows

Model 1: Yt = �Yt�1 + ut; ut � iid with E (ut) = 0; V ar (Yt) = �2Y for each tCov (ut; ut�j) = 0 unless j = 0; 0 < � < 1

Model 2: Yt = "t + �"t�1; "t � iid with E ("t) = 0; V ar ("t) = �2" for each tCov ("t; "t�j) = 0 unless j = 0

(a) (6 points) Prove that, for Model 1, Yt = �2Yt�2 + �ut�1 + ut = �3Yt�3 + �2ut�2 + �ut�1 + ut:

Solution Note that the model implies Yt�1 = �Yt�2 + ut�1, and Yt�2 = �Yt�3 + ut�2.

Yt = �Yt�1 + ut = �(�Yt�2 + ut�1) + ut

= �(�(�Yt�3 + ut�2) + ut�1) + ut

= �3Yt�3 + �2ut�2 + �ut�1 + ut

(b) (6 points) The population autocorrelation of order j of a time series Yt can be calculated as

�j =cov (Yt; Yt�j)

V ar (Yt):

Calculate the autocorrelation of order 1, 2, and 3 for the time series Yt if Model 1 is correct. (Hint: usethe result from part (a), and remember that the problem tells you that for each t; V ar (Yt) = �2Y )

Solution In Model 1 Cov(Yt; Yt�1) = Cov(�Yt�1 + ut; Yt�1) = �V ar(Yt�1). From the previous part,Cov(Yt; Yt�2) = Cov(�2Yt�2 + �ut�1 + ut; Yt�2) = �2V ar(Yt�2), and, analogously, Cov(Yt; Yt�3) =�3V ar(Yt�3): Since the Yt is a covariance stationary process, V ar(Yt�i) = V ar(Yt�j) for all i; j. So,the autocorrelations are

�1 = �; �2 = �2; �3 = �

3

(c) (6 points) Calculate the autocorrelation of order 1, 2, and 3 for the time series Yt if Model 2 is correct.(Hint: write down each Y in terms of the corresponding iid components ").

Solution In Model 2 Cov(Yt; Yt�1) = Cov("t + �"t�1; "t�1 + �"t�2) = ��2", as "t is iid. Higher ordercovariances are zero, since for example Cov(Yt; Yt�2) = Cov("t+�"t�1; "t�2+�"t�3) = 0 by independenceof "t. Also, V ar(Yt) = V ar("t + �"t�1) = (1 + �2)�2Y So, the autocorrelations are

�1 = �=(1 + �2); �2 = 0; �3 = 0

(d) (5 points) For both Model 1 and Model 2, draw two separate graphs with j on the horizontal axis, withj = 1; 2; 3, and �j on the vertical axis. Comment on the di¤erence between the two graphs.

Solution The graph for Model 1 looks like a slowly decreasing positive function of j, since � 2 (0; 1).The graph for Model 2 has a spike at j = 1 and is zero at all higher lags.

116

Page 117: The Big Problems File

(e) (5 points) Explain in words how you can use your results to choose between Model 1 and Model 2.

Solution The models imply di¤erent distribution properties of the process fYtg. Model 1 implies somelinear dependence of the process at any lag, while Model 2 implies independence of fYtg beyond the�rst lag. When we take real data, we calculate autocorrelations, and if there are signi�cant non-zerocorrelation coe¢ cients beyond lag 1, we choose not to use Model 2.

2. You have monthly data on the New York Stock Exchange index (NYSE hereafter), from January 1991 toDecember 1998, for a total of 96 observations. Let Yt denote the NYSE index, and let �Yt denote, as usual,the �rst di¤erence, that is �Yt = Yt � Yt�1. You estimate the following AR(1) model

d�Yt = 2:86(1:12)

+ 0:24(0:09)

�Yt�1; �R2 = 0:057

(a) (5 points) Does it look like an AR(1) model does a good job at predicting monthly changes in the index?Explain.

Solution The results suggest that the change in index is only weakly predictable with the past value. Ascan be seen from the R

2, only 6% of its variation is explained by the variation of the index change in the

previous month.

(b) (6 points) You know that the index was equal to 576 in December 1998, and equal to 564 in November1997. Calculate a prediction for the value of the NYSE index in January 1999.

Solution In this case4Yt�1 = 12, which implies d4Yt = 5:74, so the value in January is Yt = Yt�1+d4Yt =581:74.

(c) (5 points) You want to select the optimal number of lags, and you use a BIC criterion for this purpose.The following table reports the results of your calculations, for p = 1; 2; :::; 12:

p BIC (p) �R2

1 4.93 0.042 4.92 0.083 4.96 0.074 5.00 0.075 5.01 0.096 5.02 0.127 4.98 0.188 4.99 0.209 4.79 0.3710 4.84 0.3611 4.88 0.3512 4.89 0.37

Based on the results of this table, which model should you estimate? Explain.

Solution Based on these results, we would estimate an AR(9) model, as the BIC is minimized, when thenumber of lags in the model is 9.

117

Page 118: The Big Problems File

11 Complete Past Exams

11.1 Midterm 1 in Fall 2007

This problem is based on actual data for a sample of 55,360 women of age 15 to 49 from rural India. For simplicity,let us assume these data are iid. The dataset includes the following variables recorded for all women in the sample:

age Age in yearss Completed years of schoolingmalaria Binary variable = 1 if the woman said she had malaria in the 3 months before the interviewhg Hemoglobin level (grams per deciliter of blood)lowhg Binary variable = 1 if Hemoglobin level is < 10

Hemoglobin is the molecule, in our blood red cells, which among other things has the role of transporting oxygenfrom the lungs to the rest of the body. Low levels of hg are usually associated with poor health, and can be causedby di¤erent factors, such as malnutrition, disease, etc. The following table shows the joint distribution of the twobinary variables malaria and lowhg in the sample

Malaria=0 Malaria=1lowhg=0 .796 0.038lowhg=1 .155 0.011

1. (3 points): What is the number of women with low hemoglobin level in this sample?

Solution: The number of women is the fraction of the total for whom lowhg = 1; that is 55360�(:155+0:011) =9190 (rounded to the nearest unit)

2. (3 points): What is the mean value of lowhg in this sample?

Solution:E (lowhgi) = 1� P (lowhgi = 1) + 0� P (lowhgi = 0) = P (lowhgi = 1)

From the previous question, :155 + 0:011 = 0:166

3. (3 points): In this sample, what is the fraction of women with low hemoglobin level (that is, with lowhg = 1)conditional on having had malaria recently (that is among women with malaria = 1)?

Solution:

P (lowhg = 1jmalaria = 1) = P (lowhg = 1jmalaria = 1)P (malaria = 1)

=0:011

0:038 + 0:011= 0:224

:

4. (3 points): What is the fraction of women with low hemoglobin level among those who did not have malariarecently?

Solution:

P (lowhg = 1jmalaria = 0) = P (lowhg = 1jmalaria = 0)P (malaria = 0)

=0:155

0:155 + 0:796= 0:163

118

Page 119: The Big Problems File

5. (3 points): Malaria is often associated with low hemoglobin levels (because the malaria parasite destroys redblood cells). Are your results from the previous two questions overall consistent with this �ndings from themedical literature? Explain.

Solution: Yes, the fraction of women with low hg levels is higher among women who report malarial episodesrecently. Note that this does not indicate causation (it is possible that factors other than malaria explain thiscorrelation), but at least this correlation is consistent with the �ndings in the medical literature.

Now you want to study the relationship between hemoglobin level and years of schooling. In this sample themean and standard deviation of hg are 11.6 and 1.9 respectively. The results of the OLS regression of hgon s are the following, where the index i refers to the woman (heteroskedasticity-robust standard errors inparenthesis): chgi = 11:5

(0:01)+ 0:048(0:0019)

� si: (11)

6. (3 points): What is the interpretation of the intercept? Is it a meaningful parameter in this context?

Solution: It indicates the predicted value of hg for women with no schooling, and the parameter is meaningfulbecause there certainly are women with no schooling.

7. (3 points): What is the interpretation of the slope?

Solution: The slope indicates that a one more year of schooling is associated with a 0.048 increase in thepredicted level of hemoglobin

8. (4 points): Construct a 99% con�dence interval for the slope.

Solution:0:048� 2:57� 0:0019 = [:043; :053]

9. (3 points): What is predicted hemoglobin level for a woman with no schooling?

Solution:11:5 + :048 (0) = 11:5

10. (3 points): What is the predicted hemoglobin level for a woman with 10 years of schooling?

Solution:11:5 + :048 (10) = 11:98

11. (4 points): Does the di¤erence between the prediction in part 9 and that in part 10 look large?

Solution: The di¤erence is about .5. It does not look very large. Recall that the mean value of hg in thesample is 11.6, and the standard deviation is 1.9. So, 10 more years of schooling only increase hg by about1=4 of a standard deviation, which is not very large in relative terms.

12. (4 points): Can you reject the null hypothesis that the slope is equal to 0.05, using a 10 percent signi�cancelevel?

Solution: The test is two-sided (because the question does not indicate otherwise) so

0:048� 0:050:0019

= �1:0526 > �1:645

so we cannot reject the null.

119

Page 120: The Big Problems File

13. (6 points): Now you re-estimate the model separately for two di¤erent Indian states. In the state of Punjab,the slope is now 0.018, with a standard error equal to 0.01. In the state of Rajasthan, the slope is 0.003, andthe standard error is again 0.01. Note that the two estimates use independent samples. Can you reject thenull hypothesis that the slope is the same in the two states, using a 1% signi�cance level?

Solution: Here the null and alternative hypothesis are

H0 : �Punjab1 � �Rajasthan1 = 0

HA : �Punjab1 � �Rajasthan1 6= 0

so the test is��Punjab

1 � �Rajasthan1

�� 0dS:E:��Punjab1 � �Rajasthan1

� =��Punjab

1 � �Rajasthan1

�� 0rdV ar:��Punjab1 � �Rajasthan1

� =��Punjab

1 � �Rajasthan1

�� 0rdV ar:��Punjab1

�+ dV ar:��Rajasthan1

�where the fact that there is no covariance depends on the independence of the two samples. So�

�Punjab

1 � �Rajasthan1

�� 0rdS:E:��Punjab1

�2+ dS:E:��Rajasthan1

�2 = :018� :003p:012 + :012

= 1:0607 < 2:57

so that you cannot reject the null that �Punjab1 � �Rajasthan1 = 0:

14. (5 points): All the results point to a positive correlation between years of schooling and hemoglobin level. Doyou think these results should be interpreted to mean that more education causes improvements in hemoglobinlevels? Explain.

Solution: Not at all. There are many other factors which could be driving this correlation. On the one hand,better education could lead to better ability to take care of one�s health, so the result could be causal in part.But better education is also likely to be associated with higher income, better nutrition, better epidemiologicalenvironment etc etc. Without keeping all these (and probably many other) factors constant, it is impossibleto argue that the estimated correlation actually indicates causality.

.

Suppose now that the standard OLS assumptions hold for the following model:

hgi = �0 + �1si + ui: (12)

Recall that the OLS assumptions will also imply that �u;s = Cov(ui; si) = 0. You want to estimate �1:

Unfortunately, there were problems with the blood tests necessary to measure hemoglobin, so that you do notobserve the true hemoglobin level hgi, but you only observe a value hg�i = hgi + ei; where ei is measurementerror. In this problem we want to study the consequences of such measurement error for the estimation of theslope �1:

Let �e;s denote Cov(ei; si), that is, the covariance between the education level and measurement error, and let�2s denote the variance of years of completed schooling.

First, note that because your data set only includes hg�i ; and not hgi; your OLS estimator for the slope �1 willbe

�1 =

nXi=1

(si � s)hg�inXi=1

(si � s)2

120

Page 121: The Big Problems File

15. (6 points): Prove that the estimator can be rewritten as

�1 = �1 +

1n

nXi=1

(si � s) (ui + ei)

1n

nXi=1

(si � s)2

Solution:

�1 =

nXi=1

(si � s)hg�inXi=1

(si � s)2=

nXi=1

(si � s) (hgi + ei)

nXi=1

(si � s)2=

nXi=1

(si � s) (�0 + �1si + ui + ei)

nXi=1

(si � s)2

= �0

=0z }| {nXi=1

(si � s)

nXi=1

(si � s)2+ �1

266664nXi=1

(si � s) si

nXi=1

(si � s)2

377775| {z }

=1

+

nXi=1

(si � s) (ui + ei)

nXi=1

(si � s)2

= �1 +

1n

nXi=1

(si � s) (ui + ei)

1n

nXi=1

(si � s)2

16. (3 points)What is the probability limit of 1n

nXi=1

(si � s) (ui + ei)?

Solution: First, note that in large samples, s � �s; so when n!1

1

n

nXi=1

(si � s) (ui + ei) �1

n

nXi=1

(si � �s) (ui + ei)

but then this is just a mean of n iid random variables, which will converge to E [(si � �s) (ui + ei)] =E[(si � �s) (ui + ei � �u � �e)] = cov (si; ui + ei) : Using the properties of covariances, this in turn can bewritten as

cov (si; ui + ei) = cov (si; ui) + cov (si; ei) :

We know that cov (si; ui) = 0 by assumption, so �nally

p lim1

n

nXi=1

(si � s) (ui + ei) = cov (si; ei) = �e;s

17. (4 points): Prove thatp lim �1 = �1 +

�e;s�2s

Solution: We saw many times in class that under the usual assumptions

p lim1

n

nXi=1

(si � s)2 = �2s;

121

Page 122: The Big Problems File

and in the previous point we proved that

p lim1

n

nXi=1

(si � s) (ui + ei) = �e;s;

so, putting things together and using the properties of p lim we have the result

p lim �1 = p lim

266666666664�1 +

p!�e;sz }| {1

n

nXi=1

(si � s) (ui + ei)

1

n

nXi=1

(si � s)2| {z }p!�2s

377777777775= �1 +

�e;s�2s

18. (5 points): Suppose that better educated women have less time to be tested, so that for these women, onaverage, hgi has to measured in a rush, and that this usually leads to measurements which are below the truevalue. In this case, is �1 a consistent estimator of �1? Explain.

Solution: If women with better education, on average, get readings which are lower than the true value, thenwe should expect that when s is above average, the measurement error e will be more likely to be negative andbelow average. Hence the covariance between the two will be negative, and then

p lim �1 = �1 +�e;s�2s

< �1

so that �1 will NOT be consistent. Intuitively, the estimates will be systematically lower than the true value,because for women with better schooling we observe values of the dependent variable with are lower than thetrue values.

Let �2u and �2e denote the variance of the regression error and measurement error respectively, and let �e;u

denote the covariance between the two errors.

19. (3 points): Calculate V ar(ui + ei).

Solution:V ar(ui + ei) = �

2u + �

2e + 2�e;u

20. (4 points): Assume now that �e;s = �e;u = 0 and assume also for simplicity that the regression error ui ishomoskedastic with variance �2u, and the variance of the measurement error is constant and equal to �

2e. What

is the asymptotic variance ofpn(�1 � �1)? Interpret your results.

Solution: This is a VERY simple question, which does not require any lenghty calculation. First, note thatV ar(ui + ei) = �2u + �

2e if the two errors are uncorrelated. Then note that with no measurement error and

homoskedasticity, the asymptotic distribution would be

pn(�1 � �1)

d! N

�0;�2u�2s

�This is proved starting from (again, when there is NO measurement error)

�1 = �1 +

1n

nXi=1

(si � s)ui

1n

nXi=1

(si � s)2

122

Page 123: The Big Problems File

But in this case we have

�1 = �1 +

1n

nXi=1

(si � s) (ui + ei)

1n

nXi=1

(si � s)2:

But then notice that the only di¤erence between the �usual�and this special case with measurement error isthat now the �new error�has become ui+ei: So, IF measurement error in the dependent variable is uncorrelatedwith the regressor, �1 IS consistent, and the only consequence for our estimator is that it will be end up havingmore �noise�, because the variance increases. Formally, we will have

pn(�1 � �1)

d! N

�0;�2u + �

2e

�2s

123

Page 124: The Big Problems File

11.2 Midterm 2, Fall 2007

You have data from a random sample of 19,451 zero to 3 years old children from rural India, and you have estimatedthe following regressions, where the dependent variable is logweighti (heteroskedasticity robust standard errors inparenthesis)

Model (1) Model (2)�1 - log heighti 1.93 (.0108) 1.924 (.0110)�2 - FSi (Father�s years of schooling) .002 (.0003) .00076 (.0003)�3 - SLIi (Standard of Living Index) .0017 (.0002)�4 - constant -6.18 (.0466) -6.18 (.0466)

R2 0.7396 0.7407

The Standard of Living Index (SLIi) is a measure of asset ownership, and is constructed in a way that householdswith more assets have larger values of SLIi.

1. (3 points): Interpret the estimated slope �1 (the coe¢ cient corresponding to log heighti) in Model (1).

This is a log-log model, so �1 is the elasticity of weight with respect to height (that isa 1% increase in heightis associated with a 1.93% increase in weight)

2. (3 points): Using again Model (1), can you reject the null hypothesis �1 = 2 using a 1% signi�cance level?

The alternative is (by default) two-sided, so

t =1:93� 20:0108

= �6:48

so we de�nitely reject the null hypothesis using a 1% level

3. (5 points): In Model (2), the asset index SLIi is added to the regression estimated in Model (1). Interpretthe change in �2 in terms of omitted variable bias.

We should certainly expect SLI (wealth) to be positively correlated with father�s schooling. We also knowthat �3 > 0; so we should expect

sign (OV B) = sign��3

�>0

sign (�SLI;FS)>0

> 0:

Then, the omission in Model (1) of SLI should lead to an upward bias in �2; which is indeed what we see,because when we do include SLI in the regression �2 decreases from 0:002 to 0:00076 (and the t-ratio decreasestoo).

124

Page 125: The Big Problems File

Now let STUi be a binary variable equal to 1 if child i is �stunted� (that is, his/her weight is low relativeto his/her height) and equal to zero otherwise. You use STUi as dependent variable and you estimate thefollowing Linear Probability Model (LPM), where heteroskedasticity-robust standard errors are in parenthesis.

       _cons  .2583318   .0143801    17.96   0.000     .2301454    .2865181       FxSLI ­.0004805   .0008231    ­0.58   0.559    ­.0020939    .0011329         SLI ­.0045558    .000602    ­7.57   0.000    ­.0057357   ­.0033759      Female  .0117461   .0136431     0.86   0.389    ­.0149955    .0384876        age2   ­.00027   .0000348    ­7.76   0.000    ­.0003382   ­.0002019         age  .0097866   .0012474     7.85   0.000     .0073416    .0122317   Ftr_Illit  .0145057   .0079548     1.82   0.068    ­.0010864    .0300978   Mtr_Illit  .0344392   .0073581     4.68   0.000     .0200168    .0488617

         STU       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]               Robust

whereMtr_Illit and Ftr_Illit are binary variable indicating respectively whether the child�s mother or fatheris illiterate, age is child�s age in months, age2 is its squared, Female is a dummy variable equal to one if childi is a girl and F �SLI is Female�SLI: In what follows, let also �x denote the slope corresponding to a givenregressor x (so that, for instance, �age = 0:0097866).

4. (4 points) Interpret the estimated coe¢ cients �Mtr_Illit and �Ftr_Illit:

�Mtr_Illit indicates that having an illiterate mother increases the probability of stunting by 3.4 percentage

points, keeping everything else constant. Similarly, �Ftr_Illit indicates that having an illiterate father increasesthe probability of stunting by1.5 percentage points, keeping everything else constant.

5. (4 points) Do the results above suggest that father�s and mother�s illiteracy have a very di¤erent importancefor child nutritional status, as measured by stunting?

Based on the results from question 4, there does seem to be a large di¤erence. Keeping everything else constant,the �impact�of mother�s illiteracy on the predicted probability is more than twice as large as the �impact�offather�s probability. Given that we are talking about the probability of stunting for young children, a di¤erencein predicted probabilities equal to � 1:9 is quite large!

125

Page 126: The Big Problems File

6. (5 points) Suppose that you want to test the null hypothesis that father�s and mother�s illiteracy a¤ect equallychild nutritional status as measured by stunting. State clearly the null and the alternative hypothesis, andstate whether you can reject the null hypothesis, knowing that the value of the F test statistic is 3.23.

H0 : �Mtr_Illit = �Ftr_Illit

HA : �Mtr_Illit 6= �Ftr_Illit

This is a test with a single restriction (note that you cannot use a t-ratio test, because the problem does not tellyou what the estimated covariance between the two coe¢ cients is). So we have to use an F1;1 test. Lookingat the appropriate row in the tables, we can see that we cannot reject the null using a 5 or 1% signi�cancelevel, which we can reject the null using a 10% level.

7. (3 points) Using a 1% signi�cance level, can you reject the null hypothesis that the conditional expectationof STUi is a linear function of agei keeping all other regressors constant?

The conditional expectation is linear in age if age2 does not enter into the regression. So we just have to lookat the p-value of �age2; which is approximately zero. Hence, we can de�nitely reject the null of linearity.

8. (5 points) Now you want to test the null hypothesis that the regression is the same for boys and girls. Stateclearly the null and the alternative hypothesis, and state if you can reject the null, knowing that the value ofthe F test statistic is 0.44.

H0 : �Female = �F�SLI = 0

HA : �Female 6= 0 and/or �F�SLI 6= 0

This is a joint hypothesis with two restrictions, hence we have to use an F2;1 test. Looking at the tables, wesee that we cannot reject the null hypothesis that the regression is the same for boys and girls.

126

Page 127: The Big Problems File

9. (4 points) Calculate the probability of being stunted for that a newborn boy (age = 0), born from parentswho have no assets (SLIi = 0) and who are both illiterate.

:2583318 + :0344392 + :0145057 = :3072767

Now you re-estimate the same model as before, but using probit. The results are the following:

_cons ­.6473693   .0447236   ­14.47   0.000    ­.7350259   ­.5597127FxSLI ­.0014958   .0026327    ­0.57   0.570    ­.0066557    .0036642

SLI ­.0140907   .0019095    ­7.38   0.000    ­.0178333   ­.0103481Female  .0347082   .0401499     0.86   0.387    ­.0439842    .1134007age2 ­.0008046    .000105    ­7.66   0.000    ­.0010104   ­.0005988age  .0291546   .0037847     7.70   0.000     .0217368    .0365725

Ftr_Illit  .0389355   .0229746     1.69   0.090    ­.0060938    .0839648   Mtr_Illit   .104285   .0226281     4.61   0.000     .0599346    .1486353

         STU       Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

Log likelihood =  ­11442.85 Pseudo R2       =     0.0127Prob > chi2     =     0.0000LR chi2(7)      =     295.13

Probit regression                                 Number of obs   =      19451

10. (4 points) Re-calculate the probability of being stunted for that a newborn boy (age = 0), born from parentswho have no assets (SLIi = 0) and who are both illiterate. Is the result very di¤erent from the one youestimated using LPM?

� (�:6473693 + :104285 + :0389355) = � (�:5041488) = :30707841;

which is almost identical to the one we obtained using the LMP!! As usual, the model we use does not seemto matter much....

127

Page 128: The Big Problems File

11. (4 points) Calculate the marginal e¤ect of SLI (that is, of an increase in the asset index), for a newborn boywhose parents are both literate and whose SLIi is equal to 20. (note: writing down the correct expression forthe marginal e¤ect is su¢ cient to get full credit in this question, you do not have to do the actual calculations)

Remember that the marginal e¤ect in a non-linear regression model is the partial derivative of the regressionwith respect to a given regressor, for given initial values of all regressors. Also, we saw in class that thederivative of the cdf is the density, so

ME =@� (:)

@SLIi= � (�:6473693� :0140907� 20) (�:0140907)

= (�:0140907) 1p2�e�

12(�:6473693�:0140907�20)2

where the last step follows from using the formula for the density of a normal distribution. So

ME = �:00365

which is almost zero.

12. (5 points) You re-estimate the model above omitting Ftr_Illit, Female and F � SLI: The log-likelihood ofthe new model is �11444:731. Can you reject the null hypothesis that �Ftr_Illit = �Female = �F�SLI = 0?

LR = 2 [lnLU � lnLR] = 2 [�11442:85� (�11444:731)] = 3:762

There are three restrictions, so that we know that under the null hypothesis

LR = 2 [lnLU � lnLR]d! �23:

Looking at the table, it is clear that we cannot reject the null hypothesis (the critical value using a 10% levelis 6.25).

128

Page 129: The Big Problems File

Now you want to evaluate the impact of a child nutrition supplement program on stunting. Let Pi be a binaryvariable equal to 1 if child i participates to the program, and zero otherwise. Participation to the program isvoluntary, and decided by the children�s parents. Suppose that you estimate the following model using OLS(remember that STUi is a dummy equal to one if child i is stunted):

STUi = �0 + �1Pi + ui; (13)

where E (ui) = 0:

13. (4 points) Do you think your estimate �1 could be interpreted in a causal sense, that is, will �1 measure thecausal impact of program participation on the probability of stunting? Explain.

Certainly not. Participation is voluntary, hence Pi will be most likely correlated with other observable orunobservable variables which are also likely to be important determinant of STUi (for instance how muchparents care about children�s health, how far they live from the place where the program is o¤ered, how wellthey understand the importance of the program to improve child nutrition etc etc). Hence, omitted variablebias is likely to be a problem.

Suppose now that program participation is still voluntary, but that you also make sure that a random subsampleof families is supplied ample information about the existence and the utility of the program. As a consequence,you anticipate that such random subsample of families will be relatively more likely to participate to theprogram. Let Ai be a binary variable equal to one if child i lives in a family that has been exposed to this�advertisement campaign�about the program, and zero otherwise.

14. (4 points) Prove that E (STUi j Ai) = �0 + �1E (Pi j Ai) :

E (STUi j Ai) = E (�0 + �1Pi + ui j Ai)= �0 + �1E (Pi j Ai) + E (ui j Ai)

The conclusion follows noting that the last term is zero, because Ai has been determined completely atrandom, and hence will be uncorrelated with the error. Formally, Ai and ui are statistically independent, sothat E (ui j Ai) = E (ui) = 0:

129

Page 130: The Big Problems File

15. (3 points) Prove that E (PiAi) = P (Pi = 1; Ai = 1) : Justify your argument.

Both Pi and Ai are binary variable, so their product will always be zero, unless both variables are equal toone. Hence

E (PiAi) = 0 + 1� P (Pi = 1; Ai = 1) = P (Pi = 1; Ai = 1)

16. (5 points) Recalling that both Pi and Ai are binary variables, prove that

p limn!1

�Pni=1 PiAiPni=1Ai

�= E (Pi j Ai = 1)

p limn!1

�Pni=1 PiAiPni=1Ai

�= p lim

n!1

1n

Pni=1 PiAi

1n

Pni=1Ai

!

(by Slutsky) =p lim 1

n

Pni=1 PiAi

p lim 1n

Pni=1Ai

(by LLN) =E (PiAi)

E (Ai)=P (Pi = 1; Ai = 1)

P (Ai = 1)(because Pi and Ai are binary variables)

= P (Pi = 1 j Ai = 1) = E (Pi j Ai = 1) (because Pi is binary)

130

Page 131: The Big Problems File

17. (6 points) Recalling that both STUi and Ai are binary variables, prove that

p limn!1

�Pni=1 STUi (1�Ai)Pn

i=1 (1�Ai)

�= E (STUi j Ai = 0)

p limn!1

�Pni=1 STUi (1�Ai)Pn

i=1 (1�Ai)

�=

p lim 1n

Pni=1 STUi (1�Ai)

p lim 1n

Pni=1 (1�Ai)

(by Slutsky)

=E [STUi (1�Ai)]

E (1�Ai)

But STUi and Ai are both binary, so STUi (1�Ai) is binary too, and is equal to one only if both STUi = 1and Ai = 0: Hence:

=P (STUi = 1; 1�Ai = 1)

P (1�Ai = 1)=P (STUi = 1; Ai = 0)

P (Ai = 0)= P (STUi = 1 j Ai = 0)

= E (STUi j Ai = 0)

131

Page 132: The Big Problems File

18. (4 points) Based on the results from the questions 14, 16 and 17, describe a consistent estimator for �1, thatis, write down the estimator and prove that it is consistent. Justify your steps. (hint: start by writing downthe conditional expectation in 14 for Ai = 0 and for Ai = 1). This question is worth few points given itsdi¢ culty, so plan accordingly.

From 14 we know that

E (STUi j Ai = 1) = �0 + �1E (Pi j Ai = 1)E (STUi j Ai = 0) = �0 + �1E (Pi j Ai = 0)

So �1 can be obtained through some simple maniuplations. Speci�cally, if both the above expressions are true,then the di¤erence of the left-hand sides is equal to the di¤erence of the right-hand sides. Hence

E (STUi j Ai = 1)� E (STUi j Ai = 0) = �0 + �1E (Pi j Ai = 1)� �0 + �1E (Pi j Ai = 0) :

Solving for �1:

�1 =E (STUi j Ai = 1)� E (STUi j Ai = 0)E (Pi j Ai = 1)� E (Pi j Ai = 0)

Then a consistent estimator can be obtained using the sample analogue!

�1 =E (STUi j Ai = 1)� E (STUi j Ai = 0)E (Pi j Ai = 1)� E (Pi j Ai = 0)

From the responses to 16 and 17 we already know how to consistently estimate each term:

�1 =

p!E(STUijAi=1)z }| {Pni=1 STUiAiPn

i=1Ai�

p!E(STUijAi=0)z }| {Pni=1 STUi (1�Ai)Pn

i=1 (1�Ai)Pni=1 PiAiPni=1Ai| {z }

p!E(PijAi=1)

�Pni=1 Pi (1�Ai)Pni=1 (1�Ai)| {z }p!E(PijAi=0)

p! E (STUi j Ai = 1)� E (STUi j Ai = 0)E (Pi j Ai = 1)� E (Pi j Ai = 0)

= �1

132

Page 133: The Big Problems File

11.3 Final, Fall 2006

1. You have collected information on seatbelts use and other tra¢ c-related variables from the 50 U.S. States plusthe District of Columbia, for the year 1993. Let fi be the number of fatalities per million of tra¢ c miles instate i, and let SENFi be a binary variable equal to one if state i enforces seat belt laws. You estimate thefollowing regression (heteroskedasticity robust standard errors in parenthesis):

fi = :0169(:00091)

+ :0018(:0012)

� SENFi (14)

(a) (4 points) Determine if SENFi is statistically signi�cant, using a 10% level.

t =:0018

:0012= 1:5 < 1:645

so this is not signi�cant at 10% level.

(b) (4 points) Now you re-estimate the model adding, as regressor, the logarithm of per capita income(log inci). The result is the following:

fi = :0181(:0337)

+ :0016(:0009)

� SENFi � :0017(:0034)

log inci: (15)

Interpret the coe¢ cient for log inci:

A one percent increase in income is associated with a decrease in fatality rate equal to (0:01) 0:0017 fewer

deaths per million of tra¢ c miles.

(c) (5 points) What does the comparison of the results in (??) and (15) suggest about the correlationbetween log inci and SENFi? Should it be positive or negative? Explain.

Here we observe that adding log(income) to the regression leads to (albeit small) decrease in the coe¢ cientfor SENF: Hence, the coe¢ cient in (??) appears to be upward biased. Because the sign of the coe¢ cientfor log(income) is negative, an upward bias results if the correlation (or covariance) between log inci andSENFi is negative as well. The negative correlation is also con�rmed by the following logit and probitestimates.

133

Page 134: The Big Problems File

Now you estimate a regression of SENFi on log inci using logit and probit, and you obtain the followingresults (standard errors in parenthesis):

logit probit

log inc -.3955 -.2345( 2.00) (1.20)

constant 4.8 2.9(19.8) (11.9)

log-likelihood -30.8761 -30.8765

(d) (5 points) Using logit, estimate the di¤erence in the estimated probability of having seat belt enforcementbetween a state with income per head equal to $18,000 and one with income per head equal to $25,000.

1

1 + e�(4:8�:3955 ln(25000))� 1

1 + e�(4:8�:3955 ln(18000))= �0:027141

(e) (5 points) Estimate the same di¤erence using probit. Is the results very di¤erent from the one obtainedwith logit? Is this what you expected? Explain.

� (2:9� :2345 ln (25000))� � (2:9� :2345 ln (18000)) = �:02621:

As expected, logit and probit give very similar results for the estimated predicted change.

(f) (5 points) The coe¢ cients estimated using logit and probit are very di¤erent. However, the two log-likelihoods are almost identical. Is this surprising, or was it to be expected? Explain.

As pointed out in the previous point, logit and probit usually produce very similar results when it comesto the predicted probabilities. Becaues the log-likelihood is a function of such predicted probabilities, thetwo log-likelihoods will usually be very similar.

134

Page 135: The Big Problems File

Now you want to explore the topic further, and you collect data for all 50 states + DC for all yearsbetween 1983 and 1997. First, you estimate equation (15) with OLS again, but adding time dummies.The results are as follows (the time dummies are omitted for brevity).

fit = :195(:011)

+ :00009(:0003)

� SENFit � :0018(:0011)

� log incit + TimeFixedEffects: (16)

(g) (4 points) How many time �xed e¤ects should be included in regression (16). Explain.

You should include 97-83 time �xed e¤ects. There are 15 years in our dataset, but because we are alsoestimating a constant, we need to omit one year dummy.

(h) (4 points) You test the null hypothesis that the time �xed e¤ects are not statistically signi�cant. Thevalue of the F -test is 1.91. What do you conclude?

There are fourteen time �xed e¤ects, so that we have to compare 1.91 with the critical value of a F14;1distribution. The critical values are 1.5 (10%), 1.69 (5%) and 2.08 (1%), so that we cannot reject the nullusing a 1% signi�cance level, but we reject using a 10 or a 5% level.

(i) (5 points) Using the results in (16), calculate the p-value of the test of statistical signi�cance of �SENFit .

This is a two-sided test, so that the p-value is

2�

�� :00009:0003

�= :7642

135

Page 136: The Big Problems File

(j) (6 points) Because you suspect that SENFit is endogenous, you would like to use IV to estimateconsistently its impact on fatality rates. A fellow researcher suggests to use, as instrument, anotherbinary variable (PENFit) equal to one if state i has in place �primary enforcement�at time t (primaryenforcement allows police o¢ cers to have more power in enforcing seat belt laws). Do you think thisinstrument is likely to be valid? Explain.

This is probably a bad idea. Surely, PENF is likely to be relevant. Factors that are associated withthe probability of of having �secondary enforcement�are also likely to be associated to having �primaryenforcement�, so that the two variables are likely to be strongly related (positively or negatively dependingon whether SENF and PENF are likely to be �complements�or �substitute�from a policy perspective.However, if we suspect that there are unobserved factors (such as road conditions, political views in thestate etc.) that will be related to SENF AND to f through channels di¤erent from SENF (so thatSENF will be correlated with the error term and hence endogenous), it is not clear why one should thinkthat the same factors will also be correlated with PENF: So, PENF is likely to be relevant but notexogenous, hence not a valid instrument.

(k) (6 points) You want to go ahead with the idea of trying to use PENFit as an instrument, and you decideto use also ageit (mean age of drivers in state i in year t) as a further instrumental variable in estimatingequation (16). Explain how you should carry out the �rst stage of the 2SLS procedure.

You should regress the endogenous variable (SENF ) on all instruments (PENF and age) and all theexogenous variables included in (16), that is the constant, log incit; as well as the time �xed e¤ects.

136

Page 137: The Big Problems File

(l) (6 points) You estimate the �rst stage of 2SLS, and you want to check if the instruments are likely tobe weak. The F-test is 382.11. Explain what is the null hypothesis of this F-test, and what you concludein terms of instrument weakness.

The null hypothesis is that all coe¢ cients that multiply the instruments are equal to zero. We concludethat the instruments are weak if the F test for this joint test of statistical signi�cance is below 10. Here,the test is clearly above the threshold, and so we conclude (as expected) that the instrument are relevant,and not weak.

You estimate the second stage of 2SLS, and this is what you get:

fit = :195(:010)

� :0007(:00047)

� SENFit � :0018(:0011)

log incit + TimeFixedEffects: (17)

(m) (5 points) Now the coe¢ cient for SENFit has turned negative. Calculate the power of a 5% test ofstatistical signi�cance of this coe¢ cient, when the true value of the coe¢ cient is zero.

The power is the probability of rejecting H0 : �SENF = 0; when the truth is that �SENF = �ASENF : But

if the the coe¢ cient is zero, then �ASENF = 0, so that this is just the probability of rejecting the null whenthe null is true! If we use a 5% signi�cance level, such probability is .05 by construction.

137

Page 138: The Big Problems File

(n) (6 points) Calculate the power of a 5% test of statistical signi�cance of this coe¢ cient, when the truevalue of the coe¢ cient is -0.0005.

power�=0:05 (�0:0005)= 1� Pr

�do not reject null at 5% j �ASENF = �0:0005

�= 1� Pr

�1:96 � �SENF � 0

:00047� 1:96 j �ASENF = �0:0005

!

= 1� Pr �1:96� 0:0005

:00047� �SENF � (�0:0005)

:00047� 1:96� 0:0005

:00047j �ASENF = �0:0005

!

= 1� Pr

0BB@�3:0238298 � �SENF � (�0:0005):00047| {z }�N(0;1)

� :89617021 j �ASENF = �0:0005

1CCA= 1� � (:89617021) + � (�3:0238298) = :18632892

(o) (6 points) You still suspect that your instruments may not be exogenous, and because you have moreinstruments than necessary, you can perform a test of overidentifying restrictions. The value of the testis 7.07. Explain how this test should be carried out, and what you conclude about the validity of yourinstruments.

First, after estimating the model with 2SLS, you should calculate the residuals as

uit = fit � :195 + :0007� SENFit + :0018� log incit � TimeFixedEffects:

Then you should regress uit on all instruments, log incit and the TimeFixedEffects; and �nally youshould test the joint null that both instruments (PENF and age) are equal to zero. Here the valueof the test is 7.07, which has to be compared with the critical value of a �21 distribution (there is oneoveridenti�cation restriction). Clearly, we reject at any standard signi�cance level. This suggests that atleast one of the two instruments is not valid, and hence our results are probably not that meaningful, asexpected from the poor choice of instruments.

138

Page 139: The Big Problems File

Now you �nally turn to exploiting the panel structure of your data, and you estimate the usual equationusing Fixed E¤ects (FE) and Random E¤ects (RE). The results are as follows (year dummies are includedbut omitted from the table for brevity):

FE RE

SENFi -.00005 -.00019( .0003 ( .0003)

log inc .019 .0037(.0026) (.0021)

Time Fixed E¤ect Yes YesConstant No Yes

(p) (4 points) Using the FE results, what is the predicted decrease in the number of fatalities associatedwith a state having seat belt enforcement?

Seat belt enforcement is associated with a decline in fatalities equal to �:00005 fewer deaths per millionmiles.

(q) (4 points) The results using the FE and RE models are quite di¤erent. You perform a Hausman test,and the result of the test is 104.18. What do you conclude?

The value of the test is very large. There are 16 degrees of freedom (14 time dummies + the regressors inthe table), and the test is asymptotically distributed as a �216: The critical value with a 1% signi�cancelevel is 32, so we reject the null that the state �xed e¤ects are uncorrelated with the included regressorsat any standard signi�cance level. We conclude that RE is inconsisten, and then we shuold use FE (withTWO grains of salt, and it�s not clear that using FE we will solve all endogeneity problems...)

139

Page 140: The Big Problems File

2. You wish to estimate the impact of mother�s smoking on child birth weight. Low birth weight is often associatedwith poorer outcomes for the child, both in infancy and later in life. You know that the true model is thefollowing:

log(birthw)i = �0i + �1ipacksdayi + ui; (18)

where notice that there is heterogeneity in the coe¢ cients, which can therefore be treated as random variablesthemselves. You can also assume that the coe¢ cients are independent from all the other variables. You wouldlike to estimate the Average Treatment E¤ect (ATE), that is, E[�1i].

(a) (6 points) Suppose for the moment that E[ui j packsdayi] = 0, and assume also that there are two�types�of mothers: half of the population is composed of Type I mothers, for whom �1i = �0:12, whilethe other half of the population is composed of Type II women, for whom �1i = �0:02. If you estimateequation (18) with OLS using i:i:d: data, what is the probability limit of b�1? Justify your answer.

We proved in class that if the regressor is exogenous, then

b�OLS1p�! E[�1i]

so that OLS estimates consistently the ATE. In our case, because we have two Types which happen withthe same probability, we have

b�OLS1p�! E[�1i] = :5 (�:12) + :5 (�:02) = �0:07

140

Page 141: The Big Problems File

In reality, E[ui j packsdayi] 6= 0, as there are several omitted factors that are likely to lead to a negativecorrelation between birthwi and packsdayi. Hence, you resort to using IV, using cigarette prices (pi) asan instrument. You can assume that pi is a valid instrument, in the sense that it is both relevant andexogenous. However, there is also heterogeneity in the �rst stage of IV, which is then as follows:

packsdayi = �0i + �1ipi + vi; (19)

where you know that �1i = �0:05 for Type I women, but �1i = 0 for Type II women.(b) (6 points) If you now estimate equation (18) with IV using i:i:d: data and pi as an instrument, what is

the probability limit of b�IV1 ? Justify your answer.

No, b�IV1 is not a consistent estimator for the ATE. We know that in fact

b�IV1 p�! E

��1i

�1iE (�1i)

�In this case

E

��1i

�1iE (�1i)

�= :5 (�:12)

��0:05E (�1i)

�+ :5 (�:02)

�0

E (�1i)

�= :5 (�0:05) (�:12) 1

(:5) (�0:05) + (0:5) 0= �:12

(c) (5 points) Is b�IV1 a consistent estimator for the ATE? Brie�y interpret your results.

Because Type II women�s smoking habits are not at all at all a¤ected by cigarette prices, their �beta�will receive zero weight. Hence, IV will converge in distribution to the beta of women of Type I only,hence severely overestimating the potential harm of smoke on birth weight.

141

Page 142: The Big Problems File

3. You have an i:i:d: sample of 1000 observations on yearly income y1; :::; y1000: Let z be a �poverty line�, sothat an individual is categorized as �poor�if her/his yearly income is below z: Let pi denote a binary variableequal to one if an individual is poor, that is, if yi < z: Let H denote the �poverty head count ratio�, that is,the proportion of individuals who are poor in the population. You want to estimate H = P (yi < z) : You canassume that all the usual �regularity conditions�hold, so that you can apply LLN and CLT when appropriate.

(a) (4 points) Prove that H = E (pi) :

E (pi) = 1Pr (yi < z) + 0Pr (yi � z) = Pr (yi < z) = H

Note that

E (pi) 6=1

n

nXi=1

pi

Sample mean and expectation are not the same thing!!!

(b) (6 points) Write down the log-likelihood of (p1; p2; :::; p1000):

This is really easy once you realize that this is the usual, standard log-likelihood of an i:i:d: sample froma Bernoulli distribution with parameter H: Then:

lnL (H) =1000Xi=1

pi lnH +

1000�

1000Xi=1

pi

!ln (1�H)

(c) (6 points) What is the maximum likelihood estimator of the poverty headcount ratio H? To get fullcredit, you can either prove your answer by maximizing the likelihood, or refer clearly to results we haveseen in class.

As we saw in class, the MLE of the single parameter that identi�ed a Bernoulli distribution, with an i:i:d:sample, is just the sample mean, so that:

H =1

1000

1000Xi=1

pi

This, of course, could also be derived solving the maximization problem

maxHlnL (H) = max

"1000Xi=1

pi lnH +

1000�

1000Xi=1

pi

!ln (1�H)

#

) 0 =

1000Xi=1

pi

!1

H+

1000�

1000Xi=1

pi

!1

1� H(�1)

) H =1

1000

1000Xi=1

pi

142

Page 143: The Big Problems File

(d) (6 points) Suppose that 10% of individuals in your sample are poor. Using a 10% signi�cance level, canyou reject the null hypothesis that H = :15 using a likelihood ratio test?

lnLUnrestricted =1000Xi=1

pi ln (:1) +

1000�

1000Xi=1

pi

!ln (1� :1)

= 100 ln (:1) + 900 ln (:9) = �325:08

lnLRestricted =1000Xi=1

pi ln (:15) +

1000�

1000Xi=1

pi

!ln (1� :15)

= 100 ln (:15) + 900 ln (:85) = �335:98

so thatLR = 2 [lnLUnrestricted � lnLRestricted] = 2 [�325:08 + 335:98] = 21:8 > 2:71

This LR test has an asymptotic distribution equal to a �21, as here we have only one restriction. So wede�nitely reject the null at any standard signi�cance level, including a 10% (the critical value for 10%is 2.71). The most common mistakes here has been to confuse the value of the parameter with thevalue of the likelihood.

(e) (6 points) Suppose now that you are estimating the proportion of individuals who are poor using thesample mean, that is

H = �p =1

n

nXi=1

pi;

where in this case n = 1000, and �p = :10: Calculate a 95% con�dence interval for �p:

Because this is a sample from a Bernoulli distribution, dvar �H� = (1�H)H1000 = (:1)(:9)

1000 ; so the con�dence

interval is h:10� 1:96

q(:1)(:9)1000 :10 + 1:96

q(:1)(:9)1000

i�0:081406 0:11859

(f) (5 points) Is H a consistent estimator for the true headcount poverty ratio H? Prove your answer.

Is is certainly consistent, as it is just a sample mean of i:i:d: random variable. The mean, under the usualregularity conditions, will converge in probability to the expectation of an abitrary term, so that:

H = �p =1

n

nXi=1

pip�! E (pi) = H

143

Page 144: The Big Problems File

(g) (4 points) Now you want to check if H is asymptotically normally distributed. Let pi = H + ui; whereH is the true value of the headcount ratio, and ui is a residual. First prove that E(ui) = 0.

Just using the de�nition of ui :

E(ui) = E (pi �H) = E (p1)�H = H �H = 0

(h) (4 points) Prove that V ar(ui) = H(1�H)

It is su¢ cient to notice that this is just the variance of a Bernoulli random variable with parameter H.

(i) (4 points) Let �u be the sample mean of ui. Prove that

pn�H �H

�=

�uqH(1�H)

n

pH (1�H)

This follows from a few manipulations

H =1

n

nXi=1

pi =1

n

nXi=1

(H + ui) = H +1

n

nXi=1

ui

) H �H =1

n

nXi=1

ui )�H �H

�= �u

)pn�H �H

�=pn�u =

pn�u

pH (1�H)pH (1�H)

=�uq

H(1�H)n

pH (1�H)

144

Page 145: The Big Problems File

(j) (7 points) Using the Central Limit Theorem (CLT), prove that

pn�H �H

�d�! N (0;H (1�H))

This is just as in Midterm 2. First, because we have i:i:d: random variables and E (ui) = 0; andq

H(1�H)n

is the standard deviation of �u; we have

�uqH(1�H)

n

=�u� 0qH(1�H)

n

d�! N (0; 1) :

Butpn�H �H

�is just �uq

H(1�H)n

multiplied by a constant equal topH (1�H), so its asymptotic

distribution will beN (0;H (1�H))

(k) (4 points) Using the previous results, how would you construct a 95% con�dence interval for the head-count ratio H?

Analogously to what we saw in Midterm 2:

H � 1:96

vuutH�1� H

�n

145

Page 146: The Big Problems File

Now you would like to study the relation between poverty and voting behavior. You are using data from acountry with two political parties, A and B: You would like to have an estimate of the poverty head countratio among voters of party A, that is, you would like to know HA � P (yi � z j Ai = 1) = E(pi j Ai = 1),where y is income, z is the poverty line, and Ai is a binary variable equal to one if i votes for party A;and equal to zero if individual i votes for party B. Let pA; pB be the proportion of individuals who votefor party A and B respectively. Similarly, let HB = P (yi � z j Ai = 0) denote the poverty ratio amongvoters of party B. Suppose that you have a dataset which includes n i:i:d: observations (yi; Ai):

(l) (6 points) Prove that E[piAi] = HApA.

Both p and A are binary variables, so the only instance where their product is non-zero is if both areequal to 1. Then

E[piAi] = Pr (pi = 1; Ai = 1) = Pr (pi = 1 j Ai = 1)| {z }HA

Pr (Ai = 1)| {z }pA

= HApA

(m) (5 points) Is the following estimator unbiased for HA? Justify your answer.

HA =

Pni=1 piAiPni=1Ai

: (20)

This is estimator is NOT unbiased. To be unbiased, we would need E�H�= H; however, the expecta-

tion of a ratio of two random variables is NOT equal to the ratio of the expectations of thetwo random variables, so that

E�H�= E

�Pni=1 piAiPni=1Ai

�6= E (

Pni=1 piAi)

E (Pni=1Ai)

= H

(n) (6 points) Is HA a consistent estimator of HA? Justify your answer.

Is is consistent. Observations are i:i:d: and we assume that the usual regularity conditions hold, so

HA =

p�!E(piAi)=HApAz }| {1

n

nXi=1

piAi

1

n

nXi=1

Ai| {z }p�!E(Ai)=pA

p�! HApApA

= HA

146

Page 147: The Big Problems File

(o) (2 points) Now suppose that, unfortunately, you do not have a dataset that includes information on bothvoting behavior and income. However, suppose that you know for certain that 10% of the population ofvoters is poor, and you also know that 40% of the population voted for party A. We want to see if thisinformation can be used to infer something about HA. Prove that

HA =:1�HB(:6)

:4: (21)

:10 = Pr (pi = 1) = Pr (pi = 1; Ai = 1) + Pr (pi = 1; Ai = 0)

= Pr (pi = 1 j Ai = 1)Pr (Ai = 1) + Pr (pi = 1 j Ai = 0)Pr (Ai = 0)= HApA +HBpB = HA (:40) +HB (:60)

) HA =:1�HB(:6)

:4

(p) (2 points) Given the information at hand, calculate the �lower bound�for HA; that is, calculate whatis the smallest possible value of HA that is consistent with equation (21)? Does your conclusion provideuseful information about HA? Hint: Remember that HB is the fraction of voters who vote for B whoare poor, so that this fraction is certainly a number � 0 and � 1:

We know that HB; being a probability, must be below 0 and one. It enters the expression for HA with anegative sign, so that the minimum value of HA is achieved when HB is equal to one, its maximum value,so

HA �:1� :6:4

= �1:25

which is not very useful information. We already knew that HA must be between zero and one, becauseit is a probability/fraction!!

(q) (2 points) Given the information at hand, calculate the �upper bound�for HA; that is, calculate whatis the largest possible value of HA that is consistent with equation (21)? Does your conclusion provideuseful information about HA?

In this case the information IS useful. The maximum value of HA is achieved (by an argument similar tothe one in the previous point) when HB achieves its minimum value, that is, zero. In this case then wedo learn that

HA �:1

:4= :25

So even if the information we have is not su¢ cient to calculate exactly HA; we can at least say that nomore than one quarter of A-voters are poor.

147

Page 148: The Big Problems File

11.4 Final - Fall 2007

1. Malaria a¤ects millions of people worldwide and kills many thousands every year. The public health literaturehas shown that one of the most e¤ective preventive measures against malaria is the regular use of insecticidetreated nets (ITNs) while sleeping at night. You want to study this topic by using data from an i:i:d: randomsample of 2569 individuals in rural Orissa (India). Let ITNi denote a binary variable equal to one if individuali sleeps regularly under an ITN, and let Mi denote a binary variable equal to one if individual i has malariaat the time of the survey.

(a) (5 points) Suppose that you know that 58 percent of individuals regularly use a net, and you also knowthat the fraction of individuals in your sample with malaria is .14 among those with ITN = 0, and .10among those with ITN = 1: Calculate �M (that is, the fraction of individuals with malaria in the sample).

This just requires the calculation of a weighted mean (the sample equivalent of LIE). So

�M = :58 (:10) + (1� :58) (:14) = 0:1168

You would like to estimate what is the impact of using ITNs on malaria. You estimate a simple linearregression ofMi on ITNi, and the results are the following (standard errors are heteroskedasticity-robust)

       _cons  .1353105   .0104173    12.99   0.000     .1148834    .1557376         ITN ­.0326259    .013054    ­2.50   0.013    ­.0582234   ­.0070285

           M       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]               Robust

Throughout the problem, let �X denote the coe¢ cient corresponding to a given regressor X; so forinstance, in this case �ITN = �0:0326259:

(b) (6 points) Interpret the estimated coe¢ cient �ITN and indicate whether it is statistically signi�cant atthe standard signi�cance levels.

The fraction of individuals with malaria is 0.033 lower among individuals who sleep regularly under a netthan among those who don�t. The t-ratio is

t =�:033:013054

= �2:5280

which is signi�cant at the 5 and 10 level, but not at the 1% level (even if almost so).

148

Page 149: The Big Problems File

(c) (5 points) Calculate the p-value of a two-sided test where H0 : �ITN = �0:05:

pvalue = 2�

�������:033� (�0:05):013054

����� = 2� (�1:33) = :1835(d) (6 points) Do you think the results in this regression can be interpreted in a causal sense? Why, or why

not?

De�nitely not. Individuals choose whether to sleep under a net. This regression most likely su¤er fromomitted variable bias. For instance, richer persons, or persons that care more about health, are morelikely to use an ITN (which induces a correlation between omitted variable and regressor), but these samecharacteristics are also likely to a¤ect malaria prevalence directly, through pathways di¤erent from theuse of ITNs. Hence, both conditions for the existence of OVB hold and OLS will not be consistent forthe causal e¤ect.

(e) (4 points) In this model, the dependent variable is binary. Do you think a logit or a probit model wouldhave been more appropriate? Explain.

Not really. First, we know that usually results with logit and probit are very similar to those obtainedusing LPM. But (and this is the most important reason) in this case the single included regressor is binary,so logit and probit will actually lead to identical results, by construction! Intuitively, we know that withno regressors the three models will just estimate the probability of a �success� (that is, a �1�). If theonly regressor is a binary variable, all models will estimate the probability of a success for each of the twopossible values of the regressor. This will lead to identical (not just similar) results across models.

149

Page 150: The Big Problems File

Now you estimate the following model, again using OLS:

       _cons  .3268167   .0593594     5.51   0.000     .2104195    .4432139        lpce ­.0348944   .0104168    ­3.35   0.001    ­.0553207   ­.0144681         ITN ­.0285609   .0130424    ­2.19   0.029    ­.0541356   ­.0029863

           M       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]               Robust

where lpce denotes the logarithm of per capita (per head) expenditure (in Indian Rupees) in individuali�s family.

(f) (5 points) Calculate the �tted value Mi for an individual who sleeps regularly under a net, and wholives in a household where expenditure per head is 300 Rupees.

Mi = :3268167� :0285609� :0348944 (ln 300) = 0:099

(g) (5 points) Interpret the estimated coe¢ cient �lpce (you do not need to comment on the signi�cance).

A 1% increase in pce decreases the predicted probability of having malaria by 0:01� 0:035

(h) (6 points) Interpret the change in �ITN with respect to the value estimated using the previous model(on page 2) in terms of omitted variable bias.

When ln pce is included in the regression, �ITN decreases by about 10% in absolute value. So, it lookslike the �rst model was su¤ering from downward bias (the coe¢ cients are negative). We know that thesign of the asymptotic bias is determined by the sign of the product between the sign of the coe¢ cientsof the omitted variable (in this case �ln pce; which is < 0) and the sign of the correlation between theincluded and the excluded regressors (which in this case should be expected to be positive, because richerhouseholds are more likely to use nets). So, the sign of the product is negative, and indeed the bias isdownwards. In other words, if we do not control for income/wealth the coe¢ cient for ITN will incorporatenot only the reduced malaria risk due to nets, but also the fact that when we look at folks who use nets weare looking at individuals who are richer, and hence are likely to have lower malaria burdens for reasonsother than ITN use (health care, quality of housing etc).

150

Page 151: The Big Problems File

You are still worried that the results may be inconsistent because of omitted variable bias. It so happensthat you have collected data for several individuals in di¤erent households, so you can use your datasetas if it were a panel where the �unit�is the household and instead of �time�you have �individuals�. So,the model you estimate is the following

Mhi = �ITN � ITNhi + �lpce � lpceh + �h + uhi (22)

where the index h denotes the household, and the index i denotes the individual. You estimate equation(22) using �xed e¤ect (FE), and the results are the following:

_cons  .0954435   .0440625     2.17   0.031     .0089603    .1819267lpce (dropped)

         ITN  .0361111   .0750633     0.48   0.631    ­.1112186    .1834408

           M       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]               Robust

(i) (4 points) Why did Stata drop �lpce from the estimation?

Because ln (pce) is constant for all individuals in a household, hence it is dropped when you use FE,together with anything that does not vary with the �group�(in this case, the household).

(j) (5 points) Suppose that equation (22) su¤ers from omitted variable bias where the variables omitted areindividual-speci�c (for instance, you could think that individuals who care more about health are morelikely to use ITNs and that these individuals are also less likely to have malaria for reasons other thanITN use). In such case, do you think that the FE estimator will be consistent? Explain.

The FE estimator would not be consistent, because if this were the case we would still have correlationbetween ITN and the individual-speci�c error uhi: Indeed this is likely to be the case, because even withina family people decide whether they want to use a net or not. Someone who does not use one may careless about illness (or maybe have some sort of immunity and not be very worried about malaria).

151

Page 152: The Big Problems File

(k) (6 points) A colleague suggests that instead of estimating the impact of regular ITN use on malariausing FE, it may be better to use 2SLS. He proposes to use, as an instrument, a binary variable LNiequal to one if individual i slept under a net the night before the interview. Would you expect LNto be a valid instrument for ITN (which denotes regular use of a net while sleeping)? Why, or why not?

This is a really silly idea. If ITN is endogenous, LN will be endogenous as well for the same reasons! Thisinstrument will most likely to be very relevant (if you sleep regularly under a net you are also going to bemore likely to have slept under a net the night before the interview), but exogeneity will most certainlynot hold.

You go ahead with your colleague�s plan, but instead of using LN only as an instrument for ITN , you alsouse, as instrument, hhsize (number of family members) and age: The result of the �rst-stage regressionis the following:

       _cons ­.0154055     .09324    ­0.17   0.869    ­.1982389    .1674278        lpce  .0940303   .0152509     6.17   0.000     .0641249    .1239356      hhsize  .0086553   .0036473     2.37   0.018     .0015032    .0158073         age ­.0019503   .0005715    ­3.41   0.001    ­.0030709   ­.0008297          LN  .4673849   .0134301    34.80   0.000     .4410499    .4937198

         ITN       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]               Robust

(l) (4 points) You want to check if the instruments you have chosen are weak. Describe the null hypothesisof the test that you should carry out to establish if the instruments are weak.

The null hypothesis is that �LN = �hhsize = �age = 0

152

Page 153: The Big Problems File

(m) (4 points) The value of the test statistic calculated to test for instrument weakness is 418.65. What doyou conclude?

As expected, the instrument (which make no sense...) are strongly correlated with the endogenous vari-able. The value of the F test is way above 10, hence our instruments are really bad, but very strong!

You carry out the estimation with 2SLS and the results are the following:

       _cons  .3285983   .0590766     5.56   0.000     .2127557     .444441        lpce ­.0337712   .0110664    ­3.05   0.002    ­.0554713   ­.0120712         ITN ­.0423915   .0349156    ­1.21   0.225    ­.1108571     .026074

           M       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]               Robust

(n) (4 points) Can you reject the null that �ITN = 0 at the 10% signi�cance level?

t =�:04240:035

= �1:21

so you cannot reject the null at the 10 percent level.

(o) (5 points) Now you carry out an overidenti�cation test to test for exogeneity, and the value of the F-testfor the joint statistical signi�cance of the instruments is 2.11. What do you conclude?

There are three instruments, and one endogenous regressors, so there are two overidenti�cation restric-tions. Hence

mF = 3� 2:11 = 6:33:

The J-statistics has a chi-square distribution with (in this case) two degrees of freedom, so you can onlyreject the null at the 5 or 10% level. Given that we are certain that this was a stupid instrument, thisresult con�rms that one has to be careful when arguing in favor of the exogeneity simply based on a testfor exogeneity. You have to use common sense as well!

153

Page 154: The Big Problems File

2. You want to try a completely di¤erent approach, and you carry out the following randomized experiment, inan area where no one uses ITNs. You give a free ITN to a random sample of individuals, and aftera few months you return to the same area and you collect information on ITN use and malaria prevalence.Let Fi be a binary variable equal to one if individual i received a free ITN, and zero otherwise. Like in theprevious problem, let Mi denote a binary variable equal to one if individual i has malaria at the time of thesurvey. Finally, let Si denote the fraction of nights individual i has slept under a net since you completed yourdistribution of free nets.

(a) (5 points) Suppose that you estimate the following regression Mi = �0 + �1Si + ui. Do you think that�1 could be interpreted as the causal impact of ITN use on malaria prevalence? Explain.

Not really. Even if nets have been distributed at random in the population, what you have done isdistributing the nets for free. You are not randomly assigning Si; which is still CHOSEN by individuals.Hence, a regression of M on S will not estimate a causal impact.

(b) (5 points) Suppose that you estimate instead the following regression Mi = �0 + �1Fi + vi: What is theinterpretation of �1 in this regression?

This measures the causal impact on malaria of OFFERING A NET to an individual. It does NOTmeasure the impact of actually using it, because you do not know whether these folks use them or not.While actual use is not randomly determined, the o¤er is, and hence you can interpret the results in acausal sense.

(c) (6 points) Suppose that you estimate instead the following regression, where pcei indicates expenditureper head in individual i�s household: Mi = �

00 + �

01Fi + �2 � pcei + v0i: Would you expect �

01 to be very

di¤erent from �1 estimated using the model in part (b)? Explain.

No, you should not expect a signi�cant change. That�s because Fi has been assigned at random, hence itwill be uncorrelated with anything. So, one of the two conditions for OVB (correlation between includedand excluded regressor) does not hold, so even if we should expect pce to matter in the regression, thecoe¢ cient for Fi should not be expected to change (as long as, of course, you have done the randomizationwell!)

154

Page 155: The Big Problems File

(d) (4 points) From now on, consider the model in part (a), that is, Mi = �0 + �1Si + ui and assume thatE (ui) = 0: Now you consider the idea of estimating this model using instrumental variables. Speci�cally,you consider the idea of using Fi as an instrument for Si: Do you think this is going to be a valid instru-ment? Why?

Yes, you should expect this one to be a valid instrument. It is exogenous, because it�s been randomlydetermined, and it should be expected to be relevant, because individuals should be more likely to sleepoften under a net if they got one for free.

Assume now that Fi is a valid instrument for Si. In Midterm 2 we proved that �1 can be calculated as

�1 =E (MijFi = 1)� E (MijFi = 0)E (SijFi = 1)� E (SijFi = 0)

Now we want to see how this estimator compares with a more standard 2SLS procedure. As usual, weproceed in steps.

(e) (5 points) Prove that E (MiFi) = E (MijFi = 1)E (Fi)

E (MiFi) = P (Mi = 1; Fi = 1) = P (Mi = 1jFi = 1)P (Fi = 1)= E (MijFi = 1)E (Fi)

where all the changes from E to P and vice versa follow easily because both random variables are binary(which is also the reason why E (MiFi) is equal to the probability that both are equal to one.

155

Page 156: The Big Problems File

(f) (6 points) Suppose that you have two random variables, Xi and Yi. You know that Xi is a binaryrandom variable, and let P (Xi = 1) = E (Xi) � pX . Prove that

Cov (Yi; Xi) = pX (1� pX) [E (YijXi = 1)� E (YijXi = 0)]

Start from the de�nition of covariance (I omit the index i for simplicity)

Cov (X;Y ) = E (XY )� E (X)E (Y )

now we know that X is binary, so

E (XY ) = E (Y jX = 1) pX + 0

E (Y ) = E (Y jX = 1) pX + E (Y jX = 0) (1� pX)

Hence

Cov (X;Y ) = E (Y jX = 1) pX � E (Y jX = 1) p2X � E (Y jX = 0) pX (1� pX)= E (Y jX = 1)

�pX � p2X

�� E (Y jX = 0) pX (1� pX)

= E (Y jX = 1) pX (1� pX)� E (Y jX = 0) pX (1� pX)= pX (1� pX) [E (YijXi = 1)� E (YijXi = 0)]

156

Page 157: The Big Problems File

(g) (5 points) Putting together the results in the previous steps, prove that

�1 =E (MijFi = 1)� E (MijFi = 0)E (SijFi = 1)� E (SijFi = 0)

=Cov (Mi; Fi)

Cov (Si; Fi)(23)

Based on the results in the previous step, we have

Cov (Mi; Fi)

Cov (Si; Fi)=pF (1� pF ) [E (MijFi = 1)� E (MijFi = 0)]pF (1� pF ) [E (SijFi = 1)� E (SijFi = 0)]

=E (MijFi = 1)� E (MijFi = 0)E (SijFi = 1)� E (SijFi = 0)

(h) (5 points) From Midterm 2, it follows that �1 can be estimated consistently by using

�Wald1 =

E (MijFi = 1)� E (MijFi = 0)E (SijFi = 1)� E (SijFi = 0)

that is, by calculating the sample analogue of the middle expression in (23). This estimator is sometimescalled the �Wald Estimator�. Based on the results in part (g), what is the relationship between the WaldEstimator and the 2SLS estimator where the binary variable Fi is used as instrument? Explain.

They are identical! The Wald estimator estimates the sample analogue of the middle expression in (23),while 2SLS estimates the sample analogue of the last term in (23), but such two expressions are identical,so the estimators are e¤ectively the same.

157

Page 158: The Big Problems File

(i) (6 points) You proceed with the 2SLS estimation and the results are the following:

Mi = 0:10(0.01)

� 0:05(0.015)

� Si; R2 = 0:01; ccov ��0; �S� = �:0001 (24)

Do these results suggest that it is important to sleep regularly under an ITN to reduce the risk of con-tracting malaria? Comment on both the statistical and substantive signi�cance of the results.

Yes they do. The probability of having malaria without ever sleeping under a net (Si = 0) is 10 percent,while the probability is only half as large if you always sleep under a net (Si = 1). So the di¤erence iscertainly substantively important (also consider that malaria can kill...). The di¤erence is also statisticallysigni�cant at all standard signi�cance level, because

t =�0:05:015

= �3:33

(j) (4 points) Suppose that the true value of the slope �1 is equal to �0:075. Based on the estimates in(24), would you argue that the 2SLS estimator is inconsistent? Justify your answer.

Certainly not! An estimator is inconsistent when its probability limit is di¤erent from the true value, notwhen the point estimate is di¤erent from the true value! Here, indeed, we should expect the estimator tobe consistent, because we are using 2SLS and a valid instrument. The fact that the point estimate is notidentical to the true value is just a result of the fact that we have a �nite sample. The point estimate isessentially always di¤erent from the true value!

(k) (6 points) Show how to construct a 95% con�dence interval for the predicted probability of having malariafor an individual who always sleeps under a net (note: you do not need to complete the calculation.Setting it up correctly will give you full credit).

:05� 1:96p:012 + :0152 � 2 (1) (�:0001)

158

Page 159: The Big Problems File

Suppose again that the instrument is valid, but that that the population you are studying is �heteroge-neous�, that is

Mi = �i0 + �i1Si + ui

Si = �i0 + �i1Fi + vi

so that now the regression coe¢ cients are individual-speci�c. You can also assume that �Fv = 0; and thatthe individual-speci�c coe¢ cients are independent of Si and Fi: You would like to estimate the AverageTreatment E¤ect, that is, E (�i1) :

(l) (5 points) Let �2SLS1 be the 2SLS estimator that you obtain if you use Fi as an instrument, and if youignore the existence of heterogeneity. Prove that

p lim �2SLS1 = E (�i1) +cov (�i1; �i1)

E (�i1):

>From the chichi, we know that

p lim �2SLS1 =E (�i1�i1)

E (�i1)

so that using the de�nition of covariance we get

p lim �2SLS1 =cov (�i1; �i1)� E (�i1)E (�i1)

E (�i1)= E (�i1) +

cov (�i1; �i1)

E (�i1)

(m) (7 points) Suppose that the poorest and least literate individuals, on average, do not sleep regularlyunder the nets, even when they receive them for free (for instance, because they do not believe that malariais transmitted by mosquitoes). However, suppose also that these same individuals are those that wouldbene�t more from sleeping regularly under a net (for instance, because they live in poorly constructedhuts that mosquitoes can easily enter at night). If so, will �2SLS1 consistently estimate the ATE? Justifyyour answer, and if you �nd that the answer is no, explain whether 2SLS will over or underestimate thetrue bene�ts from using regularly ITNs.

The question clearly indicates that to more negative (that is, smaller) values of �i1 correspond, on average,smaller values of �i1 (which should be expected to be positive, because o¤ering a net for free should, ifanything, increase the number of nights you sleep under one of them). In other words, the covariance willbe positive. Hence, we should expect

p lim �2SLS1 = E (�i1) +

>0z }| {cov (�i1; �i1)

E (�i1)| {z }>0

> E (�i1)

Hence, the 2SLS estimator will UNDERestimate the bene�t of ITNs, because the result will be a negativenumber which is not negative enough!

159

Page 160: The Big Problems File

Now you estimate the following model, where Ci is a binary variable equal to one if individual i is a child

Mi = �0 + �1Si + �2Ci + �3 (Si � Ci) + wi (25)

(n) (5 points) Suppose you want to test the null hypothesis that sleeping under an ITN a¤ects the riskof contracting malaria in the same way for children as for adults. Describe the null and the alternativehypothesis in terms of the coe¢ cients.

H0 : �3 = 0

HA : �3 6= 0

(o) (4 points) You carry out an F-test for the null described in the previous step, and the value of the testis 6.5. What do you conclude?

The threshold for a 1% test and one degree of freedom is 6.63, and for a 5% level is 3.84, so you reject at5% level but not at 1%.

(p) (5 points) Based again on model (25), how would you test the null hypothesis that ITNs are not usefulin protecting adults from malaria?

H0 : �1 = 0

HA : �1 6= 0

160

Page 161: The Big Problems File

3. Suppose now that you have the two binary variables Ii and Ni; where Ii = 1 if individual i sleeps regularlyunder an ITN, while Ni = 1 if the individual sleeps regularly under an untreated net (that is, a net not treatedwith insecticide). Let pI = Pr (Ii = 1) and pN = Pr (Ni = 1) : Suppose that you want to estimate pI and pNusing Maximum Likelihood, and suppose that you have an iid sample of n observations from a malaria-pronearea.

(a) (6 points) Write down the log-likelihood of your sample.

There are three possibly outcomes (ITN, N, or no net at all). So, the likelihood is

P (Ii; TNi) = pIiI p

NiN (1� pN � pI)

1�Ii�Ni

so the log-likelihood for the whole sample will be (using, as with a Bernoulli, the properties of exponentialsand then taking logs)

lnL(pI ; pN ) = ln pI

nXi=1

Ii + ln pN

nXi=1

Ni

+ ln (1� pI � pN ) n�

nXi=1

Ii �nXi=1

Ni

!

(b) (5 points) You estimate the two parameters above using MLE, and the resulting log-Likelihood is equalto �12450: The log-likelihood evaluated at pI = 0:02 and pN = 0:15 is equal to �12455: Can you rejectthe null hypothesis that pI = 0:02 and pN = 0:15?

2 [lnLU � lnLR] = 2 [5] = 10

so you certainly reject at any standard signi�cance level (compare with a chi-square distribution with twodegrees of freedom. The threshold for a 1% level is 9.21

161

Page 162: The Big Problems File

Suppose now that you would like to estimate the fraction of individuals who do not sleep regularly underany net. Let pNO denote the true value of such fraction. Let

pNO =1

n

nXi=1

(1� Ii �Ni) :

(c) (6 points) Prove thatp limn!1

(pNO) = pNO

This is just the usual LLN with iid observatoins. The mean converges in probability to the expectedvalue, so

p limn!1

(pNO) = E (1� Ii �Ni) = P (Ii = 0; Ni = 0) = pNO

(d) (6 points) Prove thatpn (pNO � pNO)

d! N (0; pNO (1� pNO))

This is very simple once you realize that, by the CLT,

X � E (Xi)qvar(Xi)

n

d! N (0; 1)

or pn�X � E (Xi)

� d! N (0; var (Xi))

In this case, pNO is a sample mean of iid variables equal to (1� Ii �Ni), and pNO is the correspondingvariance, so p

n (pNO � pNO)d! N (0; var (1� Ii �Ni))

where

var (1� Ii �Ni) = E (1� Ii �Ni) [1� E (1� Ii �Ni)]= pNO (1� pNO)

by the de�nition of pNO

162

Page 163: The Big Problems File

11.5 Midterm 1, Fall 2008

This problem is based on actual data collected from a group of villages in rural areas of the Indian state of Orissa.A group of researchers is studying new mechanisms to increase the proportion of households who regularly useinsecticide-treated bednets (ITNs). The public health literature has amply demonstrated that the use of ITNssharply reduces the prevalence of malaria, a debilitating and potentially fatal disease spread by mosquitoes. Manypoor households in rural Orissa do now own ITNs. The researchers have teamed up with a micro-�nance institutionto evalute if poor households can be induced to purchase ITNs on credit rather than on cash. A household surveyhas been completed on a random sample of respondents within the villages where the ITNs have been sold on credit.

Let Ni denote a dummy (binary) variable equal to one when the household decides to purchase at least one net.Let Si denote a dummy variable equal to one if at least one person in the household was sick with malaria beforethe survey. The following table diplays the joint distribution of Ni and Si in the sample.

Ni = 0 Ni = 1

Si = 0 0.29 0.19Si = 1 0.25 0.27

1. (3 points) Calculate E (Si) using the �gures shown in the distribution.

E (Si) = P (Si = 1) = 0:25 + 0:27 = 0:52

2. (4 points) Calculate E(Ni j Si = 0) and E(Ni j Si = 1).

E(Ni j Si) = P (Ni = 1 j Si) =P (Ni = 1; Si)

P (Si);

so

E(Ni j Si = 1) =0:27

0:52= 0:51923

E(Ni j Si = 0) =0:19

0:29 + 0:19= 0:39583

3. (3 points) Are Si and Ni statistically independent? Justify your answer.

The conditional expectation of Ni depends on Si; so the two random variables cannot be independent. Thismakes sense, we do expect purchase to be more likely if someone in the household had recent episodes ofmalaria.

4. (5 points) Calculate Cov(Si; Ni).

We know that Cov(Si; Ni) = E (SiNi) � E (Si)E (Ni) : We also know that both Si and Ni are binary, soE (Si) = P (Si = 1) ; E (Ni) = P (Ni = 1) and E (SiNi) = P (Si = 1; Ni = 1) : So

Cov(Si; Ni) = 0:27� 0:52� (0:19 + 0:27) = 0:0308:

As expected, the two variables are positively correlated.

5. (4 points) Suppose now that you estimate an OLS regression of Ni on Si using the same data used to producethe joint distribution in the previous page. What would the slope of the regression be equal to? Justify youranswer.

We know that when we regress a variable on a binary (dummy) variable, the �slope��1 measures the �impact�

163

Page 164: The Big Problems File

on the dependent variable of changing the binary variable from zero to one. So the estimated slope will beE(Ni j Si = 1) � E(Ni j Si = 0) = 0:51923 � 0:39583 = 0:1234: The same could be calculated using the OLSformula

�1 =ccov (Si; Ni)dvar (Si) :

Because Si is binary we know thatdvar (Si) = P (Si = 1) h1� P (Si = 1)i ; so that�1 =

0:0308

0:52 (1� 0:52) = 0:1234

Now you estimate the regression model mentioned in the previous question using OLS, and the result is thefollowing (heteroskedasticity-robust standard errors in parenthesis):

Ni = 0:39(0:028)

+ 0:12(0:040)

Si: (26)

You have also estimated that the covariance between the intercept and the slope is equal to �:0008.

6. (3 points) Can you reject the null hypothesis that recent malaria episodes have no impact on the probabilityof purchasing ITNs, using a 1% level?

The test statistic is0:12

0:040= 3 > 2:58

so you should reject the null at the 1% level.

7. (4 points) From a substantive point of view, do you think the results in model (26) show that past malariaepisodes are an important predictor of ITN purchase? Is this what you expected, and why?

Yes, they do. Past malaria episodes increase the probability of purchase by 12 percent in our sample. This islarge increase. Note also that .12 represents a 30 percent increase (0:12=0:39) in the probability of purchasingrelative to the group where no one had malaria before the survey.

8. (4 points) Do you think the results in model (26) can be interpreted in a causal way? Why?

While it makes sense to expect that past malaria episodes will cause individuals to be more willing to purchaseITNs, which protect from malaria, it is hard to interpret the result in a causal sense. There are many otherfactors which could bias this relationship. For instance, poor households may be more at risk for malaria butthey may also be less likely to purchase the nets. If so, a high Si could also proxy for poverty, which would beassociated with lower purchases. This would end up biasing the OLS estimates downwards. Or Si could proxyfor the frequency of malaria in a given area (regardless of actual incidence within a household). This wouldbias the OLS estimates upwards.

9. (5 points) Calculate a 95% con�dence interval for the probability that a household purchases ITNs whenthere was at least one malaria episode in the six months before the purchase.

What you want is a con�dence interval for �0+�1; which measures the probability of a purchase among those

164

Page 165: The Big Problems File

with at least one malaria episode. So

CI =�b�0 + b�1�� 1:96q�2b�0 + �2b�1 + 2�b�0;b�1

= 0:39 + 0:12� 1:96p0:0282 + 0:042 + 2 (�:0008)

= 0:51� 0:054 88 = [0:45512; 0:56488]

Your data have been collected from a list of villages in di¤erent parts of the state of Orissa. You know that inKandhamal district malaria prevalence is very high, while prevalence is relatively low in Sambalpur district.You then re-estimate the model in (26) using district-speci�c samples. Keep in mind that the two samples arestatistically independent. The results for the two regressions are the following:

Kandhamal: Ni = 0:86(0:137)

+ 0:14(0:137)

Si;

Sambalpur: Ni = 0:33(0:065)

+ 0:087(0:096)

Si;

10. (6 points) Test the null hypothesis that the probability of purchasing at least one net among householdswhere no one has been sick with malaria in the six months before the survey in Kandhamal is twice as largeas in Sambalpur.

In this case the null hypothesis is that

H0 : �Kandhamal0 � 2� �Sambalpur0 = 0

H1 : �Kandhamal0 � 2� �Sambalpur0 6= 0:

So the �rst step to proceed with the test is the calculation of the correct standard error to use in the denomi-nator. Using the fact that the two samples are independent we have

se�b�Kandhamal0 + b�Sambalpur0

�=p0:1372 + 4� 0:0652 = 0:18886:

So the test is0:86� 2 (0:33)0:18886

= 1:059

11. (3 points) Calculate the p-value for a test of statistical signi�cance of Si in the regression estimated usingdata from Sambalpur.

The p-value is

p = 2� ���0:0870:096

�= 2� �

��0:0870:096

�= 0:3648

Now you would like to �nd out if in your study area it is true that the regular use of bednets reduces theprobability of falling sick with malaria. You have found some useful data collected before your sale-on-creditprogram. Your data include Pi, the number of nights individual i slept protected by an ITN last year. Youalso observe Si, a dummy variable equal to one if the individual had malaria last year. Suppose now that thefollowing model holds:

Si = �0 + �1Pi + �2Mi + ui; (27)

where Mi is an indicator of the density of malaria-transmitting mosquitoes in the area where individual i lives(higher values of Mi indicate that there are more mosquitoes in the area). You also know that the error isuncorrelated with both Pi and Mi:

165

Page 166: The Big Problems File

12. (4 points) What signs do you expect �1 and �2 to have? Justify your answer.

You should expect �1 < 0 (lower probability of falling sick with malaria if you sleep regularly under a net,keeping everything else constant) and �2 > 0 (higher probability of falling sick with malaria if you live in aplace with a lot of malaria-transmitting mosquitoes).

Suppose that your dataset only includes Pi and Si: You do not observe Mi; so instead of estimating model(27) you estimate a regression of Si on Pi: The usual OLS estimator is therefore:

�1 =

Pni=1

�Pi � �P

�SiPn

i=1

�Pi � �P

�2 : (28)

Now you want to study if this estimator is consistent for the true value �1 despite the fact that you are ignoringthe fact thatMi belongs to the regression too. In what follows, also let �2P = var (Pi) and �PM = cov (Pi;Mi) :

13. (6 points) Prove that

�1 = �1 + �2

Pni=1

�Pi � �P

�MiPn

i=1

�Pi � �P

�2 +

Pni=1

�Pi � �P

�uiPn

i=1

�Pi � �P

�2 (29)

Substituting the right-hand side of (27) into (28) we have

�1 =

Pni=1

�Pi � �P

�(�0 + �1Pi + �2Mi + ui)Pni=1

�Pi � �P

�2

= :�0

Xn

i=1

�Pi � �P

�| {z }=0Pn

i=1

�Pi � �P

�2 + �1"1 =

(Pni=1

�Pi � �P

�PiPn

i=1

�Pi � �P

�2#+ �2

Pni=1

�Pi � �P

�MiPn

i=1

�Pi � �P

�2 +

Pni=1

�Pi � �P

�uiPn

i=1

�Pi � �P

�2= �1 + �2

Pni=1

�Pi � �P

�MiPn

i=1

�Pi � �P

�2 +

Pni=1

�Pi � �P

�uiPn

i=1

�Pi � �P

�214. (3 points) What is the probability limit of 1n

Pni=1

�Pi � �P

�Mi? Justify your steps.

We have iid observations, so we can use the usual LLN and conclude that

p lim1

n

Xn

i=1

�Pi � �P

�Mi = p lim

1

n

Xn

i=1(Pi � E (Pi))Mi

= E [(Pi � E (Pi))Mi] = E [(Pi � E (Pi)) (Mi � E (Mi))]

= cov (Pi;Mi) = �PM

15. (4 points) What is the probability limit of 1nPni=1

�Pi � �P

�ui? Justify your steps.

Like in the previous step, we can use the usual LLN to argue that

p lim1

n

Xn

i=1

�Pi � �P

�ui = p lim

1

n

Xn

i=1(Pi � E (Pi))ui

= E [(Pi � E (Pi))ui] = E [(Pi � E (Pi)) (ui � E (ui))]= cov (Pi; ui) = 0

166

Page 167: The Big Problems File

because the error term is uncorrelated with Pi by assumption.

16. (5 points) Prove thatp lim �1 = �1 + �2

�PM�2P

(30)

We have to use again the usual LLN for the denominators, then use the results from the last two points, and�nally use the properties of probability limits (so that the p lim of a ratio is the ratio of the p lim�s). Then:

�1 = �1 + �2

p!�PMz }| {1

n

Xn

i=1

�Pi � �P

�Mi

1

n

Xn

i=1

�Pi � �P

�2| {z }p!�2P

+

p!0z }| {1

n

Xn

i=1

�Pi � �P

�ui

1

n

Xn

i=1

�Pi � �P

�2| {z }p!�2P

sop lim �1 = �1 + �2

�PM�2P

17. (4 points) Based on the result in (30), and given that �2 6= 0; we have shown that �1 is a consistent estimatorfor �1 only if �PM = 0: Would you expect �PM to be equal to, less than or greater than zero, and why?

Certainly not. �PM is the covariance between a variable which indicates that an individual sleeps regularlyunder a net and the prevalence of malaria in the place where s/he lives. We expect this covariance to bepositive. Keeping everything else constant, we would expect individuals to be more likely to use regularly abednet if they know there is a lot of malaria around (note that this model is clearly too simple. In practice itis likely that areas with more malaria are also poorer, for instance, but in our model we are keeping incomeinto account. However, here we are dealing with a simpli�ed case).

18. (5 points) Given your response to the previous question, if the true model is (27) but you estimate �1 usingthe �wrong�estimator in equation (28) (that is, if you use �1 =

Pni=1

�Pi � �P

�Si=

Pni=1

�Pi � �P

�2) will youend up overestimating or underestimating the actual bene�t of sleeping under a bednet? Justify your argumentboth algebraically and intuitively.

We have argued that �PM is likely positive, and �2 is likely positive as well. So

p lim �1 = �1 + �2�PM�2P

> �1

and hence if we estimate the �wrong�model we end up overestimating �1: Because �1 < 0 (the coe¢ cientmeasures how sleeping under a net reduces the probability of getting malaria), this means that we are under-estimating the bene�ts from sleeping under a bednet. Intuitively, if we do not �control�for how much malariathere is in a given area, we may end up �nding that ITNs do not protect from malaria because when in ourregression we look at people who use ITNs we are looking at people who live in area with more malaria. If wewant to isolate the causal impact of using nets on the probability of falling sick with malaria we need to keepother confounding factors constant.

167

Page 168: The Big Problems File

11.6 Midterm 2, Fall 2008

You want to study the relationship between hemoglobin level (Hb) and malaria in a sample of 2,651 individuals fromrural Orissa (India). Hemoglobin is a protein that has the essential task of transporting oxygen in red blood cells.Low Hb values are associated with anemia, which if severe can have very serious health consequences. Given thatthe malaria parasite destroys red blood cells, malaria is usually associated with anemia. Table 1 reports the resultsof several OLS regressions. Each column represents a di¤erent model. The second row (denoted A) indicates thedependent variable. Malaria is a binary variable equal to 1 if the individual tests positive to malaria. Male is adummy equal to one if the individual is a male, while Age indicates the individual�s age in years. The second tolast row (denoted B) indicates if the standard errors are heteroskedasticity-robust. All standard errors are indicatedin brackets below the corresponding coe¢ cient. In what follows let �X denote the regression coe¢ cient for variableX; and let �hats�denote as usual estimates. So, for instance, in Column (1), �Malaria = �0:216.

1. (3 points) Using the results in Column (2), what is the interpretation of �malaria?

Individuals with malaria are predicted to have an hemoglobin level �0:216 lower than other individuals whoare not sick with malaria.

2. (3 points) Now compare the results in Columns (1) and (2). Does the comparison suggest that the regressionresidual is heteroskedastic? Explain.

Not really. The standard errors are almost identical, so in this regression heteroskedasticity is unlikly to be animportant issue.

3. (3 points) Do you think the results in model (2) can be interpreted in a causal way? Justify your answer.

Certainly not. We are not controlling for a lot of factors which are likely to a¤ect hemoglobin levels and whichare also likely to be correlated with the included regressor, that is, with Malaria. The answer to the followingquestion represents one example of such a factor.

168

Page 169: The Big Problems File

4. (4 points) We know that anemia is a common health problem among the poor, both because of poor nutritionand because of infections. Suppose that poor individuals are also more likely to have malaria (for instance,because they are more likely to live in malarious areas, or they lack the means to protect themselves frommosquitoes). Under these conditions, how would the inclusion of income as regressor in model (2) a¤ect�malaria? Explain.

According to this argument, if we added income to the regression the coe¢ cient would be positive, whilecov (income;malaria) < 0: Hence, the exclusion of income from the regression would lead to downward bias,because the sign of the omitted variable bias (OVB) would be negative. Hence, the inclusion of income in theregression should be expected to lead to a higher value of �malaria; that is, to a value closer to zero (but mostlikely still negative).

5. (5 points) Now compare the results in Columns (2) and (3). Would you conclude that in this sample malesare more or less likely to have malaria than females? Explain.

Here we have once again to use the formula for the OVB. Here we see that the exclusion of Male from theregression leads to downward bias in �malaria (because the coe¢ cient for Malaria increases when Male isincluded). Because �male > 0; the only way the exclusion of Male can lead to a downward bias in �malaria isif ccov (malaria;male) < 0: Both Malaria and Male are binary variables, so this means that Malaria is morelikely to be equal to one when Male is equal to zero. In other words, females are more likely to have malariathan males in this sample.

6. (3 points) What is the interpretation of �malaria in the model estimated in Column (4)?

In this model the dependent variable is in logarithms. So the result indicates that, conditional on gender, beingsick with malaria predicts a 1.3% decline in hemoglobin levels.

169

Page 170: The Big Problems File

7. (3 points) In the model estimated in Column (5), can you reject the null hypothesis that the regression islinear in age, using a 1% signi�cance level?

The t-ratio ist =

�0:0020:0001

= �20

so we certainly reject, even at the 1% level.

8. (4 points) What is the interpretation of �Male�log(age) in the regression estimated in Column (6)?

The coe¢ cient indicates that a 1% increase in age, keeping all other regressors constant, increases the predictedvalue of Hb by 0:01� 0:635 = 0:00635 more than among females. In other words, older individuals appear tohave higher hemoglobin levels (keeping everything else constant) than younger ones, but this is more so amongmales than among females.

9. (5 points) You want to test the null hypothesis that the regression in Column (6) is the same for males andfemales. The result of the test is F = 161:8. Write down the null and the alternative hypothesis and determinewhether you can reject the null hypothesis.

H0 : �(Malaria=1)�(Male=1) = �(Male=1) = �(Male=1)�log(age) = 0

HA : at least one of the above coe¢ cients 6= 0

This is a joint hypothesis with 3 degrees of freedom. The critical value of an F3;1 test using a 1% level is 3.78,so we certainly reject the null at any standard level of signi�cance.

10. (5 points) Using again the results in model (6), calculate a 95% con�dence interval for the predicted change inHg associated with having malaria, for a male. The estimated covariance between �Malaria�Male and �Malaria

is equal to �:012. You do not need to complete the �nal calculation to get full credit.

Here the con�dence interval should be calculated as

�(Malaria=1)�(Male=1) + �(Malaria=1) � 1:96rdvar ��(Malaria=1)�(Male=1) + �(Malaria=1)

�;

CI = (�:186 + :038)� 1:96p0:1082 + 0:2352 + 2 (�:012)

170

Page 171: The Big Problems File

In all the models above, the standard errors have been estimated assuming that observations are iid. However,suppose that in your sample there are several cases where blood tests have been completed for more than oneperson within the same family. Suppose that the model is the following (for this question it does not matterwhat the regressor is):

yfi = �0 + �1Xfi + ufi; (31)

where ufi = �f + "fi:

So, in this model the error term ufi is the sum of two components: �f , which is an error term common toeveryone in the same family, but uncorrelated across di¤erent families; and "fi, which is the �usual� iid errorterm component. You can assume that the iid errors "fi are also uncorrelated with all the �f :

11. (3 points) Calculate Cov(ufi; ugj), where fi and gj denote two individuals that belong to two di¤erentfamilies.

Cov(ufi; ugj) = cov (�f + vfi; �g + vgj) = 0

because all elements are uncorrelated with each other by assumption.

12. (4 points) Calculate Cov(ufi; ufj), where fi and fj denote two individuals that belong to the same family.

Cov(ufi; ufj) = cov (�f + vfi; �f + vfj)

= cov (�f ; �f ) = var (�f ) > 0:

All other elements are uncorrelated by assumption.

13. (3 points) In model (31), do you think the assumption that observations in your sample are iid is correct?Explain.

De�nitely not. We have just shown that there is correlation between the residuals that belong to observationsfrom the same family, so the sample is not an iid sample. Intuitively, information about someone from a givenfamily f should be informative about someone else in our sample that belongs to the same family.

171

Page 172: The Big Problems File

Now let Ai denote a binary variable equal to one if individual i is anemic, that is, if the individual�s hemoglobinlevel is low. The results in Table 2 report the results of di¤erent regressions, estimated using OLS (the �LinearProbability Model", LPM, in Column 1), logit (in Column 2) or probit (Columns 3 and 4).

14. (6 points) You need an estimate of the predicted increase in the probability of being anemic associated withhaving malaria, for an individual for whom log(income per head) is equal to the sample mean (the mean isequal to 6.35). Calculate the predicted change using the three models in columns (1), (2) and (3) of Table 2.

For the LPM model, the predicted increase is simply 0:03: For the logit model, we need to calculate the changeas

1

1 + e�1:807�0:137+0:422(6:35)� 1

1 + e�1:807+0:422(6:35)= 0:029

while for probit� (1:055 + 0:083� 0:251 (6:35))� � (1:055� 0:251 (6:35)) = 0:029:

Not surprisingly, the predicted impacts are almost identical across the di¤erent models, as it is usually thecase when the value of the regressors is chosen to be close to their means.

15. (4 points) Using the results in columns (3) and (4), test the null hypothesis that the coe¢ cients for log(incomeper head)2 and log(income per head)3 are equal to zero, using a 5% level.

With the information at hand, the test can be performed using a likelihood ratio test. The �unrestrictedmodel�is the one with the square and the cube, while the �restricted model�is the one without. So

L = 2 [logLU � logLR] = 2 (�1601:31 + 1602:35) = 2:08

The null hypothesis imposes two restrictions, so we have to compare this value with the critical value of a �2(2)distribution, which is 5.99. Because 2:08 < 5:99 we do not reject the null.

172

Page 173: The Big Problems File

Now you have collected data on the prevalence of anemia from a large number V of villages. Let x denote thenumber of individuals in the village who are not anemic (let us call them �healthy�). Based on preliminaryanalysis, you know that the true density function of x is well approximated by the following distribution:

f (x) = �e��x; 0 � x � 1; (32)

where, however, you do not know the value of the parameter �:

16. (4 points) Let xv denote the number of healthy individuals in village v:Write down the log-likelihood functionof your sample of villages, keeping in mind that you have an iid sample of villages. Explain your steps.

The density for one observation is �e��xv ; and given that observations are iid; the likelihood of the samplecan be written as the product of the village-speci�c likelihoods. Hence

L =VYv=1

�e��xv

and the log-likelihood is

L = lnL (�) =VXv=1

ln��e��xv

�= V ln�� �

VXv=1

xv

= V (ln�� ��x)

17. (3 points) Prove that �MLE , that is, the MLE of � is equal to 1=�x; where �x is the sample mean of xv acrossall villages in your sample.

To �nd the MLE we maximize the log-likelihood wrt the parameter �: So, we calculate the FOC and we solve.

@ lnL (�)@�

= V

�1

�� �x�= 0) �MLE =

1

�x

173

Page 174: The Big Problems File

18. (4 points) Calculate p lim �MLE . Justify your steps.

This is just a straightforward application of the LLN. We have iid observations so we know that

p lim �x = E (xv) :

Then we also know that (by the properties of probability limits)

p lim �MLE = p lim1

�x=

1

p lim �x=

1

E (xv)

19. (4 points) What is the relationship between � (the true value) and the expected value of xv? Justify youranswer.

We have just shown that p lim �MLE =1

E(xv):We also know that �MLE is consistent (because it is a maximum

likelihood estimator and we know that we are using the right likelihood). Hence, it must be that

p lim �MLE =1

E (xv)= �) E (xv) =

1

�:

You could have also calculated the expected value directly from (32) above, by using the de�nition of expec-tation, that is, by solving the integral

E (xv) =

Z 1

0x�e��xdx:

174

Page 175: The Big Problems File

Table 1(1) (2) (3) (4) (5) (6)

(A) Dependent Variable! Hb Hb Hb log(Hb) Hb Hb

RegressorsMalaria -0.216 -0.216 -0.145 -0.013 -0.189 -0.186

[0.114] [0.111] [0.105] [0.010] [0.113] [0.108]Male 1.225 0.106 1.561 -0.23

[0.080] [0.007] [0.076] [0.141]Malaria� Male 0.001 0.038

[0.236] [0.235]Age 0.128

[0.007]Age2 -0.002

[0.0001]log(age) 0.215

[0.033]Male� log(age) 0.635

[0.052]Constant 10.916 10.916 10.468 2.335 8.937 9.857

[0.039] [0.040] [0.041] [0.004] [0.079] [0.097]

(B) Heteroskedasticity-robust st. errors No Yes Yes Yes Yes YesR-squared 0.00 0.00 0.10 0.08 0.23 0.25

Table 2(1) (2) (3) (4)LPM Logit Probit Probit

malaria 0.03 0.137 0.083 0.079[0.027] [0.129] [0.079] [0.079]

log(income per head) -0.086 -0.422 -0.251 -1.504[0.016] [0.079] [0.047] [3.022]

log(income per head)2 0.13[0.449]

log(income per head)3 -0.003[0.022]

Constant 0.841 1.807 1.055 -0.003[0.102] [0.504] [0.300] [0.022]

Log-likelihood -1602.18 -1602.35 -1601.31

175

Page 176: The Big Problems File

11.7 Final, Fall 2008

This problem uses data from a sample of 22,445 zero to 3 year old Indian children. For simplicity, you can assumethat the data are iid throughout the problem, unless speci�ed otherwise. Among the other things, the datasetincludes information on the weight and height of each child, as well as measures of nutritional status which evaluatethe growth performance of the child relative to a reference population of healthy, well-fed children. Let UWi denotea binary variable equal to one if child i is �underweight�, that is, if the child�s weight is very low relative to normalstandards for children of the same age and gender. Let also STi denote a binary variable equal to one if a childis �stunted�, that is, if the child�s height is very low relative to normal standards for children of the same age andgender. The following table shows the distribution of UWi and STi in your sample.

STi = 0 STi = 1

UWi = 0 0.46 0.11UWi = 1 0.11 0.32

1. (4 points) Estimate the fraction of stunted children in the sample.

Solution:E (STi) = P (STi = 1) = 0:43

2. (4 points) Calculate an estimate of P (UWi = 1 j STi = 1).

Solution:

P (UWi = 1 j STi) =P (UWi = 1; STi = 1)

P (STi = 1)=0:32

0:43= 0:74419

3. (5 points) Estimate the variance of UWi.

Solution: This is a binary variable, so we can use the formula for the variance of a Bernoulli.

dvar(UWi) = P (UWi = 1)h1� P (UWi = 1)

i= 0:43 (1� :43) = :2451

176

Page 177: The Big Problems File

4. (6 points) Estimate the covariance between UWi and STi.

Solution: we know that cov (UWi; STi) = E (UWi � STi) � E (UWi)E (STi) : Also, note that UWi � STi isdi¤erent from zero only if both variables are di¤erent from zero. Hence

dcov (UWi; STi) = E (UWi � STi)� E (UWi) E (STi)

= P (STi = 1; UWi = 1)� P (STi = 1) P (UWi = 1)

= 0:32� 0:43 (0:43) = 0:1351:

5. (6 points) Calculate a 95% con�dence interval for E(UWi).

Solution: First, recall that we estimate expectations using sample means, so that

E(UWi) = UW i =1

n

nXi=1

UWi:

We also know that the variance of UW i isV ar(UWi)

n : Hence the con�dence interval is constructed as

UW i � 1:96

s dV ar (UWi)

n= 0:43� 1:96

r0:2451

22445

= [0:42352; 0:436 48]

177

Page 178: The Big Problems File

Now we want to see if the prevalence of undernutrition changes with age, and we estimate di¤erent regressionmodels where the dependent variable is UWi. The results of the di¤erent regressions are listed in Table 1. Asusual, let �X denote the slope of a regression with respect to regressor X.

6. (5 points) Using the OLS model (1), calculate the predicted probability of being underweight for a girl, whenage is 12, 18 or 36 months. How does prevalence of underweight change with age?

dUW i (age = 12) = �0:011 + 0:02401 (1) + 0:04724 (12)� 0:00097�122�= 0:44021dUW i (age = 18) = �0:011 + 0:02401 (1) + 0:04724 (18)� 0:00097

�182�= 0:54905dUW i (age = 36) = �0:011 + 0:02401 (1) + 0:04724 (36)� 0:00097

�362�= 0:45653

According to these estimates, the shape of relationship between UW and age is parabolic and concave. Theprevalence of undernutrition �rst increases with age and then decreases.

7. (5 points) Using again the OLS model (1), test the null hypothesis that the regression is linear in age, usinga 1% signi�cance level.

The null hypothesis is that �Age2 = 0: So the t-ratio is

t =�Age2bse��Age2� =

�0:000970:00003

= �32:333:

The null is certainly rejected.

8. (4 points) What is the interpretation of �log(Age) in model (2)?

A 1% increase in age in months increases the prevalence of underweight by 0:01 (0:1775) :

178

Page 179: The Big Problems File

9. (5 points) Using model (3), test the null hypothesis that age a¤ects in the same way boys�and girls�predictedprobability of being underweight.

The null hypothesis is that �Female�log(Age) = 0:So the t-ratio is

t =�Female�log(Age)bse��Female�log(Age)� =

0:0085

0:0059= 1:4407:

So the null cannot be rejected at any standard signi�cance level.

10. (6 points) Using both regression models (3) and (4), calculate the di¤erence in the predicted probability ofbeing underweight between a 12 and an 18-month old boy.

Using the linear model (3), we have

�0:0292 + 0:1735 ln (18)� [�0:0292 + 0:1735 ln (12)]

= 0:1735 ln

�18

12

�= 0:070348

Using the probit model (4) we have

� (�1:6206 + 0:5332 ln 18)� � (�1:6206 + 0:5332 ln 12)= � (�0:079454)� � (�0:295 65)= :46833576� :38374869 = 0:08458707;

11. (4 points) Are the results similar between the two models? Is this what you expected?

The results are quite close, which is what we usually expect. OLS and probit usually give very similar results,as long as the predictions are being calculated for values of the regressors that are not too extreme.

179

Page 180: The Big Problems File

12. (6 points) Using both regression models (3) and (4), calculate the predicted probability of being underweightfor a 1-month old boy. Are the results very di¤erent from each other? Is this what you expected?

Using the linear model (3), we have

�0:0292 + 0:1735 ln (1) = �0:0292

Using the probit model (4) we have

� (�1:6206 + 0:5332 ln 1) = � (�1:6206) = :05255173:

The results are quite di¤erent! The OLS prediction is even negative. This is perhaps not expected but weshould not be too surprised. Here the prediction is being made for a fairly extreme value of the regressor.Recall that children in the sample are 1-35 month old, so choosing children of age 1 means that we are choosingvalues of the regressors which are close to the boundaries. For such values we know that the choice of modelmay matter, and here it does!

13. (5 points) Let Ri denote a dummy variable equal to one if child i lives in a rural area. How should youmodify model (2) if you wanted to test the null hypothesis that gender di¤erences in underweight are di¤erentin rural vs. urban areas, keeping age constant?

You should estimate a regression of UWi on Femalei; ln (Agei) and the interaction between Ri and Femalei:The null hypothesis would be that �Female�Ri = 0:

180

Page 181: The Big Problems File

Now you want to study how the prevalence of underweight if related to episodes of intestinal disease. Let Dibe a binary variable equal to one if the child had an intestinal disease (such as diarrhea) in the 3 months beforethe survey. Table 2 reports the results of di¤erent regressions, all estimated using OLS. The �Asset index�isa indicator of wealth (a larger number indicates more wealth).

14. (6 points) Looking at the results in model (5), does it look like Di changes signi�cantly the probability thata child is underweight? Evaluate both the statistical signi�cance and the magnitude of the coe¢ cient.

Di is certainly statistically signi�cant at any standard signi�cance level (the t-ratio is 7.28). The resutls indi-cate that the probability of being underweight is about 3 percentage points higher for children who recentlyhad intestinal diseases. We know that overall 43 percent of children were underweight, so a 3 percent di¤erenceis not huge (it�s about 7 percent of the mean) but it�s not negligible either.

15. (5 points) Do you think that the results in model (5) can be interpreted as suggesting that intestinal diseaseis a cause of underweight? Explain.

Not at all. There are many other factors which are likely to a¤ect UWi and which are also correlated withDi: For instance, we would expect children that live in poorer areas to su¤er more from malnutrition and alsoto su¤er more from intestinal diseases.

16. (5 points) Compare the results in models (5) and (6). Does �D change in the expected direction when youinclude the asset index in the regression?

Yes it does, although the change is very small, and perhaps smaller in magnitude than we could have ex-pected. First, note that in model (2) �AssetIndex < 0 (wealth reduced underweight, conditional on Di): Also,we would expect children from richer families to su¤er less from intenstinal diseases, so we would expectcov (Di; AssetIndex) < 0: Hence, the OVB that results from excluding AssetIndex from the regression shouldbe expected to be positive, and indeed with the inclusion of AssetIndex; �D becomes smaller, as expected.

181

Page 182: The Big Problems File

17. (5 points) Compare the results in models (6) and (7). Does �AssetIndex change in the expected direction whenyou include in the regression a binary variable equal to one when the child�s mother is illiterate?

Yes it does. First, note that in model (3) �MotherIlliterate > 0: Also, we would expect children from familieswhere the mother is illiterate to su¤er more from intenstinal diseases, so we would expect cov (MotherIlliteratei; Di) <0: Hence, the OVB that results from excluding MotherIlliterate from the regression should be expected to benegative, and indeed with the inclusion of MotherIlliterate; �AssetIndex becomes larger (closer to zero), asexpected.

18. (4 points) Using the results in model (8), construct a test for the null hypothesis that mother�s and father�silliteracy predict equal increases in the probability of child underweight. You have estimated that

Cov(�FatherIlliterate; �MotherIlliterate) = �:00002189:

You do not have to complete the calculation to get full credit.

Solution:

t =�FatherIlliterate � �MotherIlliteraterbse2 ��FatherIlliterate�+ bse2 ��MotherIlliterate�� 2 (�:00002189)

=0:04368� 0:09974p

0:008462 + 0:007802 � 2 (�:00002189)= �4:2234

so the null is certainly rejected at any standard signi�cance level (the �nal calculation was not necessary toget full credit).

182

Page 183: The Big Problems File

You are worried that your estimates of the e¤ect of Di on UWi may be biased by unobserved omitted factorsat the household level. For this reason, you re-estimate the model using Fixed E¤ects and Random E¤ects,including only households where there are at least two children below 3 years of age. In this estimation, the�group� is then represented by a household, while the within-group observations are the di¤erent childrenwithin the same household. Keep in mind that a �household�may include more than one family, and that theAsset index is household-speci�c. The following table shows the results.

(FE) (RE)

Had diarrhea recently 0.02934 0.02941[0.01128] [0.00792]

Mother is Illiterate 0.0702 0.09801[0.04061] [0.01447]

Asset index -0.03515[0.00382]

Constant 0.35586[0.01036]

Observations 5722 5722Number of households 2731 2731

Standard errors in brackets

19. (4 points) The FE model has not estimated the slope �AssetIndex. Why?

Solution: The problem states that the asset index has been calculated at the household level, so when weestimate model (1), everything that is invariant within the houshold will be �absorbed�by the FE and thecorresponding slopes cannot be estimated.

20. (6 points) You perform a Hausman test, and the value of the test is 0.54 (note: this is the value of the test,not the p-value). Based on this result, which model should you use, and why?

We know that the null of the Hausman test is that the �xed e¤ects are not correlated with the regressors.The test is distributed as a �2m; where m is the number of slopes estimated in both FE and RE models. Here,m = 2: The critical value for a �22 test if we use a 10 percent signi�cance level is 4.61, and our test is 0.54.Therefore, we do not reject the null hypothesis at standard signi�cance levels (the critical values would beeven larger with smaller signi�cance levels). This suggests that in this case we should use RE, which not onlyis consistent but will also be more e¢ icient than FE.

183

Page 184: The Big Problems File

21. (6 points) As a further robustness check, you would like to re-estimate the e¤ect of Di on underweight usinginstrumental variable estimation. A colleague suggests to use, as instruments, the two binary variables Ci andFi, where Ci is = 1 if the child had recent episodes of cough or other respiratory ailments, while Fi is = 1 ifthe child recently had fever. Do you think these two variables satisfy the requirements for valid instruments?Explain.

Both variables are likely to be relevant but not exogenous. Both are likely to be relevant, because they areall measures of child health, and so we expect them to be strongly correlated with each other. But both areunlikely to be exogenous. If we think that the prevalence of intestinal disease is likely correlated with omittedvariables (such as income, availability of health care and sanitation, quality of housing etc.) we should ex-pect these omitted factors to be correlated with Ci and Fi as well. So, these instuments are not likely to be valid.

22. (6 points) Consider now the simple OLS estimates in column (5), Table 2. If you were worried that thismodel su¤ers from omitted variable bias, and you assume that Ci and Fi are valid instruments, would youexpect �D to increase or decrease, if you estimate model (5) using 2SLS? Explain carefully your argument.

Based on the answers in the previous points, we mostly expect �D to be biased upwards, because Di is likelyabsorbing the impact on UWi of factors associated to poverty. Such factors will generally increase UW andbe positively correlated with Di: So, we would expect OLS estimates to be biased upwards. Hence, if the

instruments were valid, we would expect �IV

D to be smaller than �OLS

D .

184

Page 185: The Big Problems File

23. (5 points) You go ahead with your colleague�s idea and you re-estimate model (5), Table 2, using Ci and Fias instruments. You test the hypothesis that instruments are weak, and the result of the test is F = 600:4.Explain how the test is performed, and whether you conclude that the instruments are weak.

Solution: As expected, the instruments are very strong. 600:4 is way larger than the �rule-of-thumb�thresh-hold of 10. The test for instrument weakness is performed after the �rst stage, by calculating, in our case, thevalue of the F test of the null hypothesis that all instruments are not signi�cant in a regression of Di on Ciand Fi.

24. (6 points) Another colleague is not persuaded that the instruments are exogenous, so you also perform a testof exogeneity. The result of the F-test is 5.21. Explain how the test is performed, and whether you concludethat the instruments are exogenous, using a 5 percent level.

Solution: The test is performed after the second stage of 2SLS. First, the residuals are calculates as

ui = UWi � �2SLS

0 � �2SLSD Di:

Then ui is regressed on Ci and Fi, and the F test for the null hypothesis that both variables are not sig-ni�cant is calculated. Here the result of the F-test is 5.21. Then we have to compare mF with the criticalvalue for a �2m�k; where m = 2 is the number of instruments, and m � k is the number of overidentifyingrestrictions. Because 2 (5:21) = 10:42; we reject the null hypothesis that the instruments are exogenous (as ex-pected!) at any standard signi�cance level (the critical value for a �21 for the most conservative 1% test is 6.63).

25. (5 points) Suppose that you know that the true value �D = 0:01. In your sample, the 2SLS estimate�D = 0:02. Does this imply that your estimator for �D is biased? Does it imply that your estimator is notconsistent? Explain.

Solution: The point estimate has nothing to do with either bias or consistency. Both are properties of anthe estimator which do not depend on the actual point estimate we get. Indeed, our point estimate is prettymuch never equal to the true value, but still there are plenty of unbiased and consistent estimators out there(just think of the sample mean).

185

Page 186: The Big Problems File

Now you want to evaluate the height performance of children in your study population. Let Hi denote ameasure of the height performance of a child relative to a reference of healthy and well fed children. You havereasons to believe that Hi is normally distributed. Recall that if a random variable H is distributed normallywith expected value �H and variance �

2, its density is

f (H) =1p2��

e� 12

�H��H

�2:

You already know that the (true) expected value of Hi in the population is �H ; but you want to estimate thevariance using Maximum Likelihood. Recall that you can assume that the observations are iid.

26. (4 points) Prove that the log-likelihood of your sample can be written as

lnL�H1:::Hnj�2

�= �n ln

p2� � n

2ln(�2)� 1

2�2

nXi=1

(Hi � �H)2

Solution: the density for a single observation is the one indicated above, so, given that we can assume iidobservations:

L�H1:::Hnj�2

�=

nYi=1

1p2��

e� 12

�Hi��H

�2

and taking logs

lnL�H1:::Hnj�2

�=

nXi=1

ln

�1p2��

e� 12

�Hi��H

�2�

= �nXi=1

"ln�p2��

�� 12

nXi=1

�Hi � �H

�2#

= �n lnp2� � n ln� � 1

2�2

nXi=1

(Hi � �H)2

= �n lnp2� � n

2ln�2 � 1

2�2

nXi=1

(Hi � �H)2

27. (6 points) Prove that the MLE of the variance is �2MLE =1n

Pni=1 (Hi � �H)

2

Solution: we have to take the �rst order condition with respect to �2 and solve

@ lnL�H1:::Hnj�2

�@�2

= 0 = � n

2�2+

1

2 (�2)2

nXi=1

(Hi � �H)2

= �n+ 1

�2MLE

nXi=1

(Hi � �H)2

) �2MLE =1

n

nXi=1

(Hi � �H)2

186

Page 187: The Big Problems File

28. (6 points) Is �2MLE a consistent estimator of the true variance �2? Justify your answer.

Solution: �2MLE is consistent. We have iid observations, and by the LLN we know that (if the usual conditionshold) the mean of iid observations converges in probability to the expectation of any one observation. Hence

�2MLE =1

n

nXi=1

(Hi � �H)2p! E

h(Hi � �H)2

i= �2:

29. (5 points) Suppose that �2MLE = 2:75. Construct a test for the null hypothesis that �2 = 3. (This question

is relatively hard. You do not need to complete the �nal calculations to get full credit).

Solution: with the information at hand, we can use a likelihood ratio test. Recall that

L = 2 [lnLU � lnLR] ;

and that

lnL�H1:::Hnj�2

�= �n ln

p2� � n

2ln�2 � 1

2�2

nXi=1

(Hi � �H)2

Also, we have proved that the unrestricted MLE of the variance is �2MLE =1n

Pni=1 (Hi � �H)

2 ; so we canwrite

lnL�H1:::Hnj�2

�= �n ln

p2� � n

2ln�2 � n

2�2

"1

n

nXi=1

(Hi � �H)2#

= �n lnp2� � n

2ln�2 � n

2�2�2MLE :

So

L = 2

��n ln

p2� � n

2ln �2MLE �

n

2�2MLE

�2MLE ���n ln

p2� � n

2ln 3� n

2 (3)�2MLE

��= 2

��22445

2ln 2:75� 22445

2���22445

2ln 3� 22445

2 (3)2:75

��Note that in principle one could use the formula from the very last question (n. 33) to construct a test.However, such expression (which still would yield partial credit) was not acceptable as an answer to this

question, because the expression in (33) requires knowledge of 1nnPi=1

h(Hi � �H)2 � �2MLE

i2, which you do not

instead observe here. As you can see above, the LR test can instead be calculated with the data at hand.

187

Page 188: The Big Problems File

30. (5 points) Now we need to derive the asymptotic distribution of �2MLE : First prove that

pn��2MLE � �2

�=pn�v;

where �v is the sample mean of vi � (Hi � �H)2 � �2:

Solution:

�2MLE =1

n

nXi=1

(Hi � �H)2 =) �2MLE � �2 =1

n

nXi=1

h(Hi � �H)2

i� �2

=1

n

nXi=1

h(Hi � �H)2

i� nn�2

=) �2MLE � �2 =1

n

nXi=1

h(Hi � �H)2 � �2

i=)

pn��2MLE � �2

�=pn1

n

nXi=1

h(Hi � �H)2 � �2

i=pn�v

31. (5 points) Prove that E (vi) = 0:

Solution:E (vi) = E

h(Hi � �H)2 � �2

i= E

h(Hi � �H)2

i| {z }

=�2

� �2 = 0

32. (5 points) Prove that �2v � V ar (vi) = E�h(Hi � �H)2 � �2

i2�:

Solution: We have just proved that E (vi) = 0; so it follows that

V ar (vi) = E�v2i��

24E (vi)| {z }=0

352 = E �v2i � = E�h(Hi � �H)2 � �2i2�

by the de�nition of vi:

188

Page 189: The Big Problems File

33. (5 points) Using the results in the previous steps, prove that

pn��2MLE � �2

�=�v � E (vi)

�vpn

�v

Solution: We have already proved thatpn��2MLE � �2

�=pn�v and that E (vi) = 0: But then

pn��2MLE � �2

�=

pn�v�v�v=pn (�v � 0) �v

�v=pn (�v � E (vi))

�v�v

=pn�v � E (vi)

�v�v =

�v � E (vi)�vpn

�v

34. (6 points) Prove thatpn��2MLE � �2

� d! N�0; �2v

�. Justify your steps!

Solution: By the Central Limit Theorem, we know that if we have a sample of iid random variables with�nite mean and variance it follows that

�v � E (vi)�vpn

d! N (0; 1) :

But then, using the properties of variances, we have that

pn��2MLE � �2

�=�v � E (vi)

�vpn| {z }

d!N(0;1)

�vd! �vN (0; 1) = N

�0; �2v

189

Page 190: The Big Problems File

35. (5 points) Based on the result in the previous step, and using the de�nition of �2v; how would you calculate a95% con�dence interval for the estimated variance �2? (Note: this question is worth fewer points that it shouldbased on its di¢ culty, so plan accordingly. Hint: think about what the approximate value of the variance of�2MLE is in large samples, and think about how you would estimate all the elements that are unknown, thatis, those elements that need to estimated)

Solution: We know thatpn��2MLE � �2

� d! N�0; �2v

�: Therefore, in large but �nite samples we also have

that the following is approximately true

pn��2MLE � �2

�� N

�0; �2v

�:

Using the properties of normal distribution we have therefore

�2MLE � �2 �1pnN�0; �2v

�= N

�0;1

n�2v

�and

�2MLE � N�0;1

n�2v

�� �2 = N

��2;

1

n�2v

�:

So, using the de�nition of vi and of �2v a 95% con�dence interval will be

�2MLE � 1:96r1

n�2v = �

2MLE � 1:96

s1

nE

�h(Hi � �H)2 � �2

i2�:

However, in this expression there are several elements which are unknown. We have been assuming that weknow �H ; so this is not a problem. We do not know �

2; but we have an estimate �2MLE : Finally, we can alwaysestimate an expectation with a sample mean. So

�2MLE � 1:96r1

n�2v = �

2MLE � 1:96

vuut 1

n

"1

n

nXi=1

h(Hi � �H)2 � �2MLE

i2#

190

Page 191: The Big Problems File

Table 1: Dependent variable: UWi (=1 if child i is underweight)

(1) (2) (3) (4)OLS OLS OLS Probit

Female 0.02401 0.0244 0.0024 0.052[0.00628] [0.0063] [0.0148] [0.0627]

Age 0.04724[0.00114]

Age2 -0.00097[0.00003]

log(Age) 0.1775 0.1735 0.5332[0.0029] [0.0039] [0.0153]

Female�log(Age) 0.0085 0.0066[0.0059] [0.0226]

Constant -0.011 -0.0396 -0.0292 -1.6206[0.00776] [0.0078] [0.0097] [0.0425]

R-squared 0.10187 0.0958 0.0958Log-likelihood -14140.528

Robust standard errors in brackets

191

Page 192: The Big Problems File

11.8 Midterm 1, Fall 2009

You are using a sample of 985 households who live in Delhi, India. The data set includes information on their totalmonthly expenditure per head (pce) and their expenditure in food. Let Pi be a binary variable equal to one if thehousehold is �poor�, where here a household is considered to be poor if its pce is below the sample mean. Let alsoFi denote a binary variable equal to one if the household spends more than 50% of its total budget in food. Thefollowing table shows the joint distribution of Pi and Fi in your sample:

Fi = 0 Fi = 1

Pi = 0 0.1878 0.1350Pi = 1 0.1066 0.5706

1. (3 points) Estimate the fraction of households in your sample who spend more than 50% of their total budgetin food, that is, calculate cPr (Fi = 1) :Solution: This is just cPr (Fi = 1; Pi = 0) +cPr (Fi = 1; Pi = 1) = 0:135 + 0:5706 = 0:7056:

2. (3 points) Estimate E (Pi) :

Solution: Pi is a binary variable, so E (Pi) = cPr (Pi = 1) = 0:1066 + 0:5706 = 0:6772:3. (4 points) Estimate E (FijPi = 0) and E (FijPi = 1) :

Solution: again it helps to note that F is a binary variable. So

E (FijPi = 0) = cPr (Fi = 1jPi = 0) = cPr (Fi = 1; Pi = 0)cPr (Pi = 0) =0:135

1� 0:6772 = 0:41822:

Similarly

E (FijPi = 1) = cPr (Fi = 1jPi = 1) = cPr (Fi = 1; Pi = 1)cPr (Pi = 1) =0:5706

0:6772= 0:84259:

4. (3 points) Interpret the result in the previous point. For instance, what does it mean from an economic pointof view? Is it what you expected?

Solution: The relative magnitude of the two conditional expectations indicate that more than 80% of thepoor in this sample spend more than half of their total budget in food, while only 40 percent do among thenon-poor. This was to be expected. This being a sample from a poor country, we would expect the poor(because of their low income) to spend a very large fraction of their total outlay in food, which is a necessity.

5. (5 points) Estimate �FP ; that is, the correlation coe¢ cient between Fi and Pi:

Solution: Once again it helps using the binary nature of the two random variables, which implies that theexpectation of their product is the probability that both are equal to one. Hence

�FP = E (FiPi)� E (Fi) E (Pi) = cPr (Fi = 1; Pi = 1)�cPr (Fi = 1)cPr (Pi = 1)= 0:5706� 0:6772 (0:7056) = 0:092768:

192

Page 193: The Big Problems File

Also, we know that the variance of a binary variable with probability p is p (1� p) ; so

�FP =�FP�F �P

=0:092768p

0:6772 (1� 0:6772) 0:7056 (1� 0:7056)= 0:43534:

6. (5 points) Construct a 95% con�dence interval for the fraction of households who are poor in the population:

Solution: Recall that this is just a con�dence interval for a sample mean. Hence the con�dence interval is

cPr (Pi = 1) 6= 1:96ds:e:�cPr (Pi = 1)� ;where

ds:e:�cPr (Pi = 1)� =vuutcPr (Pi = 1)�1�cPr (Pi = 1)�

n=

r0:6772 (1� 0:6772)

985= 0:014897

so that the CI is[0:678� 1:96 (0:014897) ; 0:678 + 1:96 (0:014897)]

7. (4 points) Suppose that you know that mean pce among the poor is 579 Rupees per person per month (thisis about 35 USD taking into account the di¤erence in purchasing power between US and India), while youknow that among the non-poor the mean is 1832. Estimate E (pce) in the population.

Solution: This an application of the LIE, because we know that

E (pce) = EP [E (pcejP )] = 0:6772 (579) + (1� 0:6772) 1832 = 983:47

193

Page 194: The Big Problems File

Let FSi denote the budget share spent in food by household i: That is

FSi = 100�Total monthly expenditure per head in foodi

pcei:

You estimate a linear regression of FSi on pcei (in 100 Rupees) and this is the result:

dFSi = 59:7(0.986)

� 0:194(0.097)

pcei:

8. (3 points) What is the interpretation of the slope in this regression?

Solution: The result indicates that an increase in pce of 100 Rs predicts a decrease of slightly less than 0.2percentage points in the share of the budget spent in food.

9. (3 points) Does the intercept have a meaningful interpretation in this regression?

Solution: No. In this regression, the intercept indicates the food budget share for a household with zeroexpenditure, which is not a very meaningful quantity to estimate...

10. (4 points) Is pcei signi�cant at the 1, 5 and 10 percent level?

Solution: The t-ratio is�0:1940:097

= �2;

so we reject the null that the slope is zero using a 10 or 5 percent level (barely, in this case) but we cannotreject the null if we use a 1% level.

11. (4 points) Calculate the p-value for the two-sided test in the previous point.

Solution:p = 2� (�2) = (:047)

12. (4 points) Estimate the budget share spent in food for a household whose per capita expenditure is Rs 5000(recall that in the regression estimated above pcei denotes expenditure divided by 100).

Solution: E (FSijpcei = 5000=100) = 59:7� 0:194 (50) = 50:0:

13. (5 points) Estimate a 95% con�dence interval for the prediction estimated in the previous question, taking intoaccount that the covariance between the estimated slope and intercept is �0:085: You do not need to completethe calculation to get full credit. Just write down the correct formula and plug in the correct estimates.

Solution: 50:0� 1:96p0:9862 + (502) 0:0972 + 2 (50) (�0:085) = 50� 7:8387:

194

Page 195: The Big Problems File

Now you estimate the following model,pcei = �0 + �ssi + ui; (33)

where the depedent variable is pcei (in �00 Rupees) and si is household size, that is, the number of membersin household i: You estimate model (33) with OLS, and the results are the following:

cpcei = 12:3(0.64)

� 0:60(0.29)

si: (34)

14. (5 points) Calculate a 95% con�dence interval for the di¤erence in predicted pcei (in �00 Rupees) between ahousehold with 3 members and a household with 6 members. You do not need to complete the calculation toget full credit. Setting up the problem correctly is su¢ cient.

Solution: First, note that the point estimate of the quantity for which we need a con�dence interval is

�0:60 (6� 3) = �1:8:

The con�dence interval is then

�1:8� 1:96 (3) (0:29) = �1:8� 1:7052 = [�3:5052;�0:0948]

15. (4 points) A colleague sees the results of your regression and concludes that increased use of contraception,by leading to lower family sizes, is very likely to be an e¤ective development policy, because based on yourresults reduced family size will certainly lead to increased household expenditure. Do you think this is a soundargument? Justify your answer.

Solution: The argument is nonsense. The OLS results only document the existence of a negative correlationbetween pce and household size, but tell us nothing about the causal relationship between the two. Indeed,many economists argue that the causal pathway goes from poverty to large family size, and not vice-versa.

16. (5 points) The OLS standard errors in equation equation (34) have been estimated using the formula we saw inclass. Speci�cally, the variance of the slope has been estimated using the following large sample approximation:

V ar��s

�� 1

n

var [(si � �s)ui](�2s)

2 ;

where �s � E (si) and �2s � V ar (si) : Assume that the error term ui satis�es the usual zero conditional meanassumption, that is, assume that E (uijsi) = 0: Prove that the following is true:

var [(si � �s)ui] = Eh(si � �s)2 u2i

i:

Solution: First, recall that the variance of a random variable Y can be written as E�Y 2�� [E (Y )]2 : Then

var [(si � �s)ui] = Eh(si � �s)2 u2i

i� fE [(si � �s)ui]g2:

Now consider the term in curly brackets and use LIE:

E [(si � �s)ui] = E

8><>:E264 (si � �s)| {z }constant, given s

uijsi

3759>=>; = E

8<:(si � �s)E [uijsi]| {z }=0

9=; = 0:

Thenvar [(si � �s)ui] = E

h(si � �s)2 u2i

i� f0g2 = E

h(si � �s)2 u2i

i:QED

195

Page 196: The Big Problems File

17. (3 points) Prove that V ar (uijsi) = E�u2i jsi

�:

Solution: Let us rewrite the variance using a formula analogous to the one we just used, V ar (uijsi) =E�u2i jsi

�� [E (uijsi)]2 (recall that here all expectations need to be conditional). But then

V ar (uijsi) = E�u2i jsi

�because we know that E (uijsi) = 0 by assumption.

18. (4 points) Suppose now that the variance of the residual ui does not depend on the regressor si; that is

V ar (uijsi) = E�u2i jsi

�= E

�u2i�= �2u:

Prove that in this case we havevar [(si � �s)ui] = �2u�2s:

Solution: We already know that var [(si � �s)ui] = Eh(si � �s)2 u2i

i: Then, by using LIE again we have

Eh(si � �s)2 u2i

i= E

nEh(si � �s)2 u2i jsi

io= E

8><>:(si � �s)2 �E�u2i jsi

�| {z }=�2u by assumption

9>=>;= E

h�2u (si � �s)

2i= �2uE

h(si � �s)2

i| {z }

=�2s

= �2u�2s QED

19. (3 points) Using the results in the previous points, prove that, under the assumption that the variance of theerror does not depend on the regressor (that is, under the assumption that V ar (uijsi) = �2u) the variance ofthe slope in your OLS regression can obtained using the following large sample approximation:

V ar��s

�� 1

n

�2u�2s:

Solution:

V ar��s

�� 1

n

V ar [(si � �s)ui](�2s)

2 :

But in 18 we have proved that if V ar (uijsi) = �2u then V ar [(si � �s)ui] = �2u�2s; so, substituting in we have

V ar��s

�� 1

n

�2u�2s

(�2s)2 =

1

n

�2u�2s

QED

196

Page 197: The Big Problems File

Table 2: Dependent variable: UWi (=1 if child i is underweight)

(5) (6) (7) (8)

D (had intestinal disease recently) 0.02992 0.02644 0.02582 0.02572[0.00411] [0.00401] [0.00400] [0.00399]

Asset index -0.05615 -0.04288 -0.04114[0.00149] [0.00171] [0.00173]

Mother is illiterate 0.1127 0.09974[0.00736] [0.00780]

Father is illiterate 0.04368[0.00846]

Constant 0.41864 0.42263 0.36539 0.36065[0.00370] [0.00362] [0.00513] [0.00520]

Observations 22445 22445 22445 22445R-squared 0.00239 0.05087 0.0611 0.06229

Robust standard errors in brackets

197