practice problems for midterm 1 - discover · pdf filepractice problems for midterm 1 multiple...

1

Practice Problems for Midterm 1 Multiple Choice Questions Chapter 2 1) The probability of an outcome

a. is the number of times that the outcome occurs in the long run. b. equals M×N, where M is the number of occurrences and N is the population size. c. is the proportion of times that the outcome occurs in the long run. d. equals the sample mean divided by the sample standard deviation.

Answer: c

2) The probability of an event A or B (Pr(A or B)) to occur equals

a. Pr(A) × Pr(B). b. Pr(A) + Pr(B) if A and B are mutually exclusive.

c. Pr( )Pr( )

AB

.

d. Pr(A) + Pr(B) even if A and B are not mutually exclusive.

Answer: b 3) The cumulative probability distribution shows the probability

e. that a random variable is less than or equal to a particular value. f. of two or more events occurring at once. g. of all possible events occurring. h. that a random variable takes on a particular value given that another event has

happened.

Answer: a 4) The expected value of a discrete random variable

a. is the outcome that is most likely to occur. b. can be found by determining the 50% value in the c.d.f. c. equals the population median. d. is computed as a weighted average of the possible outcome of that random

variable, where the weights are the probabilities of that outcome.

2

Answer: d

9) Let Y be a random variable. Then var(Y) equals

a. 2[( ) ]YE Y µ− . b. [| ( ) |]YE Y µ− . c. 2[( ) ]YE Y µ− . d. [( )]YE Y µ− .

Answer: c

10) The conditional distribution of Y given X = x, Pr( | )Y y X x= = , is

a. Pr( )Pr( )

Y yX x

==

.

b. 1

Pr( , )l

ii

X x Y y=

= =∑ .

c. Pr( , )Pr( )X x Y y

Y y= =

=.

d. Pr( , )Pr( )X x Y y

X x= =

=.

Answer: d

11) The conditional expectation of Y given X, ( | )E Y X x= , is calculated as follows:

a. 1

Pr( | )k

i ii

y X x Y y=

= =∑ .

b. [ ( | )]E E Y X .

c. 1

Pr( | )k

i ii

y Y y X x=

= =∑ .

d. 1

( | ) Pr( )l

i ii

E Y X x X x=

= =∑ .

Answer: c

9) Two random variables X and Y are independently distributed if all of the following

conditions hold, with the exception of

3

a. Pr( | ) Pr( )Y y X x Y y= = = = . b. knowing the value of one of the variables provides no information about the other. c. if the conditional distribution of Y given X equals the marginal distribution of Y. d. ( ) [ ( | )]E Y E E Y X= .

Answer: d

9) The correlation between X and Y

a. cannot be negative since variances are always positive. b. is the covariance squared. c. can be calculated by dividing the covariance between X and Y by the product of

the two standard deviations.

d. is given by cov( , )( , )var( ) var( )

X Ycorr X YX Y

= .

Answer: c 10) Two variables are uncorrelated in all of the cases below, with the exception of

a. being independent. b. having a zero covariance. c. 2 2| |XY X Yσ σ σ≤ . d. ( | ) 0E Y X = .

Answer: c

11) var( )aX bY+ =

a. 2 2 2 2X Ya bσ σ+ .

b. 2 2 2 22X XY Ya ab bσ σ σ+ + . c. XY X Yσ µ µ+ . d. 2 2

X Ya bσ σ+ .

Answer: b 12) To standardize a variable you

a. subtract its mean and divide by its standard deviation. b. integrate the area below two points under the normal distribution. c. add and subtract 1.96 times the standard deviation to the variable. d. divide it by its standard deviation, as long as its mean is 1.

4

Answer: a 13) Assume that Y is normally distributed 2( , )N µ σ . To find 1 2Pr( )c Y c≤ ≤ , where 1 2c c<

and ii

cd µσ−= , you need to calculate 1 2Pr( )d Z d≤ ≤ =

a. 2 1( ) ( )d dΦ Φ− . b. (1.96) ( 1.96)Φ Φ− − . c. 2 1( ) (1 ( ))d dΦ Φ− − . d. 2 11 ( ( ) ( ))d dΦ Φ− − .

Answer: a

The Student t distribution is

a. the distribution of the sum of m squared independent standard normal random variables.

b. the distribution of a random variable with a chi-squared distribution with m degrees of freedom, divided by m.

c. always well approximated by the standard normal distribution. d. the distribution of the ratio of a standard normal random variable, divided by the

square root of an independently distributed chi-squared random variable with m degrees of freedom divided by m.

Answer: d

17) When there are ∞ degrees of freedom, the t∞ distribution

a. can no longer be calculated. b. equals the standard normal distribution. c. has a bell shape similar to that of the normal distribution, but with “fatter” tails. d. equals the 2χ∞ distribution.

Answer: b

18) The sample average is a random variable and

a. is a single number and as a result cannot have a distribution. b. has a probability distribution called its sampling distribution. c. has a probability distribution called the standard normal distribution. d. has a probability distribution that is the same as for the 1,..., nY Y i.i.d. variables.

Answer: b

5

19) To infer the political tendencies of the students at your college/university, you sample

150 of them. Only one of the following is a simple random sample: You

a. make sure that the proportion of minorities are the same in your sample as in the entire student body.

b. call every fiftieth person in the student directory at 9 a.m. If the person does not answer the phone, you pick the next name listed, and so on.

c. go to the main dining hall on campus and interview students randomly there. d. have your statistical package generate 150 random numbers in the range from 1 to

the total number of students in your academic institution, and then choose the corresponding names in the student telephone directory.

Answer: d

20) The variance of 2, YY σ , is given by the following formula:

a. 2Yσ .

b. Y

nσ .

c. 2Y

nσ .

d. 2Y

nσ .

Answer: c

21) The mean of the sample average , ( )Y E Y , is

a. 1Yn

µ .

b. Yµ .

c. Y

nµ .

d. Y

Y

σµ

for n > 30.

Answer: b

6

22) In econometrics, we typically do not rely on exact or finite sample distributions because

a. we have approximately an infinite number of observations (think of re-sampling). b. variables typically are normally distributed. c. the covariances of ,i jY Y are typically not zero. d. asymptotic distributions can be counted on to provide good approximations to the

exact sampling distribution.

Answer: d 23) The central limit theorem states that

a. the distribution for Y

Y

Y µσ− becomes arbitrarily well approximated by the standard

normal distribution.

b. p

YY µ→ . c. the probability that Y is in the range Y cµ ± becomes arbitrarily close to one as n

increases for any constant 0c > . d. the t distribution converges to the F distribution for approximately n > 30.

Answer: a

24) The covariance inequality states that

a. 20 1X Yσ≤ ≤ . b. 2 2 2

X Y X Yσ σ σ≤ . c. 2 2 2

X Y X Yσ σ σ− ≤ .

d. 2

22

XX Y

Y

σσσ

≤ .

Answer: b

Chapter 3 1) An estimator is

a. an estimate. b. a formula that gives an efficient guess of the true population value. c. a random variable. d. a nonrandom number.

7

Answer: c

2) An estimate is

a. efficient if it has the smallest variance possible. b. a nonrandom number. c. unbiased if its expected value equals the population value. d. another word for estimator.

Answer: b

3) An estimator ˆYµ of the population value Yµ is consistent if

a. ˆp

Y Yµ µ→ . b. its mean square error is the smallest possible. c. Y is normally distributed.

d. 0p

Y → .

Answer: a

4) An estimator ˆYµ of the population value Yµ is more efficient when compared to another estimator Yµ% , if

a. E( ˆYµ ) > E( Yµ% ). b. it has a smaller variance. c. its c.d.f. is flatter than that of the other estimator. d. both estimators are unbiased, and var( ˆYµ ) < var( Yµ% ).

Answer: d

5) The standard error of ˆ, ( ) YY SE Y σ= is given by the following formula:

i. 2

1

1 ( )n

ii

Y Yn =

−∑ .

j. 2Ysn

.

k. Ys .

l. Ysn

.

Answer: d

8

7) When you are testing a hypothesis against a two-sided alternative, then the alternative is

written as

a. ,0( ) YE Y µ> . b. ,0( ) YE Y µ= .

c. ,0YY µ≠ . d. ,0( ) YE Y µ≠ .

Answer: d

8) A scatterplot

a. shows how Y and X are related when their relationship is scattered all over the

place. b. relates the covariance of X and Y to the correlation coefficient. c. is a plot of n observations on iX and iY , where each observation is represented by

the point ( ,i iX Y ). d. shows n observations of Y over time.

Answer: c

9) The following types of statistical inference are used throughout econometrics, with the exception of

a. confidence intervals. b. hypothesis testing. c. calibration. d. estimation.

Answer: c

10) Among all unbiased estimators that are weighted averages of 1,..., nY Y , Y is

a. the only consistent estimator of Yµ . b. the most efficient estimator of Yµ . c. a number which, by definition, cannot have a variance. d. the most unbiased estimator of Yµ .

Answer: b

9

11) To derive the least squares estimator Yµ , you find the estimator m which minimizes

e. 2

1

( )n

ii

Y m=

−∑ .

f. 1

| ( ) |n

ii

Y m=

−∑ .

g. 2

1

n

ii

mY=∑ .

h. 1

( )n

ii

Y m=

−∑ .

Answer: a

12) If the null hypothesis states 0 ,0: ( ) YH E Y µ= , then a two-sided alternative hypothesis is

e. 1 ,0: ( ) YH E Y µ≠ . f. 1 ,0: ( ) YH E Y µ≈ . g. 1 ,0: Y YH µ µ< . h. 1 ,0: ( ) YH E Y µ> .

Answer: a

14) A large p-value implies

e. rejection of the null hypothesis. f. a large t-statistic. g. a large actY . h. that the observed value actY is consistent with the null hypothesis.

Answer: d

15) The formula for the sample variance is

a. 2

1

1 ( )1

n

Y ii

s Y Yn =

= −− ∑ .

b. 2 2

1

1 ( )1

n

Y ii

s Y Yn =

= −− ∑ .

c. 2 2

1

1 ( )1

n

Y i Yi

s Yn

µ=

= −− ∑ .

10

d. 1

2 2

1

1 ( )1

n

Y ii

s Y Yn

−

=

= −− ∑ .

Answer: b

16) Degrees of freedom

a. in the context of the sample variance formula means that estimating the mean uses up some of the information in the data.

b. is something that certain undergraduate majors at your university/college other than economics seem to have an ∞ amount of.

c. are (n-2) when replacing the population mean by the sample mean. d. ensure that 2 2

Y Ys σ= .

Answer: a 17) The t-statistic is defined as follows:

a. ,02

Y

Y

Yt

n

µσ−

= .

e. ,0

( )YY

tSE Y

µ−= .

f. 2

,0( )( )

YYt

SE Yµ−

= .

g. 1.96.

Answer: b 18) The power of the test

e. is the probability that the test actually incorrectly rejects the null hypothesis when the null is true.

f. depends on whether you use Y or 2Y for the t-statistic. g. is one minus the size of the test. h. is the probability that the test correctly rejects the null when the alternative is true.

Answer: d

19) The sample covariance can be calculated in any of the following ways, with the exception

of:

11

a. 1

1 ( )( )1

n

i ii

X X Y Yn =

− −− ∑ .

b. 1

11 1

n

i ii

nX Y XYn n=

−− −∑ .

c. 1

1 ( )( )n

i X i Yi

X Yn

µ µ=

− −∑ .

d. XY Y Yr s s , where XYr is the correlation coefficient.

Answer: c 20) When the sample size n is large, the 90% confidence interval for µY is

a. 1.96 ( )Y SE Y± . b. 1.64 ( )Y SE Y± . c. 1.64 YY σ± . d. 1.96Y ± .

Answer: b

21) The standard error for the difference in means if two random variables M and W , when

the two population variances are different, is

a. 2 2M W

M W

s sn n

++

.

b. WM

M W

ssn n

+ .

c. 221 ( )

2WM

M W

ssn n

+ .

d. 22WM

M W

ssn n

+ .

Answer: d

22) The following statement about the sample correlation coefficient is true.

12

a. –1 XYr≤ ≤ 1.

b. 2 ( , )p

XY i ir corr X Y→ . c. | | 1XYr < .

d. 2

2 2XY

XYX Y

srs s

= .

Answer: a

23) The correlation coefficient

a. lies between zero and one. b. is a measure of linear association. c. is close to one if X causes Y. d. takes on a high value if you have a strong nonlinear relationship.

Answer: b

Chapter 4 1) When the estimated slope coefficient in the simple regression model, 1̂β , is zero, then

a. R2 = Y . b. 0 < R2 < 1. c. R2 = 0. d. R2 > (SSR/TSS). Answer: c

2) Heteroskedasticity means that

a. homogeneity cannot be assumed automatically for the model. b. the variance of the error term is not constant. c. the observed units have different preferences. d. agents are not all rational. Answer: b

3) With heteroskedastic errors, the weighted least squares estimator is BLUE. You should

13

use OLS with heteroskedasticity-robust standard errors because

a. this method is simpler. b. the exact form of the conditional variance is rarely known. c. the Gauss-Markov theorem holds. e. your spreadsheet program does not have a command for weighted least squares. Answer: b

4) Which of the following statements is correct?

a. TSS = ESS + SSR b. ESS = SSR + TSS c. ESS > TSS d. R2 = 1 – (ESS/TSS) Answer: a

5) Binary variables

a. are generally used to control for outliers in your sample. b. can take on more than two values. c. exclude certain individuals from your sample. d. can take on only two values.

Answer: d 6) When estimating a demand function for a good where quantity demanded is a linear

function of the price, you should

a. not include an intercept because the price of the good is never zero. b. use a one-sided alternative hypothesis to check the influence of price on quantity. c. use a two-sided alternative hypothesis to check the influence of price on quantity. d. reject the idea that price determines demand unless the coefficient is at least 1.96.

Answer: b

7) The reason why estimators have a sampling distribution is that

a. economics is not a precise science. b. individuals respond differently to incentives. c. in real life you typically get to sample many times. d. the values of the explanatory variable and the error term differ across samples. Answer: d

14

8) The OLS estimator is derived by

a. connecting the Yi corresponding to the lowest Xi observation with the Yi corresponding to the highest Xi observation.

b. making sure that the standard error of the regression equals the standard error of the slope estimator.

c. minimizing the sum of absolute residuals. d. minimizing the sum of squared residuals.

Answer: d 9) Interpreting the intercept in a sample regression function is

a. not reasonable because you never observe values of the explanatory variables around

the origin. b. reasonable because under certain conditions the estimator is BLUE. c. reasonable if your sample contains values of Xi around the origin. d. not reasonable because economists are interested in the effect of a change in X on the

change in Y. Answer: c 10) The sample average of the OLS residuals is

a. some positive number since OLS uses squares. b. zero. c. unobservable since the population regression function is unknown. d. dependent on whether the explanatory variable is mostly positive or negative. Answer: b

11) The t-statistic is calculated by dividing

a. the OLS estimator by its standard error. b. the slope by the standard deviation of the explanatory variable. c. the estimator minus its hypothesized value by the standard error of the estimator. d. the slope by 1.96. Answer: c

15

12) The slope estimator, β1, has a smaller standard error, other things equal, if

a. there is more variation in the explanatory variable, X. b. there is a large variance of the error term, u. c. the sample size is smaller. d. the intercept, β0, is small. Answer: a

13) The regression R2 is a measure of

a. whether or not X causes Y. b. the goodness of fit of your regression line. c. whether or not ESS > TSS. d. the square of the determinant of R. Answer: b

14) (Requires Appendix) The sample regression line estimated by OLS

a. will always have a slope smaller than the intercept. b. is exactly the same as the population regression line. c. cannot have a slope of zero. d. will always run through the point ( ,X Y ). Answer: d

15) The confidence interval for the sample regression function slope

a. can be used to conduct a test about a hypothesized population regression function slope.

b. can be used to compare the value of the slope relative to that of the intercept. c. adds and subtracts 1.96 from the slope. d. allows you to make statements about the economic importance of your estimate. Answer: a

16) If the absolute value of your calculated t-statistic exceeds the critical value from the

standard normal distribution, you can

a. reject the null hypothesis. b. safely assume that your regression results are significant. c. reject the assumption that the error terms are homoskedastic.

16

d. conclude that most of the actual values are very close to the regression line. Answer: a

17) Under the least squares assumptions (zero conditional mean for the error term, Xi and Yi

being i.i.d., and Xi and ui having finite fourth moments), the OLS estimator for the slope and intercept

a. has an exact normal distribution for n > 15. b. is BLUE. c. has a normal distribution even in small samples. d. is unbiased. Answer: d

18) To obtain the slope estimator using the least squares principle, you divide the

a. sample variance of X by the sample variance of Y. b. sample covariance of X and Y by the sample variance of Y. c. sample covariance of X and Y by the sample variance of X. d. sample variance of X by the sample covariance of X and Y. Answer: c

19) To decide whether or not the slope coefficient is large or small,

a. you should analyze the economic importance of a given increase in X. b. the slope coefficient must be larger than one. c. the slope coefficient must be statistically significant. d. you should change the scale of the X variable if the coefficient appears to be too

small. Answer: a

20) E(ui | Xi) = 0 says that

a. dividing the error by the explanatory variable results in a zero (on average). b. the sample regression function residuals are unrelated to the explanatory variable. c. the sample mean of the Xs is much larger than the sample mean of the errors. d. the conditional distribution of the error given the explanatory variable has a zero

mean.

17

Answer: d

21) In the linear regression model, iii uXY ++= 10 ββ , iX10 ββ + is referred to as

a. the population regression function. b. the sample regression function. c. exogenous variation. d. the right-hand variable or regressor. Answer: a

22) Multiplying the dependent variable by 100 and the explanatory variable by 100,000 leaves the

a. OLS estimate of the slope the same. b. OLS estimate of the intercept the same. c. regression R2 the same. d. heteroskedasticity-robust standard errors of the OLS estimators the same.

Answer: c

Analytical Questions Chapter 2 1) Think of the situation of rolling two dice and let M denote the sum of the number of dots

on the two dice. (So M is a number between 1 and 12.) (a) In a table, list all of the possible outcomes for the random variable M together with its

probability distribution and cumulative probability distribution. Sketch both distributions.

Answer:

Outcome (sum of dots)

2 3 4 5 6 7 8 9 10 11 12

Probability distribution

0.028

0.056

0.083

0.111

0.139

0.167

0.139

0.111

0.083

0.056

0.028

Cumulative probability distribution

0.028

0.083

0.167

0.278

0.417

0.583

0.722

0.833

0.912

0.972

1.000

18

Probability and Cumulative Probability Distribution of Number of Dots

00.020.040.060.080.1

0.120.140.160.18

2 3 4 5 6 7 8 9 10 11 12

Number of Dots

Prob

abili

ty

00.10.20.30.40.50.60.70.80.91

2 3 4 5 6 7 8 9 10 11 12

Probability Cumulative Probability

(b) Calculate the expected value and the standard deviation for M.

Answer: 7.0; 2.42. (c) Looking at the sketch of the probability distribution, you notice that it resembles a normal

distribution. Should you be able to use the standard normal distribution to calculate probabilities of events? Why or why not?

Answer: You cannot use the normal distribution (without continuity correction) to

calculate probabilities of events, since the probability of any event equals zero. (d) What is the probability of the following outcomes?

(i) Pr(M = 7) (ii) Pr(M = 2 or M = 10) (iii) Pr(M = 4 or M ≠ 4) (iv) Pr(M = 6 and M = 9) (v) Pr(M < 8) (vi) Pr(M = 6 or M > 10)

Answer: (i) 0.167 or 6 136 6

= ; (ii) 0.111 or 4 139 9

= ; (iii) 1; (iv) 0; (v) 0.583;

19

(vi) 0.222 or 8 236 9

= .

2) Probabilities and relative frequencies are related in that the probability of an outcome is the proportion of the time that the outcome occurs in the long run. Hence concepts of joint, marginal, and conditional probability distributions stem from related concepts of frequency distributions.

You are interested in investigating the relationship between the age of heads of households and weekly earnings of households. The accompanying data gives the number of occurrences grouped by age and income. You collect data from 1,744 individuals and think of these individuals as a population that you want to describe, rather than a sample from which you want to infer behavior of a larger population. After sorting the data, you generate the accompanying table:

Joint Absolute Frequencies of Age and Income, 1,744 Households

Age of head of household

1X 2X 3X 4X 5X Household Income 16-under 20 20-under 25 25-under 45 45-under 65 65 and >

1Y $0-under $200 80 76 130 86 24 2Y $200-under $400 13 90 346 140 8 3Y $400-under $600 0 19 251 101 6 4Y $600-under $800 1 11 110 55 1 5Y $800 and > 1 1 108 84 2

The median of the income group of $800 and above is $1,050. (a) Calculate the joint relative frequencies and the marginal relative frequencies. Interpret

one of each of these. Sketch the cumulative income distribution.

20

Answer: The joint relative frequencies and marginal relative frequencies are given in the accompanying table. 5.2 percent of the individuals are between the age of 20 and 24, and make between $200 and under $400. 21.6 percent of the individuals earn between $400 and under $600.

Joint Relative and Marginal Frequencies of Age and Income, 1,744 Households

Age of head of household

1X 2X 3X 4X 5X Household Income 16-under 20 20-under 25 25-under 45 45-under 65 65 and > Total

1Y $0-under $200 0.046 0.044 0.075 0.049 0.014 0.227 2Y $200-under $400 0.007 0.052 0.198 0.080 0.005 0.342 3Y $400-under $600 0.000 0.011 0.144 0.058 0.003 0.216 4Y $600-under $800 0.001 0.006 0.063 0.032 0.001 0.102 5Y $800 and > 0.001 0.001 0.062 0.048 0.001 0.112

Cumulative Income Distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

$0-<$200 $200-<$400 $400-<$600 $600-<800 $800 and >

Income Class

Perc

ent

Cumulative Income Distribution

21

(b) Calculate the conditional relative income frequencies for the two age categories 16-under

20, and 45-under 65. Calculate the mean household income for both age categories.

Answer: The mean household income for the 16-under 20 age category is roughly $144. It is approximately $489 for the 45-under 65 age category.

Conditional Relative Frequencies of Income and Age 16-under 20, and 45-under 65, 1,744 Households

Age of head of household 1X 4X Household Income 16-under 20 45-under 65

1Y $0-under $200 0.842 0.185 2Y $200-under $400 0.137 0.300 3Y $400-under $600 0.000 0.217 4Y $600-under $800 0.001 0.118 5Y $800 and > 0.001 0.180

(c) If household income and age of head of household were independently distributed, what

would you expect these two conditional relative income distributions to look like? Are they similar here?

Answer: They would have to be identical, which they clearly are not.

(d) Your textbook has given you a primary definition of independence that does not involve

conditional relative frequency distributions. What is that definition? Do you think that age and income are independent here, using this definition?

Answer: Pr( , ) Pr( ) Pr( )Y y X x Y y X x= = = = = . We can check this by multiplying two

marginal probabilities to see if this results in the joint probability. For example, 3Pr( ) 0.216Y Y= = and 3Pr( ) 0.542X X= = , resulting in a product of 0.117,

which does not equal the joint probability of 0.144. Given that we are looking at the data as a population, not a sample, we do not have to test how “close” 0.117 is to 0.144.

3) Math and verbal SAT scores are each distributed normally with (500,10000)N . (a) What fraction of students scores above 750? Above 600? Between 420 and 530? Below

480? Above 530?

22

Answer: Pr(Y>750) = 0.0062; Pr(Y>600) = 0.1587; Pr(420<Y<530) = 0.4061; Pr(Y<480) = 0.4270; Pr(Y>530) = 0.3821.

(b) If the math and verbal scores were independently distributed, which is not the case, then what would be the distribution of the overall SAT score? Find its mean and variance.

Answer: The distribution would be (1000,20000)N , using equations (2.29) and (2.31) in

the textbook. Note that the standard deviation is now roughly 141 rather than 200.

(c) Next, assume that the correlation coefficient between the math and verbal scores is 0.75.

Find the mean and variance of the resulting distribution.

Answer: Given the correlation coefficient, the distribution is now (1000,35000)N , which has a standard deviation of approximately 187.

(d) Finally, assume that you had chosen 25 students at random who had taken the SAT exam.

Derive the distribution for their average math SAT score. What is the probability that this average is above 530? Why is this so much smaller than your answer in (a)?

Answer: The distribution for the average math SAT score is (500,400)N . Pr( 530)Y > =

0.0668. This probability is smaller because the sample mean has a smaller standard deviation (20 rather than 100).

You have read about the so-called catch-up theory by economic historians, whereby nations that

are further behind in per capita income grow faster subsequently. If this is true systematically, then eventually laggards will reach the leader. To put the theory to the test, you collect data on relative (to the United States) per capita income for two years, 1960 and 1990, for 24 OECD countries. You think of these countries as a population you want to describe, rather than a sample from which you want to infer behavior of a larger population. The relevant data for this question is as follows:

Y 1X 2X 1Y X× 2Y 2

1X 22X

0.023 0.770 1.030 0.018 0.00053 0.593 1.0609 0.014 1.000 1.000 0.014 0.00020 1.000 1.0000 …. …. …. …. …. …. ….

0.041 0.200 0.450 0.008 0.00168 0.040 0.2025 0.033 0.130 0.230 0.004 0.00109 0.017 0.0529 0.625 13.220 17.800 0.294 0.01877 8.529 13.9164

23

where 1X and 2X are per capita income relative to the United States in 1960 and 1990 respectively, and Y is the average annual growth rate in X over the 1960-1990 period. Numbers in the last row represent sums of the columns above.

(a) Calculate the variance and standard deviation of 1X and 2X . For a catch-up effect to be

present, what relationship must the two standard deviations show? Is this the case here?

Answer: The variances of 1X and 2X are 0.0520 and 0.0298 respectively, with standard deviations of 0.2279 and 0.1726. For the catch-up effect to be present, the standard deviation would have to shrink over time. This is the case here.

(b) Calculate the correlation between Y and 1X . What sign must the correlation coefficient

have for there to be evidence of a catch-up effect? Explain.

Answer: The correlation coefficient is –0.88. It has to be negative for there to be evidence of a catch-up effect. If countries that were relatively ahead in the initial period and in terms of per capita income grow by relatively less over time, then eventually the laggards will catch-up.

4) Following Alfred Nobel’s will, there are five Nobel Prizes awarded each year. These are

for outstanding achievements in Chemistry, Physics, Physiology or Medicine, Literature, and Peace. In 1968, the Bank of Sweden added a prize in Economic Sciences in memory of Alfred Nobel. You think of the data as describing a population, rather than a sample from which you want to infer behavior of a larger population. The accompanying table lists the joint probability distribution between recipients in economics and the other five prizes, and the citizenship of the recipients, based on the 1969-2001 period.

Joint Distribution of Nobel Prize Winners in Economics and Non-Economics Disciplines, and Citizenship, 1969-2001

U.S. Citizen

( 0Y = ) Non-U.S. Citizen

( 1Y = ) Total

Economics Nobel Prize ( 0X = )

0.118 0.049 0.167

Physics, Chemistry, Medicine, Literature,

and Peace Nobel Prize ( 1X = )

0.345 0.488 0.833

Total 0.463 0.537 1.00

(a) Compute ( )E Y and interpret the resulting number.

24

Answer: ( ) 0.537E Y = . 53.7 percent of Nobel Prize winners were non-U.S. citizens. (b) Calculate and interpret ( | 1)E Y X = and ( | 0)E Y X = .

Answer: ( | 1) 0.586E Y X = = . 58.6 percent of Nobel Prize winners in non-economics disciplines were non-U.S. citizens. ( | 0) 0.293E Y X = = . 29.3 percent of the Economics Nobel Prize winners were non-U.S. citizens.

(c) A randomly selected Nobel Prize winner reports that he is a non-U.S. citizen. What is the

probability that this genius has won the Economics Nobel Prize? A Nobel Prize in the other five disciplines?

Answer: There is a 9.1 percent chance that he has won the Economics Nobel Prize, and a

90.9 percent chance that he has won a Nobel Prize in one of the other five disciplines.

(d) Show what the joint distribution would look like if the two categories were independent. .

Answer: Joint Distribution of Nobel Prize Winners in Economics and Non-Economics Disciplines, and Citizenship, 1969-2001, under assumption of independence

U.S. Citizen

( 0Y = ) Non= U.S. Citizen

( 1Y = ) Total

Economics Nobel Prize ( 0X = )

0.077 0.090 0.167

Physics, Chemistry, Medicine, Literature,

and Peace Nobel Prize ( 1X = )

0.386 0.447 0.833

Total 0.463 0.537 1.00

7) A few years ago the news magazine The Economist listed some of the stranger

explanations used in the past to predict presidential election outcomes. These included whether or not the hemlines of women’s skirts went up or down, stock market performances, baseball World Series wins by an American League team, etc. Thinking about this problem more seriously, you decide to analyze whether or not the presidential candidate for a certain party did better if his party controlled the house. Accordingly you collect data for the last 34 presidential elections. You think of this data as comprising a population which you want to describe, rather than a sample from which you want to infer behavior of a larger population. You generate the accompanying table:

Joint Distribution of Presidential Party Affiliation and Party Control of House of Representatives, 1860-1996

25

Democratic Control

of House ( 0Y = ) Republican Control of House ( 1Y = )

Total

Democratic President ( 0X = )

0.412 0.030 0.441

Republican President ( 1X = )

0.176 0.382 0.559

Total 0.588 0.412 1.00

(a) Interpret one of the joint probabilities and one of the marginal probabilities.

Answer: 38.2 percent of the presidents were Republicans and were in the White House while Republicans controlled the House of Representatives. 44.1 percent of all presidents were Democrats.

(b) Compute ( )E X . How does this differ from ( | 0)E X Y = ? Explain..

Answer: ( )E X = 0.559. ( | 0)E X Y = = 0.701. ( )E X gives you the unconditional

expected value, while ( | 0)E X Y = is the conditional expected value.

(c) If you picked one of the Republican presidents at random, what is the probability that during his term the Democrats had control of the House?

Answer: ( ) 0.559E X = . 55.9 percent of the presidents were Republicans.

( | 0) 0.299E X Y = = . 29.9 percent of those presidents who were in office while Democrats had control of the House of Representatives were Republicans. The second conditions on those periods during which Democrats had control of the House of Representatives, and ignores the other periods.

(d) What would the joint distribution look like under independence? Check your results by

calculating the two conditional distributions and compare these to the marginal distribution.

Answer:

Joint Distribution of Presidential Party Affiliation and Party Control of House of Representatives, 1860-1996, under the Assumption of Independence

Democratic Control

of House ( 0Y = ) Republican Control of House ( 1Y = )

Total

26

Democratic President ( 0X = )

0.259 0.182 0.441

Republican President ( 1X = )

0.329 0.230 0.559

Total 0.588 0.412 1.00

0.259Pr( 0 | 0) 0.4400.588

X Y= = = = (there is a small rounding error).

0.230Pr( 1| 1) 0.4110.559

Y X= = = = (there is a small rounding error).

8) The expectations augmented Phillips curve postulates

( )p f u u∆ π= − − , where p∆ is the actual inflation rate, π is the expected inflation rate, and u is the unemployment rate, with “–” indicating equilibrium (the NAIRU – Non-Accelerating Inflation Rate of Unemployment). Under the assumption of static expectations (π = 1p∆ − ), i.e. that you expect this period’s inflation rate to hold for the next period (“the sun shines today, it will shine tomorrow”), then the prediction is that inflation will accelerate if the unemployment rate is below its equilibrium level. The accompanying table below displays information on accelerating annual inflation and unemployment rate differences from the equilibrium rate (cyclical unemployment), where the latter is approximated by a five = year moving average. You think of this data as a population which you want to describe, rather than a sample from which you want to infer behavior of a larger population. The data is collected from United States quarterly data for the period 1964:1 to 1995:4.

Joint Distribution of Accelerating Inflation and Cyclical Unemployment,

1964:1-1995:4

( ) 0u u− >

( 0Y = ) ( ) 0u u− ≥

( 1Y = ) Total

1 0p p∆ ∆ −− > ( 0X = )

0.156 0.383 0.539

1 0p p∆ ∆ −− ≤ ( 1X = )

0.297 0.164 0.461

Total 0.453 0.547 1.00

(a) Compute ( )E Y and ( )E X , and interpret both numbers.

27

Answer: ( ) 0.547E Y = . 54.7 percent of the quarters saw cyclical unemployment. ( ) 0.461E X = . 46.1 percent of the quarters saw decreasing inflation rates.

(b) Calculate ( | 1)E Y X = and ( | 0)E Y X = . If there was independence between cyclical

unemployment and acceleration in the inflation rate, what would you expect the relationship between the two expected values to be? Given that the two means are different, is this sufficient to assume that the two variables are independent?

Answer: ( | 1) 0.356E Y X = = ; ( | 0) 0.711E Y X = = . You would expect the two

conditional expectations to be the same. In general, independence in means does not imply statistical independence, although the reverse is true.

(c) What is the probability of inflation to increase if there is positive cyclical unemployment?

Negative cyclical unemployment?

Answer: There is a 34.4 percent probability of inflation to increase if there is positive cyclical unemployment. There is a 70 percent probability of inflation to increase if there is negative cyclical unemployment.

(d) You randomly select one of the 59 quarters when there was positive cyclical

unemployment ( ( ) 0u u− > ) . What is the probability there was decelerating inflation during that quarter?

Answer: There is a 65.6 percent probability of inflation to decelerate when there is

positive cyclical unemployment.

9) The accompanying table shows the joint distribution between the change of the unemployment rate in an election year and the share of the candidate of the incumbent party since 1928. You think of this data as a population which you want to describe, rather than a sample from which you want to infer behavior of a larger population.

Joint Distribution of Unemployment Rate Change and Incumbent Party’s Vote Share in Total Vote Cast for the Two Major-Party Candidates,

1928-2000

( 50%) 0Incumbent − >

( 0Y = ) ( 50%) 0Incumbent − ≤

( 1Y = ) Total

0u∆ > ( 0X = ) 0.053 0.211 0.264 0u∆ ≤ ( 1X = ) 0.579 0.157 0.736 Total 0.632 0.368 1.00

(a) Compute and interpret ( )E Y and ( )E X .

28

Answer: ( ) 0.368E Y = ; ( ) 0.736E X = . The probability of an incumbent to have less than 50% of the share of votes cast for the two major-party candidates is 0.368. The probability of observing falling unemployment rates during the election year is 73.6 percent.

(b) Calculate ( | 1)E Y X = and ( | 0)E Y X = . Did you expect these to be very different?

Answer: ( | 1) 0.213E Y X = = ; ( | 0) 0.799E Y X = = . A student who believes that incumbents will attempt to manipulate the economy to win elections will answer affirmatively here.

(c) What is the probability that the unemployment rate decreases in an election year?

Answer: Pr( 1)X = = 0.736.

(d) Conditional on the unemployment rate decreasing, what is the probability that an incumbent will lose the election?

Answer: Pr( 1| 1) 0.213Y X= = = .

(e) What would the joint distribution look like under independence?

Answer: Joint Distribution of Unemployment Rate Change and Incumbent Party’s Vote Share in Total Vote Cast for the Two Major-Party Candidates,

1928-2000 under Assumption of Statistical Independence

( 50%) 0Incumbent − >

( 0Y = ) ( 50%) 0Incumbent − ≤

( 1Y = ) Total

0u∆ > ( 0X = ) 0.167 0.097 0.264 0u∆ ≤ ( 1X = ) 0.465 0.271 0.736 Total 0.632 0.368 1.00

29

10) The accompanying table lists the joint distribution of unemployment in the United States in 2001 by demographic characteristics (race and gender).

Joint Distribution of Unemployment by Demographic Characteristics,

United States, 2001

White

( 0Y = ) Black and Other

( 1Y = ) Total

Age 16-19 ( 0X = )

0.13 0.05 0.18

Age 20 and above ( 1X = )

0.60 0.22 0.82

Total 0.73 0.27 1.00

(a) What is the percentage of unemployed white teenagers?

Answer: Pr( 0, 0) 0.13.Y X= = =

(b) Calculate the conditional distribution for the categories “white” and “black and other.”

Answer: Conditional Distribution of Unemployment by Demographic

Characteristics, United States, 2001

White

( 0Y = ) Black and Other

( 1Y = ) Age 16-19 ( 0X = )

0.18 0.19

Age 20 and above ( 1X = )

0.82 0.81

Total 1.00 1.00

(c) Given your answer in the previous question, how do you reconcile this fact with the probability to be 60% of finding an unemployed adult white person, and only 22% for the category “black and other.”

Answer: The original table showed the joint probability distribution, while the table in (b) presented the conditional probability distribution.

30

Mathematical and Graphical Problems 1) Think of an example involving five possible quantitative outcomes of a discrete random

variable and attach a probability to each one of these outcomes. Display the outcomes, probability distribution, and cumulative probability distribution in a table. Sketch both the probability distribution and the cumulative probability distribution.

Answer: Answers will vary by student. The generated table should be similar to Table

2.1 in the text, and figures should resemble Figures 2.1 and 2.2 in the text. 2) The height of male students at your college/university is normally distributed with a

mean of 70 inches and a standard deviation of 3.5 inches. If you had a list of telephone numbers for male students for the purpose of conducting a survey, what would be the probability of randomly calling one of these students whose height is

(a) taller than 6'0"? (b) between 5'3" and 6'5"? (c) shorter than 5'7", the mean height of female students? (d) shorter than 5'0"? (e) taller than Shaq O’Neal, the center of the L.A. Lakers, who is 7'1" tall? Compare this to

the probability of a woman being pregnant for 10 months (300 days), where days of pregnancy is normally distributed with a mean of 266 days and a standard deviation of 16 days.

Answer: (a) Pr(Z > 0.5714) = 0.2839; (b) Pr( –2 < Z < 2) = 0.9545 or approximately

0.95; (c) Pr(Z < -0.8571) = 0.1957; (d) Pr(Z < -2.8571) = 0.0021; (e) Pr(Z > 4.2857) = 0.000009 (the text does not show values above 2.99 standard deviations, Pr(Z>2.99 = 0.0014) and Pr(Z > 2.1250) = 0.0168.

3) Calculate the following probabilities using the standard normal distribution. Sketch the

probability distribution in each table case, shading in the area of the calculated probability. (a) Pr(Z < 0.0) (b) Pr(Z ≤ 1.0) (c) Pr(Z > 1.96) (d) Pr(Z < –2.0) (e) Pr(Z > 1.645) (f) Pr(Z > –1.645) (g) Pr(–1.96 < Z < 1.96) (h) Pr(Z < 2.576 or Z > 2.576) (i) Pr(Z > z) = 0.10; find z. (j) Pr(Z < -z or Z > z) = 0.05; find z.

31

Answer: (a) 0.5000; (b) 0.8413; (c) 0.0250; (d) 0.0228; (e) 0.0500; (f) 0.9500; (g) 0.0500; (h) 0.0100; (i) 1.2816; (j) 1.96.

4) Using the fact that the standardized variable Z is a linear transformation of the normally distributed random variable Y, derive the expected value and variance of Z.

Answer: 1Y Y

Y Y Y

YZ Y a bYµ µσ σ σ−= = − + = + , with Y

Y

a µσ

= − and 1

Y

bσ

= . Given (2.29)

and (2.30) in the text, 1( ) 0YY

Y Y

E Z µ µσ σ

= − + = , and 22

1 1Z ZZ

σ σσ

= = .

5) Show in a scatterplot what the relationship between two variables X and Y would look like if there was

(a) a strong negative correlation. (b) a strong positive correlation. Answer: Answer:

32

(c) no correlation.

(d) What would the correlation coefficient be if all observations for the two variables were on a curve described by 2Y X= ?

Answer: The correlation coefficient would be zero in this case, since the relationship is non-linear.

6) Find the following probabilities: (a) Y is distributed 2

4χ . Find Pr(Y > 9.49).

Answer: 0.05. (b) Y is distributed t∞ . Find Pr(Y > –0.5).

Answer: 0.6915.

(c) Y is distributed 4,F ∞ . Find Pr(Y < 3.32).

Answer: 0.99. (d) Y is distributed (500,10000)N . Find Pr(Y > 696 or Y < 304).

33

Answer: 0.05. 7) In considering the purchase of a certain stock, you attach the following probabilities to

possible changes in the stock price over the next year.

Stock Price Change During Next Twelve Months (%)

Probability

+15

0.2

+5

0.3

0

0.4

-5

0.05

-15

0.05

What is the expected value, the variance, and the standard deviation? Which is the most likely outcome? Sketch the cumulative distribution function.

. Answer: E(Y) = 3.5; 2

Yσ = 8.49; Yσ = 2.91; most likely: 0.

Stock Price Change During Next Twelve Months

00.10.20.30.40.50.60.70.80.9

1

-15 -5 0 +5 +15Percentage Change

Prob

abili

ty

Stock Price % Change

34

8) You consider visiting Montreal during the break between terms in January. You go to the relevant Web site of the official tourist office to figure out the type of clothes you should take on the trip. The site lists that the average high during January is –70 C, with a standard deviation of 40 C. Unfortunately you are more familiar with Fahrenheit than with Celsius, but find that the two are related by the following linear function:

5329

C F= − + .

Find the mean and standard deviation for the January temperature in Montreal in Fahrenheit.

Answer: Using equations (2.29) and (2.30) from the textbook, the result is 19.4 and 7.2.

9) Two random variables are independently distributed if their joint distribution is the

product of their marginal distributions. It is intuitively easier to understand that two random variables are independently distributed if all conditional distributions of Y given X are equal. Derive one of the two conditions from the other.

Answer: If all conditional distributions of Y given X are equal, then

Pr( | 1) Pr( | 2) ... Pr( | )Y y X Y y X Y y X l= = = = = = = = = . But if all conditional distributions are equal, then they must also equal the marginal distribution, i.e. Pr( | ) Pr( )Y y X x Y y= = = = . Given the definition of the conditional distribution of Y given X = x, you then get

Pr( , )Pr( | ) Pr( )Pr( )Y y X xY y X x Y y

X x= == = = = =

=,

which gives you the condition Pr( , ) Pr( ) Pr( )Y y X x Y y X x= = = = = .

10) There are frequently situations where you have information on the conditional

distribution of Y given X, but are interested in the conditional distribution of X given Y.

35

Recalling Pr( , )Pr( | )Pr( )X x Y yY y X x

X x= == = =

=, derive a relationship between

Pr( | )X x Y y= = and Pr( | )Y y X x= = . This is called Bayes’ theorem.

Answer: Given Pr( , )Pr( | )Pr( )X x Y yY y X x

X x= == = =

=,

Pr( | ) Pr( ) Pr( , )Y y X x X x X x Y y= = × = = = = ;

similarly Pr( , )Pr( | )Pr( )X x Y yX x Y y

Y y= == = =

= and

Pr( | ) Pr( ) Pr( , )X x Y y Y y X x Y y= = × = = = = . Equating the two and solving for Pr( | )X x Y y= = then results in

Pr( | )X x Y y= = = Pr( | ) Pr( )Pr( )

Y y X x X xY y

= = × ==

.

11) You are at a college of roughly 1,000 students and obtain data from the entire freshman

class (250 students) on height and weight during orientation. You consider this to be a population that you want to describe, rather than a sample from which you want to infer general relationships in a larger population. Weight (Y) is measured in pounds and height (X) is measured in inches. You calculate the following sums:

2

1

n

ii

y=∑ = 94,228.8, 2

1

n

ii

x=∑ = 1,248.9,

1

n

i ii

x y=∑ = 7,625.9

(small letters refer to deviations from means as in i iz Z Z= − ).

(a) Given your general knowledge about human height and weight of a given age, what can you say about the shape of the two distributions?

Answer: Both distributions are bound to be normal.

(b) What is the correlation coefficient between height and weight here?

Answer: 0.703.

12) Use the definition for the conditional distribution of Y given X x= and the marginal

distribution of X to derive the formula for Pr( , )X x Y y= = . This is called the multiplication rule. Use it to derive the probability for drawing two aces randomly from a deck of cards (no joker), where you do not replace the card after the first draw. Next, generalizing the multiplication rule and assuming independence, find the probability of having four girls in a family with four children.

36

Answer: 4 3 0.004552 51

× = ; 0.0625 or 41 1

2 16 =

.

13) The systolic blood pressure of females in their 20s is normally distributed with a mean of

120 with a standard deviation of 9. What is the probability of finding a female with a blood pressure of less than 100? More than 135? Between 105 and 123? You visit the women’s soccer team on campus, and find that the average blood pressure of the 25 members is 114. Is it likely that this group of women came from the same population?

Answer: Pr(Y<100) = 0.0131; Pr(Y>135) = 0.0478; Pr(105<Y<123) = 0.6784;

Pr( 114) Pr( 3.33) 0.0004Y Z< = < − = . (The smallest z-value listed in the table in the textbook is –2.99, which generates a probability value of 0.0014.) This unlikely that this group of women came from the same population.

14) Show that the correlation coefficient between Y and X is unaffected if you use a linear

transformation in both variables. That is, show that * *( , ) ( , )corr X Y corr X Y= , where *X a bX= + and *Y c dY= + , and where a, b, c, and d are arbitrary non-zero constants.

Answer: * *

* *

* * 2 2

cov( , ) cov( , )( , ) ( , )var( ) var( ) var( ) var( )

X Y bd X Ycorr X Y corr X YX Y b X d Y

= = .

15) The textbook formula for the variance of the discrete random variable Y is given as

2 2

1

( )k

Y i Y ii

y pσ µ=

= −∑ .

Another commonly used formulation is

2 2 2

1

k

Y i i yi

y pσ µ=

= −∑ .

Prove that the two formulas are the same.

Answer: 2 2 2 2 2 2

1 1 1

( ) ( 2 ) ( 2 )k k k

Y i Y i i Y Y i i i i Y i Y i ii i i

y p y y p y p p y pσ µ µ µ µ µ= = =

= − = + − = + −∑ ∑ ∑ .

Moving the summation sign through results in

2 2 2

1 1 12

k k k

Y i i Y i Y i ii i i

y p p y pσ µ µ= = =

= + −∑ ∑ ∑ . But 1

1k

ii

p=

=∑ and 1

k

Y i ii

y pµ=

=∑ , giving

you the second expression after simplification.

37

16) The Economic Report of the President gives the following age distribution of the United

States population for the year 2000:

United States Population By Age Group, 2000

Outcome (age category)

Under 5 5-15 16-19 20-24 25-44 45-64 65 and over

Percentage 0.06 0.16 0.06 0.07 0.30 0.22 0.13

Imagine that every person was assigned a unique number between 1 and 275,372,000 (the total population in 2000). If you generated a random number, what would be the probability that you had drawn someone older than 65 or under 16? Treating the percentages as probabilities, write down the cumulative probability distribution. What is the probability of drawing someone who is 24 years or younger?

Answer: Pr( 16Y < or 65)Y > = 0.35;

Outcome (age category)

Under 5 5-15 16-19 20-24 25-44 45-64 65 and over


0.06 0.22 0.28 0.35 0.65 0.87 1.00

Pr( 24) 0.35.Y ≤ =

17) The accompanying table gives the outcomes and probability distribution of the number of

times a student checks her e-mail daily:

Probability of Checking E-Mail

Outcome (number of e-mail checks)

0 1 2 3 4 5 6


0.05 0.15 0.30 0.25 0.15 0.08 0.02

38

Sketch the probability distribution. Next, calculate the c.d.f. for the above table. What is the probability of her checking her e-mail between 1 and 3 times a day? Of checking it more than 3 times a day?

Answer: Outcome (number of e-mail checks)

0 1 2 3 4 5 6


0.05 0.20 0.50 0.75 0.90 0.98 1.00

Pr(1 3) 0.70Y≤ ≤ = ; Pr( 0.25)Y > .

Cumulative Distribution Function

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6

Number of E-mail Checks

Prob

abili

ty

Cumulative Distribution Function

18) The accompanying table lists the outcomes and the cumulative probability distribution

for a student renting videos during the week while on campus.

Video Rentals per Week during Semester

Outcome (number of weekly video rentals)

0 1 2 3 4 5 6


0.05 0.55 0.25 0.05 0.07 0.02 0.01

39

Sketch the probability distribution. Next, calculate the cumulative probability distribution for the above table. What is the probability of the student renting between 2 and 4 a week? Of less than 3 a week?

Answer: The cumulative probability distribution is given below. The probability of

renting between two and four videos a week is 0.37. The probability of renting less than three a week is 0.85.

Outcome (number of weekly video rentals)

0 1 2 3 4 5 6


0.05 0.60 0.85 0.90 0.97 0.99 1.00

19) The textbook mentioned that the mean of , ( )Y E Y is called the first moment of Y, and

that the expected value of the square of 2, ( )Y E Y is called the second moment of Y, and so on. These are also referred to as moments about the origin. A related concept is moments about the mean, which are defined as [( ) ]r

YE Y µ− . What do you call the second moment about the mean? What do you think the third moment, referred to as “skewness,” measures? Do you believe that it would be positive or negative for an earnings distribution? What measure of the third moment around the mean do you get for a normal distribution?

Answer: The second moment about the mean is the variance. Skewness measures the departure from symmetry. For the typical earnings distribution, it will be positive. For the normal distribution, it will be zero.

Number of Weekly Video Rentals

00.10.20.30.40.50.6

1 2 3 4 5 6 7

Number of Rentals

Prob

abili

ty

Number of Weekly Video Rentals

40

20) Explain why the two probabilities are identical for the standard normal distribution:

Pr( 1.96 1.96)X− ≤ ≤ and Pr( 1.96 1.96)X− < < .

Answer: For a continuous distribution, the probability of a point is zero. Chapter 3 Think of at least nine examples, three of each, that display a positive, negative, or no correlation

between two economic variables. In each of the positive and negative examples, indicate whether or not you expect the correlation to be strong or weak.

Answer: Answers will vary by student. Students frequently bring up the following

correlations. Positive correlations: earnings and education (hopefully strong), consumption and personal disposable income (strong), per capita income and investment-output ratio or saving rate (strong); negative correlation: Okun’s Law (strong), income velocity and interest rates (strong), the Phillips curve (strong); no correlation: productivity growth and initial level of per capita income for all countries of the world (beta-convergence regressions), consumption and the (real) interest rate, employment and real wages.

3) Adult males are taller, on average, than adult females. Visiting two recent American

Youth Soccer Organization (AYSO) under-12-year-old (U12) soccer matches on a Saturday, you do not observe an obvious difference in the height of boys and girls of that age. You suggest to your little sister that she collect data on height and gender of children in 4th to 6th grade as part of her science project. The accompanying table shows her findings.

Height of Young Boys and Girls, Grades 4-6, in inches

Boys Girls

BoysY Boyss Boysn GirlsY Girlss Girlsn 57.8 3.9 55 58.4 4.2 57

(e) Let your null hypothesis be that there is no difference in the height of females and males

at this age level. Specify the alternative hypothesis.

41

Answer: 0 : 0Boys GirlsH µ µ− = vs. 1 : 0Boys GirlsH µ µ− ≠ (f) Find the difference in height and the standard error of the difference.

Answer: Boys GirlsY Y− = -0.6, SE( Boys GirlsY Y− ) = 2 23.9 4.2

55 57+ = 0.77.

(g) Generate a 95% confidence interval for the difference in height.

Answer: -0.6 ± 1.96×0.77 = (-2.11, 0.91). (h) Calculate the t-statistic for comparing the two means. Is the difference statistically

significant at the 1% level? Which critical value did you use? Why would this number be smaller if you had assumed a one-sided alternative hypothesis? What is the intuition behind this?

Answer: t = -0.78, so | t | < 2.58, which is the critical value at the 1% level. Hence you

cannot reject the null hypothesis. The critical value for the one-sided hypothesis would have been 2.33. Assuming a one-sided hypothesis implies that you have some information about the problem at hand, and, as a result, can be more easily convinced than if you had no prior expectation.

4) Math SAT scores (Y) are normally distributed with a mean of 500 and a standard

deviation of 100. An evening school advertises that it can improve students’ scores by roughly a third of a standard deviation, or 30 points, if they attend a course which runs over several weeks. (A similar claim is made for attending a verbal SAT course.) The statistician for a consumer protection agency suspects that the courses are not effective. She views the situation as follows: 0 : 500YH µ = vs. 1 : 530YH µ = .

(e) Sketch the two distributions under the null hypothesis and the alternative hypothesis.

Answer:

(f) The consumer protection agency wants to evaluate this claim by sending 50 students to

attend classes. One of the students becomes sick during the course and drops out. What is

42

the distribution of the average score of the remaining 49 students under the null, and under the alternative hypothesis?

Answer: Y of the 49 participants is normally distributed, with a mean of 500 and a

standard deviation of 14.286 under the null hypothesis. Under the alternative hypothesis, it is normally distributed with a mean of 530 and a standard deviation of 14.286.

43

(g) Assume that after graduating from the course, the 49 participants take the SAT test and score an average of 520. Is this convincing evidence that the school has fallen short of its claim? What is the p-value for such a score under the null hypothesis?

Answer: It is possible that the consumer protection agency had chosen a group of 49

students whose average score would have been 490 without attending the course. The crucial question is how likely it is that 49 students, chosen randomly from a population with a mean of 500 and a standard deviation of 100, will score an average of 520. The p-value for this score is 0.081, meaning that if the agency rejected the null hypothesis based on this evidence, it would make a mistake, on average, roughly 1 out of 12 times. Hence the average score of 520 would allow rejection of the null hypothesis that the school has had no effect on the SAT score of students at the 10% level.

(h) What would be the critical value under the null hypothesis if the size of your test were

5%?

Answer: The critical value would be 523. (i) Given this critical value, what is the power of the test? What options does the statistician

have for increasing the power in this situation?

Answer: 1Pr( 523 |Y H< is true) = 0.312. Hence the power of the test is 0.688. She could increase the power by decreasing the size of the test. Alternatively, she could try to convince the agency to hire more test subjects, i.e., she could increase the sample size.

5) Your packaging company fills various types of flour into bags. Recently there have been

complaints from one chain of stores: a customer returned one opened 5 pound bag which weighed significantly less than the label indicated. You view the weight of the bag as a random variable which is normally distributed with a mean of 5 pounds, and, after studying the machine specifications, a standard deviation of 0.05 pounds.

a. You take a sample of 20 bags and weigh them. Sketch below what the average pattern of

individual weights might look like. Let the horizontal axis indicate the sampled bag number (1, 2, …, 20). On the vertical axis, mark the expected value of the weight under the null hypothesis, and two ( ≈ 1.96) standard deviations above and below the expected value. Draw a line through the graph for E(Y) + 2 Yσ , E(Y), and E(Y) – 2 Yσ . How many of the bags in a sample of 20 will you expect to weigh either less than 4.9 pounds or more than 5.1 pounds?

44

Answer: On average, there should be one bag in every sample of 20 which weighs less than 4.9 pounds or more than 5.1 pounds.

b. You sample 25 bags of flour and calculate the average weight. What is the distribution of

the average weight of these 25 bags? Repeating the same exercise 20 times, sketch what the distribution of the average weights would look like in a graph similar to the one you drew in (b), where you have adjusted the standard error of Y accordingly.

Answer: The average weight of 25 bags will be normally distributed, with a mean of 5 pounds and a standard deviation of 0.01 pounds.

45

c. For each of the twenty observations in (b), a 95% confidence interval is constructed. Draw these confidence intervals, using the same graph as in (b). How many of these 20 confidence intervals would you expect to weigh 5 pounds under the null hypothesis?

Answer: You would expect 19 of the 20 confidence intervals to contain 5 pounds.

5) Assume that two presidential candidates, call them Bush and Gore, receive 50% of the

votes in the population. You can model this situation as a Bernoulli trial, where Y is a random variable with success probability Pr( 1)Y p= = , and where Y = 1 if a person votes for Bush and Y = 0 otherwise. Furthermore, let p̂ be the fraction of successes (1s)

in a sample, which is distributed N(p, (1 )p pn− ) in reasonably large samples, say for n ≥

40. (a) Given your knowledge about the population, find the probability that in a random sample of 40, Bush would receive a share of 40% or less.

Answer: 0.40 0.50Pr( 0.40) Pr( ) Pr( 1.26) 0.104.0.2540

p Z Z−< = < = < − ≈ In roughly every

10th sample of this size, Bush would receive a vote of less than 40%, although in truth, his share is 50%.

(a) How would this situation change with a random sample of 100?

46

Answer: 0.40 0.50Pr( 0.40) Pr( ) Pr( 2.00) 0.023.0.25100

p Z Z−< = < = < − ≈ With this sample

size, you would expect this to happen only every 50th sample.

(b) Given your answers in (a) and (b), would you be comfortable to predict what the voting intentions for the entire population are if you did not know p but had polled 10,000 individuals at random and calculated p̂ ? Explain.

Answer: The answers in (a) and (b) suggest that for even moderate increases in the sample size, the estimator does not vary too much from the population mean. Polling 10,000 individuals, the probability of finding a p of 0.48, for example, would be 0.00003. Unless the election was extremely close, which the 2000 election was, polls are quite accurate even for sample sizes of 2,500.

(c) This result seems to hold whether you poll 10,000 people at random in the Netherlands or the United States, where the former has a population of less than 20 million people, while the United States is 15 times as populous. Why does the population size not come into play?

Answer: The distribution of sample means shrinks very quickly depending on the sample size, not the population size. Although at first this does not seem intuitive, the standard error of an estimator is a value which indicates by how much the estimator varies around the population value. For large sample sizes, the sample mean typically is very close to the population mean.

6) You have collected weekly earnings and age data from a sub-sample of 1,744 individuals using the Current Population Survey in a given year.

. (a) Given the overall mean of $434.49 and a standard deviation of $294.67, construct a 99%

confidence interval for average earnings in the entire population. State the meaning of this interval in words, rather than just in numbers. If you constructed a 90% confidence interval instead, would it be smaller or larger? What is the intuition?

Answer: The confidence interval for mean weekly earnings is 434.49 ± 2.57 294.671744

×

= 434.49 ± 18.13 = (416.36, 452.62). Based on the sample at hand, the best guess for the population mean is $434.49. However, because of random

47

sampling error, this guess is likely to be wrong. Instead, the best guess is for the average earnings to lie between $416.36 and $452.62. Committing to such an interval repeatedly implies that the resulting statement is incorrect 1 out of 100 times. For a 90% confidence interval, the only change in the calculation of the confidence interval is to replace 2.57 by 1.64. Hence the confidence interval is smaller. A smaller interval implies, given the same average earnings and the standard deviation, that the statement will be false more often. The larger the confidence interval, the more likely it is to contain the population value.

1. When dividing your sample into people 45 years and older, and younger than 45, the

information shown in the table is found.

Age Category

Average Earnings Y

Standard Deviation Ys

N

Age ≥ 45 $488.87 $328.64 507 Age < 45 $412.20 $276.63 1237

Test whether or not the difference in average earnings is statistically significant. Given

your knowledge of age-earning profiles, does this result make sense?

Answer: Assuming unequal population variances, 2 2

(488.87 412.20)328.64 276.63

507 1237

t −=+

= 4.62,

which is statistically significant at conventional levels whether you use a two-sided or one-sided alternative. Hence the null hypothesis of equal average earnings in the two groups is rejected. Age-earning profiles typically take on an inverted U-shape. Maximum earnings occur in the 40s, depending on some other factors such as years of education, which are not considered here. Hence it is not clear if the alternative hypothesis should be one-sided or two-sided. In such a situation, it is best to assume a two-sided alternative hypothesis.

7) A manufacturer claims that a certain brand of VCR player has an average life expectancy

of 5 years and 6 months with a standard deviation of 1 year and 6 months. Assume that the life expectancy is normally distributed.

(c) Selecting one VCR player from this brand at random, calculate the probability of its life expectancy exceeding 7 years.

Answer: Pr( 7) Pr( 1)Y Z> = > = 0.1587.

(d) The Critical Consumer magazine decides to test fifty VCRs of this brand. The average life in this sample is 6 years and the sample standard deviation is 2 years. Calculate a 99% confidence interval for the average life.

48

Answer: 6 ± 2.57 250

× = 6 ± 0.73 = (5.27, 6.73).

49

(c) How many more VCRs would the magazine have to test in order to halve the width of

the confidence interval?

Answer: 1 2 1 2 2(2.57 ) 2.57 2.572 250 50 4 50

× × = × × = ××

, or n = 200.

8) U.S. News and World Report ranks colleges and universities annually. You randomly sample 100 of the national universities and liberal arts colleges from the year 2000 issue. The average cost, which includes tuition, fees, and room and board, is $23,571.49 with a standard deviation of $7,015.52.

(a) Based on this sample, construct a 95% confidence interval of the average cost of attending a university/college in the United States.

Answer: 23,571.49 ± 1.96 7,015.52100

× = 23,571.49 ± 701.55 = (22,869.94, 24,273.04).

(b) Cost varies by quite a bit. One of the reasons may be that some universities/colleges have a better reputation than others. U.S. News and World Report tries to measure this factor by asking university presidents and chief academic officers about the reputation of institutions. The ranking is from 1 (“marginal”) to 5 (“distinguished”). You decide to split the sample according to whether the academic institution has a reputation of greater than 3.5 or not. For comparison, in 2000, Caltech had a reputation ranking of 4.7, Smith College had 4.5, and Auburn University had 3.1. This gives you the statistics shown in the accompanying table.

Reputation Category

Average Cost Y

Standard Deviation of Cost ( Ys )

N

Ranking > 3.5 $29,311.31 $5,649.21 29 Ranking ≤ 3.5 $21,227.06 $6,133.38 71

Test the hypothesis that the average cost for all universities/colleges is the same independent of the reputation. What alternative hypothesis did you use?


(29,311.31 21,227.06)5,649.21 6,133.38

29 71

t −=+

= 6.33,

which is statistically significant whether or not you use a one-sided or two-sided hypothesis test. Your prior expectation is that academic institutions with a

50

higher reputation will charge more for attending, and hence a one-sided alternative would have been appropriate here.

(c) What other factors should you consider before making a decision based on the data in (b)?

Answer: There may be other variables which potentially have an effect on the cost of

attending the academic institution. Some of these factors might be whether or not the college/university is private or public, its size, whether or not it has a religious affiliation, etc. It is only after controlling for these factors that the “pure” relationship between reputation and cost can be identified.

12) The development office and the registrar have provided you with anonymous matches of

starting salaries and GPAs for 108 graduating economics majors. Your sample contains a variety of jobs, from church pastor to stockbroker.

(a) The average starting salary for the 108 students was $38,644.86 with a standard deviation of $7,541.40. Construct a 95% confidence interval for the starting salary of all economics majors at your university/college.

Answer: 38,644.86 ± 1.96 7,541.40108

× = 38,644.86 ± 1,422.32 = (37,222.54, 40,067.18).

(b) A similar sample for psychology majors indicates a significantly lower starting salary. Given that these students had the same number of years of education, does this indicate discrimination in the job market against psychology majors?

Answer: It suggests that the market values certain qualifications more highly than others. Comparing means and identifying that one is significantly lower than others does not indicate discrimination.

(c) You wonder if it pays (no pun intended) to get good grades by calculating the average salary for economics majors who graduated with a cumulative GPA of B+ or better, and those who had a B or worse. The data is as shown in the accompanying table:

Cumulative GPA

Average Earnings

Y Standard Deviation

Ys n

B+ or better $39,915.25 $8,330.21 59 B or worse $37,083.33 $6,174.86 49

51

Conduct a t-test for the hypothesis that the two starting salaries are the same in the population.

Given that this data was collected in 1999, do you think that your results will hold for other years, such as 2002?


(39,915.25 37,083.33)8,330.21 6,174.86

59 49

t −=+

= 2.03.

The critical value for a one-sided test is 1.64, for a two-sided test 1.96, both at the 5% level. Hence you can reject the null hypothesis that the two starting salaries are equal. Presumably you would have chosen as an alternative that better students receive better starting salaries, so that this becomes your new working hypothesis. 1999 was a boom year. If better students receive better starting offers during a boom year, when the labor market for graduates is tight, then it is very likely that they receive a better offer during a recession year, assuming that they receive an offer at all.

13) During the last few days before a presidential election, there is a frenzy of voting

intention surveys. On a given day, quite often there are conflicting results from three major polls.

(a) Think of each of these polls as reporting the fraction of successes (1s) of a Bernoulli random variable Y, where the probability of success is Pr( 1)Y p= = . Let p̂ be the fraction of successes in the sample and assume that this estimator is normally distributed with a mean of

p and a variance of (1 )p pn− . Why are the results for all polls different, even though they are

taken on the same day?

Answer: Since all polls are only samples, there is random sampling error. As a result, p̂

will differ from sample to sample, and most likely also from p.

(b) Given the estimator of the variance of p̂ , (1 )p p

n− , construct a 95% confidence interval

for p̂ . For which value of p̂ is the standard deviation the largest? What value does it take in the case of a maximum p̂ ?

52

Answer: (1 )1.96 p pp

n−± × . A bit of thought or calculus will show that the standard

deviation will be largest for p̂ = 0.5, in which case it becomes 0.5n

.

(e) When the results from the polls are reported, you are told, typically in the small print, that

the “margin of error” is plus or minus two percentage points. Using the approximation of 1.96 2≈ , and assuming, “conservatively,” the maximum standard deviation derived in (b), what sample size is required to add and subtract (“margin of error”) two percentage points from the point estimate?

Answer: n = 2,500.

(f) What sample size would you need to halve the margin of error?

Answer: n = 10,000.

53

Mathematical and Graphical Problems a. Your textbook defined the covariance between X and Y as follows:

1

1 ( )( )1

n

i ii

X X Y Yn =

− −− ∑

Prove that this is identical to the following alternative specification:

1

11 1

n

i ii

nX Y XYn n=

−− −∑

Answer:

1 1

1 1 1 1

1 1( )( ) ( )1 11 1( ) ( )

1 1

n n

i i i i i ii i

n n n n

i i i i i ii i i i

X X Y Y X Y XY YX YXn n

X Y X Y Y X nYX X Y nXY nYX nYXn n

= =

= = = =

− − = − − +− −

= − − + = − − +− −

∑ ∑

∑ ∑ ∑ ∑

= 1

11 1

n

i ii

nX Y XYn n=

−− −∑ .

b. For each of the accompanying scatterplots for several pairs of variables, indicate whether you expect a positive or negative correlation coefficient between the two variables, and the likely magnitude of it (you can use a small range).

(a)

2

4

6

8

10

2 4 6 8 10 12 14

X

Y

Answer: Positive correlation. The actual correlation coefficient is 0.46.

54

(b)

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

0.0 0.2 0.4 0.6 0.8 1.0

X

Y

Answer: No relationship. The actual correlation coefficient is 0.00007. (c)

0.35

0.40

0.45

0.50

0.55

0.60

0.65

3.5 4.0 4.5 5.0 5.5 6.0 6.5

X

Y

Answer: Negative relationship. The actual correlation coefficient is –0.70.

55

(d)

0

500

1000

1500

2000

0 20 40 60 80 100

X

Y

Answer: Nonlinear (inverted U) relationship. The actual correlation coefficient is 0.23.

c. Your textbook defines the correlation coefficient as follows:

2 2

1

2 2

1 1

1 ( ) ( )1

1 1( ) ( )1 1

n

i ii

n n

i ii i

Y Y X Xnr

Y Y X Xn n

=

= =

− −−=

− −− −

∑

∑ ∑

Another textbook gives an alternative formula:

1 1 1

2 2 2 2

1 1 1 1

( )( )

( ) ( )

n n n

i i i ii i i

n n n n

i i i ii i i i

n Y X Y Xr

n Y Y n X X

= = =

= = = =

−=

− −

∑ ∑ ∑

∑ ∑ ∑ ∑

Prove that the two are the same.

Answer:

56

2 2

1 1

2 2 2 2 2 2

1 1 1 1

1 1( ) ( ) ( )1 1

1 1 1( ) ( ) ( 2 ) ( 2 )1 1 1

n n

i i i i i ii i

n n n n

i i i i i ii i i i

Y Y X X Y X YX XY YXn nr

Y Y X X Y YY Y X XX Xn n n

= =

= = = =

− − − − +− −= =

− − − + − +− − −

∑ ∑

∑ ∑ ∑ ∑

= 1

2 2 2 2

1 1

n

i ii

n n

i ii i

Y X nYX

Y nY X nX

=

= =

−

− −

∑

∑ ∑= 1

2 2 2 2

1 1

n

i ii

n n

i ii i

n Y X nYnX

n Y nY X X

=

= =

−

− −

∑

∑ ∑

= 1 1 1

2 2 2 2

1 1 1 1

( )( )

( ) ( )

n n n

i i i ii i i

n n n n

i i i ii i i i

n Y X Y X

n Y Y n X X

= = =

= = = =

−

− −

∑ ∑ ∑

∑ ∑ ∑ ∑.

4) IQs of individuals are normally distributed with a mean of 100 and a standard deviation of 16. If you sampled students at your college and assumed, as the null hypothesis, that they had the same IQ as the population, then in a random sample of size

(a) n = 25, find Pr( 105)Y < . (b) n = 100, find Pr( 97)Y > . (c) n = 144, find Pr(101 103)Y< < . Answer: a. 0.94; b. 0.97; c. 0.21. 5) Consider the following alternative estimator for the population mean:

.

1 2 3 4 11 1 7 1 7 1 7( ... )

4 4 4 4 4 4n nY Y Y Y Y Y Yn −= + + + + + +

Prove that Y is unbiased and consistent, but not efficient when compared to Y .

Answer: 1 2 3 4 1

1 1 7 1 7 1 7( ) ( ( ) ( ) ( ) ( ) ... ( ) ( ))4 4 4 4 4 4n nE Y E Y E Y E Y E Y E Y E Y

n −= + + + + + +

= 1 1 7(2 2 ... ) .4 4Y Y Y

nn n

µ µ µ+ + + + = = Hence Y is unbiased.

2var( ) ( )YY E Y µ= − = 21 2 3 4 1

1 1 7 1 7 1 7[ ( ... ) ]4 4 4 4 4 4n n YE Y Y Y Y Y Y

nµ−+ + + + + + −

= 21 2 12

1 1 7 1 7[ ( ) ( ) ... ( ) ( )]4 4 4 4Y Y n Y n YE Y Y Y Y

nµ µ µ µ−− + − + + − + −

57

= 2 2 2 21 2 12

1 1 49 1 49[ ( ) ( ) ... ( ) ( ) ]16 16 16 16Y Y n Y n YE Y E Y E Y E Y

nµ µ µ µ−− + − + + − + −

= 2 2 2 22

1 1 49 1 49[ ... ]16 16 16 16Y Y Y Yn

σ σ σ σ+ + + + = 2

2

1 49[ ( )]2 16 16

Y nnσ + = 1.5625

2Y

nσ .

Since var( ) 0Y → as ,n → ∞ Y is consistent. Y has a larger variance than Y and is therefore not as efficient.

5) Imagine that you had sampled 1,000,000 females and 1,000,000 males to test whether or

not females have a higher IQ than males. IQs are normally distributed with a mean of 100 and a standard deviation of 16. You are excited to find that females have an average IQ of 101 in your sample, while males have an IQ of 99. Does this difference seem important? Do you really need to carry out a t-test for differences in means to determine whether or not this difference is statistically significant? What does this result tell you about testing hypotheses when sample sizes are very large?

Answer: The difference seems very small, both in terms of absolute values and, more

importantly, in terms of standard deviations. With a sample size as large as n=1,000,000, the standard error becomes extremely small. This implies that the distribution of means, or differences in means, has almost turned into a spike. In essence, you are (very close to) observing the population. It is therefore unnecessary to test whether or not the difference is statistically significant. After all, if in the population, the male IQ were 99.99 and the female IQ were 100.01, they would be different. In general, when sample sizes become very large, it is very easy to reject null hypotheses about population means, which involve sample means as an estimator, even if hypothesized differences are very small. This is the result of the distribution of sample means collapsing fairly rapidly as sample sizes increase.

7) Let Y be a Bernoulli random variable with success probability Pr( 1)Y p= = , and let

1,..., nY Y be i.i.d. draws from this distribution. Let p̂ be the fraction of successes (1s) in this sample. In large samples, the distribution of p̂ will be approximately normal, i.e. p̂

is approximately distributed (1 )( , )p pN pn− . Now let X be the number of successes and n

the sample size. In a sample of 10 voters (n=10), if there are six who vote for candidate A, then X = 6. Relate X, the number of success, to p̂ , the success proportion, or fraction of successes. Next, using your knowledge of linear transformations, derive the distribution of X.

Answer: X n p= × . Hence if p is distributed (1 )( , )p pN pn− , then, given that X is a

linear transformation of p , X is distributed ( , (1 ))N np np p− .

58

8) When you perform hypothesis tests, you are faced with four possible outcomes described in the accompanying table.

Truth (Population) Decision based on sample 0H is true 1H is true

Reject 0H I ☺ Do not reject 0H ☺ II

“☺” indicates a correct decision, and I and II indicate that an error has been made. In probability terms, state the mistakes that have been made in situation I and II, and relate these to the Size of the test and the Power of the test (or transformations of these). Answer: I: Pr(reject 0 0|H H is correct) = Size of the test.

II: Pr(reject 1 1|H H is correct) = (1-Power of the test). 9) Assume that under the null hypothesis, Y has an expected value of 500 and a standard

deviation of 20. Under the alternative hypothesis, the expected value is 550. Sketch the probability density function for the null and the alternative hypothesis in the same figure. Pick a critical value such that the p-value is approximately 5%. Mark the areas, which show the size and the power of the test. What happens to the power of the test if the alternative hypothesis moves closer to the null hypothesis, i.e., Yµ = 540, 530, 520, etc.? Answer: For a given size of the test, the power of the test is lower.

140 – Undergraduate Econometrics Professor Òscar Jordà Spring 2003

59

10) The net weight of a bag of flour is guaranteed to be 5 pounds with a standard deviation of 0.05 pounds. You are concerned that the actual weight is less. To test for this, you sample 25 bags. Carefully state the null and alternative hypothesis in this situation. Determine a critical value such that the size of the test does not exceed 5%. Finding the average weight of the 25 bags to be 4.7 pounds, can you reject the null hypothesis? What is the power of the test here? Why is it so low?

Answer: Let Y be the net weight of the bag of flour. Then 0 : ( ) 5H E Y = and

1 : ( ) 5H E Y < . Under the null hypothesis, Y is distributed normally, with a mean of 5 pounds and a standard deviation of 0.01 pounds. The critical value is approximately 4.98 pounds. Since 4.7 pounds falls in the rejection region, the null hypothesis is rejected. The power of the test is low here, since there is no simple alternative. In the extreme case, where the alternative hypothesis would place the net weight marginally below five pounds, the power of the test would approximately equal its size, or 5% in this case.

11) Some policy advisors have argued that education should be subsidized in

developing countries to reduce fertility rates. To investigate whether or not education and fertility are correlated, you collect data on population growth rates (Y) and education (X) for 86 countries. Given the sums below, compute the sample correlation:

1

n

ii

Y=∑ = 1.594;

1

n

ii

X=∑ = 449.6;

1

n

i ii

Y X=∑ = 6.4697; 2

1

n

ii

Y=∑ = 0.03982; 2

1

n

ii

X=∑ =

3,022.76 Answer: r = –0.716. 12) (Advanced) Unbiasedness and small variance are desirable properties of

estimators. However, you can imagine situations where a trade-off exists between the two: one estimator may be have a small bias but a much smaller variance than another, unbiased estimator. The concept of “mean square error” estimator combines the two concepts. Let µ̂ be an estimator of µ . Then the mean square error (MSE) is defined as follows: MSE( µ̂ ) = 2ˆ( )E µ µ− . Prove that MSE( µ̂ ) = bias2 + var( µ̂ ). (Hint: subtract and add ˆ( )E µ in 2ˆ( )E µ µ− .)

Answer:

2 2

2 2

ˆ ˆ ˆ ˆ ˆ ˆ( ) ( ( ) ( ) ) [( ( )) ( ( ) )]ˆ ˆ ˆ ˆ ˆ ˆ[( ( )) ( ( ) ) 2( ( ))( ( ) )]

MSE E E E E E EE E E E E

µ µ µ µ µ µ µ µ µµ µ µ µ µ µ µ µ

= − + − = − + −= − + − + − −

Next, moving through the expectation operator results in 2 2ˆ ˆ ˆ ˆ ˆ ˆ[ ( )] [ ( ) )] 2 [( ( ))( ( ) )]E E E E E E Eµ µ µ µ µ µ µ µ− + − + − − .


60

The first term is the variance, and the second term is the squared bias, since 2 2ˆ ˆ[ ( ) )] [ ( ) )]E E Eµ µ µ µ− = − . This proves MSE( µ̂ ) = bias2 + var( µ̂ ) if the last term equals zero. But

2ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ[( ( ))( ( ) )] [ ( ) ( ( )) ( )]E E E E E E Eµ µ µ µ µ µ µµ µ µ µ− − = − − + = 2ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ( )) ( ) 0.E E E E Eµ µ µ µ µ µ µ− − + =

13) Your textbook states that when you test for differences in means and you assume

that the two population variances are equal, then an estimator of the population variance is the following “pooled” estimator:

2 2 2

1 1

1 ( ) ( )2

m wn n

pooled i m i wi im w

s Y Y Y Yn n = =

= − + − + −

∑ ∑

Explain why this pooled estimator can be looked at as the weighted average of the two variances. Answer:

2 2 2 2 2

1 1

2 2

1 1( ) ( ) ( 1) ( 1)2 2

( 1) ( 1) .2 2

m wn n

pooled i m i w m m w wi im w m w

m wm w

m w m w

s Y Y Y Y n s n sn n n n

n ns sn n n n

= =

= − + − = − + − + − + −

− −= ++ − + −

∑ ∑

i. Your textbook suggests using the first observation from a sample of n as an estimator of the population mean. It is shown that this estimator is unbiased but has a variance of 2

Yσ , which makes it less efficient than the sample mean. Explain why this estimator is not consistent. You develop another estimator, which is the simple average of the first and last observation in your sample. Show that this estimator is also unbiased and show that it is more efficient than the estimator which only uses the first observation. Is this estimator consistent?

Answer: The estimator is not consistent because its variance does not vanish as n

goes to infinity, i.e. 1var( ) 0Y → as n → ∞ does not hold.

1

1 ( )2 nY Y Y= + .

11( ) ( ( ) ( ))2 nE Y E Y E Y= + = 1 ( ) .

2 Y Y Yµ µ µ+ = Hence Y

is unbiased. 2var( ) ( )YY E Y µ= − = 21

1 1[( ) ]2 2 n YE Y Y µ+ −

= 21

1 1[ ( ) ( )]2 2Y n YE Y Yµ µ− + − =

2 21

1 [ ( ) ( ) ]4 Y n YE Y E Yµ µ− + − = 2 21 [ ]

4 Y Yσ σ+ =2

2Yσ .


61

Since var( ) 0Y → as ,n → ∞ does not hold, Y is not consistent.

1var( ) var( )Y Y< , and is therefore more efficient than the estimator, which only uses the first observation.

15) Let p be the success probability of a Bernoulli random variable Y, i.e., Pr( 1)p Y= = . It can be shown that p , the fraction of successes in a sample, is

asymptotically distributed (1 )( , )p pN pn− . Using the estimator of the variance of p ,

(1 )p pn− , construct a 95% confidence interval for p. Show that the margin for

sampling error simplifies to 1/ n if you used 2 instead of 1.96 assuming, conservatively, that the standard error is at its maximum. Construct a table indicating the sample size needed to generate a margin of sampling error of 1%, 2%, 5% and 10%. What do you notice about the increase in sample size needed to halve the margin of error? (The margin of sampling error is 1.96 ( )SE p× .)

Answer: The 95% confidence interval for p is (1 )1.96 p pp

n−± × .

(1 )p pn− is

at a maximum for p = 0.5, in which case the confidence interval reduces

to 0.25 11.96p pn n

± × ≈ ± , and the margin of sampling error is 1n

.

1n

n

0.01 10,000 0.02 2,500 0.05 400 0.10 100

To halve the margin of error, the sample size has to increase fourfold.

16) Let Y be a Bernoulli random variable with success probability Pr( 1)Y p= = , and

let 1,..., nY Y be i.i.d. draws from this distribution. Let p̂ be the fraction of successes (1s) in this sample. Given the following statement

Pr( 1.96 1.96) 0.95z− < < =

and assuming that p̂ is approximately distributed (1 )( , )p pN pn− , derive the 95%

confidence interval for p by solving the above inequalities.


62

Answer:

Pr( 1.96 1.96) 0.95(1 )

p pp p

n

−− < < =−

. Multiplying through by the

standard deviation results

in (1 ) (1 )Pr( 1.96 1.96 ) 0.95p p p pp pn n− −− × < − < × = . Subtraction

of p̂ then yields, after multiplying both sides by (-1),

(1 ) (1 )Pr( 1.96 1.96 ) 0.95p p p pp p pn n− −− × < < + × = . The 95%

confidence interval for p then is (1 )1.96 p ppn−± × .

17) Your textbook mentions that dividing the sample variance by n –1 instead of n is

called a degrees of freedom correction. The meaning of the term stems from the fact that one degree of freedom is used up when the mean is estimated. Hence degrees of freedom can be viewed as the number of independent observations remaining after estimating the sample mean.

Consider an example where initially you have 20 independent observations on the height of students. After calculating the average height, your instructor claims that you can figure out the height of the 20th student if she provides you with the height of the other 19 students and the sample mean. Hence you have lost one degree of freedom, or there are only 19 independent bits of information. Explain how you can find the height of the 20th student.

Answer: Since 20

1

1 ,20 i

iY Y

=

= ∑ 20 19

201 1

20 i ii i

Y Y Y Y= =

× = = +∑ ∑ . Hence knowledge of the

sample mean and the height of the other 19 students is sufficient for finding the height of the 20th student.

18) The accompanying table lists the height (STUDHGHT) in inches and weight (Weight) in pounds of five college students. Calculate the correlation coefficient.

STUDHGHT WEIGHT

74 165 73 165 72 145 68 155 66 140

Answer: r = 0.72.


63

19) (Requires calculus.) The variance of the success probability p (a Bernoulli

random variable) is (1 )p pn− . Use calculus to show that this variance is minimized

for p = 0.5.

Answer:

(1 )1 0.

p pp pn

p n n

− ∂ − = − =∂

Hence 1 2 0p− = or 1 .2

p =

20) Consider two estimators: one which is biased and has a smaller variance, the other which is unbiased and has a larger variance. Sketch the sampling distributions and the location of the population parameter for this situation. Discuss conditions under which you may prefer to use the first estimator over the second one.

Answer: The bias indicates “how far away,” on average, the estimator is from the

population value. Although this average is zero for an unbiased estimator, there may be quite some variation around the population mean. In a single draw, there is therefore a high probability of being some distance away from the population mean. On the other hand, if the variance is very small and the estimator is biased by a small amount, then the probability of being closer to the population value may be higher. (The biased estimator may have a smaller mean square error than the unbiased estimator.)


64

Chapter 4 1) You have obtained measurements of height in inches of 29 female and 81 male students (Studenth) at your university. A regression of the height on a constant and a binary variable (BFemme), which takes a value of one for females and is zero otherwise, yields the following result:

Studenth = 71.0 - 4.84×BFemme , R2 = 0.40, SER = 2.0

(0.3) (0.57) (a) Interpret the results.

Answer: The average height of male students is 71 inches, and that of females is approximately 66 inches.

(b) Test the hypothesis that females, on average, are shorter than males, at the 1%

level.

Answer: The t-statistic for the difference in means is -8.49. For a one-sided test, the critical value is –2.33. Hence the difference is statistically significant.

(c) Is it likely that the error term is homoskedastic here?


65

Answer: It is safer to assume that the variances for males and females are different. In the underlying sample the standard deviation for females was smaller.

2) You have obtained a sub-sample of 1744 individuals from the Current Population

Survey (CPS) and are interested in the relationship between weekly earnings and age. The regression, using heteroskedasticity-robust standard errors, yielded the following result:

Earn = 239.16 + 5.20×Age , R2 = 0.05, SER = 287.21.,

(20.24) (0.57)

where Earn and Age are measured in dollars and years respectively. (a) Interpret the results.

Answer: A person who is one year older increases her weekly earnings by $5.20. There is no meaning attached to the intercept. The regression explains 5 percent of the variation in earnings.

(b) Is the relationship between Age and Earn statistically significant? Is the effect of

age on earnings large?

Answer: The t-statistic on the slope is 9.12, which is above the critical value from the standard normal distribution for any reasonable level of significance. Assuming that people worked 52 weeks a year, the effect of being one year older translates into an additional $270.40 a year. This does not seem particularly large in 2002 dollars, but may have been earlier.

(c) Why should age matter in the determination of earnings? Do the results suggest

that there is a guarantee for earnings to rise for everyone as they become older? Do you think that the relationship between age and earnings is linear?

Answer: In general, age-earnings profiles take on an inverted U-shape. Hence it

is not linear and the linear approximation may not be good at all. Age may be a proxy for “experience,” which in itself can approximate “on the job training.” Hence the positive effect between age and earnings. The results do not suggest that there is a guarantee for earnings to rise for everyone as they become older since the regression R2 does not equal 1. Instead the result holds “on average.”

(d) The variance of the error term and the variance of the dependent variable are

related. Given the distribution of earnings, do you think it is plausible that the distribution of errors is normal?


66

Answer: Since the earnings distribution is highly skewed, it is not reasonable to assume that the error distribution is normal.

(e) (Requires Appendix Material) The average age in this sample is 37.5 years. What

is annual income in the sample?

Answer: Since 0 1 0 1Y X Y Xβ β β β= − ⇒ = + . Substituting the estimates for the

slope and the intercept then results in average weekly earnings of $434.16 or annual average earnings of $22,576.32.

3) The baseball team nearest to your home town is, once again, not doing well.

Given that your knowledge of what it takes to win in baseball is vastly superior to that of management, you want to find out what it takes to win in Major League Baseball (MLB). You therefore collect the winning percentage of all 30 baseball teams in MLB for 1999 and regress the winning percentage on what you consider the primary determinant for wins, which is quality pitching (team earned run average). You find the following information on team performance:

Summary of the Distribution of Winning Percentage and Team Earned Run

Average for MLB in 1999

Average Standard deviation

Percentile

10% 25% 40% 50% (median)

60% 75% 90%

Team ERA

4.71 0.53 3.84 4.35 4.72 4.78 4.91 5.06 5.25

Winning Percentage

0.50 0.08 0.40 0.43 0.46 0.48 0.49 0.59 0.60

(a) What is your expected sign for the regression slope? Will it make sense to

interpret the intercept? If not, should you omit it from your regression and force the regression line through the origin?

Answer: You expect a negative relationship, since a higher team ERA implies a

lower quality of the input. No team comes close to a zero team ERA, and therefore it does not make sense to interpret the intercept. Forcing the regression through the origin is a false implication from this insight. Instead the intercept fixes the level of the regression.

(b) The authors of your textbook have informed you that unless you have more than

100 observations, it may not be plausible to assume that the distribution of your OLS estimators is normal. What are the implications here for testing the significance of your theory?


67

Answer: Since there are only 30 observations, the distribution of the t-statistic is

unknown. You should therefore not conduct statistical inference. (c) OLS estimation of the relationship between the winning percentage and the team

ERA yields the following:

Winpct = 0.94 – 0.10×teamera , R2=0.49, SER = 0.06, (0.08) (0.02)

where winpct is measured as wins divided by games played, so for example a team that won half of its games would have Winpct = 0.50. Interpret your regression results.

Answer: For every one point increase in Team ERA, the winning percentage

decreases by 10 percentage points, or 0.10. Roughly half of the variation in winning percentage is explained by the quality of team pitching.

(d) It is typically sufficient to win 90 games to be in the playoffs and/or to win a

division. Winning over 100 games a season is exceptional: the Atlanta Braves had the most wins in 1999 with 103. Teams play a total of 162 games a year. Given this information, do you consider the slope coefficient to be large or small?

Answer: The coefficient is large, since increasing the winning percentage by 0.10

is the equivalent of winning 16 more games per year. Since it is typically sufficient to win 56 percent of the games to qualify for the playoffs, this difference of 0.10 in winning percentage turns can easily turn a loosing team into a winning team.

(e) What would be the effect on the slope, the intercept, and the regression R2 if you

measured Winpct in percentage points, i.e. as (Wins/Games)×100?

Answer: Clearly the regression R2 will not be affected by a change in scale, since a descriptive measure of the quality of the regression would depend on whim otherwise. The slope of the regression will compensate in such a way that the interpretation of the result is unaffected, i.e. it will become 10 in the above example. The intercept will also change to reflect the fact that if X were 0, then the dependent variable would now be measured in percentage, i.e., it will become 94.0 in the above example.

(f) Are you impressed with the size of the regression R2? Given that there is 51% of

unexplained variation in the winning percentage, what might some of these factors be?


68

Answer: It is impressive that a single variable can explain roughly half of the variation in winning percentage. Answers to the second question will vary by student, but will typically include the quality of hitting, fielding, and management. Salaries could be included, but should be reflected in the inputs.

4) You have learned in one of your economics courses that one of the determinants

of per capita income (the “Wealth of Nations”) is the population growth rate. Furthermore you also found out that the Penn World Tables contain income and population data for 104 countries of the world. To test this theory, you regress the GDP per worker (relative to the United States) in 1990 (RelPersInc) on the difference between the average population growth rate of that country (n) to the U.S. average population growth rate (nus ) for the years 1980 to 1990. This results in the following regression output:

Re lPersInc = 0.518 – 18.831×(n – nus) , R2=0.522, SER = 0.197

(0.056) (3.177) (a) Interpret the results carefully. Is this relationship statistically significant? Is it

economically important?

Answer: A relative increase in the population rate of one percentage point, from 0.01 to 0.02, say, lowers relative per-capita income by almost 20 percentage points (0.188). This is a quantitatively important and large effect. Nations which have the same population growth rate as the United States have, on average, roughly half as much per capita income. The t-statistic is 5.93, making the relationship statistically significant.

(b) What would happen to the slope, intercept, and regression R2 if you ran another

regression where the above explanatory variable was replaced by n only, i.e., the average population growth rate of the country? (The population growth rate of the United States from 1980 to 1990 was 0.009.) Should this have any affect on the t-statistic of the slope?

Answer: The interpretation of the partial derivative is unaffected, in that the slope

still indicates the effect of a one percentage point increase in the population growth rate. The regression R2 and t-statistic will remain the same since only a constant was removed from the explanatory variable. The intercept will change as a result of the change in X .

(c) Is there any reason to believe that the variance of the error terms is

homoskedastic?

Answer: There are vast differences in the size of these countries, both in terms of the population and GDP. Furthermore, the countries are at different stages of economic and institutional development. Other factors vary as


69

well. It would therefore be odd to assume that the errors would be homoskedastic.

(d) 31 of the 104 countries have a dependent variable of less than 0.10. Does it

therefore make sense to interpret the intercept?

Answer: To interpret the intercept, you must observe values of X close to zero, not Y.

5) The neoclassical growth model predicts that for identical savings rates and

population growth rates, countries should converge to the per capita income level. This is referred to as the convergence hypothesis. One way to test for the presence of convergence is to compare the growth rates over time to the initial starting level.

(a) If you regressed the average growth rate over a time period (1960-1990) on the

initial level of per capita income, what would the sign of the slope have to be to indicate this type of convergence? Explain. Would this result confirm or reject the prediction of the neoclassical growth model?

Answer: You would require a negative sign. Countries that are far ahead of

others at the beginning of the period would have to grow relatively slower for the others to catch up. This represents unconditional convergence, whereas the neoclassical growth model predicts conditional convergence, i.e., there will only be convergence if countries have identical savings, population growth rates, and production technology.

(b) The results of the regression for 104 countries were as follows:

6090g = 0.019 – 0.0006×RelProd60 , R 2= 0.00007, SER = 0.016, (0.004) (0.0073)

where g6090 is the average annual growth rate of GDP per worker for the 1960-1990 sample period, and RelProd60 is GDP per worker relative to the United States in 1960. Interpret the results. Is there any evidence of unconditional convergence between the countries of the world? Is this result surprising? What other concept could you think about to test for convergence between countries?

Answer: An increase in 10 percentage points in RelProd60 results in a decrease of

0.00006 in the growth rate from 1960 to 1990, i.e., countries that were further ahead in 1960 do grow by less. There are some countries in the sample that have a value of RelProd60 close to zero (China, Uganda, Togo, Guinea) and you would expect these countries to grow roughly


70

by 2 percent per year over the sample period. The regression R 2

indicates that the regression has virtually no explanatory power. This is confirmed by the very low t-statistic, indicating that the slope is not statistically significant. The result is not surprising given that there are not many theories that predict unconditional convergence between the countries of the world.

(c) Using the OLS estimator with homoskedasticity-only standard errors, the results

changed as follows:

6090g = 0.019 – 0.0006×RelProd60 , R 2= 0.00007, SER = 0.016 (0.002) (0.0068)

Why didn’t the estimated coefficients change? Given that the standard error of the slope is now smaller, can you reject the null hypothesis of no beta convergence? Are the results in (c) more reliable than the results in (b)? Explain.

Answer: Using homoskedasticity-only standard errors has no effect on the OLS

estimator. The t- statistic remains small and is certainly below the critical value. The results are less reliable since there is no reason to believe that the error variance is homoskedastic.

(d) You decide to restrict yourself to the 24 OECD countries in the sample. This

changes your regression output as follows:

6090g = 0.048 – 0.0404 RelProd60 , R2 = 0.82 , SER = 0.0046 (0.004) (0.0063)

How does this result affect your conclusions from above? When you test for convergence, should you worry about the relatively small sample size?

Answer: Judging by the size of the slope coefficient, there is strong evidence of

unconditional convergence for the OECD countries. The regression R2

is quite high, given that there is only a single explanatory variable in the regression. However, since we do not know the sampling distribution of the estimator in this case, we cannot conduct inference.

6) In 2001, the Arizona Diamondbacks defeated the New York Yankees in the Baseball World Series in 7 games. Some players, such as Bautista and Finley for the Diamondbacks, had a substantially higher batting average during the World Series than during the regular season. Others, such as Brosius and Jeter for the Yankees, did substantially poorer. You set out to investigate whether or not the regular season batting average is a good indicator for the World Series batting average. The results for 11 players who had the most at bats for the two teams are:


71

AZWsavg = –0.347 + 2.290 AZSeasavg , R2=0.11, SER = 0.145,

(0.604) (2.156)

NYWsavg = 0.134 + 0.136 NYSeasavg , R2=0.001, SER = 0.092, (0.371) (1.347)

where Wsavg and Seasavg indicate the batting average during the World Series and the regular season respectively.

(a) Focusing on the coefficients first, what is your interpretation?

Answer: The two regressions are quite different. For the Diamondbacks, players who had a 10 point higher batting average during the regular season had roughly a 23 point higher batting average during the World Series. Hence top performers did relatively better. The opposite holds for the Yankees.

(b) What can you say about the explanatory power of your equation? What do you

conclude from this?

Answer: Both regressions have little explanatory power as seen from the regression R2. Hence performance during the season is a poor forecast of World Series performance.

(c) Calculate the t-statistics for the various regression coefficients. Are any of these

significant at the 5% level? When using statistical inference in this case, should you be concerned about the number of observations?

Answer: The respective t-statistics are –0.575, 1.062, 0.361, and 0.101. None of

these are statistically significant at the 5% level. However, given that there are only 11 observations, you should not conduct inference, since the sampling distribution is unknown.

Mathematical Questions 1) Prove that the regression R2 is identical to the square of the correlation coefficient

between two variables Y and X. Regression functions are written in a form that suggests causation running from X to Y. Given your proof, does a high regression R2 present supportive evidence of a causal relationship? Can you think of some regression examples where the direction of causality is not clear? Is without a doubt?


72

Answer: The regression 2 ESSRTSS

= , where ESS is given by 2

1( )

n

ii

Y Y=

−∑ . But

0 1i iY Xβ β= + and

0 1Y Xβ β= + . Hence 22 21( ) ( )iiY Y X Xβ− = − ,

and therefore 2 21

1( )

n

ii

ESS X Xβ=

= −∑ . Using small letters to indicate

deviations from mean, i.e., i iz Z Z= − , we get that the regression

2 21

2 1

2

1

n

ii

n

ii

xR

y

β=

=

=∑

∑. The square of the correlation coefficient is

22 2 2 21

2 1 1 1 1

2 2 2 2 2 2

1 1 1 1 1

( ) ( )

( )

n n n n

i i i i i ii i i in n n n n

i i i i ii i i i i

y x y x x xr

x y x y y

β= = = =

= = = = =

= = =∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑. Hence the two are the

same. Correlation does not imply causation. Income is a regressor in the consumption function, yet consumption enters on the right-hand side of the GDP identity. Regressing the weight of individuals on the height is a situation where causality is without doubt, since the author of this test bank should be seven feet tall otherwise. The authors of the textbook use weather data to forecast orange juice prices later in the text.

2) In order to formulate whether or not the alternative hypothesis is one-sided or two-sided, you need some guidance from economic theory. Choose at least three examples from economics or other fields where you have a clear idea what the null hypothesis and the alternative hypothesis for the slope coefficient should be. Write a brief justification for your answer.

Answer: Answers will vary by student. The problem is to find examples where

there is only a single explanatory variable. A student may argue that the price coefficient in a demand function is downward sloping, but unless you control for other variables, this may not be so. The demand for L.A. Laker tickets and their price comes to mind. CAPM is a nice example. Perhaps the marginal propensity to consume in a consumption function is another. Testing for speculative efficiency in exchange rate markets may also work.

3) For the following estimated slope coefficients and their standard errors, find the t-

statistics for the null hypothesis H0: β1 = 0. Indicate whether or not you are able to reject the null hypothesis at the 10%, 5%, and 1% level of a one-sided and two-sided hypothesis.

(a) 1 1

ˆ ˆ4.2, ( ) 2.4SEβ β= =


73

(b) 1 1ˆ ˆ0.5, ( ) 0.37SEβ β= =

(c) 1 1ˆ ˆ0.003, ( ) 0.002SEβ β= =

(d) 1 1ˆ ˆ360, ( ) 300SEβ β= =

Answer: a) t = 1.75; reject null 10% level of two-sided test, and 5% of

one-sided test. b) t = 1.35; cannot reject null at 10% of two-sided test, reject null at 10% of one-sided test. c) t = 1.50; cannot reject null at 10% of two-sided test, reject null at 10% of one-sided test. d) t = 1.20; cannot reject null at 10% of both two-sided and one-sided test.

4) Explain carefully the relationship between a confidence interval, a one-sided

hypothesis test, and a two-sided hypothesis test. What is the unit of measurement of the t-statistic?

Answer: In the case of a two-sided hypothesis test, the relationship between the t-

statistic and the confidence interval is straightforward. The t-statistic calculates the distance between the estimate and the hypothesized value in standard deviations. If the distance is larger than 1.96 (size of the test: 5%), then the distance is large enough to reject the null hypothesis. The confidence interval adds and subtracts 1.96 standard deviations in this case, and asks whether or not the hypothesized value is contained within the confidence interval. Hence the two concepts resemble the two sides of a coin. They are simply different ways to look at the same problem. In the case of the one-sided test, the relationship is more complex. Since you are looking at a one-sided alternative, it does not really make sense to construct a confidence interval. However, the confidence interval results in the same conclusion as the t-test if the critical value from the standard normal distribution is appropriately adjusted, e.g. to 10% rather than 5%. The unit of measurement of the t-statistic is standard deviations.

5) You have analyzed the relationship between the weight and height of individuals.

Although you are quite confident about the accuracy of your measurements, you feel that some of the observations are extreme, say, two standard deviations above and below the mean. Your therefore decide to disregard these individuals. What consequence will this have on the standard deviation of the OLS estimator of the slope?

Answer: Other things being equal, the standard error of the slope coefficient will

decrease the larger the variation in X. Hence you prefer more variation rather than less. This is easier to see in the case of homoskedasticity-only standard errors, but carries over to the heteroskedasticity-robust


74

standard errors. Intuitively it is easier for OLS to detect a response to a unit change in X if the data varies more.

6) In order to calculate the regression R2 you need the TSS and either the SSR or the

ESS. The TSS is fairly straightforward to calculate, being just the variation of Y. However, if you had to calculate the SSR or ESS by hand (or in a spreadsheet), you would need all fitted values from the regression function and their deviations from the sample mean, or the residuals. Can you think of a quicker way to calculate the ESS simply using terms you have already used to calculate the slope coefficient?

Answer: The ESS is given by 2

1

( )n

ii

Y Y=

−∑ . But 0 1i iY Xβ β= + and

0 1Y Xβ β= + . Hence 22 2

1( ) ( )iiY Y X Xβ− = − , and therefore

2 21

1

( )n

ii

ESS X Xβ=

= −∑ . The right-hand side contains the estimated

slope squared and the denominator of the slope, i.e., all values that have already been calculated.

7) (Requires Appendix Material) In deriving the OLS estimator, you minimize the

sum of squared residuals with respect to the two parameters 0β̂ and 1β̂ . The resulting two equations imply two restrictions that OLS places on the data,

namely that 1

ˆn

ii

u=∑ = 0 and

1

ˆn

i ii

u X=∑ = 0. Show that you get the same formula for

the regression slope and the intercept if you impose these two conditions on the sample regression function.

Answer: The sample regression function is $

1 ii ioY X uβ β= + + . Summing both

sides results in 1

1 1 1

n n n

i i ioi i i

Y n X uβ β= = =

= + +∑ ∑ ∑ . Imposing the first

restriction, namely that the sum of the residuals is zero, dividing both sides of the equation by n, and solving for oβ gives the OLS formula for the intercept.

For the second restriction, multiply both sides of the sample regression function by iX and then sum both sides to get

21

1 1 1 1

n n n n

i i i i i ioi i i i

Y X X X u Xβ β= = = =

= + +∑ ∑ ∑ ∑ . After imposing the restriction

1

ˆn

i ii

u X=∑ =0 and substituting the formula for the intercept, you get


75

21 1

1 1

( )n n

i i ii i

Y X Y X nX Xβ β= =

= − +∑ ∑ or 21 1

1 1

n n

i i ii i

Y X nYX X Xβ β= =

− = −∑ ∑ ,

which, after isolating 1β and dividing by the variation in ,X results in the OLS estimator for the slope.

8) (Requires Appendix Material) Show that the two alternative formulae for the

slope given in your textbook are identical.

1

2 2

1

1

1

n

i ii

n

ii

X Y XYn

X Xn

=

=

−

−

∑

∑ = 1

2

1

( )( )

( )

n

i ii

n

ii

X X Y Y

X X

=

=

− −

−

∑

∑

In addition, the help function for a commonly used spreadsheet program gives the following definition for the regression slope it estimates:

∑ ∑

∑ ∑∑

= =

= ==

−

−

n

i

n

iii

n

i

n

ii

n

iiii

XXn

YXYXn

1 1

22

1 11

)(

))((

Prove that this formula is also the same as those given above.

Answer: Let’s start with the first equality. The numerator of the right-hand side

expression can be written as follows:

1 1 1 1 1

( )( ) ( )n n n n n

i i i i i i i i i ii i i i i

X X Y Y X Y XY YX XY X Y X Y Y X nYX= = = = =

− − = − − + = − − +∑ ∑ ∑ ∑ ∑

1 1

n n

i i i ii i

Y X nXY nXY nXY Y X nXY= =

= − − + = −∑ ∑ . (Note that 1

n

ii

X nX=

=∑ .)

Multiplying out the terms in the denominator and moving the summation

sign into the expression in parentheses similarly yields 2 2

1

n

ii

X nX=

−∑.

Dividing both of these expressions by n then results in the left-hand side fraction.


76

Finally,

1 1 1 1 1

2 2 2 2 2 2

1 1 1 1

( )( )

( ) ( )

n n n n n

i i i i i i i ii i i i i

n n n n

i i i ii i i i

n X Y X Y n X Y nXnY X Y nXY

n X X n X nX X nX

= = = = =

= = = =

− − −= =

− − −

∑ ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑.

Dividing both numerator and denominator by n then gives you the desired result.

9) (Requires Calculus) Consider the following model:

ii uY += 0β . Derive the OLS estimator for β0.

Answer: To derive the OLS estimator, minimize the sum of squared prediction

mistakes 20

1

( )n

ii

Y b=

−∑ . Taking the derivative with respect to 0b results

in 2 2

0 0 01 1 10 0

( ) ( ) 2( )( 1)n n n

i i ii i i

Y b Y b Y bb b= = =

∂ ∂− = − = − −∂ ∂∑ ∑ ∑

0 01 1

( 2) ( ) ( 2)n n

i ii i

Y b Y nb= =

= − − = − −∑ ∑ . Setting the derivative to zero then

results in the OLS estimator:

0

1

( 2) 0n

i oi

Y n Yβ β=

− − = ⇒ =∑ .

10) (Requires Calculus) Consider the following model:

iii uXY += 1β .

Derive the OLS estimator for β1.

Answer: To derive the OLS estimator, minimize the sum of squared prediction

mistakes 21

1

( )n

i ii

Y b X=

−∑ . Taking the derivative with respect to 1b results

in 2 2

1 1 11 1 11 1

( ) ( ) 2( )( )n n n

i i i i i i ii i i

Y b X Y b X Y b X Xb b= = =

∂ ∂− = − = − −∂ ∂∑ ∑ ∑

21 1

1 1

( 2) ( )( ) ( 2)( )n n

i i i i i ii i

Y b X X Y X b X= =

= − − = − −∑ ∑ . Setting the derivative to

zero then results in the OLS estimator:


77

2 11 1

21 1

1

( 2)( 0

n

i in ni

i i i ni i

ii

Y XY X X

Xβ β =

= =

=

− − = ⇒ =∑

∑ ∑∑

.

11) Show first that the regression R2 is the square of the sample correlation coefficient.

Next, show that the slope of a simple regression of Y on X is only identical to the inverse of the regression slope of X on Y if the regression R2 equals one.

Answer: The regression 2 ESSRTSS

= , where ESS is given by 2

1

( )n

ii

Y Y=

−∑ . But

0 1i iY Xβ β= + and

0 1Y Xβ β= + . Hence 22 21( ) ( )iiY Y X Xβ− = − ,

and therefore 2 21

1

( )n

ii

ESS X Xβ=

= −∑ . Using small letters to indicate

deviations from mean, i.e., i iz Z Z= − , we get that the regression

2 21

2 1

2

1

n

ii

n

ii

xR

y

β=

=

=∑

∑. The square of the correlation coefficient is

22 2 2 21

2 1 1 1 1

2 2 2 2 2 2

1 1 1 1 1

( ) ( )

( )

n n n n

i i i i i ii i i in n n n n

i i i i ii i i i i

y x y x x xr

x y x y y

β= = = =

= = = = =

= = =∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑. Hence the two are the

same.

Now

2 2 21 22 1 1

12 2

1 1

1

n n

i ii i

n n

i ii i

x yr

y x

ββ= =

= =

= = ⇒ =∑ ∑

∑ ∑. But

21

1 12

1

n

i ii

n

ii

x y

xβ β =

=

=∑

∑and therefore

2

11

1

n

iin

i ii

y

x yβ =

=

=∑

∑,

which is the inverse of the regression slope of X on Y. 12) Consider the sample regression function

0 1ˆ ˆ ˆi i iY X uβ β= + + .

First, take averages on both sides of the equation. Second, subtract the resulting equation from the above equation to write the sample regression function in deviations from means. (For simplicity, you may want to use small letters to indicate deviations from the mean, i.e., i iz Z Z= − .) Finally, illustrate in a two-


78

dimensional diagram with SSR on the vertical axis and the regression slope on the horizontal axis how you could find the least squares estimator for the slope by varying its values through trial and error.

Answer: Taking averages results in 0 1

ˆ ˆY Xβ β= + , and subtracting this equation

from the above one, we get 1ˆ ˆi i iy x uβ= + .

$ 2 21

1

( )n

i i ii

SSR u y xβ=

= = −∑ ∑ is a quadratic which takes on different

values for different choices of 1β (the y and x are given in this case, i.e., different from the usual calculus problems, they cannot vary here). You could choose a starting value of the slope and calculate SSR. Next you could choose a different value for the slope and calculate the new SSR. There are two choices for the new slope value for you to make: first, in which direction you want to move, and second, how large a distance you want to choose the new slope value from the old one. (In essence, this is what sophisticated search algorithms do.) You continue with this procedure until you find the smallest SSR. The slope coefficient which has generated this SSR is the OLS estimator.

13) Carefully discuss the advantages of using heteroskedasticity-robust standard

errors over standard errors calculated under the assumption of homoskedasticity. Give at least five examples where it is very plausible to assume that the errors display heteroskedasticity.

Answer: There are virtually no examples where economic theory suggests that

the errors are homoskedastic. Hence the maintained hypothesis should be that they are heteroskedastic. Using homoskedasticity-only standard errors when in truth heteroskedasticity-robust standard errors should


79

be used, results in false inference. What makes this worse is that homoskedasticity-only standard errors are typically smaller than heteroskedasticity-robust standard errors, resulting in t-statistics that are too large, and hence rejection of the null hypothesis too often. There is an alternative GLS estimator, weighted least squares, which is BLUE, but requires knowledge of how the error variance depends on X, e.g. X or X2. Answers will vary by student regarding the examples, but earnings functions, cross country beta-convergence regressions, consumption functions, sports regressions involving teams from markets with varying population size, weight-height relationships for children, etc., are all good candidates.

14) The effect of decreasing the student-teacher ratio by one is estimated to result in

an improvement of the districtwide score by 2.28 with a standard error of 0.52. Construct a 90% and 99% confidence interval for the size of the slope coefficient and the corresponding predicted effect of changing the student-teacher ratio by one. What is the intuition on why the 99% confidence interval is wider than the 90% confidence interval?

Answer: The 90% confidence interval for the slope is calculated as follows:

(2.28 – 1.645×0.52, 2.28 + 1.645×0.52) = (1.42, 3.14).

The corresponding predicted effect of a unit change in the student-teacher ratio is the same, since the change in X is 1. The 99% confidence interval for the slope coefficient and the unit change in the student-teacher ratio is:

(2.28 – 2.58×0.52, 2.28 + 2.58×0.52) = (0.94, 3.62).

The 99% confidence interval corresponds to a smaller size of the test. This means that you want to be “more certain” that the population parameter is contained in the interval, and that requires a larger interval.

15) Given the amount of money and effort that you have spent on your education, you

wonder if it was (is) all worth it. You therefore collect data from the Current Population Survey (CPS) and estimate a linear relationship between earnings and the years of education of individuals. What would be the effect on your regression slope and intercept if you measured earnings in thousands of dollars rather than in dollars? Would the regression R2 be affected? Should statistical inference be dependent on the scale of variables? Discuss.

Answer: It should be clear that interpretation of estimated relationships and

statistical inference should not depend on the units of measurement. Otherwise whim could dictate conclusions. Hence the regression R2 and statistical inference cannot be effected. It is easy but tedious to show this mathematically. Next, the intercept indicates the value of Y when X


80

is zero. The change in the units of measurement have no effect on this, since the change in X is cancelled by the change in 1β . The slope coefficient will change to compensate for the change in the units of measurement of X. In the above case, the decimal point will move 3 digits to the left.

16) (Requires Appendix Material) Consider the sample regression function

* *0 1ˆ ˆ ˆi i iY X uγ γ= + + ,

where “*” indicates that the variable has been standardized. What are the units of measurement for the dependent and explanatory variable? Why would you want to transform both variables in this way? Show that the OLS estimator for the intercept equals zero. Next prove that the OLS estimator for the slope in this case is identical to the formula for the least squares estimator where the variables have not been standardized, times the ratio of the sample standard deviation of X and Y,

i.e., 1 1ˆˆ * X

Y

ss

γ β= .

Answer: The units of measurement are in standard deviations. Standardizing the variables allows conversion into common units and allows comparison of the size of coefficients. The mean of standardized variables is zero, and hence the OLS intercept must also be zero. The

slope coefficient is given by the formula $* *

11

*2

1

n

i ii

n

ii

x y

xγ =

=

=∑

∑, where small

letters indicate deviations from mean, i.e., z Z Z= − .

Note that means of standardized variables are zero, and hence we get $* *

11

*2

1

n

i ii

n

ii

X Y

Xγ =

=

=∑

∑.

Writing this expression in terms of originally observed variables results in

$ 11

22

1

1 1

1

n

i iiX Yn

iiX

x yS S

xS

γ =

=

=∑

∑, which is the same as the sought after expression after simplification.

17) The OLS slope estimator is not defined if there is no variation in the data for the

explanatory variable. You are interested in estimating a regression relating earnings to years of schooling. Imagine that you had collected data on earnings


81

for different individuals, but that all these individuals had completed a college education (16 years of education). Sketch what the data would look like and explain intuitively why the OLS coefficient does not exist in this situation.

Answer: There is no variation in X in this case, and it is therefore unreasonable to

ask by how much Y would change if X changed by one unit. Regression analysis cannot figure out the answer to this question, because a change in X never happens in the sample.

Earnings

Years of Education 16

X X X X X X X X


82

18) Indicate in a scatterplot what the data for your dependent variable and your explanatory variable would look like in a regression with an R2 equal to zero. How would this change if the regression R2 was equal to one?

Answer:

19) Imagine that you had discovered a relationship that would generate a scatterplot

very similar to the relationship 2ii XY = , and that you would try to fit a linear

regression through your data points. What do you expect the slope coefficient to be? What do you think the value of your regression R2 is in this situation? What are the implications from your answers in terms of fitting a linear regression through a non-linear relationship?

Answer: You would expect the slope to be a straight line (=0) and the regression

R2 to be zero in this situation. The implication is that although there may be a relationship between two variables, you may not detect it if you use the wrong functional form.

20) (Requires Appendix Material) A necessary and sufficient condition to derive the

OLS estimator is that the following two conditions hold: 1

ˆn

ii

u=∑ = 0 and

1

ˆn

i ii

u X=∑ =

0. Show that these conditions imply that 1

ˆˆn

i ii

u Y=∑ = 0.

Answer: 0 1 0 1

1 1 1 1

ˆˆ ˆ ˆ ˆ( ) 0n n n n

i i i i i i ii i i i

u Y u X u u Xβ β β β= = = =

= + = + =∑ ∑ ∑ ∑

practice problems for midterm 1 - discover · pdf filepractice problems for midterm 1 multiple...

Documents