part v further topics in linear regressionlipas.uwasa.fi/~sjp/teaching/ecm/lectures/ecmc5.pdf ·...

Further Topics in Linear Regression

Part V


As of Oct 8, 2020Seppo Pynnonen Econometrics I


Large Sample Properties of OLS Estimators1 Further Topics in Linear Regression

Large Sample Properties of OLS Estimators

Consistency

Asymptotic Efficiency

Asymptotic Normality and Large Sample Inference

Other Large Sample Tests: The Lagrange Multiplier Test

Effects of Data scaling on OLS Statistics

Beta Coefficients

Prediction

Confidence Intervals for Predictions

Confidence Interval for the Fitted Regression Equation

Predicting y from log(y) specification

Functional Form of Regression

Using logarithmic transformations

Quadratic Terms

Interaction terms

Seppo Pynnonen Econometrics I



The Law of Large Numbers (LLN)

Let x1, . . . , xn be independent random variables with E[xi ] = µ andvar[xi ] = σ2 <∞, then

xn =1

n

n∑i=1

xiP→ µ, (1)

where ”P→” means that

limn→∞

P(|xn − µ| > ε)→ 0 (2)

for any ε > 0.

xnP→ µ is denoted usually as

plim xn = µ (3)

”plim” standing for ”probability limit”.




Central Limit Theorem (CLT)

Let x1, . . . , xn be independent and identically distributed (iid) randomvariables withE[xi ] = µ and var[xi ] = σ2 <∞. Then

zn =(xn − µ)

σ/√n

a∼ N(0, 1) (4)

as n→∞, where ”a∼” means that the distribution of zn approaches to

the standard normal distribution.

The notationa∼ stands for the phrase ”the asymptotic distribution

of zn is N(0, 1)”.




1 Further Topics in Linear Regression


Consistency





Beta Coefficients

Prediction






Quadratic Terms

Interaction termsSeppo Pynnonen Econometrics I



Theorem 1

Under the classical assumptions 1–5 the OLS estimators are consistent.That is,

plim βj = βj . (5)

Consistency of an estimator is an important and desirable property.




Example 1

The following (Monte Carlo) simulation example demonstratesconvergence of the slope coefficient estimate β1 of the simple regression

y = β0 + β1x + u

as sample size n grows from 10 to 1000 in steps of 10 observations.

The population values are: β0 = 1 and β1 = 0.5; x is generated from

N(5, 3) and the error term u is N(0, 2).




Example 1 continues

+

+

+

++

+

+

+

++++

++++++

+

+

++++++

+++

+

+

+

+

+

+

+

++++

++

+

+

+++

+++

+

+

++++

++++

+++

++++

++++

+

+++

++++

+

+++++

++++++

+++++++++

0 200 400 600 800 1000

0.00.2

0.40.6

0.81.0

Convegence of β1 OLS estimate of y = β0 + β1x + u

Sample size

Estim

ate of

β 1

0.5

95% probability limits

β0 = 1, β1 = 0.5x ~ N(µx, σx

2)u ~ N(0, σu

2)µx = 5, σx

2 = 3µu = 0, σu

2 = 2




Remark 1

Correlation of the error term u with any of the explanatory variablesx1, . . . , xk in regression

y = β0 + β1x1 + · · ·+ βkxk + u

implies that OLS estimators βi are both biased and inconsistent!






Consistency





Beta Coefficients

Prediction






Quadratic Terms




Recall that the variance of OLS estimator βj is of the form

var[β] =σ2u

(1− R2j )SSTj

,

where R2j is the R-square from regression of xj on all other

x-variables and

SSTj =n∑

i=1

(xij − xj)2.

It can be shown that as n→∞, R2j → ρ2

j and SSTj/n→ σ2j ,

where ρ2j and σ2

j are population R-square from regressing xj on the

other explanatory variables and σ2j is the population variance of xj .




Then under the classical assumptions

s.e(βj) ≈ cj/√n, (6)

wherecj =

σu

σj√

1− ρ2j

,

which is constant (does not depend on n).

Thus, equation (6) shows that the standard error of the OLSestimator βj converges to zero at rate inverse to the square root ofsample size.

Furthermore it can be shown that (under the classicalassumptions) the OLS estimator βj is asymptotically efficientamong fairly general class of consistent estimators.

That is, among this class the OLS estimator has the smallestvariance.






Consistency





Beta Coefficients

Prediction






Quadratic Terms




If the error terms u1, . . . , un in the regression are not normal, i.e.,Assumption 6 does not hold, then the distribution of βj is not anymore (exactly) normal. This implies that the normal theoryinference is not any more exactly valid.

However, because the OLS estimators are weighted sums ofrandom variables, the CLT applies and we have the importantresult that the OLS estimators are in any caseasymptotically normal (if assumptions 1–5 are in effect).




As a resultβj − βjσβj

a∼ N(0, 1) (7)

andβj − βjsβj

a∼ N(0, 1). (8)

In the same manner the F -test discussed in Chapter 4 are alsoasymptotic.

Test statistics relying on asymptotic distribution results arecommonly called asymptotic or large sample test statistics.




In summary, for the regression model

yi = β0 + β1xi1 + · · ·+ βkxik + ui , i = 1, . . . n,

Theorem 2

Under the Gauss-Markov assumptions 1–5 (Classical assumptions),

(i) plim βj = βj , i.e., the OLS regression coefficients are consistent estimators of the

population coefficients, βj , j = 0, 1, . . . k.

(ii) plim σ2u = σ2

u, i.e., the OLS residual variance σ2u is a consistent estimator of the

error variance σ2u = var[u].

(ii) For each j = 0, 1, . . . , k

βj − βjσβj

a∼ N(0, 1),

where σβj= se(βj) is the OLS standard error of βj .






Consistency





Beta Coefficients

Prediction






Quadratic Terms




In statistics there are three kinds of all purpose large samplestatistics: The Likelihood Ratio (LR), the Wald (W), and theLagrange Multiplier (LM) test statistics. These can be used intesting complicated restrictions on parameters.

Under the null hypothesis the asymptotic distribution for eachstatistic is χ2

q with q degrees of freedom, where q equals thenumber of imposed restrictions.

Asymptotically all these test lead to the same result, but in finitesamples they may differ.

Modern econometric packages have these statistics available.




The LR test statistic is based on comparison of the estimationresults of the restricted model (restrictions imposed by the nullhypothesis, H0) and the results of the unrestricted model (H1)

The Wald test statistic is based on computing theundresticted model estimates and comparing them directlywith the restriction imposed by the null hypothesis H0.

The LM test statistic is based on estimating the model underthe constraint of the null hypothesis H0 and evaluating (interms of derivatives) how close the criterion function with therestrictions is to the global maximum (i.e., when norestrictions are imposed).




In particular the LM statistic has achieved some popularity inmodern econometrics, often referred to as n-R-squared statisticdue to the reason that for example testing for the null hypothesiswhether, say, the last q population parameters in regression

y = β0 + β1x1 + · · ·+ βkxk + u (9)

are zero, i.e., the null hypothesis

H0 : βk−q+1 = 0, βk−q+2 = 0, . . . , βk = 0 (10)

against

H1 : at least one βj 6= 0, j = k − q + 1, . . . , k. (11)




For the LM statistic

(i) Regress y on x1, . . . , xk−q (restricted model), save theresiduals.

(ii) Regress the residuals from the restricted model on all thex-variables and obtain the R-square, denote it R2

u .

(iii) Compute the LM statistic

LM = nR2u (n of observations times the R-square) (12)

(iv) Reject the null hypothesis in (10) if LM = nR2u > cα, where

cα is the α level threshold from the χ2 distribution with qdegrees of freedom (alternatively compute the p-value of nR2

and reject the null hypothesis if the p-value is smaller then α).




Example 2 (Exercise C3, Ch 5, W 5ed)

Consider data HTV.RAW and the following wage equation

log(wage) = β0 + β1educ + β2exper + β3motheduc + β4fatheduc + u(13)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.173620 0.182574 0.951 0.341816

educ 0.115893 0.009513 12.183 < 2e-16 ***

exper 0.034192 0.006746 5.069 4.63e-07 ***

motheduc 0.008330 0.008670 0.961 0.336841

fatheduc 0.020931 0.006024 3.475 0.000529 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.535 on 1225 degrees of freedom

Multiple R-squared: 0.1908,Adjusted R-squared: 0.1881

F-statistic: 72.2 on 4 and 1225 DF, p-value: < 2.2e-16




(i) Test the hypothesisH0 : β3 = 0, β4 = 0 (14)

obtain the LM test.

Estimating first the restricted model, i.e., log(wage) on educ andexper, we getCoefficients:


(Intercept) 0.372975 0.175655 2.123 0.0339 *

educ 0.130245 0.008964 14.530 < 2e-16 ***

exper 0.031911 0.006781 4.706 2.81e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1







To obtain the LM test, regress the residuals of the above regression on allexplanatory variables yields

Coefficients:


(Intercept) -0.199356 0.182574 -1.092 0.275085

educ -0.014352 0.009513 -1.509 0.131622

exper 0.002281 0.006746 0.338 0.735334

motheduc 0.008330 0.008670 0.961 0.336841

fatheduc 0.020931 0.006024 3.475 0.000529

---



F-statistic: 5.432 on 4 and 1225 DF, p-value: 0.0002457

from which LM = nR2 = 1230× 0.01743 ≈ 21.4 with 2 degrees of freedom.

(ii) Obtain the (asymptotic) p-value.

The asymptotic p-value for χ2 = 21.4 with 2 degrees of freedom is

0.00002. Thus, the null hypothesis in (14) is reject at every

conventional significance level, and we can conclude that parents

income affects adolescent’s income (father’s in particular according to

the above regression) .






Consistency





Beta Coefficients

Prediction






Quadratic Terms




We have earlier discussed scaling and adding a constant to y and xvariable in the simple regression.

The same rules apply here too. Here we discuss onlystandardization of variables.






Consistency





Beta Coefficients

Prediction






Quadratic Terms




Sometimes it is desirable to standardize all the variables, such that

y∗i =yi − y

sy(15)

and

x∗ij =xij − xj

sj, (16)

where sy and sj are the standard deviations of y and xj ,respectively, j = 1, . . . , k .

Theny∗i = β∗1x

∗i1 + · · ·+ β∗kx

∗ik , (17)

whereβ∗j =

sjsyβj . (18)




β∗j s are called standardized coefficientsor beta coefficients.

Remark 2

t-statistics R-squares etc do not change in standardization.

The interpretation of the standardized coefficients is that β∗jindicates the change in standardized value of y as xj changes byone standard deviation, ceteris paribus.



Prediction



Consistency





Beta Coefficients

Prediction






Quadratic Terms



Prediction

Estimated model

y = β0 + β1x1 + β2x2 + · · ·+ βkxk . (19)

Let a1, a2, . . . , ak be given values forx1, x2, . . . , xk , then the predicted value of y is

ypred = β0 + β1a1 + · · ·+ βkak . (20)

Let ytrue denote the true value of y , the prediction error then isupred = ytrue − ypred. Because

ytrue = β0 + β1a1 + · · ·+ βkak + u, (21)

the prediction error can be written as

ytrue − ypred = (β0 − β0) + (β1 − β1)a1 + · · ·+ (βk − βk)ak + u (22)



Prediction

As a consequence the prediction comes in one hand from theestimation error of the parameter and on the other hand form therandom tern u.

Because βjs are unbiased and E[u] = 0,

E[ypred] = E[ytrue]. (23)

However, the estimation error affect the variance, which for theprediction is

var[upred] = var[ypred] + var[u] (24)



Prediction

Matrix notations simplify deriving the variances. The predictionequation is

ypred = x′b, (25)

where x = (1, a1, . . . , ak)′, andb = (β0, β1, . . . , βk)′. The

var[ypred] = σ2ux′(X′X)−1x, (26)

where σ2u = var[u].

Thus (24) becomes

σ2upred

= var[upred] = σ2u(1 + x′(X′X)−1x). (27)

with estimator

s2upred

= σ2u(1 + x′(X′X)−1x). (28)



Prediction

The standard error of the prediction is

supred =√

s2upred

. (29)

Finally a 100(1− α)% confidence interval for the prediction is

ypred ± cα/2supred , (30)

where cα/2 is the t-distribution table with n − k − 1 degrees offreedom such thatP(T > cα/2) = α/2.



Prediction



Consistency





Beta Coefficients

Prediction






Quadratic Terms



Prediction

Variance of the fitted regression line (equation) is

σ2y = var[y ] = σ2

ux′(X′X)−1x, (31)

with estimator

s2y = var[y ] = σ2

ux′(X′X)−1x. (32)

Thus a 100(1− α)% confidence interval for the fittedregression equation

y ± cα/2sy . (33)



Prediction



Consistency





Beta Coefficients

Prediction






Quadratic Terms



Prediction

Consider the model

log(y) = β0 + β1x1 + · · ·+ βkxk + u. (34)

Estimated model

log(y) = β0 + β1x1 + · · ·+ βkxk . (35)

A natural prediction for y would be

y = e log(y).

This, however, systematically underestimates the expected value ofy .



Prediction

This is because, if u ∼ N(0, σ2u), E[eu] = e

12σ2u .

Then given x-values this implies

E[y |x] = E[e log(y)|x

]= ex

′bE[eu|x] = e12σ2uex

′b, (36)

where b = (β0, β1, . . . , βk)′ andx = (1, x1, . . . , xk)′.

Thus an appropriate predictor for y is

y = e σ2u/2e log(y). (37)



Prediction

If the normality of u does not hold then let E[eu] = α0 and (36)becomes

E[y |x] = α0elog(y) = α0e

x′b, (38)

where α0 is an unknown parameter.

It turns out that a consistent estimator of α0 is found as follows:

(1) Obtain the fitted value of log(y)i

(2) For each observation i , create mi = e log(y)i

(3) Regress y on mi without an intercept, and use the estimatedregression coefficient as an estimate of α0.



Prediction

Example 3

Consider the wage example. The model to be estimated is

log(wage) = β0 + β1educ + β2exper + β3tenure + u.

Estimating the parameters, generating the mi series, and estimatingregression

wage = α0m + v ,

where v is an error term, produces α0 = 1.1228. Note that

eσ2u/2 = e(0.440862)2/2 ≈ 1.1021, which is slightly smaller than α0.






Consistency





Beta Coefficients

Prediction






Quadratic Terms




In economic applications most nonlinear relationships betweenexplained explanatory variables are are worked out by takinglogarithms or containing quadratics of exlanatory variables in themodel.

Consider the model of an earlier example

log(price) = β0 + β1 log(nox) + β2rooms + u. (39)

Coefficient β1 is the elasticity of price with respect to pollution(nox), while 100× β2 is approximately the percentage change inprice when the rooms increases by one.




Example 4

Using Wooldridge’s data set hprice2.xls we get the following estimates for(39)

Dependent Variable: LOG(PRICE)Method: Least SquaresDate: 10/10/06 Time: 00:15Sample: 1 506Included observations: 506

Variable Coefficient Std. Error t-Statistic Prob.

C 9.233738 0.187741 49.18350 0.0000LOG(NOX) -0.717673 0.066340 -10.81816 0.0000

ROOMS 0.305918 0.019017 16.08626 0.0000

R-squared 0.513717 Mean dependent var 9.941057Adjusted R-squared 0.511784 S.D. dependent var 0.409255S.E. of regression 0.285956 Akaike info criterion 0.339957Sum squared resid 41.13085 Schwarz criterion 0.365015Log likelihood -83.00902 F-statistic 265.6890Durbin-Watson stat 0.603290 Prob(F-statistic) 0.000000




Example 4 continues

Thus the estimate model is

log(price) = 9.234 − 0.718 log(nox) + 0.306 rooms(0.188) (0.066) (0.019)

R2 = 0.514n = 506,

where standard errors are in parentheses

When nox increases by 1%, price falls by 0.718% (holding rooms fixed).

When number of rooms increases by one, price increases by

approximately 100× 0.306 = 30.6%.




Remark 3

Approximation %∆y ≈ 100×∆ log y becomes inaccurate when thechange in log y becomes large. Generally if we have an estimated model

log y = β0 + β1 log(x1) + β2x2, (40)

fixing x1, we have ∆ log y = β2∆x2, from which we get the exactpercentage change as

%∆y = 100×(

exp(β2∆x2)− 1). (41)




Example 5 (Example 4 continued)

In Example 4 we have ∆x2 = 1 and β2 = 0.305918. Thus we get

%∆y = 100× (exp(0.305918)− 1) ≈ 35.8%,

which is notably larger than the approximate change 30.6%.




Advantages of using log transformations:

interpretation (elasticity, perentage change)

changing scale does not change slope coefficients

if y > 0 log-transformation usually make variables closer tonormality

Usually log-transformations are quite routinely taken for series thatare positive monetary values (wages, salaries, firm sales, firmmarket values, stock indices, etc.) also logs are often taken fromvariables measuring population, total number of employees, etcthat are usually large integers.

Variables measured in years (education, experience, tenure, age,etc) are usually used in their original form.

Remark 4

Log-transformations cannot be used if a variable takes zero or negative values!




Exploring further (W 5ed Exploring Further 6.2, p. 185)

Suppose that the annual number of drunk driving arrests is determined by

log(arrests) = β0 + β1 log(pop) + β2 age16 25 + other factors,

where age16 25 is the proportion of population between 16 and 25 years

of age. Show that β2 has the following (ceteris paribus) interpretation: it

is the percentage of the people aged 16 to 25 increases by one

percentage point.






Consistency





Beta Coefficients

Prediction






Quadratic Terms




Consider modely = β0 + β1x + β2x

2 + u, (42)

where x2 is the quadratic term.

The interpretation of the model changes. β1 does not any moremeasure the change in y when x changes by one unit (i.e., β1 isnot any more the slope coefficient).

To see this, write the estimated model

y = β0 + β1x + β2x2, (43)

then approximately

∆y ≈ (β1 + 2β2x)∆x , (44)

so that the slope is∆y

∆x≈ β1 + 2β2x . (45)




Example 6

Consider the wage example and estimate the model (data: wage1.xls)

log(wage) = β0 + β1 exper + β2 exper2 + u.

Estimating with EViews, we obtain:

Dependent Variable: LOG(WAGE)

Method: Least Squares

Sample: 1 526

Included observations: 526

===============================================================

Variable Coefficient Std. Error t-Stat Prob.

---------------------------------------------------------------

C 1.295291 0.049481 26.17746 0.0000

EXPER 0.045534 0.005859 7.771027 0.0000

EXPER^2 -0.000944 0.000129 -7.312008 0.0000

===============================================================

R-squared 0.104001 Mean dependent var 1.623268

Adjusted R-squared 0.100574 S.D. dependent var 0.531538

S.E. of regression 0.504101 Akaike info criterion 1.473605

Sum squared resid 132.9034 Schwarz criterion 1.497932

Log likelihood -384.5581 F-statistic 30.35285

Durbin-Watson stat 1.795529 Prob(F-statistic) 0.000000

===============================================================




Example 6 continues

Thus, the estimated equation is

log(wage) = 1.30 + 0.046 exper − 0.000944 exper2

(0.049) (0.0059) (0.000129)

R2 = 0.104n = 526

Experience (exper) has a diminishing effect on wage. The first year increases wageabout by 4.6%, the sencond year [using (45)] by.045534− 2× (.000944)× 1 = 0.043646 ≈ 4.4%. Going from 10 to 11 years ofexperience, wage is predicted to change by .045534− 2× (.000944)× 10 ≈ 2.7%.

The predicted maximum wage is achieved at experience of

exper = −β1

2β2

= −.045534

2(−0.000944)≈ 24 years.

Note that we have omitted other important factors (education, etc.) from this

example.






Consistency





Beta Coefficients

Prediction






Quadratic Terms




Consider a model with two explanatory variables such that

y = β0 + β1x1 + β2x2 + β12x1x2 + u. (46)

The cross-product term x1x2 is called the interaction effect of x1

and x2.

Often, however, it turns out that the cross-product term is highlycorrelated with x1 and/or x2 resulting to a potentialmulticollinearity problem.

The collinearity problem becomes often resolved by centerering thevariables in the interaction term, such that if E[xj ] = µj j = 1, 2,instead of model (46) consider

y = β0 + δ1x1 + δ2x2 + β12(x1 − µ1)(x2 − µ2) + u. (47)




Now, for example the slope coefficients δ1 indicates the change iny in response to unit change in x1 at the mean of x2, i.e., whenx2 = µ2, so that the interaction term of the demeaned variablesdisappears. The same applies to δ2.

We find also that the relation β1 in (46) and δ1 in equation (47) isβ1 = δ1 − β12µ2, similarly β2 = δ2 − β12µ1.




Example 7

A model to explain standardized outcome on a final exam stndfnl(microeconomic principles) in terms of percentage classes attendedatndrte, prior college grade point average prigpa, and ACT score actwas specified as

stndfnl = β0 + β1atndrte + β2prigpa + β3act (48)

+β22prigpa2 + β33act

2 + β12(atndrte× prigpa) + u

The quadratics are included to capture potential non-linearities and theinteraction term atndrte× prigpa is included to account for the ideathat class attendance might have a different effect for students who haveperformed differently in the past (in terms of prigpa).The main interest is in the marginal effect of attendance (atndrte), i.e.,

∂stndfnl

∂atndrte= β1 + β12prigpa.




Exploring further 6.3 W 5ed, p. 192

If we add the term β13atndrte× act to equation (48), what is the

partial effect of atnedrte on stndfnl?




Example 7 continues

Using data Wooldridge’s data set attend.txt (available also onwww.econometrics.com) we get the following results.

Dependent variable: stndfnl

Coefficients:


(Intercept) 2.050293 1.360319 1.507 0.132225

atndrte -0.006713 0.010232 -0.656 0.512005

prigpa -1.628540 0.481002 -3.386 0.000751 ***

act -0.128039 0.098492 -1.300 0.194047

I(prigpa^2) 0.295905 0.101049 2.928 0.003523 **

I(act^2) 0.004533 0.002176 2.083 0.037634 *

atndrte x prigpa 0.005586 0.004317 1.294 0.196173

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1







Example 7 continues

Interpreting the coefficient estimates is not straightforward.

First, atndrte is not statistically significant (estimate is even negative),similarly the cross-product term atntdre× prigpa is insignificant.

Does this mean that class attendance does not have any impact?

To test this statistically we should factually test the joint hypothesis

H0 : β1 = β12 = 0.

Using the F -test we obtain F = 4.319 with 2 and 673 degrees of freedom

and p-value = 0.014, i.e. significant at the 5% level. This is an example

to show that using individual t-values may be badly misleading.




Example 7 continues

Often in these kinds of situations a potential reason is highmulticollinearity.

Here individual correlations are not that severe as the correlation betweenatndrte and atndrte× prigpa is 0.81, but R2

atndrte = .963 from theregression of atndrte on other explanatory variables. ThereforeVIF(atndrte) = 27.1 is fairly, indicating a potential collinearity problem.

Replacing the cross-product by the cross-product of demeaned atndrteand prigpa such that

stndfnl = β0 + δ1atndrte + δ2prigpa + β3act (49)

+β22prigpa2 + β33act

2 + β12(dmatndrte× dmprigpa) + u,

where dmattndrte and dmprigpa are deviations from their respective

means, reduces materially the collinearity as the R-square of atendrte

on the other regressors of equation (49) is only 0.443 and VIF 1.80.




Example 7 continues

Estimating the model yields

Dependent variable: stndfnl

Coefficients:


(Intercept) 0.869632 1.272709 0.683 0.49466

atndrte 0.007737 0.002633 2.938 0.00341 **

prigpa -1.172118 0.529975 -2.212 0.02733 *

act -0.128039 0.098492 -1.300 0.19405

I(prigpa^2) 0.295905 0.101049 2.928 0.00352 **

I(act^2) 0.004533 0.002176 2.083 0.03763 *

dmatndrte:dmprigpa 0.005586 0.004317 1.294 0.19617

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




Thus, eventually the interaction term proves not to be statistically

significant and attendrte has a positive effect.Seppo Pynnonen Econometrics I



Sometimes an interaction terms comes in a natural way into themodel.

For example, consider the simple consumption function

C = β0 + β1Y + u, (50)

where C denotes consumption and Y income.




Suppose, the marginal propensity to consume (mpc), β1 dependson the level of wealth A such that

β1 = βy + βayA. (51)

This implies the interaction term A · Y to the model (50), as

C = β0 + β1Y + u

= β0 + (βy + βayA)Y + u

= β0 + βyY + βayAY + u.

(52)


part v further topics in linear regressionlipas.uwasa.fi/~sjp/teaching/ecm/lectures/ecmc5.pdf ·...

Documents