interpreting regression resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... ·...

22
Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 22

Upload: others

Post on 21-Aug-2020

49 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Interpreting Regression Results

Carlo Favero

Favero () Interpreting Regression Results 1 / 22

Page 2: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Interpreting Regression Results

Interpreting regression results is not a simple exercise. We propose tosplit these procedure in three steps.

First, introduce a measure of sampling variability and evaluateagain what you know taking into account that parameters areestimated and there is uncertainty surrounding your pointestimates.Second, understand the relevance of our regression independentlyfrom inference on the parameters. There is an easy way to do this:suppose all parameters in the model are known and identical tothe estimated values and learn how to read these.Third, remember that each regression is run after a reductionprocess has been, explicitly or implicitly implemented. Therelevant question is what happens if something went wrong in thereduction process? What are the consequences of omittingrelevant information or of including irrelevant one in yourspecification?

Favero () Interpreting Regression Results 2 / 22

Page 3: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Statistical Significance and Relevance

Relevance of a regression is different form statistical significanceof the estimated parameters.In fact, confusing statistical significance of the estimatedparameter describing the effect of a regressor on the dependentvariable with practical relevance of that effect is a rather commonmistake in the use of the linear model.Statistical inference is a tool for estimating parameters in aprobability model and assessing the amount of samplingvariability. Statistics gives us indication on what we can say aboutthe values of the parameters in the model on the basis of oursample.The relevance of a regression is determined by the share of theunconditional variance of y that is explained by the variance ofE (y j X). Measuring how large is the share of the unconditionalvariance of y explained by the regression function is thefundamental role of R2.

Favero () Interpreting Regression Results 3 / 22

Page 4: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Statistical Significance of CAPM coefficients

Estimate the coefficients in a CAPM regressionSuppose the null hypothesis is true and compute p as be theprobability (under that hypothesis) of getting results as extreme asthose observedp is called the p-value. If it is very small the results are statisticallysignificance in the sense that the null is rejected as the probabilityof observing what you have observed under the null is small.Note that the p-value can be computed in two ways i) by derivingthe relevant distribution aunder the null ii) by simulating viaMonte-Carlo or bootstrap the relevant distribution under the null.

Favero () Interpreting Regression Results 4 / 22

Page 5: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Relevance of CAPM coefficients

Estimate the coefficients in a CAPM regression and keep themfixed at their point estimateRun an experiment by changing the conditional mean via a shockto the regressorAssess how relevant is the shock in excess market returns todetermine excess returns on asset i

Favero () Interpreting Regression Results 5 / 22

Page 6: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Inference in the Linear Regression Model

Users of econometric models in finance attributes high priority tothe concept of "statistical significance" of their estimates. In thestandard statistical jargon an estimate of a parameter is “statisticalsignificant” if its estimated value, compared with its samplingstandard deviation makes it unlikely that in other samples theestimate may change of sign.In the linear regression model the statistical index mostly used isthe t-ratio and an estimated parameter has a significance which isusually measured in terms of its P-value, the probability withwhich that coefficient is equal to zero.In the previous section we have discussed the common confusionbetween “statistical significance” and “relevance”In this section we illustrate the basic principles that allow us toevaluate statistical significance and to perform test of relevanthypothesis on the estimated coefficient in a linear model.

Favero () Interpreting Regression Results 6 / 22

Page 7: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Elements of Distribution theory

We consider the distribution of a generic n-dimensional vector z,together with the derived distribution of the vector x = g (z) whichadmits the inverse z = h (x), with h = g�1. Ifprob (z1 < z < z2) =

R z2z1

f (z) dz, and prob (x1 < x < x2) =R x2

x1f � (x) dx,

then:f � (x) = f (h (x)) J,

where J =

�������∂h1∂x1

... ∂hn∂x1

... ... ...∂h1∂xn

... ∂hn∂xn

������� =��� ∂h

∂x0

���.

Favero () Interpreting Regression Results 7 / 22

Page 8: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The normal distribution

The standardized normal univariate has the following distribution:

f (z) =1p2π

exp��1

2z2�

,

E (z) = 0, var (z) = 1.

By considering the transformation x = σz+ µ, we derive thedistribution of the univariate normal as:

f (x) =1

σp

2πexp

� (x� µ)2

2σ2

!,

E (x) = µ, var (x) = σ2.

Favero () Interpreting Regression Results 8 / 22

Page 9: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The normal multivariate distribution

Consider now the vector z = (z1, z2,..., zn) , such that

f (z) =n

∏i=1

f (zi) = (2π)�n2 exp

��1

2z0z�

.

z is, by construction, a vector of normal independent variables withzero mean and identity variance covariance matrix. The conventionalnotation is z � N (0, In).

Favero () Interpreting Regression Results 9 / 22

Page 10: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The normal multivariate distribution

Consider now the linear transformation,

x = Az+ µ,

where A is an (n� n) invertible matrix. We consider the followingtransformation z = A�1 (x� µ) with Jacobian J =

��A�1�� = 1

jAj . Byapplying the formula for the transformation of variables, we have:

f (x) = (2π)�n2

���A�1��� exp

��1

2(x� µ)0 A�10A�1 (x� µ)

�,

which, by defining the positive definite matrix ∑ = AA0, equals

f (x) = (2π)�n2

����∑� 12

���� exp��1

2(x� µ)0 ∑�1

(x� µ)

�.

The conventional notation for the multivariate normal is x � N (µ, ∑).

Favero () Interpreting Regression Results 10 / 22

Page 11: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The transformation of normal multivariate

TheoremFor any x � N (µ, ∑), given any (m� n) B matrix and any (m� 1) vector,d, if y = Bx+ d, this implies y � N (Bµ+ d, B ∑ B0).

Consider a partitioning of an n-variate normal vector in twosub-vectors of dimensions n1 and n� n1:�

x1x2

�� N

��µ1µ2

�,�

Σ11 Σ12Σ21 Σ22

��.

1 x1 � N (µ1, ∑11), which follows from applying the generalformula in the case d = 0, B = (In1 0);

2 (x1 j x2) � N�

µ1 +∑12 ∑�122 (x2�µ2) , ∑11�∑12 ∑�1

22 ∑21

�, which

is obtained by applying the general formula to the case

d = ∑12 ∑�122 x2, B =

�In1 �∑12 ∑�1

22

�.

Favero () Interpreting Regression Results 11 / 22

Page 12: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Distributions derived from the normal

Consider z � N (0, In), an n-variate standard normal. The distributionof ω = z0z is defined as a χ2 (n) distribution with n degrees offreedom. Consider two vectors z1 and z2 of dimensions n1 and n2respectively, with the following distribution:�

z1z2

�� N

��00

�,�

In1 00 In2

��.

We have ω1 = z01z1 � χ2 (n1), ω2 = z02z2 � χ2 (n2), andω1 +ω2 = z01z1 + z02z2 � χ2 (n1 + n2). In general, the sum of twoindependent χ2 (n) distributions is in itself distributed as χ2 with anumber of degrees of freedom equal to the sum of the degrees offreedom of the two χ2.

Favero () Interpreting Regression Results 12 / 22

Page 13: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Distributions derived from the normal

Our discussion of the multivariate normal concludes that ifx � N (µ, ∑), then (x� µ)0 ∑�1 (x� µ) � χ2 (n).A related result establishes that if z � N (0, In) and M is a symmetricidempotent (n� n) matrix of rank r, then z0Mz � χ2 (r).Another distribution related to the normal is the F-distribution. TheF-distribution is obtained as the ratio of two independent χ2 dividedby the respective degrees of freedom. Given ω1 � χ2 (n1), andω2 � χ2 (n2), we have:

ω1/n1

ω2/n2� F (n1, n2) .

Favero () Interpreting Regression Results 13 / 22

Page 14: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Distributions derived from the normal

The Student’s t�distribution is then defined as:

tn =q

F (1, n).

Another useful result establishes that two quadratic forms in thestandard multivariate normal, z0Mz and z0Qz, are independent ifMQ = 0. We can finally state the following theorem, which isfundamental to the statistical inference in the linear model:

TheoremIf z � N (0, In), M and Q are symmetric and idempotent matrices of ranks rand s respectively and MQ = 0, then z0Qz

z0Mzrs � F (s, r).

Favero () Interpreting Regression Results 14 / 22

Page 15: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The conditional distribution

y j XTo perform inference in the linear regression model, we need a furtherhypothesis to specify the distribution of y conditional upon X:

y j X � N�

Xβ, σ2I�

, (1)

or, equivalentlyε j X � N

�0, σ2I

�. (2)

Given (1) we can immediately derive the distribution of�bβ j X

�which, being a linear combination of a normal distribution, is alsonormal: �bβ j X

�� N

�β, σ2 �X0X��1

�. (3)

Favero () Interpreting Regression Results 15 / 22

Page 16: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The conditional distribution

y j XEquation (3) constitutes the basis to construct confidence intervals andto perform hypothesis testing in the linear regression model. Considerthe following expression:�bβ� β

�0X0X

�bβ� β�

σ2 =ε0X (X0X)�1 X0X (X0X)�1 X0ε

σ2

=ε0Qε

σ2 ,

Q = X�X0X

��1 X0

and, applying the results derived in the previous section, we know that

ε0Qε

σ2 j X � χ2 (k) . (4)

Favero () Interpreting Regression Results 16 / 22

Page 17: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

The conditional distribution

y j XEquation (4) is not useful in practice, as we do not know σ2. However,we know that

S�bβ� j X

σ2 =ε0Mε

σ2 j X � χ2 (T� k) . (5)

M = I�Q (6)

Since MQ = 0, we know the distribution of the ratio of (4) and (5);moreover, taking the ratio, we get rid of the unknown term σ2:�bβ� β

�0X0X

�bβ� β�

/σ2

s2/σ2 =ε0Qε

ε0Mε(T� k) � kF (k, T� k) . (7)

Favero () Interpreting Regression Results 17 / 22

Page 18: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Confidence Intervals for beta

We use result (7) to obtain from the tables of the F-distribution thecritical value F�α (k, T� k) such that

prob [F (k, T� k) > F�α (k, T� k)] = α, 0 < α < 1,

for different values of α we are in the position of evaluating exactly aninequality of the following form:

prob��bβ� β

�0X0X

�bβ� β�� ks2F�α (k, T� k)

�= 1� α,

which defines confidence intervals for bβ centred upon β.

Favero () Interpreting Regression Results 18 / 22

Page 19: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Hypothesis Testing

Hypothesis testing is strictly linked to the derivation ofconfidence intervals.When testing the hypothesis, we aim at rejecting the validity ofrestrictions imposed on the model on the basis of the sampleevidence. Within this framework, (??)� (3) are the maintainedhypothesis and the restricted version of the model is identifiedwith the null hypothesis H0. Following the Neyman�Pearsonapproach to hypothesis testing, one derives a statistic with knowndistribution under the null. Then the probability of the first-typeerror (rejecting H0 when it is true) is fixed at α.For example, we use a test at the level α of the null hypothesisβ = β0, based on the F-statistic, when we do not reject the null H0if β0 lies within the confidence interval associated with theprobability 1� α.However, in practice, this is not a useful way of proceeding, as theeconomic hypotheses of interest rarely involve a number ofrestrictions equal to the number of estimated parameters.

Favero () Interpreting Regression Results 19 / 22

Page 20: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Hypothesis Testing

The general case of interest is therefore the one when we have rrestrictions on the vector of parameters with r < k. If we limit ourinterest to the class of linear restrictions, we can express them as

H0 = Rβ = r,

where R is an (r� k) matrix of parameters with rank k and r is an(r� 1) vector of parameters. To illustrate how R and r are constructed,we consider the baseline case of the CAPM model; we want to imposethe restriction β0,i = 0 on the following specification:�

rit � rrf

t

�= β0,i + β1,i

�rm

t � rrft

�+ ui,t, (8)

Rβ = r,�1 0

� � β0,iβ1,i

�= (0) .

The distribution of a known statistic under the null is derived byapplying known results.

Favero () Interpreting Regression Results 20 / 22

Page 21: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Hypothesis Testing

If�bβ j X

�� N

�β, σ2 (X0X)�1

�, then:�

Rbβ� r j X�� N

�Rβ� r, σ2R

�X0X

��1 R0�

. (9)

The test is constructed by deriving the distribution of (9) under thenull Rβ� r = 0.Given that �

Rbβ� r j X�= Rβ� r+R

�X0X

��1 X0ε,

under H0, we have:�Rbβ� r

�0 �R�X0X

��1 R0��1 �

Rbβ� r�

= ε0X�X0X

��1 R0 �

R�X0X

��1 R0��1

R�X0X

��1 X0ε

= ε0Pε.

where P is a symmetric idempotent matrix of rank r, orthogonal to M.Favero () Interpreting Regression Results 21 / 22

Page 22: Interpreting Regression Resultsdidattica.unibocconi.it/mypage/dwload.php?nomefile=... · Interpreting Regression Results Interpreting regression results is not a simple exercise

Hypothesis Testing

Then�Rbβ� r

�0 �R (X0X)�1 R0

��1 �Rbβ� r

�s2 � rF (r, T� k) , under H0,

which can be used to test the relevant hypothesis.

Favero () Interpreting Regression Results 22 / 22