working paper series faculty of financefinancial information reflects firm characteristics, which...
TRANSCRIPT
Working Paper Series
Faculty of Finance
No. 11
Fixed-effects in Empirical Accounting Research
Eli Amir, Jose M. Carabias, Jonathan Jona, Gilad Livne
Fixed-effects in Empirical Accounting Research
Eli Amir
Tel Aviv University and City University of London
Jose M. Carabias London School of Economics and Political Science
Jonathan Jona
University of Melbourne
Gilad Livne University of Exeter Business School
21 July 2015
Comments welcome
Abstract
The fixed-effects specification is often used in panel datasets as a way of dealing with correlated omitted variables. A review of recent accounting publications reveals that while researchers are generally aware of the need to include fixed-effects in empirical models when using panel datasets (firm-time observations), many chose to replace firm fixed-effects with other form of fixed-effects, mainly industry fixed-effects. We examine the consequences of using different specifications of fixed-effects and show analytically and using simulations that this can lead to biased estimates and wrong inferences. To illustrate the importance of properly including firm fixed-effects, we reexamine commonly used regression models in the accounting literature. We show how inferences change when fixed-effects are properly included. We call for a more careful consideration with regard to the use of fixed-effects specification.
Address correspondence to Eli Amir at [email protected]. We would like to thank Yakov Amihud, Eti Einhorn, Joanne Horton, Fani Kalogirou and seminar participants at the University of Exeter Business School and Tel Aviv University for helpful comments.
1
Fixed-effects in Empirical Accounting Research
1. Introduction
Many empirical studies analyze panel datasets, where both cross-section and time-series
observations are pooled together to obtain a larger and more powerful sample. The advantage of
panel datasets is they allow the investigation of relations of interest both in the cross section and
over time. However, examining panel data in accounting should carefully account for
underlying statistical properties of the cross-section data, as well as of the time series. The
challenge, however, is that such properties are not easily observed.
Petersen (2009) argues that while researchers often use panel datasets, the coefficients’
standard errors are often wrongly estimated. Also, the methods used by researchers to correct
for possible biases in standard errors vary widely, but are often wrong. To address this concern,
he proposes clustering standard errors at the firm level, time dimension or both, when
appropriate. However, he finds that clustering standard errors by both firm and time appears
unnecessary in the finance applications he considers. Gow et al. (2010) review and evaluate the
procedures commonly used in estimating standard errors in accounting research and show that
in the presence of both serial and cross-sectional dependence, existing methods often produce
misspecified test statistics.
Both Petersen (2009) and Gow et al. (2010) provide important evidence on significant
biases in standard errors and their implications for test statistics. However, both studies assume
that the models estimated by researchers are well-specified and the coefficients themselves are
not subject to the bias caused by correlated omitted variables. Both studies focus on the problem
of correlations among the error terms. We address a different problem than that addressed by
Petersen (2009) and Gow et al. (2010); namely the potential bias in the estimated coefficients
2
due to misspecified regression models. In particular, our study complements Petersen (2009)
and Gow et al. (2010) by investigating instead the effect of model misspecification (biased and
inconsistent parameters) and its implications for statistical inference. Similar to Petersen (2009)
and Gow et al. (2010) we highlight the concern of incorrect statistical inferences. However,
while Petersen (2009) and Gow et al. (2010) consider bias in the denominator of the test
statistic (standard errors), we consider the possible bias both in the numerator of the test
statistic (the coefficient estimate) and its denominator (the standard error).
To control for unobserved firm and time effects, researchers choose between two models.
The first one, which has been widely used by accounting researchers, is the fixed-effects model.
The maintained assumption under this model is that the unobserved firm and time effects are
correlated with the main explanatory variables. This is a reasonable assumption because
financial information reflects firm characteristics, which are often time-invariant.
An alternative model that is rarely used in accounting research, but is popular in other
disciplines, when panel datasets are used, is the random-effects model.1 In this specification the
unobserved firm and time effects are assumed uncorrelated with the regression residuals. In
datasets with firms being the units of analysis, the random-effect method estimates an
unobserved effect that is drawn from an in i.i.d normal distribution, and is independent of the
error term (Greene, 2003; Wooldridge, 2007). Consequently, the random-effects model
consumes fewer degrees of freedom relative to the fixed-effects model. When the researcher
believes that the model specification does not suffer from a correlated omitted variable problem,
then the random-effect model is the preferred specification, because it will produce unbiased
slope estimates and more efficient standard errors (Greene, 2003; Wooldridge, 2007). However,
1 Mundlak (1978) shows that the random-effects model is, in fact, a special case of the fixed-effects model.
3
in the presence of correlated omitted variables, which are time-invariant, the fixed-effects model
is preferred. Our study deals only with the empirical consequences of using the fixed-effects
model.
The maintained assumption under the fixed-effects model is that both dimensions (firm
and time) of the panel are correlated with the main regressors. Hence the research should
include both time and firm controls. Nevertheless, in many accounting studies researchers
replace firm fixed-effects with industry fixed-effects. This replacement could lead to biased and
inconsistent estimates, and hence incorrect inferences, when firm fixed-effects are correlated
with the main variables of interest.
To assess the significance of this problem in accounting research, we reviewed articles
published in six accounting journals over the period 2006-2013: Journal of Accounting and
Economics, Journal of Accounting Research, The Accounting Review, Review of Accounting
Studies, Contemporary Accounting Research, and European Accounting Review. Out of 1,842
articles, 1,152 (62.5%) can be classified as empirical studies (see Table 1, Panel A). While many
of these empirical articles use more than one empirical methodology, 933 articles (50.6%) use
some form of pooled regressions (see Table 1, Panel B). Many of these "pooled" studies use
some form of fixed-effects. The most common fixed-effects specification used in these
regressions are time and industry. Surprisingly, only 114 articles out of the 927 (12.2%) use firm
fixed-effects, 75 of which in the recent three years. It seems that many researchers prefer using
industry instead of firm fixed-effects.
One plausible reason for researchers’ pervasive use of industry fixed-effects instead of
firm fixed-effects is the belief that industry fixed-effects are sufficient and would not lead to
incorrect inferences. That is, the researcher believes that within-industry variation of the
4
variable of interest is negligible. Another reason may be that one or more of the explanatory
variables are time-invariant which obviates the need to control for firm fixed-effects. A third
possible explanation is that when empirical findings are not robust to the inclusion of firm fixed-
effects the researcher may not know which set of results is more credible and opts to report
results rather than no-results.
The purpose of this study is to examine the implications of model misspecification due to
omitted unobserved fixed-effects in accounting settings. We start by analytically identifying the
reasons why failing to control for firm fixed-effects could lead to wrong inferences. These
include the correlation between the omitted fixed-effects and the included explanatory variables,
using incorrect degrees of freedom and biased estimate of the included coefficients’ standard
error. Next, we use simulations of a simple model in which the main variable of interest is
moderately correlated with the unobserved firm fixed-effect. Even with a relatively moderate
correlation, we find that replacing firm fixed-effects with industry fixed-effects, could lead to
substantially wrong inferences. This is because the inclusion of industry fixed-effects does not
fully remove cross-sectional correlations between the omitted firm fixed-effects and the
independent variables. These simulations also reveal that the distribution of the underlying
coefficient of interest measured from OLS regressions with industry and time fixed-effects do
not overlap with the distribution of this coefficient when measured from OLS regressions
featuring both firm and time fixed-effects. This “disconnect” in the distributions indicates the
severity of the inference problem that could result from failing to properly control for these
fixed-effects. Specifically, omitting fixed-effects results in a much higher rate of rejection of the
null (a Type I error). However, the severity of the problem diminishes as the number of
industries increases, or the number of firms per industry decreases. In general, however,
5
ignoring the within-industry variation in the main explanatory variables results in biased
estimated coefficients and wrong inferences. Furthermore, we show that the bias caused by
omitting firm fixed-effects decreases as the variance of the main regressor increases. The
intuition is that firm fixed-effects become more crucial as the main regressor is more time-
invariant.
We then examine the sensitivity of commonly used capital markets-based accounting
regressions to the omission of firm fixed-effects. First, we select Basu's (1997) regression of the
asymmetric response of earnings to good and bad news. We select this regression because the
relation between earnings and stock returns may vary across firms and over time. Ball et al.
(2013) also argue that this is indeed the case because the relation between expected earnings and
expected stock returns varies cross-sectionally. Another, not mutually exclusive, explanation is
that the process of impounding economic news into earnings is firm-specific due to corporate
governance mechanisms, internal controls, or relationships with the auditor, all of which are
quite stable over time. The crucial factor, however, is that the researcher either cannot observe
the underlying mechanisms, or cannot collect the full set of data, and hence needs to consider
this in designing the research strategy. With similar motivation in mind, we also select the
predictive regressions of accruals and cash flow components of earnings as explanatory
variables for future earnings (Sloan, 1996).
In addition to the fixed-effects specifications, we also examine a number of alternative
specifications, which have been used in the literature. These include (i) first differencing of both
the dependent and independent variables (full differencing), (ii) using first differences of only
the dependent variable but not the independent variable, (iii) replacing the vectors of the
dependent and independent variables by their means and estimating a single cross-sectional
6
regression, (iv) demeaning only the dependent variable, and (v) using the Fama and MacBeth
(1973) estimation approach, namely estimating the coefficient in question by averaging the
coefficients and standard errors obtained from periodical cross-sectional regressions.
Of these alternative specifications, only full differencing specification yields an
overlapping distribution with that of the correct specification (that includes both firm and time
fixed-effects). However, this specification is somewhat less efficient, as the distribution of this
alternative method is more dispersed than the distribution obtained from the correct regression
specification. This highlights that incorrect inferences under the full differencing method are
more likely than under the correct model.
Our replication of the two studies, using firm fixed-effects, reveals that that the magnitude
and significance of the main coefficients of interest are quite different from what is obtained
under the original specifications. Although these replications still yield coefficients that are
reliably different from zero, the differences in magnitude suggest that the strength of the
underlying economic phenomenon could be substantially weaker. For example, our fixed-effects
estimate of the Basu’s coefficient of timely loss recognition is 40-50% lower than the estimate
obtained under the Basu (1997) estimation procedure, depending on the estimation period. We
also obtain qualitatively similar results when we replicate Sloan’s (1996) study.
With respect to Basu's (1997) regression, Ball et al. (2013) also acknowledge the need to
control for fixed-effects. However, they use an alternative approach; they suggest that
demeaning the dependent variable is equivalent to using firm fixed-effects. We show
analytically that their procedure is not equivalent to using firm fixed-effects and that their
approach is not free from bias. In fact, their approach tends to understate the magnitude of the
true coefficient. Furthermore, we show empirically that the magnitude and the standard error of
7
the slope coefficient on negative stock returns in the Basu (1997) model are smaller using the
Ball et al. (2013) approach than when firm and time fixed-effects are included. Nevertheless, the
Basu (1997) results hold, albeit with lower magnitude and significance, which is broadly
consistent with Ball et al. (2013).2
It is important to note that including firm fixed-effects does not solve all correlated
omitted variable problems. In particular, it does not solve the problem of omitting a correlated
variable that varies across time. However, in such cases including firm and time fixed-effects
would not exacerbate the underlying problem. We therefore recommend the use of firm and
time fixed-effects because there is no harm to doing so. Including firm and time fixed-effects
when this is not necessary would nevertheless yield correct inferences, but excluding firm or
time fixed-effects would lead to incorrect inferences when they are correlated with the
explanatory variables. If including firm fixed-effects is not feasible, then the “second-best”
approach is first differencing of both the dependent variable and independent variables. While
first differencing yields unbiased coefficients, it is less efficient owing to loss of data (e.g., the
first year in the panel is lost). Either way, replacing firm with industry fixed-effects is likely to
yield biased coefficients.
Section 2 summarizes the main insights from our analytical framework. Section 3 presents
results of simulations aimed at quantifying the potential bias caused by correlated omitted
variables in panel datasets. Section 4 uses the empirical models employed by Basu (1997) and
Sloan (1996) to demonstrate the effect of correlated omitted variables on estimated coefficients.
Section 5 provides concluding remarks.
2 Patatoukas and Thomas (2015) argue that the conservatism coefficient in Ball et al. (2013) is still upward biased. Our study focuses on the role of fixed effects in regression specifications and does not address the issues raised by Patatoukas and Thomas (2015).
8
2. Analytical Derivations
An appendix to this study provides detailed calculations, which are the basis for the
following text. Here we only present the essential elements.3 Let D = [DF, DT] be a matrix of
indicator (dummy) variables for firm fixed-effects (DF) and time fixed-effects (DT), and the
unobserved fixed-effects be denoted by [ , ]F T where the subscripts F and T denote firm
and time fixed-effects, respectively. Then for the panel data the model becomes
y X D (1)
The variance of ε is denoted 2 . Estimating this model will yield
y Xb Da e (2)
where b and a are the coefficient estimates of β and α, respectively and e is the estimated
regression residual. The vector of estimated slope coefficients b can be expressed (using
partitioned matrix conventions) as:
1 1* * * *[ ] [ ] [ ] [ ]D Db X M X X M y X X X y (3)
where 1[ ]DM I D D D D ; * DX M X ; * Dy M y ; * DM . (4)
MD can be thought of as a particular process of demeaning the independent and dependent
variables. It is straightforward to show that b is unbiased (that is, [ | ]E b X ). Furthermore, in
the case where the fixed effects are uncorrelated with X, employing (3) would generate an
unbiased estimate of β, which will be the same as the estimate one would obtain from regressing
y only on X.
Crucially, the demeaning process captured in the matrix M is dependent on the
researcher’s choice of which fixed-effects to include. Specifically, assume that the researcher
3 The full analytical appendix is available from the authors upon request.
9
uses instead industry fixed-effects and time fixed-effects. Let * [ , ]H TD D D and * [ , ]H T
where DH stands for the matrix of industry dummies and H is the matrix of unobserved
industry fixed-effects. Then for the panel data the estimated model is
* *y X D (5)
Importantly, the disturbance term μ involves firm fixed-effects that have not been removed
by the inclusion of industry fixed-effects. Now, the estimated coefficient b1, from this model can
be expressed similarly to (3)
* *
*
1 11 ** ** ** **
1 1** ** ** ** ** ** **
[ ] [ ] [ ] [ ]
[ ] [ ]
D D
D
b X M X X M y X X X y
X X X M X X X
(6)
where *
1* * * *[ ]DM I D D D D ;
*** DX M X ; *** Dy M y ;
*** DM (7)
Because the disturbance term still includes firm fixed-effects (since using industry fixed-
effects has not fully controlled for cross-sectional variations), it follows that
11 ** ** ** **
1** ** ** * *
[ | ] [ ] [ | ]
[ ] [ ]F F
E b X X X X E X
X X X D
(8)
It follows from (8) that b1 is biased. The magnitude of the bias is related to the covariance
element, ** * *[ ]F FX D , and the scaling factor, ** **[ ]X X , which can be thought of as a measure of
the variability in the undemeaned regressor X (see Figure 2). Hence, including industry fixed-
effects instead of firm fixed-effects will affect inferences. Furthermore, t-statistics are also a
function of the estimated coefficients’ standard error. Note also that
1 2* *|Var b X X X (9)
and
1 21 ** **|Var b X X X (10)
10
Since 2 is unknown to the researcher, it has to be estimated from the data. Let T denote
the number of years, F the number of firms and H the number of industries. The unbiased
estimator of 2 is s2, whereby:
22 ( ) ( )
( 1) ( 1) 1 ( 1) ( 1) 1
e y Xb Da y Xb Das
FT F T K FT F T K
(11)
Hence, the conditional variance of the k-th coefficient bk is based on the diagonal element
kk in the matrix1
* *X X
as follows:
1
2 2* *ˆ . |
kb kkk
s Est Var b X s X X
. (12)
With industry and time fixed-effects, the equivalent expressions are
212 1 * * 1 * *
1
( ) ( )
x ( 1) (T 1) 1 x ( 1) (T 1) 1
e y Xb D a y Xb D as
F T H K F T H K
(13)
and
1
12 21 1 ** **ˆ . [ | ]
kb kks Est Var b X s X X
(14)
Observation 3 in the Appendix states that a t-test based on the (misspecified) regression
coefficients b1, 1
1
1
2k
k
kb
b
bt
s , and a t-test based on the correct regression coefficients b,
2k
k
kb
b
bt
s , are identical if and only if
12
1
1
12
1 ** ** ,1
2 21
* * ,
2 2
[ ]
/ ( x ( 1) ( 1) 1)
[ ]
/ ( x ( 1) ( 1) 1)
k
k
k
k
k k kkb
b
k k k kb
b
b X Xbt
s e F T H T K
b X X bt
e F T F T K s
(15)
11
Otherwise, the sign and significance of the t-test of b1 would be different than that of b.
Note that with non-zero correlation between the firm fixed-effects and the independent variables
the two expressions for the t-statistics differ along four dimensions. The first is the difference
point estimates b and b1. The second difference relates to the specific kk-th element in the square
bracket in the numerator. The third is the two sums of the regression squared residuals. The
fourth and last is the difference in the degrees of freedom. With respect to the latter, note that as
H approaches F the difference in the degrees of freedom becomes smaller in magnitude. In the
extreme, when H = F the two t-statistics will be identical, as firm and industry fixed-effects
coincide.
The Ball et al. (2013) specification
Ball et al. (2013) argue that the Basu (1997) model suffers from a correlated omitted
variable problem. They suggest the problem could be solved by using a fixed-effects
specification. However, due to computation infeasibility, they suggest an alternative approach to
the standard fixed-effects specification in which the dependent variable, earnings, is adjusted for
average earnings (where averages are taken at a firm level over time). Demeaning only the
dependent variable is not identical to including firm and time fixed-effects.4 It is important to
note that Ball et al. (2013) employ several measures of returns. The important specification for
us appears in Table 5, and for this table they use size and book-to-market adjusted returns.
Importantly these returns are not zero mean. This would lead to biased coefficients.
4 Ball et al. (2013) compute unexpected returns by subtracting from firm-specific returns the market return (or the return on the corresponding size and book-to-market portfolios). If the average unexpected return is zero, the full fixed-effects model and Ball et al. (2013) approach yields unbiased estimators in the Basu (1997) framework.
12
To see this, let iy denote the firm level average of ity (that is, 1
/T
i ity y T ). If only the
dependent variable is demeaned, the model can be written as (assuming it includes time fixed-
effects):
*it i t it ity y y X u (16)
where it it f i iu C y
Cf=i denotes the time-invariant firm fixed-effects that are not explicitly modelled and hence are
absorbed in the disturbance term. Using matrix algebra, we get5
*
1
*
;
[ ]
( )
Ty D X u
M I D D D D
y M y M X u X u
(17)
The estimated vector of coefficients is therefore
1[ ]b X X X y
Taking expectations, and bearing in mind that the correct model is given by equation (1)
above and that [ | ]F F FE C X D :
1
1
1
1 1
[ | ] [[ ] | ]
[ ] [ ( ) | ]
[ ] ( )
[ ] [ [ ] ]
i
T F F T F F
A A
E b X E X X X y X
X X X E M y y X
X X X X M D M D X M D M D
X X X M X I X X X M X
(18)
where MA is the firm-level average-creating matrix. That is, demeaning the dependent variable
but without demeaning the right-hand side variables leads to a biased coefficient. Only in the
case where X+ is zero-mean, we obtain unbiased coefficients since MA X+ = 0. Importantly, the
bias in (18) is unrelated to the correlation between the firm fixed effects and the other
5 With matrix algebra Fu C y .
13
explanatory variables. Equation (18) suggests that the Ball et al’s (2013) coefficients are the true
coefficients scaled down by the expression 1[ [ ] ]AI X X X M X .
The Ball et al.’s (2013) estimates are therefore biased and they would differ from a full
fixed-effect model. To stress, this is because including time and firm fixed-effects works to
demean the panel variables in a different way. Under this specification all variables (dependent
and independent) are transformed as follows:
*it it i tv v v v v (19)
where *itv is the demeaned variable for firm i and year t, itv is the original observation, iv is the
average across all annual observations for firm i, tv is the average of the underlying variable v in
year t across all firms, and v is the grand average of v. For the dependent variable this is
different from the Ball et al.’s (2013) transformation *it it iv v v . This also affects inferences,
because the standard error of the t-test is derived from the residuals sum of squares, which also
differs between the two approaches. Additionally, a research design using the Ball et al.’s
(2013) approach is likely to incorrectly calculate the degrees of freedom. The typical statistical
software used in regression analysis will identify the transformed variable *it it iv v v as a
“single” variable and hence instead of using 1 1 1FT F T K degrees of freedom
would use 1 1FT T K degrees of freedom.6
The Fama-MacBeth (1973) approach
According to Fama-MacBeth (1973) regression coefficients are not calculated from a
panel, but rather from periodical cross-sectional regressions. Specifically, the overall coefficient
6 Ball et al. (2013) also estimate a model in which the dependent variable (earnings) is adjusted by subtracting the lagged dependent variable (but not the independent variable). It can be shown analytically that this approach also leads to biased estimates. For brevity we do not include the proof here, but simulate this specification later.
14
is the average coefficient over the T annual regressions and the standard error is derived from
the distribution of the individual (periodical) coefficients. Because periodical regressions are not
tooled to accommodate fixed firm or annual effects, this method is still prone to the same
problem. Each annual underlying model can be written as:
t t ty X (20)
where X is the matrix of annual explanatory variables. However, the individual disturbance term
incorporates the fixed-effects implying it f i t itC C . The estimated coefficient for a
single year t therefore can be expressed as in a simple OLS setup:
1 1[ ] [ ] [ ]t t t t t t t t tb X X X y X X X (21)
Averaging over T annual regressions yields the Fama-MacBeth coefficient:
11 11 1
[ ]T T
FM t t t tT Tt tb X X X
(22)
The assumption of correlated fixed-effects implies that [ | ] 0f t tE C C X . Hence we
obtain that in expectation
11
1
[ | ] [ ] [ | ]T
FM t t t t tTt
E b X X X E X X
. (23)
In other words, the Fama-MacBeth (1973) procedure is prone to the omitted fixed-effects
problem under the assumption that the true model is as expressed in equation (1).
3. Simulations
To illustrate the potential bias in estimated slope coefficients that is caused by omitting
fixed-effects, we simulate a panel dataset according to the following specification:
it i t I it ity a a a x e
15
it i tX X
0
0, , , ~ Μ, Σ , Μ
1
1 0 0 , Σ
0
0 10
0 1 0
0
i t i i i I
i t
i i
i I
X a X a X a
X a
i t i IX a
X a
X a a a N
and , ~ 0,1 t ite N . The variable , is the dependent variable, , , are firm, time and
industry fixed-effects, respectively, and is the independent variable. The values of the true
parameters are }50.0,25.0,0,25.0,5.0{,, Iiiiti XXX and 1 . The null hypothesis is
1 . Since the bias in depends on the correlation between the omitted fixed-effects and the
regressor X, we impose five different levels of correlations between the fixed-effects and X:
0.50, 0.25, 0, -0.25 and -0.50. We expect a positive, zero and negative bias for the positive, zero
and negative correlations, respectively, when the model omits the fixed-effects.7
Using the above specification, we simulate a panel of 8,000 observations made of 10
periods, 20 industries and 40 firms per industry. We repeat this process 8,000 times, applying
the following eight specifications:
(1) Firm and time fixed-effects (FE) – This model includes both firm and time fixed-effects.
We expect this specification to yield an unbiased slope estimate (b = 1). We label this
model as FE.
(2) Industry and time fixed-effects (IE) – Here, we include time and industry fixed-effects,
by replacing with . This specification ignores within-industry variations at the firm
level and hence we expect the slope estimate to be biased. Notice that this specification
approaches the full fixed-effects model as the number of firms per industry decreases. At
7 In a single variable setting, it is easy to sign the bias; In a multivariate setting, however, the sign of the bias depends on the correlation matrix of the regressors (see the Appendix).
16
the extreme case where there is one firm per industry, this specification is identical to the
full fixed-effects model. To show this, we conduct sensitivity analysis where we
sequentially increase the number of industries (and reduce the number of firms per industry)
keeping the number of observations constant (see Table 3). We label this model as IE.
(3) Misspecified model (MS) – Here we omit all fixed-effects; hence, we expect the estimated
slope coefficient (β) to be biased to a greater degree than the previous model. However,
given the small number of time periods (10) relative to the number of firms (800), the bias
from omitting time effects is expected to be small. This situation is similar to that in many
studies that use archival data, as the number of firms is much larger than the number of
periods. We label this model MS.
(4) First differences model for both the dependent and independent variables (FD) – In
this model, we use first differences instead of current values (that is, current value minus
the lagged one). This leads to an unbiased estimated slope coefficient, and one could omit
the fixed-effects from the model. However, the differencing process involves loss of
information, as the first period in the panel is lost. We label this method FD.
(5) Using first differences for the dependent variable only (LY) – Here, the researcher first-
differences the dependent variable but not the independent variables. We expect this
specification to yield biased results. However, in this case the bias in induced not only by
the covariance between the independent variable and the fixed-effects, but also by the
exclusion of the variable βXit-1 from the model. Hence, the bias in this case is also a function
of the true β; when β is positive the bias is negative and when β is negative the bias is
positive. We label this model LY (Lag Y).
17
(6) Using the time-series means of the independent and the dependent variables (MYX) –
Here we convert the panel dataset into a single cross-sectional regression by using the
means of the independent and dependent variables as the main variables in the regression
(see for instance, Aghion et al., 2010). Using time-level means implies that the error term
still includes firm fixed effects and hence the coefficient estimates are biased.
(7) Demeaning the dependent variable Y (MY) - Similar to the LY case and motivated by
Ball et al. (2013), this specification only adjusts the Ys by subtracting the firm level
averages. However, this specification is expected to yield biased estimates, as argued above.
The bias is induced not by the covariance between the independent variable and the fixed-
effects, but by the failure to demean the dependent variable at the firm level. Equation (18)
suggests that coefficient estimate under this specification, is a scaled-down estimate of the
true parameter. We therefore expect it to be smaller than 1, regardless of the sign of the
correlation between the fixed-effects and independent variable. We label this model as MY.
(8) Fama-MacBeth (FM) – We also estimate the model (without fixed-effects) using the
Fama-MacBeth (1973) procedure; that is, estimating 10 periodical regressions and reporting
the average slope coefficient. Equation (23) suggests the FM specification would yield
biased estimate, under the true model that includes firm and time effects. We label this
model as FM.
For each of the eight specifications, we obtain 8,000 slope estimates. We also vary the
magnitude of the correlation between the fixed-effects and the regressor X. Table 2 reports the
means of the estimated slope coefficients, standard errors, t-statistics of the distance from the
true coefficient (β = 1), and R2s for five levels of correlations: 0.5, 0.25, 0, -0.25, and -0.5. We
18
also present the distribution of the estimated slope coefficients in Figure 1, using three different
correlations: Figures 1a, 1b, and 1c present the distribution for a correlation of 0.5, 0, and -0.5,
respectively.
By construction, the full fixed-effects model (FE) yields an unbiased estimate (b = 1) and
high R2s for all levels of correlation. Also, the distribution of the bs is the tightest among all
alternative distributions, as can be seen from the figures. The second model (industry and time
effects, denoted IE) yields a positive bias (b = 1.25) when the correlation between the fixed-
effects and the regressor is 0.5; zero bias when the correlation is zero and negative bias when the
correlation is -0.5 (b = 0.75). We would incorrectly reject the null hypothesis that the slope is
equal to 1 in all cases, except for the case of zero correlation between the fixed-effects and the
regressor. Also, the regression R2s are lower relatively to the FE model.
When both firm and time effects are omitted (MS), the pattern of bias is similar to that
observed for model IE. That is, with 20 industries and 800 different firms, controlling for
industry fixed-effects performs equally poorly as the fully misspecified model. Moreover, from
Figures 1a-1c we note that the distributions of the slope coefficient under both the MS and IE
are completely disjoint from the distribution of b under the FE specification. This suggests that
it is very unlikely that a slope estimate from these two specifications would fall within a
conventional confidence interval obtained under the full fixed-effect model.
Using first differences for both the dependent and independent variables (FD) yields an
unbiased slope estimate (b = 1.00) for all five correlations, but the estimate is less efficient as
reflected by the larger standard errors, lower t-statistic, and lower adjusted R2 and the larger tails
seen in Figures 1a-1c. Using a first difference only for the dependent variable (LY) yields a
biased and less efficient estimate (b = 0.50; Adjusted R2 = 0.09), regardless of the correlation
19
between the fixed-effects and regressor X. This is because the model is not sensitive to the
correlations between the fixed-effects and the independent variable.
In model MYX, we use means of X and Y and estimate a cross-sectional regression. This
model yields a large positive (negative) bias when the correlation between the fixed effects and
the regressor is positive (negative). However, in the case of demeaning the dependent variable
only (MY), the bias is negative regardless of the correlation between the fixed effects and the
regressor. The reason the two models LY and MY lie to the left of the FE distribution is
consistent with the theoretical prediction derived in the previous section, which states that with
one explanatory variable the estimated coefficient will be smaller in magnitude. Since β is
positive here, they yield values smaller than 1.8
Using the Fama-MacBeth (1973) specification (FM) yields qualitatively similar results as
the misspecified model, with even larger tails of the distribution. This is seen in Figures 1a-1c
where the FM parameter distribution obscures the parameter distribution of the MS model.
Finally, when the correlation between the fixed-effects and the independent variable is zero,
omitting the fixed-effects is not expected to cause any bias. Indeed, the results show that the
slope coefficients in models IE, MS, and FM are unbiased.9
8 We also find from additional simulations (not tabulated) that when β = -1, the LY and MY distributions are within negative value range and lie to the right of the FE and FD distributions. This, again, is consistent with lower magnitude relative to the true value. 9 Note that the distributions of the slope coefficient depicted in the three charts of Figure 1 do not correspond to the average standard errors reported in Table 2. For example, the distribution of β under the FM specification features larger tails than that of the FE model, although the standard error for the FM model (0.008) is smaller than that of the FE model (0.012). To demonstrate this issue, assume that we run 5 simulations of 8,000 observations each and obtain the following coefficient estimates for the FM model: -2, -1, 0, 1, and 2. Also, suppose the output for the OLS regression is such that each coefficient is estimated with standard error of 1. In contrast, for the FE model assume we obtain coefficients of -1, -0.5, 0, 0.5 and 1, but each coefficient is estimated with a standard error of 2. Then, if we were to chart these outcomes, the distribution of the FM (FE) will be wider (narrower), but the average standard error tabulated would be smaller (larger) for the FM (FE) specification.
20
The conclusion from this analysis is that omitting firm fixed-effects will result in biased
slope coefficients unless the fixed-effects are uncorrelated with the independent regressor. Using
firm fixed-effects is a safe approach in that it will generate unbiased coefficients even when the
data generating process does not contain unobserved correlated fixed-effects. An alternative
approach would be to conduct the Hausman (1978) test procedure to identify whether fixed-
effects should be employed. However, since the Hausman test practically runs a fixed-effect
model against a model with no fixed-effects, we see no clear advantage over routinely including
firm fixed-effects.10
(Table 2 and Figure 1 about here)
Since industry fixed-effects is often used instead of firm fixed-effects, we examine the
effect of firm distribution across industries on the results by changing the number of firms per
industry while keeping the total number of observations constant at 8,000. We consider the
following cases: (i) 10 industries with 80 companies in each industry; (ii) 20 industries with 40
companies in each industry (the baseline used above); (iii) 40 industries with 20 companies in
each industry; (iv) 160 industries with 5 companies in each industry; and (v) 400 industries with
2 companies in each industry. We expect the bias to decline as the number of industries
increases. In the extreme case of one firm per industry, there will not be any bias, as this case
coincides with the full fixed-effects specification.
Table 3 contains the results of this analysis. As the number of industries increases and the
number of companies per industry decreases, the bias declines. However, the decline in the bias
is rather small. For example, the bias is 24% when we use 40 industries and 20 companies per
industry; it declines to 17% when using 400 industries and two companies per industry. Hence,
10 Another advantage for using fixed effects specifications is that typical software output can report the fixed effects coefficients, if the researcher is interested in exploring or reporting these coefficients.
21
replacing firm fixed-effects with industry fixed-effects and increasing the number of industries
will not eliminate the bias in the coefficients, although using a finer industry classification might
reduce the bias. For example, using the Fama-French 48-industry classification in estimating
panel datasets is expected to yield less biased results than using the 12-industry classification.
(Table 3 about here)
The bias caused by omitting firm and time fixed-effects depends also on the time-variance
of the main regressor X. As the time-variance of the regressor X increases, the bias caused by
omitting the firm fixed-effects is expected to decline. To see this, we let the variance of X (i.e.,
of the parameter - t ) to decrease from 2.0 to 0.25 in intervals of 0.25. As Figure 2 shows, the
bias in the slope coefficient increases as the variance of X decreases. In other words, omitting
firm fixed-effects results in a larger bias as the main regressors become more time-invariant. In
contrast, when the regressor X varies over time, omitting firm fixed-effects is likely to result in
little bias if at all.
(Figure 2 about here)
Overall we draw several conclusions from the simulation analysis:
(i) Omitting firm fixed-effects may generate biased estimates and overstated t-statistics, hence,
wrong inferences. Replacing firm with industry fixed-effects is not a valid approach as it
does not eliminate the coefficient bias, if the purpose is to control for unobserved correlated
omitted variables that are time-invariant. While increasing the number of industries is likely
to reduce the bias, this approach is unlikely to eliminate the bias.
(ii) Using means of the dependent and independent variables, or the approach taken by Ball et
al. (2013) are not equivalent to using firm fixed-effects. These methods yield biased
estimates. Same holds for lagging just the dependent variable.
22
(iii) Using first differences (for both the dependent variable and independent variables) is a
valid, but less efficient, estimation strategy.
(iv) The coefficient distributions of several specifications may be so disjoint from the
coefficient distribution of a fixed-effect model that respective confidence intervals may be
entirely non-overlapping. That is, the chance of correct inference under the wrong
specification may be quite slim.
4. Implications for Empirical Accounting Research
We now examine the effects of using different model specifications on the results of
commonly used regression models in accounting research. We chose two regressions that have
gained wide recognition: Basu's (1997) model of asymmetric timeliness of earnings and Sloan’s
(1996) differential persistence of accruals and cash flow components of earnings.11
4.1 The Asymmetric Timeliness of Earnings – Basu (1997)
The Basu (1997) model highlights the differential reaction of earnings to good and bad
news, where stock returns serve as a proxy for news. The regression is:
ititititititit RRDRRDPX )0()0(/ 10101
where denotes firm i's annual stock returns for the 12 months starting nine months prior to
fiscal year-end until three months after the fiscal year-end, a period that roughly corresponds to
the period between earnings announcements. denotes firm i's earnings per share for year t,
11 The problem of excluding fixed-effects applies to any dynamic panel data models where the dependent variable is a function of lagged values of the independent variables. Suppose that the data generating process is
1it i t it ity a a y e . Then it follows that the dependent variable1 1 2 1it i t it ity a a y e . If the researcher
estimates instead the regression1 it it ity y u where
it it i tu e a a , it follows that1 1[ | ] 0it it itE u y y and
therefore the estimate b will be biased.
23
denotes firm i’s share price at the beginning of fiscal year t, and D(Rit < 0) is an indicator
variable obtaining the value "1" if stock returns are negative, and "0" otherwise.
Like Basu (1997), we use all firm-year observations from 1963 to 1990 for which stock
returns are available on the CRSP monthly files, and the necessary accounting data available on
Compustat. Similarly, we deflate earnings by the beginning-of-year share price and eliminate
observations falling in the top or bottom 0.5% of opening price-deflated earnings in each
calendar year to reduce the effects of outliers on the results.
Like Ball et al. (2013) we consider two additional definitions of stock returns: market
adjusted returns, and size and book-to-market adjusted returns. The size and book-to-market
adjusted returns are computed by forming 5x5 portfolios based on annual sorts on market
capitalization and on the book-to-market ratio (at the end of year t-1). We then calculate
monthly value-weighted mean returns for each size and book-to-market portfolio and subtract
the portfolio returns from the same size and book-to-market quintiles raw returns. Market
adjusted returns are raw returns minus the value-weighted market returns. To save space, we
report results only for raw returns; results for market-adjusted returns and size and book to
market-adjusted returns are similar.
We collect data for all US firms that trade on the NYSE, AMEX and NASDAQ. Our
sample contains 114,175 firm-year observations for the period 1963-2013, and 42,546 firm-year
observations for the period 1963-1990. For comparison, Basu (1997) reports results for a sample
of 43,321 firm-year observations over the same period. Panel A of Table 4 reports summary
statistics for the regression variables for the period 1963-2013. These statistics are consistent
with those reported in Patatoukas and Thomas (2011) and Ball et al. (2013). For instance,
1/it itX P is left-skewed and itR is right-skewed. The median value of 1/it itX P in our study is
24
0.063 whereas Patatoukas and Thomas (2011) report median value of 0.063 and Ball et al.
(2013) report median value of 0.062. The median value of annual stock returns itR in our study
is 0.094, whereas Patatoukas and Thomas (2011) report 0.089 and Ball et al. (2013) report a
value of -0.047.
(Table 4 about here)
Table 5 presents the sensitivity of the asymmetric timeliness of earnings regression to the
inclusion of fixed-effects. Looking at the 1963-1990 sample, our results show that when pooled
OLS is used, the coefficient 1 is positive (0.198) and significant at the 0.01 level (t-statistic of
24.12). Basu (1997) reports a somewhat larger coefficient of 0.256 (t-statistic of 27.14). The
common interpretation of this result is that contemporaneous earnings reflect negative news in a
timelier manner than positive news (accounting earnings are conditionally conservative).
When firm and time fixed-effects are included in the model the slope coefficient is
significantly lower than that reported by Basu (1997). Specifically, the coefficient 1 is 0.091 (t-
statistic = 10.88) for the 1963-1990 period and 0.145 (t-statistic = 28.45) for the 1963-2013
period. When industry effects are included in the model instead of firm effects, 1 is 0.163 (t-
statistic = 21.72) for the 1963-1990 period and 0.223 (t-statistic = 44.90) for the 1963-2013
period, very similar to those reported by Basu (1997). Furthermore, using the Fama-MacBeth
(1973) methodology yields 1 equal to 0.211 (t-statistic = 9.26) for the 1963-1990 period and
0.26 (t-statistic = 14.51) for the 1963-2013 period, again very close to the results reported by
Basu (1997).
In sum, adding firm fixed-effects reduces the coefficient on conservatism ( 1 )
substantially, while using industry fixed-effects or the Fama-MacBeth (1973) methodology
25
yields a much higher conservatism coefficient. These results are in line with our theoretical
predictions and highlight the fact that when the data generating process contains firm fixed-
effects, the inclusion of industry fixed-effects does not help in dealing with unobserved firm
heterogeneity. Similarly, as predicted, the Fama-MacBeth (1973) approach is also subject to
unobserved heterogeneity biases.
Aghion et al. (2010) use a different approach to estimating a panel dataset. Instead of
adding firm and time fixed-effect, they convert the panel dataset into a cross-sectional
regression whereby instead of the vectors of dependent and independent variables, they use
time-level means (denoted here MYX). Using this method yields conservatism coefficients ( 1 )
equal to 0.097 (t-statistic = 9.03) for the 1963-1990 period and 0.096 (t-statistic = 10.43) for the
1963-2013 period.
Ball et al. (2013) acknowledge that in order to deal with this problem, the researcher could
use a fixed-effects specification. However, in their empirical approach and due to computational
constraints, they use a different approach: demeaning the dependent variable (MY). To assess
the effect of their approach, the last specification in Table 5 employs the Ball et al. (2013)
specification (denoted, MY). As can be seen, 1 is 0.048 (t-statistic = 6.65) for the 1963-1990
period and 0.082 (t-statistic = 18.96) for the 1963-2013 period. These values are similar to the
estimates reported in Table 5 (row 5) in Ball et al. (2013). We therefore conclude that 1 under
the Ball et al. (2013) specification is significantly lower than the coefficient obtained from the
full fixed-effects specification. This result suggests that the MY specification provides a lower
bound for the conditional conservatism coefficient. This is likely to be the case as the mean
value of the dependent variable is positive, as suggested by our simulations.
26
Overall, our results suggest that there is substantial unobserved heterogeneity at the firm
level that seems to be an important determinant for explaining price-deflated earnings. These
findings are consistent with the Ball et al. (2013) finding that the Basu (1997) is affected by
correlated omitted variables due to the expected components of earnings being correlated with
the expected components of stock returns. However, we argue that the empirical specification of
Ball et al. (2013) is not an appropriate substitute for firm and time fixed-effects (FE). This
notwithstanding, from a qualitative standpoint, our results confirm the presence of conditional
conservatism in earnings.
(Table 5 about here)
4.2 The differential persistence of accruals and cash flow components of earnings
Sloan (1996) explores the association between future income and previous year’s accruals
and cash flows. He finds that the persistence of cash flows exceeds that of accruals, which is
consistent with the reversal property of accruals. Sloan (1996) estimates the following
regression, allowing the persistence coefficient on the accruals and cash flow components of
earnings to be different. The model is:
1 10 0 1 1 1/ / /it it itit it it itOI TA ACC TA CF TA e
where / ititOI TA is operating income divided by average total assets, 11 / ititACC TA denotes
operating accruals divided by average total assets, and 11 / ititCF TA denotes operating cash
flows divided by average total assets.
The accrual component of operating income is measured as ACCit = (∆CAit - ∆Cashit) -
(∆CLit - ∆STDit - ∆TPit) – Depit, where ∆CAit is the change in current assets; ∆Cashit is the
change in cash and cash equivalents; ∆CLit is the change in current liabilities; ∆STDit is the
27
change in debt included in current liabilities; ∆TPit is the change in income taxes payable; and
Depit is the depreciation and amortization expense. The cash flow component of earnings (CFit)
is measured as the difference between operating income and the accrual component of earnings.
We collect a sample, which includes all firm-year observations with the necessary
accounting and stock return data available on Compustat and CRSP monthly file between 1962
and 1991. This is the sample analyzed by Sloan (1996). We also collect data for an extended
sample for 1963-2013. We only sample US firms that trade on NYSE, AMEX or NASDAQ. As
before, we eliminate the top and bottom 0.5% of observations. Table 4, Panel B, contains
descriptive statistics for the main variables for the extended period 1963-2013. Median
operating income over average total assets is 0.13. This figure is made of a median accrual
component of 0.01 and a median cash flow component of 0.12.
Table 6 presents results of estimating the differential persistence regressions for the full
period 1963-2013 and for the sub-period 1962-1991. For the sub-period, using a pooled
regression without fixed-effects, the coefficient on the accrual component ( 0 ) is 0.621 (t-
statistic = 141.29), lower, as expected, than the coefficient on the cash flow component ( 1 ),
which is 0.732 (t-statistic = 207.87). These coefficients are somewhat lower than those reported
by Sloan (1996) in Table 3 ( 0 = 0.765 and 1 = 0.855). However, similar to Sloan (1996), our
results also show that the accrual component of earnings is less persistent than the cash flow
component (0.62 vs. 0.73) and that the difference between the two coefficients is significant at
the 0.01 level.
Adding firm and year fixed-effects reduces the persistence coefficients quite significantly.
The coefficient on the accrual component ( 0 ) is 0.468 (t-statistic = 128.33) and the coefficient
on the cash flow component ( 1 ) is 0.538 (t-statistic = 197.53) for the 1962-2013 period. The
28
corresponding coefficients for sub-period are even lower (0.377 and 0.453, respectively). When
firm fixed-effects are replaced with industry fixed-effects, the persistence coefficients increase
to 0.680 and 0.789, respectively for the entire sample period, and in the sub-period these
coefficients are 0.606 and 0.713, respectively. That is, the coefficients without fixed-effects and
with industry and time fixed-effects are virtually identical. Using the Fama-MacBeth (1973)
methodology has a minor effect on the persistence coefficients and these coefficients are of a
similar magnitude as without any fixed-effects.
Using means of the dependent and independent variables (MYX) yields very high
persistence measures. For the entire sample period the persistence measures of accruals and cash
flows are 0.816 and 0.988, respectively. Interestingly, for the sub-period, these measures are
very similar to this reported by Sloan (1996): 0.715 and 0.888, respectively. Finally, demeaning
the dependent variable (MY) reduces both persistence coefficients to the lowest magnitude
reported in this table. Moreover, for the sub-period 1962-1991 accruals are not less persistent
than cash flows; clearly, this last specification yields downwards biased coefficients, as shown
in our simulations.12
To summarize, with firm fixed-effects the magnitude of the coefficients in the Sloan
(1996) model is smaller than originally reported. This suggests that both accruals and cash flows
are only moderately persistent, although accruals are still found to be less persistent than cash
flows.
(Table 6 about here)
12 In all other specifications we find that accruals are less persistent than cash flows and that the difference between the two coefficients is statistically significant at the 0.01 level.
29
5. Conclusion
Accounting researchers often use panel datasets that contain firm/time observations.
However, instead of controlling for firm and time fixed-effects, researchers often use industry
and time fixed-effects or none at all. When asked about the reason for avoiding firm fixed-
effects, some researchers have argued that by including firm fixed-effects they "throw the baby
with the bathtub water." Our study highlights the consequences of adopting this view on
estimation results. We show analytically and empirically that omitting firm fixed-effects yields
biased and inconsistent slope estimates and hence erroneous test statistics, which in turn could
result in incorrect inferences.
We complement recent studies by Petersen (2009) and Gow et al. (2012) that address
potential problems in panel datasets due to correlation in residuals over time and thus biased
standard errors. Unlike Petersen (2009) and Gow et al. (2012) who assume that the regression
model is well-specified, we focus on cases where models are misspecified, and hence the
coefficient estimates are biased and the related standard errors are incorrect. Specifically, we
show how incorrect inferences stem not only from the test statistics denominator, i.e., the
standard error (e.g., Petersen, 2009; Gow et al., 2012) but also from the test statistics numerator,
i.e., the coefficient estimates.
Our study is the first that focuses on the potential bias in estimated coefficients due to
omitting firm fixed-effects when panel datasets are used. Our survey of the common panel
dataset regression specifications used in accounting literature illustrates a clear preference for
the use of industry rather than firm fixed-effects. We find that the inclusion of industry fixed-
effects does not eliminate the bias and could lead to markedly incorrect inferences. This is due
to potential within-industry variations that are ignored or wrongly assumed to be immaterial by
30
researchers. Our results show that the bias in coefficients is negatively related to the number of
industries controlled. This provides further support for within-industry variations and for the
need to use firm fixed-effects. We further test and show that other commonly used methods such
as differencing the dependent variable, or demeaning it, and using the Fama-MacBeth (1973)
procedure yield biased slope coefficients, unless the fixed-effects are uncorrelated with the
independent variable. In addition to the firm fixed-effects model, using first differences for both
the dependent and independent variables yield unbiased estimates but these are less efficient.
With the aim of providing guidance for empirical accounting researchers, we conclude
that the commonly used methods addressing the potential limitations in inferences of panel
datasets do not eliminate the correlated omitted variable problem, with the exception of firm and
time fixed-effects. This is due to the fact that with archival data, the exact form of the data
generation process is unknown to researchers. Our replications of two widely recognized
regression models in the accounting literature show that regression results are sensitive to the
method used. Without knowing all the underlying mechanisms, researchers should check for
substantial differences between a full fixed-effects specification and a simple pooled regression.
A substantial difference may highlight the need to use a full fixed-effects specification.
31
References Aghion, P., Y., Algan, P. Cahuc, and A. Shleifer (2010). Regulation and Distrust, The Quarterly Journal of Economics vol. 125, no. 3, pp. 1015-1049. Ball, R., S. Kothari, S., and V. Nikolaev (2013). On Estimating Conditional Conservatism, The Accounting Review, vol. 88, no. 3, pp. 755-787.
Baltagi, B.H. (2008) Econometrics Analysis of Panel Data (4th Edition). Chichester : Wiley.
Basu, S. (1997). The conservatism principle and the asymmetric timeliness of earnings. Journal of Accounting and Economics, vol. 24, no. 1, pp. 3-37.
Easton, P. D., and T. S. Harris (1991). Earnings as an explanatory variable for returns. Journal of Accounting Research, vol. 29, no. 1, pp. 19-36.
Fama, E. and J. MacBeth (1973). Risk, Return, and Equilibrium: Empirical Tests. Journal of Political Economy, vol. 81, pp. 607–36. Gow, I. D., G. Ormazabal, and D. J. Taylor (2010). Correcting for cross-sectional and time-series dependence in accounting research. The Accounting Review, vol. 85, no. 2, pp. 483-512.
Greene, W.H., 2003. Econometric analysis, 5th. Edition. Upper Saddle River, NJ: Pearson Education.
Hausman, J. A. (1978). Specification tests in econometrics. Econometrica: Journal of the Econometric Society, vol. 46, no. 6, pp. 1251-1271. Mundlak, Y., 1978. On the pooling of time series and cross section data. Econometrica: journal of the Econometric Society, no. 46, pp. 69-85. Patatoukas, P., and J. Thomas (2011). More evidence of bias in differential timeliness estimates of conditional conservatism. The Accounting Review, vol. 86, no. 5, pp. 1765-1793. Patatoukas, P. N., and J. K. Thomas (2015). Placebo tests of conditional conservatism. The Accounting Review, forthcoming. Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, vol. 22, no. 1, pp. 435-480. Sloan, R. G. (1996). Do stock prices fully reflect information in accruals and cash flows about future earnings? The Accounting Review, vol. 71, no. 3, pp. 289-315. Wooldridge, J.M., 2010. Econometric analysis of cross section and panel data. MA: MIT Press.
32
Table 1 A Survey of Research Methodologies in Accounting Journals (2006-2013)
Panel A. Accounting Journals’ Review Statistics Journal Empirical Experiment Theory Essay Survey Case Study Other Total
CAR 179 42 25 47 10 3 18 324
EAR 91 5 20 68 18 6 16 224
JAE 212 1 23 44 2 0 1 283
JAR 188 22 39 45 4 0 1 299
RAST 153 6 24 46 0 0 0 229
TAR 328 75 37 17 21 2 3 483
Total 1152 151 168 267 55 11 39 1842
Panel B. Accounting Journals’ Regressions Specification and Treatments
Journal Pooled Annual Portfolio Time Industry Firm Country
CAR 150 3 3 83 71 13 11 EAR 62 15 2 26 27 9 6 JAE 179 9 20 86 76 27 11 JAR 144 10 19 79 66 23 11
RAST 116 10 15 61 46 8 2 TAR 282 28 17 128 120 34 17 Total 933 75 76 463 406 114 58
Panel C. Firm Fixed-Effect per year
Year 2006 2007 2008 2009 2010 2011 2012 2013
Total 8 5 7 12 7 22 26 27
Notes: 1. Journal abbreviations are: Contemporary Accounting Research (CAR), European Accounting
Review (EAR), Journal of Accounting and Economics (JAE), Journal of Accounting Research (JAR), Review of Accounting Studies (RAST), and The Accounting Review (TAR).
2. Column definitions in Panel A are: - Empirical – Studies that use archival data to support a theory or derive a conclusion. - Experiment – Studies that carry out experiments with the goal of verifying, falsifying, or
validating a hypothesis. - Theory – Studies that operate within theoretically defined framework. Use mathematical
derivations to illustrate and verify hypothesis. - Essay – Studies that do not test any hypothesis but merely discusses concepts within the
accounting discipline. These studies are often discussions of other papers or editors comments on specific topics.
33
- Survey – Studies that gather and collect data by sending surveys to subjects. - Case Study – Qualitative studies that study specific subjects in depth. - Other - Interviews, Descriptive studies, and studies on methodology.
3. Column definitions in Panel B are: - Pooled – Studies that use pooled cross-section and time-series regressions. Many
empirical studies use pooled regressions as a first step before improving the specification.
- Annual – Studies that use indicator variables for specific years or periods (for instance, pre-SOX.
- Portfolio – Studies that use portfolio analysis. - Time – Studies that include time fixed-effects in pooled regressions. - Industry – Studies that use industry fixed-effects in pooled regressions. - Firm – Studies that include firm fixed-effects in the pooled regressions. - Country – Studies that include country dummies. This could be done for specific
countries but also for all countries in the sample.
34
Table 2 Simulation Results
1 2 3 4 5 6 7 8
FE IE MS FD LY MYX MY FM β = 1 ρ = 0.5 Slope (b) 1.00 1.247 1.250 1.00 0.50 1.45 0.45 1.25 t-stat (b = 1) 0.01 22.8 16.5 0.01 -26.0 10.1 -46.5 35.8 Standard error 0.012 0.011 0.015 0.016 0.019 0.045 0.012 0.008 Adjusted R2 0.85 0.73 0.46 0.35 0.09 0.56 0.15 0.53 β = 1 ρ = 0.25 Slope (b) 1.00 1.12 1.12 1.00 0.50 1.23 0.45 1.12 t-stat (b = 1) 0.01 11.12 8.12 0.01 -26.00 4.82 -46.40 17.8 Standard error 0.012 0.011 0.015 0.016 0.019 0.047 0.012 0.008 Adjusted R2 0.84 0.69 0.40 0.35 0.09 0.46 0.15 0.47 β = 1 ρ = 0 Slope (b) 1.00 1.00 1.00 1.00 0.50 1.00 0.45 1.00 t-stat (b = 1) -0.02 -0.01 -0.01 -0.03 -26.03 0.01 -46.49 -0.02 Standard error 0.012 0.011 0.015 0.016 0.019 0.048 0.012 0.008 Adjusted R2 0.83 0.66 0.34 0.35 0.09 0.35 0.15 0.40 β = 1 ρ = -0.25 Slope (b) 1.00 0.87 0.87 1.00 0.50 0.77 0.45 0.87 t-stat (b = 1) -0.01 -11.13 -8.14 -0.03 -25.93 -4.82 -46.36 -17.66 Standard error 0.012 0.011 0.015 0.016 0.019 0.047 0.012 0.008 Adjusted R2 0.81 0.63 0.29 0.34 0.09 0.25 0.15 0.35 β = 1 ρ = -0.50 Slope (b) 1.00 0.75 0.75 1.00 0.50 0.55 0.45 0.75 t-stat (b = 1) 0.013 -22.78 -16.48 0.01 -25.95 -10.1 -46.37 -35.90 Standard error 0.012 0.011 0.015 0.016 0.019 0.045 0.012 0.008 Adjusted R2 0.79 0.61 0.23 0.34 0.09 0.15 0.15 0.29
Notes: We simulate a panel of 8,000 observations made of 10 periods, 20 industries and 40 firms per industry. The correlations between the fixed-effects and the independent variable are 0.5, 0.25, 0, -0.25, and -0.50, respectively. The estimation methods are: (1) FE – Including fixed firm and time fixed-effects (2) IE - Industry and time fixed-effects (3) MS - Miss-specified model (no fixed-effects) (4) FD - First differences for both the dependent and independent variables (5) LY – First differences of the dependent variable only (6) MYX – Using means of both the dependent and independent variables (subtracting the firm
mean from both the dependent and independent variable). (7) MY – Demeaning the dependent variable (subtracting the firm mean from the dependent
variable) (8) FM – Fama-Macbeth (1973) applied to annual OLS regressions. The table reports the means of the estimated slope coefficient (b), the t-statistics, standard errors and R2s across the 8,000 simulations of 8,000 observations in each round.
35
Figure 1 Plots of the Distributions of Estimated Coefficients under Different Model
Specifications and Correlation Parameters
Figure 1a
Figure 1b
0
500
1000
1500
2000
2500
3000
3500
0.38 0.50 0.63 0.75 0.87 1.00 1.12 1.24 1.37 1.49 1.61
No. o
f observations
Distributions of β
β=1; correlation=0.5FE
IE
MS
FD
LY
MY
FM
MYX
0
500
1000
1500
2000
2500
0.39 0.47 0.55 0.63 0.70 0.78 0.86 0.94 1.01 1.09 1.17
No. o
f observations
Distributions of β
β=1; correlation=0FE
IE
MS
FD
LY
MY
FM
MYX
36
Figure 1c
Table 3 Varying the Number of Industries and Number of Member Firms
Industry Fixed-effects
β = 1 ρ = 0.5 Full FE
IE 10/80
IE 20/40
IE 40/20
IE 160/5
IE 400/2
MS
Slope (b) 1.00 1.249 1.248 1.24 1.22 1.17 1.25 t-stat (b = 1) 0.02 16.7 16.6 16.2 14.6 10.4 16.6 Standard error 0.012 0.015 0.015 0.015 0.016 0.017 0.015 Adjusted R2 0.85 0.46 0.47 0.47 0.50 0.52 0.46
Note: We simulate a panel of 8,000 observations made of 10 periods, a varying number of industries (10, 20, up to 400) and a varying number of companies per industry (80, 40, down to 2). The correlation between the fixed-effects and the independent variable is positive. The models are: FE – Full Fixed Time and Firm Effects; IE - Industry and time fixed-effects; and MS - Miss-specified model (no fixed-effects). The table reports the means of the estimated slope coefficient (b), the t-statistics, standard errors and R2.
0
500
1000
1500
2000
0.40 0.47 0.54 0.61 0.67 0.74 0.81 0.88 0.95 1.02 1.09
No. o
f observations
Distributions of β
β=1; correlation=‐0.5FE
IE
MS
FD
LY
MY
FM
MYX
37
Figure 2 Sensitivity to Increasing the Cross-time Variance of the Regressors
Note: We simulate a panel of 8,000 observations made of 10 periods, a varying number of industries and a varying number of companies per industry. The correlation between the fixed-effects and the independent variable is positive 0.5. The true slope coefficient is equal to one and the correlation between fixed-effects and regressors is 0.5. We vary the variance of the time component in X, i.e., t from 2, 1.75, 1.50….down to 0.25.
2 1.75 1.5 1.25 1 0.75 0.5 0.25
‐0.10
0.00
0.10
0.20
0.30
0.40
0.50
MS‐FE
Time variance of Xs
β=1; correlation=0.5
Bias in MS
38
Table 4 Variable Definitions and Summary Statistics
Panel A: The Asymmetric Timeliness of Earnings (Basu, 1997) The model is: ititititititit RRDRRDPX )0()0(/ 10101 , where 1/ itit PX denotes
earnings per share divided by lagged share price; itR denotes raw returns; mtit RR denotes
market adjusted returns; itRa denotes size and book-to-market adjusted return; and )0( itRD is
an indicator variable that takes the value of “1” if itR is negative, and “0” otherwise.
Summary statistics 1963-2013; 105,179 obs.
P5 P25 P50 P75 P95 MEAN STD
1/ itit PX -0.17 0.02 0.06 0.10 0.21 0.05 0.19
itR -0.50 -0.14 0.09 0.38 1.11 0.17 0.53
)0( itRD -0.56 -0.23 -0.02 0.23 0.91 0.05 0.49
Panel B: Accruals and Cash flows as predictors of earnings (Sloan, 1996)
The model is: it
it
it
it
it
it
it
TA
CF
TA
ACC
TA
OI 1
11
00 , where itit TAOI / denotes operating
income divided by average total assets; itit TAACC / denotes the accrual component of earnings
divided by average total assets; and itit TACF / denotes operating cash flows divided by average
total assets.
Summary Statistics 1963-2013; 104,898 obs. P5 P25 P50 P75 P95 MEAN STD
itit TAOI / -0.14 0.08 0.13 0.20 0.31 0.12 0.15
itit TAACC / -0.11 -0.02 0.01 0.05 0.15 0.01 0.09
itit TACF / -0.17 0.05 0.12 0.19 0.31 0.11 0.15
39
Table 5 Alternative Estimation Methods - The Asymmetric Timeliness of Earnings
(Basu, 1997)
α0 α1 β0 β1 Adj-R2
Observ.
Basu (1997) as reported Table 1, Panel A – coefficient 0.090 0.002 0.059 0.216 0.10 t-statistic (68.03) (0.86) (18.34) (20.66) 43,321 Table 1, Panel B – coefficient 0.030 0.014 0.047 0.256 0.12 t-statistic (22.62) (6.07) (11.03) (27.14) 43,321 Table 1, Panel C – coefficient 0.086 -0.005 0.075 0.166 0.12 t-statistic (64.1) (1.96) (21.3) (16.5) 43,118 Replication with Raw Returns Pooled, no fixed-effects (MS) 1963-2013 – coefficient 0.074 -0.009 -0.006 0.256 0.05 t-statistic (76.00) (-5.22) (-3.93) (52.02) 114,175 1963-1990 – coefficient 0.094 0.000 0.064 0.198 0.09 t-statistic (63.38) (0.14) (24.81) (24.12) 42,546 Fixed Firm and Year Effects (FE) 1963-2013 – coefficient 0.060 -0.012 0.016 0.145 0.19 t-statistic (12.27) (-7.10) (10.63) (28.45) 114,175 1963-1990 – coefficient 0.085 -0.004 0.074 0.091 0.24 t-statistic (10.37) (-1.73) (28.04) (10.88) 42,546 Fixed Industry and Year Effects (IE) 1963-2013 – coefficient 0.069 -0.013 0.011 0.223 0.12 t-statistic (14.04) (-7.29) (7.28) (44.90) 114,175 1963-1990 – coefficient 0.093 -0.006 0.071 0.163 0.16 t-statistic (12.30) (-2.54) (26.40) (21.72) 42,546 Fama-MacBeth (FM) 1963-2013 – coefficient 0.069 -0.005 0.023 0.26 0.10 t-statistic (14.46) (-1.49) (3.63) (14.51) 114,175 1963-1990 – coefficient 0.089 -0.004 0.049 0.211 0.13 t-statistic (12.65) (-0.95) (5.34) (9.26) 42,546 Means of Dependent and Independent Variables (MYX) 1963-2013 – coefficient 0.017 -0.020 0.066 0.096 0.04 t-statistic (8.63) (-7.62) (10.84) (10.43) 16,459 1963-1990 – coefficient 0.053 -0.014 0.148 0.097 0.15 t-statistic 24.54 -4.82 20.76 9.03 7,820
40
Demeaning the Dependent Variable (MY) 1963-2013 – coefficient 0.008 -0.008 0.020 0.082 0.02 t-statistic (9.10) (-5.39) (15.61) (18.96) 114,175 1963-1990 – coefficient -0.006 -0.002 0.069 0.048 0.06 t-statistic (-4.61) (-0.75) (30.11) (6.65) 42,546
Notes: The table reports results for estimating Basu’s (1997) asymmetric timeliness of earnings model: ititititititit RRDRRDPX )0()0(/ 10101 . We report results for two periods:
1963-2013, and 1963-1990 (the sample period used in Basu (1997)). See table 4 for variable definitions. We present results for the following specifications: (i) as reported in Basu (1997), (ii) replication without any fixed-effects (MS), (iii) replication with firm and year fixed-effects (FE), (iv) replication with industry and year fixed-effects (IE), (v) replication using the Fama and MacBeth (1973) methodology (average coefficients and corresponding standard errors obtained from annual cross-sectional regressions, FM)); and (vi) a replication where the dependent variable is mean-adjusted (mean is calculated for each cross-sectional unit by averaging observations over time). See Table 4 for variable definitions.
41
Table 6 Alternative Estimation Methods - Accruals and Cash flows as Predictors of
Earnings (Sloan, 1996)
Variables α0 β0 β1 Adj-R2 Observ.
Sloan (1996) As reported Table 3, Panel A, pooled - coefficient 0.011 0.765 0.855 ? t-statistic (24.05) (186.53) (304.56) 40,679 Table 3, Panel B, pooled, decile ranking -2.216 0.565 0.838 ? t-statistic (-55.86) (141.02) (209.43) 40,679 Replication Pooled, no fixed-effects (MS) 1963-2013 – coefficient 0.023 0.697 0.811 0.64 t-statistic (64.52) (206.02) (426.34) 104,898 1962-1991 – coefficient 0.042 0.621 0.732 0.50 t-statistic (66.18) (141.29) (207.87) 43,978 Fixed Firm and Year Effects (FE) 1963-2013 – coefficient 0.055 0.468 0.538 0.69 t-statistic (28.97) (128.33) (197.53) 104,898 1962-1991 – coefficient 0.084 0.377 0.453 0.57 t-statistic (23.83) (75.15) (97.52) 43,978 Fixed Industry and Year Effects (IE) 1963-2013 - coefficient 0.026 0.680 0.789 0.65 t-statistic (13.41) (197.43) (397.55) 104,898 1962-1991 - coefficient 0.044 0.606 0.713 0.51 t-statistic (12.63) (135.63) (197.83) 43,978 Fama-MacBeth (FM) 1963-2013 - coefficient 0.032 0.643 0.753 0.55 t-statistic (12.89) (28.20) (29.86) 104,898 1962-1991 - coefficient 0.043 0.588 0.700 0.47 t-statistic (22.40) (22.67) (18.71) 43,978 Means of Dependent and Independent Variables (MYX) 1963-2013 - coefficient -0.001 0.816 0.988 0.11 t-statistic (-1.34) (64.96) (218.92) 8,109 1962-1991 - coefficient 0.016 0.715 0.888 0.11 t-statistic (10.22) (44.08) (89.08) 4,243
42
Demeaning the Dependent Variable (MY) 1963-2013 - coefficient -0.024 0.215 0.202 0.109 t-statistic (-71.06) (65.46) (109.21) 104,898 1962-1991 - coefficient -0.036 0.211 0.234 0.107 t-statistic (-60.59) (51.47) (71.27) 43,978
Notes: The table reports results for the following specifications: (i) as reported in Sloan (1996, Table 3), (ii) replication without any fixed-effects, (iii) replication with firm and year fixed-effects, (iv) replication with industry and year fixed-effects, and (v) replication using the Fama and MacBeth (1973) methodology (average coefficients and corresponding standard errors obtained from annual cross-sectional regressions). We also report results for two sample periods: 1963-2013, and 1962-1991). The model is:
ititititititit TACFTAACCTAOI /// 11100 .
See Table 4 for variable definitions.