working paper series faculty of financefinancial information reflects firm characteristics, which...

Working Paper Series

Faculty of Finance

No. 11

Fixed-effects in Empirical Accounting Research

Eli Amir, Jose M. Carabias, Jonathan Jona, Gilad Livne


Eli Amir

Tel Aviv University and City University of London

Jose M. Carabias London School of Economics and Political Science

Jonathan Jona

University of Melbourne

Gilad Livne University of Exeter Business School

21 July 2015

Comments welcome

Abstract

The fixed-effects specification is often used in panel datasets as a way of dealing with correlated omitted variables. A review of recent accounting publications reveals that while researchers are generally aware of the need to include fixed-effects in empirical models when using panel datasets (firm-time observations), many chose to replace firm fixed-effects with other form of fixed-effects, mainly industry fixed-effects. We examine the consequences of using different specifications of fixed-effects and show analytically and using simulations that this can lead to biased estimates and wrong inferences. To illustrate the importance of properly including firm fixed-effects, we reexamine commonly used regression models in the accounting literature. We show how inferences change when fixed-effects are properly included. We call for a more careful consideration with regard to the use of fixed-effects specification.

Address correspondence to Eli Amir at [email protected]. We would like to thank Yakov Amihud, Eti Einhorn, Joanne Horton, Fani Kalogirou and seminar participants at the University of Exeter Business School and Tel Aviv University for helpful comments.

1


1. Introduction

Many empirical studies analyze panel datasets, where both cross-section and time-series

observations are pooled together to obtain a larger and more powerful sample. The advantage of

panel datasets is they allow the investigation of relations of interest both in the cross section and

over time. However, examining panel data in accounting should carefully account for

underlying statistical properties of the cross-section data, as well as of the time series. The

challenge, however, is that such properties are not easily observed.

Petersen (2009) argues that while researchers often use panel datasets, the coefficients’

standard errors are often wrongly estimated. Also, the methods used by researchers to correct

for possible biases in standard errors vary widely, but are often wrong. To address this concern,

he proposes clustering standard errors at the firm level, time dimension or both, when

appropriate. However, he finds that clustering standard errors by both firm and time appears

unnecessary in the finance applications he considers. Gow et al. (2010) review and evaluate the

procedures commonly used in estimating standard errors in accounting research and show that

in the presence of both serial and cross-sectional dependence, existing methods often produce

misspecified test statistics.

Both Petersen (2009) and Gow et al. (2010) provide important evidence on significant

biases in standard errors and their implications for test statistics. However, both studies assume

that the models estimated by researchers are well-specified and the coefficients themselves are

not subject to the bias caused by correlated omitted variables. Both studies focus on the problem

of correlations among the error terms. We address a different problem than that addressed by

Petersen (2009) and Gow et al. (2010); namely the potential bias in the estimated coefficients

2

due to misspecified regression models. In particular, our study complements Petersen (2009)

and Gow et al. (2010) by investigating instead the effect of model misspecification (biased and

inconsistent parameters) and its implications for statistical inference. Similar to Petersen (2009)

and Gow et al. (2010) we highlight the concern of incorrect statistical inferences. However,

while Petersen (2009) and Gow et al. (2010) consider bias in the denominator of the test

statistic (standard errors), we consider the possible bias both in the numerator of the test

statistic (the coefficient estimate) and its denominator (the standard error).

To control for unobserved firm and time effects, researchers choose between two models.

The first one, which has been widely used by accounting researchers, is the fixed-effects model.

The maintained assumption under this model is that the unobserved firm and time effects are

correlated with the main explanatory variables. This is a reasonable assumption because

financial information reflects firm characteristics, which are often time-invariant.

An alternative model that is rarely used in accounting research, but is popular in other

disciplines, when panel datasets are used, is the random-effects model.1 In this specification the

unobserved firm and time effects are assumed uncorrelated with the regression residuals. In

datasets with firms being the units of analysis, the random-effect method estimates an

unobserved effect that is drawn from an in i.i.d normal distribution, and is independent of the

error term (Greene, 2003; Wooldridge, 2007). Consequently, the random-effects model

consumes fewer degrees of freedom relative to the fixed-effects model. When the researcher

believes that the model specification does not suffer from a correlated omitted variable problem,

then the random-effect model is the preferred specification, because it will produce unbiased

slope estimates and more efficient standard errors (Greene, 2003; Wooldridge, 2007). However,

1 Mundlak (1978) shows that the random-effects model is, in fact, a special case of the fixed-effects model.

3

in the presence of correlated omitted variables, which are time-invariant, the fixed-effects model

is preferred. Our study deals only with the empirical consequences of using the fixed-effects

model.

The maintained assumption under the fixed-effects model is that both dimensions (firm

and time) of the panel are correlated with the main regressors. Hence the research should

include both time and firm controls. Nevertheless, in many accounting studies researchers

replace firm fixed-effects with industry fixed-effects. This replacement could lead to biased and

inconsistent estimates, and hence incorrect inferences, when firm fixed-effects are correlated

with the main variables of interest.

To assess the significance of this problem in accounting research, we reviewed articles

published in six accounting journals over the period 2006-2013: Journal of Accounting and

Economics, Journal of Accounting Research, The Accounting Review, Review of Accounting

Studies, Contemporary Accounting Research, and European Accounting Review. Out of 1,842

articles, 1,152 (62.5%) can be classified as empirical studies (see Table 1, Panel A). While many

of these empirical articles use more than one empirical methodology, 933 articles (50.6%) use

some form of pooled regressions (see Table 1, Panel B). Many of these "pooled" studies use

some form of fixed-effects. The most common fixed-effects specification used in these

regressions are time and industry. Surprisingly, only 114 articles out of the 927 (12.2%) use firm

fixed-effects, 75 of which in the recent three years. It seems that many researchers prefer using

industry instead of firm fixed-effects.

One plausible reason for researchers’ pervasive use of industry fixed-effects instead of

firm fixed-effects is the belief that industry fixed-effects are sufficient and would not lead to

incorrect inferences. That is, the researcher believes that within-industry variation of the

4

variable of interest is negligible. Another reason may be that one or more of the explanatory

variables are time-invariant which obviates the need to control for firm fixed-effects. A third

possible explanation is that when empirical findings are not robust to the inclusion of firm fixed-

effects the researcher may not know which set of results is more credible and opts to report

results rather than no-results.

The purpose of this study is to examine the implications of model misspecification due to

omitted unobserved fixed-effects in accounting settings. We start by analytically identifying the

reasons why failing to control for firm fixed-effects could lead to wrong inferences. These

include the correlation between the omitted fixed-effects and the included explanatory variables,

using incorrect degrees of freedom and biased estimate of the included coefficients’ standard

error. Next, we use simulations of a simple model in which the main variable of interest is

moderately correlated with the unobserved firm fixed-effect. Even with a relatively moderate

correlation, we find that replacing firm fixed-effects with industry fixed-effects, could lead to

substantially wrong inferences. This is because the inclusion of industry fixed-effects does not

fully remove cross-sectional correlations between the omitted firm fixed-effects and the

independent variables. These simulations also reveal that the distribution of the underlying

coefficient of interest measured from OLS regressions with industry and time fixed-effects do

not overlap with the distribution of this coefficient when measured from OLS regressions

featuring both firm and time fixed-effects. This “disconnect” in the distributions indicates the

severity of the inference problem that could result from failing to properly control for these

fixed-effects. Specifically, omitting fixed-effects results in a much higher rate of rejection of the

null (a Type I error). However, the severity of the problem diminishes as the number of

industries increases, or the number of firms per industry decreases. In general, however,

5

ignoring the within-industry variation in the main explanatory variables results in biased

estimated coefficients and wrong inferences. Furthermore, we show that the bias caused by

omitting firm fixed-effects decreases as the variance of the main regressor increases. The

intuition is that firm fixed-effects become more crucial as the main regressor is more time-

invariant.

We then examine the sensitivity of commonly used capital markets-based accounting

regressions to the omission of firm fixed-effects. First, we select Basu's (1997) regression of the

asymmetric response of earnings to good and bad news. We select this regression because the

relation between earnings and stock returns may vary across firms and over time. Ball et al.

(2013) also argue that this is indeed the case because the relation between expected earnings and

expected stock returns varies cross-sectionally. Another, not mutually exclusive, explanation is

that the process of impounding economic news into earnings is firm-specific due to corporate

governance mechanisms, internal controls, or relationships with the auditor, all of which are

quite stable over time. The crucial factor, however, is that the researcher either cannot observe

the underlying mechanisms, or cannot collect the full set of data, and hence needs to consider

this in designing the research strategy. With similar motivation in mind, we also select the

predictive regressions of accruals and cash flow components of earnings as explanatory

variables for future earnings (Sloan, 1996).

In addition to the fixed-effects specifications, we also examine a number of alternative

specifications, which have been used in the literature. These include (i) first differencing of both

the dependent and independent variables (full differencing), (ii) using first differences of only

the dependent variable but not the independent variable, (iii) replacing the vectors of the

dependent and independent variables by their means and estimating a single cross-sectional

6

regression, (iv) demeaning only the dependent variable, and (v) using the Fama and MacBeth

(1973) estimation approach, namely estimating the coefficient in question by averaging the

coefficients and standard errors obtained from periodical cross-sectional regressions.

Of these alternative specifications, only full differencing specification yields an

overlapping distribution with that of the correct specification (that includes both firm and time

fixed-effects). However, this specification is somewhat less efficient, as the distribution of this

alternative method is more dispersed than the distribution obtained from the correct regression

specification. This highlights that incorrect inferences under the full differencing method are

more likely than under the correct model.

Our replication of the two studies, using firm fixed-effects, reveals that that the magnitude

and significance of the main coefficients of interest are quite different from what is obtained

under the original specifications. Although these replications still yield coefficients that are

reliably different from zero, the differences in magnitude suggest that the strength of the

underlying economic phenomenon could be substantially weaker. For example, our fixed-effects

estimate of the Basu’s coefficient of timely loss recognition is 40-50% lower than the estimate

obtained under the Basu (1997) estimation procedure, depending on the estimation period. We

also obtain qualitatively similar results when we replicate Sloan’s (1996) study.

With respect to Basu's (1997) regression, Ball et al. (2013) also acknowledge the need to

control for fixed-effects. However, they use an alternative approach; they suggest that

demeaning the dependent variable is equivalent to using firm fixed-effects. We show

analytically that their procedure is not equivalent to using firm fixed-effects and that their

approach is not free from bias. In fact, their approach tends to understate the magnitude of the

true coefficient. Furthermore, we show empirically that the magnitude and the standard error of

7

the slope coefficient on negative stock returns in the Basu (1997) model are smaller using the

Ball et al. (2013) approach than when firm and time fixed-effects are included. Nevertheless, the

Basu (1997) results hold, albeit with lower magnitude and significance, which is broadly

consistent with Ball et al. (2013).2

It is important to note that including firm fixed-effects does not solve all correlated

omitted variable problems. In particular, it does not solve the problem of omitting a correlated

variable that varies across time. However, in such cases including firm and time fixed-effects

would not exacerbate the underlying problem. We therefore recommend the use of firm and

time fixed-effects because there is no harm to doing so. Including firm and time fixed-effects

when this is not necessary would nevertheless yield correct inferences, but excluding firm or

time fixed-effects would lead to incorrect inferences when they are correlated with the

explanatory variables. If including firm fixed-effects is not feasible, then the “second-best”

approach is first differencing of both the dependent variable and independent variables. While

first differencing yields unbiased coefficients, it is less efficient owing to loss of data (e.g., the

first year in the panel is lost). Either way, replacing firm with industry fixed-effects is likely to

yield biased coefficients.

Section 2 summarizes the main insights from our analytical framework. Section 3 presents

results of simulations aimed at quantifying the potential bias caused by correlated omitted

variables in panel datasets. Section 4 uses the empirical models employed by Basu (1997) and

Sloan (1996) to demonstrate the effect of correlated omitted variables on estimated coefficients.

Section 5 provides concluding remarks.

2 Patatoukas and Thomas (2015) argue that the conservatism coefficient in Ball et al. (2013) is still upward biased. Our study focuses on the role of fixed effects in regression specifications and does not address the issues raised by Patatoukas and Thomas (2015).

8

2. Analytical Derivations

An appendix to this study provides detailed calculations, which are the basis for the

following text. Here we only present the essential elements.3 Let D = [DF, DT] be a matrix of

indicator (dummy) variables for firm fixed-effects (DF) and time fixed-effects (DT), and the

unobserved fixed-effects be denoted by [ , ]F T where the subscripts F and T denote firm

and time fixed-effects, respectively. Then for the panel data the model becomes

y X D (1)

The variance of ε is denoted 2 . Estimating this model will yield

y Xb Da e (2)

where b and a are the coefficient estimates of β and α, respectively and e is the estimated

regression residual. The vector of estimated slope coefficients b can be expressed (using

partitioned matrix conventions) as:

1 1* * * *[ ] [ ] [ ] [ ]D Db X M X X M y X X X y (3)

where 1[ ]DM I D D D D ; * DX M X ; * Dy M y ; * DM . (4)

MD can be thought of as a particular process of demeaning the independent and dependent

variables. It is straightforward to show that b is unbiased (that is, [ | ]E b X ). Furthermore, in

the case where the fixed effects are uncorrelated with X, employing (3) would generate an

unbiased estimate of β, which will be the same as the estimate one would obtain from regressing

y only on X.

Crucially, the demeaning process captured in the matrix M is dependent on the

researcher’s choice of which fixed-effects to include. Specifically, assume that the researcher

3 The full analytical appendix is available from the authors upon request.

9

uses instead industry fixed-effects and time fixed-effects. Let * [ , ]H TD D D and * [ , ]H T

where DH stands for the matrix of industry dummies and H is the matrix of unobserved

industry fixed-effects. Then for the panel data the estimated model is

* *y X D (5)

Importantly, the disturbance term μ involves firm fixed-effects that have not been removed

by the inclusion of industry fixed-effects. Now, the estimated coefficient b1, from this model can

be expressed similarly to (3)

* *

*

1 11 ** ** ** **

1 1** ** ** ** ** ** **

[ ] [ ] [ ] [ ]

[ ] [ ]

D D

D

b X M X X M y X X X y

X X X M X X X

(6)

where *

1* * * *[ ]DM I D D D D ;

*** DX M X ; *** Dy M y ;

*** DM (7)

Because the disturbance term still includes firm fixed-effects (since using industry fixed-

effects has not fully controlled for cross-sectional variations), it follows that

11 ** ** ** **

1** ** ** * *

[ | ] [ ] [ | ]

[ ] [ ]F F

E b X X X X E X

X X X D

(8)

It follows from (8) that b1 is biased. The magnitude of the bias is related to the covariance

element, ** * *[ ]F FX D , and the scaling factor, ** **[ ]X X , which can be thought of as a measure of

the variability in the undemeaned regressor X (see Figure 2). Hence, including industry fixed-

effects instead of firm fixed-effects will affect inferences. Furthermore, t-statistics are also a

function of the estimated coefficients’ standard error. Note also that

1 2* *|Var b X X X (9)

and

1 21 ** **|Var b X X X (10)

10

Since 2 is unknown to the researcher, it has to be estimated from the data. Let T denote

the number of years, F the number of firms and H the number of industries. The unbiased

estimator of 2 is s2, whereby:

22 ( ) ( )

( 1) ( 1) 1 ( 1) ( 1) 1

e y Xb Da y Xb Das

FT F T K FT F T K

(11)

Hence, the conditional variance of the k-th coefficient bk is based on the diagonal element

kk in the matrix1

* *X X

as follows:

1

2 2* *ˆ . |

kb kkk

s Est Var b X s X X

. (12)

With industry and time fixed-effects, the equivalent expressions are

212 1 * * 1 * *

1

( ) ( )

x ( 1) (T 1) 1 x ( 1) (T 1) 1

e y Xb D a y Xb D as

F T H K F T H K

(13)

and

1

12 21 1 ** **ˆ . [ | ]

kb kks Est Var b X s X X

(14)

Observation 3 in the Appendix states that a t-test based on the (misspecified) regression

coefficients b1, 1

1

1

2k

k

kb

b

bt

s , and a t-test based on the correct regression coefficients b,

2k

k

kb

b

bt

s , are identical if and only if

12

1

1

12

1 ** ** ,1

2 21

* * ,

2 2

[ ]

/ ( x ( 1) ( 1) 1)

[ ]

/ ( x ( 1) ( 1) 1)

k

k

k

k

k k kkb

b

k k k kb

b

b X Xbt

s e F T H T K

b X X bt

e F T F T K s

(15)

11

Otherwise, the sign and significance of the t-test of b1 would be different than that of b.

Note that with non-zero correlation between the firm fixed-effects and the independent variables

the two expressions for the t-statistics differ along four dimensions. The first is the difference

point estimates b and b1. The second difference relates to the specific kk-th element in the square

bracket in the numerator. The third is the two sums of the regression squared residuals. The

fourth and last is the difference in the degrees of freedom. With respect to the latter, note that as

H approaches F the difference in the degrees of freedom becomes smaller in magnitude. In the

extreme, when H = F the two t-statistics will be identical, as firm and industry fixed-effects

coincide.

The Ball et al. (2013) specification

Ball et al. (2013) argue that the Basu (1997) model suffers from a correlated omitted

variable problem. They suggest the problem could be solved by using a fixed-effects

specification. However, due to computation infeasibility, they suggest an alternative approach to

the standard fixed-effects specification in which the dependent variable, earnings, is adjusted for

average earnings (where averages are taken at a firm level over time). Demeaning only the

dependent variable is not identical to including firm and time fixed-effects.4 It is important to

note that Ball et al. (2013) employ several measures of returns. The important specification for

us appears in Table 5, and for this table they use size and book-to-market adjusted returns.

Importantly these returns are not zero mean. This would lead to biased coefficients.

4 Ball et al. (2013) compute unexpected returns by subtracting from firm-specific returns the market return (or the return on the corresponding size and book-to-market portfolios). If the average unexpected return is zero, the full fixed-effects model and Ball et al. (2013) approach yields unbiased estimators in the Basu (1997) framework.

12

To see this, let iy denote the firm level average of ity (that is, 1

/T

i ity y T ). If only the

dependent variable is demeaned, the model can be written as (assuming it includes time fixed-

effects):

*it i t it ity y y X u (16)

where it it f i iu C y

Cf=i denotes the time-invariant firm fixed-effects that are not explicitly modelled and hence are

absorbed in the disturbance term. Using matrix algebra, we get5

*

1

*

;

[ ]

( )

Ty D X u

M I D D D D

y M y M X u X u

(17)

The estimated vector of coefficients is therefore

1[ ]b X X X y

Taking expectations, and bearing in mind that the correct model is given by equation (1)

above and that [ | ]F F FE C X D :

1

1

1

1 1

[ | ] [[ ] | ]

[ ] [ ( ) | ]

[ ] ( )

[ ] [ [ ] ]

i

T F F T F F

A A

E b X E X X X y X

X X X E M y y X

X X X X M D M D X M D M D

X X X M X I X X X M X

(18)

where MA is the firm-level average-creating matrix. That is, demeaning the dependent variable

but without demeaning the right-hand side variables leads to a biased coefficient. Only in the

case where X+ is zero-mean, we obtain unbiased coefficients since MA X+ = 0. Importantly, the

bias in (18) is unrelated to the correlation between the firm fixed effects and the other

5 With matrix algebra Fu C y .

13

explanatory variables. Equation (18) suggests that the Ball et al’s (2013) coefficients are the true

coefficients scaled down by the expression 1[ [ ] ]AI X X X M X .

The Ball et al.’s (2013) estimates are therefore biased and they would differ from a full

fixed-effect model. To stress, this is because including time and firm fixed-effects works to

demean the panel variables in a different way. Under this specification all variables (dependent

and independent) are transformed as follows:

*it it i tv v v v v (19)

where *itv is the demeaned variable for firm i and year t, itv is the original observation, iv is the

average across all annual observations for firm i, tv is the average of the underlying variable v in

year t across all firms, and v is the grand average of v. For the dependent variable this is

different from the Ball et al.’s (2013) transformation *it it iv v v . This also affects inferences,

because the standard error of the t-test is derived from the residuals sum of squares, which also

differs between the two approaches. Additionally, a research design using the Ball et al.’s

(2013) approach is likely to incorrectly calculate the degrees of freedom. The typical statistical

software used in regression analysis will identify the transformed variable *it it iv v v as a

“single” variable and hence instead of using 1 1 1FT F T K degrees of freedom

would use 1 1FT T K degrees of freedom.6

The Fama-MacBeth (1973) approach

According to Fama-MacBeth (1973) regression coefficients are not calculated from a

panel, but rather from periodical cross-sectional regressions. Specifically, the overall coefficient

6 Ball et al. (2013) also estimate a model in which the dependent variable (earnings) is adjusted by subtracting the lagged dependent variable (but not the independent variable). It can be shown analytically that this approach also leads to biased estimates. For brevity we do not include the proof here, but simulate this specification later.

14

is the average coefficient over the T annual regressions and the standard error is derived from

the distribution of the individual (periodical) coefficients. Because periodical regressions are not

tooled to accommodate fixed firm or annual effects, this method is still prone to the same

problem. Each annual underlying model can be written as:

t t ty X (20)

where X is the matrix of annual explanatory variables. However, the individual disturbance term

incorporates the fixed-effects implying it f i t itC C . The estimated coefficient for a

single year t therefore can be expressed as in a simple OLS setup:

1 1[ ] [ ] [ ]t t t t t t t t tb X X X y X X X (21)

Averaging over T annual regressions yields the Fama-MacBeth coefficient:

11 11 1

[ ]T T

FM t t t tT Tt tb X X X

(22)

The assumption of correlated fixed-effects implies that [ | ] 0f t tE C C X . Hence we

obtain that in expectation

11

1

[ | ] [ ] [ | ]T

FM t t t t tTt

E b X X X E X X

. (23)

In other words, the Fama-MacBeth (1973) procedure is prone to the omitted fixed-effects

problem under the assumption that the true model is as expressed in equation (1).

3. Simulations

To illustrate the potential bias in estimated slope coefficients that is caused by omitting

fixed-effects, we simulate a panel dataset according to the following specification:

it i t I it ity a a a x e

15

it i tX X

0

0, , , ~ Μ, Σ , Μ

1

1 0 0 , Σ

0

0 10

0 1 0

0

i t i i i I

i t

i i

i I

X a X a X a

X a

i t i IX a

X a

X a a a N

and , ~ 0,1 t ite N . The variable , is the dependent variable, , , are firm, time and

industry fixed-effects, respectively, and is the independent variable. The values of the true

parameters are }50.0,25.0,0,25.0,5.0{,, Iiiiti XXX and 1 . The null hypothesis is

1 . Since the bias in depends on the correlation between the omitted fixed-effects and the

regressor X, we impose five different levels of correlations between the fixed-effects and X:

0.50, 0.25, 0, -0.25 and -0.50. We expect a positive, zero and negative bias for the positive, zero

and negative correlations, respectively, when the model omits the fixed-effects.7

Using the above specification, we simulate a panel of 8,000 observations made of 10

periods, 20 industries and 40 firms per industry. We repeat this process 8,000 times, applying

the following eight specifications:

(1) Firm and time fixed-effects (FE) – This model includes both firm and time fixed-effects.

We expect this specification to yield an unbiased slope estimate (b = 1). We label this

model as FE.

(2) Industry and time fixed-effects (IE) – Here, we include time and industry fixed-effects,

by replacing with . This specification ignores within-industry variations at the firm

level and hence we expect the slope estimate to be biased. Notice that this specification

approaches the full fixed-effects model as the number of firms per industry decreases. At

7 In a single variable setting, it is easy to sign the bias; In a multivariate setting, however, the sign of the bias depends on the correlation matrix of the regressors (see the Appendix).

16

the extreme case where there is one firm per industry, this specification is identical to the

full fixed-effects model. To show this, we conduct sensitivity analysis where we

sequentially increase the number of industries (and reduce the number of firms per industry)

keeping the number of observations constant (see Table 3). We label this model as IE.

(3) Misspecified model (MS) – Here we omit all fixed-effects; hence, we expect the estimated

slope coefficient (β) to be biased to a greater degree than the previous model. However,

given the small number of time periods (10) relative to the number of firms (800), the bias

from omitting time effects is expected to be small. This situation is similar to that in many

studies that use archival data, as the number of firms is much larger than the number of

periods. We label this model MS.

(4) First differences model for both the dependent and independent variables (FD) – In

this model, we use first differences instead of current values (that is, current value minus

the lagged one). This leads to an unbiased estimated slope coefficient, and one could omit

the fixed-effects from the model. However, the differencing process involves loss of

information, as the first period in the panel is lost. We label this method FD.

(5) Using first differences for the dependent variable only (LY) – Here, the researcher first-

differences the dependent variable but not the independent variables. We expect this

specification to yield biased results. However, in this case the bias in induced not only by

the covariance between the independent variable and the fixed-effects, but also by the

exclusion of the variable βXit-1 from the model. Hence, the bias in this case is also a function

of the true β; when β is positive the bias is negative and when β is negative the bias is

positive. We label this model LY (Lag Y).

17

(6) Using the time-series means of the independent and the dependent variables (MYX) –

Here we convert the panel dataset into a single cross-sectional regression by using the

means of the independent and dependent variables as the main variables in the regression

(see for instance, Aghion et al., 2010). Using time-level means implies that the error term

still includes firm fixed effects and hence the coefficient estimates are biased.

(7) Demeaning the dependent variable Y (MY) - Similar to the LY case and motivated by

Ball et al. (2013), this specification only adjusts the Ys by subtracting the firm level

averages. However, this specification is expected to yield biased estimates, as argued above.

The bias is induced not by the covariance between the independent variable and the fixed-

effects, but by the failure to demean the dependent variable at the firm level. Equation (18)

suggests that coefficient estimate under this specification, is a scaled-down estimate of the

true parameter. We therefore expect it to be smaller than 1, regardless of the sign of the

correlation between the fixed-effects and independent variable. We label this model as MY.

(8) Fama-MacBeth (FM) – We also estimate the model (without fixed-effects) using the

Fama-MacBeth (1973) procedure; that is, estimating 10 periodical regressions and reporting

the average slope coefficient. Equation (23) suggests the FM specification would yield

biased estimate, under the true model that includes firm and time effects. We label this

model as FM.

For each of the eight specifications, we obtain 8,000 slope estimates. We also vary the

magnitude of the correlation between the fixed-effects and the regressor X. Table 2 reports the

means of the estimated slope coefficients, standard errors, t-statistics of the distance from the

true coefficient (β = 1), and R2s for five levels of correlations: 0.5, 0.25, 0, -0.25, and -0.5. We

18

also present the distribution of the estimated slope coefficients in Figure 1, using three different

correlations: Figures 1a, 1b, and 1c present the distribution for a correlation of 0.5, 0, and -0.5,

respectively.

By construction, the full fixed-effects model (FE) yields an unbiased estimate (b = 1) and

high R2s for all levels of correlation. Also, the distribution of the bs is the tightest among all

alternative distributions, as can be seen from the figures. The second model (industry and time

effects, denoted IE) yields a positive bias (b = 1.25) when the correlation between the fixed-

effects and the regressor is 0.5; zero bias when the correlation is zero and negative bias when the

correlation is -0.5 (b = 0.75). We would incorrectly reject the null hypothesis that the slope is

equal to 1 in all cases, except for the case of zero correlation between the fixed-effects and the

regressor. Also, the regression R2s are lower relatively to the FE model.

When both firm and time effects are omitted (MS), the pattern of bias is similar to that

observed for model IE. That is, with 20 industries and 800 different firms, controlling for

industry fixed-effects performs equally poorly as the fully misspecified model. Moreover, from

Figures 1a-1c we note that the distributions of the slope coefficient under both the MS and IE

are completely disjoint from the distribution of b under the FE specification. This suggests that

it is very unlikely that a slope estimate from these two specifications would fall within a

conventional confidence interval obtained under the full fixed-effect model.

Using first differences for both the dependent and independent variables (FD) yields an

unbiased slope estimate (b = 1.00) for all five correlations, but the estimate is less efficient as

reflected by the larger standard errors, lower t-statistic, and lower adjusted R2 and the larger tails

seen in Figures 1a-1c. Using a first difference only for the dependent variable (LY) yields a

biased and less efficient estimate (b = 0.50; Adjusted R2 = 0.09), regardless of the correlation

19

between the fixed-effects and regressor X. This is because the model is not sensitive to the

correlations between the fixed-effects and the independent variable.

In model MYX, we use means of X and Y and estimate a cross-sectional regression. This

model yields a large positive (negative) bias when the correlation between the fixed effects and

the regressor is positive (negative). However, in the case of demeaning the dependent variable

only (MY), the bias is negative regardless of the correlation between the fixed effects and the

regressor. The reason the two models LY and MY lie to the left of the FE distribution is

consistent with the theoretical prediction derived in the previous section, which states that with

one explanatory variable the estimated coefficient will be smaller in magnitude. Since β is

positive here, they yield values smaller than 1.8

Using the Fama-MacBeth (1973) specification (FM) yields qualitatively similar results as

the misspecified model, with even larger tails of the distribution. This is seen in Figures 1a-1c

where the FM parameter distribution obscures the parameter distribution of the MS model.

Finally, when the correlation between the fixed-effects and the independent variable is zero,

omitting the fixed-effects is not expected to cause any bias. Indeed, the results show that the

slope coefficients in models IE, MS, and FM are unbiased.9

8 We also find from additional simulations (not tabulated) that when β = -1, the LY and MY distributions are within negative value range and lie to the right of the FE and FD distributions. This, again, is consistent with lower magnitude relative to the true value. 9 Note that the distributions of the slope coefficient depicted in the three charts of Figure 1 do not correspond to the average standard errors reported in Table 2. For example, the distribution of β under the FM specification features larger tails than that of the FE model, although the standard error for the FM model (0.008) is smaller than that of the FE model (0.012). To demonstrate this issue, assume that we run 5 simulations of 8,000 observations each and obtain the following coefficient estimates for the FM model: -2, -1, 0, 1, and 2. Also, suppose the output for the OLS regression is such that each coefficient is estimated with standard error of 1. In contrast, for the FE model assume we obtain coefficients of -1, -0.5, 0, 0.5 and 1, but each coefficient is estimated with a standard error of 2. Then, if we were to chart these outcomes, the distribution of the FM (FE) will be wider (narrower), but the average standard error tabulated would be smaller (larger) for the FM (FE) specification.

20

The conclusion from this analysis is that omitting firm fixed-effects will result in biased

slope coefficients unless the fixed-effects are uncorrelated with the independent regressor. Using

firm fixed-effects is a safe approach in that it will generate unbiased coefficients even when the

data generating process does not contain unobserved correlated fixed-effects. An alternative

approach would be to conduct the Hausman (1978) test procedure to identify whether fixed-

effects should be employed. However, since the Hausman test practically runs a fixed-effect

model against a model with no fixed-effects, we see no clear advantage over routinely including

firm fixed-effects.10

(Table 2 and Figure 1 about here)

Since industry fixed-effects is often used instead of firm fixed-effects, we examine the

effect of firm distribution across industries on the results by changing the number of firms per

industry while keeping the total number of observations constant at 8,000. We consider the

following cases: (i) 10 industries with 80 companies in each industry; (ii) 20 industries with 40

companies in each industry (the baseline used above); (iii) 40 industries with 20 companies in

each industry; (iv) 160 industries with 5 companies in each industry; and (v) 400 industries with

2 companies in each industry. We expect the bias to decline as the number of industries

increases. In the extreme case of one firm per industry, there will not be any bias, as this case

coincides with the full fixed-effects specification.

Table 3 contains the results of this analysis. As the number of industries increases and the

number of companies per industry decreases, the bias declines. However, the decline in the bias

is rather small. For example, the bias is 24% when we use 40 industries and 20 companies per

industry; it declines to 17% when using 400 industries and two companies per industry. Hence,

10 Another advantage for using fixed effects specifications is that typical software output can report the fixed effects coefficients, if the researcher is interested in exploring or reporting these coefficients.

21

replacing firm fixed-effects with industry fixed-effects and increasing the number of industries

will not eliminate the bias in the coefficients, although using a finer industry classification might

reduce the bias. For example, using the Fama-French 48-industry classification in estimating

panel datasets is expected to yield less biased results than using the 12-industry classification.

(Table 3 about here)

The bias caused by omitting firm and time fixed-effects depends also on the time-variance

of the main regressor X. As the time-variance of the regressor X increases, the bias caused by

omitting the firm fixed-effects is expected to decline. To see this, we let the variance of X (i.e.,

of the parameter - t ) to decrease from 2.0 to 0.25 in intervals of 0.25. As Figure 2 shows, the

bias in the slope coefficient increases as the variance of X decreases. In other words, omitting

firm fixed-effects results in a larger bias as the main regressors become more time-invariant. In

contrast, when the regressor X varies over time, omitting firm fixed-effects is likely to result in

little bias if at all.

(Figure 2 about here)

Overall we draw several conclusions from the simulation analysis:

(i) Omitting firm fixed-effects may generate biased estimates and overstated t-statistics, hence,

wrong inferences. Replacing firm with industry fixed-effects is not a valid approach as it

does not eliminate the coefficient bias, if the purpose is to control for unobserved correlated

omitted variables that are time-invariant. While increasing the number of industries is likely

to reduce the bias, this approach is unlikely to eliminate the bias.

(ii) Using means of the dependent and independent variables, or the approach taken by Ball et

al. (2013) are not equivalent to using firm fixed-effects. These methods yield biased

estimates. Same holds for lagging just the dependent variable.

22

(iii) Using first differences (for both the dependent variable and independent variables) is a

valid, but less efficient, estimation strategy.

(iv) The coefficient distributions of several specifications may be so disjoint from the

coefficient distribution of a fixed-effect model that respective confidence intervals may be

entirely non-overlapping. That is, the chance of correct inference under the wrong

specification may be quite slim.

4. Implications for Empirical Accounting Research

We now examine the effects of using different model specifications on the results of

commonly used regression models in accounting research. We chose two regressions that have

gained wide recognition: Basu's (1997) model of asymmetric timeliness of earnings and Sloan’s

(1996) differential persistence of accruals and cash flow components of earnings.11

4.1 The Asymmetric Timeliness of Earnings – Basu (1997)

The Basu (1997) model highlights the differential reaction of earnings to good and bad

news, where stock returns serve as a proxy for news. The regression is:

ititititititit RRDRRDPX )0()0(/ 10101

where denotes firm i's annual stock returns for the 12 months starting nine months prior to

fiscal year-end until three months after the fiscal year-end, a period that roughly corresponds to

the period between earnings announcements. denotes firm i's earnings per share for year t,

11 The problem of excluding fixed-effects applies to any dynamic panel data models where the dependent variable is a function of lagged values of the independent variables. Suppose that the data generating process is

1it i t it ity a a y e . Then it follows that the dependent variable1 1 2 1it i t it ity a a y e . If the researcher

estimates instead the regression1 it it ity y u where

it it i tu e a a , it follows that1 1[ | ] 0it it itE u y y and

therefore the estimate b will be biased.

23

denotes firm i’s share price at the beginning of fiscal year t, and D(Rit < 0) is an indicator

variable obtaining the value "1" if stock returns are negative, and "0" otherwise.

Like Basu (1997), we use all firm-year observations from 1963 to 1990 for which stock

returns are available on the CRSP monthly files, and the necessary accounting data available on

Compustat. Similarly, we deflate earnings by the beginning-of-year share price and eliminate

observations falling in the top or bottom 0.5% of opening price-deflated earnings in each

calendar year to reduce the effects of outliers on the results.

Like Ball et al. (2013) we consider two additional definitions of stock returns: market

adjusted returns, and size and book-to-market adjusted returns. The size and book-to-market

adjusted returns are computed by forming 5x5 portfolios based on annual sorts on market

capitalization and on the book-to-market ratio (at the end of year t-1). We then calculate

monthly value-weighted mean returns for each size and book-to-market portfolio and subtract

the portfolio returns from the same size and book-to-market quintiles raw returns. Market

adjusted returns are raw returns minus the value-weighted market returns. To save space, we

report results only for raw returns; results for market-adjusted returns and size and book to

market-adjusted returns are similar.

We collect data for all US firms that trade on the NYSE, AMEX and NASDAQ. Our

sample contains 114,175 firm-year observations for the period 1963-2013, and 42,546 firm-year

observations for the period 1963-1990. For comparison, Basu (1997) reports results for a sample

of 43,321 firm-year observations over the same period. Panel A of Table 4 reports summary

statistics for the regression variables for the period 1963-2013. These statistics are consistent

with those reported in Patatoukas and Thomas (2011) and Ball et al. (2013). For instance,

1/it itX P is left-skewed and itR is right-skewed. The median value of 1/it itX P in our study is

24

0.063 whereas Patatoukas and Thomas (2011) report median value of 0.063 and Ball et al.

(2013) report median value of 0.062. The median value of annual stock returns itR in our study

is 0.094, whereas Patatoukas and Thomas (2011) report 0.089 and Ball et al. (2013) report a

value of -0.047.


Table 5 presents the sensitivity of the asymmetric timeliness of earnings regression to the

inclusion of fixed-effects. Looking at the 1963-1990 sample, our results show that when pooled

OLS is used, the coefficient 1 is positive (0.198) and significant at the 0.01 level (t-statistic of

24.12). Basu (1997) reports a somewhat larger coefficient of 0.256 (t-statistic of 27.14). The

common interpretation of this result is that contemporaneous earnings reflect negative news in a

timelier manner than positive news (accounting earnings are conditionally conservative).

When firm and time fixed-effects are included in the model the slope coefficient is

significantly lower than that reported by Basu (1997). Specifically, the coefficient 1 is 0.091 (t-

statistic = 10.88) for the 1963-1990 period and 0.145 (t-statistic = 28.45) for the 1963-2013

period. When industry effects are included in the model instead of firm effects, 1 is 0.163 (t-

statistic = 21.72) for the 1963-1990 period and 0.223 (t-statistic = 44.90) for the 1963-2013

period, very similar to those reported by Basu (1997). Furthermore, using the Fama-MacBeth

(1973) methodology yields 1 equal to 0.211 (t-statistic = 9.26) for the 1963-1990 period and

0.26 (t-statistic = 14.51) for the 1963-2013 period, again very close to the results reported by

Basu (1997).

In sum, adding firm fixed-effects reduces the coefficient on conservatism ( 1 )

substantially, while using industry fixed-effects or the Fama-MacBeth (1973) methodology

25

yields a much higher conservatism coefficient. These results are in line with our theoretical

predictions and highlight the fact that when the data generating process contains firm fixed-

effects, the inclusion of industry fixed-effects does not help in dealing with unobserved firm

heterogeneity. Similarly, as predicted, the Fama-MacBeth (1973) approach is also subject to

unobserved heterogeneity biases.

Aghion et al. (2010) use a different approach to estimating a panel dataset. Instead of

adding firm and time fixed-effect, they convert the panel dataset into a cross-sectional

regression whereby instead of the vectors of dependent and independent variables, they use

time-level means (denoted here MYX). Using this method yields conservatism coefficients ( 1 )

equal to 0.097 (t-statistic = 9.03) for the 1963-1990 period and 0.096 (t-statistic = 10.43) for the

1963-2013 period.

Ball et al. (2013) acknowledge that in order to deal with this problem, the researcher could

use a fixed-effects specification. However, in their empirical approach and due to computational

constraints, they use a different approach: demeaning the dependent variable (MY). To assess

the effect of their approach, the last specification in Table 5 employs the Ball et al. (2013)

specification (denoted, MY). As can be seen, 1 is 0.048 (t-statistic = 6.65) for the 1963-1990

period and 0.082 (t-statistic = 18.96) for the 1963-2013 period. These values are similar to the

estimates reported in Table 5 (row 5) in Ball et al. (2013). We therefore conclude that 1 under

the Ball et al. (2013) specification is significantly lower than the coefficient obtained from the

full fixed-effects specification. This result suggests that the MY specification provides a lower

bound for the conditional conservatism coefficient. This is likely to be the case as the mean

value of the dependent variable is positive, as suggested by our simulations.

26

Overall, our results suggest that there is substantial unobserved heterogeneity at the firm

level that seems to be an important determinant for explaining price-deflated earnings. These

findings are consistent with the Ball et al. (2013) finding that the Basu (1997) is affected by

correlated omitted variables due to the expected components of earnings being correlated with

the expected components of stock returns. However, we argue that the empirical specification of

Ball et al. (2013) is not an appropriate substitute for firm and time fixed-effects (FE). This

notwithstanding, from a qualitative standpoint, our results confirm the presence of conditional

conservatism in earnings.


4.2 The differential persistence of accruals and cash flow components of earnings

Sloan (1996) explores the association between future income and previous year’s accruals

and cash flows. He finds that the persistence of cash flows exceeds that of accruals, which is

consistent with the reversal property of accruals. Sloan (1996) estimates the following

regression, allowing the persistence coefficient on the accruals and cash flow components of

earnings to be different. The model is:

1 10 0 1 1 1/ / /it it itit it it itOI TA ACC TA CF TA e

where / ititOI TA is operating income divided by average total assets, 11 / ititACC TA denotes

operating accruals divided by average total assets, and 11 / ititCF TA denotes operating cash

flows divided by average total assets.

The accrual component of operating income is measured as ACCit = (∆CAit - ∆Cashit) -

(∆CLit - ∆STDit - ∆TPit) – Depit, where ∆CAit is the change in current assets; ∆Cashit is the

change in cash and cash equivalents; ∆CLit is the change in current liabilities; ∆STDit is the

27

change in debt included in current liabilities; ∆TPit is the change in income taxes payable; and

Depit is the depreciation and amortization expense. The cash flow component of earnings (CFit)

is measured as the difference between operating income and the accrual component of earnings.

We collect a sample, which includes all firm-year observations with the necessary

accounting and stock return data available on Compustat and CRSP monthly file between 1962

and 1991. This is the sample analyzed by Sloan (1996). We also collect data for an extended

sample for 1963-2013. We only sample US firms that trade on NYSE, AMEX or NASDAQ. As

before, we eliminate the top and bottom 0.5% of observations. Table 4, Panel B, contains

descriptive statistics for the main variables for the extended period 1963-2013. Median

operating income over average total assets is 0.13. This figure is made of a median accrual

component of 0.01 and a median cash flow component of 0.12.

Table 6 presents results of estimating the differential persistence regressions for the full

period 1963-2013 and for the sub-period 1962-1991. For the sub-period, using a pooled

regression without fixed-effects, the coefficient on the accrual component ( 0 ) is 0.621 (t-

statistic = 141.29), lower, as expected, than the coefficient on the cash flow component ( 1 ),

which is 0.732 (t-statistic = 207.87). These coefficients are somewhat lower than those reported

by Sloan (1996) in Table 3 ( 0 = 0.765 and 1 = 0.855). However, similar to Sloan (1996), our

results also show that the accrual component of earnings is less persistent than the cash flow

component (0.62 vs. 0.73) and that the difference between the two coefficients is significant at

the 0.01 level.

Adding firm and year fixed-effects reduces the persistence coefficients quite significantly.

The coefficient on the accrual component ( 0 ) is 0.468 (t-statistic = 128.33) and the coefficient

on the cash flow component ( 1 ) is 0.538 (t-statistic = 197.53) for the 1962-2013 period. The

28

corresponding coefficients for sub-period are even lower (0.377 and 0.453, respectively). When

firm fixed-effects are replaced with industry fixed-effects, the persistence coefficients increase

to 0.680 and 0.789, respectively for the entire sample period, and in the sub-period these

coefficients are 0.606 and 0.713, respectively. That is, the coefficients without fixed-effects and

with industry and time fixed-effects are virtually identical. Using the Fama-MacBeth (1973)

methodology has a minor effect on the persistence coefficients and these coefficients are of a

similar magnitude as without any fixed-effects.

Using means of the dependent and independent variables (MYX) yields very high

persistence measures. For the entire sample period the persistence measures of accruals and cash

flows are 0.816 and 0.988, respectively. Interestingly, for the sub-period, these measures are

very similar to this reported by Sloan (1996): 0.715 and 0.888, respectively. Finally, demeaning

the dependent variable (MY) reduces both persistence coefficients to the lowest magnitude

reported in this table. Moreover, for the sub-period 1962-1991 accruals are not less persistent

than cash flows; clearly, this last specification yields downwards biased coefficients, as shown

in our simulations.12

To summarize, with firm fixed-effects the magnitude of the coefficients in the Sloan

(1996) model is smaller than originally reported. This suggests that both accruals and cash flows

are only moderately persistent, although accruals are still found to be less persistent than cash

flows.


12 In all other specifications we find that accruals are less persistent than cash flows and that the difference between the two coefficients is statistically significant at the 0.01 level.

29

5. Conclusion

Accounting researchers often use panel datasets that contain firm/time observations.

However, instead of controlling for firm and time fixed-effects, researchers often use industry

and time fixed-effects or none at all. When asked about the reason for avoiding firm fixed-

effects, some researchers have argued that by including firm fixed-effects they "throw the baby

with the bathtub water." Our study highlights the consequences of adopting this view on

estimation results. We show analytically and empirically that omitting firm fixed-effects yields

biased and inconsistent slope estimates and hence erroneous test statistics, which in turn could

result in incorrect inferences.

We complement recent studies by Petersen (2009) and Gow et al. (2012) that address

potential problems in panel datasets due to correlation in residuals over time and thus biased

standard errors. Unlike Petersen (2009) and Gow et al. (2012) who assume that the regression

model is well-specified, we focus on cases where models are misspecified, and hence the

coefficient estimates are biased and the related standard errors are incorrect. Specifically, we

show how incorrect inferences stem not only from the test statistics denominator, i.e., the

standard error (e.g., Petersen, 2009; Gow et al., 2012) but also from the test statistics numerator,

i.e., the coefficient estimates.

Our study is the first that focuses on the potential bias in estimated coefficients due to

omitting firm fixed-effects when panel datasets are used. Our survey of the common panel

dataset regression specifications used in accounting literature illustrates a clear preference for

the use of industry rather than firm fixed-effects. We find that the inclusion of industry fixed-

effects does not eliminate the bias and could lead to markedly incorrect inferences. This is due

to potential within-industry variations that are ignored or wrongly assumed to be immaterial by

30

researchers. Our results show that the bias in coefficients is negatively related to the number of

industries controlled. This provides further support for within-industry variations and for the

need to use firm fixed-effects. We further test and show that other commonly used methods such

as differencing the dependent variable, or demeaning it, and using the Fama-MacBeth (1973)

procedure yield biased slope coefficients, unless the fixed-effects are uncorrelated with the

independent variable. In addition to the firm fixed-effects model, using first differences for both

the dependent and independent variables yield unbiased estimates but these are less efficient.

With the aim of providing guidance for empirical accounting researchers, we conclude

that the commonly used methods addressing the potential limitations in inferences of panel

datasets do not eliminate the correlated omitted variable problem, with the exception of firm and

time fixed-effects. This is due to the fact that with archival data, the exact form of the data

generation process is unknown to researchers. Our replications of two widely recognized

regression models in the accounting literature show that regression results are sensitive to the

method used. Without knowing all the underlying mechanisms, researchers should check for

substantial differences between a full fixed-effects specification and a simple pooled regression.

A substantial difference may highlight the need to use a full fixed-effects specification.

31

References Aghion, P., Y., Algan, P. Cahuc, and A. Shleifer (2010). Regulation and Distrust, The Quarterly Journal of Economics vol. 125, no. 3, pp. 1015-1049. Ball, R., S. Kothari, S., and V. Nikolaev (2013). On Estimating Conditional Conservatism, The Accounting Review, vol. 88, no. 3, pp. 755-787.

Baltagi, B.H. (2008) Econometrics Analysis of Panel Data (4th Edition). Chichester : Wiley.

Basu, S. (1997). The conservatism principle and the asymmetric timeliness of earnings. Journal of Accounting and Economics, vol. 24, no. 1, pp. 3-37.

Easton, P. D., and T. S. Harris (1991). Earnings as an explanatory variable for returns. Journal of Accounting Research, vol. 29, no. 1, pp. 19-36.

Fama, E. and J. MacBeth (1973). Risk, Return, and Equilibrium: Empirical Tests. Journal of Political Economy, vol. 81, pp. 607–36. Gow, I. D., G. Ormazabal, and D. J. Taylor (2010). Correcting for cross-sectional and time-series dependence in accounting research. The Accounting Review, vol. 85, no. 2, pp. 483-512.

Greene, W.H., 2003. Econometric analysis, 5th. Edition. Upper Saddle River, NJ: Pearson Education.

Hausman, J. A. (1978). Specification tests in econometrics. Econometrica: Journal of the Econometric Society, vol. 46, no. 6, pp. 1251-1271. Mundlak, Y., 1978. On the pooling of time series and cross section data. Econometrica: journal of the Econometric Society, no. 46, pp. 69-85. Patatoukas, P., and J. Thomas (2011). More evidence of bias in differential timeliness estimates of conditional conservatism. The Accounting Review, vol. 86, no. 5, pp. 1765-1793. Patatoukas, P. N., and J. K. Thomas (2015). Placebo tests of conditional conservatism. The Accounting Review, forthcoming. Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, vol. 22, no. 1, pp. 435-480. Sloan, R. G. (1996). Do stock prices fully reflect information in accruals and cash flows about future earnings? The Accounting Review, vol. 71, no. 3, pp. 289-315. Wooldridge, J.M., 2010. Econometric analysis of cross section and panel data. MA: MIT Press.

32

Table 1 A Survey of Research Methodologies in Accounting Journals (2006-2013)

Panel A. Accounting Journals’ Review Statistics Journal Empirical Experiment Theory Essay Survey Case Study Other Total

CAR 179 42 25 47 10 3 18 324

EAR 91 5 20 68 18 6 16 224

JAE 212 1 23 44 2 0 1 283

JAR 188 22 39 45 4 0 1 299

RAST 153 6 24 46 0 0 0 229

TAR 328 75 37 17 21 2 3 483

Total 1152 151 168 267 55 11 39 1842

Panel B. Accounting Journals’ Regressions Specification and Treatments

Journal Pooled Annual Portfolio Time Industry Firm Country

CAR 150 3 3 83 71 13 11 EAR 62 15 2 26 27 9 6 JAE 179 9 20 86 76 27 11 JAR 144 10 19 79 66 23 11

RAST 116 10 15 61 46 8 2 TAR 282 28 17 128 120 34 17 Total 933 75 76 463 406 114 58

Panel C. Firm Fixed-Effect per year

Year 2006 2007 2008 2009 2010 2011 2012 2013

Total 8 5 7 12 7 22 26 27

Notes: 1. Journal abbreviations are: Contemporary Accounting Research (CAR), European Accounting

Review (EAR), Journal of Accounting and Economics (JAE), Journal of Accounting Research (JAR), Review of Accounting Studies (RAST), and The Accounting Review (TAR).

2. Column definitions in Panel A are: - Empirical – Studies that use archival data to support a theory or derive a conclusion. - Experiment – Studies that carry out experiments with the goal of verifying, falsifying, or

validating a hypothesis. - Theory – Studies that operate within theoretically defined framework. Use mathematical

derivations to illustrate and verify hypothesis. - Essay – Studies that do not test any hypothesis but merely discusses concepts within the

accounting discipline. These studies are often discussions of other papers or editors comments on specific topics.

33

- Survey – Studies that gather and collect data by sending surveys to subjects. - Case Study – Qualitative studies that study specific subjects in depth. - Other - Interviews, Descriptive studies, and studies on methodology.

3. Column definitions in Panel B are: - Pooled – Studies that use pooled cross-section and time-series regressions. Many

empirical studies use pooled regressions as a first step before improving the specification.

- Annual – Studies that use indicator variables for specific years or periods (for instance, pre-SOX.

- Portfolio – Studies that use portfolio analysis. - Time – Studies that include time fixed-effects in pooled regressions. - Industry – Studies that use industry fixed-effects in pooled regressions. - Firm – Studies that include firm fixed-effects in the pooled regressions. - Country – Studies that include country dummies. This could be done for specific

countries but also for all countries in the sample.

34

Table 2 Simulation Results

1 2 3 4 5 6 7 8

FE IE MS FD LY MYX MY FM β = 1 ρ = 0.5 Slope (b) 1.00 1.247 1.250 1.00 0.50 1.45 0.45 1.25 t-stat (b = 1) 0.01 22.8 16.5 0.01 -26.0 10.1 -46.5 35.8 Standard error 0.012 0.011 0.015 0.016 0.019 0.045 0.012 0.008 Adjusted R2 0.85 0.73 0.46 0.35 0.09 0.56 0.15 0.53 β = 1 ρ = 0.25 Slope (b) 1.00 1.12 1.12 1.00 0.50 1.23 0.45 1.12 t-stat (b = 1) 0.01 11.12 8.12 0.01 -26.00 4.82 -46.40 17.8 Standard error 0.012 0.011 0.015 0.016 0.019 0.047 0.012 0.008 Adjusted R2 0.84 0.69 0.40 0.35 0.09 0.46 0.15 0.47 β = 1 ρ = 0 Slope (b) 1.00 1.00 1.00 1.00 0.50 1.00 0.45 1.00 t-stat (b = 1) -0.02 -0.01 -0.01 -0.03 -26.03 0.01 -46.49 -0.02 Standard error 0.012 0.011 0.015 0.016 0.019 0.048 0.012 0.008 Adjusted R2 0.83 0.66 0.34 0.35 0.09 0.35 0.15 0.40 β = 1 ρ = -0.25 Slope (b) 1.00 0.87 0.87 1.00 0.50 0.77 0.45 0.87 t-stat (b = 1) -0.01 -11.13 -8.14 -0.03 -25.93 -4.82 -46.36 -17.66 Standard error 0.012 0.011 0.015 0.016 0.019 0.047 0.012 0.008 Adjusted R2 0.81 0.63 0.29 0.34 0.09 0.25 0.15 0.35 β = 1 ρ = -0.50 Slope (b) 1.00 0.75 0.75 1.00 0.50 0.55 0.45 0.75 t-stat (b = 1) 0.013 -22.78 -16.48 0.01 -25.95 -10.1 -46.37 -35.90 Standard error 0.012 0.011 0.015 0.016 0.019 0.045 0.012 0.008 Adjusted R2 0.79 0.61 0.23 0.34 0.09 0.15 0.15 0.29

Notes: We simulate a panel of 8,000 observations made of 10 periods, 20 industries and 40 firms per industry. The correlations between the fixed-effects and the independent variable are 0.5, 0.25, 0, -0.25, and -0.50, respectively. The estimation methods are: (1) FE – Including fixed firm and time fixed-effects (2) IE - Industry and time fixed-effects (3) MS - Miss-specified model (no fixed-effects) (4) FD - First differences for both the dependent and independent variables (5) LY – First differences of the dependent variable only (6) MYX – Using means of both the dependent and independent variables (subtracting the firm

mean from both the dependent and independent variable). (7) MY – Demeaning the dependent variable (subtracting the firm mean from the dependent

variable) (8) FM – Fama-Macbeth (1973) applied to annual OLS regressions. The table reports the means of the estimated slope coefficient (b), the t-statistics, standard errors and R2s across the 8,000 simulations of 8,000 observations in each round.

35

Figure 1 Plots of the Distributions of Estimated Coefficients under Different Model

Specifications and Correlation Parameters

Figure 1a

Figure 1b

0

500

1000

1500

2000

2500

3000

3500

0.38 0.50 0.63 0.75 0.87 1.00 1.12 1.24 1.37 1.49 1.61

No. o

f observations

Distributions of β

β=1; correlation=0.5FE

IE

MS

FD

LY

MY

FM

MYX

0

500

1000

1500

2000

2500

0.39 0.47 0.55 0.63 0.70 0.78 0.86 0.94 1.01 1.09 1.17

No. o

f observations

Distributions of β

β=1; correlation=0FE

IE

MS

FD

LY

MY

FM

MYX

36

Figure 1c

Table 3 Varying the Number of Industries and Number of Member Firms

Industry Fixed-effects

β = 1 ρ = 0.5 Full FE

IE 10/80

IE 20/40

IE 40/20

IE 160/5

IE 400/2

MS

Slope (b) 1.00 1.249 1.248 1.24 1.22 1.17 1.25 t-stat (b = 1) 0.02 16.7 16.6 16.2 14.6 10.4 16.6 Standard error 0.012 0.015 0.015 0.015 0.016 0.017 0.015 Adjusted R2 0.85 0.46 0.47 0.47 0.50 0.52 0.46

Note: We simulate a panel of 8,000 observations made of 10 periods, a varying number of industries (10, 20, up to 400) and a varying number of companies per industry (80, 40, down to 2). The correlation between the fixed-effects and the independent variable is positive. The models are: FE – Full Fixed Time and Firm Effects; IE - Industry and time fixed-effects; and MS - Miss-specified model (no fixed-effects). The table reports the means of the estimated slope coefficient (b), the t-statistics, standard errors and R2.

0

500

1000

1500

2000

0.40 0.47 0.54 0.61 0.67 0.74 0.81 0.88 0.95 1.02 1.09

No. o

f observations

Distributions of β

β=1; correlation=‐0.5FE

IE

MS

FD

LY

MY

FM

MYX

37

Figure 2 Sensitivity to Increasing the Cross-time Variance of the Regressors

Note: We simulate a panel of 8,000 observations made of 10 periods, a varying number of industries and a varying number of companies per industry. The correlation between the fixed-effects and the independent variable is positive 0.5. The true slope coefficient is equal to one and the correlation between fixed-effects and regressors is 0.5. We vary the variance of the time component in X, i.e., t from 2, 1.75, 1.50….down to 0.25.

2 1.75 1.5 1.25 1 0.75 0.5 0.25

‐0.10

0.00

0.10

0.20

0.30

0.40

0.50

MS‐FE

Time variance of Xs

β=1; correlation=0.5

Bias in MS

38

Table 4 Variable Definitions and Summary Statistics

Panel A: The Asymmetric Timeliness of Earnings (Basu, 1997) The model is: ititititititit RRDRRDPX )0()0(/ 10101 , where 1/ itit PX denotes

earnings per share divided by lagged share price; itR denotes raw returns; mtit RR denotes

market adjusted returns; itRa denotes size and book-to-market adjusted return; and )0( itRD is

an indicator variable that takes the value of “1” if itR is negative, and “0” otherwise.

Summary statistics 1963-2013; 105,179 obs.

P5 P25 P50 P75 P95 MEAN STD

1/ itit PX -0.17 0.02 0.06 0.10 0.21 0.05 0.19

itR -0.50 -0.14 0.09 0.38 1.11 0.17 0.53

)0( itRD -0.56 -0.23 -0.02 0.23 0.91 0.05 0.49

Panel B: Accruals and Cash flows as predictors of earnings (Sloan, 1996)

The model is: it

it

it

it

it

it

it

TA

CF

TA

ACC

TA

OI 1

11

00 , where itit TAOI / denotes operating

income divided by average total assets; itit TAACC / denotes the accrual component of earnings

divided by average total assets; and itit TACF / denotes operating cash flows divided by average

total assets.

Summary Statistics 1963-2013; 104,898 obs. P5 P25 P50 P75 P95 MEAN STD

itit TAOI / -0.14 0.08 0.13 0.20 0.31 0.12 0.15

itit TAACC / -0.11 -0.02 0.01 0.05 0.15 0.01 0.09

itit TACF / -0.17 0.05 0.12 0.19 0.31 0.11 0.15

39

Table 5 Alternative Estimation Methods - The Asymmetric Timeliness of Earnings

(Basu, 1997)

α0 α1 β0 β1 Adj-R2

Observ.

Basu (1997) as reported Table 1, Panel A – coefficient 0.090 0.002 0.059 0.216 0.10 t-statistic (68.03) (0.86) (18.34) (20.66) 43,321 Table 1, Panel B – coefficient 0.030 0.014 0.047 0.256 0.12 t-statistic (22.62) (6.07) (11.03) (27.14) 43,321 Table 1, Panel C – coefficient 0.086 -0.005 0.075 0.166 0.12 t-statistic (64.1) (1.96) (21.3) (16.5) 43,118 Replication with Raw Returns Pooled, no fixed-effects (MS) 1963-2013 – coefficient 0.074 -0.009 -0.006 0.256 0.05 t-statistic (76.00) (-5.22) (-3.93) (52.02) 114,175 1963-1990 – coefficient 0.094 0.000 0.064 0.198 0.09 t-statistic (63.38) (0.14) (24.81) (24.12) 42,546 Fixed Firm and Year Effects (FE) 1963-2013 – coefficient 0.060 -0.012 0.016 0.145 0.19 t-statistic (12.27) (-7.10) (10.63) (28.45) 114,175 1963-1990 – coefficient 0.085 -0.004 0.074 0.091 0.24 t-statistic (10.37) (-1.73) (28.04) (10.88) 42,546 Fixed Industry and Year Effects (IE) 1963-2013 – coefficient 0.069 -0.013 0.011 0.223 0.12 t-statistic (14.04) (-7.29) (7.28) (44.90) 114,175 1963-1990 – coefficient 0.093 -0.006 0.071 0.163 0.16 t-statistic (12.30) (-2.54) (26.40) (21.72) 42,546 Fama-MacBeth (FM) 1963-2013 – coefficient 0.069 -0.005 0.023 0.26 0.10 t-statistic (14.46) (-1.49) (3.63) (14.51) 114,175 1963-1990 – coefficient 0.089 -0.004 0.049 0.211 0.13 t-statistic (12.65) (-0.95) (5.34) (9.26) 42,546 Means of Dependent and Independent Variables (MYX) 1963-2013 – coefficient 0.017 -0.020 0.066 0.096 0.04 t-statistic (8.63) (-7.62) (10.84) (10.43) 16,459 1963-1990 – coefficient 0.053 -0.014 0.148 0.097 0.15 t-statistic 24.54 -4.82 20.76 9.03 7,820

40

Demeaning the Dependent Variable (MY) 1963-2013 – coefficient 0.008 -0.008 0.020 0.082 0.02 t-statistic (9.10) (-5.39) (15.61) (18.96) 114,175 1963-1990 – coefficient -0.006 -0.002 0.069 0.048 0.06 t-statistic (-4.61) (-0.75) (30.11) (6.65) 42,546

Notes: The table reports results for estimating Basu’s (1997) asymmetric timeliness of earnings model: ititititititit RRDRRDPX )0()0(/ 10101 . We report results for two periods:

1963-2013, and 1963-1990 (the sample period used in Basu (1997)). See table 4 for variable definitions. We present results for the following specifications: (i) as reported in Basu (1997), (ii) replication without any fixed-effects (MS), (iii) replication with firm and year fixed-effects (FE), (iv) replication with industry and year fixed-effects (IE), (v) replication using the Fama and MacBeth (1973) methodology (average coefficients and corresponding standard errors obtained from annual cross-sectional regressions, FM)); and (vi) a replication where the dependent variable is mean-adjusted (mean is calculated for each cross-sectional unit by averaging observations over time). See Table 4 for variable definitions.

41

Table 6 Alternative Estimation Methods - Accruals and Cash flows as Predictors of

Earnings (Sloan, 1996)

Variables α0 β0 β1 Adj-R2 Observ.

Sloan (1996) As reported Table 3, Panel A, pooled - coefficient 0.011 0.765 0.855 ? t-statistic (24.05) (186.53) (304.56) 40,679 Table 3, Panel B, pooled, decile ranking -2.216 0.565 0.838 ? t-statistic (-55.86) (141.02) (209.43) 40,679 Replication Pooled, no fixed-effects (MS) 1963-2013 – coefficient 0.023 0.697 0.811 0.64 t-statistic (64.52) (206.02) (426.34) 104,898 1962-1991 – coefficient 0.042 0.621 0.732 0.50 t-statistic (66.18) (141.29) (207.87) 43,978 Fixed Firm and Year Effects (FE) 1963-2013 – coefficient 0.055 0.468 0.538 0.69 t-statistic (28.97) (128.33) (197.53) 104,898 1962-1991 – coefficient 0.084 0.377 0.453 0.57 t-statistic (23.83) (75.15) (97.52) 43,978 Fixed Industry and Year Effects (IE) 1963-2013 - coefficient 0.026 0.680 0.789 0.65 t-statistic (13.41) (197.43) (397.55) 104,898 1962-1991 - coefficient 0.044 0.606 0.713 0.51 t-statistic (12.63) (135.63) (197.83) 43,978 Fama-MacBeth (FM) 1963-2013 - coefficient 0.032 0.643 0.753 0.55 t-statistic (12.89) (28.20) (29.86) 104,898 1962-1991 - coefficient 0.043 0.588 0.700 0.47 t-statistic (22.40) (22.67) (18.71) 43,978 Means of Dependent and Independent Variables (MYX) 1963-2013 - coefficient -0.001 0.816 0.988 0.11 t-statistic (-1.34) (64.96) (218.92) 8,109 1962-1991 - coefficient 0.016 0.715 0.888 0.11 t-statistic (10.22) (44.08) (89.08) 4,243

42

Demeaning the Dependent Variable (MY) 1963-2013 - coefficient -0.024 0.215 0.202 0.109 t-statistic (-71.06) (65.46) (109.21) 104,898 1962-1991 - coefficient -0.036 0.211 0.234 0.107 t-statistic (-60.59) (51.47) (71.27) 43,978

Notes: The table reports results for the following specifications: (i) as reported in Sloan (1996, Table 3), (ii) replication without any fixed-effects, (iii) replication with firm and year fixed-effects, (iv) replication with industry and year fixed-effects, and (v) replication using the Fama and MacBeth (1973) methodology (average coefficients and corresponding standard errors obtained from annual cross-sectional regressions). We also report results for two sample periods: 1963-2013, and 1962-1991). The model is:

ititititititit TACFTAACCTAOI /// 11100 .

See Table 4 for variable definitions.

working paper series faculty of financefinancial information reflects firm characteristics, which...

Documents