bayesian solutions for the factor zoo: we just ran two

79
Bayesian Solutions for the Factor Zoo: We Just Ran Two Quadrillion Models Svetlana Bryzgalova Jiantao Huang Christian Julliard § November 18, 2019 Abstract We propose a novel, and simple, Bayesian estimation and model selection procedure for cross- sectional asset pricing. Our approach, that allows for both tradable and non-tradable factors, and is applicable to high dimensional cases, has several desirable properties. First, weak and spurious factors lead to diuse, and centered at zero, posteriors for their market price of risk, making such factors easily detectable. Second, posterior inference is robust to the presence of such factors. Third, we show that flat priors for risk premia lead to improper marginal like- lihoods, rendering model selection invalid. Therefore, we provide a novel prior, that is diuse for strong factors but shrinks away useless ones, under which posterior probabilities are well behaved, and can be used for factor and (non necessarily nested) model selection, as well as model averaging, in large scale problems. We apply our method to a very large set of factors proposed in the literature, and analyse 2.25 quadrillion possible models, gaining novel insights on the empirical drivers of asset returns. Keywords: Cross-sectional asset pricing, factor models, model evaluation, multiple testing, data mining, p-hacking, Bayesian methods. JEL codes: G12, C11, C12, C52, C58. Any errors or omissions are the responsibility of the authors. Christian Julliard thanks the Economic and Social Research Council (UK) [grant number: ES/K002309/1] for financial support. For helpful comments, discussions and suggestions, we thank Mikhail Chernov, Pierre Collin-Dufresne, Aureo de Paula, Marcelo Fernandes, Rodrigo Guimaraes, Alexander Michaelides, Olivier Scaillet, and George Tauchen. London Business School, [email protected] Department of Finance, London School of Economics, [email protected] § Department of Finance, FMG, and SRC, London School of Economics, and CEPR; [email protected].

Upload: others

Post on 17-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Solutions for the Factor Zoo: We Just Ran Two

Bayesian Solutions for the Factor Zoo

We Just Ran Two Quadrillion Modelslowast

Svetlana Bryzgalovadagger Jiantao HuangDagger Christian Julliardsect

November 18 2019

Abstract

We propose a novel and simple Bayesian estimation and model selection procedure for cross-sectional asset pricing Our approach that allows for both tradable and non-tradable factorsand is applicable to high dimensional cases has several desirable properties First weak andspurious factors lead to diffuse and centered at zero posteriors for their market price of riskmaking such factors easily detectable Second posterior inference is robust to the presence ofsuch factors Third we show that flat priors for risk premia lead to improper marginal like-lihoods rendering model selection invalid Therefore we provide a novel prior that is diffusefor strong factors but shrinks away useless ones under which posterior probabilities are wellbehaved and can be used for factor and (non necessarily nested) model selection as well asmodel averaging in large scale problems We apply our method to a very large set of factorsproposed in the literature and analyse 225 quadrillion possible models gaining novel insightson the empirical drivers of asset returns

Keywords Cross-sectional asset pricing factor models model evaluation multiple testing datamining p-hacking Bayesian methodsJEL codes G12 C11 C12 C52 C58

lowastAny errors or omissions are the responsibility of the authors Christian Julliard thanks the Economic and Social Research

Council (UK) [grant number ESK0023091] for financial support For helpful comments discussions and suggestions we thank

Mikhail Chernov Pierre Collin-Dufresne Aureo de Paula Marcelo Fernandes Rodrigo Guimaraes Alexander Michaelides

Olivier Scaillet and George TauchendaggerLondon Business School sbryzgalovalondoneduDaggerDepartment of Finance London School of Economics JHuang27lseacuksectDepartment of Finance FMG and SRC London School of Economics and CEPR cjulliardlseacuk

I Introduction

In the last decade or so two observations have come to the forefront of the empirical asset pricing

literature First at current production rates in the near future we will have more sources of em-

pirically ldquoidentifiedrdquo risk than stock returns to price with these factors ndash the so called factors zoo

phenomenon (see eg Harvey Liu and Zhu (2016)) Second given the commonly used estimation

methods in empirical asset pricing useless factors (ie factors whose true covariance with asset

returns is asymptotically zero) are not only likely to appear empirically relevant but also invali-

date inference regarding the true sources of risk (see eg Gospodinov Kan and Robotti (2019))

Nevertheless to the best of our knowledge no general method has been suggested to date that

i) is applicable to both tradeable and non tradeable factors can ii) handle the very large factor

zoo and iii) remains valid under model misspecification while iv) being robust to the spurious

inference problem And that is what we provide

As stressed by Harvey (2017) in his AFA presidential address the first observation naturally

calls for a Bayesian solution ndash and we develop one Furthermore we show that the two fundamental

problems above are tightly connected and a naive Bayesian approach to model selection may fail in

the presence of spurious factors Hence we correct it and apply our method to the zoo of traded and

non-traded factors proposed in the literature jointly evaluating 225 quadrillion models and gaining

novel insights on the empirical drivers of asset returns In particular we find that only a handful

of factors proposed in the previous literature are robust explanators of the cross-section of asset

returns and a four robust factor model easily outperforms canonical factor models Nevertheless

we also show that the lsquotruersquo latent stochastic discount factor (SDF) is dense is the space of empirical

asset pricing factors ie a large set of factors is needed to fully capture its pricing implications

Nonetheless the SDF-implied maximum Sharpe ratio in the economy is not unrealistically high

First we develop a very simple Bayesian version of the canonical Fama and MacBeth (1973)

regression method that is applicable to both traded and non-traded factors This approach makes

useless factors easily detectable in finite sample while delivering sharp posteriors for the strong

factorsrsquo risk premia (ie leaving inference about them unaffected) The result is quite intuitive

Useless factors make frequentist inference unreliable since when factor exposures go to zero risk

premia are no more identified We show that exactly the same phenomenon causes the posterior

credible intervals of risk premia to become diffuse and centered at zero which makes them easily

detectable in empirical applications This contribution is meant to add to the empirical researcher

toolset an approach for robust inference that is as easy to implement as eg the canonical Shanken

(1992) correction of the standard errors

Second the main intent of this paper is to provide a method for handling inference on the

entirety of the factor zoo at once This naturally calls for the use of model posterior probabilities

But we show that under flat priors for risk premia model and factors selection based on marginal

likelihoods (ie on posterior model probabilities or Bayes factors) is unreliable asymptotically

useless factors get selected with probability one This is due to the fact that lack of identification

generates an unbounded manifold for the risk premia parameters over which the likelihood surface

1

is totally flat1 Hence integration applied directly to the likelihood as if it were an unnormalized

probability distribution function (pdf) produces improper marginal ldquoposteriorsrdquo As a result in the

presence of identification failure naive Bayesian inference has the same weakness as the frequentist

one This observation however not only illustrates the nature of the problem but also suggests

how to restore inference use suitable non-informative but yet non-flat priors

Third building upon the literature on predictor selection (see eg Ishwaran Rao et al (2005)

and Giannone Lenza and Primiceri (2018)) we provide a novel (continuous) ldquospike-and-slabrdquo prior

that restores the validity of model selection based on posterior model probabilities and Bayes factors

The prior is uninformative (the ldquoslabrdquo) for strong factors but shrinks away (the ldquospikerdquo) useless

factors This approach is similar in spirit to a ridge regression and acts as a (Tikhonov-Phillips)

regularization of the likelihood function of the cross-sectional regression needed to estimate risk

premia A distinguishing feature of our prior is that the prior variance of a factorrsquos risk premium is

proportional to its correlation with the test asset returns Hence when a useless factor is present

the prior variance of its risk premium converges to zero so the shrinkage dominates and forces its

posterior distribution to concentrated around zero Not only this prior restores integrability but

also i) makes it computationally feasible to analyse quadrillions of alternative factor models ii)

allows the researcher to encode prior beliefs about the sparsity of the true SDF without imposing

hard thresholds and iii) shrinks the estimate of useless factorsrsquo risk premia toward zero We regard

this novel spike-and-slab prior approach as a solution for the high-dimensional inference problem

generated by the factor zoo

Our method is easy to implement and in all of our simulations has good finite sample properties

even when the cross-section of test assets is large We investigate its performance for risk premia

estimation model evaluation and factor selection in a range of simulation designs that mimic the

stylized features of returns Our simulations account for potential model misspecification and the

presence of either strong or useless factors in the model The use of posterior sampling naturally

allows to build credible confidence intervals not only for risk premia but also other statistics of

interest such as the cross-sectional R2 that is notoriously hard to estimate precisely (Lewellen

Nagel and Shanken (2010))

We show that whenever risk premia are well identified both our method and the frequentist

approach provide valid confidence intervals for model parameters with empirical coverage being

close to its nominal size The posterior distribution for useless factors however is reliably centered

around zero and quickly revels them even in a relatively short sample We find that the posterior

of strong factors is largely unaffected by the identification failure with the posterior coverage

corresponding to its nominal size as well In other words the Bayesian approach restores reliable

statistical inference in the model

We also demonstrate the pitfalls of flat priors for risk premia with the same simulation design

their use leads to selecting useless factor with probability approaching 1 for all the sample sizes

However our spike-and-slab prior seems to successfully eliminate them from the model while

1This is similar to the effect of ldquoweak instrumentsrdquo in IV estimations as discussed in Sims (2007)

2

retaining the true sources of risk

Our results have important empirical implications for the estimation of popular linear factor

models and their comparison We jointly evaluate 51 factors proposed in the previous literature

yielding a total of 225 quadrillion possible models to analyze and find that only a handful of

factors are robust explanators of the cross-section of asset returns (the Fama and French (1992)

ldquohigh-minus-lowrdquo proxy for the value premium the market index as well the adjusted versions

of the ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos

(2018))

Jointly the four robust factors provide a model that is compared to the previous empirical

literature one order of magnitude more likely to have generated the observed asset returns (itrsquos

posterior probability is about 90) However we show that with very high probability the ldquotruerdquo

latent SDF is dense in the space of factors ie capturing its characteristics requires the use of 24-25

factors Nevertheless the SDF-implied maximum Sharpe ratio is not excessive suggesting a high

degree of commonality in terms of captured risks among the factors in the zoo

Furthermore we apply our useless factors detection method to a selection of popular linear

SDFs We find that many non-traded factors such as consumption proxies labour factors or the

consumption-to-wealth ratio cay are only weakly identified at best and are characterised by a

substantial degree of model misspecification and uncertainty

I1 Closely related literature

There are numerous contributions to the literature that rely on the use of Bayesian tools in finance

especially in the areas of asset allocation (for an excellent overview see Fabozzi Huang and Zhou

(2010) and Avramov and Zhou (2010)) model selection (eg Barillas and Shanken (2018)) and

performance evaluation (Baks Metrick and Wachter (2001) Pastor and Stambaugh (2002) Jones

and Shanken (2005) Harvey and Liu (2019)) In this paper therefore we aim to provide only an

overview of the literature that is most closely related to our paper

While there are multiple papers in the literature that adopt a Bayesian approach to analyse

linear factor models and portfolio choice most of them focus on the time-series regressions where

the intercepts thanks to factors being directly traded (or using their mimicking portfolios) can be

interpreted as the vector of pricing errors ndash the αrsquos In this case the use of test assets actually

becomes irrelevant since the problem of comparing model performance is reduced to the spanning

tests of one set of factors by another (eg Barillas and Shanken (2018))

Our paper instead develops a method that can be applied to both tradable and non-tradable

factors As a result we focus on the general pricing performance in the cross-section of asset returns

(which is no longer irrelevant) and show that there is a tight link between the use of the most

popular diffuse priors for the risk premia and the failure of the standard estimation techniques

in the presence of useless factors (eg Kan and Zhang (1999a))

Shanken (1987) and Harvey and Zhou (1990) were probably the first contributions to the lit-

erature that adapted the Bayesian framework to the analysis of portfolio choice and developed

3

GRS-type tests (cf Gibbons Ross and Shanken (1989)) for mean-variance efficiency While

Shanken (1987) was the first to examine the posterior odds ratio for portfolio alphas in the linear

factor model Harvey and Zhou (1990) set the benchmark by imposing the priors both diffuse and

informative directly on the deep model parameters Using the data on 12 industry portfolios they

reject the mean-variance efficiency of the market return with a high degree of certainty

Pastor and Stambaugh (2000) and Pastor (2000) directly assign a prior distribution to the

pricing errors α α sim N (0κΣ) where Σ is the variance-covariance matrix of returns and κ isin R+

and apply it to the Bayesian portfolio choice problem The intuition behind their prior is that it

imposes a degree of shrinkage on the alphas so that whenever factor models are misspecified the

pricing errors cannot be too large a priori hence placing a bound on the Sharpe ratio achievable

in this economy Therefore a diffuse prior for the pricing errors α in general should be avoided

Barillas and Shanken (2018) extend the aforementioned prior to derive a closed-form solution

for the Bayesrsquo factor and use it to compare different linear factor models exploiting the time series

dimension of the data2 In a recent paper Goyal He and Huh (2018) also extend the notion of

distance between alternative model specifications and highlight the tension between the power of

the GRS-type tests and the absolute return-based measures of mispricing

Last but not the least there is a general close connection between the Bayesian approach

to model selection or parameter estimation and the shrinkage-based one Garlappi Uppal and

Wang (2007) impose a set of different priors on expected returns and the variance-covarince matrix

and find that the shrinkage-based analogue leads to superior empirical performance Anderson

and Cheng (2016) develop a Bayesian model-averaging approach to portfolio choice with model

uncertainty being one of the key ingredients that yield robust asset allocation and superior out-of-

sample performance of the strategies Finally the shrinkage-based approach to recovering the SDF

of Kozak Nagel and Santosh (2019) can also be interpreted from a Bayesian perspective Within

a universe of characteristic-managed portfolios the authors assign prior distributions to expected

returns3 and their posterior maximum likelihood estimators resemble a ridge regression Instead

we work directly with tradable and nontradable factors and consider (endogenously) heterogenous

priors for factor risk premia λ The dispersion of our prior for each λ directly depends on the

correlation between test assets and the factor so that it mimics the strength of the identification

of the factor risk premium

Naturally our paper also contributes to the very active (and growing) body of work that criti-

cally evaluates existing findings in the empirical asset pricing literature and tries to develop a robust

methodology There is ample empirical evidence that most linear asset pricing models are misspec-

ified (eg Chernov Lochstoer and Lundeby (2019) He Huang and Zhou (2018) Giglio and Xiu

(2018)) Gospodinov Kan and Robotti (2014) develop a general approach for misspecification-

robust inference that provides valid confidence interval for the pseudo-true values of the risk premia

2Chib Zeng and Zhao (forthcoming) show that the improper prior specification of Barillas and Shanken (2018)is problematic and propose a new class of priors that leads to valid model comparison

3Or equivalently the coefficients b when the linear stochastic discount factor is represented as mt = 1 minus (ft minusE[ft])

⊤b where ft and E denote respectively a vector of factors and the unconditional expectation operator

4

Giglio and Xiu (2018) exploit the invariance principle of the PCA and recover the risk premia asso-

ciated with the given factor from the projection on the span of latent factors driving a cross-section

of asset returns Uppal Zaffaroni and Zviadadze (2018) adopt a similar approach by recovering

latent factors from the residuals of the asset pricing model effectively completing the span of the

SDF Daniel Mota Rottke and Santos (2018) instead focus on the construction of the cross-

sectional factors and note that many well-established tradable portfolios such as HML and SMB

can be substantially improved in asset pricing tests by hedging their unpriced component (that does

not carry a risk premium) We do not take a stand on the origin of the factors or the completion of

the model space Instead we consider the whole universe of potential models that can be created

from the set of observable candidates factors proposed in the empirical literature As such our

analysis explicitly takes into account both standard specifications that have been successfully used

in numerous papers (eg Fama-French 3-factor models or nondurable consumption growth) as

well as the combinations of the factors that have never been explicitly tested

Following Harvey Liu and Zhu (2016) a large literature has tried to understand which of

the existing factors (or their combinations) drive the cross-section of asset returns Giglio Feng

and Xiu (2019) combine cross-sectional asset pricing regressions with the double-selection LASSO

of Belloni Chernozhukov and Hansen (2014) to provide valid uniform inference on the selected

sources of risk Huang Li and Zhou (2018) use a reduced rank approach to select among not

only the observable factors but their total span effectively allowing for sparsity not necessarily in

the observable set of factors but their combinations as well Kelly Pruitt and Su (2019) build a

latent factor model for stock returns with factor loadings being a linear function of the company

characteristics and find that only a small subset of the latter provide substantial independent

information relevant for asset pricing Our approach does not take a stand on whether there exists

a single combination of factors that substantially outperforms other model specifications Instead

we let the data speak and find out that the cross-sectional likelihood across the many models is

rather flat meaning the data is not informative enough to reliably indicate that there is a single

dominant specification

Finally our paper naturally contributes to the literature on weak identification in asset pricing

Starting from the seminal papers of Kan and Zhang (1999a 1999b) identification of risk premia has

been shown to be challenging for traditional estimation procedures Kleibergen (2009) demonstrates

that the two-pass regression of Fama-MacBeth lead to biased estimates of the risk premia and

spuriously high significance levels Moreover useless factors often crowd out the impact of the true

sources of risk in the model and lead to seemingly high levels of cross-sectional fit (Kleibergen and

Zhan (2015)) Gospodinov Kan and Robotti (2014 2019) demonstrate that most of the estimation

techniques used to recover risk premia in the cross-section are rendered invalid in the presence of

useless factors and propose alternative procedures that effectively eliminate the impact of these

factors from the model We build upon the intuition developed in these papers and formulate

the Bayesian solution to the problem by providing a prior that directly reflects the strength of

the factor Whenever the vector of correlation coefficients between asset returns and a factor is

close to zero the prior variance of λ for this specific factor also goes to zero and the penalty for

5

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 2: Bayesian Solutions for the Factor Zoo: We Just Ran Two

I Introduction

In the last decade or so two observations have come to the forefront of the empirical asset pricing

literature First at current production rates in the near future we will have more sources of em-

pirically ldquoidentifiedrdquo risk than stock returns to price with these factors ndash the so called factors zoo

phenomenon (see eg Harvey Liu and Zhu (2016)) Second given the commonly used estimation

methods in empirical asset pricing useless factors (ie factors whose true covariance with asset

returns is asymptotically zero) are not only likely to appear empirically relevant but also invali-

date inference regarding the true sources of risk (see eg Gospodinov Kan and Robotti (2019))

Nevertheless to the best of our knowledge no general method has been suggested to date that

i) is applicable to both tradeable and non tradeable factors can ii) handle the very large factor

zoo and iii) remains valid under model misspecification while iv) being robust to the spurious

inference problem And that is what we provide

As stressed by Harvey (2017) in his AFA presidential address the first observation naturally

calls for a Bayesian solution ndash and we develop one Furthermore we show that the two fundamental

problems above are tightly connected and a naive Bayesian approach to model selection may fail in

the presence of spurious factors Hence we correct it and apply our method to the zoo of traded and

non-traded factors proposed in the literature jointly evaluating 225 quadrillion models and gaining

novel insights on the empirical drivers of asset returns In particular we find that only a handful

of factors proposed in the previous literature are robust explanators of the cross-section of asset

returns and a four robust factor model easily outperforms canonical factor models Nevertheless

we also show that the lsquotruersquo latent stochastic discount factor (SDF) is dense is the space of empirical

asset pricing factors ie a large set of factors is needed to fully capture its pricing implications

Nonetheless the SDF-implied maximum Sharpe ratio in the economy is not unrealistically high

First we develop a very simple Bayesian version of the canonical Fama and MacBeth (1973)

regression method that is applicable to both traded and non-traded factors This approach makes

useless factors easily detectable in finite sample while delivering sharp posteriors for the strong

factorsrsquo risk premia (ie leaving inference about them unaffected) The result is quite intuitive

Useless factors make frequentist inference unreliable since when factor exposures go to zero risk

premia are no more identified We show that exactly the same phenomenon causes the posterior

credible intervals of risk premia to become diffuse and centered at zero which makes them easily

detectable in empirical applications This contribution is meant to add to the empirical researcher

toolset an approach for robust inference that is as easy to implement as eg the canonical Shanken

(1992) correction of the standard errors

Second the main intent of this paper is to provide a method for handling inference on the

entirety of the factor zoo at once This naturally calls for the use of model posterior probabilities

But we show that under flat priors for risk premia model and factors selection based on marginal

likelihoods (ie on posterior model probabilities or Bayes factors) is unreliable asymptotically

useless factors get selected with probability one This is due to the fact that lack of identification

generates an unbounded manifold for the risk premia parameters over which the likelihood surface

1

is totally flat1 Hence integration applied directly to the likelihood as if it were an unnormalized

probability distribution function (pdf) produces improper marginal ldquoposteriorsrdquo As a result in the

presence of identification failure naive Bayesian inference has the same weakness as the frequentist

one This observation however not only illustrates the nature of the problem but also suggests

how to restore inference use suitable non-informative but yet non-flat priors

Third building upon the literature on predictor selection (see eg Ishwaran Rao et al (2005)

and Giannone Lenza and Primiceri (2018)) we provide a novel (continuous) ldquospike-and-slabrdquo prior

that restores the validity of model selection based on posterior model probabilities and Bayes factors

The prior is uninformative (the ldquoslabrdquo) for strong factors but shrinks away (the ldquospikerdquo) useless

factors This approach is similar in spirit to a ridge regression and acts as a (Tikhonov-Phillips)

regularization of the likelihood function of the cross-sectional regression needed to estimate risk

premia A distinguishing feature of our prior is that the prior variance of a factorrsquos risk premium is

proportional to its correlation with the test asset returns Hence when a useless factor is present

the prior variance of its risk premium converges to zero so the shrinkage dominates and forces its

posterior distribution to concentrated around zero Not only this prior restores integrability but

also i) makes it computationally feasible to analyse quadrillions of alternative factor models ii)

allows the researcher to encode prior beliefs about the sparsity of the true SDF without imposing

hard thresholds and iii) shrinks the estimate of useless factorsrsquo risk premia toward zero We regard

this novel spike-and-slab prior approach as a solution for the high-dimensional inference problem

generated by the factor zoo

Our method is easy to implement and in all of our simulations has good finite sample properties

even when the cross-section of test assets is large We investigate its performance for risk premia

estimation model evaluation and factor selection in a range of simulation designs that mimic the

stylized features of returns Our simulations account for potential model misspecification and the

presence of either strong or useless factors in the model The use of posterior sampling naturally

allows to build credible confidence intervals not only for risk premia but also other statistics of

interest such as the cross-sectional R2 that is notoriously hard to estimate precisely (Lewellen

Nagel and Shanken (2010))

We show that whenever risk premia are well identified both our method and the frequentist

approach provide valid confidence intervals for model parameters with empirical coverage being

close to its nominal size The posterior distribution for useless factors however is reliably centered

around zero and quickly revels them even in a relatively short sample We find that the posterior

of strong factors is largely unaffected by the identification failure with the posterior coverage

corresponding to its nominal size as well In other words the Bayesian approach restores reliable

statistical inference in the model

We also demonstrate the pitfalls of flat priors for risk premia with the same simulation design

their use leads to selecting useless factor with probability approaching 1 for all the sample sizes

However our spike-and-slab prior seems to successfully eliminate them from the model while

1This is similar to the effect of ldquoweak instrumentsrdquo in IV estimations as discussed in Sims (2007)

2

retaining the true sources of risk

Our results have important empirical implications for the estimation of popular linear factor

models and their comparison We jointly evaluate 51 factors proposed in the previous literature

yielding a total of 225 quadrillion possible models to analyze and find that only a handful of

factors are robust explanators of the cross-section of asset returns (the Fama and French (1992)

ldquohigh-minus-lowrdquo proxy for the value premium the market index as well the adjusted versions

of the ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos

(2018))

Jointly the four robust factors provide a model that is compared to the previous empirical

literature one order of magnitude more likely to have generated the observed asset returns (itrsquos

posterior probability is about 90) However we show that with very high probability the ldquotruerdquo

latent SDF is dense in the space of factors ie capturing its characteristics requires the use of 24-25

factors Nevertheless the SDF-implied maximum Sharpe ratio is not excessive suggesting a high

degree of commonality in terms of captured risks among the factors in the zoo

Furthermore we apply our useless factors detection method to a selection of popular linear

SDFs We find that many non-traded factors such as consumption proxies labour factors or the

consumption-to-wealth ratio cay are only weakly identified at best and are characterised by a

substantial degree of model misspecification and uncertainty

I1 Closely related literature

There are numerous contributions to the literature that rely on the use of Bayesian tools in finance

especially in the areas of asset allocation (for an excellent overview see Fabozzi Huang and Zhou

(2010) and Avramov and Zhou (2010)) model selection (eg Barillas and Shanken (2018)) and

performance evaluation (Baks Metrick and Wachter (2001) Pastor and Stambaugh (2002) Jones

and Shanken (2005) Harvey and Liu (2019)) In this paper therefore we aim to provide only an

overview of the literature that is most closely related to our paper

While there are multiple papers in the literature that adopt a Bayesian approach to analyse

linear factor models and portfolio choice most of them focus on the time-series regressions where

the intercepts thanks to factors being directly traded (or using their mimicking portfolios) can be

interpreted as the vector of pricing errors ndash the αrsquos In this case the use of test assets actually

becomes irrelevant since the problem of comparing model performance is reduced to the spanning

tests of one set of factors by another (eg Barillas and Shanken (2018))

Our paper instead develops a method that can be applied to both tradable and non-tradable

factors As a result we focus on the general pricing performance in the cross-section of asset returns

(which is no longer irrelevant) and show that there is a tight link between the use of the most

popular diffuse priors for the risk premia and the failure of the standard estimation techniques

in the presence of useless factors (eg Kan and Zhang (1999a))

Shanken (1987) and Harvey and Zhou (1990) were probably the first contributions to the lit-

erature that adapted the Bayesian framework to the analysis of portfolio choice and developed

3

GRS-type tests (cf Gibbons Ross and Shanken (1989)) for mean-variance efficiency While

Shanken (1987) was the first to examine the posterior odds ratio for portfolio alphas in the linear

factor model Harvey and Zhou (1990) set the benchmark by imposing the priors both diffuse and

informative directly on the deep model parameters Using the data on 12 industry portfolios they

reject the mean-variance efficiency of the market return with a high degree of certainty

Pastor and Stambaugh (2000) and Pastor (2000) directly assign a prior distribution to the

pricing errors α α sim N (0κΣ) where Σ is the variance-covariance matrix of returns and κ isin R+

and apply it to the Bayesian portfolio choice problem The intuition behind their prior is that it

imposes a degree of shrinkage on the alphas so that whenever factor models are misspecified the

pricing errors cannot be too large a priori hence placing a bound on the Sharpe ratio achievable

in this economy Therefore a diffuse prior for the pricing errors α in general should be avoided

Barillas and Shanken (2018) extend the aforementioned prior to derive a closed-form solution

for the Bayesrsquo factor and use it to compare different linear factor models exploiting the time series

dimension of the data2 In a recent paper Goyal He and Huh (2018) also extend the notion of

distance between alternative model specifications and highlight the tension between the power of

the GRS-type tests and the absolute return-based measures of mispricing

Last but not the least there is a general close connection between the Bayesian approach

to model selection or parameter estimation and the shrinkage-based one Garlappi Uppal and

Wang (2007) impose a set of different priors on expected returns and the variance-covarince matrix

and find that the shrinkage-based analogue leads to superior empirical performance Anderson

and Cheng (2016) develop a Bayesian model-averaging approach to portfolio choice with model

uncertainty being one of the key ingredients that yield robust asset allocation and superior out-of-

sample performance of the strategies Finally the shrinkage-based approach to recovering the SDF

of Kozak Nagel and Santosh (2019) can also be interpreted from a Bayesian perspective Within

a universe of characteristic-managed portfolios the authors assign prior distributions to expected

returns3 and their posterior maximum likelihood estimators resemble a ridge regression Instead

we work directly with tradable and nontradable factors and consider (endogenously) heterogenous

priors for factor risk premia λ The dispersion of our prior for each λ directly depends on the

correlation between test assets and the factor so that it mimics the strength of the identification

of the factor risk premium

Naturally our paper also contributes to the very active (and growing) body of work that criti-

cally evaluates existing findings in the empirical asset pricing literature and tries to develop a robust

methodology There is ample empirical evidence that most linear asset pricing models are misspec-

ified (eg Chernov Lochstoer and Lundeby (2019) He Huang and Zhou (2018) Giglio and Xiu

(2018)) Gospodinov Kan and Robotti (2014) develop a general approach for misspecification-

robust inference that provides valid confidence interval for the pseudo-true values of the risk premia

2Chib Zeng and Zhao (forthcoming) show that the improper prior specification of Barillas and Shanken (2018)is problematic and propose a new class of priors that leads to valid model comparison

3Or equivalently the coefficients b when the linear stochastic discount factor is represented as mt = 1 minus (ft minusE[ft])

⊤b where ft and E denote respectively a vector of factors and the unconditional expectation operator

4

Giglio and Xiu (2018) exploit the invariance principle of the PCA and recover the risk premia asso-

ciated with the given factor from the projection on the span of latent factors driving a cross-section

of asset returns Uppal Zaffaroni and Zviadadze (2018) adopt a similar approach by recovering

latent factors from the residuals of the asset pricing model effectively completing the span of the

SDF Daniel Mota Rottke and Santos (2018) instead focus on the construction of the cross-

sectional factors and note that many well-established tradable portfolios such as HML and SMB

can be substantially improved in asset pricing tests by hedging their unpriced component (that does

not carry a risk premium) We do not take a stand on the origin of the factors or the completion of

the model space Instead we consider the whole universe of potential models that can be created

from the set of observable candidates factors proposed in the empirical literature As such our

analysis explicitly takes into account both standard specifications that have been successfully used

in numerous papers (eg Fama-French 3-factor models or nondurable consumption growth) as

well as the combinations of the factors that have never been explicitly tested

Following Harvey Liu and Zhu (2016) a large literature has tried to understand which of

the existing factors (or their combinations) drive the cross-section of asset returns Giglio Feng

and Xiu (2019) combine cross-sectional asset pricing regressions with the double-selection LASSO

of Belloni Chernozhukov and Hansen (2014) to provide valid uniform inference on the selected

sources of risk Huang Li and Zhou (2018) use a reduced rank approach to select among not

only the observable factors but their total span effectively allowing for sparsity not necessarily in

the observable set of factors but their combinations as well Kelly Pruitt and Su (2019) build a

latent factor model for stock returns with factor loadings being a linear function of the company

characteristics and find that only a small subset of the latter provide substantial independent

information relevant for asset pricing Our approach does not take a stand on whether there exists

a single combination of factors that substantially outperforms other model specifications Instead

we let the data speak and find out that the cross-sectional likelihood across the many models is

rather flat meaning the data is not informative enough to reliably indicate that there is a single

dominant specification

Finally our paper naturally contributes to the literature on weak identification in asset pricing

Starting from the seminal papers of Kan and Zhang (1999a 1999b) identification of risk premia has

been shown to be challenging for traditional estimation procedures Kleibergen (2009) demonstrates

that the two-pass regression of Fama-MacBeth lead to biased estimates of the risk premia and

spuriously high significance levels Moreover useless factors often crowd out the impact of the true

sources of risk in the model and lead to seemingly high levels of cross-sectional fit (Kleibergen and

Zhan (2015)) Gospodinov Kan and Robotti (2014 2019) demonstrate that most of the estimation

techniques used to recover risk premia in the cross-section are rendered invalid in the presence of

useless factors and propose alternative procedures that effectively eliminate the impact of these

factors from the model We build upon the intuition developed in these papers and formulate

the Bayesian solution to the problem by providing a prior that directly reflects the strength of

the factor Whenever the vector of correlation coefficients between asset returns and a factor is

close to zero the prior variance of λ for this specific factor also goes to zero and the penalty for

5

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 3: Bayesian Solutions for the Factor Zoo: We Just Ran Two

is totally flat1 Hence integration applied directly to the likelihood as if it were an unnormalized

probability distribution function (pdf) produces improper marginal ldquoposteriorsrdquo As a result in the

presence of identification failure naive Bayesian inference has the same weakness as the frequentist

one This observation however not only illustrates the nature of the problem but also suggests

how to restore inference use suitable non-informative but yet non-flat priors

Third building upon the literature on predictor selection (see eg Ishwaran Rao et al (2005)

and Giannone Lenza and Primiceri (2018)) we provide a novel (continuous) ldquospike-and-slabrdquo prior

that restores the validity of model selection based on posterior model probabilities and Bayes factors

The prior is uninformative (the ldquoslabrdquo) for strong factors but shrinks away (the ldquospikerdquo) useless

factors This approach is similar in spirit to a ridge regression and acts as a (Tikhonov-Phillips)

regularization of the likelihood function of the cross-sectional regression needed to estimate risk

premia A distinguishing feature of our prior is that the prior variance of a factorrsquos risk premium is

proportional to its correlation with the test asset returns Hence when a useless factor is present

the prior variance of its risk premium converges to zero so the shrinkage dominates and forces its

posterior distribution to concentrated around zero Not only this prior restores integrability but

also i) makes it computationally feasible to analyse quadrillions of alternative factor models ii)

allows the researcher to encode prior beliefs about the sparsity of the true SDF without imposing

hard thresholds and iii) shrinks the estimate of useless factorsrsquo risk premia toward zero We regard

this novel spike-and-slab prior approach as a solution for the high-dimensional inference problem

generated by the factor zoo

Our method is easy to implement and in all of our simulations has good finite sample properties

even when the cross-section of test assets is large We investigate its performance for risk premia

estimation model evaluation and factor selection in a range of simulation designs that mimic the

stylized features of returns Our simulations account for potential model misspecification and the

presence of either strong or useless factors in the model The use of posterior sampling naturally

allows to build credible confidence intervals not only for risk premia but also other statistics of

interest such as the cross-sectional R2 that is notoriously hard to estimate precisely (Lewellen

Nagel and Shanken (2010))

We show that whenever risk premia are well identified both our method and the frequentist

approach provide valid confidence intervals for model parameters with empirical coverage being

close to its nominal size The posterior distribution for useless factors however is reliably centered

around zero and quickly revels them even in a relatively short sample We find that the posterior

of strong factors is largely unaffected by the identification failure with the posterior coverage

corresponding to its nominal size as well In other words the Bayesian approach restores reliable

statistical inference in the model

We also demonstrate the pitfalls of flat priors for risk premia with the same simulation design

their use leads to selecting useless factor with probability approaching 1 for all the sample sizes

However our spike-and-slab prior seems to successfully eliminate them from the model while

1This is similar to the effect of ldquoweak instrumentsrdquo in IV estimations as discussed in Sims (2007)

2

retaining the true sources of risk

Our results have important empirical implications for the estimation of popular linear factor

models and their comparison We jointly evaluate 51 factors proposed in the previous literature

yielding a total of 225 quadrillion possible models to analyze and find that only a handful of

factors are robust explanators of the cross-section of asset returns (the Fama and French (1992)

ldquohigh-minus-lowrdquo proxy for the value premium the market index as well the adjusted versions

of the ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos

(2018))

Jointly the four robust factors provide a model that is compared to the previous empirical

literature one order of magnitude more likely to have generated the observed asset returns (itrsquos

posterior probability is about 90) However we show that with very high probability the ldquotruerdquo

latent SDF is dense in the space of factors ie capturing its characteristics requires the use of 24-25

factors Nevertheless the SDF-implied maximum Sharpe ratio is not excessive suggesting a high

degree of commonality in terms of captured risks among the factors in the zoo

Furthermore we apply our useless factors detection method to a selection of popular linear

SDFs We find that many non-traded factors such as consumption proxies labour factors or the

consumption-to-wealth ratio cay are only weakly identified at best and are characterised by a

substantial degree of model misspecification and uncertainty

I1 Closely related literature

There are numerous contributions to the literature that rely on the use of Bayesian tools in finance

especially in the areas of asset allocation (for an excellent overview see Fabozzi Huang and Zhou

(2010) and Avramov and Zhou (2010)) model selection (eg Barillas and Shanken (2018)) and

performance evaluation (Baks Metrick and Wachter (2001) Pastor and Stambaugh (2002) Jones

and Shanken (2005) Harvey and Liu (2019)) In this paper therefore we aim to provide only an

overview of the literature that is most closely related to our paper

While there are multiple papers in the literature that adopt a Bayesian approach to analyse

linear factor models and portfolio choice most of them focus on the time-series regressions where

the intercepts thanks to factors being directly traded (or using their mimicking portfolios) can be

interpreted as the vector of pricing errors ndash the αrsquos In this case the use of test assets actually

becomes irrelevant since the problem of comparing model performance is reduced to the spanning

tests of one set of factors by another (eg Barillas and Shanken (2018))

Our paper instead develops a method that can be applied to both tradable and non-tradable

factors As a result we focus on the general pricing performance in the cross-section of asset returns

(which is no longer irrelevant) and show that there is a tight link between the use of the most

popular diffuse priors for the risk premia and the failure of the standard estimation techniques

in the presence of useless factors (eg Kan and Zhang (1999a))

Shanken (1987) and Harvey and Zhou (1990) were probably the first contributions to the lit-

erature that adapted the Bayesian framework to the analysis of portfolio choice and developed

3

GRS-type tests (cf Gibbons Ross and Shanken (1989)) for mean-variance efficiency While

Shanken (1987) was the first to examine the posterior odds ratio for portfolio alphas in the linear

factor model Harvey and Zhou (1990) set the benchmark by imposing the priors both diffuse and

informative directly on the deep model parameters Using the data on 12 industry portfolios they

reject the mean-variance efficiency of the market return with a high degree of certainty

Pastor and Stambaugh (2000) and Pastor (2000) directly assign a prior distribution to the

pricing errors α α sim N (0κΣ) where Σ is the variance-covariance matrix of returns and κ isin R+

and apply it to the Bayesian portfolio choice problem The intuition behind their prior is that it

imposes a degree of shrinkage on the alphas so that whenever factor models are misspecified the

pricing errors cannot be too large a priori hence placing a bound on the Sharpe ratio achievable

in this economy Therefore a diffuse prior for the pricing errors α in general should be avoided

Barillas and Shanken (2018) extend the aforementioned prior to derive a closed-form solution

for the Bayesrsquo factor and use it to compare different linear factor models exploiting the time series

dimension of the data2 In a recent paper Goyal He and Huh (2018) also extend the notion of

distance between alternative model specifications and highlight the tension between the power of

the GRS-type tests and the absolute return-based measures of mispricing

Last but not the least there is a general close connection between the Bayesian approach

to model selection or parameter estimation and the shrinkage-based one Garlappi Uppal and

Wang (2007) impose a set of different priors on expected returns and the variance-covarince matrix

and find that the shrinkage-based analogue leads to superior empirical performance Anderson

and Cheng (2016) develop a Bayesian model-averaging approach to portfolio choice with model

uncertainty being one of the key ingredients that yield robust asset allocation and superior out-of-

sample performance of the strategies Finally the shrinkage-based approach to recovering the SDF

of Kozak Nagel and Santosh (2019) can also be interpreted from a Bayesian perspective Within

a universe of characteristic-managed portfolios the authors assign prior distributions to expected

returns3 and their posterior maximum likelihood estimators resemble a ridge regression Instead

we work directly with tradable and nontradable factors and consider (endogenously) heterogenous

priors for factor risk premia λ The dispersion of our prior for each λ directly depends on the

correlation between test assets and the factor so that it mimics the strength of the identification

of the factor risk premium

Naturally our paper also contributes to the very active (and growing) body of work that criti-

cally evaluates existing findings in the empirical asset pricing literature and tries to develop a robust

methodology There is ample empirical evidence that most linear asset pricing models are misspec-

ified (eg Chernov Lochstoer and Lundeby (2019) He Huang and Zhou (2018) Giglio and Xiu

(2018)) Gospodinov Kan and Robotti (2014) develop a general approach for misspecification-

robust inference that provides valid confidence interval for the pseudo-true values of the risk premia

2Chib Zeng and Zhao (forthcoming) show that the improper prior specification of Barillas and Shanken (2018)is problematic and propose a new class of priors that leads to valid model comparison

3Or equivalently the coefficients b when the linear stochastic discount factor is represented as mt = 1 minus (ft minusE[ft])

⊤b where ft and E denote respectively a vector of factors and the unconditional expectation operator

4

Giglio and Xiu (2018) exploit the invariance principle of the PCA and recover the risk premia asso-

ciated with the given factor from the projection on the span of latent factors driving a cross-section

of asset returns Uppal Zaffaroni and Zviadadze (2018) adopt a similar approach by recovering

latent factors from the residuals of the asset pricing model effectively completing the span of the

SDF Daniel Mota Rottke and Santos (2018) instead focus on the construction of the cross-

sectional factors and note that many well-established tradable portfolios such as HML and SMB

can be substantially improved in asset pricing tests by hedging their unpriced component (that does

not carry a risk premium) We do not take a stand on the origin of the factors or the completion of

the model space Instead we consider the whole universe of potential models that can be created

from the set of observable candidates factors proposed in the empirical literature As such our

analysis explicitly takes into account both standard specifications that have been successfully used

in numerous papers (eg Fama-French 3-factor models or nondurable consumption growth) as

well as the combinations of the factors that have never been explicitly tested

Following Harvey Liu and Zhu (2016) a large literature has tried to understand which of

the existing factors (or their combinations) drive the cross-section of asset returns Giglio Feng

and Xiu (2019) combine cross-sectional asset pricing regressions with the double-selection LASSO

of Belloni Chernozhukov and Hansen (2014) to provide valid uniform inference on the selected

sources of risk Huang Li and Zhou (2018) use a reduced rank approach to select among not

only the observable factors but their total span effectively allowing for sparsity not necessarily in

the observable set of factors but their combinations as well Kelly Pruitt and Su (2019) build a

latent factor model for stock returns with factor loadings being a linear function of the company

characteristics and find that only a small subset of the latter provide substantial independent

information relevant for asset pricing Our approach does not take a stand on whether there exists

a single combination of factors that substantially outperforms other model specifications Instead

we let the data speak and find out that the cross-sectional likelihood across the many models is

rather flat meaning the data is not informative enough to reliably indicate that there is a single

dominant specification

Finally our paper naturally contributes to the literature on weak identification in asset pricing

Starting from the seminal papers of Kan and Zhang (1999a 1999b) identification of risk premia has

been shown to be challenging for traditional estimation procedures Kleibergen (2009) demonstrates

that the two-pass regression of Fama-MacBeth lead to biased estimates of the risk premia and

spuriously high significance levels Moreover useless factors often crowd out the impact of the true

sources of risk in the model and lead to seemingly high levels of cross-sectional fit (Kleibergen and

Zhan (2015)) Gospodinov Kan and Robotti (2014 2019) demonstrate that most of the estimation

techniques used to recover risk premia in the cross-section are rendered invalid in the presence of

useless factors and propose alternative procedures that effectively eliminate the impact of these

factors from the model We build upon the intuition developed in these papers and formulate

the Bayesian solution to the problem by providing a prior that directly reflects the strength of

the factor Whenever the vector of correlation coefficients between asset returns and a factor is

close to zero the prior variance of λ for this specific factor also goes to zero and the penalty for

5

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 4: Bayesian Solutions for the Factor Zoo: We Just Ran Two

retaining the true sources of risk

Our results have important empirical implications for the estimation of popular linear factor

models and their comparison We jointly evaluate 51 factors proposed in the previous literature

yielding a total of 225 quadrillion possible models to analyze and find that only a handful of

factors are robust explanators of the cross-section of asset returns (the Fama and French (1992)

ldquohigh-minus-lowrdquo proxy for the value premium the market index as well the adjusted versions

of the ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos

(2018))

Jointly the four robust factors provide a model that is compared to the previous empirical

literature one order of magnitude more likely to have generated the observed asset returns (itrsquos

posterior probability is about 90) However we show that with very high probability the ldquotruerdquo

latent SDF is dense in the space of factors ie capturing its characteristics requires the use of 24-25

factors Nevertheless the SDF-implied maximum Sharpe ratio is not excessive suggesting a high

degree of commonality in terms of captured risks among the factors in the zoo

Furthermore we apply our useless factors detection method to a selection of popular linear

SDFs We find that many non-traded factors such as consumption proxies labour factors or the

consumption-to-wealth ratio cay are only weakly identified at best and are characterised by a

substantial degree of model misspecification and uncertainty

I1 Closely related literature

There are numerous contributions to the literature that rely on the use of Bayesian tools in finance

especially in the areas of asset allocation (for an excellent overview see Fabozzi Huang and Zhou

(2010) and Avramov and Zhou (2010)) model selection (eg Barillas and Shanken (2018)) and

performance evaluation (Baks Metrick and Wachter (2001) Pastor and Stambaugh (2002) Jones

and Shanken (2005) Harvey and Liu (2019)) In this paper therefore we aim to provide only an

overview of the literature that is most closely related to our paper

While there are multiple papers in the literature that adopt a Bayesian approach to analyse

linear factor models and portfolio choice most of them focus on the time-series regressions where

the intercepts thanks to factors being directly traded (or using their mimicking portfolios) can be

interpreted as the vector of pricing errors ndash the αrsquos In this case the use of test assets actually

becomes irrelevant since the problem of comparing model performance is reduced to the spanning

tests of one set of factors by another (eg Barillas and Shanken (2018))

Our paper instead develops a method that can be applied to both tradable and non-tradable

factors As a result we focus on the general pricing performance in the cross-section of asset returns

(which is no longer irrelevant) and show that there is a tight link between the use of the most

popular diffuse priors for the risk premia and the failure of the standard estimation techniques

in the presence of useless factors (eg Kan and Zhang (1999a))

Shanken (1987) and Harvey and Zhou (1990) were probably the first contributions to the lit-

erature that adapted the Bayesian framework to the analysis of portfolio choice and developed

3

GRS-type tests (cf Gibbons Ross and Shanken (1989)) for mean-variance efficiency While

Shanken (1987) was the first to examine the posterior odds ratio for portfolio alphas in the linear

factor model Harvey and Zhou (1990) set the benchmark by imposing the priors both diffuse and

informative directly on the deep model parameters Using the data on 12 industry portfolios they

reject the mean-variance efficiency of the market return with a high degree of certainty

Pastor and Stambaugh (2000) and Pastor (2000) directly assign a prior distribution to the

pricing errors α α sim N (0κΣ) where Σ is the variance-covariance matrix of returns and κ isin R+

and apply it to the Bayesian portfolio choice problem The intuition behind their prior is that it

imposes a degree of shrinkage on the alphas so that whenever factor models are misspecified the

pricing errors cannot be too large a priori hence placing a bound on the Sharpe ratio achievable

in this economy Therefore a diffuse prior for the pricing errors α in general should be avoided

Barillas and Shanken (2018) extend the aforementioned prior to derive a closed-form solution

for the Bayesrsquo factor and use it to compare different linear factor models exploiting the time series

dimension of the data2 In a recent paper Goyal He and Huh (2018) also extend the notion of

distance between alternative model specifications and highlight the tension between the power of

the GRS-type tests and the absolute return-based measures of mispricing

Last but not the least there is a general close connection between the Bayesian approach

to model selection or parameter estimation and the shrinkage-based one Garlappi Uppal and

Wang (2007) impose a set of different priors on expected returns and the variance-covarince matrix

and find that the shrinkage-based analogue leads to superior empirical performance Anderson

and Cheng (2016) develop a Bayesian model-averaging approach to portfolio choice with model

uncertainty being one of the key ingredients that yield robust asset allocation and superior out-of-

sample performance of the strategies Finally the shrinkage-based approach to recovering the SDF

of Kozak Nagel and Santosh (2019) can also be interpreted from a Bayesian perspective Within

a universe of characteristic-managed portfolios the authors assign prior distributions to expected

returns3 and their posterior maximum likelihood estimators resemble a ridge regression Instead

we work directly with tradable and nontradable factors and consider (endogenously) heterogenous

priors for factor risk premia λ The dispersion of our prior for each λ directly depends on the

correlation between test assets and the factor so that it mimics the strength of the identification

of the factor risk premium

Naturally our paper also contributes to the very active (and growing) body of work that criti-

cally evaluates existing findings in the empirical asset pricing literature and tries to develop a robust

methodology There is ample empirical evidence that most linear asset pricing models are misspec-

ified (eg Chernov Lochstoer and Lundeby (2019) He Huang and Zhou (2018) Giglio and Xiu

(2018)) Gospodinov Kan and Robotti (2014) develop a general approach for misspecification-

robust inference that provides valid confidence interval for the pseudo-true values of the risk premia

2Chib Zeng and Zhao (forthcoming) show that the improper prior specification of Barillas and Shanken (2018)is problematic and propose a new class of priors that leads to valid model comparison

3Or equivalently the coefficients b when the linear stochastic discount factor is represented as mt = 1 minus (ft minusE[ft])

⊤b where ft and E denote respectively a vector of factors and the unconditional expectation operator

4

Giglio and Xiu (2018) exploit the invariance principle of the PCA and recover the risk premia asso-

ciated with the given factor from the projection on the span of latent factors driving a cross-section

of asset returns Uppal Zaffaroni and Zviadadze (2018) adopt a similar approach by recovering

latent factors from the residuals of the asset pricing model effectively completing the span of the

SDF Daniel Mota Rottke and Santos (2018) instead focus on the construction of the cross-

sectional factors and note that many well-established tradable portfolios such as HML and SMB

can be substantially improved in asset pricing tests by hedging their unpriced component (that does

not carry a risk premium) We do not take a stand on the origin of the factors or the completion of

the model space Instead we consider the whole universe of potential models that can be created

from the set of observable candidates factors proposed in the empirical literature As such our

analysis explicitly takes into account both standard specifications that have been successfully used

in numerous papers (eg Fama-French 3-factor models or nondurable consumption growth) as

well as the combinations of the factors that have never been explicitly tested

Following Harvey Liu and Zhu (2016) a large literature has tried to understand which of

the existing factors (or their combinations) drive the cross-section of asset returns Giglio Feng

and Xiu (2019) combine cross-sectional asset pricing regressions with the double-selection LASSO

of Belloni Chernozhukov and Hansen (2014) to provide valid uniform inference on the selected

sources of risk Huang Li and Zhou (2018) use a reduced rank approach to select among not

only the observable factors but their total span effectively allowing for sparsity not necessarily in

the observable set of factors but their combinations as well Kelly Pruitt and Su (2019) build a

latent factor model for stock returns with factor loadings being a linear function of the company

characteristics and find that only a small subset of the latter provide substantial independent

information relevant for asset pricing Our approach does not take a stand on whether there exists

a single combination of factors that substantially outperforms other model specifications Instead

we let the data speak and find out that the cross-sectional likelihood across the many models is

rather flat meaning the data is not informative enough to reliably indicate that there is a single

dominant specification

Finally our paper naturally contributes to the literature on weak identification in asset pricing

Starting from the seminal papers of Kan and Zhang (1999a 1999b) identification of risk premia has

been shown to be challenging for traditional estimation procedures Kleibergen (2009) demonstrates

that the two-pass regression of Fama-MacBeth lead to biased estimates of the risk premia and

spuriously high significance levels Moreover useless factors often crowd out the impact of the true

sources of risk in the model and lead to seemingly high levels of cross-sectional fit (Kleibergen and

Zhan (2015)) Gospodinov Kan and Robotti (2014 2019) demonstrate that most of the estimation

techniques used to recover risk premia in the cross-section are rendered invalid in the presence of

useless factors and propose alternative procedures that effectively eliminate the impact of these

factors from the model We build upon the intuition developed in these papers and formulate

the Bayesian solution to the problem by providing a prior that directly reflects the strength of

the factor Whenever the vector of correlation coefficients between asset returns and a factor is

close to zero the prior variance of λ for this specific factor also goes to zero and the penalty for

5

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 5: Bayesian Solutions for the Factor Zoo: We Just Ran Two

GRS-type tests (cf Gibbons Ross and Shanken (1989)) for mean-variance efficiency While

Shanken (1987) was the first to examine the posterior odds ratio for portfolio alphas in the linear

factor model Harvey and Zhou (1990) set the benchmark by imposing the priors both diffuse and

informative directly on the deep model parameters Using the data on 12 industry portfolios they

reject the mean-variance efficiency of the market return with a high degree of certainty

Pastor and Stambaugh (2000) and Pastor (2000) directly assign a prior distribution to the

pricing errors α α sim N (0κΣ) where Σ is the variance-covariance matrix of returns and κ isin R+

and apply it to the Bayesian portfolio choice problem The intuition behind their prior is that it

imposes a degree of shrinkage on the alphas so that whenever factor models are misspecified the

pricing errors cannot be too large a priori hence placing a bound on the Sharpe ratio achievable

in this economy Therefore a diffuse prior for the pricing errors α in general should be avoided

Barillas and Shanken (2018) extend the aforementioned prior to derive a closed-form solution

for the Bayesrsquo factor and use it to compare different linear factor models exploiting the time series

dimension of the data2 In a recent paper Goyal He and Huh (2018) also extend the notion of

distance between alternative model specifications and highlight the tension between the power of

the GRS-type tests and the absolute return-based measures of mispricing

Last but not the least there is a general close connection between the Bayesian approach

to model selection or parameter estimation and the shrinkage-based one Garlappi Uppal and

Wang (2007) impose a set of different priors on expected returns and the variance-covarince matrix

and find that the shrinkage-based analogue leads to superior empirical performance Anderson

and Cheng (2016) develop a Bayesian model-averaging approach to portfolio choice with model

uncertainty being one of the key ingredients that yield robust asset allocation and superior out-of-

sample performance of the strategies Finally the shrinkage-based approach to recovering the SDF

of Kozak Nagel and Santosh (2019) can also be interpreted from a Bayesian perspective Within

a universe of characteristic-managed portfolios the authors assign prior distributions to expected

returns3 and their posterior maximum likelihood estimators resemble a ridge regression Instead

we work directly with tradable and nontradable factors and consider (endogenously) heterogenous

priors for factor risk premia λ The dispersion of our prior for each λ directly depends on the

correlation between test assets and the factor so that it mimics the strength of the identification

of the factor risk premium

Naturally our paper also contributes to the very active (and growing) body of work that criti-

cally evaluates existing findings in the empirical asset pricing literature and tries to develop a robust

methodology There is ample empirical evidence that most linear asset pricing models are misspec-

ified (eg Chernov Lochstoer and Lundeby (2019) He Huang and Zhou (2018) Giglio and Xiu

(2018)) Gospodinov Kan and Robotti (2014) develop a general approach for misspecification-

robust inference that provides valid confidence interval for the pseudo-true values of the risk premia

2Chib Zeng and Zhao (forthcoming) show that the improper prior specification of Barillas and Shanken (2018)is problematic and propose a new class of priors that leads to valid model comparison

3Or equivalently the coefficients b when the linear stochastic discount factor is represented as mt = 1 minus (ft minusE[ft])

⊤b where ft and E denote respectively a vector of factors and the unconditional expectation operator

4

Giglio and Xiu (2018) exploit the invariance principle of the PCA and recover the risk premia asso-

ciated with the given factor from the projection on the span of latent factors driving a cross-section

of asset returns Uppal Zaffaroni and Zviadadze (2018) adopt a similar approach by recovering

latent factors from the residuals of the asset pricing model effectively completing the span of the

SDF Daniel Mota Rottke and Santos (2018) instead focus on the construction of the cross-

sectional factors and note that many well-established tradable portfolios such as HML and SMB

can be substantially improved in asset pricing tests by hedging their unpriced component (that does

not carry a risk premium) We do not take a stand on the origin of the factors or the completion of

the model space Instead we consider the whole universe of potential models that can be created

from the set of observable candidates factors proposed in the empirical literature As such our

analysis explicitly takes into account both standard specifications that have been successfully used

in numerous papers (eg Fama-French 3-factor models or nondurable consumption growth) as

well as the combinations of the factors that have never been explicitly tested

Following Harvey Liu and Zhu (2016) a large literature has tried to understand which of

the existing factors (or their combinations) drive the cross-section of asset returns Giglio Feng

and Xiu (2019) combine cross-sectional asset pricing regressions with the double-selection LASSO

of Belloni Chernozhukov and Hansen (2014) to provide valid uniform inference on the selected

sources of risk Huang Li and Zhou (2018) use a reduced rank approach to select among not

only the observable factors but their total span effectively allowing for sparsity not necessarily in

the observable set of factors but their combinations as well Kelly Pruitt and Su (2019) build a

latent factor model for stock returns with factor loadings being a linear function of the company

characteristics and find that only a small subset of the latter provide substantial independent

information relevant for asset pricing Our approach does not take a stand on whether there exists

a single combination of factors that substantially outperforms other model specifications Instead

we let the data speak and find out that the cross-sectional likelihood across the many models is

rather flat meaning the data is not informative enough to reliably indicate that there is a single

dominant specification

Finally our paper naturally contributes to the literature on weak identification in asset pricing

Starting from the seminal papers of Kan and Zhang (1999a 1999b) identification of risk premia has

been shown to be challenging for traditional estimation procedures Kleibergen (2009) demonstrates

that the two-pass regression of Fama-MacBeth lead to biased estimates of the risk premia and

spuriously high significance levels Moreover useless factors often crowd out the impact of the true

sources of risk in the model and lead to seemingly high levels of cross-sectional fit (Kleibergen and

Zhan (2015)) Gospodinov Kan and Robotti (2014 2019) demonstrate that most of the estimation

techniques used to recover risk premia in the cross-section are rendered invalid in the presence of

useless factors and propose alternative procedures that effectively eliminate the impact of these

factors from the model We build upon the intuition developed in these papers and formulate

the Bayesian solution to the problem by providing a prior that directly reflects the strength of

the factor Whenever the vector of correlation coefficients between asset returns and a factor is

close to zero the prior variance of λ for this specific factor also goes to zero and the penalty for

5

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 6: Bayesian Solutions for the Factor Zoo: We Just Ran Two

Giglio and Xiu (2018) exploit the invariance principle of the PCA and recover the risk premia asso-

ciated with the given factor from the projection on the span of latent factors driving a cross-section

of asset returns Uppal Zaffaroni and Zviadadze (2018) adopt a similar approach by recovering

latent factors from the residuals of the asset pricing model effectively completing the span of the

SDF Daniel Mota Rottke and Santos (2018) instead focus on the construction of the cross-

sectional factors and note that many well-established tradable portfolios such as HML and SMB

can be substantially improved in asset pricing tests by hedging their unpriced component (that does

not carry a risk premium) We do not take a stand on the origin of the factors or the completion of

the model space Instead we consider the whole universe of potential models that can be created

from the set of observable candidates factors proposed in the empirical literature As such our

analysis explicitly takes into account both standard specifications that have been successfully used

in numerous papers (eg Fama-French 3-factor models or nondurable consumption growth) as

well as the combinations of the factors that have never been explicitly tested

Following Harvey Liu and Zhu (2016) a large literature has tried to understand which of

the existing factors (or their combinations) drive the cross-section of asset returns Giglio Feng

and Xiu (2019) combine cross-sectional asset pricing regressions with the double-selection LASSO

of Belloni Chernozhukov and Hansen (2014) to provide valid uniform inference on the selected

sources of risk Huang Li and Zhou (2018) use a reduced rank approach to select among not

only the observable factors but their total span effectively allowing for sparsity not necessarily in

the observable set of factors but their combinations as well Kelly Pruitt and Su (2019) build a

latent factor model for stock returns with factor loadings being a linear function of the company

characteristics and find that only a small subset of the latter provide substantial independent

information relevant for asset pricing Our approach does not take a stand on whether there exists

a single combination of factors that substantially outperforms other model specifications Instead

we let the data speak and find out that the cross-sectional likelihood across the many models is

rather flat meaning the data is not informative enough to reliably indicate that there is a single

dominant specification

Finally our paper naturally contributes to the literature on weak identification in asset pricing

Starting from the seminal papers of Kan and Zhang (1999a 1999b) identification of risk premia has

been shown to be challenging for traditional estimation procedures Kleibergen (2009) demonstrates

that the two-pass regression of Fama-MacBeth lead to biased estimates of the risk premia and

spuriously high significance levels Moreover useless factors often crowd out the impact of the true

sources of risk in the model and lead to seemingly high levels of cross-sectional fit (Kleibergen and

Zhan (2015)) Gospodinov Kan and Robotti (2014 2019) demonstrate that most of the estimation

techniques used to recover risk premia in the cross-section are rendered invalid in the presence of

useless factors and propose alternative procedures that effectively eliminate the impact of these

factors from the model We build upon the intuition developed in these papers and formulate

the Bayesian solution to the problem by providing a prior that directly reflects the strength of

the factor Whenever the vector of correlation coefficients between asset returns and a factor is

close to zero the prior variance of λ for this specific factor also goes to zero and the penalty for

5

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 7: Bayesian Solutions for the Factor Zoo: We Just Ran Two

the risk premium converges to infinity effectively shrinking the posterior of the useless factorsrsquo

risk premia toward zero Therefore our priors are particularly robust to the presence of spurious

factors Conversely they are very diffuse for the strong factors with the posterior reflecting the

full impact of the likelihood

II Inference in Linear Factor Models

This section introduces the notation and reviews the main results of the Fama-MacBeth (FM)

regression method (see Fama and MacBeth (1973)) We focus on classic linear factor models for

cross-sectional asset returns Suppose that there are K factors ft = (f1t fKt)⊤ t = 1 T

which could be either tradable or non-tradable To simplify exposition we consider mean zero

factors that have also been demeaned in sample so that we have both E[ft] = 0K and f = 0K where

E[] denotes the unconditional expectation and the upper bar denotes the sample mean operator

The returns of N test assets in excess of the risk free rate are denoted by Rt = (R1t RNt)⊤

In the FM procedure the factor exposures of asset returns βf isin RNtimesK are recovered from

the linear regression

Rt = a+ βfft + 983171t (1)

where 9831711 983171Tiidsim N (0N Σ) and a isin RN Given the mean normalization of ft we have E[Rt] = a

The risk premia associated with the factors λf isin RK are then estimated from the cross-

sectional regression

R = λc1N + 983141βfλf +α (2)

where 983141βf denotes the time series estimates λc is a scalar average mispricing that should be equal to

zero under the null of the model being correctly specified 1N denotes an N -dimensional vector of

ones and α isin RN is the vector of pricing errors in excess of λc If the model is correctly specified

it implies the parameter restriction a = E[Rt] = λc1N + βfλf Therefore we can rewrite the

two-step Fama-MacBeth regression into one equation as

Rt = λc1N + βfλf + βfft + 983171t (3)

Equation (3) is particularly useful in our simulation study Note that the intercept λc is included

in (2) and (3) in order to separately evaluate the ability of the model to explain the average level

of the equity premium and the cross-sectional variation of asset returns

Let B⊤ = (aβf ) and F⊤t = (1f⊤

t ) and consider the matrices of stacked time series observa-

tions

R =

983091

983109983109983107

R⊤1

R⊤T

983092

983110983110983108 F =

983091

983109983109983107

F⊤1

F⊤T

983092

983110983110983108 983171 =

983091

983109983109983107

983171⊤1

983171⊤T

983092

983110983110983108

The regression in (1) can then be rewritten as R = FB + 983171 yielding the time-series estimates of

6

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 8: Bayesian Solutions for the Factor Zoo: We Just Ran Two

(aβf ) and Σ

983141B =

983075983141a⊤

983141β⊤f

983076= (F⊤F )minus1F⊤R 983141Σ =

1

T(Rminus F 983141B)⊤(Rminus F 983141B)

In the second step the OLS estimates of the factor risk premia are

983141λ = (983141β⊤ 983141β)minus1 983141β⊤R (4)

where 983141β = (1N 983141βf ) and λ⊤ = (λc λf ⊤) The canonical Shanken (1992) corrected covariance matrix

of the estimated risk premia is4

983141σ2(983141λ) = 1

T

983147(983141β⊤ 983141β)minus1 983141β⊤ 983141Σ983141β(983141β⊤ 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf )983148 (5)

where 983141Σf is the sample estimate of the variance-covariance matrix of the factors ft There are

two sources of estimation uncertainty in the OLS estimates of λ First we do not know the test

assetsrsquo expected returns but instead estimate them as sample means R According to the time-

series regression R sim N (a 1T Σ) asymptotically Second if β is known the asymptotic covariance

matrix of 983141λ is simply 1T (β

⊤β)minus1β⊤ 983141Σβ(β⊤β)minus1 The extra term (1 + λ⊤fΣ

minus1f λf ) is included to

account for the fact that βf is estimated

Alternatively we can run a (feasible) GLS regression in the second stage obtaining the estimates

983141λ = (983141β⊤ 983141Σminus1 983141β)minus1 983141β⊤ 983141Σminus1R (6)

where 983141Σ = 1T 983141983171

⊤983141983171 and 983141983171 denotes the OLS residuals and with the associated covariance matrix of

the estimates

983141σ2(983141λ) = 1

T(983141β⊤ 983141Σminus1 983141β)minus1(1 + 983141λ⊤

f983141Σminus1f

983141λf ) (7)

Equations (4) and (6) make it clear that in the presence of a spurious (or useless) factor ie

such that βj = CradicT C isin RN risk premia are no longer identified Furthermore their estimates

diverge leading to inference problems for both the useless and the strong (ie βj ∕rarr 0 as T rarr infin)

factors (see eg Kan and Zhang (1999b)) In the presence of such an identification failure the

cross-sectional R2 also becomes untrustworthy If a useless factor is included into the two-pass

regression the OLS R2 tend to be highly inflated (although the GLS R2 is less affected)5

4An alternative way (see eg Cochrane (2005) Page 242) to account for the uncertainty from ldquogenerated regres-sorsrdquo such as βf is to estimate the whole system in GMM The moments are

gT (aβf λ) =

983061IN otimes IK+1

A⊤

983062983091

983107E[Rt minus aminus βfft]

E[(Rt minus aminus βfft)otimes f⊤t ]

E[Rt minus λc1N minus βfλf ]

983092

983108 = 0

where β = (1N βf ) A⊤ = β⊤ for OLS and A⊤ = β⊤Σminus1 for GLS Also note that GLS estimation is not the sameas efficient GMM estimation

5For example Kleibergen and Zhan (2015) derive the asymptotic distribution of the R2 under the assumptionthat a few unknown factors are able to explain expected asset returns and show that in the presence of a useless

7

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 9: Bayesian Solutions for the Factor Zoo: We Just Ran Two

This problem arises not only when using the Fama-MacBeth two-step procedure Kan and

Zhang (1999a) point out that the identification condition in the GMM test of linear stochastic

discount factor models fails when a useless factor is included Moreover this leads to overrejection

of the hypothesis of a zero risk premium for the useless factor under the Wald test and the power

of the over-identifying restriction test decreases Gospodinov Kan and Robotti (2019) document

similar problems within the maximum likelihood estimation and testing framework

Consequently several papers have attempted to develop alternative statistical procedures that

are robust to the presence of useless factors Kleibergen (2009) proposes several novel statis-

tics whose large sample distributions are unaffected by the failure of the identification condition

Gospodinov Kan and Robotti (2014) derive robust standard errors for the GMM estimates of

factor risk premia in the linear stochastic factor framework and prove that t-statistics calculated

using their standard errors are robust even when the model is misspecified and a useless factor is

included Bryzgalova (2015) introduces a LASSO-like penalty term in the cross-sectional regression

to shrink the risk premium of the useless factor towards zero

In this paper we provide a Bayesian inference and model selection framework that i) can be

easily used for robust inference in the presence and detection of useless factors (section II1) and

ii) can be used for both model selection and model averaging even in the presence of a very large

number of candidate (traded or non traded and possibly useless) risk factors ndash ie the entire factor

zoo

II1 Bayesian Fama-MacBeth

This section introduces our hierarchical Bayesian Fama-MacBeth (BFM) estimation method A

formal derivation is presented in Appendix A1 To start with letrsquos consider the time-series regres-

sion We assume that the time series error terms follow an iid multivariate Gaussian distribution

(the approach at the cost of analytical solutions could be generalized to accommodate different

distributional assumptions) ie 983171 sim MVN (0TtimesN Σ otimes IT ) The likelihood of the data (RF ) is

then

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr

983147Σminus1(Rminus FB)⊤(Rminus FB)

983148983070

The time-series regression is always valid even in the presence of a spurious factor For simplicity

we choose the non-informative Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 Note that this prior

is flat in the B dimension The posterior distribution of (BΣ) is therefore

B|Σ data sim MVN983059983141BolsΣotimes (F⊤F )minus1

983060 (8)

Σ|data sim W -1983059T minusK minus 1 T Σ

983060 (9)

where 983141Bols and Σ denote the canonical OLS based estimates and W -1 is the inverse-Wishart

distribution (a multivariate generalization of the inverse-gamma distribution) From the above

factor the OLS R2 is more likely to be inflated than its GLS counterpart

8

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 10: Bayesian Solutions for the Factor Zoo: We Just Ran Two

we can sample the posterior distribution of the parameters (BΣ) by first drawing the covariance

matrix Σ from the inverse-Wishart distribution conditional on the data and then drawing B from

a multivariate normal distribution conditional on the data and the draw of Σ

If the model is correctly specified in the sense that all true factors are included expected

returns of the assets should be fully explained by their risk exposure β and the prices of risk λ

ie E[Rt] = βλ But since given our mean normalisation of the factors E[Rt] = a we have the

least square estimate (β⊤β)minus1β⊤a Therefore we can define our first estimator

Definition 1 (Bayesian Fama-MacBeth (BFM)) The posterior distribution of λ conditional

on B Σ and the data is a Dirac distribution at (β⊤β)minus1β⊤a A draw (λ(j)) from the posterior

distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j) from the Normal-

inverse-Wishart in (8)-(9) and computing (β⊤(j)β(j))

minus1β⊤(j)a(j)

The posterior distribution of λ defined above accounts both for the uncertainty about the

expected returns (via the sampling of a) and the uncertainty about the factor loadings (via the

sampling of β) Note that differently from the frequentist case in equation (5) there is no ldquoextra

termrdquo (1 + λ⊤fΣ

minus1f λf ) to account for the fact that βf is estimated The reason being that it

is unnecessary to explicitly adjust standard errors of λ in the Bayesian approach since we keep

updating βf in each simulation step automatically incorporating the uncertainty about βf into the

posterior distribution of λ Furthermore it is quite intuitive from the above definition of the BFM

estimator why we expect posterior inference to detect weak and spurious factors in finite sample

For such factors the near singularity of (β⊤(j)β(j))

minus1 will cause the draws for λ(j) to diverge as in

the frequentist case Nevertheless the posterior uncertainty about factor loadings and risk premia

will cause β⊤(j)a(j) to flip sign across draws causing the posterior distribution of λ to put substantial

probability mass on both values above and below zero Hence centered posterior credible intervals

will tend to include zero with high probability

In addition to the price of risk λ we are also interested in estimating the cross-sectional fit of

the model ie the cross-sectional R2 Once we obtain the posterior draws of the parameters we

can easily obtain the posterior distribution of the cross-sectional R2 defined as

R2ols = 1minus (aminus βλ)⊤(aminus βλ)

(aminus a1N )⊤(aminus a1N ) (10)

where a = 1N

983123Ni ai That is for each posterior draw of (a β λ) we can construct the corre-

sponding draw for the R2 from equation (10) hence tracing out its posterior distribution We can

think of equation (10) as the population R2 where a β and λ are unknown After observing

the data we infer the posterior distribution of a β and λ and from these we can recover the

distribution of the R2

However realistically the models are rarely true Therefore one might want to allow for the

presence of pricing errors α in the cross-sectional regression6 This can be easily accommodated

6As we will show in the next section this is essential for model selection

9

within our Bayesian framework since in this case the data generating process in the second stage

becomes a = βλ + α If we further assume that pricing error αi follows an independent and

identical normal distribution N (0σ2) the likelihood function in the second step becomes7

p(data|λσ2β) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070 (11)

In the cross-sectional regression the ldquodatardquo are the expected risk premia a and the factor loadings

β albeit these quantities are not directly observable to the researcher Hence in the above we

are conditioning on the knowledge of these quantities which can be sampled from the first step

Normal-inverse-Wishart posterior distribution (8)-(9) Conceptually this is not very different from

the Bayesian modeling of latent variables In the benchmark case we assume a Jeffreysrsquo diffuse

prior8 for (λσ2) π(λσ2) prop σminus2 In Appendix A1 we show that the posterior distribution of

(λσ2) is then

λ|σ2BΣ data sim N

983091

983109983107(β⊤β)minus1β⊤a983167 983166983165 983168983141λ

σ2(β⊤β)minus1

983167 983166983165 983168Σλ

983092

983110983108 (12)

σ2|BΣ data sim Γminus1

983075N minusK minus 1

2(aminus βλ)⊤(aminus βλ)

2

983076 (13)

where Γminus1 denotes the inverse-gamma distribution The conditional distribution in equation (12)

makes it clear that the posterior takes into account both the uncertainty about the market price

of risk stemming from the first stage uncertainty about the β and a (that are drawn from the

Normal-inverse-Wishart posterior in equations (8)-(9)) and the random pricing errors α that have

the conditional posterior variance in equation (13) If test assetsrsquo expected excess returns are fully

explained by β there are no pricing errors and σ2(β⊤β)minus1 converges to zero otherwise this layer

of uncertainty always exists

Note also that we can think of the posterior distribution of (β⊤β)minus1β⊤a as a Bayesian decision

makerrsquos belief about the dispersion of the Fama-MacBeth OLS estimates after observing the data

RtftTt=1 Alternatively when pricing errors α are assumed to be zero under the null hypothesis

the posterior distribution of λ in equation (12) collapses to a degenerate distribution where λ

equals (β⊤β)minus1β⊤a with probability one

Often the cross sectional step of the Fama-MacBeth estimation is performed via GLS rather than

least squares In our setting under the null of the model this leads to λ = (β⊤Σminus1β)minus1β⊤Σminus1a

Therefore we define the following GLS estimator

Definition 2 (Bayesian Fama-MacBeth GLS (BFM-GLS)) The posterior distribution of λ

conditional on B Σ and the data is a Dirac distribution at (β⊤Σminus1β)minus1β⊤Σminus1a A draw (λ(j))

7We derive a formulation with non-spherical cross-sectional pricing errors that leads to a GLS type estimator inAppendix A2

8As shown in the next subsection in the presence of useless factors such prior is not appropriate for modelselection based on Bayes factors and posterior probabilities since it does not lead to proper marginal likelihoodsTherefore we introduce therein a novel prior for model selection

10

from the posterior distribution of λ conditional on the data only is obtained by drawing B(j) and Σ(j)

from the Normal-inverse-Wishart in equations (8)-(9) and computing (β⊤(j)Σ

minus1(j)β(j))

minus1β⊤(j)Σ

minus1(j)a(j)

From the posterior sampling of the parameters in the above definition we can also obtain the

posterior distribution of the cross-sectional GLS R2 defined as

R2gls = 1minus

(aminus βλgls)⊤Σminus1(aminus βλgls)

(aminus a1N )⊤Σminus1(aminus a1N ) (14)

Once again we can think of equation (14) as the population GLS R2 that is a function of the

unknown quantities a β and λ But after observing the data we infer the posterior distribution

of the parameters and from these we recover the posterior distribution of the R2gls

Remark 1 (Generated factors) Often factors are estimated as eg in the case of principal

components (PCs) and factor mimicking portfolios (albeit the latter is not needed in our setting)

This generates an additional layer of uncertainty normally ignored in empirical analysis due to

the associated asymptotic complexities Nevertheless it is relatively easy to adjust the Bayesian

estimators of risk premia to account for this uncertainty In the case of a mimicking portfolio

under a diffuse prior and Normal errors the posterior distribution of the portfolio weights follow the

standard Normal-inverse-Gamma of Gaussian linear regression models (see eg Lancaster (2004))

Similarly in the case of principal components as factors under a diffuse prior the covariance

matrix from which the PCs are constructed follow an inverse-Wishart distribution9 Hence the

posterior distributions in Definitions 1 and 2 can account for the generated factors uncertainty by

first drawing from an inverse-Wishart the covariance matrix from which PCs are constructed or

the Normal-inverse-Gamma posterior of the mimicking portfolios coefficients and then sampling

the remaining parameters as explained in the definitions

II2 Model selection

In the previous subsection we have derived simple Bayesian estimators that deliver in finite sam-

ple credible intervals robust to the presence of spurious factors and avoid over-rejecting the null

hypothesis of zero risk premia for such factors

However given the plethora of risk factors that have been proposed in the literature a robust

approach for models selection across non-necessarily nested models and that can handle poten-

tially a very large number of possible models as well as both traded and non-traded factors is

of paramount importance for empirical asset pricing The canonical way of selecting models and

testing hypothesis within the Bayesian framework is through Bayesrsquo factors and posterior prob-

abilities and that is the approach we present in this section This is for instance the approach

suggested by Barillas and Shanken (2018) The key elements of novelty of the proposed method

are that i) our procedure is robust to the presence of spurious and weak factors ii) it is directly

9Based on these two observations Allena (2019) proposes a generalisation of Barillas and Shanken (2018) modelcomparison approach for these type of factors

11

applicable to both traded and non-traded factors and iii) it selects models based on their cross-

sectional performance (rather than the time series one) ie on the basis of the risk premia that the

factors command

In this subsection we show first that flat priors for risk premia (as the Jeffreysrsquo priors used

in section II1 for illustrative purposes) are not suitable for model selection in the presence of

spurious factors Given the close analogy between frequentist testing and Bayesian inference with

flat priors this is not too surprising But the novel insight is that the problem arises exactly

because of the use of flat priors and can therefore be fixed by using non-flat yet non-informative

priors Second we introduce ldquospike-and-slabrdquo priors that are robust to the presence of spurious

factors and particularly powerful in high-dimensional model selection ie when one wants as in

our empirical application to test all factors in the zoo

II21 Pitfalls of Flat Priors for Risk Premia

We start this section by discussing why flat priors for risk premia as the Jeffreysrsquo prior are not

desirable in model selection Since we want to focus and select models based on the cross-sectional

asset pricing properties of the factors for simplicity we retain Jeffreysrsquo priors for the time series

parameter (aβf Σ) of the first-step regression

In order to perform model selection we relax the (null) hypothesis that models are correctly

specified and allow instead for the presence of cross-sectional pricing errors That is we consider

the cross-sectional regression a = βλ + α For illustrative purposes we focus on spherical errors

but all the results in this and the following subsections can be generalized to the non-spherical error

setting in Appendix A2

Similar to many Bayesian variable selection problems we introduce a vector of binary latent

variables γ⊤ = (γ0 γ1 γK) where γj isin 0 1 When γj = 1 it indicates that the factor j

(with associated loadings βj) should be included into the model and vice versa The number of

included factors is simply given by pγ =983123K

j=0 γj Note that we do not shrink the intercept so

γ0 is always equal to 1 (as the common intercept plays the role of the first ldquofactorrdquo) The notation

βγ = [βj ]γj=1 represents a pγ-columns sub-matrix of β

When testing whether the risk premium of factor j is zero the null hypothesis is H0 λj = 0

In our notation this null hypothesis can be expressed as H0 γj = 0 while the alternative is

H1 γj = 1 This is a small but important difference relative to the canonical frequentist testing

approach for useless factors the risk premium is not identified hence testing whether it is equal to

any given value is per se problematic Nevertheless as we show in the next section with appropriate

priors whether a factor should be included or not is a well-defined question even in the presence

of useless factors

In the Bayesian framework the prior distribution of parameters under the alternative hypothesis

should be carefully specified Generally speaking the priors for nuisance parameters such as β σ2

and Σ do not greatly influence the cross-sectional inference But as we are about to show this is

not the case for the priors about risk premia

12

Recall that when considering multiple models say wlog model γ and model γ prime by Bayesrsquo

theorem we have that the posterior probability of model γ is

Pr(γ|data) = p(data|γ)p(data|γ) + p(data|γ prime)

where we have given equal prior probability to each model and p(data|γ) denotes the marginal

likelihood of the model indexed by γ In Appendix A3 we show that when using a Jeffreysrsquo prior

(that is flat for λ) the marginal likelihood is

p(data|γ) prop (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ983059Nminuspγ+1

2

983060

983059N σ2

γ

2

983060Nminuspγ+1

2

(15)

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

Therefore if model γ includes a useless factor (whose β asymptotically converges to zero) the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero sending the marginal likelihood

in (15) to infinity As a result the posterior probability of the model containing the spurious factor

go to one Consequently under a flat prior for risk premia the model containing a useless factor

will always be selected asymptotically However the posterior distribution of λ for the spurious

factor is robust and particularly disperse in any finite sample

Moreover it is highly likely that conclusions based on the posterior coverage of λ contradict

those arising from Bayesrsquo factors When the prior distribution of λj is too diffuse under the

alternative hypothesis H1 the Bayesrsquo factor tends to favor H0 over H1 even though the estimate

of λj is far from 0 The reason is that even though H0 seems quite unlikely based on posterior

coverages the data is even more unlikely under H1 than under H0 Therefore a disperse prior for

λj may push the posterior probabilities to favor H0 and make it fail to identity true factors This

phenomenon is the so called ldquoBartlett Paradoxrdquo (see Bartlett (1957))

Note also that flat hence improper priors for the risk premia are not legitimate since they render

the posterior model probabilities arbitrary Suppose that we are testing the null H0 λj = 0 Under

the null hypothesis the prior for (λσ2) is λj = 0 and π(λminusj σ2) prop 1

σ2 However the prior under the

alternative hypothesis is π(λj λminusj σ2) prop 1

σ2 Since the marginal likelihoods of data p(data|H0)

and p(data|H1) are both undetermined we cannot define the Bayesrsquo factor p(data|H1)p(data|H0)

(see eg

Cremers (2002) Chib Zeng and Zhao (forthcoming)) In contrast for nuisance parameters such

as σ2 we can continue to assign improper priors Since both hypotheses H0 and H1 include σ2

the prior for it will be offset in the Bayesrsquo factor and in the posterior probabilities Therefore we

can only assign improper priors for common parameters10 Similarly we can still assign improper

priors for β and Σ in the first time series step

The final reason why it might be undesirable to use Jeffreysrsquo prior in the second step is that it

does not impose any shrinkage on the parameters This is problematic given the large number of

10See Kass and Raftery (1995) (and also Cremers (2002)) for more detailed discussion

13

members of the factor zoo while we have only limited time-series observations of both factors and

test asset returns

In the next subsection we propose an appropriate prior for risk premia that is both robust to

spurious factors and can be used for model selection even when dealing with a very large number

of potential models

II22 Spike and Slab Prior for Risk Premia

In order to make sure that the integration of the marginal likelihood is well-behaved we propose

a novel prior specification for the factorsrsquo risk premia λ⊤f = (λ1 λK)11 Since the inference in

time-series regression is always valid we only modify the priors of the cross-sectional regression

parameters

The prior that we propose belongs to the so-called ldquospike-and-slabrdquo family For exemplifying

purposes in this section we introduce a Dirac spike so that we can easily illustrate its implications

for model selection In the next subsection we generalize the approach to a ldquocontinuous spikerdquo

prior and study its finite sample performance in our simulation setup

In particular we model the uncertainty underlying the model selection problem with a mixture

prior π(λσ2γ) prop π(λ|σ2γ)π(σ2)π(γ) for the risk premium of the j-th factor When γj = 1

and hence the factor should be included in the model the prior follows a normal distribution given

by λj |σ2 γj = 1 sim N (0σ2ψj) where ψj is a quantity that we will be defining below When instead

γj = 0 and the corresponding risk factor should not be included in the model the prior is a Dirac

distribution at zero For the cross-sectional variance of the pricing errors we keep the same prior

that would arise with Jeffreysrsquo approach π(σ2) prop σminus2

Let D denote a diagonal matrix with elements cψminus11 middot middot middot ψminus1

K and Dγ the sub-matrix of D

corresponding to model γ We can then express the prior for the risk factors λγ of model γ as

λγ |σ2γ sim N (0σ2Dminus1γ )

Note that c is a small positive number since we do not shrink the common intercept λc of the

cross-sectional regression

Given the above prior specification we sample the posterior distribution by sequentially drawing

from the conditional distributions of the parameters (ie we use a Gibbs sampling algorithm) The

crucial steps in addition to the sampling of the times series parameters from the posteriors in

equations (8)-(9) are as follows

Sampling λγ

Note that

11We do not shrink the intercept λc

14

p(λ|dataσ2γ) prop p(data|λσ2γ)π(λ|σ2γ)

prop (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 exp

983069minus 1

2σ2[(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ ]

983070

= (2π)minuspγ2 |Dγ |

12 (σ2)minus

N+pγ2 e

983069minus

(λγminusλγ )⊤(β⊤γ βγ+Dγ )(λγminusλγ )

2σ2

983070

e

983153minusSSRγ

2σ2

983154

where SSRγ = a⊤aminus a⊤βγ(β⊤γ βγ +Dγ)

minus1β⊤γ a = minλγ(aminus βγλγ)

⊤(aminus βγλγ) + λ⊤γDγλγ

Note that SSRγ is the minimized sum of squared errors with generalised ridge regression penalty

term λ⊤γDγλγ That is our prior modelling is analogous to introducing a Tikhonov-Phillips

regularisation (see Tikhonov Goncharsky Stepanov and Yagola (1995) Phillips (1962)) in the

cross-sectional regression step and has the same rationale delivering a well defined marginal

likelihood in the presence of rank deficiency (that in our settings arises in the presence of useless

factors) However in our setting the shrinkage applied to the factors is heterogeneous since we

rely on the partial correlation between factors and test assets to set ψj as

ψj = ψ times ρ⊤j ρj (16)

where ρj is an N times 1 vector of correlation coefficients between factor j and the test assets and

ψ isin R+ is a tuning parameter which controls the shrinkage over all the factors12 When the

correlation between fjt and Rt is very low as in the case of a useless factor the penalty for λj

which is the reciprocal of ψρ⊤j ρj is very large and dominates the sum of squared errors

Let λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a and σ2(λγ) = σ2(β⊤

γ βγ +Dγ)minus1 the posterior distribution of

λγ is

λγ |dataσ2γ sim N (λγ σ2(λγ))

The above equation makes it clear why this Bayesian formulation is robust to spurious factors

When β converges to zero (β⊤γ βγ +Dγ) is dominated by Dγ so the identification condition for

the risk premia no longer fails When a factor is spurious its correlation with test assets converges

to zero hence the penalty for this factor ψminus1j goes to infinity As a result the posterior mean

of λγ λγ = (β⊤γ βγ +Dγ)

minus1β⊤γ a is shrunk towards zero and the posterior variance term σ2(λ)

approaches σ2Dminus1γ Consequently the posterior distribution of λ for a spurious factor is nearly the

same as its prior In contrast for a normal factor that has non-zero covariance with test assets the

information contained in β dominates the prior information since in this case the absolute size of

Dγ is small relative to β⊤γ βγ

Using our priors and integrating out λ yields

p(data|σ2γ) =

983133p(data|λσ2γ)π(λ|σ2γ)dλ prop (σ2)minus

N2

|Dγ |12

|β⊤γ βγ +Dγ |

12

exp

983069minusSSRγ

2σ2

983070

12Alternatively we could have set ψj = ψ times β⊤j βj where βj is an N times 1 vector However ρj has the advantage

of being invariant to the units in which the factors are measured

15

Sampling σ2

From the Bayesrsquo theorem we have that the posterior of σ2 given by

p(σ2|dataγ) prop p(data|σ2γ)π(σ2) prop (σ2)minusN2minus1 exp

983069minusSSRγ

2σ2

983070

hence the posterior distribution of σ2 is an inverse-Gamma σ2|dataγ sim Γminus1983059N2

SSRγ

2

983060

Finally we obtain the marginal likelihood of the data in model γ by integrating out σ2

p(data|γ) =983133

p(data|σ2γ)π(σ2)dσ2 prop |Dγ |12

|β⊤γ βγ +Dγ |

12

1

(SSRγ2)N2

When comparing two models using posterior model probabilities is equivalent to simply using the

ratio of the marginal likelihoods ie the Bayesrsquo factor defined as

BFγγprime = p(data|γ)p(data|γ prime)

where we have given equal prior probability to model γ and model γprime

Remark 2 (Bayes Factor) Consider two nested linear factor models γ and γ prime The only dif-

ference between γ and γ prime is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a

(K minus 1) times 1 vector of model index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) where without

loss of generality we have assumed that the factor p is ordered last The Bayesrsquo factor is then

BFγγprime =

983061SSRγprime

SSRγ

983062N2 983059

1 + ψpβ⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp

983060minus 12 (17)

The above result is proved in Appendix A4

Since β⊤p [IN minus βγprime(β⊤

γprimeβγprime + Dγprime)minus1β⊤γprime ]βp is always positive ψp plays an important role in

variable selection For a strong and useful factor that can substantially reduce pricing errors the

first term in equation (17) dominates and the Bayesrsquo factor will be much greater than 1 hence

providing evidence in favour of model γ

Remember that SSRγ = minλγ(a minus βγλγ)⊤(a minus βγλγ) + λ⊤

γDγλγ hence we always have

SSRγ le SSRγprime in sample There are two effects of increasing ψp i) when ψp is large the penalty

for λp is small hence it is easier to minimise SSRγ and SSRγprimeSSRγ becomes much larger than

1 ii) large ψp decreases the second term in equation (17) lowering the Bayesrsquo factor and acting as

a penalty for dimensionality

A particularly interesting case is when the factor is useless βp converges to zero but the

penalty term 1ψp prop 1ρ⊤pρp goes to infinity On the one hand the first term in equation (17) will

converge to 1 on the other hand since ψp asymp 0 in large sample the second term in equation (17)

will also be around 1 Therefore the Bayesrsquo factor for a useless factor will go to 1 asymptotically13

13But in finite sample it may deviate from its asymptotic value so we should not use 1 as a threshold when testingthe null hypothesis H0 γp = 0

16

In contrast a useful factor should be able to greatly reduce the sum of squared errors SSRγ so

the Bayesrsquo factor will be dominated by SSRγ yielding a value substantially above 1

Remark 3 (Level Factors) Identification failure of factorsrsquo risk premia can arise in the presence

of lsquolevel factorsrsquo exposure to which is non-zero but lacks cross-sectional spread ie βj rarr c1N with

c ∕= 0 These factors help explain the average level of returns but not the their cross-sectional

dispersion and hence are collinear with the common cross-sectional intercept Our approach can

handle this case by using variance standardised variables in the estimation and replacing the penalty

in (16) with ψj = ψtimes983145ρj⊤983145ρj where 983145ρj is the cross-sectionally demeaned vector of correlations with

asset returns ie 983145ρj = ρj minus983059

1N

983123Ni=1 ρji

983060times 1N

II23 Continuous Spike

We extend the Dirac spike-and-slab prior by encoding a continuous spike for λj when γj equals

0 Following the literature on Bayesian variable selection (see eg George and McCulloch (1993)

George and McCulloch (1997) and Ishwaran Rao et al (2005)) we model the uncertainty under-

lying model selection with a mixture prior π(λσ2γω) = π(λ|σ2γ)π(σ2)π(γ|ω)π(ω) which is

specified as following

λj |γj σ2 sim N (0 r(γj)ψjσ2)

Note the introduction of a new element r(γj) in the prior and where r(1) = 1 and r(0) = r As

we explain below the additional parameter vector ω encodes our prior beliefs about the sparsity

of the true model

Redefine D as a diagonal matrix with elements c (r(γ1)ψ1)minus1 (r(γK)ψK)minus1 where ψj is

given as before by equation (16) In matrix notation the prior for λ is λ|σ2γ sim N (0σ2Dminus1)

The term r(γj)ψj in Dminus1 is set to be small or large depending on whether γj = 0 or γj = 1 In the

empirical implementation we force r to be much less than 1 since we intend to shrink λj towards

zero when γj is 014 Hence the spike component concentrates the mass of λ towards zero whereas

the slab component allows λ to take values over a much wider range Therefore the posterior

distribution of λ is very similar to the case of a Dirac spike in section II22

We use Gibbs sampling (ie sequential sampling from conditional distributions) to draw the

posterior distribution of the parameters (λγωσ2) where as explained below ω encodes our

prior beliefs about the sparsity of the true model

Sampling λγ

Combining the likelihood and the prior for λ we have

p(λ|dataσ2γ) prop p(data|λσ2γ)p(λ|σ2γ) prop exp

983069minus 1

2σ2

983147λ⊤(β⊤β +D)λminus 2λ⊤β⊤a

983148983070

Therefore defining λ = (β⊤β+D)minus1β⊤a and σ2(λ) = σ2(β⊤β+D)minus1 the posterior distribution

14We can set r = 00001 In our framework r is essentially a tune parameter hence we need to choose a reasonablevalue such that we can identify useful factor but exclude spurious ones

17

of λ can be expressed as λ|dataσ2γω sim N (λ σ2(λ))

Sampling γjKj=1

Even though the prior on model index γ could be simply set to be π(γ) = 12K we interpret

π(γj = 1|ωj) = ωj as our prior belief about the sparsity of the true model As in the literature on

predictors selection we assign the following prior distribution to (γω)

π(γj = 1|ωj) = ωj ωj sim Beta(aω bω)

Different hyper-parameters aω and bω determine whether we favor more parsimonious models or

not15

Given a ωj the Bayes factor for the j-th risk then factor is

p(γj = 1|dataλωσ2γminusj)

p(γj = 0|dataλωσ2γminusj)=

ωj

1minus ωj

p(λj |γj = 1σ2)

p(λj |γj = 0σ2)

If we had instead imposed ωj = 05 as in section II22 the Bayesrsquo factor would be fully determined

byp(λj |γj=1σ2)p(λj |γj=0σ2)

Sampling ω

From Bayesrsquo theorem we have

p(ωj |dataλγσ2) prop π(ωj)π(γj |ωj) prop ωγjj (1minus ωj)

1minusγjωaωminus1j (1minus ωj)

bωminus1

prop ωγj+aωminus1j (1minus ωj)

1minusγj+bωminus1

Therefore the posterior distribution of ωj is ωj |dataλγσ2 sim Beta(γj + aω 1minus γj + bω)

Sampling σ2

Finally

p(σ2|dataωλγ) prop (σ2)minusN+K+1

2minus1 exp

983069minus 1

2σ2[(aminus βλ)⊤(aminus βλ) + λ⊤Dλ]

983070

Hence the posterior distribution of σ2 is

σ2|dataωλγ sim Γminus1

983061N +K + 1

2(aminus βλ)⊤(aminus βλ) + λ⊤Dλ

2

983062

Note that when ωj is a constant 05 and r converges to 0 the continuous slab-and-spike prior

is equivalent to the one with a Dirac spike in section II22 However the MCMC algorithm of

the continuous spike setting is particularly useful in the high dimensional case Imagine that there

are 30 candidate factors in the factor zoo In the Dirac spike-and-slab prior case we have to

15We set aω = bω = 2 in the benchmark case However we could assign aω = 1 and bω = 2 in order to select asparser model

18

calculate the posterior model probabilities for 230 different models Given that we update (aβ)

in each sampling round posterior probabilities for all models are necessarily re-computed for every

new draw of these quantities rendering the computational cost very large In contrast using our

continuous spike approach we can simply use the posterior mean of γj to approximate the posterior

marginal probability of the j-th factor

III Simulation

We build a simple setting for a linear factor model that includes both strong and irrelevant factors

and allows for potential model misspecification

The cross-section of asset returns mimics empirical properties of 25 Fama-French portfolios

sorted by size and value We generate both factors and test asset returns from normal distributions

assuming that HML is the only useful factor A misspecified model also includes pricing errors from

the two-step Fama-MacBeth procedure which makes the vector of simulated expected returns equal

to their sample mean estimates of 25 Fama-French portfolios Finally a spurious factor is simulated

from an independent normal distribution with mean zero and standard deviation 1

ftuselessiidsim N (0 (1)2)

ftHMLiidsim N (rHML σ

2HML)

ftHML = ftHML minus macrftHML

Rt|ftHMLiidsim

983099983103

983101N (λc1N + β

983059λHML + ftHML

983060 Σ) if the model is correct

N (R+ βftHML Σ) if the model is misspecified

where factor loadings risk premia and variance-covariance matrix of returns are equal to their

OLS sample estimates from time series and two-pass Fama-MacBeth regressions of 25 size and

value portfolios on HML All the model parameters are estimated on monthly data from July 1963

to December 2017

To better illustrate the properties of the frequentist and bayesian approaches we consider 3

estimation setups

(a) the model includes only a strong factor (HML)

(b) the model includes only a useless factor

(c) the model includes both strong and useless factors

Each setting can be correctly or incorrectly specified with the following sample sizes T = 100

200 600 1000 and 20000 We compare the performance of the OLSGLS standard frequentist

and Bayesian Fama-MacBeth estimators (FM and BFM correspondingly) with the focus on risk

premia recovery testing and identification of strong and useless factors for model comparison

19

III1 Estimating risk premia via Bayesian Fama-MacBeth

Since it is unlikely that in most empirical settings a linear factor model is correctly specified we

focus our discussion on the case that allows for model misspecification We report similar simulation

results for the case of correct model specification in Appendix C

Table 1 Tests of risk premia in a misspecified model with a strong factor

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0107 0052 0007 0115 0076 0013 -422 4530200 0094 0067 0006 0118 0072 0015 -303 5621

FM 600 0097 0053 0006 0098 0047 0011 389 55751000 0088 0048 0012 0103 0060 0012 865 532820000 0109 0056 0010 0110 0060 0008 2486 3655

100 0063 0026 0006 0032 0013 0001 -356 3609200 0086 0041 0012 0079 0039 0008 -367 4689

BFM 600 0087 0043 0013 0087 0047 0009 -067 52611000 0095 0046 0009 0093 0046 0009 440 534620000 0096 0052 0012 0100 0052 0012 2390 3601

Panel B GLS

100 0246 0173 0067 0251 0154 0070 1720 7464200 0165 0107 0031 0171 0105 0035 5337 8045

FM 600 0137 0076 0020 0137 0073 0016 6942 83871000 0131 0072 0019 0132 0072 0015 7341 841620000 0122 0074 0014 0113 0061 0015 8034 8283

100 0139 0083 0022 0143 0083 0024 3127 6815200 0131 0072 0014 0121 0069 0019 4858 7299

BFM 600 0114 0061 0013 0126 0067 0018 6594 80091000 0100 0048 0009 0106 0058 0008 7063 814920000 0092 0050 0012 0100 0048 0012 8022 8267

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor The true value of the cross-sectional R2

adj is 3055(8175) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervals and their size for BFMestimates are constructed using posterior coverage of Fama-MacBeth estimates of λ The last two columns report the5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimatesfor FM and its posterior mode for BFM

Table 1 presents the size of the tests for risk premia and confidence intervals for cross-sectional

R2 in probably the most relevant (and best-case) scenario for empirical applications It compares

the performance of frequentist and Bayesian Fama-MacBeth estimators for the case when the model

is misspecified and includes a single cross-sectional factor that is priced and strongly identified

proxied by HML Since the model is misspecified cross-sectional R2 never reaches 100 even for

T = 20 000 with the population value of 31 (82) for OLS (GLS) As expected both FM and

BFM estimators are very similar to each other and provide confidence intervals of correct size In

case of the standard FM approach they are constructed using standard t-statistics adjusted for

Shanken correction and in case of the BFM we rely on the quantiles of the posterior distribution

to form the credible confidence intervals for parameters The last two columns also report the

quantiles of the mode of the posterior distribution of R2 across the simulations

20

Table 2 summarizes risk premia estimation for the same cross-section of 25 expected returns

but using a useless (spurious) factor as a candidate cross-sectional factor As expected the standard

Fama-MacBeth estimator fails to recognize the rank failure in the second stage and conventional

risk premia estimates and t-statistics are inflated Indeed it is widely known since Kan and Zhang

(1999) that if the model is misspecified then t-statistics of the spurious factors tend to infinity

asymptotically This is confirmed in Panels A and B for T = 20 000 the probability of finding a

useless factor to have a t-stat above 196 is almost 50 for FM-OLS and over 80 for FM-GLS

Furthermore cross-sectional measures of fit such as R2 and related quantities are substantially

inflated even though itrsquos true value is 0 for both OLS and GLS settings in-sample estimates

produced by the frequentist approach not only have a much wider empirical support (for example

from -4 to over 40 for FM-OLS) but also uncertainty that does not decrease with the sample

size

Table 2 Tests of risk premia in a misspecified model with a useless factor

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0072 0037 0010 0055 0011 0001 -429 4184200 0078 0043 0005 0084 0021 0001 -418 4524

FM 600 0089 0043 0014 0223 0116 0013 -428 43701000 0093 0052 0014 0333 0187 0027 -429 450120000 0213 0135 0067 0698 0488 0172 -427 4363

100 0035 0011 0001 0001 0000 0000 -247 -036200 0041 0013 0001 0008 0002 0000 -261 -016

BFM 600 0071 003 0003 0031 0006 0002 -287 0191000 0047 002 0001 0039 0017 0002 -298 08420000 0034 0013 0000 0091 0043 0012 -336 1140

Panel B GLS

100 0238 0144 0066 0305 0199 0062 -347 3857200 0152 0091 0028 0292 0189 0067 -375 1953

FM 600 0126 0066 0017 0407 0314 0148 -385 16431000 0117 0063 0013 0510 0401 0239 -405 156920000 0104 0039 0005 0864 0847 0768 -311 1333

100 0128 0070 0019 0047 0018 0002 -218 1154200 0107 0060 0014 0034 0011 0000 -275 891

BFM 600 0093 0046 0008 0042 0012 0001 -325 6621000 0083 0031 0004 0061 0028 0004 -339 51920000 0023 0006 0000 0099 0049 0011 -304 138

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor The true value of the cross-sectional R2 is zero

The Bayesian approach to Fama-MacBeth regressions successfully overcomes the hurdle of use-

less factors As Table 2 demonstrates both BFM-OLS and BFM-GLS are able to identify the spu-

rious factor with the posterior distribution providing credible confidence bounds with the proper

control for the size of the test (eg for T = 20 000 the probability to reject the null of no risk

premia attached to a useless factor when using a test with the nominal size of 10 is 91) Rec-

ognizing a useless factor cross-sectional measures of fit are more conservative and overall tighter

Why does the bayesian approach work when the frequentist fails The argument is probably

best summarized by Figure 1 that plots a posterior distribution of λuseless for BFM from one of

21

minus4 minus2 0 2 4

00

02

04

06

08

λuseless

Density

BFMminusOLSFMminusOLS

Figure 1 Risk premia estimates of a useless factor

The graph presents the posterior distribution (blue line) of λuseless from the BFM-OLS estimation in a misspecifiedmodel with a useless factor based on a single simulation with T = 1 000 The red line depicts the asymptoticdistribution of Fama-MacBeth estimate of risk premium under the normal approximation

the simulations along with the pseudo-true value of risk premium defined as 0 in this case In this

particular simulation Fama-MacBeth OLS estimate of λuseless is -119 with Shanken-corrected

t-statistics equal to -255 so according to traditional hypothesis testing we would reject the null

of λuseless = 0 even at 1 The posterior distribution of BFM estimates of risk premium (the

blue line in Figure 1) behaves rather differently it is centered around 0 and overall more spread

out with a confidence interval (minus1603 1201) Intuitively the main driving force behind it

is the fact that in BFM β is updated continuously when β is close to zero the posterior draws

of β will be positive or negative randomly which implies that the conditional expectation of λ in

Equation 12 will also flip sign depending on the draw As a result the posterior distribution of

λuseless is centered around 0 and so is the confidence interval The same logic applies to the case

of BFM-GLS

Finally Table 3 combines the insights for true and irrelevant factors and presents the simulation

results for the most realistic model setup that includes both useless and strong factors As expected

in the conventional case of frequentist Fama-MacBeth estimation the useless factor is often found

to be a significant predictor of the asset returns its OLS (GLS) t-statistic would be above a

5-critical value in over 60 (80) of the simulations On the contrary the bayesian confidence

intervals have the right coverage and reject the null of no risk premia attached to the spurious

factor with frequency asymptotically approaching the size of the tests

The crowding out of the true factors by the useless ones could also be an important empirical

concern When the model is misspecified the presence of spurious factors can also bias the risk

premia estimates for the strong ones and often leads to their crowding out of the model Panel

A in Table 3 serves as a good illustration of this possibility with risk premia estimates for the

22

Table 3 Tests of risk premia in a misspecified model with useless and strong factors

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0082 0039 0008 0121 0067 0016 0099 0023 0001 -513 5663200 0096 0044 0005 0157 0100 0034 0129 0039 0005 127 6190

FM 600 0093 0034 0014 0212 0147 0071 0264 0129 0022 840 61781000 0102 0046 0010 0261 0194 0098 0380 0199 0056 1184 624820000 0114 0054 0009 0289 0229 0152 0848 0633 0240 2507 6076

100 0035 0012 0001 0028 0007 0001 0004 0001 0000 -211 4033200 0049 0017 0001 0067 0031 0004 0011 0003 0000 -175 4828

BFM 600 005 0018 0004 0099 0047 0005 0047 0014 0002 1020 55721000 0041 0021 0003 0102 0048 0011 0071 0035 0004 1487 569520000 0017 0007 0000 0087 0033 0007 0099 0055 0012 2480 5466

Panel B GLS

100 0219 0155 0057 0224 0135 0066 0303 0198 0064 1911 7775200 0155 0092 0028 0149 0090 0024 0263 0183 0061 5537 8171

FM 600 0121 0068 0015 0116 0064 0016 0391 0293 0134 6948 84331000 0115 0061 0013 0115 0057 0012 0487 0387 0216 7305 847420000 0084 0050 0009 0100 0041 0005 0864 0836 0757 7979 8424

100 0122 0069 0016 0129 0070 0017 0046 0017 0002 3243 6869200 0112 0056 0012 0099 0048 0012 0031 0012 0000 4844 7355

BFM 600 0096 0049 0011 0086 0045 0009 0049 0016 0002 6576 80301000 0081 0036 0007 0073 0032 0006 0058 0030 0003 7064 815420000 0027 0005 0000 0022 0007 0000 0098 0047 0013 7974 8259

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value of the

cross-sectional R2adj is 3055 (8175) for the OLS (GLS) estimation

strong factor are clearly biased in the frequentist estimation by the identification failure in case of

the frequentist approach Again in this case BFM provides reliable albeit conservative confidence

bounds for model parameters

III11 Evaluating cross-sectional fit

In addition to risk premia estimates it is often useful to understand the quality of cross-sectional

fit of the model Indeed the increase in cross-sectional R2 is often interpreted as measuring the

economic importance of the predictor contrary to the statistical one implied by the risk premia

significance It is well-known however that the average values of R2 are not always informative

about the true model performance its sample distribution often suffers from a large estimation un-

certainty (see eg Stock (1991) and Lewellen Nagel and Shanken (2010)) ad has a non-standard

distribution when the matrix of β has reduced rank (Kleibergen and Zhan (2015) Gospodinov

Kan and Robotti (2019)) In this section we further investigate the properties of cross-sectional

R2 in the frequentist and Bayesian Fama-MacBeth regressions

Figure 2 shows the distribution of cross-sectional OLS R2 across a large number of simulations

for the asymptotic case of T = 20 000 and a misspecified process for returns As Panel (a)

illustrates if the model is strongly identified the distribution of posterior mode of R2 for BFM

tends to coincide with that of conventional Fama-MacBeth procedure as expected The major

23

difference emerges whenever a useless factor is included into the candidate set of variables Indeed

it is well-known that in this case the distribution of conventional measures of fit is nonstandard

and often inflated (Kleibergen and Zhan (2015)) This is further confirmed in Figures 2 b) and

c) that show that under the presence of spurious factors conventional Fama-MacBeth R2 has an

extremely spreadout right tail of the distribution which makes it easy to find a substantial increase

of fit whenever the model is simply not identified This unfortunate property of the frequentist

approach is not shared by the inference with BFM Indeed the mode of the posterior distribution

of R2 is generally tightly centered around the true values The slight bump to the right tail of the

distribution comes from the fact that whenever a spurious factor is included into the model with a

small probability (based on t-statistic cut-off this is equal to the size of the test see eg Table 3

Panel A) its fit will be similar to that of the frequentist estimation

00 02 04 06 08 10

02

46

810

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(a) Model with a strong factor

00 02 04 06 08

010

2030

40

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(b) Model with a useless factor

00 02 04 06 08 10

02

46

8

Radj2

Dens

ity

Posterior ModeFMminusOLSTrue Radj

2

(c) Model with strong and useless factors

Figure 2 Cross-sectional distribution of R2adj in models with strong and useless factors

Figures (a)-(c) show the asymptotic distribution of cross-sectional R2 under different model specifications across 1000simulations of sample size T = 20 000 Blue lines correspond to the distribution of the posterior mode for R2

adj whilegreen lines depict the pointwise sample distribution of cross-sectional fit evaluated at Fama-MacBeth risk premiaestimates The dark yellow dash line stands for the true value of R2

adj in the model

24

However the pointwise distribution of cross-sectional R2 across the simulations is only part of

the story as it does not reveal the in-sample estimation uncertainty and whether the confidence

intervals are credible in reflecting it While BFM incorporates this uncertainty directly into the

shape of its posterior distribution one needs to rely on bootstrap-like algorithms to build a similar

analogue in the frequentist case As frequentist benchmark we use the approach Lewellen Nagel

and Shanken (2010) to construct the confidence interval for R2 Details on this procedure can be

found in Appendix A5

minus05 00 05 10

00

05

10

15

20

25

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(a) Model with a useless factor

minus04 minus02 00 02 04 06 08 10

00

05

10

15

20

Radj2

Den

sity

BFM PosteriorFM Radj

2

5th95th CI of BFM5th95th CI of LNSTrue Radj

2

(b) Model with strong and useless factors

Figure 3 The estimation uncertainty of cross-sectional R2Figures (a) and (b) show the posterior density of the cross-sectional R2 (blue) in one representative simulationalong with its 90 confidence interval (red) The dark yellow line denotes the true value of R2 while the green linedepicts its in-sample Fama-MacBeth estimate with the confidence interval constructed following Lewellen Nagel andShanken (2010) (black)

Figure 3 presents the posterior distribution of cross-sectional R2 for a model that contains a

useless factor (and potentially a strong one too) and contrasts it with a frequentist value and

the confidence interval around it Consider for example Panel a) The fact that the in-sample

Fama-MacBeth estimate of cross-sectional fit (51) is substantially higher than the mode of the

posterior distribution (-2 which is close to the true value of R2adj about minus4) is not surprising

given the previous results on the pointwise distribution of the estimates What is quite interesting

however is the coverage of the confidence interval constructed via the simulation-based approach of

Lewellen Nagel and Shanken (2010) Not only does it not include true value of the cross-sectional

fit but in fact in this particular simulation it suggests that R2adj should be between 42 and

100 A similar mismatch between the seemingly high levels of cross-sectional fit produced by a

frequentist approach and their true values can also be observed in Panel b) of Figure 3 for the

case of including both strong and a useless factors

In section A6 of the Appendix we show that the performance of the Bayesian Fama-MacBeth

method is robust to the use of a larger cross-sectional dimension ndash ie the above discussed properties

25

hold in a larger N setting

III2 Bayes factors

How well do flat and spike-and-slab priors work empirically in selecting relevant and detecting

spurious factors in the cross-section of asset returns We revisit the theoretical results from Section

II1 using the same simulation design used to evaluate the estimation of risk premia

Table 4 The probability of retaining risk factors using Bayes factors

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0813 0784 0758 0722 0693 0662600 0929 0915 0896 0876 0851 08341000 0972 0963 0957 0951 0937 0924

Spike-and-Slab Prior fstrong 200 0605 0570 0538 0505 0476 0447600 0853 0830 0807 0784 0762 07351000 0954 0940 0928 0912 0896 0875

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0996 0988 0967 0919 0822600 0998 0998 0995 0988 0977 09431000 1000 1000 1000 0994 0983 0965

Spike-and-Slab Prior fuseless 200 0325 0188 0106 0057 0024 0013600 0072 0025 0006 0002 0001 00001000 0051 0015 0001 0000 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0924 0897 0874 0848 0821 0799600 0988 0985 0976 0974 0965 09581000 0998 0996 0996 0995 0992 0987

fuseless 200 0984 0960 0910 0811 0702 0584600 0999 0993 0985 0954 0913 08541000 1000 1000 0995 0986 0966 0945

Spike-and-Slab Prior fstrong 200 0627 0591 0552 0509 0474 0452600 0860 0830 0802 0783 0758 07271000 0956 0942 0927 0911 0895 0875

fuseless 200 0260 0128 0071 0031 0019 0010600 0080 0028 0010 0004 0001 00011000 0058 0013 0005 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

Consider a cross-section of 25 portfolios that is actually loading on 2 systematic sources of risk

with the econometrician potentially observing at most only one of them a strong (and priced) ft

However there is also a second candidate factor available which is orthogonal to asset returns

and essentially useless We compute Bayes factors corresponding to each of the potential sources

of risk and document the empirical probability of retaining the variable in the model across 1000

26

simulations Again we consider models that contain either strong or useless factors or a com-

bination of both and different sample sizes (T = 200 600 and 1000) In each case we run the

Gibbs sampling algorithm derived using continuous spike-and-slab prior and then approximate the

marginal probability of each factor by the posterior mean of γj The decision rule is based on a

range of critical values 55-65 such that whenever the posterior mean of γj is above a particular

threshold we retain the factor Finally we also compute the probability of retaining a factor under

Jeffreys prior which would be the standard in the literature

Table 4 summarizes our findings When only a true risk factor is included in the candidate set

(Panel A) both Jeffreyrsquos and spike-and-slab priors successfully identify it with a high probability

especially in large sample For small sample sizes however Jeffreys prior clearly has a somewhat

higher power of detecting a true risk factor This outcome is not surprising as the estimator does

not impose additional shrinkage on the risk premium The difference however vanishes for larger

sample sizes

The difference between the two priors becomes drastic whenever useless factors are included

in the model (Panels B Panels B and C in Table 4) As discussed in Section II21 since the

matrix β⊤γ βγ is nearly singular and its determinant goes to zero under a flat prior the posterior

probability of including a spurious factor in the model converges to 1 asymptotically For example

the probability of misidentifying a spurious factor as being the true source of risk is almost 1

under Jeffreys prior even for a very short sample This in turn makes the overall process of model

selection invalid

Overall we find the asymptotic behavior of the spike-and-slab prior encouraging for variable

and model selection While often somewhat conservative in very short samples it successfully

eliminates the impact of the spurious factors from the model and identifies the true sources of

risk

IV Empirical Applications

In this section we apply our Bayesian approach to a large set of factors proposed in the previous

literature First we use the Bayesian Fama-MacBeth method to analyse several notable factor

models (subsection IV1) Second we consider 51 tradable and non-tradable factors yielding more

than two quadrillion possible models and employ our spike and slab priors to compute factorsrsquo

posterior probabilities and implied risk premia (subsections IV2 and IV3) Third we compare

the performance of a (low dimensional) robust model constructed with only the factors that have

high posterior probability to the one of several notable factor models (subsection IV4) Fourth

we estimate the degree of sparsity (in terms of linear factors) of the true latent stochastic discount

factor as well as the SDF-implied maximum Sharpe ratio (subsection IV5)

27

IV1 Some notable factor models

In this section we illustrate the differences between the frequentist and Bayesian FM estimation

(both OLS and GLS) for several candidate models In particular we estimate a set of linear

factor models on the returns of the standard 25 Fama-French portfolios sorted by size and value

using frequentist and Bayesian Fama-MacBeth estimators We use monthly data over the 197001-

201712 sample for tradable factors and whenever possible nontradables For factors available

only at quarterly frequency the sample is 1952Q1-2017Q3 (whenever possible) A full description

of the data and models used can be found in Appendix B Additional empirical results for other

candidate factors and cross-sections (eg 25 Fama-French + 17 Industry portfolios) can be found

in Appendix C

Tables 5 and 6 summarize the performance of several leading factor models For the classical FM

approach we report point estimates of risk premia with their Shanken-corrected t-statistics and

the cross-sectional R2 along with its 90 confidence interval (constructed following the method-

ology of Lewellen Nagel and Shanken (2010)) For BFM we report the posterior mean of risk

premia estimates and the posterior median and mode of R2 along with the centered 90 posterior

coverage We chose to report both the median and the mode for cross-sectional fit because the of

the shape of its distribution which is often heavily skewed

Carhart (1997) 4-factor model OLS and GLS Fama-MacBeth estimates of risk premia indicate

that size value and momentum (SMB HML and UMD correspondingly) are significant drivers of

the cross-section of test assets The market factor does not command a significant risk premium

which is a typical finding for this model Cross-sectional fit seems to be high with R2 over 70 even

though it comes with rather wide confidence bounds according to Lewellen Nagel and Shanken

(2010) approach The Bayesian estimation indicates that part of the model success is due to the fact

that this cross section of test assets does not have much exposure to momentum especially after

one controls for the conventional Fama-French factors While still marginally significant its risk

premium is substantially lower under both BFM and BFM-GLS estimation with tighter bounds

for R2 too On the contrary both HML and SMB have virtually identical risk prices under both

FM and Bayesian estimations

Hou Xue and Zhang (2014) q-factor model emphasized the role of investment and profitability

in matching the cross-section of equity returns and true to the data we find these factors strongly

priced Both estimation strategies give identical parameter estimates and largely consistent mea-

sures of cross-sectional fit This in turn implies that all the risk premia are strongly identified

for this model when estimated on a cross-section of 25 Fama-French portfolios When industry

portfolios are among the test assets (see Appendix C) risk premia and R2 decline and some of the

parameters lose significance but broadly speaking the model performs in a qualitatively similar

way

Liquidity-adjusted CAPM of Pastor and Stambaugh (2003) seems to suffer from identification

failure as the risk premium on the liquidity factor ceases to remain significant when BFM is used

in estimation Wide confidence bounds and uncertain cross-sectional fit provide a stark difference to

28

Table 5 Tradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0489 7063 0703 6432 6329[-0244 1222] [3160 9400] [-0061 1426] [4826 7646]

MKT 0120 -0101[-0631 0870] [-0822 0683]

SMB 0171 0164[0100 0241] [0089 0232]

HML 0404 0396[0331 0477] [0330 0466]

UMD 2445 1806[0955 3936] [0259 3328]

q-factor model Intercept 0912 6567 0922 6062 6123Hou Xue and Zhang (2014) [0286 1539] [3040 8680] [0276 1560] [4131 7640]

ROE 0394 0377[0016 0771] [-0020 0789]

IA 0387 0385[0203 0571] [0208 0580]

ME 0274 0268[0169 0379] [0158 0376]

MKT -0371 -0378[-0995 0252] [-1005 0272]

Liquidity-CAPM Intercept 0973 3624 1162 3409 3027Pastor and Stambaugh (2000) [-0084 2030] [-909 10000] [0175 2120] [-239 6146]

LIQ 3057 1785[0727 5388] [-1237 4150]

MKT -0281 -0449[-1350 0788] [-1371 0509]

Panel A GLS

Carhart (1997) Intercept 1017 8964 1083 8587 863[0389 1645] [8200 9760] [0458 1717] [8085 9105]

MKT -0434 -0504[-1065 0196] [-1150 0122]

SMB 0191 0189[0150 0233] [0150 0230]

HML 0356 0356[0313 0400] [0316 0395]

UMD 1626 1264[0479 2772] [0077 2401]

q-factor model Intercept 1305 5503 1277 4728 4854Hou Xue and Zhang (2014) [0779 1831] [2440 9640] [0702 1879] [3245 6419]

ROE 0295 0266[-0026 0615] [-0087 0640]

IA 0270 0265[0104 0437] [0093 0450]

ME 0251 0246[0161 0341] [0144 0345]

MKT -0749 -0720[-1268 -0229] [-1292 -0156]

Liquidity-CAPM Intercept 1244 4938 1256 5298 4317Pastor and Stambaugh (2000) [0664 1824] [2691 9891] [0738 1749] [1250 6653]

LIQ 1141 0775[-0232 2514] [-0450 2116]

MKT -0664 -0678[-1242 -0086] [-1176 -0162]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of 25 Fama-French monthly excess returns Each model is estimated via OLS and GLS We reportpoint estimates and 5 confidence intervals for risk premia which are constructed based on the asymptotic normaldistribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nagel and Shanken(2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λ denoted byλj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (595) credible intervals and denote significance at the 90 95 and 99 level respectively

29

Table 6 Nontradable factors and 25 Fama-French portfolios sorted by size and value

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled CCAPM Intercept 1046 2567 1791 3436 2919[-0848 2940] [-1429 10000] [0001 3723] [-476 6207]

cay 1817 0791[-0653 4288] [-1347 2686]

∆Cnd 0713 0303[-0030 1456] [-0462 0951]

∆Cnd times cay 0804 0301[-1645 3253] [-1911 2270]

HC-CAPM Intercept 3243 -122 3090 354 957[1228 5257] [-909 3345] [0790 5259] [-748 4431]

∆Y 0464 0085[-0213 1140] [-1119 1058]

MKT -0719 -0656[-2680 1242] [-2859 1558]

Durable CCAPM Intercept 2214 5238 2780 471 4078[-1037 5465] [2800 10000] [-0184 5751] [120 6991]

∆Cnd 0743 0357[-0025 1511] [-0207 0832]

∆Cd -0057 0014[-0719 0605] [-0668 0693]

MKT 0083 -0495[-3322 3489] [-3395 2555]

Panel B GLS

Scaled CCAPM Intercept 2180 -1024 2257 -658 -313[0825 3536] [-1429 6457] [1221 3258] [-1187 1517]

cay 0435 0256[-0774 1643] [-0688 1217]

∆Cnd 0118 0089[-0266 0502] [-0214 0407]

∆Cnd times cay 0141 0063[-1005 1286] [-0845 0938]

HC-CAPM Intercept 2730 5636 2759 5824 4926[1458 4002] [3018 8364] [1379 4095] [967 7507]

∆Y -0421 -0241[-0742 -0099] [-0598 0114]

MKT -0717 -0740[-1979 0545] [-2073 0622]

Durable CCAPM Intercept 2960 4454 2841 5474 4099[0547 5374] [286 7829] [1102 4558] [-241 7215]

∆Cnd 0105 0052[-0265 0475] [-0201 0311]

∆Cd 0055 0025[-0390 0501] [-0286 0327]

MKT -0941 -0822[-3314 1432] [-2528 0895]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of 25 Fama-French quarterly excess returns Each model is estimated via OLS and GLS Wereport point estimates and 5 confidence intervals for risk premia which are constructed based on the asymptoticnormal distribution and cross-sectional R2 and its (5 95) confidence level constructed as in Lewellen Nageland Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide the posterior mean of λdenoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as wellas its (5 95) credible intervals and denote significance at the 90 95 and 99 level respectively

the pointwise estimates and their seemingly high significance levels under the standard frequentist

approach

Conditional CCAPM of Lettau and Ludvigson (2001) is weakly identified at best Unlike the

basic FM estimation that indicates a relative empirical success of the model the Bayesian approach

reveals most risk premia to be substantially lower losing all the accompanying statistical and

30

economic significance This is particularly pronounced in the BFM-GLS that delivers both risk

premia and cross-sectional R2 close to zero

Labour-adjusted CAPM of Jagannathan and Wang (1996) extends the classic CAPM framework

by introducing a proxy for human capital and finds it strongly priced in the cross-sections of stocks

returns The BFM estimates of risk premia are substantially lower and no longer significant with

the same patterns observed under both OLS and GLS procedures

Durable CCAPM of Yogo (2006) in the linearized version included the durable consumption

factor and found that its impact is priced in a number of cross-sections sorted by size and value past

betas and other characteristics Even though the Lewellen Nagel and Shanken (2010) approach

indicates a really wide support for the cross-sectional R2 the model found empirical support in the

data We find that both durable and nondurable consumption are weak predictors of the cross-

section of returns as the magnitude of their risk premia substantially declines and is no longer

significant The model is still characterized by a wide confidence interval for R2 but overall its

pricing ability is questionable at best

Appendix C provides additional empirical results on the performance of both frequentist and

bayesian Fama-MacBeth estimators In many cases when the models are well specified and strongly

identified in the data there is almost no distinction between the two approaches One notable

difference however are the confidence intervals of the R2 that are often notoriously wide in

the frequentist case There are also cases however when the difference in model performance

becomes large affecting both risk premia estimates and measures of cross-sectional fit Similar

to Gospodinov Kan and Robotti (2019) we caution the reader against blindly relying on the

estimates produced by conventional Fama-MacBeth procedure and advocate a robust approach to

inference

IV2 Sampling two quadrillion models

We now turn our attention to a large cross-section of candidate asset pricing factors In particular

we focus on 51 (both traded and non) monthly factors available from October 1973 to December

2016 (ie T = 600) Factors are described in details in Table B1 of Appendix B In choosing

the cross-section of assets to price we follow Lewellen Nagel and Shanken (2010) and employ 25

Fama-French size and book-to-market portfolios plus 30 Industry portfolio (ie N = 55) Since

we do not restrict the maximum number of factors to be included all the possible combinations

of factors give us a total of 251 possible specifications ie 225 quadrillion models Note that each

model involves 55 time series regressions and one cross-section regression ie we jointly evaluate

the equivalent of 126 quadrillion regressions

We employ the continuous spike-and-slab approach of section II23 since it is the most suited

for handling a very large number of possible models and report both the posterior (given the data)

of each factor (ie E [γj |data] forallj) as well as the posterior means of the factorsrsquo risk premia (ie

E [λj |data] forallj) computed as the Bayesian Model Average16 across all the models considered

16If we are interested in some quantity ∆ that is well-defined for every model j = 1 M (eg risk premia

31

The posterior evaluation is performed and reported over a wide range for the parameter (ψ in

equation (16)) that controls the degree of shrinkage of potentially useless factorsrsquo risk premia from

ψ = 1 (ie very strong shrinkage) to ψ = 100 (making the shrinkage virtually irrelevant) The

prior for each factor inclusion is a Beta(1 1) (ie a uniform on [0 1]) yielding a prior expectation

for γj equal to 5017

000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 5 10 20 50 100

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB STRev

IPGrowth BEH_PEAD NONDUR SERV LIQ_TR

PE DeltaSLOPE BW_ISENT DEFAULT Oil

DIV REAL_UNC UNRATE TERM HJTZ_ISENT

UMD CMA LIQ_NT FIN_UNC STOCK_ISS

NetOA MACRO_UNC INV_IN_ASS COMP_ISSUE ACCR

HML ROA RMW CMA LTRev

INTERM_CAP_RATIO ASS_Growth DISSTR PERF BAB

IA MGMT ROE O_SCORE BEH_FIN

GR_PROF QMJ RMW SKEW SMB

HML_DEVIL

Figure 4 Posterior factor probabilities

Posterior probabilities of factors E [γj |data] estimated over the 197310-201612 sample using a cross-section of 25Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the continuous spike andslab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models The 51 factors considered aredescribed in Table B1 of Appendix B The prior distribution for the j-th factor inclusion is a Beta(1 1) yielding aprior expectation of γj = 50 Posterior probabilities are plotted for ψ isin [1 100]

Figure 4 plots the posterior probabilities of the 51 factors as a function of the parameter ψ

The corresponding values are reported in Table 7 Overall the inclusion of only four factors finds

maximum Sharpe ratio etc) from Bayesrsquo theorem we have

E [∆|data] =M983131

j=0

E [∆|datamodel = j] Pr (model = j|data)

where E [∆|datamodel = j] = limLrarrinfin1L

983123Li=1 ∆(θ

(j)i ) and

983153θ(j)i

983154L

i=1denotes draws from the posterior distribution

of the parameters of model j That is the (Bayesian model average) expectation of ∆ conditional on only thedata is simply the weighted average of the expectation in every model with weights equal to the modelsrsquo posteriorprobabilities See eg Raftery Madigan and Hoeting (1997) Hoeting Madigan Raftery and Volinsky (1999)

17Using a Beta(2 2) that still implies a prior probability of factor inclusion of 50 but lower probabilities forvery dense and very sparse models we obtain virtually identical results Furthermore using a prior in favour of moresparse factor models (ie a Beta(2 8)) the empirical findings are very similar to the ones reported

32

Table 7 Posterior factor probabilities E [γj |data] and risk premia in 225 quadrillion models

E [γj |data] E [λj |data]ψ ψ

Factors 1 5 10 20 50 100 1 5 10 20 50 100 F

HML 0860 0909 0916 0903 0882 0877 0192 0288 0301 0300 0292 0293 0377MKT 0730 0807 0831 0851 0837 0778 0108 0271 0370 0476 0565 0572 0514MKT 0501 0630 0705 0748 0749 0707 0047 0184 0285 0385 0467 0472 0563SMB 0654 0662 0580 0500 0429 0370 0076 0138 0133 0117 0103 0102 0215STRev 0506 0526 0511 0530 0568 0550 0003 0013 0026 0049 0106 0158 0438IPGrowth 0508 0508 0501 0502 0508 0502 0000 -0001 -0001 -0002 -0004 -0007 0121BEH PEAD 0486 0512 0478 0492 0511 0517 0004 0012 0018 0027 0049 0072 0619NONDUR 0475 0499 0490 0504 0504 0509 0000 0002 0003 0005 0011 0018 0170SERV 0504 0481 0497 0502 0491 0497 0000 0000 0000 0000 -0001 -0001 0052LIQ TR 0494 0493 0485 0485 0506 0507 0000 0005 0010 0020 0048 0083 0438PE 0495 0494 0502 0490 0494 0492 -0002 -0004 -0005 -0006 -0008 -0009 6401DeltaSLOPE 0478 0490 0506 0498 0482 0511 0000 0000 -0001 -0001 -0002 -0005 0096BW ISENT 0489 0491 0496 0498 0504 0477 0000 0002 0003 0005 0010 0014 0094DEFAULT 0497 0498 0490 0497 0484 0490 0000 0000 0000 0001 0001 0002 0313Oil 0495 0504 0489 0490 0494 0483 0002 0011 0019 0034 0068 0108 0852DIV 0488 0491 0486 0494 0491 0505 0000 0000 0000 -0001 -0002 -0003 0876REAL UNC 0479 0490 0503 0490 0498 0484 0000 0000 0000 0000 0000 0000 0043UNRATE 0483 0499 0484 0490 0490 0486 0000 -0001 -0002 -0003 -0006 -0008 1083TERM 0470 0476 0480 0497 0503 0501 0000 0003 0005 0009 0018 0030 0942HJTZ ISENT 0483 0480 0491 0487 0489 0477 0000 0000 0000 -0001 0000 0000 0210UMD 0531 0537 0536 0495 0422 0384 0028 0069 0089 0100 0102 0108 0646CMA 0499 0504 0491 0495 0466 0442 0001 0000 -0002 -0005 -0008 -0012 0242LIQ NT 0477 0478 0494 0493 0481 0471 -0003 -0005 -0004 -0002 0009 0019 0501FIN UNC 0481 0477 0471 0477 0489 0491 0000 0000 0000 0000 -0001 -0001 0096STOCK ISS 0515 0537 0509 0473 0433 0374 -0032 -0081 -0096 -0103 -0110 -0105 0515NetOA 0504 0504 0481 0489 0423 0402 0005 0015 0022 0032 0040 0042 0544MACRO UNC 0476 0483 0479 0471 0463 0430 0000 0000 0000 0000 0000 0000 0073INV IN ASS 0479 0476 0479 0461 0454 0440 0001 0002 0005 0010 0023 0029 0549COMP ISSUE 0514 0518 0503 0481 0393 0363 0037 0083 0101 0111 0105 0108 0497ACCR 0483 0477 0486 0472 0436 0415 -0009 -0012 -0009 0001 0015 0015 0343HML 0491 0481 0479 0469 0431 0376 -0002 -0001 0001 0004 0008 0010 0251ROA 0517 0514 0499 0473 0399 0319 0041 0105 0141 0162 0154 0123 0551RMW 0508 0481 0483 0465 0397 0347 0002 0008 0013 0017 0017 0016 0219CMA 0560 0530 0499 0432 0351 0292 0031 0056 0062 0058 0051 0044 0351LTRev 0485 0491 0474 0442 0402 0350 -0007 -0025 -0032 -0035 -0036 -0029 0252INTERM CAP RATIO 0499 0477 0452 0451 0380 0324 0027 0037 0054 0082 0103 0104 0790ASS Growth 0473 0470 0458 0439 0380 0327 0001 0005 0005 0000 -0005 -0003 0525DISSTR 0505 0487 0456 0410 0340 0269 0043 0094 0105 0104 0085 0070 0475PERF 0541 0497 0448 0390 0319 0259 -0054 -0079 -0072 -0064 -0054 -0049 0651BAB 0460 0448 0433 0405 0372 0325 -0010 -0012 -0008 0006 0027 0037 0921IA 0497 0465 0425 0385 0325 0257 -0010 -0011 -0009 -0008 -0008 -0007 0409MGMT 0516 0469 0424 0379 0302 0244 0028 0055 0058 0056 0050 0044 0631ROE 0482 0446 0414 0364 0299 0233 0009 0024 0027 0028 0024 0018 0555O SCORE 0475 0426 0403 0368 0281 0229 -0009 -0014 -0015 -0018 -0016 -0012 002BEH FIN 0483 0410 0360 0321 0238 0196 -0016 -0008 -0003 0000 0003 0003 076GR PROF 0459 0405 0366 0314 0249 0206 0011 0005 0008 0011 0012 0014 0199QMJ 0502 0408 0369 0300 0232 0178 0030 0030 0026 0018 0014 0011 0405RMW 0464 0395 0356 0302 0245 0193 0015 0025 0028 0027 0025 0020 0292SKEW 0481 0388 0345 0287 0219 0179 -0001 0000 0000 0000 0000 0000 0438SMB 0336 0305 0319 0312 0311 0277 0019 0032 0041 0047 0054 0051 0257HML DEVIL 0434 0326 0289 0242 0183 0152 0014 0013 0009 0005 0003 0002 0356

Posterior probabilities of factors E [γj |data] and posterior mean of factor risk premia E [λj |data] computed usingthe continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225 quadrillion models Theprior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50 The last column reportssample average returns for the tradable factors The data is monthly 197310 to 201612 Test assets cross-sectionof 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered are described inTable B1 of Appendix B Numbers denoted with the asterisk in the last column correspond to the return on thefactor-mimicking portfolio of the nontradable factor constructed by a linear projection of its values on the set of 51test assets and scaled to have the same volatility as the original nontradable factor

33

substantial support in our empirical analysis First the celebrated Fama-French HML (high-minus-

low) designed to capture the so-called lsquovalue premiumrsquo is a strong determinant of the cross-section

of asset returns For ψ = 10 (a reasonable benchmark) its posterior probability is about 895 and

only for very strong shrinkage (ψ = 1) the posterior probability gets reduced to 639 Second

the market factor in the version of Daniel Mota Rottke and Santos (2018) (MKTlowast that is

meant to have hedged out the unpriced risk contained in the market index) has also high posterior

probability (856 for ψ = 10) Third the simple market factor (MKT) seems also to be a robust

source of priced risk albeit with an empirical performance weaker than MKTlowast Fourth albeit to

a lesser extent SMBlowast the Daniel Mota Rottke and Santos (2018) version of the small-minus-big

Fama-French factor (meant to capture the so called lsquosizersquo premium) seems also to contain relevant

information for pricing the cross-section of asset returns with a posterior probability in the 60-

70 range for most values of ψ Beside the ones above mentioned all other factors have posterior

probabilities of about 50 or less for all values of ψ Interestingly the results are not very sensitive

to the choice of ψ

In addition to the posterior probabilities of the factors Table 7 reports the posterior means

of the factor risk premia computed as Bayesian Model Average ie the weighted average of the

posterior means in each possible factor model specification with weights equal to the posterior

probability of each specification being the true data generating process (see eg Roberts (1965)

Geweke (1999) Madigan and Raftery (1994)) The results are not very sensitive to the choice of

ψ except when considering very small values for of the shrinkage parameter ψ since in this case

posterior means are shrunk toward zero Interestingly the estimated price of risk for the market

factor is positive despite it being very often estimated as a negative quantity when considering

multifactor models and not dissimilar from the market excess return over the same period More

generally there is a clear pattern in cross-sectionally estimated (ie ex post) factor risk premia and

their simple time series average estimates (reported in the last column of Table 7) for the robust

four factors (HML MKTlowast MKT and SMBlowast) ex post risk premia are very similar to the time series

estimnates while the opposite holds true for the other factors In other words robust factors seems

to price themselves well (since theoretically their own beta is one) while other factors donrsquot

IV3 Estimating 26 million sparse factor models

Instead of drawing unrestricted factor models specifications we now constraint models to have a

maximum of 5 factors ndash ie we are imposing sparsity on the implied stochastic discount factor as

in most of the previous empirical asset pricing literature that has tried to identify low dimensional

factor models to explain the cross-section of asset returns Given our set of 51 factors this approach

yields about 26 millions of possible models (ie the equivalent of about 147 million time series and

cross-sectional regressions) Since we now do not sample the possible models posterior probabilities

are computed using the marginal likelihoods of all these models ie the posterior probability of

model γj is computed as

Pr(γj |data) =p(data|γj)983123i p(data|γi)

34

000

1000

2000

3000

4000

5000

6000

7000

8000

1 5 10 20 50 60

Post

erio

r pro

babi

lity

ψ

HML MKT MKT SMB PERF

STOCK_ISS COMP_ISSUE CMA ROA UMD

BEH_PEAD STRev NONDUR DISSTR MGMT

BW_ISENT TERM FIN_UNC LIQ_TR DeltaSLOPE

IPGrowth Oil SERV DEFAULT NetOA DIV PE REAL_UNC HJTZ_ISENT UNRATE

INTERM_CAP_RATIO LIQ_NT QMJ MACRO_UNC INV_IN_ASS

ACCR LTRev CMA ASS_Growth HML

RMW BAB IA O_SCORE ROE

SMB SKEW BEH_FIN GR_PROF RMW

HML_DEVIL

Figure 5 Posterior factor probabilities

Posterior probabilities of factors Pr [γj = 1|data] estimated over the 197310-201612 sample using a cross-section of25 Fama-French size and book-to-market and 30 Industry test asset portfolios computed using the Dirac spike andslab approach of section II22 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models The prior probability of a factor being included is about 1038 The 51 factors considered aredescribed in Table B1 of Appendix B Posterior probabilities are plotted for ψ isin [1 100]

where we have assigned equal prior probability to all possible specifications and p(data|γj) denotes

the marginal likelihood of the j-th model To both simplify the numerical computation and to

illustrate the qualities of the approach we use the Dirac spike-and-slab prior of section II22 since

we can leverage its closed form solution for the marginal likelihoods18

Posterior factor probabilities are reported in Figure 5 and jointly with the Bayesian model

averaging of risk premia across the sparse models in Table C15 of the Appendix Note that in

this case all factors have an ex ante probability of being included equal to 1038 The results

are strikingly similar to the ones in Table 7 and Figure 4 as before only for four factors (HML

MKTlowast MKT and SMBlowast) we observe a marked increase in the posterior probability of inclusion

after observing the data Furthermore these factors seems to price themselves well ndash ie the BMA

of their risk premia are very similar to the sample average of their excess returns ndash while other

factors do not

35

Table 8 Factor Models with highest posterior probability (Dirac spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEADDeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232 983232 983232IABABDISSTRROEMGMT 983232 983232O SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMB

Probability () 00666 00599 00574 00522 00503 00460 00437 00428 00423 00393

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 26 millioncandidate models and a model prior probability of the order of 10minus7 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

36

IV4 A robust factors model

The previous subsections suggest that only a small number of factors (HML MKTlowast MKT and to a

lesser extent SMBlowast) are robust explanators of the cross-section of asset returns Furthermore Table

8 that reports the ten factor model specifications with the highest posterior probabilities19 shows

that these robust factors tend to be almost always included in the most likely models both HML

and MKTlowast are featured in all ten specifications while MKT is excluded only once and SMBlowast is

included in the five most likely specification plus the tenth one Note that the posterior probabilities

in Table 8 might appear small in absolute terms but are actually four orders of magnitude larger

than the prior model probabilities (equal to one over the number of models considered)

Therefore a natural question is whether the four factors identified as robust in the above

analysis do indeed deliver a significantly better cross-sectional asset pricing model We answer this

questions by comparing the performance of a four factor model with HML MKTlowast MKT and SMBlowast

as factors to the one of several notable factor models20 In particular Table 9 reports the model

posterior probabilities for the specifications considered ie the probability of any of these models

being the true data generating process Strikingly for almost any value of ψ the model posterior

probabilities are in the single digits range for all models but the robust factors one the probability

of this specification is always higher than 85 except when using a very strong shrinkage (in which

case it is reduced to 65) Furthermore for ψ in the most salient range (10-20) the posterior

probability of the robust factors model is about 90

Table 9 Posterior probabilities of notable models vs robust factors

ψmodel 1 5 10 20 50 100

CAPM 002 001 001 002 004 008Fama and French (1992) 005 001 001 001 000 000Fama and French (2016) 009 006 005 004 003 002Carhart (1997) 005 001 001 001 001 001Hou Xue and Zhang (2015) 002 000 000 000 000 000Pastor and Stambaugh (2000) 001 000 000 001 001 002Asness Frazzini and Pedersen (2014) 010 003 002 002 002 001Robust Factors Model 065 088 090 090 089 085

Posterior model probabilities for the specifications in the first column for different values of ψ computed using theDirac spike-and-slab prior The models and their factors are described in Appendix B The model in the last rowuses the HML MKT MKTlowast and SBMlowast factors described in Table B1 The data is monthly 197310 to 201612 Testassets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios

18Alternatively one could use the continuous spike-and-slab approach employed in the previous subsection anddrop the draws of specifications with more than 5 factors but this significantly reduces computational efficiency

19Table 8 focuses on the Dirac spike-and-slab specification with ψ = 10 Very similar results are reported in tablesC16-C18 of Appendix C for other values of ψ and for the continuous prior case

20Note that the correlation between MKT and MKTlowast is not too large being about 064

37

IV5 The sparsity of the stochastic discount factor

We have shown that a robust model with only four ndash robust ndash factors is much more likely to

capture the true stochastic discount factor than all the notable models considered (see Table 9)

Nevertheless are only four factors likely to deliver a sufficiently accurate representation of the true

latent stochastic discount factor

Thanks to our Bayesian method this question can be easily answered In particular by using

our estimations of about 225 quadrillion models and their posterior probabilities we can compute

the posterior distribution of the dimensionality of the lsquotruersquo model That is for any integer number

between one and fifty-one we can compute the posterior probability of the SDF being a function

of that number of factors

Figure 6 reports the posterior distributions of the dimensionality of the SDF for various values

of ψ These distributions are also summarized in Table 10 For the most salient values of ψ (10 and

20) the posterior mean of the number of factors in the true SDF is in the 24-25 range and the 95

posterior credible intervals are contained in the 17 to 32 factors range That is there is substantial

evidence that the SDF is dense in the space of the linear factors considered given the factors at

hand a relative large number of them is needed to provide an accurate representation of the true

latent SDF Since most of the literature has focused on very low dimension linear factor models

this finding suggests that most empirical results therein have been affected by a large degree of

misspecification

It is worth noticing that as Figure 6 and Table 10 show for very large ψ ie with basically

a flat prior for factor risk premia the posterior dimensionality is reduced This is due to two

phenomena we have already outlined First if some of the factors are useless (and our analysis

points in this direction) under a flat prior they do tend to have a higher posterior probability and

drive out the true sources of priced risk Second a flat prior for the risk premia can generate a

ldquoBartlett Paradoxrdquo (see the discussion in section II21 and Bartlett (1957))

Table 10 The posterior dimensionality of the stochastic discount factor

ψ1 5 10 20 50 100

mean 2570 2525 2460 2370 2203 2046median 26 25 25 24 22 2025 19 18 18 17 15 145 20 20 19 18 16 1595th 31 31 30 30 28 26975th 33 32 32 31 29 27

Summary statistics of the posterior density of the true model number of factors for various values of ψ Estimated

over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry

test asset portfolios using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1)

38

0 10 20 30 40 50

000

004

008

012

Model Dimensionality

Freq

uenc

y

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 6 Posterior density of the dimensionality of the stochastic discount factor

Posterior density of the true model having the number of factors listed on the horizontal axis Estimated over the

197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30 Industry test asset

portfolios computed using the continuous spike and slab approach of section II23 and 51 factors yielding 251 asymp 225

quadrillion models The prior for each factor inclusion is a Beta(1 1) yielding a prior expectation for γj equal to 50

The 51 factors considered are described in Table B1 of Appendix B Posterior densities are plotted for ψ isin [1 100]

Note that if the factors proposed in the literature were to capture different and uncorrelated

sources of risk one might worry that a SDF that is dense in the space of factors might imply

unrealistically high Sharpe ratios (see eg the discussion in Kozak Nagel and Santosh (2019))

Since given a factor model the SDF-implied maximum Sharpe ratio is just a function of the

factorsrsquo risk premia and covariance matrix our Bayesian method allows to construct the posterior

distribution of the maximum Sharpe ratio for each of the 2 quadrilion models considered Therefore

using the posterior probabilities of each possible model specification we can actually construct

the (Bayesian Model Averaging) posterior distribution of the SDF-implied maximum Sharpe ratio

(conditional on the data only)

Figure 7 and table 11 report respectively the posterior distribution of the (annualized) SDF-

implied maximum Sharpe ratio and its summary statistics for several values of the parameter ψ

Except when a very strong shrinkage (small ψ) is imposed (and hence risk premia and consequently

Sharpe ratios are shrunk toward zero) the posterior distribution of the Sharpe ratio are quite

similar for all values of ψ Furthermore despite the SDF being dense in the space of factors the

posterior maximum Sharpe ratio does not appear to be unrealistically high eg for ψ isin [10 20]

its posterior mean is about 074ndash080 and the 95 posterior credible intervals are in the 050ndash114

range Interestingly Ghosh Julliard and Taylor (2016 2018) provides a non-parametric estimate

of the pricing kernel extracted using an information-theoretic approach and wide cross-sections of

equity portfolios and find SDF-implied maximum Sharpe ratios of very similar magnitude

39

00 05 10 15

01

23

4

Sharpe Ratio

Dens

ity

ψ=1ψ=5ψ=10ψ=20ψ=50ψ=100

Figure 7 Posterior density of the Sharpe ratio of the model-implied stochastic discount factor

Posterior density of the Sharpe ratio of the model-implied stochastic discount factor for various values of ψ isin [1 100]

Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and book-to-market and 30

Industry test asset portfolios computed using the continuous spike and slab approach of section II23 51 factors

described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225 quadrillion possible models

The prior for each factor inclusion is a Beta(1 1)

Table 11 Posterior distribution of the Sharpe ratio of the model-implied stochastic discount factor

ψ1 5 10 20 50 100

mean 044 067 074 080 087 092median 043 066 073 079 086 09125 026 044 050 054 059 0625 029 047 053 058 063 06695th 060 089 099 108 117 124975th 064 094 105 114 124 133

Summary statistics of the posterior distribution of the maximal Sharpe ratio of the model-implied SDF for various

values of ψ isin [1 100] Estimated over the 197310-201612 sample using a cross-section of 25 Fama-French size and

book-to-market and 30 Industry test asset portfolios computed using the continuous spike and slab approach of

section II23 51 factors described in Table B1 of Appendix B and Bayesian Model Averaging of the 251 asymp 225

quadrillion possible models The prior for each factor inclusion is a Beta(1 1)

V Extensions

In additions to the extensions formalized in remarks 1 (on how to handle generated factors such as

principal components and factor mimicking portfolios) and 3 (on how to handle the identification

failure generated by lsquolevel factorsrsquo) our method can be feasibly extended to encompass several

salient generalizations

40

First based on economic considerations one might possibly want to bound the maximum risk

premia (or the maximum Sharpe ratios) associated with the factors This can be achieved by re-

placing the Gaussian distributions in our spike-and-slab priors with (rescaled and centered) Beta

distributions since the latter have bounded support Furthermore for the sake of expositional

simplicity and closed form solutions we have focused on regularising spike and slab priors with

exponential tails Nevertheless our approach that shrinks useless factors based on their correla-

tion with asset returns could be as well implemented using polynomial tailed (ie heavy-tailed)

mixing priors (see Polson and Scott (2011) for a general discussion of priors for regularisation and

shrinkage)21 The rational for using heavy tailed priors is that when the likelihood has thick tails

while the prior has thin tail if the likelihood peak moves too far from the prior mean the posterior

eventually eventually reverts toward the prior This is illustrated in Figure D2 of the Appendix

Nevertheless note that this mechanism (first pointed out in Jeffreys (1961)) is actually desirable

in our settings in order to shrink the risk premia of useless factors toward zero22

Second Lewellen Nagel and Shanken (2010) points out that the first pass time-series regression

is often affected by having a strong factor structure in the residuals Given the hierarchical structure

of our Bayesian approach one can add latent linear components in the time series regression of asset

returns on factors reformulate the time series estimation step as a state-space problem and filter

the latent components (eg via Kalman filter) The posterior sampling of the time series parameters

would then be enriched by the drawing of the added terms as in Bryzgalova and Julliard (2018)

Furthermore one could allow the latent time series factors to be potential priced in the cross-

section (again as in Bryzgalova and Julliard (2018)) This extension would increase the numerical

complexity of the procedure in the time-series step but would nonetheless leave unchanged the

method proposed in this paper at the cross-sectional step (with the only difference that the time

series loadings of the latent factors could be included in the cross-sectional step as if these latent

factors were observable) This extended approach would lead to valid posterior inference and model

selection

Third again thanks to the hierarchical structure of our method time varying time series betas

could be accommodated by adopting the time varying parameters approach of Primiceri (2005)

in the time series step And since in our approach the asset specific expected risk premia are

a parameter estimated in the time series step this extension would also allow for time variation

in asset risk premia Furthermore albeit this would increase the numerical complexity of the

cross-sectional inference step the time varying parameters formulation could also be used for the

modelling of the factor risk premia

21For example albeit alternatives with more desirable properties exist our spike and slab could be implementedusing a Cauchy prior with location parameter set to zero and scale parameter proportional to ψj as defined in equation(16)

22Risk premia have a natural support hence the prior can be chosen (adjusting the paramater ψ) to attach highprobability to it And since useless factors will tend to generate heavy tailed likelihoods with posterior peaks for riskpremia that deviate toward infinity the risk premia of such factors will be shrunk toward the prior mean if the priorhas thin-tails

41

VI Conclusions

We have developed a novel (Bayesian) method for the analysis of linear factor models in asset

pricing The approach can handle the quadrillions of models generated by the zoo of traded and

non traded factors and delivers inference that is robust to the common identification failures and

spurious inference problems caused by useless factors

We have applied our approach to the study of more than two quadrillion factor model spec-

ification and have found that 1) only a handful of factors (the Fama and French (1992) ldquohigh-

minus-lowrdquo proxy for the value premium the market index as well the adjusted versions of the

ldquosmall-minus-bigrdquo size factor and the market factor of Daniel Mota Rottke and Santos (2018))

seem to be robust explanators of the cross-sections of asset returns 2) jointly the four robust fac-

tors provide a model that is compared to the previous empirical literature one order of magnitude

more likely to have generated the observed asset returns (itrsquos posterior probability is about 90)

3) with very high probability the ldquotruerdquo latent stochastic discount factor is dense in the space of

factors proposed in the previous literature ie capturing its characteristics requires the use of 24-25

factors (at the posterior mean of the SDF sparsity) 4) despite being dense in the space of factors

the SDF-implied maximum Sharpe ratio is not excessive suggesting a high degree of commonality

in terms of captured risks among the factors in the zoo

As a byproduct of our novel framework for empirical asset pricing we provide a very simple

Bayesian version of the Fama and MacBeth (1973) regression method (BFM) We show that this

simple procedure (that does neither require optimisation nor tuning parameters and is not harder

to implement than eg the Shanken (1992) correction for standard errors) makes useless factors

easily detectable in finite sample In extensive simulations the BFM and its GLS analogue (BFM-

GLS) perform well even with relatively small time and large cross-sectional dimensions We apply

BFM and BFM-GLS to several notable factor models and document that a range of non-traded

factors such as consumption proxies labour factors or the consumption-to-wealth ratio are only

weakly identified at best and are characterised by a substantial degree of model misspecification

and uncertainty

Finally thanks to its hierarchical structure our framework is extremely flexible and can accom-

modate and deliver robust inference in the presence of 1) pre-estimated factors (eg mimicking

portfolios and principal components) 2) latent priced and unpriced factors 3) time varying betas

as well as asset and factor risk premia

42

References

Allena R (2019) ldquoComparing Asset Pricing Models with Traded and Non-Traded Factorsrdquo working paper

Asness C and A Frazzini (2013) ldquoThe Devil in HMLs Detailsrdquo Journal of Portfolio Management 39 49ndash68

Asness C S A Frazzini and L H Pedersen (2019) ldquoQuality minus Junkrdquo Review of Accounting Studies24(1) 34ndash112

Avramov D and G Zhou (2010) ldquoBayesian Portfolio Analysisrdquo Annual Review of Financial Economics 225ndash47

Baker M and J Wurgler (2006) ldquoInvestor Sentiment and the Cross-Section of Stock Returnsrdquo Journal ofFinance 61 1645ndash80

Baks K P A Metrick and J Wachter (2001) ldquoShould Investors Avoid All Actively Managed Mutual FundsA Study in Bayesian Performance Evaluationrdquo Journal of Finance 56 45ndash85

Ball R (1985) ldquoAnomalies in Relationships between Securitiesrsquo Yields and Yield-Surrogatesrdquo Journal of FinancialEconomics 6 103ndash126

Barillas F and J Shanken (2018) ldquoComparing Asset Pricing Modelsrdquo The Journal of Finance 73(2) 715ndash754

Bartlett M S (1957) ldquoComment on a Statistical Paradox by DV Lindleyrdquo Biometrika 44(1-2) 533ndash534

Basu S (1977) ldquoInvestment Performance of Common Stocks in Relation to their Price-Earnings Ratio A Test ofthe Efficient Market Hypothesisrdquo Journal of Finance 32 663ndash82

Belloni A V Chernozhukov and C Hansen (2014) ldquoInference on treatment effects after selection amonghigh-dimensional controlsrdquo The Review of Economic Studies 81 608ndash650

Breeden D T M R Gibbons and R H Litzenberger (1989) ldquoEmpirical Test of the Consumption-OrientedCAPMrdquo The Journal of Finance 44(2) 231ndash262

Bryzgalova S (2015) ldquoSpurious Factors in Linear Asset Pricing Modelsrdquo working paper

Bryzgalova S and C Julliard (2018) ldquoConsumption in Asset Returnsrdquo London School of EconomicsManuscript

Campbell J Y (1996) ldquoUnderstanding Risk and Returnrdquo Journal of Political Economy 104(2) 298ndash345

Campbell J Y J Hilscher and J Szilagyi (2008) ldquoIn Search of Distress Riskrdquo Journal of Finance 632899ndash2939

Carhart M M (1997) ldquoOn Persistence in Mutual Fund Performancerdquo The Journal of Finance 52(1) 57ndash82

Chan K N-F Chen and D A Hsieh (1985) ldquoAn Exploratory Investigation of the Firm Size Effectrdquo Journalof Financial Economics 14 451ndash71

Chen L R Novy-Marx and L Zhang (2010) ldquoA New Three-Factor Modelrdquo working paper

Chen N-F S A Ross and R Roll (1986) ldquoEconomic Forces and the Stock Marketrdquo Journal of Business 59383ndash403

Chernov M L Lochstoer and S Lundeby (2019) ldquoConditional Dynamics and the Multi-Horizon Risk-ReturnTrade-Offrdquo working paper

Chib S X Zeng and L Zhao (forthcoming) ldquoOn Comparing Asset Pricing Modelsrdquo The Journal of Finance

Cochrane J H (2005) Asset Pricing Revised Edition Princeton University Press

Cooper M J H Gulen and M J Schill (2008) ldquoAsset Growth and the Cross-Section of Stock ReturnsrdquoJournal of Finance 63 1609ndash52

Cremers K M (2002) ldquoStock Return Predictability A Bayesian Model Selection Perspectiverdquo The Review ofFinancial Studies 15(4) 1223ndash1249

Daniel K D Hirshleifer and L Sun (2019) ldquoShort- and Long-Horizon Behavioral Factorsrdquo Review ofFinancial Studies

Daniel K L Mota S Rottke and T Santos (2018) ldquoThe Cross-Section of Risk and Returnrdquo workingpaper

Daniel K and S Titman (2006) ldquoMarket reactions to tangible and intangible informationrdquo Journal of Finance61 1605ndash43

Fabozzi F D Huang and G Zhou (2010) ldquoRobust Portfolios Contributions from Operations Research andFinancerdquo Annals of Operations Research 172 191ndash220

43

Fama E F and K French (2008) ldquoDissecting Anomaliesrdquo Journal of Finance 63 1653ndash78

Fama E F and K R French (1992) ldquoThe Cross-Section of Expected Stock Returnsrdquo The Journal of Finance47 427ndash465

(1993) ldquoCommon Risk Factors in the Returns on Stocks and Bondsrdquo The Journal of Financial Economics33 3ndash56

Fama E F and K R French (2015) ldquoA Five-Factor Asset Pricing Modelrdquo Journal of Financial Economics116 1ndash22

(2016) ldquoDissecting Anomalies with a Five-Factor Modelrdquo Review of Financial Studies 29 69ndash103

Fama E F and J D MacBeth (1973) ldquoRisk Return and Equilibrium Empirical Testsrdquo Journal of PoliticalEconomy 81(3) 607ndash636

Ferson W and C Harvey (1991) ldquoThe Variation of Economic Risk Premiumsrdquo Journal of Political Economy99 385ndash415

Frazzini A and L H Pedersen (2014) ldquoBetting against Betardquo Journal of Financial Economics 111(1) 1ndash25

Garlappi L R Uppal and T Wang (2007) ldquoPortfolio Selection with Parameter and Model Uncertainty AMulti-Prior Approachrdquo Review of Financial Studies 20 41ndash81

George E I and R E McCulloch (1993) ldquoVariable Selection via Gibbs Samplingrdquo Journal of the AmericanStatistical Association 88 881ndash889

(1997) ldquoApproaches for Bayesian Variable Selectionrdquo Statistica Sinica 7 339ndash373

Gertler M and E Grinols (1982) ldquoUnemployment Inflation and Common Stock Returnsrdquo Journal of MoneyCredit and Banking 14 216ndash33

Geweke J (1999) ldquoUsing Simulation Methods for Bayesian Econometric Models Inference Development andCommunicationrdquo Econometric Reviews 18(1) 1ndash73

Ghosh A C Julliard and A Taylor (2016) ldquoWhat is the Consumption-CAPM missing An Information-Theoretic Framework for the Analysis of Asset Pricing Modelsrdquo Review of Financial Studies 30 442ndash504

(2018) ldquoAn Information-Theoretic Asset Pricing Modelrdquo London School of Economics Manuscript

Giannone D M Lenza and G E Primiceri (2018) ldquoEconomic Predictions with Big Data The Illusion ofSparsityrdquo FRB of New York Staff Report No 847

Gibbons M R S A Ross and J Shanken (1989) ldquoA Test of the Efficiency of a Given Portfoliordquo Econometrica57(5) 1121ndash1152

Giglio S G Feng and D Xiu (2019) ldquoTaming the factor Zoordquo Journal of Finance forthcoming

Giglio S and D Xiu (2018) ldquoAsset Pricing with Omitted Factorsrdquo working paper

Gospodinov N R Kan and C Robotti (2014) ldquoMisspecification-Robust Inference in Linear Asset-PricingModels with Irrelevant Risk Factorsrdquo The Review of Financial Studies 27(7) 2139ndash2170

(2019) ldquoToo Good to be True Fallacies in Evaluating Risk Factor Modelsrdquo Journal of Financial Economics132(2) 451ndash471

Goyal A Z L He and S-W Huh (2018) ldquoDistance-Based Metrics A Bayesian Solution to the Power andExtreme-Error Problems in Asset-Pricing Testsrdquo Swiss Finance Institute Research Paper No 18-78 available atSSRN httpsssrncomabstract=3286327

Hall R E (1978) ldquoThe Stochastic Implications of the Life Cycle Permanent Income Hypothesisrdquo Journal ofPolitical Economy 86(6) 971ndash87

Harvey C and Y Liu (2019) ldquoCross-sectional Alpha Dispersion and Performance Evaluationrdquo Journal ofFinancial Economics forthcoming

Harvey C and G Zhou (1990) ldquoBayesian Inference in Asset Pricing Testsrdquo Journal of Financial Economics26 221ndash254

Harvey C R (2017) ldquoPresidential Address The Scientific Outlook in Financial Economicsrdquo The Journal ofFinance 72(4) 1399ndash1440

Harvey C R Y Liu and H Zhu (2016) ldquo and the Cross-Section of Expected Returnsrdquo The Review ofFinancial Studies 29(1) 5ndash68

He A D Huang and G Zhou (2018) ldquoNew Factors Wanted Evidence from a Simple Specification Testrdquoworking paper available at SSRN httpsssrncomabstract=3143752

44

He Z B Kelly and A Manela (2017) ldquoIntermediary Asset Pricing New Evidence from Many Asset ClassesrdquoJournal of Financial Economics 126 1ndash35

Hirshleifer D H Kewei S Teoh and Y Zhang (2004) ldquoDo Investors Overvalue Firms with Bloated BalanceSheetsrdquo Journal of Accounting and Economics 38 297ndash331

Hoeting J A D Madigan A E Raftery and C T Volinsky (1999) ldquoBayesian Model Averaging ATutorialrdquo Statistical Science 14(4) 382ndash401

Hou K C Xue and L Zhang (2015) ldquoDigesting Anomalies An Investment Approachrdquo The Review of FinancialStudies 28(3) 650ndash705

Huang D F Jiang J Tu and G Zhou (2015) ldquoInvestor Sentiment Aligned A Powerful Predictor of StockReturnsrdquo Review of Financial Studies 28 791ndash837

Huang D J Li and G Zhou (2018) ldquoShrinking Factor Dimension A Reduced-Rank Approachrdquo workingpaper available at SSRN httpsssrncomabstract=3205697

Ishwaran H J S Rao et al (2005) ldquoSpike and Slab Variable Selection Frequentist and Bayesian StrategiesrdquoThe Annals of Statistics 33(2) 730ndash773

Jagannathan R and Z Wang (1996) ldquoThe Conditional CAPM and the Cross-Section of Expected ReturnsrdquoThe Journal of Finance 51(1) 3ndash53

Jeffreys H (1961) Theory of Probability Oxford Calendron Press

Jegadeesh N and S Titman (1993) ldquoReturns to Buying Winners and Selling Losers Implications for StockMarket Efficiencyrdquo Journal of Finance 48 65ndash91

(2001) ldquoProfitability of Momentum Strategies An Evaluation of Alternative Explanationsrdquo Journal ofFinance 56 699ndash720

Jones C and J Shanken (2005) ldquoMutual Fund Performance with Learning across Fundsrdquo Journal of FinancialEconomics 78 507ndash552

Jurado K S Ludvigson and S Ng (2015) ldquoMeasuring Uncertaintyrdquo American Economic Review 105 1177ndash1215

Kan R and C Zhang (1999a) ldquoGMM Tests of Stochastic Discount Factor Models with Useless Factorsrdquo Journalof Financial Economics 54(1) 103ndash127

(1999b) ldquoTwo-Pass Tests of Asset Pricing Models with Useless Factorsrdquo the Journal of Finance 54(1)203ndash235

Kass R E and A E Raftery (1995) ldquoBayes Factorsrdquo Journal of the American Statistical Association 90(430)773ndash795

Kelly B S Pruitt and Y Su (2019) ldquoCharacteristics are covariances A unified model of risk and returnrdquoJournal of Financial Economics 134 501ndash524

Kleibergen F (2009) ldquoTests of Risk Premia in Linear Factor Modelsrdquo Journal of Econometrics 149(2) 149ndash173

Kleibergen F and Z Zhan (2015) ldquoUnexplained Factors and their Effects on Second Pass R-squaredsrdquo Journalof Econometrics 189(1) 101ndash116

Kozak S S Nagel and S Santosh (2019) ldquoShrinking the Cross-Sectionrdquo Journal of Financial Economicsforthcoming

Lancaster T (2004) Introduction to Modern Bayesian Econometrics Wiley

Langlois H (2019) ldquoMeasuring Skewness Premiardquo Journal of Financial Economics

Lettau M and S Ludvigson (2001) ldquoResurrecting the (C) CAPM A Cross-Sectional Test when Risk Premiaare Time-Varyingrdquo Journal of political economy 109(6) 1238ndash1287

Lewellen J S Nagel and J Shanken (2010) ldquoA Skeptical Appraisal of Asset Pricing Testsrdquo Journal ofFinancial Economics 96(2) 175ndash194

Lintner J (1965) ldquoSecurity Prices Risk and Maximal Gains from Diversificationrdquo Journal of Finance 20587ndash615

Ludvigson S S Ma and S Ng (2019) ldquoUncertainty and Business Cycles Exogenous Impulse or EndogenousResponserdquo American Economic Journal Macroeconomics

Madigan D and A E Raftery (1994) ldquoModel Selection and Accounting for Model Uncertainty in GraphicalModels Using Occamrsquos Windowrdquo Journal of the American Statistical Association 89(428) 1535ndash1546

45

Novy-Marx R (2013) ldquoThe Other Side of Value The Gross Profitability Premiumrdquo Journal of Financial Eco-nomics 108 1ndash28

Ohlson J A (1980) ldquoFinancial Ratios and the Probabilistic Prediction of Bankruptcyrdquo Journal of AccountingResearch 18 109ndash131

Pastor L (2000) ldquoPortfolio Selection and Asset Pricing Modelsrdquo The Journal of Finance 55(1) 179ndash223

Pastor L and R F Stambaugh (2000) ldquoComparing Asset Pricing Models an Investment Perspectiverdquo Journalof Financial Economics 56(3) 335ndash381

Pastor L and R F Stambaugh (2002) ldquoInvesting in Equity Mutual Fundsrdquo Journal of Financial Economics63 351ndash380

Pastor L and R F Stambaugh (2003) ldquoLiquidity Risk and Expected Stock Returnsrdquo Journal of Politicaleconomy 111(3) 642ndash685

Phillips D L (1962) ldquoA Technique for the Numerical Solution of Certain Integral Equations of the First KindrdquoJ ACM 9(1) 84ndash97

Polson N G and J G Scott (2011) ldquoShrink Globally Act Locally Sparse Bayesian Regularization and Pre-dictionrdquo in Bayesian Statistics 9 ed by J M Bernardo M J Bayarri J O Berger A P Dawid D HeckermanA F M Smith and M West Oxford University Press

Primiceri G E (2005) ldquoTime Varying Structural Vector Autoregressions and Monetary Policyrdquo Review of Eco-nomic Studies 72(3) 821ndash852

Raftery A E D Madigan and J A Hoeting (1997) ldquoBayesian Model Averaging for Linear RegressionModelsrdquo Journal of the American Statistical Association 92(437) 179ndash191

Ritter J R (1991) ldquoThe Long-Run Performance of Initial Public Offeringsrdquo Journal of Finance 46 3ndash27

Roberts H V (1965) ldquoProbabilistic Predictionrdquo Journal of the American Statistical Association 60(309) 50ndash62

Shanken J (1987) ldquoA Bayesian Approach to Testing Portfolio Efficiencyrdquo Journal of Financial Economics 19195ndash215

(1992) ldquoOn the Estimation of Beta-Pricing Modelsrdquo The Review of Financial Studies 5 1ndash33

Sharpe W F (1964) ldquoCapital Asset Prices A Theory of Market Equilibrium under Conditions of Riskrdquo Journalof Finance 19(3) 425ndash42

Sims C A (2007) ldquoThinking about Instrumental Variablesrdquo mimeo

Sloan R (1996) ldquoDo Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future EarningsrdquoThe Accounting Review 71 289ndash315

Stambaugh R F and Y Yuan (2016) ldquoMispricing Factorsrdquo Review of Financial Studies 30 1270ndash1315

Stock J H (1991) ldquoConfidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time SeriesrdquoJournal of Monetary Economics 28(3) 435ndash459

Tikhonov A A Goncharsky V Stepanov and A G Yagola (1995) Numerical Methods for the Solutionof Ill-Posed Problems Mathematics and Its Applications Springer Netherlands

Titman S K J Wei and F Xie (2004) ldquoCapital Investments and Stock Returnsrdquo Journal of Financial andQuantitative Analysis 39 677ndash700

Uppal R P Zaffaroni and I Zviadadze (2018) ldquoCorrecting Misspecified Stochastic Discount Factorsrdquoworking paper

Yogo M (2006) ldquoA Consumption-Based Explanation of Expected Stock Returnsrdquo Journal of Finance 61(2)539ndash580

46

A Additional Derivations

A1 Derivation of the posterior distribution in section II1

Letrsquos consider first the time-series regression We assume that 983171tiidsim N (0Σ) or 983171 sim MVN (0TtimesN Σotimes

IT ) The likelihood of data (RF ) is therefore

p(data|BΣ) = (2π)minusNT2 |Σ|minus

T2 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

After assigning the Jeffreysrsquo prior for (BΣ) π(BΣ) prop |Σ|minusN+1

2 we simplify the likelihood

function by exploiting the fact that OLS estimated residuals are orthogonal to the regressors

(Rminus FB)⊤(Rminus FB) = [Rminus FBols minus F (B minus Bols)]⊤[Rminus FBols minus F (B minus Bols)]

= (Rminus FBols)⊤(Rminus FBols) + (B minus Bols)

⊤F⊤F (B minus Bols)

= T Σ+ (B minus Bols)⊤F⊤F (B minus Bols)

where

Bols =

983075a⊤

β⊤

983076= (F⊤F )minus1F⊤R Σols =

1

T(Rminus FBols)

⊤(Rminus FBols)

Therefore the posterior distribution in the first step is

p(BΣ|data) prop (2π)minusNT2 |Σ|minus

T+N+12 exp

983069minus1

2tr[Σminus1(Rminus FB)⊤(Rminus FB)]

983070

prop |Σ|minusT+N+1

2 eminus12tr[Σminus1(T Σ)]eminus

12tr[Σminus1(BminusBols)

⊤F⊤F (BminusBols)]

Hence the posterior distribution of B conditional on data and Σ is

p(B|Σ data) prop exp

983069minus1

2tr[Σminus1(B minus Bols)

⊤F⊤F (B minus Bols)]

983070

and the above is the kernel of the multivariate normal in equation (8)

If we further integrate out B it is easy to show that

p(Σ|data) prop |Σ|minusT+NminusK

2 exp

983069minus1

2tr[Σminus1(T Σ)]

983070

Therefore the posterior distribution of Σ is the inverse-Wishart in equation (9)

Recall that β = (1N βf ) λ⊤ = (λc λ⊤

f ) If we assume that the pricing error αi follows an

independent and idencitical normal distribution N (0σ2) the likelihood function in the second step

is

p(data|λσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

where data in the second step include (aβf ) drawn from the first step Assuming the diffuse

47

Jeffreysrsquo prior π(λσ2) prop 1σ2 the posterior distribution of (λσ2) is

p(λσ2|dataBΣ) prop (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ)⊤(aminus βλ)

983070

= (σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βλ+ β(λminus λ))⊤(aminus βλ+ β(λminus λ))

983070

= (σ2)minusN+2

2 exp

983069minusN σ2

2σ2

983070exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

there4 p(λ|σ2 dataBΣ) prop exp

983083minus(λminus λ)⊤β⊤β(λminus λ)

2σ2

983084

where λ = (β⊤β)minus1β⊤a and σ2 = (aminusβλ)⊤(aminusβλ)N Therefore the posterior conditional distribution

of λ is the one in equation (12) Finally we can derive the posterior distribution of σ2 by integrating

out λ

p(σ2|dataBΣ) =

983133p(λσ2|dataBΣ)dσ2 prop (σ2)minus

NminusK+12 exp

983069minusN σ2

2σ2

983070

hence obtaining the posterior distribution in equation (13)

A2 Non-spherical pricing errors

Our framework can also easily accommodate non-spherical cross-sectional pricing errors To see

this note that under the null of the model we can rewrite equation (1) as Rt = βλ + βfft + 983171t

Consider the cross-sectional regression and let ET define the sample mean operator Since ET [Rt] =

βλ+ET [βfft]+ET [983171t] = βλ+ET [983171t] the pricing error α should be equal to ET [983171t] Hence under

the hypothesis that the model is correctly specified and in the spirit of the central limit theorem

a suitable distributional assumption for the pricing errors α in the second step is

α|Σ sim N

9830610N

1

983062

Implying the distribution of R equiv ET [Rt] sim N (βλ 1T Σ) The likelihood function in the second step

is then

p(data|λ) = (2π)minusN2

9830559830559830559830551

983055983055983055983055minus 1

2

exp

983069minusT

2(Rminus βλ)⊤Σminus1(Rminus βλ)

983070

prop exp

983069minusT

2(λ⊤β⊤Σminus1βλminus 2λ⊤β⊤Σminus1R)

983070

Hence we can now define the following estimator

Definition 3 (Bayesian Fama-MacBeth GLS2 (BFM-GLS2)) The BFM-GLS2 posterior dis-

tribution of λ is

λ|dataBΣ sim N (λΣλ) (18)

48

where λ = (β⊤Σminus1β)minus1β⊤Σminus1R Σλ = 1T (β

⊤Σminus1β)minus1 and where β and Σ are drawn from the

Normal-inverse-Wishart in (8)-(9)

In the above the conditional expectation λ = (β⊤Σminus1β)minus1β⊤Σminus1R is essentially the Fama-

MacBeth GLS estimate of λ and Σλ is just the standard covariance matrix of the GLS estimates

Different from our OLS version we incorporate the uncertainty of R by drawing λ from the normal

distribution with variance matrix 1T (β

⊤Σminus1β)minus1

In this case two forces are at play to make useless factors detectable in finite sample First as

for the BFM and BFM-GLS cases useless factors will generate posterior draws with diverging 983141λand flipping sing Second differently from our OLS version we incorporate the uncertainty of R

by drawing λ from the normal distribution with variance matrix 1T (β

⊤Σminus1β)minus1

A3 Formal derivation of the flat prior pitfall for risk premia

Following the derivation in section A1 the likelihood function in the second step is

p(data|γλσ2) = (2πσ2)minusN2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070(19)

Assigning a Jeffreysrsquo prior to the parameters23 (λσ2) the marginal likelihood function conditional

on model index γ is

p(data|γ) =983133 983133

p(data|γλσ2)π(λσ2|γ)dλdσ2

prop983133 983133

(σ2)minusN+2

2 exp

983069minus 1

2σ2(aminus βγλγ)

⊤(aminus βγλγ)

983070dλdσ2

=

983133 983133(σ2)minus

N+22 exp

983083minusN σ2

γ

2σ2

983084exp

983083minus(λγ minus λγ)

⊤β⊤γ βγ(λγ minus λγ)

2σ2

983084dλdσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

983133(σ2)minus

Nminuspγ+1

2 exp

983083minusN σ2

γ

2σ2

983084dσ2

= (2π)pγ+1

2 |β⊤γ βγ |minus

12

Γ(Nminuspγ+1

2 )

(N σ2

γ

2 )Nminuspγ+1

2

where λγ = (β⊤γ βγ)

minus1β⊤γ a σ

2γ =

(aminusβγ λγ)⊤(aminusβγ λγ)N and Γ denotes the Gamma function

A4 Proof of Remark 2

Proof Consider two nested linear factor models γ and γ prime The only difference between γ and γ prime

is γp γp equals 1 in model γ but 0 in model γ prime Let γminusp denote a (K minus 1) times 1 vector of model

index excluding γp γ⊤ = (γ⊤minusp 1) and γ prime⊤ = (γ⊤

minusp 0) Suppose that we rearrange the ordering of

23More precisely the priors for (λσ2) are π(λγ σ2) prop 1

σ2 and λminusγ = 0

49

factors such that factor p is the last one To begin with we introduce some matrix notations

βγ = (βγprime βp) Dγ =

983075Dγprime

1ψp

983076 β⊤

γ βγ +Dγ =

983075β⊤γprimeβγprime +Dγprime β⊤

γprimeβp

β⊤p βγprime β⊤

p βp + 1ψp

983076

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|β⊤γ βγ +Dγ | = |β⊤

γprimeβγprime +Dγprime |times |β⊤p βp +

1

ψpminus β⊤

p βγprime(β⊤γprimeβγprime +Dγprime)minus1β⊤

γprimeβp|

|Dγ | = |Dγprime |times 1

ψp

Equipped with the above we have by direct calculation

p(data|γj = 1γminusj)

p(data|γj = 0γminusj)=

|Dγ |12

|β⊤γ βγ+Dγ |

12

1983059

SSRγ2

983060N2

|Dγprime |

12

|β⊤γprimeβγprime+D

γprime |12

1983061

SSRγprime2

983062N2

=

983061SSRγprime

SSRγ

983062N2983061|Dγ ||Dγprime |

983062 12

983075|β⊤

γprimeβγprime +Dγprime ||β⊤

γ βγ +Dγ |

983076 12

=

983061SSRγprime

SSRγ

983062N2

ψminus 1

2p

983055983055983055983055β⊤p βp +

1

ψpminus β⊤

p βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprimeβp

983055983055983055983055minus 1

2

=

983061SSRγprime

SSRγ

983062N29830611 + ψpβ

⊤p

983063IN minus βγprime

983059β⊤γprimeβγprime +Dγprime

983060minus1β⊤γprime

983064βp

983062minus 12

where β⊤p

983147IN minus βγprime(β⊤

γprimeβγprime +Dγprime)minus1β⊤γprime

983148βp = minb(βpminusβγprimeb)⊤(βpminusβγprimeb)+ b⊤Dγprimeb which

is the minimal value of the penalised sum of squared errors when we use βγprime to predict βp

A5 Confidence Intervals for R2 in Fama-MacBeth Estimation

In the spirit of Stock (1991) and Lewellen Nagel and Shanken (2010) for each of the simulations

we use the following approach to construct the confidence interval for cross-sectional R2 produced

by the standard Fama-MacBeth estimation

First we choose a hypothetical true cross-sectional R2 denoted by R2h The expected test asset

returns E[Rt] are assumed to follow E[Rt] = hβλ+α where β and λ are sample estimates of β

and λ from historical data and αiiidsim N (0σ2

e) The constants h and σ2e are chosen to match the

hypothetical cross-sectional R2

Let R denote the vector of historical average asset returns and V arN (x) denote the sample

cross-sectional variance of vector x The population cross-sectional R2 is therefore

R2h =

h2V arN (βλ)

V arN (E[Rt])=

h2V arN (βλ)

h2V arN (βλ) + σ2e

At the same time wersquod like to maintain the historical cross-sectional dispersion of test asset returns

50

so we further have the following equation

V arN (E[Rt]) = h2V arN (βλ) + σ2e = V arN (ET [Rt])

h2 =R2

hV arN (ET [Rt])

V arN (βλ)

σ2e = (1minusR2

h)V arN (ET [Rt])

Solving the system for h and σ2e we simulate a vector of pricing errors from a normal distribution

αiiidsim N (0σ2

e) Since the sample variance of the draws αi is generally different from σ2e with a non-

zero cross-sectional covariance with β we adjust the vector α by (1) subtracting CovN (βλα)

V arN (βλ)βλ

from α such as the sample covariance between α and βλ is zero (2) multiplying α by σeσN (α) in

order that sample cross-sectional variance of α equals σ2e

Second we simulate a random sample of both factors and test asset returns Factors are drawn

with replacement from their empirical sample while test assets returns Rt are assumed to follow a

parametric normal distribution that is

Rt|ftiidsim N (hβλ+α+ βft Σ)24

We then use Fama-MacBeth two-step approach to estimateR2 for every simulated sample ftRtTt=1

The second step is repeated for 1000 times For each hypothetical R2 between 0 and 1 we find the

(5 95) confidence interval in the simulation Finally build a 90 confidence interval for R2 by

including those values of hypothetically true R2 whose 90 confidence interval contains R2

The confidence intervals for GLS R2 can be found in a similar way with the only difference

of focusing on cross-sectional R2 for a linear combination of Rt ie Σminus 12Rt Let Rt = Σminus 1

2Rt

β = Σminus 12 β and 983171t = Σminus 1

2 983171tiidsim N (0 IN ) The simulation then relies on Rt rather than Rt

Rt|ftiidsim N (hβλ+α+ βft IN )

A6 Large N behavior

In this section we investigate the properties of the BFM procedure in estimating the risk premia

and R-squared as well successfully identifying irrelevant factors in the model when applied to a

large cross-section For the sake of brevity here we present the overview of the simulation results

with Tables C4 - C9 summarizing the output

We consider the same simulation design as described at the beginning of Section III except

for the choice of the cross-section of test assets which time series and cross-sectional features we

mimic In our baseline case in the previous subsections we built a cross-section to emulate the 25

Fama-French portfolios sorted by size and value Now instead we consider the properties of the

following composite cross-sections to simulate returns

24Σ is the sample estimate of covariance matrix of random errors in time-series regression in equation (1)

51

(a) N = 55 25 Fama-French portfolios sorted by size and value and 30 industry portfolios

(b) N = 100 25 Fama-French portfolios sorted by size and value 30 industry portfolios 25

profitability and investment portfolios 10 momentum portfolios and 10 long-term reversal

portfolios

The rest of the simulation design stays unchanged ie the strong factor mimics the behavior of

HML with its betas and risk premia corresponding to their in-sample values cross-sectional R2adj

as well as portfolio average returns and variance of the residuals

Tables C4 and C7 focus on the case of including only a strong factor into the model that is

inherently misspecified and estimated on a large cross-section (N = 55 and N = 100 respectively)

Risk premia estimates recovered by BFM are centered around the pseudo-true values and confi-

dence intervals produced by the posterior coverage have the correct size Confidence intervals for

R2adj are equally centered around the true values and overall become quite tight for a large sample

size (eg T ge 600) The GLS version of the cross-sectional fit is slightly lower than than the true

value and this is largely due to the estimation errors in the weight matrix that are particularly

important for a large cross-section but overall are very close to the true values

Tables C5 and C8 focus on the model with just a useless factor As expected the standard

Fama-MacBeth estimation yields confidence intervals for the risk premia with wrong size rejecting

the zero risk premia for a useless factor with increasing frequency as T becomes large In contrast

the BFM inference remains valid with its empirical rejection rates being close to the true size of

the tests

Finally Tables C6 and C9 consider the mixed case of estimating a model that includes both

strong and useless factors Again BFM correctly identifies the presence of a spurious factor in

the specification and if anything becomes somewhat more conservative underrejecting the zero

risk premia associated with it This conservative inference is also shared by the estimates of the

strong factor risk premia with the latter being particularly evident in a large sample estimation

(T = 20 000 N = 100) Again the reason for this additional estimation uncertainty seems to lie

in the large N behavior of the cross-sectional regression (betas and the weight matrix in case of

the GLS) The posterior of a cross-sectional measure of fit remains relatively tight around the true

value

B Data

CAPM Following Sharpe (1964) and Lintner (1965) the only risk factor is the excess return on

market portfolio which is proxied by a value-weighted portfolio of all CRSP firms incorporated in

US and listed on the NYSE AMEX or NASDAQ We use 1-month Treasury rate as a proxy for

risk-free rate The data comes from Ken Frenchrsquos website

Fama-French 3 factor model Fama and French (1992) extend CAPM by introducing two

additional factors SMB and HML where SMB is the return difference between portfolios of stocks

52

with small and large market capitalizations and HML is the return difference between portfolios of

stocks with high and low book-to-market ratio Again the data comes from Ken Frenchrsquos website

Carhart (1997) This paper extends Fama-French 3 factor model by including a momentum

factor UMD (also available from Ken French website)

q-factor model Hou Xue and Zhang (2015) introduce a four-factor model that includes market

excess return a new size factor (ME) proxied by the return difference between large and small

stocks an investment factor (IA) proxied by the return difference of stocks with high and low

investment-to-asset ratio and finally the profitability factor (ROE) created by sorting stocks based

on their return-on-equity ratio We receive the data from the authors An alternative is Fama-

French five factor model but we only show the result of the first one since factors in these two

papers are extremely similar

Liquidity Pastor and Stambaugh (2003) created a liquidity factor based on the fact that order

flows result in larger return reversals when liquidity is lower We download the monthly tradable

liquidity factor from Stambaughrsquos website

Quality-minus-junk Asness Frazzini and Pedersen (2019) introduced the QMJ factor and

demonstrated that profitable growing well-managed companies referred to as lsquoqualityrsquo firms

command a higher rate of return The factor is available from AQR data library

CCAPM Real growth rate in nondurable consumption per capita (quarterly data) is computed

from the data on consumption levels available at St Louis FRED

Scaled CAPM Lettau and Ludvigson (2001) considered a conditional SDFmt+1 = atminusbtMKTt+1

where at and bt are linear function of conditional information cayt We download cayt from the

authorsrsquo website

Scaled CCAPM Similar to conditional CAPM but the SDF includes nondurable consumption

growth instead of the market factor

HC-CCAPM Jagannathan and Wang (1996) added a labor factor to CAPM Following their

paper we compute the returns on human capital as

Rlabort =

Ltminus1 + Ltminus2

Ltminus2 + Ltminus3minus 1 (20)

where Lt is the disposable labor income per capita

Scaled HC-CCAPM Lettau and Ludvigson (2001) extend Jagannathan and Wang (1996) by

considering a conditional SDF in which cay is the only conditional information

Durable consumption model Yogo (2006) emphasized the role of durable consumption goods

in explaining high returns of small stocks and value stocks We consider a three-factor model

market excess return non-durable consumption growth and durable (real) consumption growth Cd

(seasonally adjusted at annual rates) The data is from Yogorsquos website 1952Q1 to 2001Q4

53

B1 Factor list

Table B1 List of the factors for cross-sectional asset pricing models

Factor ID Factor name and description Reference Sourceconstruction

MKT Market excess return Sharpe (1964) Lintner(1965)

Ken French website

SMB Size factor constructed as a long-shortportfolio of stocks sorted by their mar-ket cap (small-minus-big)

Fama and French (1992) Ken French website

HML Value factor constructed as a long-short portfolio of stocks sorted by theirbook-to-market ratio (high-minus-low)

Fama and French (1992) Ken French website

RMW Profitability factor constructed as along-short portfolio of stocks sorted bytheir profitability (robust-minus-weak)

Fama and French (2015) Ken French website

CMA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment activity (conservative-minus-aggressive)

Fama and French (2015) Ken French website

UMD Momentum factor constructed as along-short portfolio of stocks sorted bytheir 12-2 cumulative previous return(up-minus-down)

Carhart (1997) Je-gadeesh and Titman(1993)

Ken French website

STREV Short-term reversal factor constructedas a long-short portfolio of stockssorted by their previous month return

Jegadeesh and Titman(1993)

Ken French website

LTREV Long-term reversal factor constructedas a long-short portfolio of stockssorted by their cumulative return ac-crued in the previous 60-13 months

Jegadeesh and Titman(2001)

Ken French website

q IA Investment factor constructed as along-short portfolio of stocks sorted bytheir investment-to-capital

Hou Xue and Zhang(2015)

Lu Zhang

q ROE Profitability factor constructed as along-short portfolio of stocks sorted bytheir return on equity

Hou Xue and Zhang(2015)

Lu Zhang

LIQ NT Liquidity factor computed as the av-erage of individual-stock measures es-timated with daily data (residual pre-dictability controlling for the marketfactor)

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

LIQ TR Liquidity factor constructed as a long-short portfolio of stocks sorted by theirexposure to LIQ NT

Pastor and Stambaugh(2003)

Robert Stambaughwebsite

MGMT Mispricing factor constructed as acombination of anomalies related tofirmrsquos management practices (stock is-sue accruals asset growth etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

PERF Mispricing factor constructed as acombination of anomalies related tofirmrsquos performance (profitability dis-tress return on assets etc)

Stambaugh and Yuan(2016)

Robert Stambaughwebsite

ACCR Accruals factor constructed as a long-short portfolio of stocks sorted bychanges in operating working capitalper split-adjusted share from the fiscalyear end t-2 to t-1 divided by book eq-uity per share in t-1

Sloan (1996) Robert Stambaughwebsite

54

DISSTR Distress factor constructed as a long-short portfolio of stocks sorted by thepredicted failure probability

Campbell Hilscher andSzilagyi (2008)

Robert Stambaughwebsite

ASS Growth Asset growth factor constructed as along-short portfolio of stocks sorted bygrowth rate of total assets in the previ-ous fiscal year

Cooper Gulen andSchill (2008)

Robert Stambaughwebsite

COMP ISSUE Composite issue factor constructed asa long-short portfolio of stocks sortedby the growth in the firmrsquos total marketvalue of equity above that of the stockrsquosrate of return

Daniel and Titman(2006)

Robert Stambaughwebsite

GR PROF Gross profitability factor constructedas a long-short portfolio of stockssorted by the ratio of gross profit toassets creates abnormal benchmark-adjusted returns

Novy-Marx (2013) Robert Stambaughwebsite

INV IN ASS Investment-in-assets factor con-structed as a long-short portfolio ofstocks sorted by the annual change ingross property plant and equipmentplus the annual change in inventoriesscaled by lagged book value of theassets

Titman Wei and Xie(2004)

Robert Stambaughwebsite

NetOA Net operating assets factor con-structed as a long-short portfolio ofstocks sorted by net operating assets

Hirshleifer Kewei Teohand Zhang (2004)

Robert Stambaughwebsite

OSCORE Ohlson O-score factor constructed as along-short portfolio of stocks sorted bythe predicted value of a distress mea-sure

Ohlson (1980) Robert Stambaughwebsite

ROA Return-on-assets factor constructed asa long-short portfolio of stocks sortedby their return on assets

Chen Novy-Marx andZhang (2010)

Robert Stambaughwebsite

STOCK ISS Stock issuance factor constructed asa long-short portfolio of stocks sortedby their annual log change in split-adjusted shares outstanding

Ritter (1991) Fama andFrench (2008)

Robert Stambaughwebsite

INTERM CR Innovations to the intermediariesrsquo cap-ital (equity) ratio

He Kelly and Manela(2017)

Asaf Manela website

BAB Betting-against-beta factor con-structed as a portfolio that holdslow-beta assets leveraged to a betaof 1 and that shorts high-beta assetsde-leveraged to a beta of 1

Frazzini and Pedersen(2014)

AQR data library

HML DEVIL A version of the HML factor that relieson the current price level to sort thestocks into long and short legs

Asness and Frazzini(2013)

AQR data library

QMJ Auality-minus-junk factor constructedas a long-short portfolio of stockssorted by the combination of theirsafety profitability growth and thequality of management practices

Asness Frazzini andPedersen (2019)

AQR data library

FIN UNC A measure of financial uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

REAL UNC A measure of real economic uncertainty Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

55

MACRO UNC A measure of macroeconomic uncer-tainty

Jurado Ludvigson andNg (2015) LudvigsonMa and Ng (2019)

Sydney Ludvigsonwebsite

TERM Term spread measured as the differ-ence in 10 year Treasury bonds and fedfunds rate

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DELTA SLOPE Change in the difference between a10-year Treasury bond yield and a 3-month Treasury bill yield

Ferson and Harvey(1991)

FRED-MD database

CREDIT Credit spread measured as the dif-ference between Moodyrsquos BAA corpo-rate bond yields and 10-year Treasurybonds

Chen Ross and Roll(1986) Fama and French(1993)

FRED-MD database

DIV Dividend yield Campbell (1996) FRED-MD databasePE Price-earnings ratio Basu (1977) Ball (1985) FRED-MD database

BW INV SENT Investor sentiment measure aggre-gated from a set of indices orthogonalto macroeconomic fundamentals

Baker and Wurgler(2006)

Dashan Huang web-site

HJTZ INV SENT Investor sentiment measure extractedwith PLS from Baker and Wurgler(2006) proxies

Huang Jiang Tu andZhou (2015)

Dashan Huang web-site

BEH PEAD Short-term behavioral factor reflectingpost-earnings announcement drift

Daniel Hirshleifer andSun (2019)

Kent Daniel website

BEH FIN Long-term behavioral factor predom-inantly capturing the impact of shareissuance and correction

Daniel Hirshleifer andSun (2019)

Kent Daniel website

MKTlowast Market factor with a hedged unpricedcomponent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SMBlowast SMB with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

HMLlowast HML with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

RMWlowast RMW with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

CMAlowast CMA with a hedged unpriced compo-nent

Daniel Mota Rottkeand Santos (2018)

Kent Daniel website

SKEW Systematic skewness factor con-structed as a long-short portfolio ofstocks sorted on the their predictedsystematic skewness rank

Langlois (2019) Hugues Langloiswebsite

NONDUR Nondurable consumption growth (realchain-weighted per capita)

Chen Ross and Roll(1986) Breeden Gib-bons and Litzenberger(1989)

Monthly consump-tion expenditurethe chain-weightedprice index andpopulation data arefrom BEA

SERV Growth rate (real chain-weighted percapita) service expenditure

Breeden Gibbons andLitzenberger (1989)Hall (1978)

Monthly expenditurefor services thechain-weighted priceindex and popula-tion data are fromBEA

UNRATE Unemployment rate Gertler and Grinols(1982)

FRED-MD database

IND PROD Growth rate of industrial production Chan Chen and Hsieh(1985) Chen Ross andRoll (1986)

Industrial Produc-tion Index is fromthe Board of Gover-nors of the FederalReserve System

56

OIL Monthly growth rate of the ProducerPrice index for Crude Petroleum (do-mestic production)

Chen Ross and Roll(1986)

PPI is from US Bu-reau of Labor Statis-tics

The table presents the list of factors used in Section IV2 For each of the variables we present their identificationindex the nature of the factor and the source of data for downloading andor constructing the time series

57

C Additional Tables

Table C1 Tests of risk premia in a correctly specified model with a strong factor

λintercept λHML R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0115 0049 0017 0114 0050 0014 -417 4429200 0101 0055 0012 0096 0047 0010 096 6773

FM 600 0106 0053 0011 0100 0048 0011 3052 84561000 0096 0050 0011 0102 0044 0010 4786 901820000 0102 0042 0012 0100 0049 0007 9620 9943

100 0063 0027 0006 0034 0010 0002 -337 3516200 0107 0055 0012 0080 0037 0005 -299 4906

BFM 600 0095 0050 0007 0080 0035 0007 431 69131000 0092 0054 0008 0092 0047 0012 3291 781820000 0108 0054 0012 0110 0052 0008 9654 9849

Panel B GLS

100 0221 0156 0061 0212 0137 0064 1062 7380200 0153 0088 0024 0144 0088 0024 5923 8571

FM 600 0117 0062 0017 0115 0060 0014 8338 94461000 0106 0053 0011 0112 0052 0011 9003 964920000 0100 0058 0010 0103 0047 0011 9947 9981

100 0151 0088 0026 0149 0086 0026 2642 6569200 0123 0066 0016 0124 0066 0015 4907 7352

BFM 600 0106 0055 0012 0105 0056 0011 7640 87301000 0107 0054 0011 0103 0051 0011 8512 915420000 0098 0057 0009 0100 0051 0013 9915 9950

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in a correctlyspecified model with an intercept and a strong factor Hypothetical true value of R2

adj is 100 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

58

Table C2 Tests of risk premia in a correctly specified model with a useless factor

λintercept λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0087 0033 0008 0019 0003 0000 -419 4844200 0071 0034 0006 0052 0011 0000 -420 5684

FM 600 0077 0031 0005 0131 0037 0000 -422 58501000 0071 0035 0006 0205 0066 0002 -416 612820000 0133 0077 0027 0709 0479 0140 -411 7007

100 0039 0011 0002 0001 0001 0000 -229 -012200 0038 0013 0001 0002 0000 0000 -232 030

BFM 600 0048 0012 0002 0017 0006 0001 -221 1261000 0048 0011 0000 0031 0010 0000 -214 16020000 0007 0000 0000 0103 0051 0014 -176 5793

Panel B GLS

100 0215 0130 0052 0211 0117 0033 -310 3727200 0141 0080 0023 0139 0072 0013 -373 2282

FM 600 0108 0053 0014 0142 0065 0010 -377 21161000 0097 0053 0009 0171 0082 0012 -371 209220000 0108 0047 0010 0621 0559 0372 -259 1697

100 0136 0074 0022 0029 0011 0001 -194 1060200 0112 0060 0014 0012 0004 0000 -239 837

BFM 600 0097 0047 0010 0010 0002 0000 -265 7491000 0096 0047 0009 0010 0002 0000 -276 83220000 0071 0034 0004 0077 0033 0004 -326 766

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a correctly specified model with an intercept and a useless factor The true value of R2 is 0 Fama-MacBeth esti-mates are constructed using OLS (GLS) two-step cross-sectional regressions with standard errors including Shankencorrection Confidence intervals for BFM estimates are constructed using a posterior distribution of Fama-MacBethestimates of λ The last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simula-tions evaluated at the simulation point estimates for FM and its posterior mode for BFM

59

Table C3 Tests of risk premia in a correctly specified model with useless and strong factors

λintercept λHML λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

100 0073 0038 0006 0071 0033 0006 0043 0012 0000 -445 5930200 0073 0035 0006 0090 0045 0008 0028 0006 0000 1083 7484

FM 600 0076 0033 0005 0100 0046 0009 0027 0005 0000 3873 85721000 0073 0034 0006 0090 0046 0009 0026 0005 0000 5415 911320000 0087 0032 0006 0061 0026 0005 0025 0008 0000 9687 9947

100 0039 0013 0001 0023 0007 0001 0002 0001 0000 -197 4174200 0048 0023 0000 0046 0015 0000 0002 0001 0000 003 5372

BFM 600 0054 0025 0003 0062 0026 0006 0002 0000 0000 4056 72821000 0084 0034 0006 0064 0021 0002 0002 0000 0000 5763 808920000 0071 0033 0005 0069 0032 0004 0004 0000 0000 9704 9860

Panel B GLS

100 0193 0129 0043 0205 0133 0043 0180 0121 0031 1405 7480200 0135 0080 0021 0141 0075 0024 0129 0062 0010 6015 8683

FM 600 0101 0054 0010 0109 0052 0009 0091 0037 0004 8331 94231000 0104 0055 0010 0105 0048 0011 0085 0039 0003 8995 963220000 0095 0047 0006 0101 0052 0005 0088 0035 0004 9944 9981

100 0133 0074 0023 0132 0074 0023 0026 0009 0001 2796 6680200 0106 0054 0012 0106 0053 0012 0013 0004 0000 4899 7362

BFM 600 0094 0047 0009 0093 0046 0009 0007 0001 0000 7654 87351000 0090 0045 0010 0091 0044 0007 0005 0001 0000 8530 914620000 0092 0043 0008 0092 0046 0008 0004 0001 0000 9914 9951

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and

λstrong λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor The true value

of the cross-sectional R2adj is 100 Fama-MacBeth estimates are constructed using OLS (GLS) two-step cross-

sectional regressions with standard errors including Shanken correction Confidence intervals for BFM estimates areconstructed using a posterior distribution of Fama-MacBeth estimates of λ The last two columns report the 5th and95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at the simulation point estimates for FMand its posterior mode for BFM

60

Table C4 Tests of risk premia in a misspecified model with a strong factor (N = 55)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0107 0058 0014 0115 0059 0012 -186 1305200 0091 0046 0009 0102 0052 0011 -184 1485600 0104 0052 0013 0109 0060 0014 -166 14951000 0100 0056 0009 0123 0061 0013 -123 135120000 0104 0049 0009 0109 0048 0009 353 803

BFM 100 0017 0004 0000 0002 0000 0000 -155 -067200 0046 0017 0004 0023 0006 0000 -168 273600 0089 0042 0012 0071 0036 0005 -174 8441000 0093 0047 0007 0089 0040 0007 -171 95920000 0101 0048 0011 0099 0042 0008 329 777

Panel B GLS

FM 100 0509 0428 0305 0513 0431 0305 2093 6471200 0295 0206 0102 0274 0187 0091 3999 6657600 0170 0109 0042 0176 0113 0034 5585 71321000 0170 0106 0034 0176 0105 0031 6010 723220000 0145 0082 0020 0141 0073 0022 6917 7182

BFM 100 0265 0181 0086 0262 0186 0079 1872 5919200 0163 0100 0032 0147 0086 0024 3401 5842600 0116 0060 0017 0118 0058 0014 5125 65951000 0120 0064 0016 0121 0063 0014 5689 686520000 0106 0053 0009 0095 0048 0013 6897 7163

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimates of a cross-section of 55 test assets The true valueof the cross-sectional R2

adj is 572 (7071) in OLS (GLS) estimation Fama-MacBeth estimates are constructedusing OLS (GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidenceintervals and their size for BFM estimates are constructed using posterior coverage of Fama-MacBeth estimates of λThe last two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluatedat the simulation point estimates for FM and its posterior mode for BFM The test assets mimic the time series andcross-sectional properties of 25 Fama-French size-value portfolios and 30 industry portfolios while the strong factorproxies the HML factor (all the data available from Ken French website)

61

Table C5 Tests of risk premia in a misspecified model with a useless factor (N = 55)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0095 0050 0013 0084 0031 0003 -186 2080200 0084 0042 0007 0093 0038 0004 -186 1801600 0103 0051 0011 0155 0077 0011 -187 13621000 0101 0051 0009 0226 0131 0033 -187 141720000 0191 0126 0050 0716 0650 0460 -187 988

BFM 100 0013 0002 0000 0000 0000 0000 -105 -047200 0039 0015 0001 0002 0000 0000 -128 -043600 0076 0040 0006 0004 0000 0000 -144 -0491000 0082 0035 0004 0022 0009 0000 -150 -02620000 0129 0059 0005 0078 0037 0008 -168 -020

Panel B GLS

FM 100 0492 0411 0287 0540 0468 0302 -134 2318200 0267 0196 0088 0364 0274 0153 -159 1359600 0166 0101 0035 0408 0323 0198 -162 10091000 0154 0089 0028 0475 0401 0266 -155 86820000 0155 0100 0044 0806 0772 0705 -068 668

BFM 100 0259 0180 0085 01695 0105 0041 -014 1285200 0162 0094 0030 0065 0027 0005 -089 615600 0108 0058 0013 0056 0022 0003 -127 5311000 0112 0057 0014 0064 0021 0004 -138 47920000 0051 0020 0001 0093 0049 0011 -072 172

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 55 portfolios Thetrue value of the cross-sectional R2 is zero The test assets mimic the time series and cross-sectional properties of 25Fama-French size-value portfolios and 30 industry portfolios

62

Table C6 Tests of risk premia in a misspecified model with useless and strong factors (N = 55)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 100 0097 0052 0014 0099 0045 0009 0108 0041 0006 -318 2477200 0085 0041 0006 0090 0049 0011 0117 0051 0008 -298 2350600 0095 0045 0013 0108 0060 0012 0191 0100 0015 -212 21211000 0083 0043 0006 0125 0072 0016 0258 0171 0047 -155 213720000 0109 0055 0012 0146 0086 0038 0766 0704 0535 267 1778

BFM 100 0014 0002 0000 0000 0000 0000 0000 0000 0000 -141 161200 0037 0014 0001 0017 0005 0000 0002 0000 0000 -203 726600 0069 0031 0005 0058 0027 0002 0007 0000 0000 -227 10961000 0064 0024 0003 0069 0027 0005 0026 0008 0001 -209 117320000 0028 0006 0000 0068 0033 0004 0080 0042 0009 262 873

Panel B GLS

FM 100 0488 0406 0282 0495 0411 0281 0537 0465 0306 2201 6613200 0263 0194 0088 0252 0167 0077 0359 0279 0151 4047 6707600 0157 0094 0032 0161 0093 0027 0398 0313 0199 5581 71591000 0146 0087 0029 0144 0090 0026 0475 0393 0251 5994 725020000 0140 0094 0035 0134 0089 0041 0789 0747 0667 6888 7237

BFM 100 0253 0182 0081 0260 0176 0077 0168 0104 0039 2036 6007200 0156 0092 0028 0140 0082 0021 0063 0029 0004 3451 5844600 0105 0058 0013 0108 0049 0011 0054 0024 0002 5125 66141000 0105 0054 0015 0111 0056 0009 0057 0024 0003 5678 687320000 0051 0019 0001 0045 0018 0001 0093 0045 0013 6873 7157

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

and λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor estimated on a cross-section

of 55 portfolios The true value of the cross-sectional R2adj is 572 (7071) in OLS (GLS) estimation The test

assets mimic the time series and cross-sectional properties of 25 Fama-French size-value portfolios and 30 industryportfolios while the strong factor proxies the HML factor

63

Table C7 Tests of risk premia in a misspecified model with a strong factor (N = 100)

λc λstrong R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0044 0010 0109 0055 0012 -098 1310600 0102 0052 0013 0116 0057 0019 -042 12311000 0099 0056 0011 0117 0060 0014 031 114520000 0099 0044 0010 0116 0068 0017 431 748

BFM 200 0016 0004 0000 0006 0001 0000 -082 074600 0076 0034 0006 0060 0028 0005 -086 7901000 0081 0046 0007 0076 0037 0007 -072 85920000 0097 0044 0011 0106 0057 0014 418 742

Panel B GLS

FM 200 0495 0411 0287 0481 0403 0268 2938 5885600 0255 0169 0077 0257 0178 0073 4736 62091000 0234 0149 0059 0205 0129 0049 5198 635220000 0166 0100 0035 0170 0097 0036 6142 6398

BFM 200 0249 0163 0070 0233 0158 0073 2567 5296600 0132 0082 0023 0137 0077 0020 4287 56771000 0125 0073 0021 0112 0060 0014 4849 597020000 0101 0056 0012 0098 0053 0013 6114 6375

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for the pseudo-true values of λlowast

i in amisspecified model with an intercept and a strong factor estimated on a cross-section of 100 portfolios The truevalue of R2

adj is 585 (6297) for the OLS (GLS) estimation Fama-MacBeth estimates are constructed using OLS(GLS) two-step cross-sectional regressions with standard errors including Shanken correction Confidence intervalsand their size for BFM estimates are constructed using a posterior distribution of Fama-MacBeth estimates of λ Thelast two columns report the 5th and 95th percentiles of cross-sectional R2

adj across 1000 simulations evaluated at thesimulation point estimates for FM and its posterior mode for BFM The simulations design follows the methodologydescribed in Section III with the test assets mimicking the composite cross-section of 25 Fama-French size-BMportfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentum portfolios as well as 10long-term reversal portfolios (all available from Ken French website) A strong factor mimics the behavior of HML

64

Table C8 Tests of risk premia in a misspecified model with a useless factor (N = 100)

λc λuseless R2adj

T 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0091 0041 0009 0147 0079 0015 -101 1481600 0110 0062 0010 0280 0180 0064 -100 13551000 0096 0053 0007 0341 0248 0101 -100 122220000 0221 0151 0061 0805 0759 0643 -100 1220

BFM 200 0014 0003 0000 0000 0000 0000 -050 002600 0070 0030 0003 0014 0002 0000 -066 0261000 0073 0034 0005 0026 0010 0001 -071 02720000 0097 0035 0003 0091 0045 0008 -081 109

Panel B GLS

FM 200 0472 0397 0272 0530 0454 0320 -072 1495600 0230 0149 0064 0458 0363 0235 -052 9691000 0196 0122 0048 0487 0407 0275 -037 86120000 0106 0063 0021 0760 0717 0640 171 615

BFM 200 0244 0161 0066 0150 0092 0031 -016 769600 0129 0073 0021 0078 0035 0008 -055 6761000 0118 0066 0019 0070 0033 0006 -060 62420000 0076 0034 0005 0093 0049 0009 172 399

The table shows the frequency of rejecting the null hypothesisH0 λi = λlowasti for pseudo-true value of λc and λlowast

useless = 0in a misspecified model with an intercept and a useless factor estimated on a cross-section of 100 portfolios The truevalue of R2 is zero The test assets mimic the time series and cross-sectional properties of composite cross-section of25 Fama-French size-BM portfolios 30 industry portfolios 25 profitability and investment portfolios 10 momentumportfolios as well as 10 long-term reversal portfolios while the strong factor mimics the behavior of HML (all thedata is available from Ken French website)

Table C9 Tests of risk premia in a misspecified model with useless and strong factors (N = 100)

λc λstrong λuseless R2adj

T 10 5 1 10 5 1 10 5 1 5th 95th

Panel A OLS

FM 200 0088 0043 0009 0098 0045 0008 0187 0104 0026 -124 2118600 0103 0056 0011 0104 0064 0015 0319 0217 0081 -020 19571000 0105 0049 0009 0115 0058 0013 0389 0284 0134 070 182620000 0102 0057 0014 0140 0075 0028 0823 0782 0684 400 1766

BFM 200 0012 0003 0000 0005 0000 0000 0000 0000 0000 -051 541600 0062 0025 0003 0046 0021 0002 0016 0004 0000 -063 10771000 0062 0029 0005 0057 0025 0003 0029 0008 0001 -005 107820000 0026 0008 0001 0042 0015 0000 0089 0039 0011 399 818

Panel B GLS

FM 200 0467 0392 0268 0471 0383 0254 0523 0454 0315 2971 5943600 0227 0149 0059 0228 0154 0060 0455 0363 0227 4745 62221000 0191 0118 0046 0178 0114 0036 0480 0401 0262 5207 636820000 0098 0054 0020 0095 0062 0024 0749 0707 0621 6121 6416

BFM 200 0243 0160 0066 0230 0155 0072 0152 0087 0031 2613 5350600 0126 0072 0022 0132 0071 0017 0074 0033 0007 4268 56921000 0117 0067 0017 0109 0057 0013 0068 0034 0006 4864 597920000 0068 0032 0002 0066 0034 0006 0087 0047 0009 6102 6371

The table shows the frequency of rejecting the null hypothesis H0 λi = λlowasti for pseudo-true values of λc and λstrong

λlowastuseless equiv 0 in a misspecified model with an intercept a strong and a useless factor on the cross-section of 100

portfolios The true value of R2adj is 585 (6297) in OLS (GLS) estimation The test assets mimic the time series

and cross-sectional properties of composite cross-section of 25 Fama-French size-BM portfolios 30 industry portfolios25 profitability and investment portfolios 10 momentum portfolios as well as 10 long-term reversal portfolios whilethe strong factor mimics the behavior of HML (all the data is available from Ken French website)

65

Table C10 The probability of retaining risk factors using BF

T 55 57 59 61 63 65

Panel A strong factors

Jeffreys Prior fstrong 200 0860 0845 0830 0812 0792 0771600 0987 0985 0985 0983 0981 09791000 0998 0998 0997 0996 0996 0995

Spike-and-Slab Prior fstrong 200 0749 0718 0685 0654 0618 0586600 0982 0979 0978 0972 0970 09641000 0996 0996 0996 0995 0994 0992

Panel B useless factors

Jeffreys Prior fuseless 200 1000 0995 0982 0940 0862 0726600 1000 0999 0998 0988 0971 09201000 1000 1000 1000 0999 0990 0971

Spike-and-Slab Prior fuseless 200 0419 0248 0149 0083 0040 0022600 0119 0062 0028 0012 0007 00041000 0083 0028 0011 0001 0000 0000

Panel C strong and useless factors

Jeffreys Prior fstrong 200 0928 0912 0891 0878 0860 0838600 0994 0994 0992 0991 0991 09891000 0999 0999 0999 0999 0999 0999

fuseless 200 0955 0894 0788 0642 0489 0360600 0957 0895 0764 0618 0461 03541000 0957 0893 0787 0645 0483 0357

Spike-and-Slab Prior fstrong 200 0753 0725 0697 0666 0629 0592600 0982 0980 0979 0972 0970 09611000 0996 0996 0996 0995 0994 0994

fuseless 200 0283 0154 0085 0044 0023 0011600 0077 0031 0006 0002 0000 00001000 0053 0008 0000 0000 0000 0000

The table shows the frequency of retaining risk factors for different choice sets across 1000 simulations of differentsize (T=200 600 and 1000) In Panel A the candidate risk factor is truly cross-sectionally priced and stronglyidentified while in Panel B they are not Panel C summarizes the case of using both strong and useless candidatefactors in the model A candidate factor is retained in the model if its marginal posterior probability p(γi = 1|data)is greater than a certain threshold ie 55 57 59 61 63 and 65

66

Table C11 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 1421 1769 1404 -091 1461[0627 2215] [-435 6661] [0595 2256] [-403 5211]

MKT -0647 -0631[-1429 0135] [-1458 0163]

Fama and French (1992) Intercept 1273 6071 1249 5604 5566[0730 1816] [2000 8514] [0682 1834] [4275 6946]

MKT -0686 -0664[-1227 -0145] [-1242 -0102]

SMB 0140 0140[0075 0205] [0073 0205]

HML 0380 0379[0322 0438] [0319 0439]

Quality-minus-junk Intercept 0626 7442 0648 6947 6879Asness Frazzini and Pedersen (2014) [-0048 1300] [4480 9520] [-0055 1308] [5548 7946]

QMJ 0369 0358[0148 0589] [0130 0591]

MKT -0097 -0116[-0763 0568] [-0772 0580]

SMB 0209 0206[0157 0260] [0151 0262]

HML 0338 0340[0288 0388] [0289 0392]

Panel B GLS

CAPM Intercept 1362 4805 1323 4706 4265[0941 1783] [2696 10000] [0846 1802] [1411 6530]

MKT -0788 -0749[-1206 -0369] [-1227 -0287]

Fama and French (1992) Intercept 1397 8849 1353 8543 8543[0924 1870] [8286 9771] [0846 1878] [8003 8989]

MKT -0827 -0783[-1298 -0356] [-1311 -0283]

SMB 0179 0178[0145 0213] [0142 0215]

HML 0348 0350[0312 0385] [0310 0391]

Quality-minus-junk Intercept 0966 8948 0982 8736 8642Asness Frazzini and Pedersen (2014) [0395 1537] [8320 9640] [0383 1616] [8120 9090]

QMJ 0378 0359[0204 0552] [0172 0539]

MKT -0425 -0438[-0984 0133] [-1052 0157]

SMB 0197 0196[0160 0234] [0156 0235]

HML 0359 0359[0321 0398] [0320 0399]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French sizevalue portfolios Each model is estimated viaOLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

67

Table C12 Two-pass regressions with tradable factors 25 Fama-French portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1444 662 1814 -202 545[0155 2732] [-435 9374] [0106 3664] [-423 4175]

∆Cnd 0383 0254[-0221 0988] [-0462 0956]

Scaled CAPM Intercept 4489 1617 3731 1739 2565[1479 7499] [-1429 6114] [-0268 7514] [-391 5783]

cay 1141 0563[-0198 2480] [-2262 3085]

MKT -2119 -1444[-4890 0653] [-5241 2643]

MKT times cay -8554 -5188[-19227 2119] [-17911 9415]

Scaled HC-CAPM Intercept 4405 1386 3343 3895 3659[1182 7629] [-2632 5453] [-0500 7182] [-006 6630]

cay 1038 0552[-0444 2520] [-1957 2848]

∆Y 0437 0034[-0421 1295] [-0995 0928]

MKT -2049 -1148[-5031 0933] [-4946 2772]

∆Y times cay 1428 0724[-1047 3903] [-3458 4645]

MKT times cay -7782 -4127[-18579 3015] [-17926 10532]

Panel B GLS

CCAPM Intercept 2274 -251 2336 -303 -056[1386 3162] [-435 9896] [1360 3266] [-403 1109]

∆Cnd 0139 0088[-0114 0391] [-0222 0383]

Scaled CAPM Intercept 2623 4949 2668 5626 4567[0794 4451] [1657 7829] [0859 4376] [439 7146]

cay 0362 0205[-0759 1483] [-0853 1277]

MKT -0596 -0642[-2412 1220] [-2325 1149]

MKT times cay 0644 0310[-5695 6984] [-5988 6529]

Scaled HC-CAPM Intercept 2486 5014 2604 5832 4698[0510 4463] [1158 7726] [0801 4372] [558 7371]

cay 0361 0199[-0902 1623] [-0879 1269]

∆Y -0415 -0242[-0878 0048] [-0672 0157]

MKT -0479 -0588[-2441 1482] [-2357 1241]

∆Y times cay 0395 0150[-1761 2552] [-1718 2036]

MKT times cay 1158 0477[-5888 8205] [-5890 6968]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French sizevalue portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructed basedon the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructed as inLewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we provide theposterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and 99level respectively

68

Table C13 Two-pass regressions with tradable factors 25 Fama-French + 17 industry portfolios

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Carhart (1997) Intercept 0755 4720 0780 3835 3769

[0167 1342] [803 8559] [0158 1395] [2222 5427]

MKT -0162 -0188

[-0759 0435] [-0808 0432]

SMB 0144 0142

[0082 0206] [0078 0207]

HML 0320 0316

[0251 0388] [0245 0388]

UMD 0759 0673

[-0360 1878] [-0476 1802]

q-factor model Intercept 0744 4505 0761 3647 3600

Hou Xue and Zhang (2015) [0244 1243] [138 8338] [0239 1287] [1711 5508]

ROE 0173 0156

[-0139 0485] [-0154 0481]

IA 0287 0279

[0117 0456] [0117 0455]

ME 0230 0221

[0135 0325] [0121 0320]

MKT -0199 -0211

[-0703 0304] [-0741 0311]

Liquidity Factor Intercept 0958 277 0963 -124 620

Pastor and Stambaugh (2003) [0417 1499] [-513 4113] [0377 1527] [-435 3340]

LIQ 0210 0153

[-1001 1421] [-1077 1445]

MKT -0274 -0281

[-0822 0274] [-0851 0340]

Panel B GLS

Carhart (1997) Intercept 0839 8826 0909 8486 8397

[0467 1210] [7562 9557] [0487 1339] [7895 8812]

MKT -0235 -0309

[-0610 0140] [-0737 0120]

SMB 0162 0161

[0134 0190] [0130 0193]

HML 0351 0350

[0316 0386] [0312 0390]

UMD 1426 1184

[0832 2020] [0541 1861]

q-factor model Intercept 1107 4289 1103 3742 3631

Hou Xue and Zhang (2015) [0761 1454] [027 7341] [0714 1486] [2022 5216]

ROE 0270 0247

[0057 0482] [0008 0502]

IA 0277 0263

[0134 0420] [0107 0428]

ME 0221 0216

[0156 0287] [0144 0288]

MKT -0524 -0520

[-0869 -0179] [-0900 -0138]

Liquidity Factor Intercept 1186 2897 1159 1811 2373

Pastor and Stambaugh (2003) [0874 1497] [-197 7687] [0803 1519] [438 5253]

LIQ 0089 0065

[-0567 0744] [-0696 0871]

MKT -0598 -0571

[-0909 -0288] [-0936 -0212]

69

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CAPM Intercept 0966 467 0957 -113 306

[0427 1506] [-250 5490] [0382 1522] [-246 2863]

MKT -0277 -0269

[-0823 0269] [-0838 0311]

Fama-French (1992) Intercept 1016 4330 1002 3425 3370

[0540 1491] [182 8166] [0517 1500] [2194 4614]

MKT -0445 -0430

[-0916 0026] [-0912 0068]

SMB 0147 0144

[0086 0208] [0077 0208]

HML 0316 0313

[0248 0383] [0242 0383]

Quality-minus-junk Intercept 0836 4931 0836 3861 3934

Asness et all (2019) [0323 1350] [914 8449] [0314 1365] [2459 5577]

QMJ 0153 0147

[-0065 0372] [-0066 0373]

MKT -0293 -0289

[-0796 0210] [-0814 0233]

SMB 0179 0175

[0114 0243] [0105 0240]

HML 0302 0300

[0234 0369] [0230 0373]

Panel B GLS

CAPM Intercept 1192 3060 1170 2602 2573

[0884 1500] [058 7848] [0815 1517] [600 5118]

MKT -0604 -0583

[-0911 -0298] [-0923 -0227]

Fama-French (1992) Intercept 1182 8616 1150 8272 8234

[0853 1510] [7303 9353] [0788 1519] [7736 8662]

MKT -0596 -0564

[-0923 -0268] [-0937 -0198]

SMB 0160 0160

[0132 0187] [0129 0191]

HML 0349 0348

[0314 0384] [0310 0386]

Quality-minus-junk Intercept 0970 8674 0977 8227 8276

Asness et all (2019) [0606 1334] [7451 9335] [0561 1369] [7759 8693]

QMJ 0293 0278

[0159 0427] [0125 0439]

MKT -0393 -0399

[-0754 -0033] [-0791 0012]

SMB 0166 0165

[0138 0194] [0134 0196]

HML 0353 0352

[0318 0388] [0315 0392]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with tradable risk factorson a cross-section of monthly excess returns for 25 Fama-French and 17 industry portfolios Each model is estimatedvia OLS and GLS We report point estimates and 5 confidence intervals for risk premia which are constructedbased on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence level constructedas in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimation we providethe posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode and median of thecross-sectional R2 as well as its (5 95) credible intervals and denote significance at the 90 95 and99 level respectively

70

Table C14 Two-pass regressions with nontradable factors 25 Fama-French + 17 industry port-folios

Fama-MacBeth Bayesian Estimation

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

CCAPM Intercept 1817 356 1975 -124 209

[0905 2729] [-250 6208] [0795 3135] [-245 2680]

∆Cnd 0189 0123

[-0216 0594] [-0285 0502]

Scaled CAPM Intercept 2141 1528 2344 298 1306

[0529 3754] [-789 9568] [0692 3944] [-451 4069]

cay 1408 0574

[-0182 2999] [-0935 1939]

MKT 0108 -0142

[-1471 1686] [-1747 1499]

cay timesMKT -2554 -2159

[-8811 3702] [-8813 4620]

Scaled CCAPM Intercept 1607 1709 1983 95 1468

[0394 2820] [-789 10000] [0761 3180] [-372 4189]

cay 1137 0511

[-0394 2669] [-0871 1865]

∆Cnd 0452 0175

[-0103 1007] [-0261 0598]

cay times∆Cnd -0102 0030

[-1752 1547] [-1175 1292]

HC-CAPM Intercept 2185 611 2221 -147 644

[0720 3650] [-513 6005] [0667 3735] [-429 3093]

∆Y 0398 0201

[-0011 0808] [-0290 0667]

MKT 0123 0050

[-1392 1638] [-1513 1650]

Panel B GLS

CCAPM Intercept 2351 264 2442 -057 218

[1700 3003] [-250 9795] [1612 3258] [-204 1296]

∆Cnd 0211 0132

[0034 0387] [-0081 0351]

Scaled CAPM Intercept 2516 3951 2653 4332 3470

[1467 3566] [1045 7411] [1616 3705] [360 6539]

cay 0783 0424

[0056 1511] [-0311 1119]

MKT -0504 -0639

[-1548 0541] [-1674 0406]

cay timesMKT 3785 2327

[-0412 7981] [-2053 6791]

Scaled CCAPM Intercept 2244 331 2378 296 356

[1378 3109] [-789 8166] [1539 3237] [-476 1690]

cay 0742 0425

[-0006 1489] [-0316 1104]

∆Cnd 0160 0119

[-0078 0398] [-0116 0342]

cay times∆Cnd 0355 0190

[-0343 1053] [-0455 0840]

HC-CAPM Intercept 2908 3810 2808 4454 3356

[2053 3762] [854 7372] [1809 3826] [194 6591]

∆Y -0155 -0089

[-0381 0072] [-0357 0174]

MKT -0892 -0793

[-1742 -0043] [-1803 0208]

71

FM BFM

Model Factors λj R2adj λj R2

adjmode R2adjmedian

Panel A OLS

Scaled HC-CAPM Intercept 2099 1372 2243 1599 2186

[0549 3649] [-1389 4875] [0491 3955] [-121 4771]

cay 1124 0529

[-0326 2574] [-1066 1981]

∆Y 0088 0106

[-0314 0489] [-0399 0599]

MKT 0169 -0068

[-1340 1679] [-1805 1744]

cay times∆Y 1277 0579

[-1361 3914] [-1930 2919]

cay timesMKT -2125 -1605

[-8058 3808] [-8349 5354]

Durable CCAPM Intercept 2281 2316 2220 1423 1459

[0025 4537] [-789 10000] [0403 4028] [-359 4178]

∆Cnd 0544 0194

[0054 1034] [-0163 0525]

∆Cd 0481 0104

[-0101 1062] [-0353 0531]

MKT -0102 -0063

[-2466 2262] [-1948 1863]

Panel B GLS

Scaled HC-CAPM Intercept 2662 3848 2698 4312 3502

[1626 3698] [433 7267] [1555 3844] [231 6466]

cay 0726 0416

[-0029 1481] [-0324 1146]

∆Y -0165 -0088

[-0440 0110] [-0372 0198]

MKT -0652 -0685

[-1684 0379] [-1816 0438]

cay times∆Y 0681 0333

[-0648 2009] [-1017 1616]

cay timesMKT 3266 2123

[-0950 7482] [-2173 6502]

Durable CCAPM Intercept 1958 2926 1994 412 2558

[0634 3281] [-789 6763] [0831 3125] [-269 6336]

∆Cnd 0164 0065

[-0065 0394] [-0126 0260]

∆Cd 0289 0123

[-0011 0589] [-0117 0377]

MKT 0002 -0026

[-1322 1326] [-1165 1125]

The table summarises risk premia estimates and cross-sectional fit for a selection of models with nontradable riskfactors on a cross-section of quarterly excess returns for 25 Fama-French and 17 industry portfolios Each modelis estimated via OLS and GLS We report point estimates and 5 confidence intervals for risk premia which areconstructed based on the asymptotic normal distribution and cross-sectional R2 and its (5 95) confidence levelconstructed as in Lewellen Nagel and Shanken (2010) for FM estimation In Bayesian Fama-MacBeth estimationwe provide the posterior mean of λ denoted by λj its (25 975) credible intervals the posterior mode andmedian of the cross-sectional R2 as well as its (5 95) credible intervals and denote significance at the90 95 and 99 level respectively

72

Table C15 Posterior factor probabilities and risk premia of 26 million sparse models

Pr [γj = 1|data] E [λj |data]ψ ψ

factors 1 5 10 20 50 100 1 5 10 20 50 100

HML 0455 0702 0720 0695 0646 0614 0106 0231 0249 0245 0229 0219MKTlowast 0274 0513 0594 0658 0700 0702 0050 0211 0321 0440 0553 0591MKT 0086 0178 0311 0446 0550 0576 0009 0067 0163 0286 0408 0454SMBlowast 0246 0350 0317 0264 0210 0188 0039 0113 0120 0110 0095 0089PERF 0136 0160 0147 0125 0098 0083 -0020 -0050 -0053 -0049 -0041 -0036STOCK ISS 0099 0117 0125 0123 0110 0099 -0008 -0029 -0042 -0050 -0053 -0051COMP ISSUE 0097 0101 0104 0103 0096 0088 0010 0029 0041 0051 0059 0059CMA 0111 0114 0104 0091 0075 0066 0008 0018 0019 0019 0017 0016ROA 0089 0079 0083 0094 0102 0097 0010 0023 0034 0051 0068 0070UMD 0091 0097 0097 0090 0078 0071 0006 0020 0026 0029 0030 0030BEH PEAD 0081 0069 0069 0074 0089 0108 0001 0002 0004 0007 0016 0030STRev 0078 0064 0063 0067 0086 0112 0000 0002 0003 0007 0022 0048NONDUR 0079 0065 0064 0067 0079 0097 0000 0000 0001 0001 0003 0006DISSTR 0085 0079 0076 0070 0060 0053 0010 0030 0039 0043 0042 0040MGMT 0090 0083 0077 0068 0055 0047 0007 0017 0019 0019 0017 0015BW ISENT 0079 0064 0062 0063 0069 0077 0000 0000 0000 0001 0002 0004TERM 0078 0063 0061 0062 0069 0078 0000 0000 0001 0001 0004 0007FIN UNC 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0000LIQ TR 0078 0063 0060 0061 0066 0075 -0000 0000 0001 0002 0007 0015DeltaSLOPE 0078 0063 0061 0061 0066 0073 -0000 -0000 -0000 -0000 -0000 -0001IPGrowth 0078 0063 0061 0061 0066 0072 -0000 -0000 -0000 -0000 -0001 -0002Oil 0078 0063 0061 0061 0066 0072 -0000 0001 0001 0002 0004 0009SERV 0078 0063 0061 0061 0065 0072 -0000 -0000 -0000 -0000 -0000 -0000DEFAULT 0078 0063 0060 0060 0064 0069 -0000 -0000 0000 0000 0000 0000NetOA 0079 0063 0061 0062 0065 0065 0001 0003 0004 0007 0014 0018DIV 0078 0063 0060 0060 0064 0069 0000 -0000 -0000 -0000 -0000 -0000PE 0078 0063 0060 0060 0064 0068 -0000 -0001 -0001 -0002 -0004 -0008REAL UNC 0078 0063 0060 0060 0064 0067 0000 0000 0000 0000 0000 0000HJTZ ISENT 0078 0063 0060 0060 0064 0068 -0000 -0000 -0000 -0000 -0000 -0001UNRATE 0078 0063 0060 0060 0064 0068 0000 -0000 -0000 -0000 -0001 -0002INTERM CAP RATIO 0084 0065 0061 0062 0062 0059 0009 0013 0018 0027 0038 0041LIQ NT 0079 0063 0060 0060 0063 0066 -0001 -0001 -0002 -0002 -0003 -0004QMJ 0113 0090 0069 0051 0035 0028 0012 0016 0014 0010 0007 0005MACRO UNC 0078 0063 0060 0059 0060 0061 0000 0000 0000 0000 0000 0000INV IN ASS 0078 0062 0059 0059 0060 0061 0000 0000 0001 0001 0003 0006ACCR 0081 0067 0062 0059 0056 0055 -0002 -0005 -0006 -0006 -0005 -0004LTRev 0078 0064 0063 0062 0059 0053 -0001 -0005 -0008 -0012 -0016 -0017CMAlowast 0079 0062 0059 0058 0059 0058 0000 0000 -0000 -0001 -0002 -0002ASS Growth 0078 0060 0056 0055 0055 0052 -0001 -0001 -0001 -0004 -0008 -0011HMLlowast 0079 0062 0058 0055 0053 0049 -0001 -0001 -0001 -0001 -0001 -0001RMWlowast 0077 0058 0052 0047 0040 0035 0000 0001 0001 0002 0002 0002BAB 0079 0059 0051 0045 0039 0034 -0003 -0003 -0003 -0001 0001 0003IA 0083 0060 0051 0043 0035 0030 -0003 -0004 -0004 -0003 -0003 -0003O SCORE 0077 0056 0047 0040 0032 0027 -0003 -0004 -0005 -0005 -0005 -0005ROE 0076 0055 0047 0040 0032 0027 0001 0004 0005 0005 0004 0003SMB 0039 0024 0027 0042 0067 0077 0002 0002 0004 0007 0014 0017SKEW 0090 0062 0047 0034 0024 0019 -0000 -0000 -0000 -0000 -0000 -0000BEH FIN 0079 0051 0040 0031 0023 0019 -0006 -0005 -0003 -0001 -0000 -0000GR PROF 0077 0047 0038 0031 0025 0020 0004 0003 0003 0003 0003 0003RMW 0073 0045 0036 0030 0023 0018 0003 0004 0004 0003 0003 0003HML DEVIL 0063 0036 0027 0020 0015 0012 0002 0003 0002 0001 0000 0000

Posterior probabilities of factors Pr [γj = 1|data] and posterior mean of factor risk premia E [λj |data] computedusing the Dirac spike and slab approach of section II22 51 factors and all possible models with up to 5 factorsyielding about 26 million candidate models The prior probability of a factor being included is about 1038 Thedata is monthly 197310 to 201612 Test assets cross-section of 25 Fama-French size and book-to-market and 30Industry portfolios The 51 factors considered are described in Table B1 of Appendix B

73

Table C16 Factor models with highest posterior probability (Dirac spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232SMB 983232 983232 983232STRevBW ISENTLIQ TRUNRATENONDURTERMCOMP ISSUE 983232 983232OilBEH PEAD 983232DeltaSLOPEINV IN ASSIPGrowthDEFAULTUMD 983232ROA 983232 983232 983232 983232 983232 983232 983232REAL UNCPECMA 983232ACCRSERVSTOCK ISS 983232 983232 983232DIVMACRO UNCFIN UNCLIQ NTCMANetOAHJTZ ISENTLTRevRMWHMLINTERM CAP RATIOASS GrowthPERF 983232IABABDISSTR 983232ROEMGMTO SCOREQMJBEH FINGR PROFSKEWRMWHML DEVILSMBProbability () 01080 00964 00771 00710 00709 00668 00661 00624 00600 00560

Factors and posterior model probabilities of ten most likely specifications computed using the Dirac spike and slabapproach of section II22 ψ = 20 51 factors and all possible models with up to 5 factors yielding about 26 millionmodels and a model prior probability of the order of 10minus7 Specifications organised by columns with the symbol 983232indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612 Test assetscross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factors considered aredescribed in Table B1 of Appendix B

74

Table C17 Factor Models with highest posterior probability (continuous spike-and-slab ψ = 10)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232 983232 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232NONDUR 983232 983232 983232 983232SERV 983232 983232 983232LIQ TR 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232BW ISENT 983232 983232DEFAULT 983232 983232 983232 983232 983232 983232Oil 983232 983232 983232DIV 983232 983232 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232 983232UNRATE 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232 983232CMAlowast 983232 983232 983232 983232 983232 983232LIQ NT 983232 983232 983232 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232NetOA 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232 983232 983232 983232INV IN ASS 983232 983232 983232COMP ISSUE 983232 983232 983232 983232 983232ACCR 983232 983232 983232 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232 983232 983232 983232 983232 983232INTERM CAP RATIO 983232 983232 983232 983232 983232 983232ASS Growth 983232 983232 983232 983232 983232 983232DISSTR 983232 983232 983232 983232PERF 983232 983232 983232BAB 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232 983232ROE 983232 983232 983232O SCORE 983232BEH FIN 983232 983232 983232 983232 983232GR PROF 983232 983232 983232 983232 983232QMJ 983232 983232RMW 983232SKEW 983232 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00111 00111 00111 00100 00100 00100 00100 00100 00100 00100

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

75

Table C18 Factor models with highest posterior probability (Continuous spike-and-slab ψ = 20)

modelfactor 1 2 3 4 5 6 7 8 9 10

HML 983232 983232 983232 983232 983232 983232 983232 983232 983232MKTlowast 983232 983232 983232 983232 983232 983232 983232MKT 983232 983232 983232 983232 983232 983232 983232 983232SMBlowast 983232 983232 983232STRev 983232 983232 983232 983232 983232 983232 983232 983232IPGrowth 983232 983232 983232BEH PEAD 983232 983232 983232 983232 983232 983232NONDUR 983232 983232 983232SERV 983232 983232 983232 983232 983232 983232LIQ TR 983232 983232 983232 983232 983232PE 983232 983232 983232 983232 983232 983232 983232DeltaSLOPE 983232 983232 983232 983232 983232 983232BW ISENT 983232 983232 983232 983232 983232DEFAULT 983232 983232 983232 983232 983232Oil 983232 983232DIV 983232 983232 983232 983232REAL UNC 983232 983232 983232 983232 983232UNRATE 983232 983232 983232 983232 983232TERM 983232 983232 983232 983232HJTZ ISENT 983232 983232 983232 983232 983232 983232 983232 983232UMD 983232 983232 983232 983232CMAlowast 983232 983232LIQ NT 983232 983232 983232 983232 983232FIN UNC 983232 983232 983232 983232 983232 983232STOCK ISS 983232 983232 983232 983232 983232NetOA 983232 983232 983232 983232 983232 983232 983232MACRO UNC 983232 983232INV IN ASS 983232 983232 983232 983232COMP ISSUE 983232 983232 983232 983232ACCR 983232 983232 983232HMLlowast 983232 983232 983232 983232 983232 983232 983232ROA 983232 983232 983232 983232 983232 983232RMWlowast 983232 983232 983232 983232 983232CMA 983232 983232 983232 983232 983232 983232LTRev 983232 983232 983232INTERM CAP RATIO 983232 983232 983232ASS Growth 983232 983232 983232 983232DISSTR 983232 983232 983232 983232 983232PERF 983232BAB 983232 983232 983232 983232IA 983232 983232 983232 983232MGMT 983232 983232 983232ROE 983232 983232 983232 983232 983232O SCORE 983232 983232 983232 983232 983232BEH FIN 983232GR PROF 983232 983232 983232QMJ 983232 983232RMW 983232 983232SKEW 983232 983232SMB 983232HML DEVIL 983232 983232Probability () 00133 00122 00111 00111 00100 00100 00100 00100 00100 00089

Factors and posterior model probabilities of ten most likely specifications computed using the continuous spike andslab approach of section II23 ψ = 10 51 factors and all possible models with up to 5 factors yielding about 225quadrillion models and a model prior probability of the order of 10minus16 Specifications organised by columns with thesymbol 983232 indicating that the factor in the corresponding row is included The data is monthly 197310 to 201612Test assets cross-section of 25 Fama-French size and book-to-market and 30 Industry portfolios The 51 factorsconsidered are described in Table B1 of Appendix B

76

D Additional Figures

Panel A OLS

00 01 02 03 04 05

02

46

810

λstrong

Density

BFMFM

(a) Strong factor

minus2 minus1 0 1 2

00

02

04

06

08

10

λuseless

Density

BFMFM

(b) Useless factor

Panel B GLS

025 030 035

05

1015

2025

30

λstrong

Density

BFMFM

(c) Strong factor

minus20 minus15 minus10 minus05 00 05 10

00

04

08

12

λuseless

Density

BFMFM

(d) Useless factor

Figure D1 Posterior distribution of the risk premia estimates in a misspecified model that includesboth strong and irrelevant factors

The graph presents posterior distribution of risk premia estimates for a misspecified model with both strong anduseless factors in one representative simulation Panels (a) and (b) display posterior distribution of the BFM-OLSestimates of risk premia along with the frequentist distribution implied by the point estimates and standard errorsPanels (c) and (d) report the same objects for GLS T =1000

77

0 5 10

00

04

08

12

Parameter

Den

sity

PriorLikelihood ILikelihood IILikelihood IIIPosterior IPosterior IIPosterior III

Figure D2 Posterior distributions with heavy-tailed likelihood and thin-tailed prior

Posterior shapes generated by a thin-tails (Gaussian) prior and a heavy-tails (Cauchy) likelihood as the likelihood

peak moves away from the prior mean Posteriors I II and III are generated respectively by the likelihood I II and

III

78

Page 11: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 12: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 13: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 14: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 15: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 16: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 17: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 18: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 19: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 20: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 21: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 22: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 23: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 24: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 25: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 26: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 27: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 28: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 29: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 30: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 31: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 32: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 33: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 34: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 35: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 36: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 37: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 38: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 39: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 40: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 41: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 42: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 43: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 44: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 45: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 46: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 47: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 48: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 49: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 50: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 51: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 52: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 53: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 54: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 55: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 56: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 57: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 58: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 59: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 60: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 61: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 62: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 63: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 64: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 65: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 66: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 67: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 68: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 69: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 70: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 71: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 72: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 73: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 74: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 75: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 76: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 77: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 78: Bayesian Solutions for the Factor Zoo: We Just Ran Two
Page 79: Bayesian Solutions for the Factor Zoo: We Just Ran Two