stock return predictability, conditional asset pricing

Stock Return Predictability, Conditional Asset PricingModels and Portfolio Selection

Ane Tamayo1

London Business SchoolRegent’s Park

London NW1 4SA2

First Draft: September, 2000This draft:May, 2002

1This paper is part of my dissertation at the University of Rochester. I thank the members of my dis-sertation committee Gregory Bauer, Christopher Jones, John Long, and especially the committee chair JayShanken, for helpful discussions and comments. The comments of Andreas Gintschel, Ludger Hentschel, Sta-nimir Markov, Micah Officer, William Schwert, Raman Uppal, Michela Verardo, and seminar participantsat the European Finance Association Meetings at Barcelona, Centro de Estudios Monetarios y FinancierosCEMFI, Frank Russell Company, HEC School of Management, London Business School, Ohio State Univer-sity, Universidad Carlos III, Universitat Pompeu Fabra, and University of Rochester are also appreciated. Iam responsible for any remaining errors.

2Telephone: +44 (0)20 7262 5050, Ext.: 3410. Fax: +44 (0)20 7724 6573. E-mail: [email protected].

Abstract

I examine an investor’s portfolio allocation problem across multiple risky assets in the presenceof return predictability when, in addition to the predictability evidence, the investor uses condi-tional asset pricing models to guide him in the portfolio selection decision. I also explore how theuncertainty associated with the model dynamics affects the investor’s optimal portfolio. To analyzethis, I introduce Bayesian techniques that have not been used before in the asset pricing literature.

Using a market index and a small capitalization or a value portfolio, I find that the sampleevidence on predictability plays a major role in the investor’s portfolio allocation decision. Theoptimal portfolio also depends on his beliefs about the extent to which this predictability can beattributed to time variation in risk premia and betas. Finally, I show that the portfolio allocationdecision is also affected by the investor’s uncertainty about the beta risk dynamics.

1 Introduction

Consider an investor who observes a number of variables that may predict stock returns and has

some knowledge of asset pricing theory. How can he use this information to allocate funds between a

riskless asset and a portfolio of risky assets? In this paper, I address this question by examining the

portfolio allocation problem of a Bayesian investor when returns may be predictable. In addition

to the predictability evidence, the investor uses asset pricing models to guide him in the portfolio

selection decision. In particular, since he conditions on ex-ante information, a natural benchmark is

provided by conditional asset pricing models, in which the stock return predictability is attributed

to time variation in risk premia and risk exposures or betas. The use of conditional models, however,

introduces further uncertainty into the problem related to the unobservable dynamics of expected

returns, risk premia and betas. As a first attempt to deal with this type of model uncertainty,

I suggest a simple econometric framework in which betas and risk premia are latent, stochastic

functions of the predictive variables.

Although the predictability evidence goes back at least to the late 70s,1 its effect on portfolio

allocation decisions (i.e., its economic significance) has not been explored until recently. In partic-

ular, Kandel and Stambaugh (1996) have shown that the asset allocation decision of a Bayesian

investor between a market index and a riskless asset is affected by the predictability evidence even

though this evidence could be regarded as insignificant using standard statistical measures.2 With

the exception of a few studies (e.g., Avramov, 2000; Bauer, 2000; Cremers, 2001), however, previous

literature has reduced the asset choice to a market index and a riskless asset. Allowing for multiple

risky assets is interesting because there are considerable cross-sectional differences in the time series

predictability of returns. For example, expected returns on small capitalization stocks are more

sensitive to changes in several predictive variables, such as dividend yields and default spreads,

than expected returns on large capitalization stocks (e.g., Fama and French, 1989; Harvey, 1989).

Thus, one contribution of this paper is to provide further evidence on the economic significance of

predictability when the investor allocates his funds across multiple risky assets.

Once several risky assets are introduced into the problem, the potential usefulness of asset

pricing models becomes an important consideration. For example, Pastor (2000) has shown that

an investor’s beliefs about the validity of an (unconditional) asset pricing model can largely affect

his optimal portfolio (see also Jorion, 1991; Grauer and Hakansson, 1995). These studies, however,

assume that returns are identically, independently distributed and focus on unconditional models.

In the presence of return predictability, the relevant benchmark is provided by conditional asset

1For example, see Fama and Schwert (1977) for an early study. More recent studies include Keim and Stambaugh(1986), Fama and French (1989), Goetzmann and Jorion (1993), and Kothari and Shanken (1997).

2Other studies examining the economic significance of return predictability include Pesaran and Timmerman(1995), Kim and Omberg (1996), Brennan et al (1997), Campbell and Viceira (1999), Barberis (2000), Avramov(2000), and Shanken and Tamayo (2001).

1

pricing models, in which the return predictability is captured by time variation in risk premia and

betas. Thus, in this paper, I extend the previous literature by formally considering the role of

conditional asset pricing models in portfolio selection problems. Although conditional asset pricing

models have been widely examined, there is no evidence about the extent to which departures from

the models and prior beliefs affect an investor’s optimal portfolio.3 For example, if an investor

dogmatically believes in a conditional model, his optimal portfolio should be a combination of only

benchmark portfolios that expose investors to priced sources of risk. At the other extreme, if he

dogmatically believes that the predictability cannot be explained by an asset pricing model, he

should ignore any evidence supporting the model. Finally, if the investor does not hold dogmatic

beliefs, his optimal portfolio should be affected by the sample evidence on time variation in alphas,

betas, and risk premia, and the model’s ability to explain average returns.

The use of conditional models, however, complicates the analysis because it introduces further

uncertainty into the problem. Prior literature has shown (e.g., Zellner and Chetty, 1965; Bawa,

Brown and Klein, 1979; and, more recently, Kandel and Stambaugh, 1996; Anderson et al, 1999;

Barberis, 2000; Maenhout, 2000; Pastor; 2000; Avramov, 2000) that the optimal portfolio of a

Bayesian investor is affected by parameter or, more broadly, model uncertainty. In a conditional

setting the existence of model uncertainty is an important concern, because finance theory provides

only a vague indication of how expected returns, risk premia and betas vary over time. Hence,

investors face additional uncertainty associated with the unobservable dynamics of these inputs to

the portfolio selection problem. Although this type of uncertainty has been addressed in several

theoretical papers, there is little empirical evidence about its effect on asset pricing. In this paper, I

empirically explore how uncertainty about the beta dynamics affects an investor’s optimal portfolio

and, more generally, suggest an econometric approach to incorporate uncertainty about the model

dynamics into problem.

The model that I suggest treats betas, and potentially risk premia, as latent (or unobservable)

variables and stochastic functions of some observable instruments. Previous literature has modeled

betas as deterministic functions of ex-ante observable variables even though deterministic models

are unlikely to be perfect descriptions of the true dynamics.4 The possibility of model misspecifi-

cation has therefore been ignored in these studies. The framework that I suggest allows for model

misspecification and incorporates the investor’s uncertainty about the beta dynamics into the prob-

lem. In particular, by using stochastic betas, the investor can specify a prior distribution for the

error terms in the beta equation to reflect his uncertainty about the beta model. In addition, he

3Conditional models have been studied by Bollerslev et al (1988), Harvey (1989), Shanken (1990), Bodurtha andMark (1991), Ng (1991), Ferson and Harvey (1991, 1993, 1999), Evans (1994), Ferson and Korajczyk (1995), Braunet al (1995), He et al (1996), Ghysels (1998), and Lewellen (1999) among others.

4Stochastic models for betas have been previously suggested by Rosenberg (1973) and Ohlson and Rosenberg(1983), among others. However, these studies model betas as functions of lagged betas only and, more importantly,do not present a framework to address directly the investor’s uncertainty about the beta dynamics.

2

can separate his uncertainty about the beta dynamics from his beliefs about the asset pricing model

(alphas). For example, if an investor believes in the conditional CAPM but is uncertain about the

true beta process, he will probably set very tight priors around zero for alphas and allow for fairly

diffuse priors for the regression parameters and the residual variance in the beta equation. Given

that the time variation in alphas and betas play very different roles in portfolio allocation problems,

it is important to consider these two sources of model uncertainty separately.

The framework that I adopt is Bayesian. Bayesian methods provide a convenient way to explore

portfolio selection problems because they account for parameter uncertainty by considering the

predictive distribution of returns. Furthermore, an investor’s priors about the model can be formally

incorporated into the portfolio selection problem. To estimate the posterior distribution of the

model parameters, I provide a new Bayesian estimation method that could be applied to other

problems. In particular, I use a data augmentation algorithm via Markov Chain Monte Carlo

methods, which consists of augmenting the data (i.e., the returns and predictive variables) with

a latent variable, the betas, and using the Gibbs sampler or the Metropolis-Hasting algorithm

to estimate the posterior distributions of the parameters. Also, I provide some insight into the

stochastic nature of betas and examine to what extent the simplification of deterministic, time-

varying betas is warranted.

To illustrate the framework suggested in here, I examine the optimal portfolio of an investor

who allocates funds across a riskfree asset, a value-weighted market index (benchmark asset) and

a portfolio of small capitalization or value stocks (non-benchmark assets). The investor considers

the conditional CAPM as a reference point and uses the dividend yield, term and default spreads

as predictive variables. I examine the optimal portfolio under two scenarios for the price of risk

(the ratio of the expected return on the market to the variance), constant and time-varying price

of risk; and under two scenarios for the dynamics of beta, deterministic and stochastic betas.

In my empirical analysis, I find that the sample evidence on return predictability affects an

investor’s portfolio allocation decision, which is consistent with previous studies (e.g. Kandel and

Stambaugh, 1996; Barberis, 2000). The optimal portfolio mix, however, depends largely on the

investor’s beliefs about the source of predictability. As expected, if an investor dogmatically be-

lieves that the predictability can be captured by a conditional model, the optimal allocation to

non-benchmark assets does not depend on the predictability evidence. However, if he allows for

conditional mispricing, the predictability evidence plays a major role in his allocation decision. I

also find that incorporating the investor’s uncertainty about the beta dynamics into the problem is

economically important and that allowing for time variation in the risk premia makes the optimal

portfolio even more sensitive to the predictability evidence.

This paper proceeds as follows. Section two presents a general framework to investigate how

predictability in asset returns affects an investor’s portfolio choice. First it briefly reviews the

3

literature on portfolio selection in a Bayesian setting and then introduces the framework suggested

in this paper to incorporate both return predictability and the investor’s beliefs about the source

of predictability in the portfolio selection problem. Section three presents the modeling framework

and Bayesian methodology used in this paper. It also discusses ways to examine the economic

significance of the evidence on predictability and the source of this predictability. Section four

describes the data and the priors. The empirical results are presented in sections five and six.

Section seven concludes the paper.

2 Portfolio Selection Under Time Series Return Predictability

2.1 General Framework and Previous Work5

Consider a risk averse investor with a one-period investment horizon who invests in a riskless

asset and a portfolio of (� + �) risky assets. � of the risky assets are benchmark portfolios

that expose the investor to priced sources of risk and � of them are “non-benchmark” portfolios.

Let � denote the fraction of the investor’s portfolio allocated to the riskless asset and � denote

the (� + �) × 1 vector of weights in the risky assets, such that �0��+� = 1� where ��+� is a

conformable vector of ones. The investor’s wealth at the end of month � + 1 is

��+1 =��

¡1 + � + (1− �)�0�+1

¢� (1)

where � is the rate of return on the riskless asset, observed at the end of month � , and �+1 is

the (� +�)× 1 vector of returns on the risky assets in month � + 1 in excess of � .

Let � be the information that the investor observes at the end of month �� The investor

chooses � so as to maximize the expected utility of his wealth at the end of month � + 1�

max�

Z�(��+1) (�+1|� )��+1� (2)

where �(�) is the investor’s utility function and (�+1|� ) is the density of �+1 conditional on

� , also referred to as the predictive density.

To derive the predictive density (�+1|� ) and, hence, obtain the optimal portfolio, a Bayesian

investor updates his beliefs about the model’s parameters, Θ� and specifies the likelihood function

for the future observation, (�+1|Θ� � )� After observing the data, the investor’s beliefs about the

parameters are summarized by the posterior distribution of Θ, which is proportional to the prior

distribution and the likelihood function

(Θ|� ) ∝ (Θ)�(Θ|� )� (3)

5For more details about the general framework to portfolio selection in a Bayesian setting, see, for example, Bawaand Brown (1976) and, more recently, Kandel and Stambaugh (1996), Barberis (2000), and Pastor (2000).

4

The predictive density is then obtained by integrating the product of the likelihood function for

the future observation and the posterior distribution of Θ with respect to the parameters

(�+1|� ) =

Z (�+1|Θ� � ) (Θ|� )�Θ� (4)

By integrating over the parameter space, the Bayesian investor explicitly takes into account pa-

rameter uncertainty or estimation risk in the portfolio allocation problem.

From the discussion above, it becomes clear that the optimal portfolio depends on the model

(likelihood function) that the investor specifies (in addition to his prior beliefs and utility function).

Traditionally, most studies on portfolio choice in a Bayesian setting have assumed that the distri-

bution of returns is identically and independently distributed (i.i.d.) over time (e.g., Klein and

Bawa, 1976; Bawa, et al, 1979; Jobson and Korkie, 1980; Jorion, 1985, 1991; Frost and Savarino,

1986). Starting with the pioneering work of Kandel and Stambaugh (1996), some recent work has

analyzed the portfolio allocation problem when returns may be predictable.

Kandel and Stambaugh (1996) examine the portfolio choice between a market index and a

riskless asset of an investor who conditions on dividend yield. They find that the optimal portfolio

may depend on the current level of dividend yield even though its predictive ability could be

regarded as insignificant using standard statistical measures. Barberis (2000) extends the analysis

to long-horizon investors and finds that investors allocate more to stocks the longer their horizon as

a result of the predictability evidence. Finally, Shanken and Tamayo (2001) explore the portfolio

allocation problem when both expected returns and risk are time-varying. They show that an

investor’s optimal portfolio is affected by his prior beliefs about whether the predictive ability of

the dividend yield is due to changes in risk or mispricing. Studies extending the analysis to multiple

risky assets include Avramov (2001), Bauer (2001) and Cremers (2001).6

Most of this work, however, has not considered the potential usefulness of asset pricing models

in portfolio selection problems, mainly because it tends to focus on a single risky asset. As Pastor

(2001) puts it, these studies represent “data-based” approaches to portfolio selection; they specify

a functional form for the distribution of returns and estimate the parameters from the data. When

investors can choose from a wider set of assets however, they are likely to also use theoretical models

to help them in their portfolio allocation decision. In contrast to the“data-based” approach, this

“model-based” approach specifies an asset pricing model and the optimal portfolio of every investor

is a combination of benchmark portfolios that expose investors to priced sources of risk.7 Pastor

(2001) examines an approach to portfolio allocation that lies in between the “data-” and “model-

based” approaches in an i.i.d. context. He finds that an investor’s beliefs about a model’s pricing

ability can affect his asset allocation decision.6Other studies analyzing the asset choice problem between a market index and a riskless asset in the presence

of time-varying expected returns include Kim and Omberg (1996), Brennan et al (1997), and Campbell and Viceira(1999). These studies, however, ignore parameter uncertainty.

7The notion of “data” versus “model-based” approaches to portfolio selection was first introduced in Pastor (2000).

5

The portfolio selection problem presented in this paper also incorporates a “data-and-model-

based” approach. However, I relax the i.i.d. setting by allowing for time series predictability in

returns. Accordingly, the investor analyzed here focuses on conditional rather than unconditional

asset pricing models. These models are discussed in the next subsection.

2.2 Portfolio Selection Using Conditional Models

Consider an investor who observes� predictive variables (including a constant), Z� = [1�...,� ]�

at the beginning of month � , where � = [1� �1��...,��−1�� ] is a 1×� vector for � = 1� �� and �

is a subset of all the information that investors use to set prices, �� The investor decides upon the

optimal weight on asset �� which can be a benchmark or non-benchmark portfolio, based partly on

the evidence from the time-series regression

��+1 = ��+ ��+1� (5)

for � = 1� �� +� and � = 1� �� and ��+1 ∼ �(0� �2��+1)�

In addition to the evidence from this predictive regression, the investor uses, at least indirectly,

asset pricing models to guide him in the portfolio selection problem. In particular, he may believe

that the expected returns on some assets (non-benchmark assets) are partly explained by the

assets’ exposures to priced sources of risk, whose realizations are replicated by the returns on the

benchmark assets. Hence, the investor follows a “data-and-model” based approach to portfolio

selection and, since expected returns may be time-varying, the natural reference point is provided

by conditional asset pricing models.

According to a conditional model, the cross-sectional and time series variation in expected

returns can be explained by a model with time-varying betas and risk premia. If there are �

priced sources of risk, the expected return on non-benchmark asset �� = 1� �� conditional on

all available information is given by

�[��+1|��] = �[��+1|��]�� (6)

� is a 1×� vector of risk premia, �� is a � × 1 vector of conditional factor loadings and � (�|��)denotes expectation conditional on ��. If the conditional asset pricing model holds, the optimal

portfolio of every investor should be a combination of the � benchmark portfolios.

The investor, however, may not believe that a conditional model can capture all the time-series

return predictability or price assets.8 Furthermore, even if he is fairly confident that a model could

hold conditional on all available information, he may want to account for model departures in his

8As in previous studies (e.g. Kandel and Stambaugh, 1996; Pastor, 2000), the investor in this paper should notbe viewed as a representative investor, since equilibrium cannot be obtained if all investors have the same beliefs anduse the same model (likelihood function) as this investor.

6

analysis, given that he uses a subset of information ��. This investor can use both a conditional

model and the predictive regression in (5) to help him in the portfolio selection problem.

Assume, for ease of exposition, that the investor chooses between a market index (the benchmark

asset) and a portfolio of non-benchmark assets. Assume also that he uses the conditional Sharpe-

Lintner CAPM as a reference point in the asset allocation decision, where the market index is a

proxy for the market portfolio (hence, � = � = 1). The investor could combine the “data-based”

and “model-based” approaches to portfolio selection by examining the following system for the

non-benchmark asset

�+1 = �� + ��+1 + ��+1 (7)

�� = ��

�� = �� + �∗ �

where �� and � are �×1 vectors of parameters, ��+1 ∼ �(0� �2��+1) and �∗ � ∼ �(0� �∗2� )� For the

benchmark asset he still uses the predictive regression in (5).9

The models presented in (5) for the benchmark asset and in (7) for the non-benchmark asset

provide a general framework to analyze the role that beliefs play in an investor’s asset allocation.

In particular, it is possible to study how the optimal portfolio depends on the investor’s beliefs

about predictability, and the ability of a conditional model to capture the predictability and price

assets.

To make the specification of prior beliefs easier, express the predictive variables, �, as deviations

from their means. In this case, assuming stationarity, the first elements of the parameter vectors

� in (5), and �� and � in (7), denoted by �� , and � respectively, are the long run means of the

market risk premia, alphas and betas. Hence, the simple case in which the investor has dogmatic

beliefs that expected returns are constant can be represented by setting very tight priors around

� = [�0 00�−1]0 for the benchmark asset in (5), and �� = [�0 00�−1]

0, and � = [�0 00�−1]0 and

�∗2� = 0 for the non-benchmark asset in (7), where 0�−1 is a (� − 1) × 1 vector of zeros and�0� �0� and �0 are the (long run) prior means of the risk premium, alpha, and beta. Cases in

which the investor allows for predictability can also be easily explored. For example, if an investor

dogmatically believes in the conditional model, he will have very tight priors that the parameters

for alpha equal zero, i.e., �� = 0� . At the other extreme, if he dogmatically believes that the

predictability cannot be explained by an asset pricing model, he will center �� away from zero or

set diffuse priors around �� = 0� , and maybe �∗2� = 0� � = [� 00�−1]

0� and � = [� 00�−1]0� Finally, if

the investor does not hold dogmatic beliefs, he will not specify tight priors and his optimal portfolio

will be affected by the sample evidence on time variation in alphas, betas, and risk premia, and the

model’s ability to explain average returns.

9Note that, for the non-benchmark asset, (5) is nested in (7).

7

2.3 Uncertainty about the Model Dynamics

One problem with conditional models is that finance theory provides very little indication of

how expected returns, risk premia and betas vary over time. Furthermore, empirical analyses of

conditional models necessarily use only part of all the available information. These factors introduce

additional uncertainty into the problem related to the model dynamics. A simplified approach to

deal with this type of uncertainty is to treat the risk premia and betas as latent (or unobservable)

variables and model them as stochastic functions of some observable instruments.

For the non-benchmark assets, I therefore assume that the investor considers the system in

(7). A similar framework was first suggested by Shanken (1990) to test conditional asset pricing

models but, like most of the previous literature, he assumes that betas are deterministic functions

of the predictive variables.10 In contrast, I model them as stochastic functions (i.e., I allow for

error terms in the beta equations), which allows me to incorporate into the problem the investor’s

uncertainty about the beta dynamics by specifying prior distributions for the regression parameters,

� � and the variance of the error term in the beta equation, �∗2� .11 Furthermore, the investor can

separate his uncertainty about the beta dynamics (i.e., � and �∗2� )� from his beliefs about the

asset pricing model (i.e., ��). For example, if he thinks that the conditional CAPM is likely to

hold but is uncertain about the true beta process, he will set very tight priors around zero for the

alpha parameters, ��, and allow for fairly diffuse priors for the regression parameters, � � and the

residual variance �∗2� in the beta equation. Given that time variation in alphas and betas play very

different roles in portfolio allocation problems (see section 3.3), it is important to consider these

two sources of uncertainty separately.

The investor may also want to account for uncertainty about the risk premia dynamics. Again,

he could model the expected excess returns on the benchmark assets as latent variables, � using

the following model

�+1 = � + !�+1 (8)

� = �� + �∗��

where the investor’s uncertainty is reflected by his priors beliefs about �� and �∗2�� The practical

implementation of this system, however, presents econometric problems because empirically the

variance of �∗�� is very small compared to the variance of !�+1. Pitt and Shephard (1999) show that

in this case, if the error terms �∗�� are highly autocorrelated, algorithms to estimate models like (8)

converge very slowly or even fail to converge. So, unless we have strong priors about the parameters10Betas have been modeled as deterministic functions of macroeconomic and financial variables (e.g., Harvey, 1989;

Shanken, 1990; Ferson and Harvey, 1991, 1993, 1999; Ferson and Korajczyk, 1995; He et al, 1996), firm specificvariables (e.g., Ferson and Korajczyk, 1995 and Lewellen, 1999), and lagged values of betas, variances or covariances(e.g, Bollerslev et al, 1988; Bodurtha and Mark, 1991; Ng, 1991; Evans, 1994). See Tamayo (2000) for furtherdiscussion on tests of conditional asset pricing models when betas are stochastic versus deterministic.11 In addition, stochastic models allow for potential misspecification in the model.

8

governing the model, its estimation may present difficulties (the same applies to modeling alphas

as latent variables). Given this problem, in my empirical analysis I do not model the risk premia or

alphas as stochastic processes, although in principle one could do so by specifying strong priors.12

As I show in the empirical section, the estimation of stochastic betas does not pose the problem

described above because the variance of �∗ � is large compared to the variance of ��+1�

3 Modeling Framework and Bayesian Methodology

In the empirical analysis, I restrict the study to � = � = 1; that is, the investor chooses

between a market index (benchmark portfolio), and one non-benchmark portfolio. The methodology

however, can be applied to a larger number of benchmark and non-benchmark assets. The general

case is derived in the appendices.

I model the error terms in the beta equations as first order autoregressive, AR (1), processes.

These errors are likely to be serially correlated since variables other than � (i.e., the “omitted”

variables) could capture persistent components in beta. More generally, model misspecification can

also lead to autocorrelated errors in the beta equation. Therefore, for the non-benchmark asset,

the investor estimates the following model13

�+1 = �� + ��+1 + ��+1 (9)

�� = ��

�� = (� − " �−1)� + " ��−1 + � �

where, " is the autoregressive parameter, and ��+1 ∼ �(0� �2�) and � � ∼ �(0� �2� ) are ho-

moskedastic error terms.14 For the benchmark asset, he estimates the regression

��+1 = �� + ��+1� (10)

where ��+1 ∼ �(0� �2)� The latter regression is also estimated with random coefficients �� to

allow for conditional heteroskedasticity, as explained in the next subsection.

One appealing feature of the model for beta is that it nests many specifications previously

suggested in the literature. For example, assuming that " = 0 and ignoring the error terms in

the � equation, this is equivalent to the specifications in, for example, Shanken (1990), Ferson and

Harvey (1991, 1993, 1999), and Lewellen (1999) where � is linearly related to ex-ante observable

12By specifying strong priors about the model parameters, one “guides” the simulation algorithm and, hence, it iseasier to achieve convergence.13This result is derived by modeling the error term as an AR(1) process, �∗�� = ��

∗�−1 + ��. After substituting

in �� = �� + �∗�� this yields �� = �� + ��(��−1 − ��−1��) + �� = (�� − ��−1)�� + ��−1 + ��14Note that due to leverage effects, the residual variance of the non-benchmark asset could vary with the betas.

This is left to future research.

9

variables. Another case, related to the GARCH specifications in Evans (1994), and Braun et al

(1995), amounts to letting all the elements of � but the constant equal 0 at any ��

3.1 Assumptions About the Price of Risk

One important input to the portfolio selection problem, and central to the variation in betas,

is the conditional variance of the risk premia. Modeling general forms of heteroskedasticity in a

Bayesian framework adds an additional complication to the problem. This is examined in Shanken

and Tamayo (2001) and is not pursued here.15 Instead, I make some simplifying assumptions and

investigate the investor’s portfolio allocation problem under two different scenarios for the price of

risk (the ratio of the expected return on the benchmark asset to its conditional variance).

In the first scenario, I assume that the investor has dogmatic priors that the price of risk is

constant over time. I model the expected return on the benchmark asset using (5) and vary the

conditional variance accordingly so that the price of risk remains constant. Hence, in this case, I

do not need to model the conditional variance. The assumption of constant price of risk has been

adopted in several tests of conditional asset pricing models although it has been rejected empirically

in several studies (e.g., Harvey, 1989). Nonetheless, it provides a starting point in the analysis of

the economic significance of time series return predictability in the presence of multiple assets.

In the second scenario, I relax the constant price of risk assumption and model the conditional

variance of the benchmark asset as a function of the predictive variables using a regression model

with random coefficients. For � = 1� �� benchmark assets, the model in (10) is extended to a

model with random coefficients

��+1 = �� + ��+1 (11)

�� = �+ ��

where ��+1 ∼ �(0� �2), �� ∼ �(0�Σ��) and Σ��

is a � × � matrix16 This regression model

with random coefficients and constant variance can be transformed into a regression model with

deterministic coefficients and heteroscedastic error term

��+1 = ��+ �∗�+1 (12)

where �∗�+1 = �� + ��+1� The variance of �∗�+1 conditional on � is

#�[�∗�+1|�] = �Σ��0� + �2� (13)

If the predictive variables are expressed as deviations from their means, then the model in

(13) relates the conditional return variance to the volatility of the predictive variables, typically15Shanken and Tamayo (2001) model the conditional variance of the market as a function of the dividend yield

and use importance sampling to obtain the parameter estimates.16A similar framework could be used to allow for heteroskedasticity in the residual variances of the non-benchmark

assets.

10

macroeconomic and financial variables. This specification is in the spirit of Schwert (1989), who

relates the stock market volatility to the time-varying volatility of a variety of economic variables.

Although this random coefficient model is by no means the only way to model the variances, it

does provide a convenient approach to model them using the methodology developed in this paper.

3.2 Bayesian Methodology

Priors

For the non-benchmark portfolio, I assume independent conjugate priors for the parameters in

(9), which simplifies the computation for the posteriors considerably:17

$¡�� " � �

∗2��

2�

¢= $(��)$(� )$

¡" ¢$(�∗2� )$

¡�2�¢� (14)

In particular, I specify a normal distribution as the prior distributions of �� and � , a normal

distribution truncated to the stationary region (-1,1) as the prior of " and an inverse gamma

distribution as the prior of �∗2� , the long run residual variance in the beta equation. The prior of

�2� is implied by the priors of �∗2� and " (for further details, see Tamayo, 2000). The prior of �

2�

is assumed to be noninformative, $ (��) ∝ �−1� �

Finally, for the benchmark portfolio, I assume that the priors of the parameters in model (10)

are also independent conjugate priors:

$¡��

2

¢= $(��)$(�

2)� (15)

I specify a normal distribution as the prior distribution of �� The prior of �2 is assumed to

be noninformative. The actual values for all the prior distributions are discussed in section 4.3 and

Appendix V.

Estimation of Posteriors: Data Augmentation using MCMC Methods

The estimation the models when beta is a deterministic function of the explanatory variables

follows standard Bayesian results (see Appendix I). When beta is a stochastic function, as in (9),

the system is a state-space model and its estimation is more complex.18 In a classical framework,

it is estimated using maximum likelihood and the Kalman filter. In a Bayesian framework, it can

17 I use � and to denote prior and posterior distributions of the parameters respectively.18State-space models provide a useful framework to express dynamic systems that involve unobserved state variables

or stochastic coefficients. A state-space model consists of two-equations: a measurement equation and a transitionequation (also called state equation). The measurement equation describes the relation between the data and theunobservable state variables (or stochastic coefficients). The transition equation describes the dynamics of the statevariables (or stochastic coefficients) and has the form of a first-order difference equation in the state vector. In thecontext of this paper, the measurement is the asset pricing model and the transition equation is the equation forbeta. See Harvey (1989) for further details.

11

be estimated using the two alternative methods that I briefly describe in this subsection.19 Further

details are in Appendices I and III, and Tamayo (2000). Notice that the models with random

coefficients are also state-space models and can therefore be estimated using the approach that I

use to estimate model (9).

The problem with the estimation of the state-space model proposed in (9) is that the functional

form of the joint posterior density of the parameters, ³�� " � �

2� � �

2�|� ��

´� is unknown.

However, it is possible to obtain a sample from it using a data augmentation algorithm via Markov

Chain Monte Carlo (MCMC) methods.20 The basic idea of data augmentation is to augment

the observed data (� �� ) with a latent variable � in order to obtain the augmented posterior

³�� " � �

2� � �

2�|��

´� Given the assumption of independent priors, this posterior can

be decomposed into:

¡�� " � �

2� � �

2�|��

¢=

¡� � " � �

2� |�� 2��

¢(16)

× ¡��|�� 2�� ¢× ¡�2�|��

¢�

These posteriors are either analytically tractable or can be simulated. Therefore, it is possible

to sample from them by using a data augmentation algorithm via the Gibbs sampler, when the

posteriors have familiar distributions, or via the Metropolis-Hasting algorithm, when the posterior

distributions are unfamiliar (see Appendix III). Finally, there are two alternative methods to sample

from the distribution of �. The first one obtains the posterior distribution of � based on an

approach suggested by Jacquier, Polson and Rossi (1994), and Kim, Shephard and Chib (1998)

for stochastic volatility models, and is derived in Tamayo (2000). The second method uses the

simulation smoother for state space models of De Jong and Shephard (1995), and is described in

Appendix II.

3.3 Predictive Return Density and Portfolio Allocation

The predictive density of the excess returns can be easily computed from the output obtained

from the data augmentation via the MCMC algorithm. Formally, the predictive density of �+1 is

obtained by integrating the joint predictive density of �+1 and ��+1 with respect to ��+1

(�+1|� ) =

Z (�+1� ��+1|� )��+1 =

Z (�+1|��+1� � ) (��+1|� )��+1� (17)

19The Bayesian approach has several advantages over the classical one. First, the estimation of beta is based onthe joint posterior distribution of the parameters rather than on distributions conditional on the MLE estimates ofthe rest of the parameters. Second, the Bayesian approach does not rely on asymptotics. The exact small-sampledistribution of the parameters can be computed using a data augmentation algorithm via Monte Carlo Markov Chainmethods.20MCMC is a simulation technique that generates a sample from a target distribution. It specifies the transition

probability of a Markov process with the property that its limiting distribution is the desired distribution. TheMarkov chain is iterated a large number of times and, under a set of criteria, the resulting sample is a sample fromthe desired distribution. For further discussion, see, Chib and Greenberg (1994), Gilks et al (1997), and Tanner(1996).

12

This predictive density and the predictive density of ��+1 can be obtained by integrating the

product of the likelihood functions for the future observations and the posterior distributions of

the parameters with respect to the parameters:

(��+1|� ) =

ZZ (��+1|�� )�� (18)

and

(�+1|� ) =

ZZZZ (�+1|�� +1� � ) (��+1|� ) (19)

(�� |�� ) (�� |� )��+1�� (20)

The computation of the optimal portfolio weights is greatly simplified in a mean-variance frame-

work because, in this case, the investor only cares about the first two moments of the predictive

distribution, � and %�21 Therefore, following Pastor (2000) and Pastor and Stambaugh (2000), I

consider a risk-averse mean-variance investor who maximizes the mean-variance objective

&� = �� − 12'�2� � (21)

where �� is the expected rate of return on the portfolio, �2� is the variance, and ' is the risk

aversion parameter.

The solution to the expected-utility maximization problem in (2), given the mean-variance

objective (21), is

�∗ = '−1% −1�� (22)

where �∗ is the (� + �) × 1 vector of portfolio weights. These weights do not necessarily sumto one because they are also affected by the investment in the riskfree asset, which is given by

1− �0�+��∗. Hence, they represent the proportion of an investor’s wealth allocated to each asset.

When � = � = 1 the optimal weights in (22) are given by (see Appendix IV)

�∗�� = '−1µ

��

#�(��|�)

¶= '−1

µ��

[1��]()#(�� |�)[1��]0 + #�(��|�)%� +�(�2�|�)

¶(23)

and

�∗� = '−1��

%�− ��

∗�� (24)

where, �∗�� and �∗� are the optimal weights on the non-benchmark asset and the market index

respectively. The conditional moments are calculated using the predictive densities of the returns.

21 In more general situations, it is possible to draw a sample from the predictive densities by sequentially samplingfrom the posterior distributions and computing the future observations �+1 and ��+1 using the models in (9) and(10).

13

The expressions in (23) and (24) provide some insight into what affects the optimal weights.

First, parameter uncertainty reduces the total allocation to risky assets. For example, the denom-

inator in (23) takes parameter uncertainty into account through the covariance matrix of alpha

and beta. In a classical approach to portfolio selection, the parameters are fixed and, hence,

()#(�� |�) = 02×2� In a Bayesian setting, the parameters are random variables and, hence,

()#(�� |�) is a positive definite matrix.22 Second, uncertainty about the beta dynamics increases

the variance of the predictive distribution of the returns on the non-benchmark assets (see Appendix

IV) and, as a result, reduces the allocation to the non-benchmark asset. Third, ceteris paribus,

time variation in betas does not affect an investor’s optimal allocation to the non-benchmark as-

set but his allocation to the benchmark asset could be affected. Finally, conditional mispricing

(time-varying alphas) can affect the optimal weights on both the benchmark and non-benchmark

assets.

3.4 Economic Significance of Return Predictability and Source of Predictability

As in Kandel and Stambaugh (1996), I study the economic significance of return predictability

by examining the sensitivity of the optimal allocation to the most recent observation of the predic-

tive variables, � . I compare the optimal allocation �∗, the solution to the investor’s maximization

problem in (2), to a suboptimal allocation �� which is derived by solving (2) when the most recent

observation � is replaced by a different observation �� . Although the posterior distributions of

the parameters are the same under both samples, the densities of the future observations �+1

and ��+1 differ because the most recent values of the predictive variables are different.23 Hence,

differences between �∗ and �� document the effect that the sample evidence on predictability has

on the optimal allocation.

A second way to assess the economic significance of the sample is to examine whether the

optimal portfolio is sensitive to the source of predictability, namely model mispricing, changes in

betas or risk premia. In this case, I compare the optimal allocation �∗� to a suboptimal allocation

��, that is obtained by solving (2) under an alternative model (or likelihood). For example, one

model may be the CAPM with constant alphas and time-varying betas and the alternative model

the CAPM with time-varying alphas and betas. The information set, � , is assumed to be the same

in both cases. Therefore, differences between �∗ and �� reflect the role that the investor’s asset

pricing model plays in his portfolio allocation decision. Unlike the case discussed above, now the

posterior distributions of the parameters are different under the two models because the likelihood

functions and priors differ.24

22For the market weights, parameter uncertainty is also reflected through ��, which increases.23Hence, (Θ|�� ) = (Θ|��

� )� (�+1|Θ� ��+1� �� ) 6= (�+1|Θ� ��+1� �� ) and (��+1|Θ� �� ) 6=

(��+1|Θ� �� )�

24Hence, ∗�(Θ|�� ) 6= ��(Θ|�� )� ∗�(�+1|Θ� ��+1� �� ) 6= ��(�+1|Θ� ��+1� �� ) and ∗�(��+1|Θ� �� ) 6=

14

Finally, I also analyze the economic significance of the sample by computing the investor’s

expected utility loss, measured as the difference in certainty equivalent returns (CER), if he were

to hold a suboptimal portfolio instead of the one he perceives to be optimal. The CER equivalent

comparison (the CER for the optimal portfolio minus the CER for the suboptimal portfolio, &∗−&�

or &∗�−&��) is done using one common predictive distribution: the distribution associated with

the optimal portfolio (i.e., the predictive distribution obtained under the sample � or under the

investor’s model). As Kandel and Stambaugh (1996) emphasize, it is important to compute CERs

using one common probability distribution because otherwise it is difficult to interpret differences

in CERs.

4 Specification of Priors and Data

4.1 Specification of the Priors in the Empirical Test

In the empirical analysis, I assume that the investor is very uncertain about the degree of

predictability (i.e., the parameters in the model) but has dogmatic beliefs about the source of

predictability (i.e., the model). The former assumption (i.e., fairly diffuse priors given a model)

allows me to investigate the role that the sample evidence on predictability plays in portfolio

allocation decisions. The latter assumption (i.e., dogmatic priors for the models) allows me to

investigate the role of the source of predictability.25

For example, consider an investor who believes that the return predictability may not be cap-

tured by a conditional asset pricing model. By setting diffuse priors for the parameters in (10),

it is possible to examine how the predictability evidence affects his optimal portfolio. Conversely,

consider now an investor who dogmatically believes that the predictability can be captured by a

conditional asset pricing model but that the model may misprice assets on average.26 By setting

diffuse priors for the parameters in a constant alpha, time-varying beta model, it is possible to ana-

lyze how the time variation in risk premia and betas affect the investor’s optimal portfolio. Finally,

the role that model mispricing or, more generally, the source of predictability (i.e., the model) plays

in the portfolio allocation decision can be analyzed by comparing the portfolio allocation of these

two investors.

The assumption of diffuse priors given a model is also convenient econometrically because

it simplifies the computation of the posterior distributions of the parameters when some of the

predictive variables, such as the dividend yield, are endogenous regressors. If the regressors are

��(��+1|Θ� �� )� The superscripts ∗� and � denote the investor’s optimal model and an alternative modelrespectively.25To save space, the specific priors used in this paper are discussed in Appendix V.26 If the investor dogmatically believes that the predictability can be captured by a conditional model and that the

model can price assets on average, then he will only invest in the benchmark asset and the risk free asset.

15

endogenous, the predictive regression should be estimated simultaneously with a model for the

predictive variables, unless one has very strong priors that the errors are independent across the

models or specifies diffuse priors for the model parameters (see Stambaugh, 1999). In this paper

I assume that the priors are fairly diffuse and I do not estimate the models for the predictive

variables.27

4.2 Predictive Variables

I use the dividend yield, term spread, and default spread as the investor’s conditioning in-

formation. The predictive ability of these variables has been also documented by, among others,

Keim and Stambaugh (1986), Fama and French (1988, 1989), and used in conditional asset pricing

models by Harvey (1989), Ferson and Harvey (1991, 1993, 1999), Ferson and Korajczyk (1995),

and He et al (1996). Fama and French (1989) suggest that the dividend yield and default spread

capture a component in expected returns that is related to long term economic conditions. They

also argue that the term spread captures a component in expected returns that is related to the

business cycle and is less persistent.

The descriptive statistics for these variables are presented in Table I, Panel A. The dividend

yield is calculated as a function of the value-weighted market returns with and without distributions

and is computed as in Fama and French (1988, 1989). The mean of the dividend yield is 3.55%

per annum and the standard deviation is 0.94%. The default spread is the difference between the

yields on BAA grade bonds and AAA bonds and is obtained from the Federal Reserve database.

The mean of the default spread is 0.085% per month, standard deviation 0.038%. Finally, the term

spread is the difference between the yields on ten-year Government bonds and one-month Treasury

Bill. Its mean is 0.12% per month, standard deviation 0.12%.

In the empirical analysis, I standardize the predictive variables in order to make the interpre-

tation of the slope coefficients easier. Following Fama and French, I do not use the dividend yield

and default spread together in the same regression because they are highly correlated (0.68).

4.3 Benchmark and Non-Benchmark Portfolios

Starting with Banz (1981), a considerable number of studies have shown that small capitaliza-

tion stocks earn higher returns on average than predicted by the CAPM (e.g., Fama and French,

1992, 1993; Chan et al., 1995), although in recent years they have underperformed the market

index.28 Also, there is evidence that value (high book-to-market) stocks earn higher returns on

27Although a system could be estimated using a vector autoregression as the one in Appendix III for �� and ��.28Among the variables that explain the cross-section of average returns, size (Banz (1981) and book-to-market

ratio (Stattman (1980), Chan et al (1991)) have emerged as the most relevant ones (e.g., Fama and French (1992)).

16

average than growth (low book-to-market) stocks after controlling for beta risk, size and other firm

characteristics (e.g., Fama and French, 1992, 1993; Chan et al., 1995).

From an asset allocation perspective, Pastor (2000) finds that an investor who is not very

confident about the unconditional CAPM should short the size premium (the return on SMB,

small minus large capitalization stocks) over several periods, including the 90’s. On the other

hand, he shows that even an investor with strong beliefs in the unconditional CAPM should invest

a considerable amount in the value premium (the return on HML, high minus low book-to-market

stocks).

Like previous studies, I focus on the size and value anomalies and use the smallest capitalization

and the highest book-to-market (BM) quintiles as the non-benchmark assets. As the benchmark

asset, I use the value-weighted CRSP market index. In addition to the riskfree asset, the investor

is assumed to invest in the market index and a portfolio of small capitalization stocks or a portfolio

of value stocks. I consider these non-benchmark portfolios for two reasons. First, the time series

predictability of the returns on the size and value portfolios presents cross-sectional differences

that could have important implications for asset allocation. And second, there is considerable

evidence suggesting that conditional asset pricing models cannot explain the average returns on

these portfolios, or fully capture the time variation in their expected returns.

To calculate the returns on the size (BM) quintiles, I sort on the basis of size (BM) all NYSE,

AMEX and NASDAQ stocks with market value data on CRSP for the current month (book data

on Compustat for the previous fiscal year).29 Stocks are divided into size (BM) quintiles portfolios

using only NYSE stocks to calculate the breakpoints to avoid a disproportionate number of stocks

in some portfolios (e.g., smallest capitalization). The sample is from January 1963 to December

1998 because book data is not available prior to 1963. Descriptive statistics for the excess returns

on the market index, and the highest BM and smallest capitalization quintiles are provided in Table

I, Panel B.

In order to provide some insight into the cross-sectional differences that arise in the time series

predictability of returns,Table I, Panel C presents the maximum likelihood estimates from regressing

the excess returns on the predictive variables. These estimates should be interpreted with caution

given the small-sample problems documented by Stambaugh (1999). The dividend yield predicts

excess returns on the value and size portfolios. A one-standard deviation increase in the dividend

yield predicts, ceteris paribus, a 0.50% and 0.46% increase per month in the excess returns on

the value and size portfolios respectively. The evidence on predictability using the dividend yield

for the market index is insignificant using standard statistical measures. However, as Kandel and

Stambaugh (1996) observe, the economic significance of this predictability is not readily conveyed

29To ensure that book data is known to investors when computing BM, I do not use book data until six monthsafter the fiscal year end.

17

by standard statistical measures (see next sections). The predictive ability of the term spread

is also statistically insignificant using standard statistical measures. Finally, the default spread

predicts excess returns on all three portfolios. A one-standard deviation increase in the default

spread results in a 0.79% increase in the excess return on the value portfolio, and 0.65% and 0.37%

increases in the excess returns on the size portfolio and market index respectively.

5 Posteriors of the Model Parameters

Tables II and III present the posterior means and standard deviations of the model parameters

for the value and size portfolios respectively. The predictive variables in Panel A and C (Panel B

and D) are the dividend yield and term spread (default and term spreads). In Panels A and B,

I assume that betas are either constant or deterministic functions of the predictive variables. In

Panels C and D, I assume that betas are stochastic functions of the predictive variables and, hence,

implicitly allow for uncertainty about the beta dynamics.

As previously documented in the literature, value stocks earn higher returns on average than

predicted by the CAPM (i.e., the long run mean of alpha, the ��1 parameter, is positive - second

column in Table II).30 Roughly half of the average monthly return (around 0.48% per month or

5.9% per annum) cannot be explained by the CAPM. Further, the posterior standard deviations

of the ��1 parameter are small relative to the means and the prior standard deviations, indicating

that the data strongly supports the evidence on mispricing. The long run betas (the � 1 parameter)

are smaller than one, suggesting that value stocks are less risky than the market portfolio proxy,

which is also consistent with previous studies (e.g., Fama and French, 1992).

To examine to what extent the predictability evidence is captured by a conditional CAPM, I

compare the results in Tables I and II . This comparison is meaningful because I specify diffuse

priors given a model and the predictive variables are standardized.31 Starting with Panel A of

Table II, I find that the predictive ability of the dividend yield cannot be fully attributed to time

variation in risk premia and betas (i.e., the ��2 parameter is not equal to zero). Of the 0.50%

increase per month in the value portfolio return associated with a one-standard deviation increase

in yield (Table I), nearly 40% can be attributed to time variation in the market risk premia (the

parameters associated to the dividend yield decrease from 0.499 in Table I to 0.310 in the constant

beta, time varying risk premia model in Table II). A further 10% can be attributed to time variation

in betas (the parameter decreases to 0.261 when betas are also allowed to be time-varying). Hence,

30Note that since I have standardized the predictive variables, ��1 and ��1 represent the unconditional or long runmeans of alpha and betas respectively.31Under diffuse priors, the means of the posterior distributions of the parameters are the same as maximum

likelihood estimates of the parameters.

18

at most, I can explain roughly 50% of the dividend yield predictive ability using the conditional

CAPMs suggested in this paper. The weak predictive ability of the term spread, however, can be

explained by a conditional CAPM.

Similar findings are reported in Panel B using the default spread as the predictive variable. Of

the 0.79% increase per month in the value portfolio return associated with a one-standard deviation

increase in the default spread (Table I), roughly 45% can be attributed to time variation in the

market risk premia and another 7% to time variation in betas (the parameters decrease from 0.787

in Table I to 0.436 in the constant beta, time varying risk premia model and to 0.378 when betas

are also allowed to vary in Table II).

The inability of the of the conditional CAPM to capture all the predictive ability of the dividend

yield and default spread persists when betas are modeled as stochastic functions of the variables,

as shown in Panels C and D. The means of the conditional alphas are smaller now (the standard

deviation of the parameters is similar), which suggests lower conditional mispricing. However, much

of the predictability evidence, around 40-45%, remains unexplained by the conditional CAPM. The

long-run alpha is also a bit smaller for the value portfolio when betas are stochastic functions.

Regarding the dynamics of betas, Table II shows that betas vary with both dividend yields

and default spreads. For example, an investor with diffuse priors about whether betas vary with

the default spread will conclude that a one-standard deviation increase in default predicts an

increase in betas of 0.083, if he dogmatically believes that the predictability can be explained

within a conditional CAPM, or of 0.061, if he has diffuse beliefs about the source of predictability.

Furthermore, Panels C and D suggest that betas are stochastic, rather than deterministic, functions

of the predictive variables. The posterior means of the long run standard deviation of the errors

in the beta equation, �∗� � are around 0.33 (recall that the prior is diffuse), and the means of the

autoregressive parameters are around 0.6 (0.2 standard deviation). Although the mean of �∗� can

be slightly biased upwards in small samples when the true value of �∗� is really small (around,

0.01, see Tamayo, 2001), it is surprisingly large. Thus, the evidence in Panels C and D suggests

that much of the time variation in betas cannot be captured by the dividend yield, and default and

term spreads.

Turning to the evidence for the size portfolio in Table III, I find that small stocks earn on average

higher returns (positive ��1 parameter) than predicted by the CAPM, which is consistent with prior

literature. However, these abnormal returns are smaller than those previously documented (e.g.,

Fama and French, 1993) as a result of including of the 90’s in the sample. Also, the posterior

standard deviations of the ��1 parameter are large compared to their means, suggesting that the

sample evidence on average mispricing is weak. The annualized average abnormal returns vary from

0.28% for the static CAPM in Panels A and B, to 1.34% for the conditional CAPM with stochastic

betas in Panels C and D. It is interesting to note that models with stochastic betas yield larger

19

average abnormal returns for the size portfolio. This suggests that allowing for heteroskedasticity

in the variance of the size portfolio results in larger mispricing, which is consistent with previous

findings (e.g., Seguin and Schwert, 1990). Also, consistent with previous evidence, the long run

betas are larger than one.

Comparing the results in Tables I and III, I find that a substantial proportion of the size return

predictability seems to be captured by a conditional CAPM. For example, of the 0.65% increase per

month in returns associated with a one-standard deviation increase in default (Table I), roughly

65-70% can be attributed to time variation in the market risk premia (the parameters decrease

from 0.65 in Table I to 0.216 in the constant beta, time varying risk premia model in Table III).

Furthermore, the predictability of the alphas are insignificant using traditional frequentist methods

although, as I show in the next section, their economic significance is non-trivial.

Finally, I find that the betas of the size portfolio are negatively related to the dividend yield and

term spread, which is not what one would expect a priori based on economic intuition. However,

the standard deviations of these parameters are large comparing to the means, suggesting that

this evidence is weak. The sample evidence also suggests that betas are stochastic functions of

the explanatory variables (Panels C and D). The posterior means of the long run residual standard

deviation in the beta equation, �∗� � are around 0.37, and the means of the autoregressive parameters

are around 0.45 (0.18 standard deviation). Thus, much of the time variation in betas is not captured

by the predictive variables used in this paper.

6 Economic Significance of Predictability

To examine the economic significance of return predictability, I compute the (non-normalized)

portfolio weights, Sharpe ratio and differences in certainty equivalent returns (CERs) under different

values of the predictive variables.32 The evidence is presented in Tables IV-XI. The columns labeled

by “Mean” compute the optimal portfolios when the predictive variables are at their long run means.

In the other columns, the predictive variable of interest is assumed to be one-standard deviation

above or below its long run mean. To provide a realistic economic scenario, I account for the

correlation across the predictive variables by considering how a change in one of the variables affects

the other one.33�34 The role that the sample evidence on predictability plays in asset allocation can32The tangency portfolio weights are not reported but they can be easily obtained by normalizing the weights of

the risky assets. In the tables, I present the non-normalized weights because they provide a clearer picture about theeffect of predictability on the amount invested in each asset since they take into account leverage effects.33For example, in the column labeled by “∆d/p”, I assume that the dividend yield is one-standard deviation above

its mean and set the term spread at its expected value conditional on the value of the dividend yield.34Mathematically, I compute the Cholesky decomposition of the covariance matrix of the predictive variables, which

yields a 2× 2 upper triangular matrix (because there are two predictive variables in each model specification). Sincethe variables are standardized, the upper off-diagonal element is equivalent to the slope coefficient from regressingthe other predictive variable on the variable of interest.

20

be analyzed by comparing the optimal portfolio weights and Sharpe ratios along a given row.

Additional insight into the economic significance of the sample evidence can be obtained by

computing the investor’s expected utility loss, measured in terms of certainty equivalent returns

(CER), if he were to hold a suboptimal portfolio. As discussed in section 3.4, I present two analyses

of CERs. In Tables IV-IX, I examine how much an investor should be compensated, in terms of

CER, if he were to ignore the predictability evidence. I compute the optimal weights assuming

that the value of the predictive variable of interest is one-standard deviation above/below its mean.

The suboptimal portfolio weights are calculated at the mean values of the predictive variables.

For example, the column ∆d/p compares the CER for the optimal portfolio, which is computed

assuming that the dividend yield is one-standard deviation above its mean, to the CER for a

suboptimal portfolio, which is computed assuming that the dividend yield is at its mean. In Tables

X-XI, I extend the CER analysis by computing the expected utility loss that an investor would

suffer if he were to hold the optimal portfolio of an investor who believes in an alternative model.

Tables IV-IX are derived under different assumptions about the time variation in expected

returns and covariance matrix. I examine three cases: (i) time variation in expected returns only;

(ii) proportional time variation in market expected return and risk, or constant price of risk case;

(iii) (non-proportional) time variation in market expected return and risk, or time-varying price of

risk case.35 In all the tables the risk aversion parameter is assumed to be 2.8, which is the value

at which the investor would allocate 100% of his funds to the value-weighted index if he were only

to invest in the riskless asset and the market index (i.e., before introducing any additional risky

assets).

6.1 Time Variation in Expected Returns Only

Following previous studies, I first examine the case in which expected returns may be time

varying but the return covariance matrix is constant. The regression model underlying this case is

equivalent to a time-varying alpha, constant beta regression model under the diffuse prior assump-

tion. The results are reported in Tables IV and V. The predictive variables in Panel A (Panel B)

are the dividend yield and term spread (default and term spreads).

Consider first the evidence for the value portfolio in Table IV. The investor’s optimal allocation

is sensitive to the level of the dividend yield and default spread, and, to a lesser extent, term spread.

For example, a one standard deviation increase in the dividend yield above its mean increases the

total allocation to the risky assets from 111% (216.30-104.77) to 156% (355.58-199.42) and the

riskless borrowing from 11% to 56%. Conversely, a one standard deviation decrease in the dividend

yield reduces the total allocation to the risky assets to approximately 66% (75.71-9.18), with the

remaining 33% being invested in the riskless asset. The Sharpe ratio increases (decreases) from

35Note that the time variation in expected returns only case also corresponds to a time-varying price of risk case.

21

0.21 to 0.33 (0.10) with a one standard deviation increase (decrease) in yield. Finally, the investor

would have to be compensated by a 3.1% riskless return per annum to ignore the predictive ability

of the dividend yield.

The economic significance of the predictability evidence using the default spread is even stronger.

The total allocation to the risky assets increases (decreases) from 113% to 197% (28%) when the

default spread moves by one standard deviation above (below) its mean. Furthermore, the Sharpe

ratio nearly doubles if the default spread increases by one standard deviation (from 0.21 to 0.39),

and it is dramatically reduced (to 0.04) if the default spread decreases by one standard deviation.

The investor would also demand a higher compensation in terms of certainty equivalent returns to

ignore the predictive ability of the default spread; he would require a certainty equivalent return

of approximately 6.70% per annum.

Finally, the economic significance of return predictability using the term spread is considerably

lower. Although the total allocation to the risky assets and the Sharpe ratio are sensitive to the

value of the term spread, the investor would not require a large compensation, in terms of CERs,

to ignore its predictive ability.

So far I have discussed the effect of predictability on the total allocation to risky assets. One

interesting finding in Table IV is how the allocation to risky assets is actually split between the

market index and the value portfolio. When the dividend yield is at its long run mean, the weight

on the value portfolio is 216%, which is partly financed by shorting the market index by 104%.

Recall that the risk aversion parameter is 2.8, which is the value at which the investor would invest

100% of his wealth in the market index if he were to invest only in the index and the riskfree asset.

The introduction of the value portfolio in the investor’s opportunity set results in short positions in

the market index. One would naturally expect a reduction in the market index weights given the

positive alphas of the value portfolio. In this particular case, the CAPM cannot explain nearly 50%

of the average monthly return on the value portfolio (the long run alpha is about 0.48%, nearly

50% of the 1% monthly return - see Table II). The value portfolio mispricing is nearly as large as

the excess return on the market index, 0.55%. Furthermore, the value portfolio is not much riskier

(measured by the standard deviation) than the market index and actually its beta is smaller than

one. As a result, the investor shifts his allocation to the value portfolio when his opportunity set

is expanded.

Increases in the predictive variables lead to even more extreme portfolio allocations. For ex-

ample, if the dividend yield (default spread) increases by one standard deviation above its long

run mean, the value portfolio weight increases from approximately 216% to 355% (407%), and the

market index is shorted an additional 95% (103%). These large changes in the optimal portfolio

composition are a consequence of the large conditional alphas. In particular, a one standard devi-

ation increase in dividend yield (default spread) increases the model mispricing by an additional

22

0.31% (0.44%) per month (see Table II).

Decreases in the predictive variables affect the optimal portfolio in the opposite direction. The

investor allocates less to the value portfolio and more to the market index. For example, a one

standard deviation decrease in default reduces the value portfolio weight to 30%, the market is only

marginally shorted, and the remaining 72% is invested in the riskless asset.

Turning to the evidence for the size portfolio in Table V, I find that although the evidence on

predictability can be regarded as statistically insignificant using classical statistical methods, it still

plays a role in the portfolio allocation decision. For example, the optimal allocation to risky assets

increases (decreases) from 99% to 164% (34%) when the default spread moves by one standard

deviation above (below) its mean; the Sharpe ratio changes from 0.12 to 0.22 (0.05). Measured in

term of utility loss, however, the evidence on return predictability is not as strong as for the value

portfolio. In order to ignore the predictability evidence, the investor would have to be compensated

by a riskless return of 1.60% or 2.36% per annum depending on whether the predictive variable is

the dividend yield or default spread. I also find that the investor does not short the market index

any longer. When the predictive variables are at their means, the investor’s allocation to the size

portfolio is very small, around 7%. This is not surprising given the small unconditional alphas and

the large uncertainty surrounding then (see Table III). Conditioning on information, however, the

alphas become larger and, hence, the allocation to the size portfolio increases. For example, a one

standard deviation increase in the dividend yield (default spread) increases the model mispricing by

an additional 0.26% (0.22%) per month and the optimal weight on the size portfolio to 80% (58%).

The investor’s allocation to the market index also increases with the predictive variables since (i) a

large proportion of the size return predictability can be attributed to time variation in risk premia;

(ii) and there is large uncertainty surrounding the conditional alphas. Finally, decreases in the

predictive variables do actually yield short positions in the size portfolio.

In sum, the evidence in Tables IV and V suggests that, assuming that the return covariance

matrix is constant, the predictability evidence plays an important role in the investor’s asset allo-

cation decision. An investor who originally invests 100% of his wealth in the market index reduces

his allocation to the index once his opportunity set is expanded to include a value or size portfolios.

In particular, if he were to choose between a value portfolio and the market index, he would con-

siderably short the index (assuming diffuse priors). In the next sections, I explore what happens

when the portfolio risk also varies with the predictive variables.

6.2 Constant Price of Risk

Tables VI and VII present the portfolio weights, Sharpe ratio and differences in certainty

equivalent returns (CERs) for the constant price of risk case. Table VI reports the evidence for

the value portfolio and Table VII for the size portfolio. As before, the predictive variables in Panel

23

A and C (Panel B and D) are the dividend yield and term spread (default and term spreads). In

Panels A and B, I assume that betas are either constant or deterministic functions of the predictive

variables; each panel presents four set of results derived under different assumptions about the

source of predictability (i.e., time variation in risk premia, beta and/or conditional mispricing). In

Panels C and D, I assume that betas are stochastic functions of the predictive variables; each panel

presents two sets of results derived under different assumptions about the conditional alphas.

Overall, I find that the investor’s optimal allocation is very sensitive to the levels of the dividend

yield and default spread, but not the term spread. The economic significance of return predictability,

however, depends on the investor’s beliefs about the source of predictability, namely, time variation

in risk premia and risk and/or model mispricing. The results for the size portfolio are a bit less

extreme than the results for the value portfolio although they convey a similar message. Thus, in

the discussion that follows, I focus mainly on the findings for the value portfolio in Table VI.

The first set of results in Panels A and B assume that the investor allows for predictability and

does not use asset pricing models to guide him in the portfolio allocation decision. Like in Tables

IV and V, the regression model underlying this case is equivalent to a time-varying alpha, constant

beta regression model under the diffuse prior assumption. However, the price of risk is assumed

to be constant now. Compared to the results in Table IV, I find that the total allocation to the

risky assets becomes a bit less sensitive to the predictability evidence. This is to be expected given

that the constant price of risk assumption implies that both the expected return on the market

index and the variance move proportionally (hence, a positive expected return effect in the market

weights is offset by a negative variance effect). The more surprising finding is the magnitude of

the results. For example, if the default spread increases by one standard deviation above its long

run mean, the total allocation to risky assets increases from 113% to 124% (instead of 197% as

in Table IV). As before, the value portfolio weight increases from 219% to 407% and the investor

decreases his position in the market from -106% to -283%.36 The short position becomes larger

than in Table IV because increases in the predictive variables increase not only expected returns

but also risk. Finally, despite the constant price of risk assumption, the predictability evidence is

still economically significant. For example, the Sharpe ratio increases from 0.21 to 0.36 with a one

standard deviation increase in default above its mean. The investor would have to be compensated

by a riskless return of 4.80% to ignore this evidence.

The second and third set of results in Panels A and B explore the opposite cases: the investor

uses asset pricing models and believes that the return predictability is due to time variation in risk

premia and/or betas (i.e., he does not allow for conditional mispricing). More specifically, in the

second set of results, the investor believes that all the predictability is due to time variation in

36The allocation to the non-benchmark asset is the same as in Table IV because it depends on the alphas but noton the assumption about the price of risk.

24

risk premia.37 Given the assumption of constant price of risk, in this case, the optimal allocation

does not depend on the value of the predictive variables and the investor is happy to ignore the

predictability evidence. In the third set of results, the investors also allows for time variation in

betas. In this case, the value portfolio weights do not change with the predictive variables but

the market index weights change due to the time variation in betas.38 Thus, the magnitude of

these changes reflect the extent to which the predictability in betas is economically significant. As

shown in Table VI, the predictability in betas is not economically important (although its statistical

significance is large, see Table II). In particular, allowing for time variation in risk premia and betas

yields very similar allocations as allowing for time variation in risk premia only. Furthermore, the

Sharpe ratios are nearly identical and, in both cases, the investor would require a very small

compensation to ignore the predictability evidence.

The fourth set of results in Panels A and B are derived under the assumption that the investor

believes that the return predictability may not be completely captured by a conditional CAPM; he

holds fairly diffuse priors about the source of predictability and, hence, allows for time variation

in risk premia, betas and alphas. The results in Table VI indicate that the investor increases

his investment in the value portfolio when the dividend yield or default spread increase, which is

(partly) financed by shorting the market index further. Since part of the predictability is captured

by the time-varying betas, the optimal portfolio is a bit less sensitive to the level of the predictive

variables than in the constant beta case (first set of results). For example, the investor increases

his optimal portfolio from 221% to 387% (339%) when the default spread (dividend yield) increases

by one standard deviation above its mean; the Sharpe ratio increases from 0.21 to 0.35 (0.31);

and the investor would have to be compensated, in terms of CER, by 3.74% (1.88%) per annum in

order to ignore the predictability evidence, around 1% less than an investor who does not allow for

time-varying betas.

One interesting finding is that the total allocation to the risky assets can decrease when the

predictive variables increase. For example, if the investor has diffuse priors about the source of

predictability (last set of results), the total allocation to risky assets is reduced from 117 to 111

(115 to 109) when the default spread (dividend yield) increases by one standard deviation. This is

driven by the time variation in betas (apart from the assumption of constant price of risk): when the

betas increase with the predictive variables, the covariance between the value portfolio and market

index becomes larger, increasing thereby the portfolio risk. Also, allowing for time-varying betas

introduces further parameter uncertainty into the problem, which makes the predictive residual

variance larger and the investor more reluctant to invest in the asset.

37This is equivalent to a regression model with constant alpha and beta.38Since the price of risk is assumed to be constant, changes in the market index weights are driven by time-varying

alphas and betas while changes in the non-benchmark portfolio weights are mainly driven by time variation in alphas(See 23 and 24)

25

The last two panels of Table VI (Panels C and D) are derived under the assumption that

the investor models the betas as stochastic functions of the predictive variables. As discussed in

section 3.3, modeling betas as stochastic functions introduces further uncertainty into the portfolio

selection problem. Thus, holding the rest of the parameters and the sample constant, the allocation

to the non-benchmark asset should be lower than in the deterministic beta case. In Table III,

however, I find that the distribution of the rest of the parameters depends on the nature of beta.

In particular, for the value portfolio, there is a bit less evidence of CAPM mispricing when beta is

a stochastic function of the predictive variables. This finding reinforces the negative impact on the

value portfolio weights of the increased beta uncertainty. For example, when the default spread is

at its mean, an investor with diffuse priors about the source of return predictability allocates 200%

of his wealth to the value portfolio if betas are stochastic functions (as opposed to 221% if betas

are deterministic functions).

Panels C and D also show that when betas are stochastic functions of the predictive variables, the

economic significance of return predictability is reduced. For example, the maximum compensation,

in terms of CERs, the investor would demand to ignore the predictability evidence is 1.99% (versus

3.78% in Panels A and B). Nonetheless, the predictability evidence impacts the optimal portfolio

composition and the Sharpe ratio. For example, if the investor has diffuse priors about the source

of predictability and the default spread increases by one standard deviation above its mean, the

allocation to the value portfolio increases by an additional 85% (versus 165% in Panel B); the

Sharpe ratio increases from 0.20 to 0.30. Note that although the predictive ability of the dividend

yield would be regarded as insignificant using standard statistical measures, changes in dividend

yield still have a considerable impact on the optimal portfolio weights and Sharpe ratio.

One interesting finding in Panels C and D is that when the investor holds dogmatic priors that

the predictability can be captured by a conditional CAPM (i.e., constant alpha, stochastic beta

model), changes in the predictive variables affect the optimal allocation to the value portfolio.39

For example, when the default spread increases by one standard deviation over its long run mean,

the optimal allocation to the value portfolio (market index) changes from 191% to 154% (-80% to

-53%). Note that, unlike in Panels A and B, increases in the predictive variables actually reduce

the optimal allocation to the value portfolio and increase the allocation to the market index. This

is due to the increase in uncertainty associated to the stochastic nature of beta. Increases in the

predictive variables increase both the market expected return and risk, which, together with the

larger uncertainty in beta, results in larger covariance risk (see A4.15).

Finally, the results in Panels C and D also indicate that the time variation in betas not captured

by the predictive variables is not only economically significant but also larger than the economic

39Recall from the discussion in Panels A and B that if the betas are deterministic functions of the predictivevariables and the alphas are constant, then the optimal allocation to the non-benchmark asset is not sensitive tochanges in the predictive variables (given the constant price of risk assumption).

26

significance of the time variation in betas captured by the variables. For example, a one standard

deviation increase in betas (unrelated to the predictive variables) reduces the allocation to the

market index by an additional 60-65%; the investor would require a compensation of 1.39-1.54%

riskless return per annum to ignore the stochastic nature of betas.

Turning briefly to the evidence for the size portfolio in Table VII, I find that although the

predictability is weaker than for the value portfolio, it still affects the investor’s optimal portfolio. As

before, if the investor does not allow for conditional mispricing (i.e., assumes that alphas are always

constant), the optimal portfolio is insensitive to the predictability evidence, given the constant price

of risk assumption. In contrast, if he allows for conditional mispricing, the optimal portfolio depends

on the levels of the dividend yield, default spread and, to a lesser extent, term spread. For example,

if the investor has fairly diffuse views about the source of return predictability (last set of results

in Panels A and B), his allocation to risky assets decreases from 98% to 87% when the dividend

yield increases by one standard deviation above its mean; the Sharpe ratio increases from 0.12 to

0.17 and the investor would have to be compensated by a riskless return of 1.53% per annum to

ignore the predictability evidence. Note that the investor revises his allocation to the size portfolio

substantially as a result of the changes in conditional alphas, even though these alphas would be

regarded as insignificant using standard statistical measures. Finally, note that increases in the

dividend yield and default spread actually reduce the total allocation to the risky assets (unlike in

Table III, which assumes that the return covariance matrix is constant).

Again, Panels C and D of Table VII report the results when the investor explicitly accounts for

uncertainty about the beta model dynamics by modelling betas as stochastic functions. As shown

in Table IV, when betas are stochastic functions, there is stronger evidence of non-zero average

alphas. As a result, even though the stochastic nature of betas introduces further uncertainty into

the problem, the optimal allocation to the size portfolio is larger than in Panels A and B when

the predictive variables are at their mean values. On the other hand, the economic significance

of the predictability evidence becomes weaker when betas are modeled as stochastic functions. In

particular, the investor would not require a large compensation, in terms of CERs, to ignore the

predictability evidence. Although changes in the predictive variables do lead to changes in the

portfolio composition or the Sharpe ratio, the magnitude of the latter changes are not as large as

in Panels A and B. Finally, the time variation in betas not captured by the predictive variables

does not seem to be economically significant either.

Summarizing, the evidence in Tables VI and VII suggests that, assuming that the price of risk

is constant, the predictability evidence plays an important role in the investor’s asset allocation

decision. The economic significance of return predictability, however, depends on the source of

predictability, namely, time variation in risk premia and risk and/or model mispricing. The eco-

nomic significance of the time variation in betas captured by the predictive variables is not large.

27

The economic significance of the CAPM departures is considerable, especially for the value port-

folio. Finally, allowing betas to be stochastic rather than deterministic functions of the predictive

variables seems economically significant for the value portfolio but not for the size portfolio.

6.3 Time-Varying Price of Risk with Time Variation in Expected Returns andVariances

Tables VIII and IX present the portfolio weights, Sharpe ratio and differences in certainty

equivalent returns (CERs) assuming that the price of risk is time-varying. Unlike in Tables IV and

V, in which the covariance matrix was assumed to be constant (hence, the price of risk was also

time-varying), now I allow for time variation in both the market expected return and risk. Since

the market expected return and risk do not necessarily move proportionally, the price of risk can

be time-varying. Table VIII reports the evidence for the value portfolio and Table IX for the size

portfolio. As before, the predictive variables in Panels A and C (Panels B and D) are the dividend

yield and term spread (default and term spreads). In Panels A and B, betas are either constant or

deterministic functions of the predictive variables; in Panels C and D, they are stochastic functions

of the predictive variables.

I estimate the conditional market variance using the random coefficient model described in

section 3.1, which relates the return volatility to the volatility of the dividend yield, default and

term spreads in the spirit of Schwert (1989). When the predictive variables are the dividend yield

and term spread, the posterior means of the parameters are (in %)=� = [0.490, 0.116, 0.272]’

and the posterior standard deviations *��(�) = [0.205, 0.230, 0.219]’, for the intercept, dividend

yield and term spread respectively. The means of the residual standard deviations of the random

coefficients are 1.883 and 1.104 for the coefficients associated with the dividend yield and term

spread respectively. When the predictive variables are the default and term spreads, the posterior

means of the parameters are=� = [0.559, 0.351, 0.113] ’ and the posterior standard deviations

*��(�) = [0.207, 0.193, 0.226]’, for the intercept, default and term spreads respectively. The means

of the residual standard deviations of the random coefficients are 0.110 and 1.210 for the coefficients

associated with the default and term spreads.

Overall, the results in Tables VIII and IX reinforce the findings discussed so far: the investor’s

optimal portfolio is sensitive to the levels of the dividend yield, default spread, and, to a lesser

extent, term spread; the evidence is stronger for the value portfolio than for the size portfolio; and

the economic significance depends on the source of predictability. I also find that the economic

significance of the evidence on predictability is slightly larger than in the constant price of risk

case and, when the predictive variables decrease, it is also larger than for the constant covariance

matrix case. There are two main differences between the results in Tables VIII and IX and the

results discussed so far.

28

First, when the predictive variables decrease, the optimal portfolios consist of smaller/shorter

positions in the market index and the investor requires a larger compensation to ignore the pre-

dictability evidence. In the proposed model for the variance, the conditional variance increases with

changes in the predictive variables regardless of the sign of these changes. Hence, decreases in the

predictive variables predict higher market risk and lower expected returns, which ultimately results

in shorter positions in the market. For example, as shown in Table VIII, Panel B, if the investor

has diffuse priors about the source of predictability (e.g., deterministic alpha and beta case) and

the default spread decreases by one standard deviation, he shorts the market by -10% (in Table VI,

Panel B, he invests 53% in the market); the investor has to be compensated by a riskless return of

5.22% per annum to ignore the predictability evidence (versus 3.78% in Table VI).

Second, the optimal portfolio is more sensitive to the level of the term spread now because

changes in the term spread affect the market variance. For example, when the predictive variables

are the dividend yield and term spread (Table VIII, Panel A), an investor who has diffuse beliefs

about the predictability evidence would require a CER of 0.99%-1.72% to ignore the predictive

ability of the term spread.

6.4 Comparison of CERs Across Models

In the previous tables, I have analyzed how much an investor should be compensated, in terms

of CER, if he were to ignore the predictability evidence. Another way to examine the economic

significance of predictability and the source of predictability is to compute how much an investor

should be compensated if he were to hold a portfolio derived under an alternative model. In this

case, the sample is held constant across models and the differences in CERs are driven by differences

across models only.

The analysis is presented in Tables X-XI for the value and size portfolios respectively. Since I

find that the results do not depend much on the whether the price of risk is constant or time-varying,

I present the results for the constant price of risk only. I assume that the predictive variables are at

one standard deviation above their long run mean. Again, the results are similar if the predictive

variables are at one standard deviation below their mean.

The results in Tables X and XI reinforce the findings discussed so far. The investor requires a

considerable compensation: (i) if he believes that there may be conditional mispricing (i.e., alphas

may be time-varying) and is forced to hold the portfolio of an investor who dogmatically believes

that a conditional CAPM can explain the predictability in returns (i.e., alphas are always constant),

or vice versa; and (ii) if he is to ignore the stochastic versus deterministic/constant nature of beta.

The investor, however, does not require a large compensation in order to ignore the time variation

in betas captured by the predictive variables. Again, the results for the value portfolio are stronger

29

than for the size portfolio.

Some of the CER are large, especially for the value portfolio. For example, if an investor does

not allow for conditional mispricing but allows for uncertainty about the beta dynamics, he would

have to be compensated by a riskless return of 11.14% per annum to hold the portfolio of an investor

with opposite beliefs (i.e., predictability cannot be captured by a conditional CAPM and betas are

constant) when the default spread is one standard deviation above its mean. As shown in Tables X

and XI, other differences in CER range from 0% to 9.63%. These large differences in CER suggest

that it is important to incorporate into the portfolio allocation problem the investor’s beliefs about

the source of return predictability.

7 Conclusions

This paper examines how an investor’s optimal allocation across multiple risky assets is affected

by the sample evidence on predictability, and the investor’s beliefs about both predictability and

the ability of a model to capture the predictability and price assets. I present a “data-and-model”

based approach to portfolio selection in which returns may be predictable and this predictability

may be captured by a conditional asset pricing model. I also introduce a general econometric

framework to incorporate uncertainty about the model dynamics into the problem.

Using the dividend yield, default and term spreads as predictive variables, I find that the sample

evidence on predictability plays an important role in an investor’s portfolio allocation decision

across multiple risky assets. The sensitivity of the optimal portfolio to the predictability evidence

depends on the investor’s priors about the source of predictability, namely, model mispricing or

time variation in risk premia and betas. In particular, the optimal portfolio of an investor who

attributes the predictability evidence to time variation in risk premia betas is rather insensitive

to the level of the predictive variables. Conversely, if the investor allows for time variation in

alphas, the predictability evidence plays a major role in his portfolio allocation decision regardless

of whether he allows betas to change over time or not. The effect of predictability is also important

in terms of the expected utility. For example, an investor who believes that betas are time-varying

but alphas are constant would suffer a considerable loss in expected utility if he was forced to hold

the optimal portfolio of an investor who allows for time variation in alphas. Likewise, an investor

who allows for time variation alphas would suffer a large utility loss if he were hold the portfolio of

an investor who believes that alphas are constant.

Finally, I find that it is important to incorporate an investor’s uncertainty about the beta

dynamics into the portfolio selection problem. An investor who allows for model mispricing in betas

would suffer a considerable loss in expected utility if he were to ignore the model misspecification

30

possibility. Interestingly, the sample evidence on the stochastic nature of beta is economically

more significant than the sample evidence on time variation in betas associated with the predictive

variables used in this paper.

In sum, when examining the portfolio allocation problem in the presence of predictability and

multiple risky assets, it is important to incorporate into the problem: (i) an investor’s belief about

predictability and this source of predictability; (ii) his uncertainty about the model dynamics; and

(iii) the model’s pricing abilities. The optimal portfolio differ substantially depending on these

factors.

The framework introduced in this paper could be extended to examine other questions. It could

be applied to multifactor models and other assets. The latter may be particularly interesting given

the cross-sectional differences that arise in the time series predictability of returns. The framework

discussed here could also be extended to allow investors to have multiple period investment horizons.

More informative priors and other forms of conditional volatility could also be explored. Finally,

the out-of-sample performance of portfolios resulting from different beliefs in the models could be

examined.

31

APPENDIX I:

Bayesian Estimation of the Conditional CAPM with Deterministic � and �

Rewrite the conditional asset pricing model in (7) can be rewritten in vector notation as

= +� + �� (A1.1)

where

• = (1� �� )0 is a �� × 1 vector and is a � × 1 vector for � = 1� ��

• � is a �2� × 1 vector, � = [��1 , � 1� ��

]0� and �� and � are (� × 1) vectors,

• + is a ��×�2� matrix+ =

01 ,��+1 · 01 0 �� 0 00 � 0��

0 0 �� 0 0� ,��+1 · 0�

, is a�×�

matrix of explanatory variables for asset � = 1� �� and ,+1�� is � ×� diagonal matrix with

diagonal elements {�+1��} for � = 1� ��

• and � ∼ � (0�Σ⊗ �� ) �

Assuming independent normal and inverted Wishart priors for � and Σ respectively,

� ∼ �(��) (A1.2)

Σ−1 ∼� (-−1� #)� (A1.3)

the conditional posterior distributions of � and Σ are given by:

�| ¡Σ−1� �+¢ ∼ �

µ=��

=��

¶� (A1.4)

Σ−1| (�� +) ∼�³¡- + -

¢−1� # + �

´� (A1.5)

where

=�� = ��

−1++0 ¡Σ−1 ⊗ ��

¢+� (A1.6)

=� =

=��

h��

−1� ++0 ¡Σ−1 ⊗ ��

¢i� (A1.7)

and - = [-�] � a � ×� matrix with elements -� = ( −+�)0 (� −+��) �

32

APPENDIX II:

Application of The Simulation Smoother for Time Series Models of De Jong and

Shephard (1995) to Sample the Latent Betas

In this appendix, I show how to sample the latent betas using the simulation smoother for time

series models of De Jong and Shephard (1995). Following the time series literature, the latent betas

are defined as a stack of state vectors with respect to state space form and (�|) is assumed tobe Gaussian. This sampling method presents computational advantages, especially for elaborate

models.

Formulation of the State Space Model

Let �� = (�1�� )0, for � = 1� �� be the vector of the latent betas for the � assets. Given

the vector of alphas �� for � = 1� �� the conditional CAPM can be re-written in a state space

form as:

.� = /��−1 + �� (A2.1)

�� = �� +Φ��−1 + �� (A2.2)

where

• .� is a � × 1 vector with elements {.� = � − ��}, � = 1� ��

• /� = ��0(�� ) is a � ×� diagonal matrix,

• � =

�1� − "1�

1�−1 0 � � � 0

0 � �

� � �

� � �

� � 00 � � � 0 �� − "��

��−1

is a � ×�� matrix of explanatory vari-

ables for the latent data and �� is a 1×� vector of explanatory variables for asset � = 1� ��

• �� = (� 1� �� )0 is a �� × 1 vector of parameters where � = (�1� �� )

0 are � × 1vectors for � = 1� ��

• Φ = ��0("1� �� "� ) is a � ×� diagonal matrix of autoregressive parameters,

• �� and �� are independently, identically distributed as � (0�Σ) and � (0� 1) respectively,

• and the initial conditional beta, �0, is given by �0 = 0� + �0.

33

Equation (A3.1) is usually referred to as the measurement equation and equation (A3.2) as the

state or transition equation.

To use the simulator smoother, let re-write the error terms, �� and �� as

�� = 2�#�� (A2.3)

�� = 3�#�� (A2.4)

where

• #� ∼ �(0� �) is a (� +�)× 1 vector of innovations,

• 2� = 2 = (Σ 0) is a � × (� +�) matrix, where 0 is a � ×� matrix of zeros,

• and 3� = 3 = (0 1) is a � × (� +�) matrix, where 0 is a � ×� matrix of zeros.

The state space model can be re-written as

.� = /��−1 +2#� (A2.5)

�� = �� +Φ��−1 +3#�� (A2.6)

The Simulator Smoother40

The simulator smoother draws

Step I : Kalman filter: for � = 1� �� run:

4� = .� −/�� (A2.7)

+� = /�5�/0� +220 (A2.8)

�� = (Φ5�/0�)+

−1� (A2.9)

�� = Φ−��/� (A2.10)

��+1 = �� +Φ�� +��4� (A2.11)

5�+1 = Φ5��0� +33 0 (A2.12)

where �1 = 0� and 51 = 30300 are the initial values for the filter. On this Kalman filter pass, the

quantities 4�, +� and �� are stored. The equations A2.7-A2.12 can be interpreted as follows: A2.7

is the innovation equation; A2.8 is the innovation variance; A2.9 is the Kalman gain; A2.11 is the

updating equation; A2.12 is the mean squared error of the prediction.

40Note that the notation used for the simulator smoother applies to this section only.

34

Step II: Backward recursion: for � = �� 1 run:

&� = 33 0 −33 06�33 0 (A2.13)

κ� ∼ �(0� &�) (A2.14)

%� = 33 06�� (A2.15)

�� = 33 0*� + κ� (A2.16)

*�−1 = / 0�+−1� 4� + �0�*� − % 0�&

−1� κ� (A2.17)

6�−1 = / 0�+−1� /� + �0�6�� + % 0�&

−1� %� (A2.18)

and store �� The � × 1 vector �� is a draw from 5 (3��|.)�The simulated �� is obtained recursively from

�� = �� +Φ��−1 + �� (A2.19)

where �0 = 0� + �0�

35

APPENDIX III:

Derivation of the Conditional Posterior Distributions of the Parameters in the Beta

Equation

This appendix derives the conditional posterior distributions of the parameters in the beta

equation. At each step, the posterior distribution of the parameter of interest is conditioned on

fixed values of all the other parameters (previously simulated).

Prior Distributions

In the derivation of the posterior distributions, I assume independent priors for �� " and � ∗� ,

the long run covariance matrix of the residuals, which is given by

� ∗� = Φ�

∗�Φ+�� (A3.1)

In particular, I specify a normal distribution as the prior distribution of �, a normal distribution

truncated to the stationary region as the prior for " and an inverse Wishart distribution as the

prior for � ∗�

� ∼ �³_��

´� (A3.2)

" ∼ �(_"�

_3)�� (A3.3)

� ∗ −1� ∼ �

³-−1� #´� (A3.4)

The prior of �� is implied by the prior distribution of � ∗� � However, given the prior of �

∗� , the

prior distribution of �� turns out not to be a familiar distribution. As I discuss later in this

appendix, this poses no problem because we can use the Metropolis-Hasting algorithm to draw

from the posterior of ��.

36

Conditional Posteriors of the � Parameter Vector and the Innovations Covariance

Matrix ��

In order to derive the conditional posterior densities of the parameters � and ��, write the

equations for beta as a system of equations�1�

�

�

��

=

1 − "1

1−1 0 � � � 00 � �

� � �

� � �

0 � � � 0 � − "��−1

�1�

�

�

��

+

"1�1�−1�

�

�

"��−1

+

�1�

�

�

��

�

(A3.5)

where, for assets � = 1� �� is a � × 1 vector and ��−1 is a � × 1 vector of betas lagged oneperiod ; is a � ×� matrix of explanatory variables and −1 is a � ×� matrix of explanatory

variables lagged one period; � is a �×1 vector of parameters,; and "’s are (scalar) autoregressive

coefficients.

Conditional on the values of ", the system in (A3.4) can be expressed as

7 = 8 −Φ⊗ ��8−1 = ∗� + �� (A3.6)

where 8 = (�1� �� )0 is a �� × 1 vector of the latent betas; ∗ is a �� ×�� sparse matrix

of “new” explanatory variables with elements {1 − "11−1}; � = (�1� �� )0 is a �� × 1 vector

of parameters; Φ = ��0(") is a � × � diagonal matrix of the autoregressive parameters and

" = (" 1� �� " � )0 is a � × 1 vector of autoregressive coefficients; and � = (� 1� �� )

0 is a

�� × 1 vector of residuals, which are independently, identically distributed as

� ∼ � (0�� ⊗ �� ) �

To derive the conditional posteriors of interest, divide the � observations into the initial obser-

vation, � = 0, and the rest of the observations, � = 1� �� − 1. Then the model in (A3.6) can bere-written as follows.

For � = 0, let 70 be a � × 1 vector and ∗0 the corresponding � ×�� matrix of explanatory

variables. Since 8−1 = 0, 70 equals the conditional betas at time � = 0� 80� Hence at � = 0, model

for the model is given by

70 = 80 = ∗0� + �0� (A3.7)

where �0 ∼ �(0�� ∗� ) and � ∗

� = Φ�∗�Φ+��

For the rest of the observations, � 9 0, let 7� = (8�−Φ⊗ �(�−1)8−1��) be a (� −1)� ×1 vectorand ∗� the corresponding (� −1)� ×�� matrix of explanatory variables�41 For � 9 0, the system

41The subscript denotes rest of the observations. Notice that the matrix � is lagged one period.

37

is

7� = ∗� � + �� (A3.8)

where �� ∼ �(0�� ⊗ �(�−1))�

Likelihood

The likelihood function is proportional to:

: (7 |) ∝ |� ∗� |−1�2 exp

½−12(70 − ∗0�)

0� ∗ −1

� (70 −∗0�)¾

(A3.9)

· |��|−(�−1)�2 exp½−12(7� − ∗� �)

0 ¡�−1� ⊗ �(�−1)

¢(7� − ∗� �)

¾�

Conditional Posterior Distribution of �

Define:

b�0 =¡∗00 �

∗ −1� ∗0

¢−1∗00 �

∗ −1� 70�b�� =

¡∗0�

¡�−1

� ⊗ �(�−1)¢∗�¢−1

∗0�¡�−1

� ⊗ �(�−1)¢7��

where b�0 and b�� are the estimators of �0 and ��. As a function of b�0and b�� the likelihood functionis proportional to

: (7 |) ∝ |� ∗� |−1�2 exp

½−12

³70 − ∗0b�0´0� ∗ −1

�

³70 − ∗0b�0´− 12 ³�0 − b�0´0∗00 � ∗ −1

� ∗0³�0 − b�0´¾

· |��|−(�−1)�2 exp −12

³7� − ∗�b��´0 ¡�−1

� ⊗ �(�−1)¢³

7� − ∗�b��´−12

³�� − b��´0∗00�

¡�−1

� ⊗ �(�−1)¢∗�³�� − b��´

� (A3.10)

Combining the likelihood in (A3.10) and the prior in (A3.2), the density kernel of the posterior

distribution of � is given by

exp

½−12

³�0 − b�0´0 ∗00 � ∗ −1

� ∗0³�0 − b�0´− 1

2

³�� − b��´0∗0� ¡�−1

� ⊗ �(�−1)¢∗�³�� − b��´

−12

¡� − �

¢0�−1�

¡� − �

¢¾� (A3.11)

Therefore, the posterior distribution of � is given by

� ∼ �

µ=��

=� �

¶where

=�−1� = �

−1� + ∗00 �

∗ −1� ∗0 + ∗

0�

¡�−1

� ⊗ �(�−1)¢∗� (A3.12)

=� =

=� �

³�−1� � + ∗00 �

∗ −1� 70 + ∗

0�

¡�−1

� ⊗ �(�−1)¢7�

´� (A3.13)

38

Conditional Posterior Distribution of ��

Let -0 = [-�] and -� = [-� ] with elements -� = (7 − ∗ �)0 (7 − ∗ �) for � = 0 and � =

1� �� − 1 respectively. As a function of -0 and -� the likelihood function is proportional to

:(7 |) ∝ |� ∗� |−1�2 exp

½−12�(� ∗−1

� -0)

¾· |��|−(�−1)�2 exp

½−12�(�−1

� -�)

¾� (A3.14)

Combining the likelihood in (A3.14) with the prior in (A3.4), the density kernel of the posterior

distribution of �� is given by

|� ∗� |−1�2 exp

½−12�(� ∗−1

� -0)

¾· |��|−(�−1)�2 exp

½−12�(��

−1-�)¾·

|� ∗� |−(�+�+1)�2 exp

½−12�(� ∗−1

� -)

¾= �(��) · |��|−(�−1)�2 exp

½−12�(�−1

� -�)

¾� (A3.15)

where

�(��) = |� ∗� |−(�+�+2)�2 exp

½−12�(� ∗−1

� (-0 + -))

¾�

Since the exact form of the distribution of �� is unknown, I use the Metropolis-Hasting algo-

rithm to sample from a density whose kernel is given by (A3.15). In particular, I use the inverse

Wishart distribution with parameters -� and � − 1 as a proposal distribution in the Metropolis-Hasting algorithm. Therefore, given a “current” value of �� and corresponding � , I sample a

candidate value � �� from �−1 (-� � − 1) and accept it with probability

min

½1��(� �

�)

�(��)

¾where � � = Φ� �Φ+� �

�.

2) Conditional Posteriors of the " Vector of Autoregressive Parameters

Conditional on � and ��, rewrite (A3.1) as�1 − 1� 1

�

�

�

�� − ��

= /

" 1�

�

�

" �

+

� 1�

�

�

� �

� (A3.16)

where

•

/ =

�1�−1 − 1−1� 1 0 � � 0

� �

� �

� �

� � � 0 ��−1 − �−1� �

� (A3.17)

39

• and � ∼ � (0�� ⊗ �� ) and �� is the covariance matrix of the residuals.

For simplicity express (A3.16) as

7 = /"+ ��

and 7 = [¡�1 − 1� 1

¢� �� (�� − �� )]

0�

As before, divide the � observations into the initial observation, � = 0, and the rest of the

observations, � = 1� �� − 1 and define 70, 7�, /0 and /� accordingly.

Likelihood

The likelihood function is proportional to

: (7 |/) ∝ |� |−1�2 exp½−12700�−170

¾· (A3.18)

· |��|−(�−1)�2 exp½−12(7� −/�")

0 ¡�−1� ⊗ �(�−1)

¢(7� −/�")

¾�

Define b"� = ¡/ 0�

¡�−1

� ⊗ �(�−1)¢/�

¢−1/ 0

�

¡�−1

� ⊗ �(�−1)¢7�� (A3.19)

and rewrite the likelihood as proportional to

(7 |/) ∝ |� ∗� |−1�2 exp

½−12700�−170

¾· (A3.20)

· |��|−(�−1)�2 exp −12

³7� −/�

b"´0 ¡�−1� ⊗ �(�−1)

¢ ³7� −/�

b"´−12("� − b"0�)/ 00

�

¡�−1

� ⊗ �(�−1)¢/�

³"� − b"�´

�

Combining the likelihood in (A3.20) with the prior in (A3.5), the density kernel of the posterior

distribution of " is given by42

|��|−1�2 exp½−12

¡"− "

¢03−1 ¡

"− "¢¾ · |� ∗

� |−(�+�+1)�2 exp½−12�(� ∗−1

� -)

¾(A3.21)

· |� ∗� |−1�2 exp

½−12700� ∗−1

� 70

¾· |��|−(�−1)�2 exp

½−12("� − b"0�)/ 00

�

¡�−1

� ⊗ �(�−1)¢/�

³"� − b"�´¾ �

which is proportional to

((") · exp½−12

·³"� − b"�´0/ 0

�

¡�−1

� ⊗ �(�−1)¢/�

³"� − b"�´+ ¡"− "

¢03−1 ¡

"− "¢¸¾

�

where

((") = |� ∗� |−(�+�+2)�2 · exp

½−12�(� ∗−1

� - + 700� ∗−1

� 70)

¾�

42Nothe that the posterior didtribution of � is also influenced by the prior of � ∗ �

40

Let

=3−1

= 3−1+/ 0

�(�−1� ⊗ ��−1)/�� (A3.22)

=" =

=3h3−1"+/ 0

�(�−1� ⊗ ��−1)7�

i� (A3.23)

The full conditional posterior distribution of " is the multivariate normal �µ"|="�

=3

¶trun-

cated to the (−1� 1) region and multiplied by the factor ((")� I sample from this distribution using

a Metropolis-Hasting algorithm that takes the truncated multivariate normal component as a pro-

posal distribution. Therefore, given a “current” value of " and corresponding � , I sample a

candidate value "� from the truncated normal and accept it with probability

min

½1�(("�)

((")

¾�

where � � = Φ� �Φ+� ��.

41

APPENDIX IV:

Moments of the Predictive Density when N=1, K=1

The conditional moments

=� =

h=�1

=�2

i0=£�(��+1|�) �(��+1|�)

¤0(A4.1)

=% =

" =% 1

=% 12

=% 12

=% 2

#=

·#�(��+1|�) ()#(��+1� ��+1|�)

()#(��+1� ��+1|�) #�(��+1|�)

¸(A4.2)

can be obtained from the posterior distribution of the parameters. Let the label=

(�) denote the

posterior mean of the parameter of interest. The conditional moments are given by

=�1 =

=�+

=�2

=�� (A4.3)

=�2 = �

= (A4.4)

=% 1 = [1�

=�2]()#(�� |�)[1�

=�2]

0 +=��=% 2 + #�(��|�)

=% 2 +�(�2�|�) (A4.5)

=% 2 = �()#(�� |�)

0� +�(�2|�) (A4.6)

=% 12 =

=��=% 2 (A4.7)

where

=� = �

=�� (A4.8)

=�� = (� −

=" �−1)

=� +

="

=��−1 (A4.9)

#�(��|�) = �()#(�� |�)0� (A4.10)

#�(��|�) = �()#(� � � |�)0� +�(�∗2� |�) (A4.11)

()#(�� |�) = �()#(�� |�)0�� (A4.12)

Optimal Weights in the Assets when N=1, K=1

I derive an equivalent representation for the optimal weights in the benchmark and non-

benchmark assets. This representation provides some insight into the components of the weights

and happens to be very helpful to interpret the results in this paper.

The optimal weights � for an investor who maximizes the mean-variance objective in (21) are

given by

�∗ = '−1=%−1=�� (A4.13)

Note that these weights are proportional to the tangency portfolio weights.

42

Substituting the moments for their expressions in (A4.3)-(A4.7), we obtain that

�∗1 = '−1

=��=

#�(��|�)

(A4.14)

'−1Ã

=��

[1�=�2]()#(�� |�)[1�

=�2]0 + #�(��|�)

=% 2 +�(�2�|�)

!� (A4.15)

and

�∗2 = '−1=�2=% 2

−=��

∗1� (A4.16)

where,=

#�(��|�) is the predictive residual variance of the non-benchmark asset, and �∗1 and �∗2are the optimal portfolio weights on the non-benchmark and benchmark assets respectively.

The denominator of �∗1 is the residual variance from the predictive distribution of returns and

incorporates the effect of parameter uncertainty through the covariance matrix of alpha and betas.

If one were to ignore parameter uncertainty, it is clear from (A4.14) that the optimal portfolio

weight on the non-benchmark asset would be larger and, vice versa, the optimal portfolio weight

on the benchmark asset would be smaller. Also, from (A4.11) it becomes clear that the stochastic

nature of beta increases the denominator of �∗1 and, hence, the allocation to the non-benchmark

asset is reduced.

It is easy to show that the squared Sharpe ratio of the portfolio is given by

-,2� =

=��2

[1�=�2]()#(�� |�)[1�

=�2]0 + #�(��|�)

=% 2 +�(�2�|�)

+

=�2=% 2

� (A4.17)

which is the sum of squared Sharpe ratio of the non-benchmark,=�2

� �� ! � and squared Sharpe ratio

of the benchmark. The moments are derived under the predictive distributions. Note also that the

squared Sharpe ratio of the tangency portfolio is also given by (A4.17). Hence, the Sharpe ratio of

the portfolio in (A4.13) and the tangency portfolio are the same, which is not surprising given that

the weights in (A4.13) are proportional to the tangency weights.

43

APPENDIX V:

Priors Used in the Paper

0The specific priors used in this paper are:

• Prior Distribution of �

I shrink the long run mean of beta towards the market beta and specify a prior suggesting

no predictability. However, since I let the prior standard deviations be quite large, the prior

is nearly noninformative. The prior distribution of � is

� ∼ �³¡1 0�−1

¢0��

´where �� = ��0(1�), and 0�−1 and 1� are (� − 1) × 1 and � × 1 vectors of zeros andones.

• Prior Distribution of "

0I consider a fairly noninformative prior for " � In particular, I assume that it is normally

distributed as " ∼ �(0�9� 0�16) truncated to the stationary region (-1,1).

• Prior Distribution of �∗2�

�∗2� reflects the departures from the model specified for beta. The strength of the prior

depends on−#, which corresponds to the number of data points we have observed in order

to specify the prior. Tamayo (2000) points out that the latent nature of beta makes very

small values of−# assign a lot of weight to the prior of �∗2� . This is due to the fact that beta

is latent rather than observable and, hence, the investor needs to observe larger amount of

data to estimate the beta parameter precisely. Thus,−# = 1 for the latent betas is roughly

equivalent to observing 7 or 8 data points of actually observable data. In this paper I assume

that−# = 1, a very noninformative prior.

• Prior Distribution of ��

I assume that the prior mean of �� equals 0� , the true value under the null hypothesis that

the conditional CAPM is an adequate characterization of expected returns. This implies

that the conditional CAPM can capture the time series predictability of returns and price

average returns. However, again I let the prior standard deviations be quite large, 1, so these

priors are nearly noninformative. The prior distribution of �� is �� ∼ �³00� ��

´,where

�� = ��0(1�) and 0� and 1� are � × 1 vectors of zeros and ones respectively.

44

• Prior Distribution of ��

I set the first element in ��, the long run mean of the excess return on the market, equal to

0.005 (or 6% premium over the riskfree asset per annum) and specify a prior suggesting no

predictability. Again, I let the prior standard deviations be quite large so that these priors

are nearly uninformative. Therefore, the prior distribution of �� is:

0�� ∼ �³¡0�005 0�

¢0��

´where ��

= ��0(0�5� 1�) and 0� and 1� are � × 1 vectors of zeros and ones respectively.

• Prior Distributions of �2� and �2

The priors for the parameters �2� and �2 are assumed to be diffuse.

45

References

[1] Anderson, E, Lars Hansen and Thomas Sargent, 1999, “Risk and robustness in equilibrium”,

Working Paper, Stanford University.

[2] Avramov, Doron, 2000, “Stock return predictability and model uncertainty”, Working Paper,

University of Maryland.

[3] Avramov, Doron, 2000, “Stock return predictability and asset pricing models”, Working Paper,

University of Maryland.

[4] Banz, Rolf, 1981, “The relationship between returns and market value of common stocks”,

Journal of Financial Economics, 9, 13-18.

[5] Barberis, Nicholas, 2000, “Investing for the long run when returns are predictable:”, Journal

of Finance, 55, 225-264.

[6] Bauer, Gregory, 2001, Working Paper, University of Rochester.

[7] Bawa, Vijay, Stephen Brown, and Roger Klein, 1979, Estimation Risk and Optimal Portfolio

Choice, North Holland, Amsterdam.

[8] Berger, James O., 1985, Statistical Decision Theory and Bayesian Analysis, 2nd ed., Springer

Verlag, New York.

[9] Brennan, Michael, Eduardo Scwartz, and Rinald Lagnado, 1997, “Strategic asset allocation”,

Journal of Economic Dynamics and Control, 21, 1377-1403.

[10] Bodurtha, James N. and Nelson C. Mark, 1991, “Testing the CAPM with time-varying risk”,

Journal of Finance, 46, 1485-1505.

[11] Bollerslev, Tim, Robert F. Engle, and Jeffrey M. Wooldridge, 1988, “A capital asset pricing

model with time varying covariances”, Journal of Political Economy, 96, 116-131.

[12] Bossaerts, P. and P. Hillion, 1999, “Implementing statistical criteria to select return forecasting

models: what do we learn?”, Review of Financial Studies, 12, 405-428.

[13] Campbell, John and Luis Viceira, 1999, “Consumption and portfolio decisions when expected

returns are time-varying”, Quarterly Journal of Economics, 114, 433-495.

[14] Casella, George and Edwards I. George, 1992, “Explaining the Gibbs sampler”, The American

Statistician 46, 167-174.

46

[15] Chan, Louis K.C., Narasimham Jegadeesh, and Josef Lakonishok, 1995, “Evaluating the perfor-

mance of value versus glamour: the impact of selection bias”, Journal of Financial Economics,

38, 269-296.

[16] Chib, Siddartha and Edward Greenberg, 1995, “Understanding the Metropolis-Hasting algo-

rithm”, The American Statistician, 49, 327-335.

[17] Chib, Siddartha and Edward Greenberg, 1996, “Markov Chain Monte Carlo simulation meth-

ods in econometrics”, Econometric Theory, 12, 409-431.

[18] De Jong, Piet and Neil Shephard, 1995, “Efficient sampling from the smoothing density in

time series models”, Biometrika 82, 339-350.

[19] Evans, Martin D., 1994, “Expected returns, time-varying risk and risk premia”, Journal of

Finance 49, 655-679.

[20] Fama, Eugene and William Schwert, 1977, “Asset returns and inflation”, Journal of Financial

Economics, 5, 115-146.

[21] Fama, Eugene and Kenneth French, 1988, “Dividend yields and expected stock returns”, Jour-

nal of Financial Economics, 22, 3-27.

[22] Fama, Eugene and Kenneth French, 1989, “Business conditions and expected returns on stocks

and bonds”, Journal of Financial Economics, 25, 23-49.

[23] Fama, Eugene and Kenneth French, 1992, “The cross-section of expected stock returns:”,


[24] Fama, Eugene and Kenneth French, 1993, “Common risk factors in the returns on stocks and

bonds”, Journal of Financial Economics, 33, 3-56.

[25] Ferson, Wayne E. and Campbell R. Harvey, 1991, “The variation of economic risk premiums”,

Journal of Political Economy, 99, 385-415.

[26] Ferson, Wayne E. and Campbell R. Harvey, 1993, “The risk and predictability of international

equity returns”, Review of Financial Studies, 6, 527-566.

[27] Ferson, Wayne E. and Campbell R. Harvey, 1999, “Conditioning variables and the cross-section

of stock retur0ns”, Journal of Finance, 54, 1325-1360.

[28] Ferson, Wayne E and Robert A. Korajczyk, 1995, “Do arbitrage pricing models explain the

predictability of stock returns?”, Journal of Business 68, 309-349.

47

[29] Foster, F. D., T. Smith, and R. E. Whaley, 1997, “Assessing goodness-of-fit of asset pricing

models: the distribution of maximal R2”, Journal of Finance, 52, 591-607.

[30] Frost, Peter and James Savarino, 1986, “An empirical Bayes approach to efficient portfolio

selection”, Journal of Financial and Quantitative Analysis, 21, 293-305.

[31] Gilks, W.R., S. Richardson and D.J. Spiegelhalter, 1996, Markov Chain Monte Carlo in Prac-

tice, Chapman and Hall, London.

[32] Ghysels, Eric, 1998, “On stable factor structures in the pricing of risk: do time varying betas

help or hurt?, Journal of Finance, 53, 549-573.

[33] Goetzmann, W.N. and P. Jorion, 1993, “Testing the predictive power of dividend yields”,


[34] Grauer, Robert and Hakansson, 1995, “Stein and CAPM estimators of the means in asset

allocation”, International Review of Financial Analysis, 4, 35-66.

[35] Hansen, Lars P. and Scott F. Richard, 1987, “The role of conditioning information in deducing

testable restrictions implied by dynamic asset pricing models”, Econometrica, 50, 1029-1054.

[36] Harvey, Andrew, 1989, Forecasting, Structural Time Series Models and the Kalman Filter,

Cambridge University Press, Cambridge.

[37] Harvey, Campbell R., 1989, “Time-varying conditional covariances in tests of asset pricing

models”, Journal of F0inancial Economics, 24, 289-318.

[38] Harvey, Campbell R., 1991, “The world price of covariance risk”, Journal of Finance 46,

111-157.

[39] Harvey, Campbell R. and Guofu Zhou, 1990, “Bayesian inference in asset pricing test”, Journal

of Financial Economics, 26, 221-254.

[40] He, Jia, Raymond Kan, Lilian Ng ,and Chu Zhang, 1996, “Tests of the relations among

marketwide factors, firm-specific variables, and stock returns using a conditional asset pricing

model”, Journal of Finance, 51, 1891-1908.

[41] Jacquier, Eric, Nicholas G. Polson and Peter E. Rossi, 1994, “Bayesian analysis of stochastic

volatility models”, Journal of Business and Economic Statistics, 12, 371-417.

[42] Jagannathan, Ravi and Zhenyu Wang, 1996, “The conditional CAPM and the cross-section of

expected returns”, Journal of Finance, 51, 3-53.

48

[43] Jobson, J. D. and Robert Korkie, 1980, “Estimation for Markowitz efficient portfolios”, Journal

of the American Statistical Association, 75, 544-554.

[44] Jorion, Philippe, 1985, “International portfolio diversification with estimation risk”, Journal

of Business, 58, 259-278.

[45] Jorion, Philippe, 1991, “Bayesian and CAPM estimators of the means: implications for port-

folio selection”, Journal of Banking and Finance, 15, 717-727.

[46] Kandel, Shmuel, Robert McCulloch and Robert F. Stambaugh, 1995, “Bayesian inference and

portfolio efficiency”, Review of Financial Studies, 8, 1-53.

[47] Kandel, Shmuel and Robert F. Stambaugh, 1996, “On the predictability of asset returns: an

asset-allocation perspective”, Journal of Finance, 51, 385-424.

[48] Keim, Donald and Robert Stambaugh, 1986, “Predicting returns in the stock and bond mar-

kets”, Journal of Financial Economics, 17, 357-390.

[49] Kim, Sangjoon, Neil Shephard and Siddhartha Chib, 1998, “Stochastic volatility: likelihood

inference and comparison of ARCH models”, Review of Economic Studies ,65, 361-393.

[50] Kim, Tong Suk and Edward Omberg, 1996, “Dynamic nonmyopic portfolio behavior”, Review

of Financial Studies, 9, 141-161.

[51] Klein, Roger W. and Vijay S. Bawa, 1976, “The effect of estimation risk on optimal portfolio

choice”, Journal of Financial Economics, 3, 215-231.

[52] Kothari, S.P. and Jay Shanken, 1997, “Book-to-market, dividend yield and expected market

returns: a time series analysis”, Journal of Financial Economics, 44, 169-203.

[53] Lewellen, Jonathan, 1999, “The time series relation among expected returns, risk and book-

to-market”, Journal of Financial Economics, 54, 5-44.

[54] Maenhout, Pascal, 2000, “Portfolio rules and asset pricing”, Working Paper, Insead.

[55] McCulloch, Robert and Peter E. Rossi, 1990, “Posterior predictive and utility-based ap-

proaches to testing the arbitrage pricing theory”, Journal of Financial Economics, 28, 7-38.

[56] McCulloch, Robert and Peter E. Ros0si, 1991, “A Bayesian approach to testing the arbitrage

pricing theory”, Journal of Econometrics, 49, 141-168.

[57] Ohlson, James and Barr Rosenberg, 1982, “Systematic risk of the CRSP equal-weighted com-

mon stock index: a history estimated by stochastic-parameter regression”, Journal of Business,

55, 121-145.

49

[58] Pastor, Lubos, 2000, “Portfolio selection and asset pricing models”, Journal of Finance, 55,

179-224.

[59] Pastor, Lubos and Robert Stambaugh, 1999, “Cost of equity capital and model mispricing”,


[60] Pastor, Lubos and Robert Stambaugh, 2000, “Evaluating and investing in equity mutual

funds”, Wharton School Working Paper.

[61] Pesaran, M. H. and A. Timmerman, 1995, “Predictability of stock returns: robustness and

economic significance”, Journal of Finance, 50, 1201-1228.

[62] Rosenberg, Barr, 1973, “Random coefficient models: the analysis of a cross section of time

series by stochastically convergent parameter regression”, Annals of Economic and Social Mea-

surement 2, 399-428.

[63] Schwert, William, 1989, “Why does stock market volatility change over time?”, Journal of

Finance, 44, 1115-1153.

[64] Shanken, Jay, 1987, “A Bayesian approach to testing portfolio efficiency”, Journal of Financial

Economics, 19, 195-216.

[65] Shanken, Jay, 1990, “Intertemporal asset pricing: an empirical investigation”, Journal of

Econometrics, 45, 99-120.

[66] Shanken, Jay and Ane Tamayo, 2001, “Dividend yield and stock return predictability: mis-

pricing or risk?”, Working Paper, University of Rochester.

[67] Tamayo, Ane, 2000, “An Examination of conditional asset pricing models when betas are

stochastic”, Working Paper, University of Rochester.

[68] Tanner, Martin A., 1996, Tools for Statistical Inference, Springer-Verlag, New York.

[69] Zellner, Arnold, 1971, An Introduction to Bayesian Inference in Econometrics, Wiley and

Sons, New York.

[70] Zellner, Arnold and Karuppan Chetty, 1965, “Prediction and decision problems in regression

models from a Bayesian point of view”, Journal of the American Statistical Association, 60,

608-616.

50

Table I: Descriptive Statistics

Panel A: Predictive Variables (%)

Dividend yield is the annual dividend yield on the value-weighted CRSP market index. Default spread is the average monthly yield to maturity of corporate bonds rated BAA minus the AAA corporate bond yield. Term spread is the difference between the average monthly yield of a 10-year government bond and a 1-month Treasury bill. Sample period: 1:63–12:98

Variable Mean Median Maximum Minimum Std. Dev.

Dividend Yield 3.5567 3.4119 6.2675 1.5592 0.9406 Default Spread 0.0849 0.0746 0.2242 0.0267 0.0385 Term Spread 0.1285 0.1254 0.5165 -0.3933 0.1208

Panel B: Monthly Excess Returns on the Portfolios (%)

Stocks from the population of NYSE, AMEX and NASDAQ are divided into size and Book-to-market quintiles using NYSE as breakpoints. Sample period: 1:63–12:98

Portfolio Mean Median Maximum Minimum Std. Dev.

High BM 1.0010 0.9311 35.2383 -16.8129 5.0033 Small 0.6672 0.9761 30.6653 -30.0860 6.1952 Market 0.5496 0.7706 16.0360 -23.0920 4.3776

Panel C: Maximum Likelihood Estimates of the Predictive Regression This table presents the OLS regression estimates from regressing the excess returns on the market index, and the value weighted portfolios on the predictive variables. Standard Errors in Parenthesis. Sample period: 1:63–12:98

Variable Constant (%) D/P (%) Spread (%) Default (%)

0.549 0.210 0.231 (0.210) (0.210) (0.210) Market Index 0.549 0.153 0.371 (0.210) (0.215) (0.215) 1.001 0.499 0.224 (0.239) (0.240) (0.240) Value Portfolio 1.001 0.060 0.787 (0.238) (0.244) (0.244) 0.667 0.462 0.145 (0.297) (0.298) (0.298) Size Portfolio 0.667 0.010 0.650 (0.297) (0.306) (0.304)

Table II: Posterior Means and Standard Deviations of the Model Parameters Value Portfolio

This table presents the posterior means and standard deviations of the model parameters. The rows that are shaded present the results for models with time-varying alphas, which are assumed to be deterministic functions of the predictive variables. The other rows present the results for models with constant alphas. In the first column, cons refers to a constant beta model, dete to a time-varying, deterministic beta model and stoc to a time-varying, stochastic beta model. The general model for Panels A and B is: 111 +++ ++= tmtttt rZZr εθθ βα

where ε~N(0, σε2). The general model for Panels C and D is:

ttttt

tmtttt

uZZ

rZr

ββββ

α

βφθφβεβθ

++−=++=

−−

+++

11

111

)(

where ε~N(0, σε2), uβt~N(0,σuβ

2) and σuβ*2 = σuβ

2/(1-φ2) is the long run variance. Panel A and C use the dividend yield and term spread as predictive variables (hence, Zt is a 1x3 vector where its first element is an intercept, the second one the dividend yield and the third one the term spread). Panel B and D use the default and term spreads as predictive variables (hence, Zt is a 1x3 vector where its first element is an intercept, the second one the default spread and the third one the term spread). Sample period is 1:63-12:98. Panel A: Dividend Yield and Term Spread

θα1 (%) c

θα2 (%) d/p

θα3 (%) term

θβ1 c

θβ2 d/p

θβ3 term

σε (%)

Cons beta 0.478 0.950 2.796 (0.135) (0.031) (0.005) Cons beta 0.479 0.310 0.006 0.947 2.786 (0.135) (0.135) (0.134) (0.031) (0.005) Dete beta 0.486 0.932 0.053 -0.033 2.773 (0.135) (0.031) (0.027) (0.027) (0.005) Dete beta 0.487 0.261 0.031 0.931 0.042 -0.035 2.778 (0.135 (0.104) (0.135) (0.031) (0.027) (0.027) (0.005) Panel B: Default Spread and Term Spread

θα1 (%) c

θα2 (%) default

θα3 (%) term

θβ1 c

θβ2 default

θβ3 term

σε (%)

Cons beta 0.478 0.950 2.796 (0.135) (0.031) (0.005) Cons beta 0.483 0.436 -0.084 0.942 2.771 (0.134) (0.137) (0.137) (0.031) (0.005) Dete beta 0.473 0.925 0.083 -0.061 2.773 (0.135) (0.032) (0.031) (0.028) (0.005) Dete beta 0.484 0.378 -0.054 0.922 0.061 -0.059 2.757 (0.134) (0.141) (0.137) (0.032) (0.032) (0.028) (0.005)

Panel C: Dividend Yield and Term Spread – Incorporating Uncertainty About the Beta Model Dynamics

θα1 (%) c

θα2 (%) d/p

θα3 (%) term

θβ1 c

θβ2 d/p

θβ3 term

φβ σ*uβ σε (%)

Stoc beta 0.410 0.946 0.046 -0.025 0.614 0.337 2.289 (0.124) (0.051) (0.047) (0.038) (0.194) (0.027) (0.005) Stoc beta 0.415 0.207 0.041 0.949 0.030 -0.022 0.572 0.334 2.297 (0.124) (0.131) (0.126) (0.049) (0.046) (0.038) (0.209) (0.027) (0.004) Panel D: Default and Term Spread – Incorporating Uncertainty About to the Beta Model Dynamics

θα1 (%) c

θα2 (%) def

θα3 (%) term

θβ1 c

θβ2 def

θβ3 term


Stoc beta 0.416 0.945 0.056 -0.036 0.661 0.338 2.309 (0.124) (0.058) (0.064) (0.040) (0.205) (0.031) (0.005) Stoc beta 0.429 0.332 -0.023 0.948 0.024 -0.034 0.603 0.332 2.296 (0.124) (0.132) (0.128) (0.052) (0.051) (0.040) (0.209) (0.028) (0.005)

Table III: Posterior Means and Standard Deviations of the Model Parameters Size Portfolio

This table presents the posterior means and standard deviations of the model parameters. The rows that are shaded present the results for models with time-varying alphas, which are assumed to be deterministic functions of the predictive variables. The other rows present the results for models with constant alphas. In the first column, cons refers to a constant beta model, dete to a time-varying, deterministic beta model and stoc to a time-varying, stochastic beta model. The general model for Panels A and B is: 111 +++ ++= tmtttt rZZr εθθ βα

where ε~N(0, σε2). The general model for Panels C and D is:

ttttt

tmtttt

uZZ

rZr

ββββ

α

βφθφβεβθ

++−=++=

−−

+++

11

111

)(

where ε~N(0, σε2), uβt~N(0,σuβ

2) and σuβ*2 = σuβ

2/(1-φ2) is the long run variance. Panel A and C use the dividend yield and term spread as predictive variables (hence, Zt is a 1x3 vector where its first element is an intercept, the second one the dividend yield and the third one the term spread). Panel B and D use the default and term spreads as predictive variables (hence, Zt is a 1x3 vector where its first element is an intercept, the second one the default spread and the third one the term spread). Sample period is 1:63-12:98. Panel A: Dividend Yield and Term Spread

θα1 (%) c

θα2 (%) d/p

θα3 (%) term

θβ1 c

θβ2 d/p

θβ3 term

σε (%)

Cons beta 0.023 1.169 3.506 (0.170) (0.038) (0.008) Cons beta 0.025 0.259 -0.125 1.168 3.502 (0.170) (0.170) (0.170) (0.039) (0.008) Dete beta 0.040 1.168 -0.022 -0.043 3.506 (0.170) (0.040) (0.033) (0.034) (0.008) Dete beta 0.041 0.300 0.104 1.170 -0.035 -0.042 3.500 (0.170) (0.175) (0.170) (0.040) (0.034) (0.034) (0.008) Panel B: Default Spread and Term Spread

θα1 (%) c

θα2 (%) default

θα3 (%) term

θβ1 c

θβ2 default

θβ3 term

σε (%)

Cons beta 0.023 1.169 3.506 (0.170) (0.039) (0.008) Cons beta 0.026 0.216 -0.168 1.166 3.506 (0.170) (0.173) (0.173) (0.039) (0.008) Dete beta 0.035 1.160 0.011 -0.044 3.508 (0.170) (0.040) (0.039) (0.035) (0.008) Dete beta 0.039 0.228 -0.150 1.160 -0.001 -0.041 3.508 (0.171) (0.180) (0.174) (0.040) (0.041) (0.035) (0.008)


θα1 (%) c

θα2 (%) d/p

θα3 (%) term

θβ1 c

θβ2 d/p

θβ3 term


Stoc beta 0.099 1.152 -0.008 -0.039 0.490 0.376 3.083 (0.164) (0.055) (0.051) (0.047) (0.185) (0.044) (0.008) Stoc beta 0.111 0.260 -0.151 1.157 -0.021 -0.035 0.451 0.377 3.081 (0.165) (0.172) (0.166) (0.054) (0.051) (0.047) (0.191) (0.044) (0.008) Panel D: Default and Term Spread – Incorporating Uncertainty About the Beta Model Dynamics

θα1 (%) c

θα2 (%) def

θα3 (%) term

θβ1 c

θβ2 def

θβ3 term


Stoc beta 0.099 1.150 0.007 -0.041 0.482 0.373 3.085 (0.164) (0.054) (0.052) (0.049) (0.183) (0.043) (0.008) Stoc beta 0.110 0.159 -0.176 1.152 -0.001 -0.039 0.453 0.374 3.090 (0.165) (0.174) (0.169) (0.054) (0.052) (0.048) (0.183) (0.043) (0.008)

Table IV: Weights, Sharpe Ratio and Differences in CERs - Value Portfolio Time Variation in Expected Returns Only

This table presents the optimal (non-normalized) weights on the risky assets, the Sharpe ratio and differences in CER when the investor invests in a value portfolio and a market index. The weights are given by A-1V-1E, where A is the risk aversion parameter, and E and V are the first two moments of the predictive distribution of the excess returns. The maximum Sharpe ratio is the ex-ante Sharpe ratio perceived by an investor who invests in the two

assets and is given by EVE 1' − . CER is the difference in certainty equivalent returns, C*-Ca, where C* (Ca) is the CER for the optimal (suboptimal) portfolio. The optimal (suboptimal) portfolio is computed assuming that the predictive variable of interest is one standard deviation above or below its mean (at their mean values). The difference in CER is annualized. In the column labeled by “Mean”, the predictive variables are at their mean values. In the columns labeled by “∆” (“∇”) the predictive variables are one standard deviation above (below) their means. In the Table, cons refers to constant and dete to a time-varying, deterministic alpha and/or beta model. Sample period: 1:63-12:98.

Panel A: Dividend Yield and Term Spread

Mean ∆d/p ∇d/p ∆term ∇term

Predictability in expected returns only (dete alpha, cons beta) Weight Value 216.30 355.58 75.71 220.60 211.01 Weight Market -104.77 -199.42 -9.18 -66.25 -142.35 Sharpe Ratio 0.21 0.33 0.10 0.25 0.18 CER (% per annum) 0.00 3.10 3.14 0.60 0.06

Panel B: Default and Term Spreads

Mean ∆def ∇def ∆term ∇term

Predictability in expected returns only (dete alpha, cons beta) Weight Value 219.07 407.42 29.71 222.81 214.32 Weight Market -106.40 -210.28 -1.58 -67.23 -144.63 Sharpe Ratio 0.21 0.39 0.04 0.25 0.18 CER (% per annum) 0.00 6.71 6.76 0.60 0.06

Table V: Weights, Sharpe Ratio and Differences in CERs - Size Portfolio Time Variation in Expected Returns Only

This table presents the optimal (non-normalized) weights on the risky assets when the investor invests in a size portfolio and a market index, which are given by A-1V-1E, where A is the risk aversion parameter, and E and V are the first two moments of the predictive distribution of the excess returns. The normalized weights, not reported, are given by V-1E/i2’V

-1E). The maximum Sharpe ratio is the ex-ante Sharpe ratio perceived by an investor who invests in the two assets and is

given by EVE 1' − . CER is the difference in certainty equivalent returns, C*-Ca, where C* (Ca) is the CER for the optimal (suboptimal) portfolio. The optimal (suboptimal) portfolio is computed assuming that the predictive variable of interest is one standard deviation above or below its mean (at their mean values). The difference in CER is annualized. In the column labeled by “Mean”, the predictive variables are at their mean values. In the columns labeled by “∆” (“∇”) the predictive variables are one standard deviation above (below) their means. The rows that are shaded present the results for models with time-varying alphas. The other rows present the results for models with constant alphas. In the second column, cons refers to a constant beta model and dete to a time-varying, deterministic beta model. Sample period: 1:63-12:98.



Predictability in expected returns only (dete alpha, cons beta) Weight Value 7.02 80.28 -66.28 -27.61 41.61 Weight Market 91.81 43.74 139.91 174.83 8.82 Sharpe Ratio 0.12 0.19 0.10 0.18 0.08 CER (% per annum) 0.00 1.60 1.60 0.85 0.85



Predictability in expected returns only (dete alpha, cons beta) Weight Value 7.40 58.32 -43.55 -27.05 41.82 Weight Market 91.36 105.55 77.22 174.26 8.51 Sharpe Ratio 0.12 0.22 0.05 0.18 0.08 CER (% per annum) 0.00 2.36 2.36 0.86 0.86

Table VI: Weights, Sharpe Ratio and Differences in CERs Value Portfolio – Constant Price of Risk


assets and is given by EVE 1' − . CER is the difference in certainty equivalent returns, C*-Ca, where C* (Ca) is the CER for the optimal (suboptimal) portfolio. The optimal (suboptimal) portfolio is computed assuming that the predictive variable of interest is one standard deviation above or below its mean (at their mean values). The difference in CER is annualized. In the column labeled by “Mean”, the predictive variables are at their mean values. In the columns labeled by “∆” (“∇”) the predictive variables are one standard deviation above (below) their means. Sample period: 1:63-12:98.



Predictability but no asset pricing model (deterministic alpha, constant beta) Weight Value 216.30 355.58 75.78 220.38 211.22 Weight Market -104.77 -236.62 28.26 -108.63 -99.96 Sharpe Ratio 0.21 0.32 0.11 0.23 0.19 CER (% per annum) 0.00 2.62 2.66 0.00 0.00

Predictability only in risk premia (constant alpha and beta)

Weight Value 214.08 213.90 214.27 213.87 214.30 Weight Market -103.42 -103.24 -103.60 -103.22 -103.62 Sharpe Ratio 0.21 0.22 0.20 0.23 0.19 CER (% per annum) 0.00 0.00 0.00 0.00 0.00

Predictability in risk premia and betas (constant alpha, deterministic beta) Weight Value 219.99 219.52 219.84 219.07 220.04 Weight Market -105.02 -116.17 -93.28 -97.01 -112.24 Sharpe Ratio 0.21 0.23 0.20 0.23 0.20 CER (% per annum) 0.00 0.00 0.00 0.00 0.00

Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 221.34 338.65 102.15 235.76 204.95 Weight Market -106.14 -229.45 9.10 -111.49 -97.91 Sharpe Ratio 0.21 0.31 0.13 0.24 0.19 CER (% per annum) 0.00 1.88 1.92 0.00 0.00






Predictability in risk premia and betas (constant alpha, deterministic beta) Weight Value 214.17 213.32 214.38 213.22 214.24 Weight Market -98.07 -112.20 -83.27 -88.04 -107.33 Sharpe Ratio 0.21 0.24 0.18 0.23 0.19 CER (% per annum) 0.00 0.12 0.02 0.04 0.02

Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 221.46 386.69 53.59 232.37 208.55 Weight Market -104.22 -275.28 53.17 -103.59 -101.94 Sharpe Ratio 0.21 0.35 0.08 0.24 0.19 CER (% per annum) 0.00 3.74 3.78 0.00 0.00


Mean ∆d/p ∇d/p ∆term ∇term ∆beta ∇beta

Predictability in risk premia and betas (constant alpha, stochastic beta) Weight Value 191.84 170.96 215.63 168.57 220.04 191.84 191.84 Weight Market -81.47 -60.48 -94.19 -55.43 -113.42 -146.17 -16.77 Sharpe Ratio 0.19 0.20 0.19 0.20 0.19 0.19 0.19 CER (% per annum) 0.00 0.10 0.08 0.09 0.09 1.39 1.39

Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 194.06 259.83 108.22 188.76 197.87 194.06 194.06 Weight Market -84.12 -154.33 0.57 -74.94 -92.10 -148.90 -19.34 Sharpe Ratio 0.20 0.26 0.13 0.22 0.17 0.20 0.20 CER (% per annum) 0.00 0.63 0.85 0.01 0.00 1.40 1.40

Panel D: Default and Term Spread – Incorporating Uncertainty About the Beta Model Dynamics

Mean ∆def ∇def ∆term ∇term ∆beta ∇beta


Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 200.50 285.75 60.93 196.18 202.96 200.50 200.50 Weight Market -90.01 -175.47 43.25 -80.25 -98.19 -156.55 -23.46 Sharpe Ratio 0.20 0.30 0.08 0.22 0.18 0.20 0.20 CER (% per annum) 0.00 1.11 1.99 0.00 0.00 1.54 1.47

Table VII: Weights, Sharpe Ratio and Differences in CERs Size Portfolio – Constant Price of Risk





Predictability but no asset pricing model (deterministic alpha, constant beta) Weight Value 7.02 80.21 -66.34 -27.58 41.65 Weight Market 91.81 6.33 177.47 132.21 51.36 Sharpe Ratio 0.12 0.17 0.12 0.15 0.10 CER (% per annum) 0.00 1.13 1.14 0.25 0.25


Weight Value 6.54 6.53 6.54 6.53 6.54 Weight Market 92.36 92.36 92.35 92.37 92.35 Sharpe Ratio 0.12 0.15 0.10 0.15 0.09 CER (% per annum) 0.00 0.00 0.00 0.00 0.00

Predictability in risk premia and betas (constant alpha, deterministic beta) Weight Value 11.28 11.26 11.27 11.23 11.28 Weight Market 86.82 87.11 86.57 87.37 86.32 Sharpe Ratio 0.12 0.15 0.10 0.15 0.09 CER (% per annum) 0.00 0.00 0.00 0.00 0.00

Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 11.79 96.61 -73.26 -16.64 40.25 Weight Market 86.20 -9.56 188.37 118.76 51.19 Sharpe Ratio 0.12 0.17 0.12 0.15 0.10 CER (% per annum) 0.00 1.53 1.53 0.17 0.17







Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 10.99 66.24 -44.65 -16.55 38.57 Weight Market 87.25 23.76 152.24 118.52 53.65 Sharpe Ratio 0.13 0.18 0.08 0.15 0.10 CER (% per annum) 0.00 0.65 0.66 0.16 0.16



Predictability in risk premia and betas (constant alpha, stochastic beta) Weight Value 28.22 25.85 30.79 25.52 31.24 28.22 28.22 Weight Market 67.47 70.42 64.26 71.57 62.79 56.86 78.08 Sharpe Ratio 0.13 0.15 0.10 0.15 0.10 0.13 0.13 CER (% per annum) 0.00 0.00 0.00 0.00 0.00 0.04 0.04

Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 31.66 96.05 -45.48 -9.30 81.34 31.66 31.66 Weight Market 63.38 -9.01 153.61 110.43 3.06 51.45 75.30 Sharpe Ratio 0.13 0.18 0.11 0.15 0.12 0.13 0.13 CER (% per annum) 0.00 0.96 1.16 0.39 0.47 0.05 0.05




Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 31.27 55.79 -3.79 -7.96 78.69 31.27 31.27 Weight Market 63.98 36.26 104.39 108.86 6.32 52.29 75.68 Sharpe Ratio 0.13 0.18 0.06 0.15 0.12 0.13 0.13 CER (% per annum) 0.00 0.15 0.22 0.36 0.43 0.05 0.05

Table VIII: Weights, Sharpe Ratio and Differences in CERs Value Portfolio – Time-Varying Price of Risk





Predictability but no asset pricing model (deterministic alpha, constant beta) Weight Value 219.50 360.99 76.80 223.83 214.10 Weight Market -87.84 -221.92 -0.15 -39.58 -154.23 Sharpe Ratio 0.21 0.32 0.10 0.26 0.18 CER (% per annum) 0.00 2.66 3.42 0.73 1.37

Predictability only in market expected return and variance(constant alpha and beta)


Predictability in risk premia and betas (constant alpha, deterministic beta) Weight Value 223.25 222.96 222.62 222.80 222.96 Weight Market -88.09 -99.72 -123.16 -28.06 -166.60 Sharpe Ratio 0.22 0.22 0.19 0.26 0.18 CER (% per annum) 0.00 0.04 0.39 0.95 1.67 Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 224.62 343.96 103.44 239.80 207.67 Weight Market -89.23 -214.79 -19.49 -42.79 -152.07 Sharpe Ratio 0.22 0.30 0.12 0.27 0.17 CER (% per annum) 0.00 1.90 2.39 0.99 1.72






Predictability in risk premia and betas (constant alpha, deterministic beta) Weight Value 223.17 222.93 222.63 222.64 222.75 Weight Market -94.01 -28.85 -159.97 -62.21 -143.04 Sharpe Ratio 0.22 0.28 0.18 0.24 0.19 CER (% per annum) 0.00 1.26 1.32 0.31 0.78 Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 226.16 396.09 54.51 237.77 212.61 Weight Market -94.61 -195.22 -10.39 -67.93 -136.43 Sharpe Ratio 0.22 0.38 0.06 0.25 0.18 CER (% per annum) 0.00 5.07 5.22 0.45 0.10




Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 212.18 298.43 99.24 227.40 185.44 212.18 212.18 Weight Market -81.36 -172.28 -18.63 -38.43 -131.59 -152.18 -10.53 Sharpe Ratio 0.20 0.27 0.11 0.26 0.15 0.20 0.20 CER (% per annum) 0.00 0.95 2.13 0.90 1.64 1.24 1.24




Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 209.98 362.91 49.82 225.90 181.72 209.98 209.98 Weight Market -85.03 -160.67 -9.15 -67.17 -108.02 -154.72 -15.34 Sharpe Ratio 0.21 0.35 0.06 0.24 0.16 0.21 0.21 CER (% per annum) 0.00 4.55 4.88 0.37 0.92 1.44 1.44

Table IX: Weights, Sharpe Ratio and Differences in CERs Size Portfolio – Time-Varying Price of Risk





Predictability but no asset pricing model (deterministic alpha, constant beta) Weight Value 7.12 81.43 -67.23 -28.01 42.18 Weight Market 111.65 24.73 151.07 205.03 -0.85 Sharpe Ratio 0.13 0.16 0.11 0.19 0.07 CER (% per annum) 0.00 1.15 1.85 0.99 1.63




Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 11.97 98.13 -74.18 -16.92 40.78 Weight Market 105.96 8.55 162.03 191.40 -1.00 Sharpe Ratio 0.13 0.17 0.11 0.19 0.07 CER (% per annum) 0.00 1.55 2.27 0.92 1.57







Predictability in risk premia, beta and model mispricing (deterministic alpha and beta) Weight Value 11.22 67.85 -45.42 -16.93 39.32 Weight Market 100.94 111.09 90.39 159.33 22.18 Sharpe Ratio 0.13 0.23 0.06 0.17 0.09 CER (% per annum) 0.00 2.37 2.45 0.39 0.82




Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 34.02 107.37 -42.71 -10.80 77.70 34.02 34.02 Weight Market 80.61 -2.03 122.89 184.42 -44.14 67.80 93.43 Sharpe Ratio 0.13 0.18 0.10 0.19 0.09 0.13 0.13 CER (% per annum) 0.00 1.11 1.94 1.17 1.80 0.04 0.04




Predictability in risk premia and betas (deterministic alpha, stochastic beta) Weight Value 32.56 67.52 -3.28 -8.93 72.76 32.56 32.56 Weight Market 76.47 112.04 41.06 150.32 -17.19 64.29 88.64 Sharpe Ratio 0.14 0.23 0.04 0.17 0.11 0.14 0.14 CER (% per annum) 0.00 1.96 2.04 0.61 1.01 0.04 0.04

Table X: Economic Significance of Evidence on the Source of Predictability: Difference in CER Across Models. Value Portfolio – Constant Price of Risk

This table presents the difference in certainty equivalent returns (CER), C*m-Cam, across models. C*m is the CER for the optimal portfolio and Cam is the CER for the suboptimal portfolio. The optimal portfolio is computed using the model that the investor perceives as the optimal one. In the table, the optimal model is the one in the first column. The suboptimal portfolio is computed using an alternative model, which is the one given in the second column. The difference in CER’s is annualized and is in percentages. In the column labeled by “∆dyld” the dividend yield is one standard deviation above its mean. The same convention applies to the rest of the columns. In the first two columns, cons refers to a constant model; dete to a time-varying, deterministic model; and stoc to a time-varying stochastic model. Sample period: 1:63-12:98.

Optimal Non-optimal ∆dyld ∆term ∆beta ∆def ∆term ∆beta Cons alpha, Dete alpha, cons beta 2.70 0.28 5.25 0.01 Cons beta Cons alpha, dete beta 0.02 0.07 0.03 0.07 Dete alpha, dete beta 2.12 0.15 4.49 0.15 Cons alpha, stoc beta 0.26 0.26 1.46 0.48 0.15 1.40 Dete alpha, stoc beta 0.32 0.04 1.49 0.71 0.04 1.40 Dete alpha, Cons alpha, cons beta 2.70 0.01 5.12 0.01 Cons beta Cons alpha, dete beta 2.52 0.08 5.02 0.80 Dete alpha, dete beta 0.07 0.12 0.11 0.12 Cons alpha, stoc beta 5.54 0.37 1.48 8.85 0.37 1.56 Dete alpha, stoc beta 1.20 0.09 1.51 2.00 0.09 1.59 Cons alpha, Cons alpha, cons beta 0.03 0.10 0.05 0.10 Dete beta Dete alpha, cons beta 2.60 0.09 5.17 0.09 Dete alpha, dete beta 1.94 0.04 4.20 0.04 Cons alpha, stoc beta 0.30 0.29 1.64 0.50 0.29 1.67 Dete alpha, stoc beta 0.24 0.06 1.66 0.67 0.06 1.72 Dete alpha, Cons alpha, cons beta 2.09 0.17 4.27 0.17 Dete beta Dete alpha, cons beta 0.07 0.11 0.11 0.11 Cons alpha, dete beta 1.92 0.04 4.11 0.04 Cons alpha, stoc beta 3.77 0.55 1.65 7.61 0.55 1.75 Dete alpha, stoc beta 0.79 0.20 1.67 1.50 0.20 1.79 Cons alpha, Cons alpha, cons beta 0.30 0.29 0.60 0.29 Stoc beta Dete alpha, cons beta 5.06 0.41 11.14 0.41 Cons alpha, dete beta 0.32 0.33 0.62 0.33 Dete alpha, dete beta 4.10 0.64 0.00 9.63 0.64 0.00 Dete alpha, stoc beta 1.17 0.11 0.00 2.85 0.11 0.00 Dete alpha, Cons alpha, cons beta 0.34 0.05 0.84 0.05 Stoc beta Dete alpha, cons beta 1.31 0.19 2.43 0.10 Cons alpha, dete beta 0.26 0.07 0.79 0.07 Dete alpha, dete beta 0.84 0.22 0.00 1.85 0.22 0.00 Cons alpha, stoc beta 1.16 0.11 0.00 2.77 0.11 0.00

Table XI: Economic Significance of Evidence on the Source of Predictability: Difference in CER Across Models. Size Portfolio – Constant Price of Risk

This table presents the difference in certainty equivalent returns (CER), C*m-Cam, across models. C*m is the CER for the optimal portfolio and Cam is the CER for the suboptimal portfolio. The optimal portfolio is computed using the model that the investor perceives as the optimal one. In the table, the optimal model is the one in the first column. The suboptimal portfolio is computed using an alternative model, which is the one given in the second column. The difference in CER’s is annualized and is in percentages. In the column labeled by “∆dyld” the dividend yield is one standard deviation above its mean. The same convention applies to the rest of the columns. In the first two columns, cons refers to a constant model; dete to a time-varying, deterministic model; and stoc to a time-varying stochastic model. Sample period: 1:63-12:98.

Optimal Non-optimal ∆dyld ∆term ∆beta ∆def ∆term ∆beta Cons alpha, Dete alpha, cons beta 1.14 0.24 0.58 0.24 Cons beta Cons alpha, dete beta 0.00 0.00 0.00 0.00 Dete alpha, dete beta 1.71 0.10 0.77 0.11 Cons alpha, stoc beta 0.08 0.08 0.14 0.07 0.08 0.15 Dete alpha, stoc beta 1.68 0.05 0.18 0.52 0.04 0.19 Dete alpha, Cons alpha, cons beta 1.15 0.25 0.58 0.07 Cons beta Cons alpha, dete beta 1.00 0.32 0.51 0.29 Dete alpha, dete beta 0.06 0.03 0.01 0.02 Cons alpha, stoc beta 0.62 0.60 0.13 0.25 0.57 0.13 Dete alpha, stoc beta 0.06 0.07 0.17 0.00 0.08 0.17 Cons alpha, Cons alpha, cons beta 0.00 0.00 0.00 0.00 Dete beta Dete alpha, cons beta 1.00 0.33 0.50 0.29 Dete alpha, dete beta 1.54 0.16 0.68 0.15 Cons alpha, stoc beta 0.05 0.04 0.10 0.04 0.05 0.12 Dete alpha, stoc beta 1.51 0.09 0.13 0.45 0.07 0.15 Dete alpha, Cons alpha, cons beta 1.72 0.11 0.77 0.12 Dete beta Dete alpha, cons beta 0.06 0.03 0.01 0.02 Cons alpha, dete beta 1.54 0.16 0.69 0.15 Cons alpha, stoc beta 1.06 0.37 0.09 0.38 0.38 0.11 Dete alpha, stoc beta 0.00 0.01 0.13 0.02 0.02 0.14 Cons alpha, Cons alpha, cons beta 0.09 0.08 0.08 0.08 Stoc beta Dete alpha, cons beta 0.68 0.67 0.30 0.64 Cons alpha, dete beta 0.05 0.05 0.05 0.06 Dete alpha, dete beta 1.15 0.40 0.00 0.45 0.41 0.00 Dete alpha, stoc beta 1.12 0.28 0.00 0.25 0.26 0.00 Dete alpha, Cons alpha, cons beta 1.86 0.06 0.61 0.05 Stoc beta Dete alpha, cons beta 0.06 0.08 0.00 0.09 Cons alpha, dete beta 1.66 0.10 0.53 0.07 Dete alpha, dete beta 0.00 0.01 0.00 0.03 0.02 0.00 Cons alpha, stoc beta 1.13 0.28 0.00 0.25 0.26 0.00

stock return predictability, conditional asset pricing

Documents