maskin ambiguity book mock 2 - new york university

449
Model Uncertainties Lars Peter Hansen University of Chicago and NBER Thomas J. Sargent New York University and Hoover Institution August 4, 2013

Upload: others

Post on 10-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Maskin ambiguity book mock 2 - New York University

Model Uncertainties

Lars Peter HansenUniversity of Chicago and NBER

Thomas J. SargentNew York University and Hoover Institution

August 4, 2013

Page 2: Maskin ambiguity book mock 2 - New York University
Page 3: Maskin ambiguity book mock 2 - New York University

Contents

Contents vii

List of Figures xi

1 Introduction 11.1 Overview of Model Uncertainty . . . . . . . . . . . . . . . . 11.2 Nine papers about model uncertainty . . . . . . . . . . . . . 6

2 Discounted Exponential Quadratic Gaussian Control 192.1 Cost Formulation . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Cost Recursions and Aggregator Functions . . . . . . . . . . 212.3 Infinite Horizon Costs . . . . . . . . . . . . . . . . . . . . . 222.4 Arbitrary Time-invariant Linear Control Laws . . . . . . . . 242.5 Solution to the Infinite Horizon Discounted Problem . . . . . 262.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Robust Permanent Income and Pricing 31with Thomas D. Tallarini

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Recursive Risk Sensitive Control . . . . . . . . . . . . . . . . 343.3 Robust Permanent Income Theory . . . . . . . . . . . . . . 403.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5 Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 553.6 Quantifying Robustness from the Market Price of Risk . . . 603.7 Intertemporal Mean-Risk tradeoffs . . . . . . . . . . . . . . 683.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.A Subgradient Inequality . . . . . . . . . . . . . . . . . . . . . 763.B Computing Prices for State-Contingent Utility . . . . . . . . 77

vii

Page 4: Maskin ambiguity book mock 2 - New York University

viii Contents

3.C Computing Conditional Variance of SDF . . . . . . . . . . . 78

4 A Quartet of Semigroups for Model Specification, Ro-bustness, Prices of Risk, and Model Detection 814.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.3 Mathematical preliminaries . . . . . . . . . . . . . . . . . . 904.4 A tour of four semigroups . . . . . . . . . . . . . . . . . . . 964.5 Model Misspecification and robust Control . . . . . . . . . . 1004.6 Portfolio allocation . . . . . . . . . . . . . . . . . . . . . . . 1074.7 Pricing risky claims . . . . . . . . . . . . . . . . . . . . . . . 1124.8 Statistical discrimination . . . . . . . . . . . . . . . . . . . . 1194.9 Entropy and the market price of uncertainty . . . . . . . . . 1304.10 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 1414.A Proof of Theorem 4.5.1 . . . . . . . . . . . . . . . . . . . . . 144

5 Robust Control and Model Uncertainty 1475.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1475.2 A Benchmark Resource Allocation Problem . . . . . . . . . 1485.3 Model Misspecification . . . . . . . . . . . . . . . . . . . . . 1485.4 Two Robust Control Problems . . . . . . . . . . . . . . . . . 1505.5 Recursivity of the Multiplier Formulation . . . . . . . . . . . 1515.6 Two Preference Orderings . . . . . . . . . . . . . . . . . . . 1525.7 Recursivity of the Preference Orderings . . . . . . . . . . . . 1545.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 156

6 Robust Control and Model Misspecification 1576.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616.3 Three ordinary control problems . . . . . . . . . . . . . . . 1686.4 Fear of model misspecification . . . . . . . . . . . . . . . . . 1726.5 Two robust control problems defined on sets of probability

measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.6 Games on fixed probability spaces . . . . . . . . . . . . . . . 1826.7 Sequential timing protocol for a penalty formulation . . . . 1896.8 Sequential timing protocol for a constraint formulation . . . 1936.9 A recursive multiple priors formulation . . . . . . . . . . . . 1996.10 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . 204

Page 5: Maskin ambiguity book mock 2 - New York University

Contents ix

6.A Cast of characters . . . . . . . . . . . . . . . . . . . . . . . . 2066.B Discounted entropy . . . . . . . . . . . . . . . . . . . . . . . 2076.C Absolute continuity of solutions . . . . . . . . . . . . . . . . 2116.D Three ways to verify Bellman-Isaacs condition . . . . . . . . 2146.E Recursive Stackelberg game and Bayesian problem . . . . . . 216

7 Doubts or Variability? 2217.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2237.2 The equity premium and risk-free rate puzzles . . . . . . . . 2257.3 The choice setting . . . . . . . . . . . . . . . . . . . . . . . . 2287.4 A type I agent: Kreps-Porteus-Epstein-Zin-Tallarini . . . . . 2317.5 A type I agent economy with high risk aversion attains HJ

bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2337.6 Reinterpretations . . . . . . . . . . . . . . . . . . . . . . . . 2347.7 Reinterpreting Tallarini . . . . . . . . . . . . . . . . . . . . . 2457.8 Welfare gains from eliminating model uncertainty . . . . . . 2507.9 Dogmatic Bayesians and learning . . . . . . . . . . . . . . . 2597.10 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 2607.A Formulas for trend stationary model . . . . . . . . . . . . . 261

8 Robust Estimation and Control Without Commitment 2658.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2668.2 A control problem without model uncertainty . . . . . . . . 2708.3 Using martingales to represent model misspecifications . . . 2748.4 Two pairs of operators . . . . . . . . . . . . . . . . . . . . . 2768.5 Control problems with model uncertainty . . . . . . . . . . . 2798.6 The θ1 = θ2 case . . . . . . . . . . . . . . . . . . . . . . . . 2858.7 Implied worst case model of signal distortion . . . . . . . . . 2938.8 A recursive multiple priors model . . . . . . . . . . . . . . . 2958.9 Risk sensitivity and compound lotteries . . . . . . . . . . . . 2968.10 Another example . . . . . . . . . . . . . . . . . . . . . . . . 2978.11 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 300

9 Fragile Beliefs and the Price of Uncertainty 3039.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3049.2 Stochastic discounting and risks . . . . . . . . . . . . . . . . 3079.3 Three information structures . . . . . . . . . . . . . . . . . . 3129.4 Risk prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

Page 6: Maskin ambiguity book mock 2 - New York University

x Contents

9.5 A full-information perspective on agents’ learning . . . . . . 3159.6 Price effects of consumers’ concerns about robustness . . . . 3179.7 Illustrating the mechanism . . . . . . . . . . . . . . . . . . . 3259.8 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 3369.A Detection error probabilities . . . . . . . . . . . . . . . . . . 3379.B Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . 338

10 Three types of ambiguity 34110.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 34210.2 Illustrative model . . . . . . . . . . . . . . . . . . . . . . . . 34710.3 No concern about robustness . . . . . . . . . . . . . . . . . . 34810.4 Representing probability distortions . . . . . . . . . . . . . . 35610.5 The first type of ambiguity . . . . . . . . . . . . . . . . . . . 35710.6 Heterogeneous beliefs without robustness . . . . . . . . . . . 36410.7 The second type of ambiguity . . . . . . . . . . . . . . . . . 37110.8 The third type of ambiguity . . . . . . . . . . . . . . . . . . 37210.9 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . 37510.10Numerical example . . . . . . . . . . . . . . . . . . . . . . . 38110.11Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 38710.A Some basic proofs . . . . . . . . . . . . . . . . . . . . . . . . 38810.B Example without robustness . . . . . . . . . . . . . . . . . . 38910.C Example with first type of ambiguity . . . . . . . . . . . . . 39110.D Sensitivity to robustness . . . . . . . . . . . . . . . . . . . . 393

Bibliography 395

Author Index 435

Subject Index 441

Page 7: Maskin ambiguity book mock 2 - New York University

List of Figures

3.1 A (σ, β) locus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Consumption and investment . . . . . . . . . . . . . . . . . . . 48

3.3 Two impulse response functions . . . . . . . . . . . . . . . . . . 53

3.4 Two more impulse responses . . . . . . . . . . . . . . . . . . . . 53

3.5 Likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.6 Transitory and permanent endowment parts . . . . . . . . . . . 60

3.7 Estimated innovations . . . . . . . . . . . . . . . . . . . . . . . 67

4.1 Dominating function . . . . . . . . . . . . . . . . . . . . . . . . 122

4.2 An impulse response . . . . . . . . . . . . . . . . . . . . . . . . 135

4.3 Spectral density of consumption growth . . . . . . . . . . . . . 135

4.4 Drift distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.5 Impulse response for two incomes . . . . . . . . . . . . . . . . . 138

4.6 Impulse response for persistent income . . . . . . . . . . . . . . 140

7.1 Hansen-Jagannathan bound . . . . . . . . . . . . . . . . . . . . 227

7.2 Detection error probabilities . . . . . . . . . . . . . . . . . . . . 248

7.3 Risk-free rate and market price of risk . . . . . . . . . . . . . . 248

7.4 Elimination of risk and uncertainty . . . . . . . . . . . . . . . . 258

7.5 Cost of model uncertainty . . . . . . . . . . . . . . . . . . . . . 258

7.6 Worst-case consumption growth . . . . . . . . . . . . . . . . . . 260

9.1 Bayesian and Worst-Case Model Probabilities . . . . . . . . . . 326

9.2 Decomposition of Uncertainty Prices . . . . . . . . . . . . . . . 328

9.3 Decomposition of Uncertainty Prices . . . . . . . . . . . . . . . 329

9.4 (ι)Σ(ι)λ(ι) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

9.5 Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

9.6 Means and Model Probabilities . . . . . . . . . . . . . . . . . . 332

xi

Page 8: Maskin ambiguity book mock 2 - New York University

xii List of Figures

9.7 Contributions to Uncertainty Prices . . . . . . . . . . . . . . . . 3329.8 Learning and Risk Price . . . . . . . . . . . . . . . . . . . . . . 3339.9 Means and Model Probabilities . . . . . . . . . . . . . . . . . . 3359.10 Unknown Dynamics and Uncertainty Prices . . . . . . . . . . . 3369.11 Seven Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3409.12 Probabilities of Seven Models . . . . . . . . . . . . . . . . . . . 340

10.1 Four types of ambiguity . . . . . . . . . . . . . . . . . . . . . . 34310.2 Approximating and Worst-Case Models . . . . . . . . . . . . . . 38310.3 More Approximating and Worst-Case Models . . . . . . . . . . 385

Page 9: Maskin ambiguity book mock 2 - New York University

Chapter 1

Introduction

1.1 Overview of Model Uncertainty

These basic questions motivated us to write the papers in this book.

What is model uncertainty?

Model uncertainty is fear of model misspecification. For us, a model isa stochastic process, that is, a probability distribution over a sequence ofrandom variables, perhaps indexed by a vector of parameters. Model un-certainty means that a decision maker suspects that his model is incorrect.

Why do we care about it?

Because

• It is difficult statistically to distinguish alternative models from sam-ples of the sizes of typical macroeconomic data sets.

• Experiments by Ellsberg (1961) make the no-model-doubts aspect ofthe Savage (1954) axioms dubious.

As macro econometricians, we will emphasize the first reason. Appliedeconometricians often emerge from model fitting efforts acknowledging sub-stantial doubts about the validity of their model vis a vis nearby nearlyequally good fitting models. The second reason has led to work in decisiontheory that provides axiomatic foundations for some of the applied modelsthat we put forward in this book.1

1See Gilboa and Schmeidler (1989) and Maccheroni et al. (2006a,b).

1

Page 10: Maskin ambiguity book mock 2 - New York University

2 Chapter 1. Introduction

How do we represent it?

As a decision maker who has a set of models. The decision maker isunable or unwilling to reduce the set of models to a single model by puttinga prior over that set of models and creating a compound lottery.

How do we manage it?

We construct bounds on value functions intended to apply for all mem-bers of the decision maker’s set of models. Min-max expected utility isour tool for constructing bounds on value functions. We formulate a two-player zero-sum game in which a minimizing player chooses a probabilitydistribution from a set of models and thereby helps a maximizing player tocompute bounds on value functions. This procedure can be viewed as a wayof evaluating the fragility of decision rules with respect to perturbations ofa benchmark probability model.

Who confronts model uncertainty?

We as model builders do. So do agents inside our models, namely,private citizens and government policy makers.

How do we measure it?

By the size of the set of decision maker’s statistical models, as mea-sured by relative entropy. Relative entropy is an expected log likelihoodratio. Let f(x) is a probability density for a random variable x. A like-lihood ratio is a nonnegative random variable m(x) with expected valueunity:

∫m(x)f(x)dx = 1. Multiplication of f(x) by m(x) thus gener-

ates a perturbed probability density f(x) = m(x)f(x), since by construc-tion f(x) integrates to 1. Relative entropy is defined as Em(x) logm(x) =∫m(x) logm(x)f(x)dx. Relative entropy is a statistical measure of model

discrepancy that tells how difficult it is to distinguish f(x) and f(x) statis-tically. In particular, it governs the rate of statistical learning as a samplesize grows.

An urban legend claims that when Shannon constructed his measure, heasked von Neumann what to call it. Supposedly von Neumann said: “Callit entropy. It is already in use under that name and besides, it will give youa great edge in debates because nobody knows what entropy is anyway.”While there are serious doubts about whether von Neumann actually saidthat to Shannon, perhaps the doubtful veracity of the story only enhancesits appropriateness for us to quote in a book about model uncertainty.

Page 11: Maskin ambiguity book mock 2 - New York University

1.1. Overview of Model Uncertainty 3

How do we form a set of models?

We assume that a decision maker has a unique benchmark statisticalmodel f(x) that we often call an ‘approximating model.’ We call it anapproximating model to indicate that although it is the only explicitly for-mulated model possessed by the decision maker, he distrusts it. To expressthe decision maker’s doubts about his approximating model, we surroundthat model with all models that lie within an entropy ball of size η. Thedecision maker is concerned that some model within this set might actuallygovern the data.

Notice that we don’t specify particular alternative models with any de-tailed functional forms. We just characterize them vaguely with likelihoodratios whose relative entropies are less than η. In the applications to dy-namic models described in this book, this vagueness leaves open possiblenonlinearities and complicated history dependencies that are typically ex-cluded in the decision maker’s approximating model. Nonlinearities andhistory dependencies and false reductions in dimensions of parameter spacesare among the misspecifications that concern the decision maker.

How big is the set of models?

It is uncountable. In applications, an approximating model f(x) usu-ally incorporates concrete low-dimensional functional forms. The perturbedmodels f(x) = m(x)f(x) do not. An immense set of likelihood ratios m(x),most of which can be described only by uncountable numbers of parame-ters, lie within an entropy ball Em(x) logm(x) ≤ η. The sheer size of theset of models and the huge dimensionality of individual models within theset make daunting the prospect of using available data to narrow the set ofmodels.

Why not learn your way out of model uncertainty?

Our answer to this question comes from the nature of the set of modelsthat a decision maker thinks might govern the data. All papers in thisbook generate this set in the special way described above. First, we imputeto the decision maker a single model that we refer to at different timeseither as his approximating model or his benchmark model. To capturethe idea that the decision maker doubts that model, we surround it witha continuum of probability models that lie within a ball determined by ameasure of entropy relative to the approximating model. That procedure

Page 12: Maskin ambiguity book mock 2 - New York University

4 Chapter 1. Introduction

gives the decision maker a vast set of models that are more or less close tothe approximating model, where proximity is judged by relative entropy.

We intentionally generate the decision maker’s set of models in this waybecause we want to express the idea that the decision maker is worried aboutmodels that are both vaguely specified and that are potentially difficult todistinguish from the approximating model using statistical discriminationtests. They are vaguely specified in the sense that they are described only asthe outcome of multiplying the approximating models probability density bya likelihood ratio whose relative entropy is sufficiently small. That puts onthe table a huge number of models having potentially very high dimensionalparameter spaces. Learning which of these models actually governs thedata is a daunting, if not impossible task.2 For example, as we shall seein applications in several chapters, specifications differentiated by their lowfrequency attributes are statistically especially difficult to distinguish (tolearn those features, laws of large numbers and central limit theorems askfor almost infinite patience).

More broadly, the fact that a decision maker entertains multiple modelsputs us on unfamiliar ground in terms of an appropriate theory of learning.Bayesian learning theory depends on a decision maker’s having a singlemodel in our sense of model. A Bayesian knows the correct model fromthe beginning. A Bayesian learns by conditioning in light of a single model(i.e., a probability distribution over a sequence). How do you learn whenyou don’t have a single model?

The applications in this book take two positions about learning. Chap-ters 2 through 7 exclude learning by appealing to the immense difficulty oflearning. Chapters 8 and 9 describe and apply an approach to learning thatimposes more structure on the decision maker’s model uncertainty.

How does model uncertainty affect equilibrium concepts?

To appreciate how model uncertainty affects standard equilibrium con-cepts, first think about the now dominant rational expectations equilibriumconcept. A rational expectations model is shared by every agent inside themodel, by nature, and by the econometrician (please remember that by amodel, we mean a probability distribution over a sequence). All econometricapplications of rational expectations models heavily exploit this ‘commu-nism of models.’ The ‘sharing with nature’ part precludes concerns about

2Sims (1971b) and Diaconis and Freedman (1986) describe the difficulty of usingstatistical methods to learn when parameter spaces are uncountable.

Page 13: Maskin ambiguity book mock 2 - New York University

1.1. Overview of Model Uncertainty 5

model misspecification from being analyzed in a coherent way within a ra-tional expectations model.

Our personal research histories and predilections as rational expecta-tions econometricians makes us want an equilibrium concept as close aspossible to rational expectations. In our applications, we have accomplishedthis by attributing a common approximating model to all agents living ina model. While all agents share this approximating model, some of themfear that it is misspecified, prompting them to use a min-max expectedutility theory. When agents’ interests differ, that generates ex post beliefheterogeneity even though agents share a common approximating model.This approach leads to an equilibrium concept that is an extension of eithera recursive competitive equilibrium or a Nash or subgame perfect equilib-rium.3

What does model uncertainty do to equilibrium quantities?

Chapter 3 describes how an increase in a representative consumer’smodel uncertainty has effects on quantities that operate much like an in-crease in his discount factor. This feature manifests itself in an observa-tional equivalence result that defines a ridge in a likelihood function in aplane of a discount factor and a single parameter that we use to measurerobustness. This happens because the consumer’s fear of misspecificationof the stochastic process governing his nonfinancial income induces a formof precautionary saving.

What does it do to equilibrium prices?

Despite the observational equivalence result for quantities, fear of modeluncertainty multiplies the ordinary stochastic discount factor by what lookslike a potentially volatile ‘preference shock’ from the point of view of theapproximating model, but what is actually a likelihood ratio of a worst-casemodel to an approximating model.4 It appears because min-max portfolioholders’ worst-case beliefs affect state contingent prices. That gives riseto a ‘market price of model uncertainty.’ Several papers in this book doc-ument how including a market price of model uncertainty helps a model

3There is also a connection to a self-confirming equilibrium.4Notice that this statement is from the view point of the approximating model, an

important qualification that the reader of chapters 3, 4, 7, and 9 should keep in mind.In a setting with multiple models, the outside analyst has to adopt some unique modelto proceed with econometric or quantitative work. Lars XXXXX: please read and signoff and possibly edit this footnote.

Page 14: Maskin ambiguity book mock 2 - New York University

6 Chapter 1. Introduction

attain the asset pricing bounds of Hansen and Jagannathan (1991). It doesthat by increasing the volatility of the stochastic discount factor under anapproximating model.

Does aversion to model uncertainty resemble risk aversion?

Yes, in some ways, but it activates attitudes about the intertemporaldistribution of uncertainty that distinguish it from risk aversion. And riskaversion and model uncertainty aversion would be calibrated using verydifferent mental experiments, as emphasized in chapters 4 and 7.

Can small amounts of uncertainty aversion substitute for largeamounts of risk aversion?

The answer to this question plays a big role in sorting through alter-native explanations of the equity premium puzzle of Hansen and Singleton(1983) and Mehra and Prescott (1985a). What ‘small’ and ‘large’ meandepend on calibration strategies. Risk means attitudes toward gambles de-scribed by completely trusted probability distributions. Macro and financeeconomists typically calibrate risk aversion using a mental experiment pro-posed by Pratt (1964). Chapters 4 and 7 use a very different mental exper-iment to calibrate reasonable amounts of aversion to model uncertainty, anexperiment based on measures of statistical discrepancies between alterna-tive statistical models.

How does model uncertainty affect government policy design prob-lems?

Min-max portfolio holders’ worst-case beliefs affect state contingentprices. This can make a disciplined form of purposeful belief manipulationconcern a Ramsey planner. Chapter 10 discusses some alternative ways toconfigure model uncertainties when a Ramsey planner faces a representativecompetitive agent.

1.2 Nine papers about model uncertainty

An LQG robust control problem with discounting

Chapter 2 reproduces “Discounted Linear Exponential Quadratic Gaus-sian Control.” It extends linear-quadratic-Gaussian dynamic programmingby adding the risk adjustment of Jacobson (1973, 1977) and Whittle (1981,

Page 15: Maskin ambiguity book mock 2 - New York University

1.2. Nine papers about model uncertainty 7

1989a, 1990). Our contribution is to incorporate discounting in a way thatis convenient for applications in macroeconomics and finance.

The risk-sensitivity operator TU(x) = −θ logE exp(

−U(x′)θ

)∣∣∣x, θ < θ <

+∞ that makes its fist appearance here will be featured throughout thisbook. The conditional expectation is with respect to a transition densityF (x′|x) for a Markov state x. The transition density serves as a decisionmaker’s baseline model. The idea of chapter 2 and much of the book is to re-place conditional expectations of continuation values EU(x′)|x with TU(x)and then to proceed with business as usual for dynamic programming, gametheory, and Ramsey planning.

What does replacing E with T mean? The operator T has two inter-pretations. The first regards T as making an additional adjustment forrisk beyond what is expressed in the curvature of the utility function U .Applying T instead of E to a utility function U amounts to saying thatthe decision maker cares not just about expected utility but also about thevariance of utility. In particular, the decision makers dislikes variance ofutility. Except when U is quadratic, the decision maker cares about highermoments if utility too.

A second interpretation, and one that we shall stress in this book, comesfrom regarding T as the indirect utility function for a problem in which aminimizing agent chooses a distorted probability F (x′|x) that minimizesexpected utility plus θ times the conditional entropy of F relative to F :

TU = minm≥0,E(m|x)=1

∫m(x′|x) (U(x′) + θ log(m(x′|x))) dF (x′|x).

Entropy equals Em logm and is a measure of the discrepancy between twodensities that will feature throughout the book. The ratio of the minimizingprobability density F to F is given by the exponential twisting formula

m(x′|x) = F

F∝ exp

(−U(x′)

θ

),

an expression that Bucklew (2004, p. 27) characterizes as a stochastic ver-sion of Murphy’s law: events occur with probabilities inversely related totheir desirability. The worst-case probability distribution depends on theutility function and the actions that shape the transition probability F (x′|x)for the baseline model.

Page 16: Maskin ambiguity book mock 2 - New York University

8 Chapter 1. Introduction

Why distort probabilities associated with a baseline model in this pes-simistic way? The answer is that it can be a good thing to do if the decisionmaker doesn’t completely trust F (x′|x) and wants to select a decision rulethat will work well enough if the data are not generated by F . By de-signing a decision rule that is optimal against a worst-case density F (x′|x),the decision maker can assure itself acceptable performance if the data aregenerated by one from among a set of probability models surrounding thebaseline model F (x′|x).

Two interrelated reasons motivated us to study discounted linear-quadratic-Gaussian problems. The logE exp adjustment associated with the T op-erator can be computed almost analytically when U is quadratic and F isconditionally Gaussian. Here, ‘almost analytically’ means ‘up to solvinga matrix Riccati equation.’ The worst-case density can be computed bysolving another Riccati equation. All of this means that for the linear-quadratic-Gaussian case, replacing E with T in the Bellman equation as-sociated with the dynamic programming problem typically used in financeor macroeconomics creates no additional analytical challenges.

With our eyes on applications in macroeconomics and finance, we in-corporate discounting differently than Whittle (1989a, 1990). We discountfuture time t contributions to utility and to entropy in a way that is designedto deliver a time-invariant optimal decision rule for an infinite-horizon prob-lem, a huge computational and econometric convenience. Whittle effectivelydiscounts future time t contributions to utilities, but does not discount fu-ture contributions to entropy. A consequence of that is to render decisionrules time-dependent in a way that makes effects from risk-sensitivity andconcerns about model specification wear off with the passage of time, afeature that we do not like for many applications.

An application to a real business cycle model

Chapter 3 reproduces “Robust Permanent Income and Pricing,” writtenjointly with Thomas Tallarini. This paper formulates a planning problemassociated with a linear-quadratic-Gaussian real business cycle model asa risk-sensitive LQG control problem of the type discussed in chapter 2.After constructing a competitive equilibrium whose allocation solves theplanning problem, the paper uses competitive equilibrium prices to pricerisky assets. By reinterpreting a risk-sensitive control problem in terms ofa fear of model misspecification, the paper constructs risk-return tradeoffsthat mainly reflect market prices of model uncertainty. Quantitatively, the

Page 17: Maskin ambiguity book mock 2 - New York University

1.2. Nine papers about model uncertainty 9

paper introduces ways of thinking about a question that recurs throughoutthe book: can a moderate fear of model misspecification substitute for alarge amount of risk-aversion in its impact on prices of risky securities?

Relative to a standard linear-quadratic-Gaussian real business cyclemodel, we bring one new free parameter – the θ that appears in the risk-sensitivity operator T. The paper studies the effects of hypothetical varia-tions in θ on competitive equilibrium quantities and prices. The paper dis-covers that for quantities, variations in θ can be completely offset by appro-priate variations in the discount factor β. There is a (β, θ) locus, movementsalong which preserve all competitive equilibrium quantities. An increase infear of model misspecification, captured by a reduction in θ, operates likean increase in the discount factor. The paper interprets this effect in termsof a precautionary saving motive that differs from the precautionary sav-ings motive coming from risk aversion. Thus, the precautionary savingsassociated with fears of model misspecification comes from the distortionsthose fears put into the conditional means of income processes under whicha household plans its savings, an effect that prevails even though the con-tinuation value function is quadratic. By way of contrast, precautionarysavings associated with risk aversion emerges because the third derivativeof a consumer’s continuation value function is positive. Tom and LarsXXXXX: double check the sign!

While movements along that (β, θ) locus leave equilibrium quantitiesunaltered, they do affect equilibrium prices. Along this locus, decreases inθ – meaning increases in fears of model misspecification – cause increases inwhat are usually interpreted as market prices of risk. Because they reflectfears of model misspecification, we refer to the components of those riskprices coming from θ as ‘market prices of model uncertainty.’

The existence of a (β, θ) locus implying identical equilibrium quantitiesbut differing prices is an exact result in our linear-quadratic-Gaussian realbusiness cycle model. It is also a very good approximation in the nonlinearreal business cycle model of Tallarini (2000a). Exact or approximate, thisstriking result motivates the two part quantitative strategy adopted both inour paper and in Tallarini (2000a). That quantitative strategy is first to usethe method of maximum likelihood to estimate a real business cycle modelby using data only on quantities and setting θ = +∞, thereby shuttingdown the planner’s concerns about model misspecification.5 Our second

5A confession: originally, before knowing our observational equivalence results, we

Page 18: Maskin ambiguity book mock 2 - New York University

10 Chapter 1. Introduction

step is then to study the consequences of movements along the (β, θ) locuson market prices of uncertainty, freezing all other parameters.

Other parts of this paper prefigure ideas and procedures to be developedin subsequent chapters of this book, namely, judging quantitatively reason-able fears of model misspecification by the gap between a baseline modelF and the worst-case model F generated by exponential twisting according

to FF

∝ exp(

−U(x′)θ

). Subsequent chapters tighten the link between the

market price of uncertainty, conditional mean distortions, and likelihoodratio tests for discriminating between models.

Four semigroups

Chapter 4, which reproduces “A Quartet of Semigroups for Model Spec-ification, Robustness, Prices of Risk, and Model Detection,” written jointlywith Evan W. Anderson, describes a common mathematical structure – asemigroup – that underlies four concepts important in our analysis of theequilibrium consequences of a representative consumer’s concerns aboutmodel misspecification. A semigroup is a collection of objects that satis-fies conditions like those that underly the law of iterated expectations, aworkhorse familiar both to econometricians and to rational expectationstheorists. These same conditions underly a law of iterated values widelyused in pricing assets of different maturities. The four semigroups featuredin this paper pertain to (1) a Markov process that we interpret as a deci-sion maker’s baseline probability model; (2) a perturbation to that baselinemodel that we use to express an alternative specification that a decisionmaker or representative consumer fears might actually govern the data;(3) a stochastic discount factor that assigns prices to risks that a Markovbaseline statistical model presents to a representative consumer; and (4) astochastic process that generates bounds on errors in a good statistical testfor discriminating between a baseline model and a perturbation to it.

This paper thus extends and formalizes ideas and applications first in-troduced in chapter 3. In particular, via our four semigroups, we describeintricate connections among bounds on detection error probabilities, themagnitude of a perturbed deviation from a baseline model, and marketprices of uncertainty. We use these connections to guide a quantitative ap-

included both β and θ among the free parameters that we sought to estimate by maximumlikelihood. Maximum likelihood estimation recovered a (β, θ) ridge in the likelihoodfunction. That discovery sent us to work to prove the existence of such a locus.

Page 19: Maskin ambiguity book mock 2 - New York University

1.2. Nine papers about model uncertainty 11

plication to a continuous time model of long-run consumption risk in thetradition of Bansal and Yaron (2004).

Two papers on robust control

“Robust Control and Model Uncertainty” and “Robust Control andModel Misspecification,” chapters 5 and 6, link versions of robust controltheory to pertinent decision theories. These papers also set the stage forthinking about how to calibrate the penalty parameter θ in quantitativeapplications.

Chapter 5 introduces two preference orderings over consumption plans,one called ‘multiplier preferences’ and another called ‘constraint prefer-ences.’ Both preference orders represent a decision maker’s fear of modelmisspecification. The preference orderings over consumption or outcomepaths differ, but their indifference curves are tangent along a given path.This tangency property is useful in all of our applications of multiplier andconstraint preferences to asset pricing. The parameter θ appears in bothpreference orderings, but its meaning differs. For multiplier preferences, θ isa primitive parameter that scales the penalty that a minimizing agent paysin terms of discounted relative entropy. For constraint preferences, θ is anoutcome, namely, a Lagrange multiplier on the discounted relative entropythat constrains a malevolent agent bent on minimizing the expected utilityof a maximizing agent. For constraint preferences, the primitive parameteris not θ but the discounted relative entropy available to the minimizingagent, while for multiplier preferences, the discounted relative entropy as-sociated with a given θ is an outcome.

Multiplier and constraint preferences are both instances of min-maxexpected utility theory. From a robust control theory point of view, a mini-mizing agent chooses probabilities to benefit an expected utility maximizingagent by teaching him about the fragility of his decision rule with respectto perturbations around his baseline probability model. Constraint prefer-ences are a particular instance of min-max expected utility preferences ofGilboa and Schmeidler (1989), while multiplier preferences have been ax-iomatized and generalized as variational preferences by Maccheroni et al.(2006a,b).

Chapter 6 describes five sequential problems ((1) a benchmark controlproblem, (2) a risk-sensitive problem, (3) a penalty or multiplier robustcontrol problem, (4) a constraint robust control problem, (5) an ex postBayesian problem) and two non-sequential control problems and explores

Page 20: Maskin ambiguity book mock 2 - New York University

12 Chapter 1. Introduction

relationships among them. Five Hamilton-Jacobi-Bellman (HJB) equationsconcisely summarize the sequential problems. The first two sequential prob-lems assume complete trust in a baseline model, though they incorporatedifferent attitudes towards the intertemporal distribution of risk. The fi-nal three sequential describe a decision maker who distrusts the baselinestatistical model.

For the applied work to be described in subsequent chapters, especiallychapter 7, a significant finding is that the risk-sensitive and the penalty ormultiplier control problems are observationally equivalent. This opens theway to the reinterpretation of Tallarini (2000a) presented in chapter 7.

The fact that a Bellman-Isaacs technical condition allows the order ofmaximization and minimization to be exchanged makes possible an ex postBayesian interpretation of a robust decision rule under either multiplier orconstraint preferences. Here ex post means after minimization and afterexchanging orders of maximization and minimization. The Bayesian in-terpretation of a robust decision rule is that it is a best response to theworst-case model, a feature of a robust decision rule discussed in chapter?? that assures us that it is an admissible decision rule in the sense ofBayesian statistical decision theory. The ex post Bayesian interpretation ofa robust decision rule plays an important role in the asset pricing theorydeveloped and applied in chapters 3, 4, and 7. A robust representativeconsumer puts worst-case probabilities into state-contingent prices, makingthe ratio of worst-case to baseline probabilities a key determinant of riskprices when viewed from the perspective of the baseline probability model.6

An application to the Lucas-Tallarini debate

Chapter 7, ‘Doubts or Variability?,’ reinterprets a quantitative debatebetween Tallarini (2000a) and Lucas (2003) from the viewpoint of constraintand multiplier preferences. For a representative consumer with preferences

E0

∑∞t=0 β

t c1−γt

1−γ , a value of γ set to 1 or 2, and a time series model of aggre-

gate consumption ct calibrated to fit post WWII U.S. date, Lucas (1987a)estimated that there would be only small benefits to further reductions inthe volatility of aggregate consumption around trend. From that calcula-tion, he inferred that welfare gains that flow from boosting growth are muchgreater than those from further moderating business cycles.

6From the viewpoint of the baseline model, that likelihood ratio serves as an endoge-nous shock to instantaneous utility.

Page 21: Maskin ambiguity book mock 2 - New York University

1.2. Nine papers about model uncertainty 13

But Tallarini noted that the preference ordering Lucas had attributedto the representative consumer implied a stochastic discount factor that ut-terly failed to explain asset prices. Tallarini asserted that since asset pricesreflect consumers’ attitudes toward the business cycle risk reductions thatLucas accessed, it seems important to evaluate those risk reductions witha preference specification and parameter values that do a better job of ap-proximating the asset pricing facts than does Lucas’s specification. Lucas’spreference specification left open both an equity premium puzzle (the mar-ket price of risk is far too low with Lucas’s value of γ) and a risk-free ratepuzzle (increasing γ drives the market price of risk up only by making therisk-free rate of interest far too high). To improve Lucas’s estimate of thecosts of business cycles, Tallarini wanted a preference specification capableof explaining these two features of asset prices. The heart of Lucas’s model’sproblem is that a single parameter γ combines attitudes toward risk withattitudes toward intertemporal substitution. Tallarini recognized that thepreference specification of Kreps and Porteus (1978a) and Epstein and Zin(1989a) that separate risk-aversion from intertemporal substitution is theright tool for the job. Tallarini used the finding from chapter ?? that mul-tiplier preferences and risk-sensitive preferences are observationally equiva-lent and the fact that risk-sensitive preferences are Kreps-Porteus-Epstein-Zin preferences for the special case in which the elasticity of substitutionequals unity. For Tallarini, locking the intertemporal elasticity of substi-tution at unity was useful because it arrested the risk-free rate puzzle. Headopted risk-sensitive preferences and interpreted θ as a (transformation ofa) parameter measuring aversion atemporal gambles, namely,

θ =−1

(1− β)(1− γ),

where γ is a coefficient of relative risk aversion and β is a discount factor. Heshowed how to select a value of γ and therefore θ that generates a stochasticdiscount factor capable of matching both the market price of risk and therisk-free rate in US data. That value of θ implies very high risk aversioncoefficient γ. When Tallarini used that value of γ to compute the welfarebenefits of further reductions in business cycle volatility, he obtained anestimate substantially larger than Lucas’s.

Tallarini’s findings failed to convince Lucas (2003), who asserted thatthe asset pricing data, when interpreted with the Lucas (1978) asset pric-ing model, are just not a reliable source of evidence about the risk aver-

Page 22: Maskin ambiguity book mock 2 - New York University

14 Chapter 1. Introduction

sion parameter pertinent for measuring the costs of business cycles. Lucassuggested that those asset price puzzles would ultimately be explained byappealing to economic forces other than risk aversion. Chapter 7 takes thathint seriously and runs with it.

To meet Lucas’s challenge, chapter 7 exploits three findings describedin earlier chapters: (1) the observational equivalence of risk-sensitive andmultiplier preferences; and (2) the fact that multiplier and constraint prefer-ences have indifferent curves that are tangent at a given allocation; and (3)the connection between the relative entropy limiting constraint preferencesand the detection error probabilities used to calibrate θ in chapter 4. Thus,the chapter interprets θ as a Lagrange multiplier on the relative entropyparameter in constraint preferences and uses detection error probabilitiesto show that a moderate amount of concern about model misspecificationunder constraint preferences can substitute for the substantial risk aversionthat provoked Lucas’s skepticism and dismissal of Tallarini’s redone com-putation of the welfare costs of business cycles. The chapter goes on toindicate that argue that most of the big welfare costs found by Tallarinipertain not to the mental experiment cast by Lucas – which are about areduction in pure, well-understood risk – but instead to a quite differentmental experiment about a response to a reduction in model uncertaintyas measured by the size of the set of probability models surrounding thebaseline model.

The chapter 7 reinterpretation of Tallarini rests on features of the worst-case probability model associated with a Lagrange multiplier θ capable ofapproximating the asset pricing data. It is as if the minimizing agent whochooses that worst-case model had read Lucas’s paper with its stress onthe importance of growth instead of volatility as a source of welfare gains.When the baseline model for the log of aggregate consumption is a randomwalk with drift, the worst-case model has a lower drift but an unalteredvolatility. This outcome reflects that it is much cheaper for the minimizingagent to harm the maximizing agent by spending his relative entropy budgetin reducing the drift than in increasing the volatility. The chapter links thisoutcome to analogous findings in chapter 6 about absolute continuity overfinite intervals and the detection error probability calculations in chapter 4.

Two papers on Hidden Markov Models

Couldn’t a long-lived decision maker could use observations that accu-mulate with the passage of time to diminish model uncertainty? Chapters 2

Page 23: Maskin ambiguity book mock 2 - New York University

1.2. Nine papers about model uncertainty 15

through 7 put agents in settings in which they either can’t or aren’t allowedto learn. To rationalize that modeling choice, chapters 4 and 7 use detectionerror probabilities to set discounted entropy to levels that make it difficultto discriminate between models with the amount time series data availablein our applications. Chapters 2 through 7 thus took learning off the tableand studied the consequences of decision making in settings where the al-ternative specifications that concern the decision maker are vast in numberand statistically close to the decision maker’s baseline model.

The next two chapters turn to settings where there is something to learn,in particular, either a finite set of parameters, or which among a finite setof models governs the data. In these chapters, the decision maker’s baselinemodel becomes a Hidden Markov Model (HMM), a natural setting for for-mulating parameter estimation and model selection problems. Chapter 8reproduces “Robust Estimation and Control Without Commitment,” whichtogether with two papers not included here (Hansen and Sargent (2005b,2011)), study alternative approaches to robust learning. Chapter 9, “Frag-ile Beliefs and the Price of Uncertainty,” applies the theory developed inchapter 8 to a quantitative model of risk prices in an economy in which ag-gregate consumption might or might not contain the long-run risk positedby Bansal and Yaron (2004).

The beauty of Bayes’ Law is that learning just means applying the math-ematical properties of conditional expectations to a single joint probabilitydistribution. When a decision maker fully trusts a baseline HMM, Bayes’Law becomes a complete theory of sequential learning suitable for joint es-timation and decision making. But our robust decision maker doesn’t trustthe baseline HMM and surrounds it with a large set of other statisticalmodels. How should we think about learning in the presence of a vast setof probability models? An approach that we don’t take would be to applyBayes’ law to all of them, then somehow apply a version of min-max decisiontheory at each date. Instead, chapter 8 presents an approach that appliesBayes’ Law only once at each date, to the baseline model, but then ex-presses and copes with doubts about the posterior probability distributionsover hidden Markov states that emerge from Bayes’ law at each date. Ourtool for expressing those doubts is again a T operator that exponentiallytwists a pertinent value function. Thus, our approach combines ‘businessas usual’ Bayesian learning with an application of min-max decision theoryto manage specification doubts.

This strategy leads us to use two T operators, each with its own θ.

Page 24: Maskin ambiguity book mock 2 - New York University

16 Chapter 1. Introduction

A first T1 operator conditions on knowledge of the hidden Markov stateand adjusts for doubts about conditional probabilities associated with thebaseline model. A secondT2 operator adjusts for doubts about the posteriorover hidden states coming from Bayes law under the baseline model. Wedescribe a value function recursion that uses T1 to replace an expectationconditional on hidden Markov states and T2 to replace an expectation overthe distribution over hidden states that emerges from Bayes’ law applied tothe decision maker’s HMM. 7

Chapter 9 uses these two operators to create a long-run consumptionrisk model in which a representative consumer distrusts an HMM modeldesigned to build on and modify some ideas from the long-run risk paper ofBansal and Yaron (2004). Bansal and Yaron motivated their original long-run risk model by noting that while the log of percapita consumption in thepost WWII US is well modeled as a random walk with positive and constantdrift, it is statistically difficult to distinguish that model from another thatmakes the log of aggregate per capita consumption a random walk with adrift that itself is a highly persistent process with small conditional volatil-ity and substantially larger unconditional volatility, a model that exposesa representative consumer to ‘long-run risk.’ Bansal and Yaron used thedifficulty of distinguishing those two models to justify positing that a rep-resentative consumer puts probability one on a long-run risk model andignores the equally good fitting iid log consumption growth model.

Chapter 9 departs from Bansal and Yaron by positing a representativeconsumer that responds to the existence of those two good fitting modelby keeping both of them on the table and attaching equal initial priorprobabilities to them. A dummy variable ι ∈ {0, 1} that indexes the twosubmodels becomes a hidden Markov state in the representative consumer’sbaseline probability model. Our representative agents distrusts each com-ponent ι = 1, 2 of its baseline model, prompting it to apply T1 to each ofthem; and it distrusts the posterior probability that ι = 1 that emergesfrom Bayes’ law, prompting it to apply T2 to it. This behavior inducescountercyclical fluctuations in market prices of uncertainty that come fromthe way the T2 operator endogenously induces pessimism in the sense thatthe representative consumer views bad news as permanent and good news

7By committing the decision maker to distortions chosen in the past, Hansen andSargent (2005b) takes a closely related but different approach than chapter 8. LarsXXXXXX: you might want to expand this footnote.

Page 25: Maskin ambiguity book mock 2 - New York University

1.2. Nine papers about model uncertainty 17

as permanent. This effect works through the impact of consumption growthrate surprises on the worst-case posterior probability the ι = 1

Three uncertain Ramsey planners

Chapters 2 through 9 formulate and apply single-agent decision theories,while sometimes appealing to tricks that allow us to compute competitiveequilibria by resorting to appropriate versions of the two fundamental the-orems of welfare economics. Chapter 10, “Three types of ambiguity,” isdevoted to studying a multi-agent problem known as a Ramsey problem.Here a benevolent ‘leader’ or Ramsey planner at time 0 once and for all de-signs a history-dependent strategy for choosing policy instruments, takinginto account how purposeful competitive agents who choose sequentiallyrespond to that strategy. The Ramsey planner takes account of privateagents’ responses to its strategy by including a set of private sector Eu-ler equations as constraints (called ‘implementability constraints’) on theRamsey planner’s decision problem.

Our departure point for chapter 10 is a rational expectations equilib-rium. All agents inside a rational expectations model share a common prob-ability model, an assumed communism of beliefs that immensely economizeson parameters needed to characterize different agents’ beliefs. When someagents inside an equilibrium model themselves use multiple models as a wayto express their uncertainty, we need some device to replace that form ofrational expectations communism. In chapter 10, we assume that all agentsshare a common baseline or approximating model, but allow them to havediffering degrees of doubt about the baseline model. Chapter 10 discussesthe consequences of three types of doubt that a Ramsey planner might haveabout a baseline model. By way of contrast, other researchers have focusedon formulations of Ramsey problems in which a Ramsey planner trusts abaseline model that private agents doubt.8

8Ramsey problems with a different type of ambiguity are analyzed by Karantounias(2012) and Orlik and Presno (2012). In their work, a Ramsey planner completely trusts abaseline model but thinks that private agents have a set of models contained in an entropyball surrounding the planner’s model. The Ramsey planner takes into account how itsactions influence private agents’ choice of a worst-case model along the boundary of thatsurrounding set of models. Part of the challenge for the Ramsey planner is to evaluate theprivate agent’s Euler equation using the private agent’s worst-case model. Through itschoice of actions that affect the equilibrium allocation, the planner manipulates privateagents’ worst-case model.

Page 26: Maskin ambiguity book mock 2 - New York University

18 Chapter 1. Introduction

Under type I ambiguity, the Ramsey planner has a set of models centeredon a baseline model that reflects the planner’s uncertainty about both theevolution of the exogenous processes and how the private sector views theseprocesses. The planner believes that private agents know a probabilityspecification that actually governs data and that resides within a set ofmodels surrounding the planner’s baseline model. To cope with its modeluncertainty, the Ramsey planner’s alter ego chooses a worst-case modelfrom that set, while evaluating private sector Euler equations using thatworst-case model.

In the spirit of Hansen and Sargent (2008b, ch. 16), a Ramsey plannerwith type II ambiguity has a set of models surrounding a baseline model thatprivate agents share with the planner, but completely trust. The Ramseyplanner’s probability-minimizing alter ego chooses a worst-case model fromwithin a set surrounding the baseline model, while evaluating private theagent’s Euler equations using the planner’s baseline model.

Following Woodford (2010), a Ramsey planner with type III ambiguityhas a single model of exogenous processes and thus no ambiguity along thisdimension, but it faces ambiguity because it knows that only the privatesector’s model is within an entropy ball surrounding its own model. TheRamsey planner evaluates the private sector’s Euler equations using a worst-case model chosen by the Ramsey planner’s alter ego.

In all of this work, we extensively use the chapter 6 characterization ofprobability perturbations as martingales with respect to a baseline prob-ability model. Depending on the type of ambiguity under study, thesemartingales do or don’t appear in the private sector’s Euler equations thatform implementability constraints for the Ramsey planner. A minimizingagent’s decision comes down to choosing the drift in these martingales. Aswe sort through the three types of ambiguity, we have to be careful to putmartingales in the correct places.

Page 27: Maskin ambiguity book mock 2 - New York University

Chapter 2

Discounted Linear ExponentialQuadratic Gaussian Control

Abstract

We describe a recursive formulation of discounted costs for a lin-ear quadratic exponential Gaussian linear regulator problem whichimplies time-invariant linear decision rules in the infinite horizoncase. Time invariance in the discounted case is attained by surren-dering state-separability of the risk-adjusted costs.

This paper formulates a version of a d iscounted Gaussian optimal linearregulator in which the return function is modified as suggested in Jacob-son (1973), Jacobson (1977), Whittle (1981), Whittle (1989a), and Whittle(1990) to incorporate a risk adjustment. . In Jacobson (1973), the prob-lem is formulated for the undiscounted case. Contributions ? and Whittle(1990) described how recursions on a Riccati difference equation apply to adiscounted version of the problem. In their formulation with discounting,the optimal decision rules fail to be time-invariant: over time the effects ofthe risk-parameter ‘wear off,’ and the decision rules eventually converge towhat would prevail in the usual linear-quadratic case.

We propose an alternative discounted version of the problem that pre-serves time-invariance of the decision rules in the infinite-horizon problem.We attain this desirable outcome by specifying the cost function recur-sively, and by surrendering the assumption embedded in previous formula-tions that the risk-adjusted measure of cost is separable across states of theworld.

19

Page 28: Maskin ambiguity book mock 2 - New York University

20 Chapter 2. Discounted Quadratic Control

2.1 Cost Formulation

Let {Jt : t = 0, 1, . . .} denote an increasing sequence of information sets(sigma algebras); and {wt : t = 1, 2 . . .} an m-dimensional sequence ofindependently and identically normally distributed random vectors withmean zero and covariance matrix I, where wt+1 is independent of Jt andwt is measurable with respect to Jt. Let {xt : t = 0, 1, . . .} denote ann-dimensional sequence of state vectors that evolve according to:

xt+1 = Axt +But + Cwt+1, t = 0, 1, . . . , (2.1)

where x0 is a given initial vector that can be random but is restrictedto be measurable with respect to J0. In (2.1), {ut : t = 0, 1, . . .} is ak-dimensional sequence of control vectors where ut is restricted to be mea-surable with respect to Jt. Let β ∈ (0, 1) be a discount factor.

We use the following r isk-adjusted measure of cost for each period t =0, . . . , T − 1:

Ct,T = {u′tQut + x′tRxt −(2β/σ) logE[exp(−σCt+1,T /2) | Jt]}CT,T = 0. (2.2)

When β = 1, the time zero cost C0,T can be computed using only the timezero conditional expectation operator E(·|J0).

1 But when β is strictly lessthan one, computation of C0,T with recursion (2.2) uses the conditional ex-pectation operators E(·|J1), . . . , E(·|JT−1) as well. Therefore, when β <1, specification (2.2) relaxes the assumption of state-separability axioma-tized by Neumann and Morgenstern (1944a).2We shall show that the func-tional form used in (2.2) has the features that: (1) the value functions arequadratic functions of the state vector, as in the familiar optimal linearregulator problem and in its risk-adjusted version suggested by Jacobson(1973) and Whittle (1981); (2) the statistics of the noise process influencethe optimal decision rules in a way that depends on the value of σ; and (3)the infinite time horizon problem is well posed and yields a time-invariantoptimal linear control law.

1Alternatively, the discount factor β could be placed inside the exponential functionin (2.2). This would result in an equivalent risk-adjusted measure of costs when therisk-adjustment parameter σ is replaced by σ/β.

2In abandoning state-separability to introduce independent adjustments for risk intoobjective functions, we are following Kreps and Porteus (1978a), Epstein and Zin (1989b),Weil (1990), and Weil (1993).

Page 29: Maskin ambiguity book mock 2 - New York University

2.2. Cost Recursions and Aggregator Functions 21

2.2 Cost Recursions and Aggregator

Functions

To characterize some properties of our discounted, risk-adjusted costs wefollow Koopmans (1960), Kreps and Porteus (1978a), Lucas and Stokey(1984a), Epstein and Zin (1989b) and use an aggregator function α to rep-resent costs recursively. An aggregator function maps hypothetical controls,states, and next period costs into current period costs. Let U be a space ofk-dimensional random control vectors, X an n-dimensional space of randomstate vectors, and L+ the set of nonnegative scalar random variables. Forconvenience, we let the random variables in L+ attain the value +∞ forsome states of the world. Our aggregator function maps U ×X ×L+ → L+

and is indexed by a sigma algebra J :

α(u, x,Γ | J ) = u′Qu+ x′Rx+ βρ(Γ | J )ρ(Γ | J ) ≡ −(2/σ) logE[exp(−σΓ/2) | J ]. (2.3)

The transformation ρ adjusts next period’s costs for risk prior to discount-ing. If x and u are measurable with respect to J , then α(u, x,Γ | J ) is also.For a hypothetical control u, state x, and next period cost Γ, α(u, x,Γ | J )is the discounted risk-adjusted current period cost. Notice that (2.2) canbe expressed in terms of α:

Ct,T = α(ut, xt, Ct+1,T | Jt) . (2.4)

We establish some useful properties of the aggregator function. Our firstresult will be used when we extend the time horizon to infinity.

Lemma 2.2.1. The function ρ(· | J ) is monotone increasing in Γ.

Proof. Suppose that Γ2 ≥ Γ1 ≥ 0, then

ρ(Γ2 | J )− ρ(Γ1 | J ) = −(2/σ){logE[exp(−σΓ2/2) | J ]− logE[exp(−σΓ1/2) | J ]}= −(2/σ) log{E[exp(−σΓ2/2) | J ]/E[exp(−σΓ1/2) | J ]}≥ 0 .

Our next result can be applied to establish that the cost criterion is globallyconvex in the controls and states when σ is negative.

Page 30: Maskin ambiguity book mock 2 - New York University

22 Chapter 2. Discounted Quadratic Control

Lemma 2.2.2. ρ(· | J ) is convex in Γ when σ ≤ 0.

Proof. Consider any nonnegative random variables Γ1 and Γ2 and theirconvex combination ωΓ1 + (1− ω)Γ2 for some 0 < ω < 1. Then

logE{exp[ − (σ/2)ωΓ1 − (σ/2)(1− ω)Γ2] | J }= logE{[exp(−σΓ1/2)]

ω[exp(−σΓ2/2)]1−ω | J }

≤ log({E[exp(−σΓ1/2) | J ]}ω{E[exp(−σΓ2/2) | J ]}1−ω)= ω logE[exp(−σΓ1/2) | J ] + (1− ω) logE{exp[−σΓ2/2] | J }

where the inequality follows from a conditional version of the Holder In-equality.

The aggregator α is convex in u and x because Q is positive definite andR is positive semidefinite. Because α is additively separable in its threearguments (u, x,Γ) and ρ(· | J ) is convex in Γ, when σ is negative, α isconvex in all three arguments. Because a composition of convex functionsis convex, the cost measure Ct,T is convex in the control/state sequencebetween time t and time T . Increasing −σ makes the cost criterion moreconvex and hence risk adjustments more pronounced. For this reason, weare particularly interested in the σ < 0 case.

2.3 Infinite Horizon Costs

To formulate the infinite-horizon optimization problem, we first must verifythat the associated costs are well defined. This leads us to study the timet cost criterion when the time horizon T increases. We use the result fromLemma 2.1 that the function ρ and hence the aggregator α is monotoneincreasing in Γ. It follows from this monotonicity that for any control-statesequence {(ut, xt) : t = 0, 1, . . .}, {Ct,T : T = t, t + 1, . . .} converges almostsurely to a limit cost Ct, although this random variable might be infinite(+∞).

The infinite horizon control problem can be formalized as follows. Af easible control process U = {ut : t = 0, 1, ...} is a stochastic process ofcontrols adapted to the sequence of sigma algebras {Jt : t = 0, 1, ...}. Forany such control process we define recursively via (2.1) a correspondingstate vector process X = {xt : t = 0, 1, ...} adapted to the same sequence

Page 31: Maskin ambiguity book mock 2 - New York University

2.3. Infinite Horizon Costs 23

of information sets, where we take the initial x0 as prespecified. Given Uand hence X, compute the time zero cost C0 by evaluating the almost surelimit of {C0,T : T = 1, 2, ...}. Define the time zero infinite horizon costassociated with a feasible control process U to be K(U |x0). The infinitehorizon problem is to minimize K(U |x0) by choice of a feasible controlprocess U for each initialization x0.

For some of our analysis we impose:

Assumption 2.3.1. There exists a matrix F for which the absolute valuesof all eigenvalues of A−BF are less than β−1/2.

In light of Lemmas 2.1 and 2.2 and the previous discussion, we have thefollowing:

Theorem 2.3.2. When σ is negative, K(·|x0) is convex. When σ is positiveand Assumption 1 is satisfied, there exists a feasible control process U suchthat K(U |x0) is finite.

Proof. We have already argued that the cost criterion is convex in the con-trol process for the finite horizon problem when σ ≤ 0 (see Lemma 2.2and the ensuing discussion). The convexity of the infinite horizon problemfollows by taking almost sure limits when the horizon is extended.

To show that the limiting cost Ct can be made finite for some control pro-cess when σ is positive, we use the σ = 0 case as a benchmark. The aggre-gator when σ = 0 is given by α∗(u, x, C|J ) = u′Qu+x′Rx+βE(C|J ). By theconvexity of the exponential function, E[exp(−σC/2)|J ] ≥ exp[−(σ/2)E(C|J )].Taking logarithms and multiplying by (−2β/σ) it follows that

α(u, x, C|J ) ≤ α∗(u, x, C|J ) for σ ≥ 0. (2.5)

Given inequalities (2.5) and the monotonicity of α and α∗ in C, it can beestablished that

Ct ≤ C∗t for σ ≥ 0, (2.6)

where C∗t is the time t, σ = 0 cost. It follows from (2.6) that for σ > 0,

the ‘adjusted’ cost will be finite whenever the ‘unadjusted’ (or ∗) cost isfinite. In light of Assumption 1, there exists a time invariant control lawthat makes the ‘unadjusted’ cost and hence the adjusted cost finite.

Page 32: Maskin ambiguity book mock 2 - New York University

24 Chapter 2. Discounted Quadratic Control

Unfortunately, the inequalities in the proof of Theorem 3.1 work in thewrong direction for the case in which σ < 0. In that case, the ‘risk adjusted’cost can be infinite when the ‘unadjusted’ cost is finite. The finding thatthe cost criterion can be infinite when σ is strictly negative closely parallelsresults in Jacobson (1973).

2.4 Arbitrary Time-invariant Linear

Control Laws

As a prelude to showing that the solution to the infinite horizon optimalcontrol problem is linear and time invariant, we characterize the cost asso-ciated with any such law. Suppose that ut = −Fxt. We can compute ourcost criterion under this control law by doing recursions on the aggregatorfunction α evaluated at the linear control and a quadratic representationof next period’s costs. Let V be a positive semidefinite matrix and c be anonnegative real number. Note that

α(−Fx, x, y′V y + c | J ) = x′R∗x+ βρ(y′V y + c | J ) (2.7)

wherey = A∗x+ Cw , (2.8)

A∗ ≡ A−BF,R∗ ≡ F ′QF +R, and w is normally distributed conditionedon J with mean zero and covariance matrix I. We use the following formulafrom Jacobson (1973) to compute the right side of (2.7):3

ρ(y′V y + c | J ) = x′A∗′[V − σV C(I + σC ′V C)−1C ′V ]A∗x+c+ (1/σ) log det(I + σC ′V C). (2.9)

This formula only works when (I + σC ′V C) is positive definite. If (I +σC ′V C) has one nonpositive eigenvalue, then the left side of (2.7) is infinite(where we interpret log(+∞) = +∞). Substituting (2.9) into (2.7), weobtain

α(−Fx, x, y′V y + c | J ) = x′S(V )x+ U(V, c) (2.10)

where

S(V ) ≡ R∗ + βA∗′[V − σV C(I + σC ′V C)−1C ′V ]A∗ (2.11)

3This is a special case of formulas (33) and (38) in Jacobson (1973), where Pk inJacobson’s formulas is taken to be the identity matrix.

Page 33: Maskin ambiguity book mock 2 - New York University

2.4. Arbitrary Time-invariant Linear Control Laws 25

and

U(V, c) ≡ βc+ (β/σ)[log det(I + σC ′V C)]. (2.12)

Consequently with the Gaussian information structure, the aggregator αmaps a translated quadratic cost measure for next period’s costs into atranslated quadratic cost measure today. Since the infinite-horizon costis computed by iterating on α as in (2.4), the resulting initial period costmeasure, when finite, is quadratic in the initial state vector plus a constant,say, x′0V

∗x0 + c∗.

Theorem 2.4.1. Suppose that ut = −Fxt for t = 0, 1, ... and that thecorresponding time zero cost C0 is finite. Then C0 = x′0V

∗x0 + c∗ for somepositive semidefinite matrix V ∗ and some nonnegative scalar c∗ that are thesmallest solutions to:

V ∗ = S(V ∗), c∗ = U(V ∗, c∗). (2.13)

Proof: Recall that C0 is defined to be the almost sure limit point of themonotone sequence {C0, j : j = 1, 2, ...}. Let Sj(V ) denote the jth iterateof the transformation S evaluated at a matrix V . Similarly, define

U j(V, c) ≡ U [Sj−1(V ),U j−1(V, c)]. (2.14)

Then by applying formula (2.10) repeatedly, it follows that

C0,j = x′0Sj(0)x0 + U j(0, 0)

where we have imposed the terminal cost restriction Cj,j = 0. Since thecost sequence is finite by assumption and converges almost surely for anyinitialization x0, it must be that the sequence of positive semidefinite ma-trices {Sj(0) : j = 1, 2, ...} converges (entry by entry) as does the scalarsequence {U j(0, 0) : j = 1, 2, ...} of nonnegative numbers. Let V ∗ and c∗

denote the respective limit points. From (2.12) and (2.14), it follows thatU j(0, 0) ≥ (β/σ)[log det(I + σC ′Sj−1(0)C)]. Taking limits it follows thatc∗ ≥ (β/σ)[log det(I + σC ′V ∗C)], and hence (I + σC ′V ∗C) is nonsingular.It follows that the operator S is continuous, which in turn implies that V ∗

is a fixed point of S. The convergence of {U j(0, 0)} to c∗ together with thecontinuity of U at (V ∗, c∗) ensures that c∗ = U(V ∗, c∗). Therefore, (V ∗, c∗)solves (2.13).

Page 34: Maskin ambiguity book mock 2 - New York University

26 Chapter 2. Discounted Quadratic Control

Let (V , c) be any other solution of (2.13) where V is a positive semidefi-nite matrix and c is a nonnegative real number. By the monotonicity of α inits third argument, it follows that x′0V x0+ c ≥ x′0Sj(0)x0+U j(0, 0). Takinglimits of the right side as j → ∞ it follows that x′0V x0 + c ≥ x′0V

∗x0 + c∗.Since this inequality holds for any initialization of x0, V ≥ V ∗, c ≥ c∗.The inequality involving the V ’s can be established by making x0 large ina several different directions, while the inequality entailing the c’s can beverified by setting x0 to zero.

Remark 2.4.2. Since the S transformation has the form of the operatorassociated with a matrix Riccati equation, we can construct an optimal con-trol problem which has the smallest fixed point of S for its solution. Asso-ciated with this problem are discounted, infinite-horizon versions of notionsof ‘pessimism’ and ‘optimism’ in Jacobson (1973) and Whittle (1981). SeeHansen and Sargent (2013a) for details.

2.5 Solution to the Infinite Horizon

Discounted Problem

We solve the infinite-horizon problem by first deriving discounted versionsof the recursions obtained by Jacobson (1973). Consider the two-periodproblem:

minu,y

α(u, x, y′Wy + d | J ) subject to y = Ax+Bu+ Cw (2.15)

where w is anm-dimensional normally distributed random vector with meanzero and covariance matrix I that is independent of the state vector x, andwhere (I + σC ′WC) is positive definite. Using the previously mentionedformula from Jacobson (1973), we are led to solve a corresponding uncon-strained quadratic optimization problem:

minu

{u′Qu+ x′Rx+ β(Ax+Bu)′D(W )(Ax+Bu) + U(W, d)} (2.16)

whereD(W ) ≡W − σWC(I + σC ′WC)−1C ′W. (2.17)

The optimal control law is

u = −F ◦ D(W )x, F(V ) = β[Q+ βB′V B]−1B′V A, (2.18)

Page 35: Maskin ambiguity book mock 2 - New York University

2.5. Solution to the Infinite Horizon Discounted Problem 27

and the minimized value of the criterion is x′T ◦ D(W )x+ U(W, d), where

T (V ) ≡ R + F(V )′QF(V ) + β[A− BF(V )]′V [A− BF(V )]= R + A′(βV − β2V B(Q+ βB′V B)−1B′V )A. (2.19)

Consequently, if next period’s value function is quadratic in the state plusa constant term, the current period value function will have the same func-tional form. When σ = 0,D is the identity operator so that T ◦D is just T .Note that the T is the operator associated with the matrix Riccati equationfor the ordinary discounted version of the optimal linear regulator problem.

Following the usual backward induction argument, iterating on the com-posite transformation T ◦D corresponds to increasing the time horizon. Theminimized value M0,j of the time 0, j-period optimization problem is

M0,j = x′0(T ◦ D)j(0)x0 + Vj(0, 0) (2.20)

whereVj(W, d) = U [(T ◦ D)j−1,Vj−1(W, d)] (2.21)

The infinite-horizon problem involves the limiting behavior of the sequence{M0,j : j = 1, 2, ...}. In light of Lemma 2.1, this sequence of minimizedvalues is increasing and hence converges (almost surely) to a limit valueM0, although the limit might be infinite.

Theorem 2.5.1. Suppose that M0 is finite. Then

M0 = x′0W∗x0 + d∗

for some positive semidefinite matrix W ∗ and some nonnegative scalar d∗

that are the smallest solutions to: W ∗ = T ◦ D(W ∗), d∗ = V(d∗).

Proof. This theorem can be proved by imitating the proof of Theorem 4.1where T ◦ D takes the place of S and V takes the place of U .

Theorem 2.5.2. Suppose that M0 is finite. Then the process U associatedwith the linear control law ut = −F◦D(W ∗)xt minimizes K(U |x0) by choiceof a feasible control process U for every initialization x0.

Page 36: Maskin ambiguity book mock 2 - New York University

28 Chapter 2. Discounted Quadratic Control

Proof. First note that

x′0W∗x0 + d∗ ≤ K(U |x0). (2.22)

Since W ∗ is a fixed point of T ◦ D, it is also a fixed point for the trans-formation T constructed using the candidate time-invariant linear controllaw [F = F ◦ D(W ∗)]. Moreover, d∗ = U(W ∗, d∗). Let (V ∗, d∗) be the pos-itive semidefinite matrix and nonnegative scalar given by Theorem 4.1 forthe candidate control law. Then V ∗ ≤ W ∗, c∗ ≤ d∗. In light of relation(2.22), these inequalities can be replaced by equalities, and the conclusionfollows.

Both Theorems 5.1 and 5.2 presume that M0 is finite. From Theorem3.1 we know that M0 is finite whenever Assumption 1 is satisfied and σ ispositive. When σ is strictly negative, Assumption 1 is no longer sufficient.However, M0 can be computed as the limit of {M0,j : j = 1, 2, ...} as informulas (2.20) and (2.21). As long as the matrices {(I + σC ′(T ◦ D)jC) :j = 1, 2, ...} are positive definite and have a positive definite limit, M0 willbe finite.

It is also of interest to know when the optimal control law stabilizes thestate vector process (relative to β). As in the usual linear-quadratic (σ = 0)problem, factor the positive semidefinite matrix R = L′L.

Theorem 2.5.3. Suppose that σ is negative, M0 is finite and (β1/2A,L)is detectable. Then the absolute values of the eigenvalues of A − BF(W ∗)are less than β−1/2.

Proof. As in the proof of Theorem 3.1, compare the risk adjusted costs withσ < 0 to the σ = 0 costs. The counterpart to inequalities (2.5) and (2.6)for σ ≤ 0 are

α(u, x, C|J ) ≥ α∗(u, x, C|J ) for σ ≤ 0

andC0 ≥ C∗

0 for σ ≤ 0. (2.23)

Apply these inequalities for the control law ut = −F(W ∗)xt. Since the(β1/2A,L) is detectable, the unadjusted C∗

0 cost will be finite only if A −BF(W ∗) satisfies the eigenvalue restriction. Given inequality (2.23), A −BF(W ∗) must satisfy the eigenvalue restriction because C0 and hence C∗

0 isfinite.

Page 37: Maskin ambiguity book mock 2 - New York University

2.6. Summary 29

Remark 2.5.4. Modifications of standard computational methods can beused to accelerate computational speed relative to iterating on the transfor-mation T ◦ D. See Hansen and Sargent (2013a) for details.

2.6 Summary

In economic applications, it is very useful to have formulations of dynamicprogramming problems that incorporate discounting, imply time-invariantdecision rules for infinite horizon-problems, and are readily calculable.4 Forthat reason, the discounted optimal linear regulator has been a workhorse ineconomic dynamics generally, and in particular in the development of realbusiness cycle theory and the formulation of linear rational expectationsmodels as applied in real business cycle theory and finance. The modifiedlinear regulator described in this paper preserves the desirable features ofthe linear regulator, and adds aspects of risk-sensitivity.

4Our recursive cost specification preserves the ‘time consistency’ property that issatisfied in the time and state-separable σ = 0 specification of the problem.

Page 38: Maskin ambiguity book mock 2 - New York University
Page 39: Maskin ambiguity book mock 2 - New York University

Chapter 3

Robust Permanent Income andPricing

with Thomas D. Tallarini

“. . . I suppose there exists an extremely powerful, and, if I mayso speak, malignant being, whose whole endeavors are directedtoward deceiving me.”Rene Descartes, Meditations, II.1

3.1 Introduction

This paper studies consumption and savings profiles and security marketprices in a permanent income model when consumers are robust decisionmakers. 2 Robust decision makers and expected utility maximizers share acommon probabilistic specification of the income shocks. But robust deci-sion makers suspect specification errors and want decisions to be insensitiveto them. We show how a preference for robustness lies concealed within thequantity implications of the permanent income model and how it can be

1Descartes (1901, p. 227).2This research was funded by grants from the National Science Foundation. We thank

Andrew Abel, Evan Anderson, John Cochrane, Cristina de Nardi, Martin Eichenbaum,John Heaton, Narayana Kocherlakota, Wen-Fang Liu, Jesus Santos, Kenneth Singleton,Nancy Stokey and Noah Williams for useful criticisms of earlier drafts. We are gratefulto Wen-Fang Liu for excellent research assistance. We thank two referees of an earlierdraft for comments that prompted an extensive reorientation of our research.

31

Page 40: Maskin ambiguity book mock 2 - New York University

32 Chapter 3. Robust Permanent Income and Pricing

revealed by market-based measures of ‘risk-aversion’. We aim to show thatlarge market-based measures of risk aversion can emerge from concern aboutsmall specification errors.

We reinterpret the decision rules for saving and consumption from arational expectations version of Hall’s 1978 permanent income model withhabit persistence. We show how a robust decision maker with a lower dis-count factor would use those same decision rules for saving and consump-tion.3 Increasing the preference for robustness stimulates a precautionarymotive for savings,4 an effect that an appropriate decrease of the discountfactor cancels.5

Our empirical strategy comes from the preceding observational equiv-alence result. To determine all but two parameters of the model, we es-timate the rational expectations version of a habit-persistent version ofHall’s model from aggregate U.S. time series on consumption and invest-ment. By construction, our model with a preference for robustness mustfit these quantity data as well as Hall’s. But it has different implicationsabout prices of risky assets. In particular, at the consumption/savings planassociated with Hall’s model, the shadow prices of a robust decision makerput the market price of risk much closer to empirical estimates. After es-timating Hall’s model from the quantity data, we use some asset prices tocalibrate the discount factor and a robustness parameter, while preservingthe implications for saving and consumption.

In contrast to models in the spirit of Bewley (1977), market incom-pleteness plays no role in our decentralization of the permanent incomemodel. Instead, following Hansen (1987), we interpret the permanent in-

3Our setting relates to the max-min utility theory of Gilboa and Schmeidler (1989)and Epstein and Wang (1994). A robust decision maker uses rules that work well for aspecific stochastic environment, but that are also insensitive to small perturbations ofthe probabilistic specification (see Zames (1981), Francis (1987), and Zhou et al. (1996)).Similarly, by ascribing a family of possible probability laws to a decision maker, theliterature draws a sharp distinction between Knightian uncertainty and risk. Knightianuncertainty corresponds to the perturbations in the probabilistic specification envisionedby the robust control theorists.

4Under a rational expectations interpretation, Hall’s model excludes precautionarysavings, as emphasized by Zeldes (1989).

5In effect, we are solving a particular ‘robust control’ version of an ‘inverse optimaldecision’ problem. Versions of such problems have played an important role in thedevelopment of rational expectations theory. See Muth (1960a). See Hansen and Sargent(1983) and Christiano (1987) for developments building on Muth’s work.

Page 41: Maskin ambiguity book mock 2 - New York University

3.1. Introduction 33

come decision rule in terms of a planning problem whose consumption andinvestment processes are equilibrium allocations for a competitive equilib-rium. We then deduce asset prices as did Lucas (1978) and Epstein (1988)by finding shadow prices that clear security markets. These asset pricesencode information about the slopes of intertemporal indifference curvespassing through the equilibrium consumption process, and therefore mea-sure the risk aversion of the consumer. To accommodate robustness, ourdecentralization copies Epstein and Wang (1994).6

To model robust decision making requires formulating a class of mis-specifications that worry the decision maker. We obtain a workable classof misspecifications by using the literature on risk-sensitive control startedby Jacobson (1973, 1977) and extended by Whittle (1982, 1983, 1989b,1990) and ourselves (1995). Originally this literature did not seek to modelrobustness but rather sought to magnify responses to risk under rationalexpectations. The idea was to induce bigger effects of risk on decision rules(i.e., greater departures from certainty equivalence) by altering a single risk-sensitivity parameter that influences the intertemporal objective function.But risk-sensitive preferences can be reinterpreted as embedding a wish forrobustness against a class of perturbations of the transition dynamics. Forundiscounted linear-quadratic control problems, Glover and Doyle (1988)showed how a particular kind of concern for robustness connects to therisk-sensitive formulation of preferences. They showed how the risk sensi-tivity parameter measures the size of the class of misspecifications againstwhich robustness is sought. We use a discounted version of James (1995)notion of robustness. In this paper, we prefer to interpret our results interms of a decision maker’s concern for robustness. However, because weuse a formulation of robust decision theory induced by the risk-sensitivityparameterization, an interpretation in terms of risk-sensitive preferences isalso available.7

6See Melino and Epstein (1995) for an alternative attack on this same question. Theyuse a recursive formulation of an ε–contamination specification adapted from the theoryof robust statistics.

7To avail ourselves of this interpretation requires that we model risk sensitivity withdiscounting in a recursive manner, as in Epstein (1988), Weil (1989), Epstein and Zin(1989a) and Hansen and Sargent (1995a). Epstein and Zin (1989a) developed a versionof recursive utility theory that raises the market price of risk without altering the in-tertemporal substitution elasticity. Van Der Ploeg (1993) introduced risk sensitivity intoa permanent income model, but not in a recursive manner.

Page 42: Maskin ambiguity book mock 2 - New York University

34 Chapter 3. Robust Permanent Income and Pricing

The remainder of this paper is organized as follows. Section 2 sum-marizes the necessary decision theory. We link risk-sensitive and robustdecision theories by displaying two closely connected value functions asso-ciated with superficially different problems. The problems lead to identicaldecision rules. The second problem embodies a preference for robustness,provides links to Gilboa-Schmeidler’s version of Knightian uncertainty, andexplains the quote from Descartes. In sections 3 and 4, we describe and es-timate our permanent income model. The observational equivalence propo-sition of section 4 motivates a two part strategy for using the quantityand asset price data. Section 5 exploits the links between robustness andrisk-sensitivity in developing asset pricing formulas in terms of probabilitymeasures induced by ‘pessimistic’ views of laws of motion that emerge asby-products of robust decision making. These formulas prepare the wayfor our interpretations of the market price of risk in terms of robustness.Section 6 quantifies the amount of preference for robustness required topush up the market price of risk. Section 7 measures intertemporal mean-risk trade-offs associated with different amounts of concern with robustness.Section 8 concludes.

3.2 Recursive Risk Sensitive Control

The theory rests on two closely related recursive linear quadratic optimiza-tion problems. We describe a distortion of beliefs away from rational ex-pectations that induces the same behavior as a particular modification ofpreferences toward risk. The equivalence of these two problems lets us inter-pret a ‘risk sensitivity’ parameter as measuring a preference for robustness.

The recursive risk sensitive control problem

The state transition equation is

xt+1 = Axt +Bit + Cwt+1, (3.1)

where it is a control vector, xt is the state vector, and wt+1 is an i.i.d.Gaussian random vector with Ewt+1 = 0, and Ewt+1w

′t+1 = I. Let Jt be

the sigma algebra induced by {x0, ws, 0 ≤ s ≤ t}. The one-period returnfunction is

u(i, x) = −i′Qi− x′Rx,

Page 43: Maskin ambiguity book mock 2 - New York University

3.2. Recursive Risk Sensitive Control 35

whereQ is positive definite andR is positive semidefinite. Following Epsteinand Zin (1989a), Weil (1993), and Hansen and Sargent (1995a), we use thefollowing recursion to induce intertemporal preferences:

Ut = u(it, xt) + βRt(Ut+1), (3.2)

where

Rt(Ut+1) ≡2

σlogE

[exp

(σUt+1

2

)|Jt]. (3.3)

When σ = 0 we takeRt ≡ E(Ut+1|Jt), and we have the usual von Neumann-Morgenstern form of state additivity. When σ = 0, the operator Rt makesan additional risk adjustment over and above that induced by the shapeof u(·, ·). Values of σ less than zero correspond to more aversion to riskvis a vis the von Neumann-Morgenstern specification.8 As emphasized byHansen and Sargent (1995a), the (log, exp) specification links the general re-cursive utility specification of Epstein and Zin (1989a) to risk-sensitive con-trol theory. Weil (1993) permanent income model used the same (log, exp)specification but did not exploit connections to the risk-sensitive controlliterature.

The risk sensitive control problem is to maximize the time zero utilityindex U0 by choosing a control process it adapted to Jt. Let W (x) denotethe optimum value function for this problem, so that Ue

0 = W (x0) wherethe e superscript is used to distinguish the efficient or optimal utility index.Hansen and Sargent (1995a) extended the Jacobson-Whittle risk-sensitivecontrol theory to provide formulas for Ω and ρ in the following representa-tion of the value function:

Uet =W (xt) = x′tΩxt + ρ. (3.4)

Let i = −Fx denote the optimal decision rule. Let A∗ = A − BF bethe closed loop transition matrix (i.e., with it = −Fxt substituted intothe original transition law). We display explicit formulas for the distortedexpectation operator below.

We shall have cause to evaluateRt(Ut+1) for the quadratic value function(3.4) where Ω is a negative semidefinite matrix of real numbers and ρ is a

8As in Kreps and Porteus (1978b), this recursive utility formulation overturns theindifference to the timing of the resolution of uncertainty inherent in state-separablepreferences. The additional risk adjustment for σ < 0 implies a preference for earlyresolution of uncertainty.

Page 44: Maskin ambiguity book mock 2 - New York University

36 Chapter 3. Robust Permanent Income and Pricing

nonpositive real number. It follows from Jacobson (1973) that

Rt(Uet+1) = x′tΩxt + ρ, (3.5)

where

Ω = A∗′[Ω + σΩC(I − σC ′ΩC)−1C ′Ω]A∗, (3.6a)

and

ρ = ρ− (1/σ) log[det(I − σC ′ΩC)], (3.6b)

so long as the matrix (I − σC ′ΩC) is positive definite, which we assume.

Robustness reinterpretation

We can reinterpret risk-sensitive preferences in terms of a decision makerwith ordinary preferences who fears specification errors. The robustnessinterpretation is based on a recursive formulation of a zero-sum two-playerLagrange multiplier game whose value function W (x) relates to W (x). Pa-rameterizing the game in terms of a fixed Lagrange multiplier makes asequential version of the game, under the Markov perfect equilibrium con-cept, have the same outcome as a version where players can precommit attime zero.9

In this game, one player chooses decision rules for the control vector {it},with two differences vis a vis the single agent risk-sensitive control problem.First, a maximizing player makes no risk adjustment in the utility function.Second, another minimizing player injects a distortion each time period intothe conditional mean of the shock process. Thus, the first player maximizesa utility index U0 = E0

∑∞t=0 β

tu(it, xt) by choice of state-feedback rules for{it} and subject to the distorted law of motion

xt+1 = Axt +Bit + C(wt+1 + vt), (3.7)

9Anderson, Hansen, and Sargent (1999b) (AHS) describe a different class of specifica-tion errors that leads to the same risk adjustment (3.3). AHS permit specification errorsin the form of perturbations to a controlled Markov process. AHS use a constraint onthe size relative entropy to parameterize the admissible class of misspecifications. Theirformulation applies to nonquadratic objective functions and nonlinear laws of motion.They also formulate the connection between risk-sensitivity and robustness in continuoustime.

Page 45: Maskin ambiguity book mock 2 - New York University

3.2. Recursive Risk Sensitive Control 37

where vt distorts the mean of the innovation. The second player chooses afeedback rule for vt to minimize U0 subject to

Et

∞∑j=0

βjvt+j · vt+j � ηt, (3.8a)

ηt+1 = β−1(ηt − vt · vt), (3.8b)

where η0 is given and ηt serves as a continuation pessimism bound at datet. In (3.8a), Et(·) denotes the conditional expectation taken with respectto the law of motion (3.7), which is relative to (3.1) is distorted by thepresence of vt.

The second player is introduced as a device to determine the conditionalmean distortions {vt} in a way that delivers a particular form of robustness.Letting vt feed back on xt, including its endogenous components, allows fora wide class of misspecifications. We want the feedback rule for it to beinsensitive to mistakes vt in the conditional mean of wt+1. To promoteinsensitivity, we make the second player malevolent and instruct him tominimize U0 over state feedback rules for vt.

We impose restriction (3.8b) by formulating a multiplier game. In par-ticular, we let −1/σ ≥ 0 be a Lagrange multiplier on the time t constraint(3.8a) and require that the continuation pessimism level ηt be such thatthe multiplier is constant over time.10 Condition (3.8b) accomplishes this.This leads to a recursive formulation of the game. The Markov perfectequilibrium has a value function that satisfies:

W (x) = infvsupi

{−i′Qi− x′Rx+ β

[−1

σv′v + EW (Ax+Bi+ C(w + v)

]}= x′Ωx+ ρ (3.9)

where the E operator integrates w with respect to a normal distributionwith mean zero and covariance matrix I. Hansen and Sargent (1998)show that the value functions W and W share the same matrix Ω in theirquadratic forms, but have different constants ρ and ρ. Let i = −Fx, v = Gxdenote the policy rules that solve (3.9); the rules are linear, and the rulefor i also solves the risk-sensitive control problem.11

10See Hansen and Sargent (1998) for more details.11Hansen and Sargent (1998) discuss how the particular parameterization of ‘uncer-

Page 46: Maskin ambiguity book mock 2 - New York University

38 Chapter 3. Robust Permanent Income and Pricing

The relationship between the two value functions and the decision rulesfor i establishes how the risk-sensitive preference specification induces thesame behavior that would occur without the risk-sensitivity adjustment topreferences, but with the pessimistic view of the conditional mean of inno-vations (the vt’s) reflected in (3.9). The risk-sensitivity parameter σ sets theconstant Lagrange multiplier −σ−1 on restriction (8). Notice how η0 indexesthe degree of pessimism, i.e., the size of the domain of sequences from whichthe malevolent opponent selects adverse vt’s. Hansen and Sargent (1998)describe in detail why it is convenient computationally to parameterize pes-simism in this way.

”Uncertainty aversion” or robustness

The Markov perfect equilibrium summarized by (3.9) is the value functionfor a single decision maker whose decisions are governed by a ‘worst case’analysis. By using a feedback rule for it that solves (3.9), the robust con-troller does better for some appropriately constrained mistake sequences{vt} while sacrificing utility when these mistakes are absent. Our treat-ment of this robustness and its connection to risk sensitivity follows James(1995) recent survey of robust control, except that we have incorporateddiscounting into the risk sensitive formulation of the problem and into thecorresponding constraints on the model misspecification.

There is a closely related literature in economics originating with thework of Gilboa and Schmeidler (1989) and Epstein and Wang (1994). Thedecision theory axiomatized by Gilboa and Schmeidler generalizes expectedutility theory by studying a setting where decisions are based on a ‘maxmin’criterion because beliefs are described by a family of probability measuresrather than a single probability measure. In our setup, there is a ‘nominalmodel’ corresponding to setting vt = 0 for all t. Alternative specification er-ror sequences {vt} constrained by (8) deliver the resulting family of stochas-tic processes used in the state evolution equation. Hence our decision makercan be viewed as having preferences represented by the maxmin utility the-ory of Gilboa and Schmeidler. Following Epstein and Wang (1994), wecan interpret the nonuniqueness of the stochastic constraints as depictinga form of Knightian uncertainty: an ambiguity of beliefs not fully specified

tainty aversion’ embedded in (3.9) – in which the ‘Lagrange multiplier’ −σ−1 is timeinvariant – requires choosing the continuation pessimism bounds ηt in a way to make theopponent’s decision problem recursive.

Page 47: Maskin ambiguity book mock 2 - New York University

3.2. Recursive Risk Sensitive Control 39

in probabilistic terms but described by the set of specification errors {vt}defined by restriction (8).

In intertemporal contexts, Epstein and Wang (1994) use a Markov for-mulation of the two-player game to avoid inducing a form of time incon-sistency. We follow the literature on robust control by holding fixed theLagrange multiplier −σ−1 on the specification error constraint over time.Below, we shall compute the vt’s and use them to measure the amountof uncertainty aversion associated with alternative values of σ. We availourselves of a formula for the matrix G in v = Gx.

Solution for v

The solution for v within the Markov perfect equilibrium satisfies:

vt = σ(I − σC ′ΩC)−1C ′ΩA∗xt, (3.10)

where xt+1 = A∗xt + Cwt+1 under the optimal control law for the risk-sensitive problem (A∗ = A−BF ). (Here we are assuming that the parame-ter σ is sufficiently small that the matrix (I−σC ′ΩC) is positive definite.)12

Below we shall compute vt and study how it alters measures of riskaversion extracted from asset prices.

Modified certainty equivalence

Whittle (1982) pointed out how the solution for v supports a modifiedversion of certainty equivalence. This version asserts the equivalence oftwo ways of evaluating time-invariant decision rules it = −Fxt, one underrational expectations and risk-sensitive preferences; the other under dis-torted expectations and ordinary (σ = 0) quadratic preferences. Recallthat A∗ = A−BF , and let R∗ = R+F ′QF . The two valuation algorithmsare:

12Although the matrix Ω depends implicitly on σ, it can be shown that the requisitepositive definiteness will be satisfied for small values of σ. The risk-sensitive controltheory literature draws attention to the breakdown point under which this positive defi-niteness property ceases to hold (e.g., see Glover and Doyle (1988)). At such points, therisk-adjusted recursive utility is −∞ regardless of the controller’s action. The generalequilibrium aspects of our analysis lead us to look at much smaller risk corrections thanare tolerated by the breakdown analysis.

Page 48: Maskin ambiguity book mock 2 - New York University

40 Chapter 3. Robust Permanent Income and Pricing

(1) Uet = −x′tR∗xt + βRtU

et+1, where Rt is defined in (3.3), and where the

conditional expectation operator in (3.3) is computed with respect tothe (true) law of motion xt+1 = A∗xt + Cwt+1. The criterion can berepresented as the translated quadratic form Ue

t = x′tΩxt + ρ, wherethe matrix Ω and the scalar ρ are fixed points of operators defined byHansen and Sargent (1995a).

(2) W (xt) = Ut = −x′tR∗xt+βEtUt+1−(βσ

)v′tvt, where Et is an expectation

operator computed with respect to the distorted law of motion

xt+1 = Axt + Cwt+1, (3.11)

whereA = [I + σC(I − σC ′ΩC)−1C ′Ω]A∗, (3.12)

and vt is given by (3.10). The formula for A is derived by adding Cvtto A∗, where vt satisfies (3.10). The criterion Ut has the representationUt = x′tΩxt + ρ, where Ω is the same matrix occurring in the firstrepresentation.

Evidently, these two evaluations yield the same ordering over time-invariantdecision rules it = −Fxt. This is the modified certainty equivalence prin-ciple. Notice the appearance of Ω, computed from the first formulation, inthe construction of the distorted law of motion (3.12). We shall use A from(3.12) again in computing asset prices.

3.3 Robust Permanent Income Theory

Hall (1978), Campbell (1987), Heaton (1993), and Hansen et al. (1991)studied how closely a permanent income model approximates aggregatedata on consumption and investment. We formulate a risk-sensitive ver-sion of the permanent income model with habit persistence, estimate itfrom data on consumption and investment, then use it to compare the im-plications of risk-sensitivity for consumption, investment, and asset prices.We demonstrate an observational equivalence proposition asserting that theconsumption and investment data alone are insufficient simultaneously toidentify the risk-sensitivity parameter σ and the subjective discount factorβ. This observational equivalence substantiates our claim to be reinterpret-ing decision rules from a habit-persistence version of Hall’s model in terms

Page 49: Maskin ambiguity book mock 2 - New York University

3.3. Robust Permanent Income Theory 41

of robust decision making. Adding knowledge of the risk-free rate, whichis constant in this model, does not achieve identification. But later we willshow that the risk-sensitivity parameter has strong effects on other assetprices, including the market price of risk.

The lack of identification from consumption and investment data emergesas follows. For a given specification of shocks, introducing risk sensitivityprovides an additional precautionary motive for saving. In terms of impli-cations for savings, this motive can be offset by diminishing the subjectivediscount factor to make saving less attractive. In terms of effects on thevaluation of risky assets, these changes are not offsetting.

The model

We formulate the model in terms of a planner with preferences over con-sumption streams {ct}∞t=0, intermediated through the service stream {st}.Preferences are ordered by the utility index U0, defined through the recur-sion

Ut = −(st − bt)2 + βRt(Ut+1) (3.13)

where Rt(Ut+1) is defined by (3.3).In (3.13), st is a scalar household service produced by the scalar con-

sumption ct via the household technology

st = (1 + λ)ct − λht−1, (3.14a)

ht = δhht−1 + (1− δh)ct, (3.14b)

where λ > 0 and δh ∈ (0, 1). In (3.13), {bt} is an exogenous preferenceshock process. System (14) accommodates habit persistence or rationaladdiction as in Ryder and Heal (1973), Becker and Murphy (1988), Sun-daresan (1989), Constantinides (1990) and Heaton (1993). By construction,ht is a geometric weighted average of current and past consumption. Set-ting λ > 0 induces intertemporal complementarities. Consumption servicesdepend positively on current consumption, but negatively on a weightedaverage of past consumptions, an embodiment of ‘habit persistence’.

There is a linear production technology

ct + it = γkt−1 + dt,

where the capital stock kt at the end of period t evolves according to

kt = δkkt−1 + it,

Page 50: Maskin ambiguity book mock 2 - New York University

42 Chapter 3. Robust Permanent Income and Pricing

it is time t gross investment, and {dt} is an exogenously specified endowmentprocess. The parameter γ is the (constant) marginal product of capital, andδk is the depreciation factor for capital. Solving the capital evolution equa-tion for investment and substituting into the linear production technologygives:

ct + kt = (δk + γ)kt−1 + dt. (3.15)

We define:R ≡ δk + γ

which is the physical (gross) return on capital taking account of the factthat capital depreciates over time. When the economy is decentralized, Rwill also coincide with the gross return on a risk free asset. We impose thatthe components of the solution for {ct, ht, kt} belong to L2

0, the space ofstochastic processes {yt} defined as:

L20 = {y : yt is in Jt for t = 0, 1, · · · and E

∞∑t=0

R−t(yt)2 | J0 < +∞}.

We suppose that the endowment and preference shocks (dt, bt) are gov-erned by bt = Ubzt, dt = Udzt where

zt+1 = A22zt + C2wt+1.

Here wt+1 is independent of Jt = {wt, wt−1, . . . , w1, z0}, the eigenvalues ofA22 are bounded in modulus by unity, and wt+1 is normally distributed withmean zero and covariance matrix I.

Given k0, the planner chooses a process {ct, kt} with components in L20

to maximize U0 subject to (14), (3.15).13

Solution of model and identification of σ

To establish observational equivalence for the quantity observations, weproceed constructively. First, we compute a solution for σ = 0 and βR = 1,i.e., a permanent income economy without risk sensitivity. Then we use

13We can convert this problem into a special case of the control problem posed insection 2 as follows. Form a composite state vector xt by stacking ht−1, kt−1 and zt, letthe control it be given by st − bt. Solve (3.14a) for ct as a function of st − bt, bt andht−1 and substitute into equations (3.14b) and (3.15). Stack the resulting two equationsalong with the state evolution equation for zt to form the evolution equation for xt+1.

Page 51: Maskin ambiguity book mock 2 - New York University

3.3. Robust Permanent Income Theory 43

the allocation for this σ = 0 economy to construct an equivalence classof alternative (σ, β)’s that generate the same allocation, for fixed valuesof all the other parameters. This demonstrates that the pair (σ, β) is notidentified from quantity observations alone.

The σ = 0, βR = 1 benchmark case

To produce a permanent income model in the σ = 0 special case, we followHall (1978) and impose that βR = 1. When σ = 0, (3.13) and (3.3) reduceto

U0 = E0

∞∑t=0

βt{−(st − bt)2}. (3.16)

Formulate the planning problem as a Lagrangian by putting random La-grange multiplier processes of 2βtμst on (3.14a), 2βtμht on (3.14b), and2βtμct on (3.15). First-order necessary conditions are

μst = bt − st, (3.17a)

μct = (1 + λ)μst + (1− δh)μht, (3.17b)

μht = βEt[δhμht+1 − λμst+1], (3.17c)

μct = βREtμct+1, (3.17d)

and also (14), (3.15). When βR = 1, equation (3.17d) implies that μct is amartingale; then (3.17b) and (3.17c) solved forward imply that μst, μht arealso martingales. This implies that μst has the representation

μst = μst−1 + ν ′wt, (3.18)

for some vector ν.Use (3.17a) to write st = bt − μst, substitute this into the household

technology (14), and rearrange to get the system

ct =1

1 + λ(bt − μst) +

λ

1 + λht−1, (3.19a)

ht = δhht−1 + (1− δh)(bt − μst), (3.19b)

where δh =δh+λ1+λ

. Equation (3.19b) can be used to compute

Et

∞∑j=0

βjht+j−1 = (1−βδh)−1ht−1+β(1− δh)

(1− βδh)Et

∞∑j=0

βj(bt+j−μst+j). (3.20)

Page 52: Maskin ambiguity book mock 2 - New York University

44 Chapter 3. Robust Permanent Income and Pricing

For the purpose of solving the first-order conditions (17), (14) and (3.15)subject to the side condition that {ct, kt} ∈ L2

0, treat the technology (3.15)as a difference equation in {kt}, solve forward, and take conditional expec-tations on both sides to get

kt−1 =∞∑j=0

R−(j+1)Et(ct+j − dt+j). (3.21)

Use (3.19a) to eliminate {ct+j} from (3.21), then use (3.18) and (3.20).Solve the resulting system for μst, to get

μst = (1−R−1)∞∑j=0

R−jEtbt+j+ψ0

∞∑j=0

R−jEtdt+j+ψ1ht−1+ψ2kt−1, (3.22)

where ψ0, ψ1, ψ2 are constants. Equations (3.22), (19), and (3.15) representthe solution of the planning problem.

Notice that (3.22) makes μst depend on a geometric average of currentand future values of bt. Therefore, both the optimal consumption serviceprocess and optimal consumption depend on the difference between bt anda geometric average of current and expected future values of b. So thereis no ‘level effect’ of the preference shock on the optimal decision rules forconsumption and investment. However, the level of bt will affect equilibriumasset prices.

Observational equivalence (for quantities) of σ = 0and σ = 0

At this point, we state the following

Observational Equivalence Proposition. Fix all parameters except βand σ. Suppose βR = 1. There exists a σ < 0 such that the optimalconsumption-investment plan with σ = 0 is also the optimal consumption-investment plan for any σ satisfying σ < σ < 0 and a smaller discountfactor β(σ) that varies directly with σ.

This proposition means that, so far as the quantities {ct, kt} are con-cerned, the risk-sensitive (σ < 0) version of the permanent income model isobservationally equivalent to the benchmark (σ = 0) version. This insight

Page 53: Maskin ambiguity book mock 2 - New York University

3.3. Robust Permanent Income Theory 45

will guide our estimation strategy, because it sharply partitions the impactof risk-sensitivity into real and pricing parts.

The proof of the proposition is by construction.

Proof. This is the plan of the proof. Begin with a solution {st, ct, kt, ht} fora benchmark σ = 0 economy. Form a comparison economy with a σ ∈ [σ, 0],where σ is the boundary of an admissible set of σ’s to be described below.Fix all parameters except (σ, β) the same as in the benchmark economy.Conjecture that {st, ct, kt, ht} is also the optimal allocation for the σ < 0economy. Finally, construct a β = β that verifies this conjecture.

Here are the details of the construction. The optimality of the allocationimplies that Etμct+1 = μct, and that (3.18) and (3.22) are satisfied for the( · ) benchmark allocation, where Et is the expectation operator underthe correct probability measure. The key idea is to form the distortedexpectation operator Et, then choose β = β to make the distorted versionof the Euler equation for μct hold at the benchmark (σ = 0) allocation.

To compute the distorted expectation operator, we follow the recipegiven in formulas (3.9), (3.12). First, we have to evaluate the utility indexU0 by using (3.9). We want to evaluate (3.13) with st − bt ≡ −μst and μstgiven by the law of motion (3.18), which we take as exogenous because theallocation is frozen. We take μst as the state. Since there is no control,(3.9) collapses to

Ωx2 = −x2 + βminv

(−1

σv2 + Ω(x+ θv)2), (3.23)

and we write μst = μst−1+θ(v+w), where θ2 = ν ′ν and v is the specification

error chosen by the ‘opponent’ in the fictitious game. The scalar Ω thatsolves (3.23) is

Ω(β) =β − 1 + σθ2 +

√(β − 1 + σθ2)2 + 4σθ2

−2σθ2. (3.24)

It follows from (3.12) that the distorted law of motion for μst is

Etμst+1 = ζμst, (3.25)

where

ζ = ζ(β) = 1 +θ2σΩ(β)

1− σθ2Ω(β). (3.26)

Page 54: Maskin ambiguity book mock 2 - New York University

46 Chapter 3. Robust Permanent Income and Pricing

Since μct is proportional to μst, it follows that

Etμct+1 = ζμct (3.27)

with the same ζ given by (3.26). In terms of the distorted expectationoperator, the Euler equation for capital is

βREtμct+1 = μct,

or

βRζ(β) = 1. (3.28)

Let σ be the lowest value for which the solution of (3.24) is real. Thengiven σ ∈ (σ, 0], there exists a β satisfying (3.28) such that for (σ, β) thebenchmark allocation solves the risk-adjusted problem. Therefore equations(3.24), (3.26), and (3.28) define a locus of (σ, β)’s, each point of which isobservationally equivalent to (0, β) for (ct, kt) observations, because eachsupports the benchmark allocation.

Furthermore, according to the asset pricing theory to be developedshortly and (3.28), the price of a sure claim on consumption one periodahead is R−1 for all t and all (σ, β) in the locus. Therefore, these differ-ent parameter pairs are also observationally equivalent with respect to therisk-free rate.14

In Figure 1, we report the (σ, β) pairs that are observationally equivalentfor our maximum likelihood estimates for the remaining parameters, whichwe are about to describe.

The observational equivalence depicted in Figure 1 shows that by lower-ing the discount factor, we can make investment less attractive and therebyoffset the precautionary savings motive. As an indication of the importantprecautionary role for savings in this model, suppose that future endow-ments and preference shifters could be forecast perfectly. Then consumerswould choose to draw down their capital stock. Investment would be suffi-ciently unattractive that the optimal linear rule would eventually have both

14In this model, the technology (3.15) ties down the risk-free rate. For a version of themodel with quadratic costs of adjusting capital, the risk-free rate comes to depend onσ, even though the observations on quantities are nearly independent of σ. See Hansenand Sargent (1996).

Page 55: Maskin ambiguity book mock 2 - New York University

3.3. Robust Permanent Income Theory 47

−1.5 −1 −0.5 0

x 10−4

0.9945

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

Figure 3.1: Observationally equivalent (σ, β) pairs for maximum likelihoodvalues of identified parameters; σ is the ordinate, β the coordinate

consumption and capital cross zero.15,16 Thus our robust control interpreta-tion of the permanent income decision rule delivers a form of precautionarysavings absent under the usual interpretation.

For any given pair (σ, β) depicted in Figure 1, the permanent incomedecision rule reflects either risk sensitivity or a concern for robustness. Thefamiliar version of the precautionary savings motive focuses on the role ofvariation in the shocks. This version is delivered in our setup by the risksensitive decision theoretic formulation. In contrast, the precautionary no-tion delivered by robust control theory emerges because consumers guardagainst mistakes in conditional means of shocks. Thus concern for robust-ness shifts emphasis from second to first moment properties of shocks.

15Introducing nonnegativity constraints in capital and/or consumption would inducenonlinearities into the consumption and and savings rules, especially near zero capi-tal. But investment would remain unattractive in the presence of those constraints forexperiments like the one we are describing here. See Deaton (1991) for a survey andquantitative assessment of consumption models with binding borrowing constraints.

16As emphasized by Carroll (1992), even when the discount factor is small relative tothe interest rate, precautionary savings can emerge when there is a severe utility cost forzero consumption. Such a utility cost is absent in our formulation.

Page 56: Maskin ambiguity book mock 2 - New York University

48 Chapter 3. Robust Permanent Income and Pricing

1970 1975 1980 1985 1990 19950

2

4

6

8

10

12

14

16

Figure 3.2: Detrended consumption and investment (dotted line) data

3.4 Estimation

Different observationally equivalent (σ, β) pairs identified by our Propo-sition bear different implications about (i) the pricing risky assets; (ii)the amounts required to compensate the planner for confronting differentamounts of risk; (iii) the amount of model misspecification used to jus-tify the planner’s decisions if risk sensitivity is reinterpreted as aversion toKnightian uncertainty. To evaluate these implications, we first choose pa-rameters, including noise variances, by estimating a σ = 0 version of ourpermanent income model, conditioning the likelihood function only on U.S.post-war quarterly consumption and investment data. We estimated thepermanent-income model with habit persistence using U.S. quarterly dataon consumption and investment for the period 1970I–1996III.17

17Our choice of starting the sample in 1970 corresponds to the second subsampleanalyzed by Burnside et al. (1993). Thus we have omitted the earlier period of ‘higherproductivity’. We initially estimated a version of the model with a stochastic preferenceshock over the entire post war time period, but we found that the ‘productivity slowdown’was captured in our likelihood estimation by an initial slow decline in the preferenceshock process followed by a slow increase. Our illustrative permanent income modelis apparently not well suited to capture productivity slowdowns. Given the empiricalresults reported in Burnside et al. (1993), the same could be said of the commonly usedstochastic specification of Solow’s growth model.

Page 57: Maskin ambiguity book mock 2 - New York University

3.4. Estimation 49

Consumption is measured by nondurables plus services, while invest-ment is measured by the sum of durable consumption and gross privateinvestment.18 We applied the model to data that have been scaled throughmultiplication by 1.0033−t. The scaled time series are plotted in Figure 2.We estimated the model from data on (ct, it), setting σ = 0, then deducedpairs (σ, β) that are observationally equivalent. We estimated parametersby climbing a Gaussian likelihood function. We formed the likelihood func-tion recursively, and estimated the unobserved part of the initial state vectorusing procedures described by Hansen and Sargent (1996).

Under our robustness interpretation, this approach to estimation maybe justified in one of two ways. First, economic agents may allow for modelmisspecification when making their decisions, even though in fact the modelis specified correctly during the sample period. Alternatively, economicagents use the (misspecified) maximum likelihood criterion for selectinga baseline model around which they entertain small specification errors.Under this second interpretation, the formal statistical inference formulasfor maximum likelihood estimation require modification (see White, 1982).

We specified a constant preference shifter bt = μb and a bivariate stochas-tic endowment process: dt = μd + d∗t + dt.

19 Because we are modeling twoobserved time series as functions of two shock processes, the model wouldlose its content were we to permit arbitrary cross correlation between thetwo endowment processes. Therefore, we assumed that these processes areorthogonal. We found that one of the shock processes, d∗t was particularlypersistent, with an autoregressive root of .998. While we doubt that thisvalue is distinguishable from unity, we retained the unconstrained estimateof .998 in our subsequent calculations. The two shocks are parameterizedas second order autoregressions. We write them as:

(1− φ1L)(1− φ2L)d∗t = cd∗w

d∗t ,

(1− α1L)(1− α2L)dt = cdwdt .

For the transitory process d we experimented with autoregressive processesof order 1, 2, and 3, which revealed the log likelihood values depicted in

18We used ‘old data’, not chain-weighted indexes.19A previous draft specified two stochastic shock processes: an endowment shock, dt,

and a preference shock, bt. We have chosen to report results for the bivariate endowmentprocess with a constant preference shifter b in response to a comment from one of theanonymous referees. The results from the preference shock version of our model areavailable in an earlier version of this paper available at http://riffle.stanford.edu.

Page 58: Maskin ambiguity book mock 2 - New York University

50 Chapter 3. Robust Permanent Income and Pricing

Table 1. In the table, ‘AR1’ denotes the first-order autoregression, and soon. The likelihood values show a substantial gain in increasing the orderfrom 1 to 2, but negligible gain in going from 2 to 3. These results led us tospecify a second order autoregression for the transitory endowment process.

Page 59: Maskin ambiguity book mock 2 - New York University

3.4. Estimation 51

Table 1

Likelihood Values

Transitory endowment 2× LogLikelihoodspecification

AR1 776.78AR2 779.05AR3 779.05

Note: The values reported differ from twice the log likelihood by acommon constant.

Thus the forcing processes are governed by seven free parameters:(α1, α2, cd, φ1, φ2, cd∗ , μd). We use the parameter μb to set the bliss point.While μb alters the marginal utilities, as we noted previously, it does notinfluence the decision rules for consumption and investment. Consequently,we fixed μb at an arbitrary number, namely 32, in our estimation.

The four parameters governing the endogenous dynamics are: (γ, δh, β, λ).We set δk = .975. We initially did not impose the permanent income re-striction, βR = 1, but the restriction was satisfied by our estimates, so weproceeded to impose it. That is, our estimates confirmed the random walkprediction for both the marginal utility process for consumption goods andthe marginal utility process for consumption services. The restrictions thatβR = 1, δk = .975 pin down γ once β is estimated. We chose to imposeβ = .9971, which after adjustment for the effects of the geometric growthfactor of 1.0033 implies an annual real interest rate of 2.5%.20

Maintaining the βR = 1 restriction, we estimated the model for differentvalues of γ (and therefore of β). The likelihood increases moderately asγ rises (and β decreases) over a large range of γ’s. However, over thisrange other parameters of the model do not change much. Allowing β todecrease below the value .9971 would have the primary effect on our resultsof increasing the risk-free rate above the already excessive value of 2.5 %

20When σ = 0 (the expected utility, rational expectations case) we can scale the statevariables to account for geometric growth without affecting the subsequent analysis.However, when σ < 0, the same transformation has the effect of imposing a time-varyingrisk adjustment. This problem does not arise when the single period utility function hasa different form, say logarithmic. In order to preserve the tractability of the quadraticspecification, we have decided to proceed despite this problem.

Page 60: Maskin ambiguity book mock 2 - New York University

52 Chapter 3. Robust Permanent Income and Pricing

Table 2

Parameter Estimates

Habit persistence No habit persistencerisk free rate .025 .025

β .997 .997δh .682λ 2.443 0α1 .813 .900α2 .189 .241φ1 .998 .995φ2 .704 .450μd 13.710 13.594cd .155 .173cd∗ .108 .098

2× LogLikelihood 779.05 762.55

per year. Therefore, we chose to fix β at .9971.

In Table 2 we report our estimates for the parameters governing the en-dogenous and exogenous dynamics. In Figure 3 we report impulse responsefunctions for consumption and investment to innovations in both compo-nents of the endowment process. For sake of comparison, we also reportestimates from a no habit persistence (λ = 0) model in Table 2, and theresulting impulse response functions in Figure 4.

Notice that the persistent endowment shock process contributes muchmore to consumption and investment fluctuations than does the transitoryendowment shock process.

To assess the statistical evidence for habit persistence, in Figure 5a wegraph twice the concentrated log likelihood as a function of the habit per-sistence parameter. Notice the asymmetry of this function, which has amuch steeper descent towards zero. A likelihood-based confidence intervalcan be deduced by comparing the likelihood deterioration to critical val-ues obtained from the chi-square one distribution. Thus, while values ofλ near zero are implausible, values considerably larger than the maximumlikelihood values are harder to dismiss.21 Figure 5b shows the values of

21The parameter δh is not identified when λ = 0.

Page 61: Maskin ambiguity book mock 2 - New York University

3.4. Estimation 53

5 10 15 20 25 30 35 40 45 500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Figure 3.3: Panel A: Impulse response of investment (circles) and consump-tion (line) to innovation in transitory endowment process (d), at maximumlikelihood estimate of habit persistence. Panel B: Impulse response of in-vestment (circles) and consumption (line) to innovation in persistent shock(d∗), at maximum likelihood estimate of habit persistence.

5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

Figure 3.4: Panel A: Impulse response of investment (circles) and consump-tion (line) to innovation in transitory endowment (d), no habit persistence.Panel B: Impulse response of investment (circles) and consumption (line)to innovation in persistent shock (d∗), no habit persistence.

Page 62: Maskin ambiguity book mock 2 - New York University

54 Chapter 3. Robust Permanent Income and Pricing

0 1 2 3 4 5 6762

764

766

768

770

772

774

776

778

780

0 1 2 3 4 5 6 7 80.6

0.65

0.7

0.75

0.8

Figure 3.5: Panel A: Twice log likelihood, the coordinate, as a functionof λ, the ordinate (other parameters being concentrated out). Panel B:Maximum likelihood δh, the coordinate, as a function of λ, the ordinate.

the depreciation parameter δh as a function of the λ obtained after con-centrating the likelihood function. Estimates of the depreciation parameterdecrease as λ approaches zero, but remain around .65, within the moreplausible range of λ’s.

We put our estimates of the habit persistence parameters, λ and δh, intoperspective by comparing them with ones emerging from other empiricalstudies of aggregate U.S. data. Heaton (1993) finds a comparable valueof λ, but a higher depreciation factor δh using a permanent income modelwithout preference shocks fit to consumption. Heaton also notes that his δhis estimated very imprecisely.22 As an extension to this work, Heaton (1995)estimates a power utility, habit persistence model using consumption andasset market data. In this alternative formulation, he provides evidence forlarger values of λ and a larger depreciation factor δh. Again the estimateof δh has a large standard error. From Heaton’s work, we see that morepronounced habit persistence is estimated only when it is offset in the shortrun by local durability, a source of dynamics that we ignore. Recently,Boldrin et al. (1995) find smaller values of λ and δh than ours, althoughthey model production in a different and maybe more interesting way thanwe. In contrast to Heaton (1995) and Boldrin et al. (1995)), our estimates

22Like Christiano et al. (1991), Heaton (1993) also studies the implications of timeaggregation, which we abstract from, and at the same time he allows for local durabilityin a continuous-time formulation of the model.

Page 63: Maskin ambiguity book mock 2 - New York University

3.5. Asset Pricing 55

of habit persistence embody no asset market implications beyond one forthe risk free interest rate.

3.5 Asset Pricing

For the purposes of decentralization, we regard the robust (or risk-sensitive)solution to the permanent income model as the solution to an optimalresource allocation problem. This view point permits us to compute theequilibrium stochastic process of quantities before deducing the prices thatclear a competitive security markets. We follow Lucas (1978) in assuming alarge number of identical agents who trade in security markets. We can priceassets by treating the consumption process that solves the robust permanentincome model as though it were an endowment process. Because agents areidentical, equilibrium prices become shadow prices that leave consumerscontent with that ‘endowment process.’ The pricing implications underrobustness are slightly different than those under risk-sensitivity. We willproceed in this section by assuming risk-sensitivity and pointing out wherethe analysis would differ under robustness.

The state for the model is xt = [ht−1 kt−1 z′t]′. The equilibrium con-

sumption and service processes can be represented as cet = Scxt, set = Ssxt.

Represent the endowment and preference shock processes as dt = Sdxt, bt =Sbxt. The equilibrium law of motion for the state has representation

xt+1 = Aoxt + Cwt+1. (3.29)

The value function at the optimal allocation can be represented as Uet =

x′tΩxt + ρ where

Ω = −(Ss − Sb)′(Ss − Sb)/2 + βΩ, (3.30a)

ρ = βρ, (3.30b)

and Ω satisfies (3.6a), with A evaluated at Ao.

Key subgradient inequality

We begin our analysis of asset pricing by computing the current time tprice of a state-contingent claim to utility Ut+1 tomorrow. This componentof pricing is trivial when preferences are represented as the usual recursive

Page 64: Maskin ambiguity book mock 2 - New York University

56 Chapter 3. Robust Permanent Income and Pricing

version of the von Neumann-Morgenstern specification, but is nontrivialin the case of risk sensitivity. The pricing of state-contingent utility willbe a key ingredient for pricing state-continent consumption services tomor-row and ultimately for the pricing of multi-period securities that are directclaims on consumption goods. Let st be any service process measurablewith respect to Jt, and Ut be the associated utility index. For purposes ofvaluation, Appendix A establishes the following subgradient inequality:

Rt(Ut+1)−Rt(Uet+1) ≤ TtUt+1 − TtUe

t+1, (3.31)

whereTtUt+1 ≡ E(Vt+1Ut+1|Jt)/E(Vt+1|Jt), (3.32)

andVt+1 ≡ exp(σUe

t+1/2) . (3.33)

As elaborated further below, the operator Tt acts much like a conditionalexpectation.23 Combining (3.31) with the familiar gradient inequality forquadratic functions, it follows that

Ut − Uet ≤ (st − set )Ms

t + βTt(Ut+1 − Uet+1), (3.34)

whereMs

t ≡ (bt − set ). (3.35)

If we regard the marginal utility of services Mst as the price for time t ser-

vices, then (34) states that any pair (st, Ut+1) that is preferred to (set , Uet+1)

costs more at time t. This justifies treating Mst as the equilibrium time t

price of services, and using βTt to value time t+ 1 state-contingent utility.The Tt operator can be computed as the conditional expectation of the

state in the transformed transition equation:

xt+1 = Axt + Cwt+1 , (3.36)

where C satisfiesCC ′ = C(I − σC ′ΩC)−1C ′ (3.37)

and A is given by (12). Given the matrices A and C, asset prices can becomputed using the algorithms described in Hansen and Sargent (1996).

23Depicting prices of derivative claims using distorted expectations is a common tech-nique in asset pricing (e.g., see Harrison and Kreps (1979a)). In our investigation and inEpstein and Wang (1994), the distortion is also needed to price state-contingent utility.

Page 65: Maskin ambiguity book mock 2 - New York University

3.5. Asset Pricing 57

Formula (37) shows that when σ < 0 and Ω is negative semidefinite, theconditional variance associated with the operator Tt is always greater thanor equal to CC ′, because an identity matrix is replaced by a larger matrix(I−σC ′ΩC)−1. Thus, to interpret Tt as a conditional expectation operatorrequires both a pessimistic assignment of the conditional mean for the futurestate vector and an increase in its conditional variance.24

We can interchange the risk sensitivity and the uncertainty aversioninterpretations of the optimal resource allocation problem. As shown byEpstein and Wang (1994), equilibrium asset prices can be deduced by re-ferring to the ‘pessimistic beliefs’ that implement optimal decisions. Forthe uncertainty aversion interpretation, the counterpart to the Tt operatoris the distorted conditional expectation operator, call it Et, induced by thestate transition equation of formula (11). This transition law distorts theconditional mean, but not the conditional variance.25

Pricing multi-period streams

The valuation of the state-contingent utility can be used to evaluate fu-ture consumption services. Construct a family of operators by sequentialapplication of Tt:

St,τ = TtTt+1 . . .Tt+τ−1 (3.38)

where St,0 is the identity map. Like Tt,St,τ can be interpreted as a con-ditional expectation under a transformed conditional probability measureexcept that St,τ is a time t conditional expectation applied to random vari-ables that are measurable with respect to Jt+τ .

In the permanent income model below, the consumption good is a bundleof claims to future consumption services. We can use the equilibrium prices

24 It follows from James (1992) that this covariance correction vanishes in the con-tinuous time formulation of the problem. Instead the original covariance structure isused.

25Epstein and Wang (1994) consider different ways of introducing Knightian uncer-tainty, including ones in which there is an important difference between the game withtime zero commitment and the game with sequential choice. Their specification of Knigh-tian uncertainty can result in two-person games in which the ‘beliefs’ are not unique.This leads them to a form of price indeterminacy, which they link to empirical findings ofexcess volatility. In our setup, the ‘beliefs’ turn out to be unique and price indeterminacyis absent.

Page 66: Maskin ambiguity book mock 2 - New York University

58 Chapter 3. Robust Permanent Income and Pricing

of services to deduce corresponding prices of consumption goods. Thus,consider any process {st} with components in L2

0, and let {Ut} denote theassociated utility process. Let {Ue

t } denote the utility process associatedwith the equilibrium service process {set}. Then by iterating on (34), wefind

Ut − Uet ≤

∞∑τ=0

βτSt,τ (Mst+τst+τ )−

∞∑τ=0

βτSt,τ (Mst+τs

et+τ ) . (3.39)

Inequality (39) says that whenever {st} is strictly preferred to {set} as re-flected by the associated time zero utility index, (Ut > Ue

t ), it also costsmore. Hence {set} is a solution to the consumer’s intertemporal optimiza-tion problem when the time t value of {st} is computed according to theformula

∑∞τ=0 β

τSt,τ (Mst+τst+τ ). This justifies regarding this sum as the

price of an asset offering a claim to the stream of services {st}.If services are not traded ‘unbundled’, but only as bundles of state and

date contingent claims, via the consumption goods, then what we reallywant is a consumption goods counterpart to (39), namely:

Ut − Uet ≤

∞∑τ=0

βτSt,τ (Mct+τct+τ )−

∞∑τ=0

βτSt,τ (Mct+τc

et+τ ) . (3.40)

A formula for the indirect marginal utility of consumption is deduced byascertaining the implicit service flow associated with that a unit of consump-tion and then pricing that intertemporal bundle. Using this argument, itfollows that Mc

t =Mcxt where:

Mc ≡ [(1 + λ) + (1− δh)

∞∑τ=1

βτ (δh)τ (−λ)(A)τ ](Sb − Ss) . (3.41)

Single-period security pricing

A large body of empirical research has focused on pricing one-period secu-rities. Imagine purchasing a security at time t at a price qt, holding it forone time period, then collecting the dividend and selling it at time t+1 fora total payoff pt+1 of the consumption good. The payoff and price shouldsatisfy:

qt = Tt{[βMct+1/Mc

t ]pt+1} (3.42)

Page 67: Maskin ambiguity book mock 2 - New York University

3.5. Asset Pricing 59

where Mct = Mcxt is the marginal utility of consumption and the formula

for Mc is given in (41). Under robustness, the price-payoff relationshipwould be given by:

qt = Et{[βMct+1/Mc

t ]pt+1} (3.43)

where Et is the distorted conditional expectations operator described above.A formula for qt in terms of the original conditional expectation operatoris:

qt = E(mt+1,tpt+1 | Jt) (3.44)

where the exact specification ofmt+1,t will depend upon whether the robust-ness or the risk-sensitivity interpretation is adopted. The two alternativeswill be explored in the next section. The random variable mt+1,t has aninterpretation as a one-period stochastic discount factor, or alternatively asan equilibrium intertemporal marginal rate of substitution for the consump-tion good. The next section will show how risk-sensitivity and uncertaintyaversion are reflected in the usual measure of the intertemporal marginalrate of substitution being scaled by a random variable (that depends on theinterpretation – robustness or risk-sensitivity) with conditional expectationone. We use this multiplicative adjustment to the stochastic discount factorto increase its variability and to enhance risk premia.

From the one-period stochastic discount factor, we can easily deducethe ‘market price of risk.’ For simplicity, think of a one period payoff on anasset as a bundle of two attributes: its conditional mean and its conditionalstandard deviation. In our environment, these two attributes only partiallydescribe asset payoffs. Furthermore, we cannot extract unique prices of theattributes, in part because one of the attributes, the standard deviation, isa nonlinear function of the asset payoff. Nevertheless, like any stochasticdiscount factor model, ours conveys information about how these attributesare valued (see Hansen and Jagannathan, 1991). To see this, consider thecovariance decomposition of the right-hand side of (42):

qt = Et(pt+1)Et(mt+1) + covt(mt+1, pt+1),

where covt denotes the covariance conditioned on time t information. Ap-plying the Cauchy-Schwarz Inequality, we obtain the price bound:

qt ≥ Et(pt+1)Et(mt+1)− stdt(mt+1)stdt(pt+1).

Page 68: Maskin ambiguity book mock 2 - New York University

60 Chapter 3. Robust Permanent Income and Pricing

1975 1980 1985 1990 1995−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

1975 1980 1985 1990 1995−1.5

−1

−0.5

0

0.5

1

1.5

Figure 3.6: Panel A: Estimated process for transitory endowment, dt. PanelB: Estimated process for the permanent endowment process, d∗.

where stdt denotes the standard deviation conditioned at time t. Along theso-called ‘efficient frontier,’ the ‘price of risk’ relative to expected return isgiven by the ratio: stdt(mt+1,t)/Et(mt+1,t) which is commonly referred to asthe market price of risk. This ratio is one way to encode information abouthow risk averse consumers are at the equilibrium consumption process.26

Appendix C describes how to compute the stochastic process for the marketprice of risk when σ is negative under risk-sensitivity.

3.6 Quantifying Robustness from the

Market Price of Risk

Because it is not identified from data on consumption and investment, otherinformation must be used to restrict the risk sensitivity parameter. In thissection, we study how risk sensitivity alters the predicted market price ofrisk. We then exploit the connection between risk sensitivity and Knightianuncertainty by computing the magnitude of the specification errors neededto generate implications comparable to various settings of the parameter

26Gallant, Hansen, and Gallant et al. (1990), Hansen and Jagannathan (1991) andCochrane and Hansen (1992) interpret the equity premium puzzle as the large marketprice of risk implied by asset market data. The market price of risk can be expressedas the least upper bound on Sharpe ratios |Etrt+1 − rft | stdt(rt+1) where rt+1 is a one-

period return and rft is the one-period riskless return. Thus the Sharpe ratio for theone-period return on equity gives a lower bound on the market price of risk.

Page 69: Maskin ambiguity book mock 2 - New York University

3.6. Quantifying Robustness from the Market Price of Risk 61

σ. In particular, we show how allowing for mistakes transmits to the equi-librium market price of risk. We are attracted to the interpretation interms of robustness as a way of confronting an observation of Weil (1989),who noted how market prices of risk can be enhanced by risk sensitivity,but at the cost of making the implied risk aversion ‘extreme.’ Risk aver-sion has typically been measured by studying choice problems with uniquespecifications of the probability laws. That our risk sensitivity parameterhas a nearly equivalent interpretation as reflecting aversion to uncertaintyraises hopes for reinterpreting implausibly large estimates of risk aversionas coming partly from a ‘preference for robustness.’

Market price of risk

While the risk-sensitivity parameter σ and the preference curvature param-eter μb are not identifiable from quantity data, we now show that theyaffect the market price of risk. In Tables 3(a) and 3(b), we report me-dian market prices of risk as functions of the risk sensitivity parameter forthree choices of μb. The tables are constructed using the implied statevectors obtained by applying the Kalman filter. Where yt = [ct it]

′, and

xt =[ht−1 kt−1 1 dt dt−1 d∗t d∗t−1

]′, we used the Kalman filter to com-

pute E(xt|yt, yt−1, ..., y1) for each time t in our sample. It can be shownthat the conditional covariance of the time t state vector given time t in-formation converges to zero, implying that the ‘hidden’ states should beapproximately revealed by the observations. Deviations around the meansof the implied endowment processes under habit persistence are graphedin Figure 6. We used these fitted states to calculate the median marketprice of risk over the sample. In Tables 3(a) and 3(b), we report resultsfor the model estimated with and without habit persistence, respectively.The tables show how we can achieve a ‘target’ market price of risk withalternative (σ, μb) pairs.

Given our high value of the risk free rate (2.5% per annum) and samplingerror in estimates of the market price of risk, model predictions in the rangeof .075 − .150 seem a reasonable “target.”27 Thus in the absence of risk

27It is known from the work of Hansen and Jagannathan (1991) that achieving amarket price of risk target is weaker than satisfying the consumption Euler equation. Forexample, we have not enabled the model to explain one of the glaring empirical failuresof consumption-based asset pricing models: the observed lack of correlation between the

Page 70: Maskin ambiguity book mock 2 - New York University

62 Chapter 3. Robust Permanent Income and Pricing

Table 3(a)

Median Market Price of Risk (with habit persistence)

μb σ: 0 -.00005 -.0001 -.0001518 0.0610 0.0739 0.0869 0.100024 0.0226 0.0575 0.0927 0.128130 0.0139 0.0708 0.1283 0.186536 0.1000 0.0890 0.1691 0.2509

Table 3(b)

Median Market Price of Risk (no habit persistence, λ = 0)

μb σ: 0 -.00005 -.0001 -.0001518 0.0182 0.0221 0.0261 0.030024 0.0068 0.0173 0.0279 0.038530 0.0042 0.0213 0.0385 0.055736 0.0030 0.0268 0.0506 0.0745

sensitivity, for the μb specifications we consider, the market prices of riskare very small. The market price of risk can be raised by reducing furtherthe parameter μb, but at the cost of enhancing the probability of satiationin the quadratic preference ordering. But increasing |σ| pushes the modelpredictions towards more empirically plausible market prices of risk withoutaltering the satiation probabilities.28 Roughly speaking, introducing habitpersistence triples (or multiplies by (1+λ)) the market price of risk across allof the (μb, σ) specifications that we study. This conclusion from Table 3(b)emerges from the estimates from the second (No Habit Persistence) column

implied intertemporal marginal rates of substitution and stock market returns. For adescription of how to build statistical tests based on market price of risk targets, seeBurnside (1994), Cecchetti et al. (1994), and Hansen et al. (1995).

28It can be argued that risk sensitivity is simply repairing a defect in quadratic prefer-ences, a criticism to which we are certainly vulnerable in this paper. The usual measure

of relative risk aversion in the absence of habit persistence is −cU ′′(c)U ′(c) . In the case of

our quadratic preferences, this is given by c(b−c) , which requires that the bliss point pro-

cess be twice the consumption level to attain a risk aversion coefficient of one. For aninvestigation of risk sensitive preferences and logarithmic utility, see Tallarini (2000b).

Page 71: Maskin ambiguity book mock 2 - New York University

3.6. Quantifying Robustness from the Market Price of Risk 63

of Table 2. There the parameters governing the exogenous dynamics areadjusted to match the temporal covariations of consumption and investmentas closely as possible.

Holding fixed σ and increasing the preference translation parameter μbalso enhances the market price of risk except when σ is close to zero. Tounderstand this finding, note that under risk sensitivity, the stochastic dis-count factor can be represented as the product

mt+1,t = mft+1,tm

rt+1,t (3.45)

where

mft+1,t ≡ β

Mct+1

Mct

is the ‘familiar’ intertemporal marginal rate of substitution in the absenceof risk sensitivity and

mrt+1,t ≡

exp(σUet+1/2)

E[exp(σUet+1/2)|Jt]

.

(See Appendix C for an explicit formula for mt+1,t in terms of the equilib-rium laws of motion.) When σ = 0 this second term is one, and it alwayshas conditional expectation equal to one. The latter property is what per-mits us to interpret this second factor as a pessimistic ‘distortion’ of theconditional expectation operator. Finally, recall that the market price ofrisk is simply the (conditional) standard deviation of mt+1,t divided by its(conditional) mean.

When μb is increased and σ = 0, the single-period utility function iscloser to being linear (risk neutral) over the empirically relevant portion ofits domain. As a consequence, the market price of risk decreases as μb isincreased (see the first columns of Tables 3(a) and 3(b)).

Consider next cases in which {mft+1,t} is much smoother than {mr

t+1,t},so that the market price of risk is approximately std(mr

t+1,t|Jt). The (con-ditional) standard deviation of {mr

t+1,t} will be large when the distortion inthe conditional expectation operator is large. As μb increases, the represen-tative consumer’s consumption is moved further away from his ideal pointand hence the scope for pessimism is more pronounced. Thus increasing μbenhances the market price of risk.

More generally, the overall impact of increasing μb for a fixed σ is am-biguous except when σ = 0 and depends on the particular features of the

Page 72: Maskin ambiguity book mock 2 - New York University

64 Chapter 3. Robust Permanent Income and Pricing

calibrated economy. For the calculations reported in Tables 3(a) and 3(b),the median market price of risk increases with μb except when σ is nearzero.

Market price of risk and robustness

As we have just seen, risk sensitivity introduces an additional (multiplica-tive) factor mr

t+1,t into the stochastic discount factor. This factor changesonly slightly when risk sensitivity is reinterpreted as a preference for ro-bustness. When interpreted as a preference for robustness, we can abstractfrom the covariance enhancement of the shocks. However, relative to thosereported in Tables 3(a) and 3(b), the numbers for the market price of riskbarely change when computed assuming Knightian uncertainty rather thanrisk-sensitive preferences.

Let mut+1,t denote the resulting multiplicative factor, so that the com-

posite stochastic discount factor is:

mt+1,t = mut+1,tm

ft+1,t.

To aid our understanding, suppose initially that mft+1,t is constant, so the

market price of risk is given by:

mprt = std(mut+1,t|Jt).

The first columns of Tables 3(a) and 3(b) suggest that the conditional stan-dard deviation of mu

t+1,t is indeed close to zero for the preference specifica-tion used in our calculations.

Under our particular specification of uncertainty aversion, recall thatasset prices are computed using the ‘pessimistic’ view of tomorrow’s shockvector: wt+1 is normally distributed with conditional mean vt and covari-ance matrix I where vt is computed from the solution to the two-persongame. It follows that

mut+1,t =

exp[−(wt+1 − vt)′(wt+1 − vt)/2]

exp(−wt+1′wt+1/2)

,

which is the density ratio of the ‘distorted’ relative to the ‘true’ probabilitydistribution. By a straightforward calculation, it follows that

Et[(mut+1,t)

2] = exp(v′tvt),

Page 73: Maskin ambiguity book mock 2 - New York University

3.6. Quantifying Robustness from the Market Price of Risk 65

and by construction

Et(mut+1,t) = 1.

Therefore,

std(mut+1,t|Jt) = [exp(v′tvt)− 1]1/2 ≈ |vt|

for small distortions. In other words, the market price of risk is approxi-mately equal to the magnitude of the time t specification error. Our marketprices of risk calculated under uncertainty aversion are only slightly smallerthan those computed under risk sensitivity due to the small variance ad-justment associated with the operator Tt.

To understand better this approximate mapping from the permissiblespecification errors to the market price of risk, consider the following. Underthe correct model specification, the shock vector is normally distributedand is normalized to have the identity as its covariance matrix. Suppose amisspecification takes the form of a conditional mean distortion of say, 10%times a direction vector with Euclidean norm one. This direction vectorhas the same dimension as the shock vector and picks the direction of theconditional mean distortion. This 10% distortion would alter a Gaussianlog-likelihood function by:

.005 =vt · vt2

times the number of time periods in the sample. Thus a distortion of thismagnitude would be hard to detect using a sample like ours, which consistsof a little more than one hundred time periods. Having economic agentsallow for distortions of this magnitude gives a market price of risk of approx-imately .10, assuming that there is no variation in the usually constructedstochastic discount factor. The fact that a mistake in forecasting wt+1 couldlead to a direct enhancement of the market price of risk by the magnitudeof the mistake is perhaps not surprising. What is conveyed here is thatconcern for robustness approximately directs the associated pessimism toreturns that are conditionally mean–standard deviation efficient.

More generally, we expect that |vt| is an upper bound on the approx-imate enhancement to the market price of risk caused by the concern forrobustness. Given the ‘pessimistic’ construction of vt, we expect the twocomponents mu

t+1,t and mft+1,t of the stochastic discount factor to be pos-

itively correlated. This upper bound is closer to being attained when thetwo terms are highly positively correlated.

Page 74: Maskin ambiguity book mock 2 - New York University

66 Chapter 3. Robust Permanent Income and Pricing

Table 4(a)

Median vd∗,t *(with habit persistence)

μb σ: 0 -.00005 -.0001 -.0001518 0 -.0129 -.0259 -.0388

(0,0) (-.0166,-.0096) (-.0331,-.0191) (-.0498,-.0287)24 0 -.0349 -.0698 -.1048

(0,0) (-.0385,-.0315) (-.0771,-.0631) (-.1158,-.0947)30 0 -.0569 -.1138 -.1708

(0,0) (-.0605,-.0535) (-.1211,-.1071) (-.1818,-.1607)36 0 -.0788 -.1578 -.2368

(0,0) (-.0825,-.0754) (-.1650,-.1510) (-.2478,-.2267)

Table 4(b)

Median vd,t *(with habit persistence)

μb σ: 0 -.00005 -.0001 -.0001518 0 -0.0002 -0.0004 -0.0005

(0,0) (-.0002,-.0001) (-.0005,-.0003) (-.0007,-.0004)24 0 -0.0005 -0.0010 -0.0015

(0,0) (-.0005,-.0004) (-.0011,-.0009) (-.0016,-.0013)30 0 -0.0008 -0.0016 -0.0024

(0,0) (-.0009,-.0008) (-.0017,-.0015) (-.0026,-.0023)36 0 -0.0011 -0.0022 -0.0033

(0,0) (-.0012,-.0011) (-.0023,-.0021) (-.0035,-.0032)

*Note: minimum and maximum values are in parenthesis below eachmedian.

Page 75: Maskin ambiguity book mock 2 - New York University

3.6. Quantifying Robustness from the Market Price of Risk 67

1975 1980 1985 1990 1995−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Figure 3.7: Estimated innovation to d∗t (solid line), distorted mean vd∗,t(dashed line), and sum of innovation and distorted mean (dotted line) forμb = 30, σ = −.0001.

Measuring Knightian uncertainty

Let vd,t and vd∗,t be the two components of vt associated with the innovationto the two endowment shocks. Equation (10) makes these ‘worst case’specification errors linear functions of the current Markov state. We reportmeasures of the sizes of the vd∗,t and vd,t processes in Tables 4(a) and 4(b).The tables report the medians in vt as well as minima and maxima over thesample. Like the market prices of risk, these measures are evaluated at theestimated values of the shock processes (dt, d

∗t ) over the estimation period.

Recall from our previous discussion that the enhancement of the marketprice of risk caused by Knightian uncertainty is approximately |vt|. Thetables show how |v| is mostly composed of specification errors in the shockfor the persistent component of income d∗t . Figure 7 displays time seriesestimates of d∗t and vd∗,t for μb = 30, σ = −.0001. Relative to the transitiondensity induced by the undistorted model, the distorted mean is a randomwalk, as shown in the proof of the observational equivalence proposition.

Page 76: Maskin ambiguity book mock 2 - New York University

68 Chapter 3. Robust Permanent Income and Pricing

3.7 Intertemporal Mean-Risk tradeoffs

The market price of risk reported above conveys information about the one-period tradeoff between the mean and standard deviation of asset returnsas encoded in shadow prices. We now investigate the implied intertem-poral tradeoff between means and standard deviations associated with ouralternative configurations of μb and σ. Specifically, given a proportionateincrease in the innovation standard deviation of an endowment shock, weaim to compute what proportionate increase in the conditional mean ofthat component of the endowment is required to keep the social planneron the same indifference curve. Initially we answer this question ‘locally’by considering small interventions. This imitates in part local measures ofrisk aversion. However, local measures of risk aversion are often computedaround certainty lines. In our case, we localize around the solution to thepermanent income optimal resource allocation problem. Our localizationpermits us to depict risk-aversion as the ratio of two appropriately chosenintertemporal prices. Thus, like the market price of risk, our intertemporalmeasure of risk aversion also can be interpreted as a price ratio.

We supplement this local experiment with a global one in which thestandard deviation of the shock is set to zero. The intertemporal vantagepoint adopted in this section affects the character of the implied measures ofrisk aversion. The calculations will be conducted using the ‘risk-sensitive’decentralization. A corresponding ‘robust’ decentralization gives rise toessentially the same numbers.

Local measure of risk aversion

We form a local intertemporal tradeoff between the standard deviation andthe mean of the endowment about the equilibrium process for consumptionand investment. Specifically, given a proportional enhancement of standarddeviation of the endowment innovation in all future time periods, we aimto compute what proportional mean increase in the endowment is requiredto keep the social planner on the same indifference curve, at least locally.To perform this computation we attain two ‘value expansions,’ both ofwhich we describe below. The first-order terms or ‘derivatives’ in theseexpansions can be interpreted as prices of appropriately chosen infinitelylived securities.

Page 77: Maskin ambiguity book mock 2 - New York University

3.7. Intertemporal Mean-Risk tradeoffs 69

We implement a ‘local’ modification in the state evolution equation byadopting the parameterization of the law of motion starting for j ≥ 0 as

xεt+j+1 = A0xεt+j + (C + εG)wt+1+j ,

where ε is a small positive scalar. A positive ε initiates a change in theinnovation standard deviation starting with date t + 1. Here the matrixG is designed to select one of the endowment innovations. For example ,it can be identical to C except with zeroes for entries associated with theother endowment shock. Let Ut = W ε(xt) denote the value function forresulting control problem indexed by ε; we take W 0 as the value functionfor a baseline control problem (say the risk sensitive permanent incomemodel). Let

xt+1 = A0xt + Cwt+1

be the corresponding ε = 0 state evolution equation when the optimalcontrol law is imposed. We aim to compute an expansion of the form:

W ε(x) =W 0(x) + εWd(x) + o(ε)

where o(ε) converges to zero as ε tends to zero uniformly on compact subsetsof the state space. We will derive an asset pricing formulation of Wd that,among other uses, facilitates calculations.

A corresponding experiment delivers a ‘robust control’ expansion. Alterthe intervention that takes place at time t by introducing ‘mistakes’ in theconditional mean. Now suppose instead that starting for j ≥ 0 we have:

xεt+j+1 = A0xt+j + (C + εG)(wt+1+j + vt+j),

As before, the parameter σ is used to restrain mistakes, rather than to makea risk adjustment in the utility recursion. This perturbed system givesrise to an expansion that, from a quantitative vantage point, is virtuallyidentical to that we report. The subsequent asset pricing interpretation alsoapplies, provided that we use the prices for the ‘robust’ decentralization inplace of the prices of the ‘risk sensitive’ decentralization.

Of course, W ε is a translated quadratic function of the state vector. Wewrite this function as:

W ε(x) = x′Ωεx+ ρε.

The function Wd is quadratic:

Wd(x) = x′Ωdx+ ρd.

Page 78: Maskin ambiguity book mock 2 - New York University

70 Chapter 3. Robust Permanent Income and Pricing

In effect, Ωd is the derivative with respect to ε of the matrix function Ωε,evaluated at ε = 0. Similarly, ρd is the derivative with respect to ε ofthe scalar function ρε. Computations of these derivatives are simplified bythe fact that we can abstract from the role of optimization of the controlvector for small changes in ε. This familiar property follows from the first-order conditions satisfied by the optimal control law, which imply that thecontribution to the value function expansion is second order in ε. Hencewe can compute the derivatives as if we are holding fixed the control lawand hence the state evolution matrix A0. The matrix Ωd can be computedeasily as the solution of a Sylvester equation.

Measuring risk aversion by asset pricing

Holding fixed the equilibrium law of motion for consumption, c0t = Scxt,we can use our asset pricing formula to evaluate how utility responds tochanges in ε. To compute the desired ‘derivative’ of Ut with respect to ε,we begin by forming a new state vector process:

xεt+j − x0t+j = εyt+j (3.46)

where {yt} evolves according to

yt+j+1 = A0yt+j +Gwt+1+j

with yt = 0. Notice the linear scaling in ε. A consumption process associ-ated with ε > 0 is:

cεt+j = c0t+j + εScyt+j.

It follows from our subgradient inequality (40) that

W ε(xt)−W 0(xt)

ε≤

∞∑j=1

βjSt,j(Mct+jScyt+j).

It can be verified that as ε declines to zero, this becomes an equality. There-fore, we can evaluate the desired ‘derivative’ by using the following assetpricing formula:

Wd(x) =∞∑j=1

βjSt,j(Mct+jScyt+j)

Page 79: Maskin ambiguity book mock 2 - New York University

3.7. Intertemporal Mean-Risk tradeoffs 71

This is the time t price, scaled in units of marginal utility, of an infinitely-lived security with dividend {Sc yt+j}. 29

To compute the local mean-risk tradeoff, we also estimate the utilitychange associated with a small change in the conditional mean of the en-dowment. We capture this small change as follows:

xδt+1 = A0xt + δDxt + Cwt+1

xδt+j+1 = A0xδt+j + Cwt+1+j ,

for j = 1, 2, . . .. This envisions the change in the conditional expectation asoccurring at date t + 1 continuing into the future and leads us to the timet value-function expansion:

W δ(x) = W 0(x) + δWd(x) + o(δ).

Here Wd is a quadratic function of the state vector, which we represent asx′Ωdx.

Imitating our earlier derivation, we form:

xδt+j − xt+j = δAj−10 Dxt

Notice the linear scaling in δ. The new consumption process can be ex-pressed as:

cδt+j = c0t+j + δScAj−10 Dxt.

From our subgradient inequality (39),

W δ(xt)−W 0(xt)

δ≤

∞∑j=1

βjSt,j(Mct+jScA

j−10 Dxt).

Again we can show that this subgradient is actually a gradient by drivingδ to zero. Therefore, our target derivative is given by:

Wd(xt) =∞∑j=1

βjSt,j(Mct+jScA

j−10 Dxt)

29To perform the computation,first form the state transition equation for the com-posite state (x0t

′ y′t)′. The transition equation has a block diagonal state matrix withdiagonal blocks A0. The counterpart to C is constructed by stacking C on top of G.Consumption will be formed by using a matrix (Sc 0) and the dividend will be formedby (0 Sc). Prices can now be computed recursively using a doubling algorithm.

Page 80: Maskin ambiguity book mock 2 - New York University

72 Chapter 3. Robust Permanent Income and Pricing

Table 5

Local Mean-Risk Trade-off

μb σ: 0 -.00005 -.0001 -.0001518 0.3182 0.4347 0.7358 1.923024 0.1179 0.3754 0.9482 3.101730 0.0723 0.4828 1.3775 4.717836 0.0522 0.6175 1.8423 6.4053

Table 6

Global Mean-Risk Trade-off

μb σ: 0 -.00005 -.0001 -.0001518 0.1267 0.1635 0.2460 0.486324 0.0564 0.1666 0.3709 0.920630 0.0355 0.2189 0.5495 1.428136 0.0258 0.2818 0.7393 1.9503

which is the (time t util) price of an infinitely-lived security with dividend{ScAj−1

0 Dxt}. Thus, {Ωd} solves a Sylvester equation.Using our two expansions, the compensation measure is:

δt = −x′tΩdxt + ρd

x′tΩdxt= −Wd(xt)

Wd(xt),

which we index by t to accommodate the change in vantage point as timepasses.

In Table 5, we report our (local) intertemporal measures of risk aversion.The effect of increasing (in absolute value) σ has a stronger effect on themean-risk trade-off than on the market price of risk (compare Table 5 toTable 3(a)). Increases in μb also have a slightly greater impact for thetrade-off calculation.30

30Increasing the market price of risk by enlarging μb has the virtue of further reducingthe probability of satiation. This would appear to increase the intertemporal substi-tutability of consumption. However, recall that μb does not appear in the permanentincome decision rule. Thus, by design we have not changed the consumption–savings be-

Page 81: Maskin ambiguity book mock 2 - New York University

3.8. Conclusions 73

We next verify the local nature of these computations by considering thefollowing experiment. Let ε = −1, which sets to zero the shock variancefor the endowment process. By extrapolating the local measures reportedin Table 5, the entries in this table should convey what fraction of the en-dowment the consumer would be willing to forego to achieve this reductionin volatility. Such an inquiry relates to Lucas’s 1987b quantification of thewelfare costs to fluctuations, except that we are using a permanent incomemodel that permits investment (see also Obstfeld, 1994 and Tallarini, 1998).From this vantage point, the numbers in Table 5 look to be enormous, par-ticularly for the larger (in absolute value) specifications of σ. However, thatextrapolation of our local measure turns out to be misleading. To see this,in Table 6 we report global numbers for the ε = −1 experiment that holdsfixed the permanent income decision rule for the two competing specifica-tions of the endowment process. The global mean-risk tradeoffs are muchsmaller by a factor ranging from two to four. Nevertheless, the tradeoffsremain quite large, except when σ is close to zero.31

3.8 Conclusions

Lucas (1975) warned us about theorists bearing free parameters. Havingheard Lucas, we devoted this paper to scrutinizing some of the implica-tions for prices and quantities of a single additional parameter designed toquantify a preference for robustness to specification errors. By exploitingthe connection between robustness and the risk-sensitivity specification ofJacobson (1973) and Whittle (1990), we have shown how to decentralizedynamic, stochastic general equilibrium models with a consumer who fearsmodel misspecification. Formulas for consumption, investment, and therisk-free interest rate are identical to ones coming from the usual perma-nent income models. We presented formulas for the market price of risk,

havior of the consumer as we change μb. On the other hand, some perverse implications‘off the equilibrium path’ can occur for large values of μb.

31The global numbers would be enhanced a little if we reoptimize when setting theendowment shock to zero. The solution to linear-quadratic problem is unappealing in thiscontext because with less uncertainty, capital ceases to be an attractive way to transformgoods from one period to the next. In light of this, it seems crucial to reoptimize subjectto a nonnegativity constraint on capital. Our imposition of the suboptimal ‘permanentincome’ consumption rule diminishes the impact of this nonnegativity constraint whilepossibly misstating the global tradeoff.

Page 82: Maskin ambiguity book mock 2 - New York University

74 Chapter 3. Robust Permanent Income and Pricing

then applied them to account for the market price of risk observed in U.S.data.

Like Brock and LeBaron (1996), Brock and Hommes (1994), Cochrane(1989), Marcet and Sargent (1989), and Krusell and Smith (1996), we canregard the consumer–investors in our economy as making ‘mistakes’, but asmanaging them differently than do those in these authors’ models.32 Ouragents are very sophisticated in how they accommodate possible mistakes:they base decisions on worse-case scenarios, following Gilboa and Schmei-dler (1989) and Epstein and Wang (1994).

In contrast to Cochrane (1989) and Krusell and Smith (1996), for ourpermanent income economy, the quantity allocations are observationallyequivalent to those in an economy in which no ‘mistakes’ are contemplated.This situation stems partly from the econometrician’s ignorance of the sub-jective discount factor. Like Epstein and Wang (1994) and Melino andEpstein (1995), we focus on how aversion to mistakes transmits into secu-rity market prices. We find that a conditional mean ‘mistake’ of x% of aunit norm vector for a multivariate standard normal shock process increasesthe market price of risk by approximately x/100.

We have concentrated on a robust interpretation of the permanent in-come model of consumption. The permanent income model seemed a nat-ural starting point for exploring the consequences of robust decision the-ory, partly because of its simplicity. Recent work by Carroll (1992) hasemphasized a departure from the permanent income model induced by pre-cautionary savings, low discount factors, and big utility costs to zero con-sumption.33 As we have emphasized, our reinterpretation of the permanentincome model also relies on smaller discount factors and precautionary sav-

32Cochrane (1989) and Krusell and Smith (1996) agents use decision rules that areperturbed by small amounts in arbitrary directions from optimal ones. Marcet andSargent (1989) agents correctly solve dynamic programming problems, but subject tosubtly misspecified constraints: they use estimated transition laws (usually of the correctfunctional forms) which they mistakenly take as non-random and time-invariant. SeeBrock and LeBaron (1996), especially their footnote 2, for a lucid explanation of a classof models that mix ‘adaptation’ – to induce local instability near rational expectationsequilibria – with enough ‘rationality’ to promote global attraction toward the vicinityof rational expectations. Brock and LeBaron (1996) and Brock and Hommes (1994)balance the tension between adaptation and rationality to mimic some interesting returnand volume dynamics.

33See Leland (1968) and Miller (1974) for important early contributions to the liter-ature on precautionary saving.

Page 83: Maskin ambiguity book mock 2 - New York University

3.8. Conclusions 75

ings. It does not, however, permit us to explore the ramifications of bigutility costs to zero consumption, which is central to the work of Carroll(1992) and others, and which requires nonquadratic objective functions.However, Anderson et al. (1999b) have shown how the connection betweenrisk sensitivity and robustness extends to discounted control problems withnonquadratic criteria and nonlinear, stochastic evolution equations. Theyformulate a recursive nonlinear robust control problem that applies readilyto consumption and savings decisions.

Maybe we take the representative agent paradigm too seriously. We usethe representative agent as a convenient starting point to understand theworkings of risk sensitivity and robustness in decentralized economies. Inother settings, we know how heterogeneity of preferences and incompleterisk sharing affect investment behavior and the market price of risk. In ourmodel (and Epstein and Wang’s, 1994), agents agree on the amount and lo-cation of the Knightian uncertainty. Thus, models like ours can contributean additional dimension upon which heterogeneity alters equilibrium quan-tities and prices.

Page 84: Maskin ambiguity book mock 2 - New York University

76 Chapter 3. Robust Permanent Income and Pricing

Appendix 3.A Subgradient Inequality

This appendix derives the subgradient inequality used for equilibrium pric-ing. Let Ue denote the original nonpositive random utility index, U anyother nonpositive random utility index and J a sigma algebra of events. Wewill show that

R(U)−R(Ue) ≤ E[V e(U − Ue) | J ]/E(V e | J) (3.47)

where

R(U) ≡ (2/σ) log{E[exp(σU/2) | J ]V e ≡ exp(σUe/2) . (3.48)

We assume that E[exp(σUe/2) | J ] and hence R(Ue) is finite with prob-ability one. Define h ≡ U − Ue, and let δ be any real number in (0,1).Interpret δ as determining the magnitude of a perturbation in direction h.In other words, the perturbation away from Ue under consideration is δh.

By the convexity of the exponential function:

exp[σ(Ue + hδ)/2]− exp(σUe/2) ≥ δh(σ/2)V e. (3.49)

This inequality remains true when computing expectations conditioned onJ , although either side may be infinite:

E{exp[σ(Ue + hδ)/2] | J} − E{exp(σUe/2) | J} ≥ δ(σ/2)E(V eh | J).(3.50)

Divide each side of (50) by E(V e | J):

E{exp[σ(Ue+hδ)/2] | J}/E{exp(σUe/2) | J}−1 ≥ δ(σ/2)E(V eh | J)/E(V e | J).(3.51)

Since 0 < δ < 1, (Ue + hδ) is a convex combination of Ue and U withweights (1− δ) and δ respectively. By the conditional version of the HolderInequality,

E{exp[σ(Ue + hδ)/2] | J} = E{[exp(σUe/2)]1−δ[exp(σU/2)]δ | J}≤ {E[exp(σUe/2) | J ]}1−δ{E[exp(σU/2) | J ]}δ.(3.52)

Page 85: Maskin ambiguity book mock 2 - New York University

3.B. Computing Prices for State-Contingent Utility 77

Combining (51) and (52) and dividing by δ, we have that

(1/δ){E[exp(σU/2) | J ]/E(V e | J)}δ − 1 ≥ (σ/2)E(V eh | J)/E(V e | J)(3.53)

To complete the derivation, we use the familiar approximation result forlogarithms:

limδ→0

(λδ − 1)/δ = log(λ) (3.54)

where the limit is from above. (This limit can be verified by applyingL’Hospital’s Rule or by using the series expansion for exp[δ log(λ)]). Takinglimits of the left side of (53) as δ declines to zero yields

log{E[exp(σU/2) | J ]} − log[E(V e | J)] ≥ (σ/2)E(V eh | J)/E(V e | J)(3.55)

The desired inequality (47) is obtained by multiplying both sides of (55) bythe negative number (2/σ) and reversing the inequality.

Appendix 3.B Computing Prices for

State-Contingent Utility

In this appendix, we provide a characterization of the operator Tt used inpricing state-contingent utility. The characterization relies on a restrictionthat the utility index Ue

t+1 be quadratic in a normally distributed statevector xt+1. For notational convenience, we will suppress superscripts andsubscripts.

Suppose that a utility index is quadratic in a normally distributed ran-dom vector x ∈ Rn:

U = x′Ωx+ ρ (3.56)

where Ω is a negative semidefinite matrix and ρ ≤ 0. In addition, supposethat

x = μ+ Cw (3.57)

where w is normally distributed random vector with mean zero and covari-ance matrix I. Recall that Tt can be interpreted as a conditional expecta-tion with a change of probability measure. In terms of the notation just

Page 86: Maskin ambiguity book mock 2 - New York University

78 Chapter 3. Robust Permanent Income and Pricing

developed, the new probability measure is constructed using V/EV as aRadon-Nikodym derivative where

V = exp(σU/2) ∝ exp(σw′C ′ΩCw/2 + σw′C ′Ωμ). (3.58)

We can compute expectations with respect to the transformed measureas follows. Let φ be any bounded, Borel measurable function mappingRm → R. Then

E[V φ]/EV ∝∫φ(w) exp(σw′C ′ΩCw/2 + σw′C ′Ωμ) exp(−w′w/2)dw.

(3.59)Note that

σw′C ′ΩCw/2 + σw′C ′Ωμ − w′w/2 = −w′(I − σC ′ΩC)w/2+ w′(I − σC ′ΩC)(I − σC ′ΩC)−1σC ′Ωμ. (3.60)

Consequently, the operator on the left side of (59) can be evaluated byintegrating φ with respect to a normal density with mean vector:

μ ≡ (I − σC ′ΩC)−1σC ′Ωμ (3.61)

and covariance matrix

Σ ≡ (I − σC ′ΩC)−1. (3.62)

The corresponding mean vector and covariance matrix for x are μ+Cμ andCΣC ′, respectively. The Tt operator will only be well defined so long asσC ′ΩC < I.

Appendix 3.C Computing the Conditional

Variance of the Stochastic

Discount Factor

From Eq. (45), we know that mt+1,t, the intertemporal marginal rate ofsubstitution between time t and time t + 1 can be written as:

mt+1,t =β[exp(σUe

t+1/2)ν′Mc

t+1]

E{exp(σUet+1/2) | Jt}ν ′Mc

t

(3.63)

Page 87: Maskin ambiguity book mock 2 - New York University

3.C. Computing Conditional Variance of SDF 79

or as:

mt+1,t =β{exp[σ(x′t+1Ωxt+1 + ρ)/2]ν ′Mcxt+1}

exp[σ(x′tΩxt + ρ)/2]ν ′Mcxt(3.64)

where Ω and ρ are given by (6). By applying the results of Appendix Bwe can compute the mean of mt+1,t, conditional on information available attime t. The result can be written as

E(mt+1,1|Jt) = β(ν ′McAxt)/(ν′Mcxt) (3.65)

Our present goal is to compute the conditional second moment of mt+1,t asa means for computing its conditional variance. We will accomplish this bymanipulating m2

t+1,t so that we can transform the probability measure as inAppendix B but with a different function V . We have

m2t+1,t =

β2

(ν ′Mcxt)2exp[σ(x′t+1Ωxt+1 + ρ)]

exp[σ(x′tΩxt + ρ)](ν ′Mcxt+1)

2. (3.66)

multiply the numerator and denominator by the time t conditional meanof the exponential term in the numerator, E{exp[σ(x′t+1Ωxt+1 + ρ)] | Jt}.This gives us

m2t+1,t =

β2E{exp[σ(x′t+1Ωxt+1 + ρ)] | Jt)}(ν ′Mcxt)2 exp[σ(x′tΩxt + ρ)]

exp[σ(x′t+1Ωxt+1 + ρ)] (ν ′Mcxt+1)2

E{exp[σ(x′t+1Ωxt+1 + ρ)] | Jt}.

(3.67)This conditional expectation can be computed by using a formula found inJacobson (1973), only substituting 2σ for σ:

E{exp[σ(x′t+1Ωxt+1 + ρ)] | Jt} = [det(I − 2σC ′ΩC)]−1/2 exp[σ(x′tΩxt + ρ)]

= exp[σ(x′tΩxt + ρ)], (3.68)

where Ω ≡ A′[Ω + 2σΩC(I − 2σC ′ΩC)−1C ′Ω]A and ρ ≡ − 12σ

log det(I −2σC ′ΩC) + ρ. So we get that

E{exp[σ(x′t+1Ωxt+1 + ρ)] | Jt}exp[σ(x′tΩxt + ρ)]

= exp{σ[x′t(Ω− Ω)xt + ρ− ρ]}. (3.69)

This gives us

m2t+1,t =

β2

(ν ′Mcxt)2exp{σ[x′t(Ω− Ω)xt + ρ− ρ]} Vt+1(ν

′Mcxt+1)2

E{Vt+1|Jt}, (3.70)

Page 88: Maskin ambiguity book mock 2 - New York University

80 Chapter 3. Robust Permanent Income and Pricing

where Vt+1 = exp[(σ(x′t+1Ωxt+1 + ρ)]. So

E(m2t+1,t | Jt) =

β2

(ν ′Mcxt)2exp{σ[x′t(Ω− Ω)xt + ρ− ρ]}Tt[(ν ′Mcxt+1)

2].

(3.71)where Tt is the transformed conditional expectation operator for a 2σ econ-omy. We can evaluate the Tt term in the above expression using resultsfrom Appendix B:

Tt[(ν ′Mcxt+1)2] = x′tA

′M ′cνν

′McAxt + trace(ν ′McCC′M ′

cν), (3.72)

whereA ≡ [I + 2σC(I − 2σC ′ΩC)−1C ′Ω]A (3.73)

andCC ′ ≡ C(I − 2σC ′ΩC)−1C ′. (3.74)

Finally, we know that the conditional variance of mt+1,t is given by itsconditional second moment minus the square of its conditional mean.

Page 89: Maskin ambiguity book mock 2 - New York University

Chapter 4

A Quartet of Semigroups forModel Specification,Robustness, Prices of Risk,and Model Detection

1

Abstract

A representative agent fears that his model, a continuous timeMarkov process with jump and diffusion components, is misspecifiedand therefore uses robust control theory to make decisions. Underthe decision maker’s approximating model, that cautious behaviorputs adjustments for model misspecification into market prices forrisk factors. We use a statistical theory of detection to quantify howmuch model misspecification the decision maker should fear, givenhis historical data record. A semigroup is a collection of objectsconnected by something like the law of iterated expectations. The

1Coauthored with Evan W. Anderson. We thank Fernando Alvarez, Pierre-AndreChiappori, Jose Mazoy, Eric Renault, Jose Scheinkman, Grace Tsiang, and Neng Wangfor comments on earlier drafts and Nan Li for valuable research assistance. This papersupersedes our earlier manuscript Risk and Robustness in Equilibrium (1998). This re-search provided the impetus for subsequent work including Hansen, Sargent, Turmuham-betova and Williams (2002). Hansen and Sargent gratefully acknowledge support fromthe National Science Foundation.

81

Page 90: Maskin ambiguity book mock 2 - New York University

82 Chapter 4. Quartet of Semigroups

law of iterated expectations defines the semigroup for a Markov pro-cess, while similar laws define other semigroups.Related semigroupsdescribe (1) an approximating model; (2) a model misspecificationadjustment to the continuation value in the decision maker’s Bell-man equation; (3) asset prices; and (4) the behavior of the modeldetection statistics that we use to calibrate how much robustness thedecision maker prefers. Semigroups 2, 3, and 4 establish a tight linkbetween the market price of uncertainty and a bound on the error instatistically discriminating between an approximating and a worstcase model.

Keywords: Approximation, misspecification, robustness, risk, uncertainty,statistical detection, pricing.

4.1 Introduction

Rational expectations and model misspecification

A rational expectations econometrician or calibrator typically attributesno concern about specification error to agents even as he shuttles amongalternative specifications.2 Decision makers inside a rational expectationsmodel know the model.3 Their confidence contrasts with the attitudesof both econometricians and calibrators. Econometricians routinely uselikelihood-based specification tests (information criteria or IC) to organizecomparisons between models and empirical distributions. Less formally,calibrators sometimes justify their estimation procedures by saying thatthey regard their models as incorrect and unreliable guides to parameterselection if taken literally as likelihood functions. But the agents inside a

2For example, see the two papers about specification error in rational expectationsmodels by Sims (1993) and Hansen and Sargent (1993).

3This assumption is so widely used that it rarely excites comment within macroe-conomics. Kurz (1997) is an exception. The rational expectations critique of earlierdynamic models with adaptive expectations was that they implicitly contained two mod-els, one for the econometrician and a worse one for the agents who are forecasting insidethe model. See Jorgenson (1967) and Lucas (1976a). Rational expectations modellingresponded to this critique by attributing a common model to the econometrician andthe agents within his model. Econometricians and agents can have different informationsets, but they agree about the model (stochastic process).

Page 91: Maskin ambiguity book mock 2 - New York University

4.1. Introduction 83

calibrator’s model do not share the model-builder’s doubts about specifica-tion.

By equating agents’ subjective probability distributions to the objectiveone implied by the model, the assumption of rational expectations precludesany concerns that agents should have about the model’s specification. Theempirical power of the rational expectations hypothesis comes from hav-ing decision makers’ beliefs be outcomes, not inputs, of the model-buildingenterprise. A standard argument that justifies equating objective and sub-jective probability distributions is that agents would eventually detect anydifference between them, and would adjust their subjective distributionsaccordingly. This argument implicitly gives agents an infinite history ofobservations, a point that is formalized by the literature on convergence ofmyopic learning algorithms to rational expectations equilibria of games anddynamic economies.4

Specification tests leave applied econometricians in doubt because theyhave too few observations to discriminate among alternative models. Econo-metricians with finite data sets thus face a model detection problem thatbuilders of rational expectations models let agents sidestep by endowingthem with infinite histories of observations ”before time zero.”

This paper is about models with agents whose data bases are finite, likeeconometricians and calibrators. Their limited data leave agents with modelspecification doubts that are quantitatively similar to those of econometri-cians and that make them value decision rules that perform well across aset of models. In particular, agents fear misspecifications of the state tran-sition law that are sufficiently small that they are difficult to detect becausethey are obscured by random shocks that impinge on the dynamical sys-tem. Agents adjust decision rules to protect themselves against modellingerrors, a precaution that puts model uncertainty premia into equilibriumsecurity market prices. Because we work with Markov models, we can availourselves of a powerful tool called a semigroup.

Iterated laws and semigroups

The law of iterated expectations imposes consistency requirements that causea collection of conditional expectations operators associated with a Markovprocess to form a mathematical object called a semigroup. The operators

4See Evans and Honkapohja (2003) and Fudenberg and Levine (1998).

Page 92: Maskin ambiguity book mock 2 - New York University

84 Chapter 4. Quartet of Semigroups

are indexed by the time that elapses between when the forecast is madeand when the random variable being forecast is realized. This semigroupand its associated generator characterize the Markov process. Because weconsider forecasting random variables that are functions of a Markov state,the current forecast depends only on the current value of the Markov state.5

The law of iterated values embodies analogous consistency requirementsfor a collection of economic values assigned to claims to payoffs that arefunctions of future values of a Markov state. The family of valuation op-erators indexed by the time that elapses between when the claims are val-ued and when their payoffs are realized forms another semigroup. Just asa Markov process is characterized by its semigroup, so prices of payoffsthat are functions of a Markov state can be characterized by a semigroup.Hansen and Scheinkman (2002) exploited this insight. Here we extend theirinsight to other semigroups. In particular, we describe four semigroups: (1)one that describes a Markov process; (2) another that adjusts continuationvalues in a way that rewards decision rules that are robust to misspecifica-tion of the approximating model; (3) another that models the equilibriumpricing of securities with payoff dates in the future; and (4) another thatgoverns statistics for discriminating between alternative Markov processesusing a finite time series data record.6 We show the close connections thatbind these four semigroups.

Model detection errors and market prices of risk

In earlier work (Hansen, Sargent, and Tallarini (1999), henceforth denotedHST, and Hansen, Sargent, and Wang (2002), henceforth denoted HSW),we studied various discrete time asset pricing models in which decisionmakers’ fear of model misspecification put model uncertainty premia intomarket prices of risk, thereby potentially helping to account for the equitypremium. Transcending the detailed dynamics of our examples was a tightrelationship between the market price of risk and the probability of dis-

5The semigroup formulation of Markov processes is common in the literature on ap-plied probability. See Ethier and Kurz (1986) for a general treatment of semigroups andHansen and Scheinkman (1995) for their use in studying the identification of continuous-time Markov models.

6Here the operator is indexed by the time horizon of the available data. In effectthere is a ‘statistical detection operator’ that measures the statistical value of informationavailable to discriminate between two Markov processes.

Page 93: Maskin ambiguity book mock 2 - New York University

4.1. Introduction 85

tinguishing the representative decision maker’s approximating model froma worst-case model that emerges as a byproduct of his cautious decisionmaking procedure. Although we had offered only a heuristic explanationfor that relationship, we nevertheless exploited it to help us calibrate the setof alternative models that the decision maker should plausibly seek robust-ness against. In the context of continuous time Markov models, this paperanalytically establishes a precise link between the uncertainty componentof risk prices and a bound on the probability of distinguishing the decisionmaker’s approximating and worst case models. We also develop new waysof representing decision makers’ concerns about model misspecification andtheir equilibrium consequences.

Related literature

In the context of a discrete-time, linear-quadratic permanent income model,HST considered model misspecifications measured by a single robustness pa-rameter. HST showed how robust decision-making promotes behavior likethat induced by risk aversion. They interpreted a preference for robustnessas a decision maker’s response to Knightian uncertainty and calculated howmuch concern about robustness would be required to put market prices ofrisk into empirically realistic regions. Our fourth semigroup, which de-scribes model detection errors, provides a statistical method for judgingwhether the required concern about robustness is plausible.

HST and HSW allowed the robust decision maker to consider only alimited array of specification errors, namely, shifts in the conditional meanof shocks that are i.i.d. and normally distributed under an approximatingmodel. In this paper, we consider more general approximating models andmotivate the form of potential specification errors by using specification teststatistics. We show that HST’s perturbations to the approximating modelemerge in linear-quadratic, Gaussian control problems as well as in a moregeneral class of control problems in which the stochastic evolution of thestate is a Markov diffusion process. However, we also show that misspecifi-cations different from HST’s must be entertained when the approximatingmodel includes Markov jump components. As in HST, our formulation ofrobustness allows us to reinterpret one of Epstein and Zin’s 1989a recursionsas reflecting a preference for robustness rather than aversion to risk.

As we explain in Hansen, Sargent, Turmuhambetova, and Williams(henceforth HSTW) 2006b, the robust control theory described in section

Page 94: Maskin ambiguity book mock 2 - New York University

86 Chapter 4. Quartet of Semigroups

4.5 is closely connected to the min-max expected utility or multiple priorsmodel of Gilboa and Schmeidler (1989). A main theme of the present pa-per is to advocate a workable strategy for actually specifying those multiplepriors in applied work. Our strategy is to use detection error probabilitiesto surround the single model that is typically specified in applied work witha set of empirically plausible but vaguely specified alternatives.

Robustness versus learning

A convenient feature of rational expectations models is that the modelbuilder imputes a unique and explicit model to the decision maker. Ouranalysis shares this analytical convenience. While an agent distrusts hismodel, he still uses it to guide his decisions.7 But the agent uses his modelin a way that recognizes that it is an approximation. To quantify approx-imation, we measure discrepancy between the approximating model andother models with relative entropy, an expected log likelihood ratio, wherethe expectation is taken with respect to the distribution from the alterna-tive model. Relative entropy is used in the theory of large deviations, apowerful mathematical theory about the rate at which uncertainty aboutunknown distributions is resolved as the number of observations grows.8 Anadvantage of using entropy to restrain model perturbations is that we canappeal to the theory of statistical detection to provide information abouthow much concern about robustness is quantitatively reasonable.

Our decision maker confronts alternative models that can be discrim-inated among only with substantial amounts of data, so much data that,

7The assumption of rational expectations equates a decision maker’s approximatingmodel to the objective distribution. Empirical applications of models with robust deci-sion makers like HST and HSW have equated those distributions too. The statementthat the agent regards his model as an approximation, and therefore makes cautious de-cisions, leaves open the possibility that the agent’s concern about model misspecificationis ”just in his head”, meaning that the data are actually generated by the approximatingmodel. The ”just in his head” assumption justifies equating the agent’s approximatingmodel with the econometrician’s model, a step that allows us to bring to bear much ofthe powerful empirical apparatus of rational expectations econometrics. In particular,it provides the same economical way of imputing an approximating model to the agentsas rational expectations does. The difference is that we allow the agent’s doubts aboutthat model to affect his decisions.

8See Cho, Williams, and Sargent (2002) for a recent application of large deviationtheory to a model of learning dynamics in macroeconomics.

Page 95: Maskin ambiguity book mock 2 - New York University

4.2. Overview 87

because he discounts the future, the robust decision maker simply acceptsmodel misspecification as a permanent situation. He designs robust con-trols, and does not use data to improve his model specification over time.He adopts this stance because relative to his discount factor, it would taketoo much time for enough data to accrue for him to dispose of the alterna-tive models that concern him. In contrast, many formulations of learninghave decision makers fully embrace an approximating model when makingtheir choices.9 Despite their different orientations, learners and robust de-cision makers both need a convenient way to measure the proximity of twoprobability distributions. This fact builds technical bridges between robustdecision theory and learning theory. The same expressions from large devia-tion theory that govern bounds on rates of learning also provide bounds onvalue functions across alternative possible models in robust decision the-ory.10 More importantly here, we shall show that the tight relationshipbetween detection error probabilities and the market price of risk that wasencountered by HST and HSW can be explained by formally studying therate at which detection errors decrease as sample size grows.

Reader’s guide

A reader interested only in our main results can read section 4.2, then jumpto the empirical applications in section 4.9.

4.2 Overview

This section briefly tells how our main results apply in the special case inwhich the approximating model is a diffusion. Later sections provide tech-nical details and show how things change when we allow jump components.

A representative agent’s model asserts that the state of an economy xtin a state space D follows a diffusion11

dxt = μ(xt)dt+ Λ(xt)dBt (4.1)

where Bt is a Brownian vector. The agent wants decision rules that workwell not just when (4.1) is true but also when the data conform to models

9See Bray (1982) and Kreps (1998).10See Hansen and Sargent (2008b) for discussions of these bounds.11Diffusion (4.1) describes the ‘physical probability measure’.

Page 96: Maskin ambiguity book mock 2 - New York University

88 Chapter 4. Quartet of Semigroups

that are statistically difficult to distinguish from (4.1). A robust controlproblem to be studied in section 4.5 leads to such a robust decision ruletogether with a value function V (xt) and a process γ(xt) for the marginalutility of consumption of a representative agent. As a byproduct of therobust control problem, the decision maker computes a worst-case diffusionthat takes the form

dxt = [μ(xt) + Λ(xt)g(xt)] dt+ Λ(xt)dBt, (4.2)

where g = −(1/θ)Λ′∂V/∂x and θ > 0 is a parameter measuring the sizeof potential model misspecifications. Notice that (4.2) modifies the driftbut not the volatility relative to (4.1). The formula for g tells us thatlarge values of θ are associated with gt’s that are small in absolute value,making model (4.2) difficult to distinguish statistically from model (4.1).The diffusion (4.6) below lets us quantify just how difficult this statisticaldetection problem is.

Without a preference for robustness to model misspecification, the usualapproach to asset pricing is to compute the expected discounted value ofpayoffs with respect to the ‘risk-neutral’ probability measure that is asso-ciated with the following twisted version of the physical measure (diffusion(4.1)):

dxt = [μ(xt) + Λ(xt)g(xt)] dt+ Λ(xt)dBt. (4.3)

In using the risk-neutral measure to price assets, future expected returnsare discounted at the risk-free rate ρ(xt), obtained as follows. The marginalutility of the representative household γ(xt) conforms to dγt = μγ(xt)dt +σγ(xt)dBt. Then the risk-free rate is ρ(xt) = δ − [μγ(xt)/γ(xt)], where δis the instantaneous rate at which the household discounts future utilities;the risk-free rate thus equals the negative of the expected growth rate ofthe representative household’s marginal utility. The price of a payoff φ(xN )contingent on a Markov state in period N is then

E

(exp

[−∫ N

0

ρ(xu)du

]φ(xN)

∣∣∣x0 = x

)(4.4)

where E is the expectation evaluated with respect to the distribution gen-erated by (4.3). This formula gives rise to a pricing operator for everyhorizon N . Relative to the approximating model, the diffusion (4.3) for therisk-neutral measure distorts the drift in the Brownian motion by adding

Page 97: Maskin ambiguity book mock 2 - New York University

4.2. Overview 89

the term Λ(x)g(xt), where g = Λ′ [∂ log γ(x)/∂x]. Here g is a vector of‘factor risk prices’ or ‘market prices of risk’. The equity premium puzzle isthe finding that with plausible quantitative specifications for the marginalutility γ(x), factor risk prices g are too small relative to their empiricallyestimated counterparts.

In section 4.7, we show that when the planner and a representativeconsumer want robustness, the diffusion associated with the risk-neutralmeasure appropriate for pricing becomes

dxt = (μ(xt) + Λ(xt)[g(xt) + g(xt)]) dt+ Λ(xt)dBt, (4.5)

where g is the same process that appears in (4.2). With robustness soughtover a set of alternative models that is indexed by θ, factor risk pricesbecome augmented to g + g. The representative agent’s concerns aboutmodel misspecification contribute the g component of the factor risk prices.To evaluate the quantitative potential for attributing parts of the marketprices of risk to agents’ concerns about model misspecification, we need tocalibrate θ and therefore |g|.

To calibrate θ and g, we turn to a closely related fourth diffusion thatgoverns the probability distribution of errors from using likelihood ratiotests to detect which of two models generated a continuous record of lengthN of observations on xt. Here the key idea is that we can represent theaverage error in using a likelihood ratio test to detect the difference betweenthe two models (4.1) and (4.2) from a continuous record of data of lengthN as .5E

(min{exp(�N), 1}|x0 = x

)where E is evaluated with respect to

model (4.1) and �N is a likelihood ratio of the data record of model (4.2)with respect to model (4.1). For each α ∈ (0, 1), we can use the inequalityE(min{exp(�N), 1}|x0 = x

)≤ E

({exp(α�N)}|x0 = x

)to attain a bound

on the detection error probability. For each α, we show that the boundcan be calculated by forming a new diffusion that uses (4.1) and (4.2) asingredients, and in which the drift distortion g from (4.2) plays a key role.In particular, for α ∈ (0, 1), define

dxαt = [μ(xt) + αΛ(xt)g(xt)] d t+ Λ(xt)dBt, (4.6)

and define the local rate function ρα(x) = (1− α)α/2g(x)′g(x). Then thebound on the average error in using a likelihood ratio test to discriminatebetween the approximating model (4.1) and the worst case model (4.2) from

Page 98: Maskin ambiguity book mock 2 - New York University

90 Chapter 4. Quartet of Semigroups

a continuous data record of length N is

av error ≤ .5Eα

[exp

(−∫ N

0

ρα(xt)

)dt∣∣∣x0 = x

], (4.7)

where Eα is the mathematical expectation evaluated with respect to thediffusion (4.6). The error rate ρα(x) is maximized by setting α = .5. Noticethat the right side of (4.7) is one half the price of pure discount bond thatpays off one unit of consumption for sure N periods in the future, treatingρα as the risk-free rate and the measure induced by (4.6) as the risk-neutralprobability measure.

It is remarkable that the three diffusions (4.2), (4.5), and (4.6) that de-scribe the worst case model, asset pricing under a preference for robustness,and the local behavior of a bound on model detection errors, respectively,are all obtained by perturbing the drift in the approximating model (4.1)with functions of the same drift distortion g(x) that emerges from the robustcontrol problem. To the extent that the bound on detection probabilitiesis informative about the detection probabilities themselves, our theoreticalresults thus neatly explain the pattern that was observed in the empiricalapplications of HST and HSW, namely, that there is a tight link betweencalculated detection error probabilities and the market price of risk. Thatlink transcends all details of the model specification.12 In section 4.9, weshall encounter this tight link again when we calibrate the contribution tomarket prices of risk that can plausibly be attributed to a preference forrobustness in the context of three continuous time asset pricing models.

Subsequent sections of this paper substantiate these and other resultsin a more general Markov setting that permits x to have jump components,so that jump distortions also appear in the Markov processes for the worstcase model, asset pricing, and model detection error. We shall exploitand extend the asset-pricing structure of formulas like (4.4) and (4.7) byrecognizing that they reflect that collections of expectations, values, andbounds on detection error rates can all be described with semigroups.

4.3 Mathematical preliminaries

The remainder of this paper studies continuous-time Markov formulations ofmodel specification, robust decision-making, pricing, and statistical model

12See figure 8 of HSW.

Page 99: Maskin ambiguity book mock 2 - New York University

4.3. Mathematical preliminaries 91

detection. We use Feller semigroups indexed by time for all four purposes.This section develops the semigroup theory needed for our paper.

Semigroups and their generators

Let D be a Markov state space that is a locally compact and separablesubset of Rm. We distinguish two cases. First, when D is compact, we letC denote the space of continuous functions mapping D into R. Second,when we want to study cases in which the state space is unbounded so thatD is not compact, we shall use a one-point compactification that enlargesthe state space by adding a point at ∞. In this case we let C be the spaceof continuous functions that vanish at ∞. We can think of such functions ashaving domain D or domain D ∪∞. The compactification is used to limitthe behavior of functions in the tails when the state space is unbounded.We use the sup-norm to measure the magnitude of functions on C and todefine a notion of convergence.

We are interested in a strongly continuous semigroup of operators {St :t ≥ 0} with an infinitesimal generator G. For {St : t ≥ 0} to be a semigroupwe require that S0 = I and St+τ = StSτ for all τ, t ≥ 0. A semigroup isstrongly continuous if

limτ↓0

Sτφ = φ

where the convergence is uniform for each φ in C. Continuity allows us tocompute a time derivative and to define a generator

Gφ = limτ↓0

Sτφ− φ

τ. (4.8)

This is again a uniform limit and it is well defined on a dense subset ofC. A generator describes the instantaneous evolution of a semigroup. Asemigroup can be constructed from a generator by solving a differentialequation. Thus applying the semigroup property gives

limτ↓0

St+τφ− Stφτ

= GStφ, (4.9)

a differential equation for a semigroup that is subject to the initial conditionthat S0 is the identity operator. The solution to differential equation (4.9)is depicted heuristically as:

St = exp(tG)

Page 100: Maskin ambiguity book mock 2 - New York University

92 Chapter 4. Quartet of Semigroups

and thus satisfies the semigroup requirements. The exponential formulacan be justified rigorously using a Yosida approximation, which formallyconstructs a semigroup from its generator.

In what follows, we will use semigroups to model Markov processes,intertemporal prices, and statistical discrimination. Using a formulationof Hansen and Scheinkman (2002), we first examine semigroups that aredesigned to model Markov processes.

Representation of a generator

We describe a convenient representation result for a strongly continuous,positive, contraction semigroup. Positivity requires that St maps nonneg-ative functions φ into nonnegative functions φ for each t. When the semi-group is a contraction, it is referred to as a Feller semigroup. The con-traction property restricts the norm of St to be less than or equal to onefor each t and is satisfied for semigroups associated with Markov processes.Generators of Feller semigroups have a convenient characterization:

Gφ = μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+Nφ− ρφ (4.10)

where N has the product form

Nφ(x) =

∫[φ(y)− φ(x)]η(dy|x) (4.11)

where ρ is a nonnegative continuous function, μ is an m-dimensional vectorof continuous functions, Σ is a matrix of continuous functions that is positivesemidefinite on the state space, and η(·|x) is a finite measure for each xand continuous in x for Borel subset of D. We require that N map C2

K

into C where C2K is the subspace of functions that are twice continuously

differentiable functions with compact support in D. Formula (4.11) is validat least on C2

K .13

13See Theorem 1.13 in Chapter VII of Revuz and Yor (1994). Revuz and Yor give amore general representation that is valid provided that the functions in C∞

K are in thedomain of the generator. Their representation does not require that η(·|x) be a finitemeasure for each x but imposes a weaker restriction on this measure. As we will see,when η(·|x) is finite, we can define a jump intensity. Weaker restrictions permit there tobe an infinite number of expected jumps in finite intervals that are arbitrarily small inmagnitude. As a consequence, this extra generality involves more cumbersome notationand contributes nothing essential to our analysis.

Page 101: Maskin ambiguity book mock 2 - New York University

4.3. Mathematical preliminaries 93

To depict equilibrium prices we will sometimes go beyond Feller semi-groups. Pricing semigroups are not necessarily contraction semigroups un-less the instantaneous yield on a real discount bond is nonnegative. Whenwe use this approach for pricing, we will allow ρ to be negative. Whilethis puts us out of the realm of Feller semigroups, as argued by Hansenand Scheinkman (2002), known results for Feller semigroups can often beextended to pricing semigroups.

We can think of the generator (4.10) as being composed of three parts.The first two components are associated with well known continuous-timeMarkov process models, namely, diffusion and jump processes. The thirdpart discounts. The next three subsections will interpret these componentsof equation (4.10).

Diffusion processes

The generator of a Markov diffusion process is a second-order differentialoperator:

Gdφ = μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)where the coefficient vector μ is the drift or local mean of the process andthe coefficient matrix Σ is the diffusion or local covariance matrix. Thecorresponding stochastic differential equation is:

dxt = μ(xt)dt+ Λ(xt)dBt

where {Bt} is a multivariate standard Brownian motion and ΛΛ′ = Σ.Sometimes the resulting process will have attainable boundaries, in whichcase we either stop the process at the boundary or impose other boundaryprotocols.

Jump processes

The generator for a Markov jump process is:

Gnφ = Nφ = λ[Qφ − φ] (4.12)

where the coefficient λ.=∫η(dy|x) is a possibly state-dependent Poisson

intensity parameter that sets the jump probabilities and Q is a conditionalexpectation operator that encodes the transition probabilities conditioned

Page 102: Maskin ambiguity book mock 2 - New York University

94 Chapter 4. Quartet of Semigroups

on a jump taking place. Without loss of generality, we can assume that thetransition distribution associated with the operator Q assigns probabilityzero to the event y = x provided that x = ∞, where x is the current Markovstate and y the state after a jump takes place. That is, conditioned on ajump taking place, the process cannot stay put with positive probabilityunless it reaches a boundary.

The jump and diffusion components can be combined in a model of aMarkov process. That is,

Gdφ+ Gnφ = μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+Nφ (4.13)

is the generator of a family (semigroup) of conditional expectation operatorsof a Markov process {xt}, say St(φ)(x) = E[φ(xt)|x0 = x].

Discounting

The third part of (4.10) accounts for discounting. Thus, consider a Markovprocess {xt} with generator Gd + Gn. Construct the semigroup:

Stφ = E

(exp

[−∫ t

0

ρ(xτ )dτ

]φ(xt)|x0 = x

)on C. We can think of this semigroup as discounting the future state at thestochastic rate ρ(x). Discount rates will play essential roles in representingshadow prices from a robust resource allocation problem and in measuringstatistical discrimination between competing models.14

Extending the domain to bounded functions

While it is mathematically convenient to construct the semigroup on C,sometimes it is necessary for us to extend the domain to a larger class of

14When ρ ≥ 0, the semigroup is a contraction. In this case, we can use G as a generatorof a Markov process in which the process is curtailed at rate ρ. Formally, we can let ∞be a terminal state at which the process stays put. Starting the process at state x = ∞,

E(exp

[−∫ t

0 ρ(xτ )dτ]|x0 = x

)is the probability that the process is not curtailed after

t units of time. See Revuz and Yor (1994, p. 280) for a discussion. As in Hansen andScheinkman (2002), we will use the discounting interpretation of the semigroup and notuse ρ as a curtailment rate. Discounting will play an important role in our discussion ofdetection and pricing. In pricing problems, ρ can be negative in some states as mightoccur in a real economy, an economy with a consumption numeraire.

Page 103: Maskin ambiguity book mock 2 - New York University

4.3. Mathematical preliminaries 95

functions. For instance, indicator functions 1D of nondegenerate subsetsD are omitted from C. Moreover, 1D is not in C when D is not compact;nor can this function be approximated uniformly. Thus to extend the semi-group to bounded, Borel measurable functions, we need a weaker notion ofconvergence. Let {φj : j = 1, 2, ...} be a sequence of uniformly boundedfunctions that converges pointwise to a bounded function φo. We can thenextend the Sτ semigroup to φo using the formula:

Sτφo = limj→∞

Sτφj

where the limit notion is now pointwise. The choice of approximating se-quence does not matter and the extension is unique.15

With this construction, we define the instantaneous discount or interestrate as the pointwise derivative

− limτ↓0

1

τlog Sτ1D = ρ

when the derivative exists.

Extending the generator to unbounded functions

Value functions for control problems on noncompact state spaces are oftennot bounded. Thus for our study of robust counterparts to optimization,we must extend the semigroup and its generator to unbounded functions.We adopt an approach that is specific to a Markov process and hence westudy this extension only for a semigroup generated by G = Gd + Gn.

We extend the generator using martingales. To understand this ap-proach, we first remark that for a given φ in the domain of the generator,

Mt = φ(xt)− φ(x0)−∫ t

0

Gφ(xτ )dτ

is a martingale. In effect, we produce a martingale by subtracting theintegral of the local means from the process {φ(xt)}. This martingale con-struction suggests a way to build the extended generator. Given φ we find

15This extension was demonstrated by Dynkin (1956). Specifically, Dynkin defines aweak (in the sense of functionals) counterpart to this semigroup and shows that there isa weak extension of this semigroup to bounded, Borel measurable functions.

Page 104: Maskin ambiguity book mock 2 - New York University

96 Chapter 4. Quartet of Semigroups

a function ψ such that

Mt = φ(xt)− φ(x0)−∫ t

0

ψ(xτ )dτ (4.14)

is a local martingale (a martingale under all members of a sequence of stop-ping times that increases to ∞). We then define Gφ = ψ. This constructionextends the operator G to a larger class of functions than those for whichthe operator differentiation (4.8) is well defined. For every φ in the domainof the generator, ψ = Gφ in (4.14) produces a martingale. However, thereare φ’s not in the domain of the generator for which (4.14) also producesa martingale.16 In the case of a Feller process defined on a state-spaceD that is an open subset of Rm, this extended domain contains at leastfunctions in C2, functions that are twice continuously differentiable on D.Such functions can be unbounded when the original state space D is notcompact.

4.4 A tour of four semigroups

In the remainder of the paper we will study four semigroups. Before de-scribing each in detail, it is useful to tabulate the four semigroups and theiruses. We have already introduced the first semigroup, which describes theevolution of a state vector process {xt}. This semigroup portrays a decisionmaker’s approximating model. It has the generator displayed in (4.10) withρ = 0, which we repeat here for convenience:

Gφ = μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+Nφ. (4.15)

While up to now we used G to denote a generic semigroup, from this pointforward we will reserve it for the approximating model. We can think of thedecision maker as using the semigroup generated by G to forecast functionsφ(xt). This semigroup for the approximating model can have both jump andBrownian components, but the discount rate ρ is zero. In some settings, thesemigroup associated with the approximating model includes a description

16There are other closely related notions of an extended generator in the probabilityliterature. Sometimes calendar time dependence is introduced into the function φ, ormartingales are used in place of local martingales.

Page 105: Maskin ambiguity book mock 2 - New York University

4.4. A tour of four semigroups 97

of endogenous state variables and therefore embeds robust decision rulesof one or more decision makers, as for example when the approximatingmodel emerges from a robust resource allocation problem of the kind to bedescribed in section 4.5.

With our first semigroup as a point of reference, we will consider threeadditional semigroups. The second semigroup represents an endogenousworst-case model that a decision maker uses to promote robustness to pos-sible misspecification of his approximating model (4.15). For reasons thatwe discuss in section 4.8, we shall focus the decision maker’s attention onworst-case models that are absolutely continuous with respect to his approx-imating model. Following Kunita (1969), we shall assume that the decisionmaker believes that the data are actually generated by a member of a classof models that are obtained as Markov perturbations of the approximatingmodel (4.15). We parameterize this class of models by a pair of functions(g, h), where g is a continuous function of the Markov state x that has thesame number of coordinates as the underlying Brownian motion, and h isa nonnegative function of (y, x) that distorts the jump intensities. For theworst-case model, we have the particular settings g = g and h = h. Thenwe can represent the worst-case generator G as

Gφ = μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+ Nφ, (4.16)

where

μ = μ+ Λg

Σ = Σ

η(dy|x) = h(y, x)η(dy|x).

The distortion g to the diffusion and the distortion h to the jump componentin the worst case model will also play essential roles both in asset pricing andin the detection probabilities formulas. From (4.12), it follows that the jumpintensity under this parameterization is given by λ(x) =

∫h(y, x)η(dy|x)

and the jump distribution conditioned on x is h(y, x)/λ(x)η(dy|x). A gener-ator of the form (4.16) emerges from a robust decision problem, the pertur-bation pair (g, h) being chosen by a malevolent player, as we discuss below.Our third semigroup modifies one that Hansen and Scheinkman (2002) de-veloped for computing the time zero price of a state contingent claim that

Page 106: Maskin ambiguity book mock 2 - New York University

98 Chapter 4. Quartet of Semigroups

pays off φ(xt) at time t. Hansen and Scheinkman showed that the time zeroprice can be computed with a risk-free rate ρ and a risk-neutral probabilitymeasure embedded in a semigroup with generator:

Gφ = −ρφ+ μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+ Nφ. (4.17a)

Here

μ =μ+ Λπ

Σ =Σ (4.17b)

η(dy|x) =Π(y, x)η(dy|x).

In the absence of a concern about robustness, π = g is a vector of pricesfor the Brownian motion factors and Π = h encodes the jump risk prices.In Markov settings without a concern for robustness, (4.17b) represents theconnection between the physical probability and the so-called risk-neutralprobability that is widely used for asset pricing along with the interest rateadjustment.

We alter generator (4.17) to incorporate a representative consumer’sconcern about robustness to model misspecification. Specifically a prefer-ence for robustness changes the ordinary formulas for π and Π that are basedsolely on pricing risks under the assumption that the approximating modelis true. A concern about robustness alters the relationship between thesemigroups for representing the underlying Markov processes and pricing.With a concern for robustness, we represent factor risk prices by relating μto the worst-case drift μ: μ = μ+Λg and risk-based jump prices by relatingη to the worst-case jump measure η: η(dy|x) = h(y, x)η(dy|x). Combin-ing this decomposition with the relation between the worst-case and theapproximating models gives the new vectors of pricing functions

π = g + g

Π = hh

where the pair (g, h) is used to portray the (constrained) worst-case modelin (4.16). Later we will supply formulas for (ρ, g, h).

A fourth semigroup statistically quantifies the discrepancy between twocompeting models as a function of the time interval of available data. We are

Page 107: Maskin ambiguity book mock 2 - New York University

4.4. A tour of four semigroups 99

particularly interested in measuring the discrepancy between the approxi-mating and worst case models. For each α ∈ (0, 1), we develop a bound ona detection error probability in terms of a semigroup and what looks likean associated ‘risk-free interest rate’. The counterpart to the risk-free rateserves as an instantaneous discrimination rate. For each α, the generatorfor the bound on the detection error probability can be represented as:

Gαφ = −ραφ+ μα ·(∂φ

∂x

)+

1

2trace

(Σα

∂2φ

∂x∂x′

)+N αφ,

where

μα = μ+ Λgα

Σα = Σ

ηα(dy|x) = hα(y, x)η(dy|x).

The semigroup generated by Gα governs the behavior as sample size growsof a bound on the fraction of errors made when distinguishing two Markovmodels using likelihood ratios or posterior odds ratios. The α associatedwith the best bound is determined on a case by case basis and is especiallyeasy to find in the special case that the Markov process is a pure diffusion.

Semigroup Generator Rate Drift distortion Jump distortion densityapproximating model G 0 0 1

worst-case model G 0 g(x) h(y, x)

pricing G ρ(x) π(x) = g(x) + g(x) Π(x) = h(y, x)h(y, x)detection Gα ρα(x) gα(x) hα(y, x)

Table 4.1: Parameterizations of the generators of four semigroups. The ratemodifies the generator associated with the approximating model by adding−ρφ to the generator for a test function φ. The drift distortion adds a termΛg · ∂φ/∂x to the generator associated with the approximating model. Thejump distortion density is h(y, x)η(dy|x) instead of the jump distributionη(dy|x) in the generator for the approximating model.

Table 4.1 summarizes our parameterization of these four semigroups.Subsequent sections supply formulas for the entries in this table.

Page 108: Maskin ambiguity book mock 2 - New York University

100 Chapter 4. Quartet of Semigroups

4.5 Model Misspecification and robust

Control

We now study the continuous-time robust resource allocation problem. Inaddition to an approximating model, this analysis will produce a con-strained worst case model that by helping the decision maker to assessthe fragility of any given decision rule can be used as a device to choose arobust decision rule.

Lyapunov equation under Markov approximatingmodel and a fixed decision rule

Under a Markov approximating model with generator G and a fixed policyfunction i(x), the decision maker’s value function is

V (x) =

∫ ∞

0

exp(−δt)E [U [xt, i(xt)]|x0 = x] dt.

The value function V satisfies the continuous-time Lyapunov equation:

δV (x) = U [x, i(x)] + GV (x). (4.18)

Since V may not be bounded, we interpret G as the weak extension ofthe generator (4.13) defined using local martingales. The local martingaleassociated with this equation is:

Mt = V (xt)− V (x0)−∫ t

0

(δV (xs)− U [xs, i(xs)]) ds.

As in (4.13), this generator can include diffusion and jump contributions.We will eventually be interested in optimizing over a control i, in which

case the generator G will depend explicitly on the control. For now wesuppress that dependence. We refer to G as the approximating model; Gcan be modelled using the triple (μ,Σ, η) as in (4.13). The pair (μ,Σ)consists of the drift and diffusion coefficients while the conditional measureη encodes both the jump intensity and the jump distribution.

We want to modify the Lyapunov equation (4.18) to incorporate a con-cern about model misspecification. We shall accomplish this by replacingG with another generator that expresses the decision maker’s precautionabout the specification of G.

Page 109: Maskin ambiguity book mock 2 - New York University

4.5. Model Misspecification and robust Control 101

Entropy penalties

We now introduce perturbations to the decision maker’s approximatingmodel that are designed to make finite horizon transition densities of theperturbed model be absolutely continuous with respect to those of the ap-proximating model. We use a notion of absolute continuity that pertainsonly to finite intervals of time. In particular, imagine a Markov processevolving for a finite length of time. Our notion of absolute continuity re-stricts probabilities induced by the path {xτ : 0 ≤ τ ≤ t} for all finite t.See HSTW (2002), who discuss this notion as well as an infinite historyversion of absolute continuity. Kunita (1969) shows how to preserve boththe Markov structure and absolute continuity.

Following Kunita (1969), we shall consider a Markov perturbation thatcan be parameterized by a pair (g, h), where g is a continuous functionof the Markov state x and has the same number of coordinates as theunderlying Brownian motion, and h is a nonnegative function of (y, x) usedto model the jump intensities. In section 4.8, we will have more to sayabout these perturbations including a discussion of why we do not perturbΣ. For the pair (g, h), the perturbed generator is portrayed using a driftμ+Λg, a diffusion matrix Σ, and a jump measure h(y, x)η(dy|x). Thus theperturbed generator is

G(g, h)φ(x) =Gφ(x) + [Λ(x)g(x)] · ∂φ(x)∂x

+

∫[h(y, x)− 1][φ(y)− φ(x)]η(dy|x).

For this perturbed generator to be a Feller process would require that weimpose additional restrictions on h. For analytical tractability we will onlylimit the perturbations to have finite entropy. We will be compelled toshow, however, that the perturbation used to implement robustness doesindeed generate a Markov process. This perturbation will be constructedformally as the solution to a constrained minimization problem. In whatfollows, we continue to use the notation G to be the approximating modelin place of the more tedious G(0, 1).

Page 110: Maskin ambiguity book mock 2 - New York University

102 Chapter 4. Quartet of Semigroups

Conditional relative entropy

At this point, it is useful to have a local measure of conditional relativeentropy.17 Conditional relative entropy plays a prominent role in large devi-ation theory and in classical statistical discrimination where it is sometimesused to study the decay in the so called type II error probabilities, holdingfixed type I errors (Stein’s Lemma). For the purposes of this section, wewill use relative entropy as a discrepancy measure. In section 4.8 we willelaborate on its connection to the theory of statistical discrimination. As ameasure of discrepancy, it has been axiomatized by Csiszar (1991) althoughhis defense shall not concern us here.

By �t we denote the log of the ratio of the likelihood of model one to thelikelihood of model zero, given a data record of length t. For now, let thedata be either a continuous or a discrete time sample. The relative entropyconditioned on x0 is defined to be:

E(�t

∣∣∣x0, model 1)= E

[�t exp (�t)

∣∣∣x0, model 0]

=d

dαE[exp (α�t)

∣∣∣x0, model 0] ∣∣∣

α=1, (4.19)

where we have assumed that the model zero probability distribution isabsolutely continuous with respect to the model one probability distribu-tion. To evaluate entropy, the second relation differentiates the moment-generating function for the log-likelihood ratio. The same information in-equality that justifies maximum likelihood estimation implies that relativeentropy is nonnegative.

When the model zero transition distribution is absolutely continuouswith respect to the model one transition distribution, entropy collapses tozero as the length of the data record t → 0. Therefore, with a continuousdata record, we shall use a concept of conditional relative entropy as a rate,specifically the time derivative of (4.19). Thus, as a local counterpart to(4.19), we have the following measure:

ε(g, h)(x) =g(x)′g(x)

2+

∫[1− h(y, x) + h(y, x) logh(y, x)]η(dy|x) (4.20)

17This will turn out to be a limiting version of a local Chernoff measure ρα to bedefined in section 4.8.

Page 111: Maskin ambiguity book mock 2 - New York University

4.5. Model Misspecification and robust Control 103

where model zero is parameterized by (0, 1) and model one is parameterizedby (g, h). The quadratic form g′g/2 comes from the diffusion contribution,and the term ∫

[1− h(y, x) + h(y, x) log h(y, x)]η(dy|x)

measures the discrepancy in the jump intensities and distributions. It ispositive by the convexity of h log h in h.

Let Δ denote the space of all such perturbation pairs (g, h). Conditionalrelative entropy ε is convex in (g, h). It will be finite only when

0 <

∫h(y, x)η(dy|x) <∞.

When we introduce adjustments for model misspecification, we modifyLyapunov equation (4.18) in the following way to penalize entropy

δV (x) = min(g,h)∈Δ

U [x, i(x)] + θε(g, h) + G(g, h)V (x),

where θ > 0 is a penalty parameter. We are led to the following entropypenalty problem.

Problem AJ(V ) = inf

(g,h)∈Δθε(g, h) + G(g, h)V. (4.21)

Theorem 4.5.1. Suppose that (i) V is in C2 and (ii)∫exp[−V (y)/θ]η(dy|x) <

∞ for all x. The minimizer of Problem A is

g(x) = −1

θΛ(x)′

∂V (x)

∂x

h(y, x) = exp

[V (x)− V (y)

θ

]. (4.22a)

The optimized value of the criterion is:

J(V ) = −θG[exp

(−V

θ

)]exp

(−V

θ

) . (4.22b)

Finally, the implied measure of conditional relative entropy is:

ε∗ =V G[exp(−V/θ)]− G[V exp(−V/θ)]− θG[exp(−V/θ)]

θ exp(−V/θ) . (4.22c)

Page 112: Maskin ambiguity book mock 2 - New York University

104 Chapter 4. Quartet of Semigroups

Proof. The proof is in Appendix A.

The formulas (4.22a) for the distortions will play a key role in our ap-plications to asset pricing and statistical detection.

Risk-Sensitivity as an alternative interpretation

In light of Theorem 4.5.1, our modified version of Lyapunov equation (4.18)is

δV (x) = min(g,h)∈Δ

U [x, i(x)] + θε(g, h) + G(g, h)V (x)

= U [x, i(x)] − θG[exp

(−V

θ

)](x)

exp[−V (x)

θ

] . (4.23)

If we ignore the minimization prompted by fear of model misspecificationand instead simply start with that modified Lyapunov equation as a descrip-tion of preferences, then replacing GV in the Lyapunov equation (4.18) by−θ{G[exp(−V/θ)]/exp(−V/θ)} can be interpreted as adjusting the contin-uation value for risk. For undiscounted problems, the connection betweenrisk-sensitivity and robustness is developed in a literature on risk-sensitivecontrol (e.g., see James (1992) and Runolfsson (1994)). Hansen and Sar-gent’s 1995b recursive formulation of risk sensitivity accommodates dis-counting.

The connection between the robustness and the risk-sensitivity interpre-tations is most evident when G = Gd so that x is a diffusion. Then

−θGd[exp

(−V

θ

)]exp

(−V

θ

) = Gd(V )− 1

(∂V

∂x

)′Σ

(∂V

∂x

).

In this case, (4.23) is a partial differential equation. Notice that −1/2θscales (∂V /∂x)′Σ(∂V /∂x), the local variance of the value function process{V (xt)}. The interpretation of (4.23) under risk sensitive preferences wouldbe that the decision maker is concerned not about robustness but aboutboth the local mean and the local variance of the continuation value process.The parameter θ is inversely related to the size of the risk adjustment.Larger values of θ assign a smaller concern about risk. The term 1/θ is theso-called risk sensitivity parameter.

Page 113: Maskin ambiguity book mock 2 - New York University

4.5. Model Misspecification and robust Control 105

Runolfsson (1994) deduced the δ = 0 (ergodic control) counterpart to(4.23) to obtain a robust interpretation of risk sensitivity. Partial differ-ential equation (4.23) is also a special case of the equation system thatDuffie and Epstein (1992), Duffie and Lions (1992), and Schroder and Ski-adas (1999) have analyzed for stochastic differential utility. They showedthat for diffusion models, the recursive utility generalization introduces avariance multiplier that can be state dependent. The counterpart to thismultiplier in our setup is state independent and equal to the risk sensitiv-ity parameter 1/θ. For a robust decision maker, this variance multiplierrestrains entropy between the approximating and alternative models. Themathematical connections between robustness, on the one hand, and risksensitivity and recursive utility, on the other, let us draw on a set of ana-lytical results from those literatures.18

The θ-constrained worst case model

Given a value function, Theorem 4.5.1 reports the formulas for the distor-tions (g, h) for a worst-case model used to enforce robustness. This worstcase model is Markov and depicted in terms of the value function. Thistheorem thus gives us a generator G and shows us how to fill out the secondrow in Table 4.1. In fact, a separate argument is needed to show formallythat G does in fact generate a Feller process or more generally a Markovprocess. There is a host of alternative sufficient conditions in the probabilitytheory literature. Kunita (1969) gives one of the more general treatments ofthis problem and goes outside the realm of Feller semigroups. Also, (Ethierand Kurz 1986, Chapter 8) give some sufficient conditions for operators togenerate Feller semigroups, including restrictions on the jump componentGn of the operator.

Using the Theorem 4.5.1 characterization of G, we can apply Theorem4.8.1 to obtain the generator of a detection semigroup that measures thestatistical discrepancy between the approximating model and the worst-casemodel.

18See section 4.9 for alternative interpretations of a particular empirical application interms of risk-sensitivity and robustness. For that example, we show how the robustnessinterpretation helps us to restrict θ.

Page 114: Maskin ambiguity book mock 2 - New York University

106 Chapter 4. Quartet of Semigroups

An alternative entropy constraint

We briefly consider an alternative but closely related way to compute worst-case models and to enforce robustness. In particular, we consider:

Problem BJ∗(V ) = inf

(g,h)∈Δ,ε(g,h)≤εG(g, h)V. (4.24)

This problem has the same solution as that given by Problem A exceptthat θ must now be chosen so that the relative entropy constraint is satisfied.That is, θ should be chosen so that ε(g, h) satisfies the constraint. Theresulting θ will typically depend on x. The optimized objective must nowbe adjusted to remove the penalty:

J∗(V ) = J(V )− θε∗ =V G[exp(−V/θ)]− G[V exp(−V/θ)]

exp(−V/θ) ,

which follows from (4.22c).These formulas simplify greatly when the approximating model is a dif-

fusion. Then θ satisfies

θ2 =1

(∂V (x)

∂x

)′Σ

(∂V (x)

∂x

).

This formulation embeds a version of the continuous-time preference orderthat Chen and Epstein (2002) proposed to capture uncertainty aversion.We had also suggested the diffusion version of this robust adjustment inour earlier paper (Anderson, Hansen, and Sargent 1999a).

Enlarging the class of perturbations

In this paper we focus on misspecifications or perturbations to an approxi-mating Markov model that themselves are Markov models. But in HSTW,we took a more general approach and began with a family of absolutelycontinuous perturbations to an approximating model that is a Markov dif-fusion. Absolute continuity over finite intervals puts a precise structure onthe perturbations, even when the Markov specification is not imposed onthese perturbations. As a consequence, HSTW follow James (1992) by con-sidering path dependent specifications of the drift of the Brownian motion∫ t0gsds, where gs is constructed as a general function of past x’s. Given the

Page 115: Maskin ambiguity book mock 2 - New York University

4.6. Portfolio allocation 107

Markov structure of this control problem, its solution can be represented asa time-invariant function of the state vector xt that we denote gt = g(xt).

Adding controls to the original state equation

We now allow the generator to depend on a control vector. Consider anapproximating Markov control law of the form i(x) and let the generatorassociated with an approximating model be G(i). For this generator, weintroduce perturbation (g, h) as before. We write the corresponding gen-erator as G(g, h, i). To attain a robust decision rule, we use the Bellmanequation for a two-player zero-sum Markov multiplier game:

δV = maxi

min(g,h)∈Δ

U(x, i) + θε(g, h) + G(g, h, i)V. (4.25)

The Bellman equation for a corresponding constraint game is:

δV = maxi

min(g,h)∈Δ(i),ε(g,h)≤ε

U(x, i) + G(g, h, i)V.

Sometimes infinite-horizon counterparts to terminal conditions must beimposed on the solutions to these Bellman equations. Moreover, applicationof a Verification Theorem will be needed to guarantee that the impliedcontrol laws actually solve the game. Finally, these Bellman equationspresume that the value function is twice continuously differentiable. It iswell known that this differentiability is not always present in problems inwhich the diffusion matrix can be singular. In these circumstances there istypically a viscosity generalization to each of these Bellman equations withvery similar structures. (See Fleming and Soner (1993) for a developmentof the viscosity approach to controlled Markov processes.)

4.6 Portfolio allocation

To put some of the results of section 4.5 to work, we now consider a robustportfolio problem. In section 4.7 we will use this problem to exhibit howasset prices can be deduced from the shadow prices of a robust resourceallocation problem. We depart somewhat from our previous notation andlet {xt : t ≥ 0} denote a state vector that is exogenous to the individualinvestor. The investor influences the evolution of his wealth, which we

Page 116: Maskin ambiguity book mock 2 - New York University

108 Chapter 4. Quartet of Semigroups

denote by wt. Thus the investor’s composite state at date t is (wt, xt).We first consider the case in which the exogenous component of the statevector evolves as a diffusion process. Later we let it be a jump process.Combining the diffusion and jump pieces is straightforward. We focus onthe formulation with the entropy penalty used in Problem (4.21), but theconstraint counterpart is similar.

Diffusion

An investor confronts asset markets that are driven by a Brownian motion.Under an approximating model, the Brownian increment factors have datet prices given by π(xt) and xt evolves according to a diffusion:

dxt = μ(xt)dt+ Λ(xt)dBt. (4.26)

Equivalently, the x process has a generator Gd that is a second-order differ-ential operator with drift μ and diffusion matrix Σ = ΛΛ′. A control vectorbt entitles the investor to an instantaneous payoff bt·dBt with a price π(xt)·btin terms of the consumption numeraire. This cost can be positive or nega-tive. Adjusting for cost, the investment has payoff −π(xt) · btdt + bt · dBt.There is also a market in a riskless security with an instantaneous risk freerate ρ(x). The wealth dynamics are therefore

dwt = [wtρ(xt)− π(xt) · bt − ct] dt+ bt · dBt, (4.27)

where ct is date t consumption. The control vector is i′ = (b′, c). Onlyconsumption enters the instantaneous utility function. By combining (4.26)and (4.27), we form the evolution for a composite Markov process.

But the investor has doubts about this approximating model and wantsa robust decision rule. Therefore he solves a version of game (4.25) with(4.26), (4.27) governing the dynamics of his composite state vector w, x.With only the diffusion component, the investor’s Bellman equation is

δV (w, x) = max(c,b)

mingU(c) + θε(g) + G(g, b, c)V

where G(g, b, c) is constructed using drift vector[μ(x) + Λ(x)g

wρ(x)− π(x) · b− c + b · g

]

Page 117: Maskin ambiguity book mock 2 - New York University

4.6. Portfolio allocation 109

and diffusion matrix [Λb′

] (Λ′ b

)The choice of the worst case shock g satisfies the first-order condition:

θg + Vwb+ Λ′Vx = 0 (4.28)

where Vw.= ∂V /∂w and similarly for Vx. Solving (4.28) for g gives a special

case of the formula in (4.22a). The resulting worst-case shock would dependon the control vector b. In what follows we seek a solution that does notdepend on b.

The first-order condition for consumption is

Vw(w, x) = Uc(c),

and the first-order condition for the risk allocation vector b is

−Vwπ + Vwwb+ Λ′Vxw + Vwg = 0. (4.29)

In the limiting case in which the robustness penalty parameter is set to ∞,we obtain the familiar result that

b =πVw − Λ′Vxw

Vww,

in which the portfolio allocation rule has a contribution from risk aver-sion measured by −Vw/wVww and a hedging demand contributed by thedynamics of the exogenous forcing process x.

Take the Markov perfect equilibrium of the relevant version of game(4.25). Provided that Vww is negative, the same equilibrium decision rulesprevail no matter whether one player or the other chooses first, or whetherthey choose simultaneously. The first-order conditions (4.28) and (4.29) arelinear in b and g. Solving these two linear equations gives the control lawsfor b and g as a function of the composite state (w, x):

b =θπVw − θΛ′Vxw + VwΛ

′VxθVww − (Vw)2

g =VwΛ

′Vxw − (Vw)2π − VwwΛ

′VxθVww − (Vw)2

. (4.30)

Page 118: Maskin ambiguity book mock 2 - New York University

110 Chapter 4. Quartet of Semigroups

Notice how the robustness penalty adds terms to the numerator and de-nominator of the portfolio allocation rule. Of course, the value function Valso changes when we introduce θ. Notice also that (4.30) gives decisionrules of the form

b = b(w, x)

g = g(w, x), (4.31)

and in particular how the worst case shock g feeds back on the consumer’sendogenous state variable w. Permitting g to depend on w expands thekinds of misspecifications that the consumer considers.

Related formulations

So far we have studied portfolio choice in the case of a constant robustnessparameter θ. Maenhout (2001) considers portfolio problems in which therobustness penalty depends on the continuation value. In his case, the pref-erence for robustness is designed so that asset demands are not sensitiveto wealth levels as is typical in constant θ formulations. Lei (2001) usesthe instantaneous constraint formulation of robustness described in section4.5 to investigate portfolio choice. His formulation also makes θ state de-pendent, since θ now formally plays the role of a Lagrange multiplier thatrestricts conditional entropy at every instant. Lei specifically considers thecase of incomplete asset markets in which the counterpart to b has a lowerdimension than the Brownian motion.

Ex-Post bayesian interpretation

While the dependence of g on the endogenous state w seems reasonable asa way to enforce robustness, it can be unattractive if we wish to interpretthe implied worst case model as one with misspecified exogenous dynamics.It is sometimes asked whether a prescribed decision rule can be rationalizedas being optimal for some set of beliefs, and then to find what those beliefsmust be. The dependence of the shock distributions on an endogenous statevariable such as wealth w might be regarded as a peculiar set of beliefsbecause it is egotistical to let an adverse nature feedback on personal statevariables.

But there is a way to make this feature more acceptable. It requiresusing a dynamic counterpart to an argument of Blackwell and Girshick

Page 119: Maskin ambiguity book mock 2 - New York University

4.6. Portfolio allocation 111

(1954). We can produce a different representation of the solution to thedecision problem by forming an exogenous state vector W that conformsto the Markov perfect equilibrium of the game. We can confront a decisionmaker with this law of motion for the exogenous state vector, have him notbe concerned with robustness against misspecification of this law by settingθ = ∞, and pose an ordinary decision problem in which the decision makerhas a unique model. We initialize the exogenous state at W0 = w0. Theoptimal decision processes for {(bt, ct)} (but not the control laws) will beidentical for this decision problem and for game (4.25) (see HSWT). It canbe said that this alternative problem gives a Bayesian rationale for therobust decision procedure.

Jumps

Suppose now that the exogenous state vector {xt} evolves according toa Markov jump process with jump measure η. To accommodate port-folio allocation, introduce the choice of a function a that specifies howwealth changes when a jump takes place. Consider an investor who facesasset markets with date-state Arrow security prices given by Π(y, xt) where{xt} is an exogenous state vector with jump dynamics. In particular, achoice a with instantaneous payoff a(y) if the state jumps to y has a price∫Π(y, xt)a(y)η(dy|x) in terms of the consumption numeraire. This cost can

be positive or negative. When a jump does not take place, wealth evolvesaccording to

dwt =

[ρ(xt−)wt− −

∫Π(y, xt−)a(y)η(dy|xt−)− ct−

]dt

where ρ(x) is the riskfree rate given state x and for any variable z, zt− =limτ↑t zτ . If the state x jumps to y at date t, the new wealth is a(y). TheBellman equation for this problem is

δV (w, x) = maxc,a

minh∈Δ

U(c) + Vw(w, x)

[ρ(x)wt −

∫Π(y, x)a(y)η(dy|x)− c

]+θ

∫[1− h(y, x) + h(y, x) log h(y, x)]η(dy|x)

+

∫h(y, x)(V [a(y), y]− V (w, x))η(dy|x)

Page 120: Maskin ambiguity book mock 2 - New York University

112 Chapter 4. Quartet of Semigroups

The first-order condition for c is the same as for the diffusion case andequates Vw to the marginal utility of consumption. The first-order conditionfor a requires

h(y, x)Vw[a(y), y] = Vw(w, x)Π(y, x),

and the first-order condition for h requires

−θ log h(y, x) = V [a(y), y]− V (w, x).

Solving this second condition for h gives the jump counterpart to the solu-tion asserted in Theorem 4.5.1. Thus the robust a satisfies:

Vw[a(y), y]

Vw(w, x)=

Π(y, x)

exp(

−V [a(y),y]+V (x)θ

) .In the limiting no-concern-about-robustness case θ = ∞, h is set to one.

Since Vw is equated to the marginal utility for consumption, the first-ordercondition for a equates the marginal rate of substitution of consumptionbefore and after the jump to the price Π(y, x). Introducing robustnessscales the price by the jump distribution distortion.

In this portrayal, the worst case h depends on the endogenous statew, but it is again possible to obtain an alternative representation of theprobability distortion that would give an ex post Bayesian justification forthe decision process of a.

4.7 Pricing risky claims

By building on findings of Hansen and Scheinkman (2002), we now considera third semigroup that is to be used to price risky claims. We denote thissemigroup by {Pt : t ≥ 0} where Ptφ assigns a price at date zero to a date tpayoff φ(xt). That pricing can be described by a semigroup follows from theLaw of Iterated Values: a date 0 state-date claim φ(xt) can be replicatedby first buying a claim Pτφ(xt−τ ) and then at time t − τ buying a claimφ(xt). Like our other semigroups, this one has a generator, say G, that wewrite as in (4.10):

Gφ = −ρφ+ μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+ Nφ

Page 121: Maskin ambiguity book mock 2 - New York University

4.7. Pricing risky claims 113

where

Nφ =

∫[φ(y)− φ(x)]η(dy|x).

The coefficient on the level term ρ is the instantaneous riskless yield tobe given in formula (4.34). It is used to price locally riskless claims. Takentogether, the remaining terms

μ ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+ Nφ

comprise the generator of the so called risk neutral probabilities. The riskneutral evolution is Markov.

As discussed by Hansen and Scheinkman (2002), we should expect thereto be a connection between the semigroup underlying the Markov processand the semigroup that underlies pricing. Like the semigroup for Markovprocesses, a pricing semigroup is positive: it assigns nonnegative prices tononnegative functions of the Markov state. We can thus relate the semi-groups by importing the measure-theoretic notion of equivalence. Pricesof contingent claims that pay off only in probability measure zero eventsshould be zero. Conversely, when the price of a contingent claim is zero,the event associated with that claim should occur only with measure zero,which states the principle of no-arbitrage. We can capture these propertiesby specifying that the generator G of the pricing semigroup satisfies:

μ(x) = μ(x) + Λ(x)π(x)

Σ(x) = Σ(x)

η(x) = Π(y, x)η(dy|x) (4.32)

where Π is strictly positive. Thus we construct equilibrium prices by pro-ducing a triple (ρ, π, Π). We now show how to construct this triple bothwith and without a preference for robustness.

Marginal rate of substitution pricing

To compute prices, we follow Lucas (1978) and focus on the consump-tion side of the market. While Lucas used an endowment economy, Brock(1982) showed that the essential thing in Lucas’s analysis was not the pureendowment feature. Instead it was the idea of pricing assets from marginalutilities that are evaluated at a candidate equilibrium consumption process

Page 122: Maskin ambiguity book mock 2 - New York University

114 Chapter 4. Quartet of Semigroups

that can be computed prior to computing prices. In contrast to Brock, weuse a robust planning problem to generate a candidate equilibrium allo-cation. As in Breeden (1979), we use a continuous-time formulation thatprovides simplicity along some dimensions.19

Pricing without a concern for robustness

First consider the case in which the consumer has no concern about modelmisspecification. Proceeding in the spirit of Lucas (1978)and Brock (1982),we can construct market prices of risk from the shadow prices of a planningproblem. Following Lucas and Prescott (1971) and Mehra and Prescott(1985a), we solve a representative agent planning problem to get a stateprocess {xt}, an associated control process {it}, and a marginal utility ofconsumption process {γt}, respectively. We let G∗ denote the generator forthe state vector process that emerges when the optimal controls from theresource allocation problem with no concern for robustness are imposed. Ineffect, G∗ is the generator for the θ = ∞ robust control problem.

We construct a stochastic discount factor process by evaluating themarginal rate of substitution at the proposed equilibrium consumption pro-cess:

mrst = exp(−δt)γ(xt)γ(x0)

where γ(x) denotes the marginal utility process for consumption as a func-tion of the state x. Without a preference for robustness, the pricing semi-group satisfies

Ptφ(x) = E∗ [mrstφ(xt)|x0 = x] (4.33)

where the expectation operator E∗ is the one implied by G∗.

Individuals solve a version of the portfolio problem described in section4.6 without a concern for robustness. This supports the following represen-

19This analysis differs from that of Breeden (1979) by its inclusion of jumps.

Page 123: Maskin ambiguity book mock 2 - New York University

4.7. Pricing risky claims 115

tation of the generator for the equilibrium pricing semigroup Pt:

ρ = −G∗γγ

+ δ

μ = μ∗ + Λ∗π = μ∗ + Λ∗Λ∗′∂ log γ∂x

η(dy|x) = Π(y, x)η∗(dy|x) =[γ(y)

γ(x)

]η∗(dy|x). (4.34)

These are the usual rational expectations risk prices. The risk-free rate isthe subjective rate of discount reduced by the local mean of the equilibriummarginal utility process scaled by the marginal utility. The vector π ofBrownian motion risk prices are weights on the Brownian increment inthe evolution of the marginal utility of consumption, again scaled by themarginal utility. Finally the jump risk prices Π are given by the equilibriummarginal rate of substitution between consumption before and after a jump.

Pricing with a concern for robustness under the worst

case model

As in our previous analysis, let G denote the approximating model. Thisis the model that emerges after imposing the robust control law i whileassuming that there is no model misspecification (g = 0 and h = 1). Itdiffers from G∗, which also assumes no model misspecification but insteadimposes a rule derived w ithout any preference for robustness. But simplyattributing the beliefs G to private agents in (4.34) will not give us thecorrect equilibrium prices when there is a preference for robustness. Let Gdenote the worst case model that emerges as part of the Markov perfectequilibrium of the two-player, zero-sum game. However, formula (4.34) w illyield the correct equilibrium prices if we in effect impute to the individualagents the worst-case generator G instead of G∗ as their model of state evo-lution when making their decisions w ithout any concerns about its possiblemisspecification.

To substantiate this claim, we consider individual decision-makers who,when choosing their portfolios, use the worst-case model G as if it werecorrect (i.e., they have no concern about the misspecification of that model,so that rather than entertaining a family of models, the individuals committo the worst-case G as a model of the state vector {xt : t ≥ 0}). The pricing

Page 124: Maskin ambiguity book mock 2 - New York University

116 Chapter 4. Quartet of Semigroups

semigroup then becomes

Ptφ(x) = E[mrstφ(xt)|x0 = x] (4.35)

where E denotes the mathematical expectation with respect to the d istortedmeasure described by the generator G. The generator for this pricing semi-group is parameterized by

ρ = −Gγγ

+ δ

μ = μ+ Λg = μ+ ΛΛ′∂ log γ∂x

(4.36)

η(dy|x) = h(y, x)η(dy|x) =[γ(y)

γ(x)

]η(dy|x).

As in subsection 4.7, γ(x) is the log of the marginal utility of consump-tion except it is evaluated at the solution of the robust planning problem.Individuals solve the portfolio problem described in section 4.6 using theworst-case model of the state {xt} with pricing functions π = g and Π = hspecified relative to the worst-case model. We refer to g and h as r isk pricesbecause they are equilibrium prices that emerge from an economy in whichindividual agents use the worst-case model as if it were the correct model toassess risk. The vector g contains the so-called factor risk prices associatedwith the vector of Brownian motion increments. Similarly, h prices jumprisk.

Comparison of (4.34) and (4.36) shows that the formulas for factor riskprices and the risk free rate are identical except that we have used thedistorted generator G in place of G∗. This comparison shows that we canuse standard characterizations of asset pricing formulas if we simply replacethe generator for the approximating model G with the distorted generatorG.20

Pricing under the approximating model

There is another portrayal of prices that uses the approximating model G asa reference point and that provides a vehicle for defining model uncertainty

20In the applications in HST, HSW, and section 4.9, we often take the actual datagenerating model to be the approximating model to study implications. In that sense,the approximating model supplies the same kinds of empirical restrictions that a rationalexpectations econometric model does.

Page 125: Maskin ambiguity book mock 2 - New York University

4.7. Pricing risky claims 117

prices and for distinguishing between the contributions of risk and modeluncertainty. The g and h from subsection 4.7 give the risk components. Wenow use the discrepancy between G and G to produce the model uncertaintyprices.

To formulate model uncertainty prices, we consider how prices can berepresented under the approximating model when the consumer has a pref-erence for robustness. We want to represent the pricing semigroup as

Ptφ(x) = E[(mrst)(mput)φ(xt)|x0 = x] (4.37)

where mpu is a multiplicative adjustment to the marginal rate of substi-tution that allows us to evaluate the conditional expectation with respectto the approximating model rather than the distorted model. Instead of(4.34), to attain (4.37), we portray the drift and jump distortion in thegenerator for the pricing semigroup as

μ = μ+ Λg = μ+ Λ(g + g)

η(dy|x) = h(y, x)η(dy|x) = h(y, x)h(y, x)η(dy|x).

Changing expectation operators in depicting the pricing semigroup will notchange the instantaneous risk-free yield. Thus from Theorem 4.5.1 we have:

Theorem 4.7.1. Let V p be the value function for the robust resource alloca-tion problem. Suppose that (i) V p is in C2 and (ii)

∫exp[−V p(y)/θ]η(dy|x) <

∞ for all x. Moreover, γ is assumed to be in the domain of the extendedgenerator G. Then the equilibrium prices can be represented by:

ρ = −Gγγ

+ δ

π(x) = −1

θΛ(x)′V p

x (x) + Λ(x)′[γx(x)

γ(x)] = g(x) + g(x)

log Π(y, x) = −1

θ[V p(y)− V p(x)] + log γ(y)− log γ(x) = log h(y, x) + log h(y, x).

This theorem follows directly from the relation between G and G given inTheorem 4.5.1 and from the risk prices of subsection 4.7. It supplies thethird row of Table 4.1.

Page 126: Maskin ambiguity book mock 2 - New York University

118 Chapter 4. Quartet of Semigroups

Model uncertainty prices: diffusion and jump

components

We have already interpreted g and h as risk prices. Thus we view g =−1θΛ′V p

x as the contribution to the Brownian exposure prices that comes

from model uncertainty. Similarly, we think of h(y, x) = −1θexp[V p(y) −

V p(x)] as the model uncertainty contribution to the jump exposure prices.HST obtained the additive decomposition for the Brownian motion expo-sure asserted in Theorem 4.7.1 as an approximation for linear-quadratic,Gaussian resource allocation problems. By studying continuous time dif-fusion models we have been able to sharpen their results and relax thelinear-quadratic specification of constraints and preferences.

Subtleties about decentralization

In Hansen and Sargent (2003a), we confirm that the solution of a robustplanning problem can be decentralized with households who also solve ro-bust decision problems while facing the state-date prices that we derivedabove. We confront the household with a recursive representation of state-date prices, give the household the same robustness parameter θ as theplanner, and allow the household to choose a new worst-case model. Therecursive representation of the state-date prices is portrayed in terms ofthe state vector X for the planning problem. As in the portfolio problemsof section 4.6, among the households’ state variables is their endogenouslydetermined financial wealth, w. In equilibrium, the household’s wealth canbe expressed as a function of the state vector X of the planner. However,in posing the household’s problem, it is necessary to include both wealthw and the state vector X that propels the state-date prices as distinctstate components of the household’s state. More generally, it is necessaryto include both economy-wide and individual versions of household capitalstocks and physical capital stocks in the household’s state vector, where theeconomy-wide components are used to provide a recursive representation ofthe date-state prices.

Thus the controls and the worst case shocks chosen by both the planner,on the one hand, and the households in the decentralized economy, on theother hand, will depend on different state vectors. However, in a competi-tive equilibrium, the decisions that emerge from these distinct rules will beperfectly aligned. That is, if we take the decision rules of the household

Page 127: Maskin ambiguity book mock 2 - New York University

4.8. Statistical discrimination 119

in the decentralized economy and impose the equilibrium conditions re-quiring that ‘the representative agent be representative’, then the decisionsand the motion of the state will match. The worst-case models will alsomatch. In addition, although the worst-case models depend on differentstate variables, they coincide along an equilibrium path.

Ex post Bayesian equilibrium interpretation of

robustness

In a decentralized economy, Hansen and Sargent (2003a) also confirm thatit is possible to compute robust decision rules for both the planner and thehousehold by a) endowing each such decision maker with his own worst-casemodel, and b) having each solve his decision problem w ithout a preferencefor robustness, while treating those worst-case models as if they were true.Ex post it is possible to interpret the decisions made by a robust decisionmaker who has a concern about the misspecification of his model as alsobeing made by an equivalent decision maker who has no concern about themisspecification of a d ifferent model that can be constructed from the worstcase model that is computed by the robust decision maker. Hansen andSargent’s 2003a results thus extend results of HSTW, discussed in section4.6, to a setting where both a planner and a representative household chooseworst case models, and where their worst case models turn out to be aligned.

4.8 Statistical discrimination

A weakness in what we have achieved up to now is that we have providedthe practitioner with no guidance on how to calibrate our model uncertaintypremia of Theorem 4.7.1, or what formulas (4.22a) tell us is virtually thesame thing, the decision maker’s robustness parameter θ. It is at this criticalpoint that our fourth semigroup enters the picture.21

Our fourth semigroup governs bounds on detection statistics that we canuse to guide our thinking about how to calibrate a concern about robustness.We shall synthesize this semigroup from the objects in two other semigroupsthat represent alternative models that we want to choose between given a

21As we shall see in section 4.9, our approach to disciplining the choice of θ dependscritically on our adopting a robustness and not a risk-sensitivity interpretation.

Page 128: Maskin ambiguity book mock 2 - New York University

120 Chapter 4. Quartet of Semigroups

finite data record. We apply the bounds associated with distinguishingbetween the decision maker’s approximating and worst-case models. In de-signing a robust decision rule, we assume that our decision maker worriesabout alternative models that available time series data cannot readily dis-pose of. Therefore, we study a stylized model selection problem. Supposethat a decision-maker chooses between two models that we will refer to aszero and one. Both are continuous-time Markov process models. We con-struct a measure of how much time series data are needed to distinguishthese models and then use it to calibrate our robustness parameter θ. Ourstatistical discrepancy measure is the same one that in section 4.5 we usedto adjust continuation values in a dynamic programming problem that isdesigned to acknowledge concern about model misspecification.

Measurement and prior probabilities

We assume that there are direct measurements of the state vector {xt : 0 ≤t ≤ N} and aim to discriminate between two Markov models: model zeroand model one. We assign prior probabilities of one-half to each model. Ifwe choose the model with the maximum posterior probability, two typesof errors are possible, choosing model zero when model one is correct andchoosing model one when model zero is correct. We weight these errorsby the prior probabilities and, following Chernoff (1952), study the errorprobabilities as the sample interval becomes large.

A semigroup formulation of bounds on errorprobabilities

We evade the difficult problem of precisely calculating error probabilities fornonlinear Markov processes and instead seek bounds on those error proba-bilities. To compute those bounds, we adapt Chernoff’s 1952 large deviationbounds to discriminate between Markov processes. Large deviation toolsapply here because the two types of error both get small as the sample sizeincreases. Let G0 denote the generator for Markov model zero and G1 thegenerator for Markov model one. Both can be represented as in (4.13).

Page 129: Maskin ambiguity book mock 2 - New York University

4.8. Statistical discrimination 121

Discrimination in discrete time

Before developing results in continuous time, we discuss discrimination be-tween two Markov models in discrete time. Associated with each Markovprocess is a family of transition probabilities. For any interval τ , thesetransition probabilities are mutually absolutely continuous when restrictedto some event that has positive probability under both probability mea-sures. If no such event existed, then the probability distributions would beorthogonal, making statistical discrimination easy. Let pτ (y|x) denote theratio of the transition density over a time interval τ of model one relative tothat for model zero. We include the possibility that pτ (y|x) integrates to amagnitude less than one using the model zero transition probability distri-bution. This would occur if the model one transition distribution assignedpositive probability to an event that has measure zero under model zero.We also allow the density pτ to be zero with positive model zero transitionprobability.

If discrete time data were available, say x0, xτ , x2τ , ..., xTτ where N =Tτ , then we could form the log likelihood ratio:

�Nτ =

T∑j=1

log pτ (xjτ , x(j−1)τ ).

Model one is selected when�Nτ > 0, (4.38)

and model zero is selected otherwise. The probability of making a classifi-cation error at date zero conditioned on model zero is

Pr{�Nτ > 0|x0 = x, model 0} = E(1{ Nτ >0}

∣∣x0 = x, model 0).

It is convenient that the probability of making a classification error condi-tioned on model one can also be computed as an expectation of a trans-formed random variable conditioned on model zero. Thus,

Pr{�Nτ < 0|x0 = x, model 1} = E[1{ Nτ <0}|x0 = x, model 1

]= E

[exp

(�Nτ)1{ Nτ <0}|x0 = x, model 0

].

The second equality follows because multiplication of the indicator functionby the likelihood ratio exp

(�Nτ)converts the conditioning model from one

Page 130: Maskin ambiguity book mock 2 - New York University

122 Chapter 4. Quartet of Semigroups

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

exp(r)

min

(exp

(r),

1)

min(exp(r),1)exp(α r)

Figure 4.1: Graph of min {exp(r), 1} and the dominating function exp(rα)for α = .5.

to zero. Combining these two expressions, the average error is:

av error =1

2E(min{exp

(�Nτ), 1}

∣∣∣x0 = x, model 0). (4.39)

Because we compute expectations only under the model zero probabilitymeasure, from now on we leave implicit the conditioning on model zero.

Instead of using formula (4.39) to compute the probability of makingan error, we will use a convenient upper bound originally suggested byChernoff (1952) and refined by Hellman and Raviv (1970). To motivate thebound, note that for any 0 < α < 1 the piecewise linear function min{s, 1}is dominated by the concave function sα and that the two functions agree atthe kink point s = 1. The smooth dominating function gives rise to moretractable computations as we alter the amount of available data. Thus,setting log s = r = �Nτ and using (4.39) gives the bound:

av error ≤ 1

2E[exp

(α�Nτ

) ∣∣∣x0 = x]

(4.40)

where the right side is the moment-generating function for the log-likelihoodratio �Nτ (see Figure 4.1). (Later we shall discuss how to choose α ∈ (0, 1)

Page 131: Maskin ambiguity book mock 2 - New York University

4.8. Statistical discrimination 123

in order to maximize error detection rates.) Define an operator:

Kατ φ(x) = E

[exp (α�ττ)φ(xτ )

∣∣∣x0 = x].

Then inequality (4.40) can be portrayed as:

av error ≤ 1

2(Kα

τ )T 1D(x) (4.41)

where 1D is again the indictor function of the state space D for the Markovprocess, and where superscript T on the right side denotes sequential ap-plication of an operator T times. This bound applies for any integer choiceof T and any choice of α between zero and one.22

When restricted to a function space C, we have the inequalities

|Kατ φ(x)| ≤ E

([exp (�ττ )]

α |φ|∣∣∣x0 = x

)≤ E

[exp (�ττ ) |φ|

∣∣∣x0 = x]α

≤ ‖φ‖

where the second inequality is an application of Jensen’s inequality. ThusKατ is a contraction on C.

Rates for measuring discrepancies between modelslocally

Classification errors become less frequent as more data become available.One common way to study the large sample behavior of classification errorprobabilities is to investigate the limiting behavior of the operator (Kα

τ )T as

T gets large. This amounts to studying how fast (Kατ )T contracts for large T

and results in a large deviation characterization. Chernoff (1952) proposedsuch a characterization for i.i.d. data that later researchers extended toMarkov processes. Formally, a large-deviation analysis can give rise to anasymptotic rate for discriminating between models.

Given the state dependence in the Markov formulation, there are twodifferent possible notions of discrimination rates that are based on Chernoff

22This bound covers the case in which the model one density omits probability, andso equivalence between the two measures is not needed for this bound to be informative.

Page 132: Maskin ambiguity book mock 2 - New York University

124 Chapter 4. Quartet of Semigroups

entropy. One notion that we shall dub ‘long run’ is state independent; toconstruct it requires additional assumptions. This long run rate is computedby studying the semigroup {Kt : t ≥ 0} as t gets large. This semigroup canhave a positive eigenfunction, that is, a function φ that solves

Ktφ = exp(−δt)φ (4.42)

for some positive δ. When it exists, this eigenfunction dominates the re-maining ones as the time horizon t gets large. As a consequence, for larget this semigroup decays approximately at an exponential rate δ. Therefore,δ is a long run measure of the rate at which information for discriminatingbetween two models accrues. By construction and as part of its ‘long run’nature, the rate δ is independent of the state.

In this paper, we use another approximation that results in a state-dependent or ‘short-run’ discrimination rate. It is this state-dependent ratethat is closely linked to our robust decision rule, in the sense that it isgoverned by the same objects that emerge from the worst-case analysis forour robust control problem. The semigroup {Kt : t ≥ 0} has the sameproperties as a pricing semigroup, and furthermore it contracts. We candefine a discrimination rate in the same way that we define an instantaneousinterest rate from a pricing semigroup. This leads us to use Chernoff entropyas ρα(x). It differs from the decay rate δ defined by (4.42). For a given statex, it measures the statistical ability to discriminate between models whena small interval of data becomes available. When the rate ρα(x) is large,the time series data contain more information for discriminating betweenmodels.

Before characterizing a local discrimination rate ρα that is applicable tocontinuous-time processes, we consider the following example.

Constant drift

Consider sampling a continuous multivariate Brownian motion with a con-stant drift. Let μ0,Σ0 and μ1,Σ1 be the drift vectors and constant diffusionmatrices for models zero and one, respectively. Thus under model zero,xjτ − x(j−1)τ is normally distributed with mean τμ0 and covariance matrixτΣ0. Under an alternative model one, xjτ − x(j−1)τ is normally distributedwith mean τμ1 and covariance matrix τΣ1.

Suppose that Σ0 = Σ1 and that the probability distributions impliedby the two models are equivalent (i.e., mutually absolutely continuous).

Page 133: Maskin ambiguity book mock 2 - New York University

4.8. Statistical discrimination 125

Equivalence will always be satisfied when Σ0 and Σ1 are nonsingular butwill also be satisfied when the degeneracy implied by the covariance matricescoincides. It can be shown that

limτ↓0

Kατ 1D < 1

suggesting that a continuous-time limit will not result in a semigroup. Re-call that a semigroup of operators must collapse to the identity when theelapsed interval becomes arbitrarily small. When the covariance matricesΣ0 and Σ1 differ, the detection-error bound remains positive even whenthe data interval becomes small. This reflects the fact that while absolutecontinuity is preserved for each positive τ , it is known from Cameron andMartin (1947) that the probability distributions implied by the two limitingcontinuous-time Brownian motions will not be mutually absolutely contin-uous when the covariance matrices differ. Since diffusion matrices can beinferred from high frequency data, differences in these matrices are easy todetect.23

Suppose that Σ0 = Σ1 = Σ. If μ0−μ1 is not in the range of Σ, then thediscrete-time transition probabilities for the two models over an interval τare not equivalent, making the two models easy to distinguish using data.If, however, μ0 − μ1 is in the range of Σ, then the probability distributionsare equivalent for any transition interval τ . Using a complete-the-squareargument, it can be shown that

Kατ φ(x) = exp(−τρα)

∫φ(y)P α

τ (y − x)dy

where P τα is a normal distribution with mean τ(1−α)μ0+ ταμ1 and covari-

ance matrix τΣ,

ρα =α(1− α)

2(μ0 − μ1)′Σ−1(μ0 − μ1) (4.43)

and Σ−1 is a generalized inverse when Σ is singular. It is now the case that

limτ↓0

Kατ 1D = 1.

23The continuous time diffusion specification carries with it the implication that dif-fusion matrices can be inferred from high frequency data without knowledge of the drift.That data are discrete in practice tempers the notion that the diffusion matrix canbe inferred exactly. Nevertheless, estimating conditional means is much more difficultthan estimating conditional covariances. Our continuous time formulation simplifies ouranalysis by focusing on the more challenging drift inference problem.

Page 134: Maskin ambiguity book mock 2 - New York University

126 Chapter 4. Quartet of Semigroups

The parameter ρα acts as a discount rate, since

Kατ 1D = exp(−τρα).

The best probability bound (the largest ρα) is obtained by setting α = 1/2,and the resulting discount rate is referred to as Chernoff entropy. Thecontinuous-time limit of this example is known to produce probability distri-butions that are absolutely continuous over any finite horizon (see Cameronand Martin 1947).

For this example, the long-run discrimination rate δ and the short-runrate ρα coincide because ρα is state independent. This equivalence emergesbecause the underlying processes have independent increments. For moregeneral Markov processes, this will not be true and the counterpart to theshort-run rate will depend on the Markov state. The existence of a welldefined long-run rate requires special assumptions.

Continuous time

There is a semigroup probability bound analogous to (4.40) that helps tounderstand how data are informative in discriminating between Markovprocess models. Suppose that we model two Markov processes as Fellersemigroups. The generator of semigroup zero is

G0φ = μ0 ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+N 0φ

and the generator of semigroup one is

G1φ = μ1 ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+N 1φ

In specifying these two semigroups, we assume identical Σ’s. As in the ex-ample, this assumption is needed to preserve absolute continuity. Moreover,we require that μ1 can be represented as:

μ1 = Λg + μ0.

for some continuous function g of the Markov state, where we assume thatthe rank of Σ is constant on the state space and can be factored as Σ = ΛΛ′

Page 135: Maskin ambiguity book mock 2 - New York University

4.8. Statistical discrimination 127

where Λ has full rank. This is equivalent to requiring that μ0 −μ1 is in therange of Σ.24

In contrast to the example, however, both of the μ’s and Σ can dependon the Markov state. Jump components are allowed for both processes.These two operators are restricted to imply jump probabilities that aremutually absolutely continuous for at least some nondegenerate event. Welet h(·, x) denote the density function of the jump distribution of N 1 withrespect to the distribution of N 0. We assume that h(y, x)dη0(y|x) is finitefor all x. Under absolute continuity we write:

N 1φ(x) =

∫h(y, x)[φ(y)− φ(x)]η0(dy|x).

Associated with these two Markov processes is a positive, contractionsemigroup {Kα

t : t ≥ 0} for each α ∈ (0, 1) that can be used to bound theprobability of classification errors:

av error ≤ 1

2(Kα

N) 1D(x).

This semigroup has a generator Gα with the Feller form:

Gαφ = −ραφ+ μα ·(∂φ

∂x

)+

1

2trace

∂2φ

∂x∂x′

)+N αφ. (4.44)

The drift μα is formed by taking convex combinations of the drifts for thetwo models

μα = (1− α)μ0 + αμ1 = μ0 + αΛg;

the diffusion matrix Σ is the common diffusion matrix for the two models,and the jump operator N α is given by:

N αφ(x) =

∫[h(y, x)]α[φ(y)− φ(x)]η0(dy|x)

Finally, the rate ρα is nonnegative and state dependent and is the sum ofcontributions from the diffusion and jump components:

ρα(x) = ραd (x) + ραn(x). (4.45)

24This can be seen by writing Λg = ΣΛ′(Λ′Λ)−1g.

Page 136: Maskin ambiguity book mock 2 - New York University

128 Chapter 4. Quartet of Semigroups

The diffusion contribution

ραd (x) =(1− α)α

2g(x)′g(x) (4.46)

is a positive semi-definite quadratic form and the jump contribution

ραn(x) =

∫((1− α) + αh(y, x)− [h(y, x)]α)η0(dy|x) (4.47)

is positive because the tangent line to the concave function (h)α at h = 1must lie above the function.

Theorem 4.8.1. (Newman 1973 and Newman and Stuck 1979). The gen-erator of the positive contraction semigroup {Kα

t : t ≥ 0} on C is given by(4.44).

Thus we can interpret the generator Gα that bounds the detection errorprobabilities as follows. Take model zero to be the approximating model,and model one to be some other competing model. We use 0 < α < 1 tobuild a mixed diffusion-jump process from the pair (gα, hα) where gα

.= αg

and hα.= (h)α.

Use the notation Eα to depict the associated expectation operator. Then

av error ≤ 1

2Eα

[exp

(−∫ N

0

ρα(xt)dt

)∣∣∣x0 = x

]. (4.48)

Of particular interest to us is formula (4.45) for ρα, which can be interpretedas a local statistical discrimination rate between models. In the case of twodiffusion processes, this measure is a state-dependent counterpart to formula(4.43) in the example presented in section 4.8. The diffusion component ofthe rate is maximized by setting α = 1/2. But when jump components arealso present, α = 1/2 will not necessarily give the maximal rate. Theorem4.8.1 completes the fourth row of Table 4.1.

Detection statistics and robustness

Formulas (4.46) and (4.47) show how the local Chernoff entropy rate isclosely related to the conditional relative entropy measure that we used to

Page 137: Maskin ambiguity book mock 2 - New York University

4.8. Statistical discrimination 129

formulate robust control problems. In particular, the conditional relativeentropy rate ε in continuous time satisfies

ε = −dρα

∣∣∣α=1

.

In the case of diffusion process, the Chernoff rate equals the conditionalrelative entropy rate without the proportionality factor α(1− α).25

Further discussion

In some ways, the statistical decision problem posed above is too simple.It entails a pairwise comparison of ex ante equally likely models and givesrise to a statistical measure of distance. That the contenders are bothMarkov models greatly simplifies the bounds on the probabilities of makinga mistake when choosing between models. The implicit loss function thatjustifies model choice based on the maximal posterior probabilities is sym-metric (e.g. see Chow 1957). Finally, the detection problem compels thedecision-maker to select a specific model after a fixed amount of data havebeen gathered.

Bounds like Chernoff’s can be obtained when there are more than twomodels and also when the decision problem is extended to allow waitingfor more data before making a decision (e.g. see Hellman and Raviv 1970and Moscarini and Smith 2002).26 Like our problem, these generalizationscan be posed as Bayesian problems with explicit loss functions and priorprobabilities.

While the statistical decision problem posed here is by design too simple,we nevertheless find it useful in bounding a reasonable taste for robustness.The hypothesis of rational expectations instructs agents and the modelbuilder to eliminate as misspecified those models that are detectable frominfinite histories of data. Chernoff entropy gives us one way to extendrational expectations by asking agents to exclude specifications rejected

25The distributions associated with these rates differ, however. Bound (4.48) alsouses a Markov evolution indexed by α, whereas we used the α = 1 model in evaluatingthe robust control objective.

26In particular, Moscarini and Smith (2002) consider Bayesian decision problems witha more general but finite set of models and actions. Although they restrict their analysisto i.i.d. data, they obtain a more refined characterization of the large sample consequencesof accumulating information. Chernoff entropy is a key ingredient in their analysis too.

Page 138: Maskin ambiguity book mock 2 - New York University

130 Chapter 4. Quartet of Semigroups

by f inite histories of data but to contemplate alternative models that aredifficult to detect from f inite histories of data. When Chernoff entropyis small, it is challenging to choose between competing models on purelystatistical grounds.

Detection and plausibility

In section 4.6, we reinterpreted the equilibrium allocation under a prefer-ence for robustness as one that instead would be chosen by a Bayesian socialplanner who holds fast to a particular model that differs from the approxi-mating model. If the approximating model is actually true, then this artifi-cial Bayesian planner has a false model of forcing processes, one that enoughdata should disabuse him of. However, our detection probability tools let uskeep the Bayesian planner’s model sufficiently close to the approximatingmodel that more data than are available would be needed to detect that theapproximating model is really better. That means that a rational expecta-tions econometrician would have a difficult time distinguishing the forcingprocess under the approximating model from the Bayesian planner’s model.Nevertheless, that the Bayesian planner uses such a nearby model can haveimportant quantitative implications for decisions and/or asset prices. Wedemonstrate such effects on asset prices in section 4.9.27

4.9 Entropy and the market price of

uncertainty

In comparing different discrete time representative agent economies with ro-bust decision rules, HSW computationally uncovered a connection betweenthe market price of uncertainty and the detection error probabilities fordistinguishing between a planner’s approximating and worst case models.The connection was so tight that the market price of uncertainty could beexpressed as nearly the same linear function of the detection error prob-ability, regardless of the details of the model specification.28 HSW used

27For heterogeneous agent economies, worst case models can differ across agents be-cause their preferences differ. But an argument like the one in the text could still be usedto keep each agent’s worst case model close to the approximating model as measured bydetection probabilities.

28See Figure 8 in HSW.

Page 139: Maskin ambiguity book mock 2 - New York University

4.9. Entropy and the market price of uncertainty 131

this fact to calibrate θ’s, which differed across different models because therelationship between detection error probabilities and θ d id depend on thedetailed model dynamics.

As emphasized in section 4.2, the tight link that we have formally es-tablished between the semigroup for pricing under robustness and the semi-group for detection error probability bounds provides the key to understand-ing HSW’s empirical finding, provided that the detection error probabilitybounds do a good enough job of describing the actual detection error prob-abilities. Subject to that proviso, our formal results thus provide a wayof moving directly from a specification of a sample size and a detectionerror probability to a prediction of the market price of uncertainty thattranscends details of the model.

Partly to explore the quality of the information in our detection errorprobability bounds, this section takes three distinct example economies andshows within each of them that the probability bound is quite informativeand that consequently the links between the detection error probabilityand the market price of uncertainty are very close across the three differentmodels. All three of our examples use diffusion models, so that the formulassummarized in section 4.2 apply. Recall from formula (4.46) that for thecase of a diffusion the local Chernoff rate for discriminating between theapproximating model and the worst-case model is

α(1− α)|g|22, (4.49)

which is maximized by setting α = .5. Small values of the rate suggestthat the competing models would be difficult to detect using time seriesstatistical methods.

For a diffusion, we have seen how the price vector for the Brownianincrements can be decomposed as g + g. In the standard model withoutrobustness, the conditional slope of a mean-standard deviation frontier isthe absolute value of the factor risk price vector, |g|, but with robustness itis |g+ g|, where g is the part attributable to aversion to model uncertainty.One possible statement of the equity-premium puzzle is that standard mod-els imply a much smaller slope than is found in the data because plausiblerisk aversion coefficients imply a small value for g. This conclusion extendsbeyond the comparison of stocks and bonds and is also apparent in equityreturn heterogeneity. See Hansen and Jagannathan (1991), Cochrane andHansen (1992), and Cochrane (1997) for discussions.

Page 140: Maskin ambiguity book mock 2 - New York University

132 Chapter 4. Quartet of Semigroups

In this section we explore the potential contribution from model uncer-tainty. In particular, for three models we compute |g|, the associated boundson detection error probabilities, and the detection error probabilities them-selves. The three models are: (1) a generic one where the worst-case driftdistortion g is independent of x; (2) our robust interpretation of a model ofBansal and Yaron (2004) in which g is again independent of x but whereits magnitude depends on whether a low frequency component of growth ispresent; and (3) a continuous time version of HST’s equilibrium permanentincome model, in which g depends on x. Very similar relations betweendetection error probabilities and market prices of risk emerge across thesethree models, though they are associated with different values of θ.

Links among sample size, detection probabilities, andmpu when g is independent of x

mpu Chernoff rate probability probability(= |g|) (α = .5) bound.02 .0001 .495 .444.04 .0002 .480 .389.06 .0004 .457 .336.08 .0008 .426 .286.10 .0013 .389 .240.12 .0018 .349 .198.14 .0015 .306 .161.16 .0032 .264 .129.18 .0040 .222 .102.20 .0050 .184 .079.30 .0113 .053 .017.40 .0200 .009 .002

Table 4.2: Prices of Model Uncertainty and Detection-Error Probabilitieswhen g is independent of x, N = 200; mpu denotes the market price ofmodel uncertainty, measured by |g|. The Chernoff rate is given by (4.49).

In this section we assume that the approximating model is a diffusionand that the worst case model is such that g is independent of the Markov

Page 141: Maskin ambiguity book mock 2 - New York University

4.9. Entropy and the market price of uncertainty 133

state x. Without knowing anything more about the model, we can quan-titatively explore the links among the market price of model uncertainty(mpu ≡ |g|) and the detection error probabilities.

For N = 200, Table 4.2 reports values of mpu = |g| together withChernoff entropy for α = .5, the associated probability-error bound (4.48),and the actual probability of detection on the left side of (4.48) (which wecan calculate analytically in this case). The probability bounds and theprobabilities are computed under the simplifying assumption that the driftand diffusion coefficients are constant, as in the example in section 4.8. Withconstant drift and diffusion coefficients, the log-likelihood ratio is normallydistributed, which allows us easily to compute the actual detection-errorprobabilities.29

The numbers in Table 4.2 indicate that market prices of uncertaintysomewhat less than .2 are associated with misspecified models that are dif-ficult to detect. However, market prices of uncertainty of .40 are associatedwith easily detectable alternative models. The table also reveals that al-though the probability bounds are weak, they display patterns similar tothose of the actual probabilities.

Empirical estimates of the slope of the mean-standard deviation fron-tier are about .25 for quarterly data. Given the absence of volatility inaggregate consumption, risk considerations only explain a small componentof the measured risk-return tradeoff using aggregate data (Hansen and Ja-gannathan 1991). In contrast, our calculations suggest that concerns aboutstatistically small amounts of model misspecification could account for asubstantial component of the empirical estimates. The following subsec-tions confirm that this quantitative conclusion transcends details of thespecification of the approximating model.

Low frequency growth

Using a recursive utility specification, Bansal and Yaron (2004) study howlow frequency components in consumption and/or dividends become en-

29Thinking of a quarter as the unit of time, we took the sample interval to be 200.Alternatively, we might have used a sample interval of 600 to link to monthly postwardata. The market prices of risk and model uncertainty are associated with specific timeunit normalizations. Since, at least locally, drift coefficients and diffusion matrices scalelinearly with the time unit, the market prices of risk and model uncertainty scale withthe square root of the time unit.

Page 142: Maskin ambiguity book mock 2 - New York University

134 Chapter 4. Quartet of Semigroups

coded in risk premia. Here we take up Bansal and Yaron’s theme that thefrequency decomposition of risks matters but reinterpret their risk premiaas reflecting model uncertainty.

Consider a pure endowment economy where the state x1t driving theconsumption endowment exp(x1t) is governed by the following process:

dx1t = (.0020 + .0177x2t)dt+ .0048dBt

dx2t = −.0263x2tdt+ .0312dBt. (4.50)

The logarithm of the consumption endowment has a constant component ofthe growth rate of .002 and a time varying component of the growth rate ofx2t; x2t has mean zero and is stationary but is highly temporally dependent.Relative to the i.i.d. specification that would be obtained by attaching acoefficient of zero on x2t in the first equation of (4.50), the inclusion of thex2t component alters the long run properties of consumption growth.

We calibrated the state evolution equation (4.50) by taking a discrete-time consumption process that was fit by Bansal and Yaron and embeddingit in a continuous state-space model.30 We accomplished this by sequentallyapplying the conversions tf, d2c, and ss in the MATLAB control toolbox.

The impulse response of log consumption to a Brownian motion shock,which is portrayed in Figure 4.2, and the spectral density function, whichis shown in Figure 4.3, both show the persistence in consumption growth.The low frequency component is a key feature in the analysis of Bansal andYaron (2004). The impulse response function converges to its supremumfrom below. It takes about ten years before the impulse response functionis close to its supremum. Corresponding to this behavior of the impulseresponse function, there is a narrow peak in the spectral density of con-sumption growth at frequency zero, with the spectral density being muchgreater than its infimum only for frequencies with periods of more than tenyears.

In what follows, we will also compute model uncertainty premia for analternative economy in which the coefficient on x2t in the first equation of(4.50) is set to zero. We calibrate this economy so that the resulting spec-tral density for consumption growth is flat at the same level as the infimum

30We base our calculations on an ARMA model for consumption growth, namely,log ct − log ct−1 = .002 + [(1 − .860L)/(1 − .974L)](.0048)νt reported by Bansal andYaron (2004) where {νt : t ≥ 0} is a serially uncorrelated shock with mean zero and unitvariance.

Page 143: Maskin ambiguity book mock 2 - New York University

4.9. Entropy and the market price of uncertainty 135

0 20 40 60 80 100 120 140 160 1800

0.005

0.01

0.015

0.02

0.025

0.03

Time(months)

Imp

uls

e R

esp

on

se

Figure 4.2: Impulse response function for the logarithm of the consumptionendowment x1t to a Brownian increment.

0 0.5 1 1.5 2 2.5 30

2

4

6

8x 10

−4

Frequency (rad/month)

Spectr

al D

ensity

Figure 4.3: Spectral density function for the consumption growth rate dx1t.

Page 144: Maskin ambiguity book mock 2 - New York University

136 Chapter 4. Quartet of Semigroups

depicted in Figure 4.3 and so that the corresponding impulse response func-tion is also flat at the same initial response reported in Figure 4.2. Becausewe have reduced the long run variation by eliminating x2t from the firstequation, we should expect to see smaller risk-premia in this economy.

Suppose that the instantaneous utility function is logarithmic. Thenthe value function implied by Theorem 4.5.1 is linear in x, so we can writeV as V (x) = v0 + v1x1 + v2x2. The distortion in the Brownian motion isgiven by the following special case of equation (4.28)

g = −1

θ(.0048v1 + .0312v2)

and is independent of the state x. The coefficients on the vi’s are thevolatility coefficients attached to dB in equation (4.50).31

Since g is constant, the worst-case model implies the same impulse re-sponse functions but different mean growth rates in consumption. Largervalues of 1/θ lead to more negative values of the drift g. We have to com-pute this mapping numerically because the value function itself depends onθ. Figure 4.4 reports g as a function of 1/θ (the g’s are negative). Largervalues of |g| imply larger values of the rate of Chernoff entropy. As in theprevious example this rate is constant, and the probability bounds in Table4.2 continue to apply to this economy.32

The instantaneous risk free rate for this economy is:

rft = δ + .0020 + .0177x2t + σ1gt −(σ1)

2

2

where σ1 = .0048 is the coefficient on the Brownian motion in the evolu-tion equation for x1t. Our calculations hold the risk free rate fixed as wechange θ. This requires that we adjust the subjective discount rate. Thepredictability of consumption growth emanating from the state variable x2tmakes the risk free rate vary over time.

31The state independence implies that we can also interpret these calculations ascoming from a decision problem in which |g| is constrained to be less than or equalto the same constant for each date and state. This specification satisfies the dynamicconsistency postulate advocated by Epstein and Schneider (2003b).

32However, the exact probability calculations reported in Table 4.2 will not apply tothe present economy.

Page 145: Maskin ambiguity book mock 2 - New York University

4.9. Entropy and the market price of uncertainty 137

0 1 2 3 4 5 6 7 8 9 10−0.25

−0.2

−0.15

−0.1

−0.05

0

1/θ

wo

rst−

ca

se

dis

tort

ion

Dependent Growthi.i.d Growth

Figure 4.4: Drift distortion g for the Brownian motion. This distortionis plotted as a function of 1/θ. For comparison, the drift distortions arealso given for the economy in which the first-difference in the logarithm ofconsumption is i.i.d. In generating this figure, we set the discount parameterδ so that the mean risk-free rate remained at 3.5% when we changed θ. Inthe model with temporally dependent growth the volatility of the risk-freerate is .38%. The risk-free rate is constant in the model with i.i.d. growthrates.

Risk-sensitivity and calibration of θ

This economy has a risk-sensitive interpretation along the lines asserted byTallarini (2000a) in his analysis of business cycle models, one that makes 1/θa risk-sensitivity parameter that imputes risk aversion to the continuationutility and makes g become the incremental contribution to risk aversioncontributed by this risk adjustment. Under Tallarini’s risk-sensitivity in-terpretation, 1/θ is a risk aversion parameter, and as such is presumablyfixed across environments.

Figure 4.4 shows that holding θ fixed but changing the consumption en-dowment process changes the detection error rates. For a given θ, the im-plied worst case model could be more easily detected when there is positive

Page 146: Maskin ambiguity book mock 2 - New York University

138 Chapter 4. Quartet of Semigroups

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Time (quarters)

Imp

uls

e R

esp

on

se

Figure 4.5: This figure gives the impulse response functions for the twoincome processes to two independent Brownian motion shocks.

persistence in the endowment growth rate. On the robustness interpreta-tion, such detection error calculations suggest that θ should not be taken tobe invariant across environments. However, on the risk sensitivity interpre-tation, θ should presumably be held fixed across environments. Thus whileconcerns about risk and robustness have similar predictions within a givenenvironment, calibrations of θ can depend on whether it is interpreted as arisk-sensitivity parameter that is fixed across environments, or a robustnessparameter to be adjusted across environments depending on how detectionerror probabilities differ across environments.

Permanent income economy

Our previous two examples make Chernoff entropy be independent of theMarkov state. In our third example, we computed detection-error boundsand detection-error probabilities for the version of HST’s robust permanentincome model that includes habits.33 The Chernoff entropies are state de-

33HST estimated versions of this model with and without habits.

Page 147: Maskin ambiguity book mock 2 - New York University

4.9. Entropy and the market price of uncertainty 139

pendent for this model, but the probability bounds can still be computednumerically. We used the parameter values from HST’s discrete-time modelto form an approximate continuous-time robust permanent income model,again using conversions in the MATLAB control toolbox. HST allowed fortwo independent income components when estimating their model. The im-pulse responses for the continuous-time version of the model are depictedin Figure 4.5. The responses are to independent Brownian motion shocks.One of the processes is much more persistent than the other one. That per-sistent process is the one that challenges a permanent-income-style saver.

When we change the robustness parameter θ, we alter the subjectivediscount rate in a way that completely offsets the precautionary motive forsaving in HST’s economy and its continuous-time counterpart, so that con-sumption and investment profiles and real interest rates remain fixed.34 Ithappens that the worst case g vector is proportional to the marginal utilityof consumption and therefore is highly persistent. This outcome reflectsthat the decision rule for the permanent income model is well designed toprotect the consumer against transient fluctuations, but that it is vulnerableto model misspecifications that are highly persistent under the approximat-ing model. Under the approximating model, the marginal utility processis a martingale, but the (constrained) worst case model makes this processbecome an explosive scalar autoregression. The choice of θ determines themagnitude of the explosive autoregressive coefficient for the marginal utilityprocess. The distortion is concentrated primarily in the persistent incomecomponent. Figure 4.6 compares the impulse response of the distorted in-come process to that of the income process under the approximating model.Under the distorted model there is considerably more long-run variation.Decreasing θ increases this variation.

Table 4.3 gives the detection-error probabilities corresponding to thosereported in Table 4.2. In this case, the Chernoff entropy is state dependent,leading us to compute the right side of (4.48) numerically in Table 4.3.35

The values of θ are set so that the market prices of uncertainty match those

34See HST and HSW for a proof in a discrete time setting of an observational equiva-lence proposition that identifies a locus of (δ, θ) pairs that are observationally equivalentfor equilibrium quantities.

35See Hansen and Sargent (2008b) for computational details. In computing the proba-bility bounds, we chose the following initial conditions: we set the initial marginal utilityof consumption to 15.75 and the mean zero components of the income processes to theirunconditional means of zero.

Page 148: Maskin ambiguity book mock 2 - New York University

140 Chapter 4. Quartet of Semigroups

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Time (quarters)

Imp

uls

e R

esp

on

se

Figure 4.6: This figure gives impulse response functions for the persistentincome process under the approximating model and the model for whichmpu = .16. The more enduring response is from the distorted model. Theimplied risk-free rate is constant and identical across the two models.

in Table 4.2.

Despite the different structures of the two models, for mpus up to about.12, the results in Table 4.3 are close to those of Table 4.2, though thedetection-error probabilities are a little bit lower in Table 4.3. However, theprobabilities do decay faster in the tails for the permanent income model.As noted, the model in Table 4.2 has no dependence of g on the state x. Asshown by HST, the fluctuations in |g| are quite small relative to its level,which contributes to the similarity of the results for low mpus. It remainsthe case that statistical detection is difficult for market prices of uncertaintyup to about half that of the empirically estimated magnitude.

We conclude that a preference for robustness that is calibrated to plau-sible values of the detection error probabilities can account for a substantialfraction, but not all, of estimated equity premia, and that this fraction isimpervious to details of the model specification. Introducing other fea-tures into a model can help to account more fully for the steep slope of the

Page 149: Maskin ambiguity book mock 2 - New York University

4.10. Concluding remarks 141

mpu 1θ

probability simulated(= |g|) bound probabilities.02 1.76 .495 .446.04 3.51 .479 .388.06 5.27 .453 .334.08 7.02 .416 .282.10 8.78 .372 .231.12 10.53 .323 .185.14 12.29 .271 .143.16 14.04 .220 .099.18 15.80 .171 .072.20 17.56 .128 .054.30 26.33 .015 .004.40 35.11 .0004 .000

Table 4.3: Prices of Model Uncertainty and Detection-Error Probabilitiesfor the Permanent Income Model, N = 200. The probability bounds werecomputed numerically and optimized over the choice of α. The simulatedprobabilities were computed by simulating discrete-approximations with atime interval of length .01 of a quarter and with 1000 replications.

frontier, but they would have to work through g and not the robustnesscomponent g. For example, market frictions or alterations to preferencesaffect the g component.36

4.10 Concluding remarks

In this paper we have applied tools from the mathematical theory of continuous-time Markov processes to explore the connections between model misspec-

36We find it fruitful to explore concern about model uncertainty because these othermodel modifications are themselves only partially successful. To account fully for themarket price of risk, Campbell and Cochrane (1999) adopt specifications with substantialrisk aversion during recessions. Constantinides and Duffie (1996) accommodate fully thehigh market prices of risk by attributing empirically implausible consumption volatilityto individual consumers (see Cochrane (1997). Finally, Heaton and Lucas (1996) showthat reasonable amounts of proportional transaction costs can explain only about halfof the equity premium puzzle.

Page 150: Maskin ambiguity book mock 2 - New York University

142 Chapter 4. Quartet of Semigroups

ification, risk, robustness, and model detection. We used these tools inconjunction with a decision theory that captures the idea that the decisionmaker views his model as an approximation. An outcome of the decisionmaking process is a constrained worst-case model that the decision makeruses to create a robust decision rule. In an equilibrium of a representativeagent economy with such robust decision makers, the worst case model ofa fictitious robust planner coincides with the worst case model of a repre-sentative private agent. These worst case models are endogenous outcomesof decision problems and give rise to model uncertainty premia in securitymarkets. By adjusting constraints or penalties, the worst case model canbe designed to be close to the approximating model in the sense that itis difficult statistically to discriminate it from the original approximatingmodel. These same penalties or constraints limit model uncertainty premia.Since mean returns are hard to estimate, nontrivial departures from an ap-proximating model are difficult to detect. As a consequence, within limitsimposed by statistical detection, there can still be sizable model uncertaintypremia in security prices.

There is another maybe less radical interpretation of our calculations.A rational expectations econometrician is compelled to specify forcing pro-cesses, often without much guidance except from statistics, for example,taking the form of a least squares estimate of an historical vector autoregres-sion. The exploration of worst case models can be thought of as suggestingan alternative way of specifying forcing process dynamics that is inher-ently more pessimistic, but that leads to statistically similar processes. Notsurprisingly, changing forcing processes can change equilibrium outcomes.More interesting is the fact that sometimes subtle changes in perceivedforcing processes can have quantitatively important consequences on equi-librium prices.

Our altered decision problem introduces a robustness parameter that werestrict by using detection probabilities. In assessing reasonable amountsof risk aversion in stochastic, general equilibrium models, it is common toexplore preferences for hypothetical gambles in the manner of Pratt (1964).In assessing reasonable preferences for robustness, we propose using largesample detection probabilities for a hypothetical model selection problem.We envision a decision-maker as choosing to be robust to departures thatare difficult to detect statistically. Of course, using detection probabilitiesin this way is only one way to discipline robustness. Exploration of otherconsequences of being robust, such as utility losses, would also be interest-

Page 151: Maskin ambiguity book mock 2 - New York University

4.10. Concluding remarks 143

ing.We see three important extensions to our current investigation. Like

builders of rational expectations models, we have side-stepped the issue ofhow decision-makers select an approximating model. Following the litera-ture on robust control, we envision this approximating model to be analyt-ically tractable, yet to be regarded by the decision maker as not providinga correct model of the evolution of the state vector. The misspecifica-tions we have in mind are small in a statistical sense but can otherwise bequite diverse. Just as we have not formally modelled how agents learnedthe approximating model, neither have we formally justified why they donot bother to learn about potentially complicated misspecifications of thatmodel. Incorporating forms of learning would be an important extension ofour work.

The equilibrium calculations in our model currently exploit the represen-tative agent paradigm in an essential way. Reinterpreting our calculationsas applying to a multiple agent setting is straightforward in some circum-stances (see Tallarini 2000a), but in general, even with complete securitymarket structures, multiple agent versions of our model look fundamentallydifferent from their single-agent counterparts (see Anderson 1998). Thus,the introduction of heterogeneous decision makers could lead to new insightsabout the impact of concerns about robustness for market outcomes.

Finally, while the examples of section 4.9 all were based on diffusions,the effects of concern about robustness are likely to be particularly impor-tant in environments with large shocks that occur infrequently, so that weanticipate that our modelling of robustness in the presence of jump compo-nents will be useful.

Page 152: Maskin ambiguity book mock 2 - New York University

144 Chapter 4. Quartet of Semigroups

Appendix 4.A Proof of Theorem 4.5.1

Proof. To verify the conjectured solution, first note that the objective isadditively separable in g and h. Moreover the objective for the quadraticportion in g is:

θg′g2

+ g′Λ′∂V∂x

. (4.51)

Minimizing this component of the objective by choice of g verifies the con-jectured solution. The diffusion contribution to the optimized objectiveincluding (4.51) is:

Gd(V )− 1

2θ(∂V

∂x)′Σ(

∂V

∂x) = −θGd[exp(−V/θ)]

exp(−V/θ)where we are using the additive decomposition: G = Gd + Gn. Very similarreasoning justifies the diffusion contribution to entropy formula (4.22c).

Consider next the component of the objective that depends on h:

θ

∫[1−h(y, x)+h(y, x) log h(y, x)]η(dy|x)+

∫h(y, x)[V (y)−V (x)]η(dy|x)

(4.52)To verify37 that this objective is minimized by h, first use the fact that1− h + h log h is convex and hence dominates the tangent line at h. Thus

1− h+ h log h ≥ 1− h+ h log h + log h(h− h).

This inequality continues to hold when we multiply by θ and integrate withrespect to η(dy|x). Thus

θε(h)(x)−θ∫h(y, x) log h(y, x)η(dy|x) ≥ θε(h)(x)−θ

∫h(y, x) log h(y, x)η(dy|x).

37In the special case in which the number of states is finite and the probability ofjumping to any of these states is strictly positive, a direct proof that h is the minimizeris available. Abusing notation somewhat, let η (yi|x) > 0 denote the probability that thestate jumps to its ith possible value, yi given that the current state is x. We can writethe component of the objective that depends upon h (which is equivalent to equation(4.52) in this special case) as

θ +

n∑i=1

η (yi|x) [−θh (yi, x) + θh (yi, x) log h (yi, x) + h (yi, x)V (yi)− h (yi, x) V (x)].

Differentiating this expression with respect to h (yi, x) yields η (yi|x) [θ log h (yi, x) +V (yi) − V (x)]. Setting this derivative to zero and solving for h (yi, x) yields h (yi, x) =

exp[V (x)−V (yi)θ ], which is the formula for h given in the text.

Page 153: Maskin ambiguity book mock 2 - New York University

4.A. Proof of Theorem 4.5.1 145

Substituting for log[h(y, x)] shows that hminimizes (4.52), and the resultingobjective is:

θ

∫[1− h(y, x)]η(dy|x) = −θGn exp(−V/θ)

exp(−V/θ) ,

which establishes the jump contribution to (4.22b). Very similar reasoningjustifies the jump contribution to (4.22c)

Page 154: Maskin ambiguity book mock 2 - New York University
Page 155: Maskin ambiguity book mock 2 - New York University

Chapter 5

Robust Control and ModelUncertainty

5.1 Introduction

This paper describes links between the max-min expected utility theory ofGilboa and Schmeidler (1989) and the applications of robust control theoryproposed by Anderson et al. (2000) and Dupuis et al. (1998). The max-min expected utility theory represents uncertainty aversion with preferenceorderings over decisions c and states x, for example, of the form

infQ∈Q

EQ

[∫ ∞

0

exp(−δt)U(ct, xt)d t]

(5.1)

where Q is a set of measures over c, x, and δ a discount rate. Gilboaand Schmeidler’s theory leaves open how to specify the set Q in particularapplications.1

Criteria like (5.1) also appear as objective functions in robust controltheory. Robust control theory specifies Q by taking a single ‘approximatingmodel’ and statistically perturbing it; Q is typically parameterized only im-plicitly, through a positive penalty variable θ. This paper describes how totransform that ‘penalty problem’ into a closely related ‘constraint problem’like (5.1). These two formulations differ in subtle ways but are connected viathe Lagrange multiplier theorem. The implicit preference orderings differ

1This paper summarizes detailed arguments in Hansen et al. (2006b). Tom:XXXXXX reconcile

147

Page 156: Maskin ambiguity book mock 2 - New York University

148 Chapter 5. Robust Control and Model Uncertainty

but imply the same decisions. Both preferences are recursive, and there-fore both are time consistent. However, time consistency for the constraintspecification requires that we introduce a new endogenous state variable torestrict how probability distortions are reconsidered at future dates. To fa-cilitate comparisons to Anderson et al. (2000) Tom XXXXXX: reconcileand Chen and Epstein (2002), we cast our discussion within continuous-timediffusion models.

5.2 A Benchmark Resource Allocation

Problem

We first pose a discounted, infinite time optimal resource allocation prob-lem without regard to robustness. Let {Bt : t ≥ 0} denote a d-dimensional,standard Brownian motion on an underlying probability space (Ω,F , P ).Let {Ft : t ≥ 0} denote the completion of the filtration generated bythis Brownian motion. The actions of the decision-maker form a stochas-tic process {ct : t ≥ 0} that is progressively measurable. Let U denotean instantaneous utility function, and write the discounted objective assupc∈C E

[∫∞0

exp(−δt)U(ct, xt)d t]subject to:

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt (5.2)

where x0 is a given initial condition and C is a set of admissible controlprocesses. We use P to denote the stochastic process for xt generated by(5.2). Equation (5.2) will be the ‘approximating model’ of later sections, towhich all other models in Q are perturbations.

We restrict μ and σ so that any progressively measurable control c inC implies a progressively measurable state vector process x. We assumethroughout that the objective for the control problem without reference torobustness has a finite upper bound.

5.3 Model Misspecification

The decision maker treats (5.2) as an approximation by taking into accounta class of alternative models that are statistically difficult to distinguishfrom (5.2). To construct a perturbed model we replace Bt in (5.2) byBt +

∫ t0hsds where h is progressively measurable and {Bt} is a Brownian

Page 157: Maskin ambiguity book mock 2 - New York University

5.3. Model Misspecification 149

motion. Then we can write the distorted stochastic evolution in continuoustime as dxt = μ(ct, xt)dt+σ(ct, xt)(htdt+ dBt) under the Brownian motionprobability specification.

Changes in Measure

The process h is used as device to transform the probability distributionP on (Ω,F) into a new distribution Q that is absolutely continuous withrespect to P . An absolutely continuous change in measure for a stochasticprocess can be represented in terms of a nonnegative martingale. Let Qdenote a probability distribution that is absolutely continuous with respectto P . Associated with Q is a family of expectation operators applied torandom variables that are Ft measurable for each t. Thus we can writeEQgt = EPgtqt for any bounded gt that is Ft measurable and some non-negative random variable qt that is Ft measurable. The random variable qtis called a Radon-Nikodym derivative. In our setting, we use the Girsanov

Theorem to depict qt as qt = exp[∫ t

0hτ · dBτ −

∫ t0

|hτ |22dt]. We use this

representation to justify our use of h to parameterize absolutely continuouschanges of measure.2 When h is zero we revert to the benchmark controlproblem.

Relative Entropy of a Stochastic Process

Consider a scalar stochastic process {gt} that is progressively measurable.This process is a random variable on a product space. Form Ω∗ = Ω× R

+

where R+ is the nonnegative real line; form the corresponding sigma alge-

bra F∗ as the smallest sigma algebra containing Ft ⊗ Bt for any t whereBt is the collection of Borel sets in [0, t]; and form P ∗ as the product mea-sure P ×M where M is exponentially distributed with density δ exp(−δt).We let E∗ denote the expectation operator on the product space. TheE∗ expectation of the stochastic process {gt} is by construction E∗(g) =δ∫∞0

exp(−δt)E(gt)dt.We extend this construction by using the probability measure Q. Form

Q∗ = Q×M . The process {qt} is a Radon-Nikodym derivative for Q∗ with

2Perturbations that are not absolutely continuous are easy to detect statistically,which is the reason that Anderson et al. (2000) impose absolute continuity on the per-turbations.

Page 158: Maskin ambiguity book mock 2 - New York University

150 Chapter 5. Robust Control and Model Uncertainty

respect to P ∗: E∗Q(g) = δ

∫∞0

exp(−δt)E(qtgt)dt. The Q∗ can be used toevaluate discounted expected utility under an absolutely continuous changein measure.

We measure the discrepancy between the distributions of P and Q as therelative entropy between Q∗ and P ∗: R(Q) = δ

∫∞0

exp(−δt)EQ (log qt) dt =∫∞0

exp(−δτ)EQ(

|hτ |22

)dτ. Relative entropy is convex in the measure Q∗

(e.g. see Dupuis and Ellis (1997)). Relative entropy is nonnegative andzero only when the probability distributions P ∗ and Q∗ agree. This is trueonly when the process h is zero.

5.4 Two Robust Control Problems

We study the relationship between two robust control problems. Let EQdenote the mathematical expectation taken with respect to the stochasticprocess {Bt : t ≥ 0} where dBt = dBt+htdt and {Bt : t ≥ 0} is a Brownianmotion under both P and Q. Thus we parameterize Q by the choice of driftdistortion {ht}, and use the state evolution equation:

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt. (5.3)

We define two control problems. A multiplier robust control problemissupc∈C infQEQ

[∫∞0

exp(−δt)U(ct, xt)d t]+ θR(Q) subject to (5.3). A con-

straint robust control problem is supc∈C infQEQ[∫∞

0exp(−δt)U(ct, xt)d t

]subject

to (5.3) and R(Q) ≤ η. Note that R(Q) ≤ η is a single intertemporal con-straint on the entire path of distortions h.

These two problems are closely related. We can interpret the robustnessparameter θ in the first problem as an implied Lagrange multiplier on thespecification-error constraint R(Q) ≤ η.3 Use θ to index a family of multi-plier robust control problems and η to index a family of constraint robustcontrol problems. Because not all values of θ are admissible, we consideronly nonnegative values of θ for which it is feasible to make the objectivefunction greater than −∞. Call the closure of this set Θ. In Hansen et al.(2006b) we provide assumptions and a proof for:

3 This connection has been explored in informally in Hansen et al. (1999) and for-mally in Hansen and Sargent (2008c) Tom XXXXX: reconcile in the context oflinear-quadratic control problem. We mimic arguments in Petersen et al. (2000a) andLuenberger (1969).

Page 159: Maskin ambiguity book mock 2 - New York University

5.5. Recursivity of the Multiplier Formulation 151

Claim 5.4.1. Suppose that for η = η∗, c∗ and Q∗ solve the constraintrobust control problem. There exists a θ∗ ∈ Θ such that the multiplier andconstraint robust control problem have the same solution.

To construct the multiplier, let J(c, η) satisfyJ(c, η) = infQEQ[∫∞

0exp(−δt)U(ct, xt)d t

],

subject to R(Q) ≤ η and J∗(η) = supc∈C J(c, η). As argued by Luenberger(1969), J(c, η) is decreasing and convex in η. These same properties carryover to the optimized (over c) function J∗. Given η∗, we let θ∗ be the nega-tive of the slope of the subgradient of J∗ at η∗, i.e., θ∗ is the absolute valueof the slope of a line tangent to J∗ at η∗.

Hansen et al. (2006b) also establish:

Claim 5.4.2. Suppose J∗ is strictly decreasing, θ∗ is in the interior of Θ,and that there exists a solution c∗ and Q∗ to the multiplier robust controlproblem. Then that c∗ also solves the constraint robust control problem forη = η∗ = R(Q∗).

Claims 5.4.1 and 5.4.2 are observational equivalence results because theydescribe how the multiplier and constraint robust control problems giverise to the same decisions. By adapting arguments in Tom XXXXX:reconcile Hansen and Sargent (1995b) and Anderson et al. (2000), it canbe shown that the multiplier robust control problem has the same solution asa recursive risk-sensitive control problem, where −θ−1 is the risk-sensitivityparameter.4 Claims 5.4.1 and 5.4.2 thus link a risk-sensitive control problemto the constraint robust control problem.

5.5 Recursivity of the Multiplier

Formulation

The multiplier robust control problem can be represented as supc infh E∫∞0

exp(−δt)[U(ct, xt) +

θ2(ht · ht)

]d t subject to dxt = μ(ct, xt)dt+σ(ct, xt)(htd t+dBt).

We can view h as a second control process in a two-player zero-sum game.

4Risk-sensitive control theory makes decision rules more responsive to risk by makingan exponential adjustment to the objective of the decision-maker in the same way usedby Epstein and Zin (1989a) and Duffie and Epstein (1992). Hansen and Sargent (1995b)and Anderson et al. (2000) show how risk-sensitive control theory can be motivatedthrough recursive utility theory.

Page 160: Maskin ambiguity book mock 2 - New York University

152 Chapter 5. Robust Control and Model Uncertainty

Given h we can fix the distribution for B as a multivariate standard Brow-nian motion. Then there is a single probability distribution in play and weuse the notation E to denote the associated expectation operator. Flem-ing and Souganidis (1989) tell how a Bellman-Isaacs condition justifies arecursive solution by relating a solution to a date zero commitment gameto a Markov perfect game in which the decision rules of both agents arefunctions of the state vector xt. The Bellman-Isaacs condition is:

Assumption 5.5.1. There exists a value function V such that:

δV = maxc

minhU(c, x) +

θ

2h · h+ [μ(c, x) + σ(c, x)h] · ∂V (x)

∂x

+trace

[σ(c, x)′

∂2V (x)

∂x∂x′σ(c, x)

]= min

hmaxcU(c, x) +

θ

2h · h+ [μ(c, x) + σ(c, x)h] · ∂V (x)

∂x

+trace

[σ(c, x)′

∂2V (x)

∂x∂x′σ(c, x)

].

The Bellman-Isaacs condition defines a Bellman equation for a two-playerzero-sum game in which both players decide at time 0 or recursively. Theassociated decision rules for c and h also solve our two robust control prob-lems.

5.6 Two Preference Orderings

While the Lagrange multiplier theorem links the two robust control prob-lems, the implied preference orders differ. But they are related at the com-mon solution to both problems, where their indifference curves are tangent.

Preference Orderings

To construct two preference orderings, we assume an endogenous state vec-tor st:

dst = μs(st, ct)dt. (5.4)

where this differential equation can be solved uniquely for st given s0 andprocess {cs : 0 ≤ s < t}. We assume that the solution is a progressively

Page 161: Maskin ambiguity book mock 2 - New York University

5.6. Two Preference Orderings 153

measurable process {st : t ≥ 0}. We think of st as an endogenous compo-nent of the state vector xt. We can use st to make preferences nonseparableover time as in models with habit persistence. We use the felicity functionu(st, ct) to represent preferences that are additively separable in (st, ct).

We define preference orders for times τ ≥ 0 in terms of two functions,Dτ (c, sτ ), Rτ (Q). First, defineDτ (c, sτ ) =

∫∞τ

exp(−δt)u(st+τ , ct+τ )dtwheresτ is the date τ initial condition for differential equation (5.4). The impactof consumption between dates 0 and τ is captured by the state variable sτ .

Next, define a time τ model discrepancy measureRτ (Q) = δ

∫∞0

exp(−δt)EQ (log qt+τ − log qτ |Fτ ) dt. The local evolution of

R(Q) is dRt(Q) =[− |ht|2

2+ δRt(Q)

]dt with initial condition: R0(Q) =

R(Q). We use Dτ (c, sτ ) to represent both preference specifications at τ ,and use Rτ (Q) to help us represent preferences under the constraint speci-fication.

For fixed θ, we represent the date τ multiplier preferences using the val-uation function Wτ (c; θ) = infQEQ [Dτ (c, sτ)|Fτ ]+θRτ (Q). For a nonnega-tive rτ that is Fτ measurable, we represent the time τ constraint preferencesin terms of the valuation functionWτ (c; rτ ) = infRτ (Q)≤rτ EQ [Dτ (c, sτ )|Fτ ] .For convenience, denote the time 0 versionW0(c, r0) =W (c, η) and the time0 version W0(c, θ) = W (c, θ).

We define preference orderings as follows. For any two progressivelymeasurable c and c∗, c∗ �η c if W (c∗; η) ≥ W (c; η). For any two progres-

sively measurable c and c∗, c∗�θc if W (c∗; θ) ≥ W (c; θ). We would useanalogous definitions for time τ versions of the preference orderings.

The multiplier preference ordering coincides with a recursive, risk sen-sitive preference ordering provided that θ > 0.5

Relation between the Preference Orders

The two time 0 preference orderings differ. Furthermore, given η, thereexists no θ that makes the two preference orderings agree. However, theLagrange Multiplier Theorem delivers a weaker result that is very useful tous. While globally the preference orderings differ, indifference curves that

5Under the Brownian motion information structure, these multiplier preferences co-incide with a special case of stochastic differential utility studied by Duffie and Epstein(1992)

Page 162: Maskin ambiguity book mock 2 - New York University

154 Chapter 5. Robust Control and Model Uncertainty

pass through the solution c∗ to the optimal resource allocation problem aretangent.

Use the Lagrange Multiplier Theorem to writeW (c∗; η∗) = maxθ infQEQD(c∗)+θ [R(Q)− η∗] , and let θ∗ denote the maximizing value of θ, which we as-sume to be strictly positive. Suppose that c∗ �η∗ c. Then W (c; θ∗)−θ∗η∗ ≤W (c; η∗) ≤W (c∗; η∗) = W (c∗; θ∗)− θ∗η∗. Thus c∗�θ∗c.

The observational equivalence results from claims 5.4.1 and 5.4.2 applyto consumption profile c∗. At this point, the indifference curves are tan-gent, implying that they are supported by the same prices. Observationalequivalence claims made by econometricians typically refer to equilibriumtrajectories and not to off-equilibrium aspects of the preference orders.

5.7 Recursivity of the Preference Orderings

To study time consistency, we describe the relation between the time zeroand time τ > 0 valuation functions that define preference orders. At dateτ , some information has been realized and some consumption has takenplace. Our preference orderings focus the attention of the decision-makeron subsequent consumption in states that can be realized given currentinformation. These considerations underlie our use of Dτ and Rτ to depictWτ (c, θ) and Wτ (c, rτ ). The function Dτ reflects a change in vantage pointas time passes. Except through sτ , the function Dτ depends only on theconsumption process from date τ forward.

In addition, at date τ the decision maker focuses on states that canbe realized from date τ forward. Expectations used to average over statesare conditioned on date τ information. In this context, while conditioningon time τ information, it would be inappropriate to constrain probabilitiesusing only date zero relative entropy. Imposing a date zero relative entropyconstraint at date τ would introduce a temporal inconsistency by letting theminimizing agent put no probability distortions at dates that have alreadyoccurred and in states that at date τ cannot be realized. Instead, we makethe date τ decision-maker explore only probability distortions that alter hispreferences from date τ forward. This leads us to use Rτ as a conditionalcounterpart to our relative entropy measure.

Our entropy measure has a recursive structure. Date zero relative en-tropy is easily constructed from the conditional relative entropies in future

Page 163: Maskin ambiguity book mock 2 - New York University

5.7. Recursivity of the Preference Orderings 155

time periods. We can write:

R(Q) = EQ

[∫ τ

0

exp(−δt) |ht|2

2dt+ exp(−δτ)Rτ (Q)

](5.5)

The recursive structure of the multiplier preferences follows from this rep-resentation. In effect the date zero valuation function W can be separatedby disjoint date τ events and depicted as

W (c; θ) = inf{ht:0≤t<τ}

E

(∫ τ

0

exp(−δt)[U(ct, st) + θ

|ht|22

]dt+ Wτ (c; θ)

)subject to

dBt = dBt + htdt (5.6)

dst = μs(st, ct)dt. (5.7)

The constraint preferences at time τ make the decision-maker explorechanges in probability distributions from date τ forward. We also wantto exclude the possibility of changing the probabilities of events knownin previous dates and of events known not to occur. For the date zeroconstraint preferences, given c we can find an h process used to constructW (c, η). Associated with this h process, we can compute the time τ con-ditional relative entropy Rτ (Q). Thus, implicit in the construction of thevaluation function W (c, η) is a partition of relative entropy over time andacross states as in (5.5). At date τ we ask the decision-maker to exploreonly changes in beliefs that effect outcomes that can be realized in thefuture. That is, we impose the constraint Rτ (Q) ≤ rτ for rτ = Rτ (Q)along with fixing ht for 0 ≤ t < τ . Notice that with this constraint im-posed, R(Q) ≤ R(Q) so that we continue to satisfy our date zero relativeentropy constraint. We tie the hands of the date τ decision-maker to in-herit how conditional relative entropy is to be allocated across states thatare realized at date τ . (Chen and Epstein (2002) avoid this extra hand-tying by imposing separate constraints on h for every date and state.) Wecan write the valuation function for the constrained problem recursivelyas W (c, η) = inf{ht:0≤t<τ} E

∫ τ0exp(−δt)U(ct, st)dt + EWτ (c, rτ) subject to

(5.6), (5.7) and rτ ≥ 0, where rτ solves drt =(δrt − |ht|2

2

)dt for 0 ≤ t < τ

with initial condition r0 = η.

Page 164: Maskin ambiguity book mock 2 - New York University

156 Chapter 5. Robust Control and Model Uncertainty

5.8 Concluding Remarks

Empirical work in macroeconomics and finance typically assumes a uniqueand explicitly specified dynamic statistical model. To use Gilboa andSchmeidler’s multiple-model expected utility theory, we have turned to ro-bust control theory for a parsimonious (one parameter) set of alternativemodels with rich alternative dynamics. Those alternative models come fromperturbing the decision maker’s approximating model to allow its shocks tofeed back on state variables arbitrarily. This allows the approximatingmodel to miss functional forms, the serial correlation of shocks and exoge-nous variables, and how those exogenous variables impinge on endogenousstate variables. Anderson et al. (2000) show how the multiplier parameterin the robust control problems indexes a set of perturbed models that isdifficult to distinguish statistically from the approximating model given asample of T time-series observations.

Page 165: Maskin ambiguity book mock 2 - New York University

Chapter 6

Robust Control and ModelMisspecification

1

Abstract

A decision maker fears that data are generated by a statisticalperturbation of an approximating model that is either a controlleddiffusion or a controlled measure over continuous functions of time.A perturbation is constrained in terms of its relative entropy. Sev-eral different two-player zero-sum games that yield robust decisionrules and are related to one another, to the max-min expected utilitytheory of Gilboa and Schmeidler (1989), and to the recursive risk-sensitivity criterion described in discrete time by Hansen and Sargent(1995b). To represent perturbed models, we use martingales on theprobability space associated with the approximating model. Alterna-tive sequential and non-sequential versions of robust control theoryimply identical robust decision rules that are dynamically consistentin a useful sense.

Key words: Model uncertainty, entropy, robustness, risk-sensitivity, com-mitment, time inconsistency, martingale.

1Coauthored with Gauhar A. Turmuhambetova and Noah Williams. We thank Fer-nando Alvarez, David Backus, Gary Chamberlain, Ivar Ekeland, Peter Klibanoff, TomaszPiskorski, Michael Allen Rierson, Aldo Rustichini, Jose Scheinkman, Christopher Sims,Nizar Touzi, and especially Costis Skiadas for valuable comments on an earlier draft.Sherwin Rosen encouraged us to write this paper.

157

Page 166: Maskin ambiguity book mock 2 - New York University

158 Chapter 6. Robust Control and Model Misspecification

6.1 Introduction

A decision maker consists of (i) a utility function that is maximized sub-ject to (ii) a model. Classical decision and control theory assume that adecision maker has complete confidence in his model. Robust control the-ory presents alternative formulations of a decision maker who doubts hismodel. To capture the idea that the decision maker views his model as anapproximation, these formulations alter items (i) and (ii) by (1) surround-ing the decision maker’s approximating model with a cloud of models thatare difficult to distinguish with finite data, and (2) adding a malevolentsecond agent. The malevolent agent promotes robustness by causing thedecision maker to explore the fragility of candidate decision rules to de-partures of the data from the approximating model. Finding a rule that isrobust to model misspecification entails computing lower bounds on a rule’sperformance. The minimizing agent constructs those lower bounds.

Different parts of robust control theory uses alternative mathematicalformalisms. While all of them have versions of items (1) and (2), they differin many important mathematical details including the probability spaceson which they are defined; their ways of representing alternative models;their restrictions on sets of alternative models; and their protocols about thetiming of choices by the maximizing and minimizing decision makers. Nev-ertheless, common outcomes and representations emerge from all of thesealternative formulations. Equivalent concerns about model misspecificationcan be represented by either (a) altering the decision maker’s preferences toenhance risk-sensitivity, or (b) leaving his preferences alone but slanting hisexpectations relative to his approximating model in a particular context-specific way, or (c) adding a set of perturbed models and a malevolent agent.This paper exhibits these unifying connections and stresses how they canbe exploited in applications.

Robust control theory shares with both the Bayesian paradigm and therational expectations model the feature that the decision maker brings tothe table one fully specified model. In robust control theory it is called eitherhis reference model or his approximating model. Although the decisionmaker does not explicitly specify alternative models, he evaluates a decisionrule under a set of incompletely articulated models that are formed byperturbing his approximating model. Robust control theory contributesthoughtful ways to surround a single approximating model with a cloud ofother models. We give technical conditions that allow us to regard that

Page 167: Maskin ambiguity book mock 2 - New York University

6.1. Introduction 159

set of models as the multiple priors that appear in the max-min expectedutility theory of Gilboa and Schmeidler (1989). Some technical conditionsallow us to represent the approximating model and perturbations to it.Other technical conditions reconcile the equilibrium outcomes of severaltwo-player zero-sum games that have different timing protocols, providinga way of interpreting robust control in terms of a recursive version of max-min expected utility theory.

This paper starts with two alternative ways of representing an approx-imating model in continuous time – either (1) as a diffusion or (2) as ameasure over continuous functions of time that are induced by the diffu-sion. We consider different ways of perturbing each such representation ofthe approximating model. These lead to alternative formulations of robustcontrol problems. In all of our problems, we use a definition of relativeentropy (an expected log likelihood ratio) to constrain the gap betweenthe approximating model and a statistical perturbation to it. We take themaximum value of that gap as a parameter that measures the set of per-turbations against which the decision maker seeks robustness. Requiringthat entropy be finite restricts the form that model misspecification cantake. In particular, finiteness of entropy implies that admissible pertur-bations of the approximating model must be absolutely continuous withrespect to it over finite intervals. For a diffusion, absolute continuity overfinite intervals implies that allowable perturbations can alter the drift butnot the volatility. Restricting ourselves to perturbations that are absolutelycontinuous over finite intervals is therefore tantamount to considering per-turbed models that are in principle statistically difficult to distinguish fromthe approximating model, an idea exploited by Anderson et al. (2003) tocalibrate a plausible amount of fear of model misspecification in a study ofmarket prices of risk.

The work of Araujo and Sandroni (1999) and Sandroni (2000) empha-sizes that absolute continuity of models implies that decision makers’ beliefseventually merge with the model that generates the data. But in infinitehorizon economies, absolute continuity over finite intervals does not im-ply absolute continuity. By allowing perturbations that are not absolutelycontinuous, we arrest the merging of models and thereby create a settingin which a decision maker’s fear of model misspecification endures. Per-turbations that are absolutely continuous over finite intervals but still notabsolutely continuous can be difficult to detect from a continuous record offinite length, though they could be detected from a continuous data record

Page 168: Maskin ambiguity book mock 2 - New York University

160 Chapter 6. Robust Control and Model Misspecification

of infinite length. We discuss how this modeling choice interacts with theway that the decision maker discounts the future.

We also consider a variety of technical issues about timing protocolsthat underlie interconnections among various expressions of robust controltheory. A Bellman-Isaacs condition allows us to exchange orders of mini-mization and maximization and validates several useful results, includingthe existence of a Bayesian interpretation of a robust decision rule.

Counterparts to many of the issues treated in this paper occur in discretetime robust control theory. Many of these issues surface in nonstochasticversions of the theory, for example, in Basar and Bernhard (1995). Thecontinuous time stochastic setting of this paper allows sharper analyticalresults in several cases.

Language

We call a problem nonsequential if, at an initial time 0, a decision makerchooses an entire history-contingent sequence. We call a problem sequentialor recursive if, at each time t ≥ 0, a decision maker chooses the time tcomponent of his action process as a function of his time t information.

Organization of paper

The technical nature of interrelated material inspires us to present it intwo exposures consisting first of section 6.2, then of the remaining sec-tions. Section 6.2 sets aside a variety of complications and compiles ourmain results by displaying Hamilton-Jacobi-Bellman (HJB) equations forvarious games and decision problems and asserting without proof the keyrelationships among them. The remaining sections lay things out in detail.Section 6.3 sets the stage by describing both sequential and nonsequen-tial versions of an ordinary control problem under a known model. Theseproblems form benchmarks against which to judge subsequent problems inwhich the decision maker distrusts his model. Section 6.3 also introducesa risk-sensitive control problem that alters the decision maker’s objectivefunction but leaves unchallenged his trust in his model. Section 6.4 dis-cusses alternative ways of representing fear of model misspecification. Sec-tion 6.5 introduces entropy and its relationship to a concept of absolutecontinuity over finite intervals, then formulates two nonsequential zero-sumtwo-player games, called penalty and constraint games, that induce robust

Page 169: Maskin ambiguity book mock 2 - New York University

6.2. Overview 161

decision rules. The games in section 6.5 are both cast in terms of sets ofprobability measures. In section 6.6, we cast counterparts to these gameson a fixed probability measure by representing perturbations to an approx-imating model in terms of martingales defined on a fixed probability space.Section 6.7 gives a sequential formulation of a penalty game. By takingcontinuation entropy as an endogenous state variable, section 6.8 gives asequential formulation of a constraint game. This formulation sets the stagefor our discussion in section 6.9 of the dynamic consistency issues raised byEpstein and Schneider (2003b). Section 6.10 concludes. Appendix 6.Apresents the cast of characters that records the objects and concepts thatoccur throughout the paper. Four additional appendixes deliver proofs.

6.2 Overview

One Hamilton-Jacobi-Bellman (HJB) equation is worth a thousand words.This section concisely summarizes our main results by displaying HJB equa-tions for various two-player zero-sum continuous time games that are de-fined in terms of a Markov diffusion with state x and Brownian motionB, together with the value functions for some related nonsequential games.Our story is encoded in state variables, drifts, and diffusion terms that oc-cur in HJB equations for several optimum problems and dynamic games.This telegraphic section is intended for readers who glean everything fromHJB equations and as a summary of key findings. Readers who prefer amore deliberate presentation from the beginning should skip to section 6.3.

Sequential control problems and games

Benchmark control problem:

We take as a benchmark an ordinary control problem with value function

J(x0) = maxc∈C

E

[∫ ∞

0

exp(−δt)U(ct, xt)dt]

where the maximization is subject to dxt = μ(ct, xt)dt + σ(ct, xt)dBt andwhere x0 is a given initial condition. The HJB equation for the benchmarkproblem is

δJ(x) = maxc∈C

U(c, x)+μ(c, x) ·Jx(x)+1

2trace [σ(c, x)′Jxx(x)σ(c, x)] . (6.1)

Page 170: Maskin ambiguity book mock 2 - New York University

162 Chapter 6. Robust Control and Model Misspecification

The notation · is used to denote a potentially realized value of a control or astate. Similarly, C is the set of admissible values for the control. Subscriptson value functions denote the respective derivatives. We provide more detailabout the benchmark problem in section 6.3.

In the benchmark problem, the decision maker trusts his model. Wewant to study comparable problems where the decision maker distrustshis model. Several superficially different devices can be used to promoterobustness to misspecification of the diffusion associated with (6.1). Theseadd either a free parameter θ > 0 or a state variable r ≥ 0 or a state vectorX and produce recursive problems with one of the following HJB equations:

Risk sensitive control problem:

δS(x) = maxc∈C

U(c, x) + μ(c, x) · Sx(x) +1

2trace [σ(c, x)′Sxx(x)σ(c, x)]

− 1

2θSx(x)

′σ(c, x)σ(c, x)′Sx(x) (6.2)

HJB equation (6.2) alters the right side of the value function recursion (6.1)by deducting 1

2θtimes the local variation of the continuation value. The

optimal decision rule for the risk-sensitive problem (6.2) is a policy function

ct = αc(xt)

where the dependence on θ is understood. In control theory, −1/θ is calledthe risk-sensitivity parameter; in the recursive utility literature, it is calledthe variance multiplier. Section 6.3 below provides more details about therisk-sensitive problem.

Penalty robust control problem:

A two-player zero-sum game has a value function M that satisfies

M(x, z) = zV (x)

where zt is another state variable that changes the probability distributionand V satisfies the HJB equation:

δV (x) = maxc∈C

minhU(c, x) +

θ

2h · h +

[μ(c, x) + σ(c, x)h

]· Vx(x)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)] . (6.3)

Page 171: Maskin ambiguity book mock 2 - New York University

6.2. Overview 163

The process z = {zt : t ≥ 0} is a martingale with initial condition z0 = 1and evolution dzt = ht · dBt. The minimizing agent in (6.3) chooses an h toalter the probability distribution; θ > 0 is a parameter that penalizes theminimizing agent for distorting the drift. Optimizing over h shows that Vfrom (6.3) solves the same partial differential equation (6.2). The penaltyrobust control problem is discussed in more detail in sections 6.6 and 6.7.

Constraint robust control problem:

A two-player zero-sum game has a value function zK(x, r), whereK satisfiesthe HJB equation

δK(x, r) = maxc∈C

minh,g

U(c, x) +[μ(c, x) + σ(c, x)h

]·Kx(x, r) +

(δr − h · h

2

)·Kr(x, r)

+1

2trace

([σ(c, x)′ g

] [ Kxx(x, r) Kxr(x, r)Krx(x, r) Krr(x, r)

] [σ(c, x)g′

]). (6.4)

Equation (6.4) shares with (6.3) that the minimizing agent chooses an hthat alters the probability distribution, but unlike (6.3), there is no penaltyparameter θ. Instead, in (6.4), the minimizing agent’s choice of ht affectsa new state variable rt that we call continuation entropy. The minimiz-ing player also controls another decision variable g that determines howincrements in the continuation value are related to the underlying Brown-ian motion. The right side of the HJB equation for the constraint controlproblem (6.4) is attained by decision rules

ct = φc(xt, rt), ht = φh(xt, rt), gt = φg(xt, rt).

We can solve the equation ∂∂rK(xt, rt) = −θ to express rt as a time invariant

function of xt:

rt = φr(xt).

Therefore, along an equilibrium path of game (6.4), we have ct = φc[xt, φr(xt)], ht =φh[xt, φr(xt)], gt = φg[xt, φr(xt)]. More detail on the constraint problem isgiven in section 6.8.

A problem with a Bayesian interpretation:

Page 172: Maskin ambiguity book mock 2 - New York University

164 Chapter 6. Robust Control and Model Misspecification

A single agent optimization problem has a value function zW (x, X) whereW satisfies the HJB equation:

δW (x, X) = maxc∈C

U(c, x) + μ(c, x) ·Wx(x, X) + μ∗(x) ·WX(x, X)

+1

2trace

([σ(c, x)′ σ∗(X)′

] [Wxx(x, X) WxX(x, X)WXx(x, X) WXX(x, X)

] [σ(c, x)σ∗(X)

])+αh(X) · σ(c, x)′Wx(x, X) + αh(X) · σ∗(X)′WX(x, X) (6.5)

where μ∗(X) = μ[αc(X), X] and σ∗(X) = σ[αc(X), X]. The functionW (x, X) in (6.5) depends on an additional component of the state vec-tor X that is comparable in dimension with x and that is to be initializedfrom the common value X0 = x0 = x0. We shall show in appendix 6.Ethat equation (6.5) is the HJB equation for an ordinary (i.e., single agent)control problem with discounted objective:

z0W (x, X) = E

∫ ∞

0

exp(−δt)ztU(ct, xt)dt

and state evolution:

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt

dzt = ztαh(Xt)dBt

dXt = μ∗(Xt)dt+ σ∗(Xt)dBt

with z0 = 1, x0 = x, and X0 = X .This problem alters the benchmark control problem by changing the

probabilities assigned to the shock process {Bt : t ≥ 0}. It differs from thepenalty robust control problem (6.3) because the process z used to changeprobabilities does not depend on state variables that are endogenous to thecontrol problem.

In appendix 6.E, we verify that under the optimal c and the prescribedchoices of μ∗, σ∗, αh, the ‘big X’ component of the state vector equals the‘little x’ component, provided that X0 = x0. Equation (6.5) is therefore theHJB equation for an ordinary control problem that justifies a robust decisionrule under a fixed probability model that differs from the approximatingmodel. As the presence of zt as a preference shock suggests, this problemreinterprets the equilibrium of the two-player zero-sum game portrayed inthe penalty robust control problem (6.3). For a given θ that gets embeddedin σ∗, μ∗, the right side of the HJB equation (6.5) is attained by c = γc(x, X).

Page 173: Maskin ambiguity book mock 2 - New York University

6.2. Overview 165

Different ways to attain robustness

Relative to (6.1), HJB equations (6.2), (6.3), (6.4), and (6.5) can all beinterpreted as devices that in different ways promote robustness to misspec-ification of the diffusion. HJB equations (6.2) and (6.5) are for ordinarycontrol problems: only the maximization operator appears on the right side,so that there is no minimizing player to promote robustness. Problem (6.2)promotes robustness by enhancing the maximizing player’s sensitivity torisk, while problem (6.5) promotes robustness by attributing to the maxi-mizing player a belief about the state transition law that is distorted in apessimistic way relative to his approximating model. The HJB equationsin (6.3) and (6.4) describe two-player zero-sum dynamic games in which aminimizing player promotes robustness.

Nonsequential problems

We also study two nonsequential two-player zero-sum games that are definedin terms of perturbations q ∈ Q to the measure q0 over continuous functionsof time that is induced by the Brownian motion B in the diffusion for x.Let qt be the restriction of q to events measurable with respect to time thistories of observations. We define discounted relative entropy as

R(q).= δ

∫ ∞

0

exp(−δt)(∫

log

(dqtdq0t

)dqt

)dt

and use it to restrict the size of perturbations q to q0. Leaving the depen-dence on B implicit, we define a utility process υt(c) = U(ct, xt) and posethe following two problems:

Nonsequential penalty control problem:

V (θ) = maxc∈C

minq∈Q

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θR(q). (6.6)

Nonsequential constraint control problem:

K(η) = maxc∈C

minq∈Q(η)

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt (6.7)

Page 174: Maskin ambiguity book mock 2 - New York University

166 Chapter 6. Robust Control and Model Misspecification

where Q(η) = {q ∈ Q : R(q) ≤ η}.Problem (6.7) fits the max-min expected utility model of Gilboa and

Schmeidler (1989), where Q(η) is a set of multiple priors. The axiomatictreatment of Gilboa and Schmeidler views this set of priors as an expressionof the decision maker’s preferences and does not cast them as perturbationsof an approximating model.2 We are free to think of problem (6.7) asproviding a way to use a single approximating model q0 to generate Gilboa-Schmeidler’s set of priors as all those unspecified models that satisfy therestriction on relative entropy, Q(η) = {q ∈ Q : R(q) ≤ η}. In section 6.5we provide more detail on the nonsequential problems.

The objective functions for these two nonsequential optimization prob-lems (6.6) and (6.7) are related via the Legendre transform pair:

V (θ) = minη≥0

K(η) + θη (6.8)

K(η) = maxθ≥0

V (θ)− ηθ. (6.9)

Connections

An association between robust control and the framework of Gilboa andSchmeidler (1989) extends beyond problem (6.7) because the equilibriumvalue functions and decision rules for all of our problems are intimatelyrelated. Where V is the value function in (6.3) and K is the value functionin (6.4), the recursive counterpart to (6.8) is:

V (x) = minr≥0

K(x, r) + θr

with the implied first-order condition

∂rK(x, r) = −θ.

This first-order condition implicitly defines r as a function of x for a given θ,which implies that r is a redundant state variable. The penalty formulationavoids this redundancy.3

2Similarly, Savage’s framework does not purport to describe the process by whichthe Bayesian decision maker constructs his unique prior.

3There is also a recursive analog to (6.9) that uses the fact that the function Vdepends implicitly on θ.

Page 175: Maskin ambiguity book mock 2 - New York University

6.2. Overview 167

The nonsequential value function V is related to the other value func-tions via:

V (θ) =M(x0, 1) = 1 · V (x0) = W (x0, x0) = S(x0)

where x0 is the common initial value and θ is held fixed across the differ-ent problems. Though these problems have different decision rules, we shallshow that for a fixed θ and comparable initial conditions, they have identicalequilibrium outcomes and identical recursive representations of those out-comes. In particular, the following relations prevail across the equilibriumdecision rules for our different problems:

αc(x) = γc(x, x) = φc[x, φr(x)].

Who cares?

We care about the equivalence of these control problems and games becausesome of the problems are easier to solve and others are easier to interpret.

These problems came from literatures that approached the problem ofdecision making in the presence of model misspecification from differentangles. The recursive version of the penalty problem (6.3) emerged from aliterature on robust control that also considered the risk-sensitive problem(6.2). The nonsequential constraint problem (6.7) is an example of themin-max expected utility theory of Gilboa and Schmeidler (1989) with aparticular set of priors. By modifying the set of priors over time, constraintproblem (6.4) states a recursive version of that nonsequential constraintproblem. The Lagrange multiplier theorem supplies an interpretation ofthe penalty parameter θ.

A potentially troublesome feature of multiple priors models for appliedwork is that they impute a set of models to the decision maker.4 Howshould that set be specified? Robust control theory gives a convenient wayto specify and measure a set of priors surrounding a single approximatingmodel.

4For applied work, an attractive feature of rational expectations is that by equatingthe equilibrium of the model itself to the decision maker’s prior, decision makers’ beliefscontribute no free parameters.

Page 176: Maskin ambiguity book mock 2 - New York University

168 Chapter 6. Robust Control and Model Misspecification

6.3 Three ordinary control problems

By describing three ordinary control problems, this section begins describ-ing the technical conditions that underlie the broad claims made in section6.2. In each problem, a single decision maker chooses a stochastic processto maximize an intertemporal return function. The first two are differentrepresentations of the same underlying problem. They are cast on differ-ent probability spaces and express different timing protocols. The third,called the risk-sensitive control problem, alters the objective function ofthe decision maker to induce more aversion to risk.

Benchmark problem

We start with two versions of a benchmark stochastic optimal control prob-lem. The first formulation is defined in terms of a state vector x, an un-derlying probability space (Ω,F , P ), a d-dimensional, standard Brownianmotion {Bt : t ≥ 0} defined on that space, and {Ft : t ≥ 0}, the completionof the filtration generated by the Brownian motion B. For any stochasticprocess {at : t ≥ 0}, we use a or {at} to denote the process and at to denotethe time t-component of that process. The random vector at maps Ω intoa set A; a denotes an element in A. Actions of the decision-maker forma progressively measurable stochastic process {ct : t ≥ 0}, which meansthat the time t component ct is Ft measurable.5 Let U be an instantaneousutility function and C be the set of admissible control processes.

Definition 6.3.1. The benchmark control problem is:

J(x0) = supc∈C

E

[∫ ∞

0

exp(−δt)U(ct, xt)dt]

(6.10)

where the maximization is subject to

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt (6.11)

and where x0 is a given initial condition.

5Progressive measurability requires that we view c.= {ct : t ≥ 0} as a function of

(t, ω). For any t ≥ 0, c : [0, t]×Ω must be measurable with respect to Bt ×Ft, where Bt

is a collection of Borel subsets of [0, t]. See Karatzas and Shreve (1991) pages 4 and 5for a discussion.

Page 177: Maskin ambiguity book mock 2 - New York University

6.3. Three ordinary control problems 169

The parameter δ is a subjective discount rate, μ is the drift coefficient andσσ′ is the diffusion matrix. We restrict μ and σ so that any progressivelymeasurable control c in C implies a progressively measurable state vectorprocess x and maintain

Assumption 6.3.2. J(x0) is finite.

We shall refer to the law of motion (6.11) or the probability measureover sequences that it induces as the decision maker’s approximating model.The benchmark control problem treats the approximating model as correct.

A nonsequential version of the benchmark problem

It is useful to restate the benchmark problem in terms of the probabilityspace that the Brownian motion induces over continuous functions of time,thereby converting it into a nonsequential problem that pushes the statex into the background. At the same time, it puts the induced probabilitydistribution in the foreground and features the linearity of the objectivein the induced probability distribution. For similar constructions and fur-ther discussions of induced distributions, see Elliott (1982) and Liptser andShiryaev (2000), chapter 7.

The d-dimensional Brownian motion B induces a multivariate Wienermeasure q0 on a canonical space (Ω∗,F∗), where Ω∗ is the space of contin-uous functions f : [0,+∞) → R

d and F∗t is the Borel sigma algebra for the

restriction of the continuous functions f to [0, t]. Define open sets using thesup-norm over each interval. Notice that ιs(f)

.= f(s) is F∗

t measurable foreach 0 ≤ s ≤ t. Let F∗ be the smallest sigma algebra containing F∗

t fort ≥ 0. An event in F∗

t restricts continuous functions on the finite interval[0, t]. For any probability measure q on (Ω∗,F∗), let qt denote the restric-tion to Ft

∗. In particular, q0t is the multivariate Wiener measure over theevent collection Ft

∗.Given a progressively measurable control c, solve the stochastic differ-

ential equation (6.11) to obtain a progressively measurable utility process

U(ct, xt) = υt(c, B)

where υ(c, ·) is a progressively measurable family defined on (Ω∗,F∗). Thisnotation accounts for but conceals the evolution of the state vector xt.A realization of the Brownian motion is a continuous function. Putting

Page 178: Maskin ambiguity book mock 2 - New York University

170 Chapter 6. Robust Control and Model Misspecification

a probability measure q0 on the space of continuous functions allows us toevaluate expectations. We leave implicit the dependence on B and representthe decision maker’s objective as

∫∞0

exp(−δt)(∫

υt(c)dq0t

)dt.

Definition 6.3.3. A nonsequential benchmark control problem is

J(x0) = supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dq0t

)dt.

Recursive version of the benchmark problem

The problem in definition 6.3.1 asks the decision maker once and for all attime 0 to choose an entire process c ∈ C. To transform the problem intoone in which the decision maker chooses sequentially, we impose additionalstructure on the choice set C by restricting c to be in some set C that iscommon for all dates. This is for notational simplicity, since we could easilyincorporate control constraints of the form C(t, x). With this specificationof controls, we make the problem recursive by asking the decision maker tochoose c as a function of the state x at each date.

Definition 6.3.4. The HJB equation for the benchmark problem is

δJ(x) = supc∈C

U(c, x)+μ(c, x) ·Jx(x)+1

2trace [σ(c, x)′Jxx(x)σ(c, x)] . (6.12)

The recursive version of the benchmark problem (6.12) puts the state xtfront and center. A decision rule ct = ζc(xt) attains the right side of theHJB equation (6.12).

Although the nonsequential and recursive versions of the benchmarkcontrol problem yield identical formulas for (c, x) as a function of the Brow-nian motion B, they differ in how they represent the same approximatingmodel: as a probability distribution in the nonsequential problem as astochastic differential equation in the recursive problem. Both versions ofthe benchmark problem treat the decision maker’s approximating model astrue.6

6As we discuss more in section 6.7, an additional argument is generally needed toshow that an appropriate solution of (6.12) is equal to the value of the original problem(6.10).

Page 179: Maskin ambiguity book mock 2 - New York University

6.3. Three ordinary control problems 171

Risk-sensitive control

Let ρ be an intertemporal return or utility function. Instead of maxi-mizing Eρ (where E continues to mean mathematical expectation), risk-sensitive control theory maximizes −θ logE[exp(−ρ/θ)], where 1/θ is arisk-sensitivity parameter. As the name suggests, the exponentiation in-side the expectation makes this objective more sensitive to risky outcomes.Jacobson (1973) and Whittle (1981) initiated risk sensitive optimal controlin the context of discrete-time linear-quadratic decision problems. Jacobsonand Whittle showed that the risk-sensitive control law can be computed bysolving a robust penalty problem of the type we have studied here.

A risk-sensitive control problem treats the decision maker’s approximat-ing model as true but alters preferences by appending an additional termto the right side of the HJB equation (6.12):

δS(x) = supc∈C

U(c, x) + μ(c, x) · Sx(x) +1

2trace [σ(c, x)′Sxx(x)σ(c, x)]

− 1

2θSx(x)

′σ(c, x)σ(c, x)′Sx(x), (6.13)

where θ > 0. The term

μ(c, x) · Sx(x) +1

2trace [σ(c, x)′Sxx(x)σ(c, x)]

in HJB equation (6.13) is the local mean or dt contribution to the continua-tion value process {S(xt) : t ≥ 0}. Thus, (6.13) adds− 1

2θSx(x)

′σ(c, x)σ(c, x)′Sx(x)to the right side of the HJB equation for the benchmark control problem(6.10), (6.11). Notice that Sx(xt)

′σ(ct, xt)dBt gives the local Brownian con-tribution to the value function process {S(xt) : t ≥ 0}. The additional termin the HJB equation is the negative of the local variance of the continuationvalue weighted by 1

2θ. Relative to our discussion above, we can view this as

the Ito’s lemma correction term for the evolution of instantaneous expectedutility that comes from the concavity of the exponentiation in the risk sen-sitive objective. When θ = +∞, this collapses to the benchmark controlproblem. When θ <∞, we call it a risk-sensitive control problem with −1

θ

being the risk-sensitivity parameter. A solution of the risk-sensitive controlproblem is attained by a policy function

ct = αc(xt) (6.14)

Page 180: Maskin ambiguity book mock 2 - New York University

172 Chapter 6. Robust Control and Model Misspecification

whose dependence on θ is understood.

James (1992) studied a continuous-time, nonlinear diffusion formulationof a risk-sensitive control problem. Risk-sensitive control theory typicallyfocuses on the case in which the discount rate δ is zero. Hansen and Sargent(1995b) showed how to introduce discounting and still preserve much ofthe mathematical structure for the linear-quadratic, Gaussian risk-sensitivecontrol problem. They applied the recursive utility framework developed byEpstein and Zin (1989a) in which the risk-sensitive adjustment is appliedrecursively to the continuation values. Recursive formulation (6.13) givesthe continuous-time counterpart for Markov diffusion processes. Duffie andEpstein (1992) characterized the preferences that underlie this specification.

6.4 Fear of model misspecification

For a given θ, the optimal risk-sensitive decision rule emerges from otherproblems in which the decision maker’s objective function remains that inthe benchmark problem (6.10) and in which the adjustment to the continua-tion value in (6.13) reflects not altered preferences but distrust of the model(6.11). Moreover, just as we formulated the benchmark problem either as anonsequential problem with induced distributions or as a recursive problem,there are also nonsequential and recursive representations of robust controlproblems.

Each of our decision problems for promoting robustness to model mis-specification is a zero-sum, two-player game in which a maximizing player(‘the decision maker’) chooses a best response to a malevolent player (‘na-ture’) who can alter the stochastic process within prescribed limits. Theminimizing player’s malevolence is the maximizing player’s tool for ana-lyzing the fragility of alternative decision rules. Each game uses a Nashequilibrium concept. We portray games that differ from one another inthree dimensions: (1) the protocols that govern the timing of players’ deci-sions, (2) the constraints on the malevolent player’s choice of models; and(3) the mathematical spaces in terms of which the games are posed. Be-cause the state spaces and probability spaces on which they are defineddiffer, the recursive versions of these problems yield decision rules that dif-fer from (6.14). Despite that, all of the formulations give rise to identicaldecision processes for c, all of which in turn are equal to those that apply theoptimal risk sensitive decision rule (6.14) to the transition equation (6.11).

Page 181: Maskin ambiguity book mock 2 - New York University

6.5. Two robust control problems defined on sets of probability measures173

The equivalence of their outcomes provides interesting alternative per-spectives from which to understand the decision maker’s response to possi-ble model misspecification.7 That outcomes are identical for these differentgames means that when all is said and done, the timing protocols don’tmatter. Because some of the timing protocols correspond to nonsequen-tial or ‘static’ games while others enable sequential choices, equivalence ofequilibrium outcomes implies a form of dynamic consistency.

Jacobson (1973) and Whittle (1981) first showed that the risk-sensitivecontrol law can be computed by solving a robust penalty problem of thetype we have studied here, but without discounting. Subsequent researchreconfirmed this link in nonsequential and undiscounted problems, typicallyposed in nonstochastic environments. Petersen et al. (2000a) explicitlyconsidered an environment with randomness, but did not make the link torecursive risk-sensitivity.

6.5 Two robust control problems defined

on sets of probability measures

We formalize the connection between two problems that are robust coun-terparts to the nonsequential version of the benchmark control problem(6.3.3). These problems do not fix an induced probability distribution qo.Instead they express alternative models as alternative induced probabilitydistributions and add a player who chooses a probability distribution tominimize the objective. This leads to a pair of two-player zero-sum games.One of the two games falls naturally into the framework of Gilboa andSchmeidler (1989) and the other is closely linked to risk-sensitive control.An advantage of working with the induced distributions is that a convexityproperty that helps to establish the connection between the two games iseasy to demonstrate.

Entropy and absolute continuity over finite intervals

We use a notion of absolute continuity of one infinite-time stochastic pro-cess with respect to another that is weaker than what is implied by the

7See section 9 of Anderson et al. (2003) for an application.

Page 182: Maskin ambiguity book mock 2 - New York University

174 Chapter 6. Robust Control and Model Misspecification

standard definition of absolute continuity. The standard notion character-izes two stochastic processes as being absolutely continuous with respect toeach other if they agree about “tail events”. Roughly speaking, the weakerconcept requires that the two measures being compared both put positiveprobability on all of the same events, except tail events. This weaker notionof absolute continuity is interesting for applied work because of what it im-plies about how quickly it is possible statistically to distinguish one modelfrom another.

Recall that the Brownian motion B induces a multivariate Wiener mea-sure on (Ω∗,F∗) that we have denoted q0. For any probability measure qon (Ω∗,F∗), we have let qt denote the restriction to Ft

∗. In particular, q0tis the multivariate Wiener measure over the events Ft

∗.

Definition 6.5.1. A distribution q is said to be absolutely continuous overfinite intervals with respect to q0 if qt is absolutely continuous with respectto q0t for all t <∞.8

Let Q be the set of all distributions that are absolutely continuous withrespect to q0 over finite intervals. The set Q is convex. Absolute conti-nuity over finite intervals captures the idea that two models are difficultto distinguish given samples of finite length. If q is absolutely continuouswith respect to q0 over finite intervals, we can construct likelihood ratios forfinite histories at any calendar date t. To measure the discrepancy betweenmodels over an infinite horizon, we use a discounted measure of relativeentropy:

R(q).= δ

∫ ∞

0

exp(−δt)(∫

log

(dqtdq0t

)dqt

)dt, (6.15)

where dqtdq0t

is the Radon-Nikodym derivative of qt with respect to q0t . In ap-

pendix 6.B (claim 6.B.1), we show that this discrepancy measure is convexin q.

The distribution q is absolutely continuous with respect to q0 when∫log

(dq

dq0

)dq < +∞.

8Kabanov et al. (1979) refer to this concept as local absolute continuity. AlthoughKabanov et al. (1979) define local absolute continuity through the use of stopping times,they argue that their definition is equivalent to this “simpler one”.

Page 183: Maskin ambiguity book mock 2 - New York University

6.5. Two robust control problems defined on sets of probability measures175

In this case a law of large numbers that applies under q0 must also applyunder q, so that discrepancies between them are at most ‘temporary’. Weintroduce discounting in part to provide an alternative interpretation of therecursive formulation of risk-sensitive control as expressing a fear of modelmisspecification rather than extra aversion to well understood risks. Byrestricting the discounted entropy (6.15) to be finite, we allow∫

log

(dq

dq0

)dq = +∞. (6.16)

Time series averages of functions that converge almost surely under q0 canconverge to a different limit under q, or they may not converge at all. Thatwould allow a statistician to distinguish q from q0 with a continuous recordof data on an infinite interval.9 But we want these alternative models to beclose enough to the approximating model that they are statistically difficultto distinguish from it after having observed a continuous data record of onlyfinite length N on the state. We implement this requirement by requiringR(q) < +∞, where R(q) is defined in (6.15).

The presence of discounting in (6.15) and its absence from (6.16) aresignificant. With alternative models that satisfy (6.16), the decision makerseeks robustness against models that can be distinguished from the ap-proximating model with an infinite data record; but because the modelssatisfy (6.15), it is difficult to distinguish them from a finite data record.Thus, we have in mind settings of δ for which impatience outweighs thedecision maker’s ability eventually to learn specifications that give superiorfits, prompting him to focus on designing a robust decision rule.

We now have the vocabulary to state two nonsequential robust controlproblems that use Q as a family of distortions to the probability distributionq0 in the benchmark problem:

Definition 6.5.2. A nonsequential penalty robust control problem is

V (θ) = supc∈C

infq∈Q

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt + θR(q).

9Our specification allows Q measures to put different probabilities on tail events,which prevents the conditional measures from merging, as Blackwell and Dubins (1962)show will occur under absolute continuity. See Kalai and Lerner (1993) and Jacksonet al. (1999) for implications of absolute continuity for learning.

Page 184: Maskin ambiguity book mock 2 - New York University

176 Chapter 6. Robust Control and Model Misspecification

Definition 6.5.3. A nonsequential constraint robust control problem is

K(η) = supc∈C

infq∈Q(η)

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt

where Q(η) = {q ∈ Q : R(q) ≤ η}.

The first problem is closely linked to the risk sensitive control problem. Thesecond problem fits into the max-min expected utility or multiple priorsmodel advocated by Gilboa and Schmeidler (1989), the set of priors beingQ(η). We use θ to index a family of penalty robust control problems and ηto index a family of constraint robust control problems. The two types ofproblems are linked by the Lagrange multiplier theorem, as we show next.

Relation between the constraint and penalty problems

In this subsection we establish two important things about the two nonse-quential multiple priors problems 6.5.2 and 6.5.3: (1) we show that we caninterpret the robustness parameter θ in problem 6.5.2 as a Lagrange multi-plier on the specification-error constraint R(q) ≤ η in problem 6.5.3;10 (2)we display technical conditions that make the solutions of the two problemsequivalent to one another. We shall exploit both of these results in latersections.

The simultaneous maximization and minimization means that the linkbetween the penalty and constraint problem is not a direct implication of theLagrange multiplier Theorem. The following treatment exploits convexity ofR in Q. The analysis follows Petersen et al. (2000a), although our measureof entropy differs.11 As in Petersen et al. (2000a), we use tools of convexanalysis contained in Luenberger (1969) to establish the connection betweenthe two problems.

Assumption 6.3.2 makes the optimized objectives for both the penaltyand constraint robust control problems less than +∞. They can be −∞,depending on the magnitudes of θ and η.

10This connection is regarded as self-evident throughout the literature on robust con-trol. It has been explored in the context of a linear-quadratic control problem, informallyby Hansen et al. (1999), and formally by Hansen and Sargent (2008c).

11To accommodate discounting in the recursive, risk sensitive control problem, weinclude discounting in our measure of entropy. See appendix 6.B.

Page 185: Maskin ambiguity book mock 2 - New York University

6.5. Two robust control problems defined on sets of probability measures177

Given an η∗ > 0, add −θη∗ to the objective in problem 6.5.2. Forgiven θ, doing this has no impact on the control law.12 For a given c, theobjective of the constraint robust control problem is linear in q and theentropy measure R in the constraint is convex in q. Moreover, the family ofadmissible probability distributions Q is itself convex. Thus, we formulatethe constraint version of the robust control problem (problem 6.5.3) as aLagrangian:

supc∈C

infq∈Q

supθ≥0

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ

[R(q)− η

].

For many choices of q, The optimizing multiplier θ is degenerate: it is infi-nite if q violates the constraint and zero if the constraint is slack. Therefore,we include θ = +∞ in the choice set for θ. Exchanging the order of maxθand minq attains the same value of q. The Lagrange multiplier theoremallows us to study:

supc∈C

supθ≥0

infq∈Q

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ

[R(q)− η

]. (6.17)

A complication arises at this point because the maximizing θ in (6.17)depends on the choice of c. In solving a robust control problem, we aremost interested in the c that solves the constraint robust control problem.We can find the appropriate choice of θ by changing the order of maxc andmaxθ to obtain:

supθ≥0

supc∈C

infq∈Q

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+θ

[R(q)− η∗

]= max

θ≥0V (θ)−θη∗,

since for a given θ the term −θη∗ does not effect the extremizing choices of(c, q).

Claim 6.5.4. For η∗ > 0, suppose that c∗ and q∗ solve the constraint robustcontrol problem for K(η∗) > −∞. Then there exists a θ∗ > 0 such that thecorresponding penalty robust control problem has the same solution. More-over,

K(η∗) = maxθ≥0

V (θ)− θη∗.

12 However, it will alter which θ results in the highest objective.

Page 186: Maskin ambiguity book mock 2 - New York University

178 Chapter 6. Robust Control and Model Misspecification

Proof. This result is essentially the same as Theorem 2.1 of Petersen et al.(2000a) and follows directly from Luenberger (1969).

This claim gives K as the Legendre transform of V . Moreover, by adapt-ing an argument of Luenberger (1969), we can show that K is decreasingand convex in η.13 We are interested in recovering V from K as the inverseLegendre transform via:

V (θ∗) = minη≥0

K(η) + θ∗η. (6.18)

It remains to justify this recovery formula.We call admissible those nonnegative values of θ for which it is feasible to

make the objective function greater than −∞. If θ is admissible, values of θlarger than θ are also admissible, since these values only make the objectivelarger. Let θ denote the greatest lower bound for admissible values of θ.Consider a value θ∗ > θ. Our aim is to find a constraint associated withthis choice of θ.

It follows from claim 6.5.4 that

V (θ∗) ≤ K(η) + θ∗η

for any η > 0 and hence

V (θ∗) ≤ minη≥0

K(η) + θ∗η.

Moreover,

K(η) ≤ infq∈Q(η)

supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt,

since maximizing after minimizing (rather than vice versa) cannot decreasethe resulting value of the objective. Thus,

V (θ∗) ≤ minη≥0

[inf

q∈Q(η)supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ∗η

]= min

η≥0

[inf

q∈Q(η)supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ∗R(q)

]13This follows because we may view K as the maximum over convex functions indexed

by alternative consumption processes.

Page 187: Maskin ambiguity book mock 2 - New York University

6.5. Two robust control problems defined on sets of probability measures179

= infq∈Q

supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ∗R(q).

For the first equality, the minimization over η is important. Given some ηwe may lower the objective by substituting R(q) for η when the constraintR(q) ≤ η is imposed in the inner minimization problem. Thus the min-imized choice of q for η may have entropy η < η. More generally, theremay exist a sequence {qj : j = 1, 2, ...} that approximates the inf for which{R(qj) : j = 1, 2, ...} is bounded away from η. In this case we may extracta subsequence of R(qj) : j = 1, 2, ...} that converges to η < η. Therefore,we would obtain the same objective by imposing an entropy constraintR(q) ≤ η at the outset:

infq∈Q(η)

[supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ∗η

]= inf

q∈Q(η)

[supc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θ∗R(q)

].

Since the objective is minimized by choice η there is no further reductionin the optimized objective by substituting R(q) for η.

Notice that the last equality gives a min−max analogue to the non-sequential penalty problem (6.5.2), but with the order of minimization andmaximization reversed. If the resulting value continues to be V (θ∗), wehave verified (6.18).

We shall invoke the following assumption:

Assumption 6.5.5. For θ > θ

V (θ) = maxc∈C

minq∈Q

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θR(q)

= minq∈Q

maxc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θR(q).

Both equalities assume that the maximum and minimum are attained. Be-cause minimization occurs first, without the assumption the second equalitywould have to be replaced by a less than or equal sign ( ≤). In much ofwhat follows, we presume that inf’s and sup’s are attained in the controlproblems, and thus we will replace inf with min and sup with max.

Page 188: Maskin ambiguity book mock 2 - New York University

180 Chapter 6. Robust Control and Model Misspecification

Claim 6.5.6. Suppose that Assumption 6.5.5 is satisfied and that for θ∗ >θ, c∗ is the maximizing choice of c for the penalty robust control problem6.5.2. Then that c∗ also solves the constraint robust control problem 6.5.3for η∗ = R(q∗) where η∗ solves

V (θ∗) = minη≥0

K(η) + θ∗η.

Since K is decreasing and convex, V is increasing and concave in θ. TheLegendre and inverse Legendre transforms given in claims 6.5.4 and 6.5.6fully describe the mapping between the constraint index η∗ and the penaltyparameter θ∗. However, given η∗, they do not imply that the associated θ∗

is unique, nor for a given θ∗ > θ do they imply that the associated η∗ isunique.

While claim 6.5.6 maintains assumption 6.5.5, claim 6.5.4 does not.Without assumption 6.5.5, we do not have a proof that V is concave. More-over, for some values of θ∗ and a solution pair (c∗, q∗) of the penalty problem,we may not be able to produce a corresponding constraint problem. Nev-ertheless, the family of penalty problems indexed by θ continues to embedthe solutions to the constraint problems indexed by η as justified by claim6.5.4. We are primarily interested in problems for which assumption 6.5.5is satisfied and in section 6.7 and appendix 6.D provide some sufficient con-ditions for this assumption. One reason for interest in this assumption isgiven in the next subsection.

Preference Orderings

We now define two preference orderings associated with the constraint andpenalty control problems. One preference ordering uses the value function:

K(c; η) = infR(q)≤η

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt.

Definition 6.5.7. (Constraint preference ordering) For any two progres-sively measurable c and c∗, c∗ �η c if

K(c∗; η) ≥ K(c; η).

The other preference ordering uses the value function:

V (c; θ) = infq

∫ ∞

0

exp(−δt)(∫

υt(c)dqt

)dt+ θR(q)

Page 189: Maskin ambiguity book mock 2 - New York University

6.5. Two robust control problems defined on sets of probability measures181

Definition 6.5.8. (Penalty preference ordering) For any two progressivelymeasurable c and c∗, c∗ �θ c if

V (c∗; θ) ≥ V (c; θ).

The first preference order has the multiple-priors form justified by Gilboaand Schmeidler (1989). The second is commonly used to compute robustdecision rules and is closest to recursive utility theory. The two preferenceorderings differ. Furthermore, given η, there exists no θ that makes thetwo preference orderings agree. However, the Lagrange Multiplier Theoremdelivers a weaker result that is very useful to us. While they differ globally,indifference curves passing through a given point c∗ in the consumption setare tangent for the two preference orderings. For asset pricing, a particularlyinteresting point c∗ would be one that solves an optimal resource allocationproblem.

Use the Lagrange Multiplier Theorem to write K as

K(c∗; η∗) = maxθ≥0

infq

∫ ∞

0

exp(−δt)(∫

υt(c∗)dqt

)dt+ θ

[R(q)− η∗

],

and let θ∗ denote the maximizing value of θ, which we assume to be strictlypositive. Suppose that c∗ �η∗ c. Then

V (c; θ∗)− θ∗η∗ ≤ K(c; η∗) ≤ K(c∗; η∗) = V (c∗; θ∗)− θ∗η∗.

Thus, c∗ �θ∗ c. The observational equivalence results from Claims 6.5.4and 6.5.6 apply to decision profile c∗. The indifference curves touch but donot cross at this point.

Although the preferences differ, the penalty preferences are of inter-est in their own right. See Wang (2001) for an axiomatic development ofentropy-based preference orders and Maccheroni et al. (2004) for an ax-iomatic treatment of preferences specified using convex penalization.

Bayesian interpretation of outcome of nonsequentialgame

A widespread device for interpreting a statistical decision rule is to finda probability distribution for which the decision rule is optimal. Here weseek an induced probability distribution for B such that the solution for

Page 190: Maskin ambiguity book mock 2 - New York University

182 Chapter 6. Robust Control and Model Misspecification

c from either the constraint or penalty robust decision problem is optimalfor a counterpart to the benchmark problem. When we can produce such adistribution, we say that we have a Bayesian interpretation for the robustdecision rule. (See Blackwell and Girshick (1954) and Chamberlain (2000)for related discussions.)

The freedom to exchange orders of maximization and minimization inproblem 6.5.2 (Assumption 6.5.5) justifies such a Bayesian interpretationof the decision process c ∈ C. Let (c∗, q∗) be the equilibrium of game 6.5.2.Given the worst case model q∗, consider the control problem:

maxc∈C

∫ ∞

0

exp(−δt)(∫

υt(c)dq∗t

)dt. (6.19)

Problem (6.19) is a version of our nonsequential benchmark problem 6.3.3with a fixed model q∗ that is distorted relative to the approximating modelq0. The optimal choice of a progressively measurable c takes q∗ as ex-ogenous. The optimal decision c∗ is not altered by adding θR(q∗) to theobjective. Therefore, being able to exchange orders of extremization in6.5.2 allows us to support a solution to the penalty problem by a particulardistortion in the Wiener measure. The implied least favorable q∗ assignsa different (induced) probability measure for the exogenous stochastic pro-cess {Bt : t ≥ 0}. Given that distribution, c∗ is the ordinary (non robust)optimal control process.

Having connected the penalty and the constraint problem, in what fol-lows we will focus primarily on the penalty problem. For notational sim-plicity, we will simply fix a value of θ and not formally index a family ofproblems by this parameter value.

6.6 Games on fixed probability spaces

This section describes important technical details that are involved in mov-ing from the nonsequential to the recursive versions of the multiple proba-bility games 6.5.2 and 6.5.3. It is convenient to represent alternative modelspecifications as martingale ‘preference shocks’ on a common probabilityspace. This allows us to formulate two-player zero-sum differential gamesand to use existing results for such games. Thus, instead of working with

Page 191: Maskin ambiguity book mock 2 - New York University

6.6. Games on fixed probability spaces 183

multiple distributions on the measurable space (Ω∗,F∗), we now use theoriginal probability space (Ω,F , P ) in conjunction with nonnegative mar-tingales.

We present a convenient way to parameterize the martingales and issuea caveat about this parameterization.

Martingales and finite interval absolute continuity

For any continuous function f in Ω∗, let

κt(f) =

(dqtdq0t

)(f)

zt = κt(B) (6.20)

where κt is the Radon-Nikodym derivative of qt with respect to q0t .

Claim 6.6.1. Suppose that for all t ≥ 0, qt is absolutely continuous withrespect to q0t . The process {zt : t ≥ 0} defined via (6.20) on (Ω,F , P ) is anonnegative martingale adapted to the filtration {Ft : t ≥ 0} with Ezt = 1.Moreover, ∫

φtdqt = E [ztφt(B)] (6.21)

for any bounded and F∗t measurable function φt. Conversely, if {zt : t ≥ 0}

is a nonnegative progressively measurable martingale with Ezt = 1, then theprobability measure q defined via (6.21) is absolutely continuous with respectto q0 over finite intervals.

Proof. The first part of this claim follows directly from the proof of theorem7.5 in Liptser and Shiryaev (2000). Their proof is essentially a direct ap-plication of the Law of Iterated Expectations and the fact that probabilitydistributions necessarily integrate to one. Conversely, suppose that z is anonnegative martingale on (Ω,F , P ) with unit expectation. Let φt be anynonnegative, bounded and F∗

t measurable function. Then (6.21) defines ameasure because indicator functions are nonnegative, bounded functions.Clearly

∫φtdqt = 0 whenever Eφt(B) = 0. Thus, qt is absolutely continu-

ous with respect to q0t , the measure induced by Brownian motion restrictedto [0, t]. Setting φt = 1 shows that qt is in fact a probability measure forany t.

Page 192: Maskin ambiguity book mock 2 - New York University

184 Chapter 6. Robust Control and Model Misspecification

Claim 6.6.1 is important because it allows us to integrate over (Ω∗,F∗, q)by instead integrating against a martingale z on the original probabilityspace (Ω,F , P ).

Representing martingales

By exploiting the Brownian motion information structure, we can attain aconvenient representation of a martingale. Any martingale z with a unitexpectation can be portrayed as

zt = 1 +

∫ t

0

kudBu

where k is a progressively measurable d-dimensional process that satisfies:

P

{∫ t

0

|ku|2du <∞}

= 1

for any finite t (see Revuz and Yor (1994), Theorem V.3.4). Define:

ht =

{kt/zt if zt > 00 if zt = 0.

(6.22)

Then z solves the integral equation

zt = 1 +

∫ t

0

zuhudBu (6.23)

and its differential counterpart

dzt = zthtdBt (6.24)

with initial condition z0 = 1, where for t > 0

P

{∫ t

0

(zu)2|hu|2du <∞

}= 1. (6.25)

The scaling by (zu)2 permits ∫ t

0

|hu|2du = ∞

provided that zt = 0 on the probability one event in (6.25).In reformulating the nonsequential penalty problem 6.5.2, we param-

eterize nonnegative martingales by progressively measurable processes h.We introduce a new state zt initialized at one, and take h to be under thecontrol of the minimizing agent.

Page 193: Maskin ambiguity book mock 2 - New York University

6.6. Games on fixed probability spaces 185

Representing likelihood ratios

We are now equipped to fill in some important details associated with usingmartingales to represent likelihood ratios for dynamic models. Before ad-dressing these issues, we use a simple static example to exhibit an importantidea.

A static example

The static example is designed to illustrate two alternative ways to repre-sent the expected value of a likelihood ratio by changing the measure withrespect to which it is evaluated. Consider two models of a vector y. In thefirst, y is normally distributed with mean ν and covariance matrix I. In thesecond, y is normally distributed with mean zero and covariance matrix I.The logarithm of the ratio of the first density to the second is:

�(y) =

(ν · y − 1

2ν · ν

).

Let E1 denote the expectation under model one and E2 under model two.Properties of the log-normal distribution imply that

E1 exp [�(y)] = 1.

Under the second model

E2�(y) = E1�(y) exp[�(y)] =1

2ν · ν,

which is relative entropy.

The dynamic counterpart

We now consider a dynamic counterpart to the static example by showingtwo ways to represent likelihood ratios, one under the original Brownianmotion model and another under the model associated with a nonnegativemartingale z. First we consider the likelihood ratio under the Brownian mo-tion model for B. As noted above, the solution to (6.24) can be representedas an exponential:

zt = exp

(∫ t

0

hu · dBu −1

2

∫ t

0

|hu|2du). (6.26)

Page 194: Maskin ambiguity book mock 2 - New York University

186 Chapter 6. Robust Control and Model Misspecification

We allow∫ t0|hu|2du to be infinite with positive probability and adopt the

convention that the exponential is zero when this event happens. In theevent that

∫ t0|hu|2du < ∞, we can define the stochastic integral

∫ t0hudBu

as an appropriate probability limit (see Lemma 6.2 of Liptser and Shiryaev(2000)).

When z is a martingale, we can interpret the right side of (6.26) as aformula for the likelihood ratio of two models evaluated under the Brownianmotion specification for B. Taking logarithms, we find that

�t =

∫ t

0

hu · dBu −1

2

∫ t

0

|hu|2du.

Since h is progressively measurable, we can write:

ht = ψt(B).

Changing the distribution of B in accordance with q gives another char-acterization of the likelihood ratio. The Girsanov Theorem implies

Claim 6.6.2. If for all t ≥ 0, qt is absolutely continuous with respect toq0t , then q is the induced distribution for a (possibly weak) solution B to astochastic differential equation defined on a probability space (Ω,F , P ):

dBt = ψt(B)dt+ dBt

for some progressively measurable ψ defined on (Ω∗,F∗) and some Brownianmotion B that is adapted to {Ft : t ≥ 0}. Moreover, for each t

P

[∫ t

0

|ψu(B)|2du <∞]= 1.

Proof. From Lemma 6.6.1 there is a nonnegative martingale z associatedwith the Radon-Nikodym derivative of qt with respect to q0t . This martin-gale has expectation unity for all t. The conclusion follows from a gener-alization of the Girsanov Theorem (e.g., see Liptser and Shiryaev (2000)Theorem 6.2).

The ψt(B) is the same as that used to represent ht defined by (6.22).Under the distribution P ,

Bt =

∫ t

0

hudu+ Bt

Page 195: Maskin ambiguity book mock 2 - New York University

6.6. Games on fixed probability spaces 187

where Bt is a Brownian motion with respect to the filtration {Ft : t ≥ 0}.In other words, we obtain perturbed models by replacing the Brownianmotion model for a shock process with a Brownian motion with a drift.

Using this representation, we can write the logarithm of the likelihoodratio as:

�t =

∫ t

0

ψu(B) · dBu +1

2

∫ t

0

|ψu(B)|2du.

Claim 6.6.3. For q ∈ Q, let z be the nonnegative martingale associatedwith q and let h be the progressively measurable process satisfying (6.23).Then

R(q) =1

2E

[∫ ∞

0

exp(−δt)zt|ht|2dt].

Proof. See appendix 6.B.

This claim leads us to define a discounted entropy measure for nonnegativemartingales:

R∗(z) .=1

2E

[∫ ∞

0

exp(−δt)zt|ht|2dt]. (6.27)

A martingale version of a robust control problem

Modeling alternative probability distributions as preference shocks that aremartingales on a common probability space is mathematically convenientbecause it allows us to reformulate the penalty robust control problem(problem 6.5.2) as:

Definition 6.6.4. A nonsequential martingale robust control problem is

maxc∈C

minh∈H

E

(∫ ∞

0

exp(−δt)zt[U(ct, xt) +

θ

2|ht|2

]d t

)(6.28)

subject to:

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt

dzt = ztht · dBt. (6.29)

But there is potentially a technical problem with this formulation. Theremay exist control process h and corresponding processes z such that z is anonnegative local martingale for which R∗(z) < ∞, yet z is not a martin-gale. We have not ruled out nonnegative supermartingales that happen to

Page 196: Maskin ambiguity book mock 2 - New York University

188 Chapter 6. Robust Control and Model Misspecification

be local martingales. This means that even though z is a local martingale,it might satisfy only the inequality

E (zt|Fs) ≤ zs

for 0 < s ≤ t. Even when we initialize z0 to one, zt may have a mean lessthan one and the corresponding measure will not be a probability measure.Then we would have given the minimizing agent more options than weintend.

For this not to cause difficulty, at the very least we have to show that theminimizing player’s choice of h in problem 6.6.4 is associated with a z thatis a martingale and not just a supermartingale.14 More generally, we haveto verify that enlarging the set of processes z as we have done does not alterthe equilibrium of the two-player zero-sum game. In particular, consider thesecond problem in assumption 6.5.5. It suffices to show that the minimizingh implies a z that is a martingale. If we assume that condition 6.5.5 issatisfied, then it suffices to check this for the following timing protocol:

minh∈H

maxc∈C

E

(∫ ∞

0

exp(−δt)zt[U(ct, xt) +

θ

2|ht|2

]d t

)subject to (6.29), z0 = 1, and an initial condition x0 for x.15 In appendix6.C, we show how to establish that the solution is indeed a martingale.

14Alternatively, we might interpret the supermartingale as allowing for an escape to aterminal absorbing state with a terminal value function equal to zero. The expectationof zt gives the probability that an escape has not happened as of date t. The existenceof such terminal state is not, however, entertained in our formulation of 6.5.2.

15To see this let H∗ ⊆ H be the set of controls h for which z is a martingale and letobj(h, c) be the objective as a function of the controls. Then under Assumption 6.5.5 wehave

minh∈H∗

maxc∈C

obj(h, c) ≥ minh∈H

maxc∈C

obj(h, c) = maxc∈C

minh∈H

obj(h, c) ≤ maxc∈C

minh∈H∗

obj(h, c).

(6.30)If we demonstrate, the first inequality ≥ in (6.30) is an equality, it follows that

minh∈H∗

maxc∈C

obj(h, c) ≤ maxc∈C

minh∈H∗

obj(h, c).

Since the reverse inequality is always satisfied provided that the extrema are attained,this inequality can be replaced by an equality. It follows that the second inequality ≤ in(6.30) must in fact be an equality as well.

Page 197: Maskin ambiguity book mock 2 - New York University

6.7. Sequential timing protocol for a penalty formulation 189

6.7 Sequential timing protocol for a

penalty formulation

The martingale problem 6.6.4 assumes that at time zero both decision mak-ers commit to decision processes whose time t components are measurablefunctions of Ft. The minimizing decision maker who chooses distorted be-liefs h takes c as given; and the maximizing decision maker who chooses ctakes h as given. Assumption 6.5.5 asserts that the order in which the twodecision makers choose does not matter.

This section studies a two-player zero-sum game with a protocol thatmakes both players choose sequentially. We set forth conditions that implythat with sequential choices we obtain the same time zero value function andthe same outcome path that would prevail were both players to choose onceand for all at time 0. The sequential formulation is convenient computation-ally and also gives a way to justify the exchange of orders of extremizationstipulated by assumption 6.5.5.

We have used c to denote the control process and c ∈ C to denote thevalue of a control at a particular date. We let h ∈ H denote the realizedmartingale control at any particular date. We can think of h as a vector inRd. Similarly, we think of x and z as being realized states.

To analyze outcomes under a sequential timing protocol, we think ofvarying the initial state and define a value function M(x0, z0) as the opti-mized objective function (6.28) for the martingale problem. By appealingto results of Fleming and Souganidis (1989), we can verify that V (θ) =M(x, z) = zV (x), provided that x = x0 and z = 1. Under a sequentialtiming protocol, this same value function gives the continuation value forevaluating states reached at subsequent time periods.

Fleming and Souganidis (1989) show that a Bellman-Isaacs conditionrenders equilibrium outcomes under two-sided commitment at date zeroidentical with outcomes of a Markov perfect equilibrium in which the de-cision rules of both agents are chosen sequentially, each as a function ofthe state vector xt.

16 The HJB equation for the infinite-horizon zero-sum

16Fleming and Souganidis (1989) impose as restrictions that μ, σ and U are bounded,uniformly continuous and Lipschitz continuous with respect to x uniformly in c. Theyalso require that the controls c and h reside in compact sets. While these restrictions areimposed to obtain general existence results, they are not satisfied for some importantexamples. Presumably existence in these examples will require special arguments. These

Page 198: Maskin ambiguity book mock 2 - New York University

190 Chapter 6. Robust Control and Model Misspecification

two-player martingale game is:

δzV (x) = maxc∈C

minhzU(c, x) + z

θ

2h · h+ μ(c, x) · Vx(x)z

+z1

2trace [σ(c, x)′Vxx(x)σ(c, x)] + zh · σ(c, x)′Vx(x)(6.31)

where Vx is the vector of partial derivatives of V with respect to x andVxx is the matrix of second derivatives.17 The diffusion specification makesthis HJB equation a partial differential equation that has multiple solutionsthat correspond to different boundary conditions. To find the true valuefunction and to justify the associated control laws requires that we apply aVerification Theorem (e.g., see Theorem 5.1 of Fleming and Soner (1993)).

The scaling of partial differential equation (6.31) by z verifies our guessthat the value function is linear in z. This allows us to study the alternativeHJB equation:

δV (x) = maxc∈C

minhU(c, x) +

θ

2h · h +

[μ(c, x) + σ(c, x)h

]· Vx(x)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)] , (6.32)

which involves only the x component of the state vector and not z.18

issues are beyond the scope of this paper.17In general the value functions associated with stochastic control problems will not

be twice differentiable, as would be required for the HJB equation in (6.32) below topossess classical solutions. However Fleming and Souganidis (1989) prove that the valuefunction satisfies the HJB equation in a weaker viscosity sense. Viscosity solutions areoften needed when it is feasible and sometimes desirable to set the control c so thatσ(c, x) has lower rank than d, which is the dimension of the Brownian motion.

18We can construct another differential game for which V is the value function replac-ing dBt by htdt+ dBt in the evolution equation instead of introducing a martingale. Inthis way we would perturb the process rather than the probability distribution. Whilethis approach can be motivated using Girsanov’s Theorem, some subtle differences be-tween the resulting perturbation game and the martingale game arise because the historyof Bt =

∫ t

0hudu+Bt can generate either a smaller or a larger filtration than that of the

Brownian motion B. When it generates a smaller sigma algebra, we would be compelledto solve a combined control and filtering problem if we think of B as the generating theinformation available to the decision maker. If B generates a larger information set, thenwe are compelled to consider weak solutions to the stochastic differential equations thatunderlie the decision problem. Instead of extensively developing this alternative inter-pretation of V (as we did in an earlier draft), we simply think of the partial differentialequation (6.32) as a means of simplifying the solution to the martingale problem.

Page 199: Maskin ambiguity book mock 2 - New York University

6.7. Sequential timing protocol for a penalty formulation 191

A Bellman-Isaacs condition renders inconsequential the order of actiontaken in the recursive game. The Bellman-Isaacs condition requires:

Assumption 6.7.1. The value function V satisfies

δV (x) = maxc∈C

minhU(c, x) +

θ

2h · h+

[μ(c, x) + σ(c, x)h

]· Vx(x)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)]

= minh

maxc∈C

U(c, x) +θ

2h · h+

[μ(c, x) + σ(c, x)h

]· Vx(x)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)]

Appendix 6.D describes three ways to verify this Bellman-Isaacs condition.The infinite-horizon counterpart to the result of Fleming and Souganidis(1989) asserts that the Bellman-Isaacs condition implies assumption 6.5.5and hence V (θ) = V (x0) because z is initialized at unity.

A representation of z∗

One way to represent the worst-case martingale z∗ in the recursive penaltygame opens a natural transition to the risk-sensitive ordinary control prob-lem whose HJB equation is (6.13). The minimizing player’s decision rule ish = αh(x), where

αh(x) = −1

θσ∗(x)′Vx(x) (6.33)

and σ∗(x) ≡ σ∗(αc(x), x). Suppose that V (x) is twice continuously differ-entiable. Applying the formula on page 226 of Revuz and Yor (1994), formthe martingale:

z∗t = exp

(−1

θ[V (xt)− V (x0)]−

∫ t

0

w(xu)du

),

where w is constructed to ensure that z∗ has a zero drift. The worst casedistribution assigns more weight to bad states as measured by an exponen-tial adjustment to the value function. This representation leads directly tothe risk-sensitive control problem that we take up in the next subsection.

Page 200: Maskin ambiguity book mock 2 - New York University

192 Chapter 6. Robust Control and Model Misspecification

Risk sensitivity revisited

The HJB equation for the recursive, risk-sensitive control problem is ob-tained by substituting the solution (6.33) for h into the partial differentialequation (6.32):

δV (x) = maxc∈C

minhU(c, x) +

θ

2h · h +

[μ(c, x) + σ(c, x)h

]· Vx(x)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)]

= maxc∈C

U(c, x) + μ(c, x) · Vx(x) (6.34)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)]

− 1

2θVx(x)

′σ(c, x)σ(c, x)′Vx(x)

The value function V for the robust penalty problem is also the value func-tion for the risk sensitive control problem of section 6.3. The risk sensitiveinterpretation excludes worries about misspecified dynamics and insteadenhances the control objective with aversion to risk in a way captured bythe local variance of the continuation value. While mathematically relatedto the situation discussed in James (1992) (see pages 403 and 404), the pres-ence of discounting in our setup compels us to use a recursive representationof the objective of the decision-maker.

It light of this connection between robust control and risk-sensitive con-trol, it is not surprising that the penalty preference ordering that we devel-oped in section 6.5 is equivalent to a risk-sensitive version of the stochasticdifferential utility studied by Duffie and Epstein (1992). Using results fromSchroder and Skiadas (1999), Skiadas (2001) has shown this formally.

The equivalence of the robustness-penalty preference order with onecoming from a risk-adjustment of the continuation value obviously providesno guidance about which interpretation we should prefer. That a givenpreference order can be motivated in two ways does not inform us aboutwhich of them is more attractive. But in an application to asset pricing,Anderson et al. (2003) have shown how the robustness motivation wouldlead a calibrator to think differently about the parameter θ than the riskmotivation.19

19The link between the preference orders would vanish if we limited the concerns about

Page 201: Maskin ambiguity book mock 2 - New York University

6.8. Sequential timing protocol for a constraint formulation 193

6.8 Sequential timing protocol for a

constraint formulation

Section 6.7 showed how to make penalty problem 6.5.2 recursive by adopt-ing a sequential timing protocol. Now we show how to make the constraintproblem 6.5.3 recursive. Because the value of the date zero constraint prob-lem depends on the magnitude of the entropy constraint, we add the con-tinuation value of entropy as a state variable. Instead of a value functionV that depends only on the state x, we use a value function K that alsodepends on continuation entropy, denoted r.

An HJB equation for a constraint game

Our strategy is to use the link between the value functions for the penaltyand constraint problems asserted in claims 6.5.4 and 6.5.6, then to deducefrom the HJB equation (6.31) a partial differential equation that can be in-terpreted as the HJB equation for another zero-sum two-player game withadditional states and controls. By construction, the new game has a se-quential timing protocol and will have the same equilibrium outcome andrepresentation as game (6.31). Until now, we have suppressed the depen-dence of V on θ in our notation for the value function V . Because thisdependence is central, we now denote it explicitly.

Another value function

Claim 6.5.4 showed how to construct the date zero value function for theconstraint problem from the penalty problem via Legendre transform. Weuse this same transform over time to construct a new value function K:

K(x, r) = maxθ≥0

V (x, θ)− rθ (6.35)

that is related to K by

K(r) = K(x, r)

model misspecification to some components of the vector Brownian motion. In Wang’s2001 axiomatic treatment, the preferences are defined over both the approximating modeland the family of perturbed models. Both can vary. By limiting the family of perturbedmodels, we can break the link with recursive utility theory.

Page 202: Maskin ambiguity book mock 2 - New York University

194 Chapter 6. Robust Control and Model Misspecification

provided that x is equal to the date zero state x0, r is used for the initialentropy constraint, and z = 1. We also assume that the Bellman-Isaacscondition is satisfied, so that the inverse Legendre transform can be applied:

V (x, θ) = minr≥0

K(x, r) + rθ. (6.36)

WhenK and V are related by the Legendre transforms (6.35) and (6.36),their derivatives are closely related, if they exist. We presume the smooth-ness needed to compute derivatives.

The HJB equation (6.31) that we derived for V held for each value of θ.We consider the consequences of varying the pair (x, θ), as in the construc-tion of V , or we consider varying the pair (x, r), as in the construction ofK. We have

Kr = −θ or Vθ = r.

For a fixed x, we can vary r by changing θ, or conversely we can vary θ bychanging r. To construct a partial differential equation for K from (6.31),we will compute derivatives with respect to r that respect the constraintlinking r and θ.

For the optimized value of r, we have

δV = δ(K + θr) = δK − δrKr, (6.37)

and

−θ(h · h2

)= Kr

(h · h2

). (6.38)

By the implicit function theorem, holding θ fixed:

∂r

∂x= −Kxr

Krr.

Next we compute the derivatives of V that enter the partial differentialequation (6.31) for V :

Vx = Kx

Vxx = Kxx +Krx∂r

∂x

= Kxx −KrxKxr

Krr. (6.39)

Page 203: Maskin ambiguity book mock 2 - New York University

6.8. Sequential timing protocol for a constraint formulation 195

Notice that

12trace [σ(c, x)′Vxx(x)σ(c, x)] =

ming

12trace

([σ(c, x)′ g

] [ Kxx(x, r) Kxr(x, r)Krx(x, r) Krr(x, r)

] [σ(c, x)g′

])(6.40)

where g is a column vector with the same dimension d as the Brownianmotion. Substituting equations (6.37), (6.38), (6.39), and (6.40) into thepartial differential equation (6.32) gives:

δK(x, r) = maxc∈C

minh,g

U(c, x) +[μ(c, x) + σ(c, x)h

]·Kx(x, r) +

(δr − h · h

2

)·Kr(x, r)

+1

2trace

([σ(c, x)′ g

] [ Kxx(x, r) Kxr(x, r)Krx(x, r) Krr(x, r)

] [σ(c, x)g′

]). (6.41)

The remainder of this section interprets zK(x, r) as a value functionfor a recursive game in which θ = θ∗ > θ is fixed over time. We havealready seen how to characterize the state evolution for the recursive penaltydifferential game associated with a fixed θ. The first-order condition for themaximization problem on the right side of (6.35) is

r = Vθ(x, θ∗). (6.42)

We view this first-order condition as determining r for a given θ∗ and x.Then formula (6.42) implies that the evolution of r is fully determined bythe equilibrium evolution of x. We refer to r as continuation entropy.

We denote the state evolution for the θ∗ differential game as:

dxt = μ∗(xt, θ∗)dt+ σ∗(xt, θ∗)dBt

Continuation entropy

We want to show that r evolves like continuation entropy. Recall formula(6.27) for the relative entropy of a nonnegative martingale:

R∗(z) .= E

∫ ∞

0

exp(−δt)zt|ht|22dt.

Define a date t conditional counterpart as follows:

R∗t (z) = E

[∫ ∞

0

exp(−δu)(zt+uzt

)|ht+u|2

2du∣∣∣Ft

],

Page 204: Maskin ambiguity book mock 2 - New York University

196 Chapter 6. Robust Control and Model Misspecification

provided that zt > 0 and define R∗t (z) to be zero otherwise. This family of

random variables induces the following recursion for ε > 0:

ztR∗t (z) = exp(−δε)E

[zt+εR∗

t+ε(z)∣∣∣Ft

]+E

[∫ ε

0

exp(−δu)zt+u|ht+u|2

2du∣∣∣Ft

].

Since ztR∗t (z) is in the form of a risk neutral value of an asset with future

dividend zt+uht+u·ht+u

2, its local mean or drift has the familiar formula:

δztR∗t (z)− zt

|ht|22.

To defend an interpretation of rt as continuation entropy, we need to verifythat this drift restriction is satisfied for rt = R∗

t (z). Write the evolution forrt as:

drt = μr(xt)dt+ σr(xt) · dBt,

and recall thatdzt = ztht · dBt.

Using Ito’s formula for the drift of ztrt, the restriction that we want toverify is:

zμr(x) + zσr(x) · h = δzr − z|h|22. (6.43)

Given formula (6.42) and Ito’s differential formula for a smooth functionof a diffusion process, we have

μr(x) = Vθx(x, θ∗) · μ∗(x, θ∗) +

1

2trace [σ(c, x)′Vθxx(x)σ(c, x)]

andσr(x) = Vθx(x, θ

∗)σ∗(x, θ∗).

Recall that the worst case ht is given by

ht = − 1

θ∗σ∗(xt, θ∗)′Vx(xt, θ∗)

and thus|ht|22

=

(1

2θ∗2

)Vx(x)

′σ(c, x)σ(c, x)′Vx(x).

Restriction (6.43) can be verified by substituting our formulas for rt, ht, μrand σr. The resulting equation is equivalent to that obtained by differentiat-ing the HJB equation (6.34) with respect to θ, justifying our interpretationof rt as a continuation entropy.

Page 205: Maskin ambiguity book mock 2 - New York University

6.8. Sequential timing protocol for a constraint formulation 197

Minimizing continuation entropy

Having defended a specific construction of continuation entropy that sup-ports a constant value of θ, we now describe a differential game that makesentropy an endogenous state variable. To formulate that game, we considerthe inverse Legendre transform (6.36) from which we construct V from Kby minimizing r. In the recursive version of the constraint game, the statevariable rt is the continuation entropy that at t remains available to allocateacross states at future dates. At date t, continuation entropy is allocatedvia the minimization suggested by the inverse Legendre transform. We re-strict the minimizing player to allocate future rt across states that can berealized with positive probability, conditional on date t information.

Two state example

Before presenting the continuous-time formulation, consider a two-periodexample. Suppose that two states can be realized at date t+ 1, namely ω1

and ω2. Each state has probability one-half under an approximating model.The minimizing agent distorts these probabilities by assigning probabilitypt to state ω1. The contribution to entropy coming from the distortion of

the probabilities is the discrete state analogue of∫log

(dqtdq0t

)dqt, namely,

I(pt) = pt log pt + (1− pt) log(1− pt) + log 2.

The minimizing player also chooses continuation entropies for each ofthe two states that can be realized next period. Continuation entropies arediscounted and averaged according to the distorted probabilities, so thatwe have:

rt = I(pt) + exp(−δ) [ptrt+1(ω1) + (1− pt)rt+1(ω2)] . (6.44)

Let Ut denote the current period utility for an exogenously given processfor ct, and let Vt+1(ω, θ) denote the next period value given state ω. Thisfunction is concave in θ. Construct Vt via backward induction:

Vt(θ) = min0≤pt+1≤1

Ut + θIt(pt)

+ exp(−δ) [ptVt+1(ω1, θ) + (1− pt)Vt+1(ω2, θ)] (6.45)

Compute the Legendre transforms:

Kt(r) = maxθ≥0

Vt(θ)− θr

Page 206: Maskin ambiguity book mock 2 - New York University

198 Chapter 6. Robust Control and Model Misspecification

Kt+1(r, ω) = maxθ≥0

Vt+1(θ, ω)− θr

for ω = ω1, ω2. Given θ∗, let rt be the solution to the inverse Legendretransform:

Vt(θ∗) = min

r≥0Kt(r) + θ∗r.

Similarly, let rt+1(ω) be the solution to

Vt+1(ω, θ∗) = min

r≥0Kt+1(ω, r) + θ∗r.

Substitute the inverse Legendre transforms into the simplified HJB equation(6.45):

Vt(θ∗) = min

0≤pt≤1Ut + θ∗It(pt)

+ exp(−δ)(pt

[minr1≥0

Kt+1(ω1, r1) + θ∗r1

]+ (1− pt)

[minr2≥0

Kt+1(ω2, r2) + θ∗r2

])= min

0≤pt≤1,r1≥0,r2≥0Ut + θ∗ (It(pt) + exp(−δ) [ptr1 + (1− pt)r2])

+ exp(−δ) [ptKt+1(ω1, r1) + (1− pt)Kt+1(ω2, r2)] .

Thus,

Kt(rt) = Vt(θ∗)− θ∗rt

= min0≤pt≤1,r1≥0,r2≥0

maxθ≥0

Ut + θ (It(pt) + exp(−δ) [ptr1 + (1− pt)r2]− rt)

+ exp(−δ) [ptKt+1(ω1, r1) + (1− pt)Kt+1(ω2, r2)] .

Since the solution is θ = θ∗ > 0, at this value of θ the entropy constraint(6.44) must be satisfied and

Kt(rt) = min0≤pt≤1,r1≥0,r2≥0

Ut+exp(−δ) [ptKt+1(ω1, r1) + (1− pt)Kt+1(ω2, r2)] .

By construction, the solution for rj is rt+1(ωj) defined earlier. The recur-sive implementation presumes that the continuation entropies rt+1(ωj) arechosen at date t prior to the realization of ω.

When we allow the decision maker to choose the control ct, this con-struction requires that we can freely change orders of maximization andminimization as in our previous analysis.

Page 207: Maskin ambiguity book mock 2 - New York University

6.9. A recursive multiple priors formulation 199

Continuous-time formulation

In a continuous-time formulation, we allocate the stochastic differential ofentropy subject to the constraint that the current entropy is rt. The incre-ment to r is determined via the stochastic differential equation:20

drt =

(δrt −

|ht|22

− gt · ht)dt+ gt · dBt.

This evolution for r implies that

d(ztrt) =

(δztrt − zt

|ht|22

)dt + zt(rtht + gt)dBt

which has the requisite drift to interpret rt as continuation entropy.The minimizing agent not only picks ht but also chooses gt to allocate

entropy over the next instant. The process g thus becomes a control vec-tor for allocating continuation entropy across the various future states. Informulating the continuous-time game, we thus add a state rt and a con-trol gt. With these added states, the differential game has a value functionzK(x, r), where K satisfies the HJB equation (6.41).

We have deduced this new partial differential equation partly to helpus understand senses in which the constrained problem is or is not timeconsistent. Since rt evolves as an exact function of xt, it is more efficient tocompute V and to use this value function to infer the optimal control lawand the implied state evolution. In the next section, however, we use therecursive constraint formulation to address some interesting issues raisedby Epstein and Schneider (2003b).

6.9 A recursive multiple priors formulation

Taking continuation entropy as a state variable is a convenient way to re-strict the models entertained at time t by the minimizing player in therecursive version of constraint game. Suppose instead that at date t thedecision maker retains the date zero family of probability models withoutimposing additional restrictions or freezing a state variable like continuation

20The process is stopped if rt hits the zero boundary. Once zero is hit, the continuationentropy remains at zero. In many circumstances, the zero boundary will never be hit.

Page 208: Maskin ambiguity book mock 2 - New York University

200 Chapter 6. Robust Control and Model Misspecification

entropy. That would allow the minimizing decision maker at date t to reas-sign probabilities of events that have already been realized and events thatcannot possibly be realized given current information. The minimizing deci-sion maker would take advantage of that opportunity to alter the worst-caseprobability distribution at date t in a way that makes the specification ofprior probability distributions of section 6.5 induce dynamic inconsistencyin a sense formalized by Epstein and Schneider (2003b). They characterizefamilies of prior distributions that satisfy a rectangularity criterion thatshields the decision maker from what they call “dynamic inconsistency”.In this section, we discuss how Epstein and Schneider’s notion of dynamicinconsistency would apply to our setting, show that their proposal for at-taining consistency by minimally enlarging an original set of priors to berectangular will not work for us, then propose our own way of making priorsrectangular in a way that leaves the rest of our analysis intact.

Consider the martingale formulation of the date zero entropy constraint:

E

∫ ∞

0

exp(−δu)zu|hu|22

du ≤ η (6.46)

wheredzt = ztht · dBt.

The component of entropy that constrains our date t decision-maker is:

rt =1

ztE

(∫ ∞

0

zt+u|ht+u|2

2du|Ft

)in states in which zt > 0. We rewrite (6.46) as:

E

∫ t

0

exp(−δu)zu|hu|22

du+ exp(−δt)Eztrt ≤ η.

To illuminate the nature of dynamic inconsistency, we begin by notingthat the time 0 constraint imposes essentially no restriction on rt. Considera date t event that has probability strictly less than one conditioned on datezero information. Let y be a random variable that is equal to zero on theevent and equal to the reciprocal of the probability on the complement of theevent. Thus, y is a nonnegative, bounded random variable with expectationequal to unity. Construct a zu = E(y|Fu). Then z is a bounded nonnegativemartingale with finite entropy and zu = y for u ≥ t. In particular zt is zero

Page 209: Maskin ambiguity book mock 2 - New York University

6.9. A recursive multiple priors formulation 201

on the date t event used to construct y. By shrinking the date t event tohave arbitrarily small probability, we can bring the bound arbitrarily closeto unity and entropy arbitrarily close to zero. Thus, for date t events withsufficiently small probability, the entropy constraint can be satisfied withoutrestricting the magnitude of rt on these events. This exercise isolates ajustification for using continuation entropy as a state variable inheritedat date t: fixing it eliminates any gains from readjusting distortions ofprobabilities assigned to uncertainties that were resolved in previous timeperiods

Epstein and Schneider’s proposal works poorly for us

If we insist on withdrawing an endogenous state variable like rt, dynamicconsistency can still be obtained by imposing restrictions on ht for alterna-tive dates and states. For instance, we could impose prior restrictions inthe separable form

|ht|22

≤ ft

for each event realization and date t. Such a restriction is rectangular inthe sense of Epstein and Schneider (2003b). To preserve a subjective notionof prior distributions, Epstein and Schneider (2003b) advocate making anoriginal set of priors rectangular by enlarging it to the least extent pos-sible. They suggest this approach in conjunction with entropy measuresof the type used here, as well as other possible specifications. However,an ft specified on any event that occurs with probability less than one isessentially unrestricted by the date zero entropy constraint. In continuoustime, this follows because zero measure is assigned to any calendar date, butit also carries over to discrete time because continuation entropy remainsunrestricted if we can adjust earlier distortions. Thus, for our applica-tion Epstein and Schneider’s way of achieving a rectangular specificationthrough the mechanism fails to restrict prior distributions in an interestingway.21

21While Epstein and Schneider (2003b) advocate rectangularization even for entropy-based constraints, they do not claim that it always gives rise to interesting restrictions.

Page 210: Maskin ambiguity book mock 2 - New York University

202 Chapter 6. Robust Control and Model Misspecification

A better way to impose rectangularity

There is an alternative way to make the priors rectangular that has trivialconsequences for our analysis. The basic idea is to separate the choice of ftfrom the choice of ht, while imposing |ht|2

2≤ ft. We then imagine that the

process {ft : t ≥ 0} is chosen ex ante and adhered to. Conditioned on thatcommitment, the resulting problem has the recursive structure advocatedby Epstein and Schneider (2003b). The ability to exchange maximizationand minimization is central to our construction.

From section 6.5, recall that

K(r) = maxθ≥0

V (θ)− θr.

We now rewrite the inner problem on the right side for a fixed θ. Take theBellman-Isaacs condition

zV (x) = minh∈H

maxc∈C

E

∫ ∞

0

exp(−δt)[ztU(ct, xt) + θzt

|ht|22

]dt

with the evolution equations

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt

dzt = ztht · dBt. (6.47)

Decompose the entropy constraint as:

η = E

∫ ∞

0

exp(−δt)ztftdt

where

ft =|ht|22.

Rewrite the objective of the optimization problem as

minf∈F

minh∈H, |ht|2

2≤ft

maxc∈C

E

∫ ∞

0

exp(−δt) [ztU(ct, xt) + θztft] dt

subject to (6.47). In this formulation, F is the set of progressively measur-able scalar processes that are nonnegative. We entertain the inequality

|ht|22

≤ ft

Page 211: Maskin ambiguity book mock 2 - New York University

6.9. A recursive multiple priors formulation 203

but in fact this constraint will always bind for the a priori optimized choiceof f . The inner problem can now be written as:

minh∈H, |ht|2

2≤ft

maxc∈C

E

∫ ∞

0

exp(−δt)ztU(ct, xt)dt

subject to (6.47). Provided that we can change orders of the min andmax, this inner problem will have a rectangular specification of alternativemodels and be dynamically consistent in the sense of Epstein and Schneider(2003b).

Although this construction avoids introducing continuation entropy asan endogenous state variable, it assumes a commitment to a process f thatis computed ex ante by solving what is essentially a static optimizationproblem. That is, f is chosen by exploring its consequences for a dynamicimplementation of the form envisioned by Epstein and Schneider (2003b)and is not simply part of the exogenously ex ante given set of beliefs of thedecision maker.22 We can, however, imagine that at date zero, the decisionmaker accepts the sequence {ft : t ≥ 0} as part of a conditional preferenceformulation. This decision maker then has preferences of a type envisionedby Epstein and Schneider (2003b).

While their concern about dynamic consistency leads Epstein and Schnei-der to express doubts about commitments to a constraint based on contin-uation entropy, they do not examine what could lead a decision-maker tocommit to a particular rectangular set of beliefs embodied in a specificationof f .23 If multiple priors truly are a statement of a decision maker’s sub-jective beliefs, we think it is not appropriate to dismiss such beliefs on thegrounds of dynamic inconsistency. Repairing that inconsistency throughthe enlargements necessary to induce rectangularity reduces the content ofthe original set of prior beliefs. In our context, this enlargement is immense,too immense to be interesting to us.

The reservations that we have expressed about the substantive impor-tance of rectangularity notwithstanding, we agree that Epstein and Schnei-der’s discussion of dynamic consistency opens up a useful discussion of the

22Notice that the Bayesian interpretation is also a trivial special case of a recursivemultiple priors model.

23Furthermore, an analogous skeptical observation about commitment pertains toBayesian decision theory, where the decision maker commits to a specific prior distribu-tion.

Page 212: Maskin ambiguity book mock 2 - New York University

204 Chapter 6. Robust Control and Model Misspecification

alternative possible forms of commitment that allow us to create dynamicmodels with multiple priors.24

6.10 Concluding remarks

Empirical studies in macroeconomics and finance typically assume a uniqueand explicitly specified dynamic statistical model. Concerns about modelmisspecification recognize that an unknown member of a set of alternativemodels might govern the data. But how should one specify those alterna-tive models? With one parameter that measures the size of the set, robustcontrol theory parsimoniously stipulates a set of alternative models withrich dynamics.25 Robust control theory leaves those models only vaguelyspecified and obtains them by perturbing the decision maker’s approximat-ing model to let shocks feed back on state variables arbitrarily. Amongother possibilities, this allows the approximating model to miss the serialcorrelation of exogenous variables and the dynamics of how those exogenousvariables impinge on endogenous state variables.

We have delineated some formal connections that exist between vari-ous formulations of robust control theory and the max-min expected utilitytheory of Gilboa and Schmeidler (1989). Their theory deduces a set of mod-els from a decision maker’s underlying preferences over risky outcomes. Intheir theory, none of the decision maker’s models has the special status thatthe approximating model has in robust control theory. To put Gilboa andSchmeidler’s theory to work, an applied economist would have to impute aset of models to the decision makers in his model (unlike the situation inrational expectations models, where the decision maker’s model would bean equilibrium outcome). A practical attraction of robust control theoryis the way it allows an economist to take a single approximating modeland from it manufacture a set of models that express a decision maker’s

24In the second to last paragraph of their page 16, Epstein and Schneider (2003b)seem also to express reservations about their enlargement procedure.

25Other formulations of robust control put more structure on the class of alterna-tive models and this can have important consequences for decisions. See Onatski andWilliams (2003b) for one more structured formulation and Hansen and Sargent (2006b)for another. By including a hidden state vector and appropriately decomposing the den-sity of next period’s observables conditional on a history of signals, Hansen and Sargent(2006b) extend the approach of this paper to allow a decision maker to have multiplemodels and to seek robustness to the specification of a prior over them.

Page 213: Maskin ambiguity book mock 2 - New York University

6.10. Concluding remarks 205

ambiguity. Hansen and Sargent (2003c) exploit this feature of robust con-trol to construct a multiple agent model in which a common approximatingmodel plays the role that an equilibrium common model does in a rationalexpectations model.

We have used a particular notion of discounted entropy as a statisticalmeasure of the discrepancy between models. It directs our decision maker’sattention to models that are absolutely continuous with respect to his ap-proximating model over finite intervals, but not absolutely continuous withrespect to it over an infinite interval. This specification keeps the decisionmaker concerned about models that can be difficult to distinguish fromthe approximating model from a continuous record of observations on thestate vector of a finite length. Via statistical detection error probabilities,Anderson et al. (2003) show how the penalty parameter or the constraintparameter in the robust control problems can be used to identify a setof perturbed models that are difficult to distinguish statistically from theapproximating model in light of a continuous record of finite length T ofobservations on xt.

Finally, we have made extensive use of martingales to represent per-turbed models. Hansen and Sargent (2005b) and Hansen and Sargent(2006b) Tom XXXXXX: update and fix in reference list. use such mar-tingales to pose robust control and estimation problems in Markov decisionproblems where some of the state variables are hidden.

Page 214: Maskin ambiguity book mock 2 - New York University

206 Chapter 6. Robust Control and Model Misspecification

Appendix 6.A Cast of characters

This appendix sets out the following list of objects and conventions thatmake repeated appearances in our analysis.

1. Probability spaces

a) A probability space associated with a Brownian motion B thatis used to define an approximating model and a set of alternativemodels.

b) A probability space over continuous functions of time induced byhistory of the Brownian motion B in part 1a and used to definean approximating model.

c) A set of alternative probability distributions induced by B andused to define a set alternative models.

2. Ordinary (single-agent) control problems

a) A benchmark optimal control problem defined on space 1a.

b) A benchmark decision problem defined on the probability spaceinduced by B.

c) A risk-sensitive problem defined on space 1a.

d) Alternative Bayesian (benchmark problems) defined on the spacesin 1c.

3. Representations of alternative models

a) As nonnegative martingales with unit expectation the probabil-ity space 1a.

b) As alternative induced distributions as in 1c.

4. Restrictions on sets of alternative models

a) An implicit restriction embedded in a nonnegative penalty pa-rameter θ.

b) A constraint on relative entropy, a measure of model discrepancy.

5. Representations of relative entropy

Page 215: Maskin ambiguity book mock 2 - New York University

6.B. Discounted entropy 207

a) Time 0 (nonsequential): discounted expected log likelihood ratioof an approximating model q0 to an alternative model q drawnfrom the set 1c.

b) Time 0 (nonsequential): a function of a martingale defined onthe probability space 1a.

c) Recursive: as a solution of either of a differential equations de-fined in terms of B.

6. Timing protocols for zero-sum two-player games

a) Exchange of order of choice for maximizing and minimizing play-ers.

b) Under two-sided commitment at t = 0, both players choose pro-cesses for all time t ≥ 0.

c) With lack of commitment on two sides, both players choose se-quentially.

Appendix 6.B Discounted entropy

Let Q be the set of all distributions that are absolutely continuous withrespect to q0 over finite intervals. This set is convex. For q ∈ Q, let

R(q).= δ

∫ ∞

0

exp(−δt)[∫

log

(dqtdq0t

)dqt

]dt,

which may be infinite for some q ∈ Q.

Claim 6.B.1. R is convex on Q.

Proof. Since q ∈ Q is absolutely continuous with respect to q0 over finiteintervals, we can construct likelihood ratios for finite histories at any calen-dar date t. Form Ω = Ω∗×R

+ where R+ is the nonnegative real line. Formthe corresponding sigma algebra F as the smallest sigma algebra containingF∗t ⊗ Bt for any t where Bt is the collection of Borel sets in [0, t]; and form

q as the product measure q with an exponential distribution with densityδ exp(−δt) for any q ∈ Q. Notice that q is a probability distribution andR(q) is the relative entropy of q with respect to q0:

R(q) =

∫log

(dq

dq0

)dq.

Page 216: Maskin ambiguity book mock 2 - New York University

208 Chapter 6. Robust Control and Model Misspecification

Form two measures q1 and q2 as the product of q1 and q2 with an exponentialdistribution with parameter δ. Then a convex combination of q1 and q2 isgiven by the product of the corresponding convex combination of q1 and q2

with the same exponential distribution. Relative entropy is well known tobe convex in the probability measure q (e.g., see Dupuis and Ellis (1997)),and hence R is convex in q.

Recall that associated with any probability measure q that is absolutelycontinuous with respect to q0 over finite intervals is a nonnegative mar-tingale z defined on (Ω,F , P ) with a unit expectation. This martingalesatisfies the integral equation:

zt = 1 +

∫ t

0

zuhudBu. (6.48)

Claim 6.B.2. Suppose that qt is absolutely continuous with respect to q0tfor all 0 < t < ∞. Let z be the corresponding nonnegative martingale on(Ω,F , P ). Then

Ezt1{∫ t0|hs|2ds<∞} = 1.

Moreover, ∫log

dqtdq0t

dqt =1

2E

∫ t

0

zs|hs|2ds.

Proof. Consider first the claim that

Ezt1{∫ t0 |hs|2ds<∞} = 1,

The martingale z satisfies the stochastic differential equation:

dzt = zthtdBt

with initial condition z0 = 1. Construct an increasing sequence of stoppingtimes {τn : n ≥ 1} where τn

.= inf{t : zt = 1

n} and let τ = limn τn. The

limiting stopping time can be infinite. Then zt = 0 for t ≥ τ and

zt = zt∧τ

Form:znt = zt∧τn

Page 217: Maskin ambiguity book mock 2 - New York University

6.B. Discounted entropy 209

which is nonnegative martingale satisfying:

dznt = znt hnt dBt

where hnt = ht if 0 < t < τn and hnt = 0 if t ≥ τn. Then

P

{∫ t

0

|hns |2(zns )2ds <∞}

= 1

and hence

P

{∫ t

0

|hns |2ds <∞}

= P

{∫ t∧τn

0

|hs|2ds <∞}

= 1.

Taking limits as n gets large,

P

{∫ t∧τ

0

|hs|2ds <∞}

= 1.

While it is possible that τ < ∞ with positive P probability, as argued byKabanov et al. (1979)∫

zt1{τ<∞}dP =

∫{zt=0, t<∞}

ztdP = 0.

Therefore,

Ezt1{∫ t0|hs|2ds<∞} = Ezt1{∫ t∧τ

0|hs|2ds<∞,τ=∞} + Ezt1{∫ t

0|hs|2ds<∞,τ<∞} = 1.

Consider next the claim that∫log

dqtdq0t

dqt = E

∫ t

0

zs|hs|2ds.

We first suppose that

E

∫ t

0

zs|hs|2ds <∞. (6.49)

We will subsequently show that this condition is satisfied when R(q) <∞.Use the martingale z to construct a new probability measure P on (Ω,F).Then from the Girsanov Theorem [see Theorem 6.2 of Liptser and Shiryaev(2000)]

Bt = Bt −∫ t

0

hsds

Page 218: Maskin ambiguity book mock 2 - New York University

210 Chapter 6. Robust Control and Model Misspecification

is a Brownian motion with respect to the filtration {Ft : t ≥ 0}. Moreover,

E

∫ t

0

|hs|2ds = E

∫ t

0

zs|hs|2ds.

Write

log zt =

∫ t

0

hs · dBs −1

2

∫ t

0

|hs|2ds =∫ t

0

hs · dBs +1

2

∫ t

0

|hs|2ds.

which is well defined under the P probability. Moreover,

E

∫ t

0

hs · dBs = 0

and hence

E log zt =1

2E

∫ t

0

|hs|2ds =1

2E

∫ t

0

zs|hs|2ds,

which is the desired equality. In particular, we have proved that∫log dqt

dq0tdqt

is finite.Next we suppose that ∫

logdqtdq0t

dqt <∞,

which will hold when R(q) < ∞. Then Lemma 2.6 from Follmer (1985)insures that

1

2E

∫ t

0

|hs|2ds ≤∫

logdqtdq0t

dqt.

Follmer’s result is directly applicable because∫log dqt

dq0tdqt is the same as the

relative entropy of Pt with respect to Pt where Pt is the restriction of P toevents in Ft and Pt is defined similarly. As a consequence, (6.49) is satisfiedand the desired equality follows from our previous argument.

Finally, notice that 12E∫ t0|hs|2ds is infinite if, and only if

∫log dqt

dq0tdqt is

infinite.

Claim 6.B.3. For q ∈ Q, let z be the nonnegative martingale associatedwith q and let h be the progressively measurable process satisfying (6.48).Then

R(q) =1

2E

[∫ ∞

0

exp(−δt)zt|ht|2dt]

Page 219: Maskin ambiguity book mock 2 - New York University

6.C. Absolute continuity of solutions 211

Proof. The conclusion follows from:

R(q) = δ

∫ ∞

0

exp(−δt)∫

log

(dqtdq0t

)dqtdt

2E

[∫ ∞

0

exp(−δt)∫ t

0

zu|hu|2dudt]

=1

2E

[∫ ∞

0

exp(−δt)zt|ht|2dt]

where the second equality follows from 6.B.2 and the third from integratingby parts.

This justifies our definition of entropy for nonnegative martingales:

R(z) =1

2E

[∫ ∞

0

exp(−δt)zt|ht|2dt].

Appendix 6.C Absolute continuity of

solutions

In this appendix we show how to verify that the solution for z from themartingale robust control problem is in a fact a martingale and not just alocal martingale. Our approach to studying absolute continuity and verify-ing that the Markov perfect equilibrium z is a martingale differs from theperhaps more familiar use of a Novikov or Kazamaki condition.26

Consider two distinct stochastic differential equations. One is the Markovsolution to the penalty robust control problem.

dx∗t = μ∗(x∗t )dt+ σ∗(x∗t )dBt

dz∗t = z∗t αh(x∗t )dBt. (6.50)

where μ∗(x) = μ(αc(x), x), σ∗(x) = σ(αc(x), x) and where αc and αh are

the solutions from the penalty robust control problem. Notice that theequation for the evolution of x∗t is autonomous (it does not depend on z∗t ).Let a strong solution to this equation system be:

x∗t = Φ∗t (B).

26We construct two well defined Markov processes and verify absolute continuity.Application of the Novikov or Kazamaki conditions entails imposing extra moment con-ditions on the objects used to construct the local martingale z.

Page 220: Maskin ambiguity book mock 2 - New York University

212 Chapter 6. Robust Control and Model Misspecification

Consider a second stochastic differential equation:

dxt = μ∗(xt)dt+ σ∗(xt)[αh(xt)dt+ dBt

](6.51)

In verifying that this state equation has a solution, we are free to examineweak solutions provided that Ft is generated by current and past xt and Bdoes not generate a larger filtration than x.

The equilibrium outcomes x∗ and x for the two stochastic differentialequations thus induce two distributions for x. We next study how thesedistributions are related. We will discuss how to check for absolute conti-nuity along finite intervals for induced distributions associated with thesemodels. When the models satisfy absolute continuity over finite intervals,it will automatically follow that the equilibrium process z∗ is a martingale.

Comparing models of B

We propose the following method to transform a strong solution to (6.50)into a possibly weak solution to (6.51). Begin with a Brownian motion Bdefined on a probability space with probability measure P . Consider therecursive solution:

xt = Φ∗t (B)

Bt = Bt +

∫ t

0

αh(xu)du.

We look for solutions in which Ft is generated by current and past valuesof B (not B). We call this a recursion because B is itself constructed frompast values of B and B. The stochastic differential equation associated withthis recursion is (6.51).

To establish the absolute continuity of the distribution induced by Bwith respect to Weiner measure q0 it suffices to verify that for each t

E

∫ t

0

|αh(xu)|2du <∞

and hence

P

{∫ t

0

|αh(xu)|2du <∞}

= 1. (6.52)

Page 221: Maskin ambiguity book mock 2 - New York University

6.C. Absolute continuity of solutions 213

It follows from Theorem 7.5 of Liptser and Shiryaev (2000) that the proba-bility distribution induced by B under the solution to the perturbed prob-lem is absolutely continuous with respect to Wiener measure q0. To exploredirectly the weaker relation (6.52) further, recall that

αh(x) = −1

θσ∗(x)′Vx(x).

Provided that σ∗ and Vx are continuous in x and that x does not explodein finite time, this relation follows immediately.

Comparing generators

Another strategy for checking absolute continuity is to follow the approachof Kunita (1969), who provides characterizations of absolute continuity andequivalence of Markov models through restrictions on the generators of theprocesses. Since the models for x∗ and x are Markov diffusion processes,we can apply these characterizations provided that we include B as partof the state vector. Abstracting from boundary behavior, Kunita (1969)requires a common diffusion matrix, which can be singular. The differencesin the drift vector are restricted to be in the range of the common diffusionmatrix. These restrictions are satisfied in our application.

Verifying z∗ is a martingale

We apply our demonstration of absolute continuity to reconsider the supermartingale z∗. Let κt denote the Radon-Nikodym derivative for the twomodels of B. Conjecture that

z∗t = κt(B).

By construction, z∗ is a nonnegative martingale defined on (Ω,F , P ). More-over, it is the unique solution to the stochastic differential equation (6.50)subject to the initial condition z∗0 = 1. See Theorem 7.6 of Liptser andShiryaev (2000).

Page 222: Maskin ambiguity book mock 2 - New York University

214 Chapter 6. Robust Control and Model Misspecification

Appendix 6.D Three ways to verify

Bellman-Isaacs condition

This appendix describes three alternative conditions that are sufficient toverify the Bellman-Isaacs condition embraced in Assumption 6.7.1.27 Theability to exchange orders of extremization in the recursive game impliesthat the orders of extremization can also be exchanged in the nonsequentialgame, as required in Assumption 6.5.5. As we shall now see, the exchangeof order of extremization asserted in Assumption 6.7.1 can often be verifiedwithout knowing the value function S.

No binding inequality restrictions

Suppose that there are no binding inequality restrictions on c. Then ajustification for Assumption 6.7.1 can emerge from the first-order conditionsfor c and h. Define

χ(c, h, x).= U(c, x) +

θ

2h · h +

[μ(c, x) + σ(c, x)h

]· Sx(x)

+1

2trace [σ(c, x)′Sxx(x)σ(c, x)] ,(6.53)

and suppose that χ is continuously differentiable in c. First, find a Markovperfect equilibrium by solving:

∂χ

∂c(c∗, h∗, x) = 0

∂χ

∂h(c∗, h∗, x) = 0.

In particular, the first-order conditions for h are:

∂χ

∂h(c∗, h∗, x) = θh∗ + σ(c∗, x)′Sx(x) = 0.

If a unique solution exists and if it suffices for extremization, the Bellman-Isaacs condition is satisfied. This follows from the “chain rule.” Thus,

27Fleming and Souganidis (1989) show that the freedom to exchange orders of maxi-mization and minimization guarantees that equilibria of the nonsequential (i.e., choicesunder mutual commitment at date 0) and the recursive games (i.e., sequential choicesby both agents) coincide.

Page 223: Maskin ambiguity book mock 2 - New York University

6.D. Three ways to verify Bellman-Isaacs condition 215

suppose that the minimizing player goes first and computes h as a functionof x and c:

h∗ = −1

θσ(c, x)′Sx(x) (6.54)

Then the first-order conditions for the max player selecting c as a functionof x are:

∂χ

∂c+∂h

∂c

′∂χ∂h

= 0

where ∂h∂c

can be computed from the reaction function (6.54). Notice thatthe first-order conditions for the maximizing player are satisfied at theMarkov perfect equilibrium. A similar argument can be made if the maxi-mizing player chooses first.

Separability

Consider next the case in which σ does not depend on the control. In thiscase the decision problems for c and h separate. For instance, from (6.54),we see that h does not react to c in the minimization of h conditionedon c. Even with binding constraints on c, the Bellman-Isaacs condition(Assumption 6.7.1) is satisfied, provided that a solution exists for c.

Convexity

A third approach that uses results of Fan (1952) and Fan (1953) is based onthe global shape properties of the objective. When we can reduce the choiceset C to be a compact subset of a linear space, Fan (1952) can apply. Fan(1952) also requires that the set of conditional minimizers and maximizersbe convex. We know from formula (6.54) that the minimizers of χ(c, ·, x)form a singleton set, which is convex for each c and x.28 Suppose alsothat the set of maximizers of χ(·, h, x) is non-empty and convex for eachh and x.29 Then again the Bellman-Isaacs condition (Assumption 6.7.1) issatisfied. Finally Fan (1953) does not require that the set C be a subset of

28 Notice that provided C is compact, we can use (6.54) to specify a compact set thatcontains the entire family of minimizers for each c in C and a given x.

29See Ekeland and Turnbull (1983) for a discussion of continuous time, deterministiccontrol problems when the set of minimizers is not convex. They show that sometimesit is optimal to chatter between different controls as a way to imitate convexification incontinuous time.

Page 224: Maskin ambiguity book mock 2 - New York University

216 Chapter 6. Robust Control and Model Misspecification

a linear space, but instead requires that χ(·, h, x) be concave. By relaxingthe linear space structure we can achieve compactness by adding points (saythe point ∞) to the control set, provided that we can extend χ(·, h, x) tobe upper semi-continuous. The extended control space must be a compactHausdorff space. Provided that the additional points are not attained inoptimization, we can apply Fan (1953) to verify Assumption 6.7.1.30

Appendix 6.E Recursive Stackelberg game

and Bayesian problem

Recursive version of a Stackelberg game

We first change the timing protocol for decision-making, moving from theMarkov perfect equilibrium that gives rise to a value function V to a datezero Stackelberg equilibrium with value function N . In the matrix manipu-lations that follow, state vectors and gradient vectors are treated as columnvectors when they are pre-multiplied by matrices.

The value function V solves:

δV (x) = maxc∈C

minhU(c, x) +

θ

2h · h +

[μ(c, x) + σ(c, x)h

]· Vx(x)

+1

2trace [σ(c, x)′Vxx(x)σ(c, x)]

Associated with this value function are the first-order conditions for thecontrols:

θh + σ(c, x)′ · Vx(x) = 0∂

∂c

(U(c, x) +

[μ(c, x) + σ(c, x)h

]· Vx(x) +

1

2trace [σ(c, x)′Vxx(x)σ(c, x)]

)= 0.

Solving these first-order conditions gives the control laws ht = α(xt) andct = αc(xt). Define μ∗ and σ∗ such that the states evolve according to

dxt = μ∗(xt)dt+ σ∗(xt)dBt

30Apply Theorem 2 of Fan (1953) to −χ(·, ·, x). This theorem does not require com-pactness of the choice set for h, only of the choice set for c. The theorem also doesnot require attainment when optimization is over the noncompact choice set. In ourapplication, we can verify attainment directly.

Page 225: Maskin ambiguity book mock 2 - New York University

6.E. Recursive Stackelberg game and Bayesian problem 217

after the two optimal controls are imposed. Associated with this recursiverepresentation are processes h and c that can also be depicted as functionsof the history of the underlying Brownian motion B.

When the Bellman-Isaacs condition is satisfied, Fleming and Souganidis(1989) provide a formal justification for an equivalent date zero Stackelbergsolution in which the minimizing agent announces a decision process {ht :t ≥ 0} and the maximizing agent reacts by maximizing with respect to{ct : t ≥ 0}. We seek a recursive representation of this solution by using abig X, little x formulation. Posit a worst-case process for Xt of the form:

dXt = μ∗(Xt)dt+ σ∗(Xt) [αh(Xt)dt+ dBt] .

This big X process is designed so that it produces the same process forht = αh(Xt) that is implied by the Markov perfect equilibrium associatedwith the value function V when X0 = x0.

The big X process cannot be influenced by the maximizing agent, butlittle x can:

dxt = μ(ct, xt)dt+ σ(ct, xt) [αh(Xt)dt+ dBt] .

Combining the two state evolution equations, we have a Markov controlproblem faced by the maximizing agent. It gives rise to a value function Nsatisfying a HJB equation:

δN(x, X) = maxc∈C

U(c, x) + μ(c, x) ·Nx(x, X) + μ∗(x) ·NX(X, X)

+1

2trace

([σ(c, x)′ σ∗(X)′

] [Nxx(x, X) NxX(x, X)NXx(x, X) NXX(x, X)

] [σ(c, x)σ∗(X)

])(6.55)

+αh(X) · σ(c, x)′Nx(x, X) + αh(X) · σ∗(X)′NX(x, X)

2αh(X) · αh(X).

We want the outcome of this optimization problem to produce the samestochastic process for c (ct as a function of current and past values of theBrownian motion Bt) provided that X0 = x0. For this to happen, the valuefunctions V and N must be closely related. Specifically,

Nx(x, X)|X=x = Vx(x)NX(x, X)|X=x = 0. (6.56)

Page 226: Maskin ambiguity book mock 2 - New York University

218 Chapter 6. Robust Control and Model Misspecification

The first restriction equates the co-state on little x with the implied co-statefrom the Markov perfect equilibrium along the equilibrium trajectory. Thesecond restriction implies that the co-state vector for big X is zero alongthis same trajectory.

These restrictions on the first derivative, imply restrictions on the secondderivative. Consider a perturbation of the form:

x+ rν, X + rν

for some scalar r and some direction ν. The directions that interest us arethose in the range of σ∗(X), which are the directions that the Brownianmotion can move the state to. Since (6.56) holds,

Nxx(x, X)ν +NxX(x, X)ν|X=x = Vxx(x)νNXx(x, X)ν +NXX(x, X)ν|X=x = 0.

From HJB (6.55), we could find a control law that expresses c as afunction of x and X. We are only concerned, however, with c evaluatedin the restricted domain x = X. Given the presumed restrictions on thefirst derivative and the derived restrictions on the second derivative, we canshow that c = αc(x) satisfies the first-order conditions for c provided onthis restricted domain.

Changing the objective

The value function for a Bayesian problem does not include a penalty term.In the recursive representation of the date zero Stackelberg problem, thepenalty term is expressed completely in terms of big X. We now show howto adjust the value function L by solving a Lyapunov equation.

The function that we wish to compute solves:

L(X) =θ

2E

∫ ∞

0

exp(−δt)|αh(Xt)|2dt

subject to

dXt = μ∗(Xt)dt+ σ∗(Xt) [αh(Xt)dt+ dBt] .

where X0 = X.

Page 227: Maskin ambiguity book mock 2 - New York University

6.E. Recursive Stackelberg game and Bayesian problem 219

The value function L for this problem solves:

δL(X) =θ

2αh(X) · αh(X) + μ∗(X) · LX(X)

+1

2trace

[σ∗(X)′LXX(X)σ∗(X)

]+ αh(X) · σ∗(X)′Lx(X).(6.57)

Bayesian value function

To construct a Bayesian value function we form:

W (x, X) = N(x, X)− L(X).

Given equations (6.55) and (6.57), the separable structure of W impliesthat it satisfies the HJB equation:

δW (x, X) = maxc∈C

U(c, x) + μ(c, x) ·Wx(x, X) + μ∗(x) ·WX(X, X)

+1

2trace

([σ(c, x)′ σ∗(X)′

] [Wxx(x, X) WxX(x, X)WXx(x, X) WXX(x, X)

] [σ(c, x)σ∗(X)

])+αh(X) · σ(c, x)′Wx(x, X) + αh(X) · σ∗(X)′WX(x, X)

Then zW (x, X) the value function for the stochastic control problem:

zW (x, X) = E

∫ ∞

0

exp(−δt)ztU(ct, xt)dt

and evolution:

dxt = μ(ct, xt)dt+ σ(ct, xt)dBt

dzt = ztαh(Xt)dBt

dXt = μ∗(Xt)dt+ σ∗(Xt)dBt

where z0 = z, x0 = x and X0 = X . To interpret the nonnegative z asinducing a change in probability, we initialize z0 at unity.

Also, W (x, X, θ) is the value function for a control problem with dis-counted objective:

W (x, X) = maxc∈C

E

∫ ∞

0

exp(−δt)U(ct, xt)dt

Page 228: Maskin ambiguity book mock 2 - New York University

220 Chapter 6. Robust Control and Model Misspecification

and evolution:

dxt = μ(ct, xt)dt+ σ(ct, xt)[αh(Xt)dt+ dBt

]dXt = μ∗(Xt)dt+ σ∗(Xt)

[αh(Xt)dt+ dBt

].

This value function is constructed using a perturbed specification wherea Brownian increment dBt is replaced by an increment αh(Xt)dt + dBt

with a drift distortion that depends only on the uncontrollable state X.This perturbation is justified via the Girsanov Theorem, provided that weentertain a weak solution to the stochastic differential equation governingthe state evolution equation.

Page 229: Maskin ambiguity book mock 2 - New York University

Chapter 7

Doubts or Variability?

1

Abstract

Reinterpreting most of the market price of risk as a price ofmodel uncertainty eradicates a link between asset prices and mea-sures of the welfare costs of aggregate fluctuations that was proposedby Hansen et al. (1999), Tallarini (2000a), and Alvarez and Jermann(2004). Prices of model uncertainty contain information about thebenefits of removing model uncertainty, not the consumption fluc-tuations that Lucas (1987a, 2003) studied. A max-min expectedutility theory lets us reinterpret Tallarini’s risk-aversion parameteras measuring a representative consumer’s doubts about the modelspecification. We use model detection instead of risk-aversion ex-periments to calibrate that parameter. Plausible values of detectionerror probabilities give prices of model uncertainty that approachthe Hansen and Jagannathan (1991) bounds. Fixed detection errorprobabilities give rise to virtually identical asset prices as well as vir-tually identical costs of model uncertainty for Tallarini’s two modelsof consumption growth.

1Coauthored with Francisco Barillas. We thank David Backus, Marc Giannoni, Syd-ney Ludvigson, Fabio Maccheroni, Massimo Marinacci, Monika Piazzesi, Martin Schnei-der, Nancy Stokey, Tomasz Strzalecki, and two referees for very helpful comments onearlier drafts.

221

Page 230: Maskin ambiguity book mock 2 - New York University

222 Chapter 7. Doubts or Variability?

Key words: Risk aversion, model misspecification, robustness, marketprice of risk, equity premium puzzle, risk-free rate puzzle, detection errorprobability, costs of model uncertainty.

Page 231: Maskin ambiguity book mock 2 - New York University

7.1. Introduction 223

No one has found risk aversion parameters of 50 or 100 in the di-versification of individual portfolios, in the level of insurance de-ductibles, in the wage premiums associated with occupations withhigh earnings risk, or in the revenues raised by state-operatedlotteries. It would be good to have the equity premium resolved,but I think we need to look beyond high estimates of risk aversionto do it.

Robert Lucas, Jr., January 10, 2003

7.1 Introduction

In terms of their effects on asset prices and real quantities, can plausibleconcerns about robustness to model misspecification substitute for the im-plausibly large risk aversion parameters that bother Lucas in the aboveepigraph?2 To answer this question, we reinterpret an elegant graph of Tal-larini (2000a) by transforming Tallarini’s CRRA risk-aversion parameter γinto a parameter that measures a set of probability models for consump-tion growth that are difficult to distinguish and over which a representativeconsumer seeks a robust valuation. To restrict γ, we use detection errorprobabilities that measure the proximity of probability distributions, as ad-vocated by Anderson et al. (2003) and Hansen and Sargent (2008b, ch. 9),and we recast Tallarini’s key diagram in terms of model detection errorprobabilities. A connection between model detection probabilities and aprice of model uncertainty transcends specific approximating models. Thatprice compensates the representative consumer for bearing model uncer-tainty, not risk.3 We show that modest amounts of model uncertainty can

2Hansen et al. (1999) describe a locus of (β, γ) pairs that are observationally equiva-lent for consumption and investment in linear-quadratic production economies, but thatnevertheless imply different prices for risky assets. This finding is the basis of what Lu-cas (2003, p. 7) calls Tallarini’s (2000) finding of “an astonishing separation of quantityand asset price determination . . ..” Although this paper studies only pure endowmenteconomies, the analytical observational equivalence result of Hansen et al. (1999) andthe approximate version of that result in Tallarini (2000a) make us confident that thetheoretical values of quantities that will emerge from production economies will not beaffected by alterations in the risk-sensitivity parameter γ that we use to measure concernsabout model misspecification.

3See Anderson et al. (2003).

Page 232: Maskin ambiguity book mock 2 - New York University

224 Chapter 7. Doubts or Variability?

substitute for large amounts of risk aversion in terms of choices and effectson asset prices.

Reinterpreting risk prices as model uncertainty prices makes them un-informative about the benefits of reducing aggregate fluctuations as definedby Lucas (1987a, 2003) and implies that those costs were mismeasured byTallarini (2000a) and Alvarez and Jermann (2004), who used connectionsbetween risk prices and costs of fluctuations that had been set forth byHansen et al. (1999). To elaborate on this observation, we fashion a mentalexperiment about the welfare benefits from removing model uncertainty, anexperiment that differs conceptually from Lucas’s, but about which pricesof model uncertainty are informative.

Section 7.2 reviews Hansen and Jagannathan’s 1991 characterization ofthe equity premium and risk free rate puzzles that emerge with time sep-arable CRRA preferences. Section 7.3 describes the stochastic setting andpreferences that express aversion to model uncertainty. Sections 7.4 and7.5 describe how Tallarini (2000a) used a preference of Kreps and Porteus(1978c) to find values of a risk-aversion parameters γ, one for a random walkmodel of log consumption, another for a trend stationary model, that canexplain the risk-free rate puzzle of Weil (1990). But those values of γ areso high that they provoked Lucas’s skeptical remark. Section 7.6 defines aconcern about robustness to alternative models of log consumption growththat are constructed using martingale perturbations that Hansen and Sar-gent (2005b, 2007b) and Hansen et al. (2006b) used to represent alternativespecifications that are statistically near an approximating model. We thenreinterpret Tallarini’s utility recursion in terms of some max-min expectedutility formulations in which the minimization operator expresses an agent’sdoubts about his stochastic specification. We describe senses in which riskaversion and model uncertainty aversion are and are not observationallyequivalent. Section 7.7 reinterprets Tallarini’s findings in terms of modeluncertainty aversion. We use detection error probabilities to justify select-ing different context-specific values of γ for the two approximating modelsof log consumption growth used by Tallarini, then modify Tallarini’s keyfigure by recasting it in terms of detection probabilities. The figure revealsa link between the market price of model uncertainty and the detectionerror probability that transcends differences in the stochastic specificationof the representative consumer’s approximating model for log consumptiongrowth, an outcome that could be anticipated from the tight relationshipbetween the market price of model uncertainty and a large deviation bound

Page 233: Maskin ambiguity book mock 2 - New York University

7.2. The equity premium and risk-free rate puzzles 225

on detection error probabilities derived by Anderson et al. (2003). Section7.8 measures the benefits from a hypothetical experiment that removesmodel uncertainty, explains how this experiment differs from the mentalexperiment that underlies calculations of the benefits of reducing aggregatefluctuations by Lucas (1987a, 2003), Tallarini (2000a), and Alvarez and Jer-mann (2004), and tells how the benefits of eliminating model uncertaintyare reflected in the market price of model uncertainty. Section 7.9 discusseswhether and how someone can learn not to fear model misspecification.Section 7.10 concludes.

Our analysis highlights how random shocks confront consumers withmodel ambiguity by obscuring differences among statistical models. Hererandom shocks discomfort consumers and investors in ways that Lucas(1987a, 2003), Tallarini (2000a), and Alvarez and Jermann (2004) ignored.

7.2 The equity premium and risk-free rate

puzzles

Along with Tallarini (2000a), we begin with a characterization of the risk-free rate and equity premium puzzles by Hansen and Jagannathan (1991).The random variable mt+1,t is said to be a stochastic discount factor if itconfirms the following equation for the time t price pt of a one-period payoffyt+1:

pt = Et (mt+1,tyt+1) ,

where Et denotes the mathematical expectation conditioned on date t in-formation. For time-separable CRRA preferences with discount factor β,mt+1,t is simply the marginal rate of substitution:

mt+1,t = β

(Ct+1

Ct

)−γ(7.1)

where Ct is consumption and γ is the coefficient of relative risk aversion.The reciprocal of the gross one-period risk-free rate is

1

Rft

= Et [mt+1,t] = Et

(Ct+1

Ct

)−γ]. (7.2)

Let ξt+1 be the one-period excess return on a security or portfolio of securi-ties. Using the definition of a conditional covariance and a Cauchy-Schwarz

Page 234: Maskin ambiguity book mock 2 - New York University

226 Chapter 7. Doubts or Variability?

inequality, Hansen and Jagannathan (1991) deduce the following bound:

|Et [ξt+1]|σt (ξt+1)

≤ σt (mt,t+1)

Et [mt,t+1]. (7.3)

The left-hand side of (7.3) is the Sharpe ratio. The maximum Sharpe ratio iscommonly called the market price of risk. It is the slope of the (conditional)mean-standard deviation frontier and is the increase in the expected rate ofreturn needed to compensate an investor for bearing a unit increase in thestandard deviation of return along the efficient frontier. The Sharpe ratiois bounded by the right-hand side of relation (7.3). With complete marketsthe bound is attained.

A counterpart to this inequality uses unconditional expectations andresults in Hansen and Jagannathan’s 1991 statement of the equity premiumpuzzle.4 To reconcile formula (7.1) with measures of the market price ofrisk extracted from data on asset returns and prices only (like those in table7.1) requires a value of γ so high that it elicits doubts like those expressedby Lucas (2003) in the epigraph starting this paper.5 6

But another failure isolated by the X’s in figure 7.1 motivated Tallar-ini (2000a). The figure plots an unconditional version of the Hansen andJagannathan bound (the parabola) as well as the X’s, which are pairs ofunconditional mean E(m) and the unconditional standard deviation σ(m)implied by equations (7.1) and (7.2) for different values of γ.7 The fig-ure addresses whether values of γ can be found for which the associated

4Conditioning information is brought in through the back door by scaling payoffs byvariables in the conditioning information set and using an expanded collection of payoffswith prices that are one on average in place of gross returns.

5The “market price of risk” reported in 7.1 ignores conditioning information, but itremains a valid lower bound on the ratio of of the volatility of the intertemporal marginalrate of substitution relative to its mean.

6Precursors to Hansen and Jagannathan (1991) are contained in Shiller (1982) and acomment there by Hansen. Shiller deduced an inequality from the marginal distributionsof consumption and returns while Hansen and Jagannathan (1991) use the marginaldistribution for the stochastic discount factor and the joint distribution for returns.Hansen and Jagannathan (1991) thus featured maximal Sharpe ratios in their volatilitybounds.

7For CRRA time-separable preferences, formulas for E(m) and σ(m)/E(m)

are, first, for the random walk model, E [m] = β exp[γ(−μ+

σ2εγ2

)]and

σ(m)E[m] =

{exp

[σ2εγ

2]− 1

} 12 and, second, for the trend stationary model E [m] =

β exp[γ(−μ+

σ2εγ2

(1 + 1−ρ

1+ρ

))]and σ(m)

E[m] ={exp

[σ2εγ

2(1 + 1−ρ

1+ρ

)]− 1

} 12

.

Page 235: Maskin ambiguity book mock 2 - New York University

7.2. The equity premium and risk-free rate puzzles 227

Table 7.1: Sample moments from quarterly U.S. data 1948:1-2006:4, re is thereal quarterly return on the value-weighted NYSE portfolio and rf is the realquarterly return on the three month Treasury bill. Returns are measured inpercent per quarter.

Return Mean Std. dev.re 2.27 7.68rf 0.32 0.61

re − rf 1.95 7.67Market price of risk: 0.25

0.8 0.85 0.9 0.95 10

0.05

0.1

0.15

0.2

0.25

0.3

E(m)

σ(m

)

HJ boundsCRRARWTS

Figure 7.1: Solid line: Hansen-Jagannathan volatility bound for quarterly re-turns on the value-weighted NYSE and Treasury Bill, 1948-2006. Circles: Meanand standard deviation for intertemporal marginal rate of substitution gener-ated by Epstein-Zin preferences with random walk consumption. Pluses: Meanand standard deviation for stochastic discount factor generated by Epstein-Zinpreferences with trend stationary consumption. Crosses: Mean and standard de-viation for intertemporal marginal rate of substitution for CRRA time separablepreferences. The coefficient of relative risk aversion, γ takes on the values 1, 5,10, 15, 20, 25, 30, 35, 40, 45, 50 and the discount factor β=0.995.

Page 236: Maskin ambiguity book mock 2 - New York University

228 Chapter 7. Doubts or Variability?

(E(m), σ(m)

)pairs are inside the unconditional version of Hansen and Ja-

gannathan bounds. The line of X’s shows that high values of γ deliver highmarket prices of risk but also push the reciprocal of the risk-free rate downand therefore away from the Hansen and Jagannathan bounds. This is therisk-free rate puzzle of Weil (1990).

In section 7.5 we shall explain the loci of circles and crosses in figure7.1. These loci depict how, by adopting a recursive preference specification,Tallarini (2000a) found values of γ that pushed

(E(m), σ(m)

)pairs inside

the Hansen and Jagannathan bounds. That achievement registered as amixed success because the values of γ that work are so high that, wheninterpreted as measures of risk-aversion, they provoked Lucas’s skepticalremark.

7.3 The choice setting

To prepare the way for Tallarini’s findings and our reinterpretation of them,it is convenient to introduce the following objects in terms of which alter-native decision theories are cast.

Shocks and consumption plans

We let ct = logCt, x0 be an initial state vector, εt = [εt, εt−1, . . . , ε1], and{εt+1, t ≥ 0} be a sequence of random shocks with conditional densitiesπt+1(·)|εt, x0) and an implied joint distribution Π∞(·|x0) over the entiresequence. Let C be a set of consumption plans C∞ whose time t elements Ctare measurable functions of (εt, x0). Soon we shall consider a restricted classof consumption plans in C that have the following recursive representation:

xt+1 = Axt +Bεt+1

ct = Hxt (7.4)

where xt is an n×1 state vector, εt+1 is an m×1 shock, and the eigenvaluesof A are bounded in modulus by 1√

β. Representation (7.4) implies that the

time t element of a consumption plan can be expressed as the followingfunction of x0 and the history of shocks:

ct = H(Bεt + ABεt−1 + · · ·+ At−1Bε1) +HAtx0. (7.5)

Page 237: Maskin ambiguity book mock 2 - New York University

7.3. The choice setting 229

We let C(A,B,H ; x0) denote the set of consumption plans with representa-tion (7.4)-(7.5).

In this paper, we use one of the following two consumption plans thatTallarini finds fit post WWII U.S. percapita consumption well:

1. geometric random walk:

ct = c0 + tμ + σε(εt + εt−1 + · · ·+ ε1), t ≥ 1 (7.6)

where

εt ∼ πt(·|εt−1, x0) = π(·) ∼ N (0, 1)

2. geometric trend stationary:8

ct = ρtc0 + μt+ (1− ρt)ζ + σε(εt + ρεt−1 + · · ·+ ρt−1ε1), t ≥ 1 (7.7)

where

εt ∼ πt(·|εt−1, x0) = π(·) ∼ N (0, 1).

Parameter estimates

We estimated both consumption processes using U.S. quarterly real con-sumption growth per capita from 1948:2-2006:4. 9 Maximum likelihoodpoint estimates are summarized in table 7.2. We shall use these point esti-mates as inputs into the calculations below.

Overview of Agents I, II, III, and IV

The preferences of our four types of agent over consumption plans C∞ ∈ Care defined in terms of the following sets of objects:

Type I agent (Kreps-Porteus-Epstein-Zin-Tallarini):

(i) a discount factor β ∈ (0, 1); (ii) an intertemporal elasticity of substitu-tion IES equal to unity; (iii) a risk aversion parameter γ ≥ 1; and (iv) a

8The recursive version of our trend stationary model is ct = ζ+μt+zt, zt = ρzt−1+εt.9Consumption is measured as real personal consumption expenditures on nondurable

goods and services and is deflated by its implicit chain price delator. We use the samedeflator to deflate asset prices. We use civilian noninstitutional population 16 years andolder to get per capita series.

Page 238: Maskin ambiguity book mock 2 - New York University

230 Chapter 7. Doubts or Variability?

Table 7.2: Estimates from quarterly U.S. data 1948:2-2006:4. Standard errorsin parenthesis.

Parameter Random Walk Trend Stationaryμ 0.00495 0.00418

(0.0003) (0.0003)σε 0.0050 0.0050

(0.0002) (0.0002)ρ - 0.980

(0.010)ζ - -4.48

(0.08)

conditional density πt+1(·|εt, x0) = π(·) for εt+1 and an implied joint distri-bution Π∞(·|x0).

Type II agent (ambiguity averse Hansen and Sargent (2001a) multiplierpreferences):

(i) a discount factor β ∈ (0, 1); (ii) an intertemporal elasticity of substi-tution IES equal to unity; (iii) a risk aversion parameter equal to 1; (iv)a conditional density πt+1(·|εt, x0) for εt+1 and an implied joint distribu-tion Π∞(·|x0); and (v) a parameter θ that penalizes the entropy associatedwith a minimizing player’s perturbation of Π∞ relative to the iid, standardnormal benchmark.

Type III agent (ambiguity averse Hansen and Sargent (2001a) constraintpreferences):

(i) a discount factor β ∈ (0, 1); (ii) an intertemporal elasticity of substi-tution IES equal to unity; (iii) a risk aversion parameter equal to 1; (iv)a conditional density πt+1(·|εt, x0) for εt+1 and an implied joint distribu-tion Π∞(·|x0); and (v) a parameter η that measures the discounted relativeentropy of perturbations to Π∞(·|x0) relative to an iid, standard normalbenchmark allowable to a minimizing player.

Type IV agent (pessimistic ex post Bayesian):

Page 239: Maskin ambiguity book mock 2 - New York University

7.4. A type I agent: Kreps-Porteus-Epstein-Zin-Tallarini 231

(i) a discount factor β ∈ (0, 1); (ii) an IES = 1; (iii) a risk-aversion pa-rameter of 1; and (iv) a unique pessimistic joint probability distributionΠ∞(·|x0, θ).

Our reinterpretation of Tallarini’s quantitative findings as well as ourmental experiment that measures the costs of model specification uncer-tainty both hinge on the following behavioral implications of these alterna-tive preference specifications. Agents I and II are observationally equivalentin the strong sense that they have identical preferences over C. Agent IIIand IV are observationally equivalent with I and II in the more restricted,but for us still very useful, sense that their valuations of risky assets coincideat an exogenous endowment process that we take to be the approximatingmodel for the type II and type III representative agents.

7.4 A type I agent:

Kreps-Porteus-Epstein-Zin-Tallarini

Our type I agent has preferences over C that are defined via the valuefunction recursion

log Vt = (1− β)ct + β log[Et (Vt+1)

1−γ] 1

1−γ(7.8)

where γ ≥ 1. This is the risk-sensitive recursion of Hansen and Sar-gent (1995b, 2007b) that for a logarithmic period utility function Tallarini(2000a) interpreted to be a case of the recursive preference specificationof Epstein and Zin (1989a, 1991) in which the intertemporal elasticity ofsubstitution IES is fixed at unity and the atemporal coefficient of relativerisk aversion is γ.

Formulas for continuation values

To represent asset prices, we first compute continuation values for the twoalternative consumption processes. Define Ut ≡ log Vt/(1− β) and

θ =−1

(1− β)(1− γ). (7.9)

Then

Ut = ct − βθ logEt

[exp

(−Ut+1

θ

)]. (7.10)

Page 240: Maskin ambiguity book mock 2 - New York University

232 Chapter 7. Doubts or Variability?

When γ = 1 (or θ = +∞), recursion (7.10) becomes the standard dis-counted expected utility recursion

Ut = ct + βEtUt+1.

For consumption processes C∞ ∈ C(A,B,H ; x0) associated with differentspecifications of (A,B,H) in (7.4), recursion (7.10) implies the followingBellman equation:10

U(x) = c− βθ log

∫exp

[−U(Ax +Bε)

θ

]π(ε)dε. (7.11)

For the random walk specification, the solution of (7.10) is

Ut =β

(1− β)2

[μ− σ2

ε

2θ (1− β)

]+

1

1− βct. (7.12)

For the trend stationary model the solution of the value function recursion(7.10) is:

Ut =βζ (1− ρ)

(1− β) (1− βρ)+

βμ

(1− β)2− σ2

εβ

2θ (1− β) (1− βρ)2+

μβ (1− ρ)

(1− βρ) (1− β)t+

1

1− βρct.

(7.13)

Pricing implications

Arrow securities are defined relative to a measure used to integrate overstates. We first use the Lebesgue measure. For a type I representativeagent economy, the price of a one-period Arrow security is(

βCt

Ct+1(ε∗)

)(exp [−Ut+1(ε

∗)/θ]∫exp [−Ut+1(ε)/θ] dπ(ε)

)π(ε∗).

We abuse notation to avoid proliferation. The notation Ct+1 is the randomvariable that denotes consumption at date t+1. Recognizing that the newinformation available at date t + 1 is captured by the random vector εt+1,

10The notation Ut denotes the continuation value realized at date t for a consumptionplan. In Bellman equation (7.11), we use U(·) to denote the value function as a functionof the Markov state.

Page 241: Maskin ambiguity book mock 2 - New York University

7.5. A type I agent economy with high risk aversion attains HJ bound233

Ct+1(·) explicitly represents the dependence of Ct+1 on εt+1 and similarlyfor Ut+1(·).

Instead of using the Lebesgue measure to integrate over states, thestochastic discount factor uses the underlying conditional probability dis-tribution. As a consequence, the stochastic discount factor is given by

mt+1,t =

(βCtCt+1

)(exp (−Ut+1/θ)

Et [exp (−Ut+1/θ)]

). (7.14)

The change in the reference measure for integration leads to π being omittedin (7.14).

In conjunction with a solution for the value function, for example, (7.12)or (7.13), equation (7.14) shows how the standard stochastic discount factor(β Ct

Ct+1

)associated with time separable logarithmic utility is altered by a

potentially volatile function of the continuation value Ut+1 when the riskaversion parameter γ ≡ 1 + 1

(1−β)θ > 1 (see equation (7.9)). For a type Iagent, γ is a risk aversion parameter that differs from the reciprocal of theIES when γ = 1.

7.5 A type I agent economy with high risk

aversion attains HJ bound

For the random walk and trend stationary consumption processes, Tallarinicomputed the following formulas for E(m) and σ(m) for what we call a typeI agent. For the random walk model that follows, the mean and volatilityof mt+1 conditioned on date t information are constant. For the trendstationary models they depend on conditioning information, and we willwork with their unconditional counterparts.

• Random walk model:

E[m] = β exp

[−μ +

σ2ε

2(2γ − 1)

](7.15)

σ (m)

E [m]={exp

[σ2εγ

2]− 1

} 12 (7.16)

Page 242: Maskin ambiguity book mock 2 - New York University

234 Chapter 7. Doubts or Variability?

• Trend stationary model:

E[m] = β exp

[−μ+

σ2ε

2

(1− 2 (1− β) (1− γ)

1− βρ+

1− ρ

1 + ρ

)](7.17)

σ (m)

E [m]=

{exp

[σ2ε

({(1− β) (1− γ)

1− βρ− 1

}2

+1− ρ

1 + ρ

)]− 1

} 12

(7.18)

Figure 7.1 is our version of Tallarini’s 2000a key figure. It follows Tal-larini in using the above formulas to plot loci of (E(m), σ(m)) pairs as therisk-aversion parameter γ varies.11 This figure chalks up a striking successfor Tallarini compared to the corresponding risk-free-rate-puzzle laden X’sin Figure 7.1 for time separable CRRA preferences. Notice how for bothspecifications of the endowment process, increasing γ pushes the volatilityof the stochastic discount factor upward toward the Hansen-Jagannathanbound while leaving E(m) essentially unaffected, thus avoiding the risk-freerate puzzle of Weil (1990).12

However, to approach the Hansen-Jagannathan bound Tallarini had toset the risk aversion parameter γ to very high values, 50 for the randomwalk model, about 250 for the trend stationary model. These high valuesprovoked the skeptical remarks we have cited from Lucas (2003).

7.6 Reinterpretations

We respond to Lucas’s reluctance to use Tallarini’s findings as a sourceof evidence about a representative consumer’s attitude about random con-sumption fluctuations by reinterpreting γ as a parameter that expressesmodel specification doubts rather than risk aversion.

11As observed by Kocherlakota (1990a), for the random walk model, it is pos-sible to generate the (E(m), σ(m)) pairs in figure 7.1 while sticking with the timeseparable CRRA model by changing β along with γ in the following way: (γ, β) =(1, 0.9950), (5, 1.0147), (10, 1.0393), (15, 1.0637), (20, 1.0881), (25, 1.1124), (30, 1.1364), (35, 1.1602), (40, 1.1838)(45, 1.2070), (50, 1.2300).

12By comparing the formulas for E(m) in footnote 7 with formula (7.15) for E(m) forthe random walk case, one sees how by locking the IES equal to 1, formula (7.15) arreststhe force in the footnote 7 equation that pushes downward E(m) as one increases γ viathe term exp(−γμ). For power utility this is the dominant effect of γ on E(m) when μis much larger than σε, as it is in the data.

Page 243: Maskin ambiguity book mock 2 - New York University

7.6. Reinterpretations 235

Language for robustness: an ‘approximating model’

To express doubts about model specification, we put multiple probabilityspecifications on the table. To stay as close as possible to rational expecta-tions, we work with a setting in which a representative agent has one fullyspecified model represented with particular A,B,H and Π∞ in (7.4). Inthis paper, that model will be either the random walk model or the trendstationary model described above. We shall call this the ‘approximatingmodel’ to acknowledge that the agent does not completely trust it. We ex-press specification doubts in terms of alternative joint distributions Π thatthe agent contemplates assigning to the shocks ε∞. We imagine that theagent surrounds his approximating model with a set of unspecified modelsthat are statistically nearby (as measured by conditional relative entropy)and that he thinks might govern the data. Our type II and III agentswant one value function that will somehow let them evaluate consumptionplans under all of those nearby models. Before telling how they get thosevalue functions, we first describe a mathematical formalism for represent-ing the unspecified densities over ε∞ that concern the agent as statisticalperturbations of Π∞(ε∞).

Using martingales to represent probability distortions

Let the representative consumer’s information set be Xt, which for us willbe the history of log consumption growth rates up to date t. Randomvariables that are Xt measurable can be expressed as Borel measurablefunctions of x0 and εt. Hansen and Sargent (2005b, 2007b) use a nonneg-ative Xt-measurable function Gt with E(Gt|x0) = 1 to create a distortedprobability measure that is absolutely continuous with respect to the prob-ability measure over Xt generated by one of our two approximating modelsfor log consumption growth.13 Under the original probability measure therandom variable Gt is a martingale with mean 1. We can use Gt as a Radon-Nikodym derivative (i.e., a likelihood ratio) to generate a distorted measureunder which the expectation of a bounded Xt-measurable random variableWt is EWt

.= EGtWt. The entropy of the distortion at time t conditioned

on date zero information is E (Gt logGt|X0).

13See Hansen et al. (2006b) for a continuous time formulation.

Page 244: Maskin ambiguity book mock 2 - New York University

236 Chapter 7. Doubts or Variability?

Recursive representations of distortions

We often factor a joint density Ft+1 over an Xt+1-measurable random vectoras Ft+1 = ft+1Ft, where ft+1 is a one-step ahead density conditioned on Xt.Following Hansen and Sargent (2005b), it is also useful to factor Gt+1. Form

gt+1 =

{Gt+1

Gtif Gt > 0

1 if Gt = 0.

Then Gt+1 = gt+1Gt and

Gt = G0

t∏j=1

gj. (7.19)

The random variable G0 is equal to unity. By construction, gt+1 has datet conditional expectation equal to unity. For a bounded random variablebt+1 that is Xt+1-measurable, the distorted conditional expectation impliedby the martingale {Gt : t ≥ 0} is

E(Gt+1bt+1|Xt)

E(Gt+1|Xt)=E(Gt+1bt+1|Xt)

Gt

= E (gt+1bt+1|Xt)

provided that Gt > 0. We extend this distorted conditional expectation toa more general collection of random variables by approximating unboundedrandom variables with a sequence of bounded ones. For each t ≥ 0, con-struct the space Gt+1 of all nonnegative, Xt+1-measurable random variablesgt+1 for which E(gt+1|Xt) = 1. We use gt+1 to represent distortions of theconditional probability distribution for Xt+1 given Xt.

A type II agent: ambiguity averse multiplierpreferences

We represent ambiguity aversion with the multiplier preferences of Hansenand Sargent (2001a) and Hansen et al. (2006b).14These are defined in termsof a parameter θ that penalizes the discrepancy between perturbed models

14Maccheroni et al. (2006a,b) give an axiomatic foundation for variational prefer-ences and describe how they express ambiguity aversion. Both multiplier and constraintpreferences are special cases of variational preferences. Constraint preferences are partic-ular instances of the multiple priors model of Gilboa and Schmeidler (1989). Strzalecki(2008b) and Cerreia-Vioglio et al. (2008) give axiomatic defenses for multiplier prefer-ences. Hansen and Sargent (2007b) show the link between the smooth ambiguity formu-

Page 245: Maskin ambiguity book mock 2 - New York University

7.6. Reinterpretations 237

and the approximating model and that is linked via an application of theLagrange multiplier theorem to a parameter η that occurs in what we shallcall the “constraint preferences” of our type III representative agent.

A type II agent’smultiplier preference ordering over C∞ ∈ C is describedby

min{gt+1}

∞∑t=0

E{βtGt

[ct + βθE(gt+1 log gt+1|εt, x0)

]∣∣∣x0} (7.20)

whereGt+1 = gt+1Gt, E[gt+1|εt, x0] = 1, gt+1 ≥ 0, G0 = 1. (7.21)

In this paper, we restrict ourselves to studying subsets C(A,B,H ; x0) ofC with typical element C∞. For this set of consumption plans C∞ ∈C(A,B,H ; x0), a type II agent has a value function

W (x0) = min{gt+1}

∞∑t=0

E{βtGt

[ct + βθE(gt+1 log gt+1|εt, x0)

]∣∣∣x0} (7.22)

where the minimization is subject to (7.4) and( 7.21). The value functionsolves the following Bellman equation:

GW (x) = ming(ε)≥0

G(c+ β

∫ [g(ε)W (Ax+Bε) + θg(ε) log g(ε)

]π(ε)dε

).

(7.23)Dividing by G gives

W (x) = c+ ming(ε)≥0

∫ [g(ε)W (Ax+Bε) + θg(ε) log g(ε)

]π(ε)dε

)where the minimization is subject to

∫g(ε)π(ε)dε = 1. Solving the mini-

mum problem and substituting the minimizer into the above equation givesthe risk-sensitive recursion of Hansen and Sargent (1995b, 2007b):

W (x) = c− βθ log

∫exp

[−W (Ax+Bε)

θ

]π(ε)dε. (7.24)

lation of Klibanoff et al. (2005a) and multiplier preferences, while Cerreia-Vioglio et al.(2008) show that the multiplier preferences used here are the only preferences that areboth variational and smooth in the sense of Klibanoff et al. (2005a). Specifically, smoothvariational preferences necessarily use discrepancies between distributions measured withrelative entropy.

Page 246: Maskin ambiguity book mock 2 - New York University

238 Chapter 7. Doubts or Variability?

The minimizing martingale increment is

gt+1 =

(exp (−W (Axt +Bεt+1)/θ)

Et [exp (−W (Axt +Bεt+1)/θ)]

). (7.25)

Types I and II are observationally equivalent

Notice that equations (7.11) and (7.24) imply that

W (x) ≡ U(x). (7.26)

Therefore, agents I and II have identical preferences over elements of C∞ ∈C(A,B,H ; x0).

15 In this strong sense, they are observationally equivalent,but the interpretation of θ differs for type I and type II agents. For a typeI agent, θ(γ) ≡ −1

(1−β)(1−γ) is a measure of risk aversion. For a type II agent,θ indicates his fear of model misspecification as measured by how much theminimizing agent gets penalized for raising entropy.

A type III agent: ambiguity averse constraintpreferences

Hansen and Sargent (2001a, 2005b) and Hansen et al. (2006b) describeconstraint preferences that are directly related to the multiple priors modelof Gilboa and Schmeidler (1989). Here a primitive object is a set of prob-ability densities that we attribute to the representative type III consumer.We follow our earlier work by using ideas from robust control theory toconstruct this set of densities. In particular, we follow Hansen and Sargent(2001a, 2005b) and restrain the discounted relative entropy of perturbationsto the approximating model:

βE[ ∞∑t=0

βtGtE(gt+1 log gt+1|εt, x0)∣∣∣x0] ≤ η (7.27)

where η ≥ 0 measures the size of an entropy ball surrounding the distri-bution Π∞(ε∞|x0). Given a set of models within an entropy ball η > 0,constraint preferences over C∞ ∈ C are ordered by

min{gt+1}

∞∑t=0

E[βtGtct

∣∣∣x0] (7.28)

15It can also be established that they have identical preferences for C∞ ∈ C.

Page 247: Maskin ambiguity book mock 2 - New York University

7.6. Reinterpretations 239

where the minimization is subject to Gt+1 = gt+1Gt and G0 = 1. If werestrict C∞ to be in C(A,B,H ; x0), a type III agent has a value function

J(x0) = min{gt+1}

∞∑t=0

E[βtGtct

∣∣∣x0] (7.29)

where the minimization is subject to the discounted entropy constraint(7.27) and

xt+1 = Axt +Bεt+1

ct = Hxt, x0 givenGt+1 = gt+1Gt, E[gt+1|εt, x0] = 1, gt+1 ≥ 0, G0 = 1. (7.30)

Hansen and Sargent (2001a), Hansen et al. (2006b), and Hansen and Sargent(2008b, chap. 6) describe how constraint and multiplier preferences differand also how θ and η can be chosen to align choices and valuations alongequilibrium paths of the associated two-player zero-sum games. Briefly, theyshow how (1) ex post θ in the multiplier preferences can be viewed as theLagrange multiplier on the time 0 discounted entropy constraint, and (2)the multiplier θ and continuation entropy can be chosen to align equilibriumoutcomes that emerge from multiplier and constraint preferences.

A type IV agent: ex post Bayesian

A type IV agent is an ordinary expected utility agent with log preferencesand a particular (distorted) joint distribution Π∞(·|x0) over C∞:

E0

∞∑t=0

βtct. (7.31)

The joint distribution Π∞(·|x0) is the one associated with the preferences ofa type II agent and so depends on θ as well as on A,B,H when we restrictC∞ to lie within C(A,B,H ; x0). The value function for a type IV agentequals the value function J(x) for a type III agent.

Page 248: Maskin ambiguity book mock 2 - New York University

240 Chapter 7. Doubts or Variability?

Types III and IV not observationally equivalent to I

or II,but . . .

While agents I and II have identical preference orderings over C(A,B,H ; x0)(and more generally over C∞ ∈ C), they have different preference order-ings than agents III and IV. Still, there is a more limited but for us veryimportant sense in which agents of all four types look alike. It is truethat for a f ixed π(ε∞|A,B,H, θ), the type IV pessimistic agent makes dif-ferent choices over C(A,B,H ; x0) (i.e., plans C∞ represented in terms of(A, B, H) = (A,B,H)) than does the type I or type II agent. However, forthe particular A,B,H plan and θ used to derive the worst-case joint dis-tribution Π(ε∞), the shadow prices of uncertain claims for a type IV agentmatch those for a type II agent.16 This provides an interesting perspectiveon what are ordinarily interpreted as prices of “risk” in settings in whichno one fears model uncertainty.

Interpretation of stochastic discount factor

The same representation of the stochastic discount factor

mt+1,t =

(βCtCt+1

)g(ε(t+ 1)) (7.32)

prevails for all four types of representative consumer, but the interpretationof g varies across the four types. With a type I representative consumer inthe style of Tallarini (2000a), the distortion g(εt+1) is a contribution fromthe Kreps and Porteus (1978c) recursive utility specification that gets inter-mediated through continuation values; g(εt+1) deviates from being identi-cally unity when risk aversion γ > 1 exceeds the inverse of the IES. For themax-min expected utility type II and III representative agents, g(εt+1) is thelikelihood ratio that transforms the one-step conditional density π(εt+1) un-der the approximating model to the worst-case density that these consumersuse to evaluate risky streams. The fact that g(εt+1)π(εt+1) is the (unique)subjective conditional density for εt+1 for a type IV ex post Bayesian repre-sentative agent means that as outside analysts we must introduce the like-

16This link extends to decision problems in which one can apply the Minimax Theo-rem. See Hansen et al. (2006b) for more discussion.

Page 249: Maskin ambiguity book mock 2 - New York University

7.6. Reinterpretations 241

lihood ratio g(εt+1) into the stochastic discount factor whenever we wantto use the approximating model to price assets.

Choice of data generating model

The sets of probability models that surround the approximating model andthat help define the preferences of the type II and III agents are in theirheads. To make empirical statements, we have to posit a data generat-ing model. The rational expectations hypothesis assumes that there is aunique distribution, i.e., that all subjective distributions equal a presumedobjective one, so that after a rational expectations is formulated, the datagenerating mechanism is not an extra object to specify. But since we haveput multiple probability models into the heads of our types II and III agents,and a unique pessimistic one into the head of our type IV agent, we have tomake an explicit assumption about a unique data generating mechanism.We assume that the approximating model is the data generating mecha-nism, so the fears of model misspecification of our type II and III agentsare, after all, only in their heads. Having taken this stance, we shall usethe Radon-Nikodym derivative g to price the model uncertainty feared byour type II and III representative consumers.

Value functions and discounted entropy

In terms of the minimizing martingale increment we can express the valuefunction recursion for a type II agent as

W (x) = c+ β

∫[g(ε)W (Ax+Bε) + θg(ε) log g(ε)] π(ε)dε. (7.33)

By solving (7.33), we can express W (x) as the sum of two components, thefirst of which is the expected discounted value of C∞ under the worst casemodel, while the second is θ times discounted entropy:

W (x) = J(x) + θN(x) (7.34)

where

J(x) = c+ β

∫[g(ε)J(Ax+Bε)] π(ε)dε (7.35)

Page 250: Maskin ambiguity book mock 2 - New York University

242 Chapter 7. Doubts or Variability?

and

N(x) = β

∫[g(ε) log g(ε) + g(ε)N(Ax+Bε)]π(ε)dε. (7.36)

Here

J(xt) = Et

∞∑j=0

βjct+j

is the expected discounted log consumption under the worst case joint den-sity for C∞ and

GtN(xt) = GtβE

[ ∞∑j=0

βjGt+j

Gt

E[gt+j+1 log gt+j+1|εt+j, x0]∣∣∣εt, x0]

is continuation entropy.While W (x) is the value function for a type II agent, evidently J(x) is

the value function for a type III agent as well as for a type IV agent. Toevaluate W (x) and J(x) it is useful first to find the minimizing martingaleincrement g(ε) and then discounted entropy. We do that in the next twosubsections.

Minimizing martingale increments

Using formula (7.25), we find that the minimizing martingale increment forthe geometric random walk model is:

gt+1 ∝ exp

(−σεεt+1

(1− β) θ

).

The implied distorted conditional density is

π (εt+1) ∝ exp

(−ε2t+1

2

)exp

(−σεεt+1

(1− β) θ

).

Completing the square gives

π(ε) ∼ N (w(θ), 1) (7.37)

where

w(θ) =−σε

(1− β) θ. (7.38)

Page 251: Maskin ambiguity book mock 2 - New York University

7.6. Reinterpretations 243

Pursuing an analogous calculation for the trend stationary model, we findthat the worst case conditional density again has form (7.37), where now

w(θ) = − σε(1− ρβ) θ

. (7.39)

Discounted entropy

When the conditional densities for εt+1 under the approximating and worstcase models are π ∼ N (0, 1) and π ∼ N (−w(θ), 1), respectively, we cancompute that conditional entropy is

Etgt+1 log gt+1 =

∫ (log π(ε)− log π(ε)

)π(ε)dε =

1

2w′(θ)w(θ).

It then follows that discounted entropy becomes

βE[ ∞∑t=0

βtGtE(gt+1 log gt+1|εt, x0)∣∣∣x0] = η =

β

2(1− β)w′(θ)w(θ) (7.40)

Formula (7.40) gives a mapping between θ and η that allows us to set theseparameters to align multiplier and constraint preferences along an exoge-nous endowment process. We shall use this mapping to interpret θ below. Inparticular, after we introduce detection error probabilities in section (7.7),we shall argue that it is more natural to fix η rather than θ when we makecomparisons by altering the consumer’s baseline approximating model fromthe random walk to the trend stationary model. For this purpose, it isuseful to note that by using formulas (7.38), (7.39), and (7.40), we findthat the following choices of θ’s for the random walk and trend stationarymodels imply identical discounted entropies:17

θTS =

(σTSε

σRWε

)1− β

1− ρβθRW (7.41)

Value functions for random walk log consumption

Using the formula for w(θ) from the random walk model (7.38) tells us thatdiscounted entropy is

N(x) =β

2(1− β)

σ2ε

(1− β)2θ2. (7.42)

17The ratio σTSε /σRW

ε = 1 at the parameter values in table 7.2.

Page 252: Maskin ambiguity book mock 2 - New York University

244 Chapter 7. Doubts or Variability?

For the random walk model, we can then compute the value function for atype II agent to be

W (xt) =β

(1− β)2

[μ− σ2

ε

2(1− β)θ

]+

1

1− βct (7.43)

and for a type III agent to be

J(xt) =β

(1− β)2

[μ− σ2

ε

(1− β)θ

]+

1

1− βct, (7.44)

so thatW (xt) = J(xt)+θN(xt). To interpret J(xt) as the value function fora type III agent, we use formula (7.40) to align θ and η. We shall use thesevalue functions to construct compensating variations in the initial conditionfor log consumption c0 in an elimination-of-model-uncertainty experimentto be described in section 7.8.

Market prices of risk and model uncertainty

Hansen et al. (1999) and Hansen et al. (2002) note that the conditionalstandard deviation of the Radon-Nikodym derivative g(ε) is

MPU = stdt(g) = [exp(w(θ)′w(θ))− 1]12 ≈ |w(θ)|. (7.45)

By construction Etg = 1. We call stdt(g) the market price of model un-certainty (MPU). It can be verified that for the random-walk and trend-stationary models, |wt+1| given by the above formulas comprises the lion’sshare of what Tallarini (2000a) interpreted as the market price of risk givenby formulas (7.16) and (7.18). This is because the first difference in the logof consumption has a small conditional coefficient of variation in our data(this observation is the heart of the equity premium puzzle). Thus, formula(7.45) is a good approximation to Tallarini’s formulas (7.16) and (7.18). Itfollows from formula (7.40) that

MPU =

√2η(1− β)

β.

Page 253: Maskin ambiguity book mock 2 - New York University

7.7. Reinterpreting Tallarini 245

Interpretation of MPU

As the slope of the mean-standard deviation frontier, the market price ofrisk (MPR) tells the increase in the expected return needed to compensatean investor for accepting a unit increase in the standard deviation of thereturn along the efficient frontier. Our type II and III consumers’ worst-case beliefs encode their concerns about model misspecification. We canmeasure the market price of model uncertainty (MPU) in terms of howa representative investor’s worst case model distorts mean returns. Whenmeasured using the approximating model, the worst case model’s distortionin mean rates of return amplifies objects that are usually interpreted as amarket price of risk. In a continuous time limit, the MPU is the maximalexpected rate of return distortion in the worst case model relative to the ap-proximating model, per unit of standard deviation of return. For example,see Anderson et al. (2003).

7.7 Reinterpreting Tallarini

Tallarini interprets γ as a parameter measuring aversion to atemporal gam-bles. The quote from Lucas (2003) and the reasoning of Cochrane (1997),who applied ideas of Pratt (1964), tell why economists think that only smallpositive values of γ are plausible when it is interpreted as a risk-aversionparameter. The mental experiment of Pratt confronts a decision makerwith choices between gambles with known probability distributions (i.e.,the type of risks that the type I agent thinks he faces).

The observational equivalence between our types I and II agents meansthat we can just as well interpret γ as measuring the consumer’s concernabout model misspecification. But how should we think about plausiblevalues of γ (or θ) when it is to be interpreted as encoding responses togambles that involve unknown probability distributions? We answer thisquestion by using detection error probabilities that tell how difficult it is todistinguish probability distributions on the basis of a fixed finite number ofobservations. These measures inspire us to argue that it is not appropriateto regard γ or θ as a parameter that remains fixed when we vary the stochas-tic process for consumption under the consumer’s approximating model,e.g., the random walk or trend stationary model for log consumption. In-stead, we shall see that it is more plausible to fix the size of the discounted

Page 254: Maskin ambiguity book mock 2 - New York University

246 Chapter 7. Doubts or Variability?

entropy ball η as we think of moving across approximating models. Thisis because the detection error probabilities turn out to be functions of ηthat vary little across the trend stationary and random walk models. Thus,our mental experiment under model uncertainty leads us to use the samevalues of the discounted entropy constraint η or the implied detection errorprobabilities, but different values of γ, for different approximating models.

Calibrating γ using detection error probabilities

This section describes how to use Bayesian detection error probabilities tocalibrate a plausible value for γ or θ when it is interpreted as a parametermeasuring a representative consumer’s concern about model misspecifica-tion.18 The idea is that it is plausible for agents to be concerned aboutmodels that are difficult to distinguish from one another with data setsof moderate size. We implement this idea by focusing on statistically dis-tinguishing the approximating model (call it model A) from a worst casemodel associated with a particular θ (call it model B). Imagine that beforeseeing any data, the agent had assigned probability .5 to both the approx-imating model and the worst case model associated with θ. After seeing Tobservations, the representative consumer performs a likelihood ratio testfor distinguishing model A from model B. If model A were correct, the like-lihood ratio could be expected falsely to say that model B generated thedata pA percent of the time. Similarly, if model B were correct, the like-lihood ratio could be expected falsely to say that model A generated thedata pB percent of the time. We weight pA, pB by the prior probabilities .5to obtain what we call the detection error probability:

p(θ−1

)=

1

2(pA + pB) . (7.46)

The detection error probability p(θ−1) is a function of θ−1 because the worst-case model depends on θ. When γ = 1 (or θ−1 = 0, see equation (7.9)), it iseasy to see that p(θ−1) = .5 because then the approximating and worst-casemodels are identical. As we raise θ−1 above zero, p(θ−1) falls below .5.

We use introspection to instruct us about plausible values of p(θ−1) as ameasure of concern about model misspecification. Thus, we think it is sensi-ble for a decision maker to want to guard against possible misspecificationswhose detection error probabilities are .2 or even less.

18Also see Anderson et al. (2003) and Hansen and Sargent (2008b).

Page 255: Maskin ambiguity book mock 2 - New York University

7.7. Reinterpreting Tallarini 247

As a function of θ−1, p(θ−1) differs for different specifications of theapproximating model. In particular, it will change when we switch froma trend stationary to a random walk model of log consumption. Whencomparing outcomes across different approximating models, we advocatecomparing outcomes for the same detection error probabilities p(θ−1) andadjusting the θ−1’s appropriately across models. We shall do that for ourversion of Tallarini’s model and will recast his figure 7.1 in terms of locithat record (E(m), σ(m)) pairs as we vary the detection error probability.

Tallarini’s figure again

The left panel of Figure 7.2 describes the detection probability p(θ−1) forthe random walk (dashed line) and trend stationary (solid line) models.We simulated the approximating and worst-case models 100,000 times andfollowed the procedure described above to compute the detection error prob-abilities for a given θ−1. The simulations were done for T = 235 periods,the sample size for quarterly consumption growth data over the period from1948II-2006IV.

The left panel of figure 7.2 reveals that for the random walk and thetrend stationary models, a given detection error probability p(θ−1) is as-sociated with different values of θ−1. Therefore, if we want to computeE(m), σ(m) pairs for the same detection error probabilities, we have to usedifferent values of θ−1 for our two models of log per capita consumptiongrowth. We shall use figure 7.2 to find these different values of θ−1 associ-ated with a given detection error probability, then redraw Tallarini’s figurein terms of detection error probabilities.

The right panel of figure 7.2 plots the detection error probabilitiesagainst the values of discounted entropy η for each model for the randomwalk and trend stationary models. As functions of η, the detection errorprobabilities for the two models are the same.

Thus, to prepare a counterpart to figure 7.1, our updated version ofTallarini’s graph, we invert the detection error probability functions p(θ−1)in the left panel of figure 7.2 to get θ−1 as a function of p(θ−1) for eachmodel, then use this θ−1 either in formulas (7.15), (7.16) or in formulas(7.17), (7.18) to compute the (E(m), σ(m)) pairs to plot a la Tallarini. Wepresent the results in figure 7.3.

We invite the reader to compare our figure 7.3 with figure 7.1. Thecalculations summarized in figure 7.1 taught Tallarini that with the ran-

Page 256: Maskin ambiguity book mock 2 - New York University

248 Chapter 7. Doubts or Variability?

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

5

10

15

20

25

30

35

40

45

50

θ−1

p(θ−

1 )

RWTS

(a) Detection error vs. θ−1

0 1 2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

40

45

50

Discounted Entropy (η)

p(η)

RWTS

(b) Detection error vs. entropy

Figure 7.2: Panel A: detection error probabilities versus θ−1 for the randomwalk and trend stationary models. Panel B: detection error probabilities versusdiscounted entropy η for the random walk and trend stationary models (the twocurves coincide).

0.8 0.85 0.9 0.95 10

0.05

0.1

0.15

0.2

0.25

0.3

E(m)

σ(m

)

HJ boundsRWTS

Figure 7.3: Reciprocal of risk free rate, market price of risk pairs for the randomwalk (◦) and trend stationary (+) models for values of p(θ−1) of 50, 45, 40, 35,30, 25, 20, 15, 10, 5 and 1 percent.

Page 257: Maskin ambiguity book mock 2 - New York University

7.7. Reinterpreting Tallarini 249

dom walk model for log consumption, the (E(m), σ(m)) pairs approachthe Hansen and Jagannathan bound when γ is around 50, whereas underthe trend stationary model we need γ to be 75 in order to approach thebound when β = .995. Figure 7.3 simply repackages those results by usingthe detection error probabilities p(θ−1) reported in the left panel of figure7.2 to trace out loci of (E(m), σ(m)) pairs as we vary the detection errorprobability.

Figure 7.3 reveals the striking pattern that varying the detection errorprobabilities traces out nearly the same loci for the random walk and thetrend stationary models of consumption. This outcome faithfully reflectsa pattern that holds exactly for large deviation bounds on the detectionerror probabilities that were studied by Anderson et al. (2003). Their workestablished a tight link between those bounds and the market price of modeluncertainty that transcends details of the stochastic specification for therepresentative consumer’s approximating model.

In terms of the issue raised in the quote from Lucas (2003), figure 7.3reveals that regardless of the stochastic specification for consumption, whatwe regard as conservative detection error probabilities of between .15 and.2 take us half of the way toward the Hansen and Jagannathan bound.

To appreciate the significance of this finding, recall that Tallarini (2000a)showed how to explain both the equity premium and the risk-free rate byusing Epstein-Zin-Weil preferences to separate a CRRA parameter γ froman IES parameter that he fixed at 1. To make things work, Tallarini neededvery different levels of risk aversion depending on whether he used a random-walk with drift or a trend stationary model for log consumption. In figure7.1, Tallarini needed to set γ = 50 for the random walk model and γ = 75for the trend stationary model. For that figure, we follow Tallarini in settingβ = 0.995, which implies an E(m) whose inverse does not match the riskfree rate in the economy very well: notice that in our figure 7.3, the circlesand crosses lie a bit to the left of the Hansen-Jagannathan bound.19

Figure 7.3 reveals that for the same detection error probability bothmodels of consumption growth imply the same values of (what is ordinarily

19We can get inside the Hansen-Jagannathan bound by increasing the discount factorβ. But, doing so requires even higher levels of the coefficient of risk aversion, especiallyfor the trend stationary model. Adjusting the parameters in this way pushes the cir-cles and pluses in our figure 7.3 to the right as we increase the discount factor. Thelevel of detection error probability necessary to achieve a certain market price of modeluncertainty is almost unaltered when we alter the discount factor.

Page 258: Maskin ambiguity book mock 2 - New York University

250 Chapter 7. Doubts or Variability?

interpreted as) the market price of risk. We say “what is ordinarily in-terpreted as” in order to indicate that on our preferred interpretation, thecontribution from g, which accounts for most of it, should be interpreted asa “market price of model uncertainty”. Figure 7.3 alters our sense of howplausible a given setting of γ is when we see that one gets pretty close tothe bound with a detection error probability of 5 per cent. A representativeconsumer who sets a detection error probability that small does not seemto be as timid as one who sets a CRRA coefficient as high as 50 or 75.

7.8 Welfare gains from eliminating model

uncertainty

Obstfeld (1994a), Dolmas (1998), and Tallarini (2000a) studied the welfarecosts of business cycle with Epstein-Zin preferences, while Hansen et al.(1999) described links between asset prices and welfare costs of consump-tion fluctuations in settings that featured both risk-sensitivity and robust-ness. In this section, we revisit welfare calculations under our robustnessinterpretation instead of Tallarini’s 2000a risk-sensitivity interpretation.

We have argued that the lion’s share of what Tallarini (2000a) and Al-varez and Jermann (2004) interpret as market prices of risk should insteadbe interpreted as market prices of uncertainty . This means that those un-certainty prices reveal the representative consumer’s attitude about a verydifferent mental experiment than the one that interested Lucas (1987a,2003). The question posed by Lucas (1987a, 2003) was “how much con-sumption would the representative consumer be willing to sacrifice in orderto avoid facing the risk associated with a known distribution of consumptionfluctuations?” In the epigraph above, Lucas doubts that useful measuresof the representative consumer’s attitudes toward the type of macroeco-nomic risk that he had in mind can be recovered from asset market pricesand returns by adopting the risk-sensitive interpretations of risk-premia inHansen et al. (1999), Tallarini (2000a), and Alvarez and Jermann (2004).

In this section, we describe how market prices of uncertainty extractedfrom asset market data contain information about how much the represen-tative consumer would be willing to pay to eliminate model uncertainty.

Page 259: Maskin ambiguity book mock 2 - New York University

7.8. Welfare gains from eliminating model uncertainty 251

Comparison with risk-free certainty equivalent path

under logarithmic random walk

In the spirit of Lucas (1987a), we follow Tallarini (2000a) by using as ourpoint of comparison the certainty equivalent plan

ct+1 − ct = μ+1

2σ2ε . (7.47)

We seek an adjustment to initial consumption, and therefore the scale of theentire process, that renders a representative consumer indifferent betweenthe certainty equivalent plan and the original risky consumption plan. Forthe same initial conditions, the certainty equivalent path of consumptionexp(ct+1) has the same mean as for the original plan ct+1 − ct = μ+ σεεt+1,but its conditional variance has been reduced to zero. We let cJ0 denote thelevel of initial log consumption in the certainty equivalent plan for a typeJ agent, where J is I, II, III or IV.

Type I agent

Recall formulas (7.12) and (7.43) for the value functions of type I and IIrepresentative agents facing a random walk process for log consumption,namely

U(x0) =W (x0) =β

(1− β)2

[μ− σ2

ε

2(1− β)θ

]+

1

1− βc0.

We seek a proportional decrease in the certainty equivalent trajectory (7.47)that leaves U equal to its value under the risky process. Let cI0 denotethe initialization of the certainty equivalent trajectory for a type I agent.Evidently, it satisfies the equation:

β

(1− β)2

(μ+

σ2ε

2

)+

1

1− βcI0 =

β

(1− β)2

[μ− σ2

ε

2(1− β)θ

]+

1

1− βc0.

The left side is the value under the certainty equivalent plan, while the rightside is the value under the original risky plan starting from c0. Solving for

Page 260: Maskin ambiguity book mock 2 - New York University

252 Chapter 7. Doubts or Variability?

c0 − cI0 gives

c0 − cI0 =β

(1− β)

[σ2ε

2+

σ2ε

2(1− β)θ

](7.48)

=βσ2

ε

2(1− β)

[1 +

1

(1− β)θ

](7.49)

=βσ2

εγ

2(1− β). (7.50)

Type II agent

Because the value functions U and W for type I and II agents are identical,the compensating variation (7.50) renders both types indifferent betweenthe original risky process and the certainty equivalent path. Thus cI0 = cII0 .But the reasons for indifference differ for the two types of agent. For ourKreps-Porteus type I agent, expression (7.50) makes risk aversion, as mea-sured by γ, the reason that the consumer is willing to accept a lower ini-tialization of the consumption path in order to eliminate volatility in thegrowth of the logarithm of consumption. However, for our type II agentwith multiplier preferences indexed by θ, the reduction in initial consump-tion contains contributions from both risk-aversion and aversion to modeluncertainty. The compensation c0 − cII0 emerges from comparing what atype II agent regards as a trajectory that is both risky and model-uncertainwith a trajectory that is both risk-free and model-certain. By itself, thiscomparison does not allow us to distinguish responses to risk and modeluncertainty. To separate the parts contributed by risk and uncertainty, weconstruct a certainty equivalent for another type II agent, but one who doesnot fear model misspecification. This certainty equivalent starts from cII0 (r)instead of cII0 .

Thus, consider a type II agent who does not fear model uncertainty,so that θ = +∞. We ask how much adjustment in the initial conditionof a certainty equivalent path (7.47) a θ = +∞ type II consumer wouldrequire.20 For θ = +∞, (7.50) asserts a compensating variation for theelimination of risk alone of

c0 − cII0 (r) =βσ2

ε

2(1− β). (7.51)

20Of course, this will be the same compensation that a type I agent with γ = 1 wouldrequire.

Page 261: Maskin ambiguity book mock 2 - New York University

7.8. Welfare gains from eliminating model uncertainty 253

For the random walk model, (7.51) corresponds to the compensation for-mula that Lucas (1987a) computed for a consumer with time separablelogarithmic preferences, i.e., the special case of the preferences used by Tal-larini (2000a) for a consumer whose coefficient of relative risk aversion andintertemporal elasticity of substitution are both unity.

When γ in (7.50) is large, so that θ < +∞, it means that the type IIagent fears model misspecification. Then notice that the risk-aversion term(7.51) contributes only a small fraction of the total compensation requiredto accept the certainty equivalent path. Evidently, for the type II agent,the part of the compensation in equation (7.50) that is accounted for byaversion to model uncertainty is21

cII0 (r)− cII0 =βσ2

ε

2(1− β)

[1

(1− β)θ

]=

βσ2ε

2(1− β)(γ − 1) . (7.52)

Type III agent

Consider next a type III agent with θ chosen to support constraint η on dis-counted entropy. For the certainty equivalent path (7.47), the indifferencecalculation made with value function J given by (7.44) is

β

(1− β)2

(μ+

σ2ε

2

)+

1

1− βcIII0 =

β

(1− β)2

[μ− σ2

ε

(1− β)θ

]+

1

1− βc0.

Therefore

c0 − cIII0 =βσ2

ε

2(1− β)(2γ − 1) (7.53)

and

cIII0 (r)− cIII0 =βσ2

ε

(1− β)(γ − 1). (7.54)

This is twice the compensation (7.52) required by a type II agent with thesame value of θ.

21Here we use the decomposition in logarithms c0−cII0 = (c0−cII0 (r))+(cII0 (r)−cII0 ).Notice that the implied decomposition in the level of consumption exp(ct) is multiplica-tive.

Page 262: Maskin ambiguity book mock 2 - New York University

254 Chapter 7. Doubts or Variability?

Type IV agent

Finally, consider a type IV agent. Though he ranks plans according to thevalue function J given by (7.44), his attitude is really that of a type I agentwith γ = 1 (θ = +∞) but a pessimistic view of the mean of log consumptiongrowth, with the mean being altered according to

μ = μ− σ2ε

(1− β) θ

where the θ in this formula is the robustness parameter of an associatedtype III agent whose worst-case model our type IV agent believes withoutdoubt. We again obtain (7.53) when we ask our type IV agent to tell ushow much we could lower the certainty equivalent path (7.47) to render himindifferent between it and the risky path governed by

ct+1 − ct = μ+ σεεt+1.

Comparison with risky but free-of-model-uncertaintyequivalent path

We now describe an alternative measure of the welfare benefits of remov-ing fear of model mispecification. We no longer use the no-risk certaintyequivalent path of section 7.8. We take a different approach. We isolatethe compensation for model uncertainty by allowing only one change in thepath for consumption, in particular, time 0 consumption. We compare twopaths whose risky consumptions for all dates t ≥ 1 are identical, so all com-pensation for model uncertainty occurs by adjusting time 0 consumption.We adjust c0 to equate the value functions for (i) a θ < +∞ type II agentwho fears model misspecification with (ii) a θ = + ∞ type II agent whodoes not fear model misspecification.

Thus, we consider two trajectories for consumption governed by therandom walk for log consumption. For both trajectories, we use a commoninitial condition c0 to construct identical continuation log consumptions ctfor t ≥ 1. But for the path that liberates the type II agent from fear ofmodel misspecification, we reduce date zero consumption to cII0 (u). Forindifference between situations with fear of model mispecification (the leftside of the following equation) and without fear of model misspecification

Page 263: Maskin ambiguity book mock 2 - New York University

7.8. Welfare gains from eliminating model uncertainty 255

(the right side), we require that

β

(1− β)2

[μ− σ2

ε

2(1− β)θ

]+

1

1− βc0 =

β

(1− β)2(μ)+

1

1− βc0+(cII0 (u)− c0)

In constructing the right side, we have set θ = ∞ and replaced c0 withcII0 (u). Solving the above equation for c0 − cII0 (u) gives

c0 − cII0 (u) =βσ2

ε

2(1− β)2

[ 1

(1− β)θ

]=

βσ2ε

2(1− β)2(γ − 1) . (7.55)

Note how this formula is 11−β times the expression on the right side of (7.52).

Consider next a type III agent facing the same choice, that is, the samecompensation scheme for being able to avoid model uncertainty. Let cIII0 (u)denote the consumption that leaves the agent indifferent between the riskybut uncertain process and the risky and uncertain processes. Then

β

(1− β)2

[μ− σ2

ε

(1− β)θ

]+

1

1− βc0 =

β

(1− β)2(μ)+

1

1− βc0+(cIII0 (u)−c0).

Solving for c0 − cIII0 (u) gives

c0 − cIII0 (u) =βσ2

ε

(1− β)2

[ 1

(1− β)θ

]=

βσ2ε

(1− β)2(γ − 1) ,

which equals 11−β times the term on the right side of (7.54).

The compensations in subsection 7.8 took a deterministic trajectory as apoint of comparison, while the ones here use a random path. Here we exploitthe fact that current consumption is known and pile all of the compensationinto the first period, leaving the remainder of the path unchanged underthe approximating model. Because the only adjustment to consumptionoccurs at time 0, a multiplicative factor 1

1−β appears relative to comparableformulas in subsection 7.8.

Page 264: Maskin ambiguity book mock 2 - New York University

256 Chapter 7. Doubts or Variability?

Table 7.3: Benefits of eliminating model risk and uncertainty

Type Compensation Random walk Trend stationary Compensation for

I c0 − cI0βσ2ε

2(1−β)γσ2εβ

2(1−βρ2) +βσ2ε(1−β)(γ−1)

2(1−βρ)2 risk

II c0 − cII0σ2εβ

2(1−β) +βσ2ε

2θ(1−β)2σ2εβ

2(1−βρ2) +βσ2ε

2θ(1−βρ)2 risk and uncertainty

II c0 − cII0 (r) βσ2ε2(1−β)

σ2εβ2(1−βρ2) risk

II cII0 (r)− cII0βσ2ε

2θ(1−β)2βσ2ε

2θ(1−βρ)2 uncertainty

III c0 − cIII0βσ2ε

θ(1−β)2 +βσ2ε

2(1−β)βσ2ε

θ(1−ρβ)2 +βσ2ε

2(1−βρ2) risk and uncertainty

III c0 − cIII0 (r) βσ2ε2(1−β)

σ2εβ2(1−βρ2) risk

III cIII0 (r)− cIII0βσ2ε

θ(1−β)2βσ2ε

θ(1−ρβ)2 uncertainty

Formulas for trend stationary model

Table 7.3 summarizes the above formulas and comparable formulas for thetrend stationary model worked out in appendix 7.A. In the next section,we apply these formulas.

Quantitative results

The two panels of figure 7.4 are designed to bring out the difference betweenan elimination-of-risk experiment of the type imagined by Lucas (1987a,2003) and Tallarini (2000a) and our elimination-of-model-uncertainty ex-periment. Both panels set β = .995 while calibrating θ to set p(θ−1) = 0.10.

The left panel illustrates our elimination of model uncertainty and riskexperiment for a type II agent. The ‘fan’ in the left panel shows a one-standard deviation band that describes j-step ahead conditional distribu-tions for c for our calibrated random walk model for log consumption. Thestraight dashed line below the fan shows the certainty equivalent path withdate zero consumption reduced by (c0 − cII0 ). This reduction makes ourrepresentative agent of type II indifferent between this deterministic trajec-tory and the one illustrated by the ‘fan’ and therefore compensates him forbearing both risk and model ambiguity. The solid line in the left panel il-lustrates another certainty equivalent path for a type II consumer who doesnot fear model uncertainty (θ = ∞) and therefore measures the contribu-

Page 265: Maskin ambiguity book mock 2 - New York University

7.8. Welfare gains from eliminating model uncertainty 257

tion from “risk” to (c0− cII0 ). Here the consumption trajectory is initializedat a value (c0 − cII0 (r)) lower than the initial value of the original process.This is the value computed in formula (7.51). At our calibrated values forthe parameters, it is very small. As a result the solid line is only slightlybelow the center of the ‘fan’. So along with Lucas (1987a, 2003), we alsofind that the welfare gains from eliminating well understood risks are verysmall. We reinterpret the large welfare gains found by Tallarini (2000a) ascoming not from reducing risk, but from reducing model uncertainty.

The right panel illustrates the amount of model uncertainty that our rep-resentative agent of type II fears by displaying one standard deviation bandsthat describe j-step ahead conditional distributions for several stochasticprocesses: (1) the same calibrated random walk with drift log consump-tion model depicted in the left panel and (2) two elements drawn from acloud of models that the minimizing player inside our type II representativeconsumer’s head is allowed to choose among, both of which start from thesame initial condition as process (1).22 We also plot the deterministic path(7.47) that we have initialized using formula (7.50) to make the agent indif-ferent between facing this certainty equivalent path and confronting modeluncertainty and risk.

As a function of discounted entropy η, figure 7.5 plots the benefitscIII0 (r) − cIII0 to a type III agent of eliminating model uncertainty as aproportional reduction in initial consumption. The quantities plotted aregiven by formulas (7.54) for the random walk model and the correspondingentry in Table 7.3 for the trend stationary model. The benefits are halfas much for a type II agent with a corresponding θ. But relative to theamounts estimated by Lucas (1987a, 2003), they are very big.23

22The lower element is a worst-case distribution obtained by adjusting the mean toμ + σεw(θ) where w(θ) is the worst case shock for θ that sets p(θ−1) = .1, while theupper element adjusts the mean to μ− σεw(θ).

23Proposition 10 of Cerreia-Vioglio et al. (2008) characterizes more uncertainty aversepreference relations in terms of pointwise smaller G functions, in their notation. (For ourtype II agent with multiplier preferences, their G function equals θ times the present valueof discounted entropy.) That finding allows us to measure model uncertainty aversionof our agents in terms of alternative θ’s for a type II agent and η’s for a type III agent.However, it provides no basis for comparing G’s across type II and type III agents.

Page 266: Maskin ambiguity book mock 2 - New York University

258 Chapter 7. Doubts or Variability?

0 50 100 150 200 250−0.2

0

0.2

0.4

0.6

0.8

1

1.2

log(

cons

umpt

ion)

1 s.d. band

cr

cII

(a) Risk and uncertainty

0 50 100 150 200 250−0.2

0

0.2

0.4

0.6

0.8

1

1.2

log(

cons

umpt

ion)

(b) Set of models considered

Figure 7.4: Panel A: An elimination of risk and uncertainty experimentfor the random walk model. Panel B. Set of models considered by theambiguity averse agent and an elimination of model uncertainty and riskexperiment for the random walk model.

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

detection error probability p(η)

Pro

port

ion

of c

onsu

mpt

ion

(%)

RW type IIITS type III

Figure 7.5: Proportions cIII0 (r)−cIII0 of initial consumption that a representativeIII consumer would surrender not to confront model uncertainty; top line is forrandom-walk model of consumption growth, bottom line is for trend-stationarymodel.

Page 267: Maskin ambiguity book mock 2 - New York University

7.9. Dogmatic Bayesians and learning 259

7.9 Dogmatic Bayesians and learning

Robust dogmatic Bayesians

Consider the geometric random walk model for consumption. Tallarini(2000a) follows many other rational expectations researchers in assigningto his representative consumer a dogmatic prior over the growth rate μand the innovation volatility σε. One way to think about our type II orIII representative consumer is that at the end of the day he examines howrobust his evaluations are with respect to alternative dogmatic priors overμ. That leads him to price assets like a type IV consumer who has a priorfor the mean of log consumption growth concentrated on μ+ σεw(θ).

The left panel of figure 7.6 compares log consumption to the two linesc0 + μt and c0 + (μ + σεw(θ))t for a θ associated with a detection errorprobability p(θ−1) = .2. These lines are close in the informal eyeball sensethat we have shown them to empirically sophisticated friends and uponasking them to tell us which one fits the data best, have received the modalanswer ‘difficult for me to tell’. The right panel of figure 7.6 conveys thesame idea from a different perspective by plotting μ+σεw(θ) as a function ofthe associated detection error probability p(θ) compared to a two standarddeviation band around the maximum likelihood estimate of μ.

The two panels of figure 7.6 and the logic underlying our detection errorprobabilities indicate that the differences in mean consumption growth thatour analysis features are difficult to distinguish with samples of the size, forexample, that Tallarini (2000a) used to estimate mean consumption growth.Our type II representative agent copes with this situation by doing a priorrobustness analysis, but we have effectively constrained him to comparepriors, all of which are dogmatic. A natural question to ask is: if it isso difficult to learn about μ, wouldn’t it make more sense to endow ourrepresentative consumer with a non-dogmatic prior over μ?

Learning?

An affirmative answer to that question is the starting point for Hansen andSargent (2006a), who use the analytical framework of Hansen and Sargent(2007b) to endow a representative consumer with a non-dogmatic prior overμ. They model μ as a hidden Markov state and allow the representative con-sumer to learn about mean consumption growth as data arrive. But because

Page 268: Maskin ambiguity book mock 2 - New York University

260 Chapter 7. Doubts or Variability?

1950 1960 1970 1980 1990 2000

−4.6

−4.4

−4.2

−4

−3.8

−3.6

log(consumption)approximating model

wc model p(θ−1)=0.2

(a) Log consumption

0 5 10 15 20 25 30 35 40 45 503

3.5

4

4.5

5

5.5

6x 10

−3

detection error probability

2 standard deviation band

(b) Worst case mean consumption growth

Figure 7.6: Left panel: log consumption and two lines; right panel: worstcase mean consumption growth versus detection error probability.

he does not completely trust the posterior probabilities that emerge fromBayes’ law, the representative consumer engages in a worst-case analysisthat leads him to slant posterior probabilities pessimistically. By includinga hidden state variable that indexes alternative submodels for consumptiongrowth, Hansen and Sargent (2006a) also study a difficult on-going modelselection problem. They posit an associated set of specification doubts thatlead the representative consumer to slant posterior probabilities over thesubmodels pessimistically. These learning problems are sufficiently diffi-cult that the representative consumer is unable to resolve his specificationdoubts within a sample of the length that Tallarini (2000a) studied. Ro-bust learning gives rise to countercyclical uncertainty premia because therepresentative consumer interprets good news about consumption growthas temporary and bad news about consumption growth as permanent.

7.10 Concluding remarks

It is easy to agree with Lucas that the coefficients γ that Tallarini cal-ibrated to match asset market data are implausibly high when they areinterpreted as measures of atemporal risk aversion. Those high γ’s becomemore plausible when we interpret them as measures of the representative

Page 269: Maskin ambiguity book mock 2 - New York University

7.A. Formulas for trend stationary model 261

consumer’s reluctance to face model uncertainty. How we interpret γ hasimportant ramifications about whether risk premia measure (a) the bene-fits from reducing well understood stochastic aggregate fluctuations, or (b)the benefits of reducing uncertainty about the representative consumer’sstochastic specification for consumption growth. We have argued that theymeasure (b), not (a).

The main point of Lucas (2003) was that after one takes into accountwhat has been achieved by using systematic monetary and fiscal policiesto smooth aggregate fluctuations in in the post WWII U.S., only smalladditional welfare gains can be attained by smoothing transitory shocksfurther. Under our robustness interpretation, those transitory shocks playa role excluded by Lucas’s analysis: by obscuring the consumer’s ability todiscriminate among alternative models, they put the consumer in a positionin which his concerns about model misspecification make him want evalua-tions of future outcomes that are cautious with respect to a set of plausiblestatistically nearby models. The process of constructing worst-case sce-narios to assist in making those cautious evaluations transforms transientrisks into concerns about misspecifications of lower frequency aspects of therepresentative consumer’s approximating model.24

Appendix 7.A Formulas for trend

stationary model

The worst case mean of εt+1 for the trend stationary model is

w(θ) =−σε

(1− ρβ) θ. (7.56)

Using this, we find that discounted entropy is

N(xt) =βσ2

ε

2θ2 (1− β) (1− βρ)2(7.57)

24The formulas for the worst-case means wt+1 in subsection 7.6 reveal this transfor-mation. In the simple models of this paper, concerns about misspecification translateinto (permanent) distortions in the means of shocks. In more general dynamic models,they translate into more richly altered distortions in frequency responses. See Hansenand Sargent (2008b, chap. 7).

Page 270: Maskin ambiguity book mock 2 - New York University

262 Chapter 7. Doubts or Variability?

so the value function for a type III or IV agent is

J(xt) =βζ (1− ρ)

(1− β) (1− βρ)+

βμ

(1− β)2− σ2

εβ

θ (1− β) (1− βρ)2+

μβ (1− ρ)

(1− βρ) (1− β)t+

1

1− βρct

(7.58)and the value function for a type II agent is

W (xt) =βζ (1− ρ)

(1− β) (1− βρ)+

βμ

(1− β)2− σ2

εβ

2θ (1− β) (1− βρ)2+

μβ (1− ρ)

(1− βρ) (1− β)t+

1

1− βρct.

(7.59)

The geometric trend stationary model obeys the following differenceequation:

ct+1 = ρct + ζ(1− ρ) + ρμ+ μ(1− ρ)(t + 1) + σεεt+1

We first construct the path of conditional expectations for our original pro-cess with fluctuations. Iterating the equation above forward we find that:

ct+j = ρjct+μj+(1−ρ)(ζ+μt)(1+ρ+. . .+ρj−1)+σε(εt+j+ρεt+j−1+. . .+ρj−1εt+1)

and therefore

logEt[exp(ct+j)] = ρjct + μj + (1− ρ)(ζ + μt)(1 + ρ+ . . .+ ρj−1) +σ2ε

2(1 + ρ2 + . . .+ ρ2(j−

= ρjct + μj + (ζ + μt)(1− ρj) +σ2ε

2

1− ρ2j

1− ρ2.

We then proceed to compute the value function under the certaintyequivalent trajectory.

U(xt) =

∞∑j=0

βj (logEt[exp(ct+j)])

=

∞∑j=0

βj(ρjct + μj + (ζ + μt)(1− ρj) +

σ2ε

2

1− ρ2j

1− ρ2

)=

ζβ (1− ρ)

(1− β) (1− βρ)+

μβ

(1− β)2+

σ2εβ

2(1− β)(1− βρ2)+

μβ(1− ρ)

(1− β)(1− ρβ)t+

ct1− ρβ

.

We report in table 7.3 the elimination of risk and uncertainty compen-sations for the trend stationary model that we have computed using the

Page 271: Maskin ambiguity book mock 2 - New York University

7.A. Formulas for trend stationary model 263

same procedure as for the random walk model. Note that when ρ = 1the compensating variations are identical to the ones for the random walkmodel.

The alternative subsection 7.8 welfare measure that compares the riskybut free-of-model-uncertainty equivalent path with the original path for thetrend stationary model is

c0 − cII0 (u) =βσ2

ε

2θ(1− β)(1− βρ)2

=βσ2

ε

2(1− βρ)2(γ − 1).

for a type II agent. The same measure for a type III agent is given by

c0 − cIII0 (u) =βσ2

ε

θ(1− β)(1− βρ)2

=βσ2

ε

(1− βρ)2(γ − 1).

Page 272: Maskin ambiguity book mock 2 - New York University
Page 273: Maskin ambiguity book mock 2 - New York University

Chapter 8

Robust Estimation andControl Without Commitment

1

Abstract

In a Markov decision problem with hidden state variables, a pos-terior distribution serves as a state variable and Bayes’ law underan approximating model gives its law of motion. A decision makerexpresses fear that his model is misspecified by surrounding it with aset of alternatives that are nearby when measured by their expectedlog likelihood ratios (entropies). Martingales represent alternativemodels. A decision maker constructs a sequence of robust decisionrules by pretending that a sequence of minimizing players choose in-crements to martingales and distortions to the prior over the hiddenstate. A risk sensitivity operator induces robustness to perturba-tions of the approximating model conditioned on the hidden state.Another risk sensitivity operator induces robustness to the prior dis-tribution over the hidden state. We use these operators to extend theapproach of Hansen and Sargent (1995b) to problems that containhidden states.

1We thank In-Koo Cho, Anastasios Karantounias, Kenneth Kasa, Ricardo Mayer,Grace Tsiang, Tomasz Piskorski, the editor, and especially a referee for helpful commentson earlier drafts of this paper. This material is based on work supported by the NationalScience Foundation under Award Numbers SES0519372 and SES0137035.

265

Page 274: Maskin ambiguity book mock 2 - New York University

266 Chapter 8. Robust Estimation and Control Without Commitment

Key words: robustness, hidden state Markov chains, martingales, risksensitivity, decision theory.

8.1 Introduction

This paper studies robust control problems in which the presence of hid-den state variables motivates a decision maker to solve a filtering problem.From the standpoint of robustness, hidden states create problems becausethey give an informational advantage to the hypothetical ‘evil agent’ whohelps the decision maker construct decision rules that are robust to modelmisspecification. To ensure that decision rules work well even when a de-cision maker’s baseline probability model is flawed, the evil agent must beallowed to alter the dynamics of the hidden states as well as the probabil-ities assigned to those states. We allow the evil agent to see the hiddenstates when he explores those alterations.

It is not obvious how the evil agent should exploit his informationaladvantage in order to help the decision maker design robust decision rules.In Hansen and Sargent (2005b), we studied a version of this problem whenboth the decision maker and his evil twin can commit to their decision rules.In contrast, in the present paper, neither agent can commit. AlthoughHansen and Sargent (2005b) construct a recursive representation of theequilibrium of the time 0 zero-sum, two-player game that achieves robust-ness, that equilibrium assumes that the players do not discount incrementsto entropy. When players discount future utilities, a failure to discount in-crements to entropy creates time varying decision rules that make concernsabout robustness dissipate over time. To allow concerns about robustnessto endure and time-invariant decision rules to prevail, this paper studiesMarkov perfect equilibria of zero-sum games with symmetric discounting ofincrements both to entropy and to utility.2 Interestingly, it turns out thatthis symmetric approach to discounting entropy and utility implies that, ingeneral, worst case beliefs about hidden states are dynamically inconsistent,although beliefs about observables are dynamically consistent.

In more detail, this paper constructs robust decision rules for discounteddynamic programming problems that confront a decision maker with an in-centive to evaluate alternative models and to learn about unknown param-

2Hansen and Sargent (2005b, p. 286) indicate how part of what we are doing in thispaper is to replace β−tθ in the recursive representations there with θ here.

Page 275: Maskin ambiguity book mock 2 - New York University

8.1. Introduction 267

eters and other hidden state variables.3 Extending robust control theoryto include these features allows us to approach unsolved problems in pos-itive and normative economics, for example, computing asset evaluationsof robust investors and designing robust monetary and fiscal policies. Acriticism of our earlier robust control formulations with persistent fears ofmodel misspecification (for example, Hansen and Sargent (1995b), Hansenet al. (2006b), Hansen et al. (1999), Anderson et al. (2003)) is that theyprecluded learning by not allowing the decision maker to use new informa-tion to refine his approximating model and diminish the set of plausiblealternatives against which he seeks robustness.4 This paper confronts thatcriticism by incorporating learning.

The Bayesian literature on model selection and learning has acquiredprominent critics. Turning their backs on Bayes’ law and the logic of back-ward induction, distinguished macroeconomists have warned against usingexperimentation, learning, and model selection to guide decision making.5

These skeptics seem to believe that in any concrete application, the detailsof a model are bound to be subject to misspecifications that will render rec-ommendations to experiment that come from pursuing the logic of Bayesand backward induction unacceptably fragile.6 Our methods show a deci-sion maker how to experiment even though he distrusts both his modelsand the prior that he places over them.

We begin by assuming that, through some unspecified process, a deci-sion maker has arrived at an approximating model that fits historical datawell. Our notion of a model is broad enough to subsume the possibility thatthe model is actually a collection of alternative submodels, each of which

3For examples of such dynamic programming problems, see Jovanovic (1979), Jo-vanovic and Nyarko (1995, 1996), and Bergemann and Valimaki (1996).

4See Weiland (2005).5Alan Blinder, Robert E. Lucas, and Martin Feldstein have all argued against exper-

imenting to refine model selection. See Cogley et al. (2008) for quotations and furtherdiscussion.

6See Brock et al. (2003, 2004) for an approach to model selection and decision mak-ing that also emphasizes robustness. Cho and Kasa (2008, 2006) study ongoing modelselection problems by using mathematical tools that are closely related to the ones usedhere, but in different and interesting ways. They use entropy based versions of recursiveon-going model specification tests to study a model selection problem faced by a largeplayer whose decisions influence aggregate dynamics. They show a tight relationship be-tween the dynamics produced by their model and those that emerge from some modelsthat have been used in the recent literature on adaptive learning, for example, Cho et al.(2002) and Kasa (2004).

Page 276: Maskin ambiguity book mock 2 - New York University

268 Chapter 8. Robust Estimation and Control Without Commitment

can contain unknown parameters. More generally, we formulate the deci-sion environment using a hidden state Markov model, and we index a familyof probability models by conditioning on the hidden Markov state. Someof the components of the Markov states can be time invariant indicators ofsubmodels and others can be parameters or hidden states that are associ-ated with the submodels that can vary over time. A conditional model isa probability specification conditioned on a hidden Markov state. Becausethe decision-maker fears that each member of his family of approximatingconditional models is misspecified, he surrounds each of them with a setof unspecified alternative models whose expected log likelihood ratios (i.e.,relative entropies) are restricted or penalized. The decision maker believesthat the data will be generated by an unknown member of one of these sets.When relative entropies are constrained to be small, the decision maker be-lieves that his model is a good approximation. The decision maker wantsrobustness against these alternatives because, as Anderson et al. (2003) em-phasize, perturbations with small relative entropies are statistically difficultto distinguish from the approximating model.

Since the decision maker does not know the hidden Markov state, heis also compelled to weight the alternative conditional models. This paperassumes that at each date the appropriate summary of past signals aboutthe hidden state is the decision maker’s posterior under the approximatingmodel, just as it is when the decision maker trusts his model. He makesa robust adjustment to these probabilities to accommodate the fact thatthey were constructed from the approximating model. As we shall see,the decision maker is not required to respect distortions to the distributionof today’s hidden state that were implied by his decision making processat earlier dates. Hansen and Sargent (2005b) studied a closely relateddecision problem that requires today’s decision maker to commit to thoseprior distortions of the distribution of today’s hidden states.

Section 8.2 formulates a Markov control problem in which a decisionmaker with a trusted model receives signals about hidden state variables.Subsequent sections view the model of section 8.2 as an approximation,use relative entropy to define a cloud of models that are difficult to distin-guish from it statistically, and construct a sequence of decision rules thatcan work well for all of those models. Section 8.3 uses results of Hansenand Sargent (2005b) to represent distortions of an approximating model interms of martingales that are defined on the same probability space as theapproximating model. Section 8.4 then defines two risk-sensitivity operators

Page 277: Maskin ambiguity book mock 2 - New York University

8.1. Introduction 269

,T1 and T2, respectively, that are indexed by penalty parameters (θ1, θ2).These two operators are counterparts of related operators in Hansen andSargent (2005b), but there they must appear with a common θ. By al-lowing the two θ’s to differ here, we allow the decision maker to focus hisconcerns about robustness on different aspects of his specification. Thus,in section 8.5, we use T1 to adjust continuation values for concerns aboutmodel misspecification, conditioned on knowledge of the hidden state. Weuse T2 to adjust continuation values for concerns about misspecification ofthe distribution of the hidden state. We interpret θ1 and θ2 as penalties onpertinent entropy terms. Hansen and Sargent (2006d) specializes our sec-tion 8.5 recursions to compute robust decision rules for the linear quadraticGaussian case.

Section 8.6 relates the special θ1 = θ2 case to a decision problem un-der commitment that we analyzed in Hansen and Sargent (2005b). Whenθ1 = θ2 and the discount factor is unity, the problem in this paper coin-cides with an undiscounted version of the problem in Hansen and Sargent(2005b). We discuss the dynamic consistency of worst case beliefs about thehidden state in subsections 8.6 and 8.6. To prepare the machinery neededto construct stochastic discount factors for use in finance and macroeco-nomics like those mentioned in section 8.10, section 8.7 describes the worstcase distribution over signals. Section 8.8 interprets our formulation andsuggests modifications of it in terms of the multiple priors models of Ep-stein and Schneider (2003a,b). Section 8.9 relates our formulation to papersabout reducing compound lotteries. To illustrate the single-agent decisiontheory that we have developed in this paper, section 8.10 briefly describesa pure-endowment asset pricing model of Hansen and Sargent (2006a) thatprices assets by reading marginal valuations off of the value function of asingle representative agent. Hansen and Sargent (2006a) combine the loga-rithmic preference specification of Tallarini (2000a), an endowment specifi-cation closely related to one of Bansal and Yaron (2004), and special casesof the recursions in Hansen and Sargent (2006d), which in turn special-ize the recursions in this paper. Hansen and Sargent (2006a) construct thevalue function for a representative consumer who lives in a pure endowmenteconomy and is unsure about the specifications of two submodels that mightgovern consumption growth, as well as being unsure about the probabilityfor mixing those submodels. They deduce a multiplicative adjustment tothe market price of risk that is contributed by the representative consumer’sconcerns about robustness. This model illustrates how the representative

Page 278: Maskin ambiguity book mock 2 - New York University

270 Chapter 8. Robust Estimation and Control Without Commitment

consumer’s concern about misspecification of the probabilities for mixingsubmodels and also of the conditional means and other hidden componentsof consumption growth within each submodel can increase the volatility ofthe stochastic discount factor and enhance what are typically interpreted asrisk premia.7 Section 8.11 concludes. Hansen and Sargent (2005b) containsan extensive account of related literatures. An application to a decisionproblem with experimentation and learning about multiple submodels ap-pears in Cogley et al. (2008).

8.2 A control problem without model

uncertainty

For t ≥ 0, we partition a state vector as xt =

[ytzt

], where yt is observed and

zt is not. A vector st of observable signals is correlated with the hidden statezt and is used by the decision maker to form beliefs about the hidden state.Let Z denote a space of admissible unobserved states, Z a correspondingsigma algebra of subsets of states, and λ a measure on the measurable spaceof hidden states (Z,Z). Let S denote the space of signals, S a correspondingsigma algebra, and η a measure on the measurable space (S,S) of signals.

Let {St : t ≥ 0} denote a filtration, where St is generated by y0, s1, ...,st, where we shall assume that s1, . . . , st is generated by system (8.1), (8.2),8.3) below. We can apply Bayes’ rule to τ to deduce a density qt, relativeto the measure λ, for zt conditioned on information St. Let {Xt : t ≥ 0} bea larger filtration where Xt is generated by x0, w1, w2, ..., wt. The smallestsigma algebra generated by all states for t ≥ 0 is X∞

.=∨t≥0 Xt; the

smallest sigma algebra generated by all signals for t ≥ 0 is S∞.=∨t≥0 St.

Let A denote a feasible set of actions, which we take to be a Borel set ofsome finite dimensional Euclidean space, and let At be the set of A-valuedrandom vectors that are St measurable.8

7With enough data, the representative consumer might ‘learn his way out of’ someof the specification concerns including the submodel mixture probabilities and the pa-rameters within submodel. This happens if the parameter estimates converge and themixture probabilities concentrate on one submodel. But in the model of Hansen and Sar-gent (2006a), the mixture probabilities continue fluctuating broadly within the samplefor a quarterly record of post WWII U.S. consumption growth rates.

8We could easily allow A to depend on the observable component of the state.

Page 279: Maskin ambiguity book mock 2 - New York University

8.2. A control problem without model uncertainty 271

Signals and states are determined by the transition functions

yt+1 = πy(st+1, yt, at) (8.1)

zt+1 = πz(xt, at, wt+1) (8.2)

st+1 = πs(xt, at, wt+1) (8.3)

where {wt+1 : t ≥ 0} is an i.i.d. sequence of random variables. Knowledge ofy0 and πy allows us to construct yt recursively from signals and actions. Theconstruction of xt in equation (8.1) - (8.2) and the informational constrainton action processes imply that xt is Xt measurable and yt is St measurable.Substituting (8.3) into (8.1) gives:

yt+1 = πy[πs(xt, at, wt+1), yt, at].= πy(xt, at, wt+1).

Equations (8.2) and (8.3) determine a conditional density τ(zt+1, st+1|xt, at)relative to the product measure λ× η.

As a benchmark, consider the following decision problem under completeconfidence in model (8.1), (8.2), (8.3) but incomplete information about thestate:

Problem 8.2.1.

maxat∈At:t≥0

E

[T∑t=0

βtU(xt, at)|S0

], β ∈ (0, 1)

subject to (8.1), (8.2), and (8.3).

To make problem 8.2.1 recursive, let ∗ denote a next period value anduse τ to construct two densities for the signal:

κ(s∗|yt, zt, at) .=

∫τ(z∗, s∗|yt, zt, at)dλ(z∗)

ς(s∗|yt, qt, at) .=

∫κ(s∗|yt, z, at)qt(z)dλ(z). (8.4)

By Bayes’ rule,

qt+1(z∗) =

∫τ(z∗, st+1|yt, z, at)qt(z)dλ(z)

ς(st+1|yt, qt, at)≡ πq(st+1, yt, qt, at). (8.5)

Page 280: Maskin ambiguity book mock 2 - New York University

272 Chapter 8. Robust Estimation and Control Without Commitment

In particular applications, πq can be computed with methods that specializeBayes’ rule (e.g., the Kalman filter or a discrete time version of the Wonham(1964) filter).

Take (yt, qt) as the state for a recursive formulation of problem 8.2.1.

The transition laws are (8.1) and (8.5). Let π =

[πyπq

]. Then we can rewrite

problem 8.2.1 in the alternative form:

Problem 8.2.2. Choose a sequence of decision rules for at as functions of(yt, qt) for each t ≥ 0 that maximize

E

[T∑t=0

βtU(xt, at)|S0

]

subject to (8.1), (8.5), a given density q0(z), and the density κ(st+1|yt, zt, at).The Bellman equation for this problem is

W (y, q) = maxa∈A

∫ {U(x, a)+β

∫W ∗ [π(s∗, y, q, a)]κ(s∗|y, z, a)dη(s∗)

}q(z)dλ(z).

(8.6)

In an infinite horizon version of problem 8.2.2, W ∗ =W .

Examples

Examples of problem 8.2.2 in economics include Jovanovic (1979), Jovanovic(1982), Jovanovic and Nyarko (1995, 1996), and Bergemann and Valimaki(1996). Examples from outside economics appear in Elliott et al. (1995).Problems that we are especially interested in are illustrated in the followingfour examples. (More examples appear in section 8.10.)

Example 8.2.3. Model Uncertainty I: two submodels. Let the hidden statez ∈ {0, 1} index one of two submodels. Let

yt+1 = st+1

zt+1 = ztst+1 = πs(yt, z, at, wt+1). (8.7)

The hidden state is time invariant. The decision maker has prior proba-bility Prob(z = 0) = q. The third equation in (8.7) depicts two laws of

Page 281: Maskin ambiguity book mock 2 - New York University

8.2. A control problem without model uncertainty 273

motion. Cogley et al. (2008) and Cogley et al. (2007) study the value ofmonetary policy experimentation in a model in which a is an inflation tar-get and πs(y, z, a, w) = πy(y, z, a, w) for z ∈ {0, 1} represent two submodelsof inflation-unemployment dynamics.

Example 8.2.4. Model Uncertainty II: a continuum of submodels. Theobservable state y takes two possible values {yL, yH}. Transition dynam-ics are still described by (8.7), but now there is a continuum of modelsindexed by the hidden state z ∈ [0, 1] × [0, 1] that stands for unknown val-ues of two transition probabilities for an observed state variable y. Givenz, we can use the third equation of (8.7) to represent a two state Markovchain that governs the observable state y (see Elliott et al. (1995)), P =[

p11 1− p111− p22 p22

], where (p11, p22) = z. The decision maker has prior dis-

tribution g0,1(p11)g0,2(p22) on z; g0,1 and g0,2 are beta distributions.

Example 8.2.5. Model Uncertainty III: a components model of incomedynamics with an unknown fixed effect in labor income. The utility functionU(at) is a concave function of consumption at; y2t is the level of financialassets, and y1t = st is observed labor income. The evolution equations are

y1,t+1 = st+1

y2,t+1 = R[y2,t + y1,t − at]z1,t+1 = z1,tz2,t+1 = ρz2,t + σ1w1,t+1

st+1 = z1,t + z2,t + σ2w2,t+1

where wt+1 ∼ N (0, I) is an i.i.d. bivariate Gaussian process, R ≤ β−1

is a gross return on financial assets y2,t, |ρ| < 1, z1,t is one unobservedconstant component of labor income, and z2,t is another unobserved seriallycorrelated component of labor income. A decision maker has a prior q0 over(z1,0, z2,0).

Example 8.2.6. Estimation of drifting coefficients regression model. Theutility function U(xt, at) = −L(zt − at), where L is a loss function and atis a time-t estimate of the coefficient vector zt. The evolution equation is

yt+1 = st+1

Page 282: Maskin ambiguity book mock 2 - New York University

274 Chapter 8. Robust Estimation and Control Without Commitment

zt+1 = ρzt + σ1w1,t+1

st+1 = yt · zt + σ2w2,t+1

where wt+1 ∼ N (0, I) and there is a prior q0(z) on an initial set of coeffi-cients.

Modified problems that distrust κ(s∗|y, z, a) and q(z)

This paper studies modifications of problem 8.2.2 in which the decisionmaker wants a decision rule that is robust to possible misspecifications ofequations (8.1)-(8.2). We express the Bellman equation as (8.6) and usethe decomposition of ς in (8.4). Representation (8.6) focuses the decisionmaker’s concerns on two aspects of the stochastic structure: the conditionaldistribution of next period’s signals κ(s∗|y, z, a) and the distribution overthis period’s value of the hidden state q(z). We propose a recursive for-mulation of a robust control and estimation problem that allows a decisionmaker to doubt either or both of these aspects of his stochastic specifica-tion. To prepare the way, section 8.3 describes how misspecifications of thedecision maker’s approximating model can be represented in terms of se-quences of nonnegative random variables that form martingales under thatapproximating model.

8.3 Using martingales to represent model

misspecifications

Hansen and Sargent (2005b) use a nonnegative Xt-measurable function Mt

with EMt = 1 to create a distorted probability measure that is absolutelycontinuous with respect to the probability measure over Xt generated bythe model (8.1) - (8.2). The random variable Mt is a martingale underthis probability measure. Using Mt as a Radon-Nikodym derivative gen-erates a distorted measure under which the expectation of a bounded Xt-measurable random variable Wt is EWt

.= EMtWt. The entropy of the dis-

tortion at time t conditioned on date zero information is E (Mt logMt|X0)or E(Mt logMt|S0).

Page 283: Maskin ambiguity book mock 2 - New York University

8.3. Using martingales to represent model misspecifications 275

Recursive representations of distortions

We often factor a density Ft+1 for an Xt+1-measurable random variable asFt+1 = Ftft+1 where ft+1 is a one-step ahead density conditioned on Xt. It isalso useful to factor Mt. Thus, take a nonnegative martingale {Mt : t ≥ 0}and form

mt+1 =

{Mt+1

Mtif Mt > 0

1 if Mt = 0.

Then Mt+1 = mt+1Mt and

Mt =M0

t∏j=1

mj . (8.8)

The random variable M0 has unconditional expectation equal to unity. Byconstruction, mt+1 has date t conditional expectation equal to unity. Fora bounded random variable Wt+1 that is Xt+1-measurable, the distortedconditional expectation implied by the martingale {Mt : t ≥ 0} is

E(Mt+1Wt+1|Xt)

E(Mt+1|Xt)=E(Mt+1Wt+1|Xt)

Mt= E (mt+1Wt+1|Xt)

provided that Mt > 0. We use mt+1 to represent distortions of the condi-tional probability distribution for Xt+1 given Xt. For each t ≥ 0, constructthe space Mt+1 of all nonnegative, Xt+1-measurable random variables mt+1

for which E(mt+1|Xt) = 1.The conditional (on Xt) relative entropy of a nonnegative random vari-

able mt+1 in Mt+1 is ε1t (mt+1).= E (mt+1 logmt+1|Xt) .

Distorting likelihoods with hidden information

The random variable Mt is adapted to Xt and is a likelihood ratio fortwo probability distributions over Xt. The St-measurable random variableGt = E (Mt|St) implies a likelihood ratio for the reduced information set St;Gt assigns distorted expectations to St-measurable random variables thatagree with Mt, and {Gt : t ≥ 0} is a martingale adapted to {St : t ≥ 0}.

Define the Xt-measurable random variable ht by

ht.=

{ Mt

E(Mt|St)if E (Mt|St) > 0

1 if E (Mt|St) = 0

Page 284: Maskin ambiguity book mock 2 - New York University

276 Chapter 8. Robust Estimation and Control Without Commitment

and decompose Mt asMt = htGt. (8.9)

Decompose entropy as

E (Mt logMt|S0) = E [Gtht (log ht + logGt) |S0]= E (Gtht log ht|S0) + E (Gt logGt|S0)

where we have dropped an ht from the last term because E(ht|St) = 1 andGt is St measurable. Define ε2t (ht)

.= E (ht log ht|St) as the conditional (on

St) relative entropy.We now have the tools to represent and measure misspecifications of

the two components κ(s∗|y, z, a) and q(z) in (8.6). In (8.9), Mt distorts theprobability distribution of Xt, ht distorts the probability of Xt conditionedon St, Gt distorts the probability of St, and mt+1 distorts the probability ofXt+1 given Xt. We use multiplication bymt+1 to distort κ and multiplicationby ht to distort q. We use ε1t (mt+1) to measure mt+1 and ε2t (ht) to measureht.

8.4 Two pairs of operators

This section introduces two pairs of risk-sensitivity operators, (R1t ,T

1) and(R2

t ,T2). In section 8.5, we use the T1 andT2 operators to define recursions

that induce robust decision rules.

R1t and T1

For θ > 0, let Vt+1 be an Xt+1-measurable random variable for which

E[exp

(−Vt+1

θ

)|Xt

]<∞. Then define

R1t (Vt+1|θ) = min

mt+1∈Mt+1

E (mt+1Vt+1|Xt) + θε1t (mt+1)

= −θ logE[exp

(−Vt+1

θ

)|Xt

]. (8.10)

The minimizing choice of mt+1 is

m∗t+1 =

exp(−Vt+1

θ

)E[exp

(−Vt+1

θ

)|Xt

] (8.11)

Page 285: Maskin ambiguity book mock 2 - New York University

8.4. Two pairs of operators 277

where the term in the denominator assures that Em∗t+1|Xt = 1.

In the limiting θ = ∞ case, R1t (Vt+1|∞) = E(Vt+1|Xt). Notice that

this expectation can depend on the hidden state. When θ <∞, R1t adjusts

E(Vt+1|Xt) by using a worst-case belief about the probability distribution ofXt+1 conditioned on Xt that is implied by the twisting factor (8.11), as wellas adding an entropy penalty. When the conditional moment restriction

E[exp

(−Vt+1

θ

)|Xt

]< ∞ is not satisfied, we define R1

t to be −∞ on the

relevant conditioning events.When the Xt+1-measurable random variable Vt+1 takes the special form

W (yt+1, qt+1, zt+1), the R1t (·|θ) operator implies another operator:

T1(W |θ)(y, q, z, a) = −θ log∫

exp

(−W [π(s∗, y, q, a), z∗]

θ

)τ(z∗, s∗|y, z, a)dλ(z∗)dη(s∗).

The transformation T1 maps a value function that depends on next pe-riod’s state (y∗, q∗, z∗) into a risk-adjusted value function that depends on(y, q, z, a). Associated with this risk sensitivity adjustment T1 is a worst-case distortion in the transition dynamics for the state and signal process.Let φ denote a nonnegative density function defined over (z∗, s∗) satisfying∫

φ(z∗, s∗)τ(z∗, s∗|y, z, a)dλ(z∗)dη(s∗) = 1. (8.12)

The corresponding entropy measure is:∫log[φ(z∗, s∗)]φ(z∗, s∗)τ(z∗, s∗|y, z, a)dλ(z∗)dη(s∗).

In our recursive formulation, we think of φ as a possibly infinite dimensionalcontrol vector (a density function) and consider the minimization problem:

minφ≥0

∫(W [π(s∗, y, q, a), z∗] + θ log[φ(z∗, s∗)])φ(z∗, s∗)τ(z∗, s∗|y, z, a)dλ(z∗)dη(s∗)

subject to (8.12). The associated worst-case density conditioned on Xt isφt(z

∗, s∗)τ(z∗, s∗|xt, at) where

φt(z∗, s∗) =

exp(−W [π(s∗,yt,qt,at),z∗]

θ

)E[exp

(−W [π(st+1,yt,qt,at),zt+1]

θ

)|Xt

] . (8.13)

Page 286: Maskin ambiguity book mock 2 - New York University

278 Chapter 8. Robust Estimation and Control Without Commitment

R2t and T2

For θ > 0, let V t be anXt-measurable function for which E[exp

(− V t

θ

)|St]<

∞. Then define

R2t

(Vt|θ

)= min

ht∈Ht

E(htVt|St

)+ θε2t (ht)

= −θ logE[exp

(− Vtθ

)|St

](8.14)

where Ht is the set of all nonnegative Xt-measurable random variables forwhich E(ht|St) = 1. The minimizing choice of ht is

h∗t =exp

(− V t

θ

)E[exp

(− V t

θ

)|St]

where the term in the denominator assures that Eh∗t |St = 1.When an Xt-measurable function has the special form V t = W (yt, qt, zt, at),

(8.14) implies another operator

T2(V |θ)(y, q, a) = −θ log∫

exp

[−W (y, q, z, a)

θ

]q(z)dλ(z).

The associated minimization problem is:

minψ≥0

∫ [W (y, q, z, a) + θ logψ(z)

]ψ(z)q(z)dλ(z)

subject to (8.15), where ψ(z) is a relative density that satisfies:∫ψ(z)q(z)dλ(z) = 1 (8.15)

and the entropy measure is∫[logψ(z)]ψ(z)q(z)dλ(z).

The optimized density conditioned on St is ψt(z)qt(z), where

ψt(z) =exp

(−W (yt,qt,z,at)

θ

)E[exp

(−W (yt,qt,z,at)

θ

)|St] . (8.16)

Page 287: Maskin ambiguity book mock 2 - New York University

8.5. Control problems with model uncertainty 279

8.5 Control problems with model

uncertainty

We propose robust control problems that take qt(z) as the component ofthe decision maker’s state vector that summarizes the history of signals.The decision maker’s model includes the law of motion (8.5) for q (Bayes’law) under the approximating model (8.1), (8.2), (8.3). Two recursionsthat generalize Bellman equation (8.6) express alternative views about thedecision maker’s fear of misspecification. A first recursion works with valuefunctions that include the hidden state z as a state variable. Let

W (y, q, z) = U(x, a) + E{βW ∗[π(s∗, y, q, a), z∗]

∣∣x, q} , (8.17)

where the action a solves:

W (y, q) = maxaE

[U(x, a) + E

{βW ∗[π(s∗, y, q, a), z∗]

∣∣x, q, a}∣∣∣y, q, a].(8.18)

The value function W depends on the hidden state z, whereas the valuefunction W in (8.6) does not. A second recursion modifies the ordinaryBellman equation (8.6), which we can express as:

W (y, q) = maxaE

[U(x, a) + E

{βW ∗[π(s∗, y, q, a)]

∣∣x, q, a}∣∣∣y, q, a]. (8.19)

Although they use different value functions, without concerns about modelmisspecification, formulations (8.17)-(8.18) and (8.19) imply identical con-trol laws. Furthermore, a W (y, q) that satisfies (8.19) also obeys (8.18) byvirtue of the law of iterated expectations. Because Bellman equation (8.19)is computationally more convenient, the pair (8.17)-(8.18) is not used inthe standard problem without a concern for robustness. However, witha concern about robustness, a counterpart to (8.17)-(8.18) becomes use-ful when the decision maker wants to explore distortions of the joint con-ditional distribution τ(s∗, z∗|y, z, a).9 Distinct formulations emerge from(8.18) and (8.19) when we replace the conditional expectation E(·|y, q, a)with T2(·|θ2) and the conditional expectation E(·|x, q, a) with T1(·|θ1).

9Another way to express his concerns is that in this case the decision maker fearsthat (8.2) and (8.3) are both misspecified.

Page 288: Maskin ambiguity book mock 2 - New York University

280 Chapter 8. Robust Estimation and Control Without Commitment

When θ1 = θ2 = +∞, (8.17)-(8.18) or (8.19) lead to value functionsand decision rules equivalent to those from (8.6). When θ1 < +∞ andθ2 < +∞, recursions (8.17)-(8.18) and (8.19) lead to different decision rulesbecause they take different views about the conditional distributions thatthe malevolent player wants to distort, or equivalently, about the aspects ofthe stochastic specification in the approximating model against which thedecision maker seeks robustness.

Which conditional distributions to distort?

The approximating model (8.1), (8.2), (8.3) makes both tomorrow’s signals∗ and tomorrow’s state z∗ functions of x. When tomorrow’s value functiondepends on s∗ but not on z∗ as in (8.19), the minimizing player choosesto distort only κ(s∗|y, z, a), which amounts to being concerned about mis-specifications of the evolution equation (8.3) for the signal and not (8.2)for the hidden state. Such a continuation value function imparts no ad-ditional incentive to distort the evolution equation (8.2) of z∗ conditionedon s∗ and x.10 Such a continuation value that depends on s∗ but not onz∗ thus imparts concerns about a limited array of distortions that ignorepossible misspecification of the z∗ evolution (8.2). Therefore, when we wantto direct the maximizing agent’s concerns about misspecification onto theconditional distribution κ(s∗|y, z, a), we should form a current period valuethat depends only on the history of the signal and of the observed state.We do this in recursion (8.23) below.

However, in some situations, we might want to extend the maximizingplayer’s concerns about misspecification to the joint distribution τ(z∗, s∗|y, z, a)of z∗ and s∗. We can do this by making tomorrow’s value function for theminimizing player also depend on z∗. In recursions (8.20)-(8.21) below, weform a continuation value function that depends on z∗ and thereby extendrecursions (8.17), (8.18) to incorporate concerns about misspecification of(8.2).

Thus, (8.20)-(8.21) below will induce the minimizing player to distortthe distribution of z∗ conditional on (s∗, x, a), while the formulation in(8.23) will not.

10Dependence between (s∗, z∗) conditioned on x under the approximating modelmeans that in the process of distorting s∗ conditioned on (x, a), the minimizing playermay indirectly distort the distribution of z∗ conditioned on (x, a). But he does notdistort the distribution of z∗ conditioned on (s∗, x, a)

Page 289: Maskin ambiguity book mock 2 - New York University

8.5. Control problems with model uncertainty 281

Value function depends on (x, q)

By defining a value function that depends on the hidden state, we focusthe decision maker’s attention on misspecification of the joint conditionaldistribution τ(z∗, s∗|y, z, a) of (s∗, z∗). We modify recursions (8.17)-(8.18)by updating a value function according to

W (y, q, z) = U(x, a) +T1[βW ∗(y∗, q∗, z∗)|θ1

](x, q, a) (8.20)

after choosing an action according to

maxa

T2(U(x, a) +T1

[βW ∗(y∗, q∗, z∗)|θ1

](x, q, a

)∣∣∣θ2)(y, q, a), (8.21)

for θ1 ≥ θ1, θ2 ≥ θ2(θ1) for θ1, θ2 that make the problems well posed.11

Updating the value function by recursion (8.20) makes it depend on (x, q),while using (8.21) to guide decisions makes actions depend only on theobservable state (y, q). Thus, continuation value W depends on unobservedstates, but actions do not. To retain the dependence of the continuationvalue on z, (8.20) refrains from using the T2 transformation when up-datingcontinuation values.

The fixed point of (8.20)-(8.21) is the value function for an infinite hori-zon problem. For the finite horizon counterpart, we begin with a terminalvalue function and view the right side of (8.20) as mapping next period’svalue function into the current period value function.

Time inconsistency of maximizing player’s preferences

In formulation (8.20)-(8.21), the current period decision maker acknowl-edges the dependence on the current hidden state of discounted future re-turns. For simplicity, suppose that we set θ1 = ∞. ThenW (y, q, z) gives thediscounted value of an objective conditioned on the hidden state z. Thatthis hidden state helps predict future signals and future observable statevectors is reflected the dependence of this value function on z. This depen-dence remains when we let θ1 <∞, thus activating a concern about modelmisspecification conditioned on the current period value of the state z. Suchdependence is also present in a commitment formulation of the problem dis-cussed in Hansen and Sargent (2005b). In the present formulation without

11Limits on θ1 and θ2 are typically needed to make the outcomes of the T1 and T2

operators be finite.

Page 290: Maskin ambiguity book mock 2 - New York University

282 Chapter 8. Robust Estimation and Control Without Commitment

commitment, we use recursion (8.20) to portray a Markov perfect equilib-rium of a game in which the date t maximizing decision maker (and hismalevolent companions) take as given the decisions of future maximizingdecision makers (and their malevolent companions).12

That the T2 operator is applied only at the last stage of the backwardinduction in (8.20)-(8.21) renders the preferences of the time 0 agent dynam-ically inconsistent.13 The dynamic inconsistency reflects a conflict betweenthe interests of decision makers at different times, one that vanishes whenβ → 1 and which we now describe.

To explore the preferences implicit in this formulation it is convenientto apply the operators R1

t and R2t to continuation values. Let Vt+1 denote

the continuation values of a stochastic process of actions from date t + 1forward. This continuation value can depend on the future states. It isXt+1 measurable but not necessarily St+1 measurable. Assess this actionprocess at date t+1 using R2

t+1(Vt+1|θ2), which makes a robust adjustmentand results in an St+1 measurable continuation value.

Consider two such continuation values, V at+1 and V b

t+1, where

R2t+1(V

at+1|θ2) ≥ R2

t+1(Vbt+1|θ2). (8.22)

We are interested in a date t ranking of these two after we discount (8.22)and add a common current period contribution Ut to both before applyingR2t . This results in two continuation values that are not necessarily compa-

rable, namely, Ut +R1t (βV

at+1|θ1) and Ut +R1

t (βVbt+1|θ1). For some realized

signal histories, the ranking in inequality (8.22) can be reversed, even afterapplying R2

t .It is instructive to consider the special case in which Ut is St measurable.

Then

R2t [Ut +R1

t (βVjt+1|θ1)|θ2] = Ut +R2

t [R1t+1(βV

jt+1|θ1)|θ2]

for j = a, b. The source of possible intertemporal reversal of rankings isthat inequality (8.22) does not imply:

R2t [R

1t+1(βV

at+1|θ1)|θ2] ≥ R2

t [R1t+1(βV

bt+1|θ1)|θ2].

12Laibson (1997) uses a Markov perfect equilibrium of such a game to model thedecisions of made by someone with intertemporally inconsistent preferences coming fromhyperbolic discounting.

13That dynamic inconsistency is what prompts us to model decisions as the Markovperfect equilibrium represented in recursion (8.20).

Page 291: Maskin ambiguity book mock 2 - New York University

8.5. Control problems with model uncertainty 283

If, however, we strengthen inequality (8.22) to be:

V at+1 ≥ V b

t+1

then the rankings are preserved. Thus, when we limit comparisons to onesconditioned on hidden states, then intertemporal inconsistency vanishes.

In the next subsection, we propose an alternative approach that avoidsthe conflict that is the source of this intertemporal inconsistency at thecost of giving the hidden states a less direct role. In particular, as we shallsee, this alternative approach considers value functions that depend onlyon (y, q) and not the hidden state z. This formulation removes an incentiveto explore misspecification of the hidden state dynamics themselves and in-stead focuses only on how those misspecifications might affect the evolutionof signals.

Value function depends on (y, q)

To focus on misspecifications of the conditional distribution κ(s∗|y, z, a),we want the minimizing player’s value function to depend only on the re-duced information encoded in (y, q). For this purpose, we use the followingcounterpart to recursion (8.19):

W (y, q) = maxa

T2

(U(x, a) +T1 [βW ∗(y∗, q∗)|θ1] (x, q, a)

∣∣∣θ2)(y, q, a)

(8.23)for θ1 ≥ θ1 and θ2 ≥ θ2(θ1). Although z

∗ is excluded from the value functionW ∗, z may help predict the observable state y∗ or it may enter directlyinto the current period reward function, so application of the operator T1

creates a value function that depends on (x, q, a), including the hidden statez. Since the malevolent agent observes z, he can distort the dynamics forthe observable state conditioned on z via the T1 operator. Subsequentapplication of T2 gives a value function that depends on (y, q, a), but not z;T2 distorts the hidden state distribution. The decision rule sets action a as afunction of (y, q). The fixed point of Bellman equation (8.23) gives the valuefunction for an infinite horizon problem. For finite horizon problems, weiterate on the mapping defined by the right side of (8.23), beginning witha known terminal value function. Recursion (8.23) extends the recursiveformulation of risk-sensitivity with discounting advocated by Hansen andSargent (1995b) to situations with a hidden state.

Page 292: Maskin ambiguity book mock 2 - New York University

284 Chapter 8. Robust Estimation and Control Without Commitment

A third formulation that forgets that z is hidden

It is interesting to contrast the above approaches with an alternative onethat is be feasible for problems in which z does not appear directly in Ubut instead either y appears alone or y and q both appear. Then one couldproceed by simply applying a single risk-sensitivity operator. For suchproblems, the Bellman equation without concerns about robustness (8.6)could also be expressed as

W (y, q) = maxa∈A

U(y, q, a) + β

∫W ∗ [π(s∗, y, q, a)] ς(s∗|y, q, a)dη(s∗).

The analysis of robust control problems without hidden states in Hansenand Sargent (1995b) and Hansen et al. (2006b) could be applied to ob-tain robust decision rules by taking (y, q) as the observed state. Decisionrules that are robust to misspecification of ς(s∗|y, q, a) can be obtained byiterating on

W (y, q) = maxa∈A

U(y, q, a) +T1[βW ∗(y∗, q∗)|θ](y, q, a).

This approach absorbs Bayes’ law into the transition law for the state andseeks robustness to misspecification of ς(s∗|y, q, a). In contrast, the formu-lations in (8.20)-(8.21) and (8.23) distinguish distortions to κ(s∗|y, z, a) andto q(z) and seek robustness to misspecifications of each of them separately.

Advantages of our specification

We take the distribution qt(z) as a state variable and explore misspecifi-cations of it. An alternative way to describe a decision maker’s fears ofmisspecification would be to perturb the evolution equation for the hiddenstate (8.2) directly. Doing that would complicate the problem substantiallyby requiring us to solve a filtering problem for each perturbation of (8.2).Our formulation avoids multiple filtering problems by solving one and onlyone filtering problem under the approximating model. The transition lawπq for q(z) in (8.5) becomes a component of the approximating model.

When θ1 = +∞ but θ2 < +∞, the decision maker trusts the signaldynamics κ(s∗|y, z, a) but distrusts q(z). When θ2 = +∞ but θ1 < +∞,the situation is reversed. The two-θ formulation thus allows the decisionmaker to disentangle his suspicions about these two aspects of the model.

Page 293: Maskin ambiguity book mock 2 - New York University

8.6. The θ1 = θ2 case 285

Before saying more about the two-θ formulation, the next section exploressome ramifications of the special case in which θ1 = θ2 and how it comparesto the single θ specification that prevails in a related decision problem undercommitment.

8.6 The θ1 = θ2 case

For the purpose of studying intertemporal consistency and other featuresof the associated worst case models, it is interesting to compare the out-comes of recursions (8.20)-(8.21) or (8.23) with the decision rule and worstcase model described by Hansen and Sargent (2005b) in which at time 0the maximizing and minimizing players in a zero-sum game commit to asequence of decision rules and a single worst case model, respectively. Be-cause there is a single robustness parameter θ in this “commitment model”,it is natural to make this comparison for the special case in which θ1 = θ2.

A composite operator T2 ◦T1 when θ1 = θ2

When a common value of θ appears in the two operators, the sequentialapplication T2T1 can be replaced by a single operator:

T2 ◦T1[U(x, a) + βW (y∗, q∗)

](y, q, a)

= −θ log∫

exp

(−U(x, a) + βW [π(s∗, y, q, a)]

θ

)κ(s∗|y, z, a)q(z)dη(s∗)dλ(z).

This operator is the outcome of a portmanteau minimization problem overa single relative density ϕ(s∗, z) ≥ 0 that satisfies14∫

ϕ(s∗, z)κ(s∗|y, z, a)q(z)dη(s∗)dλ(z) = 1,

where ϕ is related to φ and ψ defined in (8.12) and (8.15) by

ϕ(s∗, z) =∫φ(z∗, s∗|z)ψ(z)q∗(z∗)dλ(z∗),

14Recall that applying T1 and T2 separately amounts to minimizing over separaterelative densities φ and ψ.

Page 294: Maskin ambiguity book mock 2 - New York University

286 Chapter 8. Robust Estimation and Control Without Commitment

where this notation emphasizes that the choice of φ can depend on z. Theentropy measure for ϕ is∫

[logϕ(s∗, z)]ϕ(s∗, z)κ(s∗|y, z, a)q(z)dη(s∗)dλ(z),

and the minimizing composite distortion ϕ to the joint density of (s∗, z)given St is

ϕt(s∗, z) =

exp(−U(yt,z,at)+βW [π(s∗,yt,qt,at)]

θ

)E[exp

(−U(yt,z,at)+βW [π(st+1,yt,qt,at)]

θ

)|St] . (8.24)

Special case U(x, a) = U(y, a)

When U(x, a) = U(y, a), the current period utility drops out of formula(8.24) for the worst-case distortion to the distribution, and it suffices tointegrate with respect to the distribution ς(s∗|y, q, a) that we constructedin (8.4) by averaging κ over the distribution of the hidden state. Probabil-ities of future signals compounded by the hidden state are simply averagedout using the state density under the benchmark model, a reduction of acompound lottery that would not be possible if different values of θ were tooccur in the two operators.

To understand these claims, we deduce a useful representation of εt(mt+1, ht)by solving:

εt(m∗t+1, h

∗t ) ≡ min

mt+1∈Mt,ht∈Ht

E[htε

1t (mt+1)|St

]+ ε2t (ht)

subject to E (mt+1ht|St+1) = gt+1, where E (gt+1|St) = 1, a constraint thatwe impose because our aim is to distort expectations of St+1-measurablerandom variables given current information St. The minimizers are

m∗t+1 =

{ gt+1

E(gt+1|Xt)if E (gt+1|Xt) > 0

1 if E (gt+1|Xt) = 0

and h∗t = E (gt+1|Xt) . Therefore, m∗t+1h

∗t = gt+1 and the minimized value

of the objective is

εt(m∗t+1, h

∗t ) = E [gt+1 log(gt+1)|St] ≡ εt(gt+1). (8.25)

Page 295: Maskin ambiguity book mock 2 - New York University

8.6. The θ1 = θ2 case 287

Thus, in penalizing distortions to continuation values that are St-measurable,it suffices to use the entropy measure εt defined in (8.25) and to explore dis-tortions to the conditional probability of St+1-measurable events given St.This is precisely what the gt+1 random variable accomplishes. The gt+1

associated with T2T1 in the special case in which U(x, a) = U(y, a) impliesa distortion φt in equation (8.13) that depends on s∗ alone. The iteratedoperator T2T1 can be regarded as a single risk-sensitivity operator thatfunctions like T1:

T2T1[U(y, a) + βW ∗(y∗, q∗)

](y, q, a)

= U(y, a)− θ log

∫exp

(−βW

∗(π(s∗, y, q, a))θ

)ς(s∗|y, q, a)dη(s∗).

In Hansen and Sargent (2006d), we describe how to compute this operatorfor linear quadratic problems.

Role of absolute continuity and relation to

commitment solutions

Among the outcomes of iterations on the recursions (8.20)-(8.21) or (8.23)of section 8.5 are time-invariant functions that map (yt, qt) into a pair ofnonnegative random variables (mt+1, ht). For the moment, ignore the dis-tortion ht and focus exclusively on mt+1. Through (8.8), the time-invariantrule for mt+1 can be used to a construct a martingale {Mt : t ≥ 0}. Thismartingale implies a limiting probability measure on X∞ = ∨t≥0Xt via theKolmogorov extension theorem. The implied probability measure on X∞will typically not be absolutely continuous over the entire collection of lim-iting events in X∞. Although the martingale converges almost surely byvirtue of Doob’s martingale convergence theorem, without absolute conti-nuity, the limiting random variable will not have unit expectation. Thisimplies that concerns about robustness persist in a way that they don’t ina class of robust control problems under commitment that are studied, forexample, by Whittle (1990) and Hansen and Sargent (2005b).15

15The product decomposition (8.8) of Mt implies an additive decomposition of en-tropy:

E (Mt logMt|S0)− E (M0 logM0|S0) =

t−1∑j=0

E [MjE (mj+1 logmj+1|Xj) |S0] . (8.26)

Page 296: Maskin ambiguity book mock 2 - New York University

288 Chapter 8. Robust Estimation and Control Without Commitment

Problem formulation

Let M∞ be a nonnegative random variable that is measurable with respectto X∞, with E(M∞|S0) = 1. For a given action process {at : t ≥ 0} adaptedto {Xt : t ≥ 0}, let V∞ .

=∑∞

t=0 βtU(xt, at) subject to (8.1)-(8.2). Suppose

that θ > 0 is such that E[exp

(−1θV∞)|S0

]<∞. Then

R1∞(V∞)

.= min

M∞≥0,E(M∞|S0)=1E(M∞V∞|S0) + θE(M∞ logM∞|S0)(8.27)

= −θ logE[exp

(−1

θV∞

)|S0

]. (8.28)

This static problem has minimizer M∗∞ =

exp(− 1θV∞)

E[exp(− 1θV∞)|S0]

that implies a

martingale M∗t = E (M∗

∞|Xt) .16 Control theory interprets (8.28) as a risk-

sensitive adjustment of the criterion V∞ (e.g., see Whittle (1990)) and getsdecisions that are robust to misspecifications by solving

maxat∈At,t≥0

−θ logE[exp

(−1

θV∞

) ∣∣∣S0

].

In a closely related setting, Whittle (1990) obtained time-varying decisionrules for at that converge to ones that ignore concerns about robustness(i.e., those computed with θ = +∞).

The dissipation of concerns about robustness in this commitment prob-lem is attributable to setting β ∈ (0, 1) while using the undiscounted formof entropy in the criterion function (8.27). Those features lead to the exis-tence of a well defined limiting random variableM∞ with expectation unity(conditioned on S0), which means that tail events that are assigned prob-ability zero under the approximating model are also assigned probabilityzero under the distorted model.17

Setting E(M0|S0) = 1 means that we distort probabilities conditioned on S0.16See Dupuis and Ellis (1997). While robust control problems are often formulated as

deterministic problems, here we follow Petersen et al. (2000a) by studying a stochasticversion with a relative entropy penalty.

17Because all terms on the right side of (8.26) are nonnegative, the sequence

t−1∑j=0

Mj−1E (mj logmj |Xj−1)

is increasing. Therefore, it has a limit that might be +∞ with positive probability.

Page 297: Maskin ambiguity book mock 2 - New York University

8.6. The θ1 = θ2 case 289

Persistence of robustness concerns without commitment

In our recursive formulations (8.20)-(8.21) and (8.23) of section 8.5, thefailure of the worst-case nonnegative martingale {Mt : t ≥ 0} to convergeto a limit with expectation one (conditioned on S0) implies that the dis-torted probability distribution on X∞ is not absolutely continuous withrespect to the probability distribution associated with the approximatingmodel. This feature sustains enduring concerns about robustness and per-mits time-invariant robust decision rules, in contrast to the outcomes withdiscounting in Whittle (1990) and Hansen and Sargent (2005b), for exam-ple. For settings with a fully observed state vector, Hansen and Sargent(1995b) and Hansen et al. (2006b) discounted entropy in order to formulaterecursive problems that yield time-invariant decision rules and enduringconcerns about robustness. The present paper extends these recursive for-mulations to problems with unobserved states.

Dynamic inconsistency of worst-case probabilities

about hidden states

This section links robust control theory to recursive models of uncertaintyaversion by exploring aspects of the worst case probability models thatemerge from the recursions defined in section 8.5. Except in a special casethat we describe in subsection 8.6, those recursions achieve dynamic consis-tency of decisions by sacrificing dynamic consistency of beliefs about hiddenstate variables. We explore how this happens. Until we get to the specialcase analyzed in subsection 8.6, the arguments of this subsection will alsoapply to the general case in which θ1 = θ2.

Problems (8.10) and (8.14) that define R1t and R2

t , respectively, implyworst-case probability distributions that we express as a pair of Radon-

Thus, limt→∞E(Mt logMt|S0) converges. Hansen and Sargent (2005b) show that whenthis limit is finite almost surely, the martingale sequence {Mt : t ≥ 0} converges in thesense that limt→∞ E ( |Mt −M∞| |S0) = 0, where M∞ is measurable with respect toX∞

.=∨∞

t=0 Xt. The limiting random variableM∞ can be used to construct a probabilitymeasure on X∞ that is absolutely continuous with respect to the probability measureassociated with the approximating model. Moreover, Mt = E(M∞|Xt). When the im-plied M∞ is strictly positive with probability one, the distorted probability measure willbe equivalent with the original probability measure. In this case, tail events that areassigned probability measure zero under either measure are assigned zero under the otherone.

Page 298: Maskin ambiguity book mock 2 - New York University

290 Chapter 8. Robust Estimation and Control Without Commitment

Nikodym derivatives (m∗t+1, h

∗t ). Are these probability distortions consis-

tent with next period’s distortion h∗t+1? Not necessarily, because we havenot imposed the pertinent consistency condition on these beliefs. In partic-ular, our use of mt+1, ht to distort two conditional distributions each periodoverdetermines a distortion to the distribution of xt+1 conditional on St+1:because mt+1 distorts the probabilities of Xt+1 events conditional on Xt

and ht distorts the probabilities of Xt events conditioned on St, mt+1ht dis-torts the probabilities of Xt+1 events conditional on St. Given the distortedprobabilities of Xt+1 events conditioned on St, we can deduce the probabilitydistortion of Xt+1 events conditional on St+1 (because St ⊂ St+1 ⊂ Xt+1).If we had required the decision maker at time t + 1 to adhere to this dis-tortion, he would not be free to choose ht+1 anew at time t + 1. Thus,except when a special condition that we lay out in the next subsection ismet, the decision maker’s worst-case beliefs about the distribution of xt+1

conditional on St+1 will not be time-consistent. This is a price that we payto attain a recursive formulation in which qt(z) remains a state variable forour formulation of the robust estimation and control problem.

A belief consistency condition

To deduce a sufficient condition for time consistency, recall that the implied{M∗

t+1 : t ≥ 0} should be a martingale. Decompose M∗t+1 in two ways:

M∗t+1 = m∗

t+1h∗tG

∗t = h∗t+1G

∗t+1.

These equations involve G∗t+1 and G

∗t , both of which we have ignored in the

recursive formulation of section 8.5. Taking expectations conditioned onSt+1 on both sides of m∗

t+1h∗tG

∗t = ht+1G

∗t+1 yields

G∗tE

(m∗t+1h

∗t |St+1

)= G∗

t+1.

Thus,g∗t+1 = E

(m∗t+1h

∗t |St+1

)is the implied multiplicative increment for the candidate martingale {G∗

t :t ≥ 0} adapted to the signal filtration.

Claim 8.6.1. A sufficient condition for the distorted beliefs to be time con-sistent is that the process {h∗t : t ≥ 0} should satisfy:

h∗t+1 =

{m∗

t+1h∗t

E(m∗t+1h

∗t |St+1)

if E(m∗t+1h

∗t |St+1

)> 0

1 if E(m∗t+1h

∗t |St+1

)= 0.

(8.29)

Page 299: Maskin ambiguity book mock 2 - New York University

8.6. The θ1 = θ2 case 291

This condition is necessary if G∗t+1 > 0.18

The robust control problem under commitment analyzed by Hansen andSargent (2005b) satisfies condition (8.29) by construction: at time 0 a sin-gle minimizing player chooses a pair (m∗

t+1, h∗t ) that implies next period’s

h∗t+1. However, in the recursive games defined in the recursions (8.20)-(8.21)and (8.23) in section 8.5, the date t+ 1 minimizing agent can deviate fromthe h∗t+1 that is implied by the (m∗

t+1, h∗t ) pair chosen by the date t mini-

mizing agent. The pair (m∗t+1, h

∗t ) gives one distortion of the distribution

of the hidden state (conditioned on St+1) and h∗t+1 gives another. We donot require that these agree, and, in particular, do not require that theprobabilities of events in Xt+1 be distorted in the same ways by the date tdetermined worst-case distribution (conditioned on St+1) and the date t+1worst-case distribution (conditioned on St+1).

A conflict can arise between these worst-case distributions because choos-ing an action is forward-looking, while estimation of z is backward looking.Dynamic inconsistency of any kind is a symptom of conflicts among the in-terests of different decision makers, and that is the case here. The two-playergames that define the evaluation of future prospects (T1) and estimation ofthe current position of the system (T2) embody different orientations – T1

looking to the f uture, T2 focusing on an historical record of signals.

The inconsistency of the worst-case beliefs pertains only to the decisionmaker’s opinions about the hidden state. If we ignore hidden states andfocus on signals, we can assemble a consistent distorted signal distributionby constructing g∗t+1 = E

(m∗t+1h

∗t |St+1

)and noting that E

(g∗t+1|St

)= 1,

so that g∗t+1 is the implied one-period distortion in the signal distribution.We can construct a distorted probability distribution over events in St+1 by

18This consistency condition arguably could be relaxed for the two player game un-derlying (8.23). Although we allow mt+1 to depend on the signal st+1 and the hiddenstate zt+1, the minimizing solution associated with recursions (8.23) depends only on thesignal st+1. Thus we could instead constrain the minimizing agent in his or her choiceof mt+1 and introduce a random variable mt+1 that distorts the probability distributionof zt+1 conditioned on st+1 and Xt. A weaker consistency requirement is

h∗t+1 =mt+1m

∗t+1h

∗t

E(mt+1m∗

t+1h∗t |St+1

)for some mt+1 with expectation equal to one conditioned on st+1 and Xt.

Page 300: Maskin ambiguity book mock 2 - New York University

292 Chapter 8. Robust Estimation and Control Without Commitment

using

G∗t+1 =

t+1∏j=1

g∗j . (8.30)

Under this interpretation, the pair (m∗t+1, h

∗t ) is only a device to construct

g∗t+1. When the objective function U does not depend directly on the hiddenstate vector z, as is true in many economic problems, the consistent set ofdistorted probabilities defined by (8.30) describes the events that directlyinfluence the decision maker’s well being.

Discounting and payoffs influenced by hidden states

are the source of intertemporal inconsistency

If β = 1 and U(x, a) does not depend on the hidden state, we can show thatthe distortions (mt+1, ht) implied by our recursions satisfy the restrictionrequired for Claim 8.6.1 and so are temporally consistent. Therefore, in thisspecial case, the recursive games in section 8.5 imply the same decisions andworst case distortions as the game under commitment analyzed by Hansenand Sargent (2005b). For simplicity, suppose that we fix an action process{at : t ≥ 0} and focus exclusively on assigning distorted probabilities.Let {Vt : t ≥ 0} denote the process of continuation values determinedrecursively and supported by choices of worst-case models.

Consider two operators R1t and R2

t with a common θ. The operator R1t

implies a worst-case distribution for Xt+1 conditioned on Xt with densitydistortion:

m∗t+1 =

exp(−Vt+1

θ

)E[exp

(−Vt+1

θ

)|Xt

] .The operator R2

t implies a worst-case model for the probability of Xt con-ditioned on St with density distortion:

h∗t =E[exp

(−Vt+1

θ

)|Xt

]E[exp

(−Vt+1

θ

)|St] .

Page 301: Maskin ambiguity book mock 2 - New York University

8.7. Implied worst case model of signal distortion 293

Combining the distortions gives

m∗t+1h

∗t =

exp(−Vt+1

θ

)E[exp

(−Vt+1

θ

)|St] .

To establish temporal consistency, from Claim 8.6.1 we must show that

h∗t+1 =exp

(−Vt+1

θ

)E[exp

(−Vt+1

θ

)|St+1

]where

h∗t+1.=E[exp

(−Vt+2

θ

)|Xt+1

]E[exp

(−Vt+2

θ

)|St+1

] .This relation is true when β = 1 and U does not depend on the hidden statez. To accommodate β = 1, we shift from an infinite horizon problem to afinite horizon problem with a terminal value function. From value recursion(8.20) and the representation of R1

t+1 in (8.10),

exp

(−Vt+1

θ

)∝ E

[exp

(−Vt+2

θ

)|Xt+1

],

where the proportionality factor is St+1 measurable. The consistency re-quirement for h∗t+1 is therefore satisfied.

The preceding argument isolates the role that discounting plays in ren-dering the worst case beliefs over the hidden state time inconsistent. Heuris-tically, the games defined by the recursions (8.20)-(8.21) or (8.23) imply in-tertemporal inconsistency when β < 1 because the decision maker discountsboth current period returns and current period increments to entropy; whilein the commitment problem analyzed in Hansen and Sargent (2005b), thedecision maker discounts current period returns but not current period in-crements to entropy.

8.7 Implied worst case model of signal

distortion

The martingale (relative to St) increment gt+1 = E (mt+1ht|St) distortsthe distribution of the date t + 1 signal given information St generated by

Page 302: Maskin ambiguity book mock 2 - New York University

294 Chapter 8. Robust Estimation and Control Without Commitment

current and past signals. For the following three reasons, it is interestingto construct an implied g∗t+1 from the m∗

t+1 associated with R1t or T1 and

the h∗t associated with R2t or T

2.

First, actions depend only on signal histories. Hidden states are usedeither to depict the underlying uncertainty or to help represent preferences.However, agents cannot take actions contingent on these hidden states, onlyon the signal histories.

Second, in decentralized economies, asset prices can be characterized bystochastic discount factors that equal the intertemporal marginal rates ofsubstitution of investors who are off corners and that depend on the dis-torted probabilities these investors use to value contingent claims. Sincecontingent claims to consumption can depend only on signal histories (andnot on hidden states), the distortion to the signal distribution is the twistto asset pricing that is contributed by investors’ concerns about model mis-specification. In particular, under the approximating model, gt+1

E[gt+1|St]be-

comes a multiplicative adjustment to the ordinary stochastic discount factorfor a representative agent (e.g., see Hansen et al. (1999)). It follows that thetemporal inconsistency of worst case beliefs over hidden states discussed insection 8.6 does not prevent appealing to standard results on the recursivestructure of asset pricing in settings with complete markets.19

Third, Anderson et al. (2003) found it useful to characterize detectionprobabilities using relative entropy and an alternative measure of entropydue to Chernoff (1952). Chernoff (1952) showed how detection error prob-abilities for competing models give a way to measure model discrepancy.Models are close when they are hard to distinguish with historical data.Because signal histories contain all data that are available to a decisionmaker, the measured entropy from distorting the signal distribution ispertinent for statistical discrimination. These lead us to measure eitherE(g∗t+1 log g

∗t+1|St

)or Chernoff’s counterpart, as in Anderson et al. (2003).20

Our characterizations of worst case models have conditioned implicitlyon the current period action. The implied distortion in the signal densityis: ∫

φt(z∗, s∗)τ(z∗, s∗|yt, z, , at)ψt(z)qt(z)dλ(z∗)dλ(z)

19See Johnsen and Donaldson (1985c).20Anderson et al. (2003) show a close connection between the market price of risk and

a bound on the error probability for a statistical test for discriminating the approximatingmodel from the worst case model.

Page 303: Maskin ambiguity book mock 2 - New York University

8.8. A recursive multiple priors model 295

where φt is given by formula (8.13) and ψt is given by (8.16). When aBellman-Isaacs condition is satisfied,21 we can substitute for the controllaw and construct a conditional worst case conditional probability densityfor st+1 as a function of the Markov state (yt, qt). The process {(yt+1, qt+1) :t ≥ 0} is Markov under the worst case distribution for the signal evolution.The density qt remains a component of the state vector.

8.8 A recursive multiple priors model

To attain a notion of dynamic consistency when the decision maker hasmultiple models, Epstein and Schneider (2003a,b) advocate a formulationthat, when translated into our setting, implies time varying values for θ1and θ2. Epstein and Schneider advocate sequential constraints on sets oftransition probabilities for signal distributions. To implement their proposalin our context, we can replace our fixed penalty parameters θ1, θ2 with twosequences of constraints on relative entropy.

In particular, suppose that

ε1t (mt+1) ≤ κ1t (8.31)

where κ1t is a positive random variable in Xt, and

ε2t (ht) ≤ κ2t (8.32)

where κ2t is a positive random variable in St. If these constraints bind, theworst-case probability distributions are again exponentially tilted. We cantake θ1t to be the Xt-measurable Lagrange Multiplier on constraint (8.31),

where m∗t+1 ∝ exp

(−Wt+1

θ1t

)and θ1t solves ε1t (m

∗t+1) = κ1t . The counterpart

to R1t (Wt+1) is

C1t (Wt+1)

.=E[Wt+1 exp

(−Wt+1

θ1t

)|Xt

]E[exp

(−Wt+1

θ1t

)|Xt

] .

Similarly, let θ2t be the St-measurable Lagrange multiplier on constraint

(8.32), where h∗t ∝ exp(−Wt

θ2t

), and θ2t solves ε

2t (h

∗t ) = κ2t . The counterpart

21For example, see Hansen et al. (2006b) or Hansen and Sargent (2008d).

Page 304: Maskin ambiguity book mock 2 - New York University

296 Chapter 8. Robust Estimation and Control Without Commitment

to R2t (Wt) is

C2t (Wt)

.=E[Wt exp

(−Wt

θ2t

)|St]

E[exp

(−Wt

θ2t

)|St] .

These constraint problems lead to natural counterparts to the operators T1

and T2.Constraint formulations provide a justification for making θ1 and θ2

state- or time-dependent. Values of θ1 and θ2 would coincide if the twoconstraints were replaced by a single entropy constraint E [htε

1t (mt+1)|St]+

ε2t (ht) ≤ κt, where κt is St-measurable. Liu et al. (2005) and Maenhout(2004) give other reasons for making the robustness penalty parametersstate dependent.22 With such state dependence, it can still be useful todisentangle misspecifications of the state dynamics and the distribution ofthe hidden state given current information. Using separate values for θ1and θ2 achieves that.

8.9 Risk sensitivity and compound lotteries

Jacobson (1973) linked a concern about robustness, as represented in thefirst line of (8.10), to risk sensitivity, as conveyed in the second line of (8.10).That link has been exploited in the control theory literature, for example, byWhittle (1990). Our desire to separate the concern about misspecified statedynamics from concern about misspecifying the distribution of the stateinspires two risk-sensitivity operators. Although our primary interest is tolet a decision maker respond to model misspecification, our two operatorscan also be interpreted in terms of enhanced risk aversion.23

Risk-sensitive interpretation of R1t

The R1t operator has an alternative interpretation as a risk-sensitive ad-

justment to continuation values that expresses how a decision maker who

22These authors consider problems without hidden states, but their motivation forstate dependence would carry over to decision problems with hidden states.

23Using detection probabilities, Anderson et al. (2003) describe alternative senses inwhich the risk-sensitivity and robustness interpretations are and are not observationallyequivalent. We intend eventually to study the interesting issues that arise in extendingdetection error probabilities to discipline the choice of θ1, θ2 pairs.

Page 305: Maskin ambiguity book mock 2 - New York University

8.10. Another example 297

has no concern about robustness prefers to adjust continuation values fortheir risk. The literature on risk-sensitive control uses adjustments of thesame logE exp form that emerge from an entropy penalty and a concern forrobustness, as asserted in (8.10). There are risk adjustments that are moregeneral than those of the logE exp form associated with risk-sensitivity. Inparticular, we could follow Kreps and Porteus (1978c) and Epstein and Zin(1989a) in relaxing the assumption that a temporal compound lottery canbe reduced to a simple lottery without regard to how the uncertainty isresolved, which would lead us to adjust continuation values by

R1t (Vt+1) = φ−1 (E [φ(Vt+1)|Xt])

for some concave increasing function φ. The risk-sensitive case is the spe-cial one in which φ is an exponential function. We focus on the specialrisk-sensitivity logE exp adjustment because it allows us to use entropyto interpret the resulting adjustment as a way of inducing robust decisionrules.

R2t and the reduction of compound lotteries

While (8.16) shows that the operator R2t assigns a worst-case probabil-

ity distribution, another interpretation along the lines of Segal (1990),Klibanoff et al. (2005a), and Ergin and Gul (2009) is available. This oper-ator adjusts for state risk differently than does the usual Bayesian modelaveraging approach. Specifically, we can regard the transformation R2

t asa version of what Klibanoff et al. (2005a) call constant ambiguity aversion.More generally, we could use

R2t (Vt) = ψ−1E

[ψ(Vt)|St

]for some concave increasing function ψ. Again, we use the particular‘logE exp’ adjustment because of its explicit link to entropy-based robust-ness.

8.10 Another example

Hansen and Sargent (2006a) follow Tallarini (2000a) and start with a rep-resentative consumer who, if he did not want to make a risk-sensitivity or

Page 306: Maskin ambiguity book mock 2 - New York University

298 Chapter 8. Robust Estimation and Control Without Commitment

robustness adjustment, would value streams of log consumption ct accordingto

Vt = (1− β)ct + EtβVt+1.

But he wants to make multiple risk sensitivity adjustments to reflect multi-ple doubts about the stochastic specification of consumption growth. Therepresentative consumer has two submodels for consumption growth st+1 ≡ct+1 − ct, each of which has the state space form

ζt+1(ι) = A(ι)ζt(ι) + C(ι)wt+1

ct+1 − ct ≡ st+1 = D(ι)ζt(ι) +G(ι)wt+1

where {wt+1} is an iid Gaussian process with mean 0 and covariance I andζ0(ι) is normally distributed with mean ζ0(ι) and covariance matrix Σ0(ι).Denote the submodels ι ∈ {0, 1} and suppose that the representative con-sumer attaches probability pt = E(ι|St) to model 1 at time t These proba-bilities can be computed by using Bayes rule and data st = [st, st−1, . . . , s1].

Hansen and Sargent (2006a) specify submodel ι = 0 so that it makesconsumption growth be an i.i.d. Gaussian process with an unknown mean.Submodel ι = 1 is like, but not identical to, a model of Bansal and Yaron(2004) that makes consumption growth contain a difficult to detect persis-tent component. In addition to the uncertainty about shocks wt+1 assumedby Bansal and Yaron, one component of ζ(1) is a constant conditional meanof consumption that is unknown to the representative consumer. This fea-ture would increase the risk faced by our representative consumer relativeto Bansal and Yaron’s, even if he set p0 = 1. The representative learnsabout the mean consumption growth parameters as well as other parts ofthe hidden state zt = [ζt(0), ζt(1), ι].

The results of applying Bayes’ law to submodel ι can be represented interms of an innovations representation that takes the form

ζt+1(ι) = A(ι)ζt(ι) +K[Σt(ι), ι]wt+1(ι)

Σt+1(ι) = A(ι)Σt(ι)A(ι)′ + C(ι)C(ι)′ −K[Σt(ι), ι][A(ι)Σt(ι)A(ι)

′ + C(ι)G(ι)′]′

st+1 = D(ι)ζt + wt+1(ι)

where

K[Σt(ι), ι].= [A(ι)Σt(ι)D(ι)′ + C(ι)G(ι)′][D(ι)Σt(ι)D(ι)′ +G(ι)G(ι)′]−1,

Page 307: Maskin ambiguity book mock 2 - New York University

8.10. Another example 299

ζt+1(ι) = E[ζt+1|st, ι], wt+1(ι) is the forecast error for the signal (i.e., the‘innovation’), and Σt(ι) is the covariance matrix for ζt(ι) − ζt(ι) condi-tioned on ι and the signal history through date t. Evidently, in this model,ζt(ι),Σt(ι), ι = 0, 1, and pt are sufficient statistics for the joint distributionqt(z).

Hansen and Sargent (2006a) apply recursions (8.20), (8.21) to form thestochastic discount factor implied by a representative consumer who is con-cerned about misspecifications of the following distributions: (i) the distri-butions of (zt+1, st+1) conditioned on [ι, ζt(ι)]; (ii) the distributions of ζt(ι)conditioned on [ι,St]; and (iii) the distributions of ι, conditional on St. Therepresentative consumer of Hansen and Sargent (2006a) applies T1 to ad-just for his suspicion about (i) and iterates on (8.20) to find valuations asfunctions of ζ(ι), ι. The representative consumer makes adjustment (8.21)by applying T2 first to adjust the distribution mentioned in (ii). He thenapplies another T2 operator to adjust for suspicion of the distribution men-tioned in (iii).24 The implied Radon-Nikodym derivative that perturbs thedistribution of st+1 = ct+1 − ct conditional on St serves as a multiplicativeadjustment to the stochastic discount factor; in a T1-only model, Hansenet al. (1999) dubbed its conditional standard deviation the market priceof model uncertainty. Hansen and Sargent (2006a) study market prices ofmodel uncertainty that emerge from the setting described here and inves-tigate how it compares to ones that emerge from the T1 only models ofHansen et al. (1999) and Tallarini (2000a).

The distributions mentioned in (i) and (ii) of the previous paragraphare both Gaussian, while the one in (iii) is a scalar ∈ (0, 1). Because thelogarithmic preference specification, the value function for problem posedin section 8.5 is affine in ζ , c. As a result the calculations in this modelbecome very easy – the Kalman filter does the hard work in implementingBayes’ Law and the calculations of T1,T2 for the linear-quadratic Gaus-sian model in Hansen and Sargent (2006d) apply. The assumption thatA(ι), C(ι), D(ι), G(ι) are known accounts for this simplicity. Extendingthe model to let some elements in these matrices be unknown enriches thescope for modeling learning about unknown parameters at the cost of mak-ing the filtering problem nonlinear and so pushing it beyond the range of

24Notice that by using different θ2’s in these two applications of T2, we could focusthe decision maker’s concerns about robustness more on one of the two potential sourcesof misspecification.

Page 308: Maskin ambiguity book mock 2 - New York University

300 Chapter 8. Robust Estimation and Control Without Commitment

the Kalman filter. Hansen et al. (2006a) study such problems.

8.11 Concluding remarks

By incorporating learning, this paper responds to thoughtful criticisms ofour earlier work about recursive formulations of robust control withoutlearning. The framework here allows us to examine the consequences forvaluations and decision rules of learnable components of the state that cancapture both model selection and parameter estimation.

The model in section 8.10 is about a pure endowment economy, so thatthe representative consumer chooses no actions – his worst case model de-termines valuations but not actions. Of course, the framework in this paperallows us also to study settings in which a decision maker chooses an ac-tion that influences the motion of the state. We illustrate this aspect byperforming an analysis of robust experimentation in Cogley et al. (2008).For a given concern about misspecification of hidden state probabilities asmeasured by θ2, we can study the speed at which learning diminishes con-cerns about misspecification along particular dimensions of uncertainty asthe accretion of data together with Bayes law gradually reduces the set ofperturbed models by tightening posterior probabilities. The formulas inHansen and Sargent (2006d) and Hansen and Sargent (2006a) show pre-cisely how the volatilities of hidden state estimates that come from Bayes’law affect the gap between the worst case probabilities and those from theapproximating model.

Our procedures for solving robust discounted dynamic programmingproblems are as easy to use as corresponding problems without concernsabout robustness and come down to replacing each of two conditional expec-tations operators in the problem without robustness with a risk-sensitivityoperator. For a finite θ1, the operator T1 captures the decision maker’sfear that the state and signal dynamics conditioned on both the observedand hidden components of the state are misspecified. For a finite θ2, theoperator T2 captures the decision maker’s fear that the distribution of thehidden state conditioned on the history of signals is misspecified. Using dif-ferent values of θ1 and θ2 in the operators T1 and T2 gives us the freedomto focus distrust on different aspects of the decision maker’s model.25

25Specifications with θ1 = θ2 emerge when we follow Hansen and Sargent (2005b)by adopting a timing protocol that requires the malevolent agent to commit to a worst

Page 309: Maskin ambiguity book mock 2 - New York University

8.11. Concluding remarks 301

We do not address the interesting issues that would arise in an econ-omy with heterogeneous agents who have different specification concernsabout the same approximating model. Anderson (2005) studies how Paretooptimal allocations for such economies put history dependence into Paretoweights. Anderson does not ascribe learning problems to his agents, but itwould be interesting to study them in such heterogeneous agent contexts.

case model {Mt+1} once and for all at time 0. Hansen and Sargent (2005b) give arecursive representation for the solution of the commitment problem in terms of R1

t andR2

t operators with a common but time-varying multiplier equal to θβt . The presence of

βt causes the decision maker’s concerns about misspecification to vanish for tail events.Only for the undiscounted case does the zero-sum two player game with commitment inHansen and Sargent (2005b) give identical outcomes to the games without commitmentin this paper. As noted in section 8.6, when β < 1, the gap between the outcomes withand without commitment is the source of time-inconsistency of the worst case beliefsabout the hidden state. Much of the control theory literature (e.g., Whittle (1990)and Basar and Bernhard (1995)) uses the commitment timing protocol and sets β = 1.Hansen and Sargent (2005b) show how to represent parts of that literature in terms ofour formulation of model perturbations as martingales.

Page 310: Maskin ambiguity book mock 2 - New York University
Page 311: Maskin ambiguity book mock 2 - New York University

Chapter 9

Fragile Beliefs and the Price ofUncertainty

1

Abstract

A representative consumer uses Bayes’ law to learn about param-eters and to construct probabilities with which to perform ongoingmodel averaging. The arrival of signals induces the consumer to alterhis posterior distribution over parameters and models. The consumercopes with specification doubts by slanting probabilities pessimisti-cally. One of his models puts long-run risks in consumption growth.The pessimistic probabilities slant toward this model and contributea counter-cyclical and signal-history-dependent component to pricesof risk.

Key words: Learning, Bayes’ law, robustness, risk-sensitivity, pessimism,prices of risk.

1We thank Gadi Barlevy, Alberto Bisin, Riccardo Colacito, Mark Gertler, Anasta-sios Karantounias, Ricardo Mayer, Tomasz Piskorski, Grace Tsiang, Gianluca Violante,and Amir Yaron for helpful comments on earlier drafts of this paper. We thank Fran-cisco Barillas, Ricardo Mayer, and Leandro Nascimento for excellence in executing thecomputations. We thank the National Science Foundation for research support underseparate grants to Hansen and Sargent.

303

Page 312: Maskin ambiguity book mock 2 - New York University

304 Chapter 9. Fragile Beliefs and the Price of Uncertainty

Le doute n’est pas une condition agreable, mais la certitude estabsurde.2 Voltaire 1767.

9.1 Introduction

A pessimist thinks that good news is temporary and that bad news en-dures. This paper describes how a representative consumer’s model selec-tion problem and fear of model misspecification foster pessimism that putscountercyclical model uncertainty premia into risk prices.

Doubts promote fragile beliefs

Our representative consumer values consumption streams according to iter-ated versions of the multiplier preferences that Hansen and Sargent (2001a)use to represent aversion to model uncertainty.3 Following Hansen and Sar-gent (2007b), an iterated application of risk-sensitivity operators allows usto focus the representative consumer’s ambiguity on particular aspects, in-cluding model selection and parameter values.4 Ex post, the consumer acts‘as if’ he uses a probability measure that he twists pessimistically relativeto his approximating model. By ‘fragile beliefs’ we refer to the responsive-ness of pessimistic probabilities to the arrival of news, as determined bythe state dependent value functions that define what the consumer is pes-simistic about.5 Relative to the conventional rational expectations case inwhich the representative consumer has complete confidence in his statisti-cal model, our representative consumer’s reluctance fully to trust a singleapproximating model adds ‘model uncertainty premia’ to prices of risk.New uncertainty components of the ‘risk prices’ emerge from the hiddenMarkov model. They are time-dependent and state-dependent, in contrast

2Doubt is not a pleasant condition, but certainty is absurd.3The relationship of the multiplier preferences of Hansen and Sargent (2001a) to the

max-min expected utility preferences of Gilboa and Schmeidler (1989) are analyzed byHansen et al. (2006b), Maccheroni et al. (2006a,b), Cerreia-Vioglio et al. (2008), andStrzalecki (2008b).

4Sometimes the literature calls this ‘structured uncertainty’.5Harrison and Kreps (1978) and Scheinkman and Xiong (2003) explore another set-

ting in which difficult to detect departures from rational expectations lead to interestingasset price dynamics that cannot occur under rational expectations.

Page 313: Maskin ambiguity book mock 2 - New York University

9.1. Introduction 305

to the constant uncertainty premium analyzed by Hansen et al. (1999) andAnderson et al. (2003).

Fragile expectations as sources of time-varying risk

premia

A hidden Markov model for consumption growth confronts a representativeconsumer with ongoing model selection and parameter estimation problems.Our representative consumer wants to know components of a hidden statevector, some that stand for unknown parameters within a model and oth-ers that index models. A probability distribution over that hidden statebecomes part of the state vector in the representative consumer’s valuefunction. Bayes’ law describes its motion over time. The representativeconsumer slants probabilities towards the model that has the lowest utility.We show how variations over time in the probabilities attached to modelsand other state variables put volatility into the model uncertainty premia.

Key components

In addition to the risk sensitivity operator that Tallarini (2000a) applied, weintroduce an additional one, taken from Hansen and Sargent (2007b), thatadjusts the probability distribution of hidden Markov states for model un-certainty.6 We interpret both risk-sensitivity operators as capturing the rep-resentative consumer’s concerns about robustness instead of the enhancedrisk aversion interpretation of Tallarini.7

Our representative consumer assigns positive probabilities to two mod-els whose fits make them indistinguishable for our data on per capita U.S.consumption expenditures on nondurables and services from 1948II-2008III.In one model, consumption growth rates are nearly i.i.d. model, and in theother there is a highly persistent component to the consumption growth

6This second risk-sensitivity operator accounts for what Klibanoff et al. (2005b, 2009)call smooth ambiguity and what other researchers call ‘structured’ model uncertainty.As an example of a different approach to learning in the presence of model ambiguity,Epstein and Schneider (2008) apply their recursive multiple priors model to study theresponse of asset prices to signals when investors are uncertain about a noise variancethat influences Bayesian updating.

7Barillas et al. (2009) reinterpret some of Tallarini’s results in terms of concern aboutmodel misspecification instead of risk aversion.

Page 314: Maskin ambiguity book mock 2 - New York University

306 Chapter 9. Fragile Beliefs and the Price of Uncertainty

rate, as in the long-run risk model of Bansal and Yaron (2004) with a per-sistent component in consumption growth. But the consumer doubts themodel-mixing probabilities as well as the specification of each of the compo-nent models. In contrast, Bansal and Yaron assume that the representativeconsumer assigns probability one to the long-run risk model even thoughsample evidence is indecisive in selecting between them.8 Our frameworkexplains why a consumer might act as if he puts probability (close to)one on the long-run risk model even though he knows that it is difficult todiscriminate between these models statistically.

Organization

We proceed as follows. After section 9.2 sets out a framework for pricingrisks implicit in a vector Brownian motion wt, section 9.3 describes a hid-den Markov model and three successively less information structures (fullinformation, unknown states, and unknown states and unknown model)together with the three innovations (or news) processes given by the incre-ments to Wt(ι), Wt(ι) and Wt that are implied by these three informationstructures. Section 9.4 then uses these three information specifications andassociated choices dWt(ι), dWt(ι) and dWt as the risks dwt to be priced with-out model uncertainty. We construct these section 9.4 risk prices under theinformation assumptions ordinarily used in finance and macroeconomics.Section 9.5 offers a different perspective on Bayesian learning by pricingeach of the risks dWt(ι), dWt(ι) and dWt under the single full informationset. Section 9.6 describes contributions to risk prices coming from modeluncertainties about distributions conditioning on each of our three infor-mation sets. Uncertainty about shock distributions with known states con-tributes a constant uncertainty premium, while uncertainty about unknownstates contributes a time-dependent one and uncertainty about models con-tributes a state-dependent one. Section 9.7 presents an empirical exampledesigned to highlight the mechanism through which the state-dependentuncertainty premia give rise to countercyclical prices of risk. Appendix 9.Adescribes how we use detection error probabilities to calibrate the represen-tative consumer’s concerns about model misspecification, while appendix

8Bansal and Yaron (2004) incorporate other features in their specifications of con-sumption dynamics, including stochastic volatility. They also use a recursive utilityspecification with an intertemporal elasticity of substitution greater than 1.

Page 315: Maskin ambiguity book mock 2 - New York University

9.2. Stochastic discounting and risks 307

9.B proliferates models as part of a robustness exercise designed to refineour understanding of the forces that produce countercyclical risk prices.

9.2 Stochastic discounting and risks

Let {St} be a stochastic discount factor process that, in conjunction with anexpectation operator, assigns date 0 risk-adjusted prices to payoffs at datet. Trading at intermediate dates implies that St+τ

Stis the τ -period stochastic

discount factor for pricing at date t. Let {wt} be a vector Brownian motioninnovation process where the increment dwt represents new informationflowing to consumers at date t. We synthesize a cumulative time t payoffas

logQt(α) = α · (wt − w0)−t

2|α|2.

By subtracting t2|α|2, we make the payoff be a martingale with unit expecta-

tion. By changing the vector α, we change the risk exposure to componentsof wt. At date t, we price the payoff Qt+τ(α)

Qt(α)as

Pt,τ (α) = E

[St+τQt+τ (α)

StQt(α)

∣∣∣Yt] . (9.1)

The vector of (growth-rate) risk prices for horizon τ is given by the price“elasticity”:

πt,τ = − ∂

∂α

1

τlogPt,τ (α)|α=αo, (9.2)

where we have scaled by the payoff horizon τ for comparability. We takethe negative because exposure to risk is bad. Since we scaled the payoffs tohave unit price, − 1

τlog pt,τ is the logarithm of an expected return adjusted

for the payoff horizon. In log-normal models, this derivative is independentof αo. This is true more generally when the investment horizon shrinks tozero.9

The vector of local risk prices is given by the limit

πt = − limτ↓0

τ∂αlogPt,τ . (9.3)

9Here we are following Hansen and Scheinkman (2009a) and Hansen (2008b) inconstructing a term structure of prices of growth-rate risk.

Page 316: Maskin ambiguity book mock 2 - New York University

308 Chapter 9. Fragile Beliefs and the Price of Uncertainty

It gives the local compensation for exposure to shocks expressed as anincrease in the conditional mean return. Local risk prices in conjunctionwith an instantaneous risk-free rate are the building blocks of asset prices(e.g., Duffie (2001, pp. 111-114)). These local prices can be compounded toconstruct the asset prices for arbitrary payoff intervals τ using the dynamicsof the underlying state variables in an economy.

We exploit the local normality to obtain a simple characterization of theslope of the mean-standard deviation frontier and to reproduce a classicalresult from finance. The slope of the efficient segment of the mean-standarddeviation frontier is obtained by solving:

maxα,α·α=1

α · πt

where the constraint imposes a unit local variance. The solution is α∗t =

πt|πt|

with the optimized local mean given by

α∗t · πt =

πt · πt|πt|

= |πt|, (9.4)

In this local normal environment, the Hansen and Jagannathan (1991) anal-ysis simplifies to comparing the magnitude of the risk price vector impliedby alternative models to an observed mean-standard deviation frontier.

In the power utility model,

St+τSt

= exp(−δ) exp[−γ(logCt+τ − logCt)],

where the growth rate of log consumption logCt+τ − logCt. Here the vectorπt of local risk prices is the vector of “exposures” of −d log St = γd logCtto the Brownian increment vector dWt.

To study learning and robustness, we use models of Bayesian learningto create alternative specifications of dWt and information sets with respectto which the mathematical expectation in (9.1) are evaluated.

Learning and asset prices

We assume a hidden Markov model in which Xt(ι) is a hidden state space, ιindexes an unknown model, Y t+τ

t is a path of signals, and Yt is a condition-ing information set generated by the history of signals. We let lower caseletters denote alternative potential values that can be realized. That is,

Page 317: Maskin ambiguity book mock 2 - New York University

9.2. Stochastic discounting and risks 309

yt+τt is possible realized path for the signals and xt(ι) is a possible realiza-tion of the date t signal of model ι. The hidden Markov structure inducesprobability densities f [yt+τt |ι, xt(ι)], g[xt(ι)|ι,Yt], h(ι|Yt), and f(yt+τt |Yt).10Evidently,

f(yt+τt |Yt) =∫ (∫

f[yt+τt |ι, xt(ι)

]g[xt(ι)|ι,Yt]dxt(ι)

)h(ι|Yt)dι. (9.5)

For convenience, let

Zt+τ (α) =St+τQt+τ (α)

StQt(α).

In our construction under limited information in the absence of robustness,Zt+τ (α) can be expressed as a function of Y t+τ

t and hence we may expressthe asset price

Pt,τ (α) = E [Zt+τ (α)|Yt] (9.6)

as an integral against the density f .To express the price in an alternative way that will be useful to us, we

first use density f to construct

Qt,τ [α|xt(ι), ι] = E[Zt+τ (α)|xt(ι), ι]

and then write

Pt,τ (α) =∫ ∫

Pt,τ [α|xt(ι), ι] g[xt(ι)|ι,Yt]dxt(ι)︸ ︷︷ ︸ h(ι|Yt)dι︸ ︷︷ ︸ .↑ ↑

unknown unknownstate model

This decomposition helps us understand how our paper relates to ear-lier asset pricing papers including, for example, Detemple (1986), David(1997), Veronesi (2000), Brennan and Zia (2001), Ai (2006), and Croceet al. (2006),11 that use learning about a hidden state simply to generatean exogenous process for distributions of future signals conditional on past

10Densities are always expressed relative to a reference measure. In the case of Y t+τt ,

the reference measure is a measure over the space of continuous functions between overthe interval [t, t+ τ ].

11The learning problems in those papers share the feature that learning is passive,there being no role for experimentation so that prediction can be separated from control.Cogley et al. (2008) apply the framework of Hansen and Sargent (2007b) in a setting

Page 318: Maskin ambiguity book mock 2 - New York University

310 Chapter 9. Fragile Beliefs and the Price of Uncertainty

signals as an input into a consumption based asset pricing model. Havingconstructed the f(yt+τt )|Yt), decision making and asset pricing in these mod-els proceeds as in standard asset pricing models without learning. There-fore, the asset pricing implications of such learning models depend only onf and not on the underlying structure with hidden states that the modelbuilder used to deduce that conditional distribution. In such models, theonly thing that learning contributes is a justification for a particular spec-ification of f . We would get equivalent asset pricing implications by justassuming that distribution from the start.

Robust learning and asset pricing

As we shall see, application of distinct risk-sensitivity operators to twistthe component distributions f, g, h means that that equivalence is not truein our model because it makes asset prices depend on the evolution of thehidden states and not simply on the distribution of future signals condi-tioned on signal histories. This occurs because of how, following Hansenand Sargent (2007b), we make the representative consumer explore poten-tial misspecifications of the distributions of hidden Markov states and offuture signals conditioned on those hidden Markov states and on how hetherefore refuses to reduce compound lotteries.

Our representative consumer copes with model misspecification by re-placing the f, g, h conditional densities with worst-case densities f , g, h.With a robust representative consumer, we can use the implied (·) versionof density f distribution to represent the asset price as

Pt,τ (α) = E[Zt+τ (α)

∣∣∣Yt] . (9.7)

Using the density f to account for unknown dynamics, we now construct

Qt,τ [α|xt(ι), ι] = E[Zt+τ (α)|xt(ι), ι].

where decisions affect future probabilities of hidden states and experimentation is active.The papers just cited price risks under the same information structure that is used togenerate the risks being priced. In section 9.5, we offer an interpretation of some otherpapers (e.g., Bossaerts (2002, 2004) and Cogley and Sargent (2008a)) that study theeffects of agents Bayesian learning on pricing risks generated by limited information setsfrom the point of view of an outside econometrician who has a larger information set.

Page 319: Maskin ambiguity book mock 2 - New York University

9.2. Stochastic discounting and risks 311

Our information decomposition of the asset price with a robust representa-tive consumer becomes

Pt,τ (α) =∫ ∫

Qt,τ [α|xt(ι), ι] g[xt(ι)|ι,Yt]dxt(ι)︸ ︷︷ ︸ h(ι|Yt)dι.︸ ︷︷ ︸↑ ↑

unknown unknownstate model

We can also represent the price in terms of the original undistorted distri-bution

Pt,τ (α) = E

(Zt+τ (α)

f

f[Y t+τt |ι, Xt(ι)]

g

g[Xt(ι)|ι,Yt]

h

h[ι|Yt]

∣∣∣∣Yt)

(9.8)

where we have substituted in the random unobserved state vector the ran-dom future signals. Equivalently, the price with a robust representativeconsumer can be represented as

Pt,τ (α) = E

(M t+τ

t

Mt

Zt+τ (α)

∣∣∣∣Yt)where

M t+τt =

f

f[Y t+τt |ι, Xt(ι)]︸ ︷︷ ︸

g

g[Xt(ι)|ι,Yt]︸ ︷︷ ︸

h

h[ι|Yt]︸ ︷︷ ︸

↑ ↑ ↑distorted distorted distorteddynamics state estimation model probabilities

(9.9)

satisfies E(M t+τ

t |Yt)= 1.

In section 9.6, we show how to represent the three relative densitiesff, gg, hh, respectively, that emerge from applying risk-sensitivity operators to

conditional value functions. These operators adjust for alternative formsof model misspecification. Continuation utilities will be center stage inhow our representative consumer uses signal histories to learn about hid-den Markov states, an ingredient absent from those earlier applications ofBayesian learning that reduced the representative consumer’s informationprior to asset pricing. In the continuous-time setting to be laid out insection

Page 320: Maskin ambiguity book mock 2 - New York University

312 Chapter 9. Fragile Beliefs and the Price of Uncertainty

Changes in probability measure are conveniently depicted as martin-gales. As we will see, there is a martingale associated with each of thechannels highlighted by (9.9). For the “distorted” dynamics, we constructa martingale {Mf

t } in section 9.6 that alters the hidden state dynamics,including the link between future signals and the current state reflected in

the density ratio ff. The martingale is constructed relative to a sequence

of information sets that includes the hidden state histories and knowledgeof the model. We construct a second martingale {M i

t} in section 9.6 by in-cluding an additional distortion to state estimation conditioned on a modelas reflected in the density ratio g

g. This martingale is relative to a sequence

of information sets that conditions on the signal history and model, but noton the history of hidden states. Finally, we produce a martingale {Mu

t }in section 9.6 that alters the probabilities over models and is constructedrelative to a sequence of conditioning information sets that includes only

the signal history and is reflected in the density ratio hh.

9.3 Three information structures

We use a hidden Markov model and two filtering problems to constructthree information sets that we shall use to define risks to be priced withand without concerns about robustness to model misspecification.

State evolution

Two models ι = 0, 1 take the state-space forms

dXt(ι) = A(ι)Xt(ι)dt+B(ι)dWt

dYt = D(ι)Xt(ι)dt+G(ι)dWt (9.10)

whereXt(ι) is the state, Yt is the (cumulated) signal, andW is a multivariatestandard Brownian motion. For notational simplicity, we suppose that thesame Brownian motion drives both models. Under full information, ι isobserved and the vector dWt gives the new information available to theconsumer at date t.

Page 321: Maskin ambiguity book mock 2 - New York University

9.3. Three information structures 313

Filtering problems

To generate two alternative information structures, we solve two types offiltering problem. Let Yt be generated by the history of the signal dYτ upto t. In what follows we first condition on Yt and ι for each t. We thenomit ι from the consumer’s conditioning information.

Model known

First, suppose that ι is known. Application of the Kalman filter yields thefollowing innovations representation:

dXt(ι) = A(ι)Xt(ι)dt +Kt(ι)[dyt −D(ι)Xt(ι)]

where Xt(ι) = E[Xt(ι)|Yt, ι] and

Kt(ι) = [B(ι)G(ι)′ + Σt(ι)D(ι)′][G(ι)G(ι)′]−1

dΣt(ι)

dt= A(ι)Σt(ι) + ΣtA(ι)

′ +B(ι)B(ι)′

−Kt(ι)[G(ι)B(ι)′ +D(ι)Σt(ι)]. (9.11)

The innovation process is

dWt(ι) = [G(ι)]−1[dYt −D(ι)Xt(ι)dt

]where G(ι)G′(ι) = G(ι)G(ι)′ and G(ι) is nonsingular. The innovation pro-cess comprises the new information revealed to economic agents by thesignal history.

Model unknown

Assume that G(ι)G(ι)′ independent of ι. Without this assumption, ι isrevealed immediately. Let ιt = E(ι|Yt) and

dWt = G−1 (dYt − νtdt) = ιtdWt(1) + (1− ιt)dWt(0)

whereνt

.= [ιtD(1)Xt(1) + (1− ιt)D(0)Xt(0)]. (9.12)

Then

dιt = ιt(1− ιt)[Xt(1)′D(1)′ − Xt(0)

′D(0)′](G′)−1

dWt. (9.13)

The new information pertinent to consumers is now dWt.

Page 322: Maskin ambiguity book mock 2 - New York University

314 Chapter 9. Fragile Beliefs and the Price of Uncertainty

9.4 Risk prices

Section 9.3 described three information structures: i) full information, ii)hidden states but known model, iii) unknown states and unknown model.We use the associated Brownian motions W (ι), Wt(ι), and Wt as risks to bepriced under the information structure that generated them. (But in section9.5 we shall price all three risks under full information in order to look atBayesian learning from another angle.) The forms of the risk prices arethe same for all three information structures and are familiar from Breeden(1979). Given the local normality of the diffusion model, the risk prices aregiven by the exposures of the log marginal utility to the underlying risks.Let the increment logarithm of consumption be given by d logCt = H ′dYt,implying that consumption growth rates are revealed by the increment inthe signal vector. Each of the differing information sets implies a risk pricevector, as reported in Table 9.1.

Because different risks are being priced, the risk prices change across in-formation structures. However, the magnitude of the risk price is the sameacross information structures. As we saw in (9.4), the magnitude of therisk price vector is the slope of the instantaneous mean-standard deviationfrontier. In section 9.6, we shall show how a concern about model mis-specification alters risk prices by adding compensations for bearing modeluncertainty. But first we want to look at Bayesian learning and risk pricesfrom a different perspective.

information local risk risk price slope

full dWt γG(ι)′H γ√H ′G(ι)G(ι)′H

unknown state dWt(ι) γG(ι)′H γ√H ′G(ι)G(ι)′H

unknown model dWt γG′H γ√H ′G(ι)G(ι)′H

Table 9.1: When the model is unknown, G(ι)G(ι)′ is assumed to be inde-pendent of ι. The parameter γ is the coefficient of relative risk aversion ina power utility model. The entries in the “slope” column are the impliedslope of the mean-standard deviation frontier. The consumption growthrate is d logCt = H ′dYt.

Page 323: Maskin ambiguity book mock 2 - New York University

9.5. A full-information perspective on agents’ learning 315

9.5 A full-information perspective on

agents’ learning

In this section, we study what happens when an econometrician mistakenlypresumes that consumers have a larger information set than they actuallydo. It is known that an econometrician who conditions on less informationthan consumers nevertheless draws correct inferences about the magnitudeof risk prices. But we shall see that an econometrician who mistakenlyconditions on more information than consumers actually have makes falseinferences about that magnitude. We regard the consequences of an econo-metrician’s mistakenly conditioning on more information than consumersas contributing to the analysis of risk pricing under consumers’ Bayesianlearning.

Hansen and Richard (1987) systematically studied the consequences forrisk prices of an econometrician’s conditioning on less information thanconsumers. Given a correctly specified stochastic discount factor process,if economic agents use more information than an econometrician, the con-sequences for the econometrician’s inferences about risk prices can be in-nocuous. In constructing conditional moment restrictions for asset prices,all that is required is that the econometrician at least include prices inhis information set. By application of the law of iterated expectation, theproduct of a cumulative return and a stochastic discount factor remains amartingale when some of the information available to consumers is omit-ted from the econometrician’s information set. While the econometricianwho omits information fails correctly to infer the risk components actuallyconfronted by consumers, that mistake does not undermine his correct in-ference about the slope of the mean-standard deviation frontier, as we sawin the third column of table 9.1 section 9.3.

We now consider the reverse problem. What happens if economic agentsuse less information than an econometrician? We study this by using thefull-information structure but price risks generated by the smaller infor-mative information structures, in particular, dWt(ι) and dWt. In pricingdWt(ι) and dWt under full information, we use pricing formulas that take themistaken Olympian perspective (often used in macroeconomics) that con-sumers know the full-information probability distribution of signals. Thismistake made by the econometrician induces a pricing error relative to therisk prices that are actually confronted by the consumer. The full informa-

Page 324: Maskin ambiguity book mock 2 - New York University

316 Chapter 9. Fragile Beliefs and the Price of Uncertainty

tion prices misrepresent the “risks” consumers confront with their reducedinformation structures. The price discrepancies represent the effects of arepresentative agent’s learning that Bossaerts (2002, 2004) and Cogley andSargent (2008a) featured.

Hidden states but known model

Consider first the case in which the model is known. Represent the innova-tion process as

dWt(ι) = [G(ι)]−1

(D(ι)

[Xt(ι)− Xt(ι)

]dt+G(ι)dWt

).

This expression reveals that dWt(ι) bundles two risks: Xt − Xt and dWt.An innovation under the reduced information structure ceases to be aninnovation in the original full information structure. Also, the “risk” Xt(ι)−Xt(ι) under the limited information structure ceases to be risk under thefull information structure.

Consider the pricing of the small time interval limit of

Qt+τ (α)

Qt(α)= exp

(α′ [Wt+τ (ι)− Wt(ι)

]− |α|2τ

2

).

This has unit expectation under the partial information structure. Thelocal price computed under the full information structure is:

−δ−γHXt(ι)+α′[G(ι)]−1D(ι)

[Xt(ι)− Xt(ι)

]+1

2

∣∣∣−γH ′G(ι) + α′ [G(ι)]−1G(ι)

∣∣∣2−|α|22

where δ is the subjective rate of discount. Multiplying by minus one anddifferentiating with respect to α gives the local price:

γG(ι)′H + [G(ι)]−1D(ι)[Xt(ι)−Xt(ι)

].

The first term is the risk price under partial information (see Table 9.1),while the second term is the part of the forecast error in the signal underthe reduced information set that can be forecast perfectly under the fullinformation set.

Page 325: Maskin ambiguity book mock 2 - New York University

9.6. Price effects of consumers’ concerns about robustness 317

States and model both unknown

Consider next what happens when the model is unknown. Suppose thatι = 1 and represent Wt as

Wt = G−1 [G(1)dWt +D(1)Xt(1)dt]− G−1[ιtD(1)Xt(1)dt+ (1− ιt)D(0)Xt(0)dt

]There is an analogous calculation for ι = 0. When we compute local pricesunder full information, we obtain

γG′H + G−1 [νt −D(ι)Xt] (9.14)

where νt is defined in (9.12).The term γG′H is the risk price under reduced information when the

model is unknown (see Table 9.1). The term G−1 [νt −D(ι)Xt] is a contri-bution to the risk price measured by the econometrician coming from theeffects of the consumer’s learning on the basis of his more limited informa-tion set. With respect to the probability distribution used by the consumer,this term averages out to zero. Since ι is unknown, the average includesa contribution from the prior. For some sample paths, this term can havenegative entries for a substantial amount of time, indicating that the pricesunder the reduced information exceed those computed under full informa-tion. Other trajectories could display just the opposite phenomenon. Itis thus possible that the term G−1 [νt −D(ι)Xt] contributes apparent pes-simism or optimism, depending on the prior over ι and the particular samplepath. In what follows, we use concerns about robustness to motivate priorsthat are necessarily pessimistic and that always enhance the counterpart torisk prices.

9.6 Price effects of consumers’ concerns

about robustness

When prices reflect a representative consumer’s fears of model misspecifi-cation, (9.2) must be replaced by (9.7) or equivalently (9.8). To computedistorted densities under our alternative information structures, we mustfind value functions for a planner who fears model misspecification.12 While

12Hansen and Sargent (2008d, chs.11-13) discuss the role of the planner’s problem incomputing and representing prices with which to confront a representative consumer.

Page 326: Maskin ambiguity book mock 2 - New York University

318 Chapter 9. Fragile Beliefs and the Price of Uncertainty

we have previously constructed “risk prices” that assign prices to shock ex-posures, we now construct analogous prices but they are no longer purelyrisk prices. Instead, because they will include uncertainty components, weshall refer to such prices as “shock prices”. We construct components ofthese prices for our three information structures and display them in thelast column of Table 9.2. Specifically, this column gives the contribution tothe shock prices from each type of model uncertainty.

information local risk risk price uncertainty price

full dWt G(ι)′H 1θ1[B(ι)′λ(ι) +G(ι)′H ]

unknown state dWt(ι) G(ι)′H 1θ2G(ι)−1D(ι)Σt(ι)λ(ι)

unknown model dWt G′H (ι− ι)G−1[D(1)x(1)−D(0)′X(0)]

Table 9.2: When the model is unknown, G(ι)G(ι)′ is assumed to be inde-pendent of ι. The consumption growth rate is d logCt = H ′dYt. Pleasecumulate contributions to uncertainty prices as you move down the lastcolumn.

Value function without robustness

We study a consumer with unitary elasticity of intertemporal substitution.We start with the value function for discounted expected utility using alogarithm period utility function:

V (x, c, ι) = δE

[∫ ∞

0

exp(−δτ) logCt+τ |Xt = x, logCt = c, ι

]= δE

[∫ ∞

0

exp(−δτ)(logCt+τ − logCt)|Xt = x, logCt = c, ι

]+ c

= λ(ι) · x+ c.

Given the recursive nature of this valuation, the vector λ(ι) satisfies theequation

0 = −δλ(ι) +D(ι)′H + A(ι)′λ(ι), (9.15)

Page 327: Maskin ambiguity book mock 2 - New York University

9.6. Price effects of consumers’ concerns about robustness 319

and thusλ(ι) = [δI − A(ι)′]−1D(ι)′H. (9.16)

The value function under limited information simply replaces x with thebest forecast x of the state vector given past information on signals.

Full information

Consider first the full information environment in which states are observedand the model is known. A concern for robustness under full informationgives us a way to construct f

fin (9.9) via a martingale {Mf

t }. While ff

distorts the future signals conditioned on the current state and model, wewill be distort both the state and signal dynamics. The form of the valuefunction is the same as that of Tallarini (2000a) and Barillas et al. (2009).13

In a diffusion setting, a concern about robustness induces the consumer toconsider distortions that append a drift μtdt to the Brownian incrementand to impose a quadratic penalty to this distortion. This leads to a min-imization problem whose indirect value function yields the T1 operator ofHansen and Sargent (2007b):

Problem 9.6.1.

0 = minμ

−δ[λ(ι) · x(ι) + κ(ι)] + x(ι)′D(ι)′H + μ′G(ι)′H + x(ι)′A(ι)′λ(ι)

+ μ′B(ι)′λ(ι) +θ12μ′μ.

where we conjecture a value function of the form λ(ι) · x+ κ(ι) + c.

Here θ1 is a positive penalty parameter that characterizes the decisionmaker’s fear that model ι is misspecified. We impose the same θ1 for bothmodels. See Hansen et al. (2006b) and Anderson et al. (2003) for moregeneral treatments and see appendix 9.A for how we propose to calibrateθ1. The minimizing drift distortion μ is

μ∗(ι) = − 1

θ1

[G(ι)′H +B(ι)′λ(ι)

](9.17)

13While Tallarini adopts an interpretation in terms of enhanced risk aversion, we in-terpret a risk-sensitivity adjustment as expressing an consumer’s concern about modelmisspecification. See Barillas et al. (2009) for the relationship between these interpreta-tions.

Page 328: Maskin ambiguity book mock 2 - New York University

320 Chapter 9. Fragile Beliefs and the Price of Uncertainty

which is independent of the state vector X(ι). As a result,

κ(ι) = − 1

2θ1δ|G(ι)′H +B(ι)′λ(ι)|2 (9.18)

Equating coefficients on x(ι) in (9.6.1) implies that equation (9.15) contin-ues to hold. Thus, λ(ι) remains the same as in the model without robustnessand is given by (9.16).

Proposition 9.6.2. The value function shares the same λ(ι) with the ex-pected utility model [formula ( (9.15))] and κ(ι) is given by (9.18). Theassociated worst case distribution for the Brownian increment is normalwith covariance matrix Idt and drift μ∗(ι)dt given by (9.17).

Under full information, the likelihood of the worst-case model relative tothat of the benchmark model is a martingale {Mf

t (ι)} with local evolution

d logMft (ι) = μ∗(ι)′dWt −

1

2|μ∗(ι)|2dt.

The stochastic discount factor (relative to the benchmark model) includescontributions both from the consumption dynamics and from the martin-gale, so that

d logSft = d logMft (ι)− δdt− dct.

The vector of local shock price is once again the negative of the exposure ofthe stochastic discount factor to the respective shocks. With robustness, theshock price vector under full information is augmented by an uncertaintyprice:

G(ι)H︸ ︷︷ ︸ +1

θ1[G(ι)′H +B(ι)′λ(ι)]︸ ︷︷ ︸

↑ ↑risk uncertainty.

Neither the risk contribution nor the uncertainty contribution to the shockprices is state dependent or time dependent. We have completed the firstrow of Table 9.2.

Unknown states

Now suppose that the model (the value of ι) is known but the state Xt(ι)is not. Thus, we now construct g

gin formula (9.9).

Page 329: Maskin ambiguity book mock 2 - New York University

9.6. Price effects of consumers’ concerns about robustness 321

We seek a martingale {M it} to use under this information structure.

Following Hansen and Sargent (2007b), we introduce a positive penaltyparameter θ2 and construct a robust estimate of the hidden state Xt(ι) bysolving:

Problem 9.6.3.

minφ∫φ(x)ψ(x|x,Σ)dx=1

∫[λ(ι) · x+ κ(ι) + θ2φ(x) logφ(x)]ψ(x|x,Σ)dx

= minx

λ(ι) · x+ κ(ι) +θ22[x− x(ι)]′[Σ(ι)]−1[x− x(ι)]

where ψ(x|x,Σ) is the normal density with mean x and covariance matrixΣ, x(ι) is the estimate of state and Σ the covariance matrix under thebenchmark ι model.

In the first line of Problem 9.6.3, φ is a density (relative to a normal)that distorts the density for the hidden state and θ2 is a positive penaltyparameter that penalizes φ’s with large values of relative entropy (the ex-pected value of φ logφ). The second line of Problem 9.6.3 exploits the factthat the worst-case density is necessarily normal with a mean distortion zto the state. This structure make it straightforward to compute the inte-gral and as a result simplifies the minimization problem. In particular, theworst-case state estimate x(ι) solves

0 = λ(ι) +1

θ2[Σ(ι)]−1[x(ι)− x(ι)].

Proposition 9.6.4. The robust value function is

U [ι, x(ι),Σ(ι)] = λ(ι) · x(ι) + κ(ι)− 1

2θ2λ(ι)′Σ(ι)λ(ι) (9.19)

with the same λ(ι) as in the expected utility model [formula (9.15)] and thesame κ(ι) as in the robust planner’s problem with full information [formula(9.18)]. The worst-case state estimate is

x = x− 1

θ2Σ(ι)λ(ι).

Page 330: Maskin ambiguity book mock 2 - New York University

322 Chapter 9. Fragile Beliefs and the Price of Uncertainty

The indirect value function on the right side of (9.19) defines an instanceof the T2 operator of Hansen and Sargent (2007b). Under the distortedevolution, dYt has drift

ξt(ι)dt = D(ι)Xt(ι)dt +G(ι)μ∗(ι)dt,

while under the benchmark evolution it has drift

ξt(ι)dt = D(ι)Xtdt.

The corresponding likelihood ratio for our limited information setup is amartingale M i

t (ι) that evolves as

d logM it (ι) =[ξt(ι)− ξt(ι)]

′[G(ι)′]−1dWt(ι)−1

2|G(ι)−1[ξt(ι)− ξt(ι)]|2dt,

and therefore the stochastic discount factor evolves as

d logSit = d logM it (ι)− δdt− d logCt.

There are now two contributions to the uncertainty price, the one inthe last column of the first row of table 9.2 coming from the potentialmisspecification of the state dynamics as reflected in the drift distortion tothe Brownian motion and the other in the second row of table 9.2 comingfrom the filtering problem as reflected in a distortion in the estimated meanof hidden state vector:

G(ι)′H︸ ︷︷ ︸ +1

θ1[G(ι)]−1G(ι)[G(ι)′H +B(ι)′λ(ι)]︸ ︷︷ ︸ +

1

θ2[G(ι)]−1D(ι)Σt(ι)λ(ι)︸ ︷︷ ︸ .

↑ ↑ ↑risk model uncertainty estimation uncertainty

The state estimation adds time dependence to the uncertainty prices throughthe evolution of the covariance matrix Σt(ι) governed by (9.11), but the ob-served history of signals is inconsequential. We have completed the secondrow of Table 9.2.

Model unknown

Finally, we obtain a martingale {Mut } that reflects a robust adjustment for

an unknown model. Thus we now construct hhin formula (9.9). We do this

by twisting the model probability ιt by solving:

Page 331: Maskin ambiguity book mock 2 - New York University

9.6. Price effects of consumers’ concerns about robustness 323

Problem 9.6.5.

min0≤ι≤1

ιU [1, x(1),Σ(1)] + (1− ι)U [0, x(0),Σ(0)]

+ θ2ι[log ι− log ι] + θ2(1− ι)[log(1− ι)− log(1− ι)]

Proposition 9.6.6. The indirect value function for this problem becomesour robust value function14

−θ2 log[ι exp

(− 1

θ2U [1, x(1),Σ(1)]

)+ (1− ι) exp

(− 1

θ2U [0, x(0),Σ(0)]

)].

The worst-case model probabilities satisfy:

(1− ι) ∝ (1− ι) exp

(−U [0, x(0),Σ(0)]

θ2

)(9.20)

ι ∝ ι exp

(−U [1, x(1),Σ(1)]

θ2

). (9.21)

Under the distorted probabilities, the signal increment dYt has a drift

κtdt = [ιtξt(1) + (1− ιt)ξt(0)]dt,

and under the benchmark probabilities this drift is

κtdt = [ιtξt(1) + (1− ιt)ξt(0)]dt.

The associated martingale constructed from the relative likelihoods has evo-lution

d logMut = (κt − κt)

′(G′)−1dWt −1

2|G−1(κt − κt)|2dt

and the stochastic discount factor is

d logSt = d logMut − δdt− d logCt

The resulting shock price vector equals the exposure of d logSt to dWt andis the ordinary risk price G′H plus the following contribution coming fromconcerns about model misspecification:

14This is evidently another application of the T2 operator of Hansen and Sargent(2007b).

Page 332: Maskin ambiguity book mock 2 - New York University

324 Chapter 9. Fragile Beliefs and the Price of Uncertainty

ιG−1

[1

θ 1G(1)G(1)′H +

1

θ1G(1)B(1)′λ(1)

]+ (1− ι)G−1

[1

θ 1G(0)G(0)′H +

1

θ1G(0)B(0)′λ(0)

]+ ιG−1

[1

θ2D(1)Σ(1)λ(1)

]+ (1− ι)G−1

[1

θ2D(0)Σ(0)λ(0)

]+ (ι− ι)G−1 [D(1)x(1)−D(0)′x(0)] . (9.22)

As summarized in Table 9.2, the first term reflects uncertainty in statedynamics associated with each of the two models. Hansen et al. (1999)feature a similar term. It is forward looking by virtue of the appearance ofλ(ι) determined in (9.16). The next term reflects uncertainty about hiddenstates for each of the respective models. When ι < 1, it depends partlyon the evolution of ι. In the limiting case in which ι = 1, the first term isconstant over time and the next one depends on time but not on the signalhistory. In our application, this limiting case approximately obtains whenθ2 is sufficiently small. The third term reflects uncertainty about the modelsand depends on the signal history even when ι = 1. The term that is scaledby ι− ι is also central to the evolution of model probabilities given in (9.13)and dictates how new information contained in the signals induces changesin the model probabilities under the benchmark specification. In effect,G−1 [D(1)x(1)−D(0)′x(0)], appropriately scaled, is the response vector ofnew information in the signals to the updating of the probability assignedto models ι. The signal realizations over the next instant push the decision-maker’s posterior towards one of the two models and this is reflected inthe equilibrium uncertainty prices. In addition to responding to the signalhistory, this response vector will recurrently change signs within an observedsample whenever discriminating between models is challenging. In suchcases, we do not expect new information always to move probabilities inthe same direction. In the third term of (9.22), this response vector isscaled by the difference between the current model probabilities under thebenchmark and worst case models. Formulas (9.20) and (9.21) indicatehow the consumer slants the model probabilities towards the model withthe worse utility consequences. This probability slanting induces additionalhistory dependence through ιt, which depends on the signal history.

Page 333: Maskin ambiguity book mock 2 - New York University

9.7. Illustrating the mechanism 325

9.7 Illustrating the mechanism

To highlight the forces that govern the component contributions of modeluncertainty to shock prices in formula (9.22), we create a long-run riskmodel with a predictable growth rate along the lines of Bansal and Yaron(2004) and Hansen et al. (2008a). Our models share the form

dX1t = a(ι)X1t(ι)dt+ σ1(ι)dW1t

dX2t = 0

dYt = X1tdt+X2tdt+ σ2(ι)dW2t (9.23)

where X1t(ι), X2t(ι) are scalars and W1t,W2t are scalar components of thevector Brownian motion Wt and where X20(ι) = μy(ι) is the unconditionalmean of consumption growth for model ι. We use the following discretetime approximation to the state space system (9.10):

Xt+τ (ι)−Xt(ι) = τA(ι)Xt(ι) +B(ι)(Wt+τ −Wt)

Yt+τ − Yt = τD(ι)Xt(ι) +G(ι)(Wt+τ −Wt).

We set τ = 1.A small negative a(ι) coupled with a small σ1(ι) captures long-run risks

in consumption growth. Bansal and Yaron (2004) justify such a specificationwith the argument that it fits consumption growth approximately as wellas, and is therefore difficult to distinguish from, an iid consumption growthmodel, which we know to fit the aggregate per capital U.S. consumptiondata well. We respect this argument by forming two models with the samevalues of the signal noise σ2(ι) but that with differing values of σ1(ι), ρ(ι) =a(ι) + 1, and μy(ι) = x20(ι), give identical values of the likelihood. Weimpose ρ(1) = .99 to capture a long-run risk model, while the equallygood fitting ι = 0 model has ρ = .36.15 Thus, we have constructed our twomodels so that they are indistinguishable statistically over our sample. This

15The sample for real consumption of services and durables runs over the period1948II-2008III. To fit model ι = 1, we fixed ρ = .99 and estimated σ1 = .0004257, σ2 =.0048177, μy = .004545. Fixing σ2 equal to .0048177, we then found a values of ρ = .36and associated values σ1 =, 0020455, μy = .00478258 that give virtually the same value ofthe likelihood. In this way, we construct two good fitting models that are difficult to dis-tinguish, with model ι = 1 being the long-run risk model and model ι much more closelyapproximating an iid growth model. Freezing the value of σ2 at the above value, the

Page 334: Maskin ambiguity book mock 2 - New York University

326 Chapter 9. Fragile Beliefs and the Price of Uncertainty

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pro

b

time

Figure 9.1: Bayesian model probability ιt (solid line) and worst-case modelprobability ιt (dashed line).

is our way of making precise the Bansal and Yaron (2004) observation thatlong-run risk and iid consumption growth models are difficult to distinguishempirically.

In appendix 9.A we describe how we first calibrated θ1 to drive theaverage detection error probability over the two ι models with observedstates to be .4 and then with θ1 thereby fixed set θ2 to get a detectionerror probability of .2 for the signal distribution of the mixture model. Weregard these values of detection error probabilities as being associated with

maximum likelihood estimates are ρ = .8179, σ1 = .00131659, μy = .00474011. The datafor consumption comes from the St. Louis Fed data set (FRED). They are taken fromtheir latest vintage (11/25/2008) with the following identifiers PCNDGC96 20081125(real consumption on nondurable goods), PCESVC96 20081125 (real consumption onservices). The population series is from the BLS, Series ID: LNS10000000. This iscivilian noninstitutional population 16 years and over in thousands. The raw data aremonthly. We averaged it to compute a quarterly series.

Page 335: Maskin ambiguity book mock 2 - New York University

9.7. Illustrating the mechanism 327

moderate amounts of model uncertainty.16 For these values of θ1, θ2,17 figure

9.1 plots values of the Bayesian model mixing probability ι along with theworst-case probability ι. As described in the previous paragraph, we haveconstructed our two models so that with our setting of the initial modelprobability ι0 at .5, the terminal value of ιt is also approximates .5. Aninteresting thing about figure 9.1 is to watch how the worst-case ιt twiststoward the long-run risk ι = 1 model. This probability twisting contributesto the countercyclical movements to the uncertainty contributions to theshock price (9.22) that we plot in figure 9.2.18

Figure 9.3 decomposes the uncertainty contribution to the shock pricesinto the components coming from the three lines of expression (9.22), namely,those associated with state dynamics with a known model, unknown stateswith a known model, and an unknown model, respectively. As anticipated,the first two contributions are positive, the first being constant and whilethe second varies over time. The third contribution, due to uncertaintyabout the model, alternates in sign.

The first contribution is constant and relatively small in magnitude. Wehave specified our models so that G(ι)B(ι)′ = 0 and thus[

1

θ 1G(ι)G(ι)′H +

1

θ1G(ι)B(ι)′λ(ι)

]=

1

θ 1GG′H

which is the same for both models. While the forward-looking componentto shock prices reflected in 1

θ1B(ι)′λ(ι) is present in the model with full in-

formation, it is absent in our specification with limited information. Laterwe consider a specification in which this forward-looking component is ac-tivated.19 The second contribution features state estimation. Figure 9.4

16We initiate the Bayesian probability ι0 = .5 and set the covariance matrices Σ0(ι)over hidden states at values that approximate what would prevail for a Bayesian who hadpreviously observed a sample of the length 242 that we have in our actual sample period.In particular, we calibrated the initial state covariance matrices for both models asfollows. First, we set preliminary ‘uninformative’ values that we took to be the varianceof the unconditional stationary distribution of x2t(ι) and a value for the variance of x2(ι)of .012, which is orders of magnitude larger than the maximum likelihood estimates ofμy for our entire sample. We set a preliminary state covariance between x1t(ι) andx2(ι) equal to zero. We put these preliminary values into the Kalman filter, ran it for asample length of 242, and took the terminal covariance matrix as our starting value forthe covariance matrix of the hidden state for model ι.

17The calibrated values are θ−11 = 7, θ−1

2 = 1.18The figure plots all components of (9.22) except the ordinary risk price GH ′.19See section 9.7, where we specify an example in which G(ι)B(ι)′ is not zero.

Page 336: Maskin ambiguity book mock 2 - New York University

328 Chapter 9. Fragile Beliefs and the Price of Uncertainty

1950 1960 1970 1980 1990 2000 2010

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

uncertainty

Figure 9.2: Contributions to uncertainty prices from all sources of modeluncertainty.

shows the components D(ι)Σ(ι)λ(ι) that are important inputs of state un-certainty. This figure reveals how hidden states are more difficult to learnabout in model ι = 1 than in model ι = 0 because the presence of thevery persistent hidden state slows convergence of Σt(1). In particular, formodel ι = 1, the variance of the estimated unconditional mean of consump-tion growth, Σt(1)22, converges more slowly to zero. The third contributionwill generally fluctuate over time in ways that depend on the evolution ofthe discrepancy between the estimated means D(ι)x(ι) under the two mod-els, depicted in figure 9.5. Thus, while pessimism arising from a concernfor robustness necessarily increases the uncertainty prices via the first twoterms, it may either lower or raise it through the third term. Recall thatthe slope of the mean-standard deviation frontier, the maximum Sharperatio, is the absolute value of the shock price vector. Sizable negative orpositive shock prices imply large maximum Sharpe ratios. Negative shockprices for some signal histories imply that positive consumption innovationsare sometimes feared because of what they convey about the plausibilityof alternative models. The magnitude of the these prices determines thefamiliar risk-return tradeoffs from finance. How concerns about model un-certainty affect uncertainty premia on risky assets will ultimately dependon how returns are correlated with consumption shocks.

Page 337: Maskin ambiguity book mock 2 - New York University

9.7. Illustrating the mechanism 329

1950 1960 1970 1980 1990 2000 20100

0.01

0.02

0.03

dynamics

1950 1960 1970 1980 1990 2000 2010

0.04

0.05

0.06

0.07

0.08

state learning

1950 1960 1970 1980 1990 2000 2010−0.5

0

0.5

unknown model

Figure 9.3: Contributions to uncertainty prices from state dynamics (toppanel) and learning hidden state (middle panel), models known, and un-known model (bottom panel)

1950 1960 1970 1980 1990 20000

0.5

1

1.5

2

2.5

3

3.5

x 10−4 unknown state, mod 1

1950 1960 1970 1980 1990 20000

1

2

3

4

5

6

7x 10

−5 unknown state, mod 0

Figure 9.4: D(ι)Σ(ι)λ(ι) for ι = 1 (top panel) and ι = 0 (bottom panel).

Page 338: Maskin ambiguity book mock 2 - New York University

330 Chapter 9. Fragile Beliefs and the Price of Uncertainty

1950 1960 1970 1980 1990 2000 2010−4

−3

−2

−1

0

1

2

3x 10

−3 difference of means

1950 1960 1970 1980 1990 2000 2010

0

2

4

6

8

x 10−3 means under ι=1 and ι=0

10

Figure 9.5: Difference in means and means themselves from models ι = 1and ι = 0.

Explanation for countercyclical uncertainty premia

Our representative consumer attaches positive probabilities to a model withstatistically subtle persistence in consumption growth, namely, the long-runrisk model of Bansal and Yaron (2004), and also to another model assert-ing close to iid consumption growth rates.20 The asymmetrical responseof model uncertainty premia to consumption growth shocks comes from(i) how the representative consumer’s concern about possible misspecifica-tion of the probabilities that he attaches to models causes him to calculateworst case probabilities that depend on value functions, and (ii) how thevalue functions for the two models respond to shocks in ways that bringthem closer together after positive consumption growth shocks and pushthem farther apart after negative shocks. The long-run risk model withvery persistent consumption growth confronts the consumer with a long-lived shock to consumption growth. That affects the set of possible modelmisspecifications that he worries about. The representative consumer’s con-cerns about these misspecifications are reflected in a more negative value ofκ(ι)− 1

2θ2λ(ι)′Σ(ι)λ(ι) formula (9.19) for the continuation value. Over our

sample period, the difference across models varies monotonically from -2.55to- 2.42. The resulting difference in constant terms in the value functions for

20Appendix 9.B reports a sensitivity analysis aimed to add insight about the sourceof countercyclical shock prices.

Page 339: Maskin ambiguity book mock 2 - New York University

9.7. Illustrating the mechanism 331

the models with and without long-run consumption risk sets the stage for anasymmetric response of uncertainty premia to consumption growth shocks.Consecutive periods of higher than average consumption growth raise theprobability that the consumer attaches to the model with persistent con-sumption growth relative to that of the approximately iid consumptiongrowth ι = 0 model. Although the long-run risk model has a more negativeconstant term, when a string of higher than average consumption growthsoccur, persistence of consumption growth under this model means that con-sumption growth can be expected to remain higher than average for manyfuture periods. This pushes the continuation values associated with thetwo models closer together than they are when consumption growth rateshave recently been lower than average. Via exponential twisting formulas,continuation values determine the worst-case probabilities that the repre-sentative consumer attaches to the models. That the continuation values forthe two models become farther apart after a string of negative consumptiongrowth shocks implies that our cautious consumer slants probability moretowards the pessimistic long-run risk model when recent observations of con-sumption growth have been lower than average than when these observedgrowth rates have been higher than average. The intertemporal behaviorof robustness-induced probability slanting accounts for how learning in thepresence of uncertainty about models induces time variation in uncertaintypremia.

Roles of types of uncertainty

The decomposition of uncertainty contributions to shock prices depicted infigure 9.3 helps us to think about how these contributions would change if,by changing θ1 and θ2, we refocus the representative consumer’s concernabout misspecification on a different mixture of dynamics, hidden states,and unknown model. Figures 9.6 and 9.7 show the consequences of turningoff fear of unknown dynamics by setting θ1 = +∞ while lowering θ2 toset the detection error probability again to .2 (here θ−1

2 = −1.95). Noticethat now the uncertainty contribution to shock prices remains positive overtime. Apparently, the consumers in this economy no longer fear good newsabout about consumption.

Page 340: Maskin ambiguity book mock 2 - New York University

332 Chapter 9. Fragile Beliefs and the Price of Uncertainty

1950 1960 1970 1980 1990 2000 2010−4

−3

−2

−1

0

1

2

3x 10

−3 difference in means

1950 1960 1970 1980 1990 2000 20100

0.2

0.4

0.6

0.8

1

Pro

b

time

Figure 9.6: Difference in means (top panel) and Bayesian model probabil-ity ιt (solid line) and worst-case model probability ιt (dashed line) (bottompanel). Here θ1 is set to +∞ and θ2 is set to give a detection error proba-bility of .2.

1950 1960 1970 1980 1990 2000 20100.06

0.08

0.1

0.12

state learning

1950 1960 1970 1980 1990 2000 2010−0.5

0

0.5

unknown model

1950 1960 1970 1980 1990 2000 2010

−0.2

0

0.2

0.4

0.6

uncertainty

Figure 9.7: Contributions to uncertainty prices from learning hidden state(top panel), models known; unknown model (middle panel), and all sources(bottom panel). Here θ1 is set to +∞ and θ2 is set to give a detectionerror probability of .2. Because θ1 = +∞, the contribution from unknowndynamics is identically zero.

Page 341: Maskin ambiguity book mock 2 - New York University

9.7. Illustrating the mechanism 333

1950 1960 1970 1980 1990 2000 2010

−0.4

−0.2

0

0.2

0.4

0.6

Figure 9.8: Contribution of learning to risk price.

Effects of learning under rational expectations

It is interesting to contrast the kind of pessimism coming from robustnesswith the kind featured in Cogley and Sargent (2008a) that is induced by apessimistic prior joined with ordinary Bayesian learning. Figure 9.8 showsthe contributions to shock prices γG′H + G−1 [νt −D(ι)xt] given in expres-sion (9.14) when we assume that the true model used to price risks under fullinformation is model ι = 0 with parameters set at values estimated at theend of our sample. Notice how the learning contribution to the shock pricefluctuates between positive and negative values. These fluctuations can beinterpreted in terms of alternating spells of Bayesian-learning-induced opti-mism and pessimism relative to what we have assumed are the true hiddenstate variables with the true model.21

21Suppose that the state vector processes {xt(ι)} are stationary and ergodic andthe associated stationary distributions are used as the prior for the limited informationstructures. In this case, learning is about perpetually moving targets. In long samples,the entries of {xt(ι)− xt(ι)} will change signs so that on average they agree. In contrastif one an entry of xt(ι) is truly invariant but unknown a priori, then a systematic biascan emerge in a sample trajectory analogous to the one depicted in figure 9.8 even as theimpact of the prior decays over time. For finite t’s, the expectation of xt(ι) conditionedon the invariant parameter will be biased as is standard in Bayesian analysis. This biasdisappears only when we average across such trajectories in accordance to the prior overthe invariant parameter.

Page 342: Maskin ambiguity book mock 2 - New York University

334 Chapter 9. Fragile Beliefs and the Price of Uncertainty

A specification with state-dependent contributions

from unknown dynamics

The fact that our specification (9.23) implies that G(ι)B(ι)′ = 0 for ι = 0, 1disables a potentially interesting component of uncertainty contributionsin formula (9.22). To activate this effect, we briefly study a specificationin which G(ι)B(ι)′ = 0 and in which its difference across the two modelscontributes in interesting ways. In particular, we modify (9.23) to thesingle-shock specification

dX1t = a(ι)X1t(ι)dt + σ1(ι)dWt

dX2t = 0

dYt = X1tdt+X2tdt+ σ2(ι)dWt (9.24)

where X1t(ι), X2t(ι) are again scalars and Wt is now a scalar Brownianmotion. We construct this system from the time-invariant innovations rep-resentation for system (9.23). This makes the second component of thestate, the unconditional mean of consumption growth, be known at time 0because the (2,2) component of the steady state covariance matrix of thehidden state is zero. The model is set up so that the signal reveals the firstcomponent of the state, so with ι known, the consumer faces no filteringproblem. Therefore, the second source of uncertainty contribution vanishesand (9.22) simplifies to

ιG−1

[1

θ 1G(1)G(1)′H +

1

θ1G(1)B(1)′λ(1)

]+ (1− ι)G−1

[1

θ 1G(0)G(0)′H +

1

θ1G(0)B(0)′λ(0)

]+ (ι− ι)G−1 [D(1)x(1)−D(0)′x(0)] . (9.25)

Figures 9.9 and 9.10 illustrate these outcomes when we set θ−11 = 3.8, which

delivers a detection error probability of .44, and θ−12 = 2, which delivers an

overall detection error probability of .137. We chose these values to illus-trate the key forces at work.22 Notice how the contribution of unknownstate dynamics in the top panel of figure 9.10 now varies over time. Thisreflects the difference in 1

θ1G(ι)B(ι)′λ(ι) across the two models as well as

the fluctuating value of ι. Notice that while the overall shock price varies,this variation is much smaller than in our previous calculations. While the

22The term μ∗(ι) = −θ−11

[G(ι)′H +B(ι)′λ(ι)

]is now -0.0231 for ι = 0 and -0.146 for

model ι = 1.

Page 343: Maskin ambiguity book mock 2 - New York University

9.7. Illustrating the mechanism 335

1950 1960 1970 1980 1990 2000 2010−4

−3

−2

−1

0

1

2

3x 10

−3 difference in means

1950 1960 1970 1980 1990 2000 20100

0.2

0.4

0.6

0.8

1

Pro

b

time

Figure 9.9: Difference in means (top panel) and Bayesian model probabilityιt (solid line) and worst-case model probability ιt (dashed line) (bottompanel). Here θ−1

1 is set to give a detection error probability of .44 and θ−12

is set to give a detection error probability of .137.

current example increases the contribution from a concern about misspec-ified dynamics, it is also true that by ignoring robust state estimation, wehave excluded much of the interesting variation in shock prices.

If we were to lower θ2 enough to imply ι = 1, then the representativeconsumer would act as if he puts probability one on the long-run risk model,as assumed by Bansal and Yaron (2004). Then (9.25) simplifies to

G−1

[1

θ 1G(1)G(1)′H +

1

θ1G(1)B(1)′λ(1)

]+ (ι− 1)G−1 [D(1)x(1)−D(0)′x(0)] . (9.26)

The first term becomes constant, and the effect of not knowing the modelcontributes time-variation to the second term. The first term is capturedunder the Bansal and Yaron (2004) approach that has the consumer assignprobability 1 to the long-run risk model, but not the second term, which inour framework continues to play a role.

Page 344: Maskin ambiguity book mock 2 - New York University

336 Chapter 9. Fragile Beliefs and the Price of Uncertainty

1950 1960 1970 1980 1990 2000 20100.06

0.08

0.1

0.12

state learning

1950 1960 1970 1980 1990 2000 2010−0.5

0

0.5

unknown model

1950 1960 1970 1980 1990 2000 2010

−0.2

0

0.2

0.4

0.6

uncertainty

Figure 9.10: Contributions to uncertainty prices from unknown dynamics(top panel); unknown model (middle panel), and both sources (bottompanel). Here θ−1

1 is set to give a detection error probability of .44 and θ−12

is set to give a detection error probability of .137.

9.8 Concluding remarks

The contributions of model uncertainty to shock prices combine (1) the sameconstant forward-looking contribution μ∗(ι) = −θ−1

1

[G(ι)′H + B(ι)′λ(ι)

]that was featured in earlier work without learning by Hansen et al. (1999)and Anderson et al. (2003), (2) additional smoothly decreasing in timecomponents −θ−1

2 Σ(ι)λ(ι) that come from learning about parameter valueswithin models, and (3) the potentially volatile time varying contributionhighlighted in section 9.7 that is caused by the consumer’s robust learningabout the probability distribution over models.

Our mechanism for producing time-varying shock prices differs fromother approaches. For instance, Campbell and Cochrane (1999) induce sec-ular movements in risk premia that are backward looking because a socialexternality depends on current and past average consumption. To gener-ate variation in risk premia, Bansal and Yaron (2004) assume stochasticvolatility in consumption.23

Our analysis features the effects of robust learning on local prices of

23Our interest in learning and time series variation in the uncertainty premium dif-ferentiates us from Weitzman (2005) and Jobert et al. (2006), who focus on long runaverages.

Page 345: Maskin ambiguity book mock 2 - New York University

9.A. Detection error probabilities 337

exposure to uncertainty. Studying the consequences of robust learning andmodel selection for multi-period uncertainty prices is a natural next step.Multi-period valuation requires the compounding of local prices, and whenthe prices are time-varying this compounding can have nontrivial conse-quences.

Our analysis also imposed a unitary elasticity of substitution in order toobtain convenient formulas for prices. While a unitary elasticity of substi-tution simplifies our calculations, it implies that the ratio of consumption towealth is constant. Although consumption claims have no obvious counter-part in financial data, it remains interesting to relax the unitary elasticity ofsubstitution because of its potential importance in the valuation of durableclaims.

While our example economy is highly stylized, we can imagine a vari-ety of environments in which learning about low frequency phenomena isespecially challenging when consumers are not fully confident about theirprobability assessments. Hansen et al. (2008a) show that while long-runrisk components have important quantitative impacts on low frequency im-plications of stochastic discount factors and cash flows, it is statisticallychallenging to measure those components. Belief fragility emanating frommodel uncertainty promises to be a potent source of fluctuations in theprices of long-lived assets.

Appendix 9.A Detection error probabilities

By adapting procedures developed by Hansen et al. (2002) and Andersonet al. (2003) in ways described by Hansen et al. (2008b), we can use simu-lations to approximate a detection error probability. Repeatedly simulate{yt+1−yt}Tt=1 under the approximating model. Evaluate the likelihood func-tions the likelihood functions LaT and LwT of the approximating model andworst case model for a given (θ1, θ2). Compute the fraction of simulationsfor which

Lwt

LaT> 1 and call it ra. This approximates the probability that the

likelihood ratio says that the worst-case model generated the data whenthe approximating model actually generated the data. Do a symmetricalcalculation to compute the fraction of simulations for which

LaT

LwT> 1 (call it

rw), where the simulations are generated under the worst case model. As inHansen et al. (2002) and Anderson et al. (2003), define the overall detection

Page 346: Maskin ambiguity book mock 2 - New York University

338 Chapter 9. Fragile Beliefs and the Price of Uncertainty

error probability to be

p(θ1, θ2) =1

2(ra + rw). (9.27)

Because in this paper we use what Hansen et al. (2008b) call GameI, we use the following sequential procedure to calibrate θ1 first, then θ2.First, we pretend that xt(ι) is observable for ι = 0, 1 and calibrate θ1 bycalculating detection error probabilities for a system with an observed statevector using the approach of Hansen et al. (2002) and Hansen and Sargent(2008d, ch. 9). Then having pinned down θ1, we use formula (9.27) tocalibrate θ2. This procedure takes the point of view that θ1 measures howdifficult it would be to distinguish one model of the partially hidden statefrom another if we were able to observe the hidden state, while θ2 measureshow difficult it is to distinguish alternative models of the hidden state. Theprobability p(θ1, θ2) measures both sources of model uncertainty.

We proceeded as follows. (1) Conditional on model ι and the model ιstate xt(ι) being observed, we computed the detection error probability asa function of θ1 for models ι = 0, 1. (2) Using a prior probability of π = .5,we averaged the two curves described in point (1) and plotted the averageagainst θ1. We calibrated θ1 to yield an average detection error probabilityof .4 and used this value of θ1 in the next step. (3) With θ1 locked atthe value just set, we then calculated and plotted the detection error forthe mixture model against θ2. To generate data under the approximatingmixture model, we sampled sequentially from the conditional density ofsignals under the mixture model, building up the Bayesian probabilities ιtsequentially along a sample path. Similarly, to generate data under theworst case mixture model, we sampled sequentially from the conditionaldensity for the worst-case signal distribution, building up the worst-casemodel probabilities ιt sequentially. We set θ2 to fix the overall detectionerror equal to .2.

Appendix 9.B Sensitivity analysis

This appendix spotlights the force that produces countercyclical uncertaintycontributions to shock prices by introducing a perturbation to our modelthat attenuates that force. The persistent countercyclical uncertainty con-tributions to shock prices in figure 9.8 come from a setting in which the

Page 347: Maskin ambiguity book mock 2 - New York University

9.B. Sensitivity analysis 339

representative consumer entertains two models that are difficult to distin-guish. We study how uncertainty contributions to shock prices change whenwe expand the consumer’s universe of models to include ones that fit thedata even better than the two models in section 9.7. In particular, we nowendow the model with seven models having the same value of σ2 but nowwith values of ρ = .36, .52, .67, .82, .89, .95, .99, with values of σ1, μy beingconcentrated out via likelihood function maximization. In terms of the like-lihood function for the whole sample, the values at the end values .36, .99are the poorest fitting ones and the ρ = .82 one is the best fitting. We startthe representative consumer with a uniform prior over the seven modelsand set θ−1

1 = 6.85, θ−12 = .8 (these are set to give the same detection error

probabilities of .4 for the state dynamics and .2 over all that we used inthe text) to obtain the uncertainty contribution to shock prices reported infigure 9.11. We report Bayesian model probabilities and their worst-casecounterparts in figure 9.12.

Countercyclical shock prices still emerge, but they are moderated rel-ative to those in figure 9.8 in the text. The reason is to be found in howthe presence of models that fit better eventually pushes down the Bayesianmodel probability on long-run risk model. Pushing that probability downfar enough diminishes its influence on uncertainty contributions to shockprices even in the face of the tendency to twist model probabilities towardthe long-run risk model. Even after twisting, the worst-case probabilitieson that model are much smaller than they were in figure 9.1.

We find the comparison among competing models that have disperseimplications for shock prices as featured in our paper to be interesting.While adding our models with ρ’s between .36 and .99 that give highervalues of the likelihood diminishes the variation in contributions to un-certainty that we have computed, the impact on the rational expectationscounterpart can be even more dramatic because those intermediate modelsfit the data in ways that imply substantially smaller shock prices. For therecursive utility model with full information, the magnitude of the shockprice is γ|[B(ι)′λ(ι) + G(ι)′H ]| where γ is a measure of risk aversion. Forthe model with the largest likelihood, |[B(ι)′λ(ι) + G(ι)′H ]| = 0.008622,while this magnitude is 0.039627 for the ρ = .99 model. Thus, a value of γmore than four times larger is required at the maximum likelihood estimatefor the magnitude of the shock price to remain the same as for the modelwith ρ = .99. To the extent that ρ = .99 is statistically implausible, arational expectations econometrician either rejects the model or finds that

Page 348: Maskin ambiguity book mock 2 - New York University

340 Chapter 9. Fragile Beliefs and the Price of Uncertainty

1950 1960 1970 1980 1990 20000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Figure 9.11: Contribution of model uncertainty to risk price with sevenmodels.

1950 1960 1970 1980 1990 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pro

b

time

Phat

1950 1960 1970 1980 1990 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pro

b

time

Ptilda

1234567

1234567

Figure 9.12: Bayesian and worst case model probabilities with seven models.

consumers are highly risk averse.

Page 349: Maskin ambiguity book mock 2 - New York University

Chapter 10

Three types of ambiguity

1

Abstract

For each of three types of ambiguity, we compute a robust Ram-sey plan and an associated worst-case probability model. Ex post,ambiguity of type I implies endogenously distorted homogeneous be-liefs, while ambiguities of types II and III imply distorted heteroge-neous beliefs. Martingales characterize alternative probability speci-fications and clarify distinctions among the three types of ambiguity.We use recursive formulations of Ramsey problems to impose localpredictability of commitment multipliers directly. To reduce the di-mension of the state in a recursive formulation, we transform thecommitment multiplier to accommodate the heterogeneous beliefsthat arise with ambiguity of types II and III. Our formulations facil-itate comparisons of the consequences of these alternative types ofambiguity.

Keywords: Robustness, ambiguity, martingales, Ramsey plan, commit-ment, local predictability, heterogeneous beliefs.

1We thank Marco Bassetto, Anmol Bhandari, Jaroslav Borovicka, Rui Cui, Christo-pher Sleet, and Sevin Yeltekin for helpful comments on earlier versions. We also thankAnmol Bhandari and Rui Cui for excellent computational assistance.

341

Page 350: Maskin ambiguity book mock 2 - New York University

342 Chapter 10. Three types of ambiguity

10.1 Introduction

Rational expectations models attribute a unique probability model to di-verse agents. Gilboa and Schmeidler (1989) express a single person’s ambi-guity with a set of probability models. A coherent multi-agent setting withambiguity must impute possibly distinct sets of models to different agents,and also specify each agent’s understanding of the sets of models of otheragents.2 This paper studies three ways of doing this for a Ramsey planner.

We analyze three types of ambiguity, called I, II, and III, that a Ramseyplanner might have. In all three, the Ramsey planner believes that privateagents experience no ambiguity. This distinguishes our models from othersthat attribute ambiguity to private agents. For example, in what we shallcall the type 0 ambiguity analyzed by Karantounias (2012), the planner hasno model ambiguity but believes that private agents do.

To illustrate these distinctions, figure 10.1 depicts four types of ambi-guity within a class of models in which a Ramsey planner faces a privatesector. The symbols x and o signify distinct probability models over exoge-nous processes. (The exogenous process is a cost-push shock in the examplethat we will carry along in this paper). Circles with either x’s or o denoteboundaries of sets of models. An x denotes a Ramsey planner’s model whilean o denotes a model of the private sector. In a rational expectations model,there is one model x for the Ramsey planner and the same model o = xfor the private sector, so a graph like figure 10.1 for a rational expectationsmodel would be a single x on top of a single o.

The top left panel of figure 10.1 depicts the type of ambiguity analyzedby Karantounias (2012).3 To distinguish it from three other types to bestudied in this paper, we call this type 0 ambiguity. A type 0 Ramseyplanner has a single model x but thinks that private agents have a set ofmodels o contained in an entropy ball that surrounds the planner’s model.Karantounias’s 2012 Ramsey planner takes into account how its actionsinfluence private agents’ choice of a worst-case model along the boundary

2Battigalli et al. (2011) analyze self-confirming equilibria in games where players areambiguity averse.

3Orlik and Presno (2012) expand the space of strategies to study problems in which aRamsey planner cannot commit and in which the private sector and the Ramsey plannerboth have sets of probability models. They represent history-dependent strategies interms of pairs of continuation values and also promised marginal utilities of privateconsumption.

Page 351: Maskin ambiguity book mock 2 - New York University

10.1. Introduction 343

Type 0 Type I

Type II Type III

Figure 10.1: Type 0, top left: Ramsey planner trusts its approximatingmodel (x), knowing private agents (o) don’t trust it. Type I, top right:Ramsey planner has set of models (x) centered on an approximating model,while private sector knows a correct model (o) among Ramsey planner’s setof models x. Type II, bottom left; Ramsey planner has set of models (x)surrounding its approximating model, which private sector trusts (o). TypeIII, bottom right: Ramsey planner has single model (x) but private sectorhas another model in an entropy ball around (x).

of the set of models depicted by the o’s. Part of the challenge for theRamsey planner is to evaluate the private agent’s Euler equation using theprivate agent’s worst-case model drawn from the boundary of the set.4

Models of types I, II, and III differ from the type 0 model because inthese three models, the Ramsey planner believes that private agents expe-rience no model ambiguity. But the planner experiences ambiguity. Thethree types differ in what the planner is ambiguous about. The privatesector’s response to the Ramsey planner’s choices and the private sector’sview of the exogenous forcing variables have common structures across all

4Through its choice of actions that affect the equilibrium allocation, the plannermanipulates private agents’ worst-case model.

Page 352: Maskin ambiguity book mock 2 - New York University

344 Chapter 10. Three types of ambiguity

three types of ambiguity. In all three, private agents view the Ramseyplanner’s history-dependent strategy as a sequence of functions of currentand past values of exogenously specified processes. In addition, the privatesector has a well specified view of the evolution of these exogenous pro-cesses. These two inputs determine the private sector’s actions. Althoughthe planner’s strategy and the private sector’s beliefs differ across our threetypes of ambiguity, the mapping (i.e., the reaction function) from theseinputs into private sector responses is identical. We will represent this gen-eralized notion of a reaction function as a sequence of private sector Eulerequations. When constructing Ramsey plans under our three types of am-biguity, we will alter how the Ramsey planner views both the evolution ofthe exogenous processes and the beliefs of the private sector. We will studythe consequences of three alternative configurations that reflect differencesin what the Ramsey planner is ambiguous about.

The top right panel of figure 10.1 depicts type I ambiguity. Here theRamsey planner has a set of models x centered on an approximating model.The Ramsey planner is uncertain about both the evolution of the exogenousprocesses and how the private sector views these processes. The plannerpresumes that private sector uses a probability specification that actuallygoverns the exogenous processes. To cope with its ambiguity, the Ramseyplanner’s alter ego chooses a model on the circle, while evaluating privatesector Euler equations using that model.

The bottom left panel of figure 10.1 depicts type II ambiguity. In thespirit of Hansen and Sargent (2008b, ch. 16), the Ramsey planner has a setof models surrounding an approximating model x that the private sectoro completely trusts; so the private sector’s set of models is a singleton ontop of the Ramsey planner’s approximating model. The Ramsey planner’sprobability-minimizing alter ego chooses model on the circle, while evalu-ating private the agent’s Euler equations using the approximating modelo.

The bottom right panel of figure 10.1 depicts type III ambiguity. Fol-lowing Woodford (2010), the Ramsey planner has a single model x of theexogenous processes and thus no ambiguity along this dimension. Never-theless, the planner faces ambiguity because it knows only that the privatesector’s model o is within a “ball” around its own model. The Ramseyplanner evaluates the private sector’s Euler equations using a worst-casemodel chosen by the Ramsey planner’s alter ego.

This figure is just for motivation. Our formal analysis is more complex.

Page 353: Maskin ambiguity book mock 2 - New York University

10.1. Introduction 345

There are many (an infinite number of) dimensions associated with our“entropy balls” of probability specifications. Technically, we do not specifysuch balls but instead penalize relative entropy as a way to restrain howmuch concern the Ramsey planner has for model ambiguity. To do this, weextend and apply the multiplier preferences of Hansen and Sargent (2001b).

For each of our three types of ambiguity, we compute a robust Ram-sey plan and an associated worst-case probability model. A worst-casedistribution is sometimes called an ex post distribution, meaning after therobust decision maker’s minimization over probabilities. Ex post, ambiguityof type 1 delivers a model of endogenously distorted homogeneous beliefs,while ambiguities of types 2 and 3 give distinct models of endogenouslyheterogeneous beliefs.

A Ramsey problem can be solved by having the planner choose a pathfor the private sector’s decisions subject to restrictions on the private sec-tor’s co-state variable λt at dates t ≥ 0 that are implied by the privatesector’s optimization.5 The private sector’s Euler equation for λt involvesconditional expectations of future values of λt, which makes it differ from astandard ‘backward-looking’ state evolution equation in ways that we musttake into account when we pose Ramsey problems that confront alternativetypes of ambiguity. A Ramsey plan can be represented recursively by usingthe “co-state on the private sector costate,” λt, as a state variable ψt for theRamsey planner. The planner chooses the initial value ψ0 to maximize itstime 0 value function. The evolution of ψt encodes the planner’s commit-ment to confirm the private sector’s earlier expectations about the Ramseyplanner’s time t actions. It is particularly important for us to character-ize the probability distribution with respect to which the private sector’sexpectations are formed and how ψt responds to shocks.

For linear-quadratic problems without robustness, a certainty equiva-lence principle implies that shock exposures have no impact on decisionrules.6 But even in linear-quadratic problems, concerns about robustnessmake shock exposures affect decision rules by affecting the scope of concernsabout statistical misspecification.

Along with others, in earlier work we have analyzed the effects of shockexposures on robust decisions too casually. In this paper, we proceed sys-

5Marcet and Marimon (2011) and the references cited there formulate a class ofproblems like ours under rational expectations. Marcet and Marimon (2011) discussmeasurability restrictions on multipliers that are closely related to ones that we impose.

6Shock exposures do affect constant terms in value functions.

Page 354: Maskin ambiguity book mock 2 - New York University

346 Chapter 10. Three types of ambiguity

tematically by starting with fundamentals and distinguishing among con-ditional expectations associated with alternative probability models. Weexploit the finding that, without concerns about robustness, the planner’scommitment multiplier ψt is “locally predictable” and hence has zero expo-sure to shocks in the current period. We then describe ways that a Ramseyplanner seeks to be robust for each of our three types of statistical ambiguityand produce a Hamilton-Jacobi-Bellman equation for each.

Technically, this paper (1) uses martingales to clarify distinctions amongthe three types of ambiguity; (2) finds, to our initial surprise, that even incontinuous time limits and even in our very simple linear New Keynesianmodel, ambiguity of types II and III lead to zero-sum games that are notlinear-quadratic; (3) uses recursive formulations of Ramsey problems toimpose local predictability of commitment multipliers in a direct way; and(4) finds, as a consequence of (3), that to reduce the dimension of the statein the recursive formulation, it is convenient to transform the commitmentmultiplier in a way to accommodate heterogeneous beliefs with ambiguityof types II and III.7

The ex post belief distortion that emerges from ambiguity of type Iis reminiscent of some outcomes for a robust social planning problem ap-pearing in some of our earlier research, but there are important differences.Hansen and Sargent (2008b, chs. 12-13)) used a robust social planning prob-lem to compute allocations as well as worst-case beliefs that we imputedto a representative agent in a model of competitive equilibrium withouteconomic distortions. In effect, we appealed to welfare theorems and re-strictions on preferences to justify a robust planner. We priced risky assetsby taking the representative agent’s first-order conditions for making tradesin a decentralized economy, then evaluating them at the allocation chosenby a robust social planner under the imputed worst-case beliefs (e.g. Hansenand Sargent (2008b, chs. 14)). In this paper, we can’t appeal to the welfaretheorems.8

7We do not analyze the type 0 ambiguity studied by Karantounias (2012) mainlyfor the technical reason that the trick we use to reduce the dimension of the state inthe planner’s Bellman equations for ambiguity of types II and III in sections 10.7 and10.8 does not apply. The Bellman equation analyzed by Karantounias (2012) containsan additional state variable relative to ours.

8Even in heterogeneous-agent economies without economic distortions, where thewelfare theorems do apply, formulating Pareto problems with agents who are concernedabout robustness requires an additional endogenous state variable to characterize efficient

Page 355: Maskin ambiguity book mock 2 - New York University

10.2. Illustrative model 347

Section 10.2 describes a simple New Keynesian model that we use as alaboratory in which to study our three types of ambiguity. Section 10.3 setsthe stage by solving a Ramsey problem without robustness in two ways, onein the space of sequences, another recursively. Section 10.4 describes how torepresent alternative probability models as distortions of a baseline approx-imating model. Section 10.5 solves a robust Ramsey problem under the firsttype of ambiguity. Section 10.6 studies a Ramsey problem with exogenousbelief heterogeneity between the private sector and the Ramsey planner.The model with arbitrary belief heterogeneity is of interest in its own rightand is also useful in preparing for the analysis of the robust Ramsey prob-lem under the second type of ambiguity to be presented in section 10.7.Section 10.8 then studies the robust Ramsey problem under the third typeof ambiguity. Section 10.9 proposes new local approximations to compareoutcomes under robust Ramsey plans constructed under the three types ofambiguity. We illustrate our analysis with a numerical example in section10.10. After section 10.11 offers concluding remarks, appendices 10.B and10.C describe calculations that illustrate how sequence formulations andrecursive formulations of Ramsey plans agree.

10.2 Illustrative model

For concreteness, we use a simple version of a New Keynesian model ofWoodford (2010). We begin by describing the model and Ramsey problemswithout ambiguity in discrete time and in continuous time.

Let time be discrete with t = εj for ε > 0 and integer j ≥ 0. A cost-pushshock ct is a function f(xt) of a Markov state vector xt described by

xt+ε = g(xt, wt+ε − wt, ε), (10.1)

where {wt} is a standard Brownian motion so that the increment wt+ε−wtis normally distributed with mean zero and variance ε and is independent ofws for 0 ≤ s ≤ t. The private sector treats c as exogenous to its decisions.

allocations recursively. See Anderson (2005), who studies risk-sensitive preferences thatalso have an interpretation as expressing aversion to model ambiguity with what havecome to be called multiplier preferences.

Page 356: Maskin ambiguity book mock 2 - New York University

348 Chapter 10. Three types of ambiguity

The private sector’s first-order necessary conditions are

pt − pt−ε = ελt (10.2)

λt = ε(κyt + ct + c∗) + exp(−δε)E [λt+ε|Ft] (10.3)

εiε,t − ελt = ρE [yt+ε|Ft]− ρyt + εd∗, (10.4)

where iε,t is the one-period (of length ε) nominal interest rate set at date t.Equation (10.3) is a New Keynesian Phillips curve and equation (10.4) is aconsumption Euler equation.

To obtain a continuous-time model that is mathematically easier toanalyze, we shrink the discrete-time increment ε. Index the time incrementby ε = 1

2jfor some positive integer j. Define the local mean μλt to be

μλ,t = limε↓0

1

εE [λt+ε − λt|Ft] ,

and drive ε to zero in (10.3) to get a continuous time version of a newKeynesian Phillips curve:

μλ,t = δλt − κyt − ct − c∗. (10.5)

Applying a similar limiting argument to (10.4) produces a continuous-timeconsumption Euler equation:

μy,t =1

ρ(it − λt − d∗) (10.6)

where here λt is the instantaneous inflation rate and it is the instantaneousnominal interest rate. We depict the continuous-time counterpart to theexogenous state evolution equation (10.1) as

dxt = μx(xt)dt+ σx(xt)dwt.

These equations, or modifications of them that appropriately allow for al-ternative specifications of private sector beliefs, constrain our Ramsey plan-ners.

10.3 No concern about robustness

In this section, we first pose a Ramsey problem as a Lagrangian and de-duce a set of first-order conditions that restrict the dynamic evolution of

Page 357: Maskin ambiguity book mock 2 - New York University

10.3. No concern about robustness 349

the state variables and associated Lagrange multipliers. We can computea Ramsey plan by solving these equations subject to the appropriate ini-tial and terminal conditions. When these equations are linear, we couldsolve them using invariant subspace methods. We take a different routeby developing and solving a recursive version of the Ramsey problem usingthe multiplier on the private sector Euler equation as a state variable. Theidea of constructing a recursive representation of a Ramsey plan in this wayhas a long history. See (Ljungqvist and Sargent 2004a, chs. 18,19) for anextensive discussion and references. In later sections, we will extend thatliterature by constructing robust counterparts to recursive formulation ofthe Ramsey problem in discrete and continuous time.

Planner’s objective function

In discrete time and without concerns about robustness the Ramsey plannermaximizes

−1

2E

∞∑j=0

exp(−εδj)[(λεj)

2 + ζ(yεj − y∗)2]|F0

). (10.7)

In a continuous-time limit, the planner’s objective becomes

−1

2E

(∫ ∞

0

exp(−δt)[(λt)

2 + ζ(yt − y∗)2]dt|F0

).

In posing our Ramsey problem, we follow Woodford (2010) in specifyingthe Ramsey planner’s objective function in a way that induces the Ramseyplanner to trade off output and inflation dynamics. The Ramsey plannertakes the firm’s Euler equation (10.5) as an implementability constraint andchooses welfare-maximizing processes for {λt} and {yt}. The consumer’sEuler equation (10.6) will then determine an implied interest rate rule it =λt − ρμy,t + d∗ that implements the Ramsey plan.

A discrete-time sequence formulation

A Ramsey planner chooses sequences {λεj, yεj}∞j=0 to maximize (10.7) sub-ject to (10.3) and ct = f(xt) with xt governed by (10.1). Form the La-

Page 358: Maskin ambiguity book mock 2 - New York University

350 Chapter 10. Three types of ambiguity

grangian

− 1

2E

∞∑j=0

exp(−εδj)[(λεj)

2 + ζ(yεj − y∗)2]|F0

]

+E

[ ∞∑j=0

exp(−εδj)ψε(j+1)

[λεj − ε (κyεj + cεj + c∗)− exp(−εδ)λ(j+1)ε

]|F0

].(10.8)

Remark 10.3.1. The private sector Euler equation (10.3) is cast in termsof mathematical expectations conditioned on time t information. This makesit appropriate to restrict the Lagrange multiplier ψt+ε to depend on date tinformation. We shall exploit this measurability condition extensively whenwe drive ε to zero to obtain continuous-time limits. This measurabilitycondition is the source of local predictability of ψt.

First-order conditions for maximizing (10.8) with respect to λt, yt, re-spectively, are

ψt+ε − ψt − ελt = 0 (10.9)

−ζ(yt − y∗)− κψt+ε = 0.

Combine (10.9) with the equation system (10.1) that describes the evolutionof {xt} and also the private-sector Euler equation (10.3). When the xdynamics (10.1) are linear, a Ramsey plan without robustness is a stabilizingsolution of the resulting system of equations, which can be computed usinga stabilizing subspace method described by Hansen and Sargent (2008b,chs. 4,16).

A recursive formulation

We now propose an alternative approach to the Ramsey problem withoutrobustness that builds on recursive formulations of Stackelberg or Ram-sey problems that were summarized by Ljungqvist and Sargent (2004a,chs. 18,19) and extended by Marcet and Marimon (2011). To encode his-tory, view ψ as an endogenous state variable that evolves as indicated by(10.9), namely,

ψt+ε = ελt + ψt.

Because the Brownian increment wt+ε −wt does not affect the evolution ofψt+ε, ψt+ε is said to be “locally predictable”.

Page 359: Maskin ambiguity book mock 2 - New York University

10.3. No concern about robustness 351

In the spirit of dynamic programming, we transform a multi-period prob-lem to a sequence of two-period problems. Recall that the cost-push shock cis a function f(x) of a Markov state vector x that obeys (10.1). Guess thatan appropriate state vector for next period is (x+, ψ+). Soon we will arguethat we can interpret ψ+ as a commitment multiplier. Let λ+ = F+(x+, ψ+)be a policy function for λ+. Let V +(x+, ψ+) denote a planner’s next-periodvalue function inclusive of a term that encodes commitment. To be moreprecise V (x, ψ) + ψF (x, ψ) will be the discounted expected value of thesingle period contributions given by

− ε

2

[(λt)

2 + ζ(yt − y∗)2]

to the Ramsey planner’s objective. In our first recursive formulation, wewill take to be the next period function V +(x+, ψ+) + ψ+F+(x+, ψ+) andthen compute the current-period functions F and V . To ensure that com-mitments are honored we will subtract a term ψλ from the current-periodobjective when we optimize with respect λ required for computing F . No-tice that V includes this term evaluated at λF (x, ψ).

It turns out that by virtue of optimization, we can restrict the twofunctions V + and F+ to satisfy

V +2 (x+, ψ+) = −F+(x+, ψ+) (10.10)

where V +2 is the derivative of V + with respect to its second argument ψ+.

We will show that property (10.10) is replicated under iteration on theBellman equation for the Ramsey planner. The relations between V + andF+ and between V and F will lead us to construct an alternative Bellmanequation mapping V + to V . Our specific tasks in this section are to i)provide an evolution equation for ψ+ and interpret ψ and ψ+ formally ascommitment multipliers; ii) show that the counterpart to restriction (10.10)applies to F ; and iii) construct a Bellman equation that applies to V andV + with no specific reference to F or F+.

Problem 10.3.2. Our first Bellman equation for the Ramsey planner is

V (x, ψ) =maxy,λ

−ψλ− ε

2

[λ2 + ζ(y − y∗)2

]+

+ exp(−δε)E[V +(x+, ψ+) + ψ+F+(x+, ψ+)|x, ψ

](10.11)

Page 360: Maskin ambiguity book mock 2 - New York University

352 Chapter 10. Three types of ambiguity

where the maximization is subject to

λ− exp(−δε)E[F+(x+, ψ+)|x, ψ

]− ε [κy + f(x) + c∗] = 0 (10.12)

ελ + ψ − ψ+ = 0 (10.13)

g(x, w+ − w)− x+ = 0.

Notice the term −ψλ on the right side of (10.11). This term remembersand confirms commitments and plays a vital role when it comes to optimiz-ing with respect to λ. In the special case in which ψ = 0, which happensto be the initial value set at by the Ramsey planner at date zero, the onlydate at which the planner is free to set ψ, this commitment term vanishes.Soon we will display an alternative Bellman equation (10.17) that involvesonly the function V but that nevertheless encodes the private sector Eulerequation.

To justify our interpretation of ψ+ and ψ as commitment multipliers,we solve the Bellman equation (10.11) by first introducing multipliers �1and �2 on the first two constraints (10.12) and (10.13) for Problem 10.3.2.First-order conditions for maximizing the resulting Lagrangian with respectto λ and y are

−ελ + �1 + ε�2 − ψ = 0,

−ζ(y − y∗)− κ�1 = 0. (10.14)

Combining the first equation of (10.14) with the second constraint (10.13)for Problem 10.3.2 gives

ψ+ = �1 + ε�2.

Our next result justifies our interpretation of ψ+ and the evolution that weposited for ψ+ in the constraint (10.13). We link the multiplier �1 to ψ+

and verify that this constraint is slack.

Lemma 10.3.3. In problem 10.3.2, the multiplier �1 on constraint (10.12)equals ψ+ and the multiplier �2 on constraint (10.13) equals zero. Further-more,

y = y∗ −(κ

ζ

)(ψ + ελ) , (10.15)

where λ = F (x, ψ) satisfies the private firm’s Euler equation (10.12). Fi-nally, V2(x, ψ) = −F (x, ψ).

Page 361: Maskin ambiguity book mock 2 - New York University

10.3. No concern about robustness 353

See Appendix 10.A for a proof.Finally, we construct a Bellman equation for the Ramsey planner that

incorporates the private sector Euler equation by using our characterizationof ψ+ as a Lagrange multiplier. Express the contribution of the privatesector Euler equation to a Lagrangian formed from the optimization on theright side of (10.11):

ψ+[λ− exp(−δε)E

[F+(x+, ξ+)|x, ψ

]− ε (κy + c+ c∗)

]= − exp(−δε)E

[ψ+F+(x+, ψ+)|x, ψ

]+ ψ+ [λ− ε (κy + c + c∗)] ,

where we have used the fact that ψ+ is locally predictable. Adding thisLagrangian term to the Ramsey planner’s objective results in:

− ψλ− ε

2

[λ2 + ζ(y − y∗)2

]+ exp(−δε)E

[V +(x+, ψ+)|x, ψ

]+ ψ+ [λ− ε (κy + c+ c∗)] . (10.16)

Not surprisingly, by differentiating with respect to y, λ and ψ+, we repro-duce consequence (10.15) of the first-order conditions reported in Lemma10.3.3. This optimization has us maximize with respect to λ and y. Bymaximizing with respect to λ we obtain state evolution (10.13), and byminimizing with respect to ψ+, we obtain the private sector Euler equation(10.12).

In what follows we consider ψ+ as an endogenous state variable and λas a control. After substituting for ψ+ into the Lagrangian (10.16), we areled to study the following recursive, zero-sum game.

Problem 10.3.4. An alternative Bellman equation for a discrete-time Ram-sey planner without robustness is

V (x, ψ) = minλ

maxy

ε

2

[λ2 − ζ(y − y∗)2

]+ exp(−δε)E

[V +(x+, ψ+)|x, ψ

]− ε(ψ + ελ) [κy + f(x) + c∗] ,

(10.17)

where the extremization is subject to

ψ + ελ− ψ+ = 0 (10.18)

g(x, w+ − w, ε)− x+ = 0.

Page 362: Maskin ambiguity book mock 2 - New York University

354 Chapter 10. Three types of ambiguity

Claim 10.3.5. Discrete-time problems 10.3.2 and 10.3.4 share a commonvalue function V and common solutions for y, λ as functions of the statevector (x, ψ).

Proof. The first-order condition for y implies the same formula given inLemma 10.3.3. To verify the private sector Euler equation, introduce amultiplier � on constraint (10.18). Differentiate with respect to λ and divideby ε:

λ+ �− ε [κy + f(x) + c∗] = 0. (10.19)

Differentiate with respect to ψ+ and substitute −F+ for V +2 to get

−�− exp(−δε)E[F+(x+, ψ+)|x, ψ

]= 0.

Solving this equation for � and substituting into (10.19) allows us to expressthe private sector Euler equation as constraint (10.12) in Problem 10.3.2.

Remark 10.3.6. In Problem 10.3.4, the Ramsey planner minimizes withrespect to λ, taking into account its contribution to the evolution of themultiplier ψ+. That we minimize with respect to λ is the outcome of ourhaving substituted for ψ+ into (10.16). In contrast to Problem 10.3.2, theconstraint (10.13) ceases to be slack. Instead of being included as a separateconstraint, Problem 10.3.4 embeds the private-sector Euler equation (i.e.,equation (10.12)), in the criterion to be optimized.

Remark 10.3.7. At time 0, ψ is a choice variable for the Ramsey planner.The optimal choice of ψ solves

minψV (x, ψ) + ψF (x, ψ).

First-order conditions are

V2(x, ψ) + F (x, ψ) + ψF2(x, ψ) = 0.

Since V2 = −F , a solution to the above equation is ψ = 0, which is consis-tent with our initial condition ψ0 = 0.

Page 363: Maskin ambiguity book mock 2 - New York University

10.3. No concern about robustness 355

Continuous-time recursive formulation

In a continuous-time formulation of the Ramsey problem without concernsabout robustness, the exogenous state vector evolves according to:

dxt = μx(xt)dt+ σx(xt)dwt

dψt = λtdt.

Using Ito calculus, we characterize the effects of the evolution of x, ψ onthe value function V by differentiating the value function. Subtract V fromboth sides of (10.17) and divide by ε to obtain

Problem 10.3.8.

0 =minλ

maxy

1

2λ2 − ζ

2(y − y∗)2 − κψy − ψf(x)− ψc∗

− δV + V1 · μx + V2λ

+1

2trace (σx

′V11σx) . (10.20)

From the first-order conditions,

y = y∗ − κ

ζψ

λ = −V2.

As in our discrete-time formulation, we used a Lagrangian to impose theprivate sector Euler equation under the approximating model. In Appendix10.A, we verify that satisfaction of the Hamilton-Jacobi-Bellman equation(10.20) implies that the Euler equation is also satisfied.

We end the section with a caveat. We have assumed attainment anddifferentiability without providing formal justification. We have not estab-lished the existence of smooth solutions to our Bellman equations. Whilewe could presumably appeal to more general viscosity solutions to the Bell-man equation, this would require a different approach to verifying that theprivate sector’s Euler equation is satisfied than what we have done in Ap-pendix 10.A. In the numerical example of section 10.10, there is a quadraticsolution to the Hamilton-Jacobi-Bellman (HJB) equation (10.20), so therethe required smoothness prevails.

Page 364: Maskin ambiguity book mock 2 - New York University

356 Chapter 10. Three types of ambiguity

10.4 Representing probability distortions

To represent an alternative probability model, we use a positive martingale zwith a mathematical expectation with respect to the approximating modelequal to unity. By setting z0 = 1, we indicate that we are conditioningon time 0 information. A martingale z is a likelihood ratio process for aprobability model perturbed vis a vis an approximating model. It followsfrom the martingale property that the perturbed probability measure obeysa Law of Iterated Expectations. Associated with a martingale z are theperturbed mathematical expectations

E (ρt+τ |Ft) = E

(zt+τzt

ρt+τ |Ft

),

where the random variable ρt+τ is in the date t+ τ information set. By themartingale property

E

(zt+τzt

|Ft

)= 1.

Measuring probability distortions

To measure probability distortions, we use relative entropy, an expectedlog-likelihood ratio, where the expectation is computed using a perturbedprobability distribution. Following Hansen and Sargent (2007c), the term

∞∑j=0

ε exp[−εδ(j + 1)]E(zε(j+1)

[log zε(j+1) − log zεj

]|F0

)= [1− exp(−εδ)]

∞∑j=0

ε exp[−εδ(j + 1)]E[zε(j+1) log zε(j+1)|F0

](10.21)

measures discounted relative entropy between a perturbed (by z) probabil-ity model and a baseline approximating model. The component

E[zε(j+1) log zε(j+1)|F0

]measures conditional relative entropy of perturbed probabilities of date ε(j+1) events conditioned on date zero information, while

E(zε(j+1)

[log zε(j+1) − log zεj

]|Fεj

)measures conditional relative entropy of perturbed probabilities of date ε(j+1) events conditioned on date εj information.

Page 365: Maskin ambiguity book mock 2 - New York University

10.5. The first type of ambiguity 357

Representing continuous-time martingales

We acquire simplifications by working with a continuous time model thatemerges from forming a sequence of discrete time models with time incre-ment ε and driving ε to zero. For continuous Brownian motion informationstructures, altering the probability model changes the drift of the Brow-nian motion in a way conveniently described in terms of a multiplicativerepresentation of the martingale {zt}:

dzt = ztht · dwt.

Under the perturbed model associated with the martingale z, the drift ofdwt is htdt. We use Ito’s lemma to characterize the evolution of log z andz log z:

d log zt = −1

2|ht|2dt+ ht · dwt,

dzt log zt =1

2zt(ht)

2dt+ zt(1 + log zt)ht · dwt.

The drift or local mean of(zt+ε

zt

)(log zt+ε − log zt) at t for small positive ε

is 12(ht)

2. Hansen et al. (2006b) used this local measure of relative entropy.Discounted relative entropy in continuous time is

1

2E

[∫ ∞

0

exp(−δt)zt(ht)2dt|F0

]= δE

[∫ ∞

0

exp(−δt)zt log ztdt|F0

].

In our continuous-time formulation, the robust Ramsey planner chooses h.

10.5 The first type of ambiguity

In the first type of ambiguity, the planner thinks that the private sectorknows a model that is distorted relative to the planner’s approximatingmodel.

Managing the planner’s ambiguity

To respond to its ambiguity about the private sector’s statistical model, theRamsey planner chooses z to minimize and y and λ to maximize a multiplier

Page 366: Maskin ambiguity book mock 2 - New York University

358 Chapter 10. Three types of ambiguity

criterion9

− 1

2E

∞∑j=0

exp(−εδj)zεj[(λεj)

2 + ζ(yεj − y∗)2]|F0

)

+ θE

( ∞∑j=0

ε exp[−εδ(j + 1)]zε(j+1)

[log zε(j+1) − log zεj

]|F0

)(10.22)

subject to the implementability constraint

λt = ε(κyt + ct + c∗) + exp(−δε)E(zt+εzt

λt+ε|Ft

)(10.23)

and the exogenously specified cost-push process. Here the parameter θ pe-nalizes martingales z with large relative entropies. Setting θ arbitrarilylarge makes this problem approximate a Ramsey problem without robust-ness. In (10.22), the Ramsey planner evaluates its objective under theperturbed probability model associated with the martingale z. Also, in theprivate sector’s Euler equation (10.23), the Ramsey planner evaluates theexpectation under the perturbed model. These choices capture the plan-ner’s belief that the private sector knows a correct probability specificationlinked to the planner’s approximating model by a probability distortion zthat is unknown to the Ramsey planner but known by the private sector.

Evidently

E

[zt+εzt

(ct+ε − ct)|Ft

]= ενcct + E

[zt+εzt

(wt+ε − wt)|Ft

]where E

[zt+ε

zt(wt+ε − wt)|Ft

]is typically not zero, so that the martingale

{zt} alters the conditional mean of the cost-push process.Form the Lagrangian

− 1

2E

∞∑j=0

exp(−εδj)zεj[(λεj)

2 + ζ(yεj − y∗)2]|F0

]

+θE

( ∞∑j=0

ε exp[−εδ(j + 1)]zε(j+1)

[log zε(j+1) − log zεj

]|F0

)9See Hansen and Sargent (2001b).

Page 367: Maskin ambiguity book mock 2 - New York University

10.5. The first type of ambiguity 359

+E

[ ∞∑j=0

exp(−εδj)zε(j+1)ψε(j+1)

[λεj − ε (κyεj + cεj + c∗)− exp(−εδ)λ(j+1)ε

]|F0

].

(10.24)

First-order conditions for maximizing (10.24) with respect to λt and yt,respectively, are

ztψt+ε − ztψt − εztλt = 0

−ζzt(yt − y∗)− κztψt+ε = 0,

where we have used the martingale property E(zt+ε|Ft) = zt. Because ztis a common factor in both first-order conditions, we can divide both by ztand thereby eliminate zt.

Recursive formulation with arbitrarily distorted

beliefs

For our recursive formulation in discrete time, initially we posit that thecost-push process c is a function f(x) of a Markov state vector x and thatthe martingale z itself has a recursive representation, so that

x+ = g(x, w+ − w, ε)

z+ = zk(x, w+ − w, ε), (10.25)

where we impose the restriction E [k(x, w+ − w, ε)|x] = 1 that lets us in-terpret z+

z= k(x, w+ − w, ε) as a likelihood ratio that alters the one-step

transition probability for x. For instance, since w+ − w is a normally dis-tributed random vector with mean zero and covariance εI, suppose that

k(x, w+) = exp[q(x)′(w+ − w)− ε

2q(x)′q(x)

].

Then the multiplicative martingale increment z+

z= k(x, w+ − w, ε) trans-

forms the distribution of the increment (w+−w) from a normal distributionwith conditional mean zero to a normal distribution with conditional meanq(x).

Using this recursive specification, we can adapt the analysis in section10.3 to justify solving

V (x, ψ) = minλ

maxy

ε

2

[λ2 − ζ(y − y∗)2

]+ exp(−δε)E

[k(x, w+ − w, ε)V +(x+, ψ+)|x, ψ

]− ε(ψ + ελ) [κy + f(x) + c∗] + θE

[k(x, w+ − w, ε) log k(x, w+ − w, ε)|x, ψ

],

Page 368: Maskin ambiguity book mock 2 - New York University

360 Chapter 10. Three types of ambiguity

where the extremization is again subject to (10.18). We minimize with re-spect to λ, taking into account the contribution of λ to the evolution of ψ.This takes the specification of the martingale as given. To manage ambigu-ity of the first type, we must contemplate the consequences of alternativez’s.

A Ramsey planner’s HJB equation for the first typeof ambiguity

In a continuous-time formulation of the Ramsey problem with concernsabout the first type of ambiguity, we confront the Ramsey planner with thestate vector evolution

dxt = μx(xt)dt+ σx(xt)dwt

dzt = ztht · dwtdψt = λtdt.

We characterize the impact of the state evolution on continuation values byapplying the rules of Ito calculus under the change of measure. We add apenalty term θ

2|h|2 to the continuous-time objective to limit the magnitude

of the drift distortions for the Brownian motion and then by imitating thederivation of HJB equation (10.20) deduce

0 =minλ,h

maxy

1

2λ2 − ζ

2(y − y∗)2 +

θ

2|h|2 − κψy − ψf(x)− ψc∗

− δV + V1 · (μx + σxh) + V2λ

+1

2trace (σx

′V11σx) . (10.26)

Notice how (10.26) minimizes over h.The separable form of the objective implies that the order of minimiza-

tion and maximization can be exchanged. First-order conditions imply

y = y∗ − κ

ζψ

h = −1

θ(σx)

′V1 (10.27)

λ = −V2.

Page 369: Maskin ambiguity book mock 2 - New York University

10.5. The first type of ambiguity 361

As in the Ramsey problem without robustness (see Appendix 10.A), to ver-ify that the private sector Euler condition is satisfied, differentiate the HJBequation (10.26) for V with respect to ψ and apply the envelope condition.

Interpretation of worst-case dynamics

The worst-case ht = −1θ(σx)

′V1(xt, ψt) from (10.27) feeds back on the en-dogenous state variable ψt. As a consequence, the implied worst-case modelmakes this endogenous state influence the dynamics of the exogenous statevector xt. The peculiar feature that {ψt} Granger-causes {xt} can make theworst-case model difficult to interpret. What does it mean for the Ramseyplanner to believe that its decisions influence the motion of exogenous statevariables? To approach this question, Hansen et al. (2006b) develop analternative representation. As shown by Fleming and Souganidis (1989), ina two-player zero-sum HJB equation, if a Bellman-Isaacs condition makesit legitimate to exchange orders of maximization and minimization for therecursive problem, then orders of maximization and minimization can alsobe exchanged for a corresponding zero-sum game that constitutes a datezero, formulation of a robust Ramsey problem in the space of sequences.That allows us to construct an alternative representation of the worst-casemodel without dependence of the dynamics of the exogenous state vectorxt on ψt. We accomplish this by augmenting the exogenous state vectoras described in detail by Hansen et al. (2006b) and Hansen and Sargent(2008b, ch. 7) in what amounts to an application of the “Big K, little k”trick common in macroeconomics. In particular, we construct a worst-caseexogenous state-vector process

d

[xtΨt

]=

[μx(xt)F (ct,Ψt)

]dt+

[σx(xt)

0

] [−1

θσx(xt)

′V1(xt,Ψt)dt+ dwt

](10.28)

for a multivariate standard Brownian increment dwt. We then construct aRamsey problem without robustness but with this expanded state vector.This yields an HJB equation for a value function V (x,Ψ, ψ) that dependson both big Ψ and little ψ. After solving it, we can construct F via

F = −V3.

Then

F (c, ψ) = F (c, ψ, ψ).

Page 370: Maskin ambiguity book mock 2 - New York University

362 Chapter 10. Three types of ambiguity

Provided that we set ψ0 = Ψ0 = 0, it will follow that ψt = Ψt and thatthe resulting {λt} and {yt} processes from our robust Ramsey plan withthe first type of ambiguity will coincide with the Ramsey processes underspecification (10.28) for the cost-push process.

Relation to previous literature

The form of HJB equation (10.26) occurs in the literature on continuoustime robust control. For instance, see James (1992) and Hansen et al.(2006b). It is also a continuous-time version of a discrete-time Ramsey prob-lem studied by researchers including Walsh (2004), Giordani and Soderlind(2004), Leitemo and Soderstrom (2008), Dennis (2008), and Olalla andGomez (2011). We have adapted and extended this literature by suggestingan alternative recursive formulation together with appropriate HJB equa-tions. In the next subsection, we correct misinterpretations in some of theearlier literature.

Not sharing worst-case beliefs

Walsh (2004) and Giordani and Soderlind (2004) argue that private agentsshare the government’s concern about robustness so that when the govern-ment chooses beliefs in a robust fashion, agents act on these same beliefs.We think that interpretation is incorrect and prefer the one we have de-scribed as the first type of ambiguity. In selecting a worst-case model, theprivate sector would look at its own objective functions and constraints, notthe government’s, so robust private agents’ worst-case models would differfrom the government’s. Even if the government and the private agentswere to share the same value of θ, they would compute different worst-casemodels.10 Dennis (2008) argues that “the Stackelberg leader believes thefollowers will use the approximating model for forming expectations andformulates policy accordingly.” Our Ramsey problem for the second type

10 Giordani and Soderlind (2004), in particular, argue that “we follow Hansen andSargent in taking the middle ground, and assume that the private sector and governmentshare the same loss function, reference model and degree of robustness.” But even if thegovernment and private sector share the same loss function, the same reference model,and the same robustness parameter, they still might very well be led to different worst-case models because they face different constraints. We do not intend to criticize Walsh(2004) and Giordani and Soderlind (2004) unfairly. To the contrary, it is a strength thaton this issue their work is more transparent and criticizable than many other papers.

Page 371: Maskin ambiguity book mock 2 - New York University

10.5. The first type of ambiguity 363

of ambiguity has this feature, but not our Ramsey problem for the firsttype, as was mistakenly claimed by Dennis.

As emphasized above, we favor an interpretation of the robust Ramseyplans of Walsh and others as one in which the Ramsey planner believes thatprivate agents know the correct probability model. Because the associatedinference problem is so immense, the Ramsey planner cannot infer privateagents’ model by observing their decisions (see section 10.5). The Ramseyplanner’s worst-case z is not intended to “solve” this impossible inferenceproblem. It is just a device to construct a robust Ramsey policy. It isa cautious inference about private agents’ beliefs that helps the Ramseyplanner design that robust policy. Since private firms know the correctmodel, they would actually make decisions by using a model that generallydiffers from the one associated with the Ramsey planner’s minimizing {zt}.Therefore, the Ramsey planner’s ex post subjective decision rule for thefirm as a function of the aggregate states, obtained by solving its Eulerequation with the minimizing {z}, will not usually produce the observedvalue of pt+ε − pt. This discrepancy will not surprise the Ramsey planner,who knows that discrepancy is insufficient to reveal the process {zt} actuallybelieved by the private sector.

An intractable model inference problem

The martingale {zt} defining the private sector’s model has insufficientstructure to allow the Ramsey planner to infer the private sector’s modelfrom observed outcomes {pt+ε−pt, xt, yt}. The Ramsey planner knows thatthe probability perturbation {zt} gives the private sector a model that hasconstrained discounted entropy relative to the approximating model. Thisleaves the immense set of unknown models so unstructured that it is im-possible to infer the private sector’s model from histories of outcomes foryt, xt, and λt. The Ramsey planner does not attempt to reverse engineer{zt} from observed outcomes because it cannot.

To indicate the magnitude of the inference problem, consider a discretetime specification and suppose that after observing inflation, the Ramseyplanner solves an Euler equation forward to infer a discounted expectedlinear combination of output and a cost-push shock. If the Ramsey plannerwere to compare this to the outcome of an analogous calculation basedon the approximating model, it would reveal a distorted expectation. Butthere are many consistent ways to distort dynamics that rationalize this

Page 372: Maskin ambiguity book mock 2 - New York University

364 Chapter 10. Three types of ambiguity

distorted forecast. One would be to distort only the next period transitiondensity and leave transitions for subsequent time periods undistorted. Manyother possibilities are also consistent with the same observed inflation. Thecomputed worst-case model is one among many perturbed models consistentwith observed data.

10.6 Heterogeneous beliefs without

robustness

In section 10.7, we shall study a robust Ramsey planner who faces oursecond type of ambiguity. The section 10.7 planner distrusts an approxi-mating model but believes that private agents trust it. Because ex post theRamsey planner and the private sector have disparate beliefs, many of thesame technical issues for coping with the second type of ambiguity arise ina class of Ramsey problems with exogenous heterogeneous beliefs. So webegin by studying situations in which both the Ramsey planner and theprivate agents completely trust different models.

To make a Ramsey problem with heterogeneous beliefs manageable, ithelps to use the perturbed probability model associated with {zt} whencomputing the mathematical expectations that appear in the system ofequations whose solution determines an equilibrium. To prepare a recursiveversion of the Ramsey problem, it also helps to transform the ψt variablethat measures the Ramsey planner’s commitments in a way that reducesthe number of state variables. We extend the analysis in section 10.3 tocharacterize the precise link between our proposed state variable and themultiplier on the private sector Euler equation.

With exogenous belief heterogeneity, it is analytically convenient to for-mulate the Lagrangian for a discrete time version of the Ramsey planner’sproblem as

− 1

2E

∞∑j=0

exp(−εδj)zεj[(λεj)

2 + ζ(yεj − y∗)2]|F0

]

+E

[ ∞∑j=0

exp(−εδj)zεjψε(j+1)

[λεj − ε (κyεj + cεj + c∗)− exp(−εδ)λ(j+1)ε

]|F0

].(10.29)

Page 373: Maskin ambiguity book mock 2 - New York University

10.6. Heterogeneous beliefs without robustness 365

Explanation for treatment of ψt+ε

Compare (10.29) with the corresponding Lagrangian (10.24) for the robustRamsey problem for the first type of ambiguity from section 10.5. There weused zt+εψt+ε as the Lagrange multiplier on the private firm’s Euler equationat the date t information set. What motivated that choice was that in thesection 10.5 model with the first type of ambiguity, private agents use thez-perturbed model, so their expectations can be represented as

E

(zt+εzt

λt+ε|Ft

),

where zt is in the date t information set. Evidently

zt+εzt

ztψt+ε = zt+εψt+ε,

which in section 10.5 allowed us to adjust for the probability perturbationby multiplying ψt+ε by zt+ε and then appropriately withholding zt+ε as afactor multiplying λt+ε in the Euler equation that ψt+εzt+ε multiplies. Incontrast to the situation in section 10.5, here the private sector embracesthe original benchmark model, so the private firm’s Euler equation nowinvolves the conditional expectation E (λt+ε|Ft) taken with respect to theapproximating model. The form of this conditional expectation leads us toattach Lagrange multiplier ztψt+ε to the private firm’s Euler equation atthe information set at date t, a choice that implies that the ratio zt+ε

ztdoes

not multiply λt+ε in the Lagrangian (10.29).

Analysis

First-order conditions associated with λt for t ≥ 0 are

ztψt+ε − εztλt − zt−εψt = 0, (10.30)

and first-order conditions for yt for t ≥ 0 are

−εζzt(yt − y∗)− εκψt+εzt = 0.

To facilitate a recursive formulation, define

ξt+ε =ztzt+ε

ψt+ε, (10.31)

Page 374: Maskin ambiguity book mock 2 - New York University

366 Chapter 10. Three types of ambiguity

which by virtue of (10.30) implies

ξt+ε = εztzt+ε

λt +ztzt+ε

ξt.

While the process {ξt} is not locally predictable, the exposure of ξt+ε toshocks comes entirely through zt+ε. The conditional mean of ξt+ε under theperturbed measure associated with {zt} satisfies

E

(zt+εzt

ξt+ε|Ft

)= ελt + ξt.

First-order conditions for yt imply

(yt − y∗) = −(κ

ζ

)zt+εzt

ξt+ε.

Evidently,

E

[(zt+εzt

)ξt+ελt+ε|Ft

]= ψt+εE (λt+ε|Ft) ,

a prediction formula that suggests a convenient way to pose the Ramseyplanner’s optimization under the z model.

Recursive formulation with exogenous heterogeneous

beliefs

We continue to view the cost-push shock c is a function f(x) of a Markovstate vector x and use evolution equation (10.25) for x+ and z+. As aprolegomenon to studying robustness, we extend the analysis of section10.3 to describe a recursive way to accommodate exogenous heterogeneityin beliefs described by the likelihood ratio k(x, w+ − w, ε). We again workbackwards from a continuation-policy function F+(x+, ξ+) for the private-sector co-state variable λ+ and a continuation-value function V +(x+, ξ+).To start our backwards recursions, we assume that

V +2 (x+, ξ+) = −F+(x+, ξ+). (10.32)

Problem 10.6.1. The Ramsey planner’s Bellman equation is

V (x, ξ) =maxy,λ

−ξλ− ε

2

[λ2 + ζ(y − y∗)2

]+ exp(−δε)E

[(z+

z

)[V +(x+, ξ+) + ξ+F+(x+, ξ+)

]|x, ξ

],

Page 375: Maskin ambiguity book mock 2 - New York University

10.6. Heterogeneous beliefs without robustness 367

where the maximization is subject to

λ− exp(−δε)E[F+(x+, ξ+)|x, ξ

]− ε

[κy + f(x) + c+

]= 0 (10.33)( z

z+

)(ελ + ξ)− ξ+ = 0 (10.34)

g(x, w+ − w, ε)− x+ = 0

zk(x, w+ − w, ε)− z+ = 0.

We now construct an alternative Bellman equation for the Ramsey plan-ner. It absorbs the forward-looking private sector Euler equation into theplanner’s objective function. We still carry along a state transition equationfor ξ+.

Introduce multipliers �1 and(z+

z

)�2 on the constraints (10.33) and

(10.34). Maximizing the resulting Lagrangian with respect to λ and y gives

−ελ + �1 + ε�2 − ξ = 0,

−ζ(y − y∗)− κ�1 = 0.

Thus, (z+

z

)ξ+ − �1 = ε�2.

Therefore, from what we have imposed so far, it seems that ψ+ can differfrom �1, so we cannot yet claim that ψ+ is “the multiplier on the multiplier”.Fortunately, there is more structure to exploit.

Lemma 10.6.2. The multiplier �1 on constraint (10.33) equals(z+

z

)ξ+

and the multiplier �2 on constraint (10.34) equals zero. Furthermore,

y = y∗ −(κ

ζ

)(ξ + ελ) ,

where λ = F (x, ξ) solves the private firm’s Euler equation (10.33). Finally,V2(x, ξ) = −F (x, ξ).

See Appendix 10.A for a proof. Lemma 10.6.2 extends Lemma 10.3.3 to anenvironment with heterogeneous beliefs.

Page 376: Maskin ambiguity book mock 2 - New York University

368 Chapter 10. Three types of ambiguity

Finally, we deduce an alternative Bellman equation that accommodatesheterogeneous beliefs. From Lemma 10.6.2, the Ramsey planner’s valuefunction V (x, ξ) satisfies

V (x, ξ) =maxy,λ

−ξλ− ε

2

[λ2 + ζ(y − y∗)2

]+

+ exp(−δε)E[(

z+

z

)[V +(x+, ξ+) + ξ+F+(x+, ξ+)

]|x, ξ

],

where the maximization is subject to constraints (10.33) and (10.34) andwhere λ = F (x, ξ). Express the contribution of the private sector Eulerequation to a Lagrangian as(z+

z

)ξ+[λ− exp(−δε)E

[F+(x+, ξ+)|x, ξ

]− ε

(κy + c+ c+

)]= − exp(−δε)E

[(z+

z

)[ξ+F+(x+, ξ+)

]|x, ξ

]+

(z+

z

)ξ+ [λ− ε (κy + c+ c∗)] ,

where we have used the fact that(z+

z

)ξ+ is locally predictable. Adding

this term to the Ramsey planner’s objective results in the Lagrangian

− ξλ− ε

2

[λ2 + ζ(y − y∗)2

]+ exp(−δε)E

[(z+

z

)[V +(x+, ξ+)

]|x, ξ

]+

(z+

z

)ξ+[λ− ε

(κy + c+ c+

)].

Next we substitute from (z+

z

)ξ+ = ξ + ελ

to arrive at

Problem 10.6.3. An alternative Bellman equation for a discrete-time Ram-sey planner with belief heterogeneity is

V (x, ψ) = minλ

maxy

ε

2

[λ2 − ζ(y − y∗)2

]+ exp(−δε)E

[k(x, w+ − w, ε)

[V +(x+, ξ+)

]|x, ξ

]− ε(ξ + ελ) [κy + f(x) + c∗] ,

(10.35)

Page 377: Maskin ambiguity book mock 2 - New York University

10.6. Heterogeneous beliefs without robustness 369

where the extremization is subject to( z

z+

)(ελ + ξ)− ξ+ = 0

g(x, w+ − w, ε)− x+ = 0,

where we have used z+ = zk(x, w+ − w, ε) to eliminate the ratio z+

z.

Claim 10.6.4. Discrete-time problems 10.6.1 and 10.6.3 share a commonvalue function V and common solutions for y, λ as functions of the statevector (x, ξ).

In problem 10.6.3, we minimize with respect to λ, taking into account itscontribution to the evolution of the transformed multiplier ξ+.

In the next subsection, we study the continuous-time counterpart toProblem 10.6.3. Taking a continuous-time limit adds structure and tractabil-ity to the probability distortions in ways that we can exploit in formulatinga robust Ramsey problem.

Heterogeneous beliefs in continuous time

Our first step in producing a continuous-time formulation is to characterizethe state evolution. For a Brownian motion information structure, a positivemartingale {zt} evolves as

dzt = ztht · dwtfor some process {ht}. In this section where we assume exogenous beliefheterogeneity, we suppose that h is a given function of the state, but insection 10.7 we will study how a robust planner chooses ht. When used toalter probabilities, the martingale zt changes the distribution of the Brow-nian motion w by appending a drift htdt to a Brownian increment. Recallfrom (10.31) that ξt+ε =

ztzt+ε

ψt+ε. The “exposure” of dzt to the Brownianincrement dwt determines the exposure of dξt to the Brownian incrementand induces a drift correction implied by Ito’s Lemma. By differentiatingthe function 1

zof the real variable z with respect to z and adjusting for the

scaling by zt = z, it follows that the exposure is −ξthtdwt. By computingthe second derivative of 1

zand applying Ito’s Lemma, we obtain the drift

correction ξt|ht|2. Thus,

dξt = λtdt+ ξt|ht|2dt− ξtht′dwt.

Page 378: Maskin ambiguity book mock 2 - New York University

370 Chapter 10. Three types of ambiguity

Also suppose that

dxt = μx(xt)dt+ σx(xt)dwt.

While we can avoid using zt as an additional state variable, the {ξt} processhas a local exposure to the Brownian motion described by −ht ·dwt. It alsohas a drift that depends on ht under the approximating model.

Write the law of motion in terms of dwt as

d

[xtξt

]=

[μx(xt)

λt + ξt|ht|2]dt+

[σx(xt)−ξtht′

]dwt,

where {wt} is standard Brownian motion under the approximating model.Under the distorted model,

d

[xtξt

]=

[μ(xt) + σx(xt)ht

λt

]dt+

[σx(xt)−ξht′

]dwt,

where {wt} is a Brownian motion.

In continuous time, we characterize the impact of the state evolutionusing Ito calculus to differentiate the value function. We subtract V fromboth sides of (10.35) and divide by ε to obtain

0 =minλ

maxy

1

2λ2 − ζ

2(y − y∗)2 − κξy − ξc− ξc∗

− δV + V1 · μx + V2λ

+ (V1)′σxh− ξV21σxh +

1

2ξ2V22|h|2

+1

2trace (σx

′V11σx) , (10.36)

where we use the distorted evolution equation. From the first-order condi-tions

y = y∗ − κ

ζξ

λ = −V2.

As hoped, the private sector Euler equation under the approximatingmodel imposed by the Lagrangian is satisfied as we verify in Appendix 10.A.

Page 379: Maskin ambiguity book mock 2 - New York University

10.7. The second type of ambiguity 371

Remark 10.6.5. To accommodate belief heterogeneity, we have transformedthe predetermined commitment multiplier. Via the martingale used to cap-ture belief heterogeneity, the transformed version of this state variable ac-quires a nondegenerate exposure to the Brownian increment. This structureis reminiscent of the impact of belief heterogeneity in continuous-time recur-sive utility specifications. Dumas et al. (2000) show that conveniently cho-sen Pareto weights are locally predictable when beliefs are homogeneous, butwith heterogeneous beliefs Borovicka (2011) shows that the Pareto weightsinherit an exposure to a Brownian increment from the martingale that altersbeliefs of some economic agents.

10.7 The second type of ambiguity

By exploiting the structure of the exogenous heterogeneous beliefs Ramseyproblem of section 10.6, we now analyze a concern about robustness fora Ramsey planner who faces our second type of ambiguity. In continuous

time, we add a penalty term θ |h|22

to the planner’s objective and minimizewith respect to h:

0 =minλ,h

maxy

1

2λ2 − ζ

2(y − y∗)2 +

θ

2|h|2 − κξy − ξc− ξc∗

− δV + V1 · μx + V2λ

+ (V1)′σxh− ξV12σxh− 1

2ξ2V22|h|2

+1

2trace (σx

′V11σx) .

Recursive formulas for y and λ remain

y = y∗ − κ

ζξ

λ = −V2,but now we add minimization over h to the section 10.6 statement of theRamsey problem. First-order conditions for h are

θh+ (σx)′V1 − ξ(σx)

′V12 + ξ2V22h = 0,

so the minimizing h is

h = −(

1

θ + ξ2V22

)[(V1)

′σx − ξV12σx]′. (10.37)

Page 380: Maskin ambiguity book mock 2 - New York University

372 Chapter 10. Three types of ambiguity

As was the case for the Ramsey plan under the first type of ambiguity,separability of the recursive problem implies that a Bellman-Isaacs condi-tion is satisfied. Again in the spirit of Hansen and Sargent (2008b, ch. 7),we can use a date zero sequence formulation of the worst-case model toavoid having the exogenous state vector feed back onto the endogenousstate variable ξt. For a Ramsey plan under the second type of ambiguity,we use this construction to describe the beliefs of a Ramsey planner whilethe private sector continues to embrace the approximating model. Thismakes heterogeneous beliefs endogenous.

10.8 The third type of ambiguity

We now turn to our third type of ambiguity. Here, following Woodford(2010), a Ramsey planner trusts an approximating model but does notknow the beliefs of private agents. We use {zt} to represent the privatesector’s unknown beliefs.

Discrete time

Here the Lagrangian associated with designing a robust Ramsey plan is

− 1

2E

∞∑j=0

exp(−εδj)[(λεj)

2 + ζ(yεj − y∗)2]|F0

]

[ ∞∑j=0

ε exp[−εδ(j + 1)]

(zε(j+1)

zεj

)[log zε(j+1) − log zεj

]|F0

]

+E

[ ∞∑j=0

exp(−εδj)ψε(j+1)

[λεj − ε (κyεj + cεj + c∗)− exp(−εδ)

(zε(j+1)

zεj

)λ(j+1)ε

]|F0

].

First-order conditions for λt are

ψt+ε − ελt −(ztzt−ε

)ψt = 0.

Let

ξt+ε =

(zt+εzt

)ψt+ε

Page 381: Maskin ambiguity book mock 2 - New York University

10.8. The third type of ambiguity 373

so that

ξt+ε = ε

(zt+εzt

)λt +

(zt+εzt

)ξt. (10.38)

We can imitate the argument underlying Claim 10.6.4 to construct aBellman equation

V (x, ξ) = minλ

maxy

ε

2

[λ2 − ζ(y − y∗)2

]+ exp(−δε)E

[V +(x+, ξ+)|x, ξ

]− ε(ξ + ελ) (κy + c+ c∗) ,

(10.39)

where the extremization is subject to

x+ = g(x, w+ − w, ε)

ξ+ = k(x, w+ − w, ε)ξ + εk(x, w+ − w, ε)λ,

where we have used z+ = zk(x, w+−w, ε) to rewrite the evolution equationfor ξ+.

Woodford’s way of restraining perturbations of beliefs

His assumption that the Ramsey planner embraces the approximating modelprompted Woodford (2010) to measure belief distortions in his own specialway. Thus, while we have measured model discrepancy by discounted rela-tive entropy (10.21), Woodford (2010) instead uses

∞∑j=0

ε exp[−εδ(j + 1)]E

([zε(j+1)

zεj

] [log zε(j+1) − log zεj

]|F0

). (10.40)

Whereas at date zero we weight (log zt+ε − log zt) by zt+ε, Woodford weightsit by zt+ε

zt.

Remark 10.8.1. In discrete time, Woodford’s measure (10.40) is not rela-tive entropy, but a continuous-time counterpart 1

2E[∫∞

0exp(−δt)(ht)2dt|F0

]is relative entropy with a reversal of models. To see this, consider the mar-tingale evolution

dzt = ztht · dwt (10.41)

Page 382: Maskin ambiguity book mock 2 - New York University

374 Chapter 10. Three types of ambiguity

for some process {ht}. By applying Ito’s Lemma,

limε↓0

E

[zt+εzt

(log zt+ε − log zt) |Ft

]=

1

2|ht|2.

Thus, the continuous-time counterpart to Woodford’s discrepancy measureis

1

2E

[∫ ∞

0

exp(−δt)(ht)2dt|F0

]= −δE

[∫ ∞

0

exp(−δt) log ztdt|F0

],

where the right side is a measure of relative entropy that switches roles ofthe {zt}-perturbed model and the approximating model.

Third type of ambiguity in continuous time

We use equation (10.41) for dzt to depict the small ε limit of (10.38) as

dξt = λtdt+ ξtht · dwt.

For a Ramsey planner confronting our third type of ambiguity, we computea robust Ramsey plan under the approximating model. Stack the evolutionequation for ξt together with the evolution equation for xt:

d

[xtξt

]=

[μ(xt)λt

]dt+

[σx(xt)ξtht

]dwt.

The continuous-time counterpart to the Hamilton-Jacobi-Bellman equation(10.39) adjusted for a robust choice of h is

0 = minλ,h

maxy

1

2

[λ2 − ζ(y − y∗)2

]− κξy − ξc− ξc∗

+ V1μx + V2λ− δV (x)

2|h|2 + 1

2trace [σx

′V11σx] + ξh′σx′V12 +1

2(ξ)2|h|2V22.

First-order conditions for extremization are

y = y∗ − κ

ζξ

λ = −V2

h = − 1

θ + ξ2V22ξσx

′V12. (10.42)

Page 383: Maskin ambiguity book mock 2 - New York University

10.9. Comparisons 375

We can verify the private sector Euler equation as we did earlier, exceptthat now we have to make sure that the private sector expectations arecomputed with respect to a distorted model that assumes that dwt hasdrift htdt, where ht is described by equation (10.42).

As with the robust Ramsey planner under the first and second typesof ambiguity, we can verify the corresponding Bellman-Isaacs condition.Under the third type of ambiguity, the worst-case model is attributed tothe private sector while the Ramsey planner embraces the approximatingmodel.

10.9 Comparisons

In this section, we use new types of local approximations to compare models.We modify earlier local approximations in light of the special structures ofour three types of robust Ramsey problems, especially the second and thirdtypes, which appear to be unprecedented in the robust control literature.It is convenient to refer to robust Ramsey plans under our three types ofambiguity as Types I, II, and III, respectively.

James (1992) constructs expansions that simultaneously explore two di-mensions unleashed by increased conditional volatility, namely, increasednoise and increased concern about robustness.11 In particular, within thecontext of our model, he would set σx =

√τςx, θ = 1

ϑτ, and then compute

first derivatives with respect to τ and ϑ. James’s 1992 approach is en-lightening for Type I, but not for Type II or Type III. To provide insightsabout Type II and Type III, we compute two first-order expansions, onethat holds θ < ∞ fixed when we differentiate with respect to τ ; and an-other that holds fixed τ when we differentiate with respect γ = 1

θ. For both

computations, our New Keynesian economic example is simple enough to al-low us to attain quasi-analytical solutions for the parameter configurationsaround which we approximate. We use these first-order approximations tofacilitate comparisons.12

11See Anderson et al. (2012) and Borovicka and Hansen (2013) for related approaches.12James (1992) provides formal justification for his bi-variate expansion. Our presen-

tation is informal in some respects. Modifications of our calculations will be requiredbefore they can be applied to a broader class of models.

Page 384: Maskin ambiguity book mock 2 - New York University

376 Chapter 10. Three types of ambiguity

Suppose that

dxt = A11xtdt+ σxdwt

ct = H · xt,

where σx is a vector of constants.Recall the adjustments (10.27), (10.37), and (10.42) in the drift of the

Brownian motion that emerge from our three types of robustness:

Type I: h∗ = −1

θ[σx

′V1(x, ξ)]

Type II: h∗ = − 1

θ + ξ2V22(x, ξ)[σx

′V1(x, ξ)− ξσx′V12(x, ξ)]

Type III: h∗ = − 1

θ + ξ2V22(x, ξ)[ξσx

′V12(x, ξ)] ,

where the value functions V (x, ξ) and the scaling of the commitment mul-tiplier ξt differs across our three types of ambiguity. In particular, for TypeI we used the commitment multiplier ψt and did not rescale it as we did forthe Type II and III models.. To facilitate comparisons, for the Type I Ram-sey problem we take ξt = ψt. For Type I, the drift adjustment includes onlya contribution from the first derivative of the value function as is typical forproblems studied in the robust control literature. For our Type II and IIIproblems, the second derivative also makes contributions. The associatedadjustments to the planner’s value function in our three types of Ramseyproblems are:

Type I: − 1

2θ|σx′V1(x, ξ)|2 +

1

2trace [σx

′V11(x, ξ)σx]

Type II: − 1

2[θ + ξ2V22(x, ξ)]|σx′V1(x, ξ)− ξσx

′V12(x, ξ)|2 +1

2trace [σx

′V11(x, ξ)σx]

Type III: − 1

2[θ + ξ2V22(x, ξ)]|ξσx′V12(x, ξ)|2 +

1

2trace [σx

′V11(x, ξ)σx] ,(10.43)

where we have included terms involving σx. For each Ramsey plan, letΦ(V, σx, θ) denote the adjustment described in (10.43).

These adjustment formulas are suggestive but also potentially mislead-ing as a basis for comparison because the Ramsey planner’s value functionsthemselves differ across our three types of ambiguity. In the following sec-tion, we propose more even-footed comparisons by taking small noise andsmall robustness approximations around otherwise linear-quadratic economies.

Page 385: Maskin ambiguity book mock 2 - New York University

10.9. Comparisons 377

Common baseline value function

The baseline value function is the same as that given in Appendix 10.Bexcept the constant term differs because we now set σx = 0 when computingW . The minimization problem

0 =minλ

1

2λ2 +

κ2

2ζ(ξ)2 − κξy∗ − ξc− ξc∗

− δW (x, ξ) + [W1(x, ξ)] ·A11x+W2(x, ξ)λ

yields a quadratic value function W (x, ξ) that we propose to use as a base-line with respect to which we compute adjustments for our three types ofrobust Ramsey problems. The Riccati equation is the same one given inAppendix 10.B for the stochastic problem without a concern for robustnessexcept that initially we ignore a constant term contributed by the shockexposure σx, allowing us to solve a deterministic problem.

A small-noise approximation

To facilitate comparisons, we study effects of variations in τ for small τunder the “small noise” parameterization σx =

√τςx, where ςx is a vector

with the same number of columns as x.We deduce the first-order value function expansion

V (x, ξ) ≈W (x, ξ) + τN(x, ξ).

We approximate the optimal λ by

λ ≈ −W2(x, ξ)− τN2(x, ξ),

where N2 differs across our three types of robust Ramsey problems.We study a parameterized HJB equation of the form

0 =− 1

2V2(x, ξ)

2 +κ2

2ζ(ξ)2 − κξy∗ − ξc− ξc∗

− δV (x, ξ) + [V1(x, ξ)] · A11x+ Φ(V, τςx, θ) (x, ξ). (10.44)

We can ignore the impact of minimization with respect to λ and h becauseof the usual “Envelope Theorem” that exploits first-order conditions toeliminate terms involving derivatives of λ and h.

Page 386: Maskin ambiguity book mock 2 - New York University

378 Chapter 10. Three types of ambiguity

We start by computing derivatives with respect to τ of the terms in-cluded in (10.43). Thus, we differentiate Φ(V, τςx, θ) with respect to τ forall three plans. These derivatives are

Type I: S(x, ξ) = − 1

2θ|ςx′W1(x, ξ)|2 +

1

2trace [ςx

′W11ςx]

Type II: S(x, ξ) = − 1

2[θ + ξ2W22]|ςx′W1(x, ξ)− ξς ′W12|2 +

1

2trace [ςx

′W11ς]

Type III: S(x, ξ) = − 1

2[θ + ξ2W22]|ξςx′W12|2 +

1

2trace [ςx

′W11ςx] .

The function S is then used to compute N . To obtain the equation ofinterest, differentiate the (parameterized by τ) HJB equation (10.44) withrespect to τ to obtain:

0 = −W2(x, ξ) ·N2(x, ξ)− δN(x, ξ) +N1(x, ξ)′A11x+ S(x, ξ), (10.45)

where we have used the first-order conditions for λ to inform us that

λ∂λ

∂τ+ V2

∂λ

∂τ= 0.

Then N solves the Lyapunov equation (10.45). Notice that S is a quadraticfunction of the states for Type I, but not for Types II and III. For TypeII and III, this equation must be solved numerically, but it has a quasi-analytic, quadratic solution for Type I.

A small robustness approximation

So far we have kept θ fixed. Instead, we now let θ = 1γand let γ become

small and hence θ large. The relevant parameterized HJB equation becomes

0 =− 1

2V2(x, ξ)

2 +κ2

2ζ(ξ)2 − κξy∗ − ξc− ξc∗

− δV (x, ξ) + [V1(x, ξ)] · A11x+ Φ

(V, σx,

1

γ

)(x, ξ), (10.46)

Page 387: Maskin ambiguity book mock 2 - New York University

10.9. Comparisons 379

where Φ(V, σx, θ) is given by (10.43). Write the three respective adjustmentterms Φ(V, τςx,

1γ) defined in (10.43) as

Type I: − γ

2|σx′V1(x, ξ)|2 +

1

2trace [σx

′V11(x, ξ)σx]

Type II: − γ

2[1 + γξ2V22(x, ξ)]|σx′V1(x, ξ)− ξσx

′V12(x, ξ)|2 +1

2trace [σx

′V11(x, ξ)σx]

Type III: − γ

2[1 + γξ2V22(x, ξ)]|ξσx′V12(x, ξ)|2 +

1

2trace [σx

′V11(x, ξ)σx] .(10.47)

Since σx is no longer made small in this calculation, we compute the limitingvalue function as γ becomes small to be

W (x, ξ) +1

2δtrace [σx

′W11σx] ,

where the additional term is constant and identical for all three robust Ram-sey plans. This outcome reflects a standard certainty equivalent propertyfor linear-quadratic optimization problems.

We now construct a first-order robustness adjustment

V ≈W +1

2δtrace [σx

′W11σx] + γG

λ ≈ −W2 − γG2.

As an intermediate step on the way to constructing G, first differentiate(10.47) with respect to γ:

Type I: H(x, ξ) = −1

2|σx′W1(x, ξ)|2

Type II: H(x, ξ) = −1

2|σx′W1(x, ξ)− ξσx

′W12|2

Type III: H(x, ξ) = −1

2|ξσx′W12|2.

To obtain the equation of interest, differentiate the parameterized HJBequation (10.46) with respect to γ to obtain

0 = −W2(x, ξ) ·G2(x, ξ)− δG(x, ξ) +G1(x, ξ)′A11x+H(x, ξ). (10.48)

Given H , we compute the function G by solving Lyapunov equation (10.48).See Appendix 10.D for more detail.

Page 388: Maskin ambiguity book mock 2 - New York University

380 Chapter 10. Three types of ambiguity

Relation to previous work

To relate our expansions to an approach taken in Hansen and Sargent(2008b, ch. 16), we revisit Type II. Using the same section 10.9 param-eterization that we used to explore small concerns about robustness, weexpress the HJB equation as

0 =minλ,h

maxy

1

2λ2 − ζ

2(y − y∗)2 +

1

2γ|h|2 − κξy − ξc− ξc∗

− δV + V1 · μx + V2λ

+ (V1)′σxh− ξV21σxh− 1

2ξ2V22|h|2

+1

2trace (σx

′V11σx) . (10.49)

In Hansen and Sargent (2008b, ch. 16), we arbitrarily modified this HJBequation to become

0 =minλ,h

maxy

1

2λ2 − ζ

2(y − y∗)2 +

1

2γ|h|2 − κξy − ξc− ξc∗

− δU + U1 · μx + U2λ

+ (U1)′σxh− ξU21σxh

2trace (σx

′U11σx) , (10.50)

which omits the term −12ξ2V22|h|2 that is present in (10.49). A quadratic

value function solves the modified HJB equation (10.50) provided that γ isnot too large. Furthermore, it shares the same first-order robustness expan-sions that we derived for Type II. The worst-case h distortion associatedwith the modified HJB equation (10.50) is

h = −γσx′ [U1(x, ξ)− ξU12] .

Hansen and Sargent (2008b) solve a version of the modified HJB equa-tion (10.50) iteratively. They guess σx

′U12, solve the resulting Riccati equa-tion, compute a new guess for σx

′U12, and then iterate to a fixed point.Thus, the Hansen and Sargent (2008b, ch. 16) approach yields a correctfirst-order robustness expansion for a value function that itself is actually

Page 389: Maskin ambiguity book mock 2 - New York University

10.10. Numerical example 381

incorrect because of the missing term that appears in the HJB equation(10.49) but not in (10.50).13

Consider the first-order robustness expansion for Type II. Since W isquadratic, W1(x, ξ)−ξW12 depends only on x and not on ξ. Also, H and Gboth depend only on x and not on ξ, so G2 is zero and there is no first-orderadjustment for λ. The approximating continuation value function is altered,but only those terms that involve x alone. Given the private sector’s trustin the approximating model, even though the Ramsey planner thinks thatthe approximating model might misspecify the evolution of {xt}, there is noimpact on the outcome for λ. That same statement applies to U(x, ξ)−ξU12,which illustrates an observation made by Dennis (2008) in the context of theapproach suggested in Hansen and Sargent (2008b, ch. 16). When we usethat original HJB equation to compute the value function, this insensitivityof λ to γ may not hold.

10.10 Numerical example

Using parameter values given in Appendix 10.C and a robustness parameterθ = .014, we illustrate the impact of a concern for robustness. Most ofthese parameter values are borrowed from Woodford (2010). Woodfordtakes the cost-push shock to be independent and identically distributed. Inour continuous-time specification, we assume an AR process with the sameunconditional standard deviation .02 assumed by Woodford. Since θ acts asa penalty parameter, we find it revealing to think about the consequences ofθ for the worst-case model when setting θ. Under the worst-case model, theaverage drift distortion for the standardized Brownian increment is about.34. We defer to later work a serious quantitative investigation including thecalibration of θ.14 What follows is for illustrative purposes only. Appendix10.C contains more numerical details.

13Hansen and Sargent (2008b) take the shock exposure of dξt to be zero, as is the casefor dψt. The correct shock exposure for dξt scales with γ and is zero only in the limitingcase. Hansen and Sargent (2008b) interpret σx

′U12 as the shock exposure for λt, whichis only an approximation.

14See Anderson et al. (2003) for a discussion of an approach to calibration based onmeasures of statistical discrimination.

Page 390: Maskin ambiguity book mock 2 - New York University

382 Chapter 10. Three types of ambiguity

Type I

For Type I ambiguity, we have quasi-analytical solutions. Under the ap-proximating model, the cost-push shock evolves as

dct = −.15ctdt+ .011dwt, (10.51)

while under the worst-case model it evolves as

d

[ctΨt

]=

[−.0983 .01071.2485 −.6926

] [ctΨt

]dt+

[.0017.0173

]dt+

[.0110

]dwt, (10.52)

a system in which {Ψt} Granger causes {ct}. In what follows we constructordinary (non-robust) Ramsey plans for both cost-push shock specifications(10.51) and (10.52). If we set Ψ0 = 0 in (10.52), the time series trajectoriesunder the ordinary Ramsey plan constructed assuming that the plannercompletely trusts the above worst-case cost-push shock model will coincidewith time series trajectories chosen by the robust Ramsey planner whodistrusts the approximating model (10.51).

To depict dynamic implications, we report impulse response functionsfor the output gap and inflation using the two specifications (10.51) and(10.52) for the cost-push process. Figure 10.2 reports impulse responsesunder the approximating model (10.51) and these same responses underthe worst-case model (10.52). Outcomes for the different cost-push shockmodels are depicted in the two columns of this figure. We also compute op-timal plans for both cost-push shock specifications and consider the impactof misspecification. Thus, we plot two impulse response functions depend-ing on which cost-push shock model, (10.51) or (10.52), is imputed to theplanner who computes an ordinary non-robust Ramsey plan. The impulseresponse functions plotted in each of the panels line up almost on top ofeach other even though the actual cost processes are quite differenct. Theimplication is that the important differences in outcomes do not come frommisspecification in the mind of the Ramsey planner but from what we canregard as different models of the cost-push process imputed to an ordinarynon-robust Ramsey planner.

The worst-case drift distortion includes a constant term that has noimpact on the impulse response functions. To shed light on the implica-tions of the constant term, we computed trajectories for the output gapand inflation under the approximating model, setting the initial value of

Page 391: Maskin ambiguity book mock 2 - New York University

10.10. Numerical example 383

Figure 10.2: The left panels assume the approximating model for the costprocess, and the right panels assume the worst-case models for the costprocess. The top panels give the impulse response functions for the costprocess, the middle panels for the logarithm of the output gap, the bottompanels for inflation. The dashed line uses the approximating model solutionand the solid line uses the worst-case model solution. The time units onthe horizontal axis are quarters.

Page 392: Maskin ambiguity book mock 2 - New York University

384 Chapter 10. Three types of ambiguity

the cost-push variable to zero. Again we compare outcomes under a ro-bust Ramsey plan with those under a Ramsey planner who faces type Iambiguity. The left panel of Figure 10.3 reports differences in logarithmsscaled by one-hundred. By construction, the optimal Ramsey plan com-puted under the approximating model gives a higher value of the objectivefunction when the computations are done under the approximating model.The optimal plan begins at y∗ and diminishes to zero. Under the robustRamsey plan (equivalently the plan that is optimal under the worst-casecost model), output starts higher than the target y∗ and then diminishesto zero. Inflation is also higher under the robust Ramsey plan. The rightpanel of Figure 10.3 reports these differences under the worst-case modelfor the cost process. We initialize the calculation at⎡⎣ c0Ψ0

ψ0

⎤⎦ =

⎡⎣.024900

⎤⎦ ,where .0249 is the mean of the cost-push shock under the worst-case model.Again the output gap and inflation are higher under this robust Ramseyplan. If the worst-case model for the cost-push shock were to be completelytrusted by a Ramsey planner, he would choose the same plan as the robustRamsey planner. As a consequence, the output gap starts at y∗ and di-minishes to zero. The optimal plan under the approximating model startslower and diminishes to zero. The percentage differences depicted in theright panel of Figure 10.3 are substantially larger than those depicted inthe left panel.

To summarize our results for type I ambiguity, while the impulse re-sponse function depend very little on whether or not the robustness adjust-ment is made, shifts in constant terms do have a nontrivial impact on theequilibrium dynamics that are reflected in transient responses from differentinitial conditions.

Comparing Types II and III to Type I

To compare Type I with Types II and III, we compute derivatives for theworst-case drift distortion and for the decision rule for λ. The worst-casedrift coefficients are shown in table 1. Notice the structure in these coeffi-cients. Recall that the Type II problem has the private sector embracingthe approximating model, and that this substantially limits the impact of

Page 393: Maskin ambiguity book mock 2 - New York University

10.10. Numerical example 385

Figure 10.3: The left panels assume the approximating model for the costprocess initialized at its unconditional mean, 0. The right panels assumethe worst-case models for the cost process initialized at its unconditionalmean, .0249. The top panels give trajectory differences without shocks forthe logarithm of the output gap (times one hundred), and the bottom panelsgive trajectory differences (times one hundred) for inflation without shocks.The time units on the horizontal axis are quarters.

Page 394: Maskin ambiguity book mock 2 - New York University

386 Chapter 10. Three types of ambiguity

Ambiguity Type c ξ 1I .4752 .1271 .0111II .4752 0 .0111III 0 - .1271 0

Table 10.1: Coefficients for the derivatives of the drift distortion with re-spect to γ times 10.

robustness. The coefficient on the (transformed) commitment multiplieris zero, but the other two coefficients remain the same as in Type I. Incontrast, for Type III only the coefficient on ξ is different from zero. Thecoefficient is the negative of that for Type I because the Ramsey plannernow embraces the approximating model in contrast to Type I. Since theconstant term is zero for Type III, the impact of robustness for a givenvalue of θ, say θ = .014 as in our previous calculations, will be small. A cal-ibration of θ using statistical criteria in the style of Anderson et al. (2003)would push us to much lower values of θ.

Robustness also alters the decision rule for λ as reflected in the deriva-tives with respect to γ, as shown in table 2. The Type II adjustments

Ambiguity Type c ξ 1I 0.0854 0.0114 0.0022II 0.0000 0.0000 0.0000III 0.0154 0.0114 0.0002

Table 10.2: Coefficients for the derivatives for inflation with respect to γtimes 100.

are evidently zero because the private sector embraces the approximatingmodel. Type III derivatives are relatively small for the coefficients on ctand ξt.

While we find these derivatives to be educational, the numerical cal-culations for Type I reported in section 10.10 are apparently outside therange to which a linear approximation in γ is accurate. This suggests thatbetter numerical approximations to the HJB equations for Type II andIII ambiguity will be enlightening. We defer such computations to futureresearch.

Page 395: Maskin ambiguity book mock 2 - New York University

10.11. Concluding remarks 387

10.11 Concluding remarks

This paper has made precise statements about the seemingly vague topic ofmodel ambiguity within a setting with a timing protocol that allows a Ram-sey planner who is concerned about model misspecification to commit tohistory-contingent plans to which a private sector adjusts. There are differ-ent things that decision makers can be ambiguous about, which means thatthere are different ways to formulate what it means for either the planner orthe private agents to experience ambiguity. We have focused on three typesof ambiguity. We chose these three partly because we think they are intrin-sically interesting and have potential in macroeconomic applications, andpartly because they are susceptible to closely related mathematical formu-lations. We have used a very simple New Keynesian model as a laboratoryto sharpen ideas that we aspire to apply to more realistic models.

We are particularly interested in type II ambiguity because here there isendogenous belief heterogeneity. Since our example precluded endogenousstate variables other than a commitment multiplier, robustness influencedthe Ramsey planner’s value function but not Ramsey policy rules. In futureresearch, we hope to study settings with other endogenous state variablesand with pecuniary externalities that concern a Ramsey planner and whosemagnitudes depend partly on private-sector beliefs.

In this paper, we started with a model that might be best be interpretedas the outcome of a log-linear approximation, but then ignored the associ-ated approximation errors when we explored robustness. Interestingly, eventhis seemingly log-linear specification ceased to be log-linear in the presenceof the Type II and Type III forms of ambiguity. In the future, we intend toanalyze more fully the interaction between robustness and approximation.The small noise and small robustness expansions and related work in Adamand Woodford (2011) are steps in this direction, but we are skeptical aboutthe sizes of the ranges of parameters to which these local approximationsapply and intend to explore global numerical analytic approaches. Ourexercises in the laboratory provided by the New Keynesian model of thispaper should pave the way for attacks on more quantitatively ambitiousRamsey problems.

Page 396: Maskin ambiguity book mock 2 - New York University

388 Chapter 10. Three types of ambiguity

Appendix 10.A Some basic proofs

Lemma 10.3.3 is a special case of Lemma 10.6.2 with z+ = z > 0, ψ+ = ξ+

and ψ = ξ. We restate and prove Lemma 10.6.2.

Lemma 10.A.1. The multiplier �1 on constraint (10.33) equals(z+

z

)ξ+

and the multiplier �2 on constraint (10.34) equals zero. Furthermore,

y = y∗ −(κ

ζ

)(ξ + ελ) ,

where λ = F (x, ξ) solves the private firm’s Euler equation (10.33). Finally,V2(x, ξ) = −F (x, ξ).

Proof. From relation (10.32)

∂ξ+[V +(x+, ξ+) + ξ+F+(x+, ξ+)

]= ξ+F+

2 (x+, ξ+).

Differentiate the Lagrangian with respect to ξ+ to obtain

−(z+

z

)�2 − �1 exp(−δε)F+

2 (x+, ξ+) + exp(−δε)(z+

z

)ξ+F+

2 (x+, ξ+) = 0.

Taking conditional expectations gives

−�2 +[(

z+

z

)ξ+ − �1

]exp(−δε)E

[F+2 (x+, ξ+)|x, ξ

]= 0

so that

�2(1− ε exp(−δε)E

[F+2 (x+, ξ+)|x, ξ

])= 0.

We conclude that �1 =(z+

z

)ξ+. The relation V2(x, ψ) = −F (x, ψ) follows

from an envelope condition.

Next we verify that HJB equation (10.20) or (10.36) assures that thefirm’s Euler equation is satisfied. We carry out this verification for HJBequation (10.36), but the same argument applies for HJB equation (10.20)

Page 397: Maskin ambiguity book mock 2 - New York University

10.B. Example without robustness 389

after we set h = 0 and ξ = ψ. Differentiating the objective of the plannerwith respect to ξ and using V2 = −F gives

0 =− κy − c− c∗ + δF − F1 · μx − F2λ

− (F1)′σxh + ξF12σxh−

1

2ξ2F22|h|2

− 1

2trace (σx

′F11σx) + (F1)′σxh− ξF2|h|2,

where we have used the envelope condition to adjust for optimization. Mul-tiplying by minus one and simplifying gives

0 =κy + c+ c∗ − δF + F1 · μx + F2λ+ ξF2|h|2

+1

2trace (σx

′F11σx)− ξF12σxh+1

2ξ2F22|h|2.

Observe that

μλ,t =F1(xt, ψt) · μx(xt) + F2(xt, ψt)λt + ξtF2(xt, ψt)|ht|2

+1

2trace

[σx(xt)

′F11(xt, ψt)σx(xt)]− ξtF12(xt, ξt)σx(xt)ht +

1

2(ξt)

2F22(xt, ξt)|ht|2.

Thus, the Euler equation μλ,t = −κyt − ct − c∗ + δF (xt, ψt) is satisfied.

Appendix 10.B Example without

robustness

If we suppose the exogenous linear dynamics

dxt = A11xtdt+ σxdwt

ct = H · xt,

where σx is a vector of constants, it is natural to guess that the Ramseyplanner’s value function is quadratic:

V (x, ψ) =1

2

[x ψ 1

⎡⎣xψ1

⎤⎦+ v.

Page 398: Maskin ambiguity book mock 2 - New York University

390 Chapter 10. Three types of ambiguity

Then

F (x, ψ) = −[0 1 0

⎡⎣cψ1

⎤⎦ .Let

B =

⎡⎣010

⎤⎦A =

⎡⎣A11 − δ2

0 00 − δ

20

0 0 − δ2

⎤⎦Q =

⎡⎣ 0 −H 0

−H ′ κ2

ζ−κy∗ − c∗

0 −κy∗ − c∗ 0

⎤⎦ .The matrix Λ solves what is not quite a standard Riccati equation becausethe matrix Q is indefinite:

−ΛBB′Λ + A′Λ + ΛA+Q. (10.53)

The last thing to compute is the constant

v =(σc)

2

δ

[1 0 0

⎡⎣100

⎤⎦ .We have confirmed numerically that we can compute the same Ramsey

plan by using either the sequential formulation of section 10.3 that leadsus to solve for the stabilizing solution of a linear equation system or therecursive method of section 10.3 that leads us to solve the Riccati equation(10.53). We assume the parameter values:

δ = .0101 A11 = −.15κ = .05 H = 1

ζ = .005 σx =√.3× .02

y∗ = .2 c∗ = 0

Most of these parameter values are borrowed fromWoodford (2010). Wood-ford takes the cost shock to be independent and identically distributed. In

Page 399: Maskin ambiguity book mock 2 - New York University

10.C. Example with first type of ambiguity 391

our continuous-time specification, we assume an AR process with the sameunconditional standard deviation .02 assumed by Woodford.

The Matlab Riccati equation solver care.m applied to (10.53) gives15

F (c, ψ) =[1.1599 −0.7021 0.0140

] ⎡⎣ cψ1

⎤⎦

d

[ctψt

]=

[−0.15 01.1599 −0.7021

] [ctψt

]dt+

[0

0.014

]dt+

[.0110

]dwt

V =

⎡⎣−4.3382 −1.1599 −0.1017−1.1599 0.7021 −0.0140−0.1017 −0.0140 −0.0195

⎤⎦

Appendix 10.C Example with first type of

ambiguity

For our linear-quadratic problem, it is reasonable to guess that the valuefunction is quadratic:

V (c, ψ) =1

2

[c ψ 1

⎡⎣ cψ1

⎤⎦+ v.

Then

F (x, ψ) = −[0 1 0

⎡⎣cψ1

⎤⎦ .15As expected, the invariant subspace method for solving (10.9), (10.1), and (10.3)

gives identical answers.

Page 400: Maskin ambiguity book mock 2 - New York University

392 Chapter 10. Three types of ambiguity

Let

B =

⎡⎣0 σc1 00 0

⎤⎦A =

⎡⎣A11 − δ2

0 00 − δ

20

0 0 − δ2

⎤⎦Q =

⎡⎣ 0 −H 0

−H ′ κ2

ζ−κy∗ − c∗

0 −κy∗ − c∗ 0

⎤⎦R =

[1 00 θ

].

The matrix Λ solves

−ΛBR−1B′Λ + A′Λ + ΛA+Q.

Again, this Riccati equation is not quite standard because the matrix Q isindefinite. Finally,

v =(σc)

2

δ

[1 0 0

⎡⎣100

⎤⎦ .Example

Parameter values are the same as those in Appendix 10.B except that nowθ = .014.

Using the Matlab program care,

λ = F (c, ψ) =[1.2485 −0.6926 0.0173

] ⎡⎣ cψ1

⎤⎦ (10.54)

h =[4.7203 0.9769 0.1556

] ⎡⎣ cψ1

⎤⎦V =

⎡⎣−6.0326 −1.2485 −0.1988−1.2485 0.6926 −0.0173−0.1988 −0.0173 −0.0630

⎤⎦ .

Page 401: Maskin ambiguity book mock 2 - New York University

10.D. Sensitivity to robustness 393

The function F that emerges by solving the Ramsey problem withoutrobustness is

F (c,Ψ, ψ) =[1.2485 0.0095 −0.7021 0.0173

] ⎡⎢⎢⎣cΨψ1

⎤⎥⎥⎦ .Notice that the first coefficient and last coefficients equal the correspond-ing ones on the right side of (10.54) and that the sum of the second twocoefficients equals the second coefficient in (10.54).

Appendix 10.D Sensitivity to robustness

To compute the first-order adjustments for robustness, form

−H(x, ψ) =1

2

[x′ ξ 1

⎡⎣xξ1

⎤⎦ .Guess a solution of the form

−G(x, ψ) = 1

2

[x′ ξ 1

⎡⎣xξ1

⎤⎦ .The Lyapunov equation

(A∗)′Γ + ΓA∗ +Υ = 0

can be solved using the Matlab routine lyap. We used this approach tocompute the derivatives reported in section 10.9.

Page 402: Maskin ambiguity book mock 2 - New York University
Page 403: Maskin ambiguity book mock 2 - New York University

Bibliography

Abreu, Dilip, David Pearce, and Ennio Stacchetti. 1986. Optimal Car-tel Equilibria with Imperfect Monitoring. Journal of Economic Theory39 (1):251–69.

———. 1990. Toward a Theory of Discounted Repeated Games with Im-perfect Monitoring. Econometrica 58 (5):1041–63.

Adam, Klaus and Michael Woodford. 2011. Robustly Optimal MonetaryPolicy in a Microfounded New Keynesian Model. Columbia University.

Ai, Hengjie. 2006. Incomplete Information and Equity Premium in Produc-tion Economies. Unpublished.

Aiyagari, S. Rao, Albert Marcet, Thomas J. Sargent, and Juha Seppala.2002. Optimal Taxation without State-Contingent Debt. Journal of Po-litical Economy 110 (6):1220–54.

Alvarez, Fernando and Urban J. Jermann. 2004. Using Asset Prices toMeasure the Cost of Business Cycles. Journal of Political Economy112 (6):1223–56.

———. 2005. Using Asset Prices to Measure the the Persistence in theMarginal Utility of Wealth. Econometrica 73 (6):1977–2016.

Anderson, BDO and JB Moore. 1979a. Optimal Filtering. Prentice-Hall,Englewood Cliffs, NJ.

———. 1979b. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ.

Anderson, Evan, Lars P Hansen, and Thomas J Sargent. 1999a. Risk andRobustness in Equilibrium. Tech. rep., Working paper, University ofChicago.

395

Page 404: Maskin ambiguity book mock 2 - New York University

396 Bibliography

Anderson, Evan W. 1998. Uncertainty and the Dynamics of Pareto OptimalAllocations. Dissertation, University of Chicago.

Anderson, Evan W. 2005. The Dynamics of Risk-Sensitive Allocations.Journal of Economic Theory 125 (2):93–150.

Anderson, Evan W. and Lars Peter Hansen. 1996. Perturbation Methodsfor Risk Sensitive Economies. Mimeo.

Anderson, Evan W., Lars Peter Hansen, and Thomas J. Sargent. 1999b.Risk and Robustness in Equilibrium. Tech. rep., Working paper, Univer-sity of Chicago.

———. 2000. Robustness, Detection and the Price of Risk. Mimeo.

———. 2003. A Quartet of Semigroups for Model Specification, Robustness,Prices of Risk, and Model Detection. Journal of the European EconomicAssociation 1 (1):68–123.

———. 2012. Small Noise Methods for Risk-Sensitive/Robust Economies.Journal of Economics Dynamics and Control 36 (4):468–500.

Angeletos, George-Marios. 2002. Fiscal Policy with Non-Contingent Debtand the Optimal Maturity Structure. Quarterly Journal of Economics117:1105–31.

Anscombe, Francis J. and Robert J. Aumann. 1963. A Definition of Sub-jective Probability. Annals of Mathematical Statistics 34:199–205.

Araujo, Aloisio and Alvaro Sandroni. 1999. On the Convergence to Ho-mogeneous Expectations when Markets are Complete. Econometrica67 (3):663–72.

Arrow, Kenneth J. 1971. Essays in the Theory of Risk Bearing. Chicago,Illinois: Markham.

Backus, David, Mikhail Chernov, and Stanley E. Zin. 2011. Sources ofEntropy in Representative Agent Models. Nber working papers, NationalBureau of Economic Research, Inc.

Balaji, Srinivasan and Sean P. Meyn. 2000. Multiplicative Ergodicity andLarge Deviations for an Irreduciable Markov Chain. Stochastic Processesand their Applications 90:123–44.

Page 405: Maskin ambiguity book mock 2 - New York University

Bibliography 397

Bansal, Ravi and Bruce N. Lehmann. 1997. Growth-optimal Portfolio Re-strictions on Asset Pricing Models. Macroeconomic Dynamics 1:333–54.

Bansal, Ravi and Amir Yaron. 2004. Risks for the Long Run: A PotentialResolution of Asset Pricing Puzzles. Journal of Finance 59 (4):1481–1509.

Barillas, Francisco, Lars Peter Hansen, and Thomas J. Sargent. 2009.Doubts or variability? Journal of Economic Theory 144 (6):2388–2418.

Barlevy, Gadi. 2009. Policymaking under uncertainty: Gradualism androbustness. Economic Perspectives 33 (2):38–55.

Barro, Robert J. 1979. On the Determination of the Public Debt. Journalof Political Economy 87 (5):940–71.

———. 2006. Rare Disasters and Asset Markets in the Twentieth Century.Quarterly Journal of Economics 121 (3):823–66.

———. 2007. On the Welfare Cost of Consumption Uncertainty. Unpub-lished.

Basar, T. and P. Bernhard. 1995. H∞-Optimal Control and Related Mini-max Design Problems. Birkhauser, second ed.

Bassetto, Marco. 1999. Optimal Fiscal Policy with Heterogeneous Agents.Unpublished paper, Federal Reserve Bank of Chicago.

Battaglini, Marco and Stephen Coate. 2007. A Dynamic Theory of Pub-lic Spending, Taxation and Debt. Discussion Papers 1441, NorthwesternUniversity, Center for Mathematical Studies in Economics and Manage-ment Science.

Battigalli, Pierpaolo, Simone Cerreia-Vioglio, Fabio Maccheroni, and Mas-simo Marinacci. 2011. Self confirming Equilibrium and Uncertainty.Working Paper 428, IGIER, Bocconi University.

Baum, Leonard E. and Ted Petrie. 1966. Statistical Inference for Prob-abilistic Functions of a Finite State. Annals of Mathematical Statistics37 (6):1554–63.

Page 406: Maskin ambiguity book mock 2 - New York University

398 Bibliography

Becker, Gary S. and Kevin M. Murphy. 1988. A Theory of Rational Addic-tion. Journal of Political Economy 96:675–700.

Berestycki, Henri, Louis Nirenberg, and S. R. Srinivasa Varadhan. 1994.The Principal Eigenvalue and the Maximum Principal for Second-OrderElliptic Operators in General Domains. Communications in Pure andApplied Mathematics 47:47–92.

Bergemann, Dirk and J. Valimaki. 1996. Learning and Strategic Pricing.Econometrica 64:1125–49.

Bernanke, Ben S. 2007. Monetary Policy under Uncertainty. 32nd AnnualEconomic Policy Conference, Federal Reserve Bank of St. Louis.

Beveridge, Stephen and Charles R. Nelson. 1981. A New Approach to theDecomposition of Economic Time Series into Permanent and TransitoryComponents with Particular Attention to the Measurement of the ‘Busi-ness Cycle’. Journal of Monetary Economics 7:151–74.

Bewley, Truman. 1977. The Permanent Income Hypothesis: A TheoreticalFormulation. Journal of Economic Theory 16:252–59.

Bhattacharya, R. N. 1982. On the Functional Central Limit Theorem andthe Law of the Iterated Logarithm. Zeitschrift fur Wahrscheinlichkkeits-theorie und Verwandte Gebiete 60:185–201.

Billingsley, P. 1961. The Lindeberg-Levy Theorem for Martingales. Amer-ican Mathematical Monthly 12:788–92.

Blackwell, D. and L. Dubins. 1962. Merging of Opinions with IncreasingInformation. Annals of Mathematical Statistics 38:882–86.

Blackwell, D. and M. Girshick. 1954. Theory of Games and StatisticalDecisions. New York: Wiley.

Blanchard, Olivier J. and Danny Quah. 1989. The Dynamic Effects of Ag-gregate Demand and Supply Disturbances. American Economic Review79:655–73.

Boldrin, Michele, Lawrence J. Christiano, and Jonas D. M. Fisher. 1995.Asset pricing lessons for modeling business cycles. Tech. rep., NationalBureau of Economic Research.

Page 407: Maskin ambiguity book mock 2 - New York University

Bibliography 399

Borovicka, Jaroslav. 2011. Survival and Long-Run Dynamics with Hetero-geneous Beliefs under Recursive Preferences. Tech. Rep. 2011-06, FederalReserve Bank of Chicago.

Borovicka, Jaroslav and Lars Peter Hansen. 2011. Examining Macroeco-nomic Models through the Lens of Asset Pricing. University of Chicago.

———. 2013. Examining Macroeconomic Models through the Lens of AssetPricing. Journal of Econometrics forthcoming.

Borovicka, Jaroslav, Lars Peter Hansen, Mark Hendricks, and Jose A.Scheinkman. 2011. Risk-Price Dynamics. Journal of Financial Econo-metrics 9 (1):3–65.

Bossaerts, Peter. 2002. The Paradox of Asset Pricing. Princeton, NJ:Princeton University Press.

———. 2004. Filtering Returns for Unspecified Biases in Priors when Test-ing Asset Pricing Theory. Review of Economic Studies 71:63–86.

Bouakiz, M. A. and M. J. Sobel. 1985. Nonstationary Policies are Opti-mal for Risk-sensitive Markov Decision Processes. Tech. rep., TechnicalReport, College of Management, Georgia Institute of Technology.

Box, G. E. P. and G. C. Tiao. 1992. Bayesian Inference in Statisical Anal-ysis. New York: John Wiley and Sons, Inc.

Brainard, William. 1967. Uncertainty and the Effectiveness of Policy. Amer-ican Economic Review 57 (2):411–25.

Bray, Margaret. 1982. Learning, Estimation, and the Stability of RationalExpectations. Journal of Economic Theory 26 (2):318–339.

Breeden, Douglas T. 1979. An intertemporal asset pricing model withstochastic consumption and investment opportunities. Journal of Fi-nancial Economics 7 (3):265–96.

Breiman, Leo. 1968. Probability Theory. Reading, Massachusetts: Addison-Wesley Publishing Company.

Brennan, Michael J. and Yihong Zia. 2001. Stock Price Volatility andEquity Premium. Journal of Monetary Economics 47:249–83.

Page 408: Maskin ambiguity book mock 2 - New York University

400 Bibliography

Brock, W. A. 1982. Asset Pricing in a Production Economy. In The Eco-nomics of Information and Uncertainty, edited by J. J. McCall. Univer-sity of Chicago Press, for the National Bureau of Economic Research.

Brock, William A. and Steven N. Durlauf. 2005. Local robustness analysis:Theory and application. Journal of Economic Dynamics and Control29 (11):2067–92.

Brock, William A. and Cars Hommes. 1994. Rational Routes to Random-ness: Rationality in an Unstable Market with Information Costs ImpliesChaos. Department of Economics, University of Wisconsin at Madison.

Brock, William A. and Blake D. LeBaron. 1996. A Dynamic StructuralModel for Stock Return Volatility and Trading Volume. Review of Eco-nomics and Statistics 78:94–110.

Brock, William A. and Leonard J. Mirman. 1972. Optimal EconomicGrowth and Uncertainty: The Discounted Case. Journal of EconomicTheory 4 (3):479–513.

Brock, William A., Steven N. Durlauf, and Kenneth D. West. 2003. PolicyEvaluation in Uncertain Economic Environments. Brookings Papers onEconomic Activity 2003 (1):235–322.

———. 2004. Model Uncertainty and Policy Evaluation: Some Theory andEmpirics. SSRI paper 2004-19, University of Wisconsin.

Brock, William A., Steven N. Durlauf, James M. Nason, and GiacomoRondina. 2007. Simple versus optimal rules as guides to policy. Journalof Monetary Economics 54 (5):1372–96.

Brock, William A., Steven N. Durlauf, and Giacomo Rondina. 2008.Frequency-Specific Effects of Stabilization Policies. American EconomicReview 98 (2):241–45.

Brunner, Karl and Allan H. Meltzer. 1986. Real Business Cycles, Real Ex-change Rates and Actual Policies. Carnegie-Rochester Conference Serieson Public Policy 25 (1):1–10.

Brunnermeier, Markus K., Christian Gollier, and Jonathan A. Parker. 2007.Optimal Beliefs, Asset Prices and the Preference for Skewed Returns.CEPR Discussion Papers 6181, C.E.P.R. Discussion Papers.

Page 409: Maskin ambiguity book mock 2 - New York University

Bibliography 401

Bucklew, James A. 2004. An Introduction to Rare Event Simulation. NewYork: Springer Verlag.

Buera, Francisco and Juan-Pablo Nicolini. 2004. Optimal maturity of gov-ernment debt without state contingent bonds. Journal of Monetary Eco-nomics 51:531–54.

Burnside, Craig. 1994. Hansen-Jagannathan Bounds as Classical Testsof Asset Pricing Models. Journal of Business and Economic Statistics12 (1):57–79.

Burnside, Craig, Martin Eichenbaum, and Sergio Rebelo. 1993. Laborhoarding and the business cycle. Tech. rep., National Bureau of Eco-nomic Research.

Caballero, Ricardo J. and Arvind Krishnamurthy. 2008. Collective RiskManagement in a Flight to Quality Episode. Journal of FinanceLXIII (5):2195–2230.

Caballero, Ricardo J. and Pablo Kurlat. 2009. The “Surprising” Origin andNature of Financial Crises: A Macroeconomic Policy Proposal. Preparedfor the Jackson Hole Symposium on Financial Stability and Macroeco-nomic Policy, August 2009.

Cagetti, Marco, Lars Peter Hansen, Thomas J. Sargent, and NoahWilliams.2000. Robust Recursive Prediction and Control. Unpublished.

———. 2002. Robustness and Pricing with Uncertain Growth. Review ofFinancial Studies 15 (2):363–404.

Cameron, Robert H and William T Martin. 1947. The Behavior of Measureand Measurability under Change of Scale in Wiener Space. Bulletin ofthe American Mathematical Society 53 (2):130–137.

Campbell, John Y. 1987. Does Saving Anticipate Declining Labor Income?An Alternative Test of the Permanent Income Hypothesis. Econometrica55 (6):1249–73.

Campbell, John Y. and John H. Cochrane. 1999. By Force of Habit: AConsumption-Based Explanation of Aggregate Stock Market Behavior.Journal of Political Economy 107 (2):205–51.

Page 410: Maskin ambiguity book mock 2 - New York University

402 Bibliography

Campi, Marco C. and Matthew R. James. 1996. Nonlinear Discrete-TimeRisk-Sensitive Optimal Control. International Journal of Robust andNonlinear Control 6:1–19.

Carroll, Christopher D. 1992. The Buffer-Stock Theory of Saving: SomeMacroeconomic Evidence. Brookings Papers on Economic Activity1992 (2):61–156.

Cecchetti, Stephen G., Pok sang Lam, and Nelson C. Mark. 1994. Test-ing volatility restrictions on intertemporal marginal rates of substitu-tion implied by Euler equations and asset returns. Journal of Finance49 (1):123–52.

———. 2000. Asset Pricing with Distorted Beliefs: Are Equity ReturnsToo Good to Be True? American Economic Review 90 (4):787–805.

Cerreia-Vioglio, Simone, Fabio Maccheroni, Massimo Marinacci, and LuigiMontrucchio. 2008. Uncertainty Averse Preferences. Working paper,Collegio Carlo Alberto.

———. 2011. Ambiguity and Robust Statistics. Working Paper 382, IGIER.

Chamberlain, Gary. 2000. Econometric Applications of Maxmin ExpectedUtility Theory. Journal of Applied Econometrics 15:625–44.

Chang, Roberto. 1998. Credible Monetary Policy in an Infinite HorizonModel: Recursive Approaches. Journal of Economic Theory 81 (2):431–61.

Chari, V. V., Lawrence J. Christiano, and Patrick J. Kehoe. 1994. OptimalFiscal Policy in a Business Cycle Model. Journal of Political Economy102 (4):617–52.

Chari, V. V., Patrick J. Kehoe, and Ellen R. McGrattan. 2007. BusinessCycle Accounting. Econometrica 75 (3):781–836.

Chen, Zengjing and Larry G. Epstein. 2002. Ambiguity, Risk and AssetReturns in Continuous Time. Econometrica 70 (4):1403–43.

Chernoff, Herman. 1952. A Measure of Asymptotic Efficiency for Tests of aHypothesis Based on the Sum of Observations. Annals of MathematicalStatistics 23:493–507.

Page 411: Maskin ambiguity book mock 2 - New York University

Bibliography 403

Cho, In-Koo and Kenneth Kasa. 2006. Learning and Model Validation.Unpublished.

———. 2008. Learning Dynamics and Endogenous Currency Crises. Un-published.

Cho, In-Koo, Noah Williams, and Thomas J. Sargent. 2002. Escaping NashInfation. Review of Economic Studies 69:1–40.

Chow, C. K. 1957. An Optimum Character Recognition System UsingDecision Functions. IRE Transactions on Electronic Computers 6:247–54.

Christiano, Lawrence J. 1987. Cagan’s Model of Hyperinflation under Ra-tional Expectations. International Economic Review 28:33–49.

Christiano, Lawrence J., Martin Eichenbaum, and David Marshall. 1991.The Permanent Income Hypothesis Revisited. Econometrica 59 (2):397–423.

Christiano, Lawrence J., Martin Eichenbaum, and Charles L. Evans. 2005.Nominal rigidities and the dynamic effects of a shock to monetary policy.Journal of Political Economy 113 (1):1–45.

Cochrane, John H. 1989. The Sensitivity of Tests of the Intertemporal Al-location of Consumption to Near-Rational Alternatives. American Eco-nomic Review 79:319–37.

———. 1997. Where Is the Market Going? Uncertain Facts and Novel The-ories. Federal Reserve Bank of Chicago Economic Perspectives 21 (6):3–37.

———. 2001. Asset Pricing. Princeton, NJ: Princeton University Press.

Cochrane, John H. and Lars P. Hansen. 1992. Asset Pricing Explorationsfor Macroeconomics. NBER Macroeconomics Annual 1992 115–65.

Cogley, Timothy and Thomas J. Sargent. 2005. The Conquest of US In-flation: Learning and Robutsness to Model Uncertainty. Review of Eco-nomic Dynamics 8:528–63.

Page 412: Maskin ambiguity book mock 2 - New York University

404 Bibliography

———. 2008a. The Market Price of Risk and the Equity Premium: ALegacy of the Great Depression? Journal of Monetary Economics 55:454–78.

———. 2008b. The market price of risk and the equity premium: A legacyof the Great Depression? Journal of Monetary Economics 55 (3):454–76.

———. 2009. Diverse Beliefs, Survival and the Market Price of Risk. Eco-nomic Journal 119 (536):354–76.

Cogley, Timothy, Riccardo Colacito, and Thomas Sargent. 2005. Benefitsfrom US Monetary Policy Experimentation in the Days of Samuelson andSolow and Lucas. Unpublished.

Cogley, Timothy, Riccardo Colacito, and Thomas J. Sargent. 2007. Benefitsfrom US Monetary Policy Experimentation in the Days of Samuelson andSolow and Lucas. Journal of Money, Credit, and Banking 39 (s1):67–99.

Cogley, Timothy, Riccardo Colacito, Lars Peter Hansen, and Thomas J.Sargent. 2008. Robustness and US Monetary Policy Experimentation.Journal of Money, Credit, and Banking 40 (8):1599–1623.

Constantinides, George M. 1990. Habit Formation: A Resolution of theEquity Premium Puzzle. Journal of Political Economy 98:519–43.

Constantinides, George M and Darrell Duffie. 1996. Asset Pricing withHeterogeneous Consumers. Journal of Political Economy 104 (2):219–40.

Cox, John C., Jonathan E., Jr. Ingersoll, and Stephen A. Ross. 1985. AnIntertemporal General Equilibrium Model of Asset Prices. Econometrica53 (2):363–84.

Croce, Mariano M., Martin Lettau, and Sydney C. Ludvigson. 2006. In-vestor Information, Long-Run Risk, and the Duration of Risky CashFlows. Working paper 12912, NBER.

Csiszar, Imre. 1991. Why Least Squares and Maximum Entropy? An Ax-iomatic Approach to Inference for Linear Inverse Problems. Annals ofStatistics 19 (4):2032–66.

Page 413: Maskin ambiguity book mock 2 - New York University

Bibliography 405

David, Alexander. 1997. Fluctuating Confidence in Stock Markets: Impli-cations for Returns and Volatility. Journal of Financial and QuantitativeAnalysis 32 (4):457–62.

———. 2008. Heterogeneous Beliefs, Speculation, and the Equity Premium.Journal of Finance LXIII (1):41–83.

David, Alexander and Pietro Veronesi. 1999. Option Prices with Uncer-tain Fundamentals: Theory and Evidence on the Dynamics of ImpliedVolatilities and Over-Underreaction in the Options Market. Mimeo.

Deaton, Angus. 1991. Saving and Liquidity Constraints. Econometrica59 (5):1221–48.

Dembo, A. and O. Zeitouni. 1986. Parameter Estimation of Partially Ob-served Continuous Time Stochastic Processes via the EM Algorithm.Stochastic Processes and their Applications 23 (1):91–113.

Dem’ianov, Vladimir F. and Vassili N. Malozemov. 1974. Introduction toMinimax. New York: Wiley.

Dennis, Richard. 2008. Robust control with commitment: A modificationto Hansen-Sargent. Journal of Economic Dynamics and Control 32:2061–84.

Descartes, Rene. 1901. The method, meditations and philosophy ofDescartes. Tudor Publishing Company.

Detemple, Jerome B. 1986. Asset Pricing in a Production Economy withIncomplete Information. Journal of Finance 41 (2):383–90.

Diaconis, Persi and David Freedman. 1986. On the Consistency of BayesEstimates. Annals of Statistics (1):1–26.

Dolmas, Jim. 1998. Risk Preferences and the Welfare Cost of BusinessCycles. Review of Economic Dynamics 1 (3):646–76.

Donsker, Monroe E. and S. R. Srinivasa Varadhan. 1975. On a VariationalFormula for the Principal Eigenvalue for Operators with Maximum Prin-ciple. Proceedings of the National Academy of Sciences 72 (3):780–83.

Page 414: Maskin ambiguity book mock 2 - New York University

406 Bibliography

———. 1976. On the Principal Eigenvalue of Second-Order Elliptic Dif-ferential Equations. Communications in Pure and Applied Mathematics29:595–621.

Doob, Joseph L. 1953. Stochastic Processes. New York: John Wiley andSons.

Dothan, Michael U. and David Feldman. 1986. Equilibrium Interest Ratesand Multiperiod Bonds in a Partially Observable Economy. Journal ofFinance 41 (2):369–82.

Dow, James and Sergio Ribeiro da Costa Werlang. 1992. Uncertainty Aver-sion, Risk Aversion, and the Optimal Choice of Portfolio. Econometrica60 (1):197–204.

———. 1994. Learning under Knightian Uncertainty: The Law of LargeNumbers for Non-additive Probabilities. Unpublished.

Duffie, Darrell. 2001. Dynamic Asset Pricing Theory, Third Edition. Prince-ton, NJ: Princeton University Press.

Duffie, Darrell and Larry G. Epstein. 1992. Stochastic Differential Utility.Econometrica 60 (2):353–94.

Duffie, Darrell and Pierre-Louis Lions. 1992. PDE solutions of StochasticDifferential Utility. Journal of Mathematical Economics 21 (6):577–606.

Dumas, Bernard, Raman Uppal, and Tan Wang. 2000. Efficient Intertem-poral Allocations with Recursive Utility. Journal of Economic Theory93 (2):240–59.

Dupuis, Paul and Richard S. Ellis. 1997. A Weak Convergence Approach tothe Theory of Large Deviations. Wiley Series in Probability and Statistics.New York: John Wiley and Sons.

Dupuis, Paul, Matthew R. James, and Ian Petersen. 1998. Robust Proper-ties of Risk Sensitive Control. LCDS 98-15, Brown University.

Dynkin, E. B. 1978. Sufficient Statistics and Extreme Points. Annals ofProbability 6 (5):705–30.

Page 415: Maskin ambiguity book mock 2 - New York University

Bibliography 407

Dynkin, Evgenii B. 1956. Markov Processes and Semigroups of Operators.Theory of Probability and Its Applications 1 (1):22–33.

Ekeland, Ivar and Thomas Turnbull. 1983. Infinite-Dimensional Optimiza-tion and Convexity. Chicago Lectures in Mathematics. Chicago: TheUniversity of Chicago Press.

Elliott, Robert J. 1982. Stochastic Calculus and Applications. New York:Springer-Verlag.

Elliott, Robert J., Lakhdar Aggoun, and John B. Moore. 1995. HiddenMarkov Models. Estimation and Control. New York: Springer-Verlag.

Ellison, Martin and Thomas J. Sargent. 2009. A defence of the FOMC.Manuscript, Oxford University and New York University.

Ellsberg, Daniel. 1961. Risk, Ambiguity, and the Savage Axioms. QuarterlyJournal of Economics 75 (4):643669.

Engle, R. and Clive W. J. Granger. 1987. Co-integration and Error Correc-tion: Representation, Estimation and Testing. Econometrica 55:251–76.

Epson, William. 1947. Seven types of ambiguity. New York: New DirectionBooks.

Epstein, L. and S. Zin. 1989a. Substitution, Risk Aversion and the TemporalBehavior of Consumption and Asset Returns: A Theoretical Framework.Econometrica 57:937–69.

Epstein, Larry G. 1988. Risk Aversion and Asset Prices. Journal of Mon-etary Economics 22 (2):179–92.

Epstein, Larry G. and Martin Schneider. 2003a. IID: Independently andIndistinguishably Distributed. Journal of Economic Theory 113 (1):32–50.

———. 2003b. Recursive Multiple Priors. Journal of Economic Theory113 (1):1–31.

———. 2008. Ambiguity, Information Quality, and Asset Pricing. Journalof Finance 63 (1):197–228.

Page 416: Maskin ambiguity book mock 2 - New York University

408 Bibliography

Epstein, Larry G. and Tan Wang. 1994. Intertemporal Asset Pricing UnderKnightian Uncertainty. Econometrica 62 (2):283–322.

Epstein, Larry G. and Stanley E. Zin. 1989b. Substitution, Risk Aver-sion, and the Temporal Behavior of Consumption and Asset Returns: ATheoretical Framework. Econometrica 57 (4):937–69.

Epstein, Larry G. and Stanley E Zin. 1990. ‘First-order’ Risk Aversion andthe Equity Premium Puzzle. Journal of Monetary Economics 26 (3):387–407.

Epstein, Larry G. and Stanley E. Zin. 1991. Substitution, Risk Aversion,and the Temporal Behavior of Consumption and Asset Returns: An Em-pirical Analysis. Journal of Political Economy 99 (2):263–86.

Ergin, Haluk and Faruk Gul. 2009. A Subjective Theory of CompoundLotteries. Journal of Economic Theory 144 (3):899–929.

Ethier, Stewart N. and Thomas G. Kurz. 1986. Markov Processes. Charac-terization and Convergence. John Wiley.

Evans, David S. and Boyan Jovanovic. 1989. An Estimated Model of En-trepreneurial Choice under Liquidity Constraints. Journal of PoliticalEconomy 97 (4):808–27.

Evans, George W. and Seppo Honkapohja. 2003. Learning and Expectationsin Macroeconomics. Princeton, NJ: Princeton University Press.

Fan, Jianqing and Irene Gijbels. 1996. Local Polynomial Modeling and ItsApplications. Chapman and Hall.

Fan, Ky. 1952. Fixed-Point and Minimax Theorems in Locally ConvexTopological Linear Spaces. Proceedings of the National Academy of Sci-ences 38:121–26.

———. 1953. Minimax Theorems. Proceedings of the National Academy ofSciences 39:42–47.

Farhi, Emmanuel and Ivan Werning. 2008. Optimal savings distortions withrecursive preferences. Journal of Monetary Economics 55 (1):21–42.

Page 417: Maskin ambiguity book mock 2 - New York University

Bibliography 409

Fellner, William. 1965. Probability and Profit: A Study of Economic Behav-ior Along Bayesian Lines. Irwin Series in Economics. Homewood, Illinois:Richard D. Irwin, Inc.

Ferguson, Thomas S. 1967. Mathematical Statistics: A Decision TheoreticApproach. New York: Academic Press.

de Finetti, Bruno. 1937. La Prevision: Ses Lois Logiques, Ses Sources Sub-jectives. Annales de l’Institute Henri Poincare 7:1–68. English translationin Kyburg and Smokler (eds.), S tudies in Subjective Probability, Wiley,New York, 1964.

Fisher, Jonas D. M. 2006. The Dynamic Effects of Neutral and Investment-Specific Technology Shocks. Journal of Political Economy 114 (3):413–51.

Fleming, Wendell H. and H. Mete Soner. 1993. Controlled Markov Pro-cesses and Viscosity Solutions. Applications of Mathematics. New York:Springer-Verlag.

Fleming, Wendell H. and Panagiotis E. Souganidis. 1989. On the Exis-tence of Value Functions of Two-Player, Zero Sum Stochastic DifferentialGames. Indiana University Mathematics Journal 38:293–314.

Follmer, Hans. 1985. An Entropy Approach to the Time Reversal of Diffu-sion Processes. In Metivier and Pardoux (1985), 156–63.

Follmer, Hans and Nina Gantert. 1997. Entropy Minimization andSchrodinger Processes in Infinite Dimensions. Annals of Probability25:901–26.

Francis, Bruce A. 1987. A Course in H-infinity Control Theory, vol. 88.Lecture Notes in Control and Information Sciences.

Friedman, Milton. 1953. The Effects of Full Employment Policy on Eco-nomic Stability: A Formal Analysis. In Essays in Positive Economics,edited by Milton Friedman, 117–32. Chicago, Illinois: University ofChicago Press.

Frisch, Ragnar. 1959. A Complete Scheme for Computing all Direct andCross Demand Elasticities in a Model with Many Sectors. Econometrica27:177–96.

Page 418: Maskin ambiguity book mock 2 - New York University

410 Bibliography

Fudenberg, Drew and David K. Levine. 1998. The Theory of Learning inGames. Cambridge, MA: MIT Press.

Gallant, A. Ronald, Lars Peter Hansen, and George Tauchen. 1990. Us-ing Conditional Moments of Asset Payoffs to Infer the Volatility of In-tertemporal Marginal Rates of Substitution. Journal of Econometrics45 (1):141–79.

Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 1995.Bayesian Data Analysis. Chapman and Hall.

Giannoni, Marc P. 2002. Does Model Uncertainty Justify Caution? RobustOptimal Monetary Policy In A Forward-Looking Model. MacroeconomicDynamics 6 (01):111–44.

———. 2007. Robust optimal monetary policy in a forward-looking modelwith parameter and shock uncertainty. Journal of Applied Econometrics22 (1):179–213.

Gilboa, Itzhak and David Schmeidler. 1989. Maxmin Expected Utility withNon-Unique Prior. Journal of Mathematical Economics 18 (2):141–53.

Gilboa, Itzhak, Andrew Postlewaite, and David Schmeidler. 2007. Prob-ability and Uncertainty in Economic Modeling, Second Version. PIERWorking Paper Archive 08-002, Penn Institute for Economic Research,Department of Economics, University of Pennsylvania.

Giordani, Paolo and Paul Soderlind. 2004. Solution of macromodels withHansen-Sargent robust policies: some extensions. Journal of EconomicDynamics and Control 28:2367–97.

Girsanov, I. V. 1960. On Transforming a Certain Class of Stochastic Pro-cesses by Absolutely Continuous Substitution of Measures. Theory ofProbability and its Applications 5 (3):285–301.

Glover, Keith and John C. Doyle. 1988. State-space formulae for all stabiliz-ing controllers that satisfy an H∞-norm bound and relations to relationsto risk sensitivity. Systems and Control Letters 11 (3):167–72.

Gordin, M. I. 1969. The Central Limit Theorem for Stationary Processes.Soviet Mathematics Doklady 10:1174–76.

Page 419: Maskin ambiguity book mock 2 - New York University

Bibliography 411

Gourieroux, Christian and Alain Monfort. 1996. Simulation-Based Econo-metric Methods. CORE Lectures Series. Oxford University Press.

Gray, Peter R. 2009. Probability, Random Processes and Ergodic Properties.New York: Springer, second ed.

Gumen, Anna and Andrei Savochkin. 2010. Ambiguity, Dynamic Consis-tency, and Multiplier Preferences. New York University, Department ofEconomics and Stern School of Business.

Hall, Peter and C. C. Heyde. 1980. Martingale Limit Theory and Its Ap-plication. Boston: Academic Press.

Hall, Robert E. 1978. Stochastic Implications of the Life Cycle-PermanentIncome Hypothesis: Theory and Evidence. Journal of Political Economy86 (6):971–88.

Hamilton, James D. 1989. A New Approach to the Economic Analysisof Nonstationary Time Series and the Business Cycle. Econometrica57 (2):357–84.

Hansen, Lars Peter. 1987. Calculating Asset Prices in Three ExampleEconomies, chap. 6. New York: Cambridge University Press.

———. 2007. Beliefs, Doubts and Learning: Valuing Macroeconomic Risk.American Economic Review 97 (2):1–30.

———. 2008a. Modeling the Long Run: Valuation in Dynamic StochasticEconomies. Prepared for the Fisher-Schultz Lecture at the 2006 EuropeanMeetings of the Econometric Society.

———. 2008b. Modeling the Long Run: Valuation in Dynamic StochasticEconomies. Presented as the Fisher-Schultz Lecture to the EconometricSocietey.

Hansen, Lars Peter and James J. Heckman. 1996. The Empirical Founda-tions of Calibration. Journal of Economic Perspectives 10 (1):87–104.

Hansen, Lars Peter and Ravi Jagannathan. 1991. Implications of SecurityMarket Data for Models of Dynamic Economies. Journal of PoliticalEconomy 99:225–62.

Page 420: Maskin ambiguity book mock 2 - New York University

412 Bibliography

Hansen, Lars Peter and Scott F. Richard. 1987. The Role of ConditioningInformation in Deducing Testable Restrictions Implied by Dynamic AssetPricing Models. Econometrica 55 (3):587–614.

Hansen, Lars Peter and Thomas J. Sargent. 1980. Formulating and estimat-ing dynamic linear rational expectations models. Journal of EconomicDynamics and Control 2:7–46.

———. 1983. Aggregation over Time and the Inverse Optimal PredictorProblem for Adaptive Expectations in Continuous Time. InternationalEconomic Review 24:1–20.

———. 1993. Seasonality and approximation errors in rational expectationsmodels. Journal of Econometrics 55 (1-2):21–55.

———. 1995a. Discounted Linear Exponential Quadratic Gaussian Control.IEEE Transactions on Automatic Control 40 (5):968–71.

———. 1995b. Discounted Linear Exponential Quadratic Gaussian Control.IEEE Transactions on Automatic Control 40 (5):968–71.

———. 1996. Recursive Linear Models of Dynamic Economies. Forthcom-ing.

———. 1998. Discounted Robust Filtering and Control in the FrequencyDomain. University of Chicago and Stanford University.

———. 1999. Five Games and Two Objective Functions that PromoteRobustness. Mimeo.

———. 2001a. Robust Control and Model Uncertainty. American EconomicReview 91:60–66.

———. 2001b. Robust Control and Model Uncertainty. American EconomicReview, Papers and Proceedings 91:60–66.

———. 2001c. Time Inconsistency of Robust Control? Unpublished.

———. 2003a. Decentralizing economies with preferences for robustness.Unpublished.

———. 2003b. Robust Control of Forward-Looking Models. Journal ofMonetary Economics 50:581–604.

Page 421: Maskin ambiguity book mock 2 - New York University

Bibliography 413

———. 2003c. Robust Control of Forward-Looking Models. Journal ofMonetary Economics 50 (3):581–604.

———. 2004a. Discounting, Commitment, and Recursive Formulations ofRobust Estimation and Control. Unpublished.

———. 2004b. Recursive Robust Decisions with Hidden States. Unpub-lished.

———. 2005a. Recursive Robust Decisions with Hidden States. Unpub-lished.

———. 2005b. Robust Estimation and Control under Commitment. Jour-nal of Economic Theory 124 (2):258–301.

———. 2005c. Robust estimation and control under commitment. Journalof Economic Theory 124 (2):258–301.

———. 2005d. Robust Estimation and Control Without Commitment.Unpublished.

———. 2006a. Fragile beliefs and the price of model uncertainty. Unpub-lished.

———. 2006b. Recursive Formulations of Robust Estimation and ControlWithout Commitment. Unpublished.

———. 2006c. Robust Estimation and Control for LQ Gaussian Problemswithout Commitment. Unpublished.

———. 2006d. Robust Estimation and Control for LQ Gaussian problemsWithout Commitment. Unpublished.

———. 2007a. Recursive Models of Dynamic Linear Economies. Unpub-lished monograph, University of Chicago and Hoover Institution.

———. 2007b. Recursive Robust Estimation and Control without Com-mitment. Journal of Economic Theory 136:1–27.

———. 2007c. Recursive robust estimation and control without commit-ment. Journal of Economic Theory 136 (1):1–27.

Page 422: Maskin ambiguity book mock 2 - New York University

414 Bibliography

———. 2008a. Fragile Beliefs and the Price of Uncertainty. University ofChicago and New York University.

———. 2008b. Robustness. Princeton, NJ: Princeton University Press.

———. 2008c. Robustness. Princeton, NJ: Princeton University Press.

———. 2008d. Robustness. Princeton, NJ: Princeton University Press.

———. 2009. Robustness, Estimation, and Detection. Unpublished.

———. 2010. Fragile beliefs and the price of uncertainty. QuantitativeEconomics 1 (1):129–62.

———. 2011. Robustness and ambiguity in continuous time. Journal ofEconomic Theory 146 (3):1195–1223.

———. 2013a. Recursive Models of Dynamic Linear Economies. Princeton,NJ: Princeton University Press.

———. 2013b. Recursive Models of Dynamic Linear Economies. Princeton,NJ: Princeton University Press.

———. 2014. Risk, Uncertainty, and Value. Princeton, NJ: PrincetonUniversity Press.

Hansen, Lars Peter and Jose A. Scheinkman. 1995. Back to the Future:Generating Moment Implications for Continuous Time Markov Processes.Econometrica 63:767–804.

———. 2002. Semigroup Pricing. Unpublished.

———. 2009a. Long-term Risk: An Operator Approach. Econometrica77 (1):177–234.

———. 2009b. Long-term Risk: An Operator Approach. Econometrica77 (1):177–234.

Hansen, Lars Peter and Kenneth J Singleton. 1983. Stochastic Consump-tion, Risk Aversion, and the Temporal Behavior of Asset Returns. Jour-nal of Political Economy 91 (2):249–65.

Page 423: Maskin ambiguity book mock 2 - New York University

Bibliography 415

Hansen, Lars Peter, William Roberds, and Thomas J. Sargent. 1991. TimeSeries Implications of Present-Value Budget Balance and of MartingaleModels of Consumption and Taxes. Rational Expectations Econometrics121–61.

Hansen, Lars Peter, John Heaton, and Erzo Gerrit Jan Luttmer. 1995.Econometric Evaluation of Asset Pricing Models. Review of FinancialStudies 8 (2):237–74.

Hansen, Lars Peter, Jose A. Scheinkman, and Nizar Touzi. 1998. Spec-tral Methods for Identifying Scalar Diffusions. Journal of Econometrics86 (1):1–32.

Hansen, Lars Peter, Thomas J. Sargent, and Thomas D., Jr. Tallarini. 1999.Robust Permanent Income and Pricing. Review of Economic Studies66:873–907.

Hansen, Lars Peter, Thomas J. Sargent, and Neng E. Wang. 2002. RobustPermanent Income and Pricing with Filtering. Macroeconomic Dynamics6:40–84.

Hansen, Lars Peter, Nicholas Polson, and Thomas J. Sargent. 2006a. FragileBeliefs with parameter estimation. Unpublished.

Hansen, Lars Peter, Thomas J. Sargent, G. A. Turmuhambetova, and NoahWilliams. 2006b. Robust Control, Min-Max Expected Utility, and ModelMisspecification. Journal of Economic Theory 128:45–90.

Hansen, Lars Peter, John Heaton, Junghoon Lee, and Nikolai Roussanov.2007. Intertemporal Substitution and Risk Aversion. Handbook of Econo-metrics 6A (1):251–69.

Hansen, Lars Peter, John Heaton, and Nan Li. 2008a. ConsumptionStrikes Back? Measuring Long Run Risk. Journal of Political Economy116 (2):260–302.

Hansen, Lars Peter, Ricardo Mayer, and Thomas J. Sargent. 2008b. RobustHidden Markov LQG Problems. University of Chicago and New YorkUniversity.

Page 424: Maskin ambiguity book mock 2 - New York University

416 Bibliography

Hansen, Lars Peter, Ricardo Meyer, and Thomas J. Sargent. 2010. Ro-bust hidden Markov LQG problems. Journal of Economic Dynamics andControl 34 (10):1951–66.

Harrison, J. Michael and David M. Kreps. 1978. Speculative Investor Be-havior in a Market with Heterogeneous Expectations. Quarterly Journalof Economics 92 (2):323–36.

———. 1979a. Martingales and Arbitrage in Multiperiod Securities Mar-kets. Journal of Economic Theory 20 (3):381–408.

———. 1979b. Martingales and Arbitrage in Multiperiod Securities Mar-kets. Journal of Economic Theory 20 (3):381–408.

Heaton, John. 1993. The Interaction Between Time-Nonseparable Prefer-ences and Time Aggregation. Econometrica 61 (2):353–85.

———. 1995. An Empirical Investigation of Asset Pricing with TemporallyDependent Preference Specifications. Econometrica 63 (3):681–718.

Heaton, John and Deborah J. Lucas. 1996. Evaluating the Effects of In-complete Markets on Risk Sharing and Asset Pricing. Journal of PoliticalEconomy 104 (3):443–87.

Heckman, James J. 1974. Effects of Child-Care Program’s on Women’sWork Effort. In Economics of the Family: Marriage, Children and HumanCapital, edited by Theodore W. Schultz, 491–518. Chicago: University ofChicago Press.

Hellman, Martin E. and Josef Raviv. 1970. Probability Error, Equivocation,and the Chernoff Bound. IEEE Transactions on Information Theory16 (4):368–72.

Hewitt, Edwin and Leonard J. Savage. 1955. Symmetric measures onCartesian products. Transactions of the American Mathematical Soci-ety 80:470501.

Hume, David. 1748. An Enquiry concerning Human Understanding. P. F.Collier & Son. Harvard Classics, Volume 37.

Page 425: Maskin ambiguity book mock 2 - New York University

Bibliography 417

Hurwicz, Leonid. 1962. On the Structural Form of Interdendent Systems.In Logic, Methodology and Philosophy of Science, 232–39. Stanford, CA:Stanford University Press.

Ito, Kiyosi and Shinzo Watanabe. 1965. Transformation of Markov Pro-cesses by Multiplicative Functionals. Annals of Institute Fourier 15:13–30.

Jackson, Matthew O., Ehud Kalai, and Rann Smordoninsky. 1999. BayesianRepresentation of Stochastic Processes Under Learning: De Finetti Re-visited. Econometrica 67 (4):875–93.

Jacobson, David. 1973. Optimal Stochastic Linear Systems with Exponen-tial Performance Criteria and Their Relation to Deterministic DifferentialGames. Automatic Control, IEEE Transactions on 18 (2):124–31.

Jacobson, David H. 1977. Extensions of Linear-quadratic Control, Opti-mization and Matrix Theory. New York: Academic Press.

James, Matthew R. 1992. Asymptotic Analysis of Nonlinear StochasticRisk-Sensitive Control and Differential Games. Mathematics of Control,Signals, and Systems 5:401–17.

———. 1995. Recent Developments in Nonlinear H∞ Control. Departmentof Engineeering, Australian National University.

James, Matthew R. and John Baras. 1996. Partially Observed Differen-tial Games, Infinite Dimensional Hamilton-Jacobi-Isaacs Equations, andNonlinear H∞ Control. SIAM Journal on Control and Optimization34 (4):1342–64.

James, Matthew R., John Baras, and Robert J. Elliott. 1994. Risk-sensitiveControl and Dynamic Games for Partially Observed Discrete-time Non-linear Systems. IEEE Transactions on Automatic Control 39 (4):780–91.

Jobert, A., A. Platania, and L. C. G. Rogers. 2006. A Bayesian Solutionto the Equity Premium Puzzle. Statistical Laboratory, University ofCambridge.

Johnsen, Thore H. and John B. Donaldson. 1985a. The Structure of In-tertemporal Preferences under Uncertainty and Time Consistent Plans.Econometrica 53 (6):1451–58.

Page 426: Maskin ambiguity book mock 2 - New York University

418 Bibliography

———. 1985b. The Structure of Intertemporal Preferences under Uncer-tainty and Time Consistent Plans. Econometrica 53 (6):1451–58.

———. 1985c. The Structure of Intertemporal Preferences under Uncer-tainty and Time Consistent Plans. Econometrica 53 (6):1451–58.

Jorgenson, Dale W. 1967. Discussion. American Economic Review: Papersand Proceedings 57 (2):557–59.

Jovanovic, Boyan. 1979. Job Matching and the Theory of Turnover. Journalof Political Economy 87 (5):972–90.

———. 1982. Selection and the Evolution of Industry. Econometrica50 (3):649–70.

Jovanovic, Boyan and Yaw Nyarko. 1995. The Transfer of Human Capital.Journal of Economic Dynamics and Control 19:1033–64.

———. 1996. Learning by Doing and the Choice of Technology. Economet-rica 64 (6):1299–1310.

Kabanov, Ju M., R. S. Lipcer, and A. N. Sirjaev. 1977. On the Questionof Absolute Continuity and Singularity of Probability Measures. Mathe-matics of the USSR-Sbornik 33 (2):203–21.

———. 1979. Absolute Continuity and Singularity of Locally AbsolutelyContinuous Probability distributions. I. Mathematics of the USSR-Sbornik 35 (5):631–96.

Kalai, Ehud and Ehud Lerner. 1993. Rational Learning Leads to NashEquilibrium. Econometrica 61 (5):1019–45.

Kandel, Shmuel and Robert F. Stambaugh. 1991. Asset Returns and In-tertemporal Preferences. Journal of Monetary Economics 27 (1):39–71.

Karantounias, Anastasios G. 2008. Robustness and Ramsey Plans. Ph.D.thesis, New York University.

———. 2009a. Managing Pessimistic Expectations and Fiscal Policy. Work-ing paper 2009-29a, Federal Reserve Bank of Atlanta.

———. 2009b. Unpublished paper with Lars Peter Hansen and Thomas J.Sargent. Federal Reserve Bank of Atlanta.

Page 427: Maskin ambiguity book mock 2 - New York University

Bibliography 419

———. 2012. Managing pessimistic expectations and fiscal policy. Theo-retical Economics .

Karantounias, Anastasios G., Lars Peter Hansen, and Thomas J. Sargent.2007. Ramsey taxation and fear of misspecification: doubts of the plan-ner. Unpublished paper, New York University and University of Chicago.

———. 2009. Ramsey Taxation and Fear of Misspecification XXXX. WPAtlanta Fed.

Karatzas, Ioannis and Steven E. Shreve. 1991. Brownian Motion andStochastic Calculus. New York: Springer-Verlag, second ed.

Kasa, Kenneth. 2001. A robust Hansen-Sargent prediction formula. Eco-nomics Letters 71 (1):43–48.

———. 2002. Model Uncertainty, Robust Policies, And The Value Of Com-mitment. Macroeconomic Dynamics 6 (01):145–66.

———. 2004. Learning, Large Deviations, and Recurrent Currency Crises.International Economic Review 45:141–73.

———. 2006. Robustness and Information Processing. Review of EconomicDynamics 9 (1):1–33.

Kazamaki, N. 1977. On a Problem of Girsanov. Tohoku Mathematics Jour-nal 29 (4):597–600.

Keynes, John Maynard. 1936. The General Theory of Employment Interestand Money. New York: Harcourt, Brace and Company.

Klibanoff, Peter, Massimo Marinacci, and Sujoy Mukerji. 2005a. A SmoothModel of Decision Making under Ambiguity. Econometrica 73 (6):1849–92.

———. 2005b. A Smooth Model of Decision Making under Ambiguity.Econometrica 73 (6):1849–92.

———. 2009. Recursive Smooth Ambiguity Preferences. Journal of Eco-nomic Theory 144 (3):930–76.

Knight, Frank H. 1921. Risk, Uncertainty, and Profit. New York: Hart,Schaffner, and Marx; Houghton Mifflin.

Page 428: Maskin ambiguity book mock 2 - New York University

420 Bibliography

Knox, Thomas A. 2003. Foundations for Learning How to Invest WhenReturns are Uncertain. Unpublished.

Kocherlakota, Narayana and Christopher Phelan. 2009. On the robustnessof laissez-faire. Journal of Economic Theory 144 (6):2372–87.

Kocherlakota, Narayana R. 1990a. Disentangling the Coefficient of Rela-tive Risk Aversion from the Elasticity of Intertemporal Substitution: AnIrrelevance Result. Journal of Finance 45 (1):175–90.

———. 1990b. On the [‘]discount’ factor in growth economies. Journal ofMonetary Economics 25 (1):43–47.

———. 1996. Implications of Efficient Risk Sharing without Commitment.Review of Economic Studies 63 (4):595–609.

Kogan, Leonid and Raman Uppal. 2002. Risk Aversion and Optimal Port-folio Policies in Partial and General Equilibrium Economies. Unpublishedpaper, Sloan School of Management and London Business School.

Kontoyiannis, Ioannis and Sean P. Meyn. 2003. Spectral Theory and LimitTheorems for Geometrically Ergodic Markov Processes. Annals of Ap-plied Probability 13:304–62.

———. 2005. Large deviations asymptotics and the spectral theory of mul-tiplicatively regular Markov processes. Electronic Journal of Probability10 (3):no. 3, 61–123.

Koopmans, Tjalling C. 1960. Stationary ordinal utility and impatience.Econometrica 287–309.

Kreps, David M. 1988. Notes on the Theory of Choice. Underground Clas-sics in Economics. Westview Press.

———. 1998. Anticipated Utility and Dynamic Choice. In Frontiers ofresearch in economic theory: The Nancy L. Schwartz Memorial Lectures,edited by Ehud Kalai and Morton I. Kamien, no. 29 in EconometricSociety Monographs, 242–74. Cambridge: Cambridge University Press.

Kreps, David M. and Evan L. Porteus. 1978a. Temporal Resolution ofUncertainty and Dynamic Choice Theory. Econometrica 46 (1):185–200.

Page 429: Maskin ambiguity book mock 2 - New York University

Bibliography 421

———. 1978b. Temporal Resolution of Uncertainty and Dynamic ChoiceTheory. Econometrica 46 (1):185–200.

———. 1978c. Temporal Resolution of Uncertainty and Dynamic ChoiceTheory. Econometrica 46 (1):185–200.

Krusell, Per and Anthony A., Jr. Smith. 1996. Rules of Thumb in Macroe-conomic Equilibrium: A Quantitative Analysis. Journal of EconomicDynamics and Control 20 (4):527–58.

Krylov, N. and N. Bogolioubov. 1937. La Theorie generale de la mesuredans son application a l’etude des systemes de la mecanique non lineaire.Annals of Mathematics 38:65–113.

Kullback, S. and R. A. Leibler. 1951. On Information and Sufficiency.Annals of Mathematical Statistics 22:79–86.

Kunita, H. 1969. Absolute Continuity of Markov Processes and Generators.Nagoya Mathematical Journal 36:1–26.

Kurz, Mordecai. 1997. Endogenous Economic Fluctuations: Studies in theTheory of Rational Beliefs. New York: Springer.

Kydland, Finn E. and Edward C. Prescott. 1980. Dynamic optimal tax-ation, rational expectations and optimal control. Journal of EconomicDynamics and Control 2:79–91.

———. 1996. The Computational Experiment: An Econometric Tool. Jour-nal of Economic Perspectives 10 (1):69–85.

Laibson, David. 1997. Golden Eggs and Hyperbolic Discounting. QuarterlyJournal of Economics 112 (2):443–78.

Lauritzen, Steffan. 1988. Extremal Families and Systems of SufficientStatistics. New York, Berlin, and Heidelberg: Springer-Verlag.

———. 2007a. Exchangeability and de Finetti’s Theorem. Unpublishedslides.

———. 2007b. Sufficiency, Partial Exchangeablility, and Exponential Fam-ilies. Unpublished slides.

Page 430: Maskin ambiguity book mock 2 - New York University

422 Bibliography

Leamer, Edward E. 1978. Specification searches : ad hoc inference withnonexperimental data. New York, New York: Wiley.

LeCam, L. and G. L. Yang. 1989. Asymptotics in Statistics. New York:Springer-Verlag.

Lei, Chon Io. 2001. Why Don’t Investors have Large Positions in Stocks?A Robustness Perspective. Ph.D. thesis, University of Chicago.

Leitemo, Kai and Ulf Soderstrom. 2008. Robust Monetary Policy in theNew Keynesian Framework. Macroeconomic Dynamics 12 (SupplementS1):126–35.

Leland, Hayne. 1968. Savings and Uncertainty: the Precautionary Demandfor Savings. Quarterly Journal of Economics 82:465–73.

Levin, Andrew T., Alexei Onatski, John C. Williams, and Noah Williams.2005. Monetary Policy Under Uncertainty in Micro-Founded Macroe-conometric Models. In NBER Macroeconomics Annual, edited byM. Gertler and K. Rogoff, 229–87. Cambridge, Massachusetts: MITPress.

Liptser, R. S. and A. N. Shiryaev. 2000. Statistics of Random Processes,vol. I. General Theory of Applications of Mathematics. Berlin: Springer,second ed.

Liptser, Robert Shevilevich and Al’bert Nikolaevich Shiryayev. 1977. Statis-tics of Random Processes. New York: Springer-Verlag.

Liu, Jun, Jun Pan, and Tan Wang. 2005. An Equilibrium Model of Rare-Event Premia and its Implication for Option Smirks. Review of FinancialStudies 18 (1):131–64.

Ljung, L. 1978. Convergence analysis of Parametric Identification Methods.IEEE Transactions on Automatic Control 23:770–83.

Ljungqvist, Lars and Thomas J. Sargent. 2004a. Recursive MacroeconomicTheory, 2nd Edition. Cambridge, MA.: The MIT Press.

———. 2004b. Recursive Macroeconomic Theory, 2nd Edition. Cambridge,Massachusetts: MIT Press, 2nd ed.

Page 431: Maskin ambiguity book mock 2 - New York University

Bibliography 423

———. 2011. Recursive Macroeconomic Theory. Cambridge, Mass.: MITpress.

Lucas, Robert E., Jr. 1975. An Equilibrium Model of the Business Cycle.Journal of Political Economy 83:1113–44.

———. 1976a. Econometric Policy Evaluation: A Critique. In The PhillipsCurve and Labor Markets, edited by Karl Brunner and Allan H. Meltzer,19–46. Amsterdam: North-Holland.

———. 1976b. Econometric policy evaluation: A critique. Carnegie-Rochester Conference Series on Public Policy 1:19–46.

———. 1978. Asset prices in an Exchange Economy. Econometrica 46:1429–45.

———. 1982. Interest Rates and Prices in a Two-Country World. Journalof Monetary Economics 10 (3):335–60.

———. 1987a. Models of Business Cycles. Oxford and New York: BasilBlackwell.

———. 1987b. Models of Business Cycles, vol. 26. Oxford and New York:Basil Blackwell.

———. 2003. Macroeconomic Priorities. American Economic Review, Pa-pers and Proceedings Vol. 93:1–14.

Lucas, Robert E., Jr. and Edward C. Prescott. 1971. Investment UnderUncertainty. Econometrica 39 (5):659–81.

Lucas, Robert E., Jr. and Thomas J. Sargent. 1981a. Introduction. In Ra-tional Expectations and Econometric Practice. Minneapolis, Minnesota:University of Minnesota Press.

———. 1981b. Rational Expectations and Econometric Practice. Minneapo-lis, Minnesota: University of Minnesota Press.

Lucas, Robert E., Jr. and Nancy L. Stokey. 1983. Optimal fiscal and mon-etary policy in an economy without capital. Journal of Monetary Eco-nomics 12 (1):55–93.

Page 432: Maskin ambiguity book mock 2 - New York University

424 Bibliography

———. 1984a. Optimal growth with many consumers. Journal of EconomicTheory 32 (1):139–71.

———. 1984b. Optimal growth with many consumers. Journal of EconomicTheory 32 (1):139–71.

Luce, R. Duncan and Howard Raiffa. 1957. Games and Decisions. NewYork: J. Wiley.

Luenberger, David G. 1969. Optimization by Vector Space Methods. NewYork: Wiley.

Maccheroni, Fabio, Massimo Marinacci, and Aldo Rustichini. 2004. Vari-ational Representation of Preferences under Ambiguity. Working paperseries 5/2004, ICER.

———. 2006a. Ambiguity Aversion, Robustness, and the Variational Rep-resentation of Preferences. Econometrica 74 (6):1447–98.

———. 2006b. Dynamic Variational Preferences. Journal of EconomicTheory 128:4–44.

Maenhout, Pascal. 2001. Robust Portfolio Rules, Hedging and Asset Pric-ing. unpublished paper, INSEAD .

Maenhout, Pascal J. 2004. Robust Portfolio Rules and Asset Pricing. Re-view of Financial Studies 17 (4):951–83.

———. 2006. Robust portfolio rules and detection-error probabilities for amean-reverting risk premium. Journal of Economic Theory 000:000–00.

Magill, Michael J. P. 1977. Some new results on the local stability of theprocess of capital accumulation. Journal of Economic Theory 15 (1):174–210.

Marcellino, Massimiliano and Mark Salmon. 2002. Robust Decision Theoryand the Lucas Critique. Macroeconomic Dynamics 6:167–85.

Marcet, Albert and Ramon Marimon. 1998. Recursive Contracts. Un-published paper, Universitat Pompeu Fabra and European UniversityInstitute.

Page 433: Maskin ambiguity book mock 2 - New York University

Bibliography 425

———. 2011. Recursive Contracts. Discussion Paper 1055, Centre forEconomic Performance, LSE.

Marcet, Albert and Thomas J. Sargent. 1989. Convergence of Least SquaresLearning Mechanisms in Self-Referential Linear Stochastic Models. Jour-nal of Economic Theory 48:337–68.

Marcet, Albert and Andrew Scott. 2009. Debt and deficit fluctuations andthe structure of bond markets. Journal of Economic Theory 144 (1):473–501.

Marschak, Jacob. 1953. Economic Measurements for Policy and Prediction.In Studies in econometric method, edited by Tjalling Charles KoopmansWilliam C. Hood, chap. 1, 1–26. John Wiley and Sons, Inc.

Mehra, Rajnish and Edward C. Prescott. 1985a. The equity premium: Apuzzle. Journal of Monetary Economics 15 (2):145–61.

———. 1985b. The Equity Premium Puzzle: A Solution? Journal ofMonetary Economics 22:133–36.

Melino, Angelo and Larry G. Epstein. 1995. An Empirical Analysis ofAsset Returns uner ‘Non-Bayesian Rational Expectations’. University ofToronto.

Merton, Robert C. 1973. An Intertemporal Capital Asset Pricing Model.Econometrica 41 (5):867–87.

———. 1975. The Asymptotic Theory of Growth Under Uncertainty. Re-view of Economic Studies 42 (3):375–93.

Metivier, Michel and Etienne Pardoux, eds. 1985. Stochastic DifferentialSystems, vol. 69 of Lecture Notes in Control and Information Sciences.Springer-Verlag.

Meyn, Sean P. and Richard L. Tweedie. 1993a. Markov Chains and Stochas-tic Stability. London: Springer-Verlag.

———. 1993b. Stability of Markovian Processes III: Foster-Lyapunov Cri-teria for Continuous Time Processes. Advances in Applied Probability25:518–48.

Page 434: Maskin ambiguity book mock 2 - New York University

426 Bibliography

Miller, Bruce L. 1974. Optimal Consumption with a Stochastic IncomeStream. Econometrica 42:253–66.

Moscarini, Giuseppe and Lones Smith. 2002. The Law of Large Demandfor Information. Econometrica 70 (6):2351–66.

Muth, John F. 1960a. Optimal Properties of Exponentially Weighted Fore-casts. Journal of the American Statistical Association 55:299–306.

———. 1960b. Optimal Properties of Exponentially Weighted Forecasts.Journal of the American Statistical Association 55:299–306.

Neumann, John Von and Oscar Morgenstern. 1944a. Theory of Games andEconomic Behavior. Princeton, NJ: Princeton University Press.

———. 1944b. Theory of Games and Economic Behavior. Princeton, NJ:Princeton University Press.

Newell, Richard G. and William A. Pizer. 2003. Discounting the Future:How Much Do Uncertain Rates Increase Valuations. Journal of Environ-mental Economics and Management 46:52–71.

———. 2004. Uncertain Discount Rates in Climate Policy Analysis. EnergyPolicy 32:519–29.

Newman, Charles M. 1973. On the orthogonality of independent incrementprocesses. In Topics in Probability Theory, edited by Daniel W. Stroockand S. R. Srinivasan Varadhan, 93–111. New York: Courant Institute ofMathematical Sciences.

Newman, Charles M. and Barton W. Stuck. 1979. Chernoff Bounds forDiscriminating between Two Markov Processes. Stochastics 2 (1-4):139–53.

Novikov, A. A. 1971. On Moment Inequalities for Stochastic Integrals.Theory of Probability and its Applications 16 (3):538–41.

Obstfeld, Maurice. 1994a. Evaluating risky consumption paths: the role ofintertemporal substitutability. European Economic Review 38 (7):1471–86.

Page 435: Maskin ambiguity book mock 2 - New York University

Bibliography 427

———. 1994b. Evaluating Risky Consumption Paths: The Role of In-tertemporal Substitutability. European Economic Review 38 (7):1471–86.

Olalla, Myriam Garcıa and Alejandro Ruiz Gomez. 2011. Robust controland central banking behaviour. Economic Modelling 28 (3):1265–78.

Onatski, Alexei and James H. Stock. 2002. Robust Monetary Policy UnderModel Uncertainty In A Small Model Of The US Economy. Macroeco-nomic Dynamics 6 (01):85–110.

Onatski, Alexei and Noah Williams. 2003a. Modeling Model Uncertainty.Journal of the European Economic Association 1 (5):1087–1122.

———. 2003b. Modeling Model Uncertainty. Journal of the EuropeanEconomic Association 1 (5):1087–1122.

Orlik, Anna and Ignacio Presno. 2012. On Credible Monetary PoliciesWith Model Uncertainty. Board of Governors, Federal Reserve System,Washington, DC.

Petersen, Ian R., Matthew R. James, and Paul Dupuis. 2000a. MinimaxOptimal Control of Stochastic Uncertain Systems with Relative EntropyConstraints. IEEE Transactions on Automatic Control 45:398–412.

———. 2000b. Minimax Optimal Control of Stochastic Uncertain Systemswith Relative Entropy Constraints. IEEE Transactions on AutomaticControl 45:398–412.

von Plato, Jan. 1982. The Significance of the Ergodic Decomposition of Sta-tionary Measures for the Interpretation of Probability. Synthese 53:419–32.

Pratt, John W. 1964. Risk Aversion in the Small and in the Large. Econo-metrica 32 (1–2):122–36.

Prescott, Edward C. 1986. Theory Ahead of Business Cycle Measurement.Carnegie-Rochester Conference Series on Public Policy 25 (1):11–44.

Primiceri, Giorgio. 2006. Why Inflation Rose and Fell: Policymakers’ Beliefsand US Postwar Stabilization Policy. Quarterly Journal of Economics121 (3):867–901.

Page 436: Maskin ambiguity book mock 2 - New York University

428 Bibliography

Ramsey, Frank P. 1931. Truth and Probability. In The Foundations ofMathematics and other Logical Essays, edited by R. B. Braithwaite, 156–98. London: Kegan, Paul, Trench, Trubner and Co., New York: Harcourt,Brace and Company. Originally written in 1926.

Revuz, Daniel and Marc Yor. 1994. Continuous Martingales and BrownianMotion. New York: Springer-Verlag, second ed.

Romer, Christina D. and David H. Romer. 2008. The FOMC versus theStaff: Where Can Monetary Policymakers Add Value? American Eco-nomic Review 98 (2):230–35.

Runolfsson, Thordur. 1994. Optimal Control of Stochastic System withan Exponential-of-integral Performance Criterion. Systems and ControlLetters 22:451–56.

Ryder, Harl E., Jr. and Geoffrey M. Heal. 1973. Optimal Growth withIntertemporally Dependent Preferences. Review of Economic Studies40 (1):1–31.

Ryll-Nardzewski, C. 1957. On Stationary Sequences of Random Variablesand de Finetti’s Equivalence. Colloquium Mathematicum 4:146–56.

Sandroni, Alvaro. 2000. Do Markets Favor Agents able to Make AccuratePredictions? Econometrica 68 (6):1303–41.

Sargent, Thomas J. 2005. Note on Michael Woodford’s paper. Discussionprepared for conference on monetary economics at the Federal ReserveBank of New York.

Savage, Leonard J. 1954. The Foundations of Statistics. New York: JohnWiley and Sons.

Scheinkman, Jose A. and Wei Xiong. 2003. Overconfidence and SpeculativeBubbles. Journal of Political Economy 111 (6):1183–1220.

Schroder, Mark and Costis Skiadas. 1999. Optimal Consumption and Port-folio Selection with Stochastic Differential Utility. Journal of EconomicTheory 89 (1):68–126.

Sclove, Stanley L. 1983. Time-Series Segmentation: A Model and AMethod. Information Sciences 29 (1):7–25.

Page 437: Maskin ambiguity book mock 2 - New York University

Bibliography 429

Segal, Uzi. 1990. Two-Stage Lotteries without the Reduction Axiom.Econometrica 58 (2):349–77.

Shapiro, Matthew D. and Mark W. Watson. 1988. Sources of BusinessCycle Fluctuations. NBER Macroeconomics Annual 3:111–48.

Shiller, Robert J. 1972. Rational Expectations and the Term Structure ofInterest Rates. Ph.D. thesis, Massachusetts Institute of Technology.

———. 1982. Consumption, Asset Prices and Economic Fluctuations.Carnegie-Rochester Conference Series on Public Policy 17:203–38.

Shin, Yongseok. 2006. Ramsey meets Bewley: Optimal Government Financ-ing with Incomplete Markets. Unpublished paper, University of Wiscon-sin.

Silverman, Bernard W. 1986. Density Estimation for Statistics and DataAnalysis. London: Chapman and Hall.

Sims, Christopher A. 1971a. Discrete Approximations to Continuous TimeDistributed Lags in Econometrics. Econometrica 39 (3):545–63.

———. 1971b. Distributed Lag Estimation When the Parameter-Space isExplicitly Infinite- Dimensional. Annals of Mathematical Statistics .

———. 1972. The Role of Approximate Prior Restrictions in DistributedLag Estimation. Journal of the American Statistical Association 169–75.

———. 1974. Seasonality in Regression. Journal of the American StatisticalAssociation 69 (347):618–26.

———. 1980. Macroeconomics and Reality. Econometrica 48:1–48.

———. 1993. Rational expectations modeling with seasonally adjusteddata. Journal of Econometrics 55 (1-2):9–19.

———. 1996. Macroeconomics and Methodology. Journal of EconomicPerspectives 10 (1):105–20.

———. 2001. Pitfalls of a Minimax Approach to Model Uncertainty. Amer-ican Economic Review 91 (2):51–54.

Page 438: Maskin ambiguity book mock 2 - New York University

430 Bibliography

Singleton, Kenneth J. 2006. Empirical Dynamic Asset Pricing: Model Spec-ification and Econometric Assessment. Princeton, NJ: Princeton Univer-sity Press.

Siniscalchi, Marciano. 2006. A behavioral characterization of plausible pri-ors. Journal of Economic Theory 128 (1):91–135.

Skiadas, Costis. 2001. Robust Control and Recursive Utility. Departmentof Finance, Kellogg School of Management, Northwestern University.

Sleet, Christopher and Sevin Yeltekin. 2006. Optimal taxation with endoge-nously incomplete debt markets. Journal of Economic Theory 127 (1):36–73.

Smets, Frank and Raf Wouters. 2003a. An Estimated Dynamic StochasticGeneral Equilibrium Model of the Euro Area. Journal of the EuropeanEconomic Association 1 (5):1123–75.

———. 2003b. An estimated dynamic stochastic general equilibriummodel of the Euro area. Journal of the European Economic Association1 (5):1123–75.

Spear, Stephen E. and Sanjay Srivastava. 1987. On Repeated Moral Hazardwith Discounting. Review of Economic Studies 54 (4):599–617.

Strzalecki, Tomasz. 2008a. Axiomatic Foundations of Multiplier Prefer-ences. Mimeo, Harvard University.

———. 2008b. Axiomatic Foundations of Multiplier Preferences. North-western University Department of Economics.

———. 2009. Temporal Resolution of Uncertainty and Recursive Modelsof Ambiguity Aversion. Tech. rep., Harvard University. Department ofEconomics, working paper.

Sundaresan, Suresh M. 1989. Intertemporally Dependent Preferences andthe Volatility of Consumption and Wealth. Review of Financial Studies2 (1):73–89.

Svensson, Lars E. O. 2007. Robust Control Made Simple: Lecture Notes.First draft: March 2000.

Page 439: Maskin ambiguity book mock 2 - New York University

Bibliography 431

Svensson, Lars E. O. and Noah Williams. 2008. Optimal Monetary Policyunder Uncertainty in DSGE Models: A Markov Jump-Linear-QuadraticApproach. NBER Working Papers 13892, National Bureau of EconomicResearch, Inc.

Tallarini, Thomas D., Jr. 1998. Risk Sensitive Real Business Cycles. Mimeo.

———. 2000a. Risk-sensitive real business cycles. Journal of MonetaryEconomics 45 (3):507–32.

———. 2000b. Risk-Sensitive Real Business Cycles. Journal of MonetaryEconomics 45 (3):507–32.

Tetlow, Robert J. 2007. On the robustness of simple and optimal monetarypolicy rules. Journal of Monetary Economics 54 (5):1397–1405.

Tetlow, Robert J. and Brian Ironside. 2007. Real-Time Model Uncertaintyin the United States: The Fed, 1996-2003. Journal of Money, Credit, andBanking 39 (7):1533–61.

Tetlow, Robert J. and Peter von zur Muehlen. 2001. Robust monetarypolicy with misspecified models: Does model uncertainty always call forattenuated policy? Journal of Economic Dynamics and Control 25 (6-7):911–49.

———. 2004. Avoiding Nash Inflation: Bayesian and Robust Responses toModel Uncertainty. Review of Economic Dynamics 7 (4):869–99.

———. 2009. Robustifying learnability. Journal of Economic Dynamicsand Control 33 (2):296–316.

Thomas, Jonathan and Tim Worrall. 1988. Self Enforcing Wage Contracts.Review of Economic Studies 55:541–54.

Tsirel’son, B. S. 1975. An Example of a Stochastic Differential EquationHaving No Strong Solution. Theory of Probability and its Applications20:427–30.

Uhlig, Harald. 2009. A model of a systemic bank run. Tech. Rep. workingpaper 15072, NBER.

Ustunel, A. S. and M. Zakai. 2000. Transformation of Measure. Springer.

Page 440: Maskin ambiguity book mock 2 - New York University

432 Bibliography

Van Der Ploeg, Frederick. 1993. A Closed-form Solution for a Model ofPrecautionary Saving. Review of Economic Studies 60:385–95.

Veronesi, Pietro. 2000. How Does Information Quality Affect Stock MarketReturns? Journal of Finance 55:807–37.

Wald, Abraham. 1939. Contributions to the Theory of Statistical Es-timation and Testing Hypotheses. Annals of Mathematical Statistics10 (4):299–326.

Wallis, Kenneth. 1974. Seasonal adjustment and relations between vari-ables. Journal of the American Statistical Association 69:18–31.

Walsh, Carl E. 2004. Robustly Optimal Instrument Rules and RobustControl: An Equivalence Result. Journal of Money, Credit, and Banking36 (6):1105–13.

Wang, Tan. 2001. Two Classes of Multi-Prior Preferences. Unpublished.

Weil, Philippe. 1989. The Equity Premium Puzzle and the Risk-Free RatePuzzle. Journal of Monetary Economics 24 (3):401–21.

———. 1990. Nonexpected Utility in Macroeconomics. Quarterly Journalof Economics 105 (1):29–42.

———. 1993. Precautionary Savings and the Permanent Income Hypothe-sis. Review of Economic Studies 60 (2):367–83.

Weiland, Volker. 2005. Comment on ’Certainty Equivalence and Model Un-certainty’. In Models and Monetary Policy: Research in the Tradition ofDale Henderson, Richard Porter and Peter Tinsley, edited by AthanasiosOrphanides, David Reifschneider, and Jonathan Faust. Washington, DC:Board of Governors of the Federal Reserve System.

Weitzman, Martin L. 2005. A Unified Bayesian Theory of Equity ‘Puzzles’.Harvard University.

White, Halbert. 1982. Maximum Likelihood Estimation of MisspecifiedModels. Econometrica 50:1–26.

———. 1994. Estimation, Inference, and Specification Analysis. CambridgeUniversity Press.

Page 441: Maskin ambiguity book mock 2 - New York University

Bibliography 433

Whittle, Peter. 1981. Risk-sensitive Linear/Quadratic/Gaussian Control.Advances in Applied Probability 13:764–77.

———. 1982. Optimization Over Time, Vol. 1. John Wiley & Sons, Inc.

———. 1983. Optimization Over Time, Vol. 2. John Wiley & Sons, Inc.

———. 1989a. Entropy-minimizing and Risk-sensitive Control Rules. Sys-tems and Control Letters 13 (4):1–7.

———. 1989b. Risk-Sensitive Linear Quadratic Gaussian Control. Ada-vanced in Applied Probablility 13:764–77.

———. 1990. Risk-Sensitive Optimal Control. New York: John Wiley &Sons.

Wilmott, Paul, Sam Howison, and Jeff Dewynne. 1995. The Mathematicsof Financial Derivatives. Cambridge: Cambridge University Press.

Wonham, W. M. 1964. Some Applications of Stochastic Differential Equa-tions to Optimal Nonlinear Filtering. Siam Journal on Control 2:347–68.

Woodford, Michael. 2010. Robustly Optimal Monetary Policy with Near-Rational Expectations. American Economic Review 100 (1):274–303.

Zames, George. 1981. Feedback and Optimal Sensitivity: Model ReferenceTransformations, Multiplicative Seminorms, and Approximate Inverses.IEEE Transactions on Automatic Control 26:301–20.

Zeldes, Stephen P. 1989. Optimal Consumption with Stochastic Income:Deviation from Certainty Equivalence. Quarterly Journal of Economics104:275–98.

Zellner, Arnold. 1962. An Efficient Method of Estimating Seemingly Unre-lated Regressions and Tests for Aggregation Bias. Journal of the Ameri-can Statistical Association 57:348–68.

Zha, Tao. 1999. Block recursion and structural vector autoregressions. Jour-nal of Econometrics 90 (2):291–316.

Zhou, Kemin, John.C. Doyle, and Keith Glover. 1996. Robust and OptimalControl, vol. 40. Upper Saddle River, NJ: Prentice Hall New Jersey.

Page 442: Maskin ambiguity book mock 2 - New York University

434 Bibliography

Zhu, Xiaodong. 1992. Optimal fiscal policy in a stochastic growth model.Journal of Economic Theory 58 (2):250–89.

Page 443: Maskin ambiguity book mock 2 - New York University

Author Index

Adam, Klaus 383Aggoun, Lakhdar 268, 269Ai, Hengjie 305Alvarez, Fernando 217, 220, 221, 246Anderson, Evan 102Anderson, Evan W. 32, 71, 143–145,

147, 152, 155, 169, 188, 201, 219,221, 241, 242, 245, 263, 264, 290,292, 297, 301, 315, 332, 333, 343,371, 377, 382

Araujo, Aloisio 155

Bansal, Ravi 5, 9–11, 128–130, 265,294, 302, 321, 322, 326, 331, 332

Barillas, Francisco 301, 315Basar, T. 156, 297Battigalli, Pierpaolo 338Becker, Gary S. 37Bergemann, Dirk 263, 268Bernhard, P. 156, 297Bewley, Truman 28Blackwell, D. 107, 171, 178Boldrin, Michele 50Borovicka, Jaroslav 367, 371Bossaerts, Peter 306, 312Bray, Margaret 83Breeden, Douglas T. 110, 310Brennan, Michael J. 305Brock, W. A 109, 110Brock, William A. 70, 263

Bucklew, James A. 2Burnside, Craig 44, 58

Cameron, Robert H 121, 122

Campbell, John Y. 36, 137, 332Carroll, Christopher D. 43, 70, 71Cecchetti, Stephen G. 58

Cerreia-Vioglio, Simone 232, 233, 253,300, 338

Chamberlain, Gary 178Chen, Zengjing 102, 144, 151Chernoff, Herman 116, 118, 119, 290

Cho, In-Koo 82, 263Chow, C. K. 125Christiano, Lawrence J. 28, 50

Cochrane, John H. 56, 70, 127, 137,241, 332

Cogley, Timothy 263, 266, 269, 296,305, 306, 312, 329

Colacito, Riccardo 263, 266, 269, 296,305

Constantinides, George M 137

Croce, Mariano M. 305Csiszar, Imre 98

David, Alexander 305

Deaton, Angus 43Dennis, Richard 358, 377Detemple, Jerome B. 305

Dolmas, Jim 246

435

Page 444: Maskin ambiguity book mock 2 - New York University

436 Author Index

Donaldson, John B. 290

Doyle, John C. 29, 35Doyle, John.C. 28Dubins, L. 171Duffie, Darrell 101, 137, 147, 149, 168,

188, 304

Dumas, Bernard 367Dupuis, Paul 143, 146, 169, 172, 174,

204, 284Durlauf, Steven N. 263Dynkin, Evgenii B 91

Eichenbaum, Martin 44, 50

Ekeland, Ivar 211Elliott, Robert J. 165, 268, 269Ellis, Richard S. 146, 204, 284

Epstein, L. 7, 29, 31, 81, 147, 168, 227,293

Epstein, Larry G. 16, 17, 28, 29, 34,35, 52, 53, 70, 101, 102, 132, 144,147, 149, 151, 157, 168, 188,195–200, 227, 265, 291, 301

Ergin, Haluk 293Ethier, Stewart N. 80, 101Evans, George W. 79

Fan, Ky 211, 212

Fisher, Jonas D. M. 50Fleming, Wendell H. 103, 148,

185–187, 210, 213, 357Follmer, Hans 206Francis, Bruce A. 28

Fudenberg, Drew 79

Gallant, A. Ronald 56Gilboa, Itzhak 6, 28, 34, 70, 82, 143,

152, 153, 155, 162, 163, 169, 172,177, 200, 232, 234, 300, 338

Giordani, Paolo 358Girshick, M. 107, 178

Glover, Keith 28, 29, 35

Gomez, Alejandro Ruiz 358

Gul, Faruk 293

Hall, Robert E. 28, 36, 39

Hansen, Lars P 102

Hansen, Lars Peter 9, 10, 12, 22, 25,28, 29, 31–34, 36, 42, 45, 52, 56–58,71, 78, 80, 81, 83, 88–90, 93, 100,108, 109, 114, 115, 127, 129, 135,143–147, 152, 153, 155, 168, 169,172, 188, 200, 201, 217, 219–222,224, 226, 227, 231–236, 240–242,245, 246, 255–257, 261–266, 269,270, 277, 279–281, 283, 285,287–297, 300, 301, 303–306, 311,313, 315, 317–321, 332–334,340–342, 346, 352–354, 357, 358,368, 371, 376, 377, 382

Harrison, J. Michael 52, 300

Heal, Geoffrey M. 37

Heaton, John 36, 37, 50, 58, 137, 321,333

Hellman, Martin E. 118, 125

Hommes, Cars 70

Honkapohja, Seppo 79

Jackson, Matthew O. 171

Jacobson, David 1, 15, 16, 20, 22, 29,32, 69, 75, 167, 169, 292

Jacobson, David H. 1, 15, 29

Jagannathan, Ravi 56, 57, 127, 129,217, 220–222, 224, 245, 304

James, Matthew R. 29, 34, 53, 100,102, 143, 146, 168, 169, 172, 174,188, 284, 358, 371

Jermann, Urban J. 217, 220, 221, 246

Jobert, A. 332

Johnsen, Thore H. 290

Jorgenson, Dale W. 78

Jovanovic, Boyan 263, 268

Page 445: Maskin ambiguity book mock 2 - New York University

Author Index 437

Kabanov, Ju M. 170, 205Kalai, Ehud 171Karantounias, Anastasios G. 12, 338,

342Karatzas, Ioannis 164Kasa, Kenneth 263Klibanoff, Peter 233, 293, 301Kocherlakota, Narayana R. 230Koopmans, Tjalling C. 17Kreps, David M. 7, 16, 17, 31, 52, 83,

220, 236, 293, 300Krusell, Per 70Kunita, H. 93, 97, 101, 209Kurz, Mordecai 78Kurz, Thomas G. 80, 101

Laibson, David 278LeBaron, Blake D. 70Lei, Chon Io 106Leitemo, Kai 358Leland, Hayne 70Lerner, Ehud 171Lettau, Martin 305Levine, David K. 79Li, Nan 321, 333Lions, Pierre-Louis 101Liptser, R. S. 165, 179, 182, 205, 209Liu, Jun 292Ljungqvist, Lars 345, 346Lucas, Deborah J. 137Lucas, Robert E., Jr. 7, 8, 17, 29, 51,

69, 78, 109, 110, 217, 220–222, 230,241, 245–247, 249, 252, 253, 257

Ludvigson, Sydney C. 305Luenberger, David G. 146, 147, 172,

174Luttmer, Erzo Gerrit Jan 58

Maccheroni, Fabio 6, 177, 232, 233,253, 300, 338

Maenhout, Pascal 106

Maenhout, Pascal J. 292

Marcet, Albert 70, 341, 346Marimon, Ramon 341, 346Marinacci, Massimo 6, 177, 232, 233,

253, 293, 300, 301, 338Mark, Nelson C. 58

Marshall, David 50Martin, William T 121, 122Mayer, Ricardo 333, 334Mehra, Rajnish 110Melino, Angelo 29, 70

Metivier, Michel 405Miller, Bruce L. 70Montrucchio, Luigi 232, 233, 253, 300Moore, John B. 268, 269

Morgenstern, Oscar 16Moscarini, Giuseppe 125Mukerji, Sujoy 233, 293, 301Murphy, Kevin M. 37

Muth, John F. 28

Neumann, John Von 16

Newman, Charles M. 124Nyarko, Yaw 263, 268

Obstfeld, Maurice 246

Olalla, Myriam Garcıa 358Onatski, Alexei 200Orlik, Anna 12, 338

Pan, Jun 292Pardoux, Etienne 405Petersen, Ian 143

Petersen, Ian R. 146, 169, 172, 174,284

Platania, A. 332Polson, Nicholas 296Porteus, Evan L. 7, 16, 17, 31, 220,

236, 293Pratt, John W. 138, 241

Prescott, Edward C. 110

Page 446: Maskin ambiguity book mock 2 - New York University

438 Author Index

Presno, Ignacio 12, 338

Raviv, Josef 118, 125Rebelo, Sergio 44Revuz, Daniel 88, 90, 180, 187Richard, Scott F. 311Roberds, William 36Rogers, L. C. G. 332Runolfsson, Thordur 100, 101Rustichini, Aldo 6, 177, 232, 300Ryder, Harl E., Jr. 37

Sandroni, Alvaro 155sang Lam, Pok 58Sargent, Thomas J. 9, 10, 12, 22, 25,

28, 29, 31–34, 36, 42, 45, 52, 70, 71,78, 80–83, 100, 114, 115, 135,143–147, 152, 153, 155, 168, 169,172, 188, 200, 201, 217, 219–221,226, 227, 231–236, 240–242, 245,246, 255–257, 261–266, 269, 270,277, 279–281, 283, 285, 287–297,300, 301, 305, 306, 312, 313, 315,317–320, 329, 332–334, 340–342,345, 346, 352–354, 357, 358, 368,371, 376, 377, 382

Scheinkman, Jose A. 80, 88–90, 93,108, 109, 300, 303

Schmeidler, David 6, 28, 34, 70, 82,143, 152, 153, 155, 162, 163, 169,172, 177, 200, 232, 234, 300, 338

Schneider, Martin 132, 157, 195–200,265, 291, 301

Schroder, Mark 101, 188Segal, Uzi 293Shiller, Robert J. 222Shiryaev, A. N. 165, 179, 182, 205, 209Shreve, Steven E. 164Sims, Christopher A. 78Skiadas, Costis 101, 188Smith, Anthony A., Jr. 70

Smith, Lones 125

Smordoninsky, Rann 171

Soderlind, Paul 358Soderstrom, Ulf 358

Soner, H. Mete 103, 186Souganidis, Panagiotis E. 148,

185–187, 210, 213, 357

Stokey, Nancy L. 17Strzalecki, Tomasz 232, 300

Stuck, Barton W. 124Sundaresan, Suresh M. 37

Tallarini, Thomas D., Jr. 4, 6–9, 58,80, 133, 139, 146, 172, 217, 219–222,224, 225, 227, 230, 236, 240,245–247, 249, 252, 253, 255, 256,263, 265, 290, 293, 295, 301, 315,320, 332

Tauchen, George 56

Turmuhambetova, G. A. 81, 143, 146,147, 220, 231, 232, 234–236, 263,280, 285, 291, 300, 315, 353, 357,358

Turnbull, Thomas 211

Uppal, Raman 367

Valimaki, J. 263, 268Van Der Ploeg, Frederick 29

Veronesi, Pietro 305S. Lipcer, R. 170, 205

Sirjaev, A. N. 170, 205

Walsh, Carl E. 358, 359

Wang, Neng E. 80, 240, 333, 334

Wang, Tan 28, 29, 34, 35, 52, 53, 70,177, 189, 292, 367

Weil, Philippe 16, 29, 31, 57, 220, 224

Weiland, Volker 263Weitzman, Martin L. 332

West, Kenneth D. 263

Page 447: Maskin ambiguity book mock 2 - New York University

Author Index 439

Whittle, Peter 1–3, 15, 16, 22, 29, 35,69, 167, 169, 283–285, 292, 297

Williams, Noah 81, 82, 143, 146, 147,200, 220, 231, 232, 234–236, 263,280, 285, 291, 300, 315, 353, 357,358

Wonham, W. M. 268Woodford, Michael 12, 340, 343, 345,

368, 369, 377, 383, 386

Xiong, Wei 300

Yaron, Amir 5, 9–11, 128–130, 265,294, 302, 321, 322, 326, 331, 332

Yor, Marc 88, 90, 180, 187

Zames, George 28

Zeldes, Stephen P. 28

Zhou, Kemin 28

Zia, Yihong 305

Zin, S. 7, 29, 31, 81, 147, 168, 227, 293

Zin, Stanley E. 16, 17, 227

Page 448: Maskin ambiguity book mock 2 - New York University
Page 449: Maskin ambiguity book mock 2 - New York University

Subject Index

Riccati equation, 21risk adjustment, 21risk sensitivity, 21

441