macroeconomic theory i southern methodist...

Macroeconomic Theory I

Southern Methodist University

Erwan Quintin

Federal Reserve Bank of Dallas∗

First draft: July 25, 2006

This version: December 18, 2007

∗Erwan Quintin: Research Department, Federal Reserve Bank of Dallas, 2200 N. Pearl St. Dallas, TX75201, (214) 922 5157, [email protected]. The views expressed herein are those of the authorsand may not reflect the views of the Federal Reserve Bank of Dallas or the Federal Reserve System. Thisdocument draws heavily from the class notes of Tim Kehoe, Ed Prescott, Jim Dolmas and Dirk Krueger.I would like to thank Erasmus Kersting and my SMU students for weeding out many errors in previousversions of this document. All remaining errors are mine.

1

Contents

1 Course information 5

1.1 Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 List of topics and readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Neoclassical Growth Theory 9

2.1 The Ramsey optimal growth problem . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 A specific example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 The bottom line and the golden rule . . . . . . . . . . . . . . . . . . 16

2.2 The Solow Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Dynamic efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Population growth and progress . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Endogenous growth (Ak models) . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Diamond’s overlapping generation model . . . . . . . . . . . . . . . . . . . . 21

2.6 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3

4 CONTENTS

3 Intertemporal General Equilibrium Models 35

3.1 Infinitely-lived consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Market structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1.2 Welfare theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.1.4 Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Overlapping generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Separating hyperplane theorem . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Deterministic Dynamic Programming 55

4.1 Principle of optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.2 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . 62

4.2.3 Theorem of the Maximum . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3 Characteristics of the value function . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 Value function iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5 Application to the Ramsey problem . . . . . . . . . . . . . . . . . . . . . . . 70

4.6 Deterministic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Stochastic dynamic programming 81

5.1 Probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2 Transition functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.4 Stochastic control problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.5 The stochastic principle of optimality . . . . . . . . . . . . . . . . . . . . . . 90

5.6 The stochastic Ramsey problem . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

CONTENTS 5

6 Bibliography 95

6 CONTENTS

Chapter 1

Course information

1.1 Course description

My goal in this course is to teach you some tools, models and techniques one needs to know

in order to read and write academic papers in macroeconomics. These tools can be a bit

difficult to learn. I encourage you to cooperate as much as possible with your classmates

and to talk to me whenever you get stuck on an assignment or have questions about the

material.

1.2 Resources

The main source of material for this course is my notes. I will also recommend readings from

the following textbooks:

1. N. L. Stokey and R. E. Lucas with E. C. Prescott, Recursive Methods in Economic

Dynamics, Harvard University Press, 1989.

2. C. Azariadis, Intertemporal Macroeconomics, Oxford: Blackwell Publishers, 1993.

3. L. Ljungqvist and T. J. Sargent, Recursive Macroeconomic Theory, MIT Press, 2000.

4. R. J. Barro and X. Sala-I-Martin, Economic Growth, McGraw Hill Publishers, 1995.

7

8 CHAPTER 1. COURSE INFORMATION

Finally, you should have handy a good micro theory textbook (such as Mas-Collel, Green

and Whinston) and a good textbook in Real Analysis (such as Rudin’s Principles of Math-

ematical Analysis.)

1.3 Grading

Homework (30%), Midterm (35%), Final (35%). I will announce exam dates as soon as

possible.

There will be four problems sets. I will assign them once the appropriate material has

been covered, and you will have one week to complete them. There is no need to type

your answers but if I can’t read them, you won’t get credit, so make sure they are legible.

Homework problems are difficult, and they involve computer programming in some cases

(you should start learning how to use Matlab right now. You should also start reading the

basic topology section of a senior-level real analysis textbook.)

Although I encourage you to collaborate with other students, each student must turn in

his or her own set of answers. In particular, everyone should write and turn in their own

code. Your code should be as neat as possible and contain comments that help me follow

your train of thought.

1.4 List of topics and readings

1. Neoclassical Growth Theory

Notes, chapter 2.

Jim Dolmas’ Matlab notes (and any other material you find useful for becoming proficient

with Matlab.)

SLP, chapters 2, 3.

Azariadis, chapters 13, 14.

Ljunqvist an Sargent, chapter 11.

1.4. LIST OF TOPICS AND READINGS 9

Barro and Sala-I-Martin, Intro, chapters 1, 2, plus the corresponding appendices.

Solow, R. (1956), “A Contribution to the Theory of Economic Growth”, Quarterly Journal

of Economics, 70, 64-94.

Diamond, P. (1965), “National Debt in a Neoclassical Growth Model”, American Economic

Review, 55, 1026-50.

Jones, L. E., and Manuelli, R. E., (2005) “Neoclassical Models of Endogenous Growth: The

Effects of Fiscal Policy, Innovation and Fluctuations,” in: Philippe Aghion and Steven

Durlauf (ed.), Handbook of Economic Growth, edition 1, volume 1, chapter 1.

2. Intertemporal General Equilibrium Models

Notes, chapter 3.

Kehoe, T. J. (1989) “Intertemporal General Equilibrium Models”, in Frank H. Hahn, editor,

The Economics of Missing Markets, Information, and Games, Oxford University Press,

363-93.

Ljunqvist and Sargent, chapter 8.

SLP, chapter 15.

3. Deterministic Dynamic Programming

Notes, chapter 4.

SLP, chapters 4, 5.

Ljunqvist and Sargent, chapter 2, 3.

4. Stochastic Dynamic Programming

Notes, chapter 5.

SLP, chapters 8, 9, 11.

10 CHAPTER 1. COURSE INFORMATION

Chapter 2

Neoclassical Growth Theory

2.1 The Ramsey optimal growth problem

2.1.1 Set-up

Consider an economy with one representative household and one representative firm. You

should think of these two representative agents as standing in for a large number of identical

households and firms, say a continuum of both types of agents distributed uniformly over

the unit interval (a continuum of mass one).1

Time is indexed by t ∈ 0, 1, 2, 3, . . . and both agents live forever.2 There are three

types of commodities:

1. a consumption good,

2. physical capital,

3. labor.

1The “large number” story is meant to justify the assumption that these stand-in agents behave com-petitively, i.e. that they take all prices as given. Atomistic agents literally have no influence on aggregatevariables.

2The assumption that households live forever turns out to simplify things quite a bit. For one thing, aswe will see in chapter 4, economies where households live forever lend themselves to the use of stationarydynamic programming techniques. Think of it for now as approximating long life spans. Alternatively, asBarro (1973) explains, households who value the welfare of their offsprings effectively solve an infinite horizonproblem.

11

12 CHAPTER 2. NEOCLASSICAL GROWTH THEORY

The household is endowed with a quantity a0 > 0 of physical capital at date 0 and with

one unit of labor in each period. In period 0, the household sells both factors to the firm

and earns

a0R0 + w0

where, respectively, R0 and w0 are the prices of capital and labor. This income can be

consumed or saved as physical capital to be used in period 1. Consumption (c0) and savings

(a1) at date 0 must solve

c0 + a1 = a0R0 + w0.

Similarly, letting (ct, at+1) denote the household’s decisions and (Rt, wt) be the factor prices

at date t, we must have:

ct + at+1 = atRt + wt.

Given a sequence Rt, wt+∞t=0 of prices, the household chooses a non-negative consumption-

saving sequence ct, at+1+∞t=0 to maximize:

+∞∑

t=0

βtU(ct)

subject to:

ct + at+1 = atRt + wt for all t ≥ 0

where U is continuous, strictly increasing and strictly concave on IR+, continuously dif-

ferentiable on IR++, and where β ∈ (0, 1) (the discount rate) measures the impatience of

households. We will assume that U is bounded to make sure that∑+∞

t=0 βtU(ct) is always a

bounded sum (it is then dominated by a geometric series of modulus β < 1.) This will turn

out to entail no loss of generality. We will also assume that limc7→0 U ′(c) = +∞ to make sure

that the household always chooses to consume strictly positive amounts. (Show this)

A very important question (and a type of question that we will ask over and over in this

class) is whether a solution exists to the household’s problem. Another important question

is whether the solution is unique. We will deal with those questions after fully stating the

Ramsey problem.

2.1. THE RAMSEY OPTIMAL GROWTH PROBLEM 13

The firm operates a technology that, each period, transforms quantities k ≥ 0 of physical

capital and n ≥ 0 of labor into the consumption good according to a production function

F (k, n) that is continuously differentiable on IR++ and satisfies:

1. F (0, n) = 0 and F (k, 0) = 0 for all n, k ≥ 0,

2. For all k > 0, F (k, •) is strictly increasing and strictly concave,

3. For all n > 0, F (•, n) is strictly increasing and strictly concave,

4. F satisfies constant returns to scale (CRS): for all θ > 0, F (θk, θn) = θF (k, n),

5. limk 7→+∞ F1(k, n) = 0 and limk 7→0 F1(k, n) = +∞ for all n > 0,

6. limn 7→+∞ F2(k, n) = 0 and limn 7→0 F2(k, n) = +∞ for all k > 0.

We will refer to production functions that satisfy those properties as neoclassical produc-

tion functions. The last two sets of conditions are often called Inada conditions. Because F

satisfies CRS, note that for all (k, n) > (0, 0), F (k, n) = nF ( kn, 1) = nf( k

n) where f is called

the intensive production function.

Note also that for all (k, n) > (0, 0), F1(k, n) = f ′( kn) while F2(k, n) = f( k

n) − k

nf ′( k

n).

In other words, marginal products only depend on the ratio between k and n, i.e. they are

homogenous of degree zero (this is a general result: the partial derivatives of functions that

are homogenous of degree r are homogenous of degree r − 1.)

The firm chooses inputs of capital and labor so as to maximize profits. That is, at date

t, it chooses kt and nt to maximize

F (kt, nt) + (1 − δ)kt − ktRt − ntwt.

where δ is the rate of depreciation of physical capital.3

A competitive equilibrium in this environment is a sequence Rt, wt+∞t=0 of prices, a se-

quence ct, at+1+∞t=0 of household decisions, a sequence kt, nt+∞

t=0 of firm decisions such

that:3Instead of assuming the household sells its capital to the firm, we could assume that it rents it to the

firm at rate rt in period t and also receives the undepreciated part of capital at the end of each period. Thesetwo formulations are clearly equivalent with rt ≡ Rt − (1 − δ) for all t ≥ 0.


1. Given prices, ct, at+1+∞t=0 solves the household’s problem;

2. Given prices, kt, nt+∞t=0 solves the firm’s problem;

3. The market for capital clears: kt = at for all t ≥ 0;

4. The market for labor clears: nt = 1 for all t ≥ 0.

Note that given the Inada conditions we have imposed on U and F , any equilibrium must

be such that at (hence kt) is strictly positive for all t. Also because of these conditions, it is

easy to show that the capital stock series cannot grow without bound in equilibrium.

Note also that the equilibrium definition makes no mention of the market for the con-

sumption good. That’s because it must clear too, by Walras’ law (look it up), since all other

markets clear. To see this, note that profit maximization on the part of firms implies for all

t ≥ 0:

F1(kt, nt) + (1 − δ) = Rt

F2(kt, nt) = wt

Then, the consumer’s budget constraint together with market clearing conditions for

capital and labor imply that at date t,

ct + kt+1 = ktRt + wt = kt(F1(kt, nt) + (1 − δ)) + ntF2(kt, nt) = F (kt, nt) + (1 − δ)kt

where the last equality follows from Euler’s theorem for homogenous functions (look it up.)

But this condition exactly says that the supply of the consumption good equals the demand

for the consumption good, that gross output (F (kt, nt)) equals consumption plus investment

(kt+1 − (1 − δ)kt).

Recalling now that U is continuously differentiable on IR++, necessary first order condi-

tions for an interior solution to the household’s problem are that, for all t ≥ 0:

βtU ′(ct) = λt

λt = λt+1Rt+1


where λt > 0 is the multiplier associated with the budget constraint at date t. Because

U is concave, these conditions are actually sufficient as long as the following transversality

condition holds:

limt 7→+∞

λtkt+1 = 0 (2.1.1)

Some intuition for this last condition can be gained from considering the case where the

household lives for a finite number T of periods. In that case, it is optimal to eat all capital

in the last period (set kT+1 = 0) unless the consumer is satiated and the value of capital is

zero (λT = βT U ′(cT ) = 0.) Taking this reasoning to the limit, yields the condition above. A

more careful argument is in SLP, chapter 4.4

Note that combining the first two optimality conditions for the household’s problem,

using the firm’s optimality conditions and market clearing conditions yields,

U ′(ct)

βU ′(ct+1)= f ′(kt+1) + (1 − δ) for all t ≥ 0 (2.1.2)

Together with the aggregate clearing condition,

ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0 (2.1.3)

and the transversality condition,

limt 7→+∞

βtU ′(ct)kt+1 = 0 (2.1.4)

we have a full description of competitive equilibria. An allocation ct, kt+1+∞t=0 is (part of) a

competitive equilibrium if and only if it satisfies (2.1.2-2.1.4) given k0.

Do competitive equilibria exist? Do they maximize welfare? To answer these questions,

consider an omniscient, benevolent social planner who can allocate resources as they please in

this environment. They are benevolent in that among non-negative allocations ct, kt+1+∞t=0

that satisfy the aggregate resource constraint in all periods, they choose the one that maxi-

4SLP actually prove the sufficiency of a slightly different but equivalent version of the transversalitycondition. I prefer stating conditions that can be motivated directly from finite versions of infinite problems.Establishing the necessity of transversality conditions is difficult. See references provided in SLP at the endof chapter 4.


mizes the welfare of the representative household. That is, given k0, the planner maximizes

+∞∑

t=0

βtU(ct)

subject to, for all t ≥ 0:

ct + kt+1 = f(kt) + (1 − δ)kt,

and ct, kt ≥ 0.

This problem is called the Ramsey optimal growth problem. Note that solving this problem

seems much easier than looking for competitive equilibrium allocations. Yet, the two tasks

are (essentially) the same. To see this, note that first order conditions for the planner’s

problem are (after some manipulations which you should carry out):

U ′(ct)

βU ′(ct+1)= f ′(kt+1) + (1 − δ) for all t ≥ 0

and,

ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0

and that the planner must satisfy the following condition:

limt 7→+∞

βtU ′(ct)kt+1 = 0

But these are exactly conditions (2.1.2-2.1.4). Therefore, competitive allocations and

solutions to the social planner coincide. It follows that competitive allocations are Pareto

Optimal, that no other feasible allocation exists that gives the household higher welfare.

One can show that the planner maximizes a strictly concave objective function over a

convex choice set, that this choice set is compact in the product topology, and that the

objective function is continuous in that topology as well. (Since we are encountering these

notions for the first time, this is worth a digression. The details are in section 2.6.) Therefore,

a unique solution to the planner’s problem exists, hence a unique competitive equilibrium


exists.

2.1.2 A specific example

Assume that for all (k, n) ≥ (0, 0), F (k, n) = kαn1−α and that for all c ≥ 0, U(c) = log c.5

Note that log is not defined at zero in the standard real numbers, but that is of no concern.

First, one could extend the function to IR+ using the extended reals. More simply, note that

for all strictly positive sequence ct∞t=0:

exp

(+∞∑

t=0

βt log(ct)

)= Π+∞

t=0 cβt

t

so that the utility function on the right represents the same preferences as∑+∞

t=0 βt log(ct).

That utility function (which we could use instead without any effect on household decisions)

is defined everywhere on IR∞+ as long as consumption is bounded. But in this environment

consumption is bounded since capital is bounded above (show). We can use that fact to

impose an effective bound on the utility function.

In this specific example, the evolution of consumption and savings in the Ramsey problem

is governed for all t ≥ 0 by:

ct+1 = β(αkα−1

t+1 + (1 − δ))ct (2.1.5)

kt+1 = kαt + (1 − δ)kt − ct (2.1.6)

The evolution of (ct, kt+1) can be summarized as follows. Consumption rises as long as

β(αkα−1

t+1 + (1 − δ))

> 1 i.e. as long as the marginal product of capital exceeds the inverse

of the discount rate, and falls otherwise. The positive6 steady state level of capital, then, is:

k∗ =

(α

β−1 − (1 − δ)

) 11−α

.

5log refers to the natural logarithm function as is standard practice in the United States, though mostcountries would write ln for it.

6There is also a degenerate, zero-capital steady state in this economy. That steady state is unstablehowever, the economy never converges there unless it starts there.


As for capital, it rises whenever consumption is below net output (gross output minus de-

preciation) i.e. whenever ct ≤ kαt − δkt and falls otherwise.

Those dynamics can be represented on a phase diagram. Since we have argued that a

unique optimal path exists, there is one and only one c0 such that the path implied by (2.1.5-

2.1.6) is compatible with the non-negativity of savings, the nonnegativity of consumption

and the transversality condition. That unique path is called the saddlepath. In problem 3, I

ask you to compute it for specific values of all parameters.

I should emphasize that phase diagrams are heuristic tools and no substitute for a careful

proof of global convergence in the Ramsey model. We will provide such a proof in chapter

4 once we have the tools of dynamic programming at hand.

2.1.3 The bottom line and the golden rule

To summarize, the growth story implied by the Ramsey model is as follows in the case where

the economy starts below its steady state level of capital k∗. Capital and consumption both

rise and converge towards their steady state value where the marginal product of capital

equals the gross rate β−1 of time preference.

We know that this path maximizes welfare, but does it maximize steady state consump-

tion? In other words, consider the following problem. Assume the planner was free to choose

the initial level k of capital but had to commit to maintaining this level of capital forever.

Which level would they choose? That is, if the planner could choose a steady state value and

ignore the welfare consequences of transiting there, what capital stock would they choose?

This steady-state-consumption maximizing level of capital is called the golden rule capital

stock.

Since maintaining k requires setting consumption equal to f(k) − δk (that is, eat net

output), the planner would choose to set f ′(k) = δ i.e. to set the marginal product of capital

to one (make sure you see that the marginal product of capital is indeed one). Instead,

the Ramsey solution converges to a strictly lower level of capital. Why? That is because

maximizing steady state consumption is not the Ramsey planner’s objective. The Ramsey

planner does not get to ignore the transition to steady state. While reaching the golden

rule level of capital is entirely feasible for the planner, the saving policy this requires implies

2.2. THE SOLOW MODEL 19

suboptimally low consumption in early periods. There is such a thing as saving too much.

2.2 The Solow Model

2.2.1 Set-up

Solow (1956) considers exactly the same environment as Ramsey except that he posits that

savings (hence investment) is a fraction s > 0 of gross output at all dates. Clearly given this

ad-hoc assumption on saving behavior, we have no reason to expect that the resulting path

of capital and consumption is optimal.

Since consumption is a function of output hence capital only, the model boils down to

the following first order difference equation for capital for all t:

kt+1 = sf(kt) + (1 − δ)kt

This problems lends itself very nicely to graphical analysis. Again, from any initial level

of capital, capital converges to a unique steady state value. Along the transition, if we

start below steady state, the consumption and the marginal product of labor rise while the

marginal product of capital falls.

2.2.2 Dynamic efficiency

Is the resulting allocation efficient? In general, no. And making this case doesn’t even

require specifying preferences beyond the assumption that households always prefer more

consumption.

Given k0 = a0, call a non-negative path ct, kt+1+∞t=0 feasible if is satisfies the aggregate

resource constraint in every period. Define a feasible path ct, kt+1+∞t=0 to be dynamically

inefficient if there exists another feasible path c′t, k′t+1+∞

t=0 such that c′t ≥ ct for all t with a

strict inequality in at least one period. That is, a feasible capital path exists that gives the

household more consumption in every period and strictly more in at least one period.

Assume now that s is such that the steady state of capital exceeds the golden rule capital


stock level kGR. Then, from some date T on, f ′(kt) < δ. Set instead k′t = kGR for all t ≥ T

and c′t = f(kGR) − δkGR. The resulting path is feasible and yields more consumption than

the Solow path from T − 1 on. (I ask you for the complete argument in problem 5.)

Households in this economy are saving too much. In this case, Rt = f ′(kt) + (1 − δ) < 1

past t. Cass (1972) provides the following complete characterization of dynamic efficiency

in the one-sector optimal growth context:

Theorem 1. A feasible path ct, kt+1+∞t=0 is inefficient if and only if limt 7→+∞

∑ts=0 Πs−1

i=0Ri <

+∞

As an illustration of this very nice result, notice that when in the Solow model the capital

path eventually exceeds the golden rule capital stock, we have that Rt = f ′(kt)+(1−δ) < 1−ε

eventually for some ε > 0. But this means that∑+∞

t=0 Πt−1i=0Ri eventually behaves like a

geometric sum of modulus smaller than 1 − ε, hence converges.

2.3 Population growth and progress

The economies we have considered so far all converge to a steady state where consumption,

the capital stock and output are constant. To be useful as models of growth, they need to

be consistent with the fact that, in most countries, output and consumption do grow over

time.

The easiest (and perhaps most sensible way) to generate some growth in the model is to

assume that the quantity of labor the household can deliver augments geometrically at rate

g > 0.

Note that whether this is because the household is getting bigger (population growth) or

because the household is able to deliver more labor per unit of time (productivity growth)

is immaterial. One reason why household time represents more labor for the firm could be

for instance technological progress.

2.3. POPULATION GROWTH AND PROGRESS 21

At any rate, at date t the resource constraint now becomes:

ct + kt+1 = F (kt, (1 + g)t) + (1 − δ)kt

⇐⇒ ct

(1 + g)t+

kt+1

(1 + g)t= F

(kt

(1 + g)t, 1

)+ (1 − δ)

kt

(1 + g)t

⇐⇒ ct + kt+1(1 + g) = f(kt) + (1 − δ)kt

where for date t and variable xt, xt ≡ xt

(1+g)t .

It is natural in this environment to look for a balanced growth path, i.e an equilibrium

where all variables grow at the same rate as labor. To make sure that such a path exists,

we need some assumption on preferences. Assuming that U is of the Constant Relative Risk

Aversion (CRRA) sort, i.e that U(c) = cσ

σfor all c where σ < 1, works. (Note that σ can

be negative and that σ = 0 is the log case. Now would be a good time to read Jones and

Manuelli, 2005.)

Then, in the Ramsey problem, the planner chooses a path ct, kt+1+∞t=0 that satisfies the

resource constraint above and maximizes

+∞∑

t=0

βtU((1 + g)tct) =

+∞∑

t=0

(β(1 + g)σ)t cσt

σ

using the fact that preferences are of the CRRA type. To make sure that the sum is always

bounded we need to assume that β ≡ β(1+g)σ < 1 which is an implicit bound on the growth

rate compatible with balanced growth. First order conditions become:

βt(ct)σ−1 = λt

λt(1 + g) = λt+1(f′(kt+1) + (1 − δ))

where λt is the multiplier associated with date t’s resource constraint. Combining gives:

ct+1

ct=

(β(f ′(kt+1) + (1 − δ))

1 + g

) 11−σ

Together with the resource constraint we have a system that looks exactly like the one we


had before, and the analysis of the system is the same as before except that per labor unit

variables replace our old variables. In particular, in steady state, consumption, capital and

output all grow at the rate of g > 0 of progress.

As Jones and Manuelli (1990) explain, the reason why no growth can take place in the

Ramsey environment with fixed labor is that the only reproducible input in that model is

capital, and that returns to capital eventually become dominated by the rate of physical

depreciation. Growth requires that returns to all reproducible inputs be sufficiently high

asymptotically. What we did in this section is allow labor to grow exogenously, making it

reproducible in a trivial sense. Since the combined returns to capital and labor are constant,

perpetual growth then becomes possible.

We only considered the case of exogenous labor growth so far, but one could instead

assume that households can invest in labor the way they invest in capital, as in human

capital models. Then returns to all reproducible factors would no longer be diminishing,

making asymptotic growth once again possible. Models built around these ideas are called

endogenous growth models since growth then results from the accumulation decisions of

agents. While these models share a number of features with exogenous models of growth

(such as the balanced asymptotic nature of the equilibrium path), they also make predictions

that clearly separate them from exogenous models, as the next section discusses.

2.4 Endogenous growth (Ak models)

Assume that returns to physical capital are constant so that f(k) = Ak for all k ≥ 0

where A > 0. The premise that returns to physical capital are not diminishing (or, rather,

that they do not converge to zero as k becomes large given a fixed amount of labor) may

seem incongruous but you should think of capital here in a broad sense as representing all

reproducible inputs. For more on this, see e.g. Rebelo (1991) or McGrattan (1998).

Then, assuming that U is of the CRRA sort with parameter σ < 1, manipulations of the

first order conditions of the corresponding social planner problem yield, for all t ≥ 0:

ct+1

ct= (β(1 + A − δ))

11−σ .

2.5. DIAMOND’S OVERLAPPING GENERATION MODEL 23

If a balanced growth equilibrium path exists where consumption, capital and output

all grow at rate g ≥ 0 the above condition then implies that 1 + g ≡ (β(1 + A − δ))1

1−σ .

Obviously, growth requires that β(1 + A − δ) > 1. Recall also that the planner’s problem

is well defined only provided that β(1 + g)σ < 1. This gives us a range of parameters

compatible with balanced growth. In fact, under this condition, the balanced growth path

where all endogenous variables immediately and permanently grow at rate g is the only

solution to the planner’s problem.

Models of this sort turn out to make sharp predictions for the impact of government

policies on growth that differ greatly from the predictions of models with exogenous growth

(such as the model described in the previous section.) Assume for instance that in any

given period t, fraction τ of gross output Akt is taxed and that the proceeds from taxation

are dissipated.7 Such an economy is equivalent to an economy with productivity parameter

A(1 − τ) < A hence taxes permanently lower growth rates.

In models with exogenous growth however, asymptotic growth rates are completely inde-

pendent of net production opportunities, hence they are independent of taxes. (Make sure

that you can convince yourself of this.)

Another way to put this is that policies that affect investment rates have permanent effects

on an economy’s growth rate in endogenous models. This sharp prediction of endogenous

models has been extensively tested, with mixed results (see e.g. McGrattan, 1998, for more

on this.)

2.5 Diamond’s overlapping generation model

Assume that each period a new household is born that lives for two periods. At date t

therefore, there are two households: one born at date t − 1 that is in the second and final

period of its life, and one born at date t.

Assume that in the first period of their life households can deliver one unit of labor but

that they do not work when old. Date t household then splits its labor income between

7All that matters for this argument is that the way taxes are used does not impact marginal utilities orproduction opportunities.


consumption in the first period of its life cyt (y for young) and savings at+1. In the second

period of their life, date t household simply sells its physical capital and consumes cot+1 =

at+1Rt+1.

To get the economy off the ground it is necessary to assume that at date 0 there is a

household in the second period of its life, (the initial old) and that this initial generation is

endowed with physical capital a0. The initial old simply consume their income: co0 = a0R0.

Date t household chooses a non-negative consumption-saving profile to solve:

max U(cyt ) + βU(co

t+1)

subject to

cyt + at+1 = wt

cot+1 = at+1Rt+1

where, as before, we assume that U is strictly increasing, strictly concave, continuous on IR+

and continuously differentiable on IR++ while β < 1. The firm, for its part, does exactly

what it did in the Ramsey environment which implies as before that for all t:

Rt = f ′(kt) + (1 − δ) (2.5.1)

wt = f(kt) − f ′(kt)kt (2.5.2)

A competitive equilibrium in this context is a sequence of prices Rt, wt∞t=0, a consumption

level co0 for the initial old, policies (cy

t , at+1, cot+1)+∞

t=0 for all other generations, and policies

kt, nt∞t=0 for the firm such that:

1. Given prices, co0 solves the initial old’s problem (co

0 = a0R0);

2. Given prices, (cyt , at+1, c

ot+1) solves date-t household’s problem;

3. Given prices, kt, nt∞t=0 solves the firm’s problem for all t ≥ 0;

4. The market for capital clears: kt = at for all t ≥ 0;

2.5. DIAMOND’S OVERLAPPING GENERATION MODEL 25

5. The market for labor clears: nt = 1 for all t ≥ 0.

Do equilibria exist? Are they unique? Are they optimal? To answer those questions,

observe first that because U is strictly concave there is a unique solution to the problem of

each household given wt and Rt+1. In turn, prices only depend on the capital stock so that we

can write a(kt, kt+1) as date t household’s saving’s decision given (kt, kt+1). In equilibrium,

we must have, for all t,

a(kt, kt+1) = kt+1. (2.5.3)

Assume that savings increase with the interest rate. Then an increase in kt+1 lowers Rt+1

which in turn means that a(kt, kt+1) is a decreasing function of kt+1. Hence exactly one

solution to the above equation exists.8 The assumption that savings rise with the interest

rate is often called the gross substitutability assumption. It says that when the relative price

of future consumption falls, agents choose to shift more resources to the second period of

their life (or borrow less from future income.)

Then (2.5.3) can be solved for kt+1 as a function of kt. Write G(k) for the household’s

savings decision given k. The evolution of the system is fully described by kt+1 = G(kt) for

all t ≥ 0 and we have:

Theorem 2. If U is such that savings rise with the rental rate, then a unique equilibrium

exists.

In general however, there may be several solutions to equation (2.5.3), that is, G may be

set-valued (a correspondence) and several equilibria may exist. (See Azariadis, 1993, for a

very nice discussion and several interesting examples.)

How about welfare? Since all households solve a finite horizon problem, a transversality

condition that prevents overaccumulation no longer emanates from the household problem

8Existence only requires that a solution to this capital market clearing equation exist given any kt, whichwe can guarantee under weak conditions. It suffices for instance to assume that consumption when old is anormal good. To see this note first that the theorem of the maximum (see chapter 4) implies that a(k, •) iscontinuous for any k > 0. Furthermore, given k > 0, a(k, k′) < k′ when k′ is high enough since a(k, k′) <

f(k)+(1−δ)k for any k′ > 0. On the other hand, a(k, k′) = co(w(k),R(k′))R(k′) > k′ ⇐⇒ co(w(k), R(k′)) > k′R(k′).

The left-hand side rises as k′ rises since co is normal (make sure you see that) while the left hand side (whichis bounded above by f(k′) + (1− δ)k′) becomes vanishingly small when k′ does. It follows that a(k, k′) > k′

when k′ is small enough. An appeal to the intermediate value theorem completes the argument.


and we cannot rule out the possibility of dynamic inefficiency. (See Diamond, 1965, for much

more on this issue.)

Problem 6 provides a way to build specific examples that show that the first welfare

theorem may fail to hold in this model. One can specify technological opportunities and

preferences so that the equilibrium dynamics of capital are exactly those of the Solow model.

Then, one can trivially build an example where capital converges to a value that exceeds the

golden rule level, a Pareto inefficient outcome.

To see that the competitive equilibrium is inefficient in that case, note that the rental

rate of capital (the gross return on savings) is less than one eventually. But households in

this environment can always agree to transfer resources from the young to the old at a rate

of one-for-one. An arrangement such as social security or fiat money could therefore raise

everybody’s returns on savings without hurting anyone. We will elaborate on this possibility

in the next chapter.

For now, you should remember that the key results we have established in the Ramsey

model (that equilibria are always unique, optimal and display global convergence to a unique

steady state) may all fail to hold in overlapping generations models. Depending on which

questions you wish to ask, these may or may not be desirable features.

2.6 Existence and uniqueness

2.6.1 Existence

What conditions guarantee that maximization problems have solutions? In particular, under

what conditions does the Ramsey problem we studied in this chapter have a solution? To

answer these questions, we need some basic notions of topology. If you haven’t already, you

should start the process of reading a Real Analysis textbook such as Rudin’s Principles of

Mathematical Analysis from cover to cover. We will start with some definitions.

Let (X, d) be a metric space. That is, X is a set and d is a distance function on X × X

that satisfies for all (x1, x2, x3) ∈ X3:

1. d(x1, x2) ≥ 0 with equality if and only if x1 = x2

2.6. EXISTENCE AND UNIQUENESS 27

2. d(x1, x2) = d(x2, x1)

3. d(x1, x2) ≤ d(x1, x3) + d(x3, x2)

The ball centered at x ∈ X of radius ε is the set Bε(x) ≡ y ∈ X : d(x, y) ≤ ε.

A sequence xn∞n=0 ∈ X∞ converges to x ∈ X if for all ε > 0 there exists N such that

n > N implies xn ∈ Bε(x).

A set A is called open if whenever x ∈ A there exists ε > 0 such that Bε(x) ⊂ A. A is

closed if its complement in X is open. Equivalently (show), a set A is called closed if any

convergent sequence xn+∞n=0 ∈ A∞ converges to a point in A.

A set A is called bounded if it is contained in a ball. A set A is called totally bounded if

for all ε > 0, A can be covered with a finite number of balls of radius ε. (For an example of a

metric space with sets that are bounded but not totally bounded, scroll down 3 paragraphs.)

An open cover of set A is a collection Oα of open sets such that A ⊂ ∪Oα.

A set is called compact if any open cover of A contains a finite number of sets that cover

A. Equivalently in a metric space, a set is called compact if it is closed and totally bounded.

Also equivalently in a metric space, a set is called compact if every sequence in the set has a

convergent subsequence that converges to a point in the set. (This last notion is often called

sequential compactness. It is the most convenient to use in many proofs.)

In IRn with the standard Euclidian norm, a set is compact if and only if it is closed and

bounded. With a different metric, bounded is no longer sufficient. For instance, let X = IR

and for all (x, y) ∈ IR2 let d(x, y) = 1 if x 6= y, and d(x, y) = 0 otherwise. This is called

the discrete metric. Sets are compact in that metric if and only if they are finite. (Exercise:

which sets are open in this metric?) IR is bounded but not totally bounded in this metric.9

A real function f : X 7→ IR is called continuous on X if for any open (closed) set O

of reals, f−1(O) is open (closed). In metric spaces, there are two alternative, equivalent

definitions of continuity. A function f is continuous on X if for all x ∈ X and ε > 0 there

exists δ > 0 such that d(x, y) < δ =⇒ |f(x) − f(y)| < ε. Finally, a function f is continuous

9This example should tell you that boundedness of a set is really not a very interesting notion topologicallyspeaking. Any metric can be transformed into a bounded metric via a simple transformation withoutchanging anything about the nature of the space under study. But, in this topologically equivalent metric,all sets are bounded.


if for every convergent sequence xn in X,

limn 7→∞

f(xn) = f( limn 7→∞

xn).

Let A be subset of X. Define arg maxA f = x ∈ A : f(x) ≥ f(y) ∀y ∈ X. Here’s the

result we need:

Theorem 3. (Weierstrass) Let (X, d) be a metric space, A be a subset of X, and f be a

function on X. If f is continuous on A and A is compact, then arg maxA f is not empty.

Proof. For each x ∈ A define Sx = y ∈ X : f(y) < f(x). Because f is continuous, all

these sets are open. Take any finite subset x1, x2, . . . xn of A and let xi be such that

f(xi) ≥ f(xj) for j = 1, 2, . . . n. Then, xi /∈ ∪j=1,...nSxj. This means that Sxx∈A has no

finite cover, hence, since A is compact that it does not cover A. This implies that arg maxA f

is not empty.

The assumption that f is continuous is stronger than what we need, as the proof makes

clear. We only need f to be such that the sets Sx defined in the proof are open for all x ∈ X.

Such a function is called upper semi continuous.

In our Ramsey problem, A is the planner’s choice set. That is,

A = ct, kt+1+∞t=0 : ct ≥ 0, kt+1 ≥ 0, ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0 with k0 = a0

while f(ct, kt+1+∞

t=0

)=∑+∞

t=0 βtU(ct). We need to find a metric space that contains A such

that A is compact and f is continuous.

A metric space that does the trick is (IR2)∞ (the set of sequences of pairs of real numbers)

equipped with the product topology metric. The product distance between two elements x

and y of (IR2)∞ is d(x, y) =∑+∞

t=0 θt |xt−yt|1+|xt−yt| where θ < 1. (Note that x and y are pairs,

|x− y| denotes the Euclidian distance between x and y in IR2.) A set of sequence converges

to another sequence in this metric if and only if it converges in each coordinate.

Conditions we have imposed on the production function imply that M > 0 exists such

that kt < M for all t (Show). In turn this implies that consumption is bounded as well.


That A is compact now follows directly from a powerful theorem for product spaces called

Tychonoff’s theorem which says that any Cartesian product of compact sets is compact.

For completeness however, let us argue directly that A is totally bounded and closed in

the product topology. Closedness follows from coordinate-wise convergence (show it.) To

prove total boundedness, fix ε > 0 and pick T such that∑+∞

t=T θt < ε2. Then find a finite

number of sequences such that for the first T coordinates all elements of A are within distance

ε2

of those sequences. (Define a grid of mesh εT

in each coordinate for both consumption and

capital and take your finite set of sequences to be all possible combinations of grid points

before T , and 0 in all coordinates after T .) Since the tails of all elements of A past T can

only add ε2

in distance we have our finite set of balls covering all of A.

Now we need to argue that f is continuous. To see that it is recall that U is bounded

by some number M < 0. (We didn’t really need to assume it: the fact that consumption

is bounded above imposes an effective bound on U .) Take a sequence of sequences cit+∞

t,i=0

that converges to sequence c∗t +∞t=0 in the product topology. In other words, for all t, ci

t 7→ c∗t

as i gets large. Fix T and, using the fact that U is a continuous real function, choose i large

enough so that |U(cit) − U(c∗t )| ≤ ε

βtTfor all t ≤ T . Then,

∣∣∣∣∣+∞∑

t=0

βtU(cit) −

+∞∑

t=0

βtU(c∗t )

∣∣∣∣∣ ≤+∞∑

t=0

βt∣∣U(ci

t) − U(c∗t )∣∣

≤T−1∑

t=0

βt∣∣U(ci

t) − U(c∗t )∣∣+

+∞∑

t=T

βt∣∣U(ci

t) − U(c∗t )∣∣

< ε + 2MβT

1 − β

Because β < 1, the last expression can be made as small as we wish by making T large

enough. This implies that∣∣∑+∞

t=0 βtU(cit) −

∑+∞t=0 βtU(c∗t )

∣∣ 7→ 0 as i grows large and we have

shown that f is continuous in the product metric which is the last thing we needed in order

to apply Weierstrass’ theorem.

A metric that does not work in this case is the “supnorm” metric, a metric we will

use repeatedly in the dynamic programming section of this course. The trick in finding a

topology that works is to note that topologies where many sets are open (like the trivial


topology where all sets are open and all functions are continuous but the only sets that are

compact are finite sets) make compactness difficult to show but continuity easy to show.

The opposite is true for topologies where few sets are open (like the trivial topology where

the only open sets are the space itself and the empty set. The only continuous real functions

then are the constant functions, but all sets are compact.) We need a topology that is just

right.

2.6.2 Uniqueness

When are solutions to maximization problems unique? To answer that question we need to

introduce the notion of convexity. For this, we need (yet) a bit more structure. A set S is

called a real linear space if

1. for all (x, y) ∈ S×S there exists an element of S called the sum of x and y and denoted

by x + y that satisfies all the standard properties of addition,

2. for all (α, x) ∈ IR×S there exists an element of S called the (scalar) product of α and

x and denoted by αx that satisfies the standard properties of scalar multiplication,

3. for all (α, θ, x, y) ∈ IR2 × S2,

(α + θ)(x + y) = α(x + y) + θ(x + y) = αx + αy + θx + θy.

A subset A of a linear space is called convex if for all (x, y) ∈ A×A, and for all θ ∈ [0, 1],

θx + (1 − θ)y ∈ A. A function f on A is called concave if for all (x, y) ∈ A × A, and for all

θ ∈ [0, 1], f(θx+(1−θ)y) ≥ θf(x)+(1−θ)f(y). It is called strictly concave if the inequality

is strict whenever θ ∈ (0, 1) and x 6= y.

The result we need is that whenever A is convex and f is strictly concave, arg maxA f

contains at most one element. To see this, assume that two distinct elements of A maximize

f . Then a strict convex combination of these two elements raises f strictly, contradicting

the premise that the two elements are maximands. All told, if f is continuous and strictly

concave, and if A is compact, then arg maxA f contains exactly one element.


It is easy to see (and you should show) that the Ramsey planner maximizes a strictly

concave function on a convex set.10 So at most one optimal allocation exists. Since we have

argued above that at least one exists, the Ramsey problem yields exactly one solution.

10Here are some details. In order to use our uniqueness result, it is useful to extend the planner’s choiceset to A = ct, kt+1+∞

t=0 : ct ≥ 0, kt+1 ≥ 0, ct + kt+1 ≤ f(kt) + (1 − δ)kt for all t ≥ 0 with k0 = a0 i.e. toallow the planner to waste resources. This entails no loss of generality since we are only adding an optionthe planner will choose to ignore, but it makes the choice set convex since f is concave. Now, we can use ouruniqueness result. But since the objective function depends on consumption alone, the result only implies inand of itself that at most one optimal consumption path exists. To see that this, in turn, implies a uniquecapital only requires noting that the planner always exhausts all resources in all periods.


2.7 Problems

Problem 1

A household lives for two periods and chooses (c1, c2) ≥ (0, 0) to solve:

max U(c1) + βU(c2)

subject to:

c1 ≤ y

c2 = (y − c1)(1 + r)

where y > 0, r > −1 and U is a function on IR+.

1. Assume that U is continuous on IR+. Show that the household’s problem has a solution.

2. Assume in addition that U is strictly concave, show that the solution is unique.

3. Impose some additional conditions on U such that c1(r, y) is differentiable in r and y.(Use standard intermediate-micro arguments.)

4. Impose the additional restriction on the household’s choice set that (c1, c2) > (0, 0)and that limc7→0 U ′(c) = +∞. Show that c1 is independent of r if and only if U is thelog function (up to a linear transformation). This will require that you solve a secondorder differential equation.

5. Find a utility function U such that savings (y − c1) decrease with the interest rate.That is, find a utility function such that c1 rises with the interest rate.

Problem 2

Nicholas Kaldor argued a long time ago that total labor income is a fairly stable shareof GDP in most countries across time. Recently, Douglas Gollin has argued that this is alsoroughly true across countries (regardless of the stage of economic development.) For ourneoclassical economy to be consistent with these facts, we need that for all possible capitallevels k > 0, F2(k,1)

F (k,1)be a constant α ∈ (0, 1).

1. Assume that F satisfies all that a neoclassical production function must. Show thatF2(k,1)F (k,1)

= α for all k > 0 if and only if f(k) = Ak1−α where A > 0. This will requirethat you solve a first order differential equation. Don’t forget to explain why A > 0.[Note: This result is why most papers that deal with real business cycle or develop-ment questions in one-sector model use a Cobb-Douglas aggregate production function.People who use other functional forms have some explaining to do.]

2. Assume that F (k, n) = [αkσ + (1 − α)nσ]1σ where σ < 1. What happens to the labor

share along the transition path when σ < 0, when σ > 0, when σ = 0 (use L’Hospitalrule in this case)?

2.7. PROBLEMS 33

Problem 3

Consider the Ramsey problem we studied in class with U(c) = log(c) for all c > 0,β = 0.95, δ = 0.1, F (k, n) = 10k0.33n0.67 and k0 = 1. Use Matlab to solve for the equilibriumpath to steady state.

1. What is c0 (approximately)? (This is for me to check that your program is correct.)

2. Use Matlab to plot the phase diagram we drew in class and the saddlepath.

3. Plot (on one chart) the capital stock path over the first 50 periods for β = 0.9, β = 0.95,β = 0.975 holding everything else equal. Explain the effect of changes in β on thecapital stock path.

Hints: You have a system of two first order difference equations (the Euler equation forconsumption and the aggregate resource constraint) to simulate. Do this as follows:

1. Begin by computing the steady state values of k and c.

2. Guess c0

3. Since you know k0, the resource constraint and your c0 guess imply k1.

4. Given k1 the Euler equation for consumption implies c1.

5. Repeat the previous two steps to generate the first 50 values of ct, kt+1 given yourc0 guess.

6. Update you c0 guess and repeat procedure until k(50) and c(50) are near their steadystate value.

Problem 4

Consider a neoclassical economy populated by a representative household and a represen-tative firm. Time is discrete and infinite. The household is endowed with one unit of labor ineach period and a quantity a0 > 0 of physical capital a date 0. The firm can transform inputs(k, n) ≥ (0, 0) into the consumption good according to a neoclassical production function F ,and capital depreciates at rate δ ∈ (0, 1).

1. Define the intensive form f of production function F . (Don’t forget to specify a domainof definition.)

2. State the problem which the golden rule capital stock level kGR must solve and statea condition that fully characterizes kGR.

3. Assume that F (k, n) = kαn1−α for all (k, n) ≥ 0 where α ∈ (0, 1) and that thehousehold consumes ct = (1 − s)F (kt, nt) at all dates t where s ∈ (0, 1). Find a valuesGR (as a function of the model’s parameters) such that the steady state capital stockstrictly exceeds kGR if and only if s > sGR.


4. Assume now that the household is endowed with labor (1 + g)t at date t where g > 0.(F is back to an arbitrary neoclassical production function.) Define a balanced growthpath. Show that along any balanced growth path the marginal product of capital isconstant.

Problem 5

Consider a neoclassical economy populated by a representative household and a represen-tative firm. Time is discrete and infinite. The household is endowed with one unit of labor ineach period and a quantity a0 > 0 of physical capital a date 0. The firm can transform inputs(k, n) ≥ (0, 0) into the consumption good according to a neoclassical production functionF , and capital depreciates at rate δ ∈ (0, 1). Denote by f the intensive form of productionfunction F .

1. Define what it means for a capital stock and consumption sequence ct, kt+1∞t=0 tobe feasible and what it means for ct, kt+1∞t=0 to be dynamically inefficient givenk0 = a0 > 0.

2. Show that any feasible sequence ct, kt+1∞t=0 such that kt+1∞t=0 rises monotonicallyto a value k∗ > kGR is dynamically inefficient.

Problem 6

Consider an economy where time is discrete and infinite and where each period a two-period lived household is born. For all t ≥ 0, the household born at date t is endowed withone unit of time in the first period of their life which they supply to a representative firm forwage wt ≥ 0. They can save part of these earnings as physical capital which they rent to thefirm in the second and last period of their life at rental rate Rt+1. All told, date t householdchooses a non-negative consumption-saving profile (cy

t , cot+1) to solve:

max U(cyt ) + βU(co

t+1)

subject to

cyt + at+1 = wt

cot+1 = at+1Rt+1

where, as usual, we assume that U is strictly increasing, strictly concave on IR+ and contin-uously differentiable IR++, while β < 1.

At date 0 there is a household in the second period of its life (the initial old) that isendowed with physical capital a0. The initial old simply consume their income: co

0 = a0R0.Finally, the firm can transform inputs (k, n) ≥ (0, 0) into the consumption good according

to a neoclassical production function F , and capital depreciates at rate δ ∈ (0, 1]. Denote byf the intensive form of production function F . At date t and given factor prices, the firm

2.7. PROBLEMS 35

chooses non-negative kt and nt to maximize

F (kt, nt) + (1 − δ)kt − ktRt − ntwt.

1. Define a competitive equilibrium in this environment.

2. Show that if U is the log function (ignore as usual the fact that the log function is notdefined at zero), if f(k) = kα where α ∈ (0, 1), and if δ = 1 then the dynamic path ofcapital implied by this model has the same form as in the Solow model.

3. Under the assumptions of question 2, find a condition on β and α (hence on s) thatimplies that the capital stock converges to a value that exceeds the golden rule capitalstock.

4. Under that condition and those same assumptions, describe in no more than 5 sentencesa social contract that would raise the utility of all generations.

Problem 7

Consider the Ramsey problem with U(c) = log c for all c > 0, f(k) = kα for all k > 0and δ = 1.

1. Prove that the optimal solution is such that ct = sf(kt) for all t ≥ 0 where s ∈ (0, 1)and find s.

2. Find the steady state level of capital as a function of the model’s parameters.

3. Prove analytically (no graphs) in this case that the optimal capital path converges tothis steady state level regardless of initial conditions.

Problem 8

Consider a social planner who chooses a non-negative sequence ct, kt+1+∞t=0 to maximize

+∞∑

t=0

βt c1−ρt

1 − ρ

subject to:ct + kt+1 = Akt for all t ≥ 0

given an initial level k0 of the capital stock, and where β ∈ (0, 1), A > 0 and ρ > 0.

1. What condition do you need to impose on the problem’s parameters to guarantee thatit has a unique solution?

2. For the remainder of this problem, take as given that a unique solution exists. Whatadditional condition do you need to impose on the problem’s parameters to guaranteethat the solution exhibits positive growth? What is the growth rate of output in thiseconomy?


3. Prove that the solution to the social planner problem is such that ct = s(Akt) for allt ≥ 0 where s ∈ (0, 1), and find s.

Problem 9


+∞∑

t=0

βtU(ct)

subject to:ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0

given an initial level k0 of the capital stock, where β and δ are both in (0, 1), where U : IR+ 7→IR is strictly concave, strictly increasing, continuous, bounded, continuously differentiable onIR++, satisfies an Inada condition at zero, and where f is the intensive form of a neoclassicalproduction function.

1. Define what it means for an allocation to be feasible and for an allocation to be dy-namically efficient in this context.

2. Prove that the solution to the social planner problem (take as given that it exists andis unique) is dynamically efficient.

3. Find an allocation that is dynamically efficient, but does not solve the social planner’sproblem.

Chapter 3

Intertemporal General Equilibrium

Models

This chapter introduces the two canonical models of modern macroeconomics and the two

standard interpretations for these models. It draws heavily on Kehoe (1989), a great survey

article by Tim Kehoe that contains a lot of what one should know after their first quarter

of Ph.D. macro. Tim makes the article available on his webpage.

3.1 Infinitely-lived consumer

As in chapter 2, time is discrete and infinite throughout this chapter. As before, index time

by t ∈ 0, 1, 2, . . .. There is one consumption good1 but there is no production (hence no

firm, no labor and no physical capital.) There are h agents called consumers. Consumer

j ∈ 1, . . . h is endowed with quantity wjt ≥ 0 of the consumption good at date t ≥ 0.

We assume that all consumers’ endowment sequence is bounded. They order non-negative

consumption sequences cjt∞t=0 according to the following utility function:

∞∑

t=0

βtjUj(c

jt )

1The paper studies the case with many consumption goods but little is lost by looking at the one-goodcase.

37

38 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS

where βj ∈ (0, 1) for all j, Uj is strictly concave, strictly increasing and continuous on IR+,

continuously differentiable on IR++, and limc7→0 U ′j(c) = +∞ so as not to have to worry about

corner solutions. Consumers can always choose to eat their endowment in each period. But

trade could improve their lot. There are several ways to introduce trade in this environment.

We will consider two, the canonical ones.

3.1.1 Market structures

In the Arrow-Debreu (AD) market structure, all trade takes place at date 0. At date 0,

consumers trade future contracts that specify all deliveries in all periods. Let pt be the

price of one unit of consumption good at date t in terms of an arbitrary unit of account,

or numeraire. That is, consumers can trade promises to deliver one unit of date t good for

promises to deliver pt

pt′unit(s) of date t′ good.

Therefore, consumer j can select any consumption sequence that satisfies cjt∞t=0:

∞∑

t=0

ptcjt ≤

∞∑

t=0

ptwjt .

In effect, consumers sell (rights to) their endowment at date zero and then choose among

consumption profiles whose value at date 0 is less than that of their initial endowment.

Clearly, scaling all prices up or down by the same factor does not change any of the consumers’

choice set and we could for instance normalize p0 to one so that all prices are in terms of the

date 0 consumption good, making date 0 good the numeraire.

To summarize, in the AD market structure, consumer j solves:

max∞∑

t=0

βtjUj(c

jt )

subject to:

∞∑

t=0

ptcjt ≤

∞∑

t=0

ptwjt

cjt ≥ 0 for all t

3.1. INFINITELY-LIVED CONSUMER 39

First order conditions associated with this problem are

βtjU

′j(c

jt) = λjp

jt for all t, j (3.1.1)

where λj > 0 is the Lagrange multiplier associated with consumer j’s budget constraint.

An Arrow-Debreu equilibrium is a sequence pt∞t=0 of prices and consumption profiles

cjt∞t=0 for each consumer such that:

1. For all j ∈ 1, . . . h and given prices, cjt∞t=0 solves consumer j’s problem.

2. The market for the consumer good clears in all periods:∑h

j=1 cjt =

∑hj=1 wj

t for all t.

In the sequential market structure, trade takes place every period. At each date t,

households can trade the consumption good on spot markets and trade securities that, for

each unit of consumption good invested, yield quantity 1 + rt+1 > 0 of the consumption

good at date t + 1. Consumer j thus chooses a sequence cjt , b

jt+1∞t=0 of consumption and

investment to satisfy:

cj0 + bj

1 = wj0

cjt + bj

t+1 = wjt + bj

t (1 + rt) for all t ≥ 1

cjt ≥ 0 for all t ≥ 0

Solve the first equation for bj1, plug into date 1’s constraint and divide by (1 + r1) to

obtain:

cj0 +

cj1

1 + r1= wj

0 +wj

1

1 + r1− bj

2

1 + r1

Proceeding recursively, one obtains for all T > 1:

T∑

t=0

ptcjt =

T∑

t=0

ptwjt −

bjT+1

ΠTi=1(1 + ri)

where pt = (Πts=01 + rs)

−1with r0 = 0. As long as, limT 7→∞

bjT

ΠT−1i=1 (1+ri)

= 0, we obtain the

same (type) of budget constraint as before.


In the next subsection, we will in fact argue that the two market structures are equiv-

alent in the sense that equilibrium allocations under one market structure are equilibrium

allocations under the other.

While we allow consumers to buy and sell securities, it is necessary to put a bound on the

quantity of securities that they can sell. Otherwise, given any set of prices, consumers could

improve upon any consumption profile by borrowing additional amounts of the consumption

good in a given period and financing this purchase by selling as many securities as necessary

in future periods. These borrowing strategies are called Ponzi schemes and must be ruled

out for an equilibrium to exist.

In the finite T − period case, we’d simply require that bjT+1 ≥ 0 for all j. Here, we will

impose the constraint that for all j:

limt 7→∞

bjt

Πt−1i=1(1 + ri)

≥ 0.

This constraint implies that at any point in time, the present value of any consumer’s debt

cannot exceed the difference between the present value of their remaining endowment, and

the present value of their remaining consumption path. Put another way, consumers must

eventually pay their debts. In particular, one easily shows that this constraint rules out

Ponzi schemes.2

To summarize, in the sequential market structure, consumer j solves:

max∞∑

t=0

βtjUj(c

jt )

2Note that in writing this no-ponzi scheme constraint, we are implicitly imposing the constraint thatlimt 7→∞

bjt

Πt−1i=1(1+ri)

be well defined. This entails no loss of generality since that limit must exist in equilibrium.

At this stage one could require only that lim inft 7→∞bj

t

Πt−1i=1 (1+ri)

≥ 0.


subject to, for all t ≥ 0:

cjt + bj

t+1 = wjt + bj

t (1 + rt) where bj0 = 0

cjt ≥ 0

limt 7→∞

bjt

Πt−1i=1(1 + ri)

≥ 0

A sequential-market equilibrium is a sequence rt∞t=0 of interest rates, and, for all j =

1, . . . h, consumption profiles cjt∞t=0, and sequences of bond holdings bj

t+1∞t=0 such that:

1. For all j ∈ 1, . . . h and given prices, cjt , b

jt+1∞t=0 solves consumer j’s problem for all

j.

2. The market for the consumer good clears in all periods:∑h

j=1 cjt =

∑hj=1 wj

t for all t.

Do we need to specify that the bond market must clear? No. To see this, sum up the

budget constraint of agents at date 0, use the resource constraint to argue that we must

have∑

h bj1 = 0, and proceed recursively (you should do it). We can now formally state the

following equivalence result.

Proposition 1. Assume that pt+∞t=0 and cj

tt=0,...+∞,j=1,...h is an Arrow-Debreu equilibrium.

Then rt+∞t=0 and cj

t , bjt+1t=0,...+∞,j=1,...h is a sequential market equilibrium with

r0 = 0

rt =pt−1

pt− 1 for all t > 0

bjt+1 = wj

t + bjt (1 + rt) − cj

t for all j and t > 1

with bj0 = 0 for all j.

Conversely, assume that rt+∞t=0 and cj

t , bjt+1t=0,...+∞,j=1,...h is a sequential market equi-

librium. Then pt+∞t=0 and cj

tt=0,...+∞,j=1,...h is an Arrow-Debreu equilibrium with pt =

(Πts=01 + rs)

−1for all t ≥ 0 where r0 = 0.

Proof. This result follows trivially from observing that under the constructed prices, the

consumer’s choice set are unchanged, provided we can deal with one complication. Is it the


case that limt 7→∞bjt

Πt−1i=1(1+ri)

= 0 in the consumer’s problem for all j in a sequential market

equilibrium? To see that it is, let λjt be the shadow price (Lagrange multiplier) associated

with the consumer j’s date t constraint. Then we must have for all t ≥ 0 and j ∈ 1, . . . h

λjt = βt

jU′j(c

jt ) (3.1.2)

λjt = λj

t+1(1 + rt+1) (3.1.3)

lim supt 7→∞

λjtb

jt+1 ≤ 0 (3.1.4)

The last condition is the proper version of the transversality condition in this context and

says that the consumer cannot overaccumulate assets along an optimal path. The operator

lim sup applied to a sequence takes the supremum of all limits of subsequences of the original

sequence. It is always defined although it could be plus or minus infinity.3

Now, back to the proof. Consumer j ′s first order conditions imply that λjt = λj

0 (Πti=1(1 + ri))

−1

for all t, so that the transversality condition may be rewritten as lim supt 7→∞bjt

Πt−1i=1(1+ri)

≤ 0.

Combined with the no-ponzi constraint, this implies limt 7→∞bjt

Πt−1i=1(1+ri)

= 0, as needed.

Therefore, the two market structures are equivalent in the sense that a consumption

allocation is part of an Arrow-Debreu equilibrium if and only if it is part of a sequential

market equilibrium.

3.1.2 Welfare theorems

A consumption allocation cjtt=0,...+∞,j=1,...h is called Pareto optimal if it is feasible (it satis-

fies the aggregate resources in each period) and no other feasible allocation cjtt=0,...+∞,j=1,...h

satisfies:∞∑

t=0

βtjUj(c

jt) ≥

∞∑

t=0

βtjUj(c

jt) for all j

with at least one strict equality. In other words, it is not possible to increase the welfare of

one consumer without reducing that of another consumer.

In finite dimensional spaces, competitive equilibria are Pareto Optimal under very weak

3See Michel (1990) for the gruesome details. In the Ramsey problem, the non-negativity condition ofcapital stocks enables us to use simple limits instead of this more complicated expression.


assumptions, a result known as the first welfare theorem (look it up). In the infinitely-lived

consumer model, the first welfare theorem holds under equally general conditions:

Proposition 2. Assume that pt+∞t=0 and cj

tt=0,...+∞,j=1,...h is an Arrow-Debreu equilibrium.

Then cjtt=0,...+∞,j=1,...h is Pareto optimal.

Proof. Assume that cjtt=0,...+∞,j=1,...h is not Pareto Optimal and let cj

tt=0,...+∞,j=1,...h be a

feasible allocation that dominates it. Then, cjtt=0,...+∞,j=1,...h must exhaust all consumer’s

budgets (why?) and exceed it for at least one consumer (why?). That is,

∞∑

t=0

ptcjt ≥

∞∑

t=0

ptwjt for all j

with at least one strict inequality. Noting that in equilibrium we must have∑∞

t=0 ptwjt <

∞ for all j (why?), this implies (summing over j) that

h∑

j=1

∞∑

t=0

ptcjt >

h∑

j=1

∞∑

t=0

ptwjt

∞∑

t=0

pt

(h∑

j=1

cjt

)>

∞∑

t=0

pt

(h∑

j=1

wjt

)

But this can’t be since feasibility of the alternative allocation means that

h∑

j=1

cjt ≤

h∑

j=1

wjt for all t ≥ 0.

This contradiction completes the proof.

This result, in turn, implies that competitive equilibria must solve a planner’s problem

(another standard micro result that works in this infinitely-lived case.)

Proposition 3. An allocation cjtt=0,...+∞,j=1,...h is Pareto Optimal allocation if and only if

there exists (Pareto) weights αj : j = 1, . . . h such that∑h

j=1 αj = 1 and cjtt=0,...+∞,j=1,...h

solves:

max

h∑

j=1

αj

∞∑

t=0

βtjUj(c

jt )


subject to:h∑

j=1

cjt =

h∑

j=1

wjt for all t ≥ 0.

Proof. Assume that cjtt=0,...+∞,j=1,...h solves the Planner’s problem for some weights αj :

j = 1, . . . h such that∑h

j=1 αj = 1. Note that if j is such that αj = 0, then cjt = 0 for

all t since otherwise resources allocated to consumer j could be redirected to a consumer

with strictly positive weight. Assume by way of contradiction that cjtt=0,...+∞,j=1,...h is not

Pareto Optimal. Then another allocation exists that raises everybody’s utility, strictly for

one j. If that j is a consumer with zero weight, redirect all resources to a consumer with

positive weight. We have then raised∑h

j=1 αj

∑∞t=0 βt

jUj(cjt ) strictly which contradicts the

fact that cjtt=0,...+∞,j=1,...h solved the planner’s problem.

The fun part is the converse. For that we need a bit more machinery which is introduced

in section 3.3. Completing the proof is the last part of homework 2.

We have established that competitive equilibrium allocations are optimal. We will now

establish a converse of sorts to this result by showing that all Pareto optimal allocations are

competitive equilibria. We will do so using standard calculus tools for quickness. Chapter

15 in Stokey, Lucas and Prescott provides a proof that does not require any differentiability

assumption.

Let cjtt=0,...+∞,j=1,...h be Pareto Optimal. We have established that such an allocation

solves the Planner’s problem for a set αj : j = 1, . . . h of weights. Therefore, it must satisfy

the following set of first-order conditions:

αjβtjU

′j(c

jt ) = πt for all t, j (3.1.5)

for a set πt : t = 1, . . .∞ of Lagrange multipliers. But conditions (3.1.1) are the same as

conditions (3.1.5) with πt playing the role of prices and 1αj

playing the role of the consumer

j’s multiplier. So a solution to the social planner’s allocation solves the consumer’s problem if

and only if they satisfy the budget constraint at these candidate prices. In general, that is not

the case (consider for instance what happens if we set α1 = 0 in the planner’s problem.) But

with a bit of redistribution, we can support any Pareto optimal allocation as a competitive


equilibrium.

Here are two re-distribution schemes that work. First, make the desired allocation the

new endowment. That is, for all j, impose a sequence τ jt ∞t=0 of good transfers on consumer

j such that wjt +τ j

t = cjt . At the new endowments wj

t +τ jt t=0,...+∞,j=1,...h, cj

tt=0,...+∞,j=1,...h

is just affordable for each consumer. Since it satisfies all other first-order conditions, it solves

the consumer problem. Since the allocation is Pareto Optimal, it is feasible. So we have

constructed a competitive equilibrium.

There is a simpler set of transfers that also works and need only take place at date 0.

For all j, let

tj =

∞∑

t=0

πt(cjt − wj

t )

where the πt’s are the Lagrange multipliers associated with the social planner problem which

cjtt=0,...+∞,j=1,...h solves. Assume that consumer j receives transfer tj in terms of the nu-

meraire at date 0. Their new problem is:

max∞∑

t=0

βtjUj(c

jt )

subject to:

∞∑

t=0

πtcjt ≤ tj +

∞∑

t=0

πtwjt

cjt ≥ 0 for all t

Because cjtt=0,...+∞,j=1,...h satisfies the amended budget constraint of all consumers by

construction and satisfies all other first order conditions, it solves each consumer’s amended

problem. Hence we have the following result, a version of the Second Welfare Theorem:

Proposition 4. Every Pareto Optimal allocation can be supported as an Arrow-Debreu equi-

librium with transfers.


3.1.3 An example

The tools we have developed can simplify the search for competitive equilibria. Rather than

solve all consumer’s problem and look for prices that clear markets, we can solve a planner’s

problem instead. We know that competitive equilibria must be the solution to one such

problem. But which one? The one whose solution requires no transfers. This approach was

developed by Negishi (1960).

We will illustrate the idea in the context of a simple example. Assume that h = 2, that

Uj = log and βj = β ∈ (0, 1) for j = 1, 2 (insert same remarks as always about the fact

that log is not defined at zero) and that w1 = (1, 0, 1, 0, 1, . . . ) while w2 = (0, 1, 0, 1, . . .).

Competitive equilibria must solve for some α1 ∈ (0, 1):

max α1

∞∑

t=0

βt log(c1t ) + (1 − α1)

∞∑

t=0

βt log(c2t )

subject to:

c1t + c2

t = 1 for all t ≥ 0.

First order conditions are, for all t ≥ 0:

α1βt

c1t

= α2βt

c2t

= πt

where πt is the Lagrange multiplier associated with date t’s resource constraint. This implies,

for all t ≥ 0:

c1t =

α1

1 − α1c2t ,

or given the resource constraint in each period: c1t = α1 and c2

t = 1 − α1. Furthermore,

“prices” are πt = α1βt

c1t= βt for all t ≥ 0.

Here we could take a short cut. We now know that competitive allocations like all optimal

allocations give each consumer constant consumption over time. We also know that prices

must be πt = βt. Let c1 be agent 1’s constant consumption. The budget constraint implies

that+∞∑

t=0

βtc1 =c1

1 − β=

+∞∑

t=0

βtw1t =

1

1 − β2


so that c1 = 1−β1−β2 = 1

1+β. Similarly, c2 = β(1−β)

1−β2 = β1+β

is consumer 2’s constant consumption

level.

But, to illustrate Negishi’s method let us also take the long route. Given Pareto weights

(α1, 1 − α1), transfers needed to implement the optimal allocation are:

t1 =+∞∑

t=0

βt(α1 − w1t )

=α1

1 − β− 1

1 − β2

and,

t2 =+∞∑

t=0

βt(1 − α1 − w2t )

=1 − α1

1 − β− β

1 − β2

Competitive equilibria correspond to values for α1 that make those transfers zero. Algebra

shows that the only value of α1 that meets this requirement is: α1 = 1−β1−β2 = 1

1+β. And, of

course, we arrive at the same answer.

3.1.4 Money

Fiat, unbacked money cannot have any value in the infinitely-lived agent model. To see this,

assume that we endow agents j = 1, . . . h with quantity mj ≥ 0 of unbacked money. Their

consumer j’s budget constraint becomes

∞∑

t=0

ptcjt ≤ mj +

∞∑

t=0

ptwjt .

Summing over all consumers gives:

h∑

j=1

∞∑

t=0

ptcjt =

h∑

j=1

∞∑

t=0

ptwjt +

h∑

j=1

mj


But in equilibrium the fact that the resources constraint holds with equality in all periods

implies thath∑

j=1

∞∑

t=0

ptcjt =

h∑

j=1

∞∑

t=0

ptwjt

in all periods hence that:h∑

j=1

mj = 0.

To summarize, we have established that competitive equilibria are always Pareto Optimal

in the infinitely-lived agent model and that there is no room for fiat money. Furthermore,

standard arguments show that competitive equilibria are generally in finite number (see

Kehoe, 1989, for more on this point.) All these properties may be violated in overlapping

generation models, to which we now turn.

3.2 Overlapping generations

Assume that each period a consumer is born that lives for exactly two periods. The consumer

born at date t is endowed with w1 in the first period of their life and w2 in the second. There

is also an initial generation alive at date 0 that lives for one period and has endowment w2

of the consumption good and some amount m of unbacked money.

Let pt denote the price of date t consumption in units of unbacked money. The initial

old eat the largest amount of consumption good compatible with their budget constraint:

p0c−10 = p0w2 + m.

As for consumers born at date t ≥ 0, denote by cts their consumption in period s = t, t + 1.

They solve

max u(ctt, c

tt+1)

subject to

ptctt + pt+1c

tt+1 = ptw1 + pt+1w2

where u satisfies the same assumptions as in the previous section.

3.2. OVERLAPPING GENERATIONS 49

An Arrow-Debreu equilibrium is a sequence pt∞t=0 of prices, an initial money level m, a

consumption level c−10 for the initial generation and consumption profiles (ct

t, ctt+1)∞t=0 such

that given prices,

1. c−10 solves the initial generation’s problem (i.e. c−1

0 = w2 + mp0

);

2. for all t ≥ 0, (ctt, c

tt+1) solves generation t’s problem;

3. the market for goods clears for all t ≥ 0: ctt + ct−1

t = w1 + w2.

As before, we could assume instead that trading takes place in a sequential fashion. Date

t consumer would then face the following two constraints:

ctt + mt = w1

ctt+1 = w2 + mt(1 + rt+1)

where (1 + rt+1) = pt

pt+1and mt are security (or money) holdings.

In other words, mt is a claim that delivers pt

pt+1worth of consumption good at date t + 1

per unit invested at date t. Note that these claims are unbacked. Agents are willing to hold

positive amounts of those claims only provided they know that they will be able to exchange

them for positive quantities of the consumption good in the next period (by trading with

agents as yet unborn). These holdings are, in other words, fiat money holdings.4

A sequential market equilibrium is a sequence of interest rates rt∞t=0, an initial money

level m, a consumption level c−10 for the initial generation, consumption profiles (ct

t, ctt+1∞t=0

and money holdings mt∞t=1, such that given interest rates,

1. c−10 solves the initial generation’s problem (i.e. c−1

0 = w2 + m);

2. for all t ≥ 0, (ctt, c

tt+1, m

t) solves generation t’s (sequential) problem;

3. the market for goods clears for all t ≥ 0: ctt + ct−1

t = w1 + w2.

The same equivalence result holds as in the infinitely-lived agent case. Henceforth, we

will work with the Arrow-Debreu trading structure for concreteness.

4Interpretations of negative unbacked security holdings are a bit more convoluted. See Kehoe (1989) fora discussion.


Because u is continuous and strictly concave, a unique solution to the problem solved

by date t agents exists (see homework 1.) Let y(pt, pt+1) = ctt(pt, pt+1) − w1 be the excess

demand by agents born at date t when young while z(pt, pt+1) = ctt+1(pt, pt+1) − w2 is their

excess demand when old. The initial old have excess demand z(p0, m) = mp0

at date 0.

Equilibrium requires that y(p1, p0) + z(p0, m) = 0 and that for all t ≥ 1, y(pt, pt+1) +

z(pt−1, pt) = 0. It is also easy to see that for all t > 0, excess demands are homogenous of

degree zero in prices. They only depend on the price ratio, or, in other words, on the interest

rate. Furthermore, the fact that all consumers exhaust their budget constraints tells us that

for all t ≥ 1,

pty(pt, pt+1) + pt+1z(pt, pt+1) = 0,

which is Walras’ law.

Computing equilibria can be done by solving the system of market clearing conditions

forward. Fix mp0

. This gives us z(p0, m) hence, in turn, y(p1, p0) from market clearing. Does

that tell us uniquely what p1 must be? (recall that p0 ≡ 1 by convention). It does provided

y is monotonic in p1

p0i.e provided the gross substitutability assumption is met. (Otherwise,

we have several ways to proceed.)

Assuming that gross substitutability holds, we can get z(p1, p0) uniquely hence, by market

clearing, y(p1, p2). Proceeding gives us a full path for excess demands hence for prices by

Walras’s law. (The graphical version of this is figure 16.2 in Kehoe, 1989.)

Autarky is always an equilibrium. To see this, start at mp0

= 0 which, recursively, implies

that all excess demands are zero. Under gross substitutability, the constant price ratio that

supports that equilibrium is unique. In general, there is another steady state that may entail

non-zero values for mp0

, even negative values, in which case it is probably best to think of −mp0

as a tax on the initial old.

We will treat the case where agents want to transfer resources from the first to the second

period (the case where supporting autarky in equilibrium involves negative interest rates,

that is, the case where the slope of the offer curve looks as drawn in figures 16.2 in Kehoe,

1989.)

Assume then that the marginal rate of substitution between consumption when young

3.3. SEPARATING HYPERPLANE THEOREM 51

and consumption when old is less than 1. That is, assume that u1(w1,w2)u2(w1,w2)

< 1. Assume further

that the gross substitutability assumption holds. That is, ctt(pt, pt+1) rises with pt+1

pt.

Then there are two steady states: autarky which corresponds to mp0

= 0, and a steady

state with constant prices, i.e. unit interest rates which entails mp0

= z(1, 1). We will refer to

this second steady state as the stationary monetary equilibrium. There is also a continuum

of equilibria associated with each possible value for mp0

in (0, z(1, 1)). In this continuum, the

only Pareto Optimal equilibrium is the stationary monetary equilibrium. Indeed, all other

equilibria involve negative interest rates at all dates. But since one-for-one transfers between

generations can always be arranged, negative interest rates are suboptimal for all generations

born after t ≥ 1. As for the old, unitary interest rates correspond to the highest possible

transfer to them.

Another way to see the suboptimality of all equilibria except the monetary steady state

is to apply the general criterion of Balasko and Shell (1980). According to that criterion,

equilibria are optimal if and only if

∞∑

t=1

1

pt= +∞.

All equilibria other than a monetary steady state converge to a steady state such that

pt

pt+1= 1 + r < 1 where r is the autarkic interest rate, a negative number by assumption.

So as t grows large pt behaves like a geometric sequence of modulus greater than 1 hence∑∞

t=11pt

< +∞.

To summarize, in overlapping generations models, equilibria need not be optimal, there

is room for money, and many equilibria usually exist.

This effectively completes chapter 2. The following section provides the result we need

to complete the proof of proposition 3.

3.3 Separating hyperplane theorem

In the plane, drawing a few pictures should convince you that one can always draw a straight

line between two disconnected convex sets. You should also be able to convince yourself that


convexity is necessary. This remains true in higher real linear dimensional spaces. To see

this, we need a couple definitions.

Let X be a real linear space. A real linear functional on X is a function φ : X 7→ IR that

satisfies for all (x1, x2) ∈ X2 and (α, β) ∈ IR2:

φ(αx1 + βx2) = αφ(x1) + βφ(x2).

A hyperplane in X is a level set of a linear functional i.e. sets of points x ∈ X that satisfy

φ(x) = c where c is a real number. In the plane, hyperplanes are straight lines.

When X is equipped with a metric d, we call φ a continuous linear functional if it is

continuous in that metric. In Euclidian spaces, linear functionals are always continuous. But

that’s not true in general. (Now would be a good time to start reading chapter 15 in Stokey,

Lucas and Prescott. It is an intimidating chapter, but it is nothing but a generalization of

the results we have established in this chapter to arbitrary spaces.)

The theorem we need is known as the Hahn-Banach theorem, or the separating hyperplane

theorem.

Theorem 4. (Hahn-Banach) Let S be a linear space equipped with a metric. Let A, B ⊂ S

be convex sets such that A does not contain any interior point of B. Assume either that S

is finite dimensional or that B has an interior point. Then there exists a continuous linear

functional φ which is not identically zero and a constant c such that:

φ(x) ≤ c ≤ φ(y) for all x ∈ A and all y ∈ B

Note that if S = IRn then φ(x) =∑n

i=1 aixi for some non-zero vector a of real numbers.

We can now prove proposition 3. Let F be the set of all feasible allocations cjtt=0,...+∞,j=1,...h

and for c ∈ F and j = 1, . . . h, write Vj(c) =∑∞

t=0 βtjUj(c

jt). The utility feasibility set is:

A = V ∈ IRh : ∃c ∈ F such that Vj ≤ Vj(c) for all j.

That is, A is the set of utility levels that the planner can implement. One easily shows that

A is convex (do it.)

3.3. SEPARATING HYPERPLANE THEOREM 53

Now let c∗ be Pareto optimal and let V ∗ = V (c∗). Define B = V ∈ IRh : V ≥

V ∗ with V 6= V ∗. B is the set of utility assignments that give all consumers more utility,

strictly so in the case of one consumer. Again, B is convex. Also A does not contain any

interior point of B since c∗ is Pareto optimal. So one can apply the Hahn-Banach theorem

to sets A and B and this turns out to be exactly what we need to complete the proof of the

proposition, as you will show in homework 2.


3.4 Problems

Problem 1

Consider a pure-endowment economy with 2 infinitely-lived agents where for j = 1, 2,βj = β ∈ (0, 1) and Uj = log. Endowments are w1 = (1, 3, 1, 3, 1, . . .) and w2 = (3, 1, 3, . . .).

1. Define an Arrow-Debreu equilibrium in this environment.

2. Show that a unique Arrow-Debreu equilibrium exists (No need to invoke Weierstrass’theorem here.) and find it.

3. Define a Pareto optimal allocation.

4. Show that the allocation (c1t , c

2t ) = (2, 2) for all t is Pareto optimal. (Use proposition

3 or a direct argument.)

5. Explain how to implement that allocation as an Arrow-Debreu equilibrium with trans-fers.

Problem 2

Consider an overlapping generation economy where the representative consumer has util-ity function u(c1, c2) = log c1 +log c2 for all (c1, c2) > (0, 0) and endowment (w1, w2) > (0, 0).The initial old are endowed with quantity w2 of the good and some amount m of the nu-meraire, and have strictly monotonic preferences.

1. Define an Arrow-Debreu equilibrium in this economy.

2. Provide a condition under which an equilibrium exists where money has positive value.

3. There are two equilibria with constant inflation under that condition. What are they?Are they both Pareto optimal? Explain.

4. Calculate excess demand functions.

5. Assume that (w1, w2) = (2, 1). Use Matlab to draw the offer curve and draw theequilibrium path of both excess demands when the excess demand of the initial old is0.4.

Problem 3

1. Write the necessity part of proposition 3. (In details. For instance, when you invokethe Hahn Banach theorem, check that its conditions are met.)

2. Prove that as long as all endowment sequences are bounded, the social planner problemdefined in proposition 3 has a solution for any possible set of weights.

3.4. PROBLEMS 55

Problem 4

Consider an economy where time is discrete and where each period a two-period livedconsumer is born. Consumers born at date t ≥ 0 are endowed with quantity w1 > 0 ofthe consumption good when young and w2 > 0 when old. They order consumption profiles(ct

t, ctt+1) according to a utility function u : IR2

+ 7→ IR that is continuously differentiable onIR++, strictly concave and strictly increasing in both arguments.

At date 0 there is an initial old generation that is endowed with quantity w2 of the goodand m ≥ 0 of money. They want to consume as much as possible.

1. Define an Arrow-Debreu equilibrium.

2. Assume that u1(w1,w2)u2(w1,w2)

> 1. Show that the Arrow-Debreu equilibrium where (ctt, c

tt+1) =

(w1, w2) for all t ≥ 0 and m = 0 is Pareto Optimal.

3. Assume that u(ctt, c

tt+1) =

√ctt+

12

√ctt+1 for all t ≥ 0 and that (w1, w2) = (9, 1). If there

is a stationary monetary equilibrium in this economy, find it. If there is no stationarymonetary equilibrium, explain why.

Problem 5

Consider a discrete-time environment populated by two infinitely lived consumers. Thereis one consumption good and no production. Consumer j ∈ 1, 2 is endowed with quantitywj

t ≥ 0 of the consumption good at date t and assign consumption profiles cjt+∞

t=0 utility∑+∞t=0 βt

jUj(cjt) where βj ∈ (0, 1) and Uj : [0, +∞] 7→ IR is strictly concave, continuously

differentiable on IR++ and strictly increasing on IR+ with limc7→0 U ′j(c) = +∞.

1. Assume that β1 = β2 = β and that w1 = (0, 2, 0, 2 . . .) while w2 = (2, 0, 2, 0 . . .). Showthat in any Arrow-Debreu equilibrium both consumers choose constant consumptionprofiles and that prices satisfy pt+1 = βpt for all t ≥ 0. Using that information, findthe unique Arrow-Debreu equilibrium in this case.

2. Assume now that β1 < β2 and that w1t = w2

t = 0.5 for all t ≥ 0. Show that anyArrow-Debreu equilibrium is such that limt 7→+∞ c1

t = 0 while limt 7→+∞ c2t = 1.

Chapter 4

Deterministic Dynamic Programming

4.1 Principle of optimality

Many deterministic dynamic optimization problems in economics are special cases of the

following general class of stationary control problems. At date t there is a vector xt ∈ X that

describes the state of the system where X ⊂ IRn for some integer n. An agent can select a

vector of actions at drawn from a set Γ(xt) ∈ Y ⊂ IRm, m ∈ IN , that depends on the state

of the system where we assume that Γ(xt) 6= ∅ for all xt ∈ X.

Depending on the current state and the action selected, the state in period t + 1 is given

by g(xt, at). Function g is called the law of motion.

In period t, given state xt ∈ X and action at ∈ Y , the agent earns reward R(xt, at). He

orders paths xt, at+∞t=0 of states and actions according to utility function

∑+∞t=0 βtR(xt, at)

where β ∈ (0, 1). Given an initial value x0 of the state, the agent solves:

sup+∞∑

t=0

βtR(xt, at)

subject to:

at ∈ Γ(xt) for all t ≥ 0

xt+1 = g(xt, at) for all t ≥ 0

57

58 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING

This problem is called stationary because the set X, Y, Γ, g, β, R of objects that fully

define it do not depend on time. As in Stokey, Lucas and Prescott (SLP), we will call

this version of the problem the sequential problem and refer to it as problem (SP). We will

also assume throughout this chapter that for all x0 ∈ X and all sequences that satisfy the

problem’s constraints,∑+∞

t=0 βtR(xt, at) is well-defined.

We replace for the moment the standard max operator with the sup operator to allow for

the case where a solution does not exist.1 For any set X of real numbers, sup X ∈ [−∞, +∞]

is X’s least-upper bound i.e. the unique value such that:

1. x ∈ X =⇒ x ≤ sup X and,

2. y < sup X =⇒ ∃x ∈ X such that x > y.

All sets of real numbers have a least-upper bound in [−∞, +∞]. Furthermore, real numbers

satisfy the least-upper bound property : any bounded set of real numbers has a finite least-

upper bound. This property, in fact, defines the real numbers. Real numbers are the set of

all least-upper bounds of sets of rational numbers.

In principle, the supremum in (SP) could be +∞ or −∞. For simplicity, we will rule

out this possibility and concentrate our attention on the bounded returns case where R is

bounded above and below. In most cases that economists deal with, R is either explicitly

or implicitly bounded. SLP treat the general case and you should read all the details there.

Note, importantly, that assuming that R is bounded does not suffice to guarantee that a

solution to (SP) exists. As you know by now, we need continuity and compactness conditions.

More on that soon.

Define Π(x0) to be the set of action sequences that are feasible for the agent given initial

state x0 ∈ X. That is,

Π(x0) =at+∞

t=0 : ∃xt+∞t=1 such that, for all t ≥ 0 at ∈ Γ(xt) and xt+1 = g(xt, at)

.

1Recall from chapter 2 that a solution to (SP) exists for instance if R is continuous and bounded and ifΓ(xt) is compact for all xt ∈ IR. The assumption that R is bounded can be significantly relaxed in manycases. It is sufficient for instance to assume that Γ does not allow the state to grow so fast that the rewardgrows faster than β−1 forever (See proposition 1 in Jones and Manuelli, 1990.)

4.1. PRINCIPLE OF OPTIMALITY 59

An element of Π(x0) is called a feasible plan. With this notation, let

v∗(x0) = supat+∞

t=0∈Π(x0)

+∞∑

t=0

βtR(xt, at)

where for all t ≥ 0, xt+1 = g(xt, at). This function gives the supremum in problem (SP) for

any possible initial value of the initial state.

By definition of the supremum function, v∗(x0) is the only value that satisfies

1. v∗(x0) ≥∑+∞

t=0 βtR(xt, at) for all at+∞t=0 ∈ Π(x0);

2. ∀ε > 0, ∃at+∞t=0 ∈ Π(x0) such that

∑+∞t=0 βtR(xt, at) > v∗(x0) − ε where for all t ≥ 0,

xt+1 = g(xt, at).

One way to build a feasible plan from any initial state x0 is to choose some action

a0 ∈ Γ(x0) and then choose a continuation plan at+1+∞t=0 ∈ Π(g(x0, a0)). Doing so yields

utility

R(x0, a0) ++∞∑

t=0

βt+1R(xt+1, at+1) = R(x0, a0) + β+∞∑

t=0

βtR(xt+1, at+1)

where for all t ≥ 0, xt+1 = g(xt, at). Since this is true for all possible continuation plans

at+1+∞t=0 ∈ Π(g(x0, a0)),

v∗(x0) ≥ R(x0, a0) + β supat+1+∞

t=0∈Π(g(x0,a0))

+∞∑

t=0

βtR(xt+1, at+1)

= R(x0, a0) + βv∗(g(x0, a0))

But the choice of a0 was also arbitrary so we get:

v∗(x0) ≥ supa0∈Γ(x0)

R(x0, a0) + βv∗(g(x0, a0)). (4.1.1)

We now want to argue that 4.1.1 holds as an equality for all x0 ∈ X which is Theorem 4.2

in SLP and is known as Bellman’s principle of optimality. We provide a quick proof here:

Proposition 5.

v∗(x0) = supa0∈Γ(x0)

R(x0, a0) + βv∗(g(x0, a0))


for all x0 ∈ X.

Proof. We know that v∗(x0) ≥ supa0∈Γ(x0) R(x0, a0) + βv∗(g(x0, a0)). Given ε > 0 find a

plan at+∞t=0 ∈ Π(x0) that comes within ε of v∗(x0). Since that plan must be feasible,

at+∞t=1 ∈ Π(g(x0, a0)) so that

+∞∑

t=0

βtR(xt+1, at+1) ≤ v∗(g(x0, a0)).

But then,

v∗(x0)−ε <+∞∑

t=0

βtR(xt+1, at+1) ≤ R(x0, a0)+βv∗(g(x0, a0)) ≤ supa0∈Γ(x0)

R(x0, a0)+βv∗(g(x0, a0)).

Since ε is arbitrary, this implies v∗(x0) ≤ supa0∈Γ(x0) R(x0, a0) + βv∗(g(x0, a0)).

To check that you understand the argument above you should ask yourself: why does

this proof differ from SLP’s? Where did we use the assumption that R is bounded?

The Bellman equation is a recursive functional equation. It is a functional equation

because it must hold for all x0 ∈ X hence defines a condition which function v∗ must meet.

It is recursive because it defines v∗ in terms of itself. This use of language is a bit premature

however since we have yet to show that the Bellman equation defines anything, in other

words that a unique function (v∗) satisfies it. We will now provide conditions under which

the Bellman equation does define v∗.

Notice that the equation defines an operator on functions. To see this, for all real-valued

functions h let

Th(x) = supa∈Γ(x)

R(x, a) + βh(g(x, a)) for all x ∈ X. (4.1.2)

A function v on X satisfies the Bellman equation if v = Tv. We have shown that v∗ = Tv∗.

Is there any other solution to the Bellman equation? Not among bounded functions:

Proposition 6. If v satisfies the Bellman equation and v is bounded then v = v∗.

Proof. Assume that v = Tv. Then, for any x0 and at+∞t=0 ∈ Π(x0),

v(x0) ≥ R(x0, a0) + βv(g(x0, a0)) ≥ R(x0, a0) + βR(x1, a1) + β2v(g(x1, a1))

4.1. PRINCIPLE OF OPTIMALITY 61

where x1 = g(x0, a0). Proceeding recursively shows that for all n ≥ 0,

v(x0) ≥n∑

t=0

βtR(xt, at) + βnv(g(xn, an))

where for all n ≥ 0, xn+1 = g(xn, an). As n becomes large the last term becomes vanishingly

small because v is bounded which implies that

v(x0) ≥ supat+∞

t=0∈Π(x0)

+∞∑

t=0

βtR(xt, at) = v∗(x0).

To get the opposite inequality, fix ε > 0 and let δt+∞t=0 be such that δt > 0 for all t and

∑+∞t=0 δt < ε. Pick a0 so that v(x0) ≤ R(x0, a0) + βv(g(x0, a0)) + δ0. Then, for all t > 0 and

given xt = g(xt−1, at−1), pick at so that v(xt) ≤ R(xt, at)+βv(g(xt, at))+δt. By construction

it now follows that for all T > 0,

v(x0) ≤T∑

t=0

βtR(xt, at) + βT+1v(g(xT , aT )) +T∑

t=0

δt.

Taking limits and using the fact that v is bounded then gives

v(x0) ≤+∞∑

t=0

βtR(xt, at) + ε ≤ v∗(x0) + ε.

But since ε was arbitrary this implies v(x0) ≤ v∗(x0) for all x0 and we are done.

We have thus established that in the space B(X) of bounded functions, v ∈ B(X) is the

supremum in (SP) if and only if v satisfies the Bellman equation. This fact enables one to

rely on the beautiful machinery that is dynamic programming. Studying the properties of v∗

and optimal policies becomes amazingly simpler than it would be without those tools. The

single best illustration of what dynamic programming can do is Lucas and Prescott (1971).

The tools they use there, which we fully explore in the next section, are so powerful that

oodles of macro papers follow almost exactly the path Lucas and Prescott traced. A great

example is Hopenhayn (1992).


4.2 Tools

Our goals for the remainder of this chapter is to use the principle of optimality to say as

much as we can about solutions to stationary control problems. To that end, we need two

key tools: the theorem of the maximum and the contraction mapping theorem.

4.2.1 Banach spaces

Banach spaces are complete, normed linear spaces.

We already know from chapter 2 what a linear space is. Let X be real linear space (if

you don’t remember what “real” refers to here, look it up.) A norm on X is a function

‖ • ‖ : X 7→ IR+ that satisfies, for all (x1, x2) ∈ X × X and α ∈ IR:

1. ‖x1‖ ≥ 0 with equality if and only if x1 = 0,

2. ‖αx1‖ = |α|‖x1‖,

3. ‖x1 + x2‖ ≤ ‖x1‖ + ‖x2‖.

This should remind you a lot of the way we defined a metric. In fact, norms induce a unique

metric d(x1, x2) = ‖x1 −x2‖ for all (x1, x2) ∈ X ×X. You should verify as a useful and easy

exercise that d is in fact a metric.

Many examples of normed linear (or vector) spaces are provided in SLP (Exercise 3.4).

We will jump right into the space that interests us most in this chapter. Let (X, d) be

a metric space and let B(X) be the set of all bounded real functions on X. To make

B(X) a linear space, we need a notion of addition and a notion of scalar multiplication.

For any two functions g and h in B(X) and any scalar α > 0, define g + h ∈ B(X) by

(g +h)(x) = g(x)+h(x) and αg ∈ B(X) by (αg)(x) = αg(x) for all x ∈ X. That is, addition

and scalar multiplication are defined in the standard pointwise fashion. It is easy to check

that B(X) together with those two operations is a real linear space.

Now we need a norm. For g in B(X), define ‖g‖ = supx∈X |g(x)|. This norm is called

the supnorm and the topology it induces on B(X) is called the supnorm topology. It is

paramount that you verify that the supnorm is in fact a norm. Take notice also of why we

4.2. TOOLS 63

need to restrict our attention to bounded functions. Otherwise, ‖g‖ would not necessarily

be finite.

The one term that remains to define is “complete”. A sequence xn ⊂ (X, d) is called a

Cauchy sequence if limn 7→∞ supm≥nd(xm, xn) = 0. Put another way, a sequence is Cauchy if

for every ε > 0 there exists N large enough such that m, n ≥ N =⇒ d(xm, xn) ≤ ε. Put yet

another way, a sequence is Cauchy if for every ε > 0 the sequence is eventually contained

in a ball of radius ε. Thinking about this a bit should make it clear that for real sequences,

Cauchy sequences must converge. Subsets of metric spaces with that property are called

complete: the set contains the limit point of all its Cauchy sequences.

Here’s a result that we will need.

Proposition 7. Let (X, d) be a metric space. B(X) together with the supnorm is a Banach

space.

Proof. B(X) is a normed linear space with the operations we introduced above. We only

need to show that it is complete. Let gn be a Cauchy sequence in B(X). The sequence

gn(x) or real numbers is Cauchy for every x ∈ X, therefore it converges to some real

number g(x). Since we can do this for all x this gives a candidate function for gn to

converge to.

Now we need to argue that gn converges to g in the supnorm. Fix ε > 0 and pick

N > 0 such that ‖gn − gm‖ < ε2

whenever m, n ≥ N. Then, for any given x ∈ X,

|gn(x) − g(x)| ≤ |gn(x) − gm(x)| + |gm(x) − g(x)| ≤ ε

2+ |gm(x) − g(x)| < ε

for m large enough since gn converges pointwise to g.

There only remains to show that g is bounded. You should do it.

Now here’s a very useful, simple result.

Proposition 8. Let (X, d) be a complete metric space and X ′ be a closed subset of X. Then

(X ′, d) is a complete metric space.

Proof. Let xn be a Cauchy sequence in X ′. Since (X, d) is complete, xn converges to an

element of X. Since X ′ is closed, that element is in X ′ and we are done.


Equipped with this, we can now show that C(X), the space of continuous, bounded real

functions equipped with the supnorm topology is Banach space.

Proposition 9. Let (X, d) be a metric space. C(X) together with the supnorm is a Banach

space.

Proof. Everything but completeness is obvious. We only need to show then that C(X)

is a closed subset of B(X) by the previous result. Let gn ⊂ C(X) converge to g and

for x ∈ X take any sequence xk that converges to x. Then, for all integers k and n,

|g(xk) − g(x)| ≤ |g(xk) − gn(xk)| + |gn(xk) − gn(x)| + |gn(x) − g(x)|. By picking n large

enough, we can make the first and last term of the right-hand side as small as we want.

Then, because gn is continuous, we can make the middle-term as small as desired as well by

letting k grow large. Hence g is continuous, hence C(X) is closed and we are done.

4.2.2 Contraction mapping theorem

One big reason why Banach spaces are useful is the contraction mapping theorem. Let g

be a function from a metric space (X, d) to itself. It is called a contraction mapping with

modulus β < 1 if for all (x1, x2) ∈ X × X, d(g(x1), g(x2)) ≤ βd(x1, x2). A fixed point of g is

an element x ∈ X such that x = g(x).

Theorem 5. Let g be a contraction mapping on (X, d) with modulus β < 1. If (X, d) is

complete, then g has a unique fixed point x. Furthermore, for all x0 ∈ X, d(x, gn(x0)) ≤

βnd(x, x0).

We omit the proof here because it is tedious, but you should read it in SLP and understand

it. The proof consists of starting from any point of X and applying g repeatedly to that

point. The resulting set of points forms a Cauchy sequence hence converges. Because g is

continuous (very strongly in fact, it is Lipzitch continuous), the limit point is a fixed point.

Furthermore, it is clear that the iterative procedure converges there at geometric rate β.

Here’s a trivial consequence of the previous result that ranks among the most useful

results in recursive methods. (You should prove it.)

4.2. TOOLS 65

Corollary 1. Let g be a contraction mapping with modulus β < 1 on a complete metric

space (X, d). Let X ′ be a closed subset of X. If g(X ′) ⊂ X ′ then g’s fixed point is in X ′.

Here is why it is useful. Assume that we wish to show that the fixed point of a contraction

mapping g satisfies property P . One way to do this is to show that the set of points that

satisfy P is closed and that g(x) satisfies P whenever x does. Two steps and we’re done.

Showing that a particular function is a contraction mapping directly can be tough. Black-

well provided a set of sufficient conditions that one can use for that purpose.

Theorem 6. Let X be a subset of IRn. Assume that T : B(X) 7→ B(X) satisfies:

1. (monotonicity) For all f, g ∈ B(X) such that f(x) ≤ g(x) ∀x ∈ X, Tf(x) ≤ Tg(x)

∀x ∈ X,

2. (discounting) There exists β < 1 such that for all a ≥ 0 and x ∈ X, T (f + a)(x) ≤

Tf(x) + βa (where (f + a)(x) means f(x) + a for all x).

Then T is a contraction mapping with modulus β.

In the next section, we will apply this result to the Bellman mapping defined in 4.1.2 to

show that it is a contraction. Its unique fixed point v∗ can then be computed by repeated

iteration on T , a procedure called value function iteration.

4.2.3 Theorem of the Maximum

It is often useful to characterize how the value and policy functions associated with a dynamic

control problem vary with the parameters of the problem, say β or any parameter in the

specific functional forms that define R, g or Γ. For this we need yet another amazingly

useful, general result called the Theorem of the Maximum.

Again, we need to invest in a bit of structure. A correspondence h from set X to set Y

is a function2 that associates with each x ∈ X a non-empty subset h(x) of Y . When X and

Y are metric spaces, we can define two notions of continuity for correspondences, both of

which imply continuity in the standard sense when h is single-valued (hence happens to be

a function in the standard sense.)

2from X to 2Y , the set of all subsets of Y


A correspondence h on X is called compact-valued if h(x) is compact for all x ∈ X. It is

called convex-valued if h(x) is convex for all x ∈ X.

A correspondence h from a metric space (X, dX) to a metric space (Y, dY ) is upper-

hemicontinuous (u.h.c) at x ∈ X if for any open subset O of Y such that h(x) ⊂ O there

exists a neighborhood V of x such that h(V ) ⊂ O. The correspondence is called u.h.c on X

if it u.h.c at x for all x ∈ X.

When h happens to be compact-valued, one can establish upper-hemicontinuity using the

following fact. If h is compact-valued on X, then it is u.h.c at x if for any sequence xn in

X that converges to x and for every sequence yn ∈ Y with yn ∈ h(xn) for all n, yn has

a convergent subsequence that converges to a point in h(x). (See Hildenbrand and Kirman,

1988, p262, for a proof.)

Next, the correspondence h is called lower-hemicontinuous (l.h.c) at x ∈ X if for any

open subset O of Y such that h(x) ∩ O 6= ∅, there exists a neighborhood V of x such that

h(x′) ∩ O 6= ∅ for all x′ in V . The correspondence is called l.h.c on X if it l.h.c at x for all

x ∈ X.

There is once again a sequential definition of lower-hemicontinuity (which, notice, requires

no compactness assumption.) The correspondence h is l.h.c if for any sequence xn in X

that converges to x ∈ X and for any y ∈ h(x), there exists a sequence yn that converges

to y with yn ∈ h(xn) for all n.

Finally, a correspondence is continuous if it is both u.h.c and l.h.c.

Also note that in order to define these notions, one only needs to know which sets are

open: they are topological notions. See Hildenbrand and Kirman (1988) for a great treatment

of everything of importance pertaining to correspondences. You should also make sure that

you understand these notions fully. Homework 3 will give you some practice but you should

work out as many exercises as possible in SLP’s chapter 3.

For our purpose, the key result we need is Berge’s Theorem of the maximum.

Theorem 7. Let X ⊂ IRn and Y ⊂ IRm, let f : X×Y 7→ IR be a continuous function and let

Γ : X 7→ Y be a continuous, compact-valued correspondence. Then, the function h : X 7→ IR

defined by h(x) = maxy∈Γ(x) f(x, y) is continuous, and the correspondence π : X 7→ Y defined

by π(x) = arg maxy∈Γ(x) f(x, y) is non-empty, compact-valued and u.h.c.

4.3. CHARACTERISTICS OF THE VALUE FUNCTION 67

You should read and understand the proof of this result in SLP. The bottom line for our

purposes is that in a maximization problem that is parameterized by a list x of objects, the

value of the problem and the set of solutions vary upper-hemicontinuously with parameters

as long as the objective function and the choice set are continuous.

Note that this result contains Weiestrass’ theorem (it’s the “non-empty” part of the last

sentence). Furthermore, observe that if f happens to be strictly concave and Γ happens to be

convex, we know from chapter 1 that π is single-valued. In that case, it is then a continuous

function. Under those stronger assumptions, we can get a stronger continuity result. In

the set-up of the theorem of the maximum, consider a sequence of strictly concave objective

functions fn that converge to f in the supnorm, and assume that f is strictly concave as

well.3 Letting πn and π be the corresponding policy functions, a natural question to ask

is whether πn converges to π in some sense. We need two more definitions to state the

result we need.

We say that πn converges to π pointwise if for all x ∈ X, |πn(x) − π(x)| converges to

zero as n grows large. We say that πn converges to π uniformally if πn converges to π in

the supnorm. Uniform convergence implies pointwise convergence, but the opposite is not

true. (Find a counter-example.)

Theorem 8. In the set-up described in the previous two paragraphs, πn converges to π

pointwise. If in addition X is compact, πn converges to π uniformally.

This is theorem 3.8 in SLP. Now, we’re in business.

4.3 Characteristics of the value function

The Bellman equation says that v∗ is the fixed point of an operator T on the space B(X)

of bounded functions defined by equation (4.1.2). We begin by establishing that T is a

contraction mapping on B(X).

Lemma 1. The operator T defined by equation (4.1.2) is a contraction on B(X).

3Functional analysis is tricky and one should never assume that something is true unless one can proveit, however intuitive a particular result appears. The fact that all functions in fn are strictly concave doesnot imply that f is, even though fn converges to f in a very strong sense. It implies that f is concavehowever. See homework 3


Proof. Since R is bounded, Tv is bounded if v is. Therefore, T does map B(X) into itself. To

verify that T is a contraction, we will check that Blackwell’s sufficient conditions are met. If

for w, v ∈ B(X) w(x) ≥ v(x) for all x, then R(x, a)+βw(g(x, a)) ≥ R(x, a)+βv(g(x, a)) for

all a ∈ Γ(x) so that (taking sups), Tw(x) ≥ Tv(x) for all x ∈ X so T satisfies monotonicity.

Turning to discounting, for all (x, a) ∈ X × Y we have that R(x, a) + β(v(g(x, a)) + c) =

R(x, a) + β(v(g(x, a)) + βc so that (taking sups), T (v + c)(x) = Tv(x) + βc. Since β < 1, T

satisfies discounting, and it is therefore a contraction with modulus β.

This implies that T has a unique fixed point (we already knew that, we showed directly

earlier that v∗ is the only bounded function that satisfies the Bellman equation. Notice

however how easy things are once we have all the machinery of functional analysis at hand.)

and that this fixed point (v∗) can be computed by iterating on T starting from any initial

case, a procedure called value function iteration. Before seeing in details how value function

iteration works, it is useful to say as much as we possibly can about v∗ and optimal plans.

So, what else can we say about v∗? Dynamic programming tells us that v∗ inherits any

property which T preserves as long as the set of functions with that property is closed in the

supnorm topology. Here’s a first illustration:

Proposition 10. Assume that R and g rise with x and that Γ is monotone in the sense that

Γ(x) ⊂ Γ(x′) whenever x′ ≥ x. Then v∗ is an increasing function.

Proof. Assume that v is an increasing function. Then R(x, a) + βv(g(x, a)) rises with x

for all a since R and g are increasing functions. Since the set of feasible actions does not

decrease when x rises, taking sups then shows that Tv is increasing as well. So T preserves

monotonicity. Because the set of bounded increasing functions is closed under the supnorm

(see homework 3), the unique fixed point of T is increasing as well.

To be able to say more (by appealing to the Theorem of the Maximum), let us now

assume that:

Assumption 1. R, Γ and g are continuous and monotonically increasing in x and X is

compact.

Then,


Proposition 11. Under assumption 1, v∗ is a continuously increasing function. Further-

more, if R is strictly increasing in x, so is v∗.

Proof. Under assumption 1 and by the theorem of the maximum, T preserves continuity

(elaborate.). Since C(X) is a Banach space, the fixed point of T must be in C(X) as well.

Henceforth then, we can restrict our search for fixed points to C(X). Under that restriction,

Weierstrass’ theorem implies that a solution to the maximization problem that defines T

exists and we may replace the sup operator with a standard max operator. For x ∈ X,

let π(x) = arg maxa∈Γ(x) R(x, a) + βv∗(g(x, a)) and pick an element a ∈ π(x). Then, if R

increases strictly in its second argument, x < x′ for any x′ ∈ X implies that

v∗(x) = R(x, a) + βv∗(g(x, a)) < R(x′, a) + βv∗(g(x′, a)) ≤ v∗(x′).

The strict inequality uses the strict monotonicity of R and the monotonicity of v∗ and g.

The last weak inequality uses the monotonicity of Γ.

The correspondence π : X 7→ Y defined by π(x) = arg maxa∈Γ(x) R(x, a) + βv∗(g(x, a))

is called the optimal policy correspondence. One can use it to build an optimal plan from

any initial state by drawing a0 ∈ π(x0) and recursively an ∈ π(xn) where for all n > 0

xn = g(xn−1, an−1). In fact, under assumption 1, you should convince yourself that this is

the only way to build an optimal plan. It follows from this trivial observation that a unique

optimal plan exists from any initial state if and only if π(x) is single-valued (a function) for

all x ∈ X.

Under assumption 1 the theorem of the maximum implies that π(x) is u.h.c a fact that

comes in very handy when deep, general existence theorems such as Kakutani’s fixed point

theorem (See Hildenbrand and Kirman, 1988) must be invoked. But can we impose additional

assumptions on the control problem to guarantee that π is always single-valued? Yes, and

as we have discussed on many occasions already, convex choice sets and strictly concave

objectives are the answer.

Here it is useful to transform the problem as in SLP into one where agents choose x′

directly rather than an action. Define Q(x) = x′ : x′ = g(x, a) for some a ∈ Γ(x). In the

Ramsey problem, Q and Γ coincide. Also define F (x, x′) = maxa:g(x,a)=x′ R(x, a). In this


transformed problem, the optimal policy correspondence π : X 7→ X is defined for all x ∈ X

by:

π(x) = arg maxx′∈Q(x)

F (x, x′) + βv∗(x′).

Note that the assumptions we have made so far on Γ and R carry to Q and F . Here are the

additional assumptions we need:

Assumption 2. X is convex, Q has a convex graph, and F is strictly concave.

The assumption that Q has a convex graph means that for any (x, x′) ∈ (X × X), any

(y, y′) ∈ Q(x)×Q(x′), and any θ ∈ [0, 1], θy+(1−θ)y′ ∈ Q(θx+(1−θ)x′). You should draw

this in the one-dimensional case to get some intuition for the language “convex graph.”

Proposition 12. Under assumptions 1 and 2, v∗ is strictly concave and π is single valued.

Proof. Note first that v∗(x) = maxx′∈Q(x) F (x, x′) + βv∗(x′) and let T be the corresponding

contraction operator on C(X). The set of concave, continuous functions4 is a closed subset

of C(X) (see homework 3.) Furthermore, T preserves concavity since R is concave. Now

pick any two states (x1, x2) with x1 6= x2 and let (x′1, x

′2) be the corresponding optimal choice

of next-period states. For θ ∈ [0, 1], let xθ = θx1 + (1 − θ)x2 and x′θ = θx′1 + (1 − θ)x′

2.

Because Q has a convex graph, x′θ ∈ Q(xθ). So

v∗(xθ) ≥ F (xθ, x′θ) + βv∗(x′θ)

> θF (x1, x′1) + (1 − θ)F (x2, x

′2) + β(θv∗(x′

1) + (1 − θ)v∗(x′2))

= θv∗(x1) + (1 − θ)v∗(x2).

The strict inequality uses the strict concavity of F and the concavity of v∗. So v∗ is strictly

concave. Single-valuededness follows from the same arguments as always.

If for all (x, x′) ∈ X × X, a ∈ Γ(x) : g(x, a) = x′ is single-valued (as is the case in

the Ramsey problem, for instance), then π is single-valued when π is. Let us restrict our

4Concave functions are continuous on open sets. Continuity is therefore almost redundant here, butalmost does not cut it. Concave functions are a lot of things (differentiable, for one) almost everywhere, butignoring the few potential problem points can lead to big mistakes.


attention henceforth to the case where π is in fact single-valued. The optimal action a at

state x must solve:

maxa∈Γ(x)

R(x, a) + βv∗(g(x, a)).

When v∗ is differentiable,5 standard tools from calculus can be used to compute optimal

actions and characterize policy function π. When is v∗ differentiable? Here’s a theorem.

Theorem 9. (Benveniste and Scheinkman) Assume that X is convex and that v∗ is concave

and let x0 be in the interior of X. Assume that there exists a concave, differentiable function

W defined on a neighborhood D of x0 such that W (x0) = V (x0) and W (x) ≤ V (x) for all

x ∈ D, then V is differentiable at x0 and its partial derivatives there are W ′s.

The idea behind this result is trivial and is illustrated in Figure 4.1 in SLP. The only

type of non-differentiability a V can have on the interior of X is a kink that points upward.

If V envelopes W from above as is the premise of the theorem, W cannot be smooth either.

One cannot envelope a concave function with a kink with a smooth concave function from

above. Now, we get:

Theorem 10. If F is differentiable on the interior of the graph of Q and assumptions 1

and 2 hold, if x0 is in the interior of X and π(x0) is in the interior of Q(x0), then v∗ is

continuously differentiable at x0.

It would be very useful to know when v∗ is twice differentiable. Then, not only could we

find optimal policies using standard calculus tools, we could also characterize how π depends

on parameters using the implicit function theorem. Getting v∗ to be twice differentiable

requires much stronger assumptions however. Finding assumptions that work became an

important question in macroeconomics in the late 1980s, with many of the field’s greatest

minds thinking about this problem. The problem was eventually solved by (then) University

of Chicago student Manuel S. Santos. The relevant part of Manuel’s dissertation is in Santos

(1991). Clearly, given the dynamic programming principles we have used so far, for v∗ to be

5You should look up a definition of what it means for a multivariate function like v∗ to be differentiableat a particular point. Multivariate differentiability is the natural extension of the one-dimensional case. Thefunction can be well-approximated at that point by an hyperplane. This is the case for instance when thefunction has continuous partial derivatives.


twice differentiable, R must be. It turns out that, in general, it must also satisfy a strong

form of concavity.

4.4 Value function iteration

Consider a stationary control problem X, Y, Γ, g, β, R where R is bounded (above and

below.) Solving the problem entails finding v∗ and the set of optimal plans for any possible

initial state x0 ∈ X. The tools we have developed in the previous section can enable us

to say quite a bit about both objects. (See the Ramsey illustration in the next section.)

But when we ask quantitative questions (many interesting questions in macroeconomics are

quantitative: how much of phenomenon X does factor Y account for?) we need to compute

(approximations to) optimal policies.

The contraction mapping theorem tell us how to do that. We could start from any guess

v0 for the value function, but probably the most useful guess is v0 ≡ 0. Then, for all x0,

v1(x0) = (Tv0)(x0) = maxa∈Γ(x0)

R(x0, a) + βv0(g(x0, a0) = maxa∈Γ(x0)

R(x0, a)

can be naturally interpreted as the maximum utility the agent could derive if they lived for

one period and entered this one period in state x0. Similarly, vn(x0) = (T nv0)(x0) is the

maximum utility the agent can generate when they have n periods to live. We know that vn

converges to v∗ uniformally and, under sufficient conditions, πn (the optimal policy in the

n-period case) also converges uniformally to π.

There are several ways to implement this procedure with computer help. We will outline

a specific procedure in the Ramsey case below.

4.5 Application to the Ramsey problem

Recall that the Ramsey planner chooses (non-negative) ct, kt+1+∞t=0 to maximize

+∞∑

t=0

βtU(ct)

4.5. APPLICATION TO THE RAMSEY PROBLEM 73

subject to:

ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0

subject to a given initial level of the capital stock. This fits into our description of a dynamic

control problem where capital plays the role of the state variable and consumption (or

investment, pick one) at date t is the action.

Letting kmax be the unique positive solution to δk = f(k), a natural choice for X (the

state space) is [0, kmax] = [0, f(kmax) + (1 − δ)kmax] and this is the natural choice for the

action set Y as well. Note that X is convex and compact. Next, given capital level k, the

choice set is Γ(k) = [0, f(k) + (1 − δ)k]. In homework 3, you will show that Γ is continuous

and convex-valued. The transition function is g(k, c) = f(k) + (1 − δ)k − c for all k ∈ Γ(k).

Since Y is bounded, we can assume without any loss of generality that U is bounded

above. We also assume that U is defined everywhere on Y ,6 that it is strictly concave,

strictly increasing and continuously differentiable on (0,∞). Finally we will assume that

limc7→0 U ′(c) = +∞.

The value function v : X 7→ IR associated with the Ramsey problem is the unique solution

to the following Bellman equation:

v(k) = maxc∈[0,f(k)+(1−δ)k]

U(c) + βv(f(k) + (1 − δ)k − c)

for all k ∈ X. The arguments we developed in the previous section imply that v is continuous,

strictly increasing, and continuously differentiable on the interior of X. They also imply that

the optimal consumption policy c(k) is unique for all k in X. We will now argue that c rises

strictly on X. If k = 0, c(k) = 0, trivially. On the other hand, if k > 0, c(k) > 0 since

limc7→0 U ′(c) = +∞.

What’s more, optimality requires that c(k) < f(k)+(1−δ)k. This is because limk 7→0 v′(k) =

+∞ (see homework 3.)

Now take any k ∈ (0, kmax). Since U and v are differentiable, U is strictly concave and

since as stated above optimal consumption is always in the interior of Y , a necessary and

6In principle, this excludes the log utility case. But that case can be dealt using the arguments developedin section 4.4 in SLP.


sufficient condition for optimal consumption given k ∈ (0, kmax] is:

U ′(c) = βv′(f(k) + (1 − δ)k − c).

Because v is continuously differentiable and strictly concave, v′ is strictly decreasing.7

Assume that capital rises from k to k′. If c does not change, the right-hand side of the

first-order condition falls while the left-hand side is unchanged. If c falls that makes matters

worse. So c must rise. It follows that c rises strictly with k on the interior of X. Since

c(0) = 0 and we could (without changing any of the arguments above) extend X past kmax,

c rises monotonically on all of X.

The fact that c rises with k will help with the computations I ask you to perform in

homework 3. Another fact that should help is that there is a pretty tight bound on how

much c can rise when k rises: for k′ > k, c(k′) < c(k) + [f(k′) + 1− δ)k′ − (f(k) + (1− δ)k).

In other words, when output rises, consumption rises by less than output does. This means

that for any k′ > k, c(k′) ∈ [c(k), c(k) + [f(k′) + 1 − δ)k′ − (f(k) + (1− δ)k)]. In homework

3, I ask you to compute v and c on [0, kmax] via value function iteration.

The first thing you need to do is to discretize the state space, i.e. create a vector of N

points k1, k2, k3, . . . kN in [0, kmax] with kN = kmax (you can make k1 = 0 too, but keep in

mind that we know what v and c are there.) This set of points is often called a grid. You can

make the points equally spaced though people often use different assignment schemes when

they feel that it is more important to be precise in certain parts of the state space than in

others.

Then create a vector v0 of initial guesses for the value function at each grid point, starting

with the zero vector. We might actually skip the first iteration since we know what v1 = Tv0

is given by v1i = U(f(ki) + (1 − δ)ki for all i ∈ 1, . . . N.

Things get more interesting with the next iteration. What is the optimal consumption

policy given guess v1? Given capital k, c(k) solves: maxc∈[0,f(k)+(1−δ)k] U(c) + βv1(f(k) +

(1 − δ)k − c). The problem is that we know what v1 is at the grid points, but not off them.

7Look it up. What you need here is not a theorem that says that v′′ is strictly negative. First of all, thisneed not be true even if v is twice differentiable. More importantly, we have not provided conditions thatguarantee that it is twice differentiable.

4.5. APPLICATION TO THE RAMSEY PROBLEM 75

There are several ways to proceed.

You can first use brute force and constrain the agent to always land on grid points, i.e. re-

strict their choice set to consumption values such that f(k)+(1−δ)k−c ∈ k1, k2, k3, . . . kN.

There are only a finite number of possibilities to try, and Matlab can quickly tell you which

is best.

You can slightly improve over this by allowing c to fall anywhere in a different (presumably

finer) grid c1, c2, c3, . . . cM ⊂ [0, kmax]. Then there is no guarantee that k′ = f(k) + (1 −

δ)k − c ∈ k1, k2, k3, . . . kN. To get the value of vn−1 at k′ you need to use some form

of interpolation. The simplest (and my favorite because it is the only form that preserves

concavity) is linear interpolation. If k′ ∈ [ki, ki+1] for some i ∈ 1, . . .N, then approximate

v(k′) with (ki+1−k′)vn−1(ki)+(k′−ki)vn−1(ki+1)

ki+1−kiwhere vn−1 is the last guess. Again, pick the value

of consumption that yield the maximum utility.

Finally, you can use the first order condition. We know that c(k) must solve U ′(c) =

βv′(f(k) + (1 − δ)k − c). While we do not know what v′ is, under the linear approximation

above we know that it is a step function with value vn−1(ki+1)−vn−1(ki)ki+1−ki

between two consecutive

grid points ki and ki+1.

One issue is that there at two possible values when we are exactly at a grid point. There

you can use the right-hand derivative and proceed with the loop as before. In other words

an exact solution may not exist to first-order conditions but using right-hand derivatives

guarantees that the algorithm converges as it should to the kink point when it is the optimal

solution.

Now let us start at k1. We know that optimal consumption is in [0, f(k1) + (1− δ)k1] so

define c = 0 and c = f(k1) + (1 − δ)k1.

Let’s begin with guess c = c+c2

. At that c, evaluate U ′(c)−βv′(f(k)+ (1− δ)k− c) where

v′ is replaced with the step function described above at iteration n. If this value is too high,

c is too low so replace c by c since c (we now know) is an upper bound. In the opposite case,

c is too low so update c to c.

Then make you next consumption guess c+c2

and proceed. This procedure is called di-

chotomy. It works very well and converges at a geometric rate to a solution. You should

easily convince yourself that after q divisions of the consumption interval, consumption is


within f(k1)+(1−δ)k1

2q+1 of the solution.

For i > 1 we know that c(ki) ∈ [c(ki−1), c(ki−1)+[f(ki)+1−δ)ki−(f(ki−1)+(1−δ)ki−1)]

so we can once again use dichotomy to find the unique solution to the first order condition.

If you use one of the first two methods, use fine grids if you want to arrive at decently

precise answer. If you use the last method, a coarse grid suffices (which is why the last

method is superior to the first two in my eyes.)

Regardless of how you choose to compute c(ki) for all i, you can update your guess for the

value function iteration by vn(ki) = U(c(ki)) + βvn−1(f(ki) + 1 − δ)ki − c(ki)) for all i. You

should proceed until the value function is almost invariant, i.e. until maxi vn(ki)−vn−1(ki) <

ε where ε is some small tolerance level. In practice, computation time is cheap in this case,

just iterate 500 times. We know that after 500 times we are within β500v1(kmax) of the true

value function (how do we know that?), that should be good enough.

In the problem set, I ask you to carry out these computations in two cases for which the

final answer is known which will enable you to make sure that your program is working. One

is the case we worked out using a shooting algorithm in chapter 2. The second one is the

case where δ = 1, f(k) = Akα for all k where A > 0 and α ∈ (0, 1) and U is the log function.

In that case, we can find the problem’s solution analytically.

It is important to recognize that in this case U is not bounded below (as usual, we can

bound it above without any loss of generality.) Nevertheless, it is trivial to establish that

the Bellman equation still holds and that the value function satisfies the same properties as

before.8 So the value function is the unique fixed point to the following functional equation:

v(k) = max0≤c≤Akα

log c + βv(Akα − c)

for all k ∈ (0, kmax].

A guess for v (that turns out to be right) is v(k) = a + b log k for all k ∈ (0, kmax] where

a and b are constants. To see this, note that under that guess c(k) for k > 0 is the unique

solution to1

c=

bβ

Akα − c

8At k = 0, define v(k) = −∞ and operate in the extended reals.

4.6. DETERMINISTIC DYNAMICS 77

so that c(k) = 11+bβ

Akα for all k. Plugging this back in the Bellman equation shows that our

guess is correct if and only if:

a + b log k = log

(Akα

1 + bβ

)+ β[a + b log

(bβAkα

1 + bβ

)]

for all k ∈ (0, kmax]. Some algebra then shows that this holds if and only if

b =α

1 − αβ

and

a =log[(1 − αβ)A]

1 − β+

αβ

1 − αβ

log[αβA]

1 − β.

4.6 Deterministic dynamics

The tools we have developed in this paper can be used to study the dynamic evolution of

the state variable(s) in a given control problem. We will illustrate this by providing a proof

of global convergence in the Ramsey model. In chapter 2, we described heuristic tools that

strongly suggest that capital converges to a unique steady state value from any positive value

of the initial stock. We will now prove this formally.

The following draws heavily from pages 133-136 in Stokey, Lucas and Prescott, and I

would recommend that you read that section as well as the entire chapter on deterministic

dynamics. In particular, SLP correctly emphasize that global convergence and well-behaved

dynamics in the Ramsey model are a fragile result. Relaxing the one sector assumption

and/or neoclassical assumptions on the production function and the utility functions suffice

to produce a very different outcome.

In any event, the result we wish to show is:

Proposition 13. In the Ramsey model, the equilibrium path of capital converges to a unique

steady state value from any positive initial value of the capital stock.

Proof. As we argued above, the optimal investment policy function h : [0, kmax] 7→ [0, kmax]


satisfies for any k > 0:

βv′(h(k)) = U ′(f(k) + (1 − δ)k − h(k)) (4.6.1)

and v′(k) = U ′(f(k) + (1 − δ)k − h(k))(f ′(k) + (1 − δ)) (4.6.2)

Equation (4.6.1 ) implies that h rises strictly on [0, kmax] and the theorem of the maximum

implies that h is continuous. What’s more, the unique steady state value k∗ of the capital

stock is the solution on [0, kmax] to h(k) = k. Equations (4.6.1) and (4.6.2) then imply that

k∗ is the unique value that satisfies β(f ′(k) + (1 − δ)) = 1 (an observation we already made

on several occasions in chapter 2.)

Now we want to argue that [h(k) < k and k > 0] if and only if k > k∗. This will imply

that the path of capital can be studied on a graph that looks qualitatively identical to the

standard Solow model graph. In particular, we have only one strictly positive steady state

and global convergence to it from any strictly positive initial condition.

To establish the desired result, note that since v is concave v′(k1) > (=)(<)v′(k2) if and

only if k1 < (=)(>)k2 In particular, for any k > 0, [v′(k) − v′(h(k))][k − h(k)] ≤ 0 with

equality if and only if k = k∗. Together with equations (4.6.1) and (4.6.2) and a bit of

algebra, this yields gives(f ′(k) + (1 − δ) − 1

β

)(k − h(k)) ≤ 0 with equality if and only if

h(k) = k, as needed.

4.7. PROBLEMS 79

4.7 Problems

Problem 1

1. Show that the supnorm is a norm on C(X) where X ⊂ IRn (n ∈ IN).

2. Show that pointwise convergence does not imply uniform convergence.

3. Show that the set of bounded, increasing real functions on a bounded subset X of IRn

(n ∈ IN) equipped with the supnorm is a complete metric space.

4. Show that the set of bounded, strictly increasing real functions on a subset IR equippedwith the supnorm is not a complete metric space.

5. Show that the set of bounded, concave real functions on a subset X of IRn (n ∈ IN)equipped with the supnorm is a complete metric space.

6. Show that the set of bounded, strictly concave real functions on IR+ equipped with thesupnorm is not a complete metric space.

7. Let f be a continuous real function on IR+ and for all k ≥ 0 define Γ(k) = [0, f(k)].Show that Γ is a continuous correspondence. What condition do you need to imposeon f to guarantee that Γ has a convex graph? (Prove that your condition is sufficient.)

Problem 2

Consider a Ramsey planner who chooses non-negative ct, kt+1+∞t=0 to maximize

+∞∑

t=0

βtU(ct)

subject to:ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0

given an initial level k0 of the capital stock. As usual, f is the intensive form of a neoclas-sical production function, β ∈ [0, 1), δ is in [0, 1], and U is bounded, strictly concave andcontinuously differentiable on IR++ with limc7→0 U ′(c) = +∞.

1. Let v be the value function associated with this problem. Write the Bellman equationwhich v must solve.

2. Show that the mapping which the Bellman equation defines is a contraction mapping.

3. Show that limk 7→0 v′(k) = +∞.


4. Let kmax be the unique positive solution to δk = f(k) and let c : [0, kmax] 7→ [0, kmax]be the optimal consumption policy function. Show that for any k′ > k > 0,

c(k′) − c(k) ≤ f(k′) + (1 − δ)k′ − [f(k) + (1 − δ)k].

5. Assume that β = 0.95, δ = 0.1, f(k) = 10k0.33 for all k and U is the log function.(Ignore as usual the fact that U is not defined at 0.) Use value-function iteration tocompute an approximation to v and c on [0, kmax]. Plot both.

6. Let k0 = 1 and plot the optimal path of capital over the first 50 periods. Compareyour plot to the plot you obtained in homework 1 for the same set of parameters.

7. Assume now that δ = 1. What are v and c in this case (exactly, don’t use the computeryet)? Use value function iteration to compute an approximation to v and c on [0, kmax].Plot the first 5 iterates of v and c and compare them to their exact form.

Problem 3 (Cake-eating problem)


+∞∑

t=0

βtU(ct)

subject to:ct + kt+1 = kt for all t ≥ 0

given an initial level k0 of the capital stock. As usual, β ∈ (0, 1) and U is bounded, strictlyincreasing and strictly concave and continuously differentiable on IR+ with limc7→0 U ′(c) =+∞.

1. Let v be the value function associated with this problem. Write the Bellman equationwhich v must solve.

2. Show that the operator which the Bellman equation defines is a contraction mappingon [0, k0].

3. Show that v rises strictly and continously with k and that v is strictly concave.

4. Let c : [0, k0] 7→ [0, k0] be the optimal consumption policy function. Show that cincreases strictly and continuously with k.

5. Show that limk 7→0 v′(k) = +∞.

6. Show that the capital stock does not converge to a positive steady state value in thisenvironment.

7. Assume that U is the log function (ignore as usual the fact that the log function is notdefined at zero), guess that v(k) = a + b log k for all k > 0 where a, b > 0 and verifythat your guess is correct.

4.7. PROBLEMS 81

8. What is the optimal consumption policy function when β = 0?

Chapter 5

Stochastic dynamic programming

Economic decisions are often made under some uncertainty. In the context of dynamic

optimization problems, the reward or the evolution of the state associated with particular

decisions may in part be random or stochastic. For instance, the true shape of the production

function may depend on the state of technology at date t and this state may not be known

with full precision until date t itself.

Entire books are devoted in many fields (including philosophy) to defining what words

like “random” mean in a deep sense. For our purposes, we simply want to think about the

situation where an agent’s rewards or opportunities depend in part on the outcome of a

random experiment, an experiment whose outcome cannot be fully determined a priori.

The canonical random experiment is the flip of a coin. We can list all the possible

outcomes of this experiment and assign probability to them quite easily. What we need is a

framework that enables us to do this for all random experiments one can think of. For this

we need some notions of probability theory.

The following is a quick introduction to probability theory followed by a section that

extends our deterministic results to the stochastic case. Chapter 9 in SLP is somewhat

misleading in this respect by suggesting that the principle of optimality loses some generality

when one introduces uncertainty. It does not, as the next edition of SLP will explain. The

last sentence of the first paragraph of section 9.1, in particular, is entirely wrong. But this

is a technical detail and chapter 9 gives an excellent treatment of stochastic programming

techniques overall.

83

84 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING

5.1 Probability theory

Outcomes of random experiments are draws from a set Ω (sometimes called the universe). For

instance, if the experiment we have in mind is the roll of an ordinary dice, Ω = 1, 2, 3, 4, 5, 6.

An event is a subset of the universe Ω. Sometimes we will want to consider all possible

subsets of Ω but in big spaces this creates problems. In general, we restrict events to be

members of a specific class of subset called a sigma-algebra.

A sigma-algebra is a subset F of 2Ω (the set of all subsets of Ω) such that

1. Ω ∈ F ;

2. if A ∈ F , the complement A ≡ Ω − A of A is also in F ;

3. for any countable collection Aii∈I ⊂ F ,⋃

i∈I Ai ∈ F

It is trivial to see that these properties imply that sigma-algebras contain the empty set

and that they are closed under countable intersection. Properties of sigma-algebras guarantee

that whenever we can talk about an event occuring, we can talk about it not occurring as

well. They also enable us to speak of “at least one” of several possible events occurring and

“none of a list of events occurring.” Elements of F are called measurable sets.

A pair (Ω,F) is called a measurable space. A measure on a measurable space (Ω,F) is a

function µ : F 7→ [−∞, +∞] such that

1. µ(∅) = 0;

2. µ(A) ≥ 0 for all A ∈ F ;

3. for any countable, disjoint collection Aii∈I ⊂ F , µ(⋃

i∈I Ai

)=∑

i∈I µ(Ai)

The triplet (Ω,F , µ) is called a measure space. A measure P such that P (Ω) = 1 is called

a probability measure and the measure space (Ω,F , P ) is then called a probability space.

In our dice-casting example, Ω = 1, 2, 3, 4, 5, 6, and since Ω is finite it is natural to take

F = 2Ω.1 Event (or measurable set) 2, 4, 6, for instance, is the event: “The outcome of

the roll is an even number.”

1How many sets does 2Ω contain? Answering counting questions such as these are the 99% of the art ofcalculating probabilities.

5.1. PROBABILITY THEORY 85

If we believe the die to be fair, then it is natural to posit that all outcomes are equally

likely. It is natural then to equip (Ω, 2Ω) with a uniform probability measure defined for all

A ∈ 2Ω by P (A) = #A#Ω

where #A is the cardinality of set A, i.e. the number of elements it

contains. For instance, P (2, 4, 6) = 36

= 0.5.

When we work with uncountable spaces such as the real line, it is “difficult” to work with

the set of all subsets of the universe. On the real line one convenient sigma-algebra to use

is the smallest sigma-algebra that contains all intervals (whether open, closed, half-closed,

bounded, unbounded . . . ). That sigma-algebra is called the Borel sigma-algebra and its

members are called Borel sets. In fact, economists often feel that they “have to” use this

particular sigma-algebra, natural a construction as it is. This belief often leads to needless

losses of generality, as in the case of SLP’s chapter 9. More on that later.

One could devote an entire course to discussing the structure of probability theory. We

don’t have that kind of time, but you should read chapter 7 in SLP as carefully as possible

and run any questions you may have by me.

The next notion we need is that of a random variable. A real-valued random variable X

in a probability space (Ω,F , P ) is a function X : Ω 7→ IR such that for any ω ∈ Ω : X(ω) ≤

c ∈ F .

In other words, a function is a random variable if for all Borel subsets B of the real-line we

can assign a probability to any event of the form X−1(B) = ω ∈ Ω : X(ω) ∈ B. Obviously,

we could similarly define a random variable from (Ω,F , P ) into any measure space. Random

variables are special cases of measurable functions in the context of probability spaces.

One easily shows that X−1(B) : B is a Borel set is a sigma-algebra contained in F .

It has a name: it is the sigma-algebra induced by X. Similarly, X induces a probability

distribution PX on the real line defined by PX(B) = P (X−1(B)) for all Borel sets B.

Here’s an example. Consider a bet that pays one dollar if the roll of a dice turns out

to be even and nothing otherwise. Letting X be the payoff associated with the bet, X ∈

0, 1, the sigma-algebra induced by X is ∅, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 and PX is the

probability distribution that puts probability one half on zero, one half on one, and nothing

elsewhere.

A key type of measurable function is the set of simple functions. Simple functions on a


measurable space (Ω,F) are linear functions of finitely many indicator functions. That is,

for ω ∈ Ω,

f(ω) =

n∑

i=1

ai1Ai(ω)

where n is an integer, Aini=1 ⊂ F is a finite collection of measurable sets, ain

i=1 are real

numbers, and for all i, 1Aiis an indicator function that takes value 1 if ω ∈ Ai, 0 otherwise.

These functions are key for the following reasons. A function is measurable if and only

if it is the pointwise limit of increasing sequences of simple functions. In fact, we can find a

sequence of simple functions that converges uniformally to all bounded measurable functions.

(The proof is in SLP, Theorem 7.5.) Because simple functions are easy to deal with, these

facts are the main ingredient of many (most?) proofs in measure theory.

They are used, for one thing, to define a notion of integration (or expectation) in arbitrary

measure spaces. Let f =∑n

i=1 ai1Aibe a simple function in a measure space (Ω,F , µ). The

integral of f with respect to µ is

∫

Ω

fdµ =

n∑

i=1

aiµ(Ai).

Then, for an arbitrary measurable function f , let f+ = max(0, f) be the non-negative

part of f . Letting S be the set of simple functions φ such that φ ≤ f (meaning φ(ω) ≤ f(ω)

for all ω ∈ Ω), we can define: ∫

Ω

f+dµ = supφ∈S

∫

Ω

φdµ.

One similarly defines the negative part f− of f by f− = max(0,−f) and its integral

exactly as above. Measurable function f is called integrable with respect to µ if both its

negative and positive parts have finite integrals. Then,

∫

Ω

fdµ =

∫

Ω

f+dµ −∫

Ω

f−dµ.

When Ω = IR (or some other real interval), when F is the set of Borel sets on IR and

when µ is the only measure on F such that µ([a, b]) = b−a for any interval [a, b], this notion

of integration coincides with the standard Riemann integral where it is defined. It is more

5.1. PROBABILITY THEORY 87

generally referred to as the Lebesgue integral.

More importantly for our purposes, when µ is a probability measure, it coincides with

the standard expectation operator. Considering once again our role-of-the-dice example and

letting X be the random variable associated with a bet that pays one dollar if the roll of a

dice turns out to be even and nothing otherwise,

∫X(ω)dµ = E(X) =

6∑

i=1

1

6X(ωi) = 0.5

.

In the case of real random variables X with a Riemann integrable density function f , the

corresponding probability measure on IR is µ([a, b]) =∫ b

af(x)dx for all intervals [a, b] and

∫Xdµ = E(X) =

∫

IR

xf(x)dx.

The bottom line is that the broad notion of integral we have introduced encompasses all

familiar cases.

Sometimes it is useful to combine different random experiments hence different probabil-

ity spaces. Given two measurable spaces (X,F) and (Y,G) the product space (X ×Y,F ×G)

is the measure space one obtains when F ×G is the smallest sigma-algebra that contains all

sets of the form A × B where A ∈ F and B ∈ G.

Finally, we will need the notion of conditional expectation. Let (Ω,F , P ) be a probability

space and A be an element of F such that P (A) > 0. Then we can define the probability of

event B in F conditional on A as usual as P (B|A) ≡ P (B∩A)P (A)

.

For instance, what is the probability that the outcome is 2 in the dice-roll given that the

outcome is even? It is

P (2|2, 4, 6) =P (2 ∩ 2, 4, 6)

P (2, 4, 6)=

P (2)P (2, 4, 6)

=1

3.

It is easy to show that P (B|A) is a probability measure on F . We can then define

the expectation of a random variable X conditional on Ai as E(X|Ai) =∫

XdP (•|A). In

general, we need to extend this notion to sets of measure zero. For instance, given a pair


(X, Y ) of real random variables with joint, continuous density f and continuous, everywhere

positive marginal distributions fX and fY , we would like a notion of conditional expectation

that coincides with the standard notion. This can be done in great generality, as discussed

in SLP in section 7.7.

5.2 Transition functions

In stochastic problems, the state often depends on the value of a random variable. Further-

more, future values of these shocks may depend on past values. In all, the evolution of the

agent’s opportunities and rewards depends on a sequence of random variables. Indexed sets

of random variables (of which sequences are a special case) are called stochastic processes.

The study of stochastic processes is an important branch of probability theory. Here, we

only need to be able to talk about the transition of a process from one state to another (but

you should read chapter 8 in SLP in its entirety.)

Let (Z,Z) be a measurable space. A transition function is function Q : (Z,Z) 7→ [0, 1]

such that for all z ∈ Z, Q(z, •) is a probability measure on (Z,Z) and for each A ∈ Z,

Q(•, A) is a measurable function.

In words, Q(z, •) is the distribution of next period’s shock given this period’s value and

Q(•, A) gives the likelihood of a particular event as a function of today’s shock. All told,

Q(z, A) is the probability of landing in set A next period given current state z. Since Q is

a well-defined measure, we can take expectations vis-a-vis it and write for all Z-measurable

and Q − integrable functions f ,

(Tf)(z) =

∫f(z′)Q(z, dz′).

Henceforth we restrict our attention to bounded, measurable functions, which implies inte-

grability. The operator T then defines an operator from the set of bounded Z-measurable

functions to itself.

Also note that Q also induces a mapping from the set of Z-adapted probability measures

to itself. Given a Z-adapted distribution λ of shocks today, next period’s distribution is

5.3. MARKOV CHAINS 89

given for each A ∈ Z by

(T ∗λ)(A) =

∫Q(z, A)λ(dz).

That mappings T and T ∗ are well-defined and preserve Z-measurability requires a proof

of course, the proofs are in SLP’s chapter 8. One can also show that the two mappings are

intimately related (that they are dual notions). For any bounded Z-measurable function f

and any Z-adapted probability measure λ, one shows with a bit of work that

∫(Tf)(z)λ(dz) =

∫f(z′)(T ∗λ)(dz′).

In words, it does not matter if we apply the transition operator to f or to λ first, we arrive

at the same expected value next period.

We say that Q satisfies the Feller property if Tf is continuous whenever f is, and that it

is monotonic if Tf is increasing whenever f is. An example is probably useful at this stage.

5.3 Markov chains

(Reading the first section of chapter 11 will be useful for what follows.)

Assume that the state space S for the stochastic shock is a finite set s1, s2, . . . sn. The

natural sigma-algebra for such a set is the set of all subsets of S. Transition functions simply

assign a probability Πij of moving from state si to state sj and distributions of shocks are

1 × n vectors p such that∑

i pi = 1.

The resulting process is called a Markov chain. The transition function is fully summa-

rized by a n × n matrix. If the current shock is si, the distribution of shocks in the next

period is Πij : j = 1, . . . n. Generally, the process maps a distribution p of shocks to

distribution pΠ one period ahead and, recursively, pΠn after n periods.

A question that one often asks in economics is whether pΠn converges to some distribution

as n grows large. Obviously, if pΠn converges to p∗ (in IRn), it must be the case that p∗Π = p∗

(by continuity).

A distribution like p∗ is called an invariant distribution. All Markov chains have at least

one. How many it has depends on how many ergodic sets the chain has. A set E ⊂ S is


called ergodic if from all s ∈ E, the probability P (s, E) of remaining into E is one and if no

proper subset of E has this property.

A set E is called transient if there is a positive probability of leaving it and never returning

to it.

Some work shows that the state space of a Markov chain can be partitioned into ergodic

sets and one transient set, and that invariant distributions of Markov chains are the convex

combinations of at most n distributions (that can be computed quite easily via matrix

multiplication, see Theorem 11.1 in SLP).

A necessary and sufficient condition for a Markov Chain to have one invariant distrib-

ution is that it have exactly one ergodic set. This is the case if and only if a state exists

that is eventually visited with positive probability from any state. Finally, the sequence

pΠn converges to this invariant distribution given any initial distribution p under certain

conditions (see Theorem 11.4) which hold for instance if Π has only strictly positive entries.

5.4 Stochastic control problems

Now we have all the tools we need to extend our deterministic dynamic programming results

to the stochastic case.

Consider an agent who solves a control problem in the presence of stochastic shocks

whose evolution is described by a transition function Q on a measure space (Z,Z). Other

state variables, as before, take values in state X ∈ IRn, and actions come from state Y ∈ IRm

where n and m are integers.

The choice set Γ is now a correspondence from X ×Z to Y and the reward function R is

defined on X × Y × Z. The transition function for the endogenous state is now a function

g from X × Y × Z into X. There is, as before, a discount factor.

At date t the set of all possible shock histories is Zt and we can equip this set with the

product sigma-algebra Z t. We denote a particular element of Zt by zt and the sth element

of zt by zts. It is easy to see (and formally shown in section 8.2 of SLP) that the transition

function induces a unique probability distribution of date t histories µt(z0, •) that depends

on the initial state.

5.4. STOCHASTIC CONTROL PROBLEMS 91

At date 0 the agent may choose any action a0 ∈ Γ(x0, z0). From then on, the agent must

make her plan contingent on future realizations of the stochastic shock. A plan, then, is a

sequence πt of Z t-measurable functions into action set Y . Plan πt is feasible if for all t

πt is Z t-measurable, and:

1. π0 ∈ Γ(x0, z0),

2. πt(zt) ∈ Γ(g(xt−1, z

t−1), ztt).

Obviously, a necessary condition to impose before proceeding is that a feasible plan exists.

In turn, this requires first that Γ have a measurable selection. But the key thing to recognize

here is that we have yet to say exactly what we mean by “measurability” i.e. to choose a

specific sigma-algebra Z. If we insist on Borel-measurability, we will run into trouble because

the existence of a Borel-measurable selection does not guarantee that the value function is in

turn Borel measurable (hence integrable.) And there is no guarantee that a Borel-measurable

policy function exists.

But these are superficial problems. There is no reason (at this level of generality) why

one should insist on Borel measurability. The analogue in the existence problem we studied

in chapter 1 would be to insist on a particular topology and to give up if that particular

topology does not work.

In any event, the bottom line is that a notion of measurability that makes the stochastic

principle of optimality as general as the deterministic one exists. It is called universal

measurability and is defined for instance in Schreve and Bertsekas (1978.) Importantly, one

can start as is typical with Borel-measurable objects. A Borel transition function has a

unique universally measurable extension. Borel-measurability of the other objects implies

universal measurability. Assuming for instance that Γ has a Borel selection implies a fortiori

that it has a universally measurable one. And the value function is then always universally-

measurable, as does a universally measurable Markov policy. In a word (or two), there is no

problem.


5.5 The stochastic principle of optimality

Let Π(x0, z0) the set of feasible policies. The value function associated with the stochastic

control problem is

v∗(x0, z0) = supπ∈Π(x0,z0)

+∞∑

t=0

∫

Zt

βtR(x(zt), πt(zt), zt

t)µt(dzt)

where x(z0) = x0 and for all t > 0 and all possible histories zt, x(zt) = g(x(ztt−1), π(zt

t−1), ztt−1).

This seems messy but the bottom line is that under very general conditions, v∗ solves

the following functional equation:

v(x, z) = supa∈Γ(x,z)

R(x, a, z) + β

∫v(g(x, a, z), z′)Q(z, dz′) for all z ∈ Z and x ∈ X.

The operator associated with the expression above defines a contraction mapping T from

bounded, universally measurable functions to bounded, universally measurable functions.

Furthermore, (assuming as before that R is bounded) v∗ is the only bounded fixed point

of T . As before, we can make v∗ is continuous, strictly increasing, strictly concave and

differentiable in the endogenous state by assuming that R, Γ and Q satisfy certain properties.

This is developed in section 9.2 in SLP.

Also, we can use π(x, z) = arg maxa∈Γ(x,z) R(x, a, z)+β∫

v(g(x, a, z), z′)Q(z, dz′) to build

an optimal Markov plan for any set of initial condition. The plan is Markov in that the action

it specifies at a particular date only depends on the current value of the state (x, z), it does

not depend on the date. And everything can be computed via value function iteration, just

like in the deterministic case.

Let’s now illustrate all this with an example.

5.6 The stochastic Ramsey problem

Consider a variation of the Ramsey problem we have studied many times in this course

in which the only thing that changes is the production function. At a particular date,

assuming the economy has capital stock k > 0 output is zf(k) where f satisfies the same

5.6. THE STOCHASTIC RAMSEY PROBLEM 93

assumptions as always and z ∈ [zL, zH ] where zL > 0 and zH < ∞. Furthermore, z follows

a Markov process with a stationary transition function Q that is monotonic, satisfies the

Feller property and induces a unique, globally ergodic invariant distribution. The Bellman

equation associated with this problem is, for all (k, z) ∈ IR+ × [zL, zH ] :

v(k, z) = max0≤c≤zf(k)

U(c) + βE [v(zf(k) + (1 − δ)k − c, z′)|z]

This problem is studied in great details by Brock and Mirman (1972) when productivity

shocks are independent across periods (a trivial sort of Markov process), and by Mehra and

Donaldson (1983) when shocks are correlated. In both cases, it is easy to show that both

investment and consumption rise with output. They are independent of the current shock in

the independent case, but depend on the shock even at equal output in the correlated case

since the current value of the shock affects expectations of future shocks. How they depend

on the shock depends on the shape of preferences, the degree of risk aversion in particular.

One big difference between the stochastic case and the certainty case is that the steady

state depends on the reward function in the first case but not in the second. Among other

things, more risk averse agents accumulate more capital when faced with uninsurable uncer-

tainty and save more for precautionary reasons as emphasized by Ayagari (1994).

In both the correlated and the independent case (under the assumptions we have imposed

on Q), the distribution of capital converges to an invariant distribution that does not depend

on initial conditions. To understand what this means, assume that we start from any initial

level of capital k0. The value of the capital stock at date t > 0 can be computed given a any

sequence of shocks and the optimal consumption policy. Because shocks are uncertain, kt is a

random variable with distribution Ft. Saying that the distribution of capital converges to an

invariant distribution is saying that Ft converges (uniformally, see Brock and Mirman, 1972)

to some invariant distribution F . In particular, if we draw a long sequence of shocks and

the corresponding series kt of capital stocks and drop the first half (say) of the sequence,

we have a sample of capital stock that is approximately drawn from F and we approximate

the distribution moments (mean, variance . . . ) using this sample.

Another way to think about this is to assume that the economy is populated by a con-


tinuum of households who operate the same production function as above but face different

shocks from one another. Because there is a continuum of households, we can assume that

Π gives the fraction of households who experience each of the possible transitions in each

period. Because households face different (idiosyncratic) shocks, their stock of capital differ.

The results of Brock and Mirman imply that in such an environment, the distribution of

capital across households converges to F . (See Ayagari, 1994, for more on this.)

Consider finally the case where z is binary, i.e. drawn from zL, zH and follows a Markov

chain with stationary transition matrix

Π =

pLL 1 − pLL

1 − pHH pHH

where 0 < pLL < 1−pHH < 1. Since z can take two values, this is a system of two functional

equations:

v(k, zL) = max0≤c≤zLf(k)

U(c) + β[πLLv(zLf(k) − c, zL) + (1 − πLL)v(zLf(k) − c, zH)]

v(k, zH) = max0≤c≤zHf(k)

U(c) + β[(1 − πHH)v(zHf(k) − c, zL) + πHHv(zHf(k) − c, zH)]

This can be taken to the computer just as easily as before. Letting kmax be the unique

solution to zHf(k) = δk, the natural choices for X and Y are [0, kmax]. Then we can start

with v(•, z) ≡ 0 on [0, kmax] for z ∈ zL, zH. Using value-function iteration just like before

(see homework 4) we can compute (approximations to) the optimal policy function c(k, z)

and the value function v(k, z).

We can even simulate the economy’s stationary distributions by drawing from it (see

homework 4) and ask, for instance about the effects of risk on aggregate savings. Fun stuff.

5.7. PROBLEMS 95

5.7 Problems

Problem 1

1. Under what conditions do finite state Markov chains induce a transition function thatsatisfy the Feller Property?

2. Under what conditions do finite state Markov chains induce a transition function thatis monotonic?

3. Consider a two-state Markov chain with transition matrix

Π =

(0.8 0.20.2 0.8

).

Compute the chain’s unique invariant distribution. Does the chain converge there fromany initial distribution?


Π =

(0 1

0.5 0.5

).

Show that the chain has a unique invariant distribution and show that it convergesthere from any initial state.


Π =

(1 00 1

).

How many invariant distributions does this chain have?

Problem 2

Consider the stochastic Ramsey problem described in section 5.6 of the notes. Assumethat the stochastic shock follows a Markov chain with two states zL, zH and with stationarytransition matrix

Π =

(pLL 1 − pLL

1 − pHH pHH

)

where 0 < pLL < 1 − pHH < 1.

1. Instead of (k, z) define (y, z) to be the state of the system where y is current output.Write the corresponding Bellman equation.

2. Show that the operator which the Bellman operator defines on IR+ × zL, zH is acontraction mapping.


3. Assume that zL = 9 and zH = 11, and that

Π =

(0.8 0.20.2 0.8

).

Let kmax be the unique solution to δk = zHf(k) and let c : [0, kmax] × zL, zH 7→[0, kmax] be the optimal consumption policy function. Assume that β = 0.95, δ = 0.1,f(k) = 10k0.33 for all k and U is the log function. (Ignore as usual the fact that U isnot defined at 0.) Use value-function iteration to compute an approximation to v andc on [0, kmax] × zL, zH. Plot v(•, zL), v(•, zH), c(•, zL), c(•, zH).

4. As we have argued in class, this economy converge to a steady state distribution ofcapital. We can simulate this steady state distribution using a process called MarkovChain Monte Carlo. Specifically, begin with k0 = 1 and z0 = 9. The policy functionyou computed above gives you k1. Draw z1 using the chain’s transition matrix andthe computer’s random number generator. Proceed and generate 20, 000 draws fork. Drop the first 10, 000 (the assumption here is that after 10,000 periods we areeffectively drawing from the invariant distribution.) Plot the histogram associatedwith the remaining 10,000 draws.

5. Calculate the average value of the capital stock. How does this compare to the steadystate value of capital when zL = zH = 10? (Is it higher or lower?) Suggest anexplanation for this finding.

Chapter 6

Bibliography

Ayagari, R. (1994) “Uninsured Idiosyncratic Risk and Aggregate Saving,” Quarterly Jour-nal of Economics, 109: 659-84.

Balasko, Y. and Shell, K., (1980) “The Overlapping-Generations Model I: The Case of PureExchange without Money”, Journal of Economic Theory, 23, 281-306.

Barro, R. (1974). “Are Government Bonds Net Wealth?” Journal of Political Economy,82, 1095-1117.

Brock, W. A. and Mirman, L. J., (1972) “Optimal Economic Growth and Uncertainty: TheDiscounted Case,” Journal of Economic Theory, 4, 479-513.

Cass, D. (1972). “On Capital Overaccumulation in the Aggregative, Neoclassical Modelof Economic Growth: A Complete Characterization”, Journal of Economic Theory, 4,200-23.

Diamond, P. (1965), “National Debt in a Neoclassical Growth Model”, American EconomicReview, 55, 1026-50.

Mehra, R., and Donaldson, J. B., (1983) “Stochastic Growth with Correlated ProductionShocks,” Journal of Economic Theory, 29, 282-312.

Jones, L. E., and Manuelli, R. E., (1990) “A Convex Model of Equilibrium Growth: Theoryand Policy Implications” Journal of Political Economy, 98, 1008-1038.

Jones, L. E., and Manuelli, R. E., (2005) “Neoclassical Models of Endogenous Growth: TheEffects of Fiscal Policy, Innovation and Fluctuations,” in: Philippe Aghion and StevenDurlauf (ed.), Handbook of Economic Growth, edition 1, volume 1, chapter 1.

Hildenbrand, W. and Kirman, A. P., (1988) “Equilibrium Analysis.”

Hopenhayn, H. (1992) “Entry, Exit, and firm Dynamics in Long Run Equilibrium,” Econo-metrica, Vol. 60, No. 5., 1127-1150.

97

98 CHAPTER 6. BIBLIOGRAPHY

Kehoe, T. J. (1989) “Intertemporal General Equilibrium Models”, in Frank H. Hahn, editor,The Economics of Missing Markets, Information, and Games, Oxford University Press,363-93.

Lucas, R. and Prescott, E. (1971) “Investment under Uncertainty,” Econometrica, Vol. 39,No. 5., 659-681.

Michel, P. (1990), “Some clarifications on the transversality condition,” Econometrica, Vol.58, No. 3., pp. 705-723.

McGrattan, E., R. (1998), “A Defense of AK Growth Models”, Federal Reserve Bank ofMinneapolis Quaterly Review, 22, 13-27.

Negishi, T. (1960), “Welfare Economics and Existence of an Equilibrium for a CompetitiveEconomy”, Metroeconomica, 12, 92-7.

Ramsey, F. (1928), “A Mathematical Theory of Saving”, Economic Journal, 38, 543-59.

Rebelo, S. (1991), “Long-Run Policy Analysis and Long-Run Growth”, Journal of PoliticalEconomy, 99, 500-521.

Santos, M., S. (1991) “Smoothness of the Policy Function in Discrete Time EconomicModels,”Econometrica, Vol. 59, No. 5., 1365-1382.

Solow, R. (1956), “A Contribution to the Theory of Economic Growth”, Quarterly Journalof Economics, 70, 64-94.

macroeconomic theory i southern methodist...

Documents