macroeconomic theory i southern methodist...
TRANSCRIPT
Macroeconomic Theory I
Southern Methodist University
Erwan Quintin
Federal Reserve Bank of Dallas∗
First draft: July 25, 2006
This version: December 18, 2007
∗Erwan Quintin: Research Department, Federal Reserve Bank of Dallas, 2200 N. Pearl St. Dallas, TX75201, (214) 922 5157, [email protected]. The views expressed herein are those of the authorsand may not reflect the views of the Federal Reserve Bank of Dallas or the Federal Reserve System. Thisdocument draws heavily from the class notes of Tim Kehoe, Ed Prescott, Jim Dolmas and Dirk Krueger.I would like to thank Erasmus Kersting and my SMU students for weeding out many errors in previousversions of this document. All remaining errors are mine.
1
2
Contents
1 Course information 5
1.1 Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 List of topics and readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Neoclassical Growth Theory 9
2.1 The Ramsey optimal growth problem . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 A specific example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 The bottom line and the golden rule . . . . . . . . . . . . . . . . . . 16
2.2 The Solow Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Dynamic efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Population growth and progress . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Endogenous growth (Ak models) . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Diamond’s overlapping generation model . . . . . . . . . . . . . . . . . . . . 21
2.6 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3
4 CONTENTS
3 Intertemporal General Equilibrium Models 35
3.1 Infinitely-lived consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Market structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.2 Welfare theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.4 Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Overlapping generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Separating hyperplane theorem . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Deterministic Dynamic Programming 55
4.1 Principle of optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.2 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . 62
4.2.3 Theorem of the Maximum . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Characteristics of the value function . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Value function iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Application to the Ramsey problem . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 Deterministic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Stochastic dynamic programming 81
5.1 Probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Transition functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 Stochastic control problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5 The stochastic principle of optimality . . . . . . . . . . . . . . . . . . . . . . 90
5.6 The stochastic Ramsey problem . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
CONTENTS 5
6 Bibliography 95
6 CONTENTS
Chapter 1
Course information
1.1 Course description
My goal in this course is to teach you some tools, models and techniques one needs to know
in order to read and write academic papers in macroeconomics. These tools can be a bit
difficult to learn. I encourage you to cooperate as much as possible with your classmates
and to talk to me whenever you get stuck on an assignment or have questions about the
material.
1.2 Resources
The main source of material for this course is my notes. I will also recommend readings from
the following textbooks:
1. N. L. Stokey and R. E. Lucas with E. C. Prescott, Recursive Methods in Economic
Dynamics, Harvard University Press, 1989.
2. C. Azariadis, Intertemporal Macroeconomics, Oxford: Blackwell Publishers, 1993.
3. L. Ljungqvist and T. J. Sargent, Recursive Macroeconomic Theory, MIT Press, 2000.
4. R. J. Barro and X. Sala-I-Martin, Economic Growth, McGraw Hill Publishers, 1995.
7
8 CHAPTER 1. COURSE INFORMATION
Finally, you should have handy a good micro theory textbook (such as Mas-Collel, Green
and Whinston) and a good textbook in Real Analysis (such as Rudin’s Principles of Math-
ematical Analysis.)
1.3 Grading
Homework (30%), Midterm (35%), Final (35%). I will announce exam dates as soon as
possible.
There will be four problems sets. I will assign them once the appropriate material has
been covered, and you will have one week to complete them. There is no need to type
your answers but if I can’t read them, you won’t get credit, so make sure they are legible.
Homework problems are difficult, and they involve computer programming in some cases
(you should start learning how to use Matlab right now. You should also start reading the
basic topology section of a senior-level real analysis textbook.)
Although I encourage you to collaborate with other students, each student must turn in
his or her own set of answers. In particular, everyone should write and turn in their own
code. Your code should be as neat as possible and contain comments that help me follow
your train of thought.
1.4 List of topics and readings
1. Neoclassical Growth Theory
Notes, chapter 2.
Jim Dolmas’ Matlab notes (and any other material you find useful for becoming proficient
with Matlab.)
SLP, chapters 2, 3.
Azariadis, chapters 13, 14.
Ljunqvist an Sargent, chapter 11.
1.4. LIST OF TOPICS AND READINGS 9
Barro and Sala-I-Martin, Intro, chapters 1, 2, plus the corresponding appendices.
Solow, R. (1956), “A Contribution to the Theory of Economic Growth”, Quarterly Journal
of Economics, 70, 64-94.
Diamond, P. (1965), “National Debt in a Neoclassical Growth Model”, American Economic
Review, 55, 1026-50.
Jones, L. E., and Manuelli, R. E., (2005) “Neoclassical Models of Endogenous Growth: The
Effects of Fiscal Policy, Innovation and Fluctuations,” in: Philippe Aghion and Steven
Durlauf (ed.), Handbook of Economic Growth, edition 1, volume 1, chapter 1.
2. Intertemporal General Equilibrium Models
Notes, chapter 3.
Kehoe, T. J. (1989) “Intertemporal General Equilibrium Models”, in Frank H. Hahn, editor,
The Economics of Missing Markets, Information, and Games, Oxford University Press,
363-93.
Ljunqvist and Sargent, chapter 8.
SLP, chapter 15.
3. Deterministic Dynamic Programming
Notes, chapter 4.
SLP, chapters 4, 5.
Ljunqvist and Sargent, chapter 2, 3.
4. Stochastic Dynamic Programming
Notes, chapter 5.
SLP, chapters 8, 9, 11.
10 CHAPTER 1. COURSE INFORMATION
Chapter 2
Neoclassical Growth Theory
2.1 The Ramsey optimal growth problem
2.1.1 Set-up
Consider an economy with one representative household and one representative firm. You
should think of these two representative agents as standing in for a large number of identical
households and firms, say a continuum of both types of agents distributed uniformly over
the unit interval (a continuum of mass one).1
Time is indexed by t ∈ 0, 1, 2, 3, . . . and both agents live forever.2 There are three
types of commodities:
1. a consumption good,
2. physical capital,
3. labor.
1The “large number” story is meant to justify the assumption that these stand-in agents behave com-petitively, i.e. that they take all prices as given. Atomistic agents literally have no influence on aggregatevariables.
2The assumption that households live forever turns out to simplify things quite a bit. For one thing, aswe will see in chapter 4, economies where households live forever lend themselves to the use of stationarydynamic programming techniques. Think of it for now as approximating long life spans. Alternatively, asBarro (1973) explains, households who value the welfare of their offsprings effectively solve an infinite horizonproblem.
11
12 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
The household is endowed with a quantity a0 > 0 of physical capital at date 0 and with
one unit of labor in each period. In period 0, the household sells both factors to the firm
and earns
a0R0 + w0
where, respectively, R0 and w0 are the prices of capital and labor. This income can be
consumed or saved as physical capital to be used in period 1. Consumption (c0) and savings
(a1) at date 0 must solve
c0 + a1 = a0R0 + w0.
Similarly, letting (ct, at+1) denote the household’s decisions and (Rt, wt) be the factor prices
at date t, we must have:
ct + at+1 = atRt + wt.
Given a sequence Rt, wt+∞t=0 of prices, the household chooses a non-negative consumption-
saving sequence ct, at+1+∞t=0 to maximize:
+∞∑
t=0
βtU(ct)
subject to:
ct + at+1 = atRt + wt for all t ≥ 0
where U is continuous, strictly increasing and strictly concave on IR+, continuously dif-
ferentiable on IR++, and where β ∈ (0, 1) (the discount rate) measures the impatience of
households. We will assume that U is bounded to make sure that∑+∞
t=0 βtU(ct) is always a
bounded sum (it is then dominated by a geometric series of modulus β < 1.) This will turn
out to entail no loss of generality. We will also assume that limc7→0 U ′(c) = +∞ to make sure
that the household always chooses to consume strictly positive amounts. (Show this)
A very important question (and a type of question that we will ask over and over in this
class) is whether a solution exists to the household’s problem. Another important question
is whether the solution is unique. We will deal with those questions after fully stating the
Ramsey problem.
2.1. THE RAMSEY OPTIMAL GROWTH PROBLEM 13
The firm operates a technology that, each period, transforms quantities k ≥ 0 of physical
capital and n ≥ 0 of labor into the consumption good according to a production function
F (k, n) that is continuously differentiable on IR++ and satisfies:
1. F (0, n) = 0 and F (k, 0) = 0 for all n, k ≥ 0,
2. For all k > 0, F (k, •) is strictly increasing and strictly concave,
3. For all n > 0, F (•, n) is strictly increasing and strictly concave,
4. F satisfies constant returns to scale (CRS): for all θ > 0, F (θk, θn) = θF (k, n),
5. limk 7→+∞ F1(k, n) = 0 and limk 7→0 F1(k, n) = +∞ for all n > 0,
6. limn 7→+∞ F2(k, n) = 0 and limn 7→0 F2(k, n) = +∞ for all k > 0.
We will refer to production functions that satisfy those properties as neoclassical produc-
tion functions. The last two sets of conditions are often called Inada conditions. Because F
satisfies CRS, note that for all (k, n) > (0, 0), F (k, n) = nF ( kn, 1) = nf( k
n) where f is called
the intensive production function.
Note also that for all (k, n) > (0, 0), F1(k, n) = f ′( kn) while F2(k, n) = f( k
n) − k
nf ′( k
n).
In other words, marginal products only depend on the ratio between k and n, i.e. they are
homogenous of degree zero (this is a general result: the partial derivatives of functions that
are homogenous of degree r are homogenous of degree r − 1.)
The firm chooses inputs of capital and labor so as to maximize profits. That is, at date
t, it chooses kt and nt to maximize
F (kt, nt) + (1 − δ)kt − ktRt − ntwt.
where δ is the rate of depreciation of physical capital.3
A competitive equilibrium in this environment is a sequence Rt, wt+∞t=0 of prices, a se-
quence ct, at+1+∞t=0 of household decisions, a sequence kt, nt+∞
t=0 of firm decisions such
that:3Instead of assuming the household sells its capital to the firm, we could assume that it rents it to the
firm at rate rt in period t and also receives the undepreciated part of capital at the end of each period. Thesetwo formulations are clearly equivalent with rt ≡ Rt − (1 − δ) for all t ≥ 0.
14 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
1. Given prices, ct, at+1+∞t=0 solves the household’s problem;
2. Given prices, kt, nt+∞t=0 solves the firm’s problem;
3. The market for capital clears: kt = at for all t ≥ 0;
4. The market for labor clears: nt = 1 for all t ≥ 0.
Note that given the Inada conditions we have imposed on U and F , any equilibrium must
be such that at (hence kt) is strictly positive for all t. Also because of these conditions, it is
easy to show that the capital stock series cannot grow without bound in equilibrium.
Note also that the equilibrium definition makes no mention of the market for the con-
sumption good. That’s because it must clear too, by Walras’ law (look it up), since all other
markets clear. To see this, note that profit maximization on the part of firms implies for all
t ≥ 0:
F1(kt, nt) + (1 − δ) = Rt
F2(kt, nt) = wt
Then, the consumer’s budget constraint together with market clearing conditions for
capital and labor imply that at date t,
ct + kt+1 = ktRt + wt = kt(F1(kt, nt) + (1 − δ)) + ntF2(kt, nt) = F (kt, nt) + (1 − δ)kt
where the last equality follows from Euler’s theorem for homogenous functions (look it up.)
But this condition exactly says that the supply of the consumption good equals the demand
for the consumption good, that gross output (F (kt, nt)) equals consumption plus investment
(kt+1 − (1 − δ)kt).
Recalling now that U is continuously differentiable on IR++, necessary first order condi-
tions for an interior solution to the household’s problem are that, for all t ≥ 0:
βtU ′(ct) = λt
λt = λt+1Rt+1
2.1. THE RAMSEY OPTIMAL GROWTH PROBLEM 15
where λt > 0 is the multiplier associated with the budget constraint at date t. Because
U is concave, these conditions are actually sufficient as long as the following transversality
condition holds:
limt 7→+∞
λtkt+1 = 0 (2.1.1)
Some intuition for this last condition can be gained from considering the case where the
household lives for a finite number T of periods. In that case, it is optimal to eat all capital
in the last period (set kT+1 = 0) unless the consumer is satiated and the value of capital is
zero (λT = βT U ′(cT ) = 0.) Taking this reasoning to the limit, yields the condition above. A
more careful argument is in SLP, chapter 4.4
Note that combining the first two optimality conditions for the household’s problem,
using the firm’s optimality conditions and market clearing conditions yields,
U ′(ct)
βU ′(ct+1)= f ′(kt+1) + (1 − δ) for all t ≥ 0 (2.1.2)
Together with the aggregate clearing condition,
ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0 (2.1.3)
and the transversality condition,
limt 7→+∞
βtU ′(ct)kt+1 = 0 (2.1.4)
we have a full description of competitive equilibria. An allocation ct, kt+1+∞t=0 is (part of) a
competitive equilibrium if and only if it satisfies (2.1.2-2.1.4) given k0.
Do competitive equilibria exist? Do they maximize welfare? To answer these questions,
consider an omniscient, benevolent social planner who can allocate resources as they please in
this environment. They are benevolent in that among non-negative allocations ct, kt+1+∞t=0
that satisfy the aggregate resource constraint in all periods, they choose the one that maxi-
4SLP actually prove the sufficiency of a slightly different but equivalent version of the transversalitycondition. I prefer stating conditions that can be motivated directly from finite versions of infinite problems.Establishing the necessity of transversality conditions is difficult. See references provided in SLP at the endof chapter 4.
16 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
mizes the welfare of the representative household. That is, given k0, the planner maximizes
+∞∑
t=0
βtU(ct)
subject to, for all t ≥ 0:
ct + kt+1 = f(kt) + (1 − δ)kt,
and ct, kt ≥ 0.
This problem is called the Ramsey optimal growth problem. Note that solving this problem
seems much easier than looking for competitive equilibrium allocations. Yet, the two tasks
are (essentially) the same. To see this, note that first order conditions for the planner’s
problem are (after some manipulations which you should carry out):
U ′(ct)
βU ′(ct+1)= f ′(kt+1) + (1 − δ) for all t ≥ 0
and,
ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0
and that the planner must satisfy the following condition:
limt 7→+∞
βtU ′(ct)kt+1 = 0
But these are exactly conditions (2.1.2-2.1.4). Therefore, competitive allocations and
solutions to the social planner coincide. It follows that competitive allocations are Pareto
Optimal, that no other feasible allocation exists that gives the household higher welfare.
One can show that the planner maximizes a strictly concave objective function over a
convex choice set, that this choice set is compact in the product topology, and that the
objective function is continuous in that topology as well. (Since we are encountering these
notions for the first time, this is worth a digression. The details are in section 2.6.) Therefore,
a unique solution to the planner’s problem exists, hence a unique competitive equilibrium
2.1. THE RAMSEY OPTIMAL GROWTH PROBLEM 17
exists.
2.1.2 A specific example
Assume that for all (k, n) ≥ (0, 0), F (k, n) = kαn1−α and that for all c ≥ 0, U(c) = log c.5
Note that log is not defined at zero in the standard real numbers, but that is of no concern.
First, one could extend the function to IR+ using the extended reals. More simply, note that
for all strictly positive sequence ct∞t=0:
exp
(+∞∑
t=0
βt log(ct)
)= Π+∞
t=0 cβt
t
so that the utility function on the right represents the same preferences as∑+∞
t=0 βt log(ct).
That utility function (which we could use instead without any effect on household decisions)
is defined everywhere on IR∞+ as long as consumption is bounded. But in this environment
consumption is bounded since capital is bounded above (show). We can use that fact to
impose an effective bound on the utility function.
In this specific example, the evolution of consumption and savings in the Ramsey problem
is governed for all t ≥ 0 by:
ct+1 = β(αkα−1
t+1 + (1 − δ))ct (2.1.5)
kt+1 = kαt + (1 − δ)kt − ct (2.1.6)
The evolution of (ct, kt+1) can be summarized as follows. Consumption rises as long as
β(αkα−1
t+1 + (1 − δ))
> 1 i.e. as long as the marginal product of capital exceeds the inverse
of the discount rate, and falls otherwise. The positive6 steady state level of capital, then, is:
k∗ =
(α
β−1 − (1 − δ)
) 11−α
.
5log refers to the natural logarithm function as is standard practice in the United States, though mostcountries would write ln for it.
6There is also a degenerate, zero-capital steady state in this economy. That steady state is unstablehowever, the economy never converges there unless it starts there.
18 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
As for capital, it rises whenever consumption is below net output (gross output minus de-
preciation) i.e. whenever ct ≤ kαt − δkt and falls otherwise.
Those dynamics can be represented on a phase diagram. Since we have argued that a
unique optimal path exists, there is one and only one c0 such that the path implied by (2.1.5-
2.1.6) is compatible with the non-negativity of savings, the nonnegativity of consumption
and the transversality condition. That unique path is called the saddlepath. In problem 3, I
ask you to compute it for specific values of all parameters.
I should emphasize that phase diagrams are heuristic tools and no substitute for a careful
proof of global convergence in the Ramsey model. We will provide such a proof in chapter
4 once we have the tools of dynamic programming at hand.
2.1.3 The bottom line and the golden rule
To summarize, the growth story implied by the Ramsey model is as follows in the case where
the economy starts below its steady state level of capital k∗. Capital and consumption both
rise and converge towards their steady state value where the marginal product of capital
equals the gross rate β−1 of time preference.
We know that this path maximizes welfare, but does it maximize steady state consump-
tion? In other words, consider the following problem. Assume the planner was free to choose
the initial level k of capital but had to commit to maintaining this level of capital forever.
Which level would they choose? That is, if the planner could choose a steady state value and
ignore the welfare consequences of transiting there, what capital stock would they choose?
This steady-state-consumption maximizing level of capital is called the golden rule capital
stock.
Since maintaining k requires setting consumption equal to f(k) − δk (that is, eat net
output), the planner would choose to set f ′(k) = δ i.e. to set the marginal product of capital
to one (make sure you see that the marginal product of capital is indeed one). Instead,
the Ramsey solution converges to a strictly lower level of capital. Why? That is because
maximizing steady state consumption is not the Ramsey planner’s objective. The Ramsey
planner does not get to ignore the transition to steady state. While reaching the golden
rule level of capital is entirely feasible for the planner, the saving policy this requires implies
2.2. THE SOLOW MODEL 19
suboptimally low consumption in early periods. There is such a thing as saving too much.
2.2 The Solow Model
2.2.1 Set-up
Solow (1956) considers exactly the same environment as Ramsey except that he posits that
savings (hence investment) is a fraction s > 0 of gross output at all dates. Clearly given this
ad-hoc assumption on saving behavior, we have no reason to expect that the resulting path
of capital and consumption is optimal.
Since consumption is a function of output hence capital only, the model boils down to
the following first order difference equation for capital for all t:
kt+1 = sf(kt) + (1 − δ)kt
This problems lends itself very nicely to graphical analysis. Again, from any initial level
of capital, capital converges to a unique steady state value. Along the transition, if we
start below steady state, the consumption and the marginal product of labor rise while the
marginal product of capital falls.
2.2.2 Dynamic efficiency
Is the resulting allocation efficient? In general, no. And making this case doesn’t even
require specifying preferences beyond the assumption that households always prefer more
consumption.
Given k0 = a0, call a non-negative path ct, kt+1+∞t=0 feasible if is satisfies the aggregate
resource constraint in every period. Define a feasible path ct, kt+1+∞t=0 to be dynamically
inefficient if there exists another feasible path c′t, k′t+1+∞
t=0 such that c′t ≥ ct for all t with a
strict inequality in at least one period. That is, a feasible capital path exists that gives the
household more consumption in every period and strictly more in at least one period.
Assume now that s is such that the steady state of capital exceeds the golden rule capital
20 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
stock level kGR. Then, from some date T on, f ′(kt) < δ. Set instead k′t = kGR for all t ≥ T
and c′t = f(kGR) − δkGR. The resulting path is feasible and yields more consumption than
the Solow path from T − 1 on. (I ask you for the complete argument in problem 5.)
Households in this economy are saving too much. In this case, Rt = f ′(kt) + (1 − δ) < 1
past t. Cass (1972) provides the following complete characterization of dynamic efficiency
in the one-sector optimal growth context:
Theorem 1. A feasible path ct, kt+1+∞t=0 is inefficient if and only if limt 7→+∞
∑ts=0 Πs−1
i=0Ri <
+∞
As an illustration of this very nice result, notice that when in the Solow model the capital
path eventually exceeds the golden rule capital stock, we have that Rt = f ′(kt)+(1−δ) < 1−ε
eventually for some ε > 0. But this means that∑+∞
t=0 Πt−1i=0Ri eventually behaves like a
geometric sum of modulus smaller than 1 − ε, hence converges.
2.3 Population growth and progress
The economies we have considered so far all converge to a steady state where consumption,
the capital stock and output are constant. To be useful as models of growth, they need to
be consistent with the fact that, in most countries, output and consumption do grow over
time.
The easiest (and perhaps most sensible way) to generate some growth in the model is to
assume that the quantity of labor the household can deliver augments geometrically at rate
g > 0.
Note that whether this is because the household is getting bigger (population growth) or
because the household is able to deliver more labor per unit of time (productivity growth)
is immaterial. One reason why household time represents more labor for the firm could be
for instance technological progress.
2.3. POPULATION GROWTH AND PROGRESS 21
At any rate, at date t the resource constraint now becomes:
ct + kt+1 = F (kt, (1 + g)t) + (1 − δ)kt
⇐⇒ ct
(1 + g)t+
kt+1
(1 + g)t= F
(kt
(1 + g)t, 1
)+ (1 − δ)
kt
(1 + g)t
⇐⇒ ct + kt+1(1 + g) = f(kt) + (1 − δ)kt
where for date t and variable xt, xt ≡ xt
(1+g)t .
It is natural in this environment to look for a balanced growth path, i.e an equilibrium
where all variables grow at the same rate as labor. To make sure that such a path exists,
we need some assumption on preferences. Assuming that U is of the Constant Relative Risk
Aversion (CRRA) sort, i.e that U(c) = cσ
σfor all c where σ < 1, works. (Note that σ can
be negative and that σ = 0 is the log case. Now would be a good time to read Jones and
Manuelli, 2005.)
Then, in the Ramsey problem, the planner chooses a path ct, kt+1+∞t=0 that satisfies the
resource constraint above and maximizes
+∞∑
t=0
βtU((1 + g)tct) =
+∞∑
t=0
(β(1 + g)σ)t cσt
σ
using the fact that preferences are of the CRRA type. To make sure that the sum is always
bounded we need to assume that β ≡ β(1+g)σ < 1 which is an implicit bound on the growth
rate compatible with balanced growth. First order conditions become:
βt(ct)σ−1 = λt
λt(1 + g) = λt+1(f′(kt+1) + (1 − δ))
where λt is the multiplier associated with date t’s resource constraint. Combining gives:
ct+1
ct=
(β(f ′(kt+1) + (1 − δ))
1 + g
) 11−σ
Together with the resource constraint we have a system that looks exactly like the one we
22 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
had before, and the analysis of the system is the same as before except that per labor unit
variables replace our old variables. In particular, in steady state, consumption, capital and
output all grow at the rate of g > 0 of progress.
As Jones and Manuelli (1990) explain, the reason why no growth can take place in the
Ramsey environment with fixed labor is that the only reproducible input in that model is
capital, and that returns to capital eventually become dominated by the rate of physical
depreciation. Growth requires that returns to all reproducible inputs be sufficiently high
asymptotically. What we did in this section is allow labor to grow exogenously, making it
reproducible in a trivial sense. Since the combined returns to capital and labor are constant,
perpetual growth then becomes possible.
We only considered the case of exogenous labor growth so far, but one could instead
assume that households can invest in labor the way they invest in capital, as in human
capital models. Then returns to all reproducible factors would no longer be diminishing,
making asymptotic growth once again possible. Models built around these ideas are called
endogenous growth models since growth then results from the accumulation decisions of
agents. While these models share a number of features with exogenous models of growth
(such as the balanced asymptotic nature of the equilibrium path), they also make predictions
that clearly separate them from exogenous models, as the next section discusses.
2.4 Endogenous growth (Ak models)
Assume that returns to physical capital are constant so that f(k) = Ak for all k ≥ 0
where A > 0. The premise that returns to physical capital are not diminishing (or, rather,
that they do not converge to zero as k becomes large given a fixed amount of labor) may
seem incongruous but you should think of capital here in a broad sense as representing all
reproducible inputs. For more on this, see e.g. Rebelo (1991) or McGrattan (1998).
Then, assuming that U is of the CRRA sort with parameter σ < 1, manipulations of the
first order conditions of the corresponding social planner problem yield, for all t ≥ 0:
ct+1
ct= (β(1 + A − δ))
11−σ .
2.5. DIAMOND’S OVERLAPPING GENERATION MODEL 23
If a balanced growth equilibrium path exists where consumption, capital and output
all grow at rate g ≥ 0 the above condition then implies that 1 + g ≡ (β(1 + A − δ))1
1−σ .
Obviously, growth requires that β(1 + A − δ) > 1. Recall also that the planner’s problem
is well defined only provided that β(1 + g)σ < 1. This gives us a range of parameters
compatible with balanced growth. In fact, under this condition, the balanced growth path
where all endogenous variables immediately and permanently grow at rate g is the only
solution to the planner’s problem.
Models of this sort turn out to make sharp predictions for the impact of government
policies on growth that differ greatly from the predictions of models with exogenous growth
(such as the model described in the previous section.) Assume for instance that in any
given period t, fraction τ of gross output Akt is taxed and that the proceeds from taxation
are dissipated.7 Such an economy is equivalent to an economy with productivity parameter
A(1 − τ) < A hence taxes permanently lower growth rates.
In models with exogenous growth however, asymptotic growth rates are completely inde-
pendent of net production opportunities, hence they are independent of taxes. (Make sure
that you can convince yourself of this.)
Another way to put this is that policies that affect investment rates have permanent effects
on an economy’s growth rate in endogenous models. This sharp prediction of endogenous
models has been extensively tested, with mixed results (see e.g. McGrattan, 1998, for more
on this.)
2.5 Diamond’s overlapping generation model
Assume that each period a new household is born that lives for two periods. At date t
therefore, there are two households: one born at date t − 1 that is in the second and final
period of its life, and one born at date t.
Assume that in the first period of their life households can deliver one unit of labor but
that they do not work when old. Date t household then splits its labor income between
7All that matters for this argument is that the way taxes are used does not impact marginal utilities orproduction opportunities.
24 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
consumption in the first period of its life cyt (y for young) and savings at+1. In the second
period of their life, date t household simply sells its physical capital and consumes cot+1 =
at+1Rt+1.
To get the economy off the ground it is necessary to assume that at date 0 there is a
household in the second period of its life, (the initial old) and that this initial generation is
endowed with physical capital a0. The initial old simply consume their income: co0 = a0R0.
Date t household chooses a non-negative consumption-saving profile to solve:
max U(cyt ) + βU(co
t+1)
subject to
cyt + at+1 = wt
cot+1 = at+1Rt+1
where, as before, we assume that U is strictly increasing, strictly concave, continuous on IR+
and continuously differentiable on IR++ while β < 1. The firm, for its part, does exactly
what it did in the Ramsey environment which implies as before that for all t:
Rt = f ′(kt) + (1 − δ) (2.5.1)
wt = f(kt) − f ′(kt)kt (2.5.2)
A competitive equilibrium in this context is a sequence of prices Rt, wt∞t=0, a consumption
level co0 for the initial old, policies (cy
t , at+1, cot+1)+∞
t=0 for all other generations, and policies
kt, nt∞t=0 for the firm such that:
1. Given prices, co0 solves the initial old’s problem (co
0 = a0R0);
2. Given prices, (cyt , at+1, c
ot+1) solves date-t household’s problem;
3. Given prices, kt, nt∞t=0 solves the firm’s problem for all t ≥ 0;
4. The market for capital clears: kt = at for all t ≥ 0;
2.5. DIAMOND’S OVERLAPPING GENERATION MODEL 25
5. The market for labor clears: nt = 1 for all t ≥ 0.
Do equilibria exist? Are they unique? Are they optimal? To answer those questions,
observe first that because U is strictly concave there is a unique solution to the problem of
each household given wt and Rt+1. In turn, prices only depend on the capital stock so that we
can write a(kt, kt+1) as date t household’s saving’s decision given (kt, kt+1). In equilibrium,
we must have, for all t,
a(kt, kt+1) = kt+1. (2.5.3)
Assume that savings increase with the interest rate. Then an increase in kt+1 lowers Rt+1
which in turn means that a(kt, kt+1) is a decreasing function of kt+1. Hence exactly one
solution to the above equation exists.8 The assumption that savings rise with the interest
rate is often called the gross substitutability assumption. It says that when the relative price
of future consumption falls, agents choose to shift more resources to the second period of
their life (or borrow less from future income.)
Then (2.5.3) can be solved for kt+1 as a function of kt. Write G(k) for the household’s
savings decision given k. The evolution of the system is fully described by kt+1 = G(kt) for
all t ≥ 0 and we have:
Theorem 2. If U is such that savings rise with the rental rate, then a unique equilibrium
exists.
In general however, there may be several solutions to equation (2.5.3), that is, G may be
set-valued (a correspondence) and several equilibria may exist. (See Azariadis, 1993, for a
very nice discussion and several interesting examples.)
How about welfare? Since all households solve a finite horizon problem, a transversality
condition that prevents overaccumulation no longer emanates from the household problem
8Existence only requires that a solution to this capital market clearing equation exist given any kt, whichwe can guarantee under weak conditions. It suffices for instance to assume that consumption when old is anormal good. To see this note first that the theorem of the maximum (see chapter 4) implies that a(k, •) iscontinuous for any k > 0. Furthermore, given k > 0, a(k, k′) < k′ when k′ is high enough since a(k, k′) <
f(k)+(1−δ)k for any k′ > 0. On the other hand, a(k, k′) = co(w(k),R(k′))R(k′) > k′ ⇐⇒ co(w(k), R(k′)) > k′R(k′).
The left-hand side rises as k′ rises since co is normal (make sure you see that) while the left hand side (whichis bounded above by f(k′) + (1− δ)k′) becomes vanishingly small when k′ does. It follows that a(k, k′) > k′
when k′ is small enough. An appeal to the intermediate value theorem completes the argument.
26 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
and we cannot rule out the possibility of dynamic inefficiency. (See Diamond, 1965, for much
Problem 6 provides a way to build specific examples that show that the first welfare
theorem may fail to hold in this model. One can specify technological opportunities and
preferences so that the equilibrium dynamics of capital are exactly those of the Solow model.
Then, one can trivially build an example where capital converges to a value that exceeds the
golden rule level, a Pareto inefficient outcome.
To see that the competitive equilibrium is inefficient in that case, note that the rental
rate of capital (the gross return on savings) is less than one eventually. But households in
this environment can always agree to transfer resources from the young to the old at a rate
of one-for-one. An arrangement such as social security or fiat money could therefore raise
everybody’s returns on savings without hurting anyone. We will elaborate on this possibility
in the next chapter.
For now, you should remember that the key results we have established in the Ramsey
model (that equilibria are always unique, optimal and display global convergence to a unique
steady state) may all fail to hold in overlapping generations models. Depending on which
questions you wish to ask, these may or may not be desirable features.
2.6 Existence and uniqueness
2.6.1 Existence
What conditions guarantee that maximization problems have solutions? In particular, under
what conditions does the Ramsey problem we studied in this chapter have a solution? To
answer these questions, we need some basic notions of topology. If you haven’t already, you
should start the process of reading a Real Analysis textbook such as Rudin’s Principles of
Mathematical Analysis from cover to cover. We will start with some definitions.
Let (X, d) be a metric space. That is, X is a set and d is a distance function on X × X
that satisfies for all (x1, x2, x3) ∈ X3:
1. d(x1, x2) ≥ 0 with equality if and only if x1 = x2
2.6. EXISTENCE AND UNIQUENESS 27
2. d(x1, x2) = d(x2, x1)
3. d(x1, x2) ≤ d(x1, x3) + d(x3, x2)
The ball centered at x ∈ X of radius ε is the set Bε(x) ≡ y ∈ X : d(x, y) ≤ ε.
A sequence xn∞n=0 ∈ X∞ converges to x ∈ X if for all ε > 0 there exists N such that
n > N implies xn ∈ Bε(x).
A set A is called open if whenever x ∈ A there exists ε > 0 such that Bε(x) ⊂ A. A is
closed if its complement in X is open. Equivalently (show), a set A is called closed if any
convergent sequence xn+∞n=0 ∈ A∞ converges to a point in A.
A set A is called bounded if it is contained in a ball. A set A is called totally bounded if
for all ε > 0, A can be covered with a finite number of balls of radius ε. (For an example of a
metric space with sets that are bounded but not totally bounded, scroll down 3 paragraphs.)
An open cover of set A is a collection Oα of open sets such that A ⊂ ∪Oα.
A set is called compact if any open cover of A contains a finite number of sets that cover
A. Equivalently in a metric space, a set is called compact if it is closed and totally bounded.
Also equivalently in a metric space, a set is called compact if every sequence in the set has a
convergent subsequence that converges to a point in the set. (This last notion is often called
sequential compactness. It is the most convenient to use in many proofs.)
In IRn with the standard Euclidian norm, a set is compact if and only if it is closed and
bounded. With a different metric, bounded is no longer sufficient. For instance, let X = IR
and for all (x, y) ∈ IR2 let d(x, y) = 1 if x 6= y, and d(x, y) = 0 otherwise. This is called
the discrete metric. Sets are compact in that metric if and only if they are finite. (Exercise:
which sets are open in this metric?) IR is bounded but not totally bounded in this metric.9
A real function f : X 7→ IR is called continuous on X if for any open (closed) set O
of reals, f−1(O) is open (closed). In metric spaces, there are two alternative, equivalent
definitions of continuity. A function f is continuous on X if for all x ∈ X and ε > 0 there
exists δ > 0 such that d(x, y) < δ =⇒ |f(x) − f(y)| < ε. Finally, a function f is continuous
9This example should tell you that boundedness of a set is really not a very interesting notion topologicallyspeaking. Any metric can be transformed into a bounded metric via a simple transformation withoutchanging anything about the nature of the space under study. But, in this topologically equivalent metric,all sets are bounded.
28 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
if for every convergent sequence xn in X,
limn 7→∞
f(xn) = f( limn 7→∞
xn).
Let A be subset of X. Define arg maxA f = x ∈ A : f(x) ≥ f(y) ∀y ∈ X. Here’s the
result we need:
Theorem 3. (Weierstrass) Let (X, d) be a metric space, A be a subset of X, and f be a
function on X. If f is continuous on A and A is compact, then arg maxA f is not empty.
Proof. For each x ∈ A define Sx = y ∈ X : f(y) < f(x). Because f is continuous, all
these sets are open. Take any finite subset x1, x2, . . . xn of A and let xi be such that
f(xi) ≥ f(xj) for j = 1, 2, . . . n. Then, xi /∈ ∪j=1,...nSxj. This means that Sxx∈A has no
finite cover, hence, since A is compact that it does not cover A. This implies that arg maxA f
is not empty.
The assumption that f is continuous is stronger than what we need, as the proof makes
clear. We only need f to be such that the sets Sx defined in the proof are open for all x ∈ X.
Such a function is called upper semi continuous.
In our Ramsey problem, A is the planner’s choice set. That is,
A = ct, kt+1+∞t=0 : ct ≥ 0, kt+1 ≥ 0, ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0 with k0 = a0
while f(ct, kt+1+∞
t=0
)=∑+∞
t=0 βtU(ct). We need to find a metric space that contains A such
that A is compact and f is continuous.
A metric space that does the trick is (IR2)∞ (the set of sequences of pairs of real numbers)
equipped with the product topology metric. The product distance between two elements x
and y of (IR2)∞ is d(x, y) =∑+∞
t=0 θt |xt−yt|1+|xt−yt| where θ < 1. (Note that x and y are pairs,
|x− y| denotes the Euclidian distance between x and y in IR2.) A set of sequence converges
to another sequence in this metric if and only if it converges in each coordinate.
Conditions we have imposed on the production function imply that M > 0 exists such
that kt < M for all t (Show). In turn this implies that consumption is bounded as well.
2.6. EXISTENCE AND UNIQUENESS 29
That A is compact now follows directly from a powerful theorem for product spaces called
Tychonoff’s theorem which says that any Cartesian product of compact sets is compact.
For completeness however, let us argue directly that A is totally bounded and closed in
the product topology. Closedness follows from coordinate-wise convergence (show it.) To
prove total boundedness, fix ε > 0 and pick T such that∑+∞
t=T θt < ε2. Then find a finite
number of sequences such that for the first T coordinates all elements of A are within distance
ε2
of those sequences. (Define a grid of mesh εT
in each coordinate for both consumption and
capital and take your finite set of sequences to be all possible combinations of grid points
before T , and 0 in all coordinates after T .) Since the tails of all elements of A past T can
only add ε2
in distance we have our finite set of balls covering all of A.
Now we need to argue that f is continuous. To see that it is recall that U is bounded
by some number M < 0. (We didn’t really need to assume it: the fact that consumption
is bounded above imposes an effective bound on U .) Take a sequence of sequences cit+∞
t,i=0
that converges to sequence c∗t +∞t=0 in the product topology. In other words, for all t, ci
t 7→ c∗t
as i gets large. Fix T and, using the fact that U is a continuous real function, choose i large
enough so that |U(cit) − U(c∗t )| ≤ ε
βtTfor all t ≤ T . Then,
∣∣∣∣∣+∞∑
t=0
βtU(cit) −
+∞∑
t=0
βtU(c∗t )
∣∣∣∣∣ ≤+∞∑
t=0
βt∣∣U(ci
t) − U(c∗t )∣∣
≤T−1∑
t=0
βt∣∣U(ci
t) − U(c∗t )∣∣+
+∞∑
t=T
βt∣∣U(ci
t) − U(c∗t )∣∣
< ε + 2MβT
1 − β
Because β < 1, the last expression can be made as small as we wish by making T large
enough. This implies that∣∣∑+∞
t=0 βtU(cit) −
∑+∞t=0 βtU(c∗t )
∣∣ 7→ 0 as i grows large and we have
shown that f is continuous in the product metric which is the last thing we needed in order
to apply Weierstrass’ theorem.
A metric that does not work in this case is the “supnorm” metric, a metric we will
use repeatedly in the dynamic programming section of this course. The trick in finding a
topology that works is to note that topologies where many sets are open (like the trivial
30 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
topology where all sets are open and all functions are continuous but the only sets that are
compact are finite sets) make compactness difficult to show but continuity easy to show.
The opposite is true for topologies where few sets are open (like the trivial topology where
the only open sets are the space itself and the empty set. The only continuous real functions
then are the constant functions, but all sets are compact.) We need a topology that is just
right.
2.6.2 Uniqueness
When are solutions to maximization problems unique? To answer that question we need to
introduce the notion of convexity. For this, we need (yet) a bit more structure. A set S is
called a real linear space if
1. for all (x, y) ∈ S×S there exists an element of S called the sum of x and y and denoted
by x + y that satisfies all the standard properties of addition,
2. for all (α, x) ∈ IR×S there exists an element of S called the (scalar) product of α and
x and denoted by αx that satisfies the standard properties of scalar multiplication,
3. for all (α, θ, x, y) ∈ IR2 × S2,
(α + θ)(x + y) = α(x + y) + θ(x + y) = αx + αy + θx + θy.
A subset A of a linear space is called convex if for all (x, y) ∈ A×A, and for all θ ∈ [0, 1],
θx + (1 − θ)y ∈ A. A function f on A is called concave if for all (x, y) ∈ A × A, and for all
θ ∈ [0, 1], f(θx+(1−θ)y) ≥ θf(x)+(1−θ)f(y). It is called strictly concave if the inequality
is strict whenever θ ∈ (0, 1) and x 6= y.
The result we need is that whenever A is convex and f is strictly concave, arg maxA f
contains at most one element. To see this, assume that two distinct elements of A maximize
f . Then a strict convex combination of these two elements raises f strictly, contradicting
the premise that the two elements are maximands. All told, if f is continuous and strictly
concave, and if A is compact, then arg maxA f contains exactly one element.
2.6. EXISTENCE AND UNIQUENESS 31
It is easy to see (and you should show) that the Ramsey planner maximizes a strictly
concave function on a convex set.10 So at most one optimal allocation exists. Since we have
argued above that at least one exists, the Ramsey problem yields exactly one solution.
10Here are some details. In order to use our uniqueness result, it is useful to extend the planner’s choiceset to A = ct, kt+1+∞
t=0 : ct ≥ 0, kt+1 ≥ 0, ct + kt+1 ≤ f(kt) + (1 − δ)kt for all t ≥ 0 with k0 = a0 i.e. toallow the planner to waste resources. This entails no loss of generality since we are only adding an optionthe planner will choose to ignore, but it makes the choice set convex since f is concave. Now, we can use ouruniqueness result. But since the objective function depends on consumption alone, the result only implies inand of itself that at most one optimal consumption path exists. To see that this, in turn, implies a uniquecapital only requires noting that the planner always exhausts all resources in all periods.
32 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
2.7 Problems
Problem 1
A household lives for two periods and chooses (c1, c2) ≥ (0, 0) to solve:
max U(c1) + βU(c2)
subject to:
c1 ≤ y
c2 = (y − c1)(1 + r)
where y > 0, r > −1 and U is a function on IR+.
1. Assume that U is continuous on IR+. Show that the household’s problem has a solution.
2. Assume in addition that U is strictly concave, show that the solution is unique.
3. Impose some additional conditions on U such that c1(r, y) is differentiable in r and y.(Use standard intermediate-micro arguments.)
4. Impose the additional restriction on the household’s choice set that (c1, c2) > (0, 0)and that limc7→0 U ′(c) = +∞. Show that c1 is independent of r if and only if U is thelog function (up to a linear transformation). This will require that you solve a secondorder differential equation.
5. Find a utility function U such that savings (y − c1) decrease with the interest rate.That is, find a utility function such that c1 rises with the interest rate.
Problem 2
Nicholas Kaldor argued a long time ago that total labor income is a fairly stable shareof GDP in most countries across time. Recently, Douglas Gollin has argued that this is alsoroughly true across countries (regardless of the stage of economic development.) For ourneoclassical economy to be consistent with these facts, we need that for all possible capitallevels k > 0, F2(k,1)
F (k,1)be a constant α ∈ (0, 1).
1. Assume that F satisfies all that a neoclassical production function must. Show thatF2(k,1)F (k,1)
= α for all k > 0 if and only if f(k) = Ak1−α where A > 0. This will requirethat you solve a first order differential equation. Don’t forget to explain why A > 0.[Note: This result is why most papers that deal with real business cycle or develop-ment questions in one-sector model use a Cobb-Douglas aggregate production function.People who use other functional forms have some explaining to do.]
2. Assume that F (k, n) = [αkσ + (1 − α)nσ]1σ where σ < 1. What happens to the labor
share along the transition path when σ < 0, when σ > 0, when σ = 0 (use L’Hospitalrule in this case)?
2.7. PROBLEMS 33
Problem 3
Consider the Ramsey problem we studied in class with U(c) = log(c) for all c > 0,β = 0.95, δ = 0.1, F (k, n) = 10k0.33n0.67 and k0 = 1. Use Matlab to solve for the equilibriumpath to steady state.
1. What is c0 (approximately)? (This is for me to check that your program is correct.)
2. Use Matlab to plot the phase diagram we drew in class and the saddlepath.
3. Plot (on one chart) the capital stock path over the first 50 periods for β = 0.9, β = 0.95,β = 0.975 holding everything else equal. Explain the effect of changes in β on thecapital stock path.
Hints: You have a system of two first order difference equations (the Euler equation forconsumption and the aggregate resource constraint) to simulate. Do this as follows:
1. Begin by computing the steady state values of k and c.
2. Guess c0
3. Since you know k0, the resource constraint and your c0 guess imply k1.
4. Given k1 the Euler equation for consumption implies c1.
5. Repeat the previous two steps to generate the first 50 values of ct, kt+1 given yourc0 guess.
6. Update you c0 guess and repeat procedure until k(50) and c(50) are near their steadystate value.
Problem 4
Consider a neoclassical economy populated by a representative household and a represen-tative firm. Time is discrete and infinite. The household is endowed with one unit of labor ineach period and a quantity a0 > 0 of physical capital a date 0. The firm can transform inputs(k, n) ≥ (0, 0) into the consumption good according to a neoclassical production function F ,and capital depreciates at rate δ ∈ (0, 1).
1. Define the intensive form f of production function F . (Don’t forget to specify a domainof definition.)
2. State the problem which the golden rule capital stock level kGR must solve and statea condition that fully characterizes kGR.
3. Assume that F (k, n) = kαn1−α for all (k, n) ≥ 0 where α ∈ (0, 1) and that thehousehold consumes ct = (1 − s)F (kt, nt) at all dates t where s ∈ (0, 1). Find a valuesGR (as a function of the model’s parameters) such that the steady state capital stockstrictly exceeds kGR if and only if s > sGR.
34 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
4. Assume now that the household is endowed with labor (1 + g)t at date t where g > 0.(F is back to an arbitrary neoclassical production function.) Define a balanced growthpath. Show that along any balanced growth path the marginal product of capital isconstant.
Problem 5
Consider a neoclassical economy populated by a representative household and a represen-tative firm. Time is discrete and infinite. The household is endowed with one unit of labor ineach period and a quantity a0 > 0 of physical capital a date 0. The firm can transform inputs(k, n) ≥ (0, 0) into the consumption good according to a neoclassical production functionF , and capital depreciates at rate δ ∈ (0, 1). Denote by f the intensive form of productionfunction F .
1. Define what it means for a capital stock and consumption sequence ct, kt+1∞t=0 tobe feasible and what it means for ct, kt+1∞t=0 to be dynamically inefficient givenk0 = a0 > 0.
2. Show that any feasible sequence ct, kt+1∞t=0 such that kt+1∞t=0 rises monotonicallyto a value k∗ > kGR is dynamically inefficient.
Problem 6
Consider an economy where time is discrete and infinite and where each period a two-period lived household is born. For all t ≥ 0, the household born at date t is endowed withone unit of time in the first period of their life which they supply to a representative firm forwage wt ≥ 0. They can save part of these earnings as physical capital which they rent to thefirm in the second and last period of their life at rental rate Rt+1. All told, date t householdchooses a non-negative consumption-saving profile (cy
t , cot+1) to solve:
max U(cyt ) + βU(co
t+1)
subject to
cyt + at+1 = wt
cot+1 = at+1Rt+1
where, as usual, we assume that U is strictly increasing, strictly concave on IR+ and contin-uously differentiable IR++, while β < 1.
At date 0 there is a household in the second period of its life (the initial old) that isendowed with physical capital a0. The initial old simply consume their income: co
0 = a0R0.Finally, the firm can transform inputs (k, n) ≥ (0, 0) into the consumption good according
to a neoclassical production function F , and capital depreciates at rate δ ∈ (0, 1]. Denote byf the intensive form of production function F . At date t and given factor prices, the firm
2.7. PROBLEMS 35
chooses non-negative kt and nt to maximize
F (kt, nt) + (1 − δ)kt − ktRt − ntwt.
1. Define a competitive equilibrium in this environment.
2. Show that if U is the log function (ignore as usual the fact that the log function is notdefined at zero), if f(k) = kα where α ∈ (0, 1), and if δ = 1 then the dynamic path ofcapital implied by this model has the same form as in the Solow model.
3. Under the assumptions of question 2, find a condition on β and α (hence on s) thatimplies that the capital stock converges to a value that exceeds the golden rule capitalstock.
4. Under that condition and those same assumptions, describe in no more than 5 sentencesa social contract that would raise the utility of all generations.
Problem 7
Consider the Ramsey problem with U(c) = log c for all c > 0, f(k) = kα for all k > 0and δ = 1.
1. Prove that the optimal solution is such that ct = sf(kt) for all t ≥ 0 where s ∈ (0, 1)and find s.
2. Find the steady state level of capital as a function of the model’s parameters.
3. Prove analytically (no graphs) in this case that the optimal capital path converges tothis steady state level regardless of initial conditions.
Problem 8
Consider a social planner who chooses a non-negative sequence ct, kt+1+∞t=0 to maximize
+∞∑
t=0
βt c1−ρt
1 − ρ
subject to:ct + kt+1 = Akt for all t ≥ 0
given an initial level k0 of the capital stock, and where β ∈ (0, 1), A > 0 and ρ > 0.
1. What condition do you need to impose on the problem’s parameters to guarantee thatit has a unique solution?
2. For the remainder of this problem, take as given that a unique solution exists. Whatadditional condition do you need to impose on the problem’s parameters to guaranteethat the solution exhibits positive growth? What is the growth rate of output in thiseconomy?
36 CHAPTER 2. NEOCLASSICAL GROWTH THEORY
3. Prove that the solution to the social planner problem is such that ct = s(Akt) for allt ≥ 0 where s ∈ (0, 1), and find s.
Problem 9
Consider a social planner who chooses a non-negative sequence ct, kt+1+∞t=0 to maximize
+∞∑
t=0
βtU(ct)
subject to:ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0
given an initial level k0 of the capital stock, where β and δ are both in (0, 1), where U : IR+ 7→IR is strictly concave, strictly increasing, continuous, bounded, continuously differentiable onIR++, satisfies an Inada condition at zero, and where f is the intensive form of a neoclassicalproduction function.
1. Define what it means for an allocation to be feasible and for an allocation to be dy-namically efficient in this context.
2. Prove that the solution to the social planner problem (take as given that it exists andis unique) is dynamically efficient.
3. Find an allocation that is dynamically efficient, but does not solve the social planner’sproblem.
Chapter 3
Intertemporal General Equilibrium
Models
This chapter introduces the two canonical models of modern macroeconomics and the two
standard interpretations for these models. It draws heavily on Kehoe (1989), a great survey
article by Tim Kehoe that contains a lot of what one should know after their first quarter
of Ph.D. macro. Tim makes the article available on his webpage.
3.1 Infinitely-lived consumer
As in chapter 2, time is discrete and infinite throughout this chapter. As before, index time
by t ∈ 0, 1, 2, . . .. There is one consumption good1 but there is no production (hence no
firm, no labor and no physical capital.) There are h agents called consumers. Consumer
j ∈ 1, . . . h is endowed with quantity wjt ≥ 0 of the consumption good at date t ≥ 0.
We assume that all consumers’ endowment sequence is bounded. They order non-negative
consumption sequences cjt∞t=0 according to the following utility function:
∞∑
t=0
βtjUj(c
jt )
1The paper studies the case with many consumption goods but little is lost by looking at the one-goodcase.
37
38 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
where βj ∈ (0, 1) for all j, Uj is strictly concave, strictly increasing and continuous on IR+,
continuously differentiable on IR++, and limc7→0 U ′j(c) = +∞ so as not to have to worry about
corner solutions. Consumers can always choose to eat their endowment in each period. But
trade could improve their lot. There are several ways to introduce trade in this environment.
We will consider two, the canonical ones.
3.1.1 Market structures
In the Arrow-Debreu (AD) market structure, all trade takes place at date 0. At date 0,
consumers trade future contracts that specify all deliveries in all periods. Let pt be the
price of one unit of consumption good at date t in terms of an arbitrary unit of account,
or numeraire. That is, consumers can trade promises to deliver one unit of date t good for
promises to deliver pt
pt′unit(s) of date t′ good.
Therefore, consumer j can select any consumption sequence that satisfies cjt∞t=0:
∞∑
t=0
ptcjt ≤
∞∑
t=0
ptwjt .
In effect, consumers sell (rights to) their endowment at date zero and then choose among
consumption profiles whose value at date 0 is less than that of their initial endowment.
Clearly, scaling all prices up or down by the same factor does not change any of the consumers’
choice set and we could for instance normalize p0 to one so that all prices are in terms of the
date 0 consumption good, making date 0 good the numeraire.
To summarize, in the AD market structure, consumer j solves:
max∞∑
t=0
βtjUj(c
jt )
subject to:
∞∑
t=0
ptcjt ≤
∞∑
t=0
ptwjt
cjt ≥ 0 for all t
3.1. INFINITELY-LIVED CONSUMER 39
First order conditions associated with this problem are
βtjU
′j(c
jt) = λjp
jt for all t, j (3.1.1)
where λj > 0 is the Lagrange multiplier associated with consumer j’s budget constraint.
An Arrow-Debreu equilibrium is a sequence pt∞t=0 of prices and consumption profiles
cjt∞t=0 for each consumer such that:
1. For all j ∈ 1, . . . h and given prices, cjt∞t=0 solves consumer j’s problem.
2. The market for the consumer good clears in all periods:∑h
j=1 cjt =
∑hj=1 wj
t for all t.
In the sequential market structure, trade takes place every period. At each date t,
households can trade the consumption good on spot markets and trade securities that, for
each unit of consumption good invested, yield quantity 1 + rt+1 > 0 of the consumption
good at date t + 1. Consumer j thus chooses a sequence cjt , b
jt+1∞t=0 of consumption and
investment to satisfy:
cj0 + bj
1 = wj0
cjt + bj
t+1 = wjt + bj
t (1 + rt) for all t ≥ 1
cjt ≥ 0 for all t ≥ 0
Solve the first equation for bj1, plug into date 1’s constraint and divide by (1 + r1) to
obtain:
cj0 +
cj1
1 + r1= wj
0 +wj
1
1 + r1− bj
2
1 + r1
Proceeding recursively, one obtains for all T > 1:
T∑
t=0
ptcjt =
T∑
t=0
ptwjt −
bjT+1
ΠTi=1(1 + ri)
where pt = (Πts=01 + rs)
−1with r0 = 0. As long as, limT 7→∞
bjT
ΠT−1i=1 (1+ri)
= 0, we obtain the
same (type) of budget constraint as before.
40 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
In the next subsection, we will in fact argue that the two market structures are equiv-
alent in the sense that equilibrium allocations under one market structure are equilibrium
allocations under the other.
While we allow consumers to buy and sell securities, it is necessary to put a bound on the
quantity of securities that they can sell. Otherwise, given any set of prices, consumers could
improve upon any consumption profile by borrowing additional amounts of the consumption
good in a given period and financing this purchase by selling as many securities as necessary
in future periods. These borrowing strategies are called Ponzi schemes and must be ruled
out for an equilibrium to exist.
In the finite T − period case, we’d simply require that bjT+1 ≥ 0 for all j. Here, we will
impose the constraint that for all j:
limt 7→∞
bjt
Πt−1i=1(1 + ri)
≥ 0.
This constraint implies that at any point in time, the present value of any consumer’s debt
cannot exceed the difference between the present value of their remaining endowment, and
the present value of their remaining consumption path. Put another way, consumers must
eventually pay their debts. In particular, one easily shows that this constraint rules out
Ponzi schemes.2
To summarize, in the sequential market structure, consumer j solves:
max∞∑
t=0
βtjUj(c
jt )
2Note that in writing this no-ponzi scheme constraint, we are implicitly imposing the constraint thatlimt 7→∞
bjt
Πt−1i=1(1+ri)
be well defined. This entails no loss of generality since that limit must exist in equilibrium.
At this stage one could require only that lim inft 7→∞bj
t
Πt−1i=1 (1+ri)
≥ 0.
3.1. INFINITELY-LIVED CONSUMER 41
subject to, for all t ≥ 0:
cjt + bj
t+1 = wjt + bj
t (1 + rt) where bj0 = 0
cjt ≥ 0
limt 7→∞
bjt
Πt−1i=1(1 + ri)
≥ 0
A sequential-market equilibrium is a sequence rt∞t=0 of interest rates, and, for all j =
1, . . . h, consumption profiles cjt∞t=0, and sequences of bond holdings bj
t+1∞t=0 such that:
1. For all j ∈ 1, . . . h and given prices, cjt , b
jt+1∞t=0 solves consumer j’s problem for all
j.
2. The market for the consumer good clears in all periods:∑h
j=1 cjt =
∑hj=1 wj
t for all t.
Do we need to specify that the bond market must clear? No. To see this, sum up the
budget constraint of agents at date 0, use the resource constraint to argue that we must
have∑
h bj1 = 0, and proceed recursively (you should do it). We can now formally state the
following equivalence result.
Proposition 1. Assume that pt+∞t=0 and cj
tt=0,...+∞,j=1,...h is an Arrow-Debreu equilibrium.
Then rt+∞t=0 and cj
t , bjt+1t=0,...+∞,j=1,...h is a sequential market equilibrium with
r0 = 0
rt =pt−1
pt− 1 for all t > 0
bjt+1 = wj
t + bjt (1 + rt) − cj
t for all j and t > 1
with bj0 = 0 for all j.
Conversely, assume that rt+∞t=0 and cj
t , bjt+1t=0,...+∞,j=1,...h is a sequential market equi-
librium. Then pt+∞t=0 and cj
tt=0,...+∞,j=1,...h is an Arrow-Debreu equilibrium with pt =
(Πts=01 + rs)
−1for all t ≥ 0 where r0 = 0.
Proof. This result follows trivially from observing that under the constructed prices, the
consumer’s choice set are unchanged, provided we can deal with one complication. Is it the
42 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
case that limt 7→∞bjt
Πt−1i=1(1+ri)
= 0 in the consumer’s problem for all j in a sequential market
equilibrium? To see that it is, let λjt be the shadow price (Lagrange multiplier) associated
with the consumer j’s date t constraint. Then we must have for all t ≥ 0 and j ∈ 1, . . . h
λjt = βt
jU′j(c
jt ) (3.1.2)
λjt = λj
t+1(1 + rt+1) (3.1.3)
lim supt 7→∞
λjtb
jt+1 ≤ 0 (3.1.4)
The last condition is the proper version of the transversality condition in this context and
says that the consumer cannot overaccumulate assets along an optimal path. The operator
lim sup applied to a sequence takes the supremum of all limits of subsequences of the original
sequence. It is always defined although it could be plus or minus infinity.3
Now, back to the proof. Consumer j ′s first order conditions imply that λjt = λj
0 (Πti=1(1 + ri))
−1
for all t, so that the transversality condition may be rewritten as lim supt 7→∞bjt
Πt−1i=1(1+ri)
≤ 0.
Combined with the no-ponzi constraint, this implies limt 7→∞bjt
Πt−1i=1(1+ri)
= 0, as needed.
Therefore, the two market structures are equivalent in the sense that a consumption
allocation is part of an Arrow-Debreu equilibrium if and only if it is part of a sequential
market equilibrium.
3.1.2 Welfare theorems
A consumption allocation cjtt=0,...+∞,j=1,...h is called Pareto optimal if it is feasible (it satis-
fies the aggregate resources in each period) and no other feasible allocation cjtt=0,...+∞,j=1,...h
satisfies:∞∑
t=0
βtjUj(c
jt) ≥
∞∑
t=0
βtjUj(c
jt) for all j
with at least one strict equality. In other words, it is not possible to increase the welfare of
one consumer without reducing that of another consumer.
In finite dimensional spaces, competitive equilibria are Pareto Optimal under very weak
3See Michel (1990) for the gruesome details. In the Ramsey problem, the non-negativity condition ofcapital stocks enables us to use simple limits instead of this more complicated expression.
3.1. INFINITELY-LIVED CONSUMER 43
assumptions, a result known as the first welfare theorem (look it up). In the infinitely-lived
consumer model, the first welfare theorem holds under equally general conditions:
Proposition 2. Assume that pt+∞t=0 and cj
tt=0,...+∞,j=1,...h is an Arrow-Debreu equilibrium.
Then cjtt=0,...+∞,j=1,...h is Pareto optimal.
Proof. Assume that cjtt=0,...+∞,j=1,...h is not Pareto Optimal and let cj
tt=0,...+∞,j=1,...h be a
feasible allocation that dominates it. Then, cjtt=0,...+∞,j=1,...h must exhaust all consumer’s
budgets (why?) and exceed it for at least one consumer (why?). That is,
∞∑
t=0
ptcjt ≥
∞∑
t=0
ptwjt for all j
with at least one strict inequality. Noting that in equilibrium we must have∑∞
t=0 ptwjt <
∞ for all j (why?), this implies (summing over j) that
h∑
j=1
∞∑
t=0
ptcjt >
h∑
j=1
∞∑
t=0
ptwjt
∞∑
t=0
pt
(h∑
j=1
cjt
)>
∞∑
t=0
pt
(h∑
j=1
wjt
)
But this can’t be since feasibility of the alternative allocation means that
h∑
j=1
cjt ≤
h∑
j=1
wjt for all t ≥ 0.
This contradiction completes the proof.
This result, in turn, implies that competitive equilibria must solve a planner’s problem
(another standard micro result that works in this infinitely-lived case.)
Proposition 3. An allocation cjtt=0,...+∞,j=1,...h is Pareto Optimal allocation if and only if
there exists (Pareto) weights αj : j = 1, . . . h such that∑h
j=1 αj = 1 and cjtt=0,...+∞,j=1,...h
solves:
max
h∑
j=1
αj
∞∑
t=0
βtjUj(c
jt )
44 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
subject to:h∑
j=1
cjt =
h∑
j=1
wjt for all t ≥ 0.
Proof. Assume that cjtt=0,...+∞,j=1,...h solves the Planner’s problem for some weights αj :
j = 1, . . . h such that∑h
j=1 αj = 1. Note that if j is such that αj = 0, then cjt = 0 for
all t since otherwise resources allocated to consumer j could be redirected to a consumer
with strictly positive weight. Assume by way of contradiction that cjtt=0,...+∞,j=1,...h is not
Pareto Optimal. Then another allocation exists that raises everybody’s utility, strictly for
one j. If that j is a consumer with zero weight, redirect all resources to a consumer with
positive weight. We have then raised∑h
j=1 αj
∑∞t=0 βt
jUj(cjt ) strictly which contradicts the
fact that cjtt=0,...+∞,j=1,...h solved the planner’s problem.
The fun part is the converse. For that we need a bit more machinery which is introduced
in section 3.3. Completing the proof is the last part of homework 2.
We have established that competitive equilibrium allocations are optimal. We will now
establish a converse of sorts to this result by showing that all Pareto optimal allocations are
competitive equilibria. We will do so using standard calculus tools for quickness. Chapter
15 in Stokey, Lucas and Prescott provides a proof that does not require any differentiability
assumption.
Let cjtt=0,...+∞,j=1,...h be Pareto Optimal. We have established that such an allocation
solves the Planner’s problem for a set αj : j = 1, . . . h of weights. Therefore, it must satisfy
the following set of first-order conditions:
αjβtjU
′j(c
jt ) = πt for all t, j (3.1.5)
for a set πt : t = 1, . . .∞ of Lagrange multipliers. But conditions (3.1.1) are the same as
conditions (3.1.5) with πt playing the role of prices and 1αj
playing the role of the consumer
j’s multiplier. So a solution to the social planner’s allocation solves the consumer’s problem if
and only if they satisfy the budget constraint at these candidate prices. In general, that is not
the case (consider for instance what happens if we set α1 = 0 in the planner’s problem.) But
with a bit of redistribution, we can support any Pareto optimal allocation as a competitive
3.1. INFINITELY-LIVED CONSUMER 45
equilibrium.
Here are two re-distribution schemes that work. First, make the desired allocation the
new endowment. That is, for all j, impose a sequence τ jt ∞t=0 of good transfers on consumer
j such that wjt +τ j
t = cjt . At the new endowments wj
t +τ jt t=0,...+∞,j=1,...h, cj
tt=0,...+∞,j=1,...h
is just affordable for each consumer. Since it satisfies all other first-order conditions, it solves
the consumer problem. Since the allocation is Pareto Optimal, it is feasible. So we have
constructed a competitive equilibrium.
There is a simpler set of transfers that also works and need only take place at date 0.
For all j, let
tj =
∞∑
t=0
πt(cjt − wj
t )
where the πt’s are the Lagrange multipliers associated with the social planner problem which
cjtt=0,...+∞,j=1,...h solves. Assume that consumer j receives transfer tj in terms of the nu-
meraire at date 0. Their new problem is:
max∞∑
t=0
βtjUj(c
jt )
subject to:
∞∑
t=0
πtcjt ≤ tj +
∞∑
t=0
πtwjt
cjt ≥ 0 for all t
Because cjtt=0,...+∞,j=1,...h satisfies the amended budget constraint of all consumers by
construction and satisfies all other first order conditions, it solves each consumer’s amended
problem. Hence we have the following result, a version of the Second Welfare Theorem:
Proposition 4. Every Pareto Optimal allocation can be supported as an Arrow-Debreu equi-
librium with transfers.
46 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
3.1.3 An example
The tools we have developed can simplify the search for competitive equilibria. Rather than
solve all consumer’s problem and look for prices that clear markets, we can solve a planner’s
problem instead. We know that competitive equilibria must be the solution to one such
problem. But which one? The one whose solution requires no transfers. This approach was
developed by Negishi (1960).
We will illustrate the idea in the context of a simple example. Assume that h = 2, that
Uj = log and βj = β ∈ (0, 1) for j = 1, 2 (insert same remarks as always about the fact
that log is not defined at zero) and that w1 = (1, 0, 1, 0, 1, . . . ) while w2 = (0, 1, 0, 1, . . .).
Competitive equilibria must solve for some α1 ∈ (0, 1):
max α1
∞∑
t=0
βt log(c1t ) + (1 − α1)
∞∑
t=0
βt log(c2t )
subject to:
c1t + c2
t = 1 for all t ≥ 0.
First order conditions are, for all t ≥ 0:
α1βt
c1t
= α2βt
c2t
= πt
where πt is the Lagrange multiplier associated with date t’s resource constraint. This implies,
for all t ≥ 0:
c1t =
α1
1 − α1c2t ,
or given the resource constraint in each period: c1t = α1 and c2
t = 1 − α1. Furthermore,
“prices” are πt = α1βt
c1t= βt for all t ≥ 0.
Here we could take a short cut. We now know that competitive allocations like all optimal
allocations give each consumer constant consumption over time. We also know that prices
must be πt = βt. Let c1 be agent 1’s constant consumption. The budget constraint implies
that+∞∑
t=0
βtc1 =c1
1 − β=
+∞∑
t=0
βtw1t =
1
1 − β2
3.1. INFINITELY-LIVED CONSUMER 47
so that c1 = 1−β1−β2 = 1
1+β. Similarly, c2 = β(1−β)
1−β2 = β1+β
is consumer 2’s constant consumption
level.
But, to illustrate Negishi’s method let us also take the long route. Given Pareto weights
(α1, 1 − α1), transfers needed to implement the optimal allocation are:
t1 =+∞∑
t=0
βt(α1 − w1t )
=α1
1 − β− 1
1 − β2
and,
t2 =+∞∑
t=0
βt(1 − α1 − w2t )
=1 − α1
1 − β− β
1 − β2
Competitive equilibria correspond to values for α1 that make those transfers zero. Algebra
shows that the only value of α1 that meets this requirement is: α1 = 1−β1−β2 = 1
1+β. And, of
course, we arrive at the same answer.
3.1.4 Money
Fiat, unbacked money cannot have any value in the infinitely-lived agent model. To see this,
assume that we endow agents j = 1, . . . h with quantity mj ≥ 0 of unbacked money. Their
consumer j’s budget constraint becomes
∞∑
t=0
ptcjt ≤ mj +
∞∑
t=0
ptwjt .
Summing over all consumers gives:
h∑
j=1
∞∑
t=0
ptcjt =
h∑
j=1
∞∑
t=0
ptwjt +
h∑
j=1
mj
48 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
But in equilibrium the fact that the resources constraint holds with equality in all periods
implies thath∑
j=1
∞∑
t=0
ptcjt =
h∑
j=1
∞∑
t=0
ptwjt
in all periods hence that:h∑
j=1
mj = 0.
To summarize, we have established that competitive equilibria are always Pareto Optimal
in the infinitely-lived agent model and that there is no room for fiat money. Furthermore,
standard arguments show that competitive equilibria are generally in finite number (see
Kehoe, 1989, for more on this point.) All these properties may be violated in overlapping
generation models, to which we now turn.
3.2 Overlapping generations
Assume that each period a consumer is born that lives for exactly two periods. The consumer
born at date t is endowed with w1 in the first period of their life and w2 in the second. There
is also an initial generation alive at date 0 that lives for one period and has endowment w2
of the consumption good and some amount m of unbacked money.
Let pt denote the price of date t consumption in units of unbacked money. The initial
old eat the largest amount of consumption good compatible with their budget constraint:
p0c−10 = p0w2 + m.
As for consumers born at date t ≥ 0, denote by cts their consumption in period s = t, t + 1.
They solve
max u(ctt, c
tt+1)
subject to
ptctt + pt+1c
tt+1 = ptw1 + pt+1w2
where u satisfies the same assumptions as in the previous section.
3.2. OVERLAPPING GENERATIONS 49
An Arrow-Debreu equilibrium is a sequence pt∞t=0 of prices, an initial money level m, a
consumption level c−10 for the initial generation and consumption profiles (ct
t, ctt+1)∞t=0 such
that given prices,
1. c−10 solves the initial generation’s problem (i.e. c−1
0 = w2 + mp0
);
2. for all t ≥ 0, (ctt, c
tt+1) solves generation t’s problem;
3. the market for goods clears for all t ≥ 0: ctt + ct−1
t = w1 + w2.
As before, we could assume instead that trading takes place in a sequential fashion. Date
t consumer would then face the following two constraints:
ctt + mt = w1
ctt+1 = w2 + mt(1 + rt+1)
where (1 + rt+1) = pt
pt+1and mt are security (or money) holdings.
In other words, mt is a claim that delivers pt
pt+1worth of consumption good at date t + 1
per unit invested at date t. Note that these claims are unbacked. Agents are willing to hold
positive amounts of those claims only provided they know that they will be able to exchange
them for positive quantities of the consumption good in the next period (by trading with
agents as yet unborn). These holdings are, in other words, fiat money holdings.4
A sequential market equilibrium is a sequence of interest rates rt∞t=0, an initial money
level m, a consumption level c−10 for the initial generation, consumption profiles (ct
t, ctt+1∞t=0
and money holdings mt∞t=1, such that given interest rates,
1. c−10 solves the initial generation’s problem (i.e. c−1
0 = w2 + m);
2. for all t ≥ 0, (ctt, c
tt+1, m
t) solves generation t’s (sequential) problem;
3. the market for goods clears for all t ≥ 0: ctt + ct−1
t = w1 + w2.
The same equivalence result holds as in the infinitely-lived agent case. Henceforth, we
will work with the Arrow-Debreu trading structure for concreteness.
4Interpretations of negative unbacked security holdings are a bit more convoluted. See Kehoe (1989) fora discussion.
50 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
Because u is continuous and strictly concave, a unique solution to the problem solved
by date t agents exists (see homework 1.) Let y(pt, pt+1) = ctt(pt, pt+1) − w1 be the excess
demand by agents born at date t when young while z(pt, pt+1) = ctt+1(pt, pt+1) − w2 is their
excess demand when old. The initial old have excess demand z(p0, m) = mp0
at date 0.
Equilibrium requires that y(p1, p0) + z(p0, m) = 0 and that for all t ≥ 1, y(pt, pt+1) +
z(pt−1, pt) = 0. It is also easy to see that for all t > 0, excess demands are homogenous of
degree zero in prices. They only depend on the price ratio, or, in other words, on the interest
rate. Furthermore, the fact that all consumers exhaust their budget constraints tells us that
for all t ≥ 1,
pty(pt, pt+1) + pt+1z(pt, pt+1) = 0,
which is Walras’ law.
Computing equilibria can be done by solving the system of market clearing conditions
forward. Fix mp0
. This gives us z(p0, m) hence, in turn, y(p1, p0) from market clearing. Does
that tell us uniquely what p1 must be? (recall that p0 ≡ 1 by convention). It does provided
y is monotonic in p1
p0i.e provided the gross substitutability assumption is met. (Otherwise,
we have several ways to proceed.)
Assuming that gross substitutability holds, we can get z(p1, p0) uniquely hence, by market
clearing, y(p1, p2). Proceeding gives us a full path for excess demands hence for prices by
Walras’s law. (The graphical version of this is figure 16.2 in Kehoe, 1989.)
Autarky is always an equilibrium. To see this, start at mp0
= 0 which, recursively, implies
that all excess demands are zero. Under gross substitutability, the constant price ratio that
supports that equilibrium is unique. In general, there is another steady state that may entail
non-zero values for mp0
, even negative values, in which case it is probably best to think of −mp0
as a tax on the initial old.
We will treat the case where agents want to transfer resources from the first to the second
period (the case where supporting autarky in equilibrium involves negative interest rates,
that is, the case where the slope of the offer curve looks as drawn in figures 16.2 in Kehoe,
1989.)
Assume then that the marginal rate of substitution between consumption when young
3.3. SEPARATING HYPERPLANE THEOREM 51
and consumption when old is less than 1. That is, assume that u1(w1,w2)u2(w1,w2)
< 1. Assume further
that the gross substitutability assumption holds. That is, ctt(pt, pt+1) rises with pt+1
pt.
Then there are two steady states: autarky which corresponds to mp0
= 0, and a steady
state with constant prices, i.e. unit interest rates which entails mp0
= z(1, 1). We will refer to
this second steady state as the stationary monetary equilibrium. There is also a continuum
of equilibria associated with each possible value for mp0
in (0, z(1, 1)). In this continuum, the
only Pareto Optimal equilibrium is the stationary monetary equilibrium. Indeed, all other
equilibria involve negative interest rates at all dates. But since one-for-one transfers between
generations can always be arranged, negative interest rates are suboptimal for all generations
born after t ≥ 1. As for the old, unitary interest rates correspond to the highest possible
transfer to them.
Another way to see the suboptimality of all equilibria except the monetary steady state
is to apply the general criterion of Balasko and Shell (1980). According to that criterion,
equilibria are optimal if and only if
∞∑
t=1
1
pt= +∞.
All equilibria other than a monetary steady state converge to a steady state such that
pt
pt+1= 1 + r < 1 where r is the autarkic interest rate, a negative number by assumption.
So as t grows large pt behaves like a geometric sequence of modulus greater than 1 hence∑∞
t=11pt
< +∞.
To summarize, in overlapping generations models, equilibria need not be optimal, there
is room for money, and many equilibria usually exist.
This effectively completes chapter 2. The following section provides the result we need
to complete the proof of proposition 3.
3.3 Separating hyperplane theorem
In the plane, drawing a few pictures should convince you that one can always draw a straight
line between two disconnected convex sets. You should also be able to convince yourself that
52 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
convexity is necessary. This remains true in higher real linear dimensional spaces. To see
this, we need a couple definitions.
Let X be a real linear space. A real linear functional on X is a function φ : X 7→ IR that
satisfies for all (x1, x2) ∈ X2 and (α, β) ∈ IR2:
φ(αx1 + βx2) = αφ(x1) + βφ(x2).
A hyperplane in X is a level set of a linear functional i.e. sets of points x ∈ X that satisfy
φ(x) = c where c is a real number. In the plane, hyperplanes are straight lines.
When X is equipped with a metric d, we call φ a continuous linear functional if it is
continuous in that metric. In Euclidian spaces, linear functionals are always continuous. But
that’s not true in general. (Now would be a good time to start reading chapter 15 in Stokey,
Lucas and Prescott. It is an intimidating chapter, but it is nothing but a generalization of
the results we have established in this chapter to arbitrary spaces.)
The theorem we need is known as the Hahn-Banach theorem, or the separating hyperplane
theorem.
Theorem 4. (Hahn-Banach) Let S be a linear space equipped with a metric. Let A, B ⊂ S
be convex sets such that A does not contain any interior point of B. Assume either that S
is finite dimensional or that B has an interior point. Then there exists a continuous linear
functional φ which is not identically zero and a constant c such that:
φ(x) ≤ c ≤ φ(y) for all x ∈ A and all y ∈ B
Note that if S = IRn then φ(x) =∑n
i=1 aixi for some non-zero vector a of real numbers.
We can now prove proposition 3. Let F be the set of all feasible allocations cjtt=0,...+∞,j=1,...h
and for c ∈ F and j = 1, . . . h, write Vj(c) =∑∞
t=0 βtjUj(c
jt). The utility feasibility set is:
A = V ∈ IRh : ∃c ∈ F such that Vj ≤ Vj(c) for all j.
That is, A is the set of utility levels that the planner can implement. One easily shows that
A is convex (do it.)
3.3. SEPARATING HYPERPLANE THEOREM 53
Now let c∗ be Pareto optimal and let V ∗ = V (c∗). Define B = V ∈ IRh : V ≥
V ∗ with V 6= V ∗. B is the set of utility assignments that give all consumers more utility,
strictly so in the case of one consumer. Again, B is convex. Also A does not contain any
interior point of B since c∗ is Pareto optimal. So one can apply the Hahn-Banach theorem
to sets A and B and this turns out to be exactly what we need to complete the proof of the
proposition, as you will show in homework 2.
54 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
3.4 Problems
Problem 1
Consider a pure-endowment economy with 2 infinitely-lived agents where for j = 1, 2,βj = β ∈ (0, 1) and Uj = log. Endowments are w1 = (1, 3, 1, 3, 1, . . .) and w2 = (3, 1, 3, . . .).
1. Define an Arrow-Debreu equilibrium in this environment.
2. Show that a unique Arrow-Debreu equilibrium exists (No need to invoke Weierstrass’theorem here.) and find it.
3. Define a Pareto optimal allocation.
4. Show that the allocation (c1t , c
2t ) = (2, 2) for all t is Pareto optimal. (Use proposition
3 or a direct argument.)
5. Explain how to implement that allocation as an Arrow-Debreu equilibrium with trans-fers.
Problem 2
Consider an overlapping generation economy where the representative consumer has util-ity function u(c1, c2) = log c1 +log c2 for all (c1, c2) > (0, 0) and endowment (w1, w2) > (0, 0).The initial old are endowed with quantity w2 of the good and some amount m of the nu-meraire, and have strictly monotonic preferences.
1. Define an Arrow-Debreu equilibrium in this economy.
2. Provide a condition under which an equilibrium exists where money has positive value.
3. There are two equilibria with constant inflation under that condition. What are they?Are they both Pareto optimal? Explain.
4. Calculate excess demand functions.
5. Assume that (w1, w2) = (2, 1). Use Matlab to draw the offer curve and draw theequilibrium path of both excess demands when the excess demand of the initial old is0.4.
Problem 3
1. Write the necessity part of proposition 3. (In details. For instance, when you invokethe Hahn Banach theorem, check that its conditions are met.)
2. Prove that as long as all endowment sequences are bounded, the social planner problemdefined in proposition 3 has a solution for any possible set of weights.
3.4. PROBLEMS 55
Problem 4
Consider an economy where time is discrete and where each period a two-period livedconsumer is born. Consumers born at date t ≥ 0 are endowed with quantity w1 > 0 ofthe consumption good when young and w2 > 0 when old. They order consumption profiles(ct
t, ctt+1) according to a utility function u : IR2
+ 7→ IR that is continuously differentiable onIR++, strictly concave and strictly increasing in both arguments.
At date 0 there is an initial old generation that is endowed with quantity w2 of the goodand m ≥ 0 of money. They want to consume as much as possible.
1. Define an Arrow-Debreu equilibrium.
2. Assume that u1(w1,w2)u2(w1,w2)
> 1. Show that the Arrow-Debreu equilibrium where (ctt, c
tt+1) =
(w1, w2) for all t ≥ 0 and m = 0 is Pareto Optimal.
3. Assume that u(ctt, c
tt+1) =
√ctt+
12
√ctt+1 for all t ≥ 0 and that (w1, w2) = (9, 1). If there
is a stationary monetary equilibrium in this economy, find it. If there is no stationarymonetary equilibrium, explain why.
Problem 5
Consider a discrete-time environment populated by two infinitely lived consumers. Thereis one consumption good and no production. Consumer j ∈ 1, 2 is endowed with quantitywj
t ≥ 0 of the consumption good at date t and assign consumption profiles cjt+∞
t=0 utility∑+∞t=0 βt
jUj(cjt) where βj ∈ (0, 1) and Uj : [0, +∞] 7→ IR is strictly concave, continuously
differentiable on IR++ and strictly increasing on IR+ with limc7→0 U ′j(c) = +∞.
1. Assume that β1 = β2 = β and that w1 = (0, 2, 0, 2 . . .) while w2 = (2, 0, 2, 0 . . .). Showthat in any Arrow-Debreu equilibrium both consumers choose constant consumptionprofiles and that prices satisfy pt+1 = βpt for all t ≥ 0. Using that information, findthe unique Arrow-Debreu equilibrium in this case.
2. Assume now that β1 < β2 and that w1t = w2
t = 0.5 for all t ≥ 0. Show that anyArrow-Debreu equilibrium is such that limt 7→+∞ c1
t = 0 while limt 7→+∞ c2t = 1.
56 CHAPTER 3. INTERTEMPORAL GENERAL EQUILIBRIUM MODELS
Chapter 4
Deterministic Dynamic Programming
4.1 Principle of optimality
Many deterministic dynamic optimization problems in economics are special cases of the
following general class of stationary control problems. At date t there is a vector xt ∈ X that
describes the state of the system where X ⊂ IRn for some integer n. An agent can select a
vector of actions at drawn from a set Γ(xt) ∈ Y ⊂ IRm, m ∈ IN , that depends on the state
of the system where we assume that Γ(xt) 6= ∅ for all xt ∈ X.
Depending on the current state and the action selected, the state in period t + 1 is given
by g(xt, at). Function g is called the law of motion.
In period t, given state xt ∈ X and action at ∈ Y , the agent earns reward R(xt, at). He
orders paths xt, at+∞t=0 of states and actions according to utility function
∑+∞t=0 βtR(xt, at)
where β ∈ (0, 1). Given an initial value x0 of the state, the agent solves:
sup+∞∑
t=0
βtR(xt, at)
subject to:
at ∈ Γ(xt) for all t ≥ 0
xt+1 = g(xt, at) for all t ≥ 0
57
58 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
This problem is called stationary because the set X, Y, Γ, g, β, R of objects that fully
define it do not depend on time. As in Stokey, Lucas and Prescott (SLP), we will call
this version of the problem the sequential problem and refer to it as problem (SP). We will
also assume throughout this chapter that for all x0 ∈ X and all sequences that satisfy the
problem’s constraints,∑+∞
t=0 βtR(xt, at) is well-defined.
We replace for the moment the standard max operator with the sup operator to allow for
the case where a solution does not exist.1 For any set X of real numbers, sup X ∈ [−∞, +∞]
is X’s least-upper bound i.e. the unique value such that:
1. x ∈ X =⇒ x ≤ sup X and,
2. y < sup X =⇒ ∃x ∈ X such that x > y.
All sets of real numbers have a least-upper bound in [−∞, +∞]. Furthermore, real numbers
satisfy the least-upper bound property : any bounded set of real numbers has a finite least-
upper bound. This property, in fact, defines the real numbers. Real numbers are the set of
all least-upper bounds of sets of rational numbers.
In principle, the supremum in (SP) could be +∞ or −∞. For simplicity, we will rule
out this possibility and concentrate our attention on the bounded returns case where R is
bounded above and below. In most cases that economists deal with, R is either explicitly
or implicitly bounded. SLP treat the general case and you should read all the details there.
Note, importantly, that assuming that R is bounded does not suffice to guarantee that a
solution to (SP) exists. As you know by now, we need continuity and compactness conditions.
More on that soon.
Define Π(x0) to be the set of action sequences that are feasible for the agent given initial
state x0 ∈ X. That is,
Π(x0) =at+∞
t=0 : ∃xt+∞t=1 such that, for all t ≥ 0 at ∈ Γ(xt) and xt+1 = g(xt, at)
.
1Recall from chapter 2 that a solution to (SP) exists for instance if R is continuous and bounded and ifΓ(xt) is compact for all xt ∈ IR. The assumption that R is bounded can be significantly relaxed in manycases. It is sufficient for instance to assume that Γ does not allow the state to grow so fast that the rewardgrows faster than β−1 forever (See proposition 1 in Jones and Manuelli, 1990.)
4.1. PRINCIPLE OF OPTIMALITY 59
An element of Π(x0) is called a feasible plan. With this notation, let
v∗(x0) = supat+∞
t=0∈Π(x0)
+∞∑
t=0
βtR(xt, at)
where for all t ≥ 0, xt+1 = g(xt, at). This function gives the supremum in problem (SP) for
any possible initial value of the initial state.
By definition of the supremum function, v∗(x0) is the only value that satisfies
1. v∗(x0) ≥∑+∞
t=0 βtR(xt, at) for all at+∞t=0 ∈ Π(x0);
2. ∀ε > 0, ∃at+∞t=0 ∈ Π(x0) such that
∑+∞t=0 βtR(xt, at) > v∗(x0) − ε where for all t ≥ 0,
xt+1 = g(xt, at).
One way to build a feasible plan from any initial state x0 is to choose some action
a0 ∈ Γ(x0) and then choose a continuation plan at+1+∞t=0 ∈ Π(g(x0, a0)). Doing so yields
utility
R(x0, a0) ++∞∑
t=0
βt+1R(xt+1, at+1) = R(x0, a0) + β+∞∑
t=0
βtR(xt+1, at+1)
where for all t ≥ 0, xt+1 = g(xt, at). Since this is true for all possible continuation plans
at+1+∞t=0 ∈ Π(g(x0, a0)),
v∗(x0) ≥ R(x0, a0) + β supat+1+∞
t=0∈Π(g(x0,a0))
+∞∑
t=0
βtR(xt+1, at+1)
= R(x0, a0) + βv∗(g(x0, a0))
But the choice of a0 was also arbitrary so we get:
v∗(x0) ≥ supa0∈Γ(x0)
R(x0, a0) + βv∗(g(x0, a0)). (4.1.1)
We now want to argue that 4.1.1 holds as an equality for all x0 ∈ X which is Theorem 4.2
in SLP and is known as Bellman’s principle of optimality. We provide a quick proof here:
Proposition 5.
v∗(x0) = supa0∈Γ(x0)
R(x0, a0) + βv∗(g(x0, a0))
60 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
for all x0 ∈ X.
Proof. We know that v∗(x0) ≥ supa0∈Γ(x0) R(x0, a0) + βv∗(g(x0, a0)). Given ε > 0 find a
plan at+∞t=0 ∈ Π(x0) that comes within ε of v∗(x0). Since that plan must be feasible,
at+∞t=1 ∈ Π(g(x0, a0)) so that
+∞∑
t=0
βtR(xt+1, at+1) ≤ v∗(g(x0, a0)).
But then,
v∗(x0)−ε <+∞∑
t=0
βtR(xt+1, at+1) ≤ R(x0, a0)+βv∗(g(x0, a0)) ≤ supa0∈Γ(x0)
R(x0, a0)+βv∗(g(x0, a0)).
Since ε is arbitrary, this implies v∗(x0) ≤ supa0∈Γ(x0) R(x0, a0) + βv∗(g(x0, a0)).
To check that you understand the argument above you should ask yourself: why does
this proof differ from SLP’s? Where did we use the assumption that R is bounded?
The Bellman equation is a recursive functional equation. It is a functional equation
because it must hold for all x0 ∈ X hence defines a condition which function v∗ must meet.
It is recursive because it defines v∗ in terms of itself. This use of language is a bit premature
however since we have yet to show that the Bellman equation defines anything, in other
words that a unique function (v∗) satisfies it. We will now provide conditions under which
the Bellman equation does define v∗.
Notice that the equation defines an operator on functions. To see this, for all real-valued
functions h let
Th(x) = supa∈Γ(x)
R(x, a) + βh(g(x, a)) for all x ∈ X. (4.1.2)
A function v on X satisfies the Bellman equation if v = Tv. We have shown that v∗ = Tv∗.
Is there any other solution to the Bellman equation? Not among bounded functions:
Proposition 6. If v satisfies the Bellman equation and v is bounded then v = v∗.
Proof. Assume that v = Tv. Then, for any x0 and at+∞t=0 ∈ Π(x0),
v(x0) ≥ R(x0, a0) + βv(g(x0, a0)) ≥ R(x0, a0) + βR(x1, a1) + β2v(g(x1, a1))
4.1. PRINCIPLE OF OPTIMALITY 61
where x1 = g(x0, a0). Proceeding recursively shows that for all n ≥ 0,
v(x0) ≥n∑
t=0
βtR(xt, at) + βnv(g(xn, an))
where for all n ≥ 0, xn+1 = g(xn, an). As n becomes large the last term becomes vanishingly
small because v is bounded which implies that
v(x0) ≥ supat+∞
t=0∈Π(x0)
+∞∑
t=0
βtR(xt, at) = v∗(x0).
To get the opposite inequality, fix ε > 0 and let δt+∞t=0 be such that δt > 0 for all t and
∑+∞t=0 δt < ε. Pick a0 so that v(x0) ≤ R(x0, a0) + βv(g(x0, a0)) + δ0. Then, for all t > 0 and
given xt = g(xt−1, at−1), pick at so that v(xt) ≤ R(xt, at)+βv(g(xt, at))+δt. By construction
it now follows that for all T > 0,
v(x0) ≤T∑
t=0
βtR(xt, at) + βT+1v(g(xT , aT )) +T∑
t=0
δt.
Taking limits and using the fact that v is bounded then gives
v(x0) ≤+∞∑
t=0
βtR(xt, at) + ε ≤ v∗(x0) + ε.
But since ε was arbitrary this implies v(x0) ≤ v∗(x0) for all x0 and we are done.
We have thus established that in the space B(X) of bounded functions, v ∈ B(X) is the
supremum in (SP) if and only if v satisfies the Bellman equation. This fact enables one to
rely on the beautiful machinery that is dynamic programming. Studying the properties of v∗
and optimal policies becomes amazingly simpler than it would be without those tools. The
single best illustration of what dynamic programming can do is Lucas and Prescott (1971).
The tools they use there, which we fully explore in the next section, are so powerful that
oodles of macro papers follow almost exactly the path Lucas and Prescott traced. A great
example is Hopenhayn (1992).
62 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
4.2 Tools
Our goals for the remainder of this chapter is to use the principle of optimality to say as
much as we can about solutions to stationary control problems. To that end, we need two
key tools: the theorem of the maximum and the contraction mapping theorem.
4.2.1 Banach spaces
Banach spaces are complete, normed linear spaces.
We already know from chapter 2 what a linear space is. Let X be real linear space (if
you don’t remember what “real” refers to here, look it up.) A norm on X is a function
‖ • ‖ : X 7→ IR+ that satisfies, for all (x1, x2) ∈ X × X and α ∈ IR:
1. ‖x1‖ ≥ 0 with equality if and only if x1 = 0,
2. ‖αx1‖ = |α|‖x1‖,
3. ‖x1 + x2‖ ≤ ‖x1‖ + ‖x2‖.
This should remind you a lot of the way we defined a metric. In fact, norms induce a unique
metric d(x1, x2) = ‖x1 −x2‖ for all (x1, x2) ∈ X ×X. You should verify as a useful and easy
exercise that d is in fact a metric.
Many examples of normed linear (or vector) spaces are provided in SLP (Exercise 3.4).
We will jump right into the space that interests us most in this chapter. Let (X, d) be
a metric space and let B(X) be the set of all bounded real functions on X. To make
B(X) a linear space, we need a notion of addition and a notion of scalar multiplication.
For any two functions g and h in B(X) and any scalar α > 0, define g + h ∈ B(X) by
(g +h)(x) = g(x)+h(x) and αg ∈ B(X) by (αg)(x) = αg(x) for all x ∈ X. That is, addition
and scalar multiplication are defined in the standard pointwise fashion. It is easy to check
that B(X) together with those two operations is a real linear space.
Now we need a norm. For g in B(X), define ‖g‖ = supx∈X |g(x)|. This norm is called
the supnorm and the topology it induces on B(X) is called the supnorm topology. It is
paramount that you verify that the supnorm is in fact a norm. Take notice also of why we
4.2. TOOLS 63
need to restrict our attention to bounded functions. Otherwise, ‖g‖ would not necessarily
be finite.
The one term that remains to define is “complete”. A sequence xn ⊂ (X, d) is called a
Cauchy sequence if limn 7→∞ supm≥nd(xm, xn) = 0. Put another way, a sequence is Cauchy if
for every ε > 0 there exists N large enough such that m, n ≥ N =⇒ d(xm, xn) ≤ ε. Put yet
another way, a sequence is Cauchy if for every ε > 0 the sequence is eventually contained
in a ball of radius ε. Thinking about this a bit should make it clear that for real sequences,
Cauchy sequences must converge. Subsets of metric spaces with that property are called
complete: the set contains the limit point of all its Cauchy sequences.
Here’s a result that we will need.
Proposition 7. Let (X, d) be a metric space. B(X) together with the supnorm is a Banach
space.
Proof. B(X) is a normed linear space with the operations we introduced above. We only
need to show that it is complete. Let gn be a Cauchy sequence in B(X). The sequence
gn(x) or real numbers is Cauchy for every x ∈ X, therefore it converges to some real
number g(x). Since we can do this for all x this gives a candidate function for gn to
converge to.
Now we need to argue that gn converges to g in the supnorm. Fix ε > 0 and pick
N > 0 such that ‖gn − gm‖ < ε2
whenever m, n ≥ N. Then, for any given x ∈ X,
|gn(x) − g(x)| ≤ |gn(x) − gm(x)| + |gm(x) − g(x)| ≤ ε
2+ |gm(x) − g(x)| < ε
for m large enough since gn converges pointwise to g.
There only remains to show that g is bounded. You should do it.
Now here’s a very useful, simple result.
Proposition 8. Let (X, d) be a complete metric space and X ′ be a closed subset of X. Then
(X ′, d) is a complete metric space.
Proof. Let xn be a Cauchy sequence in X ′. Since (X, d) is complete, xn converges to an
element of X. Since X ′ is closed, that element is in X ′ and we are done.
64 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
Equipped with this, we can now show that C(X), the space of continuous, bounded real
functions equipped with the supnorm topology is Banach space.
Proposition 9. Let (X, d) be a metric space. C(X) together with the supnorm is a Banach
space.
Proof. Everything but completeness is obvious. We only need to show then that C(X)
is a closed subset of B(X) by the previous result. Let gn ⊂ C(X) converge to g and
for x ∈ X take any sequence xk that converges to x. Then, for all integers k and n,
|g(xk) − g(x)| ≤ |g(xk) − gn(xk)| + |gn(xk) − gn(x)| + |gn(x) − g(x)|. By picking n large
enough, we can make the first and last term of the right-hand side as small as we want.
Then, because gn is continuous, we can make the middle-term as small as desired as well by
letting k grow large. Hence g is continuous, hence C(X) is closed and we are done.
4.2.2 Contraction mapping theorem
One big reason why Banach spaces are useful is the contraction mapping theorem. Let g
be a function from a metric space (X, d) to itself. It is called a contraction mapping with
modulus β < 1 if for all (x1, x2) ∈ X × X, d(g(x1), g(x2)) ≤ βd(x1, x2). A fixed point of g is
an element x ∈ X such that x = g(x).
Theorem 5. Let g be a contraction mapping on (X, d) with modulus β < 1. If (X, d) is
complete, then g has a unique fixed point x. Furthermore, for all x0 ∈ X, d(x, gn(x0)) ≤
βnd(x, x0).
We omit the proof here because it is tedious, but you should read it in SLP and understand
it. The proof consists of starting from any point of X and applying g repeatedly to that
point. The resulting set of points forms a Cauchy sequence hence converges. Because g is
continuous (very strongly in fact, it is Lipzitch continuous), the limit point is a fixed point.
Furthermore, it is clear that the iterative procedure converges there at geometric rate β.
Here’s a trivial consequence of the previous result that ranks among the most useful
results in recursive methods. (You should prove it.)
4.2. TOOLS 65
Corollary 1. Let g be a contraction mapping with modulus β < 1 on a complete metric
space (X, d). Let X ′ be a closed subset of X. If g(X ′) ⊂ X ′ then g’s fixed point is in X ′.
Here is why it is useful. Assume that we wish to show that the fixed point of a contraction
mapping g satisfies property P . One way to do this is to show that the set of points that
satisfy P is closed and that g(x) satisfies P whenever x does. Two steps and we’re done.
Showing that a particular function is a contraction mapping directly can be tough. Black-
well provided a set of sufficient conditions that one can use for that purpose.
Theorem 6. Let X be a subset of IRn. Assume that T : B(X) 7→ B(X) satisfies:
1. (monotonicity) For all f, g ∈ B(X) such that f(x) ≤ g(x) ∀x ∈ X, Tf(x) ≤ Tg(x)
∀x ∈ X,
2. (discounting) There exists β < 1 such that for all a ≥ 0 and x ∈ X, T (f + a)(x) ≤
Tf(x) + βa (where (f + a)(x) means f(x) + a for all x).
Then T is a contraction mapping with modulus β.
In the next section, we will apply this result to the Bellman mapping defined in 4.1.2 to
show that it is a contraction. Its unique fixed point v∗ can then be computed by repeated
iteration on T , a procedure called value function iteration.
4.2.3 Theorem of the Maximum
It is often useful to characterize how the value and policy functions associated with a dynamic
control problem vary with the parameters of the problem, say β or any parameter in the
specific functional forms that define R, g or Γ. For this we need yet another amazingly
useful, general result called the Theorem of the Maximum.
Again, we need to invest in a bit of structure. A correspondence h from set X to set Y
is a function2 that associates with each x ∈ X a non-empty subset h(x) of Y . When X and
Y are metric spaces, we can define two notions of continuity for correspondences, both of
which imply continuity in the standard sense when h is single-valued (hence happens to be
a function in the standard sense.)
2from X to 2Y , the set of all subsets of Y
66 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
A correspondence h on X is called compact-valued if h(x) is compact for all x ∈ X. It is
called convex-valued if h(x) is convex for all x ∈ X.
A correspondence h from a metric space (X, dX) to a metric space (Y, dY ) is upper-
hemicontinuous (u.h.c) at x ∈ X if for any open subset O of Y such that h(x) ⊂ O there
exists a neighborhood V of x such that h(V ) ⊂ O. The correspondence is called u.h.c on X
if it u.h.c at x for all x ∈ X.
When h happens to be compact-valued, one can establish upper-hemicontinuity using the
following fact. If h is compact-valued on X, then it is u.h.c at x if for any sequence xn in
X that converges to x and for every sequence yn ∈ Y with yn ∈ h(xn) for all n, yn has
a convergent subsequence that converges to a point in h(x). (See Hildenbrand and Kirman,
1988, p262, for a proof.)
Next, the correspondence h is called lower-hemicontinuous (l.h.c) at x ∈ X if for any
open subset O of Y such that h(x) ∩ O 6= ∅, there exists a neighborhood V of x such that
h(x′) ∩ O 6= ∅ for all x′ in V . The correspondence is called l.h.c on X if it l.h.c at x for all
x ∈ X.
There is once again a sequential definition of lower-hemicontinuity (which, notice, requires
no compactness assumption.) The correspondence h is l.h.c if for any sequence xn in X
that converges to x ∈ X and for any y ∈ h(x), there exists a sequence yn that converges
to y with yn ∈ h(xn) for all n.
Finally, a correspondence is continuous if it is both u.h.c and l.h.c.
Also note that in order to define these notions, one only needs to know which sets are
open: they are topological notions. See Hildenbrand and Kirman (1988) for a great treatment
of everything of importance pertaining to correspondences. You should also make sure that
you understand these notions fully. Homework 3 will give you some practice but you should
work out as many exercises as possible in SLP’s chapter 3.
For our purpose, the key result we need is Berge’s Theorem of the maximum.
Theorem 7. Let X ⊂ IRn and Y ⊂ IRm, let f : X×Y 7→ IR be a continuous function and let
Γ : X 7→ Y be a continuous, compact-valued correspondence. Then, the function h : X 7→ IR
defined by h(x) = maxy∈Γ(x) f(x, y) is continuous, and the correspondence π : X 7→ Y defined
by π(x) = arg maxy∈Γ(x) f(x, y) is non-empty, compact-valued and u.h.c.
4.3. CHARACTERISTICS OF THE VALUE FUNCTION 67
You should read and understand the proof of this result in SLP. The bottom line for our
purposes is that in a maximization problem that is parameterized by a list x of objects, the
value of the problem and the set of solutions vary upper-hemicontinuously with parameters
as long as the objective function and the choice set are continuous.
Note that this result contains Weiestrass’ theorem (it’s the “non-empty” part of the last
sentence). Furthermore, observe that if f happens to be strictly concave and Γ happens to be
convex, we know from chapter 1 that π is single-valued. In that case, it is then a continuous
function. Under those stronger assumptions, we can get a stronger continuity result. In
the set-up of the theorem of the maximum, consider a sequence of strictly concave objective
functions fn that converge to f in the supnorm, and assume that f is strictly concave as
well.3 Letting πn and π be the corresponding policy functions, a natural question to ask
is whether πn converges to π in some sense. We need two more definitions to state the
result we need.
We say that πn converges to π pointwise if for all x ∈ X, |πn(x) − π(x)| converges to
zero as n grows large. We say that πn converges to π uniformally if πn converges to π in
the supnorm. Uniform convergence implies pointwise convergence, but the opposite is not
true. (Find a counter-example.)
Theorem 8. In the set-up described in the previous two paragraphs, πn converges to π
pointwise. If in addition X is compact, πn converges to π uniformally.
This is theorem 3.8 in SLP. Now, we’re in business.
4.3 Characteristics of the value function
The Bellman equation says that v∗ is the fixed point of an operator T on the space B(X)
of bounded functions defined by equation (4.1.2). We begin by establishing that T is a
contraction mapping on B(X).
Lemma 1. The operator T defined by equation (4.1.2) is a contraction on B(X).
3Functional analysis is tricky and one should never assume that something is true unless one can proveit, however intuitive a particular result appears. The fact that all functions in fn are strictly concave doesnot imply that f is, even though fn converges to f in a very strong sense. It implies that f is concavehowever. See homework 3
68 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
Proof. Since R is bounded, Tv is bounded if v is. Therefore, T does map B(X) into itself. To
verify that T is a contraction, we will check that Blackwell’s sufficient conditions are met. If
for w, v ∈ B(X) w(x) ≥ v(x) for all x, then R(x, a)+βw(g(x, a)) ≥ R(x, a)+βv(g(x, a)) for
all a ∈ Γ(x) so that (taking sups), Tw(x) ≥ Tv(x) for all x ∈ X so T satisfies monotonicity.
Turning to discounting, for all (x, a) ∈ X × Y we have that R(x, a) + β(v(g(x, a)) + c) =
R(x, a) + β(v(g(x, a)) + βc so that (taking sups), T (v + c)(x) = Tv(x) + βc. Since β < 1, T
satisfies discounting, and it is therefore a contraction with modulus β.
This implies that T has a unique fixed point (we already knew that, we showed directly
earlier that v∗ is the only bounded function that satisfies the Bellman equation. Notice
however how easy things are once we have all the machinery of functional analysis at hand.)
and that this fixed point (v∗) can be computed by iterating on T starting from any initial
case, a procedure called value function iteration. Before seeing in details how value function
iteration works, it is useful to say as much as we possibly can about v∗ and optimal plans.
So, what else can we say about v∗? Dynamic programming tells us that v∗ inherits any
property which T preserves as long as the set of functions with that property is closed in the
supnorm topology. Here’s a first illustration:
Proposition 10. Assume that R and g rise with x and that Γ is monotone in the sense that
Γ(x) ⊂ Γ(x′) whenever x′ ≥ x. Then v∗ is an increasing function.
Proof. Assume that v is an increasing function. Then R(x, a) + βv(g(x, a)) rises with x
for all a since R and g are increasing functions. Since the set of feasible actions does not
decrease when x rises, taking sups then shows that Tv is increasing as well. So T preserves
monotonicity. Because the set of bounded increasing functions is closed under the supnorm
(see homework 3), the unique fixed point of T is increasing as well.
To be able to say more (by appealing to the Theorem of the Maximum), let us now
assume that:
Assumption 1. R, Γ and g are continuous and monotonically increasing in x and X is
compact.
Then,
4.3. CHARACTERISTICS OF THE VALUE FUNCTION 69
Proposition 11. Under assumption 1, v∗ is a continuously increasing function. Further-
more, if R is strictly increasing in x, so is v∗.
Proof. Under assumption 1 and by the theorem of the maximum, T preserves continuity
(elaborate.). Since C(X) is a Banach space, the fixed point of T must be in C(X) as well.
Henceforth then, we can restrict our search for fixed points to C(X). Under that restriction,
Weierstrass’ theorem implies that a solution to the maximization problem that defines T
exists and we may replace the sup operator with a standard max operator. For x ∈ X,
let π(x) = arg maxa∈Γ(x) R(x, a) + βv∗(g(x, a)) and pick an element a ∈ π(x). Then, if R
increases strictly in its second argument, x < x′ for any x′ ∈ X implies that
v∗(x) = R(x, a) + βv∗(g(x, a)) < R(x′, a) + βv∗(g(x′, a)) ≤ v∗(x′).
The strict inequality uses the strict monotonicity of R and the monotonicity of v∗ and g.
The last weak inequality uses the monotonicity of Γ.
The correspondence π : X 7→ Y defined by π(x) = arg maxa∈Γ(x) R(x, a) + βv∗(g(x, a))
is called the optimal policy correspondence. One can use it to build an optimal plan from
any initial state by drawing a0 ∈ π(x0) and recursively an ∈ π(xn) where for all n > 0
xn = g(xn−1, an−1). In fact, under assumption 1, you should convince yourself that this is
the only way to build an optimal plan. It follows from this trivial observation that a unique
optimal plan exists from any initial state if and only if π(x) is single-valued (a function) for
all x ∈ X.
Under assumption 1 the theorem of the maximum implies that π(x) is u.h.c a fact that
comes in very handy when deep, general existence theorems such as Kakutani’s fixed point
theorem (See Hildenbrand and Kirman, 1988) must be invoked. But can we impose additional
assumptions on the control problem to guarantee that π is always single-valued? Yes, and
as we have discussed on many occasions already, convex choice sets and strictly concave
objectives are the answer.
Here it is useful to transform the problem as in SLP into one where agents choose x′
directly rather than an action. Define Q(x) = x′ : x′ = g(x, a) for some a ∈ Γ(x). In the
Ramsey problem, Q and Γ coincide. Also define F (x, x′) = maxa:g(x,a)=x′ R(x, a). In this
70 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
transformed problem, the optimal policy correspondence π : X 7→ X is defined for all x ∈ X
by:
π(x) = arg maxx′∈Q(x)
F (x, x′) + βv∗(x′).
Note that the assumptions we have made so far on Γ and R carry to Q and F . Here are the
additional assumptions we need:
Assumption 2. X is convex, Q has a convex graph, and F is strictly concave.
The assumption that Q has a convex graph means that for any (x, x′) ∈ (X × X), any
(y, y′) ∈ Q(x)×Q(x′), and any θ ∈ [0, 1], θy+(1−θ)y′ ∈ Q(θx+(1−θ)x′). You should draw
this in the one-dimensional case to get some intuition for the language “convex graph.”
Proposition 12. Under assumptions 1 and 2, v∗ is strictly concave and π is single valued.
Proof. Note first that v∗(x) = maxx′∈Q(x) F (x, x′) + βv∗(x′) and let T be the corresponding
contraction operator on C(X). The set of concave, continuous functions4 is a closed subset
of C(X) (see homework 3.) Furthermore, T preserves concavity since R is concave. Now
pick any two states (x1, x2) with x1 6= x2 and let (x′1, x
′2) be the corresponding optimal choice
of next-period states. For θ ∈ [0, 1], let xθ = θx1 + (1 − θ)x2 and x′θ = θx′1 + (1 − θ)x′
2.
Because Q has a convex graph, x′θ ∈ Q(xθ). So
v∗(xθ) ≥ F (xθ, x′θ) + βv∗(x′θ)
> θF (x1, x′1) + (1 − θ)F (x2, x
′2) + β(θv∗(x′
1) + (1 − θ)v∗(x′2))
= θv∗(x1) + (1 − θ)v∗(x2).
The strict inequality uses the strict concavity of F and the concavity of v∗. So v∗ is strictly
concave. Single-valuededness follows from the same arguments as always.
If for all (x, x′) ∈ X × X, a ∈ Γ(x) : g(x, a) = x′ is single-valued (as is the case in
the Ramsey problem, for instance), then π is single-valued when π is. Let us restrict our
4Concave functions are continuous on open sets. Continuity is therefore almost redundant here, butalmost does not cut it. Concave functions are a lot of things (differentiable, for one) almost everywhere, butignoring the few potential problem points can lead to big mistakes.
4.3. CHARACTERISTICS OF THE VALUE FUNCTION 71
attention henceforth to the case where π is in fact single-valued. The optimal action a at
state x must solve:
maxa∈Γ(x)
R(x, a) + βv∗(g(x, a)).
When v∗ is differentiable,5 standard tools from calculus can be used to compute optimal
actions and characterize policy function π. When is v∗ differentiable? Here’s a theorem.
Theorem 9. (Benveniste and Scheinkman) Assume that X is convex and that v∗ is concave
and let x0 be in the interior of X. Assume that there exists a concave, differentiable function
W defined on a neighborhood D of x0 such that W (x0) = V (x0) and W (x) ≤ V (x) for all
x ∈ D, then V is differentiable at x0 and its partial derivatives there are W ′s.
The idea behind this result is trivial and is illustrated in Figure 4.1 in SLP. The only
type of non-differentiability a V can have on the interior of X is a kink that points upward.
If V envelopes W from above as is the premise of the theorem, W cannot be smooth either.
One cannot envelope a concave function with a kink with a smooth concave function from
above. Now, we get:
Theorem 10. If F is differentiable on the interior of the graph of Q and assumptions 1
and 2 hold, if x0 is in the interior of X and π(x0) is in the interior of Q(x0), then v∗ is
continuously differentiable at x0.
It would be very useful to know when v∗ is twice differentiable. Then, not only could we
find optimal policies using standard calculus tools, we could also characterize how π depends
on parameters using the implicit function theorem. Getting v∗ to be twice differentiable
requires much stronger assumptions however. Finding assumptions that work became an
important question in macroeconomics in the late 1980s, with many of the field’s greatest
minds thinking about this problem. The problem was eventually solved by (then) University
of Chicago student Manuel S. Santos. The relevant part of Manuel’s dissertation is in Santos
(1991). Clearly, given the dynamic programming principles we have used so far, for v∗ to be
5You should look up a definition of what it means for a multivariate function like v∗ to be differentiableat a particular point. Multivariate differentiability is the natural extension of the one-dimensional case. Thefunction can be well-approximated at that point by an hyperplane. This is the case for instance when thefunction has continuous partial derivatives.
72 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
twice differentiable, R must be. It turns out that, in general, it must also satisfy a strong
form of concavity.
4.4 Value function iteration
Consider a stationary control problem X, Y, Γ, g, β, R where R is bounded (above and
below.) Solving the problem entails finding v∗ and the set of optimal plans for any possible
initial state x0 ∈ X. The tools we have developed in the previous section can enable us
to say quite a bit about both objects. (See the Ramsey illustration in the next section.)
But when we ask quantitative questions (many interesting questions in macroeconomics are
quantitative: how much of phenomenon X does factor Y account for?) we need to compute
(approximations to) optimal policies.
The contraction mapping theorem tell us how to do that. We could start from any guess
v0 for the value function, but probably the most useful guess is v0 ≡ 0. Then, for all x0,
v1(x0) = (Tv0)(x0) = maxa∈Γ(x0)
R(x0, a) + βv0(g(x0, a0) = maxa∈Γ(x0)
R(x0, a)
can be naturally interpreted as the maximum utility the agent could derive if they lived for
one period and entered this one period in state x0. Similarly, vn(x0) = (T nv0)(x0) is the
maximum utility the agent can generate when they have n periods to live. We know that vn
converges to v∗ uniformally and, under sufficient conditions, πn (the optimal policy in the
n-period case) also converges uniformally to π.
There are several ways to implement this procedure with computer help. We will outline
a specific procedure in the Ramsey case below.
4.5 Application to the Ramsey problem
Recall that the Ramsey planner chooses (non-negative) ct, kt+1+∞t=0 to maximize
+∞∑
t=0
βtU(ct)
4.5. APPLICATION TO THE RAMSEY PROBLEM 73
subject to:
ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0
subject to a given initial level of the capital stock. This fits into our description of a dynamic
control problem where capital plays the role of the state variable and consumption (or
investment, pick one) at date t is the action.
Letting kmax be the unique positive solution to δk = f(k), a natural choice for X (the
state space) is [0, kmax] = [0, f(kmax) + (1 − δ)kmax] and this is the natural choice for the
action set Y as well. Note that X is convex and compact. Next, given capital level k, the
choice set is Γ(k) = [0, f(k) + (1 − δ)k]. In homework 3, you will show that Γ is continuous
and convex-valued. The transition function is g(k, c) = f(k) + (1 − δ)k − c for all k ∈ Γ(k).
Since Y is bounded, we can assume without any loss of generality that U is bounded
above. We also assume that U is defined everywhere on Y ,6 that it is strictly concave,
strictly increasing and continuously differentiable on (0,∞). Finally we will assume that
limc7→0 U ′(c) = +∞.
The value function v : X 7→ IR associated with the Ramsey problem is the unique solution
to the following Bellman equation:
v(k) = maxc∈[0,f(k)+(1−δ)k]
U(c) + βv(f(k) + (1 − δ)k − c)
for all k ∈ X. The arguments we developed in the previous section imply that v is continuous,
strictly increasing, and continuously differentiable on the interior of X. They also imply that
the optimal consumption policy c(k) is unique for all k in X. We will now argue that c rises
strictly on X. If k = 0, c(k) = 0, trivially. On the other hand, if k > 0, c(k) > 0 since
limc7→0 U ′(c) = +∞.
What’s more, optimality requires that c(k) < f(k)+(1−δ)k. This is because limk 7→0 v′(k) =
+∞ (see homework 3.)
Now take any k ∈ (0, kmax). Since U and v are differentiable, U is strictly concave and
since as stated above optimal consumption is always in the interior of Y , a necessary and
6In principle, this excludes the log utility case. But that case can be dealt using the arguments developedin section 4.4 in SLP.
74 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
sufficient condition for optimal consumption given k ∈ (0, kmax] is:
U ′(c) = βv′(f(k) + (1 − δ)k − c).
Because v is continuously differentiable and strictly concave, v′ is strictly decreasing.7
Assume that capital rises from k to k′. If c does not change, the right-hand side of the
first-order condition falls while the left-hand side is unchanged. If c falls that makes matters
worse. So c must rise. It follows that c rises strictly with k on the interior of X. Since
c(0) = 0 and we could (without changing any of the arguments above) extend X past kmax,
c rises monotonically on all of X.
The fact that c rises with k will help with the computations I ask you to perform in
homework 3. Another fact that should help is that there is a pretty tight bound on how
much c can rise when k rises: for k′ > k, c(k′) < c(k) + [f(k′) + 1− δ)k′ − (f(k) + (1− δ)k).
In other words, when output rises, consumption rises by less than output does. This means
that for any k′ > k, c(k′) ∈ [c(k), c(k) + [f(k′) + 1 − δ)k′ − (f(k) + (1− δ)k)]. In homework
3, I ask you to compute v and c on [0, kmax] via value function iteration.
The first thing you need to do is to discretize the state space, i.e. create a vector of N
points k1, k2, k3, . . . kN in [0, kmax] with kN = kmax (you can make k1 = 0 too, but keep in
mind that we know what v and c are there.) This set of points is often called a grid. You can
make the points equally spaced though people often use different assignment schemes when
they feel that it is more important to be precise in certain parts of the state space than in
others.
Then create a vector v0 of initial guesses for the value function at each grid point, starting
with the zero vector. We might actually skip the first iteration since we know what v1 = Tv0
is given by v1i = U(f(ki) + (1 − δ)ki for all i ∈ 1, . . . N.
Things get more interesting with the next iteration. What is the optimal consumption
policy given guess v1? Given capital k, c(k) solves: maxc∈[0,f(k)+(1−δ)k] U(c) + βv1(f(k) +
(1 − δ)k − c). The problem is that we know what v1 is at the grid points, but not off them.
7Look it up. What you need here is not a theorem that says that v′′ is strictly negative. First of all, thisneed not be true even if v is twice differentiable. More importantly, we have not provided conditions thatguarantee that it is twice differentiable.
4.5. APPLICATION TO THE RAMSEY PROBLEM 75
There are several ways to proceed.
You can first use brute force and constrain the agent to always land on grid points, i.e. re-
strict their choice set to consumption values such that f(k)+(1−δ)k−c ∈ k1, k2, k3, . . . kN.
There are only a finite number of possibilities to try, and Matlab can quickly tell you which
is best.
You can slightly improve over this by allowing c to fall anywhere in a different (presumably
finer) grid c1, c2, c3, . . . cM ⊂ [0, kmax]. Then there is no guarantee that k′ = f(k) + (1 −
δ)k − c ∈ k1, k2, k3, . . . kN. To get the value of vn−1 at k′ you need to use some form
of interpolation. The simplest (and my favorite because it is the only form that preserves
concavity) is linear interpolation. If k′ ∈ [ki, ki+1] for some i ∈ 1, . . .N, then approximate
v(k′) with (ki+1−k′)vn−1(ki)+(k′−ki)vn−1(ki+1)
ki+1−kiwhere vn−1 is the last guess. Again, pick the value
of consumption that yield the maximum utility.
Finally, you can use the first order condition. We know that c(k) must solve U ′(c) =
βv′(f(k) + (1 − δ)k − c). While we do not know what v′ is, under the linear approximation
above we know that it is a step function with value vn−1(ki+1)−vn−1(ki)ki+1−ki
between two consecutive
grid points ki and ki+1.
One issue is that there at two possible values when we are exactly at a grid point. There
you can use the right-hand derivative and proceed with the loop as before. In other words
an exact solution may not exist to first-order conditions but using right-hand derivatives
guarantees that the algorithm converges as it should to the kink point when it is the optimal
solution.
Now let us start at k1. We know that optimal consumption is in [0, f(k1) + (1− δ)k1] so
define c = 0 and c = f(k1) + (1 − δ)k1.
Let’s begin with guess c = c+c2
. At that c, evaluate U ′(c)−βv′(f(k)+ (1− δ)k− c) where
v′ is replaced with the step function described above at iteration n. If this value is too high,
c is too low so replace c by c since c (we now know) is an upper bound. In the opposite case,
c is too low so update c to c.
Then make you next consumption guess c+c2
and proceed. This procedure is called di-
chotomy. It works very well and converges at a geometric rate to a solution. You should
easily convince yourself that after q divisions of the consumption interval, consumption is
76 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
within f(k1)+(1−δ)k1
2q+1 of the solution.
For i > 1 we know that c(ki) ∈ [c(ki−1), c(ki−1)+[f(ki)+1−δ)ki−(f(ki−1)+(1−δ)ki−1)]
so we can once again use dichotomy to find the unique solution to the first order condition.
If you use one of the first two methods, use fine grids if you want to arrive at decently
precise answer. If you use the last method, a coarse grid suffices (which is why the last
method is superior to the first two in my eyes.)
Regardless of how you choose to compute c(ki) for all i, you can update your guess for the
value function iteration by vn(ki) = U(c(ki)) + βvn−1(f(ki) + 1 − δ)ki − c(ki)) for all i. You
should proceed until the value function is almost invariant, i.e. until maxi vn(ki)−vn−1(ki) <
ε where ε is some small tolerance level. In practice, computation time is cheap in this case,
just iterate 500 times. We know that after 500 times we are within β500v1(kmax) of the true
value function (how do we know that?), that should be good enough.
In the problem set, I ask you to carry out these computations in two cases for which the
final answer is known which will enable you to make sure that your program is working. One
is the case we worked out using a shooting algorithm in chapter 2. The second one is the
case where δ = 1, f(k) = Akα for all k where A > 0 and α ∈ (0, 1) and U is the log function.
In that case, we can find the problem’s solution analytically.
It is important to recognize that in this case U is not bounded below (as usual, we can
bound it above without any loss of generality.) Nevertheless, it is trivial to establish that
the Bellman equation still holds and that the value function satisfies the same properties as
before.8 So the value function is the unique fixed point to the following functional equation:
v(k) = max0≤c≤Akα
log c + βv(Akα − c)
for all k ∈ (0, kmax].
A guess for v (that turns out to be right) is v(k) = a + b log k for all k ∈ (0, kmax] where
a and b are constants. To see this, note that under that guess c(k) for k > 0 is the unique
solution to1
c=
bβ
Akα − c
8At k = 0, define v(k) = −∞ and operate in the extended reals.
4.6. DETERMINISTIC DYNAMICS 77
so that c(k) = 11+bβ
Akα for all k. Plugging this back in the Bellman equation shows that our
guess is correct if and only if:
a + b log k = log
(Akα
1 + bβ
)+ β[a + b log
(bβAkα
1 + bβ
)]
for all k ∈ (0, kmax]. Some algebra then shows that this holds if and only if
b =α
1 − αβ
and
a =log[(1 − αβ)A]
1 − β+
αβ
1 − αβ
log[αβA]
1 − β.
4.6 Deterministic dynamics
The tools we have developed in this paper can be used to study the dynamic evolution of
the state variable(s) in a given control problem. We will illustrate this by providing a proof
of global convergence in the Ramsey model. In chapter 2, we described heuristic tools that
strongly suggest that capital converges to a unique steady state value from any positive value
of the initial stock. We will now prove this formally.
The following draws heavily from pages 133-136 in Stokey, Lucas and Prescott, and I
would recommend that you read that section as well as the entire chapter on deterministic
dynamics. In particular, SLP correctly emphasize that global convergence and well-behaved
dynamics in the Ramsey model are a fragile result. Relaxing the one sector assumption
and/or neoclassical assumptions on the production function and the utility functions suffice
to produce a very different outcome.
In any event, the result we wish to show is:
Proposition 13. In the Ramsey model, the equilibrium path of capital converges to a unique
steady state value from any positive initial value of the capital stock.
Proof. As we argued above, the optimal investment policy function h : [0, kmax] 7→ [0, kmax]
78 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
satisfies for any k > 0:
βv′(h(k)) = U ′(f(k) + (1 − δ)k − h(k)) (4.6.1)
and v′(k) = U ′(f(k) + (1 − δ)k − h(k))(f ′(k) + (1 − δ)) (4.6.2)
Equation (4.6.1 ) implies that h rises strictly on [0, kmax] and the theorem of the maximum
implies that h is continuous. What’s more, the unique steady state value k∗ of the capital
stock is the solution on [0, kmax] to h(k) = k. Equations (4.6.1) and (4.6.2) then imply that
k∗ is the unique value that satisfies β(f ′(k) + (1 − δ)) = 1 (an observation we already made
on several occasions in chapter 2.)
Now we want to argue that [h(k) < k and k > 0] if and only if k > k∗. This will imply
that the path of capital can be studied on a graph that looks qualitatively identical to the
standard Solow model graph. In particular, we have only one strictly positive steady state
and global convergence to it from any strictly positive initial condition.
To establish the desired result, note that since v is concave v′(k1) > (=)(<)v′(k2) if and
only if k1 < (=)(>)k2 In particular, for any k > 0, [v′(k) − v′(h(k))][k − h(k)] ≤ 0 with
equality if and only if k = k∗. Together with equations (4.6.1) and (4.6.2) and a bit of
algebra, this yields gives(f ′(k) + (1 − δ) − 1
β
)(k − h(k)) ≤ 0 with equality if and only if
h(k) = k, as needed.
4.7. PROBLEMS 79
4.7 Problems
Problem 1
1. Show that the supnorm is a norm on C(X) where X ⊂ IRn (n ∈ IN).
2. Show that pointwise convergence does not imply uniform convergence.
3. Show that the set of bounded, increasing real functions on a bounded subset X of IRn
(n ∈ IN) equipped with the supnorm is a complete metric space.
4. Show that the set of bounded, strictly increasing real functions on a subset IR equippedwith the supnorm is not a complete metric space.
5. Show that the set of bounded, concave real functions on a subset X of IRn (n ∈ IN)equipped with the supnorm is a complete metric space.
6. Show that the set of bounded, strictly concave real functions on IR+ equipped with thesupnorm is not a complete metric space.
7. Let f be a continuous real function on IR+ and for all k ≥ 0 define Γ(k) = [0, f(k)].Show that Γ is a continuous correspondence. What condition do you need to imposeon f to guarantee that Γ has a convex graph? (Prove that your condition is sufficient.)
Problem 2
Consider a Ramsey planner who chooses non-negative ct, kt+1+∞t=0 to maximize
+∞∑
t=0
βtU(ct)
subject to:ct + kt+1 = f(kt) + (1 − δ)kt for all t ≥ 0
given an initial level k0 of the capital stock. As usual, f is the intensive form of a neoclas-sical production function, β ∈ [0, 1), δ is in [0, 1], and U is bounded, strictly concave andcontinuously differentiable on IR++ with limc7→0 U ′(c) = +∞.
1. Let v be the value function associated with this problem. Write the Bellman equationwhich v must solve.
2. Show that the mapping which the Bellman equation defines is a contraction mapping.
3. Show that limk 7→0 v′(k) = +∞.
80 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
4. Let kmax be the unique positive solution to δk = f(k) and let c : [0, kmax] 7→ [0, kmax]be the optimal consumption policy function. Show that for any k′ > k > 0,
c(k′) − c(k) ≤ f(k′) + (1 − δ)k′ − [f(k) + (1 − δ)k].
5. Assume that β = 0.95, δ = 0.1, f(k) = 10k0.33 for all k and U is the log function.(Ignore as usual the fact that U is not defined at 0.) Use value-function iteration tocompute an approximation to v and c on [0, kmax]. Plot both.
6. Let k0 = 1 and plot the optimal path of capital over the first 50 periods. Compareyour plot to the plot you obtained in homework 1 for the same set of parameters.
7. Assume now that δ = 1. What are v and c in this case (exactly, don’t use the computeryet)? Use value function iteration to compute an approximation to v and c on [0, kmax].Plot the first 5 iterates of v and c and compare them to their exact form.
Problem 3 (Cake-eating problem)
Consider a social planner who chooses a non-negative sequence ct, kt+1+∞t=0 to maximize
+∞∑
t=0
βtU(ct)
subject to:ct + kt+1 = kt for all t ≥ 0
given an initial level k0 of the capital stock. As usual, β ∈ (0, 1) and U is bounded, strictlyincreasing and strictly concave and continuously differentiable on IR+ with limc7→0 U ′(c) =+∞.
1. Let v be the value function associated with this problem. Write the Bellman equationwhich v must solve.
2. Show that the operator which the Bellman equation defines is a contraction mappingon [0, k0].
3. Show that v rises strictly and continously with k and that v is strictly concave.
4. Let c : [0, k0] 7→ [0, k0] be the optimal consumption policy function. Show that cincreases strictly and continuously with k.
5. Show that limk 7→0 v′(k) = +∞.
6. Show that the capital stock does not converge to a positive steady state value in thisenvironment.
7. Assume that U is the log function (ignore as usual the fact that the log function is notdefined at zero), guess that v(k) = a + b log k for all k > 0 where a, b > 0 and verifythat your guess is correct.
4.7. PROBLEMS 81
8. What is the optimal consumption policy function when β = 0?
82 CHAPTER 4. DETERMINISTIC DYNAMIC PROGRAMMING
Chapter 5
Stochastic dynamic programming
Economic decisions are often made under some uncertainty. In the context of dynamic
optimization problems, the reward or the evolution of the state associated with particular
decisions may in part be random or stochastic. For instance, the true shape of the production
function may depend on the state of technology at date t and this state may not be known
with full precision until date t itself.
Entire books are devoted in many fields (including philosophy) to defining what words
like “random” mean in a deep sense. For our purposes, we simply want to think about the
situation where an agent’s rewards or opportunities depend in part on the outcome of a
random experiment, an experiment whose outcome cannot be fully determined a priori.
The canonical random experiment is the flip of a coin. We can list all the possible
outcomes of this experiment and assign probability to them quite easily. What we need is a
framework that enables us to do this for all random experiments one can think of. For this
we need some notions of probability theory.
The following is a quick introduction to probability theory followed by a section that
extends our deterministic results to the stochastic case. Chapter 9 in SLP is somewhat
misleading in this respect by suggesting that the principle of optimality loses some generality
when one introduces uncertainty. It does not, as the next edition of SLP will explain. The
last sentence of the first paragraph of section 9.1, in particular, is entirely wrong. But this
is a technical detail and chapter 9 gives an excellent treatment of stochastic programming
techniques overall.
83
84 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
5.1 Probability theory
Outcomes of random experiments are draws from a set Ω (sometimes called the universe). For
instance, if the experiment we have in mind is the roll of an ordinary dice, Ω = 1, 2, 3, 4, 5, 6.
An event is a subset of the universe Ω. Sometimes we will want to consider all possible
subsets of Ω but in big spaces this creates problems. In general, we restrict events to be
members of a specific class of subset called a sigma-algebra.
A sigma-algebra is a subset F of 2Ω (the set of all subsets of Ω) such that
1. Ω ∈ F ;
2. if A ∈ F , the complement A ≡ Ω − A of A is also in F ;
3. for any countable collection Aii∈I ⊂ F ,⋃
i∈I Ai ∈ F
It is trivial to see that these properties imply that sigma-algebras contain the empty set
and that they are closed under countable intersection. Properties of sigma-algebras guarantee
that whenever we can talk about an event occuring, we can talk about it not occurring as
well. They also enable us to speak of “at least one” of several possible events occurring and
“none of a list of events occurring.” Elements of F are called measurable sets.
A pair (Ω,F) is called a measurable space. A measure on a measurable space (Ω,F) is a
function µ : F 7→ [−∞, +∞] such that
1. µ(∅) = 0;
2. µ(A) ≥ 0 for all A ∈ F ;
3. for any countable, disjoint collection Aii∈I ⊂ F , µ(⋃
i∈I Ai
)=∑
i∈I µ(Ai)
The triplet (Ω,F , µ) is called a measure space. A measure P such that P (Ω) = 1 is called
a probability measure and the measure space (Ω,F , P ) is then called a probability space.
In our dice-casting example, Ω = 1, 2, 3, 4, 5, 6, and since Ω is finite it is natural to take
F = 2Ω.1 Event (or measurable set) 2, 4, 6, for instance, is the event: “The outcome of
the roll is an even number.”
1How many sets does 2Ω contain? Answering counting questions such as these are the 99% of the art ofcalculating probabilities.
5.1. PROBABILITY THEORY 85
If we believe the die to be fair, then it is natural to posit that all outcomes are equally
likely. It is natural then to equip (Ω, 2Ω) with a uniform probability measure defined for all
A ∈ 2Ω by P (A) = #A#Ω
where #A is the cardinality of set A, i.e. the number of elements it
contains. For instance, P (2, 4, 6) = 36
= 0.5.
When we work with uncountable spaces such as the real line, it is “difficult” to work with
the set of all subsets of the universe. On the real line one convenient sigma-algebra to use
is the smallest sigma-algebra that contains all intervals (whether open, closed, half-closed,
bounded, unbounded . . . ). That sigma-algebra is called the Borel sigma-algebra and its
members are called Borel sets. In fact, economists often feel that they “have to” use this
particular sigma-algebra, natural a construction as it is. This belief often leads to needless
losses of generality, as in the case of SLP’s chapter 9. More on that later.
One could devote an entire course to discussing the structure of probability theory. We
don’t have that kind of time, but you should read chapter 7 in SLP as carefully as possible
and run any questions you may have by me.
The next notion we need is that of a random variable. A real-valued random variable X
in a probability space (Ω,F , P ) is a function X : Ω 7→ IR such that for any ω ∈ Ω : X(ω) ≤
c ∈ F .
In other words, a function is a random variable if for all Borel subsets B of the real-line we
can assign a probability to any event of the form X−1(B) = ω ∈ Ω : X(ω) ∈ B. Obviously,
we could similarly define a random variable from (Ω,F , P ) into any measure space. Random
variables are special cases of measurable functions in the context of probability spaces.
One easily shows that X−1(B) : B is a Borel set is a sigma-algebra contained in F .
It has a name: it is the sigma-algebra induced by X. Similarly, X induces a probability
distribution PX on the real line defined by PX(B) = P (X−1(B)) for all Borel sets B.
Here’s an example. Consider a bet that pays one dollar if the roll of a dice turns out
to be even and nothing otherwise. Letting X be the payoff associated with the bet, X ∈
0, 1, the sigma-algebra induced by X is ∅, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 and PX is the
probability distribution that puts probability one half on zero, one half on one, and nothing
elsewhere.
A key type of measurable function is the set of simple functions. Simple functions on a
86 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
measurable space (Ω,F) are linear functions of finitely many indicator functions. That is,
for ω ∈ Ω,
f(ω) =
n∑
i=1
ai1Ai(ω)
where n is an integer, Aini=1 ⊂ F is a finite collection of measurable sets, ain
i=1 are real
numbers, and for all i, 1Aiis an indicator function that takes value 1 if ω ∈ Ai, 0 otherwise.
These functions are key for the following reasons. A function is measurable if and only
if it is the pointwise limit of increasing sequences of simple functions. In fact, we can find a
sequence of simple functions that converges uniformally to all bounded measurable functions.
(The proof is in SLP, Theorem 7.5.) Because simple functions are easy to deal with, these
facts are the main ingredient of many (most?) proofs in measure theory.
They are used, for one thing, to define a notion of integration (or expectation) in arbitrary
measure spaces. Let f =∑n
i=1 ai1Aibe a simple function in a measure space (Ω,F , µ). The
integral of f with respect to µ is
∫
Ω
fdµ =
n∑
i=1
aiµ(Ai).
Then, for an arbitrary measurable function f , let f+ = max(0, f) be the non-negative
part of f . Letting S be the set of simple functions φ such that φ ≤ f (meaning φ(ω) ≤ f(ω)
for all ω ∈ Ω), we can define: ∫
Ω
f+dµ = supφ∈S
∫
Ω
φdµ.
One similarly defines the negative part f− of f by f− = max(0,−f) and its integral
exactly as above. Measurable function f is called integrable with respect to µ if both its
negative and positive parts have finite integrals. Then,
∫
Ω
fdµ =
∫
Ω
f+dµ −∫
Ω
f−dµ.
When Ω = IR (or some other real interval), when F is the set of Borel sets on IR and
when µ is the only measure on F such that µ([a, b]) = b−a for any interval [a, b], this notion
of integration coincides with the standard Riemann integral where it is defined. It is more
5.1. PROBABILITY THEORY 87
generally referred to as the Lebesgue integral.
More importantly for our purposes, when µ is a probability measure, it coincides with
the standard expectation operator. Considering once again our role-of-the-dice example and
letting X be the random variable associated with a bet that pays one dollar if the roll of a
dice turns out to be even and nothing otherwise,
∫X(ω)dµ = E(X) =
6∑
i=1
1
6X(ωi) = 0.5
.
In the case of real random variables X with a Riemann integrable density function f , the
corresponding probability measure on IR is µ([a, b]) =∫ b
af(x)dx for all intervals [a, b] and
∫Xdµ = E(X) =
∫
IR
xf(x)dx.
The bottom line is that the broad notion of integral we have introduced encompasses all
familiar cases.
Sometimes it is useful to combine different random experiments hence different probabil-
ity spaces. Given two measurable spaces (X,F) and (Y,G) the product space (X ×Y,F ×G)
is the measure space one obtains when F ×G is the smallest sigma-algebra that contains all
sets of the form A × B where A ∈ F and B ∈ G.
Finally, we will need the notion of conditional expectation. Let (Ω,F , P ) be a probability
space and A be an element of F such that P (A) > 0. Then we can define the probability of
event B in F conditional on A as usual as P (B|A) ≡ P (B∩A)P (A)
.
For instance, what is the probability that the outcome is 2 in the dice-roll given that the
outcome is even? It is
P (2|2, 4, 6) =P (2 ∩ 2, 4, 6)
P (2, 4, 6)=
P (2)P (2, 4, 6)
=1
3.
It is easy to show that P (B|A) is a probability measure on F . We can then define
the expectation of a random variable X conditional on Ai as E(X|Ai) =∫
XdP (•|A). In
general, we need to extend this notion to sets of measure zero. For instance, given a pair
88 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
(X, Y ) of real random variables with joint, continuous density f and continuous, everywhere
positive marginal distributions fX and fY , we would like a notion of conditional expectation
that coincides with the standard notion. This can be done in great generality, as discussed
in SLP in section 7.7.
5.2 Transition functions
In stochastic problems, the state often depends on the value of a random variable. Further-
more, future values of these shocks may depend on past values. In all, the evolution of the
agent’s opportunities and rewards depends on a sequence of random variables. Indexed sets
of random variables (of which sequences are a special case) are called stochastic processes.
The study of stochastic processes is an important branch of probability theory. Here, we
only need to be able to talk about the transition of a process from one state to another (but
you should read chapter 8 in SLP in its entirety.)
Let (Z,Z) be a measurable space. A transition function is function Q : (Z,Z) 7→ [0, 1]
such that for all z ∈ Z, Q(z, •) is a probability measure on (Z,Z) and for each A ∈ Z,
Q(•, A) is a measurable function.
In words, Q(z, •) is the distribution of next period’s shock given this period’s value and
Q(•, A) gives the likelihood of a particular event as a function of today’s shock. All told,
Q(z, A) is the probability of landing in set A next period given current state z. Since Q is
a well-defined measure, we can take expectations vis-a-vis it and write for all Z-measurable
and Q − integrable functions f ,
(Tf)(z) =
∫f(z′)Q(z, dz′).
Henceforth we restrict our attention to bounded, measurable functions, which implies inte-
grability. The operator T then defines an operator from the set of bounded Z-measurable
functions to itself.
Also note that Q also induces a mapping from the set of Z-adapted probability measures
to itself. Given a Z-adapted distribution λ of shocks today, next period’s distribution is
5.3. MARKOV CHAINS 89
given for each A ∈ Z by
(T ∗λ)(A) =
∫Q(z, A)λ(dz).
That mappings T and T ∗ are well-defined and preserve Z-measurability requires a proof
of course, the proofs are in SLP’s chapter 8. One can also show that the two mappings are
intimately related (that they are dual notions). For any bounded Z-measurable function f
and any Z-adapted probability measure λ, one shows with a bit of work that
∫(Tf)(z)λ(dz) =
∫f(z′)(T ∗λ)(dz′).
In words, it does not matter if we apply the transition operator to f or to λ first, we arrive
at the same expected value next period.
We say that Q satisfies the Feller property if Tf is continuous whenever f is, and that it
is monotonic if Tf is increasing whenever f is. An example is probably useful at this stage.
5.3 Markov chains
(Reading the first section of chapter 11 will be useful for what follows.)
Assume that the state space S for the stochastic shock is a finite set s1, s2, . . . sn. The
natural sigma-algebra for such a set is the set of all subsets of S. Transition functions simply
assign a probability Πij of moving from state si to state sj and distributions of shocks are
1 × n vectors p such that∑
i pi = 1.
The resulting process is called a Markov chain. The transition function is fully summa-
rized by a n × n matrix. If the current shock is si, the distribution of shocks in the next
period is Πij : j = 1, . . . n. Generally, the process maps a distribution p of shocks to
distribution pΠ one period ahead and, recursively, pΠn after n periods.
A question that one often asks in economics is whether pΠn converges to some distribution
as n grows large. Obviously, if pΠn converges to p∗ (in IRn), it must be the case that p∗Π = p∗
(by continuity).
A distribution like p∗ is called an invariant distribution. All Markov chains have at least
one. How many it has depends on how many ergodic sets the chain has. A set E ⊂ S is
90 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
called ergodic if from all s ∈ E, the probability P (s, E) of remaining into E is one and if no
proper subset of E has this property.
A set E is called transient if there is a positive probability of leaving it and never returning
to it.
Some work shows that the state space of a Markov chain can be partitioned into ergodic
sets and one transient set, and that invariant distributions of Markov chains are the convex
combinations of at most n distributions (that can be computed quite easily via matrix
multiplication, see Theorem 11.1 in SLP).
A necessary and sufficient condition for a Markov Chain to have one invariant distrib-
ution is that it have exactly one ergodic set. This is the case if and only if a state exists
that is eventually visited with positive probability from any state. Finally, the sequence
pΠn converges to this invariant distribution given any initial distribution p under certain
conditions (see Theorem 11.4) which hold for instance if Π has only strictly positive entries.
5.4 Stochastic control problems
Now we have all the tools we need to extend our deterministic dynamic programming results
to the stochastic case.
Consider an agent who solves a control problem in the presence of stochastic shocks
whose evolution is described by a transition function Q on a measure space (Z,Z). Other
state variables, as before, take values in state X ∈ IRn, and actions come from state Y ∈ IRm
where n and m are integers.
The choice set Γ is now a correspondence from X ×Z to Y and the reward function R is
defined on X × Y × Z. The transition function for the endogenous state is now a function
g from X × Y × Z into X. There is, as before, a discount factor.
At date t the set of all possible shock histories is Zt and we can equip this set with the
product sigma-algebra Z t. We denote a particular element of Zt by zt and the sth element
of zt by zts. It is easy to see (and formally shown in section 8.2 of SLP) that the transition
function induces a unique probability distribution of date t histories µt(z0, •) that depends
on the initial state.
5.4. STOCHASTIC CONTROL PROBLEMS 91
At date 0 the agent may choose any action a0 ∈ Γ(x0, z0). From then on, the agent must
make her plan contingent on future realizations of the stochastic shock. A plan, then, is a
sequence πt of Z t-measurable functions into action set Y . Plan πt is feasible if for all t
πt is Z t-measurable, and:
1. π0 ∈ Γ(x0, z0),
2. πt(zt) ∈ Γ(g(xt−1, z
t−1), ztt).
Obviously, a necessary condition to impose before proceeding is that a feasible plan exists.
In turn, this requires first that Γ have a measurable selection. But the key thing to recognize
here is that we have yet to say exactly what we mean by “measurability” i.e. to choose a
specific sigma-algebra Z. If we insist on Borel-measurability, we will run into trouble because
the existence of a Borel-measurable selection does not guarantee that the value function is in
turn Borel measurable (hence integrable.) And there is no guarantee that a Borel-measurable
policy function exists.
But these are superficial problems. There is no reason (at this level of generality) why
one should insist on Borel measurability. The analogue in the existence problem we studied
in chapter 1 would be to insist on a particular topology and to give up if that particular
topology does not work.
In any event, the bottom line is that a notion of measurability that makes the stochastic
principle of optimality as general as the deterministic one exists. It is called universal
measurability and is defined for instance in Schreve and Bertsekas (1978.) Importantly, one
can start as is typical with Borel-measurable objects. A Borel transition function has a
unique universally measurable extension. Borel-measurability of the other objects implies
universal measurability. Assuming for instance that Γ has a Borel selection implies a fortiori
that it has a universally measurable one. And the value function is then always universally-
measurable, as does a universally measurable Markov policy. In a word (or two), there is no
problem.
92 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
5.5 The stochastic principle of optimality
Let Π(x0, z0) the set of feasible policies. The value function associated with the stochastic
control problem is
v∗(x0, z0) = supπ∈Π(x0,z0)
+∞∑
t=0
∫
Zt
βtR(x(zt), πt(zt), zt
t)µt(dzt)
where x(z0) = x0 and for all t > 0 and all possible histories zt, x(zt) = g(x(ztt−1), π(zt
t−1), ztt−1).
This seems messy but the bottom line is that under very general conditions, v∗ solves
the following functional equation:
v(x, z) = supa∈Γ(x,z)
R(x, a, z) + β
∫v(g(x, a, z), z′)Q(z, dz′) for all z ∈ Z and x ∈ X.
The operator associated with the expression above defines a contraction mapping T from
bounded, universally measurable functions to bounded, universally measurable functions.
Furthermore, (assuming as before that R is bounded) v∗ is the only bounded fixed point
of T . As before, we can make v∗ is continuous, strictly increasing, strictly concave and
differentiable in the endogenous state by assuming that R, Γ and Q satisfy certain properties.
This is developed in section 9.2 in SLP.
Also, we can use π(x, z) = arg maxa∈Γ(x,z) R(x, a, z)+β∫
v(g(x, a, z), z′)Q(z, dz′) to build
an optimal Markov plan for any set of initial condition. The plan is Markov in that the action
it specifies at a particular date only depends on the current value of the state (x, z), it does
not depend on the date. And everything can be computed via value function iteration, just
like in the deterministic case.
Let’s now illustrate all this with an example.
5.6 The stochastic Ramsey problem
Consider a variation of the Ramsey problem we have studied many times in this course
in which the only thing that changes is the production function. At a particular date,
assuming the economy has capital stock k > 0 output is zf(k) where f satisfies the same
5.6. THE STOCHASTIC RAMSEY PROBLEM 93
assumptions as always and z ∈ [zL, zH ] where zL > 0 and zH < ∞. Furthermore, z follows
a Markov process with a stationary transition function Q that is monotonic, satisfies the
Feller property and induces a unique, globally ergodic invariant distribution. The Bellman
equation associated with this problem is, for all (k, z) ∈ IR+ × [zL, zH ] :
v(k, z) = max0≤c≤zf(k)
U(c) + βE [v(zf(k) + (1 − δ)k − c, z′)|z]
This problem is studied in great details by Brock and Mirman (1972) when productivity
shocks are independent across periods (a trivial sort of Markov process), and by Mehra and
Donaldson (1983) when shocks are correlated. In both cases, it is easy to show that both
investment and consumption rise with output. They are independent of the current shock in
the independent case, but depend on the shock even at equal output in the correlated case
since the current value of the shock affects expectations of future shocks. How they depend
on the shock depends on the shape of preferences, the degree of risk aversion in particular.
One big difference between the stochastic case and the certainty case is that the steady
state depends on the reward function in the first case but not in the second. Among other
things, more risk averse agents accumulate more capital when faced with uninsurable uncer-
tainty and save more for precautionary reasons as emphasized by Ayagari (1994).
In both the correlated and the independent case (under the assumptions we have imposed
on Q), the distribution of capital converges to an invariant distribution that does not depend
on initial conditions. To understand what this means, assume that we start from any initial
level of capital k0. The value of the capital stock at date t > 0 can be computed given a any
sequence of shocks and the optimal consumption policy. Because shocks are uncertain, kt is a
random variable with distribution Ft. Saying that the distribution of capital converges to an
invariant distribution is saying that Ft converges (uniformally, see Brock and Mirman, 1972)
to some invariant distribution F . In particular, if we draw a long sequence of shocks and
the corresponding series kt of capital stocks and drop the first half (say) of the sequence,
we have a sample of capital stock that is approximately drawn from F and we approximate
the distribution moments (mean, variance . . . ) using this sample.
Another way to think about this is to assume that the economy is populated by a con-
94 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
tinuum of households who operate the same production function as above but face different
shocks from one another. Because there is a continuum of households, we can assume that
Π gives the fraction of households who experience each of the possible transitions in each
period. Because households face different (idiosyncratic) shocks, their stock of capital differ.
The results of Brock and Mirman imply that in such an environment, the distribution of
capital across households converges to F . (See Ayagari, 1994, for more on this.)
Consider finally the case where z is binary, i.e. drawn from zL, zH and follows a Markov
chain with stationary transition matrix
Π =
pLL 1 − pLL
1 − pHH pHH
where 0 < pLL < 1−pHH < 1. Since z can take two values, this is a system of two functional
equations:
v(k, zL) = max0≤c≤zLf(k)
U(c) + β[πLLv(zLf(k) − c, zL) + (1 − πLL)v(zLf(k) − c, zH)]
v(k, zH) = max0≤c≤zHf(k)
U(c) + β[(1 − πHH)v(zHf(k) − c, zL) + πHHv(zHf(k) − c, zH)]
This can be taken to the computer just as easily as before. Letting kmax be the unique
solution to zHf(k) = δk, the natural choices for X and Y are [0, kmax]. Then we can start
with v(•, z) ≡ 0 on [0, kmax] for z ∈ zL, zH. Using value-function iteration just like before
(see homework 4) we can compute (approximations to) the optimal policy function c(k, z)
and the value function v(k, z).
We can even simulate the economy’s stationary distributions by drawing from it (see
homework 4) and ask, for instance about the effects of risk on aggregate savings. Fun stuff.
5.7. PROBLEMS 95
5.7 Problems
Problem 1
1. Under what conditions do finite state Markov chains induce a transition function thatsatisfy the Feller Property?
2. Under what conditions do finite state Markov chains induce a transition function thatis monotonic?
3. Consider a two-state Markov chain with transition matrix
Π =
(0.8 0.20.2 0.8
).
Compute the chain’s unique invariant distribution. Does the chain converge there fromany initial distribution?
4. Consider a two-state Markov chain with transition matrix
Π =
(0 1
0.5 0.5
).
Show that the chain has a unique invariant distribution and show that it convergesthere from any initial state.
5. Consider a two-state Markov chain with transition matrix
Π =
(1 00 1
).
How many invariant distributions does this chain have?
Problem 2
Consider the stochastic Ramsey problem described in section 5.6 of the notes. Assumethat the stochastic shock follows a Markov chain with two states zL, zH and with stationarytransition matrix
Π =
(pLL 1 − pLL
1 − pHH pHH
)
where 0 < pLL < 1 − pHH < 1.
1. Instead of (k, z) define (y, z) to be the state of the system where y is current output.Write the corresponding Bellman equation.
2. Show that the operator which the Bellman operator defines on IR+ × zL, zH is acontraction mapping.
96 CHAPTER 5. STOCHASTIC DYNAMIC PROGRAMMING
3. Assume that zL = 9 and zH = 11, and that
Π =
(0.8 0.20.2 0.8
).
Let kmax be the unique solution to δk = zHf(k) and let c : [0, kmax] × zL, zH 7→[0, kmax] be the optimal consumption policy function. Assume that β = 0.95, δ = 0.1,f(k) = 10k0.33 for all k and U is the log function. (Ignore as usual the fact that U isnot defined at 0.) Use value-function iteration to compute an approximation to v andc on [0, kmax] × zL, zH. Plot v(•, zL), v(•, zH), c(•, zL), c(•, zH).
4. As we have argued in class, this economy converge to a steady state distribution ofcapital. We can simulate this steady state distribution using a process called MarkovChain Monte Carlo. Specifically, begin with k0 = 1 and z0 = 9. The policy functionyou computed above gives you k1. Draw z1 using the chain’s transition matrix andthe computer’s random number generator. Proceed and generate 20, 000 draws fork. Drop the first 10, 000 (the assumption here is that after 10,000 periods we areeffectively drawing from the invariant distribution.) Plot the histogram associatedwith the remaining 10,000 draws.
5. Calculate the average value of the capital stock. How does this compare to the steadystate value of capital when zL = zH = 10? (Is it higher or lower?) Suggest anexplanation for this finding.
Chapter 6
Bibliography
Ayagari, R. (1994) “Uninsured Idiosyncratic Risk and Aggregate Saving,” Quarterly Jour-nal of Economics, 109: 659-84.
Balasko, Y. and Shell, K., (1980) “The Overlapping-Generations Model I: The Case of PureExchange without Money”, Journal of Economic Theory, 23, 281-306.
Barro, R. (1974). “Are Government Bonds Net Wealth?” Journal of Political Economy,82, 1095-1117.
Brock, W. A. and Mirman, L. J., (1972) “Optimal Economic Growth and Uncertainty: TheDiscounted Case,” Journal of Economic Theory, 4, 479-513.
Cass, D. (1972). “On Capital Overaccumulation in the Aggregative, Neoclassical Modelof Economic Growth: A Complete Characterization”, Journal of Economic Theory, 4,200-23.
Diamond, P. (1965), “National Debt in a Neoclassical Growth Model”, American EconomicReview, 55, 1026-50.
Mehra, R., and Donaldson, J. B., (1983) “Stochastic Growth with Correlated ProductionShocks,” Journal of Economic Theory, 29, 282-312.
Jones, L. E., and Manuelli, R. E., (1990) “A Convex Model of Equilibrium Growth: Theoryand Policy Implications” Journal of Political Economy, 98, 1008-1038.
Jones, L. E., and Manuelli, R. E., (2005) “Neoclassical Models of Endogenous Growth: TheEffects of Fiscal Policy, Innovation and Fluctuations,” in: Philippe Aghion and StevenDurlauf (ed.), Handbook of Economic Growth, edition 1, volume 1, chapter 1.
Hildenbrand, W. and Kirman, A. P., (1988) “Equilibrium Analysis.”
Hopenhayn, H. (1992) “Entry, Exit, and firm Dynamics in Long Run Equilibrium,” Econo-metrica, Vol. 60, No. 5., 1127-1150.
97
98 CHAPTER 6. BIBLIOGRAPHY
Kehoe, T. J. (1989) “Intertemporal General Equilibrium Models”, in Frank H. Hahn, editor,The Economics of Missing Markets, Information, and Games, Oxford University Press,363-93.
Lucas, R. and Prescott, E. (1971) “Investment under Uncertainty,” Econometrica, Vol. 39,No. 5., 659-681.
Michel, P. (1990), “Some clarifications on the transversality condition,” Econometrica, Vol.58, No. 3., pp. 705-723.
McGrattan, E., R. (1998), “A Defense of AK Growth Models”, Federal Reserve Bank ofMinneapolis Quaterly Review, 22, 13-27.
Negishi, T. (1960), “Welfare Economics and Existence of an Equilibrium for a CompetitiveEconomy”, Metroeconomica, 12, 92-7.
Ramsey, F. (1928), “A Mathematical Theory of Saving”, Economic Journal, 38, 543-59.
Rebelo, S. (1991), “Long-Run Policy Analysis and Long-Run Growth”, Journal of PoliticalEconomy, 99, 500-521.
Santos, M., S. (1991) “Smoothness of the Policy Function in Discrete Time EconomicModels,”Econometrica, Vol. 59, No. 5., 1365-1382.
Solow, R. (1956), “A Contribution to the Theory of Economic Growth”, Quarterly Journalof Economics, 70, 64-94.