summary of the papers on increasing risk by rothschild and stiglitzsecure site...

Summary of the papers on ”Increasing risk”

by Rothschild and Stiglitz

Seminar work

Pavol Majher

1169390

Economic Literature Seminar

Lecturer: prof. Manfred Nermuth

November 2011

1 Introduction

A comparison of the variability (or “riskiness”) of different random variables has been

of a particular interest of many economists. Over past decades, several approaches to

address this problem have been developed. One of the most essential works in this field is

series of articles written by Rothschild and Stiglitz (1970, 1971, 1972), which have become

widely recognized over time.

An aim of this seminar work is to overview theory and results, which are presented in

these papers. Our structure follows the layout of the original articles. We start with

an introduction of four concepts to compare variability and provide a deeper theoretical

background for one of them. Consequently we move on to the definition of three different

partial orderings, which are related to these different approaches to the risk comparison.

Furthermore, as a main result of the paper Rothschild and Stiglitz (1970), we show the

proof of their mutual equivalence. Moreover, several remarks are spend on the compari-

son to the approach of mean-variance analysis as well as to the references mentioned in

Rothschild and Stiglitz (1972).

In the second part of our work, we present the second paper Rothschild and Stiglitz (1971),

which focuses on the economic applications on the derived framework. An economic exam-

ples such that savings and uncertainty, a portfolio problem or firm’s production problem

are overviewed. In each case, the focus is on the impact, which the higher rate of uncer-

tainty has on the decision making process.

2 Theoretical Background

2.1 Different Concepts of Risk Comparison

We start with an informal introduction of different approaches to the risk comparison,

which are formalized later in section 3.1. As there is an ambition to set up the criteria to

decide when is the random variable Y more “variable” than different random variable X,

four possible answers are listed in Rothschild and Stiglitz (1970):

1. Y is equal to X plus noise

It is reasonable that the random variable created from the original by adding some

uncorrelated noise should be riskier than the former. To illustrate this concept, let’s

2

take X a lottery ticket that pays ai with probability pi (such that∑

pi = 1). Then

Y could be considered as a lottery ticket, which pays bi with probability pi such

that bi is either ai or lottery ticket with expected value ai.

2. Every risk averter prefers X to Y

A risk averter is defined as a one having a concave utility function. Thus X can

be viewed as less risky than Y if EU(X) ≥ EU(Y ) (given that X and Y have the

same mean).

3. X has less weight in the tails than Y

For random variables X and Y with density functions f and g, it seems adequate

to present X as less variable if some probability weight has been shifted in f from

the center towards the tails to obtain g in way that the mean remained the same.

4. Y has a greater variance than X

Comparison of variances of two random variables is commonly used tool to compare

their riskiness.

As it is presented later (particularly part 3.2), the first three concepts are mutually equi-

valent definitions of greater riskiness, while the last one provides quite different apporach.

More on this “difference” is presented in part 3.3.

Before moving forward, we have to introduce notation as used in the original paper. From

now on, X and Y denote random variables with cumulative distribution functions (cdf’s)

F and G and (in case they exist) densities f and g. At the time of paper’s publication the

results applied only at the cdf’s with the points of increase at a bounded interval, which

was conveniently represented by the interval [0, 1]. As the authors mentioned, the ex-

tension of the results to cdf’s defined on the real line requires solution of multiple rather

difficult convergence problems, which furthermore are only of little economic interest.

Moreover, it has been shown (e.g. in Mayer (1966) or Strassen (1965)) that our results

would be restricted if generalized on the real line (for more detail see section 3.4).

2.2 Mean Preserving Spread

Most of the presented concepts can be formalized quite intuitivelly, the exception is the

third approach concerned with the comparison of the weight in the tails. Therefore the

3

following part is devoted to give a geometrically motivated definition to this approach to

the risk comparison, which would be sufficiently general and analytically convenient.

We start with definition of Mean Preserving Spreads for continuous as well as discrete

random variables.

Definition 1. Mean Preserving Spreads: Densities

Let’s denote as mean preserving spread (MPS) a step function s(x) defined in a fol-

lowing way:

s(x) =

α ≥ 0 x ∈ (a, a+ t)

−α ≤ 0 x ∈ (a+ d, a+ d+ t)

−β ≤ 0 x ∈ (b, b+ t)

β ≥ 0 x ∈ (b+ e, b+ e+ t)

0 otherwise

where

0 ≤ a ≤ a+ t ≤ a+ d ≤ a+ d+ t ≤ b ≤ b+ t ≤ b+ e ≤ b+ e+ t ≤ 1

and

βe = αd.

Note that a MPS is constructed such that∫ 1

0s(x)dx = 0 and also

∫ 1

0xs(x)dx = 0.

Therefore if we construct a function g = f + s from the density function f such that if

∀x : g(x) ≥ 0, then g is also a density function with the same mean as f . Furthermore,

we can say that density g differs from density f by a single MPS, if a difference function

g − f is a MPS.

We can formalize similar concept also for the discrete random variables.

Definition 2. Mean Preserving Spreads: Discrete Distributions

Let’s have the discrete r.v.’s X and Y described in the following way:

Pr(X = ai) = fi and Pr(Y = ai) = gi,

where ai is an increasing sequence of real numbers between 0 and 1 and∑

i fi =∑

i gi = 1.

Moreover, let fi = gi for all i except four, say i1 < i2 < i3 < i4. Then we say that Y

differs from X by a single MPS, if (denoting ak = aik , fk = fik and gk = gik )

g1 − f1 = f2 − g2 ≥ 0, f3 − g3 = g4 − f4 ≥ 0 and

4∑k=1

ak(gk − fk) = 0.

4

2.3 The Integral Conditions

Now we use the notion of MPS to introduce the integral conditions, which will formalize

the approach comparing the weight on the tails of random distributions. Let’s consider two

densities f and g, which differ by a single MPS s as defined in definition 1. Consequently,

the difference S = G − F of the corresponding cdf’s can be expressed as an indefinite

integral S(x) =∫ x

0s(u)du.

It’s easy to see that S(0) = S(1) = 0. Moreover, given the definition 1 we receive

∃z ∈ [0, 1] : S(x)

≥ 0 if x ≤ z

≤ 0 if x > z(1)

Finally, let’s denote T (y) =∫ y

0S(x)dx. We obtain

T (1) =

∫ 1

0

S(x)dx =[xS(x)

]10−

∫ 1

0

xs(x)dx = 0 (2)

and consequently using (1) and (2)

T (y) ≥ 0, y ∈ [0, 1). (3)

The conditions (2) and (3) are from now on referred to as the integral conditions. Note

that along with (1) they also hold for S = G−F , where G and F are discrete distributions

differing by a single MPS.

In order to use the concept of MPS as an foundation for a definition of greater variability,

we have to inquire about transitivity in sense whether G could have been obtained from

F by a sequence of MPS for F and G denoting the cdf’s of compared random variables

X and Y . By using two theoretical statements we show that convenient criterion for

comparison is contained in the integral conditions (2) and (3).

First, we state that obviously if G is obtained from F by a sequence of MPS’s, than G−F

satisfies (2) and (3). The proof is omitted as it is trivial.

Theorem 1. Let’s assume that

(a) there is a sequence of cdf ’s Fn converging (weakly) to G (Fn → G),

(b) Fn differs from Fn−1 by a single MPS denoted Sn (i.e. Fn = Fn−1+Sn = F0+∑n

i=1 Si).

Then G = F +∑∞

i=1 Si = F + S and S satisfies the integral conditions (2) and (3).

Now we provide the non-trivial result (and somehow reverse of Theorem 1) that the

integral conditions satisfied by G−F imply an existence of an aproximation of G formed

by F and a sequence of MPS’s.

5

Theorem 2. Let’s assume that G− F satisfies the integral conditions (2) and (3).

Than there exist sequences Fn and Gn such that Fn → F , Gn → G and for each n,

Gn could have been obtained from Fn by a finite number of MPS’s.

This theorem results from the two partial results: the first lemma proves it for simple step

functions with a finite number of steps and the other one is concerned with approximation

of the arbitrary cdf’s F and G to any desired degree by the step functions, which moreover

satisfy the integral conditions. Because of lack of space, the explicit proofs aren’t provided

in this work, nevertheless the basic idea of each of them will be presented. For the complete

proofs see the original paper (Rothschild and Stiglitz, 1970, p. 232)

Lemma 1. Assume that cdf’s F and G have a finite number of increase points and

moreover S = G− F satisfies the integral conditions (2) and (3).

Then there exists a sequence of cdf’s F0, . . . , Fn, where Fi differs from Fi−1 by a single

MPS for all i = 1, . . . , n, such that F0 = F and Fn = G.

In the proof, we step by step “decompose” the function S into the particular MPS’s,

which are in turn used to gradually construct sequence F1, . . . , Fn from the function F .

As the S is by assumption a step function with a finite number of steps, we are able to

finish this process after a finite number of iterations.

Before presenting the next lemma, note that we use the following functional metric as is

shown for the arbitrary functions f1 and f2:

||f1 − f2|| =∫ 1

0

|f1(x)− f2(x)| dx.

Lemma 2. Denote T (y) =∫ y

0(G(x)− F (x))dx for cdf ’s F and G. Furthermore assume

that T (y) fulfills (2) and (3).

Then for each n there exist cdf ’s Fn and Gn of discrete random variable with a finite

number of increase points, which satisfy

||Fn − F || + ||Gn −G|| ≤ 4

n.

and moreover Tn(y) =∫ y

0(Gn(x)− Fn(x))dx meets the integral conditions.

The first part of the proof consists of a construction of Fn and Gn for fixed n. We uni-

formly divide interval [0, 1] into n subintervals I1, . . . , In of an equal length. Consequently

we show that if Fn(x) is any step function constant on each of these intervals such that

6

Fn(x) ∈ F (Ii) for x ∈ Ii, then ||Fn − F || ≤ 2n. By using a similar approach for G we

obtain the first part of the lemma.

As we see, the inequality in the lemma is satisfied by broad number of step functions Fn

and Gn, therefore in second part of the proof we define one particular pair of the functions

Fn and Gn and prove that corresponding Tn (defined as above) meets the integral condi-

tions. Let’s remark that the values fi and gi of the functions Fn and Gn at interval Ii are

chosen such that fi ∈ F (Ii), gi ∈ G(Ii) and (gi − fi)/n =∫Ii(G(x)− F (x))dx.

In conclusion, the presented results provide that the fact of the random variable Y ha-

ving “more weight in the tails” than X can be analytically represented by the integral

conditions (2) and (3) satisfied by the difference of the distribution functions.

3 Partial Orderings of Distribution Functions

We move on to the next section, which summarizes the biggest theoretical results of the

original work by Rothschild and Stiglitz (1970). After the formal definition of three dif-

ferent approaches to the risk comparison, we present and prove their mutual equivalence.

We conclude this part by remarks on difference with the mean-variance analysis as well

as an overview of the literature.

3.1 Definition of Partial Orderings

Foundation for the definition of greater uncertainty should be given by the concept of

a partial ordering, which is used on a set of distribution functions. Therefore we start

with a definition of this relation.

Definition 3. Partial Ordering

We define a relation ≤p a partial ordering on a set in case it is binary, transitive and

antisymmetric (meaning that X ≤p Y and Y ≤p X imply X = Y ).

In the previous section, we formalized the comparison of the “weight in the tails” of distri-

bution function by the concept of MPS. Now we formally define corresponding approach

of risk comparison.

Definition 4. Define F ≤I G iff G− F(or more precisely T (y) =

∫ y

0(G(x)− F (x))dx

)satisfies conditions (2) and (3).

7

To justify this definition, we have to prove following:

Lemma 3. Relation ≤I is a partial ordering.

The fact that it is transitive and reflexive is evident. In order to prove antisymmetry,

let’s contruct S1 = G− F and S2 = F −G. Apparently S1(x) + S2(x) = 0, which implies

T1(y) + T2(y) = 0 and consequently from the integral conditions T1(y) = T2(y) = 0.

However, this implies Si = 0 (i = 1, 2) almost everywhere (up to set of measure zero),

since any non-zero (and thus in fact strictly positive) part of Si on the set with positive

measure would end up in Ti > 0.

The second definition corresponds to the statement that less risky random variables are

preferred by every risk averter.

Definition 5. Define F ≤u G if and only if∫ 1

0

U(x)dF (x) ≥∫ 1

0

U(x)dG(x)

for every bounded concave function U .

In this case again the properties of transitivity and reflexivity are apparent. As to anti-

symmetry, it is a consequence of Theorem 3 below.

Finally, let’s formalize the notion that adding the noise to the distribution increases the

riskiness of the given random variable.

Definition 6. Define F ≤a G iff there exists a joint distribution function H(x, z) of the

random variables X and Z defined on [0, 1]× [−1, 1] such that if

J(y) = Pr(X + Z ≤ y),

then

F (x) = H(x, 1)

G(y) = J(y)

and

E(Z|X = x) = 0.

Notice that the equivalent definition for the random variables X and Y would be X ≤a Y

iff Y =d X + Z (not Y = X + Z) for some random variable Z such that E(Z|X) = 0.

8

An important characterization can be given for the discrete distributions X and Y with

a finite number of points. It can be shown, that its formal structure is the same to

those given by theoretical frameworks of the inequality of income distributions and the

informativeness of information structures.

Assume that distributions of X and Y are determined by the concentration points ai and

probabilities fi, gi (i = 1, . . . , n) such that

Pr(X = ai) = fi and Pr(Y = ai) = gi.

Now let’s define a random variable Z, which conditionally depends on X in the following

way

cij = Pr(Z = aj − ai|X = ai), i, j = 1, . . . , n.

As a result we receive that X ≤a Y iff

n∑j=1

cij = 1, i = 1, . . . , n, (4)

n∑j=1

cij(aj − ai) = 0, i = 1, . . . , n, (5)

gj =n∑

i=1

ficij. j = 1, . . . , n. (6)

Comparing this statement with the previous definition we see, that expression (4) provides

for Z being a random variable, condition (5) relates to E(Z|X = x) = 0 and equation (6)

links to the statement Y =d X + Z.

We conclude this characterization by the matrix form of conditions (4), (5) and (6), where

e stands for all-ones vector (e = (1, . . . , 1)′):

Ce = e, Ca = a, g = fC. (7)

Matrix form of the above equations is very convenient, as it allows us to easily prove the

property of reflexivity for the discrete distributions with the finite number of points

Lemma 4. If random variables X1, X2 and X3 are concentrated at a finite number of

points, then X1 ≤a X2 ≤a X

3 implies X1 ≤a X3.

The proof simply uses the fact that if matrices C1 and C2 satisfy conditions (7), than

they hold also for matrix C∗ = C1C2.

9

3.2 Equivalence Theorem

We now proceed to the main theoretical finding of the paper by Rothschild and Stiglitz

(1970), which proves the equivalence of the different apporaches to the risk comparison.

Theorem 3. The following statements are mutually equivalent:

(A) F ≤I G,

(B) F ≤u G,

(C) F ≤a G.

As this theorem is the essential for the original paper, we present its proof in our work,

though slightly modified. To obtain the desired equivalence, the divide it into a sequence

of implications and prove each one of them individually.

(C) ⇒ (B)

We assume that F ≤a G, i.e. Y =d X + Z and E(Z|X = x) = 0 for some random

variable Z. Let’s take U an arbitrary concave function. For X fixed, we take expectations

with respect to Z and use Jensen’s inequality to obtain:

EXU(X + Z) ≤ U(EX(X + Z)) = U(X).

Furthermore, we apply expectations with respect to X and finally receive

EEXU(X + Z) = EU(Y ) ≤ EU(X).

(B) ⇒ (A)

As F ≤u G, for every concave U we have under definition 5∫ 1

0U(x)dS(x) ≤ 0, where

S = G − F . Using the fact that x and −x are both concave, we receive∫ 1

0xdS(X) ≤ 0

and also∫ 1

0−xdS(x) ≤ 0, together implying

∫ 1

0xdS(X) = 0. By integration by parts,

0 =

∫ 1

0

xdS(x) =[xS(x)

]10−

∫ 1

0

S(x)dx =

∫ 1

0

S(x)dx = T (1),

which yields the integral condition (2). Consequently, let’s consider special function

by(x) = max(y − x, 0) for fixed y. Since −by(x) is concave, we obtain:

0 ≤∫ y

0

(y − x)dS(x) = yS(y)−∫ y

0

xdS(x) = yS(y)−[xS(x)

]y0+

∫ y

0

S(x)dx = T (y).

10

(A) ⇒ (C)

Let’s first consider F and G discrete random variables, defined as follows

Pr(X = ai) = fi and Pr(Y = ai) = gi,

which differ by a single MPS. Using the definition 2, let’s consider points of different

probability weights a1 < a2 < a3 < a4. Consequently, by denoting γk = gk− fk, we obtain

γ1 = −γ2 ≥ 0, γ4 = −γ3 ≥ 0 and4∑

k=1

akγk = 0.

Let’s define matrix C in following way:

C =

1 0 0 0

γ1(a4−a2)f2(a4−a1)

g2f2

0 γ1(a2−a1)f2(a4−a1)

γ4(a4−a3)f3(a4−a1)

0 g3f3

γ4(a3−a1)f3(a4−a1)

0 0 0 1

.

Based on the characterization of ≤a for discrete random variables in definition 6, we know

that it suffices to show that elements cij of matrix C satisfy conditions (4), (5) and (6).

It’s easy to prove that conditions (4) and (5) are met, thus providing that Z defined by

cij = Pr(Z = aj − ai|X = ai)

is a random variable and satisfies E(Z|X) = 0. To show that Y =d X+Z (condition (6)),

following the approach in original proof, let’s define a discrete variable Y 1 = X +Z. Fact

that E(Z) = 0 implies E(Y 1) = E(Y ). Moreover, Y 1 may differ from Y only by different

probability weight in the points a1, a2, a3 and a4. However, by definition of Z and Y 1 we

obtain

Pr(Y 1 = a2) = Pr(X = a2) Pr(Z = 0|X = a2) = f2g2f2

= Pr(Y = a2)

and similarly Pr(Y 1 = a3) = Pr(Y = a3). Therefore the difference between these random

variables is possible only in probability of points a1 and a4. But, as a1 < a4, an inequality

of probabilities would yield an inequality of mean values, which contradicts with a fact

E(Y 1) = E(Y ). Thus Y =d Y 1.

We use theoretical findings already derived in the paper to extend this result to all cdf’s.

First, by lemmas 1 and 4 the implication holds also for the discrete distributions with

11

a finite number of points. Finally, theorem 2 provides for the validity of result for all the

cdf’s.

Assuming F ≤I G, theorem guarantees an existence of sequences of discrete distributions

with finite number of increase points {Fn} and {Gn} such that Fn → F , Gn → G and

Fn ≤I Gn, which implies Fn ≤a Gn by the first part of the proof. Let’s denote Hn(x, z)

the joint distribution function of the random variables Xn and Zn in the way that if

Jn(y) = Pr(Xn + Zn ≤ y), then

Jn(y) = Gn(y), Fn(x) = Hn(x, 1) andE(Zn|Xn) = 0.

The last condition can be represented as∫ 1

0

1−1u(x) z dHn(x, z) = 0 (8)

for all continuous functions u(x) on [0, 1]. Let’s denote the expression in the equation (8) as

Mn. Since the distribution function Hn is stochastically bounded, a subsequence {Hn′} of

the sequence {Hn} exists such that Hn′ → H(x, z), where H(x, z) is the joint distribution

function of X and Z. It’s easy to see that Hn′(x, 1) → F (x) and Jn′ → G. Moreover,

Mn′ →∫ 1

0

∫ 1

−1u(x)zdH(x, z) implying (since Mn′ ≡ 0)

∫ 1

0

∫ 1

−1u(x)zdH(x, z) = 0 and

furthermore E(Z|X) = 0, thus completing the proof.

3.3 Remarks on Mean-Variance Analysis

The following part contains several remarks on the risk-comparison approach concerned

with the comparison of variances of the random variables.In the section 2.1, we introduced

four different concepts to the risk comparison, however the equivalence proven above holds

only for three of them, excluding the mean-variance analysis described by ordering ≤v

(X ≤v Y if E(X) = E(Y ) and E(X2) ≤ E(Y 2)). A reason for this is that the relations ≤I ,

≤u and ≤a were characterized as the partial orderings, while the mean-variance analysis

is a complete ordering.

This characteristic is considered to be a disadvantage rather than advantage, as there are

examples of random variables X1 and X2 with the same mean such that E(X21 ) < E(X2

2 )

and E(U(X1)) < E(U(X2)) for some nonquadratic concave function. In fact, it can

be shown that a function U is quadratic (and convex) if and only if X ≥v Y implies

E(U(X)) ≥ E(U(Y )). On the other hand, partiality of the ordering of ≤I , ≤u and ≤a

12

can be demonstrated e.g. by case, where T (y) =∫ y

0(F (x)−G(x))dx changes sign. In such

a case, distributions F and G cannot be ordered.

Regarding the mean-variance analysis, Rothschild and Stiglitz noted Tobin’s assumption

that such approach may be appropriate for the restricted class of distributions. Authors

agree, however they object (see (Rothschild and Stiglitz, 1970, chap. IV.)) that these

restrictions are far too severe, allowing only for a changes in distributions from F to G

such that F (x) = G(ax+ b) for some a > 0, b (compare in Tobin (1965)).

3.4 Previous Literature on Given Topic

We conclude this part by remark about the previous literature on the covered topics,

as it is basically presented in the paper Rothschild and Stiglitz (1972). As the authors

reported, although they considered their result to be the entirely new idea, various sources

have proven them wrong.

The presented results on the equivalence of risk-comparison approaches, particularly the

theorem 3 as a main result of the paper, had been known especially to the mathematical

statisticians. For some time, it have already had an important place in a branch od

statistical theory called “the comparison of experiments”.

As to the examples of such works, these findings are presented e.g. in book by Blackwell

and Girschak (Blackwell and Girshick, 1954, chap. 12). Furthermore, more general as

well as modern methods can be found in chapter 11 of Mayer (1966) and Strassen (1965).

Let’s note here that these references have shown that the equivalence between the ≤a

and ≤u orderings holds for general distributions defined over more general spaces than

the interval [0, 1], such as the compact subsets of Rn. Unfortunately, ordering ≤I doesn’t

seem to provide for such a generalization.

4 Economic Applications

In the final part of this seminar work, we review the results presented in Rothschild and

Stiglitz (1971), which provide examples of the economic applications of findings derived

in Rothschild and Stiglitz (1970). As the authors state, two approaches to investigations

of the effect of risk on economic decisions are overviewed here: the effects of increasing

risk and choice of a probability distribution.

13

First part offers an alternative approach to the mean-variance analysis regarding the

problem of the economic effects of increasing risk. To provide some general framework,

let’s assume that an individual chooses a level of some control parameter α to maximize

expected utility∫U(θ, α)dF (θ), where θ is a random variable. Optimality condition for

the variable α is ∫∂U(θ, α)dF (θ)

∂α= EUα(θ, α) = 0. (9)

Assume further that α∗ is a unique solution of (9) and U is decreasing in α in the neigh-

bourhood of α∗. Given that Uα(θ, α) is a concave function of θ, our definition of risk

comparison (particularly definition 5 concerned with the behavior of all the risk averters

described through the concave utility function) implies that an increase in riskiness will

decrase α∗. Similarly, in case that the function Uα(θ, α) is convex in θ, a value of α∗

increases if the uncertainty is bigger.

In what follows, we try do apply this idea and decide about the conditions for convexity

and concavity of subject functions. As a general conclusion, we show that mean-variance

analysis provides results that are misleadingly general, opposing to our approach esta-

bilished by theorem 3. Moreover we show that the Arrow-Pratt concepts of relative and

absolute risk aversion provide a convenient approach to inquiry conditions for the con-

vexity or concavity of a given function.

After introduction of the main ideas, we present several examples of their possible app-

lication in known economic models. In part 4.1 we address the topic of savings and

uncertainty. Consequently, part 4.2 is devoted to a portfolio problem with several re-

marks on more general combined portfolio-savings problem. The last subsection 4.3 deals

with a firm’s production problem.

To be precise, work Rothschild and Stiglitz (1971) contains two more examples of economic

applications, which deal with a multi-stage planning problem in economy and choice of

output level for a competitive firm. Although they are quite interesting, we don’t present

them in detail because of lack of space.

Finally, in the part 4.4 we show an application of the equivalence of three alternative

approaches from Rothschild and Stiglitz (1970) (overviewed in part 3.2) to proof some of

the general theorems dealing with the situations of the probability distributions choice.

14

4.1 Savings and Uncertainty

In the first example we present an analysis of the effect of risk on the savings’ rate of

return. An individual wishes to allocate a given wealth W0 between consumption today

and tomorrow. Wealth not consumed today is invested and yields the random return e

per dollar invested. The expected two-period utility is

E [U(C1) + (1− δ)U(C2)] = U((1− s)W0) + (1− δ)EU(sW0e), (10)

with savings rate s and pure rate of time discount δ. We assume that the individual is

a risk averter, with the utility function satisfying U ′ > 0 and U ′′ < 0. By setting the

derivative of (10) with respect to s equal to zero, we obtain necessary and (as a reason of

risk aversion property) also sufficient condition for utility maximization:

U ′((1− s)W0) = E[U ′(sW0e)](1− δ)e. (11)

Intuitively, the increased uncertainty in the return on savings could have two possible

outcomes on the savings: they could either drop because “a bird in the hand is worth two

in the bush” or grow since risk averter saves more when facing increased unceartainty.

Formally, whether bigger risk increases or decreases an optimal level of savings s∗ depends

on convexity or concavity of eU ′(sW0e) in e. As a result, under the case of increasing risk

the level of s∗ grows if

2U ′′(C) + U ′′′(C)C > 0 (12)

and drops if converse inequality holds. Note that condition U ′′′(C) ≤ 0 suffices for in-

creasing risk to decrease savings.

Applying the Arrow-Pratt concept, we can reformulate these results using relative risk

aversion coefficient (R = −CU ′′/U ′). It can be observed that R′ has the same sign as

−(U ′′′C + U ′′(1 + R)), thus we can state that inequality (12) holds if R is nonincreasing

and greater than one. On the other hand, R nondecreasing and less than one provides

for opposite inequality.

Let’s conclude this example with comment on the application of mean-variance analysis.

As we already presented in part 3.3, this approach is equivalent to the assumption that

U is quadratic. However, if U(C) = aC − 12bC2, then we can express the RHS of (11) as

(1− δ)(aE(e)− bsW0E(e2)),

15

which decreases with E(e2) growing. Consequently, s has to drop in order to meet the

equality (11). As a result, this approach provides conclusion, which is compatible only

with the first argument (growing risk decreases savings) while omitting the second one

(savings increase as the variability raises).

4.2 Portfolio Problem and Combined Portfolio-Savings Problem

Let’s now move on to address the portfolio problem. Assume that an investor wishes

to divide his portfolio between money with zero rate of return and a risky asset with

a random rate of return e. If we represent W0 as his initial wealth and α as part of this

wealth invested in the risky asset, we obtain for the terminal wealth W (α) = W0(αe+1).

Again our objective is to maximize the expected utility of terminal wealth EU(W (α)) with

the utility function U satisfying the “risk averter” conditions formulated in the previous

problem (i.e. U ′ > 0, U ′′ < 0).

Let’s denote F the distribution function of e. Then the optimal α has to satisfy first order

condition

H(α) = W0E[U ′e] = W0

∫U ′(W (α))edF (e) = 0.

Notice that given the assumptions on the utility function, this condition is necessary as

well as sufficient (since H ′(α) < 0). Let’s again consider change in variability of e. Our

question is how the optimal level of α reacts to such a change.

Using the mean-variance analysis and the utility function in the form of a quadratic

function U(W ) = aW − 12bW 2, we receive that α = (a − bW0)E(e)/E(e2)bW0. Thus if

e becomes riskier (i.e. E(e2) increases with E(e) remaining constant), the optimal level

α has to grow. However, this result may not be true in general, though misleadingly

presented as a such. This can be observed using approach estabilished by theorem 3.

Consider that the distribution of e is changed from F to more variable G with the new

optimal allocation parameter α satisfying∫U ′(W (α)) e dG(e) = 0. Let’s define a function

S = G−F , then α R α if∫U ′(W (α)) e dS(e) R 0. Denote V (e) = U ′(W (α)) and further

assume that F and G have their points of increase confined to the interval (a, b). Now we

see that condition ∫ b

a

V (e) e dS(e) ≤ 0 (13)

for all positive and decreasing V and all S satisfying the integral conditions (2) and (3)

implies that an increase of variability decreases demand for risky assets by all risk-averse

16

individuals. Moreover, by using (3) and the second mean value theorem of the integral

calculus, we obtain a sufficient condition for (13) in a form

∀c ∈ (a, b) :

∫ c

a

e dS(e) = h(c) ≤ 0.

Furthermore, it can be shown that it is also a necessary condition. Otherwise we would

have c such that h(c) > 0. In this case we must have∫ b

aV (e)dS(e) < 0 for all positive

and decreasing V in order to (13) to be satisfied. Now consider

V =

V1 for a ≤ e < c

V2 for c ≤ e ≤ b,

where V1 > V2 > 0. Then∫ b

aV dS(e) > 0, a contradiction.

Concerning the statement that the increasing variability decreases the demand for risky

assets, authors proclaim that it is possible to show the incrasing concave utility functions,

which always satisfy it, and to prove that this type of utility functions doesn’t have a

property that increasing risk always increases α.

Regarding the application of the Arrow-Pratt concept of risk aversion, let’s first denote

Z(e) = eU ′(W (α)). We can interpret the previous results in a way that concavity of Z(e)

implies α ≤ α. Using relative and absolute risk aversion coefficients R = −U ′′W/U ′ and

A = −U ′′/U ′, we can express Z ′′(e) in a following form:

Z ′′(e) = [(1−R + AW0)U′′ + (W0A

′ −R)U ′]W0a.

Thus the nondecreasing relative risk aversion less than or equal to one together with the

nonicreasing absolute risk aversion are sufficient conditions for the decrease of the share

of risky asset caused by the increase of a risk.

We conclude this example by the notion of the portfolio-savings problem. In the model

we consider an individual who maximizes the expected value of the discounted utility of

consumption

E

∞∑t=0

(1− δ)tU(Ct),

where δ represents the discount rate and Ct denotes consumption at time t subject to the

stochastic constraints

Wt+1 = (Wt − Ct) rt,

17

where Wt stands for the wealth at time t and rt− 1 is the stochastic rate of return, which

consists of the rates of return of two assets given by expression

rt = αrt1 + (1− α)rt2.

Parameter α is fraction invested in the first asset and rt1 and rt2 represent the rate of

returns of asset 1 and asset 2 respectively.

We again ask what effect will an increase in risk of the one of the assets’ return have on

portfolio allocation and savings. Although it seems reasonable that an increase in the

variance of one asset decreases the proportion invested in this asset, it can be shown that

under special conditions an increase in variability could have the opposite effect.

We can also analyze an effect on the savings rate. Considering the CRRA utility function

U(C) = C1−a/(1− a) (for a > 0, a = 1), we obtain that if a < 1, then an increase in the

variability of r increases the savings rate, while a > 1 provides for an opposite result.

4.3 Firm’s Production’s Problem

As a last example presented in this part on the effects of increased risk, we overview

a problem of production setting. Let’s consider firm with uncertain output Q over next

period. A goal of the firm is to minimize the expected cost of production. Assume further

a two-factor concave production function P (K,L), which represents production process,

i.e. Q = P (K,L). K represents capital, which cannot be varied in the short run, and L

stands for labor, which, on the contrary, is variable.

The expected costs of production are given by expression

E[rK + wL(K,Q)] = rK + wE[L(K,Q)], (14)

where r is the cost of capital, w cost of labor and L(K,Q) stands for the level of the

labor, which is required to produce Q with capital K. Our question is what happens to

the expected costs as the variability of Q increases. To answer it we use the fact that

L(K,Q) is convex in Q for any given level of K (this is implied by the concavity of F ).

Therefore using our approach given by definition 5 we obtain that higher variability of Q

always results in the higher expected costs.

Consequent problem, which si more difficult to answer, addresses the reaction of the

optimum level of K to the increase of variability of Q. Authors point out that the answer

18

is related to the elasticity of substituion between K and L. Let’s start with a derivation

of the first order conditions from (14):

r

w= E

[∂L(Q,K)

∂K

],

which can be interpreted in a way that the factor-price ratio must be equal the mean

value (or the average) or the marginal rate of substitutions.

We conclude by two examples of particular production function. First we consider the

production function with a constant elasticity of substitution

Q(K,L) =(δKρ + (1− δ)Lρ

) 1ρ .

It can be shown that condition ρ ≤ 0 (or equivalently the elasticity of substitution less

than or equal one) implies convexity of ∂L/∂K with respect to Q, meaning that increase

of Q variability causes rise in the optimal level of K.

As a second example, we look at the production function with infinite elasticity

Q(K,L) = bK + aL.

Let’s denote G(Q) the distribution function for Q. Then it can be shown (for details

see (Rothschild and Stiglitz, 1971, p. 79)) that behavior of K regarding the increase of

variability of Q depends on the term G−1(1− (ar/wb)). To be more specific, optimal level

of K increases if G−1(1 − (ar/wb)) rises, or (equivalently) if probability that Q exceeds

bK increases.

4.4 Choosing a Probability Distribution

Finally, we address several examples of the application of our theoretical results (the

definition of variability and basic theorem on the equivalence) to prove some general

theorems dealing with the choice of a probability distribution from the set of possible

probability distribution.

We start with the diversification theorem. Consider an individual, who can allocate his

given initial wealth between two securities. Their values next period e1 and e2 (per

dollar invested) are assumed to have identical and independent distributions. An investor

chooses b to maximize the expected utility

EU(W ) = EU((be1 + (1− b)e2)W ),

19

where U is a concave function. The diversification theorem states, that optimal b holds

b = 12independently of the utility function.

To prove this statement, let’s define yb = (be1+(1−b)e2)W0. Note that we can reformulate

yb = y1/2 − (b− 1/2)(e1 − e2)W0 and furthermore E(e1 − e2|y1/2) = 0. This by definition

6 provides that y1/2 ≤a yb. Using the theorem 3 we receive y1/2 ≤u yb, therefore all

individuals with concave utility functions prefer y1/2 to yb.

Second, and final presented example deals with the Rao-Blackwell Theorem. Assume

a random distribution depending on an unknown parameter θ and consider a sample of

random variables x = (x1, . . . , xn) generated from this distribution. Furthermore consider

that the criteria for the choice of estimator d(x) of parameter θ depending on the sample x

is the minimization of the expected value of a convex loss function L(d(x)). The Rao-

Blackwell theorem states that for any estimator d(x) and any L that is convex, the

existence of sufficient statistic T for θ implies an existence of the estimator d∗ at least as

good as d(x) in the sense that EL(d∗(x)) ≤ EL(d(x)).

To prove this, let’s define d∗(x) = E(d(x)|T ) for every T . It’s easy to see that we have to

prove d∗(x) ≤u d(x), which is, by theorem 3, equivalent with d∗(x) ≤a d(x). Consider the

random variable z defined by equation d(x) = d∗(x) + z. By definition of d∗ it holds that

E(z|T, x) = E(z|d∗) = 0. Thus we may conclude (by definition 6) that d∗(x) ≤a d(x).

5 Conclusion

An aim of this seminar work is to provide a summary of the interesting series of papers

by Rothschild and Stiglitz (1970, 1971, 1972). We start with the section on theoretical

background, introducing four different approaches generally used to compare variability

of the random variables. In particular, we in depth explain and provide a formalization

for the concept of comparison of “the weight in the tails” of the random variables’ distri-

bution.

In the second part, we focus on the equivalence theorem as a main result of this series of

articles. After the introduction of the partial ordering and definition of three variability-

comparison approaches within this framework, we provide the theorem itself along with

the complete proof. A contribution of these theoretical finding within this framework is

that it states an equivalence of different perspectives on the issue of comparison of risk,

20

thus providing basis for the convenient definition of greater variability.

The final section is devoted to the examples of economic problems as potential applica-

tions of derived results. We present simple models on savings, portfolio allocation and

firm’s choice of production level. In each of these examples, we address the question of

impact of increased variability on the optimal levels of variables in the model. We show

that solutions given by the mean-variance analysis can be often misleading, omitting pos-

sibility of different outcomes other than the one single result. Finally, we apply the results

to prove some general theorems dealing with the choice of the probability distribution.

To conclude, the original works are some of the most essential regarding the economic

problem of risk assessment and comparison. Although, as the authors themselves admit,

the presented theoretical findings have been known before and therefore aren’t completely

new, they provide a complex theoretical background and thorough insight on the given

issues.

References

Blackwell, D. and M. A. Girshick (1954). Theory of Games and Statistical Decisions.

Wiley, New York.

Mayer, P. A. (1966). Probability and Potentials. Blaisdell, Waltham, Ma.

Rothschild, M. and J. E. Stiglitz (1970). Increasing Risk: I. A Definition. Journal of

Economic Theory 2 (3), 225–243.

Rothschild, M. and J. E. Stiglitz (1971). Increasing Risk II: Its Economic Consequences.

Journal of Economic Theory 3 (1), 66–84.

Rothschild, M. and J. E. Stiglitz (1972). Addendum to ”Increasing Risk: I. A Definition”.

Journal of Economic Theory 5 (2), 306–306.

Strassen, V. (1965). The Existence of Probability Measures with Given Marginals. The

Annals of Mathematical Statistics 36 (2), 423–439.

Tobin, J. (1965). The Theory of Portfolio Selection. In F. Hahn and F. Brechling (Eds.),

The Theory of Interest Rates. MacMillan, London.

21

summary of the papers on increasing risk by rothschild and stiglitzsecure site...

Documents