what finite-additivity can add to decision theory · decision theory in section2. the next three...

47
WHAT FINITE-ADDITIVITY CAN ADD TO DECISION THEORY By Mark J. Schervish, Teddy Seidenfeld, Rafael Stern, and Joseph B. Kadane Carnegie Mellon University and Federal University of S˜ao Carlos We examine general decision problems with loss functions that are bounded below. We allow the loss function to assume the value . No other assumptions are made about the action space, the types of data available, the types of non-randomized decision rules allowed, or the parameter space. By allowing prior distributions and the ran- domizations in randomized rules to be finitely-additive, we prove very general complete class and minimax theorems. Specifically, under the sole assumption that the loss function is bounded below, we show that every decision problem has a minimal complete class and all admis- sible rules are Bayes rules. We also show that every decision problem has a minimax rule and a least-favorable distribution and that every minimax rule is Bayes with respect to the least-favorable distribution. Some special care is required to deal properly with infinite-valued risk functions and integrals taking infinite values. 1. Introduction. The following example, adapted from Example 3 of Schervish, Seidenfeld and Kadane (2009), is a case in which countably- additive randomized rules do not contain a minimal complete class. It in- volves a discontinuous version of squared-error loss in which a penalty is added if the prediction and the event being predicted are on opposite sides of a critical cutoff. Example 1. A decision maker is going to offer predictions for an event B and its complement. The parameter space is Θ = {B,B C } while the action space is A = [0, 1] 2 , pairs of probability predictions. The decision maker suffers the sum of two losses (one for each prediction) each of which equals the usual squared-error loss (square of the difference between indicator of event and corresponding prediction) plus a penalty of 0.5 if the prediction is on the opposite side of 1/2 from the indicator of the event. In symbols, AMS 2000 subject classifications: Primary 62A01; secondary 62C05 Keywords and phrases: admissible rule, Bayes rule, complete class, least-favorable dis- tribution, minimax rule 1 October 31, 2017

Upload: others

Post on 28-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

WHAT FINITE-ADDITIVITY CAN ADD TO DECISIONTHEORY

By Mark J. Schervish, Teddy Seidenfeld, RafaelStern, and Joseph B. Kadane

Carnegie Mellon University and Federal University of Sao Carlos

We examine general decision problems with loss functions thatare bounded below. We allow the loss function to assume the value∞. No other assumptions are made about the action space, the typesof data available, the types of non-randomized decision rules allowed,or the parameter space. By allowing prior distributions and the ran-domizations in randomized rules to be finitely-additive, we prove verygeneral complete class and minimax theorems. Specifically, under thesole assumption that the loss function is bounded below, we show thatevery decision problem has a minimal complete class and all admis-sible rules are Bayes rules. We also show that every decision problemhas a minimax rule and a least-favorable distribution and that everyminimax rule is Bayes with respect to the least-favorable distribution.Some special care is required to deal properly with infinite-valued riskfunctions and integrals taking infinite values.

1. Introduction. The following example, adapted from Example 3 ofSchervish, Seidenfeld and Kadane (2009), is a case in which countably-additive randomized rules do not contain a minimal complete class. It in-volves a discontinuous version of squared-error loss in which a penalty isadded if the prediction and the event being predicted are on opposite sidesof a critical cutoff.

Example 1. A decision maker is going to offer predictions for an eventBand its complement. The parameter space is Θ = {B,BC} while the actionspace is A = [0, 1]2, pairs of probability predictions. The decision makersuffers the sum of two losses (one for each prediction) each of which equalsthe usual squared-error loss (square of the difference between indicator ofevent and corresponding prediction) plus a penalty of 0.5 if the predictionis on the opposite side of 1/2 from the indicator of the event. In symbols,

AMS 2000 subject classifications: Primary 62A01; secondary 62C05Keywords and phrases: admissible rule, Bayes rule, complete class, least-favorable dis-

tribution, minimax rule

1October 31, 2017

Page 2: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

2 SCHERVISH, SEIDENFELD, STERN AND KADANE

the loss function equals

L(θ, (a1, a2)) = (IB−a1)2+(IBC−a2)2+1

2

{I[0,1/2](a1) + I(1/2,1](a2) if θ = B,

I(1/2,1](a1) + I[0,1/2](a2) if θ = BC .

To keep matters simple, we assume that no data are available, but one couldrework the example with potential data at the cost of more complicated cal-culations. Figure 1 is a plot of all of the pairs (L(B, (a1, a2)), L(BC , (a1, a2))),which shows all of the possible risk functions of non-randomized rules (purestrategies when there are no data available.) The admissible non-randomized

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Risks of Pure Strategies

Risk at B

Ris

k at

B−

com

plem

ent

Fig 1. The risk functions of non-randomized rules in Example 1

strategies are the pairs (p, 1− p) for p ∈ [0, 1]. The corresponding points inFigure 1 are

(i) from (0,3) up to but not including (0.5,1.5) in the upper left, corre-sponding to p ∈ [0, 1/2),

(ii) (1,1) in the middle section, corresponding to p = 1/2, and(iii) from but not including (1.5,0.5) to (3,0) in the lower right, correspond-

ing to p ∈ (1/2, 1].

The countably-additive randomized rules have risk functions in the convexhull of the set plotted in Figure 1. The resulting set is not closed, and itslower boundary is missing all points on the closed line segment from (0.5, 1.5)to (1.5, 0.5) except (1, 1). One consequence of these points being missing is

October 31, 2017

Page 3: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 3

that there are many inadmissible rules, corresponding to points just abovethat line segment, that are dominated by other inadmissible rules, but notdominated by an admissible rule. In other words, there is no complete classin this problem.

If, however, one is willing to introduce finitely-additive randomizations,all of the missing risk functions of admissible rules become available. Forexample, there are finitely-additive probabilities P− and P+ on [0, 1] suchthat P−(A) = 1 for every set of the form A = (1/2 − ε, 1/2) and P+(C) =1 for every set of the form C = (1/2, 1/2 + ε). Every missing point onthe line segment between (0.5, 1.5) and (1.5, 0.5) is the risk function of arandomized rule that gives a1 the distribution αP− + (1 − α)P+ for someα ∈ [0, 1/2) ∪ (1/2, 1] and sets a2 = 1 − a1. For example, the risk functionfor α = 0 is the point (0.5, 1.5) while the risk function for α = 1 is the point(1.5, 0.5).

One result (Theorem 1) that we prove in this paper is that every deci-sion problem with a loss function bounded below has a minimal completeclass consisting of Bayes rules if finitely-additive randomizations are allowed.There are several complete class theorems in the countably-additive litera-ture that make additional assumptions about the loss function (e.g., continu-ous, convex) and about the distributions of the available data (e.g., exponen-tial family distributions). Some of these results can be found in Berger andSrinivasan (1978); Brown (1971) and Section 5.7 of Lehmann and Casella(1998). Another of our theorems is a general minimax theorem (Theorem 2)stating that every decision problem with a loss function bounded below hasa minimax rule and least-favorable distribution such that all minimax rulesare Bayes with respect to that least-favorable distribution. Heath and Sud-derth (1972) prove a finitely-additive minimax theorem when risk functionsare bounded. Heath and Sudderth (1978) consider cases in which a Bayesiananalysis will be performed and the joint distribution of data and parametercan be computed by integration in both orders (a property not possessedby all finitely-additive distributions.) In these situations they present resultsabout Bayes rules and extended admissible rules.

Our results also cover cases in which the loss function is allowed to assumethe value ∞. An example of such a loss function is the logarithmic loss forpredicting events. In the notation of Example 1, replace (IB−a1)2 + (IBC −a2)2 by − log(a1[1− a2])IB − log(a2[1− a1])IBC . Dealing with loss functionsthat assume the value ∞ requires special care. In particular, the set offunctions that need to be integrated is not a linear space.

The paper is organized as follows. We start with a general overview of

October 31, 2017

Page 4: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

4 SCHERVISH, SEIDENFELD, STERN AND KADANE

decision theory in Section 2. The next three sections give mathematicalbackground needed to prove the main theorems. The background beginswith topology and convergence in Section 3. At that point, we have enoughtools to give an overview of the theory of integration with respect to finitely-additive probabilities in Section 4. Because risk functions are integrals of lossfunctions with respect to various probability distributions, it is necessary tounderstand what it means to integrate a function with respect to a finitely-additive probability. The final bit of background includes some results onseparation of convex sets of unbounded functions in Section 5. After themathematical background, we return to results specific to decision theory.Throughout the paper, we assume that the loss function is bounded below.No other assumption is made about the structural parts of the decision prob-lem. Section 6 gives properties of the risk sets of general decision problemsinvolving finitely-additive probabilities and randomizations with loss func-tions that are bounded below. The first of the main theorems is the completeclass theorem (Theorem 1) that appears in Section 7. It states that everydecision problem has a minimal complete class of rules consisting of admis-sible Bayes rules. The second theorem is Theorem 2 in Section 8, which saysthat every decision problem has a minimax rule and a least-favorable distri-bution. Also, each minimax rule is Bayes with respect to the least-favorabledistribution. Section 9 gives some results on the existence and non-existenceof Bayes rules with respect to specific types of priors.

2. General Decision Theory. Table 1 contains some notation thatgets used frequently.

2.1. Decision Rules.

Definition 1. A non-randomized rule is a function γ : X → A. Wedenote by H0 the set of all non-randomized rules that the decision makerhas available. The risk function of a non-randomized rule δ is the functionR(·, γ) : Θ→ IR defined by

(1) R(θ, γ) =

∫XL(θ, γ(x))Pθ(dx).

A randomized rule δ is a coherent prevision onMH0 . Its risk function is thefunction is

(2) R(θ, δ) =

∫H0

R(θ, γ)δ(dγ).

A randomized rule δ ∈ SM is called simple.

October 31, 2017

Page 5: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 5

Table 1Notation used throughout this document. The subscript Z on symbols like MZ , ΛZ , etc.

will vary when referring to a specific set like Θ or H0.

Symbol Meaning

A The action spaceX The space where available data take their valuesΘ The parameter space that indexes the set of data distributionsPθ For each θ ∈ Θ, Pθ is a probability measure on XIR IR ∪ {∞} endowed with the topology consisting of all open subsets of IR together with

all sets of the form (c,∞)⋃{∞}.

Z A generic set that serves as the domain of a class of IR-valued functions

MZ The subset of IRZ

consisting of those elements that are bounded below

L(·, ·) The loss function mapping Θ×A to IR and bounded belowPZ The set of all coherent previsions (see Section 4) on MZΛZ The subset of PZ consisting of those coherent previsions that are minimum

extensions (see Definition 13) of their restrictions to the bounded functionsSZ The subset of ΛZ consisting of the minimum extensions (see Definition 13) of all simple

probability measures on MZ , i.e., those supported on a finite setH0 The subset of AX consisting of all non-randomized rules availble to the decision makerR(·, δ) The risk function of a rule δ as defined in (2)R The set of risk functions for all finitely-additive randomized rulesR1 The subset of R whose randomizations are minimum extensions (see

Definition 13)

Each risk function is an element of MΘ.We can think of a non-randomized rule γ as the special case of a random-

ized rule δ with δ(A) = IA(γ), for A ⊆ H0. With this understanding, (2)is a generalization of (1). The operational understanding of a randomizedrule δ is that a decision maker can select a γ from H0 using an auxiliaryrandomization with distribution δ.

We are deliberately vague about which elements of AX constitute the setH0. The reason is that our results are general enough to cover whatever setH0 one wishes to contemplate. One could choose all of AX , or just those rulesthat are formal Bayes rules with respect to various priors, or whatever subsetof AX one wishes. The rules can arise from fixed-sample-size experiments orfrom sequential experiments. The only structure we assume for the decisionproblem is that the loss L is bounded below and that all finitely-additiverandomizations (as defined in Definition 1) are allowed.

Wald and Wolfowitz (1951) described two types of randomized rules. Theyrefer to the randomizations we defined in Defintion 1, leading to risk func-tions of the form (2, as special randomizatons. They refer to a differentkind of randomization, that is common in countably-additive decision the-

October 31, 2017

Page 6: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

6 SCHERVISH, SEIDENFELD, STERN AND KADANE

ory, as a general randomization. A general randomization is a mapping fromX to probability distributions over A. In this manner, the randomizationperformed after observing a particular x can be virtually unrelated to therandomization that is performed after observing each other x. The theoryproceeds by defining

(3) L(θ, δ(x)) =

∫AL(θ, a)δ(x)(da),

and then inserting (3) directly into (1) to define the risk function of a generalrandomization δ as

(4) R(θ, δ) =

∫X

∫AL(θ, a)δ(x)(da)Pθ(dx).

Special randomizations are special cases of general randomizations. Forthe case in which all of the Pθ probabilities and all of the general random-izations are countably additive, we prove Lemma 30 (in Appendix A), thatstates that the risk functions that result from general randomizations areincluded in the set of risk functions that result from special randomizations.Aside from the restriction to countably additive probabilites, the major dif-ference general and special randomizations involves the orders in which theintegrals are performed. For general randomizations, the inner integral isover the randomization with the outer integral over the data distributionPθ. For special randomizations, the integral over the data distribution is in-side, and the outer integral is over the randomization. For finitely-additiveintegrals, the order of integration matters to a larger extent than it does forcountably-additve integrals.

2.2. Admissibility, Risk Sets, and Bayes Rules.

Definition 2. Let Z be a set, and let f, g ∈ IRZ

. We say that g dom-inates f if (i) g(z) ≤ f(z) for all z ∈ Z and (ii) there is z such that

g(z) < f(z). The lower boundary of a subset A of IRZ

, denoted ∂LA, isthe set of all functions f ∈ A such that there is no g ∈ A that dominates f .

The lower boundary of a set A of functions is defined in terms of theclosure of A. If more than one topology is available, one needs to be clearabout which closure one is using. The topology of pointwise convergenceof functions is always available for a function space since it is the producttopology obtained by identifying each function from a space S1 to S2 as anelement of SS1

2 . If the functions are all bounded, the topology of uniform

October 31, 2017

Page 7: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 7

convergence is also available. Every net that converges uniformly also con-verges pointwise. Hence, the pointwise closure of a set of functions shouldbe larger than the uniform closure when both are available. Lemma 6 showsthat uniform topology contains the pointwise topology for real-valued func-tions. For unbounded and IR-valued functions, the extensions of the uniformmetric to include both bounded and unbounded functions are not very use-ful. For example, no net of bounded functions can converge uniformly to anunbounded function.

Definition 3. The risk set R is the set of all risk functions of decisionrules. A decision rule δ dominates another decision rule δ′ if R(·, δ) dominatesR(·, δ′). A rule δ is admissible if no other rule dominates δ. A subset C of theset of all decision rules is called a complete class if, for every δ′ 6∈ C, thereexists a δ ∈ C such that δ dominates δ′. A set C is an essentially completeclass if, for every δ′ 6∈ C, there exisits δ ∈ C such that R(θ, δ) ≤ R(θ, δ′)for all θ. A(n essentially) complete class is minimal if no proper subset is(essentially) complete.

For the remainder of this paper, the risk setR is the set of all risk functionsof randomized rules as defined in Definition 1.

In addition to (as well as related to) the risk function, the Bayes risk of adecision rule with respect to one or several prior distributions is importantto decision making. A prior distribution λ over Θ can also be interpreted (asin Section 4) as a coherent prevision on MΘ, hence as an element of PΘ.

Definition 4. If δ is a randomized rule, the Bayes risk of δ with respectto λ is the value

(5) r(λ, δ) =

∫ΘR(θ, δ)λ(dθ).

Let r0(λ) = infδ r(λ, δ). If

(6) r(λ, δ0) = r0(λ),

then δ0 is called a Bayes rule with respect to λ.

3. Topology Background. Our results rely on the theory of productsets, nets, compact sets, and ultrafilters.

3.1. Product Sets.

October 31, 2017

Page 8: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

8 SCHERVISH, SEIDENFELD, STERN AND KADANE

Definition 5. Let {Zα}α∈ℵ be a collection of sets, each of which has atopology. The product topology on the product set

∏α∈ℵZα is the topology

that has as a sub-base the collection of all sets of the form∏α∈ℵ Yα where

Yα = Zα for all but at most one α, and for that one α, Yα is an open subsetof Zα.

If we let Z =⋃α∈ℵZα, then the product set

∏α∈ℵZα can be thought of

as the set of all functions f : ℵ → Z such that f(α) ∈ Zα for all α. If all Zαare the same set Z, then the product set is often written Zℵ.

Definition 6. Let {Zα}α∈ℵ be a collection of sets. For each β ∈ ℵ, thefunction fβ :

∏α∈ℵZα → Zβ defined by fβ(g) = g(β) is called an evaluation

functional or a coordinate-projection function.

It is not difficult to show that the product topology has two other equiv-alent characterizations: (i) the smallest topology such that all of the evalu-ation functionals are continuous, and (ii) the topology of pointwise conver-gence of the functions in the product set.

Every open set in a topological space is the union of arbitrarily many basicopen sets. Each basic open set is the intersection of finitely many sub-basicopen sets. Hence, the following result is straightforward.

Proposition 1. For every open set N in a product space, there exista finite integer n, points α1, . . . , αn ∈ ℵ, and open sets Nj ⊆ Zαj for j =1, . . . , n such that N contains

(7) {f : f(αj) ∈ Nj , for j = 1, . . . , n}.

In particular, every neighborhood of a function g in the product space mustcontain a set of the form (7) such that g(αj) ∈ Nj for j = 1, . . . , n.

3.2. Nets. Our main use for nets is to identify the closure of a set.

Definition 7. A partial order on a set D is a binary relation ≤D withthe following properties: (i) for all η ∈ D, η ≤D η (reflexive), and (ii) ifη ≤D β and β ≤D γ, then η ≤D γ (transitive). A directed set is a set D witha partial order ≤D that has the following additional property: (iii) for allη, β ∈ D, there exists γ ∈ D such that η ≤D γ and β ≤D γ.

Definition 8. Let D be a directed set with partial order ≤D. A functionr : D → T is called a net on D in T . Such a net is denoted either (D, r) or

October 31, 2017

Page 9: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 9

x = {xη}η∈D, where xη = r(η) for η ∈ D.1 If {xη}η∈D is a net, then for eachη ∈ D, the set

Aη = {xβ : η ≤D β}

is called a tail of x. If T is a topological space, we say that the net x convergesto x∗ ∈ T if every neighborhood of x∗ contains a tail of x.

Note that the set ZZ of positive integers is a directed set, and each sequenceis a net on ZZ. Every result that holds for all nets holds a fortiori for allsequences. There are nets that are not sequences.

Example 2. Let D be the collection of all finite subsets of an uncount-able set X . Create the partial order ≤D on D defined by η1 ≤D η2 if η1 ⊆ η2.Notice that ηj ≤D (η1 ∪ η2) for j = 1, 2, making D a directed set. Let T bethe collection of all indicator functions of subsets of X , and let xη = Iη foreach η ∈ D. It is straightforward to show that {xη}η∈D converges to IX inthe topology of pointwise convergence of real-valued functions on X . It isalso easy to see that no sequence of indicators of finite sets can converge tothe indicator of an uncountable set.

The following results are taken from Dunford and Schwartz (1957).

Proposition 2 (I.7.2, p. 27). The closure of a set C is the set of alllimits of nets in C that converge.

Proposition 3 (I.8.2, p. 32). The product of a collection of Hausdorffspaces is Hausdorff in the product topology.

Definition 9. Let x = {xη}η∈D be a net in a topological space T . Wecall a net y = {yγ}γ∈D′ a subnet of x if there exists a function h : D′ → Dwith the following properties: (i) yγ = xh(γ) for all γ ∈ D′, (ii) γ1 ≤D′ γ2

implies h(γ1) ≤D h(γ2) and (iii) for every η ∈ D there exists γ ∈ D′ suchthat h(γ) ≥D η. A cluster point of x is a point p ∈ T such that, for everyneighborhood N of p and every η ∈ D, there exists η′ ∈ D such that η′ ≥D ηand xη′ ∈ N .

It is trivial to see that a net is a subnet of itself. If we think of a sequenceas a net, then it is also trivial that a subsequence is a subnet.

1The latter notation makes the net look like a generalization of a sequence, hence thealternative name generalized sequence.

October 31, 2017

Page 10: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

10 SCHERVISH, SEIDENFELD, STERN AND KADANE

Lemma 1. For a convergent net in a Hausdorff space, there is a uniquelimit that is also the unique cluster point.

Proof. Clearly, the limit of a convergent net is a cluster point. Theproof will be complete if we prove that every cluster point equals everylimit. Assume to the contrary that there exists a cluster point p′ that is notequal to a limit p of a convergent net {xη}η∈D. Let N and N ′ be disjointneighborhoods of p and p′ respectively. There exists η0 ∈ D such that xη ∈ Nfor all η ≥D η0. Because p′ is a cluster point, there exist η′ ∈ D such thatη0 ≤D η′ and xη′ ∈ N ′. But xη′ ∈ N also, which contradicts N ′

⋂N = ∅.

Lemma 2. Let x = {xη}η∈D be a net in a topological space T , andsuppose that p is a cluster point. Then there exists a subnet {yγ}γ∈D′ of xthat converges to p.

Proof. For each neighborhood N of p and each η ∈ D, let αN,η ∈ D besuch that η ≤D αN,η and xαN,η ∈ N . Let

D′ = {(N, η) : η ∈ D and N is a neighborhood of p}.

We show next that D′ is a directed set. Define ≤D′ by (N1, η1) ≤D′ (N2, η2)to mean N2 ⊆ N1 and η1 ≤D η2, which is clearly transitive and reflexive. If(N1, η1), (N2, η2) ∈ D′, let η′ ∈ D be such that ηj ≤D η′ for j = 1, 2. Then forj = 1, 2 (Nj , ηj) ≤D′ (N1

⋂N2, η

′) ∈ D′. Let h : D′ → D be h(N, η) = αN,η,which clearly satisfies properties (ii) and (iii) in the definition of subnet. Lety(N,η) = xαN,η for each (N, η) ∈ D′. To see that {yN,η}(N,η)∈D′ converges top, let N0 be a neighborhood of p. Then y(N,η) ∈ N0 for all N ⊆ N0 and allη ∈ D.

Lemma 3. A net converges if and only if every subnet converges to thesame limit.

Proof. The “if” direction is trivial because the net itself is a subnet.For the “only if” direction, assume that the net {xη}η∈D converges to p.Let {yγ}γ∈D′ be a subnet with corresponding function h. Let N be a neigh-borhood of p. Let ηN be such that for all η ≥D ηN , xη ∈ N . Let γN ∈ D′be such that h(γN ) ≥D ηN . If γ ≥D′ γN , then h(γ) ≥D h(γn) ≥ ηN andyγ = xh(γ) ∈ N .

The following results are straightforward.

October 31, 2017

Page 11: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 11

Proposition 4. Let x = {xη}η∈D be a net in a topological space T witha subnet {yγ}γ∈D′. Let f : T → V be a function to another topological spaceV. Then {f(xη)}η∈D is a net in T with a subnet {f(yγ)}γ∈D′.

Proposition 5. Let Z be a set, and let {fη}η∈D and {gη}η∈D be con-

vergent nets in IRZ

. Let f and g be the respective limits. If fη(z) ≤ gη(z)for all z ∈ Z and η ∈ D, then f(z) ≤ g(z) for all z ∈ Z.

3.3. Compact Sets. The following results are taken from Dunford andSchwartz (1957).

Proposition 6 (I.7.9, p. 29). A topological space is compact if and onlyif every net has a cluster point.

Proposition 7 (I.5.7(a), p. 17). A closed subset of a compact space iscompact.

Proposition 8 (I.8.5, p. 32). The product of a collection of compactspaces is compact in the product topology.

It is well known that a subset of IR is compact if and only if it is closedand bounded.

Lemma 4. For each real number b, set [b,∞] is a compact Hausdorffsubset of IR.

Proof. First, we prove that IR is Hausdorff, hence so is every subset. Letx 6= y ∈ IR. If both are finite then the following open intervals (x− ε, x+ ε)and (y − ε, y + ε) are disjoint, where ε < |x − y|/2. If x < ∞ = y, then(x − 1, x + 1) and (x + 2,∞] are disjoint open intervals. Next, we provethat [a,∞] is compact. Let A = {Aα}α∈ℵ be an open cover of [b,∞]. Since∞ ∈ IR, there exists α0 ∈ ℵ such that Aα0 contains a set of the form (c,∞].For the other elements of A, let Bα = Aα \ {∞}. Then {Bα}α 6=α0 is anopen cover of [b, c], which has a finite subcover, because [b, c] is a compactsubset of IR. That finite subcover, together with Aα0 forms a finite subcoverof [b,∞] from A.

The supremum and infimum are functionals defined on spaces of IR-valuedfunctions. They are not continuous in general, but they do have the followingproperty.

October 31, 2017

Page 12: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

12 SCHERVISH, SEIDENFELD, STERN AND KADANE

Lemma 5. Let {fη}η∈D be a convergent net (with limit f0) in IRZ

forsome set Z.

• If supz∈Z fη(z) ≤ cη for all η and {cη}η∈D converges to c0, thensupz∈Z f0(z) ≤ c0.• If infz∈Z fη(z) ≥ cη for all η and {cη}η∈D converges to c0, then infz∈Z f0(z) ≥c0.

Proof. Consider the claim about the supremum first. The claim is vac-cuous if c0 =∞. If c0 <∞, suppose, to the contrary, that supz∈Z f0(z) > c0.Then there exists ε > 0 and z0 ∈ Z such that f0(z0) > c0 + ε. Because fηconverges pointwise to f0, there exists η0 such that η ≥D η0 implies thatfη(z0) > c0+2ε/3. Since cη converges to c0, there exists η1 such that η ≥D η1

implies that cη < c0 +ε/3, hence fη(z0) < c0 +ε/3. There exists η2 such thatηj ≤D η2 for j = 0, 1. Hence, η ≥D η2 implies both fη(z0) > c0 + 2ε/3 andfη(z0) < c0 + ε/3, a contradiction. A similar argument works for the claimabout the infimum.

Lemma 5 has a useful corollary.

Corollary 1. Let c ∈ IR. Then {f : infz∈Z f(z) ≥ c} and {f :supz∈Z f(z) ≤ c} are closed in the pointwise topology.

3.4. Ultrafilters.

Definition 10. Let S be a set and let U be a non-empty collection ofsubsets of S. We call U an ultrafilter on S if (i) A ∈ U and A ⊆ B impliesB ∈ U , (ii) A,B ∈ U implies A ∩ B ∈ U , and (iii) for every A ⊆ S, A ∈ Uif and only if AC 6∈ U . An ultrafilter on S is principal if there exists s ∈ Ssuch that U consists of all subset of S that contain s. Such an s is called theatom of U . Other ultrafilters are called non-principal.

The following result is from Comfort and Negrepontis (1974), and it givesgeneral conditions under which ultrafilters exist.

Proposition 9 (Theorem 2.18, p. 39). Let B be a Boolean algebra, andlet F ⊂ B. If F has the finite intersection property, then there is an ultrafilteron B that contains F .

Ultrafilters on directed sets are the main ones that we will need. If D isa directed set, then the collection of all tails of D has the finite-intersectionproperty. (Every finite subcollection has non-empty intersection.) By Propo-sition 9, there is an ultrafilter UD that contains all tails of D.

October 31, 2017

Page 13: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 13

3.5. Uniform Topology. If F is a space of bounded functions, there is analternative topology that can be used with F .

Definition 11. Let F be the set of bounded real-valued functions de-fined on Z. The topology of uniform convergence (also called the uniformtopology) has as a sub-base all sets of the form

(8) Vf,ε = {g : |f(z)− g(z)| < ε, for all z},

for each f ∈ F and each ε > 0.

There is a connection between the uniform topology and the pointwise(product) topology for a set of bounded functions.

Lemma 6. Let F be the set of bounded real-valued functions defined ona set Z. Every subset A of F that is closed (open) in the pointwise topologyis also closed (open) in the uniform topology.

Proof. Since a set is closed if and only if its complement is open, provingone statement proves the other. Let A be open in the pointwise topology.We need to show that, for every f ∈ A, there is a uniform neighborhood of fcontained in A. Let f ∈ A. Since A is open (in the pointwise topology) it con-tains a (pointwise) basic open set M that includes f . We know that M hasthe form {g : g(zj) ∈ Mj , for j = 1, . . . , n} for some integer n, some opensets M1, . . . ,Mn in IR, and some z1, . . . , zn in Z. Since f ∈M , each Mj con-tains an interval of the form (f(zj)− εj , f(zj) + εj). Let ε = min{ε1, . . . , εj}.Then M contains G = {g : |g(zj)− f(zj)| < ε, for j = 1, . . . , n}. The set Vdefined by (8) is a subset of G. So V ⊆ G ⊆M ⊆ A.

The converse to Lemma 6 is false.

Example 3. Let f ∈ F and let A be the ε ball around f , i.e.,

A = {g ∈ F : |g(z)− f(z)| < ε for all z}.

Then A is a basic open set in the uniform topology. On the other hand,for every open set B in the pointwise topology there exists z0 ∈ Z suchthat, for every real r, there exists g ∈ B such that g(z0) = r. If we chooser 6∈ (f(z0)− ε, f(z0) + ε), we see that g 6∈ A.

For the remainder of the paper, we will mention which topologies are usedin the proofs that rely on topological features.

October 31, 2017

Page 14: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

14 SCHERVISH, SEIDENFELD, STERN AND KADANE

4. Coherent Previsions.

4.1. Unbounded and Infinite-Valued Functions. De Finetti (1974) laidout the theory of coherent previsions for bounded random variables in greatdetail. In particular, a finitely-additive probability can be defined on anarbitrary collection of subsets of a general set Z, including the power set.For this reason, measurability conditions are often not included in theoremsabout finitely-additive probabilities.

In this paper, we need an extension of the finitely-addtivie theory frombounded random variables both to unbounded random variables, as was doneby Schervish, Seidenfeld and Kadane (2014), and to random variables thatassume the value∞ as well. We stop short of defining coherent previsions forrandom variables that both assume the value ∞ and are unbounded below.

Definition 12. Let Z be a set, and let F be a subset of IRZ

. LetP : F → IR ∪ {−∞}. An acceptable gamble is a linear combination of theform

(9) G(z) =n∑j=1

αj [Xj(z)− cj ],

where each Xj ∈ F , cj = P (Xj) for each j such that P (Xj) is finite, αj ≥ 0for each j such that either P (Xj) = ∞ or there is z such that Xj(z) = ∞,and αj ≤ 0 for each j such that P (Xj) = −∞. We call P a coherent previsionon F if supz G(z) ≥ 0 for every acceptable gamble G.

The conditions involving infinite values are needed to prevent ∞ − ∞from appearing in the formula for an acceptable gamble. If P is a coherent

prevision on F , X ∈ F , and −X ∈ IRZ

, then P (−X) = −P (X) is coherentwith P even if −X 6∈ F and even if P (X) ∈ {∞,−∞}. (Note that suchan X could not assume the value ∞.) Indeed, every acceptable gamble thatinvolves P (Xj) = −∞ is equal to one with Xj replaced by −Xj , so wecould define coherence without allowing −∞ previsions in the formula foran acceptable gamble. For a set F containing functions that are all boundedbelow, a coherent prevision cannot take the value −∞.

The following result allows us to extend coherent previsions to sets thatare either convex or linear.

Proposition 10. Let P be a coherent prevision on a subset F ⊆ IRZ

consisting of functions that are bounded below. Then it is coherent to extendP to the convex hull of F by P (α1X1 + · · · + αnXn) =

∑nj=1 αjP (Xj) for

October 31, 2017

Page 15: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 15

all finite n, all X1, . . . , Xn ∈ F and all non-negative α1, . . . , αn that addto 1. It is also coherent to extend P to all linear combinations of the formα1X1 + · · · + αnXn where each Xj ∈ F satisfies P (Xj) < ∞ and αj ≥ 0for each j for which there exists z such that Xj(z) = ∞. The extension isP (α1X1 + · · ·+ αnXn) =

∑nj=1 αjP (Xj).

For bounded functions, we have the following characterization of coherentprevisions.

Proposition 11. Let F be a linear space of bounded functions. ThenP is a coherent prevision on F if and only if P is a linear functional thatsatisfies (i) P (1) = 1, and (ii) P (X) ≥ 0 for all non-negative X.

The following trivial result is useful, but it illustrates the limitations as-sociated with previsions of unbounded functions.

Lemma 7. If F is a set of unbounded functions that are all boundedbelow, there is a coherent prevision P with P (f) =∞ for all f ∈ F .

Proof. In (9), every αj ≥ 0 is required, hence (9) is unbounded aboveso long as at least one αj 6= 0.

The following result says that a coherent prevision on the set of allbounded random variables can be extended, in one operation, to all randomvariables bounded below. This is in contrast to the fundamental theoremof prevision, which allows more general extensions, but those more generalextensions require that previsions be assigned to new random variables oneat a time. Heath and Sudderth (1978, eqn. 1.2) state a similar result withoutbeing explicit about the possibility of the extension taking infinite values.

Lemma 8. Let J be a linear space of bounded real-valued functions de-fined on a set Z, and let P be a coherent prevision on J . Let K ⊃ J be asubset of MZ such that, for each X ∈ K and each real m ≥ 0, X ∧m ∈ J .For each X ∈ K, define

P ′(X) = supY ∈ J , Y ≤ X

P (Y ).

Then P ′ = P on J . Let K0 be the set of all linear combinations of elements ofK∪J of the form

∑nj=1 αjXj where αj > 0 for each j such that P ′(Xj) =∞

or such that {z : Xj(z) = ∞} 6= ∅. Define Q(Y ) =∑n

j=1 αjP′(Xj) for

Y ∈ K0. Then Q is well-defined and is a coherent prevision on K0.

October 31, 2017

Page 16: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

16 SCHERVISH, SEIDENFELD, STERN AND KADANE

Proof. Because P is coherent, Y ≤ X implies P (Y ) ≤ P (X) for X,Y ∈J . Hence P ′(X) = P (X) for X ∈ J . Also, P ′(αX) = αP ′(X) if α > 0. Forthis reason, there is no loss of generality in assuming that all αj are 1 or −1.

Next, we show that P ′(X1 + X2) = P ′(X1) + P ′(X2) for X1, X2 ∈ K.Notice that P ′(X) = limm→∞ P (X ∧m) because every bounded Y ≤ X isbounded above by X ∧m where m = supz Y (z). Then notice that, for all m,

(X1 ∧m/2) + (X2 ∧m/2) ≤ (X1 +X2) ∧m ≤ (X1 ∧m) + (X2 ∧m).

The limits of the left-hand and right-hand expressions are both P ′(X1) +P ′(X2) while the limit of the middle expression is P ′(X1 +X2).

Next, we show that Q is well-defined. We need to show that, (i) if X =∑nj=1 αjXj with αj = 1 for each j such that P ′(Xj) = ∞ or {z : Xj(z) =

∞} 6= ∅, (ii) Y =∑m

k=1 βkYk(z) with βk = 1 for each k such that P ′(Yk) =∞or {z : Yk(z) =∞} 6= ∅, and (iii) X(z) = Y (z) for all z, then Q(X) = Q(Y ).Rewrite the equation

n∑j=1

αjXj =

m∑k=1

βkYk,

by moving all terms with −1 coefficients to the other side. The result hasthe form

Z1 + · · ·+ Z` = W1 + · · ·+Ws,

where each Zi is either an Xj with αj = 1 or a Yk with βk = −1, and eachWi is either a Yk with βk = 1 or an Xj with αj = −1. Also, all of the Ziand Wi are in K. We just proved that P ′ is additive on K, so

P ′(Z1) + · · ·+ P ′(Z`) = P ′(W1) + · · ·+ P ′(Ws).

Now, move the terms that corresponded to negative coefficients (all of whichare finite) back to their original sides of the equation, and we get thatQ(X) = Q(Y ).

Finally, we prove that Q is coherent. Let K1 be that part of K0 whereQ is finite and each element of K1 is real-valued. From what we’ve alreadyshown, it follows that, on K1 (i) Q is a linear functional, (ii) Q(1) = 1, and(iii) Q(X) ≥ 0 if X ≥ 0. It follows that Q is a coherent prevision on K1.The only case of gambles not yet covered is that involving both elementsof K1 and functions with infinite prevision and/or functions that take thevalue ∞. Let Q(X1) =∞, let Q(X2) be finite, and let X3 take the value ∞with Q(X3) finite. We need to show that supz[X1(z)− c+X2(z)−Q(X2) +X3(z) −Q(X3)] ≥ 0 and supz[X1(z) − c + X2(z) −Q(X2)] ≥ 0. First, notethat

supz

[X1(z)− c+X2(z)−Q(X2) +X3(z)−Q(X3)] =∞ > 0.

October 31, 2017

Page 17: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 17

Finally, let Y ∈ J be such that Y ≤ X1 and ∞ > P (Y ) > c. Then,

supz

[X1(z)− c+X2(z)−Q(X2)] ≥ supz

[Y (z)− P (Y ) +X2(z)−Q(X2)]

= supz

[Y (z)−Q(Y ) +X2(z)−Q(X2)],

which is non-negative because Y and X2 are in K1.

Definition 13. We refer to Q in Lemma 8 as the minimum extensionof P . If a prevision Q is the minimum extension of its restriction to thebounded random variables, then we say that Q is a minimum extension.

The term “minimum extension” is used because it assigns the minimumof all possible coherent previsions that are consistent with P to all func-tions that are bounded below. It is straightforward to see that the measure-theoretic definition of expectation with respect to a countably-additive prob-ability is a minimum extension.

Lemma 9. Let X be bounded below. Let P be a minimum extension suchthat P (X) <∞. Then P (X) = limm→∞ P [XI{X≤m}].

Proof. Since P [XI{X≤m}] is non-decreasing in m, it converges to somenumber c1 ≤ P (X). Hence mP (X > m) converges to c2 = P (X) − c1 ≥ 0.We need to prove that c2 = 0. Assume, to the contrary, that c2 > 0. Then

limn→∞

2nP (X > 2n)

2n−1P (X > 2n−1)= 1.

Also,

(10) P (X) ≥∞∑n=1

2n−1P (2n−1 < X ≤ 2n).

For each n ≥ 1, let dn = 2P (X > 2n)/P (X > 2n−1). We’ve assumedthat limn→∞ dn = 1. So P (2n−1 < X ≤ 2n) = P (X > 2n)(1 + εn) wherelimn→∞ εn = 0. It follows from (10) that

P (X) ≥∞∑n=1

2n−1P (X > 2n−1)(1 + εn) =∞,

which contradicts P (X) <∞.

October 31, 2017

Page 18: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

18 SCHERVISH, SEIDENFELD, STERN AND KADANE

Lemma 10. Let X be bounded below. Let P be a minimum extensionsuch that P (X) <∞. Let A = {X =∞}. Then P (XIA) = 0 and P (A) = 0.

Proof. Lemma 9 tells us that P (XIA) = limm→∞ P (XIAI{X≤m}), butIAI{X≤m} = 0, so P (XIAI{X≤m}) = 0 for all m and P (XIA) = 0. If P (A) >0, then P ([X ∧m]I{X>0}) ≥ P ([X ∧m]IA) ≥ mP (A), which goes to infinityas m→∞, contradicting P (X) <∞.

Since we make use of Lemma 7 in the proof of Theorem 2, we should notethat the coherent prevision in Lemma 7 will not in general be a minimumextension.

Example 4. Let Z be an uncountable set. Then Z contains uncount-ably many disjoint countable subsets. Let B be an uncountable collectionof countable disjoint subsets of Z. For each B ∈ B, define the unboundedfunction fB(z) =

∑∞j=1 nIB(zn), where B = {z1, z2, . . .} of Z. The minimum

extension of a probability λ on Z assigns positive prevision to fB if and onlyif λ(B) > 0. At most countably many elements of B can have positive prob-ability, hence at most countably many of the functions fB can have positive(let alone infinite) prevision under the minimum extension of λ.

4.2. Ultrafilter Probabilities. There is a correspondence between ultrafil-ters and 0-1-valued probabilities.

Lemma 11. Let Z be a set, and let F be a field of subsets of Z. Afinitely-additive probability P defined on F takes only the values 0 and 1 ifand only if (i) P can be extended to P ′ defined on 2Z and (ii) there is anultrafilter U of subsets of Z such that P ′(E) = 1 if and only if E ∈ U .

Proof. For the “if” direction, the restriction of P ′ to F takes only thevalues 0 and 1. For the “only if” direction, V = {E ∈ F : P (E) = 1} hasthe finite-intersection property, hence there is an ultrafilter U of subsets ofZ such V ⊆ U . Define P ′(E) = 1 if E ∈ U and P ′(E) = 0 if E 6∈ U . It isclear that P ′ is finitely-additive on 2Z .

Definition 14. Let U be an ultrafilter of subsets of some set Z. We callthe probability P defined by P (E) = 1 if E ∈ U and P (E) = 0 if E 6∈ U theprobability corresponding to U .

Lemma 12. Let D be a set. Let U be an ultrafilter of subsets of D. LetP be the minimum extension of the probability on D that corresponds to U .

October 31, 2017

Page 19: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 19

Then, for each f ∈MD,∫Df(η)P (dη) = sup

B∈Uinfη∈B

f(η) = infB∈U

supη∈B

f(η).

Proof. Let k0 =∫D f(η)P (dη). Since P (B) = 1 for all B ∈ U , Lemma 10

implies that k0 ≤ supη∈B f(η), for all B ∈ U . Similarly, k0 ≥ infη∈B f(η),for all B ∈ U . It follows that

(11) supB∈U

infη∈B

f(η) ≤ k0 ≤ infB∈U

supη∈B

f(η).

If the two endpoints, call them a ≤ b, of (11) are not equal, then for eachc ∈ (a, b) precisely one of {η : f(η) ≤ c} or {η : f(η) > c} is in U . If it is thefirst of these, it contradicts b = infB∈U supη∈B f(η). If it the second one, itcontradicts a = supB∈U infη∈B f(η). Hence a = b, and both inequalities in(11) are equality.

An alternative expression for∫D f(η)P (dη) is∫

Df(η)P (dη) = sup{x : f−1([x,∞]) ∈ U} = inf{x : f−1((−∞, x]) ∈ U}.

The proof is similar to the proof of Lemma 12.

Lemma 13. Let Z be a set. Let D be a directed set and let f = {fη}η∈Dbe a net in MZ . An element g of IR

Zis a cluster point of f if and only

if there is an ultrafilter U that contains all tails of D whose correspondingprobability on D has minimum extension P such that g(z) =

∫D fη(z)P (dη)

for all z ∈ Z.

Proof. For the “only if” direction, assume first that f converges to g. LetU be any ultrafilter that contains all tails of D, and let P be the minimumextension of the corresponding probability on D. Let h(z) =

∫D fη(z)P (dη)

for each z ∈ Z. We need to show that, for each z ∈ Z and each neighborhoodN of g(z), h(z) ∈ N . Let z ∈ Z. Lemma 12, applied to ultrafilters of subsetsof D, says that

(12) sup{η : fη(z)B∈U infη∈B

fη(z) = h(z) = infB∈U

supη∈B

fη(z).

If g(z) is finite, let ε > 0, and let N be the interval (g(z) − ε, g(z) + ε).Because f converges to g, there exists η ∈ D such that fβ(z) ∈ N for allβ ≥D η. Let B = Aη = {β ∈ D : η ≤D β}, which is in U . It follows that

October 31, 2017

Page 20: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

20 SCHERVISH, SEIDENFELD, STERN AND KADANE

the left and right sides of (12) are respectively at least g(z)− ε and at mostg(z) + ε. Hence h(z) ∈ N . If g(z) = ∞, let N = (c,∞]. Then there existsη ∈ D such that fβ(z) > c for all β ≥D η. Let B = Aη, which is in U .It follows that the left side of (12) is at least c, so that h(z) ∈ N . Next,assume that g is merely a cluster point of f . Then there exists a subnetf ′ = {rτ}τ∈D′ that converges to g. The previous argument shows that forevery ultrafilter U ′ on D′ that contains all tails of D′, g(z) =

∫D′ rτ (z)P ′(dτ),

where P ′ is the probability that corresponds to U ′. Let h : D′ → D be thecofinal function that embeds D′ in D, preserving order. Then rτ = fh(τ), sothat g(z) =

∫D′ fh(τ)(z)P

′(dτ). Let P be the probability on D induced by hfrom P ′. Then ∫

Dfη(z)P (dη) =

∫D′fh(τ)(z)P

′(dτ).

For the “if” direction, let U be an ultrafilter that contains all tails of Dsuch that g(z) =

∫D fη(z)P (dη) for all z ∈ Z, where P is the minimum

extension of the probability on D that corresponds to U . We need to showthat g is a cluster point of f . Specifically, we need to show that, for everyneighborhood N of g and η ∈ D, there exists β ≥D η such that fβ ∈ N . Itsuffices to prove this claim for all neighborhoods of the formN = {h : h(zj) ∈Nj , for j = 1, . . . , n}, for arbitrary positive integer n, distinct z1, . . . , zn ∈ Zand neighborhoods N1, . . . , Nn of the form Nj = (g(zj) − ε, g(zj) + ε) forε > 0 if g(zj) is finite and Nj = (c,∞] if g(zj) =∞. So, let N be of the formjust described, and let η ∈ D. We need to find β ∈ D such that η ≤D β andfβ(zj) ∈ Nj for j = 1, . . . , n. We know that, for all z ∈ Z,

supB∈U

infβ∈B

fβ(z) = g(z) = infB∈U

supβ∈B

fβ(z).

For each j such that g(zj) is finite, let Bj ∈ U be such that such that

infη∈Bj

fβ(zj) > g(zj)− ε,

supη∈Bj

fβ(zj) < g(zj) + ε.

For each j such that g(zj) =∞, let Bj ∈ U be such that such that

infβ∈Bj

fβ(zj) > c.

If we replace Bj by Bj ∩Aη, all of the last three inequalities above continueto hold. For each j, let βj ∈ Bj , and let β ≥D βj for all j. Then η ≤D β andfβ(zj) ∈ Nj for all j.

October 31, 2017

Page 21: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 21

Corollary 2. Let Z be a set. Let D be a directed set and let f =

{fη}η∈D be a net in IRZ

. The net f converges to g if and only if for everyultrafilter U that contains all tails of D with corresponding probability on Dhaving minimum extension P , g(z) =

∫D fη(z)P (dη) for all z ∈ Z.

4.3. Topology of Coherent Previsions. The set of coherent previsions is aset of functions defined on the space of random variables (functions) underconsideration. In general, if Z is a space and we think of MZ as the set ofall IR-valued functions that are bounded below, we can think of PZ as a setof functions from MZ to IR. As such, PZ has a product topology, which isalso the topology of pointwise convergence. That is, a net {Qη}η∈D in PZconverges to R if and only if {Qη(f)}η∈D converges to R(f) for all f ∈MZ .

Lemma 14. Let Z be a set. Let Q = {Qη}η∈D be a net of coherentprevisions on a subset J of MZ . Every cluster point of Q is a coherentprevision on J .

Proof. Since every cluster point of every net is the limit of some (pos-sibly different) convergent net, it sufficies to prove the result for limitsof convergent nets. Let R be the limit of Q so that, for each J ∈ J ,limη Qη(J) = R(J). We show that R is a coherent prevision indirectly.Suppose to the contrary that there are J1, . . . , Jn ∈ J and non-zero realnumbers α1, . . . , αn and c1, . . . , cn and ε > 0 such that

n∑j=1

αj [Jj(z)− cj ] ≤ −ε,

for all z ∈ Z and such that αj ≥ 0 for each j such that R(Jj) = ∞ orJj(y) = ∞ for some y, and cj = R(Jj) for each j such that R(Jj) < ∞.Since Qη(Jj) converges to R(Jj) for each j, there exists η0 ∈ D such that|Qη0(Jj)−R(Jj)| ≤ ε/(n|αj |) for all j such that R(Jj) <∞ and Qη0(Jj) ≥ cjfor each j such that R(Jj) =∞. It follows that

(13)n∑j=1

αj [Jj(z)− dj ] ≤ −ε

2,

where dj = Qη0(Jj) for each j such that R(Jj) <∞ and dj = cj for each jsuch that R(Jj) =∞. Since Qη0(Jj) ≥ dj for each j such that Qη0(Jj) <∞,the left-hand side of (13) becomes even more negative if we replace each dj byQη0(Jj) when R(Jj) =∞ but Qη0(Jj) <∞. The resulting inequality impliesthat Qη0 is incoherent as a prevision for elements of J , a contradiction.

October 31, 2017

Page 22: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

22 SCHERVISH, SEIDENFELD, STERN AND KADANE

Lemma 15. Let Z be a set. Then SZ is dense in ΛZ in the topology ofpointwise convergence.

Proof. Let λ0 ∈ ΛZ . Let N ′ be a neighborhood of λ0 in the pointwiseconvergence (product) topology. Then N contains a basic open set of theform

(14) N = {λ ∈ ΛZ : λ(fj) ∈ Nj , for j = 1, . . . , n},

where each fj ∈ IRZ

is bounded below and each Nj is a neighborhood ofP (fj) in IR. Define

J1 = {j : λ0(fj) <∞},J2 = {j : λ0(fj) =∞}.

Without loss of generality, we can assume that there exist εj > 0 for eachj ∈ J1 and cj for each j ∈ J2 such that Nj = (P (fj) − εj , P (fj) + εj) foreach j ∈ J1, and Nj = (cj ,∞] for each j ∈ J2. Let ε = minj∈J1 εj and c =maxj∈J2 cj . For each j ∈ J2, there exists mj , such that λ0(fj ∧mj) > c+ ε.Let a = minj infz fj(z). Let gj = fj −a for j ∈ J1 and gj = (fj ∧mj)−a forj ∈ J2. This makes each gj non-negative. We will find a simple probabilitymeasure λN such that

|λ0(gj)− λN (gj)| < ε for all j ∈ J1 and λN (gj) > c− a for all j ∈ J2.

Since fj ≥ gj + a, for j ∈ J2, the above clearly implies that λN ∈ N .Let g =

∑nj=1 gj . Note that λ0(g) < ∞. Let m be large enough so that

λ0(g) < λ0(gI{g≤m}) + ε/3, which exists by Lemma 9. Note that

n∑j=1

λ0(gjI{g≤m}) = λ0(gI{g≤m})

> λ0(g)− ε

3,

=n∑j=1

λ0(gj)−ε

3.

Henceε

3>∑j∈J1

[λ0(gj)− λ0(gjI{g≤m})

].

Since each λ0(gj) ≥ λ0(gjI{g≤m}), we have λ0(gj) < λ0(gjI{g≤m}) + ε/3 foreach j. Let A = {z : g(z) ≤ m}. Let ` = d3m/εe. For each j ∈ J1 and

October 31, 2017

Page 23: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 23

k = 1, . . . , `, let Aj,k = A ∩ {z : (k − 1)ε ≤ gj(z) < kε}. Then, for eachj ∈ J1, Aj,1, . . . , Aj,` partitions A. Let B1, . . . , Br be the elements of thecommon refinement of these partitions that satisfy λ0(Bk) > 0 for each k.If zk ∈ Bk for all k, then∣∣∣∣∣

r∑k=1

λ0(Bk)gj(zk)− λ0(gjIA)

∣∣∣∣∣ < 2ε

3,

for all j. Let z0 ∈ A and B0 = AC . For E ⊆ Z, define

λN (E) =

r∑k=0

λ0(Bk)IE(zk).

Then

λN (gj) =

r∑k=1

λ0(Bk)gj(zk) + λ0(AC)gj(z0).

Since gj(z0) ≥ 0, for each j ∈ J2, we have

λN (gj) ≥ λ0(gjIA)− 2ε

3> λ0(gj)− ε > c+ ε− a ≥ c− a.

Because z0 ∈ A, we have 0 ≤ gj(z0) ≤ g(z0) ≤ m. Hence, for j ∈ J1,

gj(z0)λ0(AC) ≤ λ0(gIAC ) ≤ ε

3.

It follows that, for j ∈ J1,

λ0(gj)−ε < λ0(gjIA)−2ε

3≤ λN (gj) ≤ λ0(gjIA)+

3+gj(z0)λ0(AC) ≤ λ0(gj)+

3.

So |λN (gj)−λ0(gj)| < ε for all j ∈ J1 and λN (gj) > c− a for all j ∈ J2 andλN ∈ N .

Corollary 2 and Lemma 15 combine to provide the following representa-tion.

Corollary 3. Let Z be a set. A coherent prevision λ is in ΛZ if andonly if there exists a net {λη}η∈D in SZ of simple previsions and a coherentprevision P ∈ ΛD corresponding to an ultrafilter on D containing all tails ofD such that

λ(f) =

∫Dλη(f)P (dη),

for all f ∈MZ .

October 31, 2017

Page 24: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

24 SCHERVISH, SEIDENFELD, STERN AND KADANE

4.4. Order Matters. The following is as close as we come to a Fubini/Tonellitheorem involving finitely-additive integrals.

Lemma 16. Let X and Y be sets. Let P be the finitely-additive probabilitycorresponding to an ultrafilter U of subsets of Y. Let Q be a countably-additive discrete probability on X . Then P [Q] = Q[P ] for bounded functions.

Proof. Let A ⊆ X × Y. For each x ∈ X , let Ax = {y ∈ Y : (x, y) ∈ A}.For each y ∈ Y, let Ay = {x ∈ X : (x, y) ∈ A}. For each x ∈ X , P (Ax) = 1 or0 according as Ax ∈ U or Ax 6∈ U . Let CA = {x : Ax ∈ U}. Then Q[P ](A) =Q(CA). For each y ∈ Y, let fA(y) = Q(Ay). Lemma 13 says that P [Q](A)equals the common value of supB∈U infy∈B fA(y) = infB∈U supy∈B fA(y).For each ε > 0, define DA,ε = {y : fA(y) ∈ (P [Q](A)−ε, P [Q](A)+ε)}. Notethat DA,ε ∈ U . The preceding analysis applies to every subset A of X × Y.

We would like to prove that, for every ε > 0,

(15) {y : Q(CA)− ε ≤ fA(y) ≤ Q(CA) + ε} ∈ U .

This would imply that |P [Q](A)−Q(CA)| ≤ ε for all ε > 0, hence P [Q](A) =Q(CA) = Q[P ](A).

There exists a countable subset {z1, z2, . . .} of X such that∑∞

j=1Q({zj}) =1. Let {x1, . . . , xn} be a large enough subset of CA so that

∑nj=1Q({xj}) >

Q(CA) − ε. Since each Ax ∈ U , nA ≡⋂nj=1Axj ∈ U and for each y ∈ nA,

{x1, . . . , xn} ⊆ Ay. It follows that fA(y) ≥ Q(A)−ε for every y ∈ nA. Applythe argument immediately above to AC to conclude that fAC (y) ≥ Q(CCA )−εfor all y in an ultrafilter set that we will call nA. Notice that (AC)x = (Ax)C

for all x ∈ X . It follows that CAC = CCA and fAC (y) = 1 − fA(y) for ally ∈ Y. Hence, Q(CAC ) = 1 − Q(CA), and fA(y) ≤ Q(CA) + ε for everyy ∈ nA. The set nA

⋂nA ∈ U is then a subset of the set defined in (15).

Here is an example to show why Lemma 16 has the condition that Q bea discrete countably-additive probability rather than a general countably-additive probability.

Example 5. Let X = {0, 1}∞, the set of countable binary sequences.LetQ0 be the distribution a sequenceX = {Xn}∞n=1 of independent Bernoullirandom variables each with Pr(Xn = 1) = 1/2. Let Y be the positive in-tegers, and let P be the finitely-additive probability corresponding to anon-principal ultrafilter U of subsets of the integers. Let E ⊆ X × Y bedefined as follows: For each y ∈ Y, let Ey = {x : xy = 1}, i.e., the setof sequences in which the yth coordinate is 1. Each Ey is a measurable

October 31, 2017

Page 25: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 25

set. Set E =⋃∞y=1E

y, which is also measurable, and each Ey is the y-section of E. For each x ∈ X , the x-section of E is Ex = {y : xy = 1},i.e. the subscripts of those terms in the sequence x that equal 1. We willalso need to use the set A = {x : Ex ∈ U}, which is not measurable. Tosee this, suppose that A is measurable. Then A is in the tail σ-field of thesequence X. By the Kolmogorov 0-1 law, Q0(A) ∈ {0, 1}. Similarly, AC isin the tail σ-field, so Q0(AC) ∈ {0, 1}. But AC = {x : ECx ∈ U} whereECx = {y : xy = 0}. By the inherent symmetry in the distribution of X,we must have Q0(A) = Q0(AC). But we can’t have Q0(A) = Q0(AC) = 0,and we can’t have Q0(A) = Q0(AC) = 1, so A is not measurable. It followsthat the inner and outer measures of A are not equal, so there is a numberc 6= 1/2 that is between the inner and outer measures. By symmetry theinterval between the inner and outer measures is symmetric around 1/2, sowe can take c > 1/2. Extend Q0 to the measure Q on the σ-field generatedby A and the original domain of Q0 with Q(A) = c.

The remainder of the example is devoted to showing that P [Q](E) 6=Q[P ](E). First, note that Q(Ey) = Q(Xy = 1) = 1/2 for all y ∈ Y, hence

P [Q](E) =

∫YQ(Ey)P (dy) =

1

2.

Next, note that P (Ex) = IA(x), so

Q[P ](E) =

∫XP (Ex)Q(dx) =

∫XIA(x)Q(dx) = Q(A) = c >

1

2.

Lemma 17. Under the conditions of Lemma 16, the minimum extensionof the common prevision Q[P ] = P [Q] is the iterated integral Q[P ] of theminimum extensions.

Proof. The minimum extension of Q[P ] is

Q′(f) = limn→∞

∫X

∫Y

[f(x, y) ∧ n]P (dy)Q(dx)

=

∫X

limn→∞

∫Y

[f(x, y) ∧ n]P (dy)Q(dx)

=

∫X

∫Yf(x, y)P (dy)Q(dx),

where the second equality follows from monotone convergence and the factthat Q is countably additive.

October 31, 2017

Page 26: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

26 SCHERVISH, SEIDENFELD, STERN AND KADANE

5. Separation of Convex Sets. The following definition extends somecommon terms to deal with IR-valued functions.

Definition 15. Let Z be a set. Let F ⊆ IRZ

be convex. For each p ∈ Fand f ∈ IR

Z, define

Ap,f,F = {a ∈ (0,∞) : p+ f/a ∈ F}.

We say that p is an internal point of F if for every bounded f , Ap,f,F 6= ∅.If 0 is an internal point of F , define

tF (f) = inf A0,f,F .

We call tF the support function of F .

Example 6. Let g ∈ IRZ and F = {f ∈ IRZ : f(z) < g(z), for all z}.Then F is convex and g − ε is an internal point for each ε > 0. In fact,for every ε > 0 and every bounded h ≥ ε, g − h is an internal point of F .Functions that get arbitrarily close to g are not internal.

Lemma 18. Let g and k be real-valued functions. Let F be a convex setwith g as an internal point. Let F ′ be another convex set that contains k.Suppose that f ′− f is well defined for every f ′ ∈ F ′ and every f ∈ F . Thenk − g is internal to F ′ − F .

Proof. Let h be a bounded function. We need to show that there existsa > 0 such that k − g + h/a ∈ F ′ − F . Since g is internal to F and −his bounded, there exists a > 0 such that g + (−h)/a ∈ F . Then k − [g +(−h)/a] = k − g + h/a ∈ F ′ − F . (Note that none of the arithmetic in thisproof involved infinite values.)

Lemma 19. Let F be a convex set with 0 as an internal point. Then

1. if f ∈ F , tF (f) ∈ [0, 1],2. if f 6∈ F , tF (f) ∈ [1,∞],3. tF (f1 + f2) ≤ tF (f1) + tF (f2) for all f1, f2,4. tF (αf) = αtF (f) for all f and all α > 0, and5. FF = {f : tF (f) < ∞ and tF (−f) < ∞} is a linear space that con-

tains all bounded functions.

Proof. Part (i): Let f ∈ F . Since f/1 = f , we see that 1 ∈ A0,f,F andtF (f) ≤ 1.

October 31, 2017

Page 27: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 27

Part (ii): Let f 6∈ F . If A0,f,F = ∅, then tF (f) =∞ > 1. For the rest of theproof of part (ii), suppose that A0,f,F 6= ∅. Note that f = a(f/a) + (1− a)0for all a ∈ (0, 1]. So f/a 6∈ F for a ∈ (0, 1] and tF (f) ≥ 1.

Part (iii): The inequality holds if either tF (f1) = ∞ or tF (f2) = ∞. Forthe rest of the proof of part (iii), assume that αj = tF (fj) < ∞ for bothj = 1, 2. Let ε > 0. Then fj/(αj + ε) ∈ F for j = 1, 2. Since F is convex

α1 + ε

α1 + α2 + 2ε

f1

α1 + ε+

α2 + ε

α1 + α2 + 2ε

f2

α2 + ε=

f1 + f2

α1 + α2 + 2ε,

is in F . Hence tF (f1 +f2) ≤ α1 +α2 +2ε for all ε, which implies tF (f1 +f2) ≤α1 + α2.

Part (iv): Let α > 0. For every a > 0 and every f , αf/(αa) ∈ F if andonly if f/a ∈ F , so that A0,αf,F = αA0,f,F for all f . Hence tF (αf) = αtF (f).

Part (v): Let f1, f2 ∈ FF , α1, α2 ∈ IR. Let

f ′j =

{fj if αj ≥ 0,−fj if αj < 0.

Then α1f1+α2f2 = |α1|f ′1+|α2|f ′2. By part (iv), |αj |f ′j ∈ FF for j = 1, 2, andby part (iii), |α1|f ′1+|α2|f ′2 ∈ FF . So α1f1+α2f2 ∈ FF . That FF contains allbounded functions is immediate from the fact that 0 is an internal point.

Lemma 20. Let F be a convex set with 0 as an internal point. Supposethat, for every f ∈ F , every g ≥ f is in F . If f1 ≤ f2, then tF (f1) ≥ tF (f2).

Proof. Let f1 ≤ f2. We prove that tF (f1) ≥ tF (f2) in cases. Case 1: Letf2 6∈ F . Then AC0,f2,F 6= ∅. Let a ∈ AC0,f2,F be such that f2/a 6∈ F . If f1/a ∈F , f1/a ≤ f2/a would contradict f2/a 6∈ F . So f1/a 6∈ F , and AC0,f2,F ⊆AC0,f1,F . Then A0,f1,F ⊆ A0,f2,F , which means inf A0,f1,F ≥ inf A0,f2,F andtF (f1) ≥ tF (f2). Case 2: Let f2 ∈ F with f1 6∈ F . Then tF (f1) ≥ 1 ≥ tF (f2).Case 3: Let f2 ∈ F with f1 ∈ F . Then A0,f1,F 6= ∅. For each a ∈ A0,f1,F wehave f1/a ≤ f2/a ∈ F , which implies f2/a ∈ F . So A0,f1,F ⊆ A0,f2,F andinf A0,f1,F ≥ inf A0,f2,F , so that tF (f1) ≥ tF (f2).

Lemma 21. Let G be a closed convex subset of IRZ

consisting of non-negative functions. Let G′ be a superset of G such that each for h ∈ G′ \Gthere exists g ∈ G with h ≥ g. Then ∂LG

′ = ∂LG.

Proof. Clearly, G ⊆ G′ so G ⊆ G′, and ∂LG ⊆ G′. Note that everyh ∈ G′ \ G is dominated by an element of G. Let h ∈ G′ \ G. Then thereis a net f = {fη}η∈D in G′ such that limη fη = h. Clearly, every tail of f

October 31, 2017

Page 28: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

28 SCHERVISH, SEIDENFELD, STERN AND KADANE

must contain elements of G′ \ G, otherwise f would have a cluster point(hence its limit) in G. Let g = {gγ}γ∈Γ be a subnet consisting of elementsof G′ \ G. Then g converges to h. For each γ ∈ Γ, let kγ ∈ G be such thatkγ dominates gγ . Since G is compact, k = {kγ}γ∈Γ has a convergent subnetwhose limit we call `. The corresponding subnet of g converges to h, and` ≤ h by Proposition 5. Since ` ∈ G and h 6∈ G, it must be that ` 6= h, so` dominates h. Hence, every element of G′ \G is dominated by an elementof G. Hence, no element of G′ \G can dominate an element of ∂LG, whichmakes ∂LG = ∂LG

′.

Lemma 22. Let G be a closed convex subset of IRZ

consisting of non-negative functions such that g ∈ G and f ≥ g implies f ∈ G. Let k be areal-valued function, and define

Hk = {h ∈MIR : h(z) < k(z), for each z}.

Suppose that Hk∩G = ∅. Then there exists λk ∈ ΛZ such that λk(k) ≤ λk(g)for all g ∈ G.

Proof. Note that Hk is convex. Every element of F = G −Hk − 1 is awell-defined function. Note that −1 6∈ F because Hk ∩G = ∅. However, foreach c > −1, the constant function c = k− (k− c− 1)− 1 is in F . It is easyto show that

(16) tF (c) =

{0 if c > 0,−c if c ≤ 0.

The first step in the proof is to show that F satisfies the conditions ofLemma 20. Let g1−h1−1 and g2−h2−1 be in F . Then h′ = αh1+(1−α)h2 ∈Hk and g′ = αg1 + (1 − α)g2 ∈ G. It follows that F is convex. Next, sincek− 1 is internal to Hk, k− (k− 1) = 1 is internal to G−Hk by Lemma 18.Hence 0 is internal to F Let f ∈ F so that f = g − h − 1 for some h ∈ Hk

and some g ∈ G. If f ′ ≥ f , we need to show f ′ ∈ F . Let ` ≥ 0 be such thatf ′ = f + `. (If f(z) = f ′(z) = ∞ for some z, `(z) can be any non-negativenumber.) Then, g + ` ∈ G and f ′ = g + ` − h − 1 ∈ F . This completes theproof that F satisfies the conditions of Lemma 20.

The next step in the proof is to perform a construction similar to theconstruction of a separating hyperplane between Hk and G. Specifically, we

attempt to find a coherent prevision λ on a linear space L ⊆ IRZ

such that

(17) − λ(f) ≤ tF (f), for all f ∈ L.

October 31, 2017

Page 29: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 29

The importance of satisfying (17) is that, if L contains F ∪ {−1}, thenλ(f) ≥ −1 for all f ∈ F , which implies that λ(h) ≤ λ(g) for all h ∈ Hk andall g ∈ G.

We start with a coherent λ0 on the space L0 of constant functions andwhich satisfies (17). We then extend it to a coherent prevision λ1 on the

set of all bounded functions in IRZ

, while satifying (17). We then follow theproof of the Hahn-Banach theorem. We cannot apply the theorem directlybecause not all linear functionals are coherent previsions. Let L0 be the set

of all constant real-valued functions in IRZ

. Define λ0(f) = c if f is theconstant function f(z) = c for all z. Then λ0 is trivially a coherent previsionon L0. Also, −λ0(c) ≤ tF (c) by (16). Next, partially order all extensionsthat satisfy (17) of λ0 to coherent previsions on linear spaces of boundedfunctions. That is, we denote an extension λ of λ0 that satisfies (17) ona linear space L of bounded functions that contains L0 by (λ,L), and wedefine the partial order (λa,La) � (λb,Lb) to mean that La ⊆ Lb. Sinceevery chain in this partial order has the union of its domains as an upperbound, Zorn’s lemma says that there is a maximal element, (λ1,L1).

The next step is to show that L1 contains all bounded functions. Assumeto the contrary that there is a bounded h 6∈ L1. Then there is a bounded hthat satisfies h ≥ 0, tF (h) <∞, and tF (−h) <∞. (Every bounded functionis the difference between two bounded non-negative functions.) Let F1 bethe linear span of L1 ∪ {h} so that

F1 = {f + αh : α ∈ IR, f ∈ L1},

and for each g ∈ F1 there is a unique αg ∈ IR and fg ∈ L1 such thatg = fg + αgh. Define

λc1(g) = λ1(fg) + αgc,

for each real c. We need to show that there exists c > 0 such that −λc1(g) ≤tF (g) for all g ∈ F1. Let f, ` ∈ L1. Then

λ1(f)− λ1(`) = λ1(f − `)≥ −tF (f − `)= −tF (f + h− h− `)≥ −tF (f + h)− tF (−`− h),

λ1(f) + tF (f + h) ≥ λ1(`)− tF (−h− `).

(Note that all quantities involved in the above equations are finite.) It followsthat

(18) sup`∈L1

[λ1(`)− tF (−h− `)] ≤ inff∈L1

[λ1(f) + tF (f + h)].

October 31, 2017

Page 30: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

30 SCHERVISH, SEIDENFELD, STERN AND KADANE

Because h ≥ 0, λ1(`) = −λ1(−`) ≤ tF (−`) ≤ tF (−h − `) for all f ∈ L1. Sothe left-hand side of (18) is at most 0. Let c ≥ 0 be such that −c is in theclosed interval between the two sides of (18). If αg > 0, let f = fg/αg onthe right-hand side of (18), so that

−c ≤ λ1(fg/αg) + tF (fg/αg + h),

−αgc ≤ λ1(fg) + tF (fg + αgh),

−λ1(fg)− αgc ≤ tF (fg + αgh),

−λc1(g) ≤ tF (g).

If αg < 0, let ` = fg/αg on the left-hand side of (18), so that

λ1(fg/αg)− tF (−h− fg/αg) ≤ −c,−λ1(fg)− tF (fg + αgh) ≤ αgc,

−λ1(fg)− αgc ≤ tF (fg + αgh),

−λc1(g) ≤ tF (g).

Hence, we have extended λ1 to a coherent prevision on a larger domainwhile satisfying (17), which contradicts L1 being maximal. It follows thatL1 contains all bounded functions.

Next, we extend λ1 to a domain that includes all functions that arebounded below. Let λk be the minimum extension of λ1, namely

λk(f) = supbounded g ≤ f

λ1(g).

It follows that, for every f that is bounded below, and every bounded g ≤ f

λk(f) ≥ λ1(g),

−λk(f) ≤ −λ1(g)

≤ tF (g)

≤ tF (f),

where the third inequality follows from what we proved for bounded g, andthe fourth inequality follows from Lemma 20.

Next, we prove that λk provides the desired separation between G andHk. We know that λk(f) ≥ −1 for each f ∈ F that is bounded below. Foreach g ∈ G and bounded h ∈ Hk, f = g−h−1 ∈ F and f is bounded below,hence λk(f) ≥ −1. If h is bounded, so is h+ 1 and

(19) λk(g) ≥ λk(h+ 1)− 1 = λk(h),

October 31, 2017

Page 31: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 31

for all g ∈ G and all bounded h ∈ Hk. For each ε > 0 and m > 0, (k−ε)∧m ∈Hk, is bounded, and is no greater than k − ε. So

λk(k − ε) = limm→∞

λk((k − ε) ∧m) ≤ suph∈Hk

λk(h) ≤ λk(g),

for all g ∈ G. So

λk(k) ≤ ε+ suph∈Hk

λk(h) ≤ ε+ λk(g),

for all g ∈ G, and λk(g) ≥ λk(k) − ε for every g ∈ G and ε > 0. It followsthat λk(g) ≥ λk(k) for all g ∈ G.

Lemma 23. Let G be a closed convex subset of IRZ consisting of non-negative functions. Let k ∈ ∂LG. Then there exists a coherent previsionλk ∈ ΛZ such that λk(k) ≤ λk(g) for all g ∈ G.

Proof. If k is real-valued, Lemma 22 implies the conclusion. If k(z) =∞for all z, then the risk set is the singleton {k}, and every λ can be λk. If ktakes the value∞ and at least one finite value, let Zk = Z \{z : k(z) =∞}.Let Gk be the restrictions of all elements of G to the sub-domain Zk so thatGk is a convex set of non-negative real-valued functions such that g ∈ Gkand f ≥ g implies f ∈ Gk. Let k′ be the restriction of k to Zk.

First, we show that k′ ∈ ∂LGk. Assume to the contrary that there isg ∈ Gk such that g(z) ≤ k′(z) for all z ∈ Zk and g(z0) < k′(z0) for somez0 ∈ Zk. Let g′ = {gη}η∈D be a net in Gk that converges to g. Each gη is therestriction to Zk of hη ∈ G. Let h be a cluster point of h′ = {hη}η∈D. Let h′′

be a subnet that converges to h, and let g′′ be the corresponding subnet ofg′, which still converges to g. It follows that g is the restriction of h to Zk.Since k(z) = ∞ for all z ∈ Z \ Zk, h(z) ≤ k(z) for all z and h(z0) < k(z0),contradicting k ∈ ∂LG.

Apply Lemma 22 with k replaced by k′ and G replaced by Gk to get acoherent prevision λ′ ∈ ΛZk such that λ′(k′) ≤ λ′(g) for all g ∈ Gk. For

each g ∈ IRZ

, define λk(g) to be λ′ of the restriction of g to Zk. Then,λk(k) ≤ λk(g) for all g ∈ G, and λk ∈ ΛZ .

6. Properties of the Risk Set. In all results in this section, the topol-

ogy that we assume for the set IRZ

of IR-valued functions on a set Z is theproduct topology, which is also the topology of pointwise convergence.

Lemma 24. Let b ∈ IR. Let G0 be a subset of MZ such that g(z) ≥ b forall z ∈ Z and all g ∈ G0. let G be the convex hull of G0, and let G be the

October 31, 2017

Page 32: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

32 SCHERVISH, SEIDENFELD, STERN AND KADANE

closure of G. Then G is compact. A function g ∈MZ is in G if and only ifthere is a P ∈ ΛG0 such that

(20) g(z) =

∫G0

f(z)P (df).

Proof. By Proposition 7, G is compact. By Lemma 15, (20) holds ifand only if there is a net {Pη}η∈D of simple previsions in SG0 that con-verges pointwise to P , that is, for each η ∈ D there are finitely manypositive αη,j that add to 1 and fη,j ∈ G0 for j = 1, . . . , nη such thatPη(B) =

∑nηj=1 αη,jIB(fη,j) for each subset B of G0. That Pη → P pointwise,

is equivalent to

g(z) = limη

∫G0

f(z)Pη(df)

= limη

nη∑j=1

αη,jfη,j(z),(21)

for all z. Note that (21) occurs if and only if g is the limit of the net of func-tions {gη}η∈D defined by gη =

∑nηj=1 αη,jfη,j , which are convex combinations

of elements of G0 and hence elements of G. In summary, we’ve shown that(20) holds if and only if g is the pointwise limit of a net of elements of G.Proposition 2 says that g is the limit of a net of elements of G if and only ifg ∈ G.

We will apply Lemma 24 with Z = Θ, and G0 equal to one of severalsubsets of MΘ. For all cases, we start with a set B of decision rules andlet G0 be the set of corresponding risk functions. That is, each f ∈ G0

corresponds to a rule γ ∈ B with f(θ) = R(θ, γ).First, let B = H0 be the set of non-randomized rules under consideration.

As we stated in Section 2.1, the set H0 can be an arbitrary subset of AX .With this understanding, the convex hull G is the set of risk functions ofsimple randomized rules. That is, each f ∈ G is of the form

∑nj=1 αjfj

with each fj the risk function of some non-randomized rule γj ∈ B. If δis a randomized rule as in (2), then

∫G0R(θ, γ)δ(dγ) is R(θ, δ). Let RB

be the set of risk functions of randomized rules whose randomizations areminimum extensions. Lemma 24 says that G = RB. That is, we have thefollowing result.

Corollary 4. The set RB is compact and is the closure of the convexhull of the set of risk functions of the non-randomized rules in B.

October 31, 2017

Page 33: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 33

For the special case in which B is the set of all non-randomized rules, wedenote RB by R1.

As another application of Lemma 24, let B be the set of all countably-additive general randomizations that are minimum extensions and whoserisk functions are of the form (4). For this case, we also assume that the Pθare all countably additive minimum extensions. Then, G0 equals the set ofall risk functions of the form (4), and it is convex so that G = G0. Lemma 30in Appendix A shows that this set G0 is contained in the set R1. Hence theclosure of G0 is equal to R1, and we have the following result.

Corollary 5. Assume that the Pθ are all countably additive. The setof all risk functions of finitely-additive minimum extension randomizationsof countably-additive general randomized rules equals R1.

Finally, we are also interested in the entire risk set R consisting of the riskfunctions of all finitely-additive randomizations of non-randomized rules.This set could be larger than R1. The potentially extra risk functions wouldbe those corresponding to randomizations that are not minimum extensions.However, the following is immediate from the definition of minimum exten-sion.

Proposition 12. Let Z be a set, and let λ0 be a coherent prevision onthe bounded elements ofMZ . Let λ be the minimum extension of λ0, and letλ′ be a different extension of λ0 toMZ . Then λ′(f) ≥ λ(f) for all f ∈MZ .

It follows that all risk functions in R \ R1 correspond to inadmissiblerules. Lemma 21 says that ∂LR = ∂LR1.

7. A Complete Class Theorem. Our complete class theorem hasonly one assumption, namely that the loss function is bounded below. Weneed one lemma first.

Lemma 25. Let Z be a set. For every z ∈ Z and every c ∈ IR, H =

{f ∈ IRZ

: f(z) = c} is closed. Also, both H< = {f ∈ IRZ

: f(z) < c} and

H> = {f ∈ IRZ

: f(z) > c} are open.

Proof. For finite c, each of H> and H< are sub-basic open sets, hencetheir union HC is open. For c = ∞, H> = ∅ and every cluster point f ofevery net in H has f(z) =∞, hence H is closed.

October 31, 2017

Page 34: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

34 SCHERVISH, SEIDENFELD, STERN AND KADANE

Theorem 1 (Complete Class Theorem). The decision rules whose riskfunctions are on ∂LR form a minimal complete class. Each admissible ruleis a Bayes rule.

Proof. By the definition of ∂LR, we see that every rule whose risk func-tion is on ∂LR is admissible, and every rule whose risk function is not on∂LR is inadmissible. No subset C of ∂LR could be a complete class becauseeach element of ∂LR that is not in C is not dominated by an element ofC (or by anything else.) What remains to the first claim (about a minimalcomplete class) is to show that every rule whose risk function is not on ∂LR(i.e., is inadmissible) is dominated by a rule whose risk function is on ∂LR.

Let f0 6∈ ∂LR. If f0 ∈ R \ R1, than f0 is dominated by an element of R1

by Proposition 12. So it suffices to show that each element of R1 \ ∂LR isdominated by an element of ∂LR1. Let f0 ∈ R1 \ ∂LR1, and let F0 be {f0}union with the set of all elements of R1 that dominate f . First, we showthat F0 is closed. Let h ∈ FC0 . We need to show that h has a neighborhoodthat is disjoint from F0. Since h does not dominate f , there exists θ0 suchthat h(θ0) > f(θ0). Let N = {g : g(θ0) > f0(θ0)}, which is a neighborhoodof h. It is clear that N

⋂F0 = ∅, because every element k of F0 (at least

weakly) dominates f and hence satisfies k(θ0) ≤ f0(θ0).Since F0 is closed and is a subset of the compact set R1, every net in F0

has a convergent subnet. Well-order (≺) the elements of Θ as {θγ : γ ∈ Γ}.For γ = 1, let ξ1 = infg∈F0 g(θ1). Let {gn}∞n=1 be a sequence of elements ofF0 such that limn→∞ gn(θ1) = ξ1. Let f1 ∈ F0 be the limit of a convergentsubsequence. Let F1 = {g ∈ F0 : g(θ1) = ξ1}, which is closed by Lemma 25and which we just showed was non-empty. We just proved the followinginduction hypothesis for γ = 1:

Induction hypothesis: There is a decreasing collection of non-empty closedsubsets {Fη}1�η�γ of F0 and a collection of functions {fη}1�η�γ such that(i) fη ∈ Fη, and (ii) fγ(θ) = infg∈Fη fη(θη) for all 1 � η � γ.

Let γ be a successor ordinal, and assume that Induction hypothesis is truefor each η ≺ γ. Let ξγ = infg∈Fγ−1 g(θγ). Let {gn}∞n=1 be a sequence ofelements of Fγ−1 such that limn→∞ gn(θγ) = ξγ . Let fγ ∈ Fγ−1 be the limitof a convergent subsequence. Let Fγ = {g ∈ Fγ−1 : g(θγ) = ξγ}, which isclosed and which we just showed was non-empty. Then Induction hypothesisis true for γ. Next, let γ be a limit ordinal. Let hη ∈ Fη, for each η ≺ γ.The net {hη}η≺γ has a convergent subnet, whose limit we will call fγ−.By construction, for every η ≺ γ, hψ(θη) is constant in ψ for ψ � η, hencefγ−(θη) = hη(θη) for every η ≺ γ and fγ− ∈

⋂η<γ Fη = Fγ− which is closed,

non-empty, and a subset of every Fη for η ≺ γ. Now, apply the successorordinal argument to γ with γ− standing in for γ − 1 wherever it appears.

October 31, 2017

Page 35: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 35

So, for every γ ∈ Γ, there is fγ that satisfies Induction hypothesis. Let f bea cluster point of the net {fγ}γ∈Γ.

Next, we show that f is in the lower boundary. Assume, to the contrary,that f is not in the lower boundary. Then, there exists g and θ0 such thatg(θ) ≤ f(θ) for all θ and g(θ0) < f(θ0). Let γ ∈ Γ be such that θγ = θ0. Byconstruction,

f(θ0) = fγ(θ0) = infg∈Fγ−

g(θ0),

where Fγ− stands for Fγ−1 if γ is a successor. This contradicts g(θ0) < f(θ0).Finally, we show that all admissible rules are Bayes. Let δ be admissible

with risk function k, which we just proved is in ∂LR1. Now apply Lemma 23with G = {g : g ≥ f for some f ∈ R1} and Z = Θ.

8. A Minimax Theorem.

Definition 16. , A rule δ is called a minimax rule if

(22) supθR(θ, δ) = inf

γsupθR(θ, γ).

A coherent prevision λ0 on MΘ is called a least-favorable distribution ifr0(λ0) = supλ r0(λ). The right-hand side of (22) is called the minimax valueof the decision problem.

The proof of our minimax theorem makes use of both the pointwise anduniform topologies on various sets.

Theorem 2 (Minimax theorem). There exist a minimax rule and aleast-favorable prior. Every minimax rule is Bayes with respect to the least-favorable prior.

Proof. Let R0 be the subset of R1 consisting of the bounded elements.First, suppose that R0 = ∅. Lemma 7 says that λ0(f) =∞ for all f ∈ R iscoherent, and λ0 is clearly least-favorable. Every risk function is unboundedif and only if the minimax value of the decision problem is ∞, and everyrule is minimax. Since r0(λ0) =∞, every rule is Bayes with respect to λ0.

For the rest of the proof, suppose that R0 6= ∅. Without loss of generality,assume that L is non-negative. Let f∗ ∈ R0, and let s∗ = supθ f∗(θ). LetF be the set of all elements of MΘ that are bounded above by s∗. ByCorollary 1, F is a closed set in both topologies, so F ∩R1 is closed. Also,note that F ∩ R1 = F ∩ R0. No decision rule outside of F could be aminimax rule. For each real s ≤ s∗, let As = {f ∈ F : supθ f(θ) < s}. Let

October 31, 2017

Page 36: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

36 SCHERVISH, SEIDENFELD, STERN AND KADANE

s0 = inf{s : As ∩ R0 6= ∅}. It is clear then that s0 = infδ supθ R(θ, δ). Let{fn}∞n=1 be a sequence of elements of R0∩F such that supθ fn(θ) ≤ s0+1/n.Since R0 ∩ F is closed (and compact in the pointwise topology) there isf0 ∈ R0∩F that is a (pointwise) cluster point of {fn}∞n=1 and supθ f0(θ) ≤ s0

by Lemma 5. So supθ f0(θ) = s0 and f0 is the risk function of a minimaxrule.

LetG = {f ∈ IR

Θ: f ≥ g, for some g ∈ R1}.

Let Z = Θ, and let k(θ) = s0 for all θ. Then the conditions of Lemma 22hold. There exists a coherent prevision λ0 such that λ0(k) = s0 and λ0(g) ≥s0 for all g ∈ R1. In particular,

(23) λ0(f) ≥ s0, for each f that is the risk function of a minimax rule}.

Because the risk function f of a minimax rule satisfies supθ f(θ) ≤ s0 wealso have

(24) λ(f) ≤ s0, for all λ.

In particular, λ0(f) ≤ s0 which combines with (23) to imply that λ0(f) = s0,and f is Bayes with respect to λ0. This makes rB(λ0) = s0. But (24) impliesthat rB(λ) ≤ s0 for all λ. This makes λ0 least-favorable.

The least-favorable distribution for the case in Theorem 2 in which allrisk functions are unbounded is not necessarily the minimum extension offinitely-additive probability.

9. Bayes Rules. In this section, we present some results on the exis-tence and admissibility of Bayes rules and one result about uniform domi-nance of rules that fail to be Bayes.

Lemma 26. Let λ ∈ SΘ. Then, there is an Bayes rule with respect to λwith risk function on the lower boundary.

Proof. Let c = inff∈R λ(f). For each positive integer m, there is fm ∈R1 such that λ(fm) ≤ c+ 1/m. Since R1 is compact, there is a subsequence{fnk}∞k=1 that converges to an element f ∈ R. Lemma 33 says that λ iscontinuous, so λ(f) = limn→∞ λ(fn) = c. If f is on the lower boundary, letg = f . If f is not on the lower boundary, Theorem 1 says that there is g onthe lower boundary that dominates f , hence λ(g) = c because it can’t beany lower. Every δ such that R(θ, δ) = g is an Bayes rule.

October 31, 2017

Page 37: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 37

When the loss function is bounded, we can show that Bayes rules existwith respect to a larger class of priors. In this case, R = R1 because all riskfunctions are integrals over a of bounded functions of (θ, a).

Lemma 27. Suppose that the loss function is bounded. Let λ be a countably-additive discrete coherent prevision on MΘ. Then, there is an Bayes rulewith respect to λ with risk function on the lower boundary.

Proof. Let c = r0(λ) = inff∈R λ(f). For each positive integer m, thereis fm ∈ R such that λ(fm) ≤ c + 1/m. Since R is compact, there is asubsequence {fnk}∞k=1 that converges to an element f ∈ R. Let U be anultrafilter of subsets of ZZ with corresponding probability P such that

(25) limk→∞

fnk(θ) =

∫IIfm(θ)P (dm),

for all θ. Then,

c = limk→∞

∫Θfnk(θ)λ(dθ)

=

∫II

∫Θfm(θ)λ(dθ)P (dm)

=

∫Θ

∫IIfm(θ)P (dm)λ(dθ)

=

∫Θf(θ)λ(dθ),

where the third equality follows from Lemma 16, and the final equaltiyfollows from (25). If f is on the lower boundary, let g = f . If f is not on thelower boundary, Theorem 1 says that there is g on the lower boundary thatdominates f , hence λ(g) = c because it can’t be any lower. Every δ suchthat R(θ, δ) = g is an Bayes rule.

Lemma 28. If a Bayes rule with respect to λ exists, then there is anadmissible Bayes rule with respect to λ.

Proof. Let δ1 be a Bayes rule with respect to λ with risk function f1.Then r0(λ) = r(λ, δ1). If f1 ∈ ∂LR, then f1 is admissible. If f1 6∈ ∂LR,Theorem 1 shows that there is f2 ∈ ∂lR that dominates f!, so that f2 ≤ f1,so

r0(λ) ≤ r(λ, f2) ≤ r(λ, f1) = r0(λ),

and f2 is also Bayes with respect to λ.

October 31, 2017

Page 38: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

38 SCHERVISH, SEIDENFELD, STERN AND KADANE

The one theorem conspicuous by its absence is one that says that everyprior distribution has a Bayes rule. The following is a counterexample to themissing theorem.

Example 7. Let Θ = (0, 1), and let A be the set of all real-valuedfunctions of the form cIA, where A is a non-empty open sub-interval of Θ,and c is one over the length of A. Let the loss function be L(θ, a) = a(θ)+da,where da = 0.1(1 − |A|). Consider the prior λ0 which is on an ultrfiltercontaining sets of the form (0, b) for b ∈ (0, 1). Also, consider the sequenceof non-randomized rules {γn}∞n=1 with γn = n/(n−1)I(1/n,1). The Bayes riskof each such rule is r(λ0, γn) = 0.1/n, and the infimum of the Bayes risksis 0, which is clearly the infimum of the Bayes risks over all decision rules.The remainder of this example is devoted to showing that no decision hasBayes risk equal to 0. If {R(θ, δη)}η∈D is a net of risk functions of simplerandomized rules, then each one has the form

(26) R(θ, δη) =

nη∑j=1

αη,jbη,j − aη,j

I(aη,j ,bη,j)(θ) + 0.1

nη∑j=1

αη,j(1− bη,j + aη,j),

where nη is the number of components of the simple randomization δη, thejth component is 1/(bη,j−aη,j)I(aη,j ,bη,j)(θ), and the αη,j are the coefficientsin the convex combination. Let fη(θ) denote the first sum on the right-handside of (26). Let R(θ, δB) be a cluster point of {R(θ, δη)}η∈D. We will provethat r(λ0, δB) > 0. According to Lemma 13, there is an ultrafilter U ofsubsets of D and corresponding probability P such that

R(θ, δB) =

∫DR(θ, δη)P (dη).

The Bayes risk of δB with respect to λ0 is then

(27) r(λ0, δB) = limη

0.1

nη∑j=1

αη,j(1− bη,j + aη,j) + limθ→0

f(θ),

where f(θ) = limη fη(θ) for all θ. In order for δB to be Bayes, we needboth terms in (27) to be zero. We can make the first term in (27) smallerwhile simultaneously reducing or keeping fη(θ) the same for all η and θ byreplacing each bη,j by 1. Let c ∈ (0, 1). We can make the first term in (27)smaller while keeping limθ→0 f(θ) the same by replacing each aη,j > c by c.With these changes, we find that, for each θ > a, fη(θ) ≥ 1, so f(θ) ≥ 1.Either (i) for every c there is a subnet of the net such that for each η in the

October 31, 2017

Page 39: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 39

subnet there exists aη,j < c or (ii) there is a tail of the net with aη,j ≥ cfor all η in the tail. If case (i) is true, the subnet corresponding to each cconverges to f , and for each θ > 0 there is c > 0 such that θ > c, hencef(θ) ≥ 1. This makes the second term in (27) at least 1. If case (ii) is true,the first term in (27) is at least 0.1c.

The failure of some priors to have Bayes rules may seem puzzling. For eachprior λ, there are rules that have Bayes risk arbitrarily close to r0(λ), andthe lower boundary is contained in every risk set. The problem is that not allcoherent previsions are continuous functions onMΘ. The lack of continuityis due not to the finite-additivity of the previsions but rather to the largenumber of nets that converge in the pointwise topology. By contrast, in theuniform topology, all finitely additive previsions are continuous. We explorethis relationship between prior distributions and poitwise convergence inmore detail in Appendix B.

The following is a generalization of a theorem of Pearce (1984) which wasrestricted to finite action and parameter spaces. It is also an extension ofTheorem 2 of Heath and Sudderth (1978) to cases in which the order inwhich the data and parameter integrals are performed matters.

Lemma 29. Assume that the there is a rule δ0 with bounded risk functionsuch that r(λ, δ0) > r0(λ) for every λ. That is, there is no λ such that δ0

is Bayes with respect to λ. Then there is a rule δ1 and ε > 0 such thatR(θ, δ0) > R(θ, δ1) + ε for all θ.

Proof. Consider a new decision problem with loss L′(θ, a) = L(θ, a) −R(θ, δ0), so that L′ is bounded below. Let R′ be the new risk function, sothat

(28) R′(θ, δ) = R(θ, δ)−R(θ, δ0),

for all δ and θ. In this new problem, there is no λ such that δ0 is an Bayesrule. Also R′(θ, δ0) = 0 for all θ. Theorem 2 applies, and there is a minimaxrule δ1. and a least-favorable distribution λ1. Let ε = − supθ R

′(θ, δ1). Sinceδ0 is not Bayes with respect λ1, it is not minimax. Since supθ R

′(θ, δ0) = 0,it follows that

−ε = supθR′(θ, δ1) < 0,

So, R′(θ, δ1) ≤ −ε < 0 for all θ. It follows that ε > 0 and

−ε ≥ R(θ, δ1)−R(θ, δ0),

for all θ. So R(θ, δ0) ≥ R(θ, δ1) + ε for all θ.

October 31, 2017

Page 40: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

40 SCHERVISH, SEIDENFELD, STERN AND KADANE

10. Discussion. We consider a general decision problem with generalloss function. The only assumption that we make about the loss function isthat it is bounded below. The loss is allowed to take the value ∞. We putno restrictions on the distributions of the data that might be observed andused in decision rules. We allow all finitely-additive and countably-additiverandomized rules. We prove that the risk set contains its lower boundary,and we prove general complete class and minimax theorems. In particular,all decision problems have a minimal complete class consisting of Bayesrules and all decision problems have minimax rules and a least-favorabledistribution.

We have defined the various parts of the decision problem to match theclassical countably-additive theory as closely as we could. In particular, wedefined a risk function for each randomized rule before introducing priordistributions. The Bayes risk then becomes the integral of the risk functionwith respect to the prior, which is also the prevision of the risk function.For non-randomized rules, we defined the risk function as the integral of theloss function with respect to the distribution of the data. We then definedthe risk of a randomized rule to be the integral of the risk functions of non-randomized rules with respect to a finitely-additive probability over the setG0 of non-randomized rules. This type of randomization is what Wald andWolfowitz (1951) called a special randomization in the countably-additivesetting. Special randomizations differ from general randomizations where theloss function is redefined for randomized rules by integrating the loss func-tion for a specific data value x with respect to an x-specific randomizationover A. We show both that our risk set for special randomizations con-tains all of the risk functions that result from general randomizations in thecountably-additive setting (Lemma 30) and that one can include all of thoserisk functions of general randomizations together with the risk functionsof the non-randomized rules into a set G0 before performing the finitely-additive special randomizations (Corollary 5.)

We have done no a posteriori Bayesian analysis. That is, we have com-puted neither posterior distributions of parameters nor formal Bayes rules,which minimize posterior risk. In fact, we have made no concessions to aBayesian who might wish to start with a general finitely-additive joint dis-tribution of data and parameter, as Heath and Sudderth (1972, 1978) do.Instead, we require that the data integral (given θ) be performed before theparameter integral (prior). A Bayesian with a general finitely-additive jointdistribution of data and parameter might require that the integration beperformed differently. As we saw in Example 5, the order in which one doesintegrals can make a difference in finitely-additive distributions even when

October 31, 2017

Page 41: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 41

the the integrands are positive and/or bounded. This is in sharp contrast tothe countably-additive theory in which Fubini and Tonelli theorems apply.In addition, decision problems involve a third integral over the action space,so there are six possible orders in which integrals could be computed. Theone we have chosen is (as we mentioned above) as close as we could cometo the order used in the classical countably-additive approach to decisiontheory, and it accommodates all of the decision rules and risk functions thatare available in the countably-additive theory.

Another fundamental difference between finitely-additive and countably-additive probabilities and expectation (prevision) operators is their domainsof definition. Countably-additive probabilities are traditionally defined on σ-fields of subsets of their underlying spaces and their associated expectationoperators are defined on sets of functions that are measurable with respectto that same σ-field. A countably-additive probability can be extended insome cases beyond the σ-field on which it was originally defined. (Exam-ple 5 contains an example of such an extension.) Such extensions, beyondthe measure completion of a probability, are rarely studied in detail due tothe fact that such extensions are not unique and they fail to admit regu-lar conditional probabilities (cite[Corollary 1]seidenfeldschervishkadane01.)Finitely-additive probabilities can be defined on arbitrary collections of sub-sets of their underlying spaces, including the power sets. The associated co-herent prevision operators can be defined on arbitrary sets of real-valuedfunctions. Measurability of sets and/or functions is rarely an issue exceptwhen proving that some extension is unique. Lemma 8 is one example of aresult that relies on a “measurability” assumption, namely that X ∧m ∈ Jfor all X ∈ J and all m ≥ 0. Even if such an assumption failed for a setJ , because coherent previsions can be extended to arbitrary supersets oftheir domains, one could extend the prevision to a superset of J where theassumption holds before applying the Lemma 8.

APPENDIX A: COUNTABLY-ADDITIVE RANDOMIZATIONS

As promised in Sections 2.1 and 6, we prove that the risk functions of theform (4) are included in the pointwise closure of G.

Lemma 30. Assume that each Pθ is countably-additive and a minimumextension. Also, assume that every randomization included in each random-ized rule δ is countably-additive and a minimum extension. Let G be theconvex hull of the set of risk functions of non-randomized rules. Then everyfunction of the form (4) is in the closure of G.

Proof. Let H be the set of all functions of the form (4) with all prob-

October 31, 2017

Page 42: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

42 SCHERVISH, SEIDENFELD, STERN AND KADANE

abilities countably-additive. We need to show that every neighborhood of

every element of H contains an element of G. For a general element of IRΘ

,a neighborhood in the product topology is the union of arbitrarily manybasic open sets, hence we can restrict attention to basic open sets. A basicopen set is the intersection of finitely many one-dimensional open sets, eachof which has one of the following two forms:

• {f : |f(θ)− c| < ε} for some ε > 0, some θ, and some real c, or• {f : f(θ) > c} for some θ and some real c.

Let h ∈ H so that δ becomes fixed for the remainder of the proof. Theneighborhoods of an element h ∈ H that we need to show intersect G havethe form

(29) V = {f : f(θj) ∈ Nj , for j = 1, . . . ,m},

for some m and some θ1, . . . , θm with Nj = (h(θj) − εj , h(θj) + εj) for allj ∈ J1, and Nj = (cj ,∞] for all j ∈ J2 ∪ J3 with each εj > 0.

Let h be a function of the form (4), and let V have the form (29). Weneed to find an element of G that is in V . The rule δ∗ ∈ G that we findwill have the following form. There will be finitely many elements of A,a0, . . . , aM and a partition of X , D1, . . . , DR such that δ∗ =

∑v∈V rvδv,

where V is a finite set and for each v ∈ V, (i) δv is constant on each Dt,t = 1, . . . , R and (ii) δv takes values only from the set {a0, a1, . . . , aM}. Thereare (M + 1)R possible δv. The remainder of the proof is devoted to findingan appropriate a0, . . . , aM along with an appropriate partition D1, . . . , DR

and the corresponding weights {rv}v∈V .For each j = 1, . . . ,m, let Qj be the (countably-additive) prevision on

IRA

defined by Qj(Y ) =∫X∫A Y (a)δ(x)(da)Pθj (dx). This makes h(θj) =

Qj [L(θj , ·)]. For j = 1, . . . ,m, let

Bj = {a : L(θj , a) =∞},J3 = {j : Qj(Bj) > 0}},J2 = {j 6∈ J3 : h(θj) =∞} , and

J1 = {j : h(θj) <∞}.

Let ε = min{εj : j ∈ J1}, let c = max{cj : j ∈ J2 ∪ J3}. By construction,Qj(Bj) = 0 for j ∈ J1, and Qj(Bj) > 0 for j ∈ J3. For each j ∈ J3, letbj ∈ Bj be such that L(θj , bj) = ∞. Let w be the number of distinct bj forj ∈ J3.

October 31, 2017

Page 43: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 43

Define

E∞ =

m⋃j=1

Bj ,

En = E∞⋃{a : L(θj , a) < n, for j = 1, . . . ,m}.

Then⋃∞n=1En = A. For each j ∈ J1, there exists nj such that

(30)∣∣∣h(θj)−Qj

[IEnjL(θj , ·)

]∣∣∣ < ε

4.

For each j ∈ J2, there exists nj such that

Qj

[IEnjL(θj , ·)

]> c+ ε

Let n = maxj∈J1∪J2 nj .Next, we select M and the M + 1 elements of A, a0, . . . , aM . For each

j ∈ J1⋃J2, partition

⋃j∈J1∪J2{a : L(θj , a) ≤ n} into sets of the form

{a : L(θj , a) ∈ [(k − 1)ε/4, kε/4)}, for 1 ≤ k ≤ b1 + 4n/εc. Then formthe common refinement of these partitions. Let the non-empty sets in thatcommon refinement be A1, . . . , AK . For each ` ≥ 1 and each j ∈ J1

⋃J2,

L(θj , a) (as a function of a) varies by less than ε/4 on A`. Let a` ∈ A` foreach ` = 1, . . . ,K. Let a0 = a1. If w > 0, let aK+1, . . . , aK+w be the distinctvalues of bj for j ∈ J3 in some order. Then M = K + w. Let

A0 =

⋃j∈J1∪J2

{a : L(θj , a) > n}

\ E∞,AK+1 = E∞.

It follows that {A0, . . . , AK+1} is a partiton of A. For j ∈ J1,

(31)

∣∣∣∣∣∣Qj[IEnjL(θj , ·)

]−

M∑j=1

L(θj , aj)Qj(Aj)

∣∣∣∣∣∣ < ε

4,

andε

4≥ Qj

[IECnj

L(θj , ·)]≥ L(θj , a0)Qj(A0).

Next, we form the partition D1, . . . , DR. Let γ = 1/d4n/εe, and for each` = 0, . . . ,K + 1, partition ΛA into sets of the form {P : P (A`) = 0} and{P : P (A`) ∈ ([k − 1]γ, kγ]} for 1 ≤ k ≤ b1 + 1/γc. Then form the commonrefinement of these partitions for ` = 0, . . . ,K + 1. Let the non-empty sets

October 31, 2017

Page 44: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

44 SCHERVISH, SEIDENFELD, STERN AND KADANE

in that refinement be C1, . . . , CR. Then, as a function of P for fixed ` andt, P (A`) varies less than ε/(4n) as P varies over Ct. For each x, δ(x)(·) liesin one and only one Ct. For t = 1, . . . , R, let Dt = {x : δ(x) ∈ Ct}.

Next, we construct the set of non-randomized rules {δv}v∈V . Let V be theset of functions from the set {1, . . . , R} to {0, . . . ,M}, which has (M + 1)R

elements. For each v ∈ V, define δv to be the function defined by

δv(x) =

R∑t=1

av(t)IDt(x),

That is, δv is the non-randomized rule such that, for each t, δv(x) = av(t)

for all x ∈ Dt.Next, we construct the weights for each δv in the convex combination. For

each t = 1, . . . , R, let µt ∈ Ct. Let st,` = µt(A`) for ` = 0, . . . ,K. If w > 0,

let st,` = µt(AK+1)/w for ` = K + 1, . . . ,M . Clearly,∑M

`=0 st,` = 1 for each

t = 1, . . . , R. The weights are rv =∏Rt=1 st,v(t). It is straightforward to show

that for each ` = 0, . . . ,M and each t = 1, . . . , R,

(32)∑

{v:v(t)=`}

rv =∑

{v:v(t)=`}

R∏t=1

µt(A`) = st,`,

by induction on R. (The case R = 1 is trivial, and for R > 1, {v : v(t) = `}looks just like the R−1 case with each weight having an extra factor of st,`.)It then follows that

∑v∈V rv = 1.

Finally, we show that δ∗ =∑

v∈V rvδv ∈ V . The risk function of δ∗ is

R(θ, δ∗) =R∑t=1

Pθ(Dt)

[∑v∈V

rvL(θ, av(t))

]

=

R∑t=1

Pθ(Dt)

M∑`=0

∑v:v(t)=`

rvL(θ, a`)

=

R∑t=1

Pθ(Dt)

M∑`=0

st,`L(θ, a`),

by (32). For j ∈ J1⋃J2 and each x ∈ Dt,

(33)

∣∣∣∣∣∫{a:L(θj ,a)≤n}

L(θj , a)δ(x)(da)−M−w∑`=0

st,`L(θj , a`)

∣∣∣∣∣ ≤ ε

4.

October 31, 2017

Page 45: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 45

Integrate both sides of (33) with respect to Pθj to see that

(34)

∣∣∣∣∣Qj [I{L(θj ,·)≤n}L(θj , ·)]−

R∑t=1

Pθj (Dt)M−w∑`=0

st,`L(θj , a`)

∣∣∣∣∣ < ε

4.

For j ∈ J1, combine (34) with (30) and the fact that Pθj (Dt) = 0 for eachj ∈ J1 and each t such that Dt ⊆ AK+1 to get

|h(θj)−R(θj , δ∗)| <ε

2.

For j ∈ J2, we get R(θj , δ∗) > c. For j ∈ J3, R(θj , δ∗) ≥ ε/(8w)L(θj , bj) =∞. So, δ∗ ∈ V .

APPENDIX B: A SORE POINT ABOUT POINTWISECONVERGENCE

The pointwise topology allows convergence of nets to occur where onemight not expect it.

Lemma 31. Let Z be a set. Let f and g be elements of IRZ

. There exists anet {fη}η∈D that converges pointwise to g such that for each η, fη(z) = f(z)for all but finitely many z.

Proof. Let D be the directed set of all finite subsets of Z, as in Exam-ple 2. Define the net

fη(z) =

{g(z) if z ∈ η,f(z) if z 6∈ η.

Clearly, fη(z) = f(z) for all but finitely many z, namely those in the set η.To see that fη → g, let N be a neighborhood of g. Then N contains a set ofthe form

N ′ = {h : h(zj) ∈ Nj , for j = 1, . . . , n},

where n is a finite integer, z1, . . . , zn ∈ Z, N1, . . . , Nn are open sets in IR,and g ∈ N ′. The proof will be complete if we can show that there existsηN such that ηN ≤D η implies fη ∈ N ′. Because g ∈ N ′, g(zj) ∈ Nj forj = 1, . . . , n. Let ηN = {z1, . . . , zn}. Then ηN ≤ η implies fη(zj) = g(zj) forj = 1, . . . , n, and fη ∈ N ′.

For c ∈ IR, letHλ,c = {f : λ(f) = c}.

October 31, 2017

Page 46: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

46 SCHERVISH, SEIDENFELD, STERN AND KADANE

When trying to find an Bayes rule with respect to a coherent prevision λon MΘ, we seek functions that lie in Hλ,c ∩ R, for c = inff∈R λ(f). Aconsequence of Lemma 31 is the following result that makes it difficult tofind Bayes rules in the pointwise topology.

Lemma 32. Let λ be a coherent prevision on IRΘ

such that λ(f) = λ(g)whenever f and g differ at only finitely many values. (That is, λ(A) = 0for each finite subset of Θ.) For each pair (c, d) of elements of IR and eachg ∈ Hλ,d, there exists a net contained in Hλ,c that converges to g.

Proof. Let f ∈ Hλ,c and g ∈ Hλ,d, and apply Lemma 31.

Lemma 32 says that, if λ assigns 0 probability to each finite set, then

each Hλ,c is dense in IRΘ

. No matter where you look in the risk set, you willfind risk functions “in the neighborhood” that have every possible Bayesrisk. Lemma 32 applies to both finitely-additive and countably-additive pre-visions.

There is one set of prior distributions for which Bayes rules can be found.

Lemma 33. Let Z be a set. A coherent prevision λ onMZ is continuousin the pointwise topology if only if λ is simple and is a minimum extension.

Proof. For the “if” direction, note that all evaluation functionals arecontiunuous in the pointwise topology, so every convex combination is con-tinuous. A simple prevision that is a minimum extension is a convex com-bination of evalueation functionals. For the “only if” direction, we showthat all other coherent previsions are discontinuous. First, let λ be a simpleprevision that is not a minimum extension. Then there exists a function fsuch that limm→∞ λ(f ∧m) < λ(f) while limm→∞(f ∧m) = f pointwise.Next, assume that the restriction of λ to indicators of events takes infinitelymany different values. Then there is a countable partition Z =

⋃∞n=1Zn

such that λ(Zn) > 0 for all n. Let fn(z) = IZn(z)/λ(Zn) for each n. Thenfn converges pointwise to 0, but λ(fn) = 1 for all n. Next, assume thatthe restriction of λ to indicator functions takes only finitely many differentvalues. Then there exist finitely many distinct ultrafilters U1, . . . ,Un suchthat λ =

∑nj=1 αjPj where each Pj is the probability associated with Uj ,

each αj > 0 and∑n

j=1 αj = 1. Then there is a partion Z =⋃nj=1Zj where

Zj ∈ Uj for each j. If each Uj is principal, then λ is simple. Finally, assumethat there is j such that Uj is non-principal. Then Pj is not countably-additive. Since Pj takes only the values 0 and 1, there must be a partitionof Zj =

⋃∞k=1Ak such that Pj(Ak) = 0 for all k. Define gm(z) = I∪∞i=mAi(z).

Then gm converges pointwise to 0, but λ(gm) = αj for all m.

October 31, 2017

Page 47: What Finite-Additivity can Add to Decision Theory · decision theory in Section2. The next three sections give mathematical background needed to prove the main theorems. The background

FINITELY-ADDITIVE DECISION THEORY 47

As a purely mathematical aside, the proof of Lemma 33 could be modifiedto prove the following.

Proposition 13. Let Z be a set. The set of continuous linear function-als on IRZ with the pointwise topology is the linear span of the evaluationfunctionals.

REFERENCES

Berger, J. O. and Srinivasan, C. (1978). Generalized Bayes Estimators In MultivariateProblems. Annals of Statistics 6 783-801.

Brown, L. D. (1971). Admissible Estimators, Recurrent Diffusions, and Insoluble Bound-ary Value Problems. Annals of Mathematical Statistics 42 855-903.

Comfort, W. W. and Negrepontis, S. (1974). The Theory of Ultrafilters. Springer-Verlag, New York.

de Finetti, B. (1974). Theory of Probability. Wiley, New York.Dunford, N. and Schwartz, J. T. (1957). Linear Operators, Part I: General Theory.

Wiley, Hoboken, NJ.Heath, D. and Sudderth, W. (1972). On A Theorem Of De Finetti, Oddsmaking, And

Game Theory. Annals of Mathematical Statistics 43 2072-2077.Heath, D. and Sudderth, W. (1978). On Finitely Sdditive Priors, Coherence, and Ex-

tended Admissibility. Annals of Statistics 6 333-345.Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2 ed. Springer-

Verlag, New York.Pearce, D. (1984). Rationalizable Strategic Behavior and the Problem of Perfection.

Econometrica 52 1029-1050.Schervish, M. J., Seidenfeld, T. and Kadane, J. B. (2009). Proper Scoring Rules,

Dominated Forecasts, and Coherence. Decision Analysis 6 202-221.Schervish, M. J., Seidenfeld, T. and Kadane, J. B. (2014). Infinite Previsions And

Finitely Additive Expectations. DOI: 10.1214/14-AOS1203SUPP.Wald, A. and Wolfowitz, J. (1951). Two Methods Of Randomization In Statistics And

The Theory Of Games. Annals of Mathematics 52 581-586.

Mark J. Schervish and Joseph B. KadaneDepartment of Statistics and Data ScienceCarnegie Mellon UniversityPittsburgh, PA 15213E-mail: [email protected]

[email protected]

Teddy SeidenfeldDepartment of Statistics and Data Scienceand Department of Philosophy

Carnegie Mellon UniversityPittsburgh, PA 15213E-mail: [email protected]

Rafael SternGet the Departmentand the addressE-mail: [email protected]

October 31, 2017