compensation principle in repeated games

21
GAMES AND ECONOMIC BEHAVIOR 16, 1–21 (1996) ARTICLE NO. 0071 Compensation Principle in Repeated Games Kong-Pin Chen * Institute for Social Sciences and Philosophy, Academia Sinica, Taipei 11529, Taiwan Received February 15, 1994 A behavioral refinement of subgame perfect equilibria for repeated games is proposed. A temporal deviation-free equilibrium requires that an equilibrium should not assign an action profile from which a unilateral deviation can make all the players better off at that stage. We characterize the set of equilibrium payoffs and give a simple algorithm to find it. An extension of it, compensating equilibrium, requires that an equilibrium should leave no room for the possibility that a multishot unilateral deviation can make all the players better off. We partially characterize the equilibrium. Its relation to other refinement con- cepts is also discussed. Journal of Economic Literature Classification Numbers: C72, D74. © 1996 Academic Press, Inc. 1. INTRODUCTION It is a distinguishing feature in repeated game theory that the players’ utilities depend only on total payoff, which is the sum of stage game payoffs. This means that what the players care about is the total payoff itself, not the path by which this payoff is attained. That is, so long as the total payoff remains the same, the players do not care about the intertemporal composition of it. But the usual practice in the theory of repeated games is to find a path that renders a particular payoff; in order to support it, any deviation from this path will be punished, regardless of the nature of the deviation. Under this practice there are some strategies which are not reasonable from a behavioral viewpoint. For instance, consider the game in Fig. 1. If the game is played twice, there will be a subgame perfect equilibrium which plays (a 3 , b 3 ) first, and then (a 2 , b 2 ). To support it, if any player deviates in the first stage, then instead of (a 2 , b 2 ) * I thank William Thomson, David Oliver, Shigeo Muto and the seminar and conference participants of University of Rochester, National Taiwan University, the 1992 Far Eastern Meeting of the Econo- metric Society, and the Fourth International Conference on Game Theory for their comments. Most of all, I am extremely grateful to Jeff Banks and Drew Fudenberg for their encouragement, two referees for their insightful comments, and Sang-Chul Suh for his initial collaboration. Financial support from NSC (84-2415-H001-013) is gratefully acknowledged. 1 0899-8256/96 $18.00 Copyright © 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

Upload: kong-pin-chen

Post on 03-Oct-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Compensation Principle in Repeated Games

GAMES AND ECONOMIC BEHAVIOR16,1–21 (1996)ARTICLE NO. 0071

Compensation Principle in Repeated Games

Kong-Pin Chen∗

Institute for Social Sciences and Philosophy, Academia Sinica, Taipei11529,Taiwan

Received February 15, 1994

A behavioral refinement of subgame perfect equilibria for repeated games is proposed.A temporal deviation-free equilibrium requires that an equilibrium should not assign anaction profile from which a unilateral deviation can make all the players better off at thatstage. We characterize the set of equilibrium payoffs and give a simple algorithm to findit. An extension of it, compensating equilibrium, requires that an equilibrium should leaveno room for the possibility that a multishot unilateral deviation can make all the playersbetter off. We partially characterize the equilibrium. Its relation to other refinement con-cepts is also discussed.Journal of Economic LiteratureClassification Numbers: C72, D74.© 1996 Academic Press, Inc.

1. INTRODUCTION

It is a distinguishing feature in repeated game theory that the players’ utilitiesdepend only ontotal payoff, which is the sum of stage game payoffs. This meansthat what the players care about is the total payoff itself, not the path by whichthis payoff is attained. That is, so long as the total payoff remains the same,the players do not care about the intertemporal composition of it. But the usualpractice in the theory of repeated games is to find a path that renders a particularpayoff; in order to support it,any deviation from thispath will be punished,regardless of the nature of the deviation. Under this practice there are somestrategies which are not reasonable from a behavioral viewpoint.

For instance, consider the game in Fig. 1. If the game is played twice, there willbe a subgame perfect equilibrium which plays(a3, b3) first, and then(a2, b2).To support it, if any player deviates in the first stage, then instead of(a2, b2)

∗ I thank William Thomson, David Oliver, Shigeo Muto and the seminar and conference participantsof University of Rochester, National Taiwan University, the 1992 Far Eastern Meeting of the Econo-metric Society, and the Fourth International Conference on Game Theory for their comments. Most ofall, I am extremely grateful to Jeff Banks and Drew Fudenberg for their encouragement, two refereesfor their insightful comments, and Sang-Chul Suh for his initial collaboration. Financial support fromNSC (84-2415-H001-013) is gratefully acknowledged.

10899-8256/96 $18.00

Copyright © 1996 by Academic Press, Inc.All rights of reproduction in any form reserved.

Page 2: Compensation Principle in Repeated Games

2 KONG-PIN CHEN

FIG. 1. Motivation for refinement.

they play(a1, b1) in the final stage. This is an unreasonable strategy because itis designed in a way to make sure that all the players receivelow payoffs in thefirst stage; i.e., this is a self-punishing strategy which prevents the players fromreceiving higher payoffs when it is possible with a unilateral deviation in the firststage. For instance, if player 1 playsa1 instead ofa3 in the first stage, the payoffvector in that stage will be(2, 1), instead of(0, 0). This is considered a deviationand the punishment(a1, b1) will be invoked in the second stage. However, bothplayers are made better off by this deviation. So if it were really to occur, player 2will be happy with player 1’s deviation and thus, most likely, will forget aboutit and play(a2, b2) as if both players have fulfilled their “obligation” in the firststage. But if so, the action profile(a3, b3) will never be used on the equilibriumpath because if it were, player 1 will have incentive to deviate at that stage andenjoy the same continuation payoff.

The main point is, if the players care only about the payoff itself, they shouldnot mind which path they are on as long as this path gives them payoffs as highas the original path would. So we should think of the equilibrium of a repeatedgame as a “promise” that gives every player a specific payoff at each stage, andthe player is happy as long as at every stage he receives at least the amountof payoff he is entitled to,regardless of which path they are on. A particularimplication of this is that a deviation which makes every player better off shouldbe ignored.

In this paper we propose a refinement of subgame perfect equilibrium based onthis intuition, which we call a “temporal deviation-free equilibrium.” It requiresthat an equilibrium strategy should never assign an action profile from which a

Page 3: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 3

unilateral deviation can make every player better off at that stage. We characterizethe equilibrium payoff and show that it has a striking effect when the playersuse only pure strategies. We also give a simple algorithm to find the equilibriumpayoff.1

We then suggest an extension of this refinement, which is called the “com-pensating equilibrium.” It requires that an equilibrium strategy should leave noroom for the chance that a multishot unilateral deviation can make all the playersbetter off. The raison d’ˆetre being that, if a player has a multishot unilateral de-viation which makes all the players better off, this player can ask, and the otherplayers are likely to agree, to go back to the original path as if no player hasever deviated. That is, deviations that benefit every player are being encouraged.We show that this equilibrium always exists, and its payoffs contain the convexhull of stage game Nash equilibrium payoffs. We also partially characterize theequilibrium payoffs. Its relation to other refinement concepts in the literature isthen discussed.

The organization of the remainder of the paper is as follows. Section 2 givesthe formal model. Section 3 defines and characterizes the temporal deviation-freeequilibrium. Numerous examples are given to illustrate its refining power. Sec-tion 4 defines a compensating equilibrium and partially characterizes it. Section 5discusses the relation between the compensating equilibrium and a prominentequilibrium concept in repeated games, renegotiation-proofness. Section 6 con-cludes the paper.

2. THE MODEL

In this section we define a simple repeated game with discounting.

The Stage Game0

0 = (A1, A2, . . . , An;π1, π2, . . . , πn) ≡ (A, π) is a normal form game.Ai

is a finite set of pure strategies and1(Ai ) is the set of mixed strategies for playeri . Denumerate the elements inA asa1,a2, . . . ,aM , whereM is the number ofpure strategy profiles inA. Let 1(A) ≡ ×n

i=11(Ai ). πi : 1(A) → R is thepayoff function for playeri . Let a ands be generic elements ofA and1(A),respectively. Assume that mixed strategies are observable; i.e., the randomizingdevices can be publicly observed.2 In order to distinguish the strategies of the

1 This refinement is related to what is called an “efficient breach” in the literature on contracteconomics (Polinsky, 1983). The point of the “efficient breach” principle is that if a breach can producean additional surplus that makes some of the agents better off without hurting the others, then theyshould do it. This, in the context of repeated games, implies that on the equilibrium path there shouldnot be any chance for a “breach”.

2 This is a commonly made assumption in the standard repeated game analysis, and is known notto matter much. Fudenberg and Maskin (1991) have shown that even if public randomization is not

Page 4: Compensation Principle in Repeated Games

4 KONG-PIN CHEN

repeated games we will abuse the terminology a little bit by calling botha ands actions.

The Repeated Game0∞(δ)

0∞(δ) is the infinitely repeated version of0 with discount factorδ.A path is a sequence of action profiless = {s(t)}∞t=1, wheres(t) ∈ 1(A) for

all t .A history of length tis st = (s(1), . . . , s(t)) ∈ 1(A)t . The payoff for playeri

from paths is

Vi (s) = (1− δ)∞∑

t=1

δt−1πi (s(t)).

A strategy for player iis a sequence of functionsσi = {σi (t)}∞t=1, whereσi (1) ∈ 1(Ai ), σi (t): 1(A)t−1 → 1(Ai ) for all t ≥ 2. Letσ = (σ1, . . . , σn).Let s(σ ) be the path induced by the strategyσ . The expected payoff for playeri from σ is denoted byVi (s(σ )). Letσ |st be the strategy induced by the historyst , i.e.,σ |st (sm) = σ(st , sm) for all sm ∈ 1(A)m.σ is aNash equilibriumif Vi (s(σ )) ≥ Vi (s(σ ′i , σ−i )) for all σ ′i and for alli .σ is a subgame perfect equilibrium(SPE) ifσ |st in a Nash equilibrium for

any possible historyst .Letmi = mins−i∈1(A)−i maxai∈Ai πi (ai , s−i )be the minimax payoff for playeri .

Letπi (zi ) = mi . zi is the action profile which givesi ’s minimax payoff. Also letF = CO{π(a) | a ∈ A}, the convex hull of the feasible payoff set.F∗ = {v ∈F | vi > mi∀i }, the feasible and individually rational payoff set.

3. TEMPORAL DEVIATION-FREE EQUILIBRIUM

In this section we propose a refinement of SPE, the temporal deviation-freeequilibrium, which requires that a deviation by one player which makes all theplayers better off should be ignored. This implies that an action profile fromwhich a unilateral deviation can make all the players better off should not be

available and a mixed strategy is not observable, the set of subgame perfect equilibria remains thesame. This is proved by replacing all the mixed strategies on the equilibrium path by sequences ofpure strategies yielding exactly the same payoffs. When a mixed minimax strategy is called for in thepunishment phase, the continuation payoff is carefully constructed so that the punishers are indifferentbetween all the pure strategies which are given positive weight by this minimax strategy. Unfortunately,this construction cannot be applied in our model because some of the pure strategies that are requiredto replace the mixed strategies are eliminated by the refinement restriction (see Section 3). So theassumption of the observability of a mixed strategydoesmatter in our model. However, to give asatisfactory treatment of it may require lengthy manipulation of the order of the actions and the valueof continuation payoffs and is somewhat beyond our main interest. So we content ourselves with simplyassuming that the mixed strategy is observable.

Page 5: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 5

part of an equilibrium. We first characterize the equilibrium payoff. Then wegive a simple algorithm to find it when the players use only pure actions. Thisalgorithm is then applied to some familiar games in various examples.

DEFINITION 1. Let σ be a subgame perfect equilibrium of0∞(δ). Supposethere are two historiesst = (st−1, σ (t)(st−1)) andst = (st−1, s(t)) such that

s(t) = (si (t), σ−i (st−1)),

π(s(t)) ≥ π(σ(t)(st−1)).

Thenσ is called atemporal deviation-free equilibrium(TDFE) if

σ |st = σ |st .

This definition says that, at some staget , after a certain historyst−1, σ pre-scribes thatσ(st−1) be played. But if at this stage instead of playingσi (st−1)

player i deviates unilaterally and playssi (t), and in his doing so every playeris at least as well off, then the TDFE requires that the players keep playing thegame as if no player has deviated.

Remarks. (1) TDFE always exists because playing the stage game Nashequilibrium at all stages is a TDFE.

(2) It is clear from Definition 1 that an action profile from which a unilateraldeviation that makes the deviant strictly better off without hurting the otherplayers will never be used by a TDFE, because if so the deviant can make himselfbetter off at that stage and still enjoy the same continuation payoff, violating therequirement of SPE. So the definition of TDFE can be rephrased as follows:

DEFINITION 1′. Letσ be a SPE of0∞(δ)andE be the set of actionss ∈ 1(A)such that there existsi ∈ N ands′i ∈ 1(Ai ) with

π(s′i , s−i ) ≥ π(s),πi (s′i , s−i ) > πi (s); (1)

thenσ is a TDFE ifσ |st (t + 1) 6∈ E for all st andt .

Let D(si ) = {s ∈ E | si = si }.Let Di = {si ∈ 1(Ai ) | D(si ) = (si ,1(A)−i )}. D ≡ ×n

i=1Di .Elements inE are those action profiles from which a unilateral deviation can

make the deviant strictly better off and at the same time make the rest of theplayers at least as well off.

Elements inD(si ) are action profiles inE whosei th component issi . D(si )

can be empty.

Page 6: Compensation Principle in Repeated Games

6 KONG-PIN CHEN

FIG. 2. Elements inDi .

Elements inDi are those actions of playeri which, when combined with anys−i ∈ 1(A)−i , are always inE. This means that a TDFE will not prescribe playeri to use actions inDi because there is always a chance for a Pareto-improvingunilateral deviation (not necessarily byi ). For example, in the game of Fig. 2,any action profile for player 2 which assigns positive weight onb3 is in D2.

Given any s−i ∈ 1(A)−i \D−i , define vi (s−i ) = maxai∈Ai πi (ai , s−i ).Let xi (s−i ) ∈ Ai be such thatπi (xi (s−i ), s−i ) = vi (s−i ). Let mi =mins−i {πi (xi (s−i ), s−i ) | (xi (s−i ), s−i ) 6∈ E}. Finally, letπi (xi ) = mi . xi is theaction profile which attains the constrained “minimax” fori , mi . By definitionxi is not in E.

In a TDFE, there are restrictions on the actions the players can use to minimaxsome player. In particular, when minimaxing playeri , any playerj 6= i will notuse actions inDj . (That is why we require thats−i ∈ 1(A)−i \D−i above.) Also,if any x ∈ E happens to be the action profile that minimaxesi , then it will not beused by TDFE either. (That is why we require(xi (s−i ), s−i ) 6∈ E when we definemi above.) So the minimax payoff that any playeri can be held down to,mi ,might be higher thanmi when TDFE restriction is imposed. (See Example 7.)

In order to state the theorem formally we need one more definition.

DEFINITION (Abreuet al., 1994). Let V = CO{π(s) | s ∈ 1(A)\E} andV∗ = {v ∈ V | vi > mi∀i }. V satisfies thepayoff asymmetryif there existsvi ∈ V for i = 1, 2, . . . ,n such thatvi

i < vji for all j 6= i .

THEOREM1. If V satisfies payoff asymmetry, then for anyv ∈ V∗ there existδ < 1 such thatv can be supported by a TDFE ifδ ≥ δ. Moreover, only elementsin V∗ can be TDFE payoff.

This is in fact a Folk-theorem-like result for when the players are precludedfrom using action profiles inE.

In order to prove the theorem, we need several previous results.

Page 7: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 7

THEOREM(Abreuet al., 1994). Assume that F∗ satisfies payoff asymmetry.Then for anyv ∈ F∗, there existsδ < 1 such thatv is a SPE outcome for anyrepeated game with discount factorδ ≥ δ.3

This is a result which greatly weakens the condition needed to prove theFolk Theorem previously (viz., full dimensionality; see Fudenberg and Maskin,1986). The idea of the proof is that under payoff asymmetry it can be shown thatgiven anyv ∈ V∗, there existsvk ∈ V∗, k = 1, 2, . . . ,n, such thatvk

k < vik for

all i, k, i 6= k, vk À mk, andvkk < vk. In order to supportv as a SPE outcome,

the players play a path yielding the payoffv and usezk andvk as punishmentsagainst playerk if he deviates.

LEMMA (Fudenberg and Maskin, 1991). If δ > 1−1/M , then for anyv ∈ F∗

there is a sequence of pure actionsa = {a(t)} such that V(a) = v.

This lemma says that even if restricted to pure actions, it is still possible togenerate any feasible and individually rational payoff by playing a sequenceof appropriately ordered pure actions. That is, restricting to pure actions losesnothing in terms of possible layoffs. However, it can be easily shown that itapplies also to mixed actions and toV∗. That is, for anyv ∈ V∗ there exists asequence of mixed actions{s(t)}∞t=1 whose payoff isv.

We now use the results above to prove our theorem.

Proof of Theorem1. The second part of the theorem, that only elements inV∗ can be TDFE payoff, is obvious, since any playerk can guarantee a minimumpayoffmk in the game by playing the best response againstxk in every stage. Wenow prove the first part of the theorem. By payoff asymmetry there exist vectorsv1, . . . , vk, . . . , vn in V∗ with vk À mk, vk

k < vk, andvkk < vi

k for all i 6= kand for allk. Any v ∈ V∗ can be written as a linear combination of elements in1(A)\E. So by the lemma we can find a sequence of action profiles{s(t)}∞t=1 in1(A)\E such thatV({s(t)}∞t=1) = v.

Becausevk ∈ V∗ for k = 1, . . . ,n, we can again findn sequences of actionprofiles{sk(t)}∞t=1 in 1(A)\E with V({sk(t)}) = vk. By the theorem of Abreuet al. (1994),v can be supported by the following strategy:

Phase I: Play according to the path{s(t)}∞t=1. If any playerk deviates, go tophase IIk.

Phase IIk: Play xk for Tk stages, then go to phase IIIk. If any j deviates,start IIj .

Phase IIIk: Play according to the path{sk(t)}∞t=1. If any player j deviates,go to phase IIj .

3 This theorem is in fact stated in terms of the NEU condition, which is implied by payoff asymmetry.It is easier to work with the latter in our context.

Page 8: Compensation Principle in Repeated Games

8 KONG-PIN CHEN

It is easy to show that ifTk is chosen appropriately the strategy above is SPE.Note that alls(t)’s, xk’s, andsk(t)’s are in1(A)\E, so in no stage willσ assignelements inE. This means thatσ is also a TDFE. Q.E.D.

Next we give a few examples to illustrate how to find the TDFE payoff set.

EXAMPLE 1 (Chicken). In the chicken game of Fig. 3a we first want to findthe setE. Let r (t) be the weight that player 1’s (2’s) actions1 (s2) puts ona1

(b1); that is,s1(a1) = r , s2(b1) = t . Then

E = {s ∈ 1(A) | ∂πi (s)/∂r ≥ 0, ∂π1(s)/∂r > 0, r < 1}∪ {s ∈ 1(A) | ∂πi (s)/∂t ≥ 0, ∂π2(s)/∂t > 0, t < 1}∪ {s ∈ 1(A) | ∂πi (s)/∂t ≤ 0, ∂π2(s)/∂t < 0, t > 0}∪ {s ∈ 1(A) | ∂πi (s)/∂r ≤ 0, ∂π1(s)/∂r < 0, r > 0}

= {s ∈ 1(A) | r < 1/2 andt < 1} ∪ {s ∈ 1(A) | t < 1/2 andr < 1}.So

π(1(A)\E) = CO{(5, 1), (4, 4)} ∪ CO{(1, 5), (4, 4)} ∪ H,

where

H = {π(s) | 1≥ s1(a1) ≥ 1/2, 1≥ s2(b1) ≥ 1/2}= CO{(5/2, 9/2), (4, 4), (9/2, 5/2), (5/2, 5/2)}.

Also,(m1, m2) = (1, 1). By Theorem 1 the TDFE payoff set is CO{(1, 5), (5, 1),(5/2, 5/2), (4, 4)}, which is the shaded region in Fig. 3b. Comparing with theSPE payoff set, which is CO{(1, 1), (1, 5), (4, 4), (5, 1)}, the TDFE payoff seteliminates the less efficient part of SPE payoffs.

EXAMPLE 2 (Battle of the Sexes). In the Battle of the Sexes game in Fig. 4a,we can use the same way calculating as in Example 1 to show that the1(A)\E ={2/3≥ s1(a1) ≥ 1/3 and 2/3≥ s2(b1) ≥ 1/3} ∪ (a1, b2)∪ (b2,a1). So the TDFEpayoff set is CO{(2/3, 2/3), (1, 2), (2, 1), (1, 2/3), (2/3, 1)}, the shaded region inFig. 4b. Note that the SPE payoff set is CO{(2/3, 2/3), (2/3, 4/3), (1, 2), (2, 1),(4/3, 2/3)}.

The reason for the results in the two examples above is that actions corre-sponding to the eliminated region are “bad”, from which one player’s deviationcan improve the payoff of both.

EXAMPLE 3 (Cournot Game). Two firms are engaged in a Cournot competi-tion. The firms can produce with zero cost. The market demand for the outputis P = a− bQ if Q ≤ a/b and is zero otherwise, wherea, b > 0. It is easy toshow that

1(A)\E = {(q1,q2) > 0 | a− bq1− 2bq2 ≥ 0,a− 2bq1− bq2 ≥ 0}∪ {(a/2b, 0), (0,a/2b)},

Page 9: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 9

FIG. 3. Chicken.

the shaded region in Fig. 5a.4 Soπ(1(A)\E) = {(π1, π2) > 0 | π1 + π2 ≤a2/4b}. However, since(m1, m2) = (a2/9b,a2/9b), the TDFE payoff is{(π1, π2)

≥ (a2/9b,a2/9b) | π1 + π2 ≤ a2/4b}, in which every firm receives at least theCournot equilibrium payoff. (See Fig. 5b.)

There is a simple intuition why each firm will receive at least the Cournotpayoff. In order to force any firm 1’s payoff down below the Cournot levela2/9b

FIG. 4. Battle of the sexes.

4 There is an interesting contrast between the restriction of TDFE in this example and that of a weaklyrenegotiation-proof equilibrium under imperfect monitoring, which requires the firms to produce inthe regionR∪ S. See Chen (1995).

Page 10: Compensation Principle in Repeated Games

10 KONG-PIN CHEN

FIG. 5. Cournot Game.

in every stage, the other firm (2) must produce more than the Cournot amounta/3b. However, if firm 2 is producing more thana/3b (e.g., at(a/4b,a/2b)), thenit can always reduce its output from its best response quantity (to, for example,(a/4b, 3a/8b)) such that both firms are made better off, violating the restrictionof TDFE.

EXAMPLE 4 (Prisoners’ Dilemma). See Fig. 6. No action profile can be elim-inated in Prisoner’s Dilemma, so the TDFE payoffs set is exactly the same asthat of SPE.

For the rest of the section we will consider an interesting case when the playersuse only pure actions. In this case we have to assume that there exists a purestrategy Nash equilibrium for0.5 Here Theorem 1 still holds; the only differenceis that nowmi should be interpreted as minimax in pure actions, which might begreater than the value of mixed-action minimax for all the players. We can finda very simple algorithm that characterizes the TDFE payoff: First check everyrow and eliminate all those payoff vectors that are Pareto-dominated by anothervector in the same row (and strictly so for the row player). This is equivalent toeliminating those action profiless’s satisfying (1) fori = 1. Then check everycolumn and eliminate those payoff vectors that are Pareto-dominated (strictly sofor the column player) by another vector in the same column. This is equivalentto eliminating those action profiless’s satisfying (1) fori = 2. Then check everybox, . . . , and so on until we exhaust all the players. Then compute the (pureaction) minimax payoffm of the game with the action profiles remained. The

5 Otherwise there might be no SPE for0∞. For instance, in the repeated matching pennies gamethere is no SPE in pure action: the only SPE is for the players to play half–half mixing in every stage.

Page 11: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 11

FIG. 6. Prisoners’ Dilemma.

TDFE payoff set is the feasible and individually rational payoffs of this “reducedgame”.

In the following we apply this algorithm to find the TDFE payoffs of somefamiliar games.

EXAMPLE 5 (Battle of the Sexes). See Fig. 4. Checking the first row we findthat the payoff(0, 0) is Pareto-dominated by(1, 2), with the payoff for therow player strictly smaller(0 < 2). So we eliminate the action profile(a1, b1).Similarly, the action profile(a2, b2) is eliminated by(a2, b1) when we check thesecond row. Next we check for the column player but eliminate nothing further(than(a1, b1) and(a2, b2)). So(a2, b1) and(a1, b2) are the only action profilesplayed. Moreover,m= (m1, m2) = (1, 1). Thus the TDFE payoff set is the linesegment between(1, 2) and(2, 1), which is the Pareto frontier of the feasible andindividually rational payoff set. Note that the SPE payoff set when the playerscan use pure strategies is CO{(1, 2), (2, 1), (1, 1)}.

EXAMPLE 6 (Chicken). See Fig. 3. The SPE payoff set is CO{(1, 1), (1, 5),(5, 1), (4, 4)}. On the other hand, the action profile(a2, b2) is eliminated (byeither row or column player). So(m1, m2) = (1, 1), andV∗ = CO{(4, 4), (1, 5),(5, 1)}. Note that this set is smaller than the TDFE payoff set when the playerscan use mixed actions (Example 1).

In the following example we show that the value ofmi can be different frommi . This is because when some of the actions cannot be used by the players topunish the deviant, the minimax of the deviant might be higher.

Page 12: Compensation Principle in Repeated Games

12 KONG-PIN CHEN

FIG. 7. mi lower thanmi .

EXAMPLE 7. Consider the game in Fig. 7. The minimax payoff ism= (0, 1).However,b1 ∈ D2; i.e., player 2 never usesb1 in a TDFE, som= (1, 1).

4. AN EXTENSION

In this section we propose a refinement which is an extension of TDFE. Werequire that an equilibrium strategy should leave no room for the possibility thata unilateral multishot deviation can make all the players better off. Because ifsuch a deviation exists, then after the deviation the deviant might with goodreason try to persuade the other players to go back to the original path as if nodeviation occurs. And since every player (if they care about payoff only) is betteroff, the deviant might succeed. But if so the original equilibrium path will not becredible. To require that no such change exists means the players want to exhaustall these kinds of possibilities from the beginning.

DEFINITION 2. Let σ be a SPE.σ is called acompensating equilibrium(CE)if there does not exist any historysT andt1 ≤ T such that

(1) s(t + 1) = σ(t + 1)(st) ∀0≤ t ≤ t1− 1.(2) σj (t + 1)(st) = sj (t + 1) ∀0≤ t ≤ T − 1 ∀ j 6= i .(3) (1− δ)∑T

t=t1δt−t1π(s(t)) ≥ VT (σ |st1−1), inequality strict for playeri ;

whereVT (σ |st1−1) = (1− δ)∑Tt=t1

δt−t1π(s(σ |st1−1)(t)) is the total payoff fromstaget1 to T after the historyst1−1.

Page 13: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 13

FIG. 8. CE restriction illustrated.

Note that no matter how many deviations a deviant makes, we are alwayscomparing the payoff of the resulting path with that of theoriginal path. In doingso we are taking the original path as some kind of “focal point” (Schelling, 1960)to which the players expect to go back when their payoffs within a time intervalof deviation are at least as great as can be expected from it. This is in contrastwith DeMarzo (1992), where the “focal point” is a deviation path (made possiblyby many players) suggested by a “leader”.

To give a more concrete example of what this definition means, suppose inFig. 8 that patha is the SPE paths(σ ). Playeri first deviates at staget1, afterwhich σ prescribes that the players play according at pathb. Then at staget2playeri deviates again, after which they follow pathc. But on this path playerideviates again at stageT , after which they follow pathd. If it happens that allthe players’ payoffs on the shaded path (the path they went through under allthe deviations above) are at least as great as they would have received hadi notdeviated (the path from A to B), withi ’s payoff strictly higher, thenσ is not agood equilibrium because playeri , after the deviation, might be able to convinceall the other players to go back to patha (point C and after) successfully.

Remarks. (i) This requirement is obviously stronger than TDFE because ifwe lett1 = T it reduces to TDFE.

(ii) Playing stage game Nash equilibria in all stages regardless of history isa CE, so CE always exists.

(iii) Because of (ii) the set of CE payoffs contains the convex hull of thestage game Nash equilibrium payoffs.

Next we want to give conditions under which we can characterize the CEpayoff set. Since the complex intertemporal trade-off involved in the definitionof CE makes it very difficult to characterize its payoff set, we are contented infinding out when CE and TDFE or SPE yield identical payoffs.

DEFINITION 3. Conditionα: Let {s(t)}∞t=0 ⊂ 1(A)\E. If there exists a posi-tive integerm, a playerk, and{x(t)}mt=0 ⊂ 1(Ak) such that

πk(x(0), s−k(0))− πk(s(0))+m∑

t=1

δt [πk(x(t), xk−k)− πk(s(t))] > 0,

Page 14: Compensation Principle in Repeated Games

14 KONG-PIN CHEN

then there existsj (k) such that

πj (k)(x(0), s−k(0))− πj (k)(s(0))+m∑

t=1

δt [πj (k)(xk−k, x(t))− πj (k)(s(t))] < 0,

wherexk is the constrained minimax strategy againstk.Conditionα1: The same inequalities above hold for when{s(t)}∞t=0 ⊂ 1(A)

andxk−k is replaced byzk

−k.

Conditionα (α1) states that we suppose{s(t)}∞t=0 is the original path and playerk deviates by playing{x(t)}mt=0 and is punished by being minimaxed withxk (zk

for conditionα1). If his total payoff from his first deviation(t = 0) to anyt = m(when he is still being minimaxed) is greater than that given by the original path{s(t)}mt=0, then there must be another playerj (k) who is hurt. This is a conditionthat, given any player, there is always another player whose interest conflictswith him in a very weak sense.

THEOREM2. If 0 satisfies conditionsα, then the set of CE coincides with theset of TDFE. If 0 satisfies conditionα1, then the set of CE coincides with the setof SPE.

Proof. We first want to show that, given conditionα, the strategy constructedin Theorem 1 is a CE. Suppose playerk is contemplating a multishot deviationand his first deviation occurs at staget1. Conditionα insures that if playerkgains (relative to the original equilibrium path) from staget1 to any stage inphase IIk, then there must be another playerj (k) who is hurt (relative to theoriginal equilibrium path). Sok is not able to make every player better off bymaking a multishot deviation from staget1 to any stage in IIk. It now sufficesto show thatk is unable to do so from staget1 to any stage in phase IIIk either.Note that a single deviation byk in phase IIIk will send the players back to IIk

immediately, so it is enough to show thatk is unable to violate the CE requirementfrom staget1 to the stage of his first deviation (call itt2) in phase IIIk. Since thebest payoff playerk can receive in staget2 is bk ≡ maxa∈1(A) πk(a), we onlyhave to makeTk large enough (i.e., punishk severely enough) so that, even ifhe receivesbk at t2, his payoff fromt1 to t2 is still less than that given by theoriginal equilibrium path. In summary, the CE requirement is satisfied since inphrase IIk if player k gains then there must be another playerj (k) who is hurt;and in phase IIIk playerk is definitely worse off even if he deviates again.

The same argument applies when somej 6= k is being punished. That is,exactly the same argument can be used to show that when the players are ineither phase IIj or III j ( j 6= k), playerk will not be able to violate the CErequirement by a multishot deviation either. So if conditionα holds, then thestrategy constructed in Theorem 1 is also a CE and, consequently, the payoffset coincides with the TDFE payoff set. Ifα1 holds, exactly the same argumentshows that the CE payoff set coincides with that of SPE. Q.E.D.

Page 15: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 15

EXAMPLE 8. Consider the PD game in Fig. 6. We will show that it satisfiesconditionα1 so that the set of SPE payoffs is the same as that of CE. We dothis for the case whenk = 1. The case fork = 2 is exactly the same. Suppose{s(t)}∞t=0 is any path in1(A) and letm and{x(t)}mt=0 ⊂ 1(A1) be such that

π1(x(0), s2(0))− π1(s(0))+m∑

t=1

δt [π1(x(t), x12)− π1(s(t))] > 0.

For notational simplicity assume thatsi (t) andx(t) all denote the weights playeri and 1 put on the action C. Sincex1

2 is the minimax strategy against 1,x12 puts zero

weight on C. So the above inequality reduces tos1(0)− x(0)+∑mt=1 δ

t(s1(t)−x(t)− 3s2(t)) > 0. This implies that

0> x(0)− s1(0)+m∑

t=1

δt(x(t)− s1(t)+ 3s2(t)). (2)

At the same time the change of player 2’s payoff betweent = 0 andt = m is

3(x(0)− s1(0))+m∑

t=1

δt(3x(t)− 3s1(t)+ s2(t)). (3)

Multiplying both sides of (2) by 3 and using the fact thats2(t) ≥ 0, we have

0 > 3(x(0)− s1(0))+ 3m∑

t=1

δt(x(t)− s1(t)+ 3s2(t))

> 3(x(0)− s1(0))+m∑

t+1

δt(3x(t)− 3s1(t)+ s2(t)).

That is,(3) < 0. This implies that PD satisfies conditionα1, and by Theorem 2the CE payoff set is exactly the same as that of the SPE payoff.6

This example shows that our refinement concept is not stronger than SPEin the repeated prisoners’ dilemma. We will explain this in more detail in theconcluding section.

5. RELATION TO THE LITERATURE

There has been an explosion of the literature on the refinement of the re-peated games equilibrium recently. Bernheim and Ray (1989) and Farrell and

6 It is possible to explicitly construct a strategy that uses the definition of CE directly to prove thesame fact, as is done in a previous version of this paper.

Page 16: Compensation Principle in Repeated Games

16 KONG-PIN CHEN

Maskin (1989) are seminal papers on renegotiation-proof equilibrium. Abreuand Pearce (1989), Abreuet al. (1993), and Pearce (1987) are another view onrenegotiation-proofness. Bergin and McLeod (1993) is an axiomatic treatmentwhich unifies several refinement concepts. Benoit and Krishna (1993) focuseson finitely repeated games. Asheim (1991) is a refinement using von Neumann’sconcept of stability. Chen (1995) discusses renegotiation proofness under im-perfect information. Rabin (1991) presents a refinement called a reneging-proofequilibrium which requires that after a player deviates he loses the trust of hisopponents: They will go to a path on which the deviant will have no chanceto make any more profitable deviation, that is, the deviant will play his bestresponse at every stage thereafter. Rabin (1993) goes a step further than all theliterature to find the behavioral justification of why some stage game actions arenot used (like the TDFE restriction in this paper) because the players may feelthat they are being treated fairly or unfairly by their opponents. The refinementin the literature which is closest to the compensating equilibrium proposed hereis weakly renegotiation-proof equilibrium (WRPE) (Bernheim and Ray, 1989,and Farrell and Maskin, 1989). It is evident from the definition of CE that there isa strong flavor of renegotiation. After every unilateral deviation the deviant cantry to convince the other players to go back to the original path. If his deviationmakes all the players better off, then CE assumes that he can successfully do so.But if this is true then the original path will not be credible because this playerwill have an incentive to deviate. On the other hand, WRPE requires collectiverationality on every subgame of the equilibrium path. Thus if at a certain stagethe players can find another continuation payoff which strictly Pareto-dominatesthe continuation payoff at this point, then they will unanimously agree to go tothat subgame, and the original path will be incredible. So although both CE andWRPE allow negotiation, they are quite different conceptually. However, one ad-vantage of our refinement over that of the RPE literature is that the assumptionsincorporated in our solution concept are appropriate even when there is littlescope for communication. While RPE supposes that players can communicateenough to coordinate on postdeviation continuation equilibria, all that is neededhere is that the players enter into the game with the rule of thumb “if deviationsdo not hurt anybody, then do not react to them.” They do not have to figure outor coordinate on a new plan, as is implicit in RPE.7 We also give two examplesbelow to show that they have no logical relation.

EXAMPLE 9. In the Battle of the Sexes game in Fig. 4, the TDFE payoff setis

A = {(2/3, 2/3), (2/3, 1), (1, 2), (2, 1), (1, 2/3)},

7 I thank a referee for pointing this out to me. I virtually reproduce his comments.

Page 17: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 17

FIG. 9. WRPE vs CE payoff sets.

as is shown in Example 2. So the CE payoff set is a subset ofA. By Theorem 1in Farrell and Maskin (1989, p. 332) we know that anyv ∈ F∗ can be supportedby a WRPE. This can be shown by setting botha1 anda2 in that theorem tobe the mixed-action(1/3, 2/3) and invoking the theorem. So in the Battle of theSexes game the WRPE payoff set contains the CE payoff set.

EXAMPLE 10. It is shown in Farrell and Maskin (1989, p. 341) that the WRPEpayoff set of the advertising game is CO{(2, 1), (1, 2), (2, 0), (0, 2), (0, 0)}, thatis, the shaded area in Fig. 9. We want to show that the CE payoff set isB =CO{(0, 3), (3, 0), (0, 0)}.

Let v be any payoff vector inB. Let {a(t)}∞t=1 be a path whose total payoff isv. Obviously,a(t) can be chosen to be either(a1, b2), (a2, b1), or (a3, b3). Letσ be such that:

Phase 1. Play according to{a(t)}∞t=1 as long as nobody deviates. If anyplayer deviates, go to phase 2.

Phase 2. Play the stage game Nash equilibrium(a3, b3) forever.σ is in fact a trigger strategy and is easily seen to be a SPE.To show thatσ is also a CE, we prove it by considering player 1’s deviation

only. The play for player 2 is symmetric. Also, in order to ease notation, weprove it as if there were no discounting. The proof can be easily adapted if weconsider discounting explicitly and when the discount factor is close to 1.

First note that both players receive payoff 0 at every stage in phase 2, whichis the lowest payoff they can receive in any stage under{a(t)}∞t=0. So player 1cannot make his first deviation at the stage when(a3, b3) is played because he willthen receive 0 or−2 first, then at most 0 forever; which is unprofitable from hisfirst deviation to any later stage. He will not make his first deviation at the stage

Page 18: Compensation Principle in Repeated Games

18 KONG-PIN CHEN

when(a1, b2) is played either, because this only decreases his payoff withoutincreasing that of player 2. So the only stage when player 1 can consider his firstdeviation is when(a2, b1) is played, under which they receive(0, 3). Player 1’sdeviation makes the payoff either(2,−2) (if he deviates toa3) or (1, 1) (if hedeviates toa1). Consider the first case. By deviating toa3 player 1 increases hispayoff by 2. However, player 2 loses 5 (from 3 to−2). In order to compensateplayer 2, player 1 has to deviate froma3 to a1 at least three times in phase 2.(Note that at other stages in phase 2 they both receive 0, which is the lowestpayoff they can receive. So we do not have to worry that at these stages theirpayoffs might be higher than the original path.) But by doing so he decreaseshis payoff by 2× 3 = 6, which means that his total payoff must decrease byat least 6 (compared with the original path). This will more than offset the gainfrom his first deviation. So player 1 will not make his first deviation at(a2, b1)

by playinga3. Next consider the second case. If player 1 deviates toa1, he willincrease his payoff by 1. However, player 2’s payoff is decreased by 2. In orderto compensate player 2, player 1 has to deviate froma3 to a1 at least twice inphase 2. (Note that this is a game with discounting.) But by doing so he decreaseshis payoff by 4, which again more than offsets his original gain.

We have shown that for any vector in B there exists a SPE which supports it.Further, under this strategy the requirement of CE is never violated, so it is alsoa CE. In this example CE payoffs contain that of WRPE.

6. CONCLUSION

There has been a well known embarrassment of riches for the Folk Theoremin repeated games, viz., the equilibrium payoff set is too large to be useful interms of predictive power. In particular, a strategy can be designed to insure thatall the players have verylow payoffs, which is annoying from the viewpoint ofcollective rationality. Many refinements have been proposed to try to pick outthe efficient payoffs by requiring Pareto-efficiency in the subgames. Ironically,a refinement insistent on efficiency in all subgames often leads to inefficientoutcomes, or even worse, nonexistence. In this paper we suggest a refinement inthe spirit of the “efficient breach” principle and this does lead to more efficientoutcomes. In fact, it is easy to show that, unlike some of the refinement conceptswhich can also eliminate payoffs on the Pareto-frontier as an equilibrium, therefinement proposed here always eliminates a payoff which is Pareto inferior tosome other payoff. However, our starting point is not Pareto-efficiencyper se.Rather, we impose behaviorally plausible assumptions as our refining criterion.Moreover, one should be warned that because we always eliminate from the set ofactions those which are Pareto dominated in the sense proposed, the punishmentsavailable might not be as severe as when we require only subgame perfection;so the discount factor needed in order to support a particular payoff might be

Page 19: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 19

FIG. 10. Augmented Prisoners’ Dilemma.

greater than that for subgame perfection. This means there exists a range ofdiscount factors which can support SPE payoff but not that of TDFE or CE. Forinstance, consider the augmented prisoners’ dilemma in Fig. 10; wheren is apositive integer andε is a small positive number.

Suppose the game is repeatedly played. The action (E,E) is eliminated by theTDFE requirement. So to support(4, 4) as a TDFE payoff, we have to requirethat(1− δ)n− δ < 0, orδ ≥ n/(n+ 1). On the other hand, for a SPE to support(4, 4), let a defection lead to playing (E,E) for one stage and to reversion to(C,C) thereafter. To support (E,E) being played once after a defection, (D,D) isplayed forever if one of the players fails to play E. This SPE can support(4, 4) ifδ > n/(n+4). So ifn/(n+4) < δ < n/(n+1), (4, 4) can be supported by a SPEbut not TDFE. The reason for this is, of course, that the severe punishment (E,E)cannot be used by a TDFE.8 But in fact this is also the common disadvantage ofall refinements that invoke Pareto efficiency as a criterion of refinement.

The TDFE requirement can also be criticized on the grounds that it cares toomuch about stage game payoffs instead of total payoffs when what the playersreally care about is the latter. But it is not argued here that the players shouldcareonly about total payoffs; it is simply argued that once a strategy is agreedupon, every player knows how much he would receive in each stage, so any stagegame payoff which is more than expected should be regarded as a “windfall”and should not be objected to by any player.

One aspect that is not touched on in this paper is a study of how the players

8 This example is taken entirely from a referee’s comments. I thank him for permitting me to use it.

Page 20: Compensation Principle in Repeated Games

20 KONG-PIN CHEN

interpret deviation, which has been a significant problem in multistage games.While this might seem unreasonable for compensating equilibrium, it also has itsadvantage. The player does not have to make any inference about the intentionof the deviant on the various out-of-equilibrium paths for which, despite itsimportance in the study of repeated games, it is difficult to find a criterion withconsensus. The player only compares the payoff he receives with what he wouldhave received along the equilibrium. This is simple and accords well with thebasic assumption that the player cares only about his own payoff.

Also, the refinement proposed here works well (more stringent than SPE)when the games being repeated have a flavor of coordination (e.g., Battle of theSexes and Chicken); that is, there exist action profiles of which a deviation makesall the players better off. For games in which the players’ interests conflict (e.g.,Prisoners’ Dilemma or zero-sum games), there is not much room for Pareto-improvement (in the sense of our refinement), so the refinement is not strongerthan SPE.

Finally, we have been unable to fully characterize the set of CE payoffs inthis paper. Our feeling is that, even if there is a way to do so, considering thecomplicated intertemporal relation involved in the definition of CE, it is almostimpossible to obtain a clear and simple characterization.

REFERENCES

ABREU, D., DUTTA, P., and SMITH, L. (1994). “The Folk Theorem for Discounted Repeated Games: ANEU Condition,”Econometrica62, 939–948.

ABREU, D., and PEARCE, D. (1989). “A Perspective on Renegotiation in Repeated Games,” mimeo,Harvard University.

ABREU, D., PEARCE, D., and STACCHETTI, E. (1993). “Renegotiation and Symmetry in RepeatedGames,”J. Econ. Theory60, 217–240.

ASHEIM, G. (1991). “Extending Renegotiation-Proofness to Infinite Horizon Games,”Games Econ.Behav. 3, 278–294.

BENOIT, J.-P., and KRISHNA, V. (1993). “Renegotiation in Finitely Repeated Games,”Econometrica61, 303–324.

BERGIN, J., and MACLEOD, B. (1993). “Efficiency and Renegotiation in Repeated Games,”J. Econ.Theory61, 42–73.

BERNHEIM, D., and RAY, D. (1989). “Collective Dynamic Consistency in Repeated Games,”GamesEcon. Behav. 1, 295–326.

CHEN, K.-P. (1995). “On Renegotiation-Proof Equilibrium under Imperfect Monitoring,”J. Econ.Theory65, 600–610.

DeMarzo, P. M. (1992). “Coalition, Leadership, and Social Norms: The Power of Suggestion in Games,”Games Econ. Behav. 4, 72–100.

FARRELL, J., and MASKIN, E. (1989). “Renegotiation in Repeated Games,”Games Econ. Behav. 1,327–360.

FUDENBERG, D., and MASKIN, E. (1986). “The Folk Theorem in Repeated Games with Discounting orwith Incomplete Information,”Econometrica54, 533–554.

Page 21: Compensation Principle in Repeated Games

COMPENSATION PRINCIPLE IN REPEATED GAMES 21

FUDENBERG, D., and MASKIN, E. (1991). “On the Dispensability of Public Randomization in DiscountedRepeated Games,”J. Econ. Theory53, 428–438.

PEARCE, D. (1987). “Renegotiation-proof Equilibria: Collective Rationality and Intertemporal Coop-eration,” mimeo, Cowles Foundation, Yale University.

POLINSKY, M. (1983).An Introduction to Law and Economics. Boston: Little, Brown.

RABIN, M. (1991). “Reneging and Renegotiation,” mimeo, UC Berkeley.

RABIN, M. (1993). “Incorporating Fairness into Game Theory,”Amer. Econ. Rev. 83, 1281–1302.

SCHELLING, T. (1960).The Strategy of Conflict. Cambridge, MA: Harvard Univ. Press.