justifiable punishments in repeated games

13
Games and Economic Behavior 88 (2014) 16–28 Contents lists available at ScienceDirect Games and Economic Behavior www.elsevier.com/locate/geb Justifiable punishments in repeated games Miguel Aramendia a,, Quan Wen b a BRiDGE group, Economia Aplicada IV, Universidad del Pais Vasco, Bilbao 48015, Spain b Department of Economics, University of Washington, Seattle, WA 98195-3330, USA a r t i c l e i n f o a b s t r a c t Article history: Received 12 June 2012 Available online 7 August 2014 JEL classification: C72 D74 Keywords: Repeated game Folk theorem Renegotiation proof In repeated games, subgame perfection requires all continuation strategy profiles must be effective to enforce the equilibrium; they serve as punishments should deviations occur. It does not require whether a punishment can be justified for the deviation, which creates a great deal of freedom in constructing equilibrium strategies, resulting the well-known folk theorem. We introduce justifiable punishments in repeated games. After one player deviates, the corresponding continuation or punishment is justifiable if either the deviation is bad to the other player or the continuation itself is good to the other player. We characterize the set of payoff vectors that can be supported by subgame perfect equilibria with justifiable punishments, as the discount factor goes to one. This limiting set of equilibrium payoffs can be quite different from the set of subgame perfect equilibrium payoffs. Any efficient, feasible, and strictly individually rational payoff can be supported by equilibrium with justifiable punishments. © 2014 Published by Elsevier Inc. 1. Introduction It has been recognized for long that numerous new equilibria emerge when a stage game is played repeatedly. The well-known folk theorem asserts that under certain conditions, any feasible and strictly individually rational payoff vector of a stage game can be sustained by a subgame perfect equilibrium in the corresponding infinitely repeated game when the players value their future high enough, see for example, Fudenberg and Maskin (1986). 1 Typically, there are often many different strategy profiles that result in the same payoff vector. Most of strategy profiles studied in the literature punish the deviating player in the continuation game, regardless the consequence of the deviation or the intention of the deviating player. As long as punishments are effective to deter deviations, sequential rationality itself does not impose any other restriction on the punishments. In order to weaken the equilibrium conditions, researchers often use the most severe punishment available to punish a player for any of his deviations. This independence between deviations and punishments creates a great deal of freedom in constructing equilibrium strategies. As a result, almost all reasonable, i.e., feasible and strictly individually rational, payoffs may arise in equilibria. The negative side of this coin is that folk theorem provides little guidance on which equilibrium to adopt when applying repeated game models in researches. In reality, however, histories often provide some meaningful references or justifications on what future plan of actions should be. Changing future course of actions must be caused by some incident that is outside of what we expect to happen, such as someone deviated in a repeated game. Social justice often provides the options on what to do in the future to * Corresponding author. E-mail addresses: [email protected] (M. Aramendia), [email protected] (Q. Wen). 1 For a more comprehensive overview on the recent developments in the repeated game literature, see Mailath and Samuelson (2006). http://dx.doi.org/10.1016/j.geb.2014.07.004 0899-8256/© 2014 Published by Elsevier Inc.

Upload: quan

Post on 16-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Justifiable punishments in repeated games

Games and Economic Behavior 88 (2014) 16–28

Contents lists available at ScienceDirect

Games and Economic Behavior

www.elsevier.com/locate/geb

Justifiable punishments in repeated games

Miguel Aramendia a,∗, Quan Wen b

a BRiDGE group, Economia Aplicada IV, Universidad del Pais Vasco, Bilbao 48015, Spainb Department of Economics, University of Washington, Seattle, WA 98195-3330, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 12 June 2012Available online 7 August 2014

JEL classification:C72D74

Keywords:Repeated gameFolk theoremRenegotiation proof

In repeated games, subgame perfection requires all continuation strategy profiles must be effective to enforce the equilibrium; they serve as punishments should deviations occur. It does not require whether a punishment can be justified for the deviation, which creates a great deal of freedom in constructing equilibrium strategies, resulting the well-known folk theorem. We introduce justifiable punishments in repeated games. After one player deviates, the corresponding continuation or punishment is justifiable if either the deviation is bad to the other player or the continuation itself is good to the other player. We characterize the set of payoff vectors that can be supported by subgame perfect equilibria with justifiable punishments, as the discount factor goes to one. This limiting set of equilibrium payoffs can be quite different from the set of subgame perfect equilibrium payoffs. Any efficient, feasible, and strictly individually rational payoff can be supported by equilibrium with justifiable punishments.

© 2014 Published by Elsevier Inc.

1. Introduction

It has been recognized for long that numerous new equilibria emerge when a stage game is played repeatedly. The well-known folk theorem asserts that under certain conditions, any feasible and strictly individually rational payoff vector of a stage game can be sustained by a subgame perfect equilibrium in the corresponding infinitely repeated game when the players value their future high enough, see for example, Fudenberg and Maskin (1986).1 Typically, there are often many different strategy profiles that result in the same payoff vector. Most of strategy profiles studied in the literature punish the deviating player in the continuation game, regardless the consequence of the deviation or the intention of the deviating player. As long as punishments are effective to deter deviations, sequential rationality itself does not impose any other restriction on the punishments. In order to weaken the equilibrium conditions, researchers often use the most severe punishment available to punish a player for any of his deviations. This independence between deviations and punishments creates a great deal of freedom in constructing equilibrium strategies. As a result, almost all reasonable, i.e., feasible and strictly individually rational, payoffs may arise in equilibria. The negative side of this coin is that folk theorem provides little guidance on which equilibrium to adopt when applying repeated game models in researches.

In reality, however, histories often provide some meaningful references or justifications on what future plan of actions should be. Changing future course of actions must be caused by some incident that is outside of what we expect to happen, such as someone deviated in a repeated game. Social justice often provides the options on what to do in the future to

* Corresponding author.E-mail addresses: [email protected] (M. Aramendia), [email protected] (Q. Wen).

1 For a more comprehensive overview on the recent developments in the repeated game literature, see Mailath and Samuelson (2006).

http://dx.doi.org/10.1016/j.geb.2014.07.0040899-8256/© 2014 Published by Elsevier Inc.

Page 2: Justifiable punishments in repeated games

M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28 17

everyone (the non-deviating player) except the party who is responsible for the incident (the deviating player). The change on future course of actions that is good to the non-deviating player is “economically” justifiable, because it is in this non-deviating player’s interest to make such a change. If the non-deviating player is the victim of the incident, punishing the deviating player is “morally” justifiable even if it is costly to the non-deviating player to carry out the punishment. On the other hand, if the non-deviating player is a beneficiary of the incident, carrying out the punishment that is costly to the non-deviating player would seem very unreasonable because carrying out costly punishment cannot be justified on either economical or moral ground.

Based on the arguments on what continuations are deemed to be justifiable, we study subgame perfect equilibria in two-person repeated games such that all continuations are justifiable.2 An equilibrium in repeated games can be viewed as an implicit contract between the players. The strategy profile prescribes a course of actions in the future for every possible history of plays. If a player deviates from what the strategy profile prescribed, i.e., breach the implicit contract, it is necessary to change the course of actions in the future. Whether such a change is justifiable or not depends on the consequence of the deviation. If one player’s deviation is bad to the other player, then any continuation will be justifiable, as a mean to punish such a “bad” behavior. Otherwise, a continuation is justifiable if and only if the non-deviating player benefits from the new continuation, meaning receive a high continuation payoff from it than from the continuation when no such a deviation occurs. Requiring all continuations to be justifiable seems intuitive and innocent, yet as Example 1 demonstrates, it has dramatic implications on what payoffs can be supported by subgame perfect equilibrium with justifiable punishments.3

Example 1. Consider the infinitely repeated game with the following stage game:

L RT 2, 2 0, −1B −1, 0 −2, −2

First observe that (T , L) is the unique Nash equilibrium in the stage game, and its payoff vector (2, 2) strictly dominates all other feasible payoff vectors. According to the folk theorem of Fudenberg and Maskin (1986), every feasible payoff vector that strictly dominates the minmax point (0, 0) can be supported by a SPE of the repeated game when the discount factor is sufficiently close to 1. For example, to support a symmetric payoff vector that is strictly dominated by (2, 2), Fudenberg and Maskin (1986) adopted a simple strategy profile (see Abreu, 1988) where the two players play the mutual minmax (B, R) for some number of periods followed by the stage-game Nash equilibrium (T , L) forever. This simple strategy profile requires to restart this sequence of plays after someone deviates. During the mutual minmax (B, R), if player 1 deviates to T , which increases player 2’s payoff from −2 to −1, so player 2 actually benefits from player 1’s deviating. In some sense, player 1’s deviation from B to T is a good “initiation” aiming for a better situation (T , L) for both players. Punishing player 1 by restarting this sequence of play also lowers player 2’s future payoff, which is neither “economically” nor “morally” justifiable for player 2 to do. As we will show later in detail, this repeated game has a unique subgame perfect equilibrium where all continuations are justifiable: the two players play the dominant stage-game Nash equilibrium (T , L) in every period.

Imposing justifiable continuations or punishments explicitly links players’ future course of actions to their current payoffs, which has been absent from most studies on repeated games with perfect monitoring. In repeated games with imperfect monitoring, such as Green and Porter (1984) (see also Mailath and Samuelson, 2006, for more references), players do not directly observe what actions have been played, but rather some imperfect signal of the current play. In most cases, such a signal is directly linked to a player stage game payoff, such as market price. Therefore, whether to start a punishment depends on the nature of the signal. If the signal is “good”, all players receive relatively higher payoffs, it is more likely that the players have complied the strategy profile so no revision on future course of actions is needed. If a player received a significantly lower payoff in a period than what is expected when no deviation occurs, on the other hand, this could signal someone has deviated, then a punishment will be implemented. Even though we focus on repeated games with perfect monitoring in this paper, the idea of using current payoff as the trigger of future punishment in repeated games with im-perfect monitoring provides us an alternative reasoning for justifiable punishments. After all, why should a player care about what action the other player takes? Instead, a player should care for the welfare consequences of the other player’s action and the corresponding continuations. These observations on when the players initiate a punishment in repeated game with imperfect monitoring provides us another motivation for the idea of justifiable punishment.

Our research belongs to the literature on equilibrium refinements for repeated games. One significant achievement in this area is the idea of renegotiation proofness.4 The basic idea for renegotiation proofness is that any continuation should be Pareto efficient, or at least, within all possible continuations of the equilibrium strategy profile. When motivating this

2 It is not obvious what continuation should be considered to be justifiable in an N-person repeated game. For example, after a deviation that is beneficial to two non-deviating players, but one of them is better off but the other one is worse off from the punishment, the two non-deviating players may not have a common consensus on whether the punishment is justifiable.

3 Example 1 also explicitly motivates the idea of justifiable punishments in the context of a repeated game.4 See Bernheim and Ray (1989), Farrell and Maskin (1989), van Damme (1989), Abreu and Pearce (1991), Asheim (1991), Abreu et al. (1993), Bergin and

MacLeod (1993), Benoit and Krishna (1993), and Wen (1996).

Page 3: Justifiable punishments in repeated games

18 M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28

concept, Farrell and Maskin (1989) argue, “threats of punishment may be implausible if punishing one player hurts the other(s)” and “if players can renegotiate after a defection, such a punishment may not be carried out.” Although how the players renegotiate is never explicitly modeled in those works, most versions of renegotiation-proof equilibrium concepts implicitly require that all the players, including the deviating player, can renegotiate on what continuation equilibrium to adopt. When (re)negotiation is explicitly modeled in the context of a repeated game, Busch and Wen (1995) and Houba(1997) show that players do not always agree on efficient continuations.5 Because of the findings in these models, we may want to seek the rationale on what punishments are more plausible beyond efficiency and symmetry, which is why we take a totally different approach in restricting punishments to be justifiable.

The consideration of justifiable punishments differs from that of renegotiation-proof equilibrium in several aspects. First, for justifiable punishments, only the non-deviating player is provided with the option to carry out the punishment or to ignore the deviation. Unlike in a renegotiation-proof equilibrium, we do not compare the punishment with any other continuation in the game, but only with the one if no deviation occurs, which is what the players are expected in this equi-librium. In addition, a punishment could be Pareto dominated by the continuation after no deviation occurs if the deviation makes the non-deviating player worse off, whereas there is no continuation that is dominated by another continuation in a renegotiation-proof equilibrium. Carrying out such costly punishment can be justified by the non-deviating player due to the fact that he is worse off by the deviation.

The idea of justifiable punishments is related to the compensation principle proposed by Chen (1996), which also places a great deal of emphasis on the welfare consequence of an action, rather the name of an action. More specifically, Chen (1996)argues that any deviation that makes all players better off leaves no place for punishment. Therefore, this compensation principle simply eliminates any action profile where some player has deviation that makes all players better off in any equilibrium of a repeated game. It rules out the possibility of a punishment where some players may benefit from it even if the deviation itself makes all players better in the short-run, which is what we observe. Under sequential rationality and welfare maximizing behavior, it is perfectly consistent for a player who benefits from the punishment to insist the implementation of the punishment even if this player is better off from the original deviation. It is not hard to see that with respect to one-shot deviations, refinement based in this compensation principle is stronger than what is implied by justifiable punishments. There is no logical relationship between weakly renegotiation-proof equilibrium concept and subgame perfect equilibrium that satisfied this compensation principle.

In this paper, we characterize the payoff vectors that can be supported by subgame perfect equilibria with justifiable punishments. We first identify necessary conditions of these payoffs, and then show that almost all those payoff vectors can be supported by subgame perfect equilibria with justifiable punishments when players are sufficient patient. This limiting set of equilibrium payoffs can be quite different from the set of subgame perfect equilibrium payoffs. Any efficient, feasible, and strictly individually rational payoff can still be supported by a subgame perfect payoff with justifiable punishments. In a general framework for the discussion of renegotiation in repeated games, Bergin and MacLeod (1993) conclude that the conflict between self-enforcement and efficiency is “the problem of renegotiation.” Here, on the contrary, we have the sharp contrast where all the “good” payoffs can be supported by a subgame perfect equilibrium with justifiable punishments. We illustrate this fact with two examples in the concluding remarks.

The rest of this paper is organized as follows. Section 2 layouts preliminaries of the repeated game model. The formal definition of justifiable punishments and some simple properties are given in Section 3. Section 4 presents our main analysis in characterizing subgame perfect equilibrium with justifiable punishments. Section 5 concludes and compares subgame perfect equilibrium with justifiable punishment with weakly renegotiation-proof equilibrium.

2. Preliminaries

Let G = (A1, A2, u1, u2) be a two-player stage game in normal-form, where Ai is the set of player i’s actions and ui :A = A1 × A2 → R is player i’s payoff function for i ∈ {1, 2}. Assume that Ai is compact and ui is continuous for i ∈ {1, 2}, and the stage game G admits at least one Nash equilibrium.6 Any payoff vector in the convex hull of u(A) = {u(a) : a ∈ A}, denoted by Co[u(A)], is said to be feasible. Player i’s minmax value in G is given by mina j∈A j maxai∈Ai ui(ai, a j) for j �= i.7

Any payoff vector that strictly/weakly dominates the minmax payoff vector is strictly/weakly individually rational. For later reference, denote the set of feasible and weakly individually rational payoffs as

F ∗ ={

v ∈ Co[u(A)

] : vi ≥ mina j∈A j

maxai∈Ai

ui(ai,a j) for i = 1 and 2},

5 Although these two models were motivated based on bargaining problems with endogenous disagreement games, see also Haller and Holden (1990)and Fernandez and Glazer (1991), they can be treated as repeated games where players can negotiate a binding contract on what continuation to play in every period.

6 Here we interpret each ai ∈ Ai as a player i’s “pure action” in the stage game. The concept of justifiable punishment crucially depends on the interpre-tation of players’ stage-game actions. We thank a referee for pointing out this issue. The existence of a stage-game Nash equilibrium ensures the existence of a simple and intuitive subgame perfect equilibrium with justifiable punishments.

7 By convention, we write player i’s stage-game payoff in the forms of either ui(a) or ui(ai , a j).

Page 4: Justifiable punishments in repeated games

M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28 19

which is convex and compact in R2 due to the compactness of stage-game action sets and continuity of players’ payoff functions.

Let G∞(δ) denote the corresponding infinitely repeated game of G with discount factor δ ∈ (0, 1). In this repeated game, h(t) = {a(1), ..., a(t)} represents a history of length t ≥ 1 periods, where a(s) ∈ A is the stage-game action profile played in period s ≤ t . The set of histories of length t ≥ 0 periods is then H(t) = At , where A0 = ∅ denotes the initial null history at the beginning of the repeated game G∞(δ). The set of all finite histories is H = ⋃∞

t=0 H(t). An infinite stream of stage-game action profiles π = {a(t)}∞t=1 is referred to as an outcome path of the repeated game. From such an outcome path π , player i’s average discounted payoff in the repeated game G∞(δ) is given by

Ui(π) = (1 − δ)

∞∑t=1

δt−1ui(a(t)

).

A strategy σi of player i in G∞(δ) is a function that maps from the set of histories H into the set of player i’s actions Ai . Any strategy profile σ = (σ1, σ2) induces a unique outcome path πσ = {aσ (t)}∞t=1 where aσ (1) = σ(h(0)) = σ(∅) and aσ (t) = σ(aσ (1), ..., aσ (t − 1)) for all t > 1. Each player i evaluates strategy profile σ according to his average discounted payoffs from the induced outcome path by the strategy profile Ui(σ ) ≡ Ui(π

σ ). A strategy profile σ = (σ1, σ2) is a Nash equilibrium of G∞(δ) if for i �= j ∈ {1, 2}, player i’s strategy σi is his best response to his opponent player j’s strategy σ j . A strategy profile σ is a subgame perfect equilibrium (SPE) of G∞(δ) if σ induces a Nash equilibrium in the corresponding continuation subgame after any possible finite history h ∈ H . Since we interpret each ai ∈ Ai as a player i’s “pure action” in the stage game, we focus on all “pure strategy” subgame perfect equilibria in the repeated game.

3. Justifiable punishments

To formalize the concept of SPE and to introduce justifiable punishments, some new notations are in order. Given a strategy profile σ and any history h ∈ H , the players will play σ(h) ∈ A in the period immediately after history h. If no one deviates, then the history after the current period will be (h, σ(h)). Let σ(h,σ (h)) be the continuation strategy profile induced by σ in the subgame after history (h, σ(h)). On the other hand, if a deviation occurs in the period immediately after history h, then the history after the deviation will be (h, a) for some a �= σ(h). Analogously, the continuation strategy profile induced by σ in the subgame after history (h, a) is denoted as σ(h,a) . With these notations, strategy profile σ is a SPE if and only if for all histories h ∈ H , and all player i’s unilateral deviations ai �= σi(h), and a j = σ j(h),

(1 − δ)ui(a) + δUi(σ(h,a)) ≤ Ui(σh) ≡ (1 − δ)ui(σ(h)

) + δUi(σ(h,σ (h))). (1)

That is, player i’s payoff from deviating to ai �= σi(h), a weighted sum of his current payoff ui(a) and his continuation payoff Ui(σ(h,a)), is no more than his payoff from complying the strategy profile σ given by the right-hand side of (1). The continuation strategy profile σ(h,a) is often referred to as a “punishment” for player i’s unilateral deviation ai �= σi(h). Condition (1) ensures that the punishment σ(h,a) is effective to enforce player i to play his action σi(h) prescribed by the strategy profile. Subgame perfection, however, does not concern what types of player i’s deviation is “punishable”. Even if player i’s deviation is “good” to player j, subgame perfection still requires the corresponding punishment to be carried out by player j. Player j’s incentive to carry out such a punishment is then reenforced by a punishment for player j should jdeviate in punishing player i, and vice versa.

As we motivated in the introduction, a deviation by player i that is beneficial to player j should not be punished unless the punishment itself is also beneficial to player j in the continuation game. In the later case, the punishments that are beneficial to player j are “economically” justifiable because it is in player j’s self-interest to carry out those punishments in the continuation games. If player i’s deviation costs player j, on the other hand, carrying out punishments that are worse for player j in the continuation games is “morally” justifiable. However, carrying out a punishment that is worse for player j in the continuation game after player i’s deviation that is beneficial to player j would not be justifiable.

Definition 2. Given any strategy profile σ , any history h and any unilateral deviation a by player i such that ai �= σi(h) and a j = σ j(h), continuation σ(h,a) is not justifiable if

u j(a) ≥ u j(σ(h)

)and U j(σ(h,a)) < U j(σ(h,σ (h))). (2)

A strategy profile is a SPE with justifiable punishments if it is a SPE and it has no continuation that is not justifiable.

In this paper, we study SPEs with justifiable punishments. It is obvious that playing a specific stage-game Nash equi-librium in every period is a SPE with justifiable punishments in G∞(δ) for all δ ∈ (0, 1). Furthermore, playing any fixed sequence of stage-game Nash equilibria,8 where the players never revise their continuation play, is also a SPE with justifi-able punishments in G∞(δ) for all δ ∈ (0, 1). Therefore, the existence of Nash equilibrium in the stage game G guarantees

8 In other words, a strategy profile prescribes a specific stage-game Nash equilibrium in a period after all possible histories that lead to this period; for all t ≥ 1, σ(h) = σ(h′) is a stage-game Nash equilibrium for all h, h′ ∈ H(t).

Page 5: Justifiable punishments in repeated games

20 M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28

the existence of SPE with justifiable punishments in the corresponding infinitely repeated game G∞(δ) for all δ ∈ (0, 1). Our goal is to characterize the set of payoffs that can be supported by SPEs with justifiable punishments, which is a non-empty subset of feasible and weakly individually rational payoffs F ∗ . When the discount factor δ is sufficiently close to 1, the set of SPE payoffs with justifiable punishments must contain the convex hull of all stage-game Nash equilibrium payoffs because playing any sequence of stage-game Nash equilibria is a SPE with justifiable punishments in the repeated game. We will show how the stage game determines the limiting set of average payoffs of SPE with justifiable punishments as the discount factor is sufficiently close to 1.

4. Main results

In this section, we investigate SPEs with justifiable punishments. For expositional convenience, we call a SPE with justifi-able punishments simply as an equilibrium. Since justifiable punishments impose additional constraint on the continuation SPE after one player’s deviation that is beneficial to the other player, this type of deviation plays a crucial rule in the analy-sis. For any stage-game action profile a ∈ A, denote the set of player i’s deviations that do not decrease player j’s stage-game payoff as

Di(a) = {a′

i ∈ Ai∣∣ u j

(a j,a′

i

) ≥ u j(a j,ai) for j �= i}. (3)

Observe that Di(a) is a non-empty and compact subset of Ai because ai ∈ Di(a) for all a ∈ A by definition, u j(·) is continu-ous, and A is compact. Also observe that Di(a) contains deviations that could be either good or bad to player i himself. In general, Di(a) depends on a ∈ A.

Now we are ready to present the following proposition that provides a set of necessary conditions on the equilibrium payoff vectors for all discount factors δ ∈ (0, 1).

Proposition 3. For all equilibrium payoff vectors v = (v1, v2), λ1 > 0 and λ2 > 0,

λ j v j − λi vi ≤ supa∈A

[λ ju j(a) − λi max

a′i∈Di(a)

ui(a′

i,a j)]

for i �= j ∈ {1,2}. (4)

Proof. Recall that the set of equilibrium payoffs, denote as V , is a non-empty subset of F ∗ . Because F ∗ is bounded, V is also bounded. For i �= j ∈ {1, 2} and λ1, λ2 > 0, supv∈V [λ j v j − λi vi] is well-defined and finite. Because V generally depends on the discount factor δ, so does supv∈V [λ j v j − λi vi].

Consider any equilibrium payoff vector v ∈ V . Without loss of generality, suppose that this equilibrium consists of u(a)

in the first period and a continuation equilibrium with payoff vector v ′ ∈ V after the first period. Accordingly, we have

v = (1 − δ)u(a) + δv ′ (5)

For any a′i ∈ Di(a), let v ′′ be the corresponding continuation payoff after player i deviates to a′

i ∈ Di(a) unilaterally, subgame perfection condition (1) and justifiable punishment condition (2) require that

(1 − δ)ui(a′

i, a j) + δv ′′

i ≤ v i and v ′′j ≥ v ′

j. (6)

With (5) and (6), we obtain

δ supv∈V

[λ j v j − λi vi] ≥ δ(λ j v ′′

j − λi v ′′i

)≥ δλ j v ′

j − λi[v i − (1 − δ)ui

(a′

i, a j)]

= λ j v j − (1 − δ)λ ju j(a) − λi[v i − (1 − δ)ui

(a′

i, a j)]

= λ j v j − λi v i − (1 − δ)[λ ju j(a) − λiui

(a′

i, a j)]

. (7)

Note that if a′i = ai then v ′ = v ′′ , the above inequality holds trivially. Because (7) holds for all a′

i ∈ Di(a), we have

δ supv∈V

[λ j v j − λi vi] ≥ λ j v j − λi v i − (1 − δ) supa∈A

[λ ju j(a) − λi max

a′i∈Di(a)

ui(a′

i,a j)]

.

Rearranging the last inequality yields

λ j v j − λi v i ≤ δ supv∈V

[λ j v j − λi vi] + (1 − δ) supa∈A

[λ ju j(a) − λi max

a′i∈Di(a)

ui(a′

i,a j)]

,

supv∈V

[λ j v j − λi vi] ≤ δ supv∈V

[λ j v j − λi vi] + (1 − δ) supa∈A

[λ ju j(a) − λi max

a′i∈Di(a)

ui(a′

i,a j)]

,

from which we obtain (4) because of δ ∈ [0, 1). �

Page 6: Justifiable punishments in repeated games

M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28 21

Fig. 1. The half space generated by a specific a ∈ A for λi/λ j > 0.

Observe that the right-hand side of (4)

ri j(λi, λ j) ≡ supa∈A

[λ ju j(a) − λi max

a′i∈Di(a)

ui(a′

i,a j)]

(8)

is determined by the stage game G only; it does not depend on the discount factor δ.9 Proposition 3 asserts that the set of equilibrium payoffs must lie in the half space defined by (4); λ2 v2 − λ1 v1 ≤ r12(λ1, λ2). Starting from any action profile a ∈ A, suppose player 1 can increase both players’ stage-game payoffs by deviating to a′

1 ∈ D1(a), as showed in Fig. 1. The half space associated with this action profile a is then determined by the vector (−λ1, λ2) and point (u1(a′

1, a2), u2(a1, a2)). Then the set of equilibrium payoffs will be in the largest such half space for all a ∈ A, namely λ2 v2 − λ1 v1 ≤ r12(λ1, λ2).

Now taking all possible (λi, λ j) > 0, the set of equilibrium payoffs for all δ ∈ (0, 1) must be a subset of the intersection of the set of feasible and weakly individually rational payoffs and all those half spaces. More specifically, Proposition 3 implies

Proposition 4. For all δ ∈ (0, 1), the set of equilibrium payoffs is a subset of V ∗ , where

V ∗ ≡ F ∗ ⋂i �= j∈{1,2}

⋂(λi ,λ j)>0

{v ∈R

2 : λ j v j − λi vi ≤ ri j(λi, λ j)}. (9)

Observe that V ∗ defined by (9) is a convex and closed subset of F ∗ because it is the intersection of convex and closed set F ∗ and all the closed half spaces. Also, we know that V ∗ contains at least all stage-game Nash equilibrium payoff vectors, hence it is non-empty.

Now reconsider Example 1, Proposition 4 implies that playing the stage-game Nash equilibrium (T , L) in every period is the unique SPE with justifiable punishments in the repeated game.10 Taking a = (B, R), for example, we have D1(a) = {T , B}and (

maxa′

1∈D1(a)u1

(a′

1, R), u2(B, R)

)= (0,−2).

Hence any equilibrium payoff vector (v1, v2) must lie in the half space 2v2 ≤ 3v1 − 2. Similarly, by symmetry, any equi-librium payoff vector must also be in the half space 3v2 ≥ 2v1 + 2. Together with the feasibility, Proposition 4 implies V ∗ = {(2, 2)}, hence playing the stage-game Nash equilibrium (T , L) in every period is the unique SPE with justifiable pun-ishments in G(δ) for all δ ∈ (0, 1). This example demonstrates that requiring punishments to be justifiable sometime yields a very sharp equilibrium refinement in a repeated game.

On the other hand, our next example shows a very different result in a repeated game with a similar stage game to Example 1.

Example 5. In the infinitely repeated game with the following stage game:

9 Although A is compact and u is continuous, we may not replace sup in (8) by max because λ j u j(a) − λi maxa′i∈Di (a) ui(a′

i , a j) may not be continuous. Counter example is available from the authors upon request.10 For illustration propose and simplicity, we focus on pure stage-game actions only. Even if we allow for mixed stage-game actions and assume all past

mixed action profiles are observable in the repeated game, playing the stage-game Nash equilibrium in every period is still the unique SPE with justifiable punishments.

Page 7: Justifiable punishments in repeated games

22 M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28

L RT 2, 2 0, 1B 1, 0 −2, −2

every feasible and strictly individually rational payoff can be supported by SPE with justifiable punishments for sufficiently large δ ∈ (0, 1).

Observe that every player has a dominant action and zero minmax payoff, and the Nash equilibrium payoff vector (2, 2)

strictly dominates any other feasible payoff vector. Again, the Folk Theorem is applicable in this example. Different from Example 1, however, note that D1(T , R) = {T } and D1(T , L) = {T }, hence Proposition 3 implies that any equilibrium payoff vector (v1, v2) must be in the half space 2v2 ≤ v1 + 2. By symmetry again, any equilibrium payoff vector (v1, v2) must also be in the half space v2 ≥ 2v1 − 2. Proposition 4 implies V ∗ = F ∗ in Example 5, which is very different from what we have in Example 1. Furthermore, almost all payoff vectors in V ∗ = F ∗ can be supported by SPE with justifiable punishments in the repeated game. For example, u(T , R) = (0, 1) ∈ V ∗ can be supported by an equilibrium with the simple strategy profile with the following outcome path and punishment paths:

π0 = (T , R)∞, π1 = (T , R)∞, π2 = (B, L)∞.

Note that π i is player i’s punishment path after any history in which player i was the last player who deviated. During the outcome path π0 and punishment path π1, player 1 obviously has no incentive to deviate. On the other hand, if player 2deviates from playing R to playing L, player 1’s stage-game payoff is higher and so is player 2’s stage-game payoff. When

(1 − δ) × 2 + δ × 0 ≤ 1 ⇔ δ ≥ 1

2

player 2 has no incentive to deviate from (T , R) during the outcome path π0 and punishment path π1. By symmetry, player 1 has no incentive to deviate during punishment path π2. Implementing player 2’s punishment path π2 is justifiable because player 1 will receive his highest continuation payoff of 1 in this strategy profile. Therefore, this boundary point (0, 1) of V ∗ can be supported as an equilibrium payoff vector. Using this equilibrium as the punishment for player 1, and the one with outcome path (B, L)∞ as the punishment for player 2, we can support any feasible and strictly individually rational payoff vector in (0, 1) × (0, 1) by a SPE with justifiable punishments.

Similarly, we can support any boundary point of V ∗ , say (α, β), between (0, 1) and (2, 2). Note that for sufficiently large δ, Lemma 2 of Fudenberg and Maskin (1991) states that there is an outcome path, say π0, with average payoff vector (α, β) and the payoff vector from any of its continuations is within ε of (α, β) for some small positive ε < (β − α)/2. Note that this outcome path π0 must consist of (T , L) and (T , R) only and player 1 has no profitable deviation from either action profile. By symmetry, there is an outcome path π2 from which player 1 receives β and player 2 receives α. Now consider the simple strategy profile with outcome π0, player 1’s punishment path π1 = π0, and player 2’s punishment π2. Because α < β − 2ε, player 2’s payoff from any continuation of π0 is at least β − ε > α + ε > α, which is what player 2 receives when player 2 is punished from outcome path π2. This additional gap of ε > 0 is enough to enforce player 2 not to deviate from (T , R) when

(1 − δ) × 2 + δα ≤ (1 − δ) × 1 + δ(α + ε) ⇔ δ ≥ 1

1 + ε.

Therefore, payoff vector (α, β) is supported by this simple strategy profile for sufficiently large δ. By construction, any continuation in this simple strategy profile is justifiable. By the same argument, using this equilibrium as the punishment for player 1 and the symmetric one with payoff vector (β, α), as the punishment for player 2, and we can support any feasible payoff vector in (α, β) × (α, β) when the discount factor is large enough. Obviously, payoff vector (2, 2) can be supported by the repetition of the stage-game Nash equilibrium. To conclude Example 5, we have shown that every strictly individually rational payoff vector in V ∗ can be supported by a SPE with justifiable punishment for sufficiently large discount factor.

Example 5 not only demonstrates a folk-theorem alike characterization for SPEs with justifiable punishments (Propo-sition 7), but also illustrates an important feature of set V ∗ , which is very useful in the proof of Proposition 7. More specifically, note that at the stage-game action profile that leads to either (2, 2) or (0, 1) in Example 5, player 1 has no profitable deviation. Roughly speaking, at any stage-game action profile that leads to a boundary point of V ∗ where player 1 receives his least possible payoff in V ∗ while player 2’s payoff is fixed, player 1 generally has no profitable deviation that is also strictly beneficial to player 2. In order to present this result formally, let Bi be the set of stage-game action profiles that player i has no profitable deviation that is also weakly beneficial to player j. That is, for all a ∈ Bi , there is no a′

i ∈ Ai

such that ui(a′i, a j) > ui(ai, a j) and u j(a j, a′

i) ≥ u j(a j, ai). These action profiles in Bi will be used to construct justifiable punishments for player i. We now focus on the interior of V ∗ . Note that the existence of a stage-game Nash equilibrium does not guarantee V ∗ has a non-empty interior, such as Example 1.11 In order to make the proof of Proposition 7 easy to understand, we first establish the following lemma:

11 If V ∗ has an empty interior, then Lemma 6 and Proposition 7 hold vacuously.

Page 8: Justifiable punishments in repeated games

M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28 23

Lemma 6. For any interior point v of V ∗ , there are two stage-game action profiles bi, ci ∈ Bi , possibly bi = ci , such that

v i > ρiui(bi) + (1 − ρi)ui

(ci) and v j < ρiu j

(bi) + (1 − ρi)u j

(ci) (10)

for all ρi in an open subinterval of [0, 1].

Proof. For any a ∈ A, it is easy to see that (a′i, a j) ∈ Bi and Di(a′

i, a j) ⊂ Di(a) for any a′i ∈ arg maxa′′

i ∈Di(a) ui(a′′i , a j). For all

λi, λ j > 0, we have

λ ju j(a j,a′

i

) − λi maxa′′

i ∈Di(a′i ,a j)

ui(a′′

i ,a j) ≥ λ ju j(a j,ai) − λi max

a′′i ∈Di(a)

ui(a′′

i ,a j).

This implies that in order to determine the bounds ri j(λi, λ j), we just need to consider the action profiles in Bi : ri j(λi, λ j) ≡supa∈Bi [λ ju j(a) − λi maxa′

i∈Di(a) ui(a′i, a j)].

Now, suppose that Lemma 6 is false. Then v will not be bounded by all these half spaces constructed based on any a ∈ Bi . In other words, there exist some λi, λ j ≥ 0 with (λi, λ j) �= (0, 0) such that for all a ∈ Bi ,

λ j v j − λi v i ≥ λ ju j(a) − λiui(a) = λ ju j(a) − λi maxa′

i∈Di(a)ui

(a′

i,a j), (11)

because maxa′i∈Di(a) ui(a′

i, a j) = ui(a) for all a ∈ Bi . Inequality (11) implies that λ j v j − λi v i ≥ ri j(λi, λ j), which contradicts the fact that v is an interior point of V ∗ . �

Lemma 6 asserts that for any interior point v ∈ V ∗ , there are, at least, some convex combinations of two payoff vectors resulted from actions profiles in Bi which pay player i less and player j more than at v . It is possible that bi = ci , so ui(bi) = ui(ci) < v i , u j(bi) = u j(ci) > v j , and the two inequalities of (10) hold for all ρi ∈ (0, 1). Otherwise, without loss of generality, ui(bi) < v i ≤ ui(ci) and u j(bi) ≤ v j < u j(ci). Accordingly, the two inequalities of (10) hold for all

ρi ∈(

ui(ci) − v i

ui(ci) − ui(bi),

u j(ci) − v j

u j(ci) − u j(bi)

).

This open interval of ρi will be used to construct a justifiable punishment path to support v with Fudenberg and Maskin(1991) sequence of bi and ci . In Example 5, for any point in (0, 1) × (0, 1), we have b1 = (T , R), and for any other interior point of V ∗ , we have b1 = (T , R) and c1 = (T , L). Note that even v is individually rational, u(bi) and u(ci) may not be individually rational in general. As demonstrated in Example 5, we will use bi and ci to construct justifiable punishments after player i’s deviations that are beneficial to both players to support interior points.

Proposition 7. Any interior point of V ∗ can be supported by a SPE with justifiable punishments for sufficiently large discount factor δ ∈ (0, 1).

Proof. Let v be an interior point of V ∗ . Here we provide a simplified proof by assuming that v can be achieved by one-shot play in the stage game, i.e., v = u(a) for some a ∈ A, and for i = {1, 2}, there exists one bi ∈ Bi such that v i > ui(bi) and v j < u j(bi) by Lemma 6. This simplified proof helps us to demonstrate how to construct a SPE where the two players will play a in every period and all continuations are justifiable. We will then discuss how this simplified proof can be modified and generalized to accommodate a general case.

Let mi denote an action profile that minmaxes player i in the stage game. By definition, player i has no beneficial deviation to himself when mi is played. This feature of mi resembles what we have for bi : player i has no beneficial deviation to himself that is also weakly beneficial to player j. Inspired by a simple strategy profile, we consider a strategy profile defined by five outcome paths (π0, π1, π1, π2, π2), where

π0 = (a, . . . , a, . . .),

π i = (bi, . . . ,bi︸ ︷︷ ︸

τi

, a, . . .)

for i = 1 and 2,

π i = (mi, . . . ,mi︸ ︷︷ ︸

κi

,bi, . . . ,bi︸ ︷︷ ︸τi

, a, . . .)

for i = 1 and 2.

• The two players start the repeated game by following path π0; play a in every period as long as no one has deviated yet.

• According to the ongoing path, which can be any of these five paths,– if player i deviates such that player i’s deviation does not increase player i’s stage-game payoff (type-A deviation), or

if both players deviate, then the two players will continue the ongoing path as in the case that no deviation occurs.

Page 9: Justifiable punishments in repeated games

24 M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28

– if player i deviates such that player i’s deviation does not decrease player j’s stage-game payoff and increases player i’s stage-game payoff (type-B deviation), then the two players will start path π i in the following period.

– if player i deviates such that player i’s deviation decreases player j’s stage-game payoff and increases player i’s stage-game payoff (type-C deviation), then the two players will start path π i in the following period.

This strategy profile resembles a simple strategy of Abreu (1988); the difference here is that we need two types punish-ment paths for each player after his type-B and type-C deviations, respectively, so that all continuations are guaranteed to be justifiable. In the rest of this proof, we show that there exist finite τi and κi such that for δ sufficiently close to 1, the strategy profile defined above is a SPE and all its continuations are justifiable. First, it is obvious that no one has incentive to make a type-A deviation. We next focus on type-B and type-C deviations in this strategy profile.Case 1: The strategy profile calls to play a and then path π0. Player i has no incentive to take any type-B deviation if

f i(δ) ≡ (1 − δ)maxai

ui(ai, a j) + δUi(π i) − Ui

(π0) ≤ 0. (12)

Note that limδ→1 f i(δ) = 0 and

limδ→1

f ′i (δ) = τi

(ui(a) − ui

(bi)) − max

aiui(ai, a j) + ui(a),

which can be made positive for τi large enough, such as

τi >maxai ui(ai, a j) − ui(a)

ui(a) − ui(bi). (13)

For any τi that satisfies (13), choose δ large enough so that U (π i) ∈ V ∗ and f ′i (δ) > 0, which implies that (12) holds. Hence,

player i has no incentive to make any type-B deviation from the action profile a along outcome path π0 as long as τi and δ satisfy the conditions described above.

Because U (π i) ∈ V ∗ implies ui(mi) < Ui(πi), we have Ui(π

i) < Ui(πi) for any κi > 0. Therefore, ui(mi) < Ui(π

i) and (12) are also sufficient for player i not to have incentives to make any type-C deviation during the play of a.Case 2: The strategy profile calls to play bi (for at most τi periods) then path π0 (play a forever). Recall that player i has no type-B deviation because bi ∈ Bi . Player i has no incentive to take any type-C deviation if

gi(δ) = (1 − δ)maxai

ui(ai,bi

j

) + δUi(π i) − Ui

(π i) ≤ 0 (14)

Observe that limδ→1 gi(δ) = 0 and

limδ→1

g′i(δ) = κi

[ui(a) − ui

(mi)] − max

aiui

(ai,bi

j

) + ui(a),

which is positive for κi large enough, such as

κi >maxai ui(ai,bi

j) − ui(a)

ui(a) − ui(mi). (15)

For any κi that satisfies (15), choose δ large enough so that U (π i) ∈ V ∗ and g′i(δ) > 0, which implies (14). Hence, player i

has no incentive to make any type-C deviation during the play of bi .Case 3: The strategy profile calls to play mi (for at most κi periods) and then path π i . Obviously, player i has no beneficial deviation while being minmaxed.Case 4: The strategy profile calls to play b j (for at most τ j periods) then path π0. If player i does not deviate, his continu-ation payoff will be Ui(π

j) > Ui(π0) due to ui(b j) > ui(a) from the construction. Condition (12) then implies that player i

has no incentive to take any type deviation from b j for the same reason as in Case 1.Case 5: The strategy profile calls to play m j (for at most κ j periods) then path π j . Player i has no incentive to take any type-B deviation if

hi(δ) = (1 − δ)maxai

ui(ai,m j

j

) + δUi(π i) − [(

1 − δκ j)ui

(m j) + δκ j U i

(π j)] ≤ 0. (16)

Note again that limδ→1 hi(δ) = 0 and

limδ→1

h′i(δ) = ui(a) − max

aiui

(ai,m j

j

) + τi[ui(a) − ui

(bi)] − κ j

[ui(a) − ui

(m j)]

which is positive for large enough τi , such as

τi >maxai ui(ai,m j

j) − ui(a) + κ j[ui(a) − ui(m j)]i

. (17)

ui(a) − ui(b )
Page 10: Justifiable punishments in repeated games

M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28 25

For any τi that satisfies (17), choose δ large enough so that U (π j) ∈ V ∗ and h′i(δ) > 0, which implies (16). Hence, player i

will have no incentive to make any type-B deviation. Because ui(mi) < Ui(πi) implies Ui(π

i) < Ui(πi), condition (16) also

ensures that player i will have no incentive to make any type-C deviation during the play of m j .For both i = 1 and 2, first choose κi according to (15), then choose τi according to (13) and (17). Given κi and τi , we

have (12), (14), and (16) for δ sufficiently close 1. Consequently, our strategy profile is a SPE with average payoff vector v .To conclude this proof, we just need to show that continuation SPE after any type-B deviation is justifiable because con-

tinuation of the on-going path after a type-A deviation and any continuation after a type-C deviation are always justifiable. Note that player i has no type-B deviation when playing mi and bi . If player i ever makes a type-B deviation when playing a, m j , or b j , on the other hand, player j’s continuation payoff will be U j(π

i) > v j , which is more than U j(πj) and U j(π

j). Therefore, punishment path π i is also justifiable to support π , π j , and π j , which concludes this simplified proof.

We now discuss how to generalize this simplified proof when v ∈ V ∗ cannot be resulted from any one-shot play, and for each i ∈ {1, 2}, there are two different action profiles bi and ci ∈ Bi by Lemma 6 that have these required properties. Given the two strict inequalities of (10), there exists a small ε > 0 such that the two equalities of (10) continue to hold if we replace v i by v i − ε and v j by v j + ε, possibly for a smaller but still an open interval of ρi . Lemma 2 of Fudenberg and Maskin (1991) states that for sufficiently large discount factor δ, there exists an outcome path, say π0, with average discount payoff vector v and any continuation paths of π0 leads to a payoff vector within ε of v . Next, without loss of generality, suppose u(bi) is Pareto dominated by u(ci), we can construct a finite path where bi is played for some periods and then ci is played afterwards. The average discount payoff from such a finite path will pay player i less than v i − ε and pays player j more than v j + ε. In the simplified proof, replace path π0 by π0, replace v i by v i − ε, v j by v j + ε, and bi by the finite path described above. It is then routine to verify the strategy profile described in the proof with these modifications constitutes a SPE with justifiable punishments after selecting the parameters so that all the equilibrium conditions hold. �

The key in the proof of Proposition 7 is to tailor two different types of punishments for two different types of deviations. If a player’s deviation decreases the other player’s stage-game payoff, any punishment is justifiable. On the other hand, if a player’s deviation does not decrease the other player’s stage-game payoff, justifiable punishment restricts what kinds of continuations can be used. The characteristic of set V ∗ allows us to construct justifiable punishments for all type-B deviations. The proof of Proposition 7 is complicated by the needs for justifiable punishments after all type-B deviations.

For a Pareto efficient action profile a ∈ A, a player cannot increase both players’ payoffs from any deviation, i.e., Di(a) ={ai}. Consequently, u(a) is bounded by the half spaces in constructing set V ∗ , and hence u(a) ∈ V ∗ . Our next proposition addresses all efficient payoff vectors in V ∗ that are excluded from Proposition 7.

Proposition 8. Any efficient, feasible, and strictly individually rational payoff vector can be supported by SPE with justifiable punish-ments for sufficiently large discount factor δ ∈ (0, 1).

Proof. Recall that V ∗ contains all efficient, feasible, and strictly individually rational payoff vectors, but they are not in the interior of V ∗ . Let v = u(a), for simplicity, be an efficient and strictly individually rational payoff vector. If v is the only efficient and strictly individually rational payoff vector, then a must be a stage-game Nash equilibrium, hence v can be supported by the repetition of Nash equilibrium a in the repeated game. Otherwise, there are more than one efficient and strictly individually rational payoffs, say u(a1) and u(a2), such that for j �= i

ui(ai) ≤ ui(a) ≤ ui

(a j) and ui

(ai) < ui

(a j). (18)

Note that v = u(a) can be either u(a1) or u(a2). Let mi be an action profile that minmaxes player i. Consider a modified simple strategy profile defined by the following three paths:

π0 = (a, . . . , a, . . .),

π i = (mi, . . . ,mi︸ ︷︷ ︸

τi

,ai, . . . ,ai, . . .)

for i = 1 and 2.

Players start with the outcome path π0 and switch to player i’s punishment path π i after player i makes either a type-B or type-C deviation from the ongoing path, if a player makes a type-A deviation (the deviator does not increases his stage-game payoff from a type-A deviation), the two players will continue the ongoing path. For some τi , and sufficiently large δ ∈ (0, 1), we have Ui(π

i) < Ui(πj) due to (18). Note that neither player has type-B deviation from either a, or a1, or a2. Player j

may have a type-B deviation when playing mi , Ui(πi) < Ui(π

j) ensures that punishing player j with path π j is justifiable after player j makes any type-B deviation from mi . Accordingly, this modified simple strategy profile is a SPE and any its continuation is justifiable. �5. Concluding remarks

Based on a new set of reasonings on what continuations are deemed to be justifiable in a repeated game, we charac-terize subgame perfect equilibria with justifiable punishments when players are sufficiently patient. There are two types of

Page 11: Justifiable punishments in repeated games

26 M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28

reasonings for the players to change their course of actions in the future in response to a breach by a player of what they are expected to play. The second type is when the breach by one player is bad to the other player, then any continuation is considered to be justifiable for the harmful deviation even if the continuation is bad to the non-deviating player. We show that such a simple and intuitive consideration eliminates many unreasonable equilibrium strategy profiles and their corresponding payoff vectors, such as Example 1 demonstrates a unique SPE with justifiable punishments.

As we have discussed in the introduction, this research fits to the literature on equilibrium refinements for repeated games. One important and influential work in this literature is the concept of weakly renegotiation-proof (WRP) equilibrium by Farrell and Maskin (1989). The main idea behind WRP equilibrium concept is that if a Pareto dominated continuation is called to play, then all the players would have incentive to negotiate away from such a Pareto dominated continuation, although no explicit negotiation is modeled. Requiring no Pareto dominance among the continuations would eliminate players’ incentive to renegotiate continuation equilibrium, and hence such a SPE would be renegotiation proof.

We now discuss some of the differences between the concept of WRP equilibrium and SPE with justifiable punishments. These two equilibrium concepts are built on different behavioral assumptions on continuation strategy profiles in repeated games. Even if a WRP equilibrium strategy profile may involve continuations that are not justifiable, we have

Proposition 9. Any payoff vector that can be supported by a WRP equilibrium can also be supported by a SPE with justifiable punish-ments.

Proof. Consider any WRP equilibrium σ of G∞(δ). After history h, the equilibrium σ calls the two players to play σ(h) in the current period. If player i deviates to ai �= σi(h) while player j complies with a j = σ j(h), then there are two possibilities to deal with:

1. If ui(a) > ui(σ (h)), then subgame perfection (Eq. (1)) implies that Ui(σ(h,a)) < Ui(σ(h,σ (h))). Because σ is WRP, U (σ(h,σ (h))) does not Pareto dominate U (σ(h,a)), and hence we must have U j(σ(h,σ (h))) ≤ U j(σ(h,a)). In other words, continuation σ(h,a) is justifiable.

2. If ui(a) ≤ ui(σ (h)), we can replace continuation σ(h,a) by σ(h,σ (h)) , continuation after player i does not deviate. In doing so, player i continuation payoff is not any higher so player i still has no incentive to deviate, and continuation σ(h,σ (h))

should player i deviates to a is justifiable because player j has the same continuation as in the case that player i does not deviate.

Observe that this modified strategy profile differs from the original WRP equilibrium σ only after unilateral deviation of type-A. Because neither player has incentive to deviate in this modified strategy profile, it continues to be a SPE with the same payoff vector as the original WRP equilibrium. By construction, all continuations in this modified strategy profile are justifiable. �

Proposition 9 holds for all δ ∈ (0, 1). It does not imply that every WRP equilibrium is SPE with justifiable punishments, but rather any WRP equilibrium payoff vector can be supported by a SPE with justifiable punishments, possibly supported by a different strategy profile. The concept of WRP equilibrium is not a refinement of SPE with justifiable punishments, and the concept of SPE with justifiable punishments is not a refinement of WRP equilibrium either.

On the other hand, there are certainly payoff vectors that can be supported by SPE with justifiable punishments which are not WRP. Consider the following example of Farrell and Maskin (1989, p. 343):

Example 10. Consider the infinitely repeated game with the following stage game:

Stone Scissors PaperPaper 1, 0 0, 1 0, 0Stone 0, 0 1, 0 0, 1

Scissors 0, 1 0, 0 1, 0

This stage game has a unique Nash equilibrium with payoff vector ( 13 , 13 ), which is also the minmax payoff vector.

Propositions 7 and 8 imply that every feasible and strictly individually rational payoff vector can be supported by SPE with justifiable punishments for sufficient large δ ∈ (0, 1). Farrell and Maskin (1989) have shown that only the static Nash equilibrium payoff vector ( 1

3 , 13 ) can be supported by WRP equilibrium for all δ ∈ (0, 1). This example demonstrates a sharp contrast between WRP equilibrium and SPE with justifiable punishments. In particular, none of the efficient payoff vectors in this example can be supported by WRP equilibrium, while our Proposition 9 ensures all efficient and strictly individually rational payoffs can be supported by SPE with justifiable punishments.

We conclude with an interesting example where the set of the WRP equilibrium payoffs is a proper subset of payoffs that can be supported by SPE with justifiable punishments, which is in turn a proper subset of SPE payoffs when δ is sufficiently close to 1.

Page 12: Justifiable punishments in repeated games

M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28 27

Fig. 2. SPE, SPE with justifiable punishments (JP), and WRP equilibrium payoffs.

Example 11. Consider the infinitely repeated game with the following stage game:

L RT 2, 0 −1, −1B 1, 1 0, −1

Observe that (T , L) is the unique Nash equilibrium in the stage game, and every player has minmax payoff 0. According to the Folk Theorem, every payoff vector v in the convex hull of (0, 0), (1, 1), and (2, 0) such that v2 > 0 can be sup-ported by SPE when the discount factor δ is sufficiently close to 1. Results obtained in this paper imply that the interior points in the convex hull of ( 1

2 , 0), (1, 1), and (2, 0) and the points in the convex hull of (1, 1), and (2, 0) can be sup-ported by SPE with justifiable punishments. However, (1, 1) cannot be supported by WRP equilibrium because there is no punishment that is both renegotiation-proof and effective should player 1 deviate. Note that player 2 receives less than 1 in any continuation from which player 1 receives less than 1, such a continuation would be dominated by (1, 1) and hence it cannot be renegotiation-proof in supporting payoff vector (1, 1). In fact, Theorem 1 of Farrell and Maskin (1989)implies that only the payoff vectors in the convex hull of ( 1

2 , 0), ( 75 , 35 ), and (2, 0) may be supported by WRP equilibrium.

Fig. 2 illustrates the sets of payoffs that, respectively, can be supported by SPE, SPE with justifiable punishments, and WRP equilibria.12

Acknowledgments

We would like to thank Johannes Hörner, Matthew Jackson, Fahad Khalil, Jacques Lawarrée, Bentley MacLeod, Dennis O’Dea, Hamid Sabourian, Mike Waldman, Jörgen Weibull and participants of the 4th World Congress of the Game Theory Society and the SINGS8 conference for their comments. In particular, we would like to thank the editor and referees of this journal for their constructive suggestions. This research is supported by the Spanish Ministerio de Economía y Competi-tividad under the project ECO2012-31346, and the Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco (grupo de investigación IT 568-13).

References

Abreu, D., 1988. On the theory of infinitely repeated games with discounting. Econometrica 56, 383–396.Abreu, D., Pearce, D., 1991. A perspective on renegotiation in repeated games. In: Selten, R. (Ed.), Game Equilibrium Models. Springer-Verlag, Berlin,

pp. 44–55.Abreu, D., Pearce, D., Stacchetti, E., 1993. Renegotiation and symmetry in repeated games. J. Econ. Theory 60, 217–240.Asheim, G., 1991. Extending renegotiation-proofness to infinitely horizon games. Games Econ. Behav. 3, 278–294.Benoit, J.-P., Krishna, V., 1993. Renegotiation in finitely repeated games. Econometrica 61, 303–324.Bergin, J., MacLeod, B., 1993. Efficiency and renegotiation in repeated games. J. Econ. Theory 61, 42–73.Bernheim, B.D., Ray, D., 1989. Collective dynamic consistency in repeated games. Games Econ. Behav. 1, 295–326.Busch, L.-A., Wen, Q., 1995. Perfect equilibria in a negotiation model. Econometrica 63, 545–565.Chen, K.-P., 1996. Compensation principle in repeated games. Games Econ. Behav. 16, 1–21.Farrell, J., Maskin, E., 1989. Renegotiation in repeated games. Games Econ. Behav. 1, 327–360.Fernandez, R., Glazer, J., 1991. Striking for a bargain between two completely informed agents. Amer. Econ. Rev. 81, 240–252.Fudenberg, D., Maskin, E., 1986. The folk theorem in repeated games with discounting or with incomplete information. Econometrica 54, 533–554.Fudenberg, D., Maskin, E., 1991. On the dispensability of public randomization in discounted repeated games. J. Econ. Theory 53, 428–438.

12 Details are available from the authors upon request.

Page 13: Justifiable punishments in repeated games

28 M. Aramendia, Q. Wen / Games and Economic Behavior 88 (2014) 16–28

Green, E.J., Porter, R.H., 1984. Noncooperative collusion under imperfect price information. Econometrica 52, 87–100.Haller, H., Holden, S., 1990. A letter to the editor on wage bargaining. J. Econ. Theory 52, 232–236.Houba, H., 1997. The policy bargaining model. J. Math. Econ. 28, 1–27.Mailath, G., Samuelson, L., 2006. Repeated Games and Reputations. Oxford University Press, New York.van Damme, E., 1989. Renegotiation-proof equilibria in repeated prisoners’ dilemma. J. Econ. Theory 47, 206–217.Wen, Q., 1996. On renegotiation-proof equilibria in finitely repeated games. Games Econ. Behav. 13, 286–300.