nash equilibria of finitely repeated games

8
International Journal of Game Theory, Vol. 16, Issue 3, page 197 - 204 Nash Equilibria of Finitely Repeated Games By J.-P. Benoit 1 and V. Krishna 2 Abstract: Under weak conditions, any feasible and individually rational payoff vector of a one-shot game can be approximated by the average payoff in a Nash equilibrium of a finitely repeated game with a long enough horizon. At least since the extensive discussion by Luce and Raiffa (1957), it has been well known that if the classic prisoners' dilemma (Example 1 below) is played a finite num- ber of times, the unique Nash equilibrium outcome path of the repeated game involves playing the unique Nash equilibrium of the one-shot game in every period. This fact, when contrasted with the (Nash) folk theorem for infinitely repeated games, indicates a severe discontinuity in the set of equilibrium payoffs as the number of repetitions goes to infinity. Cheat Cooperate Cheat Cooperate 0,0 3,-1 -1,3 2,2 Example 1 We show below that this discontinuity results because of a very special feature of the prisoners' dilemma. In general, fmitely repeated games may have Nash equilibria which 1 Jean-Pierre Benoit, Graduate School of Business, Columbia University, New York, NY 10027, USA. 2 Vijay Krishna, Graduate School of Business Administration,Harvard University,Boston, MA 02163, USA. We would liketo thank two anonymous refereesfox theircomments.

Upload: j-p-benoit

Post on 10-Jul-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nash equilibria of finitely repeated games

International Journal of Game Theory, Vol. 16, Issue 3, page 197 - 204

Nash Equilibria of Finitely Repeated Games

By J.-P. Benoit 1 and V. Krishna 2

Abstract: Under weak conditions, any feasible and individually rational payoff vector of a one-shot game can be approximated by the average payoff in a Nash equilibrium of a finitely repeated game with a long enough horizon.

At least since the extensive discussion by Luce and Raiffa (1957), it has been well

known that if the classic prisoners' dilemma (Example 1 below) is played a finite num-

ber of times, the unique Nash equilibrium outcome path of the repeated game involves playing the unique Nash equilibrium of the one-shot game in every period. This fact, when contrasted with the (Nash) folk theorem for infinitely repeated games, indicates

a severe discontinuity in the set of equilibrium payoffs as the number of repetitions goes to infinity.

Cheat

Cooperate

Cheat Cooperate

0 ,0 3 , -1

- 1 , 3 2 ,2

Example 1

We show below that this discontinuity results because of a very special feature of the prisoners' dilemma. In general, fmitely repeated games may have Nash equilibria which

1 Jean-Pierre Benoit, Graduate School of Business, Columbia University, New York, NY 10027, USA.

2 Vijay Krishna, Graduate School of Business Administration, Harvard University, Boston, MA 02163, USA.

We would like to thank two anonymous referees fox their comments.

Page 2: Nash equilibria of finitely repeated games

198 J.-P. Benoit and V. Krishna

do not consist of one-shot equilibria strung together. While this has been recognized

before (see, for instance, Van Damme 1983) neither the reasons lying behind this fact

nor the limiting properties of Nash equilibrium payoffs of finitely repeated games

seem to have been systematically explored. To be able to state our results precisely, we require some notation.

A (one-shot) game, G, in normal form consists of a set of n players, the action sets of the players and their payoff functions. Thus, we define G = (A 1 ,A 2 . . . . ,An; U 1 ,

U2, ..., Un) where A i is player i's action set and Ui :A -*R is i's payo f f function,

where A =A 1 xA2 x . . .A n. We will assume that the action sets, A i, are finite. An a E A may also be referred to as an outcome of 6:. Players can randomize over their

actions in the usual way by choosing an element Pi of A(Ai), the set of probability

distributions over A i. For p E A(A1) x A(A2) x ... x A(An), let P - i denote the ran- domized actions of the (n - 1) players excluding i and let bi(p) E A i denote player i's

best response to p. Let vi denote player i's minmax payoff. A payoff vector u is feasible

if it is in the convex hull of the range of U, it is individually rational if ui >f vi for all L G(T) denotes the game that results when G is successively played T times (T is a

positive integer). For t = 1,2 . . . . T i fa t E A denotes the outcome of the game G(T) at T

time t, player i's payoff in G(T) is given by ( l / T ) E Ui(at). A player's (randomized) t = l

strategy in G(T) is a T-tuple oi = (oi, o~ .. . . . o f ) where o/1 E A(Ai) and for t > 1,

oit : A t - I ~A(Ai ) . Given a strategy combination o = ( o l , o 2 . . . . . on) in G(T) and (a I , a 2 . . . . a t) E A t (t <~ T), we say that (a I , a 2 . . . . a t) is a o-possible history if a I E

supp o I , the support of o 1 and for all s <<. t, a s E supp oS(a I , a 2 . . . . . a s - l ).

Nash equilibria of G and G(T) are defined in the usual way.

Now, while it is true that any Nash equilibrium of a finitely repeated game must involve playing to an equilibrium of the one-shot game in the final period of play,

we will show later that this restriction does not, in general, impose severe constraints on earlier periods of play. In fact, the prisoners' dilemma has a very special feature -

all players receive their minmax payoffs in every equilibrium. Given this, the players must not only be at a one period best response in the last period of play, but also in

the next to last period, since even if a deviation is detected they cannot be threatened with any retaliation in the last period. The reasoning continues all the way back to the

first period. We wish to emphasize that it is the tact that equilibrium payoffs and min- max payoffs are identical which leads to this result. Neither the uniqueness of the

equilibrium in the prisoners' dilemma nor its dominant strategy nature is crucial. A complete formal argument is given in our first proposition which generalizes a result

by Sorin (1983).

Proposition O: Suppose that every Nash equilibrium e of G is such that U(e) = v. Sup- pose g is a Nash equilibrium of G(T). Then for all t~< T and all (a I ,a 2 . . . . . a t - l )

which are o-possible histories, at(a 1 , a 2 . . . . . a t - l ) is a Nash equilibrium of G.

Page 3: Nash equilibria of finitely repeated games

Nash Equilibria of Finitely Repeated Games 199

Proof." The proof is by induction on the number of stages. The result is trivial for G(1).

Suppose it is true for G ( T - 1) and suppose that o is a Nash equilibrium of G(T). Given a EA, define ola to be the strategies induced on the subgame G ( T - 1) fol-

lowing a play of a in the first period. Formally, o I la = 02(a) and for t > 1, o t l a ( a t , a 2 ,

. . . a t - 1 ) = 0 "t+l (a, a I , a 2. . . . . a t - 1 ). First notice that if 0 is a Nash equilibrium of G ( T )

and a E supp 01 , then 0 la is a Nash equilibrium of G ( T - 1). Next observe that (a m , a 2 , . . . a t ) is a o-possible historyifand only i fa I @ supp o I and (a 2 ,a 3 . . . a t) is a Ola x-pos-

sible history. Suppose (a I , a 2 . . . . a t) is a o-possible history. By the induction hypo- thesis, for t = 1,2 . . . . . T - 1, o t l a I (a 2 , a3 , . . , a t) is a Nash equilibrium of G, and hence for t = 2, 3 . . . . . T, o t ( a I , a 2 , . . . a t - 1 ) is a Nash equilibrium of G. It remains to show

that o I is also a Nash equilibrium of G. We argue by contradiction. If o I is not a Nash equilibrium of G then there is a player i who can gain by playing a best response to

ol_i . Consider the strategy #i in G ( T ) defined as follows: Play a best response to o1_ i period 1, after any history (a I , a 2 . . . . , a t - l ) , play a

best response ot_i(a I , a 2 . . . . . a t - 1 ) in period t. di yields i a higher payoff in G ( T ) than

o i since player i does better in period 1 and at least as well in later periods. (Recall that i's payoff if the strategy combination 0 is played is his first period payoff plus (T - 1 ) v i.)

This contradicts the fact that 0 is a Nash equilibrium of G ( T ) . []

In a game in which each player receives more than his individually rational payoff level at some Nash equilibrium, the reasoning given above is no longer valid. In periods of play before the final one, the players can be threatened with future retaliation and so they need not be at one-shot best responses. In fact, in all but the last few periods of play, any individually rational, feasible outcome is possible. As the horizon of the repeated game increases arbitrarily, all individually rational and feasible payoffs re- sult (in the limit) from some Nash equilibrium of the repeated game. Thus, for these games, there is no discontinuity in the set of equilibrium payoffs as the number of

repetitions gets larger. We make this precise in Theorem 1 below. The basic idea is very simple. First, suppose that we wish to approximate the payoff from an individual-

ly rational o u t c o m e , a, and that the s a m e Nash equilibrium, e, yields payoffs which are strictly higher than minmax levels for all players. By playing this equilibrium enough times towards the end of a path, every player could be induced to play "a" once at the beginning, if threatened with his minmax payoff for the remaining periods. Thus, (a, e, e . . . . . e) could be supported as a Nash equilibrium outcome path. But if "a" is individ- uaUy rational, it could be played as many times as we wish at the beginning for the s a m e number of periods of "e" . As the number of times "a" is played can be increased without bound, the resulting average payoff approximates the payoff from "a". The formal proof builds on this idea, but is complicated by the fact that we are interested

in approximating any feasible and individually rational vector of payoffs. This may not result from any s ingle outcome and may have to be approximated itself. Finally "good" Nash equilibria may be player specific. We now supply these details.

Page 4: Nash equilibria of finitely repeated games

200 J.-P. Benoit and V. Krishna

Theorem 1: Suppose that for each player ], there exists a Nash equilibrium e j of G such that U/(eJ)> vj. Let u E R n be any individually rational and feasible payoff

vector for G. Then for all e > 0, there exists a T O such that for all T/> T o there is a Nash equilibrium o of G ( T ) whose payoff vector U(a) satisfies ll U(o) - ull < e.

Proof.- Since u is feasible, there exists a vector of weights (p l , p2 . . . . , pH) and a vector of outcomes (c I ,c 2 . . . . ,c/4) such that for all h, ph > 0, Z p h = 1 and Z p h U ( c h) =u.

p h

Let QI, Q2 . . . . . Qt4 be positive integers such that for all h, k ( Q h / Q k ) = (ph /pk ) .

(This assumes that ph/pk is a rational number. Otherwise it can be approximated arbitrarily closely.)

(i) We claim that there exists integers R 1, R 2 ' ..., R n such that the path

(c 1 , . . . , c l c 2 , . . . , c 2 . . . . . c " . . . . . c H , e I . . . . . e l , e 2 . . . . , e 2 . . . . . e n , . . . , e n) (*)

where c ~ is played Qh times and e i is played R i times, is a Nash equilibrium outcome

path ofG(Q +R) for Q = ~ Qh andR = ~,R i. h i

This outcome path is supported by strategies which call for any "deviator" to be punished by receiving his minmax payoff in every period of the remaining game.

We need to ensure that all players are at a best response. Consider player i in period t. If t > Q, i is clearly at a best response since only single stage Nash equilibria

are played from period Q + 1 onwards. Therefore, suppose that t ~< Q. Let a s be the

outcome corresponding to period s prescribed by the path in (*). We want that:

Q I1 Ui(bi(at), at_i) + (Q + R - t )v i <<. ~, Ui(a s) + ~, RJUi(eJ). (1)

s = t j = l

For (1) to hold it is sufficient that

Q Ui(bi(at), at_i) - Ui(a t) + ~, Iv i - Ui(aS)] <~Ri[Ui(e i) - vi]. (2)

$ = t + l

By assumption, Ui(e i) > vi. Therefore, for all t, there exists R i such that this inequali-

ty is satisfied. By picking the largest such R i and repeating the procedure, for i = 1,2,

.... n, (i) is established. With R i established as above, let k be any positive integer. Consider the outcome

path (**) generated by repeating the first Q periods of the path in (*) k times and then

playing R periods of Nash equilibria as before.

Page 5: Nash equilibria of finitely repeated games

Nash Equilibria of Finitely Repeated Games 201

(ii) We claim that (**) can also be supported as a Nash equilibrium by punishing any deviator with his minmax level for the remainder of the game. To verify this, we proceed by induction. (i) establishes its truth for k = 1. Suppose it can be sup- ported for k = K - 1. Consider k = K. By hypothesis, i is at a best response for t > Q. Therefore, take t ~< Q. We need to verify that:

xO n Ui(bi(at), a t i ) + (KQ + R - t)vi <~ Ui(a t) + ~ Ui(a s) + ~_, RJU~i(eJ). (3)

s=t+l ]=1

Observe that from the definition of the QZ's,

KQ Y, Ui(a s) : (K - 1)Qui >t (K - 1)Qvi, (4)

s=Q+l

where the last inequality follows from the fact that u is an individually rational payoff vector. Using (4), we have that a sufficient condition for (3) to hold is that:

Ui(bi(at) ,at i) + (Q + R - t)vi < Ui(a t) + Q

Ui(a s) + ~ RJUi(e i) (5) s=r+l ]

But the last condition is equivalent to (1) which we verified as true :in step (i). Thus, we have verified (ii).

Now the average payoff to player i in the equilibrium described in (**) is

[kQu~ + ~ RJU~(eJ)]/[kQ + hi. J

In the limit as k ~ ~, this expression approximates ui.

Finally, if T = kQ +R, for some k, let (**) be the corresponding path. For T satisfying kQ + R < T < (k + I)Q + R, add arbitrary (T - kQ - R) Nash equilibria to (**) at the end to obtain the corresponding path. The average payoff from these paths is the same as the average payoff from (**) in the limit. []

Note that the conditions of the Theorem are easily relaxed. For two player games a sufficient condition for obtaining, in the limit, all feasible, individually rational points is that at least one player obtain more than his minmax payoff at some equilibrium. This condition is also necessary provided that the feasible region does indeed contain points strictly better than the minmax point. For three or more players, analogous

Page 6: Nash equilibria of finitely repeated games

202 J.-P. Benoit and V. Krishna

conditions can be given but they are somewhat burdensome and unintuitive. As the Theorem already applies to a broad class of games, we state only this simpler result.

While the theorem has beeen stated in terms of an increasing time horizon, it affords another interpretation. Let the time horizon be fixed at one period and consider all payoffs to be flow payoffs. Now increase the speed with which players can react. Thus, for instance, a five-stage game is one in which players make five choices and

receive a one fifth payoff at each stage. Then our theorem indicates that at the reac- tion speed increases, that is, as the number of stages increases, all individually rational payoffs become feasible (provided, of course, the assumptions of the theorem are met).

The power of the theorem can be seen when it is applied to a modified prisoners' dilemma. Example 2 is obtained by simply adding a strictly dominated strategy for each player to Example 1. This reduces the minmax payoff levels for both players.

Punish

Cheat

Cooperate

Punish Cheat Cooperate

- 2 , - 2 - 2 , - 1 - 2 , - 2

- I , - 2 0 ,0 3 , -1

- 2 , - 2 - 1 , 3 2 ,2

Example 2

As a one-shot affair, this game retains all of the interesting features of the classic prisoners' dilemma, and is just as "disturbing". However, when repeated, cooperation

becomes possible in all but the final period of play, and the average payoffs approach the cooperative payoff. These cooperative outcomes are sustained by having both players cooperate from the first stage to the second to last stage, and having them play the one-shot equilibrium in the last stage, provided no cheating or punishing occurs

prior to this. If either player cheats or punishes before the last stage, both players punish for the remainder of the game. It is easily verified that these strategies form a Nash equilibrium of the finitely repeated game.

Now, consider the classic Cournet duopoly with linear demand. It is usually argued that this situation is essentially a prisoners' dilemma, in that the firms would like to agree on maximizing joint profits but cannot, since each one has the incentive to expand production away from the cartel level. While there is some truth in this proposition, the situation is, in fact, closer to the modified prisoners' dilemma of Example 2. The unique Nash equilibrium yields payoffs which are greater, in general,

than individually rational levels. The firms can still threaten each other with produc- tion quantities which are greater than the one-shot equilibrium quantities. Thus, collusion will be possible in a Nash equilibrium even in finite horizon settings.

Page 7: Nash equilibria of finitely repeated games

Nash Equilibria o f Finitely Repeated Games 203

Although we have been assuming no discounting, it is relatively easy to see that a folk theorem also exists if players do not discount the future too strongly. Let G~(T) be the T-period repetition of G where players evaluation criterion is the "discounted average" of their payoffs (0 < 6 < 1 is the discount factor). Thus, for an outcome path (a I , a 2, ... a T) of G~(T) the corresponding payoff to player i is [(1 -5)/(1 -57")]

T ~tUi(at ). The following theorem follows easily along the lines of" Theorem 1 and

t = 1

hence we state it without a formal proof.

Theorem 2: Suppose that for each player j, there exists a Nash equilibrium e i of G such that Uj(eJ)>vj. Let u CR n be any individually rational and feasible payoff vector in G. Then for all e there exists a 5 o and a T o such that for all 5/> 5 o and all T/> T o there exists a Nash equilibrium 6 of G~(T) whose payoff vector U~(o) satisfies I1 U6 (o) -u l l < e.

Thus, as the discount factor increases, there is no discontinuity in the set of equilib- rium payoffs as the game horizon goes to infinity. The same is not true, however, for a fixed discount factor. In Example 3 there exist payoffs of the infinitely repeated game which cannot be approximated in a finitely repeated game, no matter how great the number of repetitions.

1,1 0,0 4 , -10

0,0 0,0 0 ,0

-10 ,4 0 ,0 2,2

Example 3

Players discount using the factor 5 = 1/2. Each player's highest equilibrium payoff is 1 and minmax payoff is 0.

Now any equilibrium path must end with a series of one-shot equilibria. Consider the first attempt to incorporate a play to (2, 2) with probability 0.9 or more in an equilibrium. For at least one player following the path will be bounded above by 1.8 while deviating will yield more. Therefore, (2,2) cannot be approached as an equilibrium payoff vector.

On the other hand, playing (2, 2) forever with the threat of going to (0, 0) is an equilibrium path of the infinitely repeated game.

We have shown that, in general, repeating a game a (large) finite number of times greatly increases the set of Nash equilibrium payoffs. However, the strategies we constructed do not form subgame perfect equilibria. In a separate paper (Benoit and

Page 8: Nash equilibria of finitely repeated games

204 J.oP. Benoit and V. Krishna." Nash Equilibria of Finitely Repeated Games

Krishna 1985) we have addressed these issues and shown that provided each player has

at least two equilibrium payoffs in G and the feasible set is full dimensional, similar

results can be obtained for subgame perfect equilibria. (Fr iedman (1985) examines the

set o f equilibria which can be sustained using trigger strategies.)

References

Benoit J-P, Krishna V (1985) Finitely repeated games. Econometrica 53:905-922 Friedman J (1985) Cooperative equilibria in finite horizon non-cooperative supergames. Journal of

Economic Theory 35:390 -398 Luce R, Raiffa H (1957) Games and decisions. Wiley, New York Sorin S (1983) On repeated games with complete information. Manuscript Mathematics of Opera-

tions Research (forthcoming) Van-Damme E (1983) Refinements of the Nash equilibrium concept. Springer, Berlin

Received June 1985 Revised version November 1985