# Theory of Repeated Games

Post on 17-Feb-2017

690 views

TRANSCRIPT

Theory of Repeated Games

Lecture Notes on Central Results

Yosuke YASUDA

Osaka University, Department of Economics

yasuda@econ.osaka-u.ac.jp

Last-Update: May 21, 2015

1 / 36

Announcement

Course Website: You can find my corse websites from the link below:https://sites.google.com/site/yosukeyasuda2/home/lecture/repeated15

Textbook & Survey: MS is a comprehensive textbook on repeatedgames, K and P are highly readable survey articles, which complement MS.

MS Mailath and Samuelson, Repeated Games and Reputations:Long-run Relationships. 2006.

K Kandori, 2008.

P Pearce, 1992.

Symbols that we use in lectures: Ex : Example, Fg : Figure, Q : Question, Rm : Remark.2 / 36

Finitely Repeated Games (1)

A repeated game, a specific class of dynamic game, is a suitableframework for studying the interaction between immediate gains andlong-term incentives, and for understanding how a reputation mechanismcan support cooperation.

Let G = {A1, ..., An;u1, ..., un} denote a static game in which players 1through n simultaneously choose actions a1 through an from the actionspaces A1 through An, and the corresponding payoffs are u1(a1, ..., an)through un(a1, ..., an).

Definition 1

The game G is called the stage game of the repeated game.Given a stage game G, let G(T ) denote the finitely repeated game inwhich G is played T times, with the outcomes of all preceding playsobserved before the next play begins.

Assume that the payoff for G(T ) is simply the sum of the payoffsfrom the T stage games. (future payoffs are not discounted)

3 / 36

Finitely Repeated Games (2)

Theorem 2

If the stage game G has a unique Nash equilibrium, then, for any finiteT , the repeated game G(T ) has a unique subgame perfect Nashequilibrium: the Nash equilibrium of G is played in every stageirrespective of the past history of the play.

Proof.

We can solve the game by backward induction, that is, starting fromthe smallest subgame and going backward through the game.

In stage T , players choose a unique Nash equilibrium of G.

Given that, in stage T 1, players again end up choosing the sameNash equilibrium outcome, since no matter what they play in T 1the last stage game outcome will be unchanged.

This argument carries over backwards through stage 1, whichconcludes that the unique Nash equilibrium outcome is played inevery stage (irrespective of the past history).

4 / 36

Finitely Repeated Games (3)

When there are more than one Nash equilibrium in a stage game,multiple subgame perfect Nash equilibria may exist.

Furthermore, an action profile which does not constitute a stagegame Nash equilibrium may be sustained (for any period t < T ) in asubgame perfect Nash equilibrium. Q The following stage game will be played twice. Can players support

non-equilibrium outcome (M1,M2) in the first period?

1 2 L2 M2 R2L1 1, 1 5, 0 0, 0M1 0, 5 4, 4 0, 0R1 0, 0 0, 0 3, 3

Rm Note that there are two Nash equilibria in the stage game:(L1, L2), (R1, R2): what players choose in the first period may result indifferent outcomes (equilibria) in the second period.

5 / 36

Infinitely Repeated Games (1)

Even if the stage game has a unique Nash equilibrium, there may besubgame perfect outcomes of the infinitely repeated game in which nostage games outcome is a Nash equilibrium of G.

Let G(, ) denote the infinitely repeated game in which G isrepeated forever and the players share the discount factor .

For each t, the outcomes of the t 1 preceding plays of the stagegame are observed before the t-th stage begins.

Each players payoff in G(, ) is the average payoff defined asfollows.

Definition 3

Given the discount factor , the average payoff of the infinite sequenceof payoffs u1, u2, ... is

(1 )(u1 + u2 + 2u3 + ) = (1 )t=1

t1ut.

6 / 36

Infinitely Repeated Games (2)

There are a few important remarks:

The history of play through stage t is the record of the playerschoices in stages 1 through t.

The players might have chosen (as1, ..., asn) in stage s, where for each

player i the action asi belongs to Ai.

In the finitely repeated game G(T ) or the infinitely repeated gameG(, ), a players strategy specifies the action that she will take ineach stage, for every possible history of play.

In the infinitely repeated game G(, ), each subgame beginning atany stage is identical to the original game.

In G(T ), a subgame beginning at stage t+ 1 is the repeated game inwhich G is played T t times, denoted by G(T t).

In a repeated game, a Nash equilibrium is subgame perfect if theplayers strategies constitute a Nash equilibrium in every subgame,i.e., after every possible history of the play.

7 / 36

Unimprovability (1)

Definition 4

A strategy i is called a perfect best response to the other playersstrategies, when player i has no incentive to deviate following any history.

Consider the following requirement that, at first glance, looks muchweaker than the perfect best response condition.

Definition 5

A strategy for i is unimprovable against a vector of strategies of heropponents if there is no t 1 period history (for any t) such that i couldprofit by deviating from her strategy in period t only and conformingthereafter (i.e., switching back to the original strategy).

To verify the unimprovability of a strategy, one needs to checks onlyone-shot deviations from the strategy, rather than arbitrarilycomplex deviations.

8 / 36

Unimprovability (2)

The following result simplifies the analysis of SPNE immensely.

It is the exact counterpart of a well-known result from dynamicprogramming due to Howard (1960), and was first emphasized inthe context of self-enforcing cooperation by Abreu (1988).

Theorem 6

Let the payoffs of G be bounded. In the repeated game G(T ) orG(, ), strategy i is a perfect best response to a profile of strategies if and only if i is unimprovable against that profile.

The proof is simple, and generalizes easily to a wide variety of dynamicand stochastic games with discounting and bounded payoffs.

9 / 36

Unimprovability (3)

Proof of (Note is trivial).We will only show since is trivial. Consider the contrapositive,i.e., not perfect best response not umimprovable.

1 If i is not a perfect best response, there must be a history afterwhich it is profitable to deviate to some other strategy.

2 Then, because of discounting and boundedness of payoffs, theremust exist a profitable deviation involves defection for finitely manyperiods (and conforms to i thereafter).

If the deviation involves defection at infinitely many nodes, then forsufficiently large T , the strategy i that agrees with

i until time T

and conforms to thereafter, is also a profitable deviation (becauseof discounting and boundedness of payoffs).

3 Consider a profitable deviation involving defection at the smallestpossible number of period, denoted by T .

4 In such a profitable deviation, the player must be improvable (notunimprobable) after deviating for T 1 period.

10 / 36

Repeated Prisoners Dilemma (1)

Q The following prisoners dilemma will be played infinitely many times.Under what conditions of , can a SPNE support cooperation (C1, C2)?

1 2 C2 D2C1 2, 2 -1, 3D2 3, -1 0, 0

Suppose that player i plays Ci in the first stage. In the t-th stage, if theoutcome of all t 1 preceding stages has been all (C1, C2) then play Ci;otherwise, play Di (thereafter).

This strategy is called trigger strategy, because player i cooperatesuntil someone fails to cooperate, which triggers a switch tononcooperation forever after.

If both players adopt this trigger strategy then the outcome of theinfinitely repeated game will be (C1, C2) in every stage.

11 / 36

Repeated Prisoners Dilemma (2)

To show that the trigger strategy is SPNE, we must verify that thetrigger strategies constitute a Nash equilibrium on every possiblesubgame that could be generated in the infinitely repeated game. Rm Since every subgame of an infinitely repeated game is identical tothe game as a whole (thanks to its recursive structure), we have toconsider only two types of subgames: (i) subgame in which all theoutcomes of earlier stages have been (C1, C2), and (ii) subgames inwhich the outcome of at least one earlier stage differs from (C1, C2).

By unimprovability, it is sufficient to show that there is no one-shotprofitable deviation in every possible history that can realize whenplayers follow the trigger strategies.

Players have no incentive to deviate in (ii) since trigger strategyinvolves repeated play of one shot NE, (D1, D2).

12 / 36

Repeated Prisoners Dilemma (3)

The following condition guarantees that there will be no (one-shot)profitable deviation in (i).

2 + 2 + 2 2 + 3 + 0 + 2 0 + 2( + 2 + ) 1

21

1 13.

Mutual cooperation (C1, C2) can be sustained as an SPNE outcomeby using the trigger strategy when players are long-sighted.

Trigger strategy (in repeated prisoners dilemma) is the severestpunishment, since each player receives her minmax payoff (in everyperiod) after deviation happens.

13 / 36

Folk Theorem: Preparation (1)

Rm The following expositions are Fudenberg and Maskin (1986).For each j, choose M j = (M j1 , . . . ,M

jn) so that

(M j1 , . . . ,Mjj1,M

jj+1, . . . ,M

jn) argmin

ajmaxaj

uj(aj , aj),

and player js reservation value is defined by

vj := maxaj

ui(aj ,Mjj) = ui(M

j).

The strategies M j = (M j1 , . . . ,Mjj1,M

jj+1, . . . ,M

jn) are minimax

strategies (which may not be unique) against player j, and vj is thesmallest payoff that the other players can keep player j below.

We refer to (v1 , . . . , vn) as the minimax point.

14 / 36

Folk Theorem: Preparation (2)

Definition 7

Let V be the set of feasible payoffs, i.e., a convex hull of payoff vectorsu yielded by (pure) action profiles, and V ( V ) be the set of feasiblepayoffs that Pareto dominate the minimax point:

V = {(v1, . . . , vn) V |vi > 0 for all i}.

V is called the set of individually rational payoffs.

There are a couple of versions of folk theorem.

The name comes from the fact that the statement (relying on NErather than SPNE) was widely known among game theorists in the1950s, even though no one had published it.

15 / 36

Folk Theorem (1)

Theorem 8 (Theorem A)

For any (v1, . . . , vn) V , if players discount the future sufficiently little,there exists a Nash equilibrium of the infinitely repeated game where,for all i, player is average payoff is vi.

If a player deviates, it may not be in others interest to go through withthe punishment of minimaxing him forever. However, Aumann andShapley (1976) and Rubinstein (1979) showed that, when there isno discounting, the counterpart of Theorem A holds for SPNE.

Theorem 9 (Theorem B)

For any (v1, . . . , vn) V there exists a subgame perfect equilibriumin the infinitely repeated game with no discounting, where, for all i,player is expected payoff each period is vi.

16 / 36

Folk Theorem (2)

One well-known case that admits both discounting and simple strategiesis where the point to be sustained Pareto dominates the payoffs of aNash equilibrium of the constituent game G.

Theorem 10 (Theorem C)

Suppose (v1, . . . , vn) V Pareto dominates the payoffs (y1, . . . , yn) ofa (one-shot) Nash equilibrium (e1, . . . , en) of G. If players discount thefuture sufficiently little, there exists a subgame perfect equilibrium ofthe infinitely repeated game where, for all i, player is average payoff is vi.

Because the punishments used in Theorem C are less severe thanthose in Theorems A and B, its conclusion is weaker.

For example, Theorem C does not allow us to conclude that aStackelberg outcome can be supported as an equilibrium in aninfinitely repeated quantity-setting duopoly.

17 / 36

General Falk Theorem Two Players

Abreu (1988) shows that there is no loss in restricting attention tosimple punishments when players discount the future. Indeed, simplepunishments are employed in the proof of the following result.

Theorem 11 (Theorem 1)

For any (v1, v2) V there exists (0, 1) such that, for all (, 1),there exists a subgame perfect equilibrium of the infinitely repeatedgame in which player is average payoff is vi when players have discountfactor .

After a deviation by either player, the players (mutually) minimaxeach other for a certain number of periods, after which they returnto the original path.

If a further deviation occurs during the punishment phase, the phaseis begun again.

18 / 36

General Falk Theorem Three or More Players

The method we used to establish Theorem 1 mutual minimaxingdoes not extend to three or more players.

Theorem 12 (Theorem 2)

Assume that the dimensionality of V equals n, the number of players,i.e., that the interior of V (relative to n-dimensional space) is nonempty.Then, for any (v1, . . . , vn) in V , there exists (0, 1) such that for all (, 1) there exists a subgame perfect equilibrium of the infinitelyrepeated game with discount factor in which player is average payoff isvi.

If a player deviates, he is minimaxed by the other players longenough to wipe out any gain from his deviation.

To induce the other players to go through with minimaxing him,they are ultimately given a reward in the form of an additional in their average payoff.

The possibility of providing such a reward relies on the fulldimensionality of the payoff set.

19 / 36

Imperfect Monitoring (1)

Perfect Monitoring: Players can fully observe the history of their pastplay. There is no monitoring difficulty or imperfection.

Bounded/Imperfect Recall: Players forget (part of) the history oftheir past play, especially that of distant past, as time goes by.

Imperfect Monitoring: Players cannot directly observe the (full) historyof their past play, but instead observe signals that depend on actionstaken in the previous period. Public Monitoring Players publicly observe a common signal. Private Monitoring Players privately receives different signals.

20 / 36

Imperfect Monitoring (2)

Punishment necessarily becomes indirectly linked with deviation.

Players can punish the deviator only in reaction to the commonsignals, since they cannot observe deviation itself.

Even if no one has deviated, punishment is triggered when badsignal realizes (with positive probability).

Constructing (efficient) punishment becomes dramatically difficult.

21 / 36

Example | Prisoners Dilemma (1)

Consider the following Prisoners Dilemma as a stage game while eachplayer cannot observe the rivals past actions.

Table: Ex ante Payoffs ui(ai, ai)

1 2 C DC 2, 2 -1, 3D 3, -1 0, 0

Q Can each player deduce the rivals action through the realized payoff(and her own action) ?

If this is the case indeed, then observation cannot be imperfect...

22 / 36

Example | Prisoners Dilemma (2)

Player is payoff in each period depends only on her own action,ai {C,D} and the public signal, y {g, b}, i.e., ui (y, ai).

Table: Ex post Payoffs ui (y, ai)

i y g b

C3 p 2qp q

p+ 2qp q

D3(1 r)q r

3rq r

p, q, r (0 < q, r < p < 1) are conditional probabilities that g realizes:

p = Pr{g|CC}, q = Pr{g|DC} = Pr{g|CD}, r = Pr{g|DD}.

23 / 36

Example | Prisoners Dilemma (3)

To achieve cooperation, consider the (m...

Recommended