game theory - econ.jku.at · repeated games until now, we considered so-called "one-shot...

41
Game Theory Wolfgang Frimmel Repeated Games 1 / 41

Upload: trinhngoc

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Game Theory

Wolfgang Frimmel

Repeated Games

1 / 41

Recap: SPNE

The solution concept for dynamic games with completeinformation is the subgame perfect Nash Equilibrium (SPNE)

Selten (1965):

A strategy profile s∗ is a subgame perfect Nash equilibrium ofextensive games if the strategy profile s∗ is a Nash equilibrium inall subgames. Any finite extensive form game with complete (butimperfect) information will have a SPNE (possibly involving mixedstrategies)

Proof: by backward induction

SPNE may not necessarily be unique

SPNE eliminates all non-credible NE

Non-credible NE: there is no credible threat for deviation

2 / 41

Repeated Games

Until now, we considered so-called ”one-shot games” with the(implicit) assumption that the game is played once among playerswho expect to not meet each other again

In real life, games are typically played within a larger context andactions affect not only present situation, but they may also haveimplications for the future

3 / 41

Repeated Games

Players may have considerations about the future, affecting alsotheir behavior in the present, i.e. if the same players meet againrepeatedly, threats and promises about future behavior caninfluence current behavior

Such situations are captured in repeated games, in which a ”stagegame” is played repeatedly

Normal-form or extensive-form games are repeated finitely orinfinitely regardless of what has been played in previous games,and often with the same set of players

The outcome of a repeated game is a sequence of outcomes

4 / 41

Finitely repeated game

Definition

Let T = {0, 1, ...,n} be the set of all possible dates

Let G be a stage game with perfect information, which is playedat each t ∈ T

The payoff of each player in this larger game is the sum of payoffsthe player receives in each stage game

Denote this larger game as GT

At the beginning of each repetition, a player considers what eachplayer has played in the previous rounds

A strategy in the repeated game GT assigns a strategy to eachstage game G

5 / 41

Two-stage game: Prisoners’ dilemma

Consider a situation in which two players play the Prisoners’ Dilemmagame

C DC 5,5 0,6D 6,0 1,1

Now assume, T = {0, 1} and G is this Prisoners’ Dilemma game. Then,the repeated game GT can be represented in the extensive form as:

6 / 41

Two-stage game: Prisoners’ dilemma (cont.)

At t = 1, a history is a strategy profile of the game, indicating what hasbeen played at t = 0: (C ,C ), (C ,D), (D ,C ), (D ,D)

GT has 4 subgames - note that payoffs are the sum of payoffs fromboth games (no discounting)!

We have (D ,D) as a unique Nash equilibrium at each of these subgames→ the actions at t = 1 are independent of what is played at t = 0

Given the behavior in t = 1, the game in t = 0 reduces to:

C DC 6,6 1,7D 7,1 2,2

We add 1 to each payoff, as this is the payoff of t = 1 for each player

The unique equilibrium of this reduced game is (D ,D)

This is also a unique subgame-perfect equilibrium: at each history, eachplayer plays D

7 / 41

n-stage games

What about arbitrary n?

On the last day n, independent of what has been played in theprevious rounds, there is a unique Nash equilibrium for theresulting subgame: Each player plays D .

The actions at day n − 1 do not have any effect in what will beplayed on the next day.

Going back to date 0, we would find a unique SPNE: At each t foreach outcomes of previous stage games, players play D

This is a general result!

8 / 41

Finitely repeated games with unique NE in stage game

Definition

Given a stage game G , let GT denote the finitely repeated game inwhich G is played T times, with the outcomes of all preceding playsobserved before the next round. The payoffs for GT are simply the sumof the payoffs from the T stage games.

Selten’s Theorem

If the stage game G has a unique Nash equilibrium then, for anyfinite T , the repeated game GT has a unique subgame perfect Nashequilibrium: the Nash equilibrium of G is played in every stage

Proof can be found in any advanced game theory or micro book

9 / 41

Finitely repeated games with multiple NE in stage game

Consider the following modified version of the Prisoners’ dilemma:

C D R

C 5,5 0,6 0,0

D 6,0 1,1 0,0

R 0,0 0,0 4,4

There are two pure strategy NE in this stage game

Assume that this stage game is played twice

Playing any sequence of NE would be a SPNENow consider the following conditional strategy (conditional onfuture NE):

In the first stage, players anticipate that the second stage outcomewill be a Nash equilibrium of the stage game, hence (D ,D) or (R,R)Players anticipate that (R,R) will be the second stage outcome ifthe first-stage outcome is (C ,C ), however (D ,D) will be thesecond-stage outcome otherwise

10 / 41

Finitely repeated games with multiple NE in stage game

Players’ first-stage interactions amount to the following one-shotgame:

C D R

C 9,9 1,7 1,1

D 7,1 2,2 1,1

R 1,1 1,1 5,5

There are 3 pure-strategy Nash equilibria - (C ,C ), (D ,D) and(R,R)

1 The NE (D ,D) corresponds to (D ,D) in the first-stage and (D ,D)in the second-stage

2 The NE (R,R) corresponds to (R,R) in the first-stage and (D ,D)in the second-stage

3 The NE (C ,C ) corresponds to (C ,C ) in the first-stage and (R,R)in the second-stage

11 / 41

Finitely repeated games with multiple NE in stage game

(D ,D) and (R,R) are concatenations of Nash equilibria outcomesof the stage game

(C ,C ) is a qualitatively different result - (C ,C ) in the first stagegame is not a NE

Cooperation is possible in the first stage of a SPNE of a repeatedgame because of a credible threat and punishment

However, SPNE depends on assumption about players’anticipation in the second stage (see conditional strategy)

Our conditional strategy requires playing (D ,D) in the secondstage, which appears silly if (R,R) is available

The credible punishment for a player if deviation from (C ,C )in the first-stage is playing a pareto-dominated equilibrium in thesecond-stage

12 / 41

Single-deviation principle

Verifying that a given strategy profile is a SPNE can be difficult - thegame above has 10 subgames, hence 310 strategies of each player!

Solution: Single-deviation principle

Definition

Given the strategies of the other players, strategy si of player i in a repeatedgame satisfies the single-deviation principle if player i cannot gain bydeviating from si in a single stage game, holding all other players’ strategiesand the rest of her own strategy fixed.

Proposition

In a finitely repeated game, a strategy profile s is a SPNE if and only if eachplayer’s strategy satisfies the single-deviation principle.

This proposition also extends to infinitely repeated games, given futurepayoffs are discounted!

13 / 41

Single-deviation principle

Let’s check for our example:

The single-deviation principle requires to check for singledeviations for each player at each stage:

Second-stage: If (C ,C ) is observed, best response is R(5 + 4 > 5 + 0)

First-stage: Deviation in the first-stage would yield a payoff of6 + 1 < 5 + 4

No deviation is profitable - (C ,C ) in the first stage and (R,R) inthe second-stage is a SPNE

14 / 41

Finitely vs. infinitely repeated games

Credible threats and promises about future behavior can influencecurrent behavior

If the relationship is only finitely repeated, this is only true if thestage game has multiple equilibria

If G is a static game of complete information with multiple NEs, thenthe T -times repeated game GT may have SPNEs in which, for anyt < T , the outcome in stage t is not a NE of G

For infinitely repeated games this result is stronger: even if thestage game G has a unique NE, there may be SPNEs of theinfinitely repeated game GT , in which the outcome is not a NE ofthe stage game G .

Hence, infinitely repeated games may be suitable for modelingcooperation sustained by threats and punishment strategies

15 / 41

Infinitely repeated games

Simply summing the payoffs from an infinite sequence of stagegames does not provide a useful meaasure of players’ payoff in aninfinitely repeated game. Why?

Solution: Use the the discounted sum of the sequence of payoffs

Each player i has a payoff function ui and a discount factorδi ∈ [0, 1] such that an infinite sequence (s1, s2, ...) is evaluated by:

ui(s1) + δiui(s

2) + δ2i ui(s3) + ... =

∞∑t=1

δt−1i ui(st)

The discount factor δi measures how much a player cares aboutthe future

when δi is close to 0, player i does not care about the future →impatientwhen δi is close to 1, player i does care about the future → patient

16 / 41

Infinitely repeated games

The infinitely repeated game differs only by the set of terminalhistories, which is the set of infinite sequences (s1, s2, ...)

The payoff is the present value∞∑t=1

δt−1i ui(st)

Note: one could also use the present value as a measure of payoffsin finitely repeated games

One could also reinterpret δ as a repeated game that ends after arandom number of repetitions, where δ is the probability that thegame ends immediately, and 1− δ that the game continues for atleast one more stage

Are infinitely repeated games likely to occur? Intuitively, in a lotof long-lasting interactions, the termination date of interaction istypically unknown to players or plays a little role

17 / 41

SPNE in infinitely repeated games

Consider again the following prisoners’ dilemma:

C D

C 5,5 0,6

D 6,0 1,1

Analogous to finitely repeated games: playing the unique NE(D ,D) in every stage game implies a NE in every subgame of theinfinitely repeated game ⇒ SPNE

In the presence of credible punishment, we may also get SPNEdifferent from Nash equilibria outcomes of the stage game → thereare strategies leading to (C ,C ) in every stage game as a SPNE

Examples of such strategies:(Grim) trigger strategyTit-for-tat strategyLimited punishment

18 / 41

Strategies in an infinitely repeated game

(Grim) trigger strategy

si(�) = C

and si(s1, ..., s t) =

{C if (s1j , ...s

tj ) = (C , ...C )

D otherwise

for every history (s1, ..., s t) and j is the other player

Player i chooses C at the start of the game and after any historyin which every previous action of player j was C

Whenever player j once chooses D , player i will also switch toaction D

Once D is reached, this state is never left

19 / 41

Strategies in an infinitely repeated game (cont.)

Tit-for-tat

The length of the punishment depends on the behavior of thepunished player

If the punished player continues to deviate with playing D ,tit-for-tat continues to do so as well ⇒ no reversion to C

Whenever the punished player reverts to C , then tit-for-tat revertsto C as well

In other words: do whatever the other player did in the previousperiod

20 / 41

Nash equilibrium: (Grim) trigger strategy

Assume that player 1 uses the grim trigger strategy

If player 2 uses this strategy as well → (C,C) will be the outcomein every period with payoffs (5,5,...)

The discounted sum is (5 + 5δ + 5δ2 + 5δ3 + ...) = 51−δ

If player 2 uses a different strategy, then in at least one period, heraction is D

In all subsequent periods, player 1 chooses D as well, since it is abest-response

Up to the first period to which player 2 chooses D her payoff is 5each period

21 / 41

Nash equilibrium: (Grim) trigger strategy

Player 2’s subsequent sequence of payoffs is (6, 1, 1, ...) (gains oneunit from deviation, loses one unit due to reaction of player 1)

Hence, the discounted sum from deviating is:(6 + 1δ + 1δ2 + 1δ3, ...) = 6 + δ

1−δ

Cooperation is successful if the payoff of cooperation is at least asgood as the payoff of cheating:

51−δ ≥ 6 + δ

1−δ

Cooperation if δ ≥ 15

In this example, only very impatient players with δ < 15 can

increase their payoff by deviating

22 / 41

Nash equilibrium: Tit-for-tat

Assume that player 1 uses the tit-for-tat strategy

When player 2 also adheres to this strategy, the equilibriumoutcome will be (C ,C ) in every period

Now assume, D is a best response to tit-for-tat for player 2

Denote t as the first period, where player 2 chooses D → player 1will choose D in period t + 1 onwards, until player 2 reverts to C

Player 2 has two options from period t + 1 onwards: revert to Cand face the same situation as at the start of the game, or continuewith D , in which case player 1 will continue to do so as well

So if player 2’s best response to tit-for-tat is choosing D in someperiod, then she either alternates between C and D or chooses Din every period

23 / 41

Nash equilibrium: Tit-for-tat (cont.)

Payoff for alternating between C and D is (6, 0, 6, 0...) = 61−δ2

Payoff for staying with D is (6, 1, 1, ...) = 6 + δ1−δ

Payoff of playing tit-for-tat is (5, 5, 5, ...) = 51−δ

Hence, tit-for-tat is best response to tit-for-tat if and only if:

51−δ ≥

61−δ2 and 5

1−δ ≥ 6 + δ1−δ

Both of these conditions are equivalent to δ ≥ 15

Whenever δ ≥ 15 , a strategy pair, in which both players use

tit-for-tat, is a Nash equilibrium

24 / 41

Folk Theorem

A main objective of studying repeated games is to explore therelation between the short-term incentives and long term incentives

When players are patient, their long-term incentives take over, anda large set of behavior may result in equilibrium.

The equilibrium multiplicity is a general implication of (infinitely)repeated games

This main result is stated in the so-called Folk Theorem

Before, we need to introduce two further definitions

25 / 41

Feasible payoffs

We call payoffs (x1, ..., xn) feasible in the stage game G if the payoffs area convex combination (i.e. a weighted average) of the pure strategypayoffs of G

Graphically, the set of feasible payoffs for the prisoners’ example:

1,1

0,6

6,0

5,5

Pure strategy payoffs (1, 1), (6, 0), (0, 6) and (5, 5) are feasible

All other pairs in the shaded-region are weighted averages ofpure-strategy payoffs

26 / 41

Average payoffs

Players’ payoffs are still defined over the present value (PV ) of theinfinite payoff stream but expressed in terms of the average payoff fromthe same infinite sequence of payoffs.

The average payoff is the payoff that would have to be received in everystage game so as to yield the PV

Definition

Given the discount factor δ, the average payoff of the infinite sequence ofpayoffs u1, u2, ... is

(1− δ)∞∑t=1

δt−1i ui(s

t)

Note that if there is a fixed payoff stream u in every stage, the PV wouldbe u

1−δ ⇒ the average payoff is (1− δ)PV = (1− δ) u1−δ = u

The average payoff is directly comparable to payoffs from a stage game

Since average payoff is just a rescaling of the PV, maximising the averagepayoff is equivalent to maximising the PV

27 / 41

Folk Theorem

The Folk Theorem states that any strictly rational and feasiblepayoff vector can be supported in a SPNE when the players aresufficiently patient

Folk Theorem

Let G be a finite static game of complete information. Let (e1, ..., en)denote the payoffs from a Nash equilibrium of G , and let (x1, ..., xn)denote any other feasible payoffs from G . If xi > ei for every player iand if δ is sufficiently close to one, then there exists a subgame perfectNash equilibrium of the infinitely repeated game G∞ that achieves(x1, ..., xn) as the average payoff.

28 / 41

Folk Theorem: Proof (I)

The proof follows the arguments for the infinitely repeatedPrisoners’ dilemma

Let (ae1, ..., aen) be the NE of G that yields the equilibriumpayoffs (e1, ..., en)

Let (ax1, ..., axn) be the set of actions yielding the feasible payoff(x1, ..., xn)

Consider the standard trigger strategy for players i = (1, ...,n):

Play axi in the first stage. In the t th stage, if the outcome of all t-1preceding stages has been (ax1, ..., axn), then play axi ; otherwise play aei

Assume that all players have adopted this trigger strategy

29 / 41

Folk Theorem: Proof (II)

Since the others will play (ae1, ..., ae,i−1, ae,i+1, ..., aen) forever,once one stage’s outcome differs from (ax1, ..., axn), playing aei is abest response for player i once the outcome differs from(ax1, ..., axn)

What is the best response for player i in the first stage and anystage where the preceding outcome has been (ax1, ..., axn)?

Let ad1 be player i’s best deviation from (ax1, ..., axn) and di is thecorresponding payoff from this deviation

Hence we have the payoff relationship di ≥ xi > ei

The present value of the sequence from player di is

di + δei + δ2ei + ... = di +δei

1− δ30 / 41

Folk Theorem: Proof (III)

Alternatively, playing axi will yield a payoff of xi .

If playing axi is optimal, the present value of is

xi + δxi + δ2xi + ... =xi

1− δIf playing adi is optimal, the present value of is (see before)

di +δei

1− δSo, playing axi is optimal if and only if

xi1− δ

≥ di +δei

1− δor

δ ≥ di − xidi − ei

31 / 41

Folk Theorem: Proof (IV)

Since this threshold value for a best response may differ amongplayers, it is only a NE for all the players to play the triggerstrategy if and only if

δ ≥ maxidi − xidi − ei

The threshold discount factor for trigger strategy being a NE isdetermined by

(short-term) gain from deviation ⇒ higher short-term gain fromnon-cooperation → more difficult to achieve cooperation(long-term) loss from deviation ⇒ higher short-term loss fromnon-cooperation (i.e. stronger punishment) → easier to achievecooperation

32 / 41

Folk Theorem: Proof (V)

Is this Nash equilibrium also subgame perfect, hence is it a Nashequilibrium in every subgame of G(∞, δ)?Two classes of subgames:

1 subgames with all outcomes of earlier stages have been (ax1, ..., axn)2 subgames in which the outcome of at least one earlier stage differs

from (ax1, ..., axn)

If players adopt trigger strategy for first class of games, thenplayers’ strategies in a subgame are again the trigger strategy(ax1, ..., axn); we just showed that this is a NE for the game as awhole

If players adopt trigger strategy for second class of games, thenplayers’ strategies are to repeat the stage-game equilibrium(ae1, ..., aen), which is also a NE for the whole game.

The trigger-strategy Nash equilibrium of the infinitely repeatedgame is subgame perfect (if δ is sufficiently large)

33 / 41

Folk Theorem

The Folk theorem implies that any point in the area above or rightto the red lines can be achieved as the average payoff in a SPNE, ifthe discount factor is sufficiently large

1,1

0,6

6,0

5,5

M(B)

Main message: Although repeated games allow for cooperativebehaviour, they also allow for an extremely wide range ofbehavior

34 / 41

Application: Cartels

Demand is given by p = A−Q , marginal cost is constant andequal to c, where A > c

There are n firms in the market, the stage game is Cournot

Firms’ discount factor is δ ∈ (0, 1)

Task:

1 Find the critical value of the discount factor to sustain collusion iffirms use grim trigger strategies. Assume that collusive behaviorinvolves equal sharing of monopoly output and profits

2 How does the minimum discount factor depend on the number offirms?

35 / 41

Application: Efficiency wages (Shapiro & Stiglitz, 1984)

Firms induce workers to work harder by paying high wages andthreatening to fire workers caught shirking

Firms reduce their demand for labor, so some workers areemployed at high wages, but involuntary unemployment increases

Larger pool of unemployed workers → threat of firing increases

In competitive equilibrium, wages w and unemployment rate ujust induce workers not to shirk, and labor demand at w results inunemployment rate u

36 / 41

Efficiency wages: Stage game

Firm offers the worker a wage w .

If the worker rejects w , she becomes self-employed at wage w0.

In case of acceptance of w , the worker chooses either to supplyeffort (with disutility e) or to shirk (without any disutility)

Effort decision is unobserved by firm, but worker’s output(low : y = 0 or high : y > 0) not

For simplicity: In case of high effort, output is high, but if workershirks, output is low

If firm employs worker at wage w , payoff with effort are y − w forthe firm and w − e for the worker; if the worker shirks, e = 0 andif output is low, y = 0

Assume that y − e > w0, hence it is efficient for the worker to beemployed and supply effort

37 / 41

Efficiency wages: Stage game

Backward induction: worker observes wage offer w :

if w ≤ w0: rejectif w > w0: accept and set e = 0 (maximises payoff w − e!)firms will anticipate e = 0, so they set w ≤ 0worker will choose self-employment

Is there a way to offer a wage premium w > w0 in an infinitelyrepeated game? ⇒ Yes, if there is a credible threat to fire theworker in case of low output

Consider the following strategy:

Firm offers w = w∗ > w0 in the first period, and in each subsequentperiod if output is high, but offer w = 0 otherwiseWorkers accept firm offer and provide effort if w∗ ≥ w0, but shirkotherwiseTrigger strategy: play cooperatively provided that all previous playshave been cooperative, but switch forever to the SPNE of the stagegame in case of deviation.

38 / 41

Efficiency wages: Worker

Firm offers w∗ and worker accepts. If the worker provides effort,output will be high, so the firm will offer w∗ also in the next period

If it is optimal for the worker to provide effort, the present value ofworker’s payoff is:

Ve =(w∗ − e)

1− δIf the worker shirks, the present value of worker’s payoff is:

Vs = w∗ +δw0

1− δIncentive to supply effort exists if

(w∗ − e)

1− δ≥ w∗ +

δw0

1− δor

w∗ ≥ w0 +e

δ39 / 41

Efficiency wages: firm

Firm decides between w = w∗, hence inducing effort bythreatening to fire in case of low output leaving a profit of y − w∗,and w = 0, hence inducing worker to choose self-employment,leaving a profit of zero in each period

It is optimal for the firm to offer a wage w = w∗ if

y − w∗ ≥ 0

Since y ≥ w∗

y ≥ w0 +e

δ

Cooperation is a Nash equilibrium if

δ ≥ e

y − w0

40 / 41

Efficiency wages: Equilibrium

Consider that we have sequential-move stage game, where workersobserve wage offers.

Cooperation if δ is sufficiently large is the SPNE if firms setw = w∗ (hence the high-wage, high-output histories)

What is the SPNE for all other histories? Workers will neversupply effort, so firms induce them to choose self-employment bysetting w = 0 by the next stage→ permanent self-employment

If worker is ever caught shirking, w = 0 forever after; if firmdeviates from offering w = w∗, then worker set e = 0 forever after,so firms cannot afford to employ the worker

Cooperation is only a SPNE if renegotiation is not feasible

41 / 41