7 repeated games

36
Nau: Game Theory 1 Introduction to Game Theory 7. Repeated Games Dana Nau University of Maryland

Upload: kanjai

Post on 04-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 1/36

Nau: Game Theory 1

Introduction to Game Theory

7. Repeated Games

Dana Nau

University of Maryland

Page 2: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 2/36

Nau: Game Theory 2Repeated Stag Hunt

Repeated Games

!  Used by game theorists, economists, social and behavioral scientists

as highly simplified models of various real-world situations

Roshambo

Iterated Chicken Game

Repeated

Matching Pennies

Iterated Prisoner’s DilemmaRepeated

Ultimatum Game

Iterated Battle

of the Sexes

Page 3: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 3/36

Nau: Game Theory 3

Finitely Repeated Games

!  In repeated games, some game G is playedmultiple times by the same set of agents

!  G is called the stage game

•  Usually (but not always),

G is a normal-form game

!  Each occurrence of G is calledan iteration or a round 

!  Usually each agent knows what all

the agents did in the previous iterations,

 but not what they’re doing in the

current iteration

!  Thus, an imperfect-informationgame with perfect recall

!  Usually each agent’s

 payoff function is additive

 Agent 1:  Agent 2 :

C

CD

CRound 1:

Round 2:

Prisoner’s Dilemma:

3+0 = 33+5 = 5Total payoff:

2

C D

C 3, 3 0, 5

 D 5, 0 1, 1

Iterated Prisoner’s Dilemma,

with 2 iterations:

Page 4: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 4/36

Nau: Game Theory 4

Iterated Prisoner’s Dilemma with 2 iterations:

Strategies

!  The repeated game has a much bigger strategy space than the stage game

!  One kind of strategy is a stationary strategy:

!  Use the same strategy

at every iteration

!  More generally, an

agent’s play at each stage

may depend on the history

!  What happened in

 previous iterations

Page 5: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 5/36

Nau: Game Theory 5

Backward Induction

!  If the number of iterations is finite and known, we can use backwardinduction to get a subgame-perfect equilibrium 

!  Example: finitely many repetitions of the Prisoner’s Dilemma

!  In the last round,

the dominant strategy is D

!  That’s common knowledge

!  So in the 2nd-to-last round,

 D also is the dominant strategy

!  …

! The SPE is ( D,D) on every round

!  As with the Centipede game,

this argument is vulnerable to

 both empirical and theoretical criticisms

 Agent 1:  Agent 2 :

D

DD

DRound 1:

Round 2:

D

DD

DRound 3:

Round 4:

Page 6: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 6/36

Nau: Game Theory 6

Backward Induction when G  is 0-sum

!  As before, backward induction works much better in zero-sum games

!  In the last round, equilibrium is the minimax profile

•  Each agent uses his/her minimax strategy

!  That’s common knowledge

! So in the 2nd-to-last round, itagain is the minimax strategies

!  The SPE is ( D,D) on every round

Page 7: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 7/36

Nau: Game Theory 7

!  An infinitely repeated game in extensive form would be an infinite tree

!  Payoffs can’t be attached to any terminal nodes

!  Payoffs can’t be the sums of the payoffs in the stage games (generally infinite)

!  Two common ways around this problem

!  Let r  (1)i , r  (2)

i , … be an infinite sequence of payoffs for agent i 

! Agent i’s  average reward is

!  Agent i’s future discounted reward is the discounted  sum of the payoffs, i.e.,

where !  (with 0 !  !  ! 1) is a constant called the discount factor  

!  Two ways to interpret the discount factor:

1.  The agent cares more about the preset than the future

2.  The agent cares about the future, but the game ends at any round with

 probability 1 "  !  

Infinitely Repeated Games

 

limk "#

ri  j ( ) / k  j =1

$

 

"  j ri

 j ( )

 j =1

#

$

Page 8: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 8/36

Nau: Game Theory 8

Example

!  Some well-known strategies for the Iterated Prisoner’s Dilemma:»   AllC : always cooperate

»   AllD (the Hawk  strategy):

always defect

»  Grim: cooperate until the

other agent defects,

then defect forever»  Tit-for-Tat (TFT):

cooperate on the first move.

On the nth move, repeat

the other agent (n –1)th move

»  Tester : defect on move 1. If the

other agent retaliates, play TFT.Otherwise, randomly intersperse

cooperation and defection

!  If the discount factor is large enough, each of the following is a Nash equilibrium

!  (TFT, TFT ), (TFT,GRIM ), and (GRIM,GRIM )

C C

TFT Tester  

C

CC

C

C

C

 … 

 … 

 D

C

 D

C

C

C

 D  D

TFT

or

Grim   AllD 

C

 D D

 D

 D

 D

 … 

 … 

 D

 D

 D

 D

 D

 D

C C

 AllC,

Grim,

or TFT  

 AllC,

Grim,

or TFT  

C

CC

C

C

C

 . . . . . .

C

C

C

C

C

C

Page 9: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 9/36

Nau: Game Theory 9

Equilibrium Payoffs for Repeated Games

!  There’s a “folk theorem” that tells what the possible equilibrium payoffsare in repeated games

!  It says roughly the following:

!  In an infinitely repeated game whose stage game is G, there is a

 Nash equilibrium whose average payoffs are ( p1 , p2 , …, pn)

if and only if

!  G has a mixed-strategy profile ( s1 , s2 , …, sn) with the following

 property:

•  For each i, si’s payoff would be #  pi if the other agents used

minimax strategies against i 

Page 10: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 10/36

Page 11: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 11/36

Nau: Game Theory 11

Zero-Sum Repeated Games

!  For two-player zero-sum repeated games, the folk theorem is still true, but it becomes vacuous

!  Suppose we iterate a two-player zero-sum game G

!  Let V  be the value of G (from the Minimax Theorem)

!  If agent 2 uses a minimax strategy against 1, then 1’s maximum payoff is V  •  Thus max value for p1 is V, so min value for p2 is –V  

!  If agent 1 uses a minimax strategy against 2, then 2’s maximum payoff is – V  

•  Thus max value for p2 is –V , so min value for p1 is V

!  Thus in the iterated game, the only Nash-equilibrium payoff profile is (V,–V )

!  The only way to get this is if each agent always plays his/her minimax strategy

•  If agent 1 plays a non-minimax strategy s1 and agent 2 plays his/her best

response, 2’s expected payoff will be higher than – V  

Page 12: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 12/36

Nau: Game Theory 12

!  Nash equilibrium for the stage game:

!  choose randomly, P=1/3 for each move

!  Nash equilibrium for the repeated game:

! always choose randomly, P=1/3 for each move

!  Expected payoff = 0

!  Let’s see how that works out in practice …

A1A2 

Rock Paper Scissors

Rock 0, 0 –1, 1 1, –1

Paper 1, –1 0, 0 –1, 1

Scissors –1, 1 1, –1 0, 0

Roshambo (Rock, Paper, Scissors)

Page 13: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 13/36

Nau: Game Theory 13

! 1999 international roshambo programming competitionwww.cs.ualberta.ca/~darse/rsbpc1.html

!  Round-robin tournament:

•  55 programs, 1000 iterations

for each pair of programs

•  Lowest possible score = –55000,highest possible score = 55000

!  Average over 25 tournaments:

•  Highest score ( Iocaine Powder ): 13038

•  Lowest score (Cheesebot ): –36006

!  Very different from the game-theoretic prediction

A1A2 

Rock Paper Scissors

Rock 0, 0 –1, 1 1, –1

Paper 1, –1 0, 0 –1, 1

Scissors –1, 1 1, –1 0, 0

Roshambo (Rock, Paper, Scissors)

Page 14: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 14/36

Nau: Game Theory 14

!  A Nash equilibrium strategy is best for youif the other agents also use their Nash equilibrium strategies 

!  In many cases, the other agents won’t use Nash equilibrium strategies

!  If you can forecast their actions accurately, you may be able to do

much better than the Nash equilibrium strategy

!  Why won’t the other agents use their Nash equilibrium strategies?

!  Because they may be trying to forecast your  actions too

!  Something analogous can happen in non-zero-sum games

Page 15: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 15/36

Nau: Game Theory 15

!  Multiple iterations of the Prisoner’s Dilemma 

!  Widely used to study the emergence of

cooperative behavior among agents

!  e.g., Axelrod (1984), The Evolution of Cooperation

!  Axelrod ran a famous set of tournaments

!  People contributed strategiesencoded as computer programs

!  Axelrod played them against each other

Iterated Prisoner’s Dilemma

If I defect now, he might punish

me by defecting next time

Nash equilibrium

 P 2 

 P 1Cooperate Defect

Cooperate 3, 3 0, 5

Defect 5, 0 1, 1

Prisoner’s Dilemma

Page 16: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 16/36

Nau: Game Theory 16

C C

TFT   TFT  

C

C

C

C

C

C

 … 

 … 

C

C

C

C

C

C

C C

TFT   Grim 

C

C

C

C

C

C

 … 

 … 

C

C

C

C

C

C

C C

TFT  Tester  

C

C

C

C

C

C

 … 

 … 

 D

C

 D

C

C

C

 D  D

TFT   AllD 

C

 D

 D

 D

 D

 D

 … 

 … 

 D

 D

 D

 D

 D

 D

TFT with Other Agents

!  In Axelrod’s tournaments, TFT usually did best

»  It could establish and maintain cooperations with many other agents

»  It could prevent malicious agents from taking advantage of it

C C

TFT   AllC  

C

C

C

C

C

C

 . . . . . .

C

C

C

C

C

C

Page 17: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 17/36

Nau: Game Theory 17

Example:

!  A real-world example of the IPD, described in Axelrod’s book:

!  World War I trench warfare 

!  Incentive to cooperate: 

!  If I attack the other side, then they’ll retaliate and I’ll get hurt

!  If I don’t attack, maybe they won’t either

!  Result: evolution of cooperation

!  Although the two infantries were supposed to be enemies, they

avoided attacking each other

Page 18: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 18/36

Nau: Game Theory 18

IPD with Noise

!  In noisy environments,

!  There’s a nonzero probability (e.g., 10%)that a “noise gremlin” will change someof the actions

•  Cooperate (C) becomes Defect (D),

and vice versa!  Can use this to model accidents

!  Compute the score using the changedaction

!  Can also model misinterpretations

!  Compute the score using the originalaction

C C

C C

C  D …   … 

C

 Noise 

Did he really

intend to do that?

Page 19: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 19/36

Nau: Game Theory 19

Example of Noise

!  Story from a British army officer in World War I:

!  I was having tea with A Company when we heard a lot of shouting and went

out to investigate. We found our men and the Germans standing on their

respective parapets. Suddenly a salvo arrived but did no damage.

 Naturally both sides got down and our men started swearing at the Germans,

when all at once a brave German got onto his parapet and shouted out:“We are very sorry about that; we hope no one was hurt. It is not our

fault. It is that damned Prussian artillery.” 

!  The salvo wasn’t the German infantry’s intention

!  They didn’t expect it nor desire it

Page 20: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 20/36

Nau: Game Theory 20

!  Consider two agentswho both use TFT

! One accident ormisinterpretationcan cause a longstring of retaliations

C D 

C C

C

C

C

C

C

C

 . . .

 . . .

C

D

C

D C

Noise 

Retaliation

Retaliation

Retaliation

Retaliation

Noise Makes it Difficult

to Maintain Cooperation

Page 21: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 21/36

Nau: Game Theory 21

Some Strategies for the Noisy IPD

!  Principle: be more forgiving in the face of defections

!  Tit-For-Two-Tats (TFTT)

»  Retaliate only if the other agent defects twice in a row

•  Can tolerate isolated instances of defections, but susceptible to exploitationof its generosity

•  Beaten by the TESTER strategy I described earlier

!  Generous Tit-For-Tat (GTFT)

»  Forgive randomly: small probability of cooperation if the other agent defects

»  Better than TFTT at avoiding exploitation, but worse at maintaining cooperation

!  Pavlov

»  Win-Stay, Lose-Shift

•  Repeat previous move if I earn 3 or 5 points in the previous iteration

•  Reverse previous move if I earn 0 or 1 points in the previous iteration

»  Thus if the other agent defects continuously, Pavlov will alternatively cooperateand defect

Page 22: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 22/36

Nau: Game Theory 22

Discussion

!  The British army officer’s story:

!  a German shouted, ``We are very sorry about that; we hope no one was

hurt. It is not our fault. It is that damned Prussian artillery.”

!  The apology avoided a conflict

!  It was convincing because it was consistent with the German infantry’s

 past behavior!  The British had ample evidence that the German infantry wanted to

keep the peace

!  If you can tell which actions are affected  by noise, you can avoid reacting  

to the noise

!  IPD agents often behave deterministically

!  For others to cooperate with you it helps if you’re predictable

!  This makes it feasible to build a model from observed behavior

Page 23: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 23/36

Nau: Game Theory 23

The DBS Agent

!  Work by my recent PhD graduate, Tsz-Chiu Au

!  Now a postdoc at University of Texas

!  From the other agent’s recent behavior, build a model $ of the other

agent’s strategy 

!  Use the model to filter noise!  Use the model to help plan our next move  Au & Nau. Accident or intention:

That is the question (in the iterated

prisoner’s dilemma). AAMAS, 2006. 

 Au & Nau. Is it accidental or

intentional? A symbolic approach to the

noisy iterated prisoner’s dilemma. In G.

Kendall (ed.), The Iterated Prisoners

Dilemma: 20 Years On. World

Scientific, 2007.

Page 24: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 24/36

Nau: Game Theory 24

Modeling the other agent

!  A set of rules of the following form

if our last move was m and their last move was m'  

then  P [their next move will be C ]

!  Four rules: one for each of (C,C), (C,D), (D,C), and (D,D)

!  For example, TFT can be described as

!  (C,C) 1, (C, D) 1, (D, C ) 0, (D, D) 0

!  How to get the probabilities?

!  One way: look at the agent’s behavior in the recent past

!  During the last k  iterations,

!  What fraction of the time did the other agent cooperate at iteration j 

when the agents’ moves were ( x,y) at iteration j– 1?

Page 25: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 25/36

Nau: Game Theory 25

Modeling the other agent

!  $ can only model a very small set of strategies

!  It doesn’t even model the Grim strategy correctly:

!  If Grim defects, it may be defecting because of something that

happened many moves ago

!  But we’re not trying to model an agent’s entire strategy, just its recent

behavior  

!  If an agent’s behavior changes, then the probabilities in $ will change

!  e.g., after Grim defects a few times, the rules will give a very low

 probability of it cooperating again

Page 26: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 26/36

Nau: Game Theory 26

Noise Filtering

C C

C C

C

C

C

C

C

C

: :

C

C

C

C

So I won’t

retaliate here. I

think these

defections areactually noise

D C

D C

!  Suppose the applicable rule isdeterministic 

!  P[their next move will be C] = 0 or 1

!  If the other agent’s next move

isn’t what the rule predicts, then

!  Assume theobserved action

is noise

!  Behave as if the

action were what

the rule predicted

The other

agent

cooperateswhen I do

Page 27: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 27/36

Nau: Game Theory 27

Change of Behavior 

!  Anomalies in observed behavior can be dueeither to noise or to a genuine change of

 behavior

!  Changes of behavior occur because

!  The other agent can change its strategy

anytime!  E.g., if noise affects one of Agent 1’s

actions, this may trigger a change in

Agent 2’s behavior

•  Agent 1 does not know this

!  How to distinguish noise from a real changeof behavior?

C C

C

C

C D

C D

: :

C

C

C

D

D

D

These

moves are

not  noise

D

:

:

I am Grim. Ifyou ever betrayme, I will neverforgive you.

Page 28: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 28/36

Nau: Game Theory 28

Detection of a Change of Behavior 

C C

C

C

C  D

C  D

: :

C

C

C

The defections

might be accidents,

so I shouldn’t lose

my temper too soon

 D

 D

 D D

 DI think the other

agent’s has really

changed, so I’ll

change mine too

Temporary tolerance: !  When we observe unexpected

 behavior from the other agent

!  Don’t immediately decide

whether it’s noise or a real

change of behavior!  Instead, defer judgment

for a few iterations 

!  If the anomaly persists, then

recompute $ based on the

other agent’s recent behavior

The other

agent

cooperateswhen I do

Page 29: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 29/36

Nau: Game Theory 29

Move generation

!  Modified version of game-tree search

!  Use the policy $ to predict probabilities of the other agent’s moves

!  Compute expected utility) for move x as

u1( x) = !  y"{C,D} u1( x,y) #  P ( y | "  , previous moves)

where x = my move, y = other agent’s move

!  Choose the move with the highest expected utilityCurrent

Iteration

 Next

Iteration

Iteration

after next: : : : : : : : : : : : : : : :

(C,C) (C,D) (D,C) (D,D)

Page 30: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 30/36

Nau: Game Theory 30

Suppose we have the rules

1. (C,C) %  0.7

2. (C,D) %  0.4

3. (D,C) %  0.1

4. (D,D) %  0.1

C C

C

C

C C

D

D

??

Rule 1 predicts

 P (C ) = 0.7, P (D) = 0.3

(C,C) (C,D) (D,C) (D,D)

Example

!  Suppose we search to depth 1 

u1(C) = 0.7 u1(C,C) + 0.3 u1(C,D) = 2.1 + 0 = 2.1

u1(D) = 0.7 u1(D,C) + 0.3 u1(D,D) = 3.5 + 0.3 = 3.8

»  So D looks better

!  Is D really what we should choose? 

Page 31: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 31/36

Nau: Game Theory 31

Suppose we have the rules

1. (C,C) %  0.7

2. (C,D) %  0.4

3. (D,C) %  0.1

4. (D,D) %  0.1

C C

C

C

C C

D

D

??

Rule 1 predicts

 P (C ) = 0.7, P (D) = 0.3

(C,C) (C,D) (D,C) (D,D)

Example

!  It’s not wise to choose D

»  On the move after that, the opponent willretaliate with P =0.9

»  The depth-1 search didn’t see this

!  But if we search to depth d >1, we’ll see it

!  C will look better and we’ll choose it instead

!  In general, it’s best look far ahead

»  e.g., 60 moves

Page 32: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 32/36

Nau: Game Theory 32

How to Search Deeper 

!  Game trees grow exponentially with search depth

»  How to search to the tree deeply?

!  Key assumption: $ accurately models the other agent’s future behavior

!  Then we can use dynamic programming

»  Makes the search polynomial in the search depth

»  Can easily search to depth 60

»  Equivalent to solving an acyclic MDP of depth 60

!  This generates fairly good movesCurrent

iteration

 Next

iteration

iteration

after next

: : : :

(C,C) (C,D) (D,C) (D,D)

Page 33: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 33/36

Nau: Game Theory 33

http://www.prisoners-dilemma.com

!  Category 2: IPD with noise

! 165 programs participated

!  DBS dominated the

top 10 places

!  Two agents scored

higher than DBS! They both used

master-and-slaves strategies

20th Anniversary IPD Competition

Page 34: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 34/36

Nau: Game Theory 34

Master & Slaves Strategy

!  Each participant could submit up to 20 programs

!  Some submitted programs that could recognize each other

!  (by communicating pre-arranged sequences of Cs and Ds)

!  The 20 programs worked as a team

•  1 master, 19 slaves

!  When a slave plays with its master

•  Slave cooperates, master defects

=> maximizes the master’s payoff

!  When a slave plays with

an agent not in its team

•  It defects

=> minimizes the other agent’s payoff

… and they

beat up

everyone else

My goons give

me

all theirmoney …

Page 35: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 35/36

Page 36: 7 Repeated Games

8/13/2019 7 Repeated Games

http://slidepdf.com/reader/full/7-repeated-games 36/36

Nau: Game Theory 36

Summary

!  Finitely repeated games – backward induction

!  Infinitely repeated games

!  average reward, future discounted reward

!  equilibrium payoffs

!   Non-equilibrium strategies

!  opponent modeling in roshambo

!  iterated prisoner’s dilemma with noise

•  opponent models based on observed behavior

•  detection and removal of noise

•  game-tree search against the opponent model

!  20th anniversary IPD competition