7 repeated games
TRANSCRIPT
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 1/36
Nau: Game Theory 1
Introduction to Game Theory
7. Repeated Games
Dana Nau
University of Maryland
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 2/36
Nau: Game Theory 2Repeated Stag Hunt
Repeated Games
! Used by game theorists, economists, social and behavioral scientists
as highly simplified models of various real-world situations
Roshambo
Iterated Chicken Game
Repeated
Matching Pennies
Iterated Prisoner’s DilemmaRepeated
Ultimatum Game
Iterated Battle
of the Sexes
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 3/36
Nau: Game Theory 3
Finitely Repeated Games
! In repeated games, some game G is playedmultiple times by the same set of agents
! G is called the stage game
• Usually (but not always),
G is a normal-form game
! Each occurrence of G is calledan iteration or a round
! Usually each agent knows what all
the agents did in the previous iterations,
but not what they’re doing in the
current iteration
! Thus, an imperfect-informationgame with perfect recall
! Usually each agent’s
payoff function is additive
Agent 1: Agent 2 :
C
CD
CRound 1:
Round 2:
Prisoner’s Dilemma:
3+0 = 33+5 = 5Total payoff:
2
1
C D
C 3, 3 0, 5
D 5, 0 1, 1
Iterated Prisoner’s Dilemma,
with 2 iterations:
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 4/36
Nau: Game Theory 4
Iterated Prisoner’s Dilemma with 2 iterations:
Strategies
! The repeated game has a much bigger strategy space than the stage game
! One kind of strategy is a stationary strategy:
! Use the same strategy
at every iteration
! More generally, an
agent’s play at each stage
may depend on the history
! What happened in
previous iterations
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 5/36
Nau: Game Theory 5
Backward Induction
! If the number of iterations is finite and known, we can use backwardinduction to get a subgame-perfect equilibrium
! Example: finitely many repetitions of the Prisoner’s Dilemma
! In the last round,
the dominant strategy is D
! That’s common knowledge
! So in the 2nd-to-last round,
D also is the dominant strategy
! …
! The SPE is ( D,D) on every round
! As with the Centipede game,
this argument is vulnerable to
both empirical and theoretical criticisms
Agent 1: Agent 2 :
D
DD
DRound 1:
Round 2:
D
DD
DRound 3:
Round 4:
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 6/36
Nau: Game Theory 6
Backward Induction when G is 0-sum
! As before, backward induction works much better in zero-sum games
! In the last round, equilibrium is the minimax profile
• Each agent uses his/her minimax strategy
! That’s common knowledge
! So in the 2nd-to-last round, itagain is the minimax strategies
…
! The SPE is ( D,D) on every round
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 7/36
Nau: Game Theory 7
! An infinitely repeated game in extensive form would be an infinite tree
! Payoffs can’t be attached to any terminal nodes
! Payoffs can’t be the sums of the payoffs in the stage games (generally infinite)
! Two common ways around this problem
! Let r (1)i , r (2)
i , … be an infinite sequence of payoffs for agent i
! Agent i’s average reward is
! Agent i’s future discounted reward is the discounted sum of the payoffs, i.e.,
where ! (with 0 ! ! ! 1) is a constant called the discount factor
! Two ways to interpret the discount factor:
1. The agent cares more about the preset than the future
2. The agent cares about the future, but the game ends at any round with
probability 1 " !
Infinitely Repeated Games
limk "#
ri j ( ) / k j =1
k
$
" j ri
j ( )
j =1
#
$
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 8/36
Nau: Game Theory 8
Example
! Some well-known strategies for the Iterated Prisoner’s Dilemma:» AllC : always cooperate
» AllD (the Hawk strategy):
always defect
» Grim: cooperate until the
other agent defects,
then defect forever» Tit-for-Tat (TFT):
cooperate on the first move.
On the nth move, repeat
the other agent (n –1)th move
» Tester : defect on move 1. If the
other agent retaliates, play TFT.Otherwise, randomly intersperse
cooperation and defection
! If the discount factor is large enough, each of the following is a Nash equilibrium
! (TFT, TFT ), (TFT,GRIM ), and (GRIM,GRIM )
C C
TFT Tester
C
CC
C
C
C
…
…
D
C
D
C
C
C
D D
TFT
or
Grim AllD
C
D D
D
D
D
…
…
D
D
D
D
D
D
C C
AllC,
Grim,
or TFT
AllC,
Grim,
or TFT
C
CC
C
C
C
. . . . . .
C
C
C
C
C
C
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 9/36
Nau: Game Theory 9
Equilibrium Payoffs for Repeated Games
! There’s a “folk theorem” that tells what the possible equilibrium payoffsare in repeated games
! It says roughly the following:
! In an infinitely repeated game whose stage game is G, there is a
Nash equilibrium whose average payoffs are ( p1 , p2 , …, pn)
if and only if
! G has a mixed-strategy profile ( s1 , s2 , …, sn) with the following
property:
• For each i, si’s payoff would be # pi if the other agents used
minimax strategies against i
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 10/36
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 11/36
Nau: Game Theory 11
Zero-Sum Repeated Games
! For two-player zero-sum repeated games, the folk theorem is still true, but it becomes vacuous
! Suppose we iterate a two-player zero-sum game G
! Let V be the value of G (from the Minimax Theorem)
! If agent 2 uses a minimax strategy against 1, then 1’s maximum payoff is V • Thus max value for p1 is V, so min value for p2 is –V
! If agent 1 uses a minimax strategy against 2, then 2’s maximum payoff is – V
• Thus max value for p2 is –V , so min value for p1 is V
! Thus in the iterated game, the only Nash-equilibrium payoff profile is (V,–V )
! The only way to get this is if each agent always plays his/her minimax strategy
• If agent 1 plays a non-minimax strategy s1 and agent 2 plays his/her best
response, 2’s expected payoff will be higher than – V
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 12/36
Nau: Game Theory 12
! Nash equilibrium for the stage game:
! choose randomly, P=1/3 for each move
! Nash equilibrium for the repeated game:
! always choose randomly, P=1/3 for each move
! Expected payoff = 0
! Let’s see how that works out in practice …
A1A2
Rock Paper Scissors
Rock 0, 0 –1, 1 1, –1
Paper 1, –1 0, 0 –1, 1
Scissors –1, 1 1, –1 0, 0
Roshambo (Rock, Paper, Scissors)
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 13/36
Nau: Game Theory 13
! 1999 international roshambo programming competitionwww.cs.ualberta.ca/~darse/rsbpc1.html
! Round-robin tournament:
• 55 programs, 1000 iterations
for each pair of programs
• Lowest possible score = –55000,highest possible score = 55000
! Average over 25 tournaments:
• Highest score ( Iocaine Powder ): 13038
• Lowest score (Cheesebot ): –36006
! Very different from the game-theoretic prediction
A1A2
Rock Paper Scissors
Rock 0, 0 –1, 1 1, –1
Paper 1, –1 0, 0 –1, 1
Scissors –1, 1 1, –1 0, 0
Roshambo (Rock, Paper, Scissors)
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 14/36
Nau: Game Theory 14
! A Nash equilibrium strategy is best for youif the other agents also use their Nash equilibrium strategies
! In many cases, the other agents won’t use Nash equilibrium strategies
! If you can forecast their actions accurately, you may be able to do
much better than the Nash equilibrium strategy
! Why won’t the other agents use their Nash equilibrium strategies?
! Because they may be trying to forecast your actions too
! Something analogous can happen in non-zero-sum games
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 15/36
Nau: Game Theory 15
! Multiple iterations of the Prisoner’s Dilemma
! Widely used to study the emergence of
cooperative behavior among agents
! e.g., Axelrod (1984), The Evolution of Cooperation
! Axelrod ran a famous set of tournaments
! People contributed strategiesencoded as computer programs
! Axelrod played them against each other
Iterated Prisoner’s Dilemma
If I defect now, he might punish
me by defecting next time
Nash equilibrium
P 2
P 1Cooperate Defect
Cooperate 3, 3 0, 5
Defect 5, 0 1, 1
Prisoner’s Dilemma
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 16/36
Nau: Game Theory 16
C C
TFT TFT
C
C
C
C
C
C
…
…
C
C
C
C
C
C
C C
TFT Grim
C
C
C
C
C
C
…
…
C
C
C
C
C
C
C C
TFT Tester
C
C
C
C
C
C
…
…
D
C
D
C
C
C
D D
TFT AllD
C
D
D
D
D
D
…
…
D
D
D
D
D
D
TFT with Other Agents
! In Axelrod’s tournaments, TFT usually did best
» It could establish and maintain cooperations with many other agents
» It could prevent malicious agents from taking advantage of it
C C
TFT AllC
C
C
C
C
C
C
. . . . . .
C
C
C
C
C
C
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 17/36
Nau: Game Theory 17
Example:
! A real-world example of the IPD, described in Axelrod’s book:
! World War I trench warfare
! Incentive to cooperate:
! If I attack the other side, then they’ll retaliate and I’ll get hurt
! If I don’t attack, maybe they won’t either
! Result: evolution of cooperation
! Although the two infantries were supposed to be enemies, they
avoided attacking each other
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 18/36
Nau: Game Theory 18
IPD with Noise
! In noisy environments,
! There’s a nonzero probability (e.g., 10%)that a “noise gremlin” will change someof the actions
• Cooperate (C) becomes Defect (D),
and vice versa! Can use this to model accidents
! Compute the score using the changedaction
! Can also model misinterpretations
! Compute the score using the originalaction
C C
C C
C D … …
C
Noise
Did he really
intend to do that?
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 19/36
Nau: Game Theory 19
Example of Noise
! Story from a British army officer in World War I:
! I was having tea with A Company when we heard a lot of shouting and went
out to investigate. We found our men and the Germans standing on their
respective parapets. Suddenly a salvo arrived but did no damage.
Naturally both sides got down and our men started swearing at the Germans,
when all at once a brave German got onto his parapet and shouted out:“We are very sorry about that; we hope no one was hurt. It is not our
fault. It is that damned Prussian artillery.”
! The salvo wasn’t the German infantry’s intention
! They didn’t expect it nor desire it
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 20/36
Nau: Game Theory 20
! Consider two agentswho both use TFT
! One accident ormisinterpretationcan cause a longstring of retaliations
C D
C C
C
C
C
C
C
C
. . .
. . .
C
D
C
D
D
D C
Noise
Retaliation
Retaliation
Retaliation
Retaliation
Noise Makes it Difficult
to Maintain Cooperation
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 21/36
Nau: Game Theory 21
Some Strategies for the Noisy IPD
! Principle: be more forgiving in the face of defections
! Tit-For-Two-Tats (TFTT)
» Retaliate only if the other agent defects twice in a row
• Can tolerate isolated instances of defections, but susceptible to exploitationof its generosity
• Beaten by the TESTER strategy I described earlier
! Generous Tit-For-Tat (GTFT)
» Forgive randomly: small probability of cooperation if the other agent defects
» Better than TFTT at avoiding exploitation, but worse at maintaining cooperation
! Pavlov
» Win-Stay, Lose-Shift
• Repeat previous move if I earn 3 or 5 points in the previous iteration
• Reverse previous move if I earn 0 or 1 points in the previous iteration
» Thus if the other agent defects continuously, Pavlov will alternatively cooperateand defect
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 22/36
Nau: Game Theory 22
Discussion
! The British army officer’s story:
! a German shouted, ``We are very sorry about that; we hope no one was
hurt. It is not our fault. It is that damned Prussian artillery.”
! The apology avoided a conflict
! It was convincing because it was consistent with the German infantry’s
past behavior! The British had ample evidence that the German infantry wanted to
keep the peace
! If you can tell which actions are affected by noise, you can avoid reacting
to the noise
! IPD agents often behave deterministically
! For others to cooperate with you it helps if you’re predictable
! This makes it feasible to build a model from observed behavior
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 23/36
Nau: Game Theory 23
The DBS Agent
! Work by my recent PhD graduate, Tsz-Chiu Au
! Now a postdoc at University of Texas
! From the other agent’s recent behavior, build a model $ of the other
agent’s strategy
! Use the model to filter noise! Use the model to help plan our next move Au & Nau. Accident or intention:
That is the question (in the iterated
prisoner’s dilemma). AAMAS, 2006.
Au & Nau. Is it accidental or
intentional? A symbolic approach to the
noisy iterated prisoner’s dilemma. In G.
Kendall (ed.), The Iterated Prisoners
Dilemma: 20 Years On. World
Scientific, 2007.
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 24/36
Nau: Game Theory 24
Modeling the other agent
! A set of rules of the following form
if our last move was m and their last move was m'
then P [their next move will be C ]
! Four rules: one for each of (C,C), (C,D), (D,C), and (D,D)
! For example, TFT can be described as
! (C,C) 1, (C, D) 1, (D, C ) 0, (D, D) 0
! How to get the probabilities?
! One way: look at the agent’s behavior in the recent past
! During the last k iterations,
! What fraction of the time did the other agent cooperate at iteration j
when the agents’ moves were ( x,y) at iteration j– 1?
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 25/36
Nau: Game Theory 25
Modeling the other agent
! $ can only model a very small set of strategies
! It doesn’t even model the Grim strategy correctly:
! If Grim defects, it may be defecting because of something that
happened many moves ago
! But we’re not trying to model an agent’s entire strategy, just its recent
behavior
! If an agent’s behavior changes, then the probabilities in $ will change
! e.g., after Grim defects a few times, the rules will give a very low
probability of it cooperating again
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 26/36
Nau: Game Theory 26
Noise Filtering
C C
C C
C
C
C
C
C
C
: :
C
C
C
C
So I won’t
retaliate here. I
think these
defections areactually noise
D C
D C
! Suppose the applicable rule isdeterministic
! P[their next move will be C] = 0 or 1
! If the other agent’s next move
isn’t what the rule predicts, then
! Assume theobserved action
is noise
! Behave as if the
action were what
the rule predicted
The other
agent
cooperateswhen I do
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 27/36
Nau: Game Theory 27
Change of Behavior
! Anomalies in observed behavior can be dueeither to noise or to a genuine change of
behavior
! Changes of behavior occur because
! The other agent can change its strategy
anytime! E.g., if noise affects one of Agent 1’s
actions, this may trigger a change in
Agent 2’s behavior
• Agent 1 does not know this
! How to distinguish noise from a real changeof behavior?
C C
C
C
C D
C D
: :
C
C
C
D
D
D
These
moves are
not noise
D
:
:
I am Grim. Ifyou ever betrayme, I will neverforgive you.
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 28/36
Nau: Game Theory 28
Detection of a Change of Behavior
C C
C
C
C D
C D
: :
C
C
C
The defections
might be accidents,
so I shouldn’t lose
my temper too soon
D
D
D D
DI think the other
agent’s has really
changed, so I’ll
change mine too
Temporary tolerance: ! When we observe unexpected
behavior from the other agent
! Don’t immediately decide
whether it’s noise or a real
change of behavior! Instead, defer judgment
for a few iterations
! If the anomaly persists, then
recompute $ based on the
other agent’s recent behavior
The other
agent
cooperateswhen I do
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 29/36
Nau: Game Theory 29
Move generation
! Modified version of game-tree search
! Use the policy $ to predict probabilities of the other agent’s moves
! Compute expected utility) for move x as
u1( x) = ! y"{C,D} u1( x,y) # P ( y | " , previous moves)
where x = my move, y = other agent’s move
! Choose the move with the highest expected utilityCurrent
Iteration
Next
Iteration
Iteration
after next: : : : : : : : : : : : : : : :
(C,C) (C,D) (D,C) (D,D)
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 30/36
Nau: Game Theory 30
Suppose we have the rules
1. (C,C) % 0.7
2. (C,D) % 0.4
3. (D,C) % 0.1
4. (D,D) % 0.1
C C
C
C
C C
D
D
??
Rule 1 predicts
P (C ) = 0.7, P (D) = 0.3
(C,C) (C,D) (D,C) (D,D)
Example
! Suppose we search to depth 1
u1(C) = 0.7 u1(C,C) + 0.3 u1(C,D) = 2.1 + 0 = 2.1
u1(D) = 0.7 u1(D,C) + 0.3 u1(D,D) = 3.5 + 0.3 = 3.8
» So D looks better
! Is D really what we should choose?
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 31/36
Nau: Game Theory 31
Suppose we have the rules
1. (C,C) % 0.7
2. (C,D) % 0.4
3. (D,C) % 0.1
4. (D,D) % 0.1
C C
C
C
C C
D
D
??
Rule 1 predicts
P (C ) = 0.7, P (D) = 0.3
(C,C) (C,D) (D,C) (D,D)
Example
! It’s not wise to choose D
» On the move after that, the opponent willretaliate with P =0.9
» The depth-1 search didn’t see this
! But if we search to depth d >1, we’ll see it
! C will look better and we’ll choose it instead
! In general, it’s best look far ahead
» e.g., 60 moves
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 32/36
Nau: Game Theory 32
How to Search Deeper
! Game trees grow exponentially with search depth
» How to search to the tree deeply?
! Key assumption: $ accurately models the other agent’s future behavior
! Then we can use dynamic programming
» Makes the search polynomial in the search depth
» Can easily search to depth 60
» Equivalent to solving an acyclic MDP of depth 60
! This generates fairly good movesCurrent
iteration
Next
iteration
iteration
after next
: : : :
(C,C) (C,D) (D,C) (D,D)
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 33/36
Nau: Game Theory 33
http://www.prisoners-dilemma.com
! Category 2: IPD with noise
! 165 programs participated
! DBS dominated the
top 10 places
! Two agents scored
higher than DBS! They both used
master-and-slaves strategies
20th Anniversary IPD Competition
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 34/36
Nau: Game Theory 34
Master & Slaves Strategy
! Each participant could submit up to 20 programs
! Some submitted programs that could recognize each other
! (by communicating pre-arranged sequences of Cs and Ds)
! The 20 programs worked as a team
• 1 master, 19 slaves
! When a slave plays with its master
• Slave cooperates, master defects
=> maximizes the master’s payoff
! When a slave plays with
an agent not in its team
• It defects
=> minimizes the other agent’s payoff
… and they
beat up
everyone else
My goons give
me
all theirmoney …
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 35/36
8/13/2019 7 Repeated Games
http://slidepdf.com/reader/full/7-repeated-games 36/36
Nau: Game Theory 36
Summary
! Finitely repeated games – backward induction
! Infinitely repeated games
! average reward, future discounted reward
! equilibrium payoffs
! Non-equilibrium strategies
! opponent modeling in roshambo
! iterated prisoner’s dilemma with noise
• opponent models based on observed behavior
• detection and removal of noise
• game-tree search against the opponent model
! 20th anniversary IPD competition