bertinoro, 2011 stochastic games dario bauso › ~bagagiol › stochasticgames.pdf · bertinoro,...

Stochastic gamesBertinoro, 2011 Dario Bauso

• Applications

• Stochastic games: formulation

• Two-players zero sum games

• Results and open questions

• References

freely inspired by Solan E (2009) Stochastic Games, in Encyclopedia of Database Systems, Springer.

Capital accumulation (Fishery)

• Two players jointly own a natural resource or productive asset.

• at every period they have to decide the amount of resource toconsume

• the amount that is not consumed grows by a known (or anunknown) fraction

• state is current amount of resource

• action set is amount of resource to be exploited in the currentperiod

• transition is influenced by the decisions of all the players, aswell as by the random growth of the resource.

Taxation

• A government sets a tax rate at every period

• each citizen decides at every period how much to work, andhow much money to consume; the rest of money grows by aknown interest rate at the next period

• state is citizens amount of savings

• stage payoff of a citizen depends on the amount of i) moneythat he consumed, ii) free time he has, iii) tax that thegovernment collected in total.

• stage payoff of government combines i) average stage payoffof citizens ii) amount of tax collected

Communication network

• A single-cell system with one receiver and multiple uplinktransmitters share a single, slotted, synchronous classicalcollision channel

• transmitters at each time slot decide if and which packet totransmit

• state is channel congestion

• stage payoff combines probability of successfull transmissionplus cost of transmission

• dropped packets are backlogged

Queues

• Individuals may choose between private slow service provider,or powerful public service provider

• state is the current load of public and private service providers

• the payoff is the time to be served

Main ingredients

• Interactions among players

• environment changes in response to players’ behaviors

• players’ stage payoff depends on players current behaviors andenvironment

Formulation

• Set of players N

• state space S

• set of actions Ai of player i

• set valued function Ai : S → Ai (available action for givenstate)

• set of action profilesSA := {(s, a) : s ∈ S , a = (ai )i∈N , ai ∈ Ai (s)}

• stage payoff function ui : SA→ R of player i

• transition function q : SA→ ∆(S)(∆(S) space of probability distribution over S)

Comments

Stochastic games generalize

• finite interactions if play moves at time t to an absorbingstate with payoff 0

• (static) matrix game if t = 1

• repeated games if we have one single state

• stopping games if stage payoff 0 until a player chooses quit,play moves to absorbing state with nonnull payoff

• markov decision problems if we have just one single player.

• Also, payoff ui is profit (to maximize) but can also be a cost(to minimize)

• actions determine current payoff and future state (payoff)

• actions, payoffs, and transitions depend only on current state

Strategies

• given past play at stage t:

(s1, a1, s2, a2, . . . , st)

• (pure) stationary strategy depends on current state only:

σi (s1, a1, s2, a2, . . . , st) ∈ Ai (st)

past play σi (s1, a1, s2, a2, . . . , at−1) do not count

• mixed strategy

σi (s1, a1, s2, a2, . . . , st) ∈ ∆(Ai (st))

∆(Ai (st)) is probability distribution on set Ai (st)

Strategies• Space of stationary mixed strategies for player i

Xi = ×s∈S∆(Ai (s))

• profile of mixed strategies

σ = (σi )i∈N , σi ∈ Xi

• space of infinite plays H∞ = SAN is set of all possible infinitesequences:

(s1, a1, s2, a2, . . . , st , at , . . .)

• Every profile of mixed strategies σ = (σi )i∈N and initial states1 induce probability distribution Ps1,σ on H∞ = SAN

• finite or infinite (T →∞) stream of payoffs

ui (st , at), t = 1, 2 . . . ,T

Set-ups

• finite horizon evaluationinteraction lasts exactly T stages

• discounted evaluationinteraction lasts many stages, players discount stage payoffs -better to receive 1 $ today than tomorrow

• limsup evaluationinteraction lasts many stages, players do not discount stagepayoffs - stage payoff at time t is insignificant if compared topayoffs in all other stages

Set-ups

• T -stage payoff

γTi (s1, σ) := Es1,σ

[ 1

T

T∑t=1

ui (st , at)]

• λ-discounted payoff

γλi (s1, σ) := Es1,σ

[λ

∞∑t=1

(1− λ)t−1ui (st , at)]

• limsup payoff

γ∞i (s1, σ) := Es1,σ

[lim supT→∞

1

T

T∑t=1

ui (st , at)]

Equilibria

• σ is T -stage ε-equilibrium if

γTi (s1, σ) ≥ γT

i (s1, σ′i , σ−i )− ε, ∀ s1 ∈ S , i ∈ N, σ′i ∈ Xi

• σ is λ-discounted ε-equilibrium if

γλi (s1, σ) ≥ γλi (s1, σ′i , σ−i )− ε, ∀ s1 ∈ S , i ∈ N, σ′i ∈ Xi

• σ is limsup ε-equilibrium if

γ∞i (s1, σ) ≥ γ∞i (s1, σ′i , σ−i )− ε, ∀ s1 ∈ S , i ∈ N, σ′i ∈ Xi

Players benefit from unilateral deviations no more than ε

Two player stochastic zero-sum game

• sum of payoffs is zero: u1(s, a) + u2(s, a) = 0, ∀(s, a) ∈ SA

• The game admits at most one equilibrium payoff (termedvalue of the game) at every initial state s1

• each player’s strategy σ1 at an ε-equilibrium is ε-optimalguarantees the value up to ε,

γT1 (s1, σ1, σ2) ≥ vT (s1)︸︷︷︸

value at s1

−ε, ∀σ2 ∈ X2

• Theorem [Shapley 1953, Fink 1964]: If all sets are finite, thenfor every λ there exists an equilibrium in stationary strategies

Proof

• V space of all functions v : S → R.

• define zero-sum matrix game Gλs (v) for all v

• A1(s), A2(s) space of actions at state s

• payoff (Player 2 pays to Player 1) is

λu1(s, a) + (1− λ)∑s′∈S

q(s ′|s, a)v(s ′)

• define value operator φs(v) = val(Gλs (v))

• non-expansiveness ‖φ(v)− φ(w)‖∞ ≤ (1− λ)‖v −w‖∞ leadsto unique fixed point v̂λ

• optimal mixed action σi in the matrix game Gλst (v̂λ) is

λ-discounted 0-optimal strategy

Example 1: Absorbing gameL R

T 0 s2 1 s1

B 1 s1 0 s0

State s2

LT 1 s1

State s1

LT 0 s0

State s0

• for every v = (v1, v2, v3) ∈ V = R3 the game Gλs2(v) is

L RT (1− λ)v2 λ+ (1− λ)v1

B λ+ (1− λ)v1 (1− λ)v0

Game Gλs2

LT λ+ (1− λ)v1

Game Gλs1

LT (1− λ)v0

Game Gλs0

• v̂λ0 = val(Gλs0(v̂)) yieds v̂λs0 = 0


• v̂λ2 = val(Gλs2(v̂)) yieds v̂λs2 = 1−

√λ

1−λ

σ2 = [1−√λ

1−λ (L),√λ−λ

1−λ (R)] σ1 = [1−√λ

1−λ (T ),√λ−λ

1−λ (B)]

fromv2 = y(1− λ)v2 + (1− y) = y (y prob. player 2 plays L)v2 = x(1− λ)v2 + (1− x) = x (x prob. player 1 plays T).

Example 2: Big MatchL R

T 0 s2 1 s2

B 1 s1 0 s0

State s2

LT 1 s1

State s1

LT 0 s0

State s0

• for every v = (v1, v2, v3) ∈ V = R3 the game Gλs2(v) is

L RT (1− λ)v2 λ+ (1− λ)v2

B λ+ (1− λ)v1 (1− λ)v0

Game Gλs2

LT λ+ (1− λ)v1

Game Gλs1

LT (1− λ)v0

Game Gλs0




2

σ2 = [12(L), 1

2(R)] σ1 = [ 11+λ(T ), λ

1+λ(B)]

fromv2 = y(1− λ)v2 + (1− y)[λ + (1− λ)v2] = y (y prob. player 2 plays L)v2 = x(1− λ)v2 + (1− x) = x[λ + (1− λ)v2] (x prob. player 1 plays T)

Results and open questions

• Can one find for every stochastic game a strategy profile thatis ε-equilibrium for every discount factor sufficiently small?

• Can one identify classes of games where one has a simplestrategy profile that is an ε-equilibrium for every discountfactor sufficiently small? (e.g. stationary strategy, periodicstrategy)

• Theorem [Mertens and Neyman 1981] For two player zerosum games each player has a strategy that is ε-optimal forevery discount factor sufficiently small

• Theorem [Vieille 2000]: For every two player non zero sumstochastic game there is a strategy profile that is ε-equilibriumfor every discount factor sufficiently small.

Algorithms

• based on linear programming for two players zero-sum games

• extentions of Lemke-Howson algorithm for nonzero-sum games

• other algorithms based on fictitious play, value iterates, andpolicy improvement

Additional and future directions

• approximation of games with infinite state and action spacesby finite games

• stochastic games in continuous time

• existence of a uniform equilibrium and a limsup equilibrium inmulti-player stochastic games with finite state and actionspaces.

• development of efficient algorithms that calculate the value oftwo-player zero-sum games.

• approachable and excludable sets in stochastic games withvector payoffs

References

• Filar JA, Vrieze K (1996) Competitive Markov decisionprocesses. Springer.

• Mertens JF, Neyman A (1981) Stochastic games. Int J GameTh 10:5366

• Neyman A, Sorin S (2003) Stochastic games and applications.NATO Science Series. Kluwer

• Solan E (2009) Stochastic Games, in Encyclopedia ofDatabase Systems, Springer.

• Vieille N (2000) Equilibrium in 2-person stochastic games I: AReduction. Israel J Math 119:5591

bertinoro, 2011 stochastic games dario bauso › ~bagagiol › stochasticgames.pdf · bertinoro,...

Documents