statistics, probability and game theory: papers in honor of david blackwell || repeated arat games

17
Repeated ARAT Games Author(s): F. Evangelista, T. E. S. Raghavan and O. J. Vrieze Source: Lecture Notes-Monograph Series, Vol. 30, Statistics, Probability and Game Theory: Papers in Honor of David Blackwell (1996), pp. 13-28 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/4355935 . Accessed: 13/06/2014 09:35 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Lecture Notes-Monograph Series. http://www.jstor.org This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AM All use subject to JSTOR Terms and Conditions

Upload: t-e-s-raghavan-and-o-j-vrieze

Post on 15-Jan-2017

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT GamesAuthor(s): F. Evangelista, T. E. S. Raghavan and O. J. VriezeSource: Lecture Notes-Monograph Series, Vol. 30, Statistics, Probability and Game Theory:Papers in Honor of David Blackwell (1996), pp. 13-28Published by: Institute of Mathematical StatisticsStable URL: http://www.jstor.org/stable/4355935 .

Accessed: 13/06/2014 09:35

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access toLecture Notes-Monograph Series.

http://www.jstor.org

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 2: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Statistics, Probability and Game Theory IMS Lecture Notes - Monograph Series (1996) Volume 30

REPEATED ARAT GAMES

F. Evangelista T. E. S. Raghavan, O. J. Vrieze

University of Wisconsin, University of Illinois, University ofLimburg

Abstract

We show the existence of e-equilibria in stationary strategies for

the class of repeated games with Additive Reward and Additive Tran-

sition (ARAT) structure. A new approach to existence questions for

stochastic games is introduced in the proof ? the strategy space of

one player is perturbed so that the player uses any pure action with at

least probability e. For this e-perturbed game, equilibria in stationary

strategies exist. By analyzing the limit properties of these strategies, the existence of stationary e-equilibria in the original game follows.

1. Introduction. When Shapley [1953] first defined stochastic games and proved the existence of value and stationary optimal strategies, he es-

sentially gave a complete result for zero-sum games with discounted pay-

offs. Independently, Blackwell [1962, 1965] initiated the systematic study of

Markovian decision processes. He investigated the nature of optimal policies

for discounted payoffs and the relation between the discounted and the lim-

iting average (undiscounted) case. Together with Ferguson (Blackwell and

Ferguson [1968]), he analyzed the stochastic game called "The Big Match,"

revealing an important difference between Markovian decision processes and

stochastic games. This zero-sum game with limiting average payoff has a

value. But unlike Markovian decision processes where the player has a sta-

tionary optimal strategy, only a behavioral e-optimal strategy exists for the

maximizer. In the Big Match, while one player has no power to terminate

the game, the opponent can terminate the game any time by choosing the

absorbing row. It is this freedom to terminate the game that complicates the player's near optimal strategy. Generalizing the Big Match as a single

loop stochastic game which can be terminated by one player, Filar extended

the game and showed that such games admit once again, an epsilon optimal but only a behavioral strategy for the controlling player [Filar (1981)].

Attempts by Kohlberg [1974], and Bewley and Kohlberg [1976], to extend

this result to general undiscounted payoffs led to them to study the Puiseux

expansion of the value function. Using ideas from Blackwell and Ferguson, and Bewley and Kohlberg, Mertens and Neyman [1981] proved the existence

of value for all limiting average zero-sum stochastic games.

13

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 3: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

14 Evangelista, Raghavan and Vrieze

Fink [1964] extended Shapley's result to nonzero-sum games, showing the existence of equilibria in stationary strategies for games with discounted

payoffs. However, there is as yet no result analogous to that of Mertens and

Neyman: the existence of e-equilibria in nonzero-sum games with limiting

average payoff is still an open problem.

Another direction in the study of stochastic games with limiting aver-

age payoff is to identify subclasses of games that have e-equilibria in sta-

tionary strategies by imposing conditions on the structure of the payoffs

and/or transition probabilities. The following examples from the literature

are games with e-equilibria in stationary strategies for both the zero-sum and

nonzero-sum cases: irreducible/unichain stochastic games (Rogers [1969], Stern [1978], Thuijsmann and Vrieze [1991]); single controller stochastic

games (Parthasarathy and Raghavan [1981]); stochastic games with state in-

dependent transitions and separable rewards (Parthasarathy, Tijs and Vrieze

[1984]). Filar and Raghavan [1991] provides a detailed survey of these games. In this paper we add another class to the list of nonzero-sum games

with e-equilibria in stationary strategies. We show that repeated games with absorbing states for which the additive reward and additive transition

property (ARAT) holds have e-equilibria, and sometimes even equilibria

(e = 0) in stationary strategies.

"The Big Match'9 is an example of a repeated game with absorbing states.

It has only one non-absorbing state; whenever the game exits from this

state to one of the other states, the game is essentially over since it will

remain forever in one of the absorbing states with the same payoffs at every decision moment. Using the idea of threat strategies, Vrieze and Thuijsman

[1989] showed that repeated games with absorbing states have e-equilibria,

generally in non-stationary strategies.

In a stochastic game with an ARAT structure, the rewards at each de-

cision moment can be written as a sum of a term dependent on player l's

action and the state variable and a term dependent on player 2's action and

the state variable. Moreover, the same holds for the transitions. Zero-sum

ARAT games have optimal stationary strategies (Raghavan, Tijs and Vrieze

[1986]) but this is not true in the nonzero-sum case (Evangelista [1993]). We end this introduction with the remark that Thuijsman and Raghavan

[1994] showed that both switching control and ARAT stochastic games pos- sess e-equilibria which are stationary-like; that is, a retaliating player may need to use non-stationary strategies only in the case of a retaliation as a

consequence of a deviation of the other player. Recently, Flesch [1995] gave an example of a perfect information game that does not have e-equilibria in

stationary strategies.

2. Preliminaries. In this section we will give some properties of re-

peated ARAT games which will be used in the proof of the main result of

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 4: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 15

the paper. First we need some notations and definitions.

A stochastic game is identified by a six-tuple {S, {A1(s); s G S}, {A2(s); s G S},rl,r2,p}. Here, S = {1,...,K} is the state space of the game;

Ak(s),k G {1,2}, s G S, is the action set of player k in state s; rk denotes

the payoff to player k, defined on the set of triples (s,i, j) with i G A1 (s) and j G A2(s). When the players choose i and j in state s, the payoff to

player k will be rk(s,i,j), and the probability that the next state will be sf

willbep(s'|s,i,j) with S8' P(*'lMi?) = 1?

For a repeated game with absorbing states, the non-absorbing state will

be called state 1 and the absorbing states will be states 2,3,..., K. Strategies for the game are completely defined by the choices the players make in state

1. The dimension of this state is taken to be ? ? ?, i.e. player I has M pure actions and player II has ? pure actions. A stationary strategy for player I will be denoted by x G ?? : = {x G &M : x(i) > 0 ,S??*(0 = 1}. A stationary strategy for player II will be denoted by y G ?/^ : = {y G

&N : VU) ^ 0 >?jLi!/(J) = 1}? When a player uses a stationary strategy,

he chooses an action according to the same randomized selection at every

decision moment; that is, when player I uses xG ?? then X{ equals the

probability that action i will be chosen, t = 1,2,..., M ; likewise for y G ?^.

A pure strategy in which action i is chosen with probability 1, will be denoted

bye?. An ARAT game is defined by the structural properties:

^(?,j)=a5(i)+6j(j), fc = l,2 and

P(s\ij) =p(s,i)+ p(sj), s = l,2,...,K andalli,j.

Since both players have just one action in the absorbing states, we will drop the parameters t and j in the payoffs when s = 2,3,...if. For s = 1, the subscript "1" will be dropped. We will suppose that the actions of the

players are arranged such that for some integers M and ?,

p(s,i) = 0, V*>2 , ?G{l,2,...,Af} ?

5]p(S,i)>0, tG{M + l,...,M} 3=2

p(s,?) = 0, Vs>2, ?G{1,2,...,?T} ?

I>(s>?)>0' je{? + i,...,N}.

The actions M + 1,..., ? and ? +1,..., ? will be called absorbing ac-

tions. A randomized strategy ? that puts positive weight on some absorbing action will be called absorbing. Note that when (x, y) is a strategy pair for

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 5: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

16 Evangelista, Raghavan and Vrieze

which either ? or y is absorbing, then the game will eventually reach one of

the absorbing states. We will also use the term "absorbing" to denote a pair of strategies of this type. For a pair of absorbing strategies (x, y) we denote

by rk(x,y) the one-step expected payoff to player k, given that absorption

occurs. The one-step absorption probability is denoted by p(x,y). Hence

with probability 1 - p(x, y) the play will remain at state 1 and go on.

We define C(x) : = {t : x(i) > 0} and C(y) : = {j : y(j) > 0}. If (x,y) is a strategy pair for which C(x) C {1,2,..., M} and C(y) C {1,2,???, ?}, then the game will forever remain in state 1. We will call (x,y) non-

absorbing. At every decision moment the one-step expected payoff for player

* equals o*(x) + 6*0,) := S&? ?*(0*(0 + S?=? &*0>0)? In general, the limiting average expected payoff to player k when the

game starts at state s and when strategies (?, G) are used by players I and

II with an immediate reward Ik(sT,iT,jT) to player k on the r-th day will

be denoted by

F*(?,G)(*) = limmf 1b ndf)/*(*,*,*)) r=l

In our model, state 1 is the only relevant state and from now on F* will

represent the payoff starting at state 1.

Lemma 2.1

(i) If stationary (x, y) is absorbing, then

^ S??+? S^??????? + Ef=a+1 SG=2?(*.JhM

??*+1 ?-2p(s'o*(o+?;=?+1 Ef=2?>(*,?)?/0?) (21)

(ii) If (?, y) is non-absorbing, then

<!>k(x,y) = ak(x) + bk(y). (2.2)

Proof: After the game moves to an absorbing state, the expected payoff

equals rk(x,y) at every decision moment. Thus the expected one-step ab-

sorption payoffs for every decision moment count fully in the limiting average

payoff. For decision moment t, the contribution equals (1 -p(x,y))trk(x,y).

Hence, when the pair (?r, y) is absorbing, then

?*(*,?) = S(1 -P(*,v))'r*(*,lf) =

F^G?

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 6: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 17

By the definition of the ARAT structure, the righthand side of (2.1) follows.

When (x,y) is non-absorbing, then player fc's expected payoff equals

ak(x) + bk(y) at every decision moment. D

Corollary 2.2 If ? is absorbing and y is non-absorbing, then the limiting

average payoff is independent ofy:

S?=?+? E?LaiK?. ?)*(?)

Similarly, if y is absorbing and ? is non-absorbing, then the payoff is inde-

pendent of x.

To construct the perturbed game, we first let

?

Yn : = {y ? AN ; ? ?0") > ?}? ?

i=JV+l

Observe that

(i) each y eYnis absorbing

(ii) ejeYnforjei? + ?,...,?}.

Define the perturbed game, G?, as the repeated game with payoffs F1 and

F2 and strategy spaces Am and Yn.

Lemma 2.3 For every n, the game Tn has at least one equilibrium.

Proof : By Corollary 2.2, an equilibrium for the repeated game G? is

a pure e-equilibrium in the (one-step) two-person nonzero-sum game with

payoffs F1 and F2 and strategy spaces ? m and ? at. Since each stationary

strategy pair (x, y) is absorbing, the payoff functions F1 and F2 are contin-

uous on the compact, convex set ?a/ ? In? Furthermore, since all ak and bk

may be supposed to be positive it can be shown (see Evangelista [1993]) that

F1 and F2 are quasi-concave on ??|?7? , i.e. for any real number a, the sets

{(x,y)| F*(?,?/) > a} are convex. By a theorem of Glicksberg [1952] or Fan

[1952], the nonzero-sum game [F1, F2, ??, Yn] has a pure Nash equilibrium which in turn is an equilibrium of the repeated game G?. ?

The proof of the main theorem will be based on a sequence of equilibria

fan, Vn) of the game G?, ? = 1,2, ? ? ?.

The following definition is essential for our approach. Let

M

Xabs := {xeAM: ?>(0<1}. i=l

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 7: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

18 Evangelista, Raghavan and Vrieze

Thus, Xab s consists of all the absorbing strategies of player I. Define for

j G {1,2,...,?},

Ej := {x?Xabs\ mxx$2(x,y) = F2(?,?;)}

y?AN

Hence, Ej contains those x G Xabsi for which pure strategy ej of player II

is a best reply. By Corollary 2.2 it follows that Ejx =

Ej2 for all j\,J2 G

{1,2,...,?G}, and so

AM = ?? U^U^?).

(2.3)

3. The Existence Proof. In accordance with (2.3) we distinguish three

cases which cover all possibilities. If in the sequel we need an accumulation

point of a bounded sequence, we suppose that an appropriate subsequence is chosen.

Case 1. There exists xn G UN ~ Ej for some n. j=AH-l

Case 2. Case 1 does not hold and xn G E\, all n.

Case 3. Cases 1 and 2 do not hold and xn G ?^, for all n G IN.

For each of these cases we will prove the existence of an e-equilibrium.

Case 1.

Theorem 3.1 IfxneEr for some j G {? + 1,..., ?}, then (xn, yn) is an

equilibrium in the original game G.

Proof : Since (xniVn) is an equilibrium of G?, we have

F^?,?/?) > F1(?,??), Vx G AM (3.1)

Since er eYn and Yn C An,

$2(Xn>Vn) = max$2(xn,y) > $2(xn,er). y?Yn J

Also, xn G Eh implies that

F2(3?,?t) = p^?F2(a;?,?/)

> mxz&2{xn,y), 3 y?An yeYn

and so

F2(*?, Vn) = max F2(??, y) (3.2) ????G

The combination of (3.1) and (3.2) proves the theorem. ?

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 8: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 19

Case 2. We start with a lemma that states that the total weight that

yn puts outside {1,2,... ,N} is minimal, namely ?. We use the following

abbrevations:

? ?

A*(*): = S S???.?????

t=M+1 *=2

? ?

F)? = S S?(*'*)*(*)

t=M+l ?=2

**(?): = S T,P(*>3)v(3)bk.

?=??+?a=2

??): = S S?^^?)

?=#+? *=2

Lemma 3.2 //?? G ??? and ?? g u?~+1Eif

???? S^=^+1

?/?0) = ?-

Proof: If xn G E\ and xn g Ej for each j G {? + 1,... ,N}, then

for every y G ???,

a(??) a(??) + b{y)

which gives

?\??) ^ ?2(y

F?) b{y) , Vy ? ?*. (3.3)

Suppose that the total weight that yn puts on {? + 1,..., N} is strictly

more than ?. Let j ? C(yn) {){? + 1,..., ?} be such that

?2<e5) _ min *(?*)

^ej) j?C(yn)n{AT+l,...,N} *^)

Consider the strategy yn defined as:

ft.(l)=?n(l)+?

yn(j) = yn(j) - d

Vnt?) = ?/??')> J F I, 3 F 3

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 9: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

20 Evangelista, Raghavan and Vrieze

where d > 0 is small enough so that

N 1

S ynU)>^ ?=#+1

It follows from (3.3) and the definition o? j that:

A\xn) + B\yn) >

B\yn) B*(e?

a(xn) + b(yn) b(yn) ~

6(e?)

"

This implies that

-,, -, A?(xn) + B\yn) -

SBHej) *(Xn,Vn) -

a(Xn) + b{yn)-Sb{ei)

A*(xn) + BHyn) _ 2 >

a(xn) + b(yn) -?<*?.*>

which contradicts the equilibrium point assumption of (rrn,yn). Hence the

assumption S7^, 1/nO) > ? is incorrect. D

Corollary 3.3 In case 2, we have C(yo) C {1,..., ?} where yo = Hm yn.

By the assumption of Case 2, xn G E\, so xn G ?a6?? Define xn as:

?n(i) = 0, i = l,2,...,M

&n(i) = xn(i)/Xn, i = AT + 1,..., Af

Within = S ??(0

i=?f+l

So ?n(t) puts weight only on absorbing actions.

Theorem 3.4 For e > 0, the pair (xniVn) w an ?-egttt/tftritim pomi in the

original game G, for ? large enough.

Proof. Part I: In the first part of the proof we will show that F^??, y ?) = F1(??? ?/n)? This will prove that for player I, xn is a best reply to yn. First,

observe that

Ak(in) = (1/Xn) AHxn) = Ak(xn)

o(x?) (l/Xn) a(xn) o(x?)

Next, we show that

A\xn) = B\yn)

F?) KVn)

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 10: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 21

a) Since (xn, yn) is an equilibrium point, we have for i G {1,..., M}:

--.-t?G??r? = F (Xn,yn) > F (e?,J/n) = "?J?x~-

a{xn) + b(yn) b(yn)

Hence

b) AISO, 9l(xn,Vn) > F^??????) ?G

??fo.)+**(??) > (!/*?) ^(?,? + ?1^)

a(??) + 6(yn) "

(l/*n) o(x?) + b(yn)

<xn) + b{yn) + (1/Xn -

1) a(x?)

Hence

Al{Xn) + Bl(yn) > (l/Xn -

1) Al(xn) = ?\??)

a(xn) + b(yn) -

(1/Xn -

1) a(x?) a(x?) '

And so

B\yn) A\xn)

Combining (3.5) and (3.6) gives

Al{xn) = B\yn)

?(xn) b(yn)

But then

Al(xn) + Bl(yn) = (1/Xn) A^xn) + Bl(yn)

o(xn) + b(yn) (1/Xn) o(x?) + 6(?/?)

and SO F???,?/?) = ^{XmVn) > F^??/?) , Vx ? ???

Part II: In the second part of the proof we will show that for player II,

yn is an e-best reply to xn, for ? large enough. Since xn 6 E\,

A\xn)^A\xn) + B\yn)

a(xn) o(xn) + 6(y?)

Because (xn,?/n) is an equilibrium point of the game G?,

A\xn) + B\yn)^A\xn) + B\y)

a(xn) + b{yn) -

a(xn) + b{y) ' Vy ? r"' (d8;

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 11: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

22 Evangelista, Raghavan and Vrieze

By (3.7) and (3.8),

?2(??) A\xn) + B\y)

o(xn) -

a(xn) + b(y) ' Vyt n?

Hence

A2(xn) B\y)

And also

A2(Xn) (1/??)?2(??) B2(y)

o(x?) (l/X?)o(x?) -

6(2/)

It follows that

, Vy e Yn.

??/* ?? - ?2(?n) + *2(y) < ?2(*?) Vw . ? F (??,?) = ?/- N , ./ ? < /- ? .Vj/ ? ??? a(xn) + b{y) a(??)

(3.9)

Observe that, since ?? puts all its weight on the absorbing actions, while

yn puts only weight 1/n on the absorbing actions we have for ? large enough:

A2(xn) ^ A2(xn) + B2(yn)_^2/a ,~ ? < ?,~ \ t u?7" + e = F (Xn,yn) + e (3.10)

a(??) a(??) + 6(yn)

Further, for j = 1,..., ?:

F2(??,?,) =

F2(??,?,) =

^ =

?j& (3.11)

By (3.9), (3.10) and (3.11), F2(??,1??) > F2(*?,?/) - e , Vy G AN. Parts I

and II prove that the pair (xn, yn) is an e-equilibrium point. ?

Case 3. Recall that this is the case where cases 1 and 2 do not hold and

xn is non-absorbing, i.e. xn G ?^. Let J G {iV + 1,..., N} be such that:

b{e-j) 6(ej) (3.12)

2. B*(e})

> B*(ej)

whenever ^^

= ^^ , Vj e {?+ 1,...,N} (3.13)

6(ej) 6(e?)

Case 3 itself is separated into 2 cases. We distinguish between the cases

Ejf 0 for some j ? {? + ?,...,?} and Ej = 0 for all j G {W + l,... ,?G}.

Case 3A: Bj F 0 for some j ? {?+1,...,N]

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 12: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 23

Theorem 3.5 Let ? ? Ej for some j G {? + 1,..., ?}. Then, for ? small

enough, the pair ((1 -

?)?? + ??, yn) is an e-equilibrium point.

Proof: Recall that Ej C Xabu so x G Xab?? By the equilibrium property

Of (Xn,?/n):

^-F2(? ?)>F2(? e~) = ^

On the other hand it follows from (3.12) that

B\yn) < *'(??)

%n) -

&(ej)

and so

*M = ?!?2? (3.14) Kiln) *(ej)

l '

Now for fixed e > 0, fixed n, we can choose ? > 0 small enough such that

the inequalities

,1//, xx- , X--.N - ?2(?/?) + ??2(?) F2((1-?)?? + ??,??) =

%?) + ?a(?)

>^?-f (MO

and for each j G {?G + 1,..., ?}:

F2((1-?)? + ????) - g2^) + AA2^

< *?&

+ F (3.16) Hej)

are both satisfied. Combining (3.14), (3.15) and (3.16) yields:

F2((1 -

?)?? + ??, ej) < F2((1 -

?)?? + ??,yn) + e (3.17)

for each j e {? + 1,..., ?}. Then F2(?, ej) > F2(?, e\) , or:

A2(x) + B2(e3)>A2(x)

a(?) + 6(ej) ""

a(?)

Hence

B\e3) A\x)

6(e3) a(x)

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 13: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

24 Evangelista, Raghavan and Vrieze

By definition of er, we obtain

?2(?) A2(x)

??)

- a(?)

'

which in combination with (3.14) leads to

R2(yw) A2(x)

KVn) a(x) '

Hence, by (3.15) we have for each j G {1,..., ?}:

F2((1 -

?)?? + ??, ej) =

^?- < F2((1

- ?)?? + ??, yn) + e (3.18)

?(?)

The inequalities (3.17) and (3.18) show that yn is an e-best answer to (1 -

?)?? + ??.

For player I things are easier. Since (xn, yn) is absorbing and ((1 -

?)?? +

??, yn) is absorbing for every ? > 0 it follows that:

?p?F??? -

?)?? + Xx,yn) = F1{??,??)

Since x? is a best reply to yn, it is obvious that, for ? > 0 small enough,

(1 ?

?)?? + ?? is an e-best reply to yn. This proves theorem 3.5. D

Case 3B: Ej = 0, all ; G {? + 1,... ,N}.

Lemma 3.6 // Ej = 0, all j G {? + 1,..., ?}, then Ej = X^ for all

j?{l,...,?}.

Proof: The lemma is trivially implied by its condition, since each x G

Xaba belongs to at least one Ej, j G {1,...,N}, and the EjJs for j G

{1,..., ?} are identical. ?

Theorem 3.7 Let xn G ?^ for all ? and suppose that Ej = 0, all j G

{? + 1,... ,N}. Then there exists an e-equilibrium point for the original

game G.

Proof : First suppose that

max / '/ > min v "

(3.19) i?{M+i,...,M} ?(et) j6{N+i,...,N} o{ej)

Let j be such that:

?Ke5) je{N+i.N) b(ej)

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 14: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 25

Let ? be such that :

???) ??(a) ??r- = jnax . (3.21)

a(e?) i?{Af+l,...,M} a(et)

Let yo ? ?^ be arbitrary. Consider the pair (e%, (1

- X)yo + Xe?) for ? > 0

small enough. Since

Hm F2(??, (1 -

X)y + Xe?) = ^2(ehy)

then for player II, (1 ?

X)y + Aej is an e-best reply to e\ for ? small enough. For player 1, note that for t ? {1,..., M},

F1(??,(1-?)?/ + ??5) =

^^,

and that

for i G {M + 1,... ,M}. It follows from (3.19), (3.20) and (3.21) that e-% is

an e-best reply to (1 -

X)yo + Xe? for ? small enough. The only remaining case is:

max ?f^f- < min v

{' (3.22) i?{M+l,...,M} a(e?) j?{Ar+i,...,N} Kej)

By the ARAT property it follows that the game restricted to{l,2,...,M}x

{1,2,..., ?} has a pure Nash equilibrium, say (fc, I). If

al{k) + ?(l)> max ^^ i6{M+l,...,M} a(ei)

and if

a2(k) + b2(l)> max ^,

then this pure equilibrium is also an equilibrium in the original repeated

game. Let ? be such that

A\e?) A\ej) ?f^-

= jnax / ? a(e?) i?{M+l,...,M} a(et)

and let j be such that

-tt?t- = max \ . 6(ej) ??{N+l,...,tf} 6(ej)

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 15: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

26 Evangelista, Raghavan and Vrieze

A1 (e-) If

* > a1 (i) + 61 (i), then we claim that {e%,ej) is an equilibrium

a(et-) point. By the definition of e?:

*W?)>*Wi), VtG{l,...,iNT},

and

F^,e^F2^), VyGA*,

since ef G 2?i = -Ej by lemma 3.6.

If / * > a2(i) + ft20) then we claim that (e?,ej) is an equilibrium

b(ej) point. By the definition of e?:

*2(ei,ej)><t2(ei,ej), Vj G {1,... ,M},

and by assumption (3.22) we have

AHx) B\ej)

~ajx? ~H?5)~ ' VX?JU"

so

a(x) + b(e3) 6(ej)

Further F1(??1, ej) = F1(??2, ej) , V?i, *2 G {1,..., ?}. This leads us to the conclusion that e% is a best answer to e?.

All the possible cases have been considered which completes the proof. a

Summarizing Theorems 3.1, 3.4, 3.5 and 3.7 we have:

Theorem 3.8 Every repeated ARAT game possesses an e-equilibrium point in stationary strategies.

REFERENCES

Bewley, T. and Kohlberg, E. (1976). The asymptotic theory of stochastic

games. Operations Research 1 197-208.

Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist. 33 719-726.

Blackwell, D. (1965). Discounted dynamic programming. Ann. Math. Statist. 36 226-235.

Blackwell, D. and Ferguson, T. S. (1968). The big match. Ann. Math. Statist. 39 159-163.

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 16: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

Repeated ARAT Games 27

Evangelista, Fe (1993). Equilibrium in bimatrix games and in repeated

games with additive rewards and transitions. PhD thesis, University of

Illinois at Chicago.

Fan, K. (1952). Fixed point and minimax theorems in locally convex topo-

logical linear spaces. Proc. Nat. Acad. Sciences 38 121-126.

Filar, J. A. (1981). A single loop stochastic game which one player can

terminate. Opsearch 18 185-203.

Filar, J. A. and Raghavan, T. E. S. (1991). Algorithms for stochastic

games - a survey. Methods and Models of Operations Research 35 437-

472.

Fink, A. M. (1964). Equilibrium points of stochastic noncoooperative games. J. Sci. Hiroshima University, Series A 28 89-93.

Flesch, F. (1995). Tech. Rept 09-1995, Dept. Math. University of Limburg, Mast rieht.

Glicksberg, I. L. (1952). A further generalization of the Kakutani fixed

point theorem with application to Nash equilibrium points. Proc Amer

Math Soc 38 170-174.

Kohlberg, E. (1974). Repeated games with absorbing states. Ann. Statis-

tics 2 724-738.

Mertens, J. F. and Neyman, A. (1991). Stochastic games. Int. J. of Game

Theory 10 53-66.

Parthasarathy, T. and Raghavan, T. E. S. (1971). Some topics in two-

person games. American Elsevier Publishers, New York.

Parthasarathy, T. and Raghavan, T. E. S. (1981). An orderfield property for stochastic games when one player controls transition probabilities. J. Optimization Theory and Appi. 33 375-392.

Parthasarathy, T., Tus, S. H. and Vrieze, O. J. (1984). Stochastic games with state independent transitions and reparable rewards. In Hammer

G., Pallasche D. (eds), Selected Topics in Operations Research and Math-

ematical Economics. Springer Verlag, Berlin, 262-271.

Raghavan, T. E. S., Tijs, S. H. and Vrieze, O. J. (1985). On stochas-

tic games with additive reward and additive transition structure. J.

Optimization Theory and Appi. 47 451-464.

Rogers, P. D. (1969). Nonzero-sum stochastic games. Report ORC 69-8,

Operations Research Centre, University of California, Berkeley.

Stern, M. A. (1975). On stochastic games with limiting average payoff. Ph.D. thesis, University of Illinois, Chicago.

Thuijsman, F. and Raghavan, T. E. S. (1994). Stochastic games with

switching control or ARAT structure. Report M94-06, Department of

Mathematics, University of Limburg, Maastricht.

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions

Page 17: Statistics, Probability and Game Theory: Papers in Honor of David Blackwell || Repeated ARAT Games

28 Evangelista, Raghavan and Vrieze

Thuijsman F. and Vrieze, O. J. (1991). Easy initial states in stochastic

games. In: Raghavan T.E.S., et.al. (eds). Stochastic Games and Related

Topics. Kluwer Academic Publishers, Dordrecht: 85-100.

Vrieze, O. J. and Thuijsman, F. (1989). On equilibria in repeated games with absorbing states. Int. J. of Game Theory 18 293-310.

Fe Evangelista

Department of Mathematics

University of Wisconsin at Wausau

Wausau, WI 54401

fevangelOuvcmail.uve.edu

O. J. Vrieze

Department of Mathematics

University of Limburg

P.O. Box 616, Maastricht

The Netherlands

OJ.Vriezeemath.rulimburg.nl

T. E. S. Raghavan

Department of Mathematics, Statistics and Computer Science

University of Illinois

Chicago, IL 60607-7045

[email protected]

This content downloaded from 62.122.79.31 on Fri, 13 Jun 2014 09:35:32 AMAll use subject to JSTOR Terms and Conditions