0809.4326

7/23/2019 0809.4326

1/27

Logical Methods in Computer Science

Vol. 6 (3:13) 2010, pp. 127

www.lmcs-online.org

Submitted Oct. 20, 2009

Published Sep. 1, 2010

ALGORITHMS FOR GAME METRICS

KRISHNENDU CHATTERJEEa, LUCA DE ALFARO b, RUPAK MAJUMDARc,AND VISHWANATH RAMANd

a IST Austria (Institute of Science and Technology Austria)e-mail address: [email protected]

b,d Computer Science Department, University of California, Santa Cruze-mail address: {luca,vishwa}@soe.ucsc.edu

c Department of Computer Science, University of California, Los Angelese-mail address: [email protected]

Abstract. Simulation and bisimulation metrics for stochastic systems provide a quanti-tative generalization of the classical simulation and bisimulation relations. These metricscapture the similarity of states with respect to quantitative specifications written in thequantitative -calculus and related probabilistic logics. We first show that the metricsprovide a bound for the difference in long-run average and discounted average behavioracross states, indicating that the metrics can be used both in system verification, and inperformance evaluation. For turn-based games and MDPs, we provide a polynomial-timealgorithm for the computation of the one-step metric distance between states. The algo-rithm is based on linear programming; it improves on the previous known exponential-timealgorithm based on a reduction to the theory of reals. We then present PSPACE algo-rithms for both the decision problem and the problem of approximating the metric distancebetween two states, matching the best known algorithms for Markov chains. For the bisim-

ulation kernel of the metric our algorithm works in time O(n4

) for both turn-based gamesand MDPs; improving the previously best known O(n9 log(n)) time algorithm for MDPs.For a concurrent game G, we show that computing the exact distance between states is

at least as hard as computing the value of concurrent reachability games and the square-root-sum problem in computational geometry. We show that checking whether the metricdistance is bounded by a rational r, can be done via a reduction to the theory of real closed

fields, involving a formula with three quantifier alternations, yielding O(|G|O(|G|5)) time

complexity, improving the previously known reduction, which yielded O(|G|O(|G|7)) time

complexity. These algorithms can be iterated to approximate the metrics using binarysearch.

1998 ACM Subject Classification: F.4.1, F.1.1, F.1.1.Key words and phrases: game semantics, minimax theorem, metrics, -regular properties, quantitative

-calculus, probabilistic choice, equivalence of states, refinement of states. A preliminary version of this paper titled Algorithms for Game Metrics appeared in the IARCS Annual

Conference on Foundations of Software Technology and Theoretical Computer Science, December 2008. Thisis a version with full proofs and extensions.

LOGICAL METHODS IN COMPUTER SCIENCE DOI:10.2168/LMCS-6 (3:13) 2010

c K. Chatterjee, L. de Alfaro, R. Majumdar, and V. RamanCC Creative Commons
http://creativecommons.org/about/licenseshttp://creativecommons.org/about/licenses

7/23/2019 0809.4326

2/27

2 K. CHATTERJEE, L. DE ALFARO, R. MAJUMDAR, AND V. RAMAN

1. Introduction

System metrics constitute a quantitative generalization of system relations. The bisim-

ulation relation captures state equivalence: two states s and t are bisimilar if and only ifthey cannot be distinguished by any formula of the -calculus[5]. The bisimulation metriccaptures the degree of differencebetween two states: the bisimulation distance betweensand t is a real number that provides a tight bound for the difference in value of formu-las of the quantitative -calculus at s and t [12]. A similar connection holds between thesimulation relation and the simulation metric.

The classical system relations are a basic tool in the study ofboolean properties ofsystems, that is, the properties that yield a truth value. As an example, if a state s of atransition system can reach a set of target states R, written s |= R in temporal logic,and t can simulate s, then we can conclude t |= R. System metrics play a similarlyfundamental role in the study of the quantitative behavior of systems. As an example,if a state s of a Markov chain can reach a set of target states R with probability 0.8,written s|= P0.8R, and if the metric simulation distance from t to s is 0.3, then we canconclude t |= P0.5R. The simulation relation is at the basis of the notions of systemrefinement and implementation, where qualitative properties are concerned. In analogousfashion, simulation metrics provide a notion of approximate refinement and implementationfor quantitative properties.

We consider three classes of systems:

Markov decision processes. In these systems there is one player. At each state, theplayer can choose a move; the current state and the move determine a probabilitydistribution over the successor states.

Turn-based games. In these systems there are two players. At each state, only oneof the two players can choose a move; the current state and the move determine a

probability distribution over the successor states. Concurrent games. In these systems there are two players. At each state, bothplayers choose moves simultaneously and independently; the current state and thechosen moves determine a probability distribution over the successor states.

System metrics were first studied for Markov chains and Markov decision processes (MDPs)[12, 32, 33, 13, 14], and they have recently been extended to two-player turn-based andconcurrent games [11]. The fundamental property of the metrics is that they provide atight bound for the difference in value that formulas belonging to quantitative specificationlanguages assume at the states of a system. More precisely, letq indicate the quantitative-calculus, a specification language in which many of the classical specification properties,including reachability and safety properties, can be written [10]. The metric bisimulationdistance between two states s and t, denoted [s g t], has the property that [s g t] =

supq |(s) (t)|, where (s) and (t) are the values assumes at s and t. To eachmetric is associated akernel: the kernel of a metric d is the relation that relates the pairs ofstates that have distance 0; to each metric corresponds a metric kernel relation. The kernelof the simulation metric is probabilistic simulation; the kernel of the bisimulation metric isprobabilistic bisimulation [27].

Metric as bound for discounted and long-run average payoff. Our first result isthat the metrics developed in [11] provide a bound for the difference in long-run averageand discounted averageproperties across states of a system. These average rewards play acentral role in the theory of stochastic games, and in its applications to optimal control and

7/23/2019 0809.4326

3/27

ALGORITHMS FOR GAME METRICS 3

economics[4, 17]. Thus, the metrics of [11]are useful both for system verification, and forperformance evaluation, supporting our belief that they constitute the canonical metrics for

the study of the similarity of states in a game. We point out that it is possible to definea discounted version [g] of the game bisimulation metric; however, we show that thisdiscounted metric does notprovide a bound for the difference in discounted values.

Algorithmic results. Next, we investigate algorithms for the computation of the metrics.The metrics can be computed in iterative fashion, following the inductive way in which theyare defined. A metricd can be computed as the limit of a monotonically increasing sequenceof approximations d0, d1,d2, . . . , whered0(s, t) is the difference in value that variables canhave at states s and t. For k 0, dk+1 is obtained from dk via dk+1 = H(dk), where theoperator Hdepends on the metric (bisimulation, or simulation), and on the type of system.Our main results are as follows:

(1) Metrics for turn-based games and MDPs. We show that for turn-based games, andMDPs, the one-step metric operator H for both bisimulation and simulation can

be computed in polynomial time, via a reduction to linear programming (LP). Theonly previously known algorithm, which can be inferred from [11], had EXPTIMEcomplexity and relied on a reduction to the theory of real closed fields; the algo-rithm thus had more a complexity-theoretic, than a practical, value. The key stepin obtaining our polynomial-time algorithm consists in transforming the originalsup-inf non-linearoptimization problem (which required the theory of reals) into aquadratic-size inf linearoptimization problem that can be solved via LP. We thenpresent PSPACE algorithms for both the decision problem of the metric distancebetween two states and for the problem of computing the approximate metric dis-tance between two states for turn-based games and MDPs. Our algorithms matchthe complexity of the best known algorithms for the sub-class of Markov chains [31].

(2) Metrics for concurrent games. For concurrent games, our algorithms for the Hoperator still rely on decision procedures for the theory of real closed fields, leadingto an EXPTIME procedure. However, the algorithms that could be inferred from[11]had time-complexity O(|G|O(|G|

7)), where|G|is the size of a game; we improve

this result by presenting algorithms with O(|G|O(|G|5)) time-complexity.

(3) Hardness of metric computation in concurrent games. We show that computing theexact distance of states of concurrent games is at least as hard as computing thevalue of concurrent reachability games [15,8], which is known to be at least as hardas solving the square-root-sum problem in computational geometry [18]. These twoproblems are known to lie in PSPACE, and have resisted many attempts to showthat they are in NP.

(4) Kernel of the metrics. We present polynomial time algorithms to compute the sim-

ulation and bisimulation kernel of the metrics for turn-based games and MDPs. Ouralgorithm for the bisimulation kernel of the metric runs in time O(n4) (assuming aconstant number of moves) as compared to the previous known O(n9 log(n)) algo-rithm of [35]for MDPs, wheren is the size of the state space. For concurrent games

the simulation and the bisimulation kernel can be computed in time O(|G|O(|G|3)),

where|G|is the size of a game.

Our formulation of probabilistic simulation and bisimulation differs from the one pre-viously considered for MDPs in [2]: there, the names of moves (called labels) must bepreserved by simulation and bisimulation, so that a move from a state has at most one

7/23/2019 0809.4326

4/27


candidate simulator move at another state. Our problem for MDPs is closer to the oneconsidered in [35], where labels must be preserved, but where a label can be associated with

multiple probability distributions (moves).For turn-based games and MDPs, the algorithms for probabilistic simulation and bisim-ulation can be obtained from the LP algorithms that yield the metrics. For probabilisticsimulation, the algorithm we obtain coincides with the algorithm previously published in[35]. The algorithm requires the solution of feasibility-LP problems with a number ofvariables and inequalities that is quadratic in the size of the system. For probabilisticbisimulation, we are able to improve on this result by providing an algorithm that requiresthe solution of feasibility-LP problems that have linearly many variables and constraints.Precisely, as for ordinary bisimulation, the kernel is computed via iterative refinement ofa partition of the state space [24]. Given two states that b elong to the same partition, todecide whether the states need to be split in the next partition-refinement step, we presentan algorithm that requires the solution of a feasibility-LP problem with a number of vari-

ables equal to the number of moves available at the states, and number of constraints linearin the number of equivalence classes. Overall, our algorithm for bisimulation runs in timeO(n4) (assuming a constant number of moves), considerably improving the O(n9 log(n))algorithm of [35] for MDPs, and providing for the first time a polynomial algorithm forturn-based games.

2. Definitions

Valuations. Let [1, 2] IR be a fixed, non-singleton real interval. Given a set of statesS, a valuation over S is a function f : S [1, 2] associating with every state s S avalue 1 f(s) 2; we let Fbe the set of all valuations. For c [1, 2], we denote by cthe constant valuation such that c(s) =c at all s S. We order valuations pointwise: for

f, g F, we write f g ifff(s) g(s) at all s S; we remark that F, under , formsa lattice. Given a, b IR, we write a b = max{a, b}, and a b = min{a, b}; we also leta b= min{1, max{0, a + b}}and a b= max{0, min{1, a b}}. We extend, , +, , , to valuations by interpreting them in pointwise fashion.

Game structures. For a finite setA, let Dist(A) denote the set of probability distributionsover A. We say that p Dist(A) is deterministic if there is a A such that p(a) = 1. Weassume a fixed finite set Vofobservation variables.

A (two-player, concurrent) game structure G= S, [], Moves, 1, 2, consists of thefollowing components [1,8]:

A finite set Sof states. A variable interpretation [] : V [1, 2]S, which associates with each variable

v Va valuation [v]. A finite set Movesof moves. Two move assignments 1, 2: S 2Moves \ {}. Fori {1, 2}, the assignment i

associates with each state s Sthe nonempty set i(s) Movesof moves availableto player i at state s.

A probabilistic transition function : S Moves MovesDist(S), that gives theprobability (s, a1, a2)(t) of a transition from s to t when player 1 plays move a1and player 2 plays move a2.

7/23/2019 0809.4326

5/27


At every state s S, player 1 chooses a move a1 1(s), and simultaneously and indepen-dently player 2 chooses a move a2 2(s). The game then proceeds to the successor state

t S with probability (s, a1, a2)(t). We let Dest(s, a1, a2) ={t S | (s, a1, a2)(t) > 0}.The propositional distancep(s, t) between two states s, t Sis the maximum difference inthe valuation of any variable:

p(s, t) = maxvV

|[v](s) [v](t)|.

The kernel of the propositional distance induces an equivalence on states: for statess, t, welet s t ifp(s, t) = 0. In the following, unless otherwise noted, the definitions refer to agame structure with componentsG = S, [], Moves, 1, 2, . We indicate the opponent ofa player i {1, 2} byi= 3 i. We consider the following subclasses of game structures.

Turn-based game structures. A game structure G is turn-based if we can write S =S1 S2 with S1 S2 = where s S1 implies |2(s)|= 1, and s S2 implies |1(s)|= 1,and further, there exists a special variable turn V, such that [turn]s= 1 iffs S1, and

[turn]s= 2 iffs S2.Markov decision processes. For i {1, 2}, we say that a structure is an i-MDP ifs S, |i(s)|= 1. For MDPs, we omit the (single) move of the player without a choiceof moves, and write (s, a) for the transition function.

Moves and strategies. Amixed moveis a probability distribution over the moves availableto a player at a state. We denote by Di(s) Dist(Moves) the set of mixed moves availableto player i {1, 2} at s S, where:

Di(s) ={D Dist(Moves)| D(a)> 0 implies a i(s)}.

The moves in Movesare called pure moves. We extend the transition function to mixedmoves by defining, for s Sand x1 D1(s), x2 D2(s),

(s, x1, x2)(t) =

a11(s)

a22(s)

(s, a1, a2)(t) x1(a1) x2(a2).

Apath ofG is an infinite sequence s0, s1, s2,...of states in s S, such that for all k 0,there exist moves ak1 1(sk) and a

k2 2(sk) with (sk, a

k1, a

k2)(sk+1)> 0. We write for

the set of all paths, and s for the set of all paths starting from state s.Astrategyfor playeri {1, 2}is a functioni: S

+ Dist(Moves) that associates withevery non-empty finite sequence S+ of states, representing the history of the game, aprobability distribution i(), which is used to select the next move of player i; we requirethat for all S and states s S, if i(s)(a) > 0, then a i(s). We write i forthe set of strategies for player i. Once the starting state s and the strategies 1 and 2 forthe two players have been chosen, the game is reduced to an ordinary stochastic process,

denotedG1,2s , which defines a probability distribution on the set of paths. We denote by

Pr1,2s () the probability of a measurable event (sets of paths) with respect to this process,and denote by E1,2s () the associated expectation operator. For k 0, we let Xk : Sbe the random variable denoting the k-th state along a path.

One-step expectations and predecessor operators. Given a valuation f F, a states S, and two mixed moves x1 D1(s) and x2 D2(s), we define the expectation offfroms under x1, x2 by,

Ex1,x2s (f) =

tS

(s, x1, x2)(t) f(t).

7/23/2019 0809.4326

6/27


For a game structure G, for i {1, 2} we define the valuation transformer Prei : F Fby, for all f F and s Sas,

Prei(f)(s) = supxiDi(s)

infxiDi(s)

Exi,xis (f).

Intuitively, Prei(f)(s) is the maximal expectation player i can achieve off after one stepfrom s: this is the standard one-day or next-stage operator of the theory of repeatedgames[17].

2.1. Quantitative -calculus. We consider the set of properties expressed by thequanti-tative-calculus (q). As discussed in [20,10,22], a large set of properties can be encodedin q, spanning from basic properties such as maximal reachability and safety probability,to the maximal probability of satisfying a general -regular specification.

Syntax. The syntax of quantitative -calculus is defined with respect to the set of obser-

vation variables Vas well as a set MVarsofcalculus variables, which are distinct from theobservation variables in V. The syntax is given as follows:

::= c| v | V | | | | c| c| pre1()| pre2()| V. | V.

for constants c [1, 2], observation variables v V, and calculus variables V MVars.In the formulas V. and V., we furthermore require that all occurrences of the boundvariable V in occur in the scope of an even number of occurrences of the complementoperator . A formula is closed if every calculus variable V in occurs in the scope ofa quantifier V or V. From now on, with abuse of notation, we denote byq the set ofclosed formulas ofq. A formula is a player i formula, for i {1, 2}, if does not containthe prei operator; we denote with qi the syntactic subset ofq consisting only of closedplayer i formulas. A formula is in positive form if the negation appears only in front of

constants and observation variables, i.e., in the context cand v; we denote with q

+

andq+i the subsets ofq andqi consisting only of positive formulas.We remark that the fixpoint operators and will not be needed to achieve our

results on the logical characterization of game relations. They have been included in thecalculus because they allow the expression of many interesting properties, such as safety,reachability, and in general, -regular properties. The operators and , on the otherhand, are necessary for our results.

Semantics. A variable valuation : MVars F is a function that maps every variableV MVars to a valuation in F. We write [V f] for the valuation that agrees with on all variables, except that V is mapped to f. Given a game structureG and a variablevaluation , every formula of the quantitative -calculus defines a valuation [[]]G F

(the superscriptG is omitted if the game structure is clear from the context):

[[c]] =c [[v]] = [v]

[[V]] =(V) [[]] =1 [[]]

[[

c]] = [[]]

c [[1

2]] = [[1]]

[[2]]

[[prei()]] = Prei([[]]) [[

V. ]] =

infsup

{f F |f= [[]][Vf]}

where i {1, 2}. The existence of the fixpoints is guaranteed by the monotonicity andcontinuity of all operators and can be computed by Picard iteration [10]. If is closed, [[]]is independent of, and we write simply [[]].

7/23/2019 0809.4326

7/27


Discounted quantitative -calculus. A discountedversion of the -calculus was intro-duced in[9]; we call this d. Let be a finite set of discount parameters that take values

in the interval [0, 1). The discounted -calculus extends q by introducing discounted ver-sions of the player pre modalities. The syntax replaces prei() for player i {1, 2} withits discounted variant, prei(), where is a discount factor that discounts one-stepvaluations. Negation in the calculus is defined as ( pre1()) = (1 ) + pre2().This leads to two additional pre-modalities for the players, (1 ) + prei().

2.2. Game bisimulation and simulation metrics. A directed metric is a function d :S2 IR0 which satisfies d(s, s) = 0 and the triangle inequality d(s, t) d(s, u) + d(u, t)for all s,t,u S. We denote by M S2 IR the space of all directed metrics; thisspace, ordered pointwise, forms a lattice which we indicate with (M, ). Sinced(s, t) maybe zero for s =t, these functions are pseudo-metricsas per prevailing terminology [32]. Inthe following, we omit directed and simply say metric when the context is clear. For a

metric d, we indicate with C(d) the set of valuations k F where k(s) k(t) d(s, t) forevery s, t S. A metric transformerH1 : M M is defined as follows, for all d Mand s, t S:

H1(d)(s, t) =p(s, t) supkC(d)

Pre1(k)(s) Pre1(k)(t)

. (2.1)

Theplayer 1 game simulation metric[1] is the least fixpoint ofH1; thegame bisimulationmetric [1] is the least symmetrical fixpoint ofH1 and is defined as follows, for alld Mand s, t S:

H1(d)(s, t) =H1(d)(s, t) H1(d)(t, s). (2.2)

The operator H1 is monotonic, non-decreasing and continuous in the lattice (M, ). Wecan therefore computeH1using Picard iteration; we denote by [

n1 ] =Hn1 (0) then-iterate

of this. From the determinacy of concurrent games with respect to -regular goals [21], wehave that the game bisimulation metric is reciprocal, in that [1] = [2]; we will thus simplywrite [g]. Similarly, for all s, t Swe have [s1t] = [t2s].

The main result in[11] about these metrics is that they are logically characterized by thequantitative -calculus of [10]. We omit the formal definition of the syntax and semantics ofthe quantitative -calculus; we refer the reader to [10] for details. Given a game structureG, every closed formula of the quantitative -calculus defines a valuation [[]] F.Let q (respectively, q+1) consist of all quantitative -calculus formulas (respectively, allquantitative -calculus formulas with only the Pre1operator and all negations before atomicpropositions). The result of [11]shows that for all states s, t S,

[s1t] = supq+1

([[]](s) [[]](t)) [sg t] = supq

|[[]](s) [[]](t)|. (2.3)

Metrics for the discounted quantitative -calculus. We call d the discounted-calculus with all discount parameters . We define the discounted metrics via an-discounted metric transformer H: M M, defined for all d M and all s, t Sby:

H1(d)(s, t) = p(s, t) supkC(d)


. (2.4)

Again,H1is continuous and monotonic in the lattice (M, ). The-discounted simulationmetric[1] is the least fixpoint ofH1 , and the -discounted bisimulation metric [1]

isthe least symmetrical fixpoint ofH1. The following result follows easily by induction on

7/23/2019 0809.4326

8/27


the Picard iterations used to compute the distances [9]; for all statess, t Sand a discountfactor [0, 1),

[s1t] [s1t] [s1t] [s1t]. (2.5)Using techniques similar to the undiscounted case, we can prove that for every game struc-ture G and discount factor [0, 1), the fixpoint [i]

is a directed metric and [i]

is a metric, and that they are reciprocal, i.e., [1] = [2] and [1] = [2]. Giventhe discounted bisimulation metric coincides for the two players, we write [ g]

instead of[1] and [2]. We now state without proof that the discounted -calculus provides alogical characterization of the discounted metric. The proof is based on induction on thestructure of formulas, and closely follows the result for the undiscounted case [11]. Let

d (respectively, d,+1 ) consist of all discounted-calculus formulas (respectively, all dis-counted -calculus formulas with only the Pre1 operator and all negations before atomicpropositions). It follows that for all game structures G and states s, t S,

[s1t] = supd

,+1

([[]](s) [[]](t)) [sg t] = supd

|[[]](s) [[]](t)|. (2.6)

Metric kernels. The kernel of the metric [g] ([g]) defines an equivalence relation g

(g ) on the states of a game structure: sg t(sg t) iff [sg t] = 0 ([sg t]

= 0); therelation g is called the game bisimulation relation [11] and the relation

g is called the

discounted game bisimulation relation. Similarly, we define the game simulation preorders 1 t as the kernel of the directed metric [1], that is, s 1 t iff [s 1 t] = 0. Thediscounted game simulationpreorder is defined analogously.

3. Bounds for Average and Discounted Payoff Games

From (2.3) it follows that the game bisimulation metric provides a tight bound for thedifference in valuations of quantitative -calculus formulas. In this section, we show that thegame bisimulation metric also provides a bound for the difference in average and discountedvalue of games. This lends further support for the game bisimulation metric, and its kernel,the game bisimulation relation, being the canonical game metrics and relations.

3.1. Discounted payoff games. Let 1 and 2 be strategies of player 1 and player 2respectively. Let [0, 1) be a discount factor. The-discountedpayoffv1 (s, 1, 2) forplayer 1 at a state s for a variable r V and the strategies 1 and 2 is defined as:

v1 (s, 1, 2) = (1 )

n=0

n E1,2s

[r](Xn)

, (3.1)

whereXnis a random variable representing the state of the game in step n. The discountedpayoff for player 2 is defined as v2 (s, 1, 2) = v

1 (s, 1, 2). Thus, player 1 wins (and

player 2 loses) the discounted sum of the valuations of r along the path, where thediscount factor weighs future rewards with the discount . Given a state s S, we areinterested in finding the maximal payoffvi(s) that playeri can ensure against all opponentstrategies, when the game starts from state s S. This maximal payoff is given by:

wi(s) = supii

infii

vi(s, i, i).

7/23/2019 0809.4326

9/27


These values can be computed as the limit of the sequence of-discounted,n-step rewards,forn . Fori {1, 2}, we define a sequence of valuations wi(0)(s),w

i(1)(s),w

i(2)(s),

. . . as follows: for all s Sand n 0:wi(n+ 1)(s) = (1 ) [r](s) + Prei(w

i(n))(s). (3.2)

where the initial valuation wi(0) is arbitrary. Shapley proved that wi = limn w

i(n)

[28].

3.2. Average payoff games. Let1 and2 be strategies of player 1 and player 2 respec-tively. The averagepayoffv1(s, 1, 2) for player 1 at a state s for a variable r Vand thestrategies1 and 2 is defined as

v1(s, 1, 2) = lim infn

1

n

n1k=0

E1,2s

[r](Xk)

, (3.3)

where Xk is a random variable representing the k-th state of the game. The reward forplayer 2 is v2(s, 1, 2) =v1(s, 1, 2). A game structureG with average payoff is calledan average reward game. The average value of the game G at s for player i {1, 2} isdefined as

wi(s) = supii

infii

vi(s, i, i).

Mertens and Neyman established the determinacy of average reward games, and showed thatthe limit of the discounted value of a game as all the discount factors tend to 1 is the sameas the average value of the game: for alls Sand i {1, 2}, we have lim1w

i(s) =wi(s)

[23]. It is easy to show that the average value of a game is a valuation.

3.3. Metrics for discounted and average payoffs. We show that the game simulationmetric [1] provides a bound for discounted and long-run rewards. The discounted metric[1] on the other hand does not provide such a bound as the following example shows.

s t s

t

2 5 2.1 8

Figure 1: Example that shows that the discounted metric may not be an upper bound forthe difference in the discounted value across states.

Example 1. Consider a game consisting of four statess, t, s, t, and a variable r, with[r](s) = 2 , [r](s) = 2.1, [r](t) = 5, and [r](t) = 8 as shown in Figure 1. All playershave only one move at each state, and the transition relation is deterministic. Considera discount factor = 0.9. The 0.9-discounted metric distance between states s and s, is[s g s]

0.9 = 0.9(85) = 2.7. For the difference in discounted values between the states weproceed as follows. Using formulation3.2, takingw(0)(t) = 5, since statetis absorbing, wegetw(1)(t) = (1 0.9) 5 + 0.9 5 = 5 which leads to w(n)(t) = 5 for all n 0. Similarlyw(n)(t) = 8 for alln 0. Therefore, the difference in discounted values between s and s,again using3.2, is given by: w(s) w(s) = (1 0.9) (2.1 2) + 0.9 (8 5) = 2.71.

7/23/2019 0809.4326

10/27


In the following we consider player 1 rewards (the case for player 2 is identical).

Theorem 1. The following assertions hold.

(1) For all game structuresG, -discounted rewardsw1 , for all statess, t S, we have,(a) w1 (s) w

1 (t) [s1 t] and (b) |w

1 (s) w

1 (t)| [sg t].

(2) There exists a game structure G, states s, t S, such that for all -discountedrewardsw1 , w

1 (t) w

1 (s)> [tg s]

.

Proof. We first prove assertion (1)(a). As the metric can be computed via Picard iteration,we have for all n 0:

[sn1 t] =p(s, t) supkC([n11 ])

(Pre1(k)(s) Pre1(k)(t)). (3.4)

We prove by induction on n 0 thatw1 (n)(s) w1 (n)(t) [s

n1 t]. For all s S, taking

w1 (0)(s) = [r](s), the base case follows. Assume the result holds for n 1 0. We have:

w1 (n)(s) w

1 (n)(t) = (1 ) [r](s) + Pre1(w

(n 1))(s)(1 ) [r](t) Pre1(w

(n 1))(t)

= (1 )

[r](s) [r](t)

+

Pre1(w(n 1))(s) Pre1(w

(n 1))(t)

(1 ) p(s, t) + [sn1 t] [sn1 t],

where the last step follows by (3.4), since by the induction hypothesis we have w1 (n 1)C([n11 ]). This proves assertion (1)(a). Given (1)(a), from the definition of [s g t] =[s1t] [t1s], (1)(b) follows.

The example shown in Figure1proves the second assertion.

Using the fact that the limit of the discounted reward, for a discount factor that ap-proaches 1, is equal to the average reward, we obtain that the metrics provide a bound forthe difference in average values as well.

Corollary 1. For all game structuresG and statess and t, we have (a) w(s) w(t) [s1t] and (b) |w(s) w(t)| [sg t].

3.4. Metrics for total rewards. The total rewardvT1(s, 1, 2) for player 1 at a state sfor a variable r Vand the strategies 1 1 and 2 2 is defined as[17]:

vT1(s, 1, 2) = lim infn

1

n

n1

k=0k

j=0E

1,2s

[r](Xj)

, (3.5)

whereXjis a random variable representing thej-th state of the game. The payoffvT2(s, 1, 2)for player 2 is defined by replacing [r] with[r] in (3.5). The total-reward valueof the gameG at s for player i {1, 2} is defined analogously to the average value, via,

wTi (s) = supii

infii

vTi (s, 1, 2).

While the game simulation metric [g] provides an upper bound for the difference in dis-counted reward across states, as well as for the difference in average reward across states,it does not provide a bound for the difference in total reward. We now introduce a new

7/23/2019 0809.4326

11/27


metric, the total reward metric, [g], which provides such a b ound. For a discount factor [0, 1), we define a metric transformer H1 : M M as follows. For all d M and

s, t S, we let:H1(d)(s, t) =p(s, t) + sup

kC(d)


. (3.6)

The metric [1] (resp. [1]

) is obtained as the least (resp. least symmetrical) fixpointof (3.6). We write [1] for [1]

1, and [1] for [1]1. These metrics are reciprocal, i.e.,

[1] = [2]

and [1] = [2]

. If < 1 we get the discounted total reward metric andif = 1 we get the undiscounted total reward metric. While the discounted total rewardmetric is bounded, the undiscounted total reward metric may not be bounded. The totalmetrics provide bounds for the difference in discounted, average, and total reward betweenstates.

Theorem 2. The following assertions hold.

(1) For all game structuresG, for all discount factors [0, 1), for all statess, t S,(a) [s 1t]

(2 1)/(1 ), (b) [s 1t] [s 1t],

(c) w1 (s) w1 (t) [s 1t]

, (d) w1(s) w1(t) [s 1t],

(e) wT1(s) wT1(t) [s 1t].

(2) There exists a game structureG and statess, t Ssuch that, [s 1t] =.

Proof. For assertion (1)(a), notice thatp(s, t) (2 1). Consider the n-step Picard iteratetowards the metric distance. We have,

[s n1 t]

n

i=0i (2 1).

In the limit this yields [s 1t] (2 1)/(1 ). Assertion (1)(b) follows by inductionon the Picard iterations that realize the metric distance. For all n 0, [sn1t]

[sn1 t].Assertion (1)(c) follows by the definition of the discounted total reward metric where wehave replaced the with a +. By induction, for all n 0, from the proof of Theorem 1 wehave,

w1 (n)(s)w1 (n)(t) (1)p(s, t)+(Pre1(w

1 (n1))(s)Pre1(w

1 (n1))(t)) [s

n1 t]

.

For assertion (1)(d), towards an inductive argument on the Picard iterates that realize themetric, for all n 0, we have [sn1 t][s

n1 t], which in the limit gives [s1t] [s1t].

This leads to w1(s) w1(t) [s 1t], using Corollary1. This proves assertion (1)(d). Wenow prove assertion (1)(e) by induction and show that for all n 0, wT1(n)(s) w

T1(n)(t)

[s n1 t]. As the metric can be computed via Picard iteration, we have for all n0:

[s n1 t] =p(s, t) + supkC([n11 ])

(Pre1(k)(s) Pre1(k)(t)). (3.7)

We define a valuation transformer u : F F as u(0) = [r] and for all n > 0 and states Sas,

u(n)(s) = [r](s) + Pre1(u(n 1))(s)

7/23/2019 0809.4326

12/27


We take wT1(0) = u(0) = [r] and for n > 0, from the definition of total rewards (3.5), weget the n-step total reward value at a state s S in terms ofu as,

wT1(n)(s) = 1n

n

i=1

u(i)(s).

Notice that wT1(n)(s) u(n) for all n 0. Whenn = 0, the result is immediate by thedefinition ofwT1(0), noticing that [s

01t] =p(s, t). Assume the result holds forn 1 0.

We have:

wT1(n)(s) wT1(n)(t) =

1

n

ni=1

u(i)(s) 1

n

ni=1

u(i)(t)

= 1

n

ni=1

(u(i)(s) u(i)(t))

= 1n

n

i=1

(([r](s) [r](t))+

(Pre1(u(i 1))(s) Pre1(u(i 1))(t))) (3.8)

1

n

ni=1

[s i1t] (3.9)

[s n1 t], (3.10)

where (3.9) follows from (3.8) by (3.7), since by our induction hypothesis we have wT1(i)u(i) C([i1]) for all 0 i < nand (3.10) follows from (3.9) from the monotonicity of theundiscounted total reward metric. To prove assertion (2), consider the game structure on theleft hand side in Figure1. The total reward at states is unbounded;wT

1(s) = 2+5+. . .=

Now consider a modified version of the game, with identical structure and with states s

andt corresponding tos and t of the original game. Let [r](t) = 0. In the modified game,wT1(s

) = 2. From result (1)(e), since wT1(s) = and wT1(s

) = 2, we have [s1s] =.

It is a very simple observation that the quantitative -calculus does not provide a logicalcharacterization for [1 ] or [1]. In fact, all formulas of the quantitative -calculus havevaluations in the interval [1, 2], while as stated in Theorem 2, the total reward can beunbounded. The difference is essentially due to the fact that our version of the quantitative-calculus lacks a + operator. It is not clear how to introduce such a + operator ina context sufficiently restricted to provide a logical characterization for [1 ]; above all,it is not clear whether a canonical calculus, with interesting formal properties, would beobtained.

3.5. Metric kernels. We now show that the kernels of all the metrics defined in the papercoincide: an algorithm developed for the game kernels1 and g, compute the kernels ofthe corresponding discounted and total reward metrics as well.

Theorem 3. For all game structuresG, statess and t, all discount factors [0, 1), thefollowing statements are equivalent:

(a) [s1t] = 0 (b) [s1 t] = 0 (c) [s 1t]

= 0.

7/23/2019 0809.4326

13/27


Proof. We prove (a) (b) (c) (a). We assume 0 < 0. Then there must be k C([n11 ]

) such that Pre1(k)(s) Pre1(k)(t)> 0.By our induction hypothesis, there exists a >0 such that k = k C([n11 ]

). SincePre is multi-linear, the player optimal responses in Pre1(k)(s) remain optimal for k

. Butthis means (Pre1(k

)(s) Pre1(k)(t))> 0 for k C([n11 ]

), leading to [sn t] >0; acontradiction. Therefore, (b) (c). In a similar fashion we can show that (c) (a).

4. Algorithms for Turn-Based Games and MDPs

In this section, we present algorithms for computing the metric and its kernel for turn-based games and MDPs. We first present a polynomial time algorithm to compute theoperator Hi(d) that gives the exactone-step distance between two states, for i {1, 2}.We then present a PSPACE algorithm to decide whether the limit distance between twostates s and t (i.e., [s 1 t]) is at most a rational value r. Our algorithm matches thebest known bound known for the special class of Markov chains [31]. Finally, we presentimproved algorithms for the important case of the kernel of the metrics. Since by Theorem3the kernels of the metrics introduced in this paper coincide, we present our algorithms for thekernel of the undiscounted metric. For the bisimulation kernel our algorithm is significantlymore efficient compared to previous algorithms.

4.1. Algorithms for the metrics. For turn-based games and MDPs, only one player has

a choice of moves at a given state. We consider two player 1 states. A similar analysisapplies to player 2 states. We remark that the distance b etween states in Si and Si isalways 2 1 due to the existence of the variable turn. For a metric d M, and statess, t S1, computing H1(d)(s, t), given that p(s, t) is trivially computed by its definition,entails evaluating the expression, supkC(d)


, which is the same as,

supkC(d)supxD1(s) infyD1(t)(Exs (k) E

yt (k)), since Pre1(k)(s) = supxD1(s)(E

xs(k)) and

Pre1(k)(t) = supyD1(t)(Eyt (k)) as player 1 is the only player with a choice of moves at state

s. By expanding the expectations, we get the following form,

supkC(d)

supxD1(s)

infyD1(t)

uS

a1(s)

(s, a)(u) x(a) k(u)vS

b1(t)

(t, b)(v) y(b) k(v)

.

(4.1)

We observe that the one-step distance as defined in (4.1) is asup-inf non-linear (quadratic)optimization problem. We now present two lemmas by which we transform (4.1) to an inflinearoptimization problem, which we solve by linear programming (LP). The first lemmareduces (4.1) to an equivalent formulation that considers only pure moves at state s. Thesecond lemma further reduces (4.1), using duality, to a formulation that can be solved usingLP.

7/23/2019 0809.4326

14/27


Lemma 1. For al l turn-based game structuresG, for all player i statess and t, given ametricd M, the following equality holds,

supkC(d)

supxDi(s)

infyDi(t)

(Exs(k) Eyt (k)) = sup

ai(s)inf

yDi(t)sup

kC(d)(Eas(k) E

yt (k)).

Proof. We prove the result for player 1 states s and t, with the proof being identical forplayer 2. Given a metric d M, we have,

supkC(d)

supxD1(s)

infyD1(t)

(Exs (k) Eyt (k)) = sup

kC(d)( supxD1(s)

Exs (k) sup

yD1(t)E

yt (k))

= supkC(d)

( supa1(s)

Eas(k) sup

yD1(t)

Eyt (k)) (4.2)

= supkC(d)

supa1(s)

infyD1(t)

(Eas(k) Eyt (k))

= supa1(s)

supkC(d)

infyD1(t)

(Eas(k) Eyt (k)) (4.3)

= supa1(s)

infyD1(t)

supkC(d)

(Eas(k) Eyt (k)) (4.4)

For a fixedk C(d), since pure optimal strategies exist at each state for turn-based gamesand MDPs, we replace the supxD1(s) with supa1(s) yielding (4.2). Since the difference in

expectations is multi-linear, y D1(t) is a probability distribution and C(d) is a compactconvex set, we can use the generalized minimax theorem [29], and interchange the innermostsup inf to get (4.4) from (4.3).

The proof of Lemma1 is illustrated using the following example.

s

u v

bce

(a) MDP 1

t

u v

bc

(b) MDP 2

Figure 2: An example illustrating the proof of Lemma1.

Example 4.1. Consider the example in Figure2. In the MDPs shown in the figure, everymove leads to a unique successor state, with the exception of move e 1(s), which leadsto states u and v with equal probability. Assume the variable valuations are such thatall states are at a propositional distance of 1. Without loss of generality, assume that thevaluation k C(d) is such that k(u) > k(v). By the linearity of expectations, for movec 1(s), Ecs(k) E

xs(k) for allx D1(s). Similar arguments can be made for k(u)< k(v).

This gives an informal justification for step (4.2) in the proof; given a k C(d), there existpure optimal strategies for the single player with a choice of moves at each state. While wecan use pure moves at states s and t ifk C(d)is known, the principle difficulty in directlycomputing the left hand side of the equality arises from the uncountably many values fork; the distance is the supremum over all possible values of k. In the final equality, step

7/23/2019 0809.4326

15/27


(4.4), and hence by this theorem, we have avoided this difficulty, by showing an equivalentexpression that picks ak C(d) to show the difference in distributions induced over states.

As we shall see, this enables computing the one-step metric distance using a trans-shippingformulation. We remark that while we can use pure moves at state s, we cannot do so atstate t in the right hand side of step (4.4) of the proof. Firstly, the proof of the theoremdepends on y D1(t) being convex. Secondly, if we could restrict our attention to puremoves at state t, then we can replace infyD1(t) with inff1(t) on the right hand side. Butthis yields too fine a one-step distance. Consider move e at state s. We see that neithercnorb at state t yield distributions over states that match the distribution induced by e. We

can then always pickk C(d) such that Ees(k)Eft(k)> 0. If we choosey D1(t) such that

y(b) =y(c) = 12 , we match the distribution induced by move e from state s, which implies

that for any choice ofk C(d), Ees(k) Ey(b)=y(c)= 1

2t (k) = 0. Intuitively, the right hand side

of the equality can be interpreted as a game between a protagonist and an antagonist, withthe protagonist picking y D1(t), for every pure move a 1(s), to match the induceddistributions over states. The antagonist then picks a k C(d) to maximize the differencein induced distributions. If the distributions match, then no choice of k C(d) yields adifference in expectations bounded away from 0.

From Lemma 1, given d M, we can write the player 1 one-step distance betweenstates s andt as follows,

OneStep(s,t,d) = supa1(s)

infyD1(t)

supkC(d)

(Eas(k) Eyt (k)). (4.5)

Hence we compute for all a 1(s), the expression,

OneStep(s,t,d,a) = inf yD1(t)

supkC(d)

(Eas(k) Eyt (k)),

and then choose the maximum, i.e., maxa1(s)OneStep(s,t,d,a). We now present a lemmathat helps reduce the above inf sup optimization problem to a linear program. We firstintroduce some notation. We denote by the set of variables u,v, for u, v S. Givena 1(s), and a distribution y D1(t), we write (s,t,a,y) if the following linearconstraints are satisfied:

(1) for all v S :uS

u,v= (s, a)(v); (2) for all u S :vS

u,v=

b1(t)

y(b) (t, b)(u);

(3) for all u, v S: u,v 0 .

Lemma 2. For al l turn-based game structures and MDPsG, for alld M, and for alls, t S, the following assertion holds:

supa1(s) infyD1(t) supkC(d)(Eas(k) E

y

t (k)) = supa1(s) infyD1(t) inf(s,t,a,y)

u,vSd(u, v) u,v

.

Proof. Since duality always holds in LP, from the LP duality based results of [32], for alla 1(s) and y D1(t), the maximization over all k C(d) can be re-written as aminimization problem as follows:

supkC(d)

(Eas(k) Eyt (k)) = inf

(s,t,a,y)

u,vS

d(u, v) u,v

.

7/23/2019 0809.4326

16/27


The formula on the right hand side of the above equality is the trans-shipping formulation,which solves for the minimum cost of shipping the distribution (s, a) into(t, y), with edge

costsd. The result of the lemma follows.

Using the above result we obtain the following LP for OneStep(s,t,d,a) over the vari-ables: (a) {u,v}u,vS, and (b) yb forb 1(t):

Minimize

u,vS

d(u, v) u,v subject to (4.6)

(1) for all v S :uS

u,v= (s, a)(v); (2) for all u S :vS

u,v=

b1(t)

yb (t, b)(u);

(3) for all u, v S: u,v0; (4) for allb 1(t) :yb 0; (5)

b1(t)

yb= 1.

Example 4.2. We now use the MDPs in Figure 3(a)and3(b)to compute the simulationdistance between states using the results in Lemma 1and Lemma2. In the figure, statesof the same color have a propositional distance of 0 and states of different colors have apropositional distance of 1;p(s, s) =p(t, t) =p(u, u) =p(v, v) =p(t, w) = 0. In MDP 1,shown in Figure 3(a), (s, a)(t) = (t, b)(v) = (t, c)(u) = 1 and (t, f)(u) = (t, f)(v) =12 . In MDP 2, shown in Figure 3(b), (s

, a)(w) = (s, b)(t) = 1, (t, c)(u) = 12 ,

(t, c)(v) = 12+ , (w, e)(u) =(w, f)(v) = 1 and (w, e)(v) =(w, f)(u) =.

s

t

u v

a

bcf

(a) MDP 1

a b

ce

s

w t

u v

f

(b) MDP 2

Figure 3: An example used to compute the simulation metric between states. States of the

same color have a propositional distance of 0.

In Table 2, we show the simulation metric distance between states of the MDPs inFigure 3(a) and Figure 3(b). Consider states t and t. c is the only move available toplayer 1 from state t and it induces a transition probability of 12 + to state v

and 12 to state u. For the pure move c at state t, the induced transition probabilities and edgecosts in the trans-shipping formulation are shown in Figure 4(a). It is easy to see that thetrans-shipping cost in this case is 12 +; shown in Table1along the row corresponding tomove c from state t and column corresponding to state t. Similarly, the trans-shippingcost for the moves b and f from state t are 12 and respectively. The metric distance

7/23/2019 0809.4326

17/27

7/23/2019 0809.4326

18/27


theory of reals formula to decide whether [s1t] r. Since [s1t] is the least fixed pointofH1, we define a formula (r) that is true iff, in the fixpoint, [s1t] r, as follows:

d M.[(

u,vS

OneStep(u,v,d) =d(u, v)) (d(s, t) r)].

If the formula (r) is true, then there exists a fixpoint d, such that d(s, t) is bounded byr, which implies that in the least fixpoint d(s, t) is bounded by r. Conversely, if in theleast fixpoint d(s, t) is bounded by r, then the least fixpoint is a witness d for (r) beingtrue. Since the existential theory of reals is decidable in PSPACE[6], we have the followingresult.

Theorem 4.4. (Decision complexity for exact distance). For al l turn-based game structuresand MDPsG, given a rationalr , and two statess andt, whether[s1t] r can be decidedin PSPACE.

Approximation. Given a rational > 0, using binary search and O(log(21

)) calls tocheck the formula (r), we can obtain an interval [l, u] with u l such that [s1t] liesin the interval [l, u].

Corollary 2. (Approximation for exact distance). For all turn-based game structuresand MDPsG, given a rational , and two statess and t, an interval [l, u] with u l such that [s1t] [l, u] can be computed in PSPACE.

4.2. Algorithms for the kernel. The kernel of the simulation metric 1can be computedas the limit of the series 01,

11,

21, . . . , of relations. For all s, t S, we have (s, t)

01

iff s t. For all n 0, we have (s, t) n+11 iffOneStep(s,t, 1n1 ) = 0. Checking theconditionOneStep(s,t, 1n

1

) = 0, corresponds to solving an LP feasibility problem for everya 1(s), as it suffices to replace the minimization goal =

u,vS1n1 (u, v) u,v with the

constraint= 0 in the LP (4.6). We note that this is the same LP feasibility problem thatwas introduced in[35] as part of an algorithm to decide simulation of probabilistic systemsin which each label may lead to one or more distributions over states.

For the bisimulation kernel, we present a more efficient algorithm, which also improveson the algorithms presented in [35]. The idea is to proceed by partition refinement, as usualfor bisimulation computations. The refinement step is as follows: given a partition, twostates s and t belong to the same refined partition iff every pure move from s induces aprobability distribution on equivalence classes that can be matched by mixed moves from t,and vice versa. Precisely, we compute a sequenceQ0,Q1,Q2, . . . , of partitions. Two statess, t belong to the same class ofQ0 iff they have the same variable valuation (i.e., iffs t).

For n 0, since by the definition of the bisimulation metric given in (2.2), [sg t] = 0 iff[s1t] = 0 and [t1s] = 0, two states s, tin a given class ofQn remain in the same class inQn+1 iff both (s, t) and (t, s) satisfy the set of feasibility LP problems OneStepBis(s,t, Qn)as given below:

OneStepBis(s,t, Q) consists of one feasibility LP problem for each a (s).The problem for a (s) has set of variables {xb | b (t)}, and set of

7/23/2019 0809.4326

19/27


constraints:

(1) for all b (t) : xb 0, (2) b(t)

xb = 1,

(3) for allV Q :

b(t)

uV

xb (t, b)(u)uV

(s, a)(u).

In the following theorem we show that two states s, t S are n+ 1 step bisimilar iffOneStepBis(s,t, Qn) and OneStepBis(t,s, Qn) are feasible.

Theorem 4.5. For all turn-based game structures and MDPs G, for all n 0, giventwo states s, t S and an n-step bisimulation partition of states Qn such that V Qn,u, v V, [ug v]

n = 0, the following holds,

[sg t]n+1 = 0 iffOneStepBis(s,t, Qn) andOneStepBis(t,s, Qn) are both feasible.

Proof. We proceed by induction on n. Assume the result holds for all iteration steps up ton and consider the case for n+ 1. In one direction, if [sg t]

n+1 = 0, then [s 1 t]n+1 =

[t 1 s]n+1 = 0 by the definition of the bisimulation metric. We need to show that given

[s1 t]n+1 = 0, OneStepBis(s,t, Qn) is feasible. The proof is identical for [t1 s]n

+1 = 0.From the definition of the n + 1 step simulation distance, givenp(s, t) = 0 by our inductionhypothesis, we have,

b 1(s) inf xD1(t)

supkC(dn)

(Ebs(k) Ext (k)) 0 . (4.7)

Consider a player 1 move a 1(s). Since we can interchange the order of the inf and supby the generalized minimax theorem in infxD1(t)supkC(dn)(E

as(k) E

xt (k)), the optimal

values ofx D1(t) andk C(dn) exist and only depend ona. Letxaand kabe the optimalvalues ofx and k that realize the inf and sup in inf

xD1(t)sup

kC(dn

)(Ea

s(k)Ex

t(k)). Using

xa and ka in (4.7) we have:

Exat (ka) E

as(ka)

uS

(t, xa)(u) ka(u)vS

(s, a)(v) ka(v)

VQn

uV

(t, xa)(u) ka(u)

VQn

vV

(s, a)(v) ka(v) (4.8)

VQn

uV

(t, xa)(u)

VQn

vV

(s, a)(v) (4.9)

V Qn.uV

(t, xa)(u) uV

(s, a)(u), (4.10)where (4.9) follows from (4.8) by noting that for all V Qn, for all states u, v V,dn(u, v) = dn(v, u) = 0, by our hypothesis, leading to k(u) k(v) dn(u, v) = 0 andk(v) k(u) dn(v, u) = 0, which implies k(u) = k(v) for all k C(dn). To show(4.10) follows from (4.9), assume towards a contradiction that there exists a V Qn

such that

uV(t, xa)(u)

uV(s, a)(u) since (t, xa) is a probability distribution and thesum of the probability mass allocated to each equivalence class should be 1. Further, forall V Qn, for all u, v V, we have dn(u, v) =dn(v, u) = 0 and for all u Vand for all

7/23/2019 0809.4326

20/27


w S\ V, we have dn(u, w) =dn(w, u) = 1. Therefore, we can pick a feasible k C(dn)such that k(v) > 0 for all v V and k(v) = 0 for all other states. Using k we getEa

s(k

) E

xa

t (k

)> 0 which means ka is not optimal, contradicting (4.7).In the other direction, assume that OneStepBis(s,t, Qn) is feasible. We need to showthat [s 1 t]n

+1 = 0. Since OneStepBis(s,t, Qn) is feasible, there exists a distributionxa D1(t) for all a 1(s) such that, V Qn.(

uV (t, xa)(u)

vV (s, a)(v)). By

our induction hypothesis, this implies that for all k C(dn), we have (Eas(k) Exat (k)) 0

and in particular supkC(dn)(Eas(k) E

xat (k)) 0. Since p(s, t) = 0 by our hypothesis and

we have shown,a 1(s) inf

xD1(t)sup

kC(dn)(Eas(k) E

xt (k)) 0,

we have, from Lemma1,

[s1t]n+1 =p(s, t) sup

a1(s)inf

xD1(t)sup

kC(dn)(Eas(k) E

xt (k)) = 0.

In a similar fashion, ifOneStepBis(t,s, Qn) is feasible then [t1 s]n+1 = 0, which leads to[sg t]

n+1 = 0 by the definition of the bisimulation metric, as required. Complexity. The number of partition refinement steps required for the computation ofboth the simulation and the bisimulation kernel is bounded byO(|S|2) for turn-based gamesand MDPs, where S is the set of states. At every refinement step, at most O(|S|2) statepairs are considered, and for each state pair (s, t) at most |(s)| LP feasibility problemsneeds to be solved. Let us denote by LPF(n, m) the complexity of solving the feasibility ofm linear inequalities over n variables. We obtain the following result.

Theorem 4.6. For all turn-based game structures and MDPsG, the following assertionshold:

(1) the simulation kernel can be computed inOn4 m LPF(n2 + m, n2 + 2n + m + 2) time;(2) the bisimulation kernel can be computed inO

n4 m LPF(m, n+m+ 1)

time;

wheren= |S| is the size of the state space, andm= maxsS |(s)|.

Remark 1. The best known algorithm forLPF(n, m) works in time O(n2.5 log(n)) [34](assuming each arithmetic operation takes unit time). The previous algorithm for thebisimulation kernel checked two way simulation and hence has the complexity O(n4 m (n2 + m)2.5 log(n2 + m)), whereas our algorithm works in time O(n4 m m2.5 log(m)). Formost practical purposes, the number of moves at a state is constant (i.e., m is constant).For the case whenmis constant, the previous best known algorithm worked in O(n9 log(n))time, whereas our algorithm works in time O(n4).

5. Algorithms for Concurrent Games

In this section we first show that the computation of the metric distance is at leastas hard as the computation of optimal values in concurrent reachability games. The exactcomplexity of the latter is open, but it is known to be at least as hard as the square-rootsum problem, which is in PSPACE but whose inclusion in NP is a long-standing openproblem [16, 18]. Next, we present algorithms based on a decision procedure for the theoryof real closed fields, for both checking the bounds of the exact distance and the kernel of themetrics. Our reduction to the theory of real closed fields removes one quantifier alternation

7/23/2019 0809.4326

21/27


when compared to the previous known formula (inferred from [11]). This improves thecomplexity of the algorithm.

5.1. Reduction of reachability games to metrics. We will use the following terms inthe result. A propositionis a boolean observation variable, and we say a state is labeled bya propositionq iffqis true ats. A state t is absorbing in a concurrent game, if both playershave only one action available at t, and the next state oft is always t (it is a state with aself-loop). For a propositionq, let qdenote the set of paths that visit a state labeled byqat least once. In concurrent reachability games, the objective is q, for a proposition q,and without loss of generality all states labeled by qare absorbing states.

Theorem 4. Consider a concurrent game structureG, with a single proposition q, suchthat all states labeled byqare absorbing states. We can construct in linear-time a concurrentgame structureG, with one additional statet, such that for alls S, we have

[s1t

] = sup11 inf22 Pr1,2s (q).

Proof. The concurrent game structure G is obtained from G by adding an absorbingstate t. The states that are not labeled by q, and the additional state t, are labeledby its complement q. Observe there is only one proposition sequence from t, and itis (q). To prove the desired claim we show that for all s S we have [s 1 t] =sup11 inf22Pr

1,2s (q). From a state s in G the possible proposition sequences can

be expressed as the following-regular expression: (q) (q) q. Since the propositionsequence from t is (q), the supremum of the difference in values over q formulas at sand t is obtained by satisfying the set of paths formalized as (q) q at s. The set ofpaths defined as (q) q is the same as reaching qin any number of steps, since all stateslabeled byqare absorbing. Hence,

supq+

([[]](s) [[]](t)) = [[X.(q Pre1(X))]](s).

It follows from the results of [10]that for all s Swe have,

[[X.(q Pre1(X))]](s) = sup11

inf22

Pr1,2s (q).

From the above equalities and the logical characterization result (2.3) we obtain the desiredresult.

5.2. Algorithms for the metrics. We first prove a lemma that helps to obtain reduced-complexity algorithms for concurrent games. The lemma states that the distance [s 1 t]

is attained by restricting player 2 to pure moves at state t, for all states s, t S.Lemma 3. For all concurrent game structuresG and all metricsd M, we have,

supkC(d)

supx1D1(s)

infy1D1(t)

supy2D2(t)

infx2D2(s)

(Ex1,x2s (k)) Ey1,y2t (k))

= supkC(d)

supx1D1(s)

infy1D1(t)

supb2(t)

infx2D2(s)

(Ex1,x2s (k) Ey1,bt (k)). (5.1)

7/23/2019 0809.4326

22/27


Proof. To prove our claim we fix k C(d), and player 1 mixed moves x D1(s), andy D1(t). We then have,

supy2D2(t)

infx2D2(s)

(Ex,x2s (k)) Ey,y

2t (k)) = inf x2D2(s)

Ex,x2s (k) infy2D2(t)

Ey,y

2t (k) (5.2)

= inf x2D2(s)

Ex,x2s (k) inf

b2(t)E

y,bt (k) (5.3)

= supb2(t)

infx2D2(s)

(Ex,x2s (k) Ey,bt (k)),

where (5.3) follows from (5.2) since the decomposition on the rhs of (5.2) yields two inde-pendent linear optimization problems; the optimal values are attained at a vertex of theconvex hulls of the distributions induced by pure player 2 moves at the two states. Thiseasily leads to the result.

We now present algorithms for metrics in concurrent games. Due to the reduction

from concurrent reachability games, shown in Theorem 4, it is unlikely that we have analgorithm in NP for the metric distance between states. We therefore construct statementsin the theory of real closed fields, firstly to decide whether [ s1 t] r, for a rational r, sothat we can approximate the metric distance between states s and t, and secondly to decideif [s 1 t] = 0 in order to compute the kernel of the game simulation and bisimulationmetrics.

The statements improve on the complexity that can be achieved by a direct translationof the statements of [11] to the theory of real closed fields. The complexity reduction isbased on the observation that using Lemma 3, we can replace a sup operator with finiteconjunction, and therefore reduce the quantifier complexity of the resulting formula. Fixa game structure G and states s and t ofG. We proceed to construct a statement in thetheory of reals that can be used to decide if [ s1 t] r, for a given rational r.

In the following, we use variables x1,y1 andx2 to denote a set of variables {x1(a)| a 1(s)}, {y1(a)| a 1(t)} and {x2(b)| b 2(s)}respectively. We use k to denote the setof variables {k(u)| u S}, and d for the set of variables {d(u, v)| u, v S}. The variables, , , range over reals. For convenience, we assume 2(t) ={b1, . . . , bl}.

First, notice that we can write formulas that state that a variable x is a mixed movefor a player at state s, and k is a constructible predicate (i.e., k C(d)):

IsDist(x, 1(s))

a1(s)

x(a) 0

a1(s)

x(a) 1

a1(s)

x(a) = 1,

kBounded(k, d)uS

k(u) 1 k(u) 2

u,vS

(k(u) k(v) d(u, v)).

In the following, we write bounded quantifiers of the form x1 D1(s) or k C(d)which mean respectively x1.IsDist(x1, 1(s)) and k.kBounded(k, d) .

Let(k, x1, x2, y1, b) be the polynomial Ex1,x2s (k) E

y1,bt (k). Notice that is a polyno-

mial of degree 3. We write a = max{a1, . . . , al}for variables a, a1, . . . , al for the formula

(a= a1l

i=1

a1 ai) . . . (a= all

i=1

al ai).

7/23/2019 0809.4326

23/27


We construct the formula for game simulation in stages. First, we construct a formula1(d,s,t,k,x,) with free variables d, k,x, such that 1(d,s,t,k,x1, ) holds for a valu-

ation to the variables iff= inf

y1D1(t)sup

b2(t)inf

x2D2(s)(Ex1,x2s (k) E

y1,bt (k)).

We use the following observation to move the innermost inf ahead of the sup over the finiteset 2(t) (for a function f):

supb2(t)

infx2D2(s)

f(b, x2, x) = inf xb12 D2(s)

. . . infxbl2 D2(s)

max(f(b1, xb12 , x), . . . , f (bl, x

bl2, x)).

The formula 1(d,s,t,k,x1, ) is given by:

y1 D1(t).xb12 D2(s) . . . x

bl2 D2(s).w1. . . wl.a.

.

y1 D1(t).xb12 D2(s) . . .x

bl2 D2(s).w1. . . wl.a.

w1= (k, x1, x

b12 , y1, b1)

wl = (k, x1, xbl2, y1, bl)

a= max{w1, . . . , wl}

(a )

w1= (k, x1,x

b12 ,y1, b1)

wl = (k, x1,xbl2,y1, bl)

a= max{w1, . . . ,wl} a

(s, t)

( )

.

Using 1, we construct a formula (d,s,t,) with free variables d M and M suchthat (d,s,t,) is true iff:

= supkC(d)

supx1D1(s)

infy1D1(t)

supb2(t)

infx2D2(s)

(Ex1,x2s (k) Ey1,bt (k)).

The formula is defined as follows:

k C(d).x1 D1(s)...

1(d,s,t,k,x1, ) ((s, t) )(k C(d).x1 D1(s).

.1(d,s,t,k, x1,

) (s, t) )

. (5.4)

Finally, given a rational r , we can check if [s1t] r by checking if the following sentenceis true:

d M.a M.[(

u,vS(d,u,v,a(u, v)) (d(u, v) =a(u, v))) (d(s, t) r)]. (5.5)

The above sentence is true iff in the least fixpoint, d(s, t) is bounded byr. Like in the case ofturn-based games and MDPs, given a rational >0, using binary search and O(log(21

))

calls to a decision procedure to check the sentence (5.5), we can compute an interval [l, u]with u l , such that [s1t] [l, u].

Complexity. Note that is of the form, because 1 is of the form, and appears innegative position in . The formula has (|S|+ |1(s)|+3) universally quantified variables,followed by (|S|+ |1(s)|+3+2(|1(t)|+ |2(s)||2(t)|+ |2(t)|+2)) existentially quantified

7/23/2019 0809.4326

24/27


variables, followed by 2(|1(t)|+|2(s)||2(t)|+|2(t)|+1) universal variables. The sentence(5.5) introduces|S|2 + |S|2 existentially quantified variables ahead of . The matrix of the

formula is of length at most quadratic in the size of the game, and the maximum degreeof any polynomial in the formula is 3. We define the size of a game G as: |G| = |S| +T,where T =

s,tS

a,bMoves|(s,a,b)(t)|. Using the complexity of deciding a formula in

the theory of real closed fields[3], which states that a formula withiquantifier blocks, whereeach block haslivariables, ofp polynomials, has a time complexity bound ofO(p

O((li+1))),we get the following result.

Theorem 5.1. (Decision complexity for exact distance). For all concurrent game structuresG, given a rationalr, and two statess and t, whether [s1 t] r can be decided in time

O(|G|O(|G|5)).

Approximation. Given a rational > 0, using binary search and O(log(21

)) calls tocheck the formula5.5,we can obtain an interval [l, u] with u l such that [s1 t] liesin the interval [l, u].

Corollary 3. (Approximation for exact distance). For all concurrent game structuresG, given a rational , and two states s and t, an interval [l, u] with u l such that

[s1t] [l, u] can be computed in timeO(log(21

) |G|O(|G|

5)).

In contrast, the formula to check whether [s 1 t] r , for a rational r, as implied bythe definition ofH1(d)(s, t), that does not use Lemma 3, has five quantifier alternationsdue to the inner sup, which when combined with the 2 |S|2 existentially quantified variables

in the sentence (5.5), yields a decision complexity ofO(|G|O(|G|7)).

5.3. Computing the kernels. Similar to the case of turn-based games and MDPs, the

kernel of the simulation metric 1 for concurrent games can be computed as the limit ofthe series 01,

11,

21, . . . , of relations. For all s, t S, we have (s, t)

01 iffs t. For all

n 0, we have (s, t)n+11 iff the following sentence s is true:

a.(dn,s ,t ,a) a = 0,

where is defined as in (5.4) and at step nin the iteration, the distance between any pairof states u, v Sis defined as follows,

u, v S. dn(u, v) =

0 if (s, t) n11 if (s, t) n1

.

To compute the bisimulation kernel, we again proceed by partition refinement. For a set ofpartitions Q0, Q1, . . ., where (s, t) V for V Qn implies (s, t) n1 , (s, t)

n+1 iff the

following sentence b is true for the state pairs (s, t) and (t, s):a.(dn,s ,t ,a) a = 0,

where is again as defined in (5.4) and at step n in the iteration, the distance between anypair of states u, v Sis defined as follows,

u, v S. dn(u, v) =

0 if (s, t) n11 if (s, t) n1

.

7/23/2019 0809.4326

25/27


Complexity. In the worst case we need O(|S|2) partition refinement steps for computingboth the simulation and the bisimulation relation. At each partition refinement step the

number of state pairs we consider is bounded by O(|S|2

). We can check if sand bare trueusing a decision procedure for the theory of real closed fields. Therefore, we needO(|S|4)decisions to compute the kernels. The partitioning of states based on the decisions can bedone by any of the partition refinement algorithms, such as [25].

Theorem 5.2. For all concurrent game structuresG, statess and t, whethers 1 t canbe decided inO(|G|O(|G|

3)) time, and whethersg t can be decided inO(|G|O(|G|3)) time.

6. Conclusion: Possible Applications and Open Problems

We have shown theoretical applications of game metrics with respect to discounted andlong-run average values of games. An interesting question regarding game metrics is related

to their usefulness in real-world applications. We now discuss possible applications of gamemetrics.

State space reduction. The kernels of the metrics are the simulation and bisimula-tion relations. These relations have been well studied in the context of transitionsystems with applications in program analysis and verification. For example, in[19]the authors show that bisimulation based state space reduction is practical andmay result in an enormous reduction in model size, speeding up model checking ofprobabilistic systems.

Security. Bisimulation plays a critical role in the formal analysis of security proto-cols. If two instances of a protocol, parameterized by a message m, are bisimilarfor messages m and m, then the messages remain secret [7]. The authors usebisimulation in probabilistic transition systems to analyze probabilistic anonymity

in security protocols. Computational Biology. In the emerging area of computational systems biology,

the authors of [30] use the metrics defined in the context of probabilistic systems[12,32,33]to compare reduced models ofStochastic Reaction Networks. These re-action networks are used to study intra-cellular behavior in computational systemsbiology. The reduced models are Continuous Time Markov Chains (CTMCs), andthe comparison of different reduced models is via the metric distance between theirinitial states. A central question in the study of intra-cellular behavior is estimatingthe sizes of populations of various species that cohabitate cells. The inter-cellulardynamics in this context is modeled as a stochastic process, representing the tem-poral evolution of the species populations, represented by a family (X(t))t0 ofrandom vectors. For 0 i < N, Nbeing the number of different species, Xi(t) isthe population of species Si at time t. In[26], the authors show how CTMCs thatmodel system dynamics can be reduced to Discrete Time Markov Chains (DTMCs)using a technique called uniformization or discrete-time conversion. The DTMCsare stochastically identical to the CTMCs and enable more efficient estimation ofspecies populations. An assumption that is made in these studies is that systems arespatially homogeneous and thermally equilibrated; the molecules are well stirred ina fixed volume at a constant temperature. These assumptions enable the reductionof these systems to CTMCs and to DTMCs in some cases.

7/23/2019 0809.4326

26/27


In the applications we have discussed, non-determinism is modeled probabilistically. Inapplications where non-determinism needs to be interpreted demonically, rather than prob-

abilistically, MDPs or turn-based games would be the appropriate framework for analysis.If the interaction between various sources of non-determinism needs to be modeled simulta-neously, then concurrent games would b e the appropriate framework for analysis. For theanalysis of these general models, our results and algorithms will be useful.

Open Problems. While we have shown polynomial time algorithms for the kernel of thesimulation and bisimulation metrics for MDPs and turn-based games, the existence of apolynomial time algorithm for the kernel of both the simulation and bisimulation metricsfor concurrent games is an open problem. The existence of a polynomial time algorithmto approximate the exact metric distance in the case of turn-based games and MDPs is anopen problem. The existence of a PSPACE algorithm for the decision problem of the exactmetric distance in concurrent games is an open problem.

Acknowledgments. The first, second and fourth author were supported in part by theNational Science Foundation grants CNS-0720884 and CCR-0132780. The third author wassupported in part by the National Science Foundation grants CCF-0427202, CCF-0546170.We would like to thank the reviewers for their detailed comments that helped us make thepaper better.

References

[1] R. Alur, T.A. Henzinger, and O. Kupferman. Alternating time temporal logic. J. ACM, 49:672713,2002.

[2] C. Baier. Polynomial time algorithms for testing probabilistic bisimulation and simulation. In CAV,pages 5061. Springer-Verlag, 1996.

[3] S. Basu. New results on quantifier elimination over real closed fields and applications to constraintdatabases.J. ACM, 46(4):537555, 1999.

[4] D.P. Bertsekas.Dynamic Programming and Optimal Control. Athena Scientific, 1995. Volumes I and II.[5] M.C. Browne, E.M. Clarke, and O. Grumberg. Characterizing finite Kripke structures in propositional

temporal logic. Theoretical Computer Science, 59:115131, 1988.[6] J. F. Canny. Some algebraic and geometric computations in pspace. In STOC, pages 460467. ACM

Press, 1988.[7] K. Chatzikokolakis, G. Norman, and D. Parker. Bisimulation for demonic schedulers. In L. de Alfaro,

editor,12th International Conference on Foundations of Software Science and Computation Structures(FOSSACS09), volume 5504 ofLNCS, pages 318332. Springer, 2009.

[8] L. de Alfaro, T.A. Henzinger, and O. Kupferman. Concurrent reachability games.Theoretical ComputerScience, 386(3):188217, 2007.

[9] L. de Alfaro, T.A. Henzinger, and R. Majumdar. Discounting the future in systems theory. InProc. 30thInt. Colloq. Aut. Lang. Prog., volume 2719 ofLect. Notes in Comp. Sci., pages 10221037. Springer-

Verlag, 2003.[10] L. de Alfaro and R. Majumdar. Quantitative solution of omega-regular games. Journal of Computerand System Sciences, 68:374397, 2004.

[11] L. de Alfaro, R. Majumdar, V. Raman, and M. Stoelinga. Game relations and metrics. InLICS, pages99108. IEEE Computer Society Press, 2007.

[12] J. Desharnais, V. Gupta, R. Jagadeesan, and P. Panangaden. Metrics for labelled Markov systems. InCONCUR, pages 258273. Springer-Verlag, 1999.

[13] J. Desharnais, V. Gupta, R. Jagadeesan, and P. Panangaden. Approximating labelled markov processes.Information and Computation, 2002.

[14] J. Desharnais, V. Gupta, R. Jagadeesan, and P. Panangaden. The metric analogue of weak bisimulationfor probabilistic processes. InLICS, pages 413422. ACM Press, 2002.

7/23/2019 0809.4326

27/27


[15] K. Etessami and M. Yannakakis. Recursive concurrent stochastic games. In ICALP (2), pages 324335.Springer-Verlag, 2006.

[16] K. Etessami and M. Yannakakis. On the complexity of Nash equilibria and other fixed points (extended

abstract). InFOCS, pages 113123. IEEE Computer Society Press, 2007.[17] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer-Verlag, 1997.[18] M. R. Garey, R. L. Graham, and D. S. Johnson. Some NP-complete geometric problems. In STOC,

pages 1022. ACM Press, 1976.[19] Joost-Pieter Katoen, Tim Kemna, Ivan S. Zapreev, and David N. Jansen. Bisimulation minimisation

mostly speeds up probabilistic model checking. In TACAS, pages 87101, 2007.[20] D. Kozen. A probabilistic PDL. InProc. 15th ACM Symp. Theory of Comp., pages 291297, 1983.[21] D.A. Martin. The determinacy of Blackwell games. The Journal of Symbolic Logic, 63(4):15651581,

1998.[22] A. McIver and C. Morgan.Abstraction, Refinement, and Proof for Probabilistic Systems. Monographs

in Computer Science. Springer-Verlag, 2004.[23] J.F. Mertens and A. Neyman. Stochastic games.International Journal of Game Theory, 10:5366, 1981.[24] R. Milner. Operational and algebraic semantics of concurrent processes. In Handbook of Theoretical

Computer Science, volume B, pages 12021242. Elsevier Science Publishers, 1990.[25] R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM Journal on Computing,

16(6):973989, 1987.[26] Werner Sandmann and Verena Wolf. Computational probability for systems biology. In FMSB, pages

3347, 2008.[27] R. Segala and N.A. Lynch. Probabilistic simulations for probabilistic processes. InCONCUR, pages

481496. Springer-Verlag, 1994.[28] L.S. Shapley. Stochastic games.Proc. Nat. Acad. Sci. USA, 39:10951100, 1953.[29] M. Sion. On general minimax theorems.Pacific Journal of Mathematics., 8:171176, 1958.[30] D. Thorsley and E. Klavins. A theory of approximation for stochastic biochemical processes. to appear

in B. Ingalls and P. Iglesias, eds., Control-Theoretic Approaches to Systems Biology. MIT Press, 2009.[31] F. van Breugel, B. Sharma, and J. Worrell. Approximating a behavioural pseudometric without discount

for probabilistic systems. CoRR, abs/0803.3796, 2008.[32] F. van Breugel and J. Worrell. An algorithm for quantitative verification of probabilistic transition

systems. InCONCUR, pages 336350. Springer-Verlag, 2001.[33] F. van Breugel and J. Worrell. Towards quantitative verification of probabilistic transition systems. In

ICALP, pages 421432. Springer-Verlag, 2001.[34] Y. Ye. Improved complexity results on solving real-number linear feasibility problems.Math. Program.,

106(2):339363, 2006.[35] L. Zhang and H. Hermanns. Deciding simulations on probabilistic automata. InATVA, pages 207222.

Springer-Verlag, 2007.

0809.4326

Documents