synthesis of joint control and active sensing …jfu2/files/papers/fu-topcu-2016...introduces active...

13
0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEE Transactions on Automatic Control 1 Synthesis of Joint Control and Active Sensing Strategies under Temporal Logic Constraints Jie Fu, Member, IEEE, and Ufuk Topcu, Member, IEEE Abstract—This paper proposes an approach to control of discrete systems with incomplete information and sensing capa- bilities, with respect to temporal logic constraints. The approach introduces active sensing to alleviate computational effort in con- trol design for systems interacting with uncontrollable environ- ments under incomplete information. Particularly, it transforms a deterministic controller under complete information into a randomized, observation-based controller. Interleaving the latter with strategic queries to sensors, the temporal logic specification is proven to be satisfied almost surely. The effectiveness of the method is demonstrated with robotic motion planning examples. Index Terms—Formal methods, control, temporal logic, active sensing, computational game theory. I. I NTRODUCTION Interactions between a system and its dynamic, uncontrolled environment can be captured as a two-player zero-sum game. The winning strategy for the system with respect to a given temporal logic specification provides a reactive controller that guarantees correctness of the controlled system against all allowable behaviors of its environment. Synthesis of such controllers has been successfully applied to hardware design [1], control of autonomous robotic systems [2], [3], vehicle management systems [4]. Moreover, software tool-sets [5]–[7] have been developed to facilitate control design process. A common yet often unjustified assumption in much of the present work on reactive synthesis is the availability of complete and precise information (about the system and envi- ronment states) during the execution of controllers. However, such an assumption is impractical. Accordingly, control design methods have been developed for systems with incomplete in- formation. Depending on the modeling of the environment, the interaction between the system and its environment gives rise to a partially observable Markov decision process (POMDP) for purely stochastic environments, and a two-player zero-sum game with partial information for non-deterministic environ- ments. For a POMDP subject to temporal logic constraints, a finite-memory policy can be computed with the method in [8] to ensure the objective is satisfied with probability one for a given initial state, whenever such a policy exists. Two- player zero-sum games with partial information cannot be solved by POMDP solvers for the difference in the modeling J. Fu is with the Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, Worcester, MA, 01604. E-mail: [email protected] U. Topcu is with the Department of Aerospace Engineering and Engineering Mechanics at The University of Texas at Austin, Austin, TX, 78712. E-mail: [email protected]. Manuscript received December 21, 2014; revised June 15, 2015. of environment. For such games, synthesis methods and tools have been developed with respect to two qualitative criteria [9]–[11]: A sure-winning controller ensures the satisfaction of a specification and an almost-sure winning controller is a randomized strategy and ensures satisfaction with probability 1. These solutions rely on a subset construction and have complexity exponential in the size of the state space [12]. The computational complexity largely limits the practical utility of these algorithms. In this paper, we study a synthesis problem with partial information from a fresh perspective: Since the control synthesis is computationally expensive in the case of partial information, is it possible to gather additional infor- mation during runtime execution and reduce computational complexity by utilizing such information, with guaranteed correctness with respect to the temporal logic specification? More specifically, we introduce the notion of active sensing in reactive synthesis under temporal logic constraints in dis- crete finite-state transition systems. Discrete finite-state transi- tion systems often arise as abstractions of complex continuous systems. Various abstraction methods have been developed in the past, such as, state-space discretization [13], counter- example guided method [14], and domain-specific abstraction methods for linear systems [15], hybrid systems [16], [17], and robotic systems with pre-defined atomic controllers [18]. In this paper, we focus on the problem of synthesizing control and active sensing policies at the discrete level of abstraction. Active sensing emphasizes that a system chooses to query its sensors intentionally to reduce the ambiguity in its belief, a set which the system believes the current state is in or a probability distribution of states, in order to accomplish its task. In general, there are two different approaches to active sensing: One approach [19], [20] studies the problem of optimal information gathering and motion planning in POMDPs with an individual robot or a cooperative team of robots. The problem of synthesizing an active sensing strategy in a POMDP is transformed into a reinforcement learning problem, in which information gain is maximized and the accumulated cost of actions is minimized. The second ap- proach, which is the one considered in this paper, is defined by introducing a set of sensing actions, also known as knowledge- producing actions [21]. These actions, when applied, reveal the truth values of certain propositional logic formulas. A system makes decisions on which sensing actions to apply at a given instance of execution, and uses the obtained information toward achieving its goal. The inclusion of sensing actions leads to two critical questions. First is on the control synthesis: Given a system with partial observation, a set of sensing actions, and a temporal logic specification, how to design

Upload: others

Post on 11-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

1

Synthesis of Joint Control and Active SensingStrategies under Temporal Logic Constraints

Jie Fu, Member, IEEE, and Ufuk Topcu, Member, IEEE

Abstract—This paper proposes an approach to control ofdiscrete systems with incomplete information and sensing capa-bilities, with respect to temporal logic constraints. The approachintroduces active sensing to alleviate computational effort in con-trol design for systems interacting with uncontrollable environ-ments under incomplete information. Particularly, it transformsa deterministic controller under complete information into arandomized, observation-based controller. Interleaving the latterwith strategic queries to sensors, the temporal logic specificationis proven to be satisfied almost surely. The effectiveness of themethod is demonstrated with robotic motion planning examples.

Index Terms—Formal methods, control, temporal logic, activesensing, computational game theory.

I. INTRODUCTION

Interactions between a system and its dynamic, uncontrolledenvironment can be captured as a two-player zero-sum game.The winning strategy for the system with respect to a giventemporal logic specification provides a reactive controller thatguarantees correctness of the controlled system against allallowable behaviors of its environment. Synthesis of suchcontrollers has been successfully applied to hardware design[1], control of autonomous robotic systems [2], [3], vehiclemanagement systems [4]. Moreover, software tool-sets [5]–[7]have been developed to facilitate control design process.

A common yet often unjustified assumption in much ofthe present work on reactive synthesis is the availability ofcomplete and precise information (about the system and envi-ronment states) during the execution of controllers. However,such an assumption is impractical. Accordingly, control designmethods have been developed for systems with incomplete in-formation. Depending on the modeling of the environment, theinteraction between the system and its environment gives riseto a partially observable Markov decision process (POMDP)for purely stochastic environments, and a two-player zero-sumgame with partial information for non-deterministic environ-ments. For a POMDP subject to temporal logic constraints,a finite-memory policy can be computed with the method in[8] to ensure the objective is satisfied with probability onefor a given initial state, whenever such a policy exists. Two-player zero-sum games with partial information cannot besolved by POMDP solvers for the difference in the modeling

J. Fu is with the Department of Electrical and Computer Engineering,Worcester Polytechnic Institute, Worcester, MA, 01604.E-mail: [email protected]

U. Topcu is with the Department of Aerospace Engineering and EngineeringMechanics at The University of Texas at Austin, Austin, TX, 78712.E-mail: [email protected].

Manuscript received December 21, 2014; revised June 15, 2015.

of environment. For such games, synthesis methods and toolshave been developed with respect to two qualitative criteria[9]–[11]: A sure-winning controller ensures the satisfactionof a specification and an almost-sure winning controller is arandomized strategy and ensures satisfaction with probability1. These solutions rely on a subset construction and havecomplexity exponential in the size of the state space [12]. Thecomputational complexity largely limits the practical utility ofthese algorithms. In this paper, we study a synthesis problemwith partial information from a fresh perspective: Since thecontrol synthesis is computationally expensive in the case ofpartial information, is it possible to gather additional infor-mation during runtime execution and reduce computationalcomplexity by utilizing such information, with guaranteedcorrectness with respect to the temporal logic specification?

More specifically, we introduce the notion of active sensingin reactive synthesis under temporal logic constraints in dis-crete finite-state transition systems. Discrete finite-state transi-tion systems often arise as abstractions of complex continuoussystems. Various abstraction methods have been developedin the past, such as, state-space discretization [13], counter-example guided method [14], and domain-specific abstractionmethods for linear systems [15], hybrid systems [16], [17],and robotic systems with pre-defined atomic controllers [18].In this paper, we focus on the problem of synthesizing controland active sensing policies at the discrete level of abstraction.

Active sensing emphasizes that a system chooses to queryits sensors intentionally to reduce the ambiguity in its belief,a set which the system believes the current state is in ora probability distribution of states, in order to accomplishits task. In general, there are two different approaches toactive sensing: One approach [19], [20] studies the problemof optimal information gathering and motion planning inPOMDPs with an individual robot or a cooperative team ofrobots. The problem of synthesizing an active sensing strategyin a POMDP is transformed into a reinforcement learningproblem, in which information gain is maximized and theaccumulated cost of actions is minimized. The second ap-proach, which is the one considered in this paper, is defined byintroducing a set of sensing actions, also known as knowledge-producing actions [21]. These actions, when applied, revealthe truth values of certain propositional logic formulas. Asystem makes decisions on which sensing actions to apply at agiven instance of execution, and uses the obtained informationtoward achieving its goal. The inclusion of sensing actionsleads to two critical questions. First is on the control synthesis:Given a system with partial observation, a set of sensingactions, and a temporal logic specification, how to design

Page 2: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

2

an active sensing strategy and a control strategy such thatthe system satisfies the specification? The second question ison the computational complexity: Is it possible to reduce thecomputational complexity of synthesis of a provably correctcontroller with partial information by utilizing active sensing?

By explicitly introducing sensing actions into reactive syn-thesis, we have to not only consider the effect of control inputbut also prepare for all possible situations that may be revealedby applying such sensing actions. Thus, an offline computationof the strategy with partial information and sensing actionsis at least as computationally heavy as synthesis with partialinformation itself. Recent studies [22]–[25] propose onlineplanning with partial information and sensing actions as a wayto overcome such complexity since the system only needs tocompute a strategy for a finite number of steps and replan withnew information obtained through sensing actions. However,in these aforementioned works, only traditional reach-avoidpath planning objectives are considered, which constitute aproper subset of objectives that can be expressed in temporallogic. Furthermore, these planners cannot provide correctnessguarantees with respect to temporal logic constraints in thepresence of dynamical environments. In this paper, we makethe first effort that successfully incorporates active sensing intoreactive synthesis for designing provably correct controllerswith respect to temporal logic constraints.

We propose an online planning algorithm featured byswitching between two phases, exploitation and explorationwith sensing. In the exploitation phase, the system takescontrol actions in order to satisfy its specification based onits current belief. During the exploration phase, the systemuses sensing actions to reduce the ambiguity in its currentbelief. The correctness of the online control strategy is ensuredby two key elements. The first element is a transformationfrom a sure-winning strategy in a game with complete in-formation to an observation-based randomized exploitationstrategy. The transformation is computationally efficient giventhat it does not involve a subset construction. However, sincethis exploitation strategy may not be defined for all beliefsthe system might encounter—when the system runs into abelief for which no exploitative action is applicable—thesystem applies a sequence of sensing actions, to refine itscurrent belief until it finds itself in a belief for which theexploitation strategy is defined. In this way, we make a trade-off between the amount of computation and run-time sensingefforts. Note that the notions of exploration and exploitation inthis context are different from those in [26], [27], in which thesystem has complete information (e.g. sensory information)yet incomplete knowledge of its stochastic environment (e.g.,a model of the environment). Thus, in [26], [27], a plannerexplores to learn unknown dynamics rather than to acquirenew information.

In addition to alleviating the expense of offline computation,the notion of sensing actions provides a systematic way ofsensor design for systems under temporal logic constraints.Through active sensing, the system can select what and whento query, and what information to be overlooked for the timebeing. This type of local modification of the sensor config-uration was not possible in the sensor framework examined

in the past [12], [28], and is of great interest because it canpotentially reduce the usage of sensors and communicationsof information.

The rest of the paper is organized as follows. We beginwith an informal problem statement and an overview of thesolution approach. In section III, we discuss some preliminar-ies and give a formal problem statement. The main result ispresented in section IV. We first present synthesis algorithmsfor an exploitation strategy and an active sensing strategy,and then prove the given temporal logic specification can besatisfied with probability 1 by interleaving these two strategies.Section V presents a synthesis method for an online sensingstrategy that trades the optimality of an active sensing strategyfor computational efficiency. We demonstrate the solution withtwo case studies in the context of robotic motion planning anddistributed sensors in section VI. Section VII concludes.

II. PROBLEM STATEMENT AND APPROACH OVERVIEW

We study the following problem: Given a temporal logicspecification, a partially observable environment, and a setof sensing actions that can acquire additional informationwhen they are applied, construct a controller with availablecontrol inputs and sensing actions such that the controlledsystem satisfies the specification. We illustrate an overview ofour approach through a running example—the so-called “theWumpus game” on a 4× 4 gridworld [29].

Example 1. Consider the gridworld in Figure 1 in which arobot is to go back and forth between the gold mine from thebottom-right corner to its home at the top-left corner infinitelyoften, while avoiding the Wumpus (A formal task specificationis given in section III-B). The robot can move in any of thefour compass directions (north (N), south (S), east (E) andwest (W)), by one cell at a time. If it hits the boundary, itwill stay in its current cell. A monster, called the Wumpus,can move along the cells on the diagonal but each time itcan only move to one of its adjacent, empty cells along thediagonal. The robot will be captured by the Wumpus if it runsinto a cell currently occupied by the Wumpus. Yet the Wumpuscannot walk into a cell which is already occupied by therobot. However, during execution, the robot does not observethe location of the Wumpus. The Wumpus emits stenches thatdrift to its adjacent cells, indicated by the waves in Figure 1.Suppose the robot is equipped with a sensor that can detectwhether there is a stench in any of its adjacent cells and eachtime the sensor can be applied to only one such cell. Given thecost of sensing such as time and energy, it is not preferable toapply this sensor for all adjacent cells all the time. Then, wewant to design a strategy that decides, at each step, whetherit is necessary to apply the sensor, and if so, what cell (orcells) to be sensed, and which action to take, such that therobot accomplishes its task.

The solution we propose is a control policy that performsalternating exploitation with control actions and explorationwith sensing actions. The interaction between a system andits environment is captured as a two-player, turn-based gamein which the system is player 1 and its environment is player

Page 3: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

3

0 1 2 3

0

1

2

3

Fig. 1. The robot and the Wumpus (the green monster). The robot hasactions N,W,E, S for moving in four compass directions. The Wumpusmoves only along the diagonal with actions U and D for moving to itsnortheast cell and to its southwest cell, respectively. (Images are taken fromc©http://blog.athico.com/2012/01/wumpus-world.html)

2. Although the game state is only observed partially duringcontrol execution, we can still compute a sure-winning, deter-ministic strategy based on complete knowledge about the gamedynamics. At runtime, with partial information, the systemcontinuously infers a set of states it can be in, referred to asits belief. Then we transform this sure-winning deterministicstrategy into a belief-based (which can be transformed intoan observation-based) randomized exploitation strategy. Thetransformation ensures that, given the system’s current belief,for any action that can be taken with a non-zero probabilityaccording to the exploitation strategy, the system is ensured toreach some state from which onward it is still able to satisfythe specification. Moreover, for the actual state of the system,there is a non-zero probability of choosing the action indicatedby the sure-winning strategy, even though the system may notknow which is the actual state due to partial information. InExample 1, suppose the robot is at cell (2, 0), even if it doesnot know in which cell the Wumpus is exactly, it can stillperform action either N or W because both actions will getitself closer to home.

At runtime, the system applies the belief-based exploitationstrategy as long as this strategy is defined for the currentbelief. However, the system may run into a belief for which theexploitation strategy is undefined. For example, if the robot,while in cell (1, 0), hypothesizes the Wumpus can be in any ofthe cells along the diagonal, then it cannot make any furthermove, threatened by running into the Wumpus. In this case,the robot explores with a finite sequence of sensing actionsaccording to an active sensing strategy in order to decide thenext control action. With the acquired information, the systemrevises its current belief until it finds itself in a belief for whichthe exploitation strategy is defined. In the Wumpus game, ifthe robot detects whether there is a stench in cell (1, 0), then itwill have information to decide whether the Wumpus is in theset of cells {(0, 0), (1, 1)} or the set of cells {(2, 2), (3, 3)}.Once the sensing action is applied and if the Wumpus is in oneof the cells of {(2, 2), (3, 3)}, then the robot can now applyeither action ‘N’ or ‘W’ and reach home in 3 steps withoutany further detection of stenches.

Under a certain condition (discussed in Section IV-C),which basically requires the set of sensing actions to besufficient for avoiding dead-ends with online planning, byalternating the exploitation strategy and the sensing strategy,system is ensured to achieve the goal almost surely, i.e.,

with probability 1. We now continue with some preliminarymaterial needed for a formal problem statement and oursolution.

III. PRELIMINARIES

A probability distribution on a finite set S is a functionD : S → [0, 1] such that

∑s∈S D(s) = 1. The support of

D is the set Supp(D) = {s ∈ S | D(s) > 0}. The set ofprobability distributions on a finite set S is denoted D(S).Let Σ be a finite alphabet. Σ∗, Σ+ and Σω are sets of stringsover Σ with finite length, with length equal to or greater thanone, and with infinite length respectively. Given u and v inΣ∗, uv is the concatenation of u with v. A string u ∈ Σ∗ isa prefix of w ∈ Σ∗ (or w ∈ Σω) if there exists v ∈ Σ∗ (orv ∈ Σω) such that w = uv. Given a tuple s = (s1, s2, . . . , sn),let πi : s 7→ si be the projection function that maps s to itsi-th entry.

A. Reactive systems with partial information

The interaction between a system and its environment isformulated as a game arena and later the temporal logicspecification is introduced to define the winning conditionthat completes the game. We assume that the system and itsenvironment do not perform concurrent actions, and thus thegame is turn-based. The following definition is derived fromthe notion of game arena in [30], augmented with a set ofatomic propositions and a labeling function.

Definition 1. A deterministic game arena capturing the inter-actions of a system (player 1) and its dynamic environment(player 2) is

G = 〈Q,Σ, T, q0,AP, L〉

where the components are defined as follows.• Q = Q1 ∪ Q2 is the set of states. At each state inQ1, the system takes an action. At each state in Q2, theenvironment takes an action.

• Σ = Σ1 ∪ Σ2 is the set of actions. Σ1 is the set ofactions for the system, and Σ2 is the set of actions forthe environment.

• T : Q × Σ → Q is the deterministic transition function.For each state q ∈ Qi, if T (q, σ) = q′, then σ ∈ Σi, q

′ ∈Qj , for (i, j) ∈ {(1, 2), (2, 1)}.

• q0 ∈ Q is the initial state.• AP is a set of atomic propositions.• L : Q→ 2AP is the labeling function.

The definition of game arena is similar to that of gametransition system [26] and reactive system [28]. A state q ∈ Qis related with a propositional logic formula over a given setP of predicates. That is, each state q ∈ Q is associated witha truth assignment to each predicate p ∈ P . This associationis captured by the interpretation function Π such that for anyq ∈ Q, for any predicate p ∈ P , Π(q)(p) ∈ {true, false}. Wewrite Π(q) = ∧p∈P`p where `p = p if Π(q)(p) = true and`p = ¬p if Π(q)(p) = false, where ∧ and ¬ are the logicalconnectives for conjunction and negation, respectively. It isthe case that AP ⊆ P because not all the state information isof concern with respect to the control objective.

Page 4: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

4

Example 1 (cont.). In Example 1, the set P of predicatesis (

⋃x∈D1

xr = x) ∪ (⋃y∈D2

yr = y) ∪ (⋃x∈D1

xw =x) ∪ (

⋃y∈D2

yw = y) ∪ (⋃x∈D1,y∈D2

Stench(x, y)) ∪ {t},where (xr, yr), (xw, yw) are the positions of the robot andthe Wumpus, respectively; t is a Boolean turn variable: Whent = 1 the robot makes a move, otherwise t = 0 andthe Wumpus makes a move; D1 is the range of variablesxr, xw and D2 is the range of variables yr, yw. Stench(x, y)indicates that there is stench at position (x, y). Since theassignments for xw and yw uniquely determine the value ofpredicate Stench(x, y), for any x ∈ D1 and y ∈ D2, forsimplicity, we write q = ((xr, yr), (xw, yw), t) to denote astate.

Observations: The system has partial observation ofstates in the set Q. Following [9], this partial observationof a state in Q is defined by an equivalence relation overthe set of states, denoted R ⊆ Q × Q. Two states q andq′ are observation-equivalent, that is, (q, q′) ∈ R, if both qand q′ provide the same state information observable by thesystem. We denote the observations of states for the systemby O ⊆ 2Q, which is defined by the observation-equivalenceclasses in Q. Clearly, O is a partition of the state space Q.Moreover, we assume the system always knows whose turnit is to play, i.e., it observes the value of a Boolean turnvariable, which when 1 the system makes a move, otherwisethe environment makes a move.

We assume that the system cannot observe which actionthe environment performs and define an observation functionObs : Q∪Σ1∪Σ2 → O∪Σ1∪{λ} that satisfies the followingproperties i) For every q, q′ ∈ Q, q′ ∈ Obs(q) if and only if qand q′ are observation-equivalent in the game arena G. ii) Forany σ ∈ Σ1, Obs(σ) = σ. iii) For any σ ∈ Σ2, Obs(σ) = λ,which is the empty string, for the system cannot observe theaction of its environment.

A run in G is a finite sequence ρ = q0q1 . . . qn ∈ Q∗ ofstates or an infinite sequence ρ = q0q1 . . . ∈ Qω such thatq0 is the initial state and, for all i ≥ 0, there exists ai ∈ Σ,T (qi, ai) = qi+1. If ρ is finite, the last element of ρ is a state,denoted Last(ρ). The observation sequence of a run ρ is asequence Obs(ρ) = Obs(q0)Obs(q1) . . .. The set of prefixesof all runs in the game arena G is denoted Pref(G).

Example 1 (cont.). We assume that the robot always knowsits own position and the value of the turn variable, i.e., theassignments for variables xr, yr and t. Thus, for example, theset of states observation-equivalent to a state ((3, 0), (1, 1), 1)includes all the states in which the robot is at cell (3, 0) andt = 1, while the Wumpus can be at any cell along the diag-onal. That is, Obs((3, 0), (1, 1), 1) = {((3, 0), (xw, yw), 1) |xw, yw ∈ {0, 1, 2, 3}, xw = yw}.

a) Strategy: For both players 1 and 2, a deterministicstrategy for player i is a function fi : Pref(G) → Σi and arandomized strategy is a function fi : Pref(G) → D(Σi).We say that player i follows strategy fi if for any runρ ∈ Pref(G) at which fi is defined, player i takes the actionfi(ρ) if fi is deterministic, or an action σ ∈ Supp(fi(ρ))with probability fi(ρ)(σ) if fi is randomized. Note that in

case when the deterministic strategy outputs sets of actions,i.e., fi : Pref(G) → 2Σi , for a run ρ ∈ Pref(G), player itakes one action from fi(ρ) non-deterministically. Since thesystem has partial observation of the states, it can only applyan observation-based strategy f1, in the sense that for anytwo prefixes ρ and ρ′ ∈ Pref(G), if Obs(ρ) = Obs(ρ′), thenf1(ρ) = f1(ρ′).

We introduce a set Γ of sensing actions for the system.

Definition 2. Consider the set P of predicates and the set Γof sensing actions. For each sensing action a ∈ Γ, its effect iscaptured by the notation KWhether(φ, a), reads as “knowwhether φ is true after taking a.” It is an abbreviation for aformula which indicates that, after applying the sensing actiona, the truth value of a propositional logic formula φ over Pis known. That is, KWhether(φ, a) evaluates true if φ istrue, and evaluates false otherwise. KWhether(φ, a) impliesKWhether(ϕ, a) provided that φ implies ϕ. To capture bothglobal and local sensing capabilities, for a given state q, wedenote Γq ⊆ Γ to be a set of sensing actions enabled at q.

The notion of sensing action is defined based on the notionof knowledge producing actions in [21]. Similarly to [21], theno side-effect assumption on sensing actions is made.

Assumption 1. A sensing action will not change the value ofstate variables in the game.

The assumption is not restrictive because in general a con-trol action can potentially have both physical effect (changingthe state) and epistemic effect (revealing some informationfor the state). Thus, if an action introduces both physicaland epistemic changes, we simply consider it as an ordinarycontrol action and include it into Σ1. We call an action in Γsensing to emphasize that it provides information of the currentstate, and an action in Σ physical to emphasize it changes thestate of the game and Γ∩Σ = ∅. We assume that at each turnof the system it can either choose a physical action or severalsensing actions followed by a physical action.

Example 1 (cont.). A sensing action Smell(x, y) representsthat the robot detects whether there is a stench at cell (x, y)and, if there is, the proposition Stench(x, y) becomes true.The set of sensing actions for the robot is {Smell(x, y) |x, y ∈ {0, 1, 2, 3}}. Given the fact that the robot can onlydetect stenches nearby, that is, when the robot is at (xr, yr),Smell(x, y) is enabled if and only if |x − xr| ≤ 1 and |y −yr| ≤ 1. The effect of the sensing action Smell(x, y) is simplyKWhether(Stench(x, y), Smell(x, y)).

B. Specification language

A deterministic Buchi automaton (DBA) is a tuple As =〈S, 2AP , Ts, Is, Fs〉 where S is a finite state set, 2AP is thealphabet and AP is a set of atomic propositions, Is ∈ Sis the initial state, and Ts : S × 2AP → S the transitionfunction. The acceptance condition Fs is a set of states. Therun for an infinite word w = w[0]w[1] . . . ∈ (2AP)ω is aninfinite sequence of states s0s1 . . . ∈ Sω where s0 = Is andsi+1 = Ts(si, w[i]). A run ρ = s0s1 . . . is accepting in As ifInf(ρ) ∩ Fs 6= ∅ where Inf(ρ) is the set of states that appear

Page 5: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

5

infinitely often in ρ. The word w ∈ (2AP)ω is in the languageof As if and only if the run ρ for the word w is accepting inAs.

A DBA over atomic propositions AP represents a lineartemporal logic (LTL) formula ϕ over AP [31]. Formally,the set of LTL formulas over a finite set AP of atomicpropositions can be defined inductively as follows:• Any atomic proposition p ∈ AP is an LTL formula.• if ϕ and ψ are LTL formulas, so as ¬ϕ, ϕ∧ψ, ©ϕ andϕUψ where © and U are temporal modal operators for“next” and “until”.

Additional temporal logic operators include ♦ (eventually) and� (always), defined by ♦ϕ := true Uϕ and �ϕ := ¬♦¬ϕ.

It has been shown that an expressive subset of LTL formulas[32] can be encoded by the languages accepted by Determin-istic Buchi automata (DBAs). This subset of LTL formulasis defined by the grammar ϕ := ψ | ¬ϕ | ϕ ∧ ϕ | ϕ ∨ ϕwhere ψ is a formula in the logic defined by the grammarψ := p | ψ ∧ ψ | ♦ψ [32]. An example formula that canbe encoded into a DBA is �p ∨ ♦(q ∧ ♦r) where p, q, rare atomic propositions. It means that either p is alwaystrue or eventually q and r (in this order) become true. Anexample that is not in this subset is ♦(q ∨ �p) ∨ �♦r(either q becomes true eventually or eventually always p, oralways eventually r). In general, this subset of formulas isexpressive enough to specify a wide range of desirable systemproperties including safety, sequencing, liveness, persistenceand recurrence. Moreover, it is noted that the method presentedherein applies to LTL formulas encoded by DBAs, whichstrictly include LTL fragment formulas defined by [32].

With the labeling function L : Q → 2AP of the gamearena G, we relate the winning condition of players with thespecification DBAAs. A run ρ ∈ Qω is winning for the systemif and only if L(ρ) belongs to the language ofAs. Otherwise, itis winning for the environment. If the system wins, we ensurethat the specification ϕ is satisfied.

Example 1 (cont.). In Example 1, the robot is assigned withthe following task: Every time gold is collected (resp. the homeis reached), eventually the robot has to bring it to its home(resp. visit the gold mine again), and it has to do so infinitelyoften while avoiding the Wumpus. This task specification canbe expressed as an LTL formula

ϕ = �♦(p1 ∧ ♦p2) ∧�¬p3, (1)

where p1 is (xr, yr) = (3, 0), p2 is (xr, yr) = (0, 3) and p3

is (xr, yr) = (xw, yw). In this case, the set AP of atomicpropositions is {p1, p2, p3}.

The control synthesis problem with respect to satisfy thespecification (resp. satisfy the specification with probabilityone) amounts to computing a sure-winning (resp. almost-sure-winning) strategy for the system. We refer the reader to [12]for details on the methods for solving games with temporallogic specifications and partial information. Algorithms forsure-winning strategies and almost-sure-winning strategies areEXPTIME-complete [33], due to a subset construction. Thissubset construction starts with a set of states, which are

observation-equivalent to the initial state, and computes a setof subsets of states that can be reached from this initial set withall possible enabled actions. Then, it recursively computes aset of reachable subsets of states for each newly generated stateset for each enabled action and in the worst case, it enumeratesall subsets of the set of states in the game.

C. Problem statement

Formally, we solve the following problem in this paper.

Problem 1. Given a two-player game arena G =〈Q,Σ, T, q0,AP, L〉, a DBA As = 〈S, 2AP , Ts, Is, Fs〉, and aset Γ of sensing actions, design an observation-based strategyf : Q∗ → D(Σ1)∪Γ∗ with which the specification is satisfiedwith probability 1, whenever such a strategy exists.

Motivated by the exponential complexity of existing solu-tions for games with partial information [9], [12], we proposea synthesis method that ensures satisfaction with probability 1and avoids the subset construction by sporadic sensor queries.

IV. MAIN RESULTS

We propose a solution to Problem 1 that includes a synthesisalgorithm for the exploitation strategy using a game withcomplete information (in section IV-A) and an algorithm for anactive sensing strategy (in section IV-B). Then, we prove thatunder the synthesized strategy, the temporal logic specificationis satisfied with probability 1, provided a condition on thegame and the set of sensing actions is satisfied. The conditionis commonly used in online planning algorithms.

A. Games with complete information and the exploitationstrategy

First, we construct a product game that encodes the speci-fication, given by the DBA As, into the game arena G. Sucha product operation is commonly used for control synthesisin discrete finite-state transition systems with temporal logicconstraints [34], [35].

Definition 3. Given a game arena G = 〈Q,Σ, T, q0,AP, L〉and a DBA As = 〈S, 2AP , Ts, Is, Fs〉, the product game isG = G n As = 〈V,Σ,∆, v0, F 〉, with components definedas follows: V = Q × S is the set of states, partitioned intothe set V1 = Q1 × S of system’s state, and the set V2 =Q2×S of environment’s states. The initial state is v0 = (q0, s0)where s0 = Ts(Is, L(q0)). The map ∆ : V × Σ → V isthe transition function, defined as ∆((q, s), σ) = (q′, s′) withq′ = T (q, σ) and s′ = Ts(s, L(q′)). F = {(q, s) | s ∈ Fs} isthe winning condition. A run ρ ∈ V ω is winning for the systemif it visits some states in the set F infinitely often. Otherwise,it is winning for the environment.

A run in the game G is a sequence of game statesρ = v0v1 . . . such that v0 is the initial state and for alli > 0, there exists a ∈ Σ, vi = ∆(vi−1, a). The setof prefixes of all runs in the game G is Pref(G). Given arun ρ = v0v1 . . . = (q0, s0)(q1, s1) . . ., for convenience innotation, we denote π1(ρ) = π1(v0)π1(v1) . . . = q0q1 . . ., the

Page 6: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

6

corresponding sequence of states in the underlying game arena,and π2(ρ) = π2(v0)π2(v1) . . . = s0s1 . . ., the correspondingsequence of states in the specification automaton.

By construction of the product game, if the system hascomplete information, its sure-winning strategy becomes acontroller that ensures that the specification is satisfied fromany state at which the strategy is defined. The system’ssure-winning strategy is a memoryless, deterministic policyWS : V → Σ1 and can be computed in time polynomial inthe size |V ×Σ| of the game. For the algorithm for computingsuch a policy, the reader is referred to [30].

b) Belief and belief update: The belief B of the system isa subset of the set V of game states in which the system thinksthe game can possibly be, given its partially observed history.Formally, given a run ρ = v0v1 . . . vn, the belief of the systemis B = {Last(ρ′) ∈ V | ρ′ ∈ Pref(G) and Obs(π1(ρ′)) =Obs(π1(ρ))}. In the game G, the initial belief is B0 = {(q, s) |q ∈ Obs(q0) and s = Ts(Is, L(q))}.

For clarity, we denote the set of beliefs by B ⊆ 2V . Wheninteracting with the environment with physical actions, thebelief of the system is updated through a process called beliefupdate and is captured by the function

Update : B × (Σ1 ∪ {λ})×O → B, (2)

where the empty string λ represents the unobserved envi-ronment’s action. Formally, the function of belief update isdefined as follows: Given a belief B, the system takes actiona ∈ Σ1 and gets an observation o ∈ O, it updates its belief toB′ according to,

B′ = Update(B, a, o) = {v′ | ∃v ∈ B such that∆(v, a) = v′ and π1(v′) ∈ o}.

If it is the environment’s turn, after the environment takessome action, the system gets an observation o ∈ O and thenupdates its current belief B to B′ as follows.

B′ = Update(B, λ, o) = {v′ | ∃v ∈ B, ∃σ ∈ Σ2 such that∆(v, σ) = v′ and π1(v′) ∈ o}.

Remark 1. Different notions of belief have been developedin different problem formulations. For example, in stochasticsystems with partial observation and sensors with uncertain-ties, a belief is a probabilistic distribution over states [36]. Intwo-player deterministic games with partial observation anddeterministic sensors studied in this paper, a belief is a subsetof states.

A belief-based strategy : A belief-based, memorylessand randomized strategy is a function fb : B → D(Σ1).A belief-based strategy can be transformed into a ran-domized, observation-based strategy for the system f :Pref(G) → D(Σ1) ∪ ∅, where ∅ means undefined, byletting f(ρ) = fb(B) where B = {Last(ρ′) ∈ V | ρ′ ∈Pref(G) and Obs(π1(ρ′)) = Obs(ρ)} if fb(B) is defined,otherwise f(ρ) = ∅. By construction, it holds that if ρ ∈ Q∗and ρ′ ∈ Q∗ are observation-equivalent, the system obtainsthe same belief B after observing ρ and ρ′ and therefore takes

an action according to the same distribution f(ρ) = f(ρ′) =fb(B) if fb(B) is defined.

So far, we have introduced two strategies: One is thedeterministic sure-winning strategy WS : V → Σ1 in theproduct game G, which can be computed but requires perfectobservation to execute at runtime. The second is a belief-basedstrategy fb : B → D(Σ1) which is applicable with partialobservation. From the sure-winning strategy WS, we constructa belief-based, exploitation strategy in the following way: LetWin1 ⊆ V be the set of states at which WS are defined, alsoknown as the sure-winning region of the system. Given B ∈ B,let

Progress(B) =⋃v∈B

WS(v), and

Allow(B) =⋂v∈B

Allow(v),

where Allow(v) = {σ ∈ Σ1 | ∆(v, σ) ∈ Win1}. Essentially,for each state v ∈ B, the sure-winning strategy will suggestaction WS(v) to be taken by the system, which is thenincluded into a set Progress(B). The set Allow(B) is a setof actions with the following property: No matter in whichstate of the belief B the game is, by taking an action inAllow(B), the next state will still be one for which the sure-winning strategy is defined. Then, if Progress(B) ⊆ Allow(B),we let fb(B)(σ) = 1

|Progress(B)| for each σ ∈ Progress(B).Otherwise, fb is undefined for B. Note that if for a given beliefB, Allow(B) is empty, then it follows that fb is undefined.

The belief-based strategy fb has the following property.

Lemma 1. If fb(B) is defined for a given belief B ⊆ V , thenfb(B

′) is also defined for any belief B′ ⊆ B.

Proof. Since fb(B) is defined, it holds thatProgress(B) ⊆ Allow(B). Given the fact that for any beliefB′ ⊆ B, Allow(B′) = ∩v∈B′Allow(v) ⊇ ∩v∈BAllow(v)and thus Allow(B′) ⊇ Allow(B). Meanwhile,Progress(B′) = ∪v∈B′WS(v) ⊆ ∪v∈BWS(v). Then, it holdsthat Progress(B′) ⊆ Progress(B) ⊆ Allow(B) ⊆ Allow(B′).By definition, fb is also defined for B′.

Example 1 (cont.). For illustration purpose, consider a simpleobjective “reaching home” that requires the robot to reachhome without running into the Wumpus. The state in the corre-sponding DBA is 1 when the robot gets home and 0 otherwise.Given the robot’s belief B = {(((3, 0), (xw, yw), 1), 0) | xw =yw, xw ∈ {0, 1, 2, 3}}, we obtain Progress(B) = {W,N}and Allow(B) = {N,W,S,E} because with actions S andE the robot will stay in its current cell, for which the sure-winning strategy is defined. We define the exploitation strategyas fb(B)(W ) = fb(B)(N) = 0.5.

In some cases, the sure-winning strategy WS can be set-valued, for example, WS is a permissive strategy [37]. Forthese cases, the observation-based exploitation strategy isderived from WS : V → 2Σ1 as follows: If for all v ∈ B, theset WS(v) ∩ Allow(B) is not empty, then let Progress(B) =⋃v∈B(WS(v) ∩ Allow(B)) and fb(B)(σ) = 1

|Progress(B)| , foreach σ ∈ Progress(B). By construction, for each state v in

Page 7: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

7

the belief B, at least one action in WS(v) will be chosen bythe system with a non-zero probability.

So far, we have shown how to transform the sure-winningstrategy with perfect observation in the product game intoa randomized, belief-based exploitation strategy with partialobservation. During control execution, the system only main-tains and updates its current belief. At each turn of thesystem, after applying an action σ ∈ Σ1 at the belief B, thesystem receives an observation o ∈ O, updates its belief toB′ = Update(B, σ, o). Then, from the environment’s move,the system obtains another observation o′ ∈ O, updates itsbelief to B′′ = Update(B′, λ, o′). The system applies fb(B′′)if fb is defined for B′′. The game continues with alternatingmoves of the system and its environment.

Complexity analysis for the exploitation strategy: For atwo-player, turn-based, product game G, the time complex-ity of finding a sure-winning strategy WS(·) with completeinformation is O(n × m) where n = |V | is the number ofstates and m = |{(v, σ, v′) | ∆(v, σ) = v′, v, v′ ∈ V, σ ∈ Σ}|is the number of transitions [30]. At each belief update, if itwas after a move of the system player, the system computesa new belief and the computation is time-linear in the sizeof the current belief, which is at most |V |. If it was after amove of the environment player, the system computes a newbelief which is time-linear in the product of the size of currentbelief and the number of actions for the environment, which isat most O(|V | × |Σ2|). The computation of fb for each beliefis simply set union and set intersection, which are time-linearin the sets. Overall, the time complexity of computing theexploration strategy fb is polynomial in the number of statesand number of transitions of the game G. In contrast, an offlinemethod solves two-player games with partial information intime exponential in the size of the game. Online planningmanages to avoid this heavy computation by only consideringone reachable belief at each step that conforms to the system’sobservation.

The constraint enforced on the choice of actions by the setof allowable actions is necessary for keeping the system inthe winning region of the sure-winning strategy WS. It is thuspossible, for some beliefs in B, the strategy fb is undefined.When the system reaches one such belief during execution,it needs to explore and obtain new information and revise itsbelief with a finite number of sensing actions, until it findsitself in one for which fb is defined and then can continue toexecute fb. The acquisition of information is done by applyingan active sensing strategy. The active sensing strategy is afunction that maps a belief to a sensing action that should beapplied. Next, we present a synthesis method to compute theactive sensing strategy that minimizes the maximal number ofsensing actions per exploration phase.

B. Belief revision and the active sensing strategy

We show how the effect of a sensing action revises a belief.

Definition 4. Given a belief B and a sensing action a ∈ Γ witheffect KWhether(φ, a), the revision of B by a is representedby

Knows(φ, a,B) = (B′, B \B′),

where B′ ⊆ B is the set of states in which φ evaluates trueand B \ B′ is the set of states in which φ evaluates false.Formally, for any v = (q, s) ∈ B′, it holds that Π(q) impliesφ, and for any v = (q, s) ∈ B \B′, it holds that Π(q) implies¬φ.

If φ is true, the belief is revised to be B′. Otherwise, it isrevised to be B \ B′. A sensing action a ∈ Γ is enabled at abelief B if and only if, for each v ∈ B, action a is enabledat q = π1(v). We say that a belief B cannot be revised iffor any sensing action a enabled at B, for any propositionallogic formula ϕ whose value can be detected by applying a,given Knows(ϕ, a,B) = (B1, B2), it holds that for eitheri = 1 or i = 2, Bi = B. In the following, we construct atree structure that represents how the system revises its beliefsbased on information obtained through sensing actions. Then,we propose a synthesis method for an active sensing strategyusing the tree structure.

Given a belief BI , the belief revision tree with the root BIis a tuple BRTree(BI) = 〈N , E〉, where N is the set of nodesin the tree, consisting a subset of beliefs, and E ⊆ N ×Γ×Nis the set of edges. It is constructed as follows.

1) The root of the tree is BI .2) At each node B ∈ N , for each sensing action a ∈ Γ

enabled at the belief B, if there exists a formula ϕwhose value can be detected by a, such that (B1, B2) =Knows(ϕ, a,B) and both B1, B2 are not empty, thenwe add two children B1, B2 of B, and include edges(B, a,B1), (B, a,B2) into the set E of edges.

3) A node B is a leaf of the tree if and only if either Bcannot be further revised by any sensing action or for Bthe exploitation strategy is defined.

An edge (B, a,B′) ∈ E is also denoted as B a−→ B′. We extendthe edges in the usual way: If B u−→ B′′ and B′′ v−→ B′, thenB

uv−→ B′, for u, v ∈ Γ+.

Example 1 (cont.). Let us illustrate the construction ofa belief revision tree with Example 1 under the sim-ple objective “reaching home.” Given the belief BI ={(((2, 1), (x, x), 1), 0) | x ∈ {0, 1, 2}}, with the set of sensingactions {Smell(x, y) | |x−2| ≤ 1, |y−1| ≤ 1}. We constructthe belief revision tree in Figure 2. For clarity, we onlyconsider a subset of sensing actions {Smell(x, y) | (x, y) ∈{(1, 2), (1, 1), (1, 0)}} out of total 9 sensing actions enabledat BI .

The active sensing strategy is a function fs : B → Γ thatmaps a belief into a sensing action for the system to take.It is computed as follows. First, in the tree BRTree(BI), wecompute a set of target nodes Reach ⊂ N such that a node B′

is included in Reach if and only if fb(B′) is defined. Similarto the attractor computation for reachability objectives [30]with the target set Reach, we have the following recursion:

1) X0 = Reach, i = 0.2) Xi+1 = Xi ∪ {B ∈ B | ∃a ∈ Γ, such that ∀B′ ∈B, (B, a,B′) ∈ E implies B′ ∈ Xi} and we definefs(B) = a. In other words, a belief B is included inXi+1 if there exists a sensing action a such that, when ais applied at B, the system must reach a belief in Xi.

Page 8: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

8

0, 1, 2start

120, 1 0, 2 1, 2 0

1 2

S(1, 0) S(1, 0)S(1, 1)

S(1, 1) S(1, 2) S(1, 2)

S(1, 1)S(1, 1)

Fig. 2. A fragment of the belief revision tree BRTree(BI) with BI ={(((2, 1), (x, x), 1), 0) | x ∈ {0, 1, 2}}, where each node x represents agame state (((2, 1), (x, x), 1), 0) in V and the sensing action Smell(x, y)is abbreviated as S(x, y). The set of leaves are marked with double circles.Note that, for the belief {1, 2}, which means the Wumpus can either be atcell (1, 1) or (2, 2), the exploitation strategy is not defined and the robot hasto further revise its belief when it reaches this node.

3) Until i is increased to some number m ∈ N such thatXm+1 = Xm, the computation terminates and we outputthe sensing strategy fs.

In principle, for a given belief B, both fb and fs may bedefined at B. Here, we enforce the condition that wheneverfb(B) is defined, fs(B) must not be defined. In other words,the intersection of the domains of fb and fs is empty. Wedenote Xm = Attr(Reach) following the notion of an attractorof the target set Reach. By construction, for any state inAttr(Reach), there exists a sensing strategy fs such that, forwhatever outcome resulted by applying sensing actions, thesystem can arrive at some belief in Reach in finitely manysteps by following fs. Furthermore, the resulting strategy alsominimizes the maximal number of sensing actions required inthe exploration phase under the following constraint: Duringsensing, the system will not run into a belief outside the regionAttr(Reach). We associate a number, called level, for eachbelief B ∈ Attr(Reach). Let Level(B) = 0 for B ∈ Reach,Level(B) = i if and only if B ∈ Xi \Xi−1, for 1 ≤ i ≤ m.The proof of this property in the following lemma followsimmediately from the property of attractor.

Lemma 2. Given the current belief BI ∈ B of the systemand the active sensing strategy fs : B → Γ obtained with theattractor computation on BRTree(BI), if fs(BI) is defined,then the maximal number of sensing actions required duringthe exploration phase is Level(BI).

Proof. When BI ∈ Attr(Reach), active sensing stops whenthe system reaches a belief in Reach and for each B ∈ Reach,Level(B) = 0. Each step, by applying fs, the level of system’sbelief will be strictly decreased by 1. Thus, the number ofsensing actions required to reach a belief B ∈ Reach ismaximally Level(BI).

Example 1 (cont.). Consider the belief revision treeBRTree(BI) in Figure 2, in this case Reach coincides withthe set of leaves and Attr(Reach) contains all the nodes inthe tree. Let X0 = Reach, X1 = Reach ∪ {{0, 1, 2}, {1, 2}}and X2 = X1 = Attr(Reach). The action fs(BI) canbe either Smell(1, 0) or Smell(1, 1). If the Wumpus isactually in cell (2, 2), by taking action Smell(1, 0), therobot detects Stench(1, 0) = false, updates the belief into

a singleton {(((2, 1), (2, 2), 1), 0)} and then takes a physi-cal action ‘W’ indicated by the exploitation strategy. If theWumpus is not in (2, 2), after taking action Smell(1, 0),the robot detects Stench(1, 0) = true and knows thatthe Wumpus is in one of the cells in {(0, 0), (1, 1)}. Itwill take action ‘N’ to avoid running into the Wumpus,indicated by the exploitation strategy fb(B), with B ={(((2, 1), (0, 0), 1), 0), (((2, 1), (1, 1), 1), 0)}.

Complexity analysis for computing active sensing strate-gies: The computation time of an active sensing strategy fsis linear in the size of the belief revision tree. However, theconstruction of a belief revision tree with root BI is basedon a subset construction, giving O(2|BI |) worst-case timecomplexity. One way of dealing with this expensive beliefrevision computation is to restrict the maximal number ofstates in the root of the tree, i.e., |BI | ≤ n, by defining anew exploitation strategy f ′b from fb such that if |B| ≤ n,f ′b(B) = fb(B), otherwise, f ′b(B) is undefined. In other words,before the ambiguity regarding the current game state growstoo big, the system is forced to take some sensing actionsto refine its belief. Alternatively, in Section V, we show thatan opportunistic sensing strategy can be obtained withoutthe construction of a belief revision tree and the attractorcomputation.

Up to this point in this subsection, we have developedan algorithm for synthesizing an active sensing strategy torevise the system’s belief. However, if the belief BI is notin Attr(Reach), then, even if we can resolve the ambiguityto a certain extent by applying sensing actions randomly,we might run into a “dead-end” for which the exploitationstrategy is not defined, and no sensing action can furtherrevise the system’s current belief. Dead-ends are notoriouspitfalls for online planning algorithms [24], [38], [39]. Sinceour method does not construct a complete solution to thesynthesis problem with partial observation and sensing actions,it is no exception. On the other hand, a complete solution ofgames with partial observation and sensing actions requires asubset construction for all reachable subsets of states, for allenabled control actions and sensing actions. The exponentialspace for finding a complete solution is prohibitively largefor problems in practice. In the following subsection, weprove that by switching between the exploitation and activesensing strategies, the system can satisfy the specification withprobability 1, provided that a certain condition in the game andthe set of sensing actions is satisfied.

C. Almost-sure winning with active sensing and exploitationAt runtime, the system alternates between the exploitation

strategy and the active sensing strategy. We call the system’sstrategy at runtime a composite strategy, denoted f : B →D(Σ1)∪Γ. The composite strategy combines the belief-basedexploitation strategy fb : B → D(Σ1), and the sensing strategyfs : B → Γ as

f(B) =

{fb(B), if fb(B) is defined;fs(B), if fs(B) is defined. (3)

Recall that, by the definition of the sensing strategy, theintersection of the domains of fb and fs is empty. The

Page 9: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

9

composite strategy f simply means that the system makesprogress using fb if and only if it is defined for its currentbelief. Otherwise, the system is stuck in the exploitation phaseand has to switch to the exploration phase with the sensingstrategy fs.

A belief dead-end is a belief B at which both fb and fsare undefined. We say that a game with partial observation isdevoid of belief dead-ends if the following condition is satis-fied: For any belief B reachable from the initial belief B0 byfollowing the composite strategy, if fb(B) is undefined, thenfor any game state v ∈ B, there exists a belief B′ ∈ Reach ofBRTree(B), v ∈ B′ and fs(B) is defined.

The following two conditions are sufficient for a game withpartial observation being devoid of belief dead-ends. 1) Foreach set Y ⊆ Q of states and each q ∈ Y , there existsa sequence w of sensing actions such that after applyingw, the system detects that the actual state is q. 2) For anybelief B reachable from the initial belief B0 by following thecomposite strategy, if there exist v, v′ ∈ B, π1(v) = π1(v′) yetπ2(v) 6= π2(v′), then fb must be defined for the belief {v, v′}.This first condition requires a powerful sensing capability forthe system. By definition, sensing actions cannot distinguishtwo game states v and v′ if they share the same state in Qbut different states in the set S of specification states. Thus,if we run into a belief which includes v and v′, then nomatter what sensing actions are applied, the revised belief thatcontains v must also contain v′. If fb is undefined for {v, v′},the system then runs into a belief dead-end. The secondcondition is enforced to avoid such belief dead-ends. Notethat the requirement on no dead-end is necessary to ensurethe correctness of online planning algorithms. Similar notionson games with no dead-ends are proposed, for instance, beliefconnected planning problems [24].

Example 1 (cont.). In this example, the set of sensing actions,which allows the robot to detect a stench in any surroundingcell, turns out to be sufficient with respect to the objectiveof “reaching home”, even though this set of sensing actionsdoes not necessarily reduce every belief into a singleton. Forexample, when the robot is in cell (0, 0) and the Wumpus isin cell (3, 3), with sensing actions, the robot can only tellthat the Wumpus is not in cell (1, 1) or (0, 0). Yet, with thisinformation, it already knows which action to take using itsexploitation strategy.

Next we prove the correctness of the composite strategy.To this end, we recall a property in the solution for gameswith perfect observation from [30]: For the product game G =〈V,Σ,∆, v0, F 〉 with perfect observation, the sure-winningregion of the system can be partitioned as Win1 =

⋃mi=0Wi.

Given a state v ∈ Win1, there exists a unique ordinal i suchthat v ∈ Wi. If v ∈ Wi ∩ V1 for some i > 0, then afterapplying WS(v), the system reaches a state v′ ∈ Wi−1. Ifv ∈ W0 ∩ V1, then after applying WS(v), the system reachesa state v′ ∈ Wj for some 0 ≤ j ≤ m. On the other hand, ifv ∈Wi ∩ V2 for some i > 0, then no matter which action theenvironment takes, the next state is in Wi−1. If v ∈W0 ∩ V2,then no matter which action the environment takes, the nextstate is in Wj for some 0 ≤ j ≤ m.

Theorem 1. Given a game G = 〈V,Σ,∆, v0, F 〉, and theinitial belief B0, if G is devoid of belief dead-ends and thecomposite strategy f is defined for B0, then by following f , itis ensured that with probability 1, the set F of states is visitedinfinitely often.

Proof. With the assumption that G is devoid of belief dead-ends, let us consider a belief B ∈ B the system runs into byfollowing the composite strategy f and suppose fb is definedfor B. By definition of fb, for each σ ∈ Progress(B), the prob-ability of choosing action σ is 1

u , where u = |Progress(B)|.Suppose the true state v is in Wi, as the action WS(v)will be chosen with probability 1

u , it is the case that withprobability 1

u , the system will reach a state in Wi−1 wheni 6= 0, or a state in Wk for some 0 ≤ k ≤ m when i = 0.Moreover, since Progress(B) ⊆ Allow(B), any action thatcan be taken by the system with a non-zero probability isin Allow(B) and thus with probability 1 the system staysin Win1 by choosing any action σ 6= WS(v). Formally,suppose v is the actual state and v′ is the state after takingsome action, Pr(v′ ∈ Win1) =

∑σ∈Progress(B)

1u Pr(v′ ∈

Win1 | v, σ) =∑σ∈Progress(B)\{WS(v)}

1u Pr(v′ ∈ Win1 |

v, σ) + 1u Pr(v′ ∈ Win1 | v,WS(v)) = 1

u · (u − 1) + 1u = 1,

where Pr(v′ ∈ Win1 | v, σ) is the probability of v′ being inWin1 provided that action σ is taken at the state v.

At each turn of the game either the system takes a physicalaction or a sequence of sensing actions followed by a physicalone, or the environment takes an action. Given v ∈ Wi,0 < i ≤ m, let Pr(v,♦iW0) denote the probability ofreaching W0 from state v in i turns. When system appliesthe strategy f , it is Pr(v,♦iW0) ≥ ( 1

|V | )i > 0 and the

probability of not reaching W0 in i turns is less than or equalto 1−( 1

|V | )i ≤ 1−( 1

|V | )m+1 = r < 1, where m+1 is the total

number of partitions in Win1. Meanwhile, any action takenby the system or the environment ensures that the next stateis within Win1. Suppose that, given the true state v ∈ Wi,i > 1, the system takes action σ′ 6= WS(v). Then, insteadof reaching Wi−1, the system arrives at v′′ ∈ Wj for some0 < j ≤ m, the probability of not reaching W0 in j turnsfrom v′′ is still less than or equal to r. Therefore, under thepolicy f , the probability eventually reaching W0 from anystate v ∈ Win1 is Pr(v,♦W0) = limk→∞ Pr(v,♦kW0) =limk→∞(1 − Pr(v,¬♦kW0)) ≥ limk→∞(1 − rk) = 1 andthus Pr(v,♦W0) = 1.

Once entering W0 = F , the system’s action or any envi-ronment’s action ensures reaching a state in Win1, and theabove reasoning applies again, which means with the strategyf , the system can revisit the set F of states infinitely oftenwith probability 1.

The proof is partially inspired by the synthesis algorithm forgames with partial observation [12], in which the set of statesfrom which the system can achieve the goal with probability1 is computed and the observation-based strategy ensures thatthe system will never leave such a set. However, here we do notcompute this set. Rather, we replace it with the sure-winningregion in the game with perfect observation. We can do so dueto the ability of acquiring information with sensing actions at

Page 10: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

10

runtime and the condition of no belief dead-ends: It is possiblefor us to reduce the uncertainty whenever such a reduction isneeded.

V. ONLINE SENSING WITH REDUCED COMPUTATIONALCOMPLEXITY

Although the exploitation strategy does not require a subsetconstruction in the game, the construction of belief revisiontree based on the sensing information is in fact exponentialin the size of the belief at the root. In this section, we showthat by relaxing the requirement on finding a sensing strategythat minimizes the maximal number of sensing actions perexploration phase, an online sensing strategy can be developedto ensure to reach a leaf in the belief revision tree at whichfb is defined. In the worst case, this strategy applies everysensing action at most once. But in practice, it is often thecase that even with the online sensing strategy, only a smallsubset of sensing actions is applied during each explorationphase.

Given the current belief B, the online sensing strategy fsis defined as follows:

1) If fb(B) is undefined, proceed to step 2. Otherwise,proceed to step 3.

2) Let fs(B) = a where sensing action a satisfies(B1, B2) = Knows(ϕ, a,B) and both B1, B2 arenonempty. If more than one such action exist, choose oneat random. Then, if ϕ evaluates true, the system arrivesat the belief B1 and let B = B1. Otherwise, ϕ evaluatesfalse and let B = B2. Go to step 1.

3) Terminate and exercise the exploitation strategy fb(B).Next we show that if there exists an active sensing strategywhich ensures that the system will reach a belief at which fbis defined, then the online sensing strategy will ensure that wereach a possibly different belief, at which fb is also defined,with potentially more sensing actions.

First, we show that for a given belief, the belief obtainedafter revision depends only on which sensing actions areapplied not the order in which those actions are applied.

Lemma 3. Given a belief B ⊆ V , if there exist two sequencesof sensing actions a1a2, a2a1 ∈ Γ∗ such that a1a2 updates thebelief as B a1−→ B1

a2−→ B2 with B2 ⊂ B1 ⊂ B, and a2a1

updates the belief as B a2−→ B′1a1−→ B′2 with B′2 ⊂ B′1 ⊂ B,

then B2 = B′2.

Proof. The proof is based on the commutativity of conjunc-tion. Suppose that the effect of a1 (resp. a2) is captured byformula KWhether(ϕ1, a1) (resp. KWhether(ϕ2, a2)).For i = 1, 2, let φi = ϕi if KWhether(ϕi, ai) evaluatestrue, and φi = ¬ϕi otherwise. We can conclude that for allstates in B2, φ1 ∧ φ2 evaluates true. Likewise, for all statesin B′2, φ2 ∧ φ1 evaluates true. Since φ1 ∧ φ2 = φ2 ∧ φ1,B2 = B′2.

Lemma 4. Given a belief B ⊆ V and the actual state v ∈ B,if there exists an active sensing strategy fs such that fs(B)is defined, then by applying the online sensing strategy, it isensured that the system will reach a belief at which fb isdefined.

Proof. Suppose, by applying the active sensing strategy fs,the system reaches a belief B′ at which fb is defined,with a sequence of sensing actions a0a1 . . . am. Considerany sequence of sensing actions w ∈ Γ∗, let Acts(w) bethe set of sensing actions in w. By Lemma 3, as long as{ai, 0 ≤ i ≤ m} ⊆ Acts(w), we are ensured to arrive ata belief B′′ ⊆ B′ since any sensing action other than thosein {ai, 0 ≤ i ≤ m} may further refine the belief B′ intoone of its subsets. Since the online sensing strategy in theworst case will apply every sensing action in Γ once, it isguaranteed that either the system reaches a belief at which fbis defined before using all the actions in {ai, 0 ≤ i ≤ m},or the sequence w of sensing actions used in the explorationphase satisfies {ai, 0 ≤ i ≤ m} ⊆ Acts(w) and the systemreaches a belief B′′ ⊆ B′. According to Lemma 1, as fb isdefined for B′, it is also defined for B′′.

The above lemma shows that if the system with activesensing strategy can avoid running into a dead-end in onlineplanning, so can it with the online sensing strategy. Note thatalthough the composite strategy with online sensing can becomputed in time polynomial in the size of the game, com-paring to the exponential complexity of any offline solutionfor partially observable games, the reduction in complexitycomes at a cost: 1) Only if there is no belief dead-ends, thecomposite strategy can ensure that the specification can besatisfied almost surely. 2) Online sensing cannot guaranteethe minimization of the maximal number of sensing actionsduring each exploration phase. Consequently, online sensingintroduces a trade-off between computational complexity andthe optimality in the sensing strategy.

VI. EXAMPLES

We apply the exploitation and exploration with sensingactions algorithm to Example 1 and a robotic motion planningproblem in a terrain with a set of distributed, stationarysensors. The implementations are in Python on a desktop withIntel(R) Core(TM) i5 processor and 16 GB of memory. Forboth examples, without the inclusion of sensing actions, it canbe shown that neither observation-based, sure-winning strate-gies nor almost-sure winning strategies exist using algorithmsfor games with partial observation [12]. That is, for theseexamples, the initial state is not included in the winning regionof the system, which is computed using the algorithms in [12].Intuitively, in the Wumpus game with partial information, therobot will inevitably reach a belief that the Wumpus can bein any of the cells along the diagonal. When the robot is in acell in the upper or lower diagonal, there is no action whichensures that the robot can cross the diagonal in order to reachits home or the gold mine without running into the Wumpus.A similar argument applies to explain the non-existence ofwinning strategy for the system in the other example.

A. Example I: The Wumpus game

In Example 1, it is reminded that the robot needs to collectgold from the mine to its home infinitely often while avoidingthe Wumpus. This specification in LTL is given in Eq. 1.

Page 11: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

11

Suppose with each trip between Gold and Home the systemis able to collect 10 grams gold. From the experimental result,after 1000 steps (a step includes either a finite sequence,possibly of length 0, of the system’s sensing actions followedby a control action, or includes an action of the environment),the robot collected 3700 grams of gold with 100 sensingactions applied. In Figure 3 we show the belief updatesby applying alternatively the exploitation strategy and activesensing strategy for the initial 100 steps. It is observed thatthe maximum cardinality of the belief set is 4 over the controlexecution, which means that the robot thinks the Wumpus canbe in any diagonal cell. However, if there is no danger ofrunning into the Wumpus in the next step, the robot toleratesthis uncertainty and revises its belief whenever it needs moreinformation of the Wumpus’ location in order to avoid it.

To see what is the minimal set of sensing actions requiredfor the system to accomplish this task, we took severalexperiments with different sensor configurations. It turns outthat the minimal set is not unique: the goal can be accom-plished if the system can detect whether there exist stenchesin its own cell and cells to its h and ` direction with(h, `) ∈ {(N,E), (N,S), (S,W )}. Certainly, any superset ofa minimal set of sensor actions would be sufficient.

0 10 20 30 40 50 60 70 80 90 1001

1.5

2

2.5

3

3.5

4

4.5

5

Number of steps (t)

The c

ard

inalit

y o

f belie

f

Fig. 3. The update in the number of possible Wumpus’ positions in thesystem’s belief.

B. Example II: Distributed sensors

We consider a gridworld example as shown in Figure 4.The assumption on the dynamics of two robots, named bot1and bot2, is the following: the position of bot1 has a discretedomain D1 = {(x1, y1) | 0 ≤ x1, y1 ≤ 9, x1, y1 ∈ Z} whereZ is the set of integers. The domain of bot2’s position is the setof cells in the block enclosed by the red, bold-lined boundary:D2 = {(x2, y2) | 0 ≤ x2, y2 ≤ 9, x2, y2 ∈ Z, y2 − k ≤x2 ≤ y2 + k} where k is a parameter one can specify torestrict the motion of bot2. In this figure and the followingexperiments, k = 4. Given a set R = {r0, r1, . . . , rn} of n+1sensors, the range of sensor ri is determined by the location(xri, yri) and its angle θri, where θri ∈ {0, 0.5π, π, 1.5π}.To be specific, the range of a sensor, denoted as its ray, isa set of cells: ray(xri, yri, 0) = {(x, y) | x ≥ xri, yri =y}; ray(xri, yri, 0.5π) = {(x, y) | x = xri, y ≥ yri};ray(xri, yri, π) = {(x, y) | x ≤ xri, yri = y}; andray(xri, yri, 1.5π) = {(x, y) | x = xri, y ≤ yri}. A sensorwith configuration (xr, yr, θr) outputs 1 if there is a movingobject in the set of cells ray(xr, yr, θr). Otherwise it outputs0. It is assumed that the ray only detects robots (e.g., by heat orby vision). In other words, if the ray of a sensor includes someobstacle but no robot, the sensor outputs 0. Depending on the

locations and orientations of the sensors, we can estimate theposition of a target. For example, given that sensor r1 outputs1 and sensor r0 outputs 1, we know that there is a robot inthe intersection of the rays from r1 and r0.

R1

R2bot1

bot2(x2, y2)

(x1, y1)

0 1 2 3 4 5 6 7 8 9

Fig. 4. A partitioned work space including two robots (bot1, bot2, representedby the red and green blocks), some static obstacles represented by the blackcross, and a set of sensors, each of which is represented by a cycle with anarrow indicating the direction at which the sensor is pointing.

From bot1’s perspective, bot2 is its dynamic environment.The objective of bot1 is given in the form of a temporal logicformula ϕs1 := �♦(R2 ∧ ♦R1) ∧ ϕsafe, where ϕsafe meansbot1 shall not hit any obstacle or bot2.

The dynamics of bot1 are given by

xk+1 =

{xk + u1, if 0 ≤ xk + u1 ≤ 9xk, otherwise.

yk+1 =

{yk + u2, if 0 ≤ yk + u2 ≤ 9yk, otherwise.

where (u1, u2) = {(1, 0), (0, 1), (−1, 0), (0,−1),(1, 1), (1,−1), (−1, 1), (−1,−1)}. In other words, bot1 canwalk in 8 compass directions and if it tries to walk off thegrid, it will stay in its current cell.

The dynamics of bot2 are similar to bot1except that the input of bot2 is restricted to a set{(1, 0), (0, 1), (−1, 0), (0,−1)}. That is, bot2 cannot movediagonally. If bot2 tries to walk out of the region enclosed bythe red boundary, it has to stay in its current cell.

In what follows we show three experiments with differentconfigurations of the sensors (by configuration we mean thelocations and numbers of sensors). Though the game betweenbot1 and bot2 has infinite number of stages, we carry out eachexperiment for 1000 steps (at each step either bot1 or bot2makes a move). In the three experiments, the initial state of thegame is ((0, 3), (7, 6), 1), which means that bot1 is at location(0, 3), bot2 is at (7, 6) and it is bot1 to make the first move. Inthis example, the number of states in the game is 13400. Sincethe construction of belief revision trees is time consuming fora game of this size, we apply the online sensing strategy.Each decision (a physical action or a sequence of sensingactions followed by a physical one) is made in about 0.001 to0.02 seconds during the online planning. The computation ofsure-winning strategy given complete information took 29.56seconds.

c) Experiment 1 with 10 sensors: The set of locationsfor the sensors is R = {(i, i) | 0 ≤ i ≤ 9}.

Page 12: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

12

d) Experiment 2 with 8 sensors: The set of locations forthe sensors isR = {(2, 0), (3, 1), (4, 2), (5, 3), (6, 4), (7, 5), (8, 6), (9, 7)}.

e) Experiment 3 with 6 sensors: The set of locations forthe sensors is R = {(4, 0), (5, 1), (6, 2), (7, 3), (8, 4), (9, 5)}.

The results are summarized in Table I. The first columnshows the number of sensors that are installed in the envi-ronment. The second column is the computation time for asimulation of 1000 steps, which is time linear in the numberof steps. In the three experiments, bot1 manages to revisit R2

and R1 multiple times and is able to do so infinitely oftenif the game continues. The third column shows the numberof revisits to R2 (or R1) in 1000 steps. The last columnshows the numbers of sensing actions applied in the threeexperiments. Since bot2 plays randomly, i.e, at each turn itchooses one available action at random, the difference in thenumber of revisits, the number of applied sensing actionsand computation time is in fact not a result of the sensordesign but rather because of the randomness in bot2’s actions.Through these experiments we observe that in order for bot1to accomplish its task specified in ϕs1, the minimum numberof sensors is 6, and the sensor design should cover all thestates where bot1 is in the workspace of bot2. Thus, thelocations of sensors matter: With a bad choice of sensor, evenwith 10 sensors, bot1 may not be able to accomplish thistask due to unresolvable ambiguity for the game states. Forexample, placing two sensors next to each other horizontallyor vertically provides less coverage than placing them next toeach other diagonally. Figure 5 shows the belief updates forthe initial 1000 steps in Experiment 1 with 10 sensors. Forthe other two experiments, the results are similar. In theseexperiments, the maximal cardinality of the beliefs is 67.At these states, bot1 believes bot2 can be anywhere in itsworkspace.

TABLE IEXPERIMENT RESULTS WITH THREE DIFFERENT SENSOR MODELS.

Num. of sensors Comp. time Revisits Num. of sensing acts10 551 11 4238 540.16 10 4216 525.61 11 403

0 100 200 300 400 500 600 700 800 900 10000

10

20

30

40

50

60

70

Number of steps (t)

The c

ard

inalit

y o

f belie

f

Fig. 5. The cardinality of beliefs v.s. the number of steps in experiment 1with 10 sensors.

As a comparison, we used the offline solver Alpaga [11] tocompute the solution for this game with partial information.Alpaga reduces the partially observable game to one withcomplete information by a subset construction. Due to thelarge state space (13400 states), the solver does not generateany output after 30 minutes and is terminated. The efficiencyand practical usability of the proposed method is observed.

VII. CONCLUSIONS

Our work shows that when additional information can beobtained through sensing actions, one can transform a sure-winning strategy with complete information to an observation-based, almost-sure winning strategy with partial information,which is then combined, at runtime, with a sensing strategyto ensure a given temporal logic specification is satisfiedwith probability 1. The method avoids a subset constructionfor solving games with partial information. Meanwhile, theactive sensing strategy leads to a cost-efficient way of sensordesign: Although we require a sufficient set of sensing actionsto avoid dead-ends at runtime, the system minimizes theusage of sensing actions by asking the most revealing queries,depending on what specification is to be satisfied, and howmuch uncertainty the system has about the state at runtime.

A number of future extensions are of interest. First, cur-rently the active sensing strategy minimizes the maximalnumber of sensing actions over each exploration phase. Itis straight-forward to design an active sensing strategy thatminimizes an upper bound on the cost of sensing during eachexploration phase with respect to a given cost function, bysolving a constrained minimax problem: The system aims tominimize the cost of sensing actions for the worst-case sce-nario. Second, the sensing actions’ outcomes are deterministicand obtained instantaneously. That is, for a given propositionalformula, by applying the corresponding sensing action, thesystem knows the truth value of that formula. Since in practicemany sensor outcomes are probabilistic and delayed, and theenvironment might be changing during the exploration phasewith delayed sensor outcomes, in our future work, we willconsider sensing actions with probabilistic and delayed sensoroutcomes. Moreover, we will consider both uncertainties inthe sensor outcomes and the system dynamics. Uncertaintyin the system and environment dynamics gives rise to a two-player stochastic game, for which different solution techniques[34] shall be applied for control synthesis with respect totemporal logic constraints. Third, the framework of planningwith sensing actions can be adapted for robot motion planningin human-robot co-work, where human provides informationto the robotic system upon queries. For this extension, we arecurrently investigating modifications that need to be made toaccount for delays, uncertainty and ambiguity in the informa-tion provided by the human.

ACKNOWLEDGEMENT

The authors would like to thank Rudiger Ehlers of theUniversity of Bremen for inspiring discussions. This workis supported by AFOSR grant number FA9550-12-1-0302,ONR grant number N000141310778 and NSF grant number1446479.

REFERENCES

[1] R. Bloem, S. Galler, B. Jobstmann, N. Piterman, A. Pnueli, andM. Weiglhofer, “Interactive presentation: Automatic hardware synthesisfrom specifications: a case study,” in Proceedings of the Conference onDesign, Automation and Test in Europe. EDA Consortium, 2007, pp.1188–1193.

Page 13: Synthesis of Joint Control and Active Sensing …jfu2/files/papers/Fu-Topcu-2016...introduces active sensing to alleviate computational effort in con-trol design for systems interacting

0018-9286 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2016.2518639, IEEETransactions on Automatic Control

13

[2] H. Kress-Gazit, T. Wongpiromsarn, and U. Topcu, “Correct, reactiverobot control from abstraction and temporal logic specifications,” IEEERobotics and Automation Magazine, vol. 18, pp. 65–74, 2011.

[3] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizontemporal logic planning,” IEEE Transactions on Automatic Control,vol. 57, no. 11, pp. 2817–2830, 2012.

[4] ——, “Formal synthesis of embedded control software for vehiclemanagement systems,” in Proceedings of AIAA Infotech Aerospace,2011.

[5] A. Pnueli, Y. Saar, and L. D. Zuck, “JTLV: A framework for developingverification algorithms,” in Computer Aided Verification, ser. LectureNotes in Computer Science, T. Touili, B. Cook, and P. Jackson, Eds.Springer Berlin Heidelberg, 2010, vol. 6174, pp. 171–174.

[6] C. Finucane, G. Jing, and H. Kress-Gazit, “Designing reactive robot con-trollers with LTLMoP.” in Automated Action Planning for AutonomousMobile Robots, ser. AAAI Workshops, vol. WS-11-09, 2011.

[7] T. Wongpiromsarn, U. Topcu, N. Ozay, H. Xu, and R. M. Mur-ray, “TuLiP: a software toolbox for receding horizon temporal logicplanning,” in Proceedings of the International Conference on Hybridsystems: Computation and Control, 2011, pp. 313–314.

[8] K. Chatterjee, M. Chmelik, and M. Tracol, “What is decidable aboutpartially observable markov decision processes with omega-regularobjectives?” Computer Science Logic, pp. 165–180, 2013.

[9] K. Chatterjee and L. Doyen, “Partial-observation stochastic games:How to win when belief fails,” Annual IEEE Symposium on Logic inComputer Science, pp. 175–184, 2012.

[10] A. Arnold, A. Vincent, and I. Walukiewicz, “Games for synthesis ofcontrollers with partial observation,” Theoretical Computer Science, vol.303, no. 1, pp. 7–34, 2003.

[11] D. Berwanger, K. Chatterjee, M. De Wulf, L. Doyen, and T. A.Henzinger, “Alpaga: A tool for solving parity games with imperfect in-formation,” in Proceedings of Tools and Algorithms for the Constructionand Analysis of Systems. Springer, 2009, pp. 58–61.

[12] K. Chatterjee, L. Doyen, T. A. Henzinger, and J.-F. Raskin, “Algorithmsfor omega-regular games with imperfect information,” Logical Methodsin Computer Science, vol. 3, no. 4, pp. 1–23, 2007.

[13] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizontemporal logic planning for dynamical systems,” in Proceedings of theIEEE Conference on Decision and Control, 2009, pp. 5997–6004.

[14] R. Alur, T. Dang, and F. Ivancic, “Counterexample-guided predicateabstraction of hybrid systems,” Theoretical Computer Science, vol. 354,no. 2, pp. 250–271, 2006.

[15] M. Kloetzer and C. Belta, “A fully automated framework for control oflinear systems from temporal logic specifications,” IEEE Transactionson Automatic Control, vol. 53, no. 1, pp. 287–297, 2008.

[16] R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas, “Discreteabstractions of hybrid systems,” Proceedings of the IEEE, vol. 88, no. 7,pp. 971–984, 2000.

[17] J. Fu and H. Tanner, “Bottom-up symbolic control: Attractor-basedplanning and behavior synthesis,” IEEE Transactions on AutomaticControl, vol. 58, no. 12, pp. 3142–3155, 2013.

[18] H. Kress-Gazit, G. Fainekos, and G. Pappas, “Temporal-logic-basedreactive mission and motion planning,” IEEE Transactions on Robotics,vol. 25, no. 6, pp. 1370–1381, 2009.

[19] C. Kwok and D. Fox, “Reinforcement learning for sensing strategies,” inProceedings of IEEE/RSJ International Conference on Intelligent Robotsand Systems, vol. 4, 2004, pp. 3158–3163.

[20] S. Zhang and M. Sridharan, “Active visual sensing and collaborationon mobile robots using hierarchical POMDPs,” in Proceedings of theInternational Conference on Autonomous Agents and Multiagent Systems- Volume 1, 2012, pp. 181–188.

[21] R. Reiter, Knowledge in Action: Logical Foundations for Specifying andImplementing Dynamical Systems. Massachusetts, MA: The MIT Press,2001.

[22] R. P. Petrick and F. Bacchus, “Extending the knowledge-based approachto planning with incomplete information and sensing,” in Proceedingsof The Knowledge Representation Conferences, 2004, pp. 613–622.

[23] A. Albore, H. Palacios, and H. Geffner, “A translation-based approachto contingent planning.” in Proceedings of the International JointConferences on Artificial Intelligence, 2009, pp. 1623–1628.

[24] G. Shani and R. I. Brafman, “Replanning in domains with partialinformation and sensing actions,” in Proceedings of the InternationalJoint Conference on Artificial Intelligence, 2011, pp. 2021–2026.

[25] G. Shani, R. Brafman, S. Maliah, and E. Karpas, “Heuristics for planningunder partial observability with sensing actions,” Heuristics and Searchfor Domain-Independent Planning, pp. 30–36, 2013.

[26] Y. Chen, J. Tumova, A. Ulusoy, and C. Belta, “Temporal logic robotcontrol based on automata learning of environmental dynamics,” TheInternational Journal of Robotics Research, vol. 32, no. 5, pp. 547–565, 2013.

[27] J. Fu and U. Topcu, “Probably approximately correct mdp learning andcontrol with temporal logic constraints,” in Proceedings of Robotics:Science and Systems, 2014.

[28] J. Fu, R. Dimitrova, and U. Topcu, “Abstractions and sensor designin partial-information, reactive controller synthesis,” in Proceedings ofAmerican Control Conference, 2014, pp. 2297–2304.

[29] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,2nd ed. Pearson Education, 2003.

[30] E. Gradel, W. Thomas, and T. Wilke, Eds., Automata Logics, and InfiniteGames: A Guide to Current Research. Springer-Verlag New York, Inc.,2002.

[31] P. Gastin and D. Oddoux, “Fast LTL to Buchi Automata Translation,” inComputer Aided Verification, ser. Lecture Notes in Computer Science,G. Berry, H. Comon, and A. Finkel, Eds., 2001, vol. 2102, pp. 53–65.

[32] R. Alur and S. La Torre, “Deterministic generators and games for LTLfragments,” ACM Transactions on Computational Logic, vol. 5, no. 1,pp. 1–25, 2004.

[33] J. H. Reif, “The complexity of two-player games of incomplete infor-mation,” Journal of Computer and System Sciences, vol. 29, no. 2, pp.274–301, 1984.

[34] T. Chen, V. Forejt, M. Kwiatkowska, D. Parker, and A. Simaitis, “Au-tomatic verification of competitive stochastic systems,” Formal Methodsin System Design, vol. 43, no. 1, pp. 61–92, 2013.

[35] J. Fu, H. G. Tanner, J. Heinz, and J. Chandlee, “Adaptive symboliccontrol for finite-state transition systems with grammatical inference,”IEEE Transactions on Automatic Control, vol. 59, no. 2, pp. 505–511,Feb 2014.

[36] S. Prentice and N. Roy, “The belief roadmap: Efficient planning in beliefspace by factoring the covariance,” The International Journal of RoboticsResearch, vol. 28, no. 11-12, pp. 1448–1465, Nov. 2009.

[37] J. Bernet, D. Janin, and I. Walukiewicz, “Permissive strategies: fromparity games to safety games,” RAIRO-Theoretical Informatics andApplications, vol. 36, no. 3, pp. 261–275, 2002.

[38] B. Bonet and H. Geffner, “Planning with incomplete information asheuristic search in belief space,” in Proceedings of Artificial IntelligencePlanning and Scheduling Systems, 2000, pp. 52–61.

[39] A. Albore and H. Geffner, “Acting in partially observable environmentswhen achievement of the goal cannot be guaranteed,” in Proceedings ofInternational Conference on Automated Planning and Scheduling, 2009.

Jie Fu is an assistant professor in the Depart-ment of Electrical and Computer Engineering, withan affiliation in Robotic Engineering Program, atWorcester Polytechnic Institute. Previously, she wasa postdoctoral scholar at the University of Pennsyl-vania from 2013 to 2015. She received her M.Sc.degree in Electrical Engineering and Automatonat Beijing Institute of Technology, and her Ph.D.degree in Mechanical Engineering from the Uni-versity of Delaware. Her research interests include:control theory, formal methods, machine learning,

with applications to the design of cyber-physical systems.

Ufuk Topcu is an assistant professor at The Uni-versity of Texas at Austin. Previously, he was apostdoctoral scholar at the California Institute ofTechnology and a research assistant professor at theUniversity of Pennsylvania. He received his Ph.D.from the University of California, Berkeley. He hashis bachelors degree from Bogazici University inTurkey, and his masters degree from the Universityof California, Irvine. His research focuses on thedesign and verification of autonomous networkedsystems.