an architecture for adversarial planning

186 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 20, NO. 1, JANUARY/FEBRUARY 1990

An Architecture for Adversarial Planning CAROL APPLEGATE, MEMBER, IEEE, CHRISTOPHER ELSAESSER, AND JAMES SANBORN

Abstract -Planning in the battle management domain poses problems, in combination, that conventional Artificial Intelligence planning a p proaches fail to address. Problems include an unpredictable and dynamic environment; control of several semiautonomous intelligent agents; the need to adjust plans dynamically according to developments during plan execution; and, most importantly, the need to consider the presence of an adversary in devising plans. The requirements imposed on automated planning systems by battle management is examined, a planning approach that meets these requirements is described, and an architecture for investi- gating adversarial aspects of battle planning is presented.

I. INTRODUCTION

ATTLE MANAGEMENT involves directing the be- B havior of multiple agents acting cooperatively but semiautonomously on the battlefield, each receiving orders (the agents’ goals) from superiors, and taking concrete action in the environment to effect change. In military operations, each level of the command hierarchy engages in an ongoing process of 1) generating a course of action for its subordinates, 2 ) issuing orders to subordinates to carry out this plan, 3) monitoring the activities and status of subordinates to determine the potential for plan success or failure, and 4) modifying the plan and generating new orders accordingly. T h s domain differs from those that have been addressed by typical artificial intelligence (AI) planning systems because the events that actually take place cannot be predicted with certainty, but it is its adversarial nature that most complicates battle planning. Agents on the battlefield that are adversaries must be expected to attempt to foil each other’s courses of action whenever possible. Because its adversary will not be com- pliant, either in its behavior or in its willingness to reveal its intentions, a planner must commit to action based on dated, often erroneous information about its adversary. Because of the limited ability of the planner and its agents to monitor activity throughout the battlefield, the planner must make assumptions about the outcome of events, replanning in the face of new or changing assumptions.

No single planner, human or automated, can effectively control and coordinate all battlefield activity. Indeed, this is the reason that flexibility in carrying out orders is built into the military chain of command. Each agent must base

Manuscript received February 2, 1989; revised July 8, 1989. The authors are with the MITRE Corporation, Mail Stop W429,

IEEE Log Number 8930992. McLean, VA 22102-3481.

its actions on its superior’s orders and whatever information it has regarding the disposition of its adversary and its own forces. A lower echelon, being closer to concrete activity on the battlefield, must be more responsive to the dynamics of its immediate environment and have a faster reaction time than those planning higher level, more general tasks. As such, these agents reason more directly about the impact of ongoing change in the environment. From the hgher echelon’s perspective, the actions of a lower echelon can be considered reactions, although they might require substantial planning on the part of the lower echelon.

Any planning system that addresses the issues posed by battle management will be fundamentally different in design from traditional AI planners. Most AI planning systems assume a generally benign domain, controlling a single, predictable agent in pursuit of an explicit, well- defined goal. The dynamic and unpredictable nature of military planning precludes the use of such planning tech- nology. In the heat of battle, a course of action must often be modified to respond to a perceived threat, rather than to achieve a well-defined, higher level goal. The critical functions of plan execution and situation monitoring, ne- glected by most AI planning systems, determine the prospects for plan success or failure by enabling effective, ongoing plan expansion and modification.

The research reported herein exarmnes adversarial planning for battle management, concentrating on 1) hypothe- sizing and reasoning about possible plans of an adversary during plan generation; 2 ) integrating the planning, acting, monitoring, and replanning processes in an adversarial environment; and 3) coordinating the actions of several semiindependent agents, each capable of planning on its own, into a coherent overall plan. The first step in our research has been to define the functionality required of an adversarial planning system. The following sections discuss the requirements from the perspective of incorporating adversarial planning with battlefield dynamics. Based on this analysis, we have devised an architecture for studying the automation of the battle management functionality of one layer in a command hierarchy. Section V presents an overview of the architecture.’

‘The reader is encouraged to turn to Fig. 2 from time to time to understand the relationships between the components of the archtecture.

0018-9472/90/0100-Ol86$01.00 01990 IEEE

APPLEGATE et U / . : ARCHITECTURE FOR ADVERSARIAL PLANNING 187

11. ADVERSARIAL PLANNING FOR BATTLE MANAGEMENT

There has been at least one review of how AI planning relates to battle management [18], but it did not focus on the adversarial nature of battle management. This is not surprising; little of planning research directly relates to the domain. Consider the application of traditional state-based planning to the battle management domain: a planner examines the state of the environment at some point in time, assumes a single course of action for its adversary, and generates a course of action that achieves the commander’s intent. The resulting plan could then be “executed,” provided assumptions made by the planner re- main correct during execution. In particular, successful execution of the plan depends on the adversary’s adhering to the planner’s assumptions regarding its plan. If the adversary attempts to counter concrete activities of the planner, as could a human opponent, the planner would have to re-evaluate the situation and plan again. Again, the success of this new plan depends on deviations from the state model developed during plan generation being negli- gible. This is a manifestation of the problem with Strips- assumption planners Wilkins calls “hierarchical promiscuity” [23]. Because of stratification in the military command hierarchy, a battle planner is unable to predict with certainty and, therefore, to reason directly about the effect of battlefield outcomes on later plan steps. This problem is the fundamental feature determining the architecture of a battle planner.

In battle planning, reasoning about the possible actions an adversary may take to counter an operation is necessary to maximize the potential for success of the resulting plan. Early theories of action in an adversarial environment from the “game theory” literature [lo], [ l l ] , [20], that form the basis of decision strategies in AI game playing and multiagent cooperative behavior [13] are only of tangential interest to adversarial planning for battle management because they focus on optimal equilibrium strategies in conflict situations. Battle planning is concerned with up- setting equilibrium situations.

Game playing, distinct from game theory, was one of the first domains examined in AI [15]. But the reasoning used in two-player, perfect information games such as Chess and Go is not directly applicable to battle planning because it requires the space of possible adversarial responses to be well defined. Lehner proposes a game-tree search approach in which adversarial planning is “ viewed as a process of generating a set of contingency plans that guarantees achievement of a specified objective no matter which of a set of anticipated-as-possible actions the adversary pursues” [9, p. 31. But in order to develop “provably complete search procedures,” Lehner must assume a closed form representation of the world, which is inappropriate to battle management. As Stachnick et al. point out, it is not possible to reason about all potential adversarial responses at a level of abstraction consistent with generation of orders for the next lower echelon because of the uncertain-

ties of the battlefield: First, the goal in the military problem is not presented as a state to be achieved. For example, defend in sector has a somewhat ill defined relationship to the physical states of the battlefield. Any particular array of forces may or may not represent a successful defend in sector mission. Sec- ondly, the military operators, the tactical actions, are not provided as actions which have as input a particular world state and yield as output another element in the set of world states. . . .Again, these actions do not clearly map onto physical battlefield states and no unique physical state corresponds to achieving one of these actions. [18, p. 1181

Furthermore, the model of a sequence of alternating plays required for a game-tree approach does not accurately reflect the complexity of the battlefield, in which both sides simultaneously execute possibly interacting activities. Indeed the “state” of the battlefield cannot be well defined in such situations.

There are other problems with game-tree search for battle planning. It is not clear how to compare states of battle in order to evaluate and choose among candidate plans. For example, a unit can minimize attrition by avoiding engagement with potential attackers, but doing so would be disastrous to an overall defense plan. The problem is that some game-tree search can have a local focus of attention, as opposed to the more global view taken by military planners. By focusing on possible responses to plan steps, such a planner may fail to notice when an opponent is carrying out a winning strategy. To ensure the success of a plan, it is important not only to be reasonably certain that the enemy is unable to take effective countermeasures but also to recognize the enemy’s strategy and ensure that it does not succeed. It is also the case that states of battle are not discrete and thus are not enumer- able.*

There has been some planning approaches to game playing that could, in principle, overcome the conflict between local focus of control and global plan strategy. Wilkins [21] built a planner, called Paradise, that generates plans from a static analysis of tactically sharp middle game situations in chess. Paradise differs from other game playing programs in that it produces a contingency plan for several moves at a time. The plan guides tree search for several ply, reducing node expansion and the number of expensive static position evaluations. Such a strategy should allow the planner to discover effective plans with less local backtrackmg by allowing plan strategy to dominate static analysis at choice points.

The approach taken in Paradise is not directly applicable to battle management, however, because it is limited to static situations in a discrete state space. Because concrete action cannot be reasoned about directly by higher echelons (because of the inherent uncertainty of the battlefield),

20f course game playing programs do not usually consider every possib!e state. However, the capability to do so is inherent in the search procedures.

188 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 20. NO. 1, JANUARY/FEBRUARY 1990

a purely state-based plan representation will be inadequate for battle planning. Perhaps more importantly, the flexibility afforded the adversary in Paradise was so limited that modeling and reasoning about the adversary’s plan were not necessary: Paradise views the adversary’s strategy simply as trying to delay loss of materiel. Paradise takes an offensive orientation and never considers changing it, as a battlefield commander does when the adversary presents the opportunity to attack or when the commander must switch from offense to defense under counterattack.

Because there are infinitely many possible plans a battlefield adversary might follow (and counter plans, and responses to those, and so on), an adversarial planner must weigh possibilities and base its reasoning on the most likely counter plans. Ongoing analysis of the adversary’s possible plans is crucial, not only to planning effective countermeasures but also to discovering points of enemy strengths and weaknesses that should lead to a change in strategy.

Carbonell [ 11 developed an approach to reasoning about adversarial behavior called “counterplanning” that main- tains a strategic view of plan conflicts. The approach, implemented in a program called Politics, uses the assumed preferences of the adversary in a goal network to determine how a specific action fits with the goals hypoth- esized for the adversary, analyzes the implications of the action for its own goals, and proposes an action to neutral- ize or counter the adversary’s apparent plan.

Metalevel control in Politics is implemented through two heuristics called constructive and obstructive counterplanning. Constructive counterplanning is planning to continue to pursue one’s own goal by alternate means when the adversary threatens achievement of the goal. Obstruc- tive counterplanning is planning to prevent the opponent from achieving its conflicting goal. Selection of the counter-action is based on “strategies” keyed on relative importance relationships among the adversary’s goals. For example if an adversary makes a threatening action pur- suant to a particular goal and there is another goal more important to the adversary, then threatening that more important goal might cause the original threat to be with- drawn.

Carbonell’s work introduces the notion reasoning about an adversary’s intentions during counterplanning. But Carbonell did not incorporate a theory for plan generation; the planning mechanisms in Politics are triggered after recognition machinery determines the role of an adversary’s action in a simple goal network.

These observations about the weaknesses of previous planning work in the battle management domain highhght several requirements for battle management systems. The first is that a battle must be responsive to ongoing change on the battlefield. This means that, while the planner may develop a long-term plan based on an initial estimate of the battlefield situation, it must be prepared to modify the plan as intelligence information becomes available during plan execution. Second, the planner must reason under assumptions about an adversary’s plans and note when

decisions depend on events occurring during the battle. As plan execution progresses, these assumptions should be tracked for validity. When intelligence data concerning the validity of an assumption becomes available, decisions dependent on the assumption must be reconsidered. In this way, the overall plan is fleshed out as information becomes available during execution. Third, the planner must consider that its agents carrying out the plan are, from its point of view, situated and must be allowed some autonomy. The following section discusses an approach to planning that takes these requirements into account.

111. AN APPROACH TO ADVERSARIAL PLANNING

The purpose of a battle planner is to construct and maintain a plan to achieve goals specified by the commander and to recognize when to suggest changes in the goals if they become unattainable. The battle plan, even at its highest level of abstraction, will necessarily bear some relationship to the goals of the adversary. Whereas a nonadversarial planner is concerned with interactions between steps in its own plan, an adversarial planner must also consider conflicts between two opposing plans. Planned activities may counter, or be countered by, activities planned by the adversary. It is this adversarial nature of battle planning that most distinguishes the required approach to planning. The relative importance of goals must be represented if the planner is to exploit the adversary’s preferences in attempts to counter actions and to make trade-offs. The ability to reason about a plan’s structure is also essential to coordinating the actions of multiple agents, where the success and failure of actions in one part of the plan can affect the outcome of other (later) parts.

Since it is impossible to know the adversary’s plan before (or even during) a battle, adversarial planning in military domains is necessarily assumption based [4]. A battle planner uses assumptions 1) to discover or recognize problems in the plan as it unfolds, 2) to select among plan steps based on the degree of certainty the planner has in the adversary’s assumed course of action, and 3) to pursue planning concurrently with execution, allowing feedback to influence subsequent decisionmaking. Assumptions concerning likely actions are made during the planning process and monitored as the battle progresses.

Because the adversary’s planning process can be mod- eled as a mirror image on one’s own planning3 the same machinery used in plan generation could be used in developing hypothetical counter plans of the enemy. However, it may be more appropriate to develop offensive and defensive plans using different approaches, as it appears human planners do [12]. Thus some form of metaplanning may be required to direct the planner, depending on the mode.

Because of the dynamics of the battle management domain, “ replanning” is subsumed in the process of main-

3That is not to say that the adversary’s goals are the same

APPLEGATE et d. : ARCHITECTURE FOR ADVERSARIAL PLANNING 189

taining a plan based on intelligence from the battlefield; only rarely should the situation be so misjudged that replanning degenerates to “ planning again.” However, there are situations in which it may be preferable to rework the plan extensively instead of making a minor change. A planner must balance the desire to modify as little of the plan as possible against the need to pursue the best overall plan. A planner that concentrates solely on limiting changes to an existing plan may fail to notice opportunities to succeed, as well as to avoid failure, as it pursues its current plan. Metaplanning control is needed to prevent the planner from becoming so obsessed with detail that is loses sight of the overall goals.

A. Plan Representation

It has already been noted that a traditional state-based representation is not entirely appropriate for battle planning [18]. It appears that in battle planning, plan steps are most appropriately viewed as specifying how to achieve a task, as in the representation developed for Sipe [22]. This approach allows the planner to place less significance on states of the world than on the derivation and persistence of desired conditions in the world.

In most planning approaches, the “leaves” of a plan serve as primitive, executable actions. A battle planner’s primitives constitute orders to subordinate units and are not themselves carried out in the execution environment. Even so, most of the representational detail required for execution and monitoring in traditional planners is necessary for battle planning by the strategic planner: a list of actions, temporal information constraining the order of actions (including interagent coordination), parameter bindings for variables in the actions (i.e., who, what, when, where), and preconditions of the actions [19].

The preconditions of an action may be partitioned between the planner’s abstracted view of the battlefield and the concrete environment in which the agents of the planner carry out the activities specified in plans, depending on what level in the hierarchy the steps occur. Because a battle planner is necessarily separated from its multiple agents carrying out its orders, four types of preconditions must be considered.

1) Operator application criteria: things that may be assumed to be true or can be planned to be made true.

2) Success criteria: conditions that, if found to be un- true, cause plans to be changed because they predict failure of the entire plan or of specific steps.

3) The physical conditions that must exist in order for a primitive activity to be executed in the environment.

4) The conditions that should exist for an activity to acheve its intended effect in the environment.

Type 1) and Type 2) preconditions concern the strategic planner during planning, monitoring, and replanning. An example of a Type 1) precondition is having a favorable force radio in order to undertake a particular kind of

time b

Fig. 1. Strategic planning.

attack; it might be either assumed or planned, say by moving reinforcements. Thus Type 1 preconditions are “local” to operator choice, and replanning is managed by the operator selection machinery of the strategic planner. Whereas Type 1 preconditions apply to individual activities or operators, Type 2 preconditions are global to the entire plan and are monitored and managed by metaplanning control mechanisms. An example of a Type 2 precondition might be that the Army has the initiative-it is basic to the entire plan and significant replanning might occur if it becomes false. The difference between Type 3 and Type 4 preconditions is that it may be possible to execute a certain action 3), but not actually achieve the intended effect 4), e.g., an attack could be mounted, but fail. Type 3 and 4 (preconditions) are managed by the subordinate agent-planner as it decides when to execute a primitive activity.

B. Control

Most planners construct an entire plan before any actions are executed and monitor execution simply by checlung that the preconditions for each plan operator are intact before the operator is executed [5]. However such an execution monitoring strategy is appropriate only in domains where the planner can ensure that an entire plan is feasible at its lowest level of abstraction prior to execution

A herarchical planner typically produces a complete plan at one level of abstraction before expanding the plan to the next level of detail [14]. Fig. 1 shows how adversarial plan development differs from strict hierarchical plan development. In battle planning, a very general but complete plan is developed before execution begins. Although the plan is likely to be revised as the battle progresses, its completion at a high level of abstraction is necessary to serve as guidance to lower echelons. The first level of the plan, the concept of operations, will usually be a conjunc- tion of temporally ordered activities that correspond to phases of the battle (e.g., defend in sector, counterattack to seize the initiative, exploit weaknesses using reserves). But

1161, ~191.

190 IEEE TRANSACTIONS O N SYSTEMS, MAN, AND CYBERNETICS, VOL. 20, NO. 1, JANUARY/FEBRUARY 1990

because of the dynamic and uncertain nature of the battlefield, it is not possible to construct a fully detailed plan prior to execution: often decisions on how to pursue some branch of a plan are contingent on information not available until part of the plan has been executed. Only the early plan steps are profitably chosen before execution begins. Note that this ongoing planning approach to the problem of hierarchical promiscuity differs with the approach taken in Sipe [23].

As indicated in the figure, some leaves of the strategic plan may become ready for execution before other branches of the plan have been expanded to the same level of detail. A continuous flow of information about the status of the planner’s agents and their adversaries is required to continue planning. Thus intelligence gathering and interpreta- tion are integral to the adversarial planning process and are not separate, independent functions.

A battle plan’s later steps are necessarily contingent on the success of earlier steps and on intelligence gathered during execution, so the plan generally will not be complete in detail at any point in the battle. There are three obvious methods for planning for contingencies. One approach is to do as humans do and develop contingency plans. This approach will generally be limited to the higher, “strategic” layers in the plan hierarchy because not all contingencies can be accounted for exhaustively. Another approach is to use conditional operators that specify prior to execution the possible actions at each decision point. This approach cannot be used in the higher layers of the plan because the nature of the battlefield makes conditional operators intractably complex. The more appropriate approach is to conduct planning and execution in parallel, allowing agents on the battlefield a certain amount of autonomy to deal with contingencies, and to monitor execution to gain intelligence in order to make more in- formed decisions. As a consequence, an effective battle planner makes no distinction between plan generation and replanning. Details are developed during execution as information affecting choices becomes available.

Note that we have not discussed plan step choice in this discussion. This remains a topic for further research, although we anticipate employing a modified state-space approach combined with explicit risk assessment. To avoid confusion on this point in the following discussion, we emphasize that our planning approach does not use simulation to predict outcomes and choose among plan steps. The architecture described herein uses simulation strictly to model the execution of actions and thus provide information about activity in the environment to the planner; i.e., our simulation is a surrogate for actually executing the plan on the battlefield.

IV. INCORPORATING BATTLEFIELD DYNAMICS INTO PLANNING

Multiple agents acting semiautonomously and asyn- chronously populate the battlefield, each pursuing goals that may call for cooperation with others, as in the case of

a coordinated attack, or may call for antagonistic con- frontation, as in an adversarial engagement. Abstract, long-range planning can incorporate reasoning about an adversary but is inherently inadequate for planning detailed actions that agents must perform to carry out orders. These agents have unique, limited perspectives on the state of the environment; higher echelons specify the goals their agents should pursue, but ultimately it is an agent’s local situation on the battlefield that determines its response to events. Thus agents executing concrete actions must have the ability to react, as well as to act intentionally, in carrying out order^.^

Recall that from the point of view of the high-level planner, specific actions of its agents can be considered “reactions” although substantial planning on the part of the agents may be required to generate these actions. Uncertainties in the environment are the fundamental source of the required dichotomy between “planned” and “reactive” behavior in battlefield planning. To incorporate battlefield dynamics into planning, each agent executing concrete actions must be capable of both reasoning in the context of the immediate situation and communicating with the higher level strategic planning system. The output of the high-level planning system may be viewed as a collection of orders to subordinate echelons that are in turn capable of planning and acting according to more immediate situations and more timely information about the actions of their adversaries.

In general, agents acting in a battlefield environment attempt to carry out orders subject to the constraints imposed by the agent’s local situation. Agents must deal with interactions with one another, as well as with passive elements (physical obstacles), in the execution environment. Because of the independent, adversarial nature of these agents’ activities, many interactions occur unexpect- edly, thereby demanding real-time responses from agents. For agents acting in battlefield environments, this functionality is a prerequisite for successful operation. For instance, if a unit encounters an adversary it has not been ordered to attack, it may decide to try and outflank rather than engage the enemy unit in order to complete its current mission. However, if outflanking is not possible, the unit may need to stop and defend, or even withdraw, to ensure its survival. In addition, agents must communicate intelligence to their superiors and accept new or modified orders whle acting in the environment.

A. Action Managers

Recently, the shortcomings of AI planning techniques have led to research in situated reasoning [6] aimed at incorporating the dynamics of realistic domains into the traditional plan generation framework. Much of this work is based on integrating plan generation with execution or

4Recall that from the point of view of the high level planner, specific actions of its agents can be considered “reactions” although substantial planning on the part of the agents may be required to generate these actions.

APPLEGATE et U/. : ARCHITECTURE FOR ADVERSARIAL PLANNING 191

developing conditional plans [7], [17], and [8] in an attempt to incorporate inherently situation-driven reasoning into a goal-directed reasoning framework. In order to prevent undesirable states from occurring, these systems must have a goal to avoid such states. In the battle management domain, the space of undesirable states and conditions not only is large but also is defined relative to the orders and abilities of various agents. Thus a global distinction between good and bad states cannot be made. In fact the notion of a “state” cannot be well-defined because of the simultaneous actions of multiple agents.

Our approach to modeling battlefield dynamics follows those described in [2] and [16]. These systems model activity in terms of concrete actions and direct observations made in the world. Rather than reasoning in the abstract about goals to be achieved, these systems reason in concrete situations about maintaining an agent’s survival. As such, both approaches are based on situation-driven reasoning about the relationshp between an agent and its environment. A limitation of these systems is that their higher level goals are static. In the battlefield domain, each agent must operate under diverse goals (orders) that may change over time, depending on the progress of a given battle.

To model realistically the dynamic environment of the battlefield, and the agent’s reactions in the environment, it is necessary to model the decision making of these semiautonomous agents. Recall that the agents are directed by the planner but must make decisions about how to achieve the orders and must react to local situations about which the planner may be unaware. Our approach achieves this by using an action manager (AM) to control each agent that the planner directs. The function of an AM is to constrain an agent’s activities based on an ongoing analysis of its local situation with respect to its orders. While the orders influence AM decisionmaking, they do not provide an explicit plan. Indeed, given the dynamic, unpredictable nature of the battlefield, it is in general not possible to predetermine the measures an agent will need to undertake to carry out its orders. Rather, responses are made to predicted or existing conditions in the environment in order to maximize the potential for achieving specific goals. Each AM considers orders as goals to be acheved subject to the constraints imposed by the battlefield. To- gether, the collection of A M s provides an interface between orders of the planner and specific activities carried out by its agents in the environment.

AM’s directly control the activities of agents in the execution environment and represent the lowest level of reasoning. As such, reactive behavior exhibited by these agents is determined by AM’s, as opposed to the longer term replanning component of the strategic planner. Since changes in the state of the environment are made by the actions of many independent agents, each AM must monitor its local situation to determine new or changing information that may affect its activities. Thus its reasoning is situation driven.

However an AM must take its current orders into account, in choosing both intentional actions and situation- driven reactions. Since the space of possible orders is diverse, the behavior of each AM may differ depending on its current orders. Thus an AM is not a situated automa- ton, hardwired to carry out a specific function. Since orders themselves do not imply specific plans of action, they are used as heuristics in selecting among alternative responses to given situations. The use of diverse orders to influence concrete activity in various ways provides a goal-directed reasoning component in the AM system.

As an example of the influence of various orders on situated-reasoning, consider the various responses of a regiment to the slighting of an enemy unit. If the regiment is acting under an achieve-tactical-objective order, where the desired objective is to the rear of the enemy unit, it may elect to maneuver around the enemy rather than to engage. On the other hand, if its orders are to defeat-in- detail, it may engage, and even pursue, the enemy. Finally, if the enemy is of sufficient strength, the regiment may need to defend itself and request reinforcements, no matter what its orders are.

The final requirement for incorporating battlefield dynamics in planning is for the agents to communicate intelligence information back to their superiors. This is required in order for the superior to track changes to assumptions that may require high-level plan modification and replanning. Thus, lower echelon units must monitor aspects of their surroundings based on the content of their orders, so that when intelligence relevant to the success or failure of an existing order becomes available to the unit, it is also made available to the unit’s superior. This ongoing communication, in the form of orders and intelligence, is crucial to the overall ability of the system to update plans and achieve goals on the battlefield.

The AM approach is useful for several reasons. It is important that the planner make assumptions about ’ abstract levels of activity on the battlefield. However since the battlefield is dynamic, some system capable of real-time responses to ongoing activity is necessary. By emulating the semiautonomous, intelligent agent in the environment, the strategic planner may effectively ignore many of the problems associated with generating completely specified plans. In this way, the strategc planner operates under reasonable assumptions regarding the dynamics of its world. Whereas its agents must react to ongoing activity in the execution environment, the strategic planner may take a longer-term view of activity, sending orders to several subordinates to direct them as a group. This approach frees the strategic planner from concern for planning concrete activity in the execution environment, while ensuring that regiments respond effectively to changing situations. In addition, since AMs control concrete activity, agents may react to events in absence of current orders. This is important to the strategic planner’s replanning function, as there may be temporal gaps between orders during which agents must act autonomously to the best of their ability.

192 IEEE TRANSACTIONS ON SYSTEMS, MAN. AND CYBERNETICS. VOL. 20. NO. 1. JANUARY/FEBRUARY 1990

This use of situated reasoning to control execution does not imply that situated reasoning is a reasonable approach for generating plans for agents that must coordinate subordinates. Situated reasoning can, however, be used as a tool to model the planning behavior of agents that must plan and react on their own. A situated reasoning component can provide realistic results for monitoring by the strategic planner without the need to implement independent planners for each agent.

B. Execution Environment

Realistic automated planning in the battle management domain is not possible without a battlefield in wluch to execute plans. Given the layering of the command luerar- chy, adversarial planning for battle management may be studied by modeling the concrete activity of the lowest level echelon considered in the plan.’ Sufficient information for planning can be generated without explicitly modeling the echelons below those addressed in the plan.

The dynamic and unpredictable nature of the battlefield imposes several requirements on the execution environment. First, the execution environment must model the incomplete domain information available to each agent on an ongoing basis in order for agents to modify their activities. To simulate an agent changing its plan, the environment must accept new activities during execution, accurately computing the effects of stopping some activities in midstream and beginning others. Since military units are capable of doing several things at once, the execution environment must simulate simultaneous actions on the part of each agent as well as simultaneous actions of several agents. Finally, the environment must determine the outcomes of interactions between agents, in addition to the results of simpler, noninteracting actions for each agent.

These requirements place two special demands on the execution environment. First, that the planning agents have incomplete information about their adversaries implies that the planning agents cannot predict with any certainty what the outcomes of their plans will be [3]. This requires the execution environment to notify plan monitors of pertinent events that occur during the simulated execution of their plans. Yet the system must reflect the reality that the planning agents are aware of only those events and opponents that they can observe in some way, so the execution environment must report only this observable information. Second, to support replanning, the execution environment must respond dynamically to changed plans, incorporating new actions into the current situation that has resulted from partially executed plans.

The execution environment in our implementation receives actions to simulate from action managers described

‘The planner that is the basis for the work reported herein is aimed at planning at the Soviet Army and U.S. Corps level; th15 means that the simulation should model the activities carried out by regiment- and brigade-level units.

in the previous section. These actions, called “activities,” are indivisible, atomic actions from an AM’s point of view. However, actions must be simulated in detail to determine interactions between the activities of multiple independent units. To see why this is true, consider a maneuver activity that specifies a destination and a desired time of arrival. If simulated maneuvers only consider the starting and destination locations, no interaction between agents will be detected when the agents plan maneuvers with intersecting paths6 Maneuvers must be simulated in detail in order to detect these interactions,

Since the agents of both adversaries are executing plans simultaneously, it is important that the execution model not be an alternating sequence of “moves” from the two sides. Because a military unit is capable of doing several things at once, for instance moving and firing, the environment must also be capable of executing more than one activity at a time by a single agent.

Since our AM’s must change their behavior to react to events in the environment, they must be notified about the outcomes of planned activities and about developments in the environment as they become “observable.” Examples of such observable events are incoming fire, another unit moving into a position that is visible to the agent’s forward observers, and encountering obstacles. However, ground truth events that are not visible to an agent should not be reported. As in reality, the agents must react with incomplete information about the environment.

Because effective adversarial planning involves replanning, the environment must incorporate plan changes during plan execution. When the execution environment receives new activities from an AM, activities that are currently planned, but not yet executed must be discarded in favor of the newly planned activities. Actions that have already been executed cannot be retracted, but the execution environment must be able to discontinue actions that have been only partially accomplished and begin executing the new activities that supersede the current ones. New activities must be executed in the ground truth situation that has resulted from all actions executed so far.

To support the requirements discussed above, our execution environment design is centered around a set of actions that are in progress at the current simulation time. Activi- ties received from the AM’s are decomposed into “plays.” the basic unit of simulated action. Each play has a dura- tion defined by the play’s start and finish times. At any point in the simulation, plays with start times earlier than the current simulation time and finish times later than the current simulation time are considered to be “active” or “in progress.” This queue of currently active plays. re- ferred to as “plays in progress.” is the focus of the execution environment. The starting or finishing of a play constitutes an “event” in the simulation. Plays with start times later than the current simulation time are maintained

‘Military units are typically spread geographicall! and occupy arcas for considerable time.

APPLEGATE er ul. : ARCHITECTURE FOR ADVERSARIAL PLANNING

t I

I

I - - - - - _ - -

Fig. 2. Adversarial planning architecture.

separately from the plays in progress. A queue of plays, called a play-list, associated with each agent holds the plays planned for future execution by that agent. If we assume that replanning does not occur and other events do not prevent their execution, plays in the play lists will become plays in progress as the simulation clock advances.

Separating planned plays from plays in progress and centering the simulation around the plays in progress (rather than around the agents or around a master queue of all planned events) 1) allow the execution environment to model simultaneous actions by both sides as wen as multiple actions by a single agent; 2) localize the plays that can possibly interact, thus allowing easy determination of the effects of simultaneous actions on each other; and 3) provide the ability to discontinue partially executed activities whle maintaining the effects generated by partial execution.

V. ARCHITECTURE OVERVIEW

The requirements discussed in the previous sections fall

requirements for adversarial planning, requirements for modeling agents reacting on the bat-

battlefield simulation requirements.

The architecture developed to meet the requirements in each of these categories is shown in Fig. 2. The architecture is composed of three major components: the strategic planner (SP), a collection of action managers (AMs), and the execution environment (EE). Each of these components is designed to address one requirement category, although there is necessarily some overlap in their functionality. Also shown in the figure is a module representing an adversary’s planning and intelligence functions. This module is shown because the SP expects an opposing planner to be controlling adversarial agents in the game.

The SP generates a course of action to be carried out by agents on the battlefield, each of which is controlled by an

into three categories:

tlefield,

193

individual AM. Communication between connected modules is bidirectional; thus reactions (by AMs) and plan modifications (by the SP) are possible. The SP initiates activity by sending orders for each agent to the appropriate AM. Each AM uses its current orders to influence local decision making, using its local view of the battlefield from the EE to determine activities on an ongoing basis. The EE simulates the execution of activities and reports observable information to each AM. AM’s in turn send intelligence information to the SP. At any point during the simulation, the SP may send new orders to AM’s in response to violated assumptions or perceived exploitations. AM’s may change, rescind, or add regiment-level activities based on new orders or changing local situations.

VI. CONCLUSION

Our research goal is to study adversarial planning issues posed by battle management, including reasoning about hypothetical adversarial plans and integrating long-term strategic planning with near-term plan execution and reaction. The first phase of the research was to develop an adversarial planning architecture for battle management applications. T h s article has discussed the requirements of planning for battle management and implications of those requirements on the architecture of an adversarial planning system.

Battle management is an ongoing process in which one must consider the adversary’s likely actions-in generating, executing, and modifying plans. Since a hgher echelon planner cannot predict with certainty the course of events on the battlefield, it must plan under assumptions about these events, modifying its plans in light of battlefield intelligence. While the planner may generate a complete high-level plan based on an initial estimate of the situation, details can only be planned as intelligence data becomes available through the activities of the planner’s subordinates or its own intelligence assets. One conclusion of this analysis is that one cannot automate the dynamics of battle planning without integrating planning with the intelligence gathering and assessment.

To plan effectively in the battle management domain, a planner must continue to develop and refine its plan as the situation unfolds. This implies a requirement for concur- rent planning and execution, with the planner using intelligence gathered by its agents acting in the environment to continue planning, and passing plan changes from the planner to the agents. To satisfactorily study adversarial planning, one must integrate the planner with an execution environment to provide the intelligence the planner needs to respond dynamically when unexpected events occur.

The implications of the battle management domain on the design of a planning system are the following: 1) in order to take effective measures against an adversary, a battle planner must build and maintain an internal model of its adversary’s plan and its relationship to its own plan; 2) some means of metaplanning is required to keep local focus of attention from dominating strategy and to avoid

194 IEEE TRANSACTIONS O N SYSTEMS, MAN, AND CYBERNETICS, VOL. 20. NO. 1. JANUARY/FEBRIJARY 1990

infinite regression in reasoning about moves and counter- moves; and 3) a battle planner must interleave plan generation, modification, and execution since it cannot accurately predict the consequences of its or its adversary’s action in advance.

The architecture we have devised to study adversarial planning separates strategic planners, action managers, and an execution environment for the simulation of plan execution. Strategic planners pass orders to action managers that in turn decide how to implement orders according to the current situation. Activities are then specified to the execution environment and simulated to determine their results. The execution environment simulates the intelligence communications that would be received at the headquarters of the lowest level agents (implemented by the action managers). These analyze the information and pass appropriate intelligence to the strategic planners. The strategic planner uses this intelligence to mcdify existing plans and focus subsequent plan generation as required. Continuous monitoring of the situation and communication between modules are central to t h s approach.

A version of the archtecture described in this article has been implemented. The system plans a Soviet combined arms army attack against NATO forces, centered in the Fulda gap area of the Federal Republic of Germany. It is almost certain that the policies and strategies for pursuing goals are substantially different for the two sides in a military conflict, especially when one is perusing an offensive mission and the other a defensive one. The hghest priority for future work is to discover whether these differ- ences have important implications for setting goals and choosing plans since our adversarial planner uses the same machinery to hypothesize its adversary’s plan as it generates its own.

REFERENCES

J. G. Carbonell. “Counterplanning: A strategy-based model of adversary planning in real-world situations,” Artificial Intell., vol. 16. pp. 295-329. 1981. D. Chapman and P. Agre, “Pengi: An implementation of a theory of activity,” AAAI-87, 1987. E. H. Durfee and V. R. Lesser, “Planning coordinated actions in dynamic domains,” Proc. DA RPA Knowledge-Based Planriing Workshop, Dec. 1987. J . de Kleer, “An assumption-based TMS,” Artificial Intell.. vol. 28. n. 2, Mar. 1986. R. E. Fikes and N. J. Nilsson, “STRIPS: A new approach to the application of theorem proving to problem solving,” Artificiul 1,i- tell.. vol. 2. pp. 189-203. 1971. M. P. Georgeff, “Planning,” Annual Review of Computer Science,

M. Georgeff. A. Lansky, and M. Schoppers, Reasoning and Plan- miig in D.vtiunzic Doniuins: A n Experiment with a Mobile Robot, Technical note 380, SRI International, 1987. L. Kaelbling, “An archtecture for intelligent reactive systems,” The 1986 Workshop on Reasoning about Actions and Plans, in M. Georgeff and A. Lansky Eds. Palo Alto, CA: Morgan Kaufman,

P. Lehner, “Adversarial planning search procedures with provable properties,” unpublished manuscript, 1987. R. Luce and H. Raiffa, Games uiid Decisions, Ititroductioii arid Criticul Survey. J. F. Nash. “Noncooperative games,” Annals of Muthemarics, vol. 54. no. 2. 1951.

v01 2, pp. 359-400. 1987.

pp. 395-410, 1986.

New York: John Wiley, 1957.

G. Powell and C. Schmidt, “Human corps-level planning: A first order computational model,” in Proc. Artificial Intell. Syst. Gowrn- merit, 1989. J. Rosenschein and M. R. Genesereth, “Deals among rational agents,” Proc. Ninth Int. Joint Conf, Artificial Intell., Los Angeles, CA, 1985. E. Sacerdoti, “Planning in a hierarchy of abstraction spaces,” IJCAI-3, 1973, pp. 412-422. A. Samuel, “Some studies in machine learning using the game of checkers.” IBM J . Res. Development. vol. 3 , pp. 211-229, 1959. J. Sanborn, “A model of reaction for planning in dynamic environments,” Masters thesis, Dept. of Computer Science. Univ. of Mary- land, 1988. M. Schoppers. “Universal plans for reactive robots in unpredictable domains,” in Proc. IJCAI 10. 1987, pp. 1039-1046. G. Stachnick, L. Appelbaum, P. Marks. J. Marsh, J. Rosenschein. M. Schoppers, and D. Shapiro, “Airland battle management planning study.” Advanced Decision Systems TR-1127-01, Mountain View, CA, 1987. W. Swartout, “DARPA Workshop on Planning,” A I Mag., vol. 9, n. 2, Summer 1988. J. von Neumann and 0. Morgenstern, The Theon. of Games and Economic Behuoior. D. Wilkins. “Using patterns and plans in chess,” Artificial Intell..

~, “Domain-independent planning: Representation and plan generation,” Artificial Intell., vol. 22, pp. 269-301, 1984. ~, Practical Planriing. Palo Alto, CA: Morgan Kaufman, 1988.

New York: John Wiley, 1953.

vol. 14, pp. 165-203, 1980.

Artificial Intelligence.

eline

Carol J. Applegate (M83) received the B.S. degree in mathematics from the University of North Carolina. Charlotte in 1978 and the M.S. in computer science from Purdue University, West Lafayette, In in 1980.

She is a member of the technical staff at General Dynamics Corp. Previously she worked at the MITRE Corporation where she is involved in planning research for battlefield applications.

Ms. Applegate is a member of the IEEE Com- puter Society and the American Association for

Christopher Elsaesser received the B.S. and M.S. degrees in Mathematics in 1975 and the M.S. degree in industrial and systems engineering in 1979; all from Ohio University, Athens, OH. He received the Ph.D. degree in engineering and public policy at Carnegie-Mellon University, Pittsburgh, PA, in 1989.

He is a Lead Engineer at the Artificial Intelli- gence Center of the MITRE Corporation, McLean, VA. His research interests include planning, uncertainty reasoning. and cognitive mod-

Mr. Elsaesser is a member of the American Association for Artificial Intelligence.

James Sanborn received the B.S. degree in mathematics from the University of Rochester, Rochester, NY in 1983, and the M.S. degree in computer science from the University of Mary- land in 1983. He is presently conducting doctoral dissertation research on the integration of planning and situated reasoning for the Ph.D. degree in computer science. University of Maryland, College Park.

He is a member of the technical staff with the Artificial Intelligence Center of the MITRE Cor-

poration, McLean, VA. His research interests include planning and acting systems, reasoning in dynamic environments, AI and robotics. and ob- ject-oriented programming.

an architecture for adversarial planning

Documents