robust agent execution

Robust Agent Execution:Generalising Plans to Landscapes

"What I done on my MRes"

Planning in Context

We have a tendency to see planning as the end, rather than the means to the end.

Its important to consider what happens when plans are executed.

Typically, the assumptions we've made to make our models tractable also mean that the plan won't execute as intended.

Planning in Context

We have a tendency to see planning as the end, rather than the means to the end.

Its important to consider what happens when plans are executed.

Typically, the assumptions we've made to make our models tractable also mean that the plan won't execute as intended.

This is a problem.

AI Background - Reactive Systems

Reactive Systems match subsets of stimuli to specific response. Leads to a fast decision process.

Agents based around reaction can deal with changes in their environment.

But they tend to be ill-suited to acting towards long term objective.

AI Background - Deliberative Systems

Deliberation takes all the available stimuli and "searches" for the best decision.

Slow process, but leads to high quality decisions.

Very good for long term goal achievement.

Less well suited to environments where things are changing outside the agent's control, and fast-paced environments.

When Things Go Wrong

When things go wrong for a deliberative agent, it typically means that the world has deviated from the model.

Not too hard to remodel the world based on current observations and rethink, but it is time consuming.

That also assumes that the model is rich enough to capture unforeseen consequences.

For reactive agents, things are rarely going to go tragically wrong, but in dealing with "threats", agent may not achieve objectives.

The Problem

The fundamental problem is that some aspects of the world demand a reactive solution and some need a deliberative solution, but we always need to be thinking about the long term goals.

It isn't enough to label some aspects as being handled by a reactive component and others by a deliberative - the interactions of the two can still be disruptive to goal satisfaction.

We can't reason fast enough to make near-real-time deliberative decisions.

A Potential Solution

The Integrated Influence Architecture is designed around a fundamental belief that neither Reaction nor Deliberation are adequate to our needs, and neither can be allowed to dominate an agent's responses.

The basic approach is to allow an agent to make decisions based on a continuous range of stimuli from a mixture of sources.

This will allow the agent to react deliberatively, or deliberate reactively.

The Short Example

Reaction vs DeliberationWhen you touch a hot oven, you don't need to consider

what time it is to know that you should move your hand immediately.

Reaction with DeliberationWhen you touch a hot oven, you don't need to consider

what time it is to know that you should move your hand immediately but it might be a good idea to move your hand in the direction of the first aid kit.

The Harsh Reality of Real-Time

Execution in Real-Time Dynamic Environments demands that decisions can be made in Real-Time (or more realistically Near-Real-Time).

If things are changing then the agent needs to be able to react to this quickly and efficiently.

A good example of this kind of environment (and a place to start coding) is video games.

Developers aim for 60fps execution - each frame gets 16ms of CPU time, but most of this is graphics processing etc. AI decisions might expect to get at most 1ms total for every agent.

Mathematical Trickery

Searching state spaces is slow.

Evaluating mathematical functions is fast.

We use heuristics to evaluate individual states and guide search towards likely good solutions.

What if we were evaluating the entire state space?

Influence Maps

Influence Maps/Artificial Potential Fields are spatial mappings of location to value.

"Influence" radiates out from a point of interest, the amount of influence exerted decays as the distance from the point increases.

Influence can be positive or negative to attract or repel an agent from the points.

Influence from multiple points can interact. Additive, multiplicative etc.

Producing Landscapes From Domains

Influence maps attach value over points that are ordered (typically 2D overviews of areas).

In planning domains, our variables don't have this kind of spatial mapping.

Or do they...

DTG and SAS+

SAS+ formalism is multi-value variable based (as opposed to PDDL's propositional approach).

DTGs define the manner in which variables can change value.

Gives a sense of adjacency of values within the domain of a single variable.

Each SAS+ variable can be seen then as having an order, allowing an Influence Map to be defined across the representation.

Influence Landscapes

Using a DTG as a spatial representation of our conceptual model, we have the kind of domain to which an Influence Map is suited.

Nodes we feel are important are attractive.

Nodes we need to avoid are repellant.

Influence propagates across the DTG.

From Layers to Stacks

The fundamental principle of the architecture is that no single method produces a complete view of the world.

A layered system involves giving priority to certain aspects, or arbitrating between them.

We instead use a "stack" model, in which each unit feeds directly into the executive.

This gives an architecture free from hierarchical bias and prioritisation.

Stacks / Landscape Generators

Each Stack is tasked with generating an Influence Landscape for a specific aspect of the world that the agent is acting in.

The landscapes are then fed to the executive which can incorporate all the relevant information into its decision making.

In the initial prototyping, three different stacks have been implemented

Domain Structure

Environmental Data

Plan Data

Domain Structure

The Domain Structure stack analyses the structure of the domain.

Given a goal node, it propagates influence across a DTG, providing information for the shortest path through the DTG as well as highlighting the existence of alternate routes.

A naïve baseline for manipulating the environment, which other stacks then modify with more detailed information.

Environmental Data

Environmental Data Stack provides a view of the environment as the agent is sensing it.

It allows for updates to the perceived value of a state, and for these value alterations to be propagated out as influence to allow an agent to exploit opportunities or avoid dangers.

Types of Environmental Data

Environmental data takes two forms:

Preferences

Bias between options - either option is a valid choice at this decision point, but elements of the environment mean that one is more favoured than another.

Roadblocks

This option is not valid at this decision point - no matter what guidance other stacks are giving, the edge cannot be used.

Permanent or Transitory Roadblocks?

Should Roadblocks remove the edge from the representation, meaning it can never be used again?

In general Roadblocks do not signify an alteration to the physical characteristics of the world, rather an addition of other characteristics.

E.g. Actual roadblocks - the road still exists, and maybe will be available again within the lifecycle of the agent.

Plan Data

Incorporating data from a proper planner is vital and necessary.

This is where the majority of reasoning is done, and issues such as Causal satisfaction are primarily handled.

Forms the basis of the execution - with no additional data and no problems, the plan should be what the agent executes.

Tight or Loose Conformity?

Already noted that no one landscape should dominate the agent's execution.

Deviation from the plan should be permissible and achievable if necessary.

Imagining the plan as a trajectory through the space, do we want to strongly describe the trajectory as a ridge through the space, or weakly describe it as a set of waypoints.

Allowing loose conformity through weak description gives scope for deviation and alternate routes being found.

Achieving Loose Conformity Via Clustering

Loose Conformity means its necessary to find a subset of nodes that the plan would traverse which are "important".

Based on SAS+ notation we know that nodes can be grouped together, such as cities representing the UK and EU.

Clustering allows us to build these groupings automatically (although not necessarily as obviously as by inspection)

Fuzzy Clustering Across DTGs

Using the Fuzzy c-Means algorithm, discover c separate clusters of nodes within a DTG.

C defined as ceiling(sqrt(n/2))

Assign each node a random weight to each cluster

Calculate the centroid of the cluster based on the centroid being the node with the smallest average weighted distance to each node within the cluster.

Update weights based on relative distance to each centroid.

Repeat to stability.

Using Clusters to Find Focal Nodes

Our hypothesis is that movement within a cluster is largely straightforward.

Consider Logistics with each city being a cluster - movement of a package within a city is easy.

The important aspect is traversal between clusters.

In Logistics, which city the package is being moved to.

We define the "Focal Nodes" of a DTG as those nodes that lie between two or more clusters - these govern the inter-cluster movement.

Activating Focal Nodes

We are looking to find important nodes within a planned set of actions.

We have a list of nodes that are "Focal Nodes" within a DTG.

Activated Focal Nodes are those that appear in both lists.

These are the waypoints that we use to form a landscape that loosely conforms to the original plan, and these nodes radiate influence to form the Plan Data Landscape.

The Unified Landscape

Each individual Stack generates a Landscape with each node having -100 < h < 100 (except roadblocks which are signalled with -200)

In the initial implementation, these are simply summed to find the overall value of each node in the landscape.

More sophisticated techniques might be appropriate instead.

Agents in the Landscape

Agents now have a cohesive view of the landscape that is reflective of the input of reactive elements as well as deliberative.

The Stacks are independent of each other, so can run asynchronously to update the landscape, and can be parallelised.

Much of the heavy-lifting can be done offline, meaning that the execution-time components are kept very simple.

When Things Go Wrong

Loose conformity means that the plan data probably isn't grossly wrong immediately and can still be used to guide execution.

Domain structure analysis means that the agent can see alternative routes to the intended node, when the total cost of a route becomes too great, other routes will be used.

Environmental data means that the agent can be influenced by what is happening around it and can react to a dynamic environment.

Progress

The majority of the MRes work has been about getting the concepts tied down and working with very small scale prototypes to test out components.

Still not entirely satisfied that the specific details laid out here (or glossed over) are completely appropriate.

Entire system now functional and tested though!

Results So Far

Main issue is time taken to generate a representation of a problem.

On problems that have been translated :

Clustering creates usable FNs in >80% of cases

Time taken for each decision is ~1msThis is slightly artificial given that sensing etc. is simulated

Agent is able to successfully negotiate the domain and obstacles to complete the goal.

Problem instances used are "easy", and not reflective of all domains or instances.

Now and Next

Right now, working on integrating my formalism with PDDL/SAS+ using Christian's "krtoolkit".

This will allow much wider testing on typical benchmark problems and a much more robust set of results to be gathered.

Pete's work with Constance and Christian's cross product of DTGs are very susceptible to having some ideas "borrowed".

The Next Few Years

Aim is to develop this from prototype into proven technique.

Need to have a much more robust implementation in place, and ideally be built into an actual application.

Want to show the extensibility of the approach by developing new Stacks such as an Opponent Modelling Stack to highlight expectation of other agent's actions and how this can be factored in as an additional landscape.

robust agent execution

Technology