meta-level control in multi-agent systems anita raja and victor lesser department of computer...

Meta-Level Control in

Multi-Agent Systems

Anita Raja and Victor Lesser

Department of Computer Science

University of Massachusetts

Amherst, MA 01002

2

Bounded Rationality

“A theory of rationality that does not give an account of problem-solving in the face of complexity is sadly incomplete. It is worse than incomplete; it can be seriously misleading by providing solutions that are without operational significance” Herb Simon, 1958

Basic Insight: Computations are actions with costs

3

Motivation

• Control actions like scheduling and coordination can be expensive

• Current multi-agent systems do not explicitly reason about these costs

• Need to account for costs at all levels of reasoning to provide accurate solutions

• Build meta-level control framework with minimum cost that reasons about cost of different control actions

4

Assumptions• Agent can pursue multiple tasks

simultaneously • Agent can partially fulfill or omit tasks• Agent can coordinate with other

agents to complete tasks• Tasks have varying arrival times,

deadlines and associated utilities• Tasks have alternate ways of being

achieved• Objective function: MAX utility over

a fixed time horizon

5

Agent Architecture

6

Meta-level Decision Taxonomy

• Whether to accept, delay or reject an incoming new task?• How much effort to put into reasoning about a new task?• Whether to negotiate with another agent about task

transfer?• Whether to renegotiate in case of failure of previous

negotiation? • Whether to re-evaluate current plan when a task completes?

7

Decision Tree for New task arrival event

8

Some State FeaturesName Description Value Complexity

F0 Relative Utility of new task High

Med

Low

Simple

F1 Relative Deadline of new task Simple

F2 Relative Utility of current schedule

Simple

F8 Relation of slack fragments to current schedule

Complex

F9 Relation of other agent’s slack fragments to non-local task

High

Med Low

Complex

9

Some Heuristic Decisions

• If current schedule has low priority (expected quality is low) and incoming task is of high priority (high expected quality with tight deadline), then drop current schedule and schedule new task immediately.

• If current schedule has very high priority and new task has low expected utility and a tight deadline, drop the new task

• If current task to be scheduled has high execution uncertainty associated with it and a deadline which is not tight, then introduce high slack in the schedule and use medium scheduling effort

10

Related Work

• Monitoring Progress of Anytime Algorithms (Hansen & Zilberstein)

– Uses dynamic programming for computation of a non-myopic stopping rule

• Predictability versus Responsiveness (Durfee & Lesser)

– Control amount of coordination using a user specified buffer

• Meta-level Control of Coordination Protocols (Kuwabara)

– Detects and handles exceptions by switching between protocols

– Does not account for overhead of reasoning process

11

Evaluation

• Compare system using hand-generated MLC heuristics to– Naïve multi-agent system with no explicit MLC

– Deterministic choice MLC

– Random choice MLC

– MLC with knowledge of environment characteristics including arrival model

• Environments are characterized by the following parameters– Type of tasks : Simple (S), Complex (C), Combination (A)

– Frequency of Arrivals: High (H), Medium (M), Low (L)

– Deadline Tightness: High (H), Medium (M), Low (L)

12

An Example

13

Evaluation, Continued

Utility Measures

0

50

100

150

200

SLM AML CMM CLL

Utility

RandomFixedHeuristicArrModel

14

Evaluation, Continued

Utility values with varying task complexity and arrival frequencies

0

20

40

60

80

100

120

140

Tight DL Med DL Loose DL

Utility

RandomFixedHeuristicArrModel

15

Contributions

• Meta-level control in a complex environment

• Designed agent architecture that reasons about overhead at all levels of the decision process

• Parametric control algorithm which reasons about effort and slack

• Identified state features for control using reinforcement learning

16

Future Work

• Implement Reinforcement-Learning based control algorithm– Function approximation (Sarsa() linear tile-coding) – MDP states will be abstractions of actual system state– Study effectiveness of RL algorithm on complex

domain

• Compare performance of heuristic approach to RL approach

17

Research Questions

• What are the major obstacles to efficient meta-level control?

• How can costs be accurately included at all levels of reasoning?

• How to deal with the huge, complex state space?

• Is reinforcement learning a feasible approach to learn good meta-level control policies?

meta-level control in multi-agent systems anita raja and victor lesser department of computer...

Documents