meta-level control in multi-agent systems anita raja and victor lesser department of computer...
TRANSCRIPT
Meta-Level Control in
Multi-Agent Systems
Anita Raja and Victor Lesser
Department of Computer Science
University of Massachusetts
Amherst, MA 01002
2
Bounded Rationality
“A theory of rationality that does not give an account of problem-solving in the face of complexity is sadly incomplete. It is worse than incomplete; it can be seriously misleading by providing solutions that are without operational significance” Herb Simon, 1958
Basic Insight: Computations are actions with costs
3
Motivation
• Control actions like scheduling and coordination can be expensive
• Current multi-agent systems do not explicitly reason about these costs
• Need to account for costs at all levels of reasoning to provide accurate solutions
• Build meta-level control framework with minimum cost that reasons about cost of different control actions
4
Assumptions• Agent can pursue multiple tasks
simultaneously • Agent can partially fulfill or omit tasks• Agent can coordinate with other
agents to complete tasks• Tasks have varying arrival times,
deadlines and associated utilities• Tasks have alternate ways of being
achieved• Objective function: MAX utility over
a fixed time horizon
5
Agent Architecture
6
Meta-level Decision Taxonomy
• Whether to accept, delay or reject an incoming new task?• How much effort to put into reasoning about a new task?• Whether to negotiate with another agent about task
transfer?• Whether to renegotiate in case of failure of previous
negotiation? • Whether to re-evaluate current plan when a task completes?
7
Decision Tree for New task arrival event
8
Some State FeaturesName Description Value Complexity
F0 Relative Utility of new task High
Med
Low
Simple
F1 Relative Deadline of new task Simple
F2 Relative Utility of current schedule
Simple
F8 Relation of slack fragments to current schedule
Complex
F9 Relation of other agent’s slack fragments to non-local task
High
Med Low
Complex
9
Some Heuristic Decisions
• If current schedule has low priority (expected quality is low) and incoming task is of high priority (high expected quality with tight deadline), then drop current schedule and schedule new task immediately.
• If current schedule has very high priority and new task has low expected utility and a tight deadline, drop the new task
• If current task to be scheduled has high execution uncertainty associated with it and a deadline which is not tight, then introduce high slack in the schedule and use medium scheduling effort
10
Related Work
• Monitoring Progress of Anytime Algorithms (Hansen & Zilberstein)
– Uses dynamic programming for computation of a non-myopic stopping rule
• Predictability versus Responsiveness (Durfee & Lesser)
– Control amount of coordination using a user specified buffer
• Meta-level Control of Coordination Protocols (Kuwabara)
– Detects and handles exceptions by switching between protocols
– Does not account for overhead of reasoning process
11
Evaluation
• Compare system using hand-generated MLC heuristics to– Naïve multi-agent system with no explicit MLC
– Deterministic choice MLC
– Random choice MLC
– MLC with knowledge of environment characteristics including arrival model
• Environments are characterized by the following parameters– Type of tasks : Simple (S), Complex (C), Combination (A)
– Frequency of Arrivals: High (H), Medium (M), Low (L)
– Deadline Tightness: High (H), Medium (M), Low (L)
12
An Example
13
Evaluation, Continued
Utility Measures
0
50
100
150
200
SLM AML CMM CLL
Utility
RandomFixedHeuristicArrModel
14
Evaluation, Continued
Utility values with varying task complexity and arrival frequencies
0
20
40
60
80
100
120
140
Tight DL Med DL Loose DL
Utility
RandomFixedHeuristicArrModel
15
Contributions
• Meta-level control in a complex environment
• Designed agent architecture that reasons about overhead at all levels of the decision process
• Parametric control algorithm which reasons about effort and slack
• Identified state features for control using reinforcement learning
16
Future Work
• Implement Reinforcement-Learning based control algorithm– Function approximation (Sarsa() linear tile-coding) – MDP states will be abstractions of actual system state– Study effectiveness of RL algorithm on complex
domain
• Compare performance of heuristic approach to RL approach
17
Research Questions
• What are the major obstacles to efficient meta-level control?
• How can costs be accurately included at all levels of reasoning?
• How to deal with the huge, complex state space?
• Is reinforcement learning a feasible approach to learn good meta-level control policies?