ms word
TRANSCRIPT
Learning agents
Project Mid-semester Report
October 22nd, 2002
Group participants: Huayan Gao ([email protected]),
Thibaut Jahan ([email protected]),David Keil ([email protected]),
Jian Lian ([email protected])
Students in CSE 333
Distributed Component SystemsProfessor Steven Demurjian
Department of Computer Science & EngineeringThe University of Connecticut
Learning agents midsemester 10/22/02
CONTENTS
CONTENTS........................................................................................................................11. Objectives and goals........................................................................................................12. Topic summary................................................................................................................22. Topic summary................................................................................................................2
2.1 Definition and classification of agents and intelligent agents...................................22.2 Learning.....................................................................................................................22.3 Platform.....................................................................................................................3
3. Topic breakdown.............................................................................................................53.1 Machine learning (David)..........................................................................................53.2 A maze problem (David)...........................................................................................63.3 Agent platform (Jian).................................................................................................73.4 Agent computing (Huayan).......................................................................................73.5 Distributed computing (Jian).....................................................................................83.6 Implementation using Together, UML, and Java (Thibaut)......................................83.7 Extension to UML needed for multi-agent systems(Huayan)...................................9
4. Progress on project, changes in direction and focus......................................................125. Planned activities...........................................................................................................14
5.1 Oct. 23 – Oct. 29:.....................................................................................................145.2 Oct. 30 – Nov. 5:......................................................................................................145.3 Nov. 6 – Nov. 12:....................................................................................................145.4 Nov. 13 – Nov. 19:..................................................................................................145.5 Nov. 20 – Nov. 26:..................................................................................................145.6 Nov. 27 – Dec. 2:.....................................................................................................14
6. References......................................................................................................................15Appendix A: Risks.............................................................................................................17Appendix B: Categories of agent computing.....................................................................17Appendix C: Q-learning Algorithm...................................................................................18
Learning agents midsemester 10/22/02 1
1. Objectives and goals
Our ambition is to build a general-architecture model of components for learning
agents. The project will investigate current research on software learning agents and will
implement a simple system of such agents. We will demonstrate our work with a
distributed learning agent system that interactively finds a policy for navigating a maze.
Our implementation will be component-based, using UML and Java.
We will begin with the notion intelligent agent, seeking to implement this in a
distributed agent environment on a pre-existing agent platform. Agents, in the sense of
mobile or distributed agents implemented, we will refer to as “deployed agents.”
We will implement the different “generic” components so they can be assembled
easily into an agent. The project may also include investigation on scalability, robustness,
and adaptability of the system. Four candidate components of a distributed learning agent
are perception, action, communication, and learning.
Our design and implementation effort will focus narrowly on an artifact of realistic
limited scope that solves a well-defined arbitrarily simplifiable maze problem using Q-
learning. We will relate the features of our implementation to recent research in the same
narrow area and to broader concepts encountered in the sources.
We select JADE (Java Agent Development Framework) as our software
development framework aimed at developing multi-agent systems and applications
conforming to FIPA (Foundation of Intelligent Physical Agents) standards for learning
agents.
Learning agents midsemester 10/22/02 2
2. Topic summary
In this section we will discuss the following questions in detail: What is an agent?
What is learning? How are learning and agents combined? What agent platform will we
use?
2.1 Definition and classification of agents and intelligent agents
Researchers involved in agent have offered a variety of definitions. Some general
features that characterize agents are: autonomy, goal-orientedness, collaboration,
flexibility, ability to be self-starting, temporal continuity, character, adaptiveness,
mobility, and capacity to learn.
According to a definition from IBM, “Intelligent agents are software entities that
carry out some set of operations on behalf of a user or another program with some degree
of independence or autonomy, and in so doing, employ some knowledge or
representation of the user's goals or desires.”
“An autonomous agent is a system situated within and a part of an environment that
senses that environment and acts on it, over time, in pursuit of its own agenda and so as
to effect what it senses in the future.”[fra-gei01]. The latter broad definition is close to
the notion of intelligent agent used in the artificial-intelligence field, replacing the logic-
programming knowledge-base-oriented paradigm.
2.2 Learning
Machine learning is a branch of artificial intelligence concerned with enabling
intelligent agents to improve their behavior. Among many categories of learning, we will
focus on reinforcement learning and the special case, Q-learning.
Learning agents midsemester 10/22/02 3
Reinforcement learning is online rational policy search and uses ideas associated
with adaptive systems and related to optimal control and dynamic programming [sut-
bar98]. It is distinguished from traditional machine-learning research approaches that
assumed offline learning, separated from the application of knowledge learned during a
separate, training phase.
In the broader definition of intelligent agents, the agent responds to its environment
under a policy, which maps from a perceived state of the environment (determined by
agent percepts) to actions. An agent’s actions are a series of responses to previously
unknown, dynamically generated percepts. A rational agent is one that acts to maximize
its expected future reward or performance measure. Because its actions may affect the
environment, such an agent must incorporate thinking or planning ahead into its
computations. Because it obtains information from its environment only through
percepts, it may have incomplete knowledge of the environment. The agent must conduct
a trial-and-error search for a policy that obtains a high performance measure.
Reinforcement by means of rewards is part of that search.
For intelligent agents that use reinforcement learning, unlike systems that learn by
training examples, the issue arises of exploitation of obtained knowledge versus
exploration to obtain new information. Exploration gains no immediate reward and is
only useful if it can improve utility by improving future expected reward. Failing to
explore, however, means sacrificing any benefit of learning.
2.3 Platform
JADE (Java Agent Development Framework) is a software framework fully
implemented in the Java language. It simplifies the implementation of multi-agent
Learning agents midsemester 10/22/02 4
systems through a middle-ware platform that claims to comply with the FIPA
specifications and through a set of tools that supports the debugging and deployment
phase. The agent platform can be distributed across machines and the configuration can
be controlled via a remote GUI.
According to the FIPA specification, agents communicate via asynchronous message
passing, where objects of the ACL Message class are the exchanged payloads. JADE
creates and manages a queue of incoming ACL messages; agents can access their queue
via a combination of several modes: blocking, polling, timeout and pattern matching
based. As for the transport mechanism protocol, Java RMI, event-notification, and IIOP
are currently used.
The standard model of an agent platform is represented in the following figure.
Fig. 2.3.1 The standard model of an agent platform
JADE is FIPA-compliant Agent Platform, which includes the AMS (Agent
Management System), the DF (Directory Facilitator), and the ACC (Agent
Communication Channel). All these three components are automatically activated at the
agent platform start-up. The AMS provides white-page and life-cycle service,
maintaining a directory of agent identifiers (AID) and agent state. Each agent must
register with an AMS in order to get a valid AID. The Directory Facilitator (DF) is the
Learning agents midsemester 10/22/02 5
agent who provides the default yellow page service in the platform. The Message
Transport System, also called Agent Communication Channel (ACC), is the software
component controlling all the exchange of messages within the platform, including
messages to/from remote platforms.
3. Topic breakdown
Our project will be focused on grid-based problems for learning agents. It will be a
similar aim to the one expounded in [rus-nor95], but we have extended that simple
example further. Our realization will be more interesting using learning multi-agents and
maybe undecided rewards and walls. We will use JADE (Java Agent Development
Framework) as our main agent platform to develop and implement the maze.
3.1 Machine learning (David)
Part of this project will consist of investigating the literature on machine learning,
particularly reinforcement learning. David will lead this work.
The problem of learning in interaction with the agent’s environment is that of
reinforcement learning (RL). The learner executes a policy search, in some solutions
using a critic to aid the reward inputs as guides to improving policy (see figure below).
Fig. 3.1.1 Learning agent
Learning agents midsemester 10/22/02 6
Within reinforcement learning we will address Q-learning, a variant in which the
agent incrementally computes, from its interaction with its environment, a table of
expected aggregate future rewards, with values discounted as they extend into the future.
As it proceeds, the agent modifies the values in the table to refine its estimates. The Q
function returns the optimal action, given a state. The evolving table of estimated Q
values is called Qˆ.
3.2 A maze problem (David)
The concrete problem described below will help to define how the project breaks
down into components:
Both [mitchelt97] and [sut-bar98] present a simple example consisting of a maze for
which the learner must find a policy, where the reward is determined by eventually
reaching or not reaching a goal location in the maze.
Fig. 3.2.1 A maze problem
We propose to modify the original problem definition by permitting multiple
distributed agents that communicate, either directly or via the environment. Either the
multi-agent system, or each agent, will use Q-learning. The mazes can be made arbitrarily
simple or complex to fit the speed, computational power, and effectiveness of the system
we are able to develop in the time available.
Learning agents midsemester 10/22/02 7
A further interesting variant of the problem would be to allow the maze to change
dynamically, either autonomously or in response to the learning agents. Robust
reinforcement learners will adapt successfully to such changes.
3.3 Agent platform (Jian)
There are many kinds of agent platform we may choose from
http://www.ece.arizona.edu/~rinda/compareagents.html. We choose JADE (Java Agent
Development Framework) as a deployed-agent platform.
JADE (Java Agent Development Framework) is a software framework fully
implemented in the Java language. It simplifies the implementation of multi-agent
systems through middleware using a set of tools that supports the debugging and
deployment phase. The agent platform can be distributed across machines (which do not
even need to share the same OS) and the configuration can be controlled via a remote
GUI. The configuration can be even changed at run time by moving agents from one
machine to another one, as and when required.
3.4 Agent computing (Huayan)
We will survey the agent paradigm of computing, focusing on rational agents, as
described in part 2 above. We will apply these concepts to the problem of machine
learning, as is done in much reinforcement-learning research.
We have defined an intelligent agent as a software entity that can monitor its
environment and act autonomously on behalf of a user or creator. To do this, an agent
must perceive relevant aspects of its environment, plan and carry out proper actions, and
communicate its knowledge to other agents and users. Learning agents will help us to
Learning agents midsemester 10/22/02 8
achieve these goals. Learning agents are adaptive, so that in difficult changing
environments they may change their behavior based on its previous experience.
The real problem with any intelligent agent system is the amount of trust placed in
the agent's ability to cope with the information provided by its sensors in its environment.
Sometimes the agent’s learning capability is not so good to achieve the anticipated goal.
This would be the emphasis when we study the agent.
Advantages of learning agents are their ability to adapt to environmental change, their
customizability, and their manageable flexibilities as the anticipated way. Disadvantages
are the time needed to learn/relearn, their ability only to automate preexisting patterns,
and thereby their lack of common sense.
3.5 Distributed computing (Jian)
In multi-agent learning in the strong sense, a common learning goal is pursued or, in
the weaker sense, agents pursue separate goals but share information. Distributed agents
may identify or execute distinct learning subtasks [weiss99]. We will survey the literature
on distributed computing, looking for connections to learning agents, and will apply what
we find in an attempt to build a distributed system of cooperating learning agents.
3.6 Implementation using Together, UML, and Java (Thibaut)
The maze described above could be represented as a bitmap or a two-dimensional
array of squares. Starting with a simple example is useful in order to concentrate on good
component design and successful implementation.
We used the Together CC software to reverse engineer existing code of examples of
learning agents. We used two examples, the cat and mouse example, and the dog and cat
Learning agents midsemester 10/22/02 9
example, explained below. We are using these examples to extract from the class
diagrams a possible design for our agents.
Multi-Agent systems being actors and software, their design do not follow typical
UML design. The paper [fla-gei01], by Flake and Geiger suggest that UML does not
offer the full possibility of designing these agents.
We plan on using the Together CC software to implement these agents, starting by
their UML design. We have so far identified several distinct components that we think
will be used in these learning agents.
These Java-implemented agents would then be executed through the JADE
environment. The communication component will have to be specific to the Agent
Communication Language (ACL) used in JADE. This should be the only environment-
dependent component. We will try to make the other components (learning, perception,
action) to be as “generic” as possible.
Besides the design and implementation of the agents, we also have to design the
environment (maze,…).
3.7 Extension to UML needed for multi-agent systems(Huayan)
Nowadays, Unified Modeling Language (UML) has been widely used in software
engineering. So it is easy to think of applying UML to the design of Agents Systems. But
Many UML applications are focused on macro aspect of agent systems like agent
interaction and communication, the design of micro aspects of such agents like goals,
complex strategies, knowledge, etc. has often been missed out. So the standard UML
could not afford to provide the complete solutions to multi-agent systems. A detail
Learning agents midsemester 10/22/02 10
description about how to use extended-UML to implement Multi-agents systems can be
seen from [fla-gei01]. A Dog-Cat Use-Case Diagram is given as follows:
Fig. 3.7.1 Dog-Cat Use-Case Diagram
In the above graph, agents are modeled as actors with square heads, and elements of
the environment are modeled as clouds. A goal case serves as a means of capturing high
level goals of an agent. Reaction cases are used to model how the environment directly
influences agents. An arc between an actor and a reactive use case expresses that the
actor is the source of events triggering this use case. Figure 1 illustrates Dog-Cat use-
case: the dog triggers the reactive use case DogDetected in the cat agent. In the
environment, the tree triggers the TreeDetected use case in the cat.
In the following, we will give a similar Use-Case Diagram of Cat-Mouse and Maze.
The rules of the Cat and Mouse game are: cat catches mouse and mouse escapes cat,
mouse catches cheese, and game is over when the cat catches the mouse.
The Cat-Mouse use-case Diagram is as follows:
Learning agents midsemester 10/22/02 11
Fig. 3.7.2 Cat-Mouse Use-Case Diagram
To the well-known maze problem, as we have mentioned in section3.2, we give the following use-case Diagram:
Fig. 3.7.3 The Maze Problem Use-Case Diagram
Learning agents midsemester 10/22/02 12
4. Progress on project, changes in direction and focus
We meet at least every Tuesday after class. Our main change of focus has been the
identification of an existing Q-learning package, “Cat and Mouse” (URL:
http://www.cse.unsw.edu.au/~aek/catmouse/followup.html), implemented in Java, and an
existing agent platform, JADE.
Thibaut generated a class diagram of the Cat and Mouse Java code using Together.
Jian installed the Java code into the JADE platform to create a distributed environment
for the learner. Our goal is to implement agents that learn to pursue moving or stationary
goals (cat pursues mouse, mouse pursues cheese) or avoid negative rewards (mouse flees
cat).
Huayan found a similar example, “Dog and Cat,” described with use cases, and
located other sources related to agent-based systems.
The source for “Dog and Cat” [Flake, Geiger, 2001], raised the issue of the
limitations of standard UML use-case diagramming for the purpose of depicting multi-
agent systems. Cat, for example, has the use case Escape while Dog has Chase. But these
two use cases denote the same set of events, seen from opposite perspectives.
David coded a simple maze reinforcement learner based on [RN95] in C++, writing
classes for the maxe, individual states in the maze, and the learning agent. At a later stage
this code could easily be ported to java.
David also wrote C++ code for a system based on (Michie, Chambers, 1968) that
uses reinforcement learning to solve the classic problem of pole balancing, in which a
controller nudges a cart that sits on a track, with a pole balanced on it, trying to avoid
letting the pole fall. In this problem, physical states are on a continuum in four
Learning agents midsemester 10/22/02 13
dimensions, but may be quantized into a tractable number of discrete states from the
standpoint of the learner, leading to a solution.
The two directions taken so far by group members are somewhat complementary.
The group may have to choose between them, however. Use of the existing Cat-and-
Mouse system will allow us with certainty to address harder problems, where the
learner’s environment changes in response to the agent (e.g., cat flees dog). Using JADE
has the best chance to allow us to attain our goal of implementing distributed learning
agents that communicate. We may then seek to extend the existing solution by adding to
its Java code.
The approach of coding known solutions from scratch, on the other hand, guarantees
that at least one member of the group will understand the code, and all members will
understand it if all members participate in the coding or if the code explains the code to
the others. We noticed that the Java code for Cat-and-Mouse is quite lengthy.
Learning agents midsemester 10/22/02 14
5. Planned activities
5.1 Oct. 23 – Oct. 29:
Consultation with instructor on platform and problem choices to be made; discussion
on selection of problem and platform. Decision on role in this project of UML extension
to multi-agent systems.
5.2 Oct. 30 – Nov. 5:
Java implementation of the learning aspect of the agents and enhancement of
communication efficiency. Each participant will code the components decided on and
described in the design part. Once these components are tested, they will be integrated
and the resulting system tested.
5.3 Nov. 6 – Nov. 12:
Extensions to code. Circulation of draft report.
5.4 Nov. 13 – Nov. 19:
Preliminary preparation of slides
5.5 Nov. 20 – Nov. 26:
Preparation of the final report and last adjustments of the learning agents.
5.6 Nov. 27 – Dec. 2:
Polishing of report and slides.
Learning agents midsemester 10/22/02 15
6. References
[aga-bek97] Arvin Agah and George A. Bekey. Phylogenetic and ontogenetic learning in a colony of interacting robots. Autonomous Robots 4, pp. 85-100, 1997.
[anders02] Chuck Anderson. Robust Reinforcement Learning with Static and Dynamic Stability. http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf, 2002.
[durfee99] Edmund H. Durfee. Distributed problem solving and planning. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999, pp. 121ff, 1999.
[d’Inverno01] Mark d’Inverno, Michael Luck. Understanding Agent Systems. [PUB?] 2001.
[fla-gei01] Stephan Flake, Christian Geiger, Jochen M. Kuster. Towards UML-based analysis and design of multi-agent systems. International NAISO Symposium on Information Science Innovations in Engineering of Natural and Artificial Intelligent Systems (ENAIS’2001), Dubai, March 2001.
[fra-gra96] Stan Franklin and Art Graesser. Is it an agent, or just a program?: A taxonomy for autonomous agents. Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, 1996. www.msci.memphis.edu/ ~franklin/AgentProg.html
[huh-ste99] Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies of agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 79-120, 1999.
[jac-byl] Ivar jacobson and Stefan Bylund. A multi-agent system assisting software developers. Downloaded.
[Knapik98] Michael Knapik, Jay Johnson. Developing Intelligent Agents for Distributed Systems, 1998
[lam-lyn90] Leslie Lamport and Nancy Lynch. Distributed computing: models and methods. In Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. B, MIT Press, 1990, pp. 1158-1199.
[mitchelt97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997.
[mor-mii96] David E. Moriarty and Risto Miikkulainen. Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, pp. 11-33, 1996.
[petrie96] Charles J. Petrie. Agent-based engineering, the web, and intelligence. IEEE Expert, December 1996.
[rus-nor95] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995.
[SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents Tutorial”, htt p://pattie.www.media.mit.edu/people/pattie/CHI97/ .
Learning agents midsemester 10/22/02 16
[sandho99] Tuomas W. Sandholm. Distributed rational decision making. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 201-258, 1999.
[sen-wei99] Sandip Sen and Gerhard Weiss. Learning in multiagent systems. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 259-298, 1999.
[shen94] Wei-Min Shen. Autonomous learning from the environment. Computer Science Press, 1994.
[sut-bar98] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT Press, 1998.
[syc-pan96] Katia Sycara, Anandeep Pannu, Mike Williamson, Dajun Zeng, Keith Decker. Distributed intelligent agents. IEEE Expert, December 1996, pp. 36-45.
[venners97] Bill Venners. The architecture of aglets. Java World, April, 1997.
[wal-wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall. A note on distributed computing. Sun Microsystems technical report SMLI TR-94-29, November 1994.
[weiss99] Gerhard Weiss, Ed. Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999.
[wooldr99] Michael Wooldridge. Intelligent agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 27-77.
Reference to get title, author:
[xx99] http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html.
Learning agents midsemester 10/22/02 17
Appendix A: Risks
Our objectives include avoiding several possible risks, including (1) the construction
of “toy worlds,” i.e., problem specifications tailored to the envisioned solution;
(2) complexity of design without performance gain; (3) overfitting the generalizable
components to the specific problem at hand, putting reusability at risk; (4) premature
commitment to a specific solution (Q-learning) as opposed to exploration of various
alternatives.
Appendix B: Categories of agent computing
A wide range of agent types exists.
Interface agents are computer programs that employ artificial intelligence
techniques to provide active assistance to a user with computer-based tasks.
Mobile agents are software processes capable of moving around networks
such as the World Wide Web, interacting with hosts, gathering information
on behalf of their owner and returning with requested information that is
found.
Co-operative agents can communicate with, and react to, other agents in a
multi-agent systems within a common environment. Such an agent's view of
its environment might be very narrow due to its limited sensory capacity. Co-
operation exists when the actions of an agent achieve not only the agent's
own goals, but also the goals of agents other than itself.
Reactive Agents do not possess internal symbolic models of their
environment. Instead, the reactive agent “reacts” to a stimulus or input that is
Learning agents midsemester 10/22/02 18
governed by some state or event in its environment. This environmental event
triggers a reaction or response from the agent.
The application field of agent computing includes economics, business (commercial
databases), management, telecommunications (network management) and e-societies (as
for e-commerce). Techniques from databases, statistics, and machine learning are widely
used in agent applications. In the telecommunication field, agent technology is used to
support efficient (in terms of both cost and performance) service provision to fixed and
mobile users in competitive telecommunications environments.
Appendix C: Q-learning Algorithm
With a known model (M below) of the learner’s transition probabilities given a state
and an action, the following constraint equation holds for Q-values, where a is an action,
i and j states, and R a reward:
Q(a, i) = R (i) + jMaij maxa Q(a, j)
Using the temporal-difference learning approach, which does not require a model,
we have the following update formula that is calculated after the learn goes from state i to
state j:
Q(a, i) Q (a, i) + (R(i) + maxa Q(a, j) - Q (a, i))
Within the objective of a simple implementation, we will aim to provide an analysis
of the time complexithy, adaptability to dynamic environments, and scalability of Q-
learning agents as compared to more primitive reinforcement learners.