romaric charton maia team - umass workshop wednesday, 23 rd june, 2004
DESCRIPTION
Learning Mediation Strategies in heterogeneous Multiagent Systems Application to adaptive services. Romaric CHARTON MAIA Team - UMASS Workshop Wednesday, 23 rd June, 2004. Presentation Overview Learning Mediation Strategies in heterogeneous Multiagent Systems. - PowerPoint PPT PresentationTRANSCRIPT
Learning Mediation Strategies in heterogeneous Multiagent Systems
Application to adaptive services
Romaric CHARTON
MAIA Team - UMASS Workshop
Wednesday, 23rd June, 2004
2/18
Presentation Overview Learning Mediation Strategies in heterogeneous Multiagent Systems
• Research and application fields
• heterogeneous Multiagent Systems (h-MAS)
• Typical example of interaction
• Markov Decision Process based Mediation
• Experiments
• Works in progress
3/18
Research fields
Domain : heterogeneous Multiagent Systems (h-MAS)
• Learning behaviours of agents that interact with human beings
• Organization of agents with different nature
Approach :
• Inspiration from the Agent-Group-Role model (Gutknecht and Ferber 1998)
• Deal with real applications
– dynamic environments
– uncertainty
– incomplete knowledge
Use of Stochastic Models (MDP) + Reinforcement Learning
4/18
Applicative Domains : Interactive services
Interaction with humans in real applications• Provided on computers and network supports• Use of various communication media (telephone, e-mail, web, etc.)• Examples : order online, search information, manage shares, etc.
(focus on Information Search services)
From Classical Interactive ServicesMost of time controlled with handwritten finite state machines (static scripts)• Complexity (particular cases and errors)• Need of implicit / expert knowledge (for instance : the user model)
To Adaptive services • Ease the design and the control of interactions • Robustness of the solution (particular cases, unforeseen cases, etc.)• Adapt the interaction to the user's behaviour, characteristics and preferences
5/18
Common features
• Bounded Rationality Agents (Russell and Norvig 1995)
• Ability to communicate and to manage knowledge and resources
Partition of the agent set according to
• Their nature (human, software, etc)
• Their subjective "confidence" (knowledge and influence on the others : goal delegation, ...)
Heterogeneous Multiagent Systems (h-MAS)
Problems
• How to bridge the language gap ?
• How to match needs to capabilities ?
• What if agents cannot be modified ?
• What if some agents are human beings (Grislin-LeSturgeon and Peninou 1998) ?
Our solution: add a Mediator Agent that will manage the interaction
6/18
Too many/raw results...
Don't know how to formulate a request
An Information Search problem Flight booking
Customer(occasional, novice)
Interaction
Goal : book a flight fromParis to Moscow
Query
Results
Mediator
Information Source
(not owned, cost)
Objective : Enhance the service quality relatively to classical search
7/18
Role of the Mediator Agent
Its goals• Build a query that matches the most the user goal• Provide relevant results to the user• Maximize its utility (user satisfaction - source costs)
At any time, it can• Ask the user about the query,
• Send the query to the information source or
• Propose a limited number of results to the user
In return, it perceives the other agent's answers (values, results, selections, rejections, etc.)
It has to manage uncertainty and incomplete knowledge :• From users (misunderstandings, partial knowledge of their needs)
• From the environment (noise and imperfect sensors)
8/18
Mediator's Environment
Interaction Sequence(MDP to control)
MDP based Interaction Control
Need to define : < S, A, T, R >
• S : State space
• A : Mediator actions
• T : Transition functions
• R : Reward function S, R
T
A
Mediator
User Source
Proposition :
Control an interaction sequence as a Markov Decision Process (MDP)
find Mediation Strategies (MDP Policies)
Problem : T and R depend on user and source agents !
Solution : Learn the mediation strategy online by reinforcement
Choice : Q-Learning (Watkins 1989)
9/18
State Space of Interactions Sequences
= U RInteraction with the source
R power set of all source objects
Known objects matching the current query
R = {flight 1 , ... , flight r} or {unknown}
Complexity Problem !
|| = (2 n + 1) (2 + i) m
n : number of total source objects
m : number of attributes
i : average value count per attribute
Interaction with the user
U set of partial user queries
Attribute state ea :
• ‘?’ val is unknown
• ‘A’ val is assigned
• ‘F’ val cannot be specified
Current partial query (attribute values)
U = { ( ea 1 , val 1 ) , ... , ( ea m , val m ) }
Idea : use a State Abstraction for the MDP
10/18
Abstract State Space (used for the MDP)
Interaction with the user Interaction with the sourceS = S U S R
SU set of user queries formulation state
Response quantity for the current query
s R= qr( | R | )
Attribute state ea :
• ‘?’ val is unknown
• ‘A’ val is assigned
• ‘F’ val cannot be specified
Current partial query formulation state
s U = { ea 1 , ... , ea m }
|S| = 4 3 m m : number of attributes
A more tractable state space !
S R = {?, 0, +, *} Quantity Classes
| R |
0
nrmax
qr = +
unknown
qr = *
qr = 0
qr = ?
+
11/18
Actions and Rewards
Actions of the mediator• Ask the user a question about an attribute (valuation, proposition, confirmation)
• Send the current query to the information source
• Ask the user to select a response
Rewards can be obtained through interaction
• with the user+ R selection user selects a proposition
- R timeout too long interaction (user disconnection / time limit)
• with the information source+ R noresp no results for a fully specified query
- R overnum too many results (response quantity s R = *)
12/18
ExperimentationFlight booking
Training of the mediator on tasks with
• 3 attributes (cities of departure/arrival and flight class)
• 4 attributes (+ the time of day for taking off)
• 5 attributes (+ the airline)
# of attributes
(m)
# of abstract states (4.3 m)
# of actions (3.m+2)
# of Q-Values
((12.m+8).3 m)
3 108 11 1 188
4 324 14 4 536
5 972 17 16 524
Complexity growth as function of the number of attributes:
13/18
Learning results Flight booking
Successful mediations Average interaction length
• 3 / 4 attributes : 99% of success, minimal mediation length length reached
• 5 attributes : more time required to converge and longer mediation
0
10
20
30
40
50
60
70
80
90
100
0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000
Number of iterations
% o
f s
uc
ce
ss
ful m
ed
iati
on
s (
se
lec
tio
n /
no
re
sp
on
se
)
3 attributes
4 attributes
5 attributes
0
5
10
15
20
25
30
35
40
45
50
0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000
Number of iterations
Avg
. m
edia
tio
n l
eng
th (
nu
m.
of
acti
on
s p
er m
edia
tio
n) 3 attributes
4 attributes
5 attributes
14/18
Conclusion
Mediation Strategies in h-MAS• Reinforcement learning of mediation strategies is possible
• Answer to users needs (majority, but also particular, through profiles)
Software model• Towards "user oriented" design (utility based on user's satisfaction)
• Implementation of a Mediator prototype
Limits• Limited richness of the learning due to the simulated answer generator• User is at most partially observable• Degradation of performance for more complex tasks
15/18
Current Works
Deal with Partial Observation• Challenge : Get rid of the ad-hoc state space abstraction• Key question : "What must be kept in / from the interaction history ?"• Study of memory based approaches :
– HQ-Learning (Wiering and Schmidhuber 1997)– U-Trees (McCallum 1995)– ...
Deal with structured tasks• Challenge : Reduced state space complexity, better guidance ... and service composition ?• Main idea : Exploit or discover the task structure (sub-tasks, dependencies, etc.)• Hierarchical models are promising
– MAX-Q (Dietterich 2000) / HEX-Q (Hengst 2002)– HAM (Parr 1998) / PHAM (Andre and Russell 2000)– H-MPD and H-POMDP– ...
16/18
References
(Andre and Russell 2000) Andre D. et Russell S. J,. Programmable Reinforcement Learning Agents. In NIPS, 2000.
(Dietterich 2000) Dietterich T. G., An overview of MAXQ hierarchical reinforcement learning. In SARA, 2000.
(Ferber 1995) Ferber J., Les Systèmes Multi-Agents. Vers une intelligence collective. Interéditions, 1995.
(Gutknecht and Ferber 1998) Gutknecht O. and Ferber J., Un méta-modèle organisationnel pour l'analyse, la conception et l'exécution de systèmes multi-agents. In JFIADSMA'98, pp. 267, 1998.
(Grislin-LeSturgeon and Peninou 1998) Grislin-Le Sturgeon E. and Péninou A., Les interactions Homme-SMA : réflexions et problématiques de conception. Systèmes Multi-Agents de l'interaction à la Socialité. In JFIADSMA'98, Hermès, pp. 133-145, 1998.
(Hengst 2002) Hengst B, Discovering Hierarchy in Reinforcement Learning with HEXQ. In ICML, pp. 243-250, Sydney Australia, 2002.
(McCallum 1995) McCallum A. K., Reinforcement Learning with selective Perception and Hidden State. PhD Thesis, University of Rochester, New York, 1995.
(Parr 1998) Parr R. E., Hierarchical Control and Learning for Markov Decision Process - PhD Thesis of University of California, Berkeley, 1998.
(Russell and Norvig 1995) Russell S. and Norvig P., Artificial Intelligence: A Modern Approach, The Intelligent Agent Book. Prentice Hall, 1995.
(Watkins 1989) Watkins C., Learning from Delayed Rewards. PhD Thesis of the King's College, University of Cambridge, England, 1989.
(Wiering et Schmidhuber 1997) Wiering M., Schmidhuber J, HQ-Learning. In Adaptive Behavior 6:2, 1997.
17/18
Thank you for your attention.
Any questions ?