romaric charton maia team - umass workshop wednesday, 23 rd june, 2004

Learning Mediation Strategies in heterogeneous Multiagent Systems

Application to adaptive services

Romaric CHARTON

MAIA Team - UMASS Workshop

Wednesday, 23rd June, 2004

2/18

Presentation Overview Learning Mediation Strategies in heterogeneous Multiagent Systems

• Research and application fields

• heterogeneous Multiagent Systems (h-MAS)

• Typical example of interaction

• Markov Decision Process based Mediation

• Experiments

• Works in progress

3/18

Research fields

Domain : heterogeneous Multiagent Systems (h-MAS)

• Learning behaviours of agents that interact with human beings

• Organization of agents with different nature

Approach :

• Inspiration from the Agent-Group-Role model (Gutknecht and Ferber 1998)

• Deal with real applications

– dynamic environments

– uncertainty

– incomplete knowledge

Use of Stochastic Models (MDP) + Reinforcement Learning

4/18

Applicative Domains : Interactive services

Interaction with humans in real applications• Provided on computers and network supports• Use of various communication media (telephone, e-mail, web, etc.)• Examples : order online, search information, manage shares, etc.

(focus on Information Search services)

From Classical Interactive ServicesMost of time controlled with handwritten finite state machines (static scripts)• Complexity (particular cases and errors)• Need of implicit / expert knowledge (for instance : the user model)

To Adaptive services • Ease the design and the control of interactions • Robustness of the solution (particular cases, unforeseen cases, etc.)• Adapt the interaction to the user's behaviour, characteristics and preferences

5/18

Common features

• Bounded Rationality Agents (Russell and Norvig 1995)

• Ability to communicate and to manage knowledge and resources

Partition of the agent set according to

• Their nature (human, software, etc)

• Their subjective "confidence" (knowledge and influence on the others : goal delegation, ...)

Heterogeneous Multiagent Systems (h-MAS)

Problems

• How to bridge the language gap ?

• How to match needs to capabilities ?

• What if agents cannot be modified ?

• What if some agents are human beings (Grislin-LeSturgeon and Peninou 1998) ?

Our solution: add a Mediator Agent that will manage the interaction

6/18

Too many/raw results...

Don't know how to formulate a request

An Information Search problem Flight booking

Customer(occasional, novice)

Interaction

Goal : book a flight fromParis to Moscow

Query

Results

Mediator

Information Source

(not owned, cost)

Objective : Enhance the service quality relatively to classical search

7/18

Role of the Mediator Agent

Its goals• Build a query that matches the most the user goal• Provide relevant results to the user• Maximize its utility (user satisfaction - source costs)

At any time, it can• Ask the user about the query,

• Send the query to the information source or

• Propose a limited number of results to the user

In return, it perceives the other agent's answers (values, results, selections, rejections, etc.)

It has to manage uncertainty and incomplete knowledge :• From users (misunderstandings, partial knowledge of their needs)

• From the environment (noise and imperfect sensors)

8/18

Mediator's Environment

Interaction Sequence(MDP to control)

MDP based Interaction Control

Need to define : < S, A, T, R >

• S : State space

• A : Mediator actions

• T : Transition functions

• R : Reward function S, R

T

A

Mediator

User Source

Proposition :

Control an interaction sequence as a Markov Decision Process (MDP)

find Mediation Strategies (MDP Policies)

Problem : T and R depend on user and source agents !

Solution : Learn the mediation strategy online by reinforcement

Choice : Q-Learning (Watkins 1989)

9/18

State Space of Interactions Sequences

= U RInteraction with the source

R power set of all source objects

Known objects matching the current query

R = {flight 1 , ... , flight r} or {unknown}

Complexity Problem !

|| = (2 n + 1) (2 + i) m

n : number of total source objects

m : number of attributes

i : average value count per attribute

Interaction with the user

U set of partial user queries

Attribute state ea :

• ‘?’ val is unknown

• ‘A’ val is assigned

• ‘F’ val cannot be specified

Current partial query (attribute values)

U = { ( ea 1 , val 1 ) , ... , ( ea m , val m ) }

Idea : use a State Abstraction for the MDP

10/18

Abstract State Space (used for the MDP)

Interaction with the user Interaction with the sourceS = S U S R

SU set of user queries formulation state

Response quantity for the current query

s R= qr( | R | )

Attribute state ea :

• ‘?’ val is unknown

• ‘A’ val is assigned

• ‘F’ val cannot be specified

Current partial query formulation state

s U = { ea 1 , ... , ea m }

|S| = 4 3 m m : number of attributes

A more tractable state space !

S R = {?, 0, +, *} Quantity Classes

| R |

0

nrmax

qr = +

unknown

qr = *

qr = 0

qr = ?

+

11/18

Actions and Rewards

Actions of the mediator• Ask the user a question about an attribute (valuation, proposition, confirmation)

• Send the current query to the information source

• Ask the user to select a response

Rewards can be obtained through interaction

• with the user+ R selection user selects a proposition

- R timeout too long interaction (user disconnection / time limit)

• with the information source+ R noresp no results for a fully specified query

- R overnum too many results (response quantity s R = *)

12/18

ExperimentationFlight booking

Training of the mediator on tasks with

• 3 attributes (cities of departure/arrival and flight class)

• 4 attributes (+ the time of day for taking off)

• 5 attributes (+ the airline)

# of attributes

(m)

# of abstract states (4.3 m)

# of actions (3.m+2)

# of Q-Values

((12.m+8).3 m)

3 108 11 1 188

4 324 14 4 536

5 972 17 16 524

Complexity growth as function of the number of attributes:

13/18

Learning results Flight booking

Successful mediations Average interaction length

• 3 / 4 attributes : 99% of success, minimal mediation length length reached

• 5 attributes : more time required to converge and longer mediation

0

10

20

30

40

50

60

70

80

90

100

0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000

Number of iterations

% o

f s

uc

ce

ss

ful m

ed

iati

on

s (

se

lec

tio

n /

no

re

sp

on

se

)

3 attributes

4 attributes

5 attributes

0

5

10

15

20

25

30

35

40

45

50

0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000

Number of iterations

Avg

. m

edia

tio

n l

eng

th (

nu

m.

of

acti

on

s p

er m

edia

tio

n) 3 attributes

4 attributes

5 attributes

14/18

Conclusion

Mediation Strategies in h-MAS• Reinforcement learning of mediation strategies is possible

• Answer to users needs (majority, but also particular, through profiles)

Software model• Towards "user oriented" design (utility based on user's satisfaction)

• Implementation of a Mediator prototype

Limits• Limited richness of the learning due to the simulated answer generator• User is at most partially observable• Degradation of performance for more complex tasks

15/18

Current Works

Deal with Partial Observation• Challenge : Get rid of the ad-hoc state space abstraction• Key question : "What must be kept in / from the interaction history ?"• Study of memory based approaches :

– HQ-Learning (Wiering and Schmidhuber 1997)– U-Trees (McCallum 1995)– ...

Deal with structured tasks• Challenge : Reduced state space complexity, better guidance ... and service composition ?• Main idea : Exploit or discover the task structure (sub-tasks, dependencies, etc.)• Hierarchical models are promising

– MAX-Q (Dietterich 2000) / HEX-Q (Hengst 2002)– HAM (Parr 1998) / PHAM (Andre and Russell 2000)– H-MPD and H-POMDP– ...

16/18

References

(Andre and Russell 2000) Andre D. et Russell S. J,. Programmable Reinforcement Learning Agents. In NIPS, 2000.

(Dietterich 2000) Dietterich T. G., An overview of MAXQ hierarchical reinforcement learning. In SARA, 2000.

(Ferber 1995) Ferber J., Les Systèmes Multi-Agents. Vers une intelligence collective. Interéditions, 1995.

(Gutknecht and Ferber 1998) Gutknecht O. and Ferber J., Un méta-modèle organisationnel pour l'analyse, la conception et l'exécution de systèmes multi-agents. In JFIADSMA'98, pp. 267, 1998.

(Grislin-LeSturgeon and Peninou 1998) Grislin-Le Sturgeon E. and Péninou A., Les interactions Homme-SMA : réflexions et problématiques de conception. Systèmes Multi-Agents de l'interaction à la Socialité. In JFIADSMA'98, Hermès, pp. 133-145, 1998.

(Hengst 2002) Hengst B, Discovering Hierarchy in Reinforcement Learning with HEXQ. In ICML, pp. 243-250, Sydney Australia, 2002.

(McCallum 1995) McCallum A. K., Reinforcement Learning with selective Perception and Hidden State. PhD Thesis, University of Rochester, New York, 1995.

(Parr 1998) Parr R. E., Hierarchical Control and Learning for Markov Decision Process - PhD Thesis of University of California, Berkeley, 1998.

(Russell and Norvig 1995) Russell S. and Norvig P., Artificial Intelligence: A Modern Approach, The Intelligent Agent Book. Prentice Hall, 1995.

(Watkins 1989) Watkins C., Learning from Delayed Rewards. PhD Thesis of the King's College, University of Cambridge, England, 1989.

(Wiering et Schmidhuber 1997) Wiering M., Schmidhuber J, HQ-Learning. In Adaptive Behavior 6:2, 1997.

17/18

Thank you for your attention.

Any questions ?

romaric charton maia team - umass workshop wednesday, 23 rd june, 2004

Documents

source agents

partial knowledge

search information

mediation experimentsworks

mediation strategy

information source orpropose

agents answers values

s u s rcomplexity problem