emergent communication for collaborative reinforcement ... · emergent communication for...

64
Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014

Upload: others

Post on 04-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Emergent Communication forCollaborative Reinforcement Learning

Yarin Gal and Rowan McAllister

MLG RCC

8 May 2014

Page 2: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Game Theory

Multi-Agent Reinforcement Learning

Learning Communication

Page 3: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Nash Equilibrium

“Nash equilibria are game-states s.t. no player would fare better byunilateral1 change of their own action.”

1Performed by or affecting only one person involved in a situation, without the agreement of another.

3 of 40

Page 4: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Prisoner’s Dilemma

Sideshow Bob

Cooperate Defect

Sn

ake Cooperate 1,1 3,0

Defect 0,3 2,2

(prison sentence in years)

4 of 40

Page 5: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Pareto Efficiency

“Pareto optima are game-states s.t. no alternative state existswhereby each player would fare equal or better.”

5 of 40

Page 6: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Iterated Prisoner Dilemma Strategies

At = Cooperate (C ), Defect (D)

St = CC , CD, DC , DD (previous game outcome)

π : ×ti=2Si → At

Possible strategies π for Snake:

I Tit-for-Tat:

π(st) =

C , if t = 1;aBob,t−1 , if t > 1

I Reinforce actions conditioned on game outcomes:

π(st) = arg mina ET[accumulated prison years|st , a]update transition model T

6 of 40

Page 7: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Game Theory

Multi-Agent Reinforcement Learning

Learning Communication

Page 8: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Multi-Agent Reinforcement Learning

How can we learn mutually-beneficial collaboration strategies?

I Modelling:multi-agent-MDPs, dec-MDPs

I Issues solving joint tasks:I decentralised knowledge with no centralised control,I credit assignment,I communication constraints

I Issues affecting individual agents:I state space explodes: O(|S|#agents),I coadapatation → dynamic non-Markov environment

8 of 40

Page 9: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Markov Decision Process (MDP)

Stochastic environment characterised by tuple S,A,R,T,γ, where:

I R : S×A× S ′ → R ∈ (−∞,∞)

I T : S×A× S ′ → R ∈ [0, 1]

I γ ∈ [0, 1]

9 of 40

Page 10: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Multi-agent MDP (MMDP)

N-agent stochastic game characterised by tuple S,A,R,T,γ, where:

I S = ×Ni=1Si

I A = ×Ni=1Ai

I R = ×Ni=1Ri , Ri : S×A× S ′ → R

I T : S×A× S ′ → R

10 of 40

Page 11: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Multi-agent Q-learning

I Oblivious agents [Sen et al., 1994]

Qi (s, ai ) ← (1 − α)Qi (s, ai ) + α[Ri (s, ai ) + γVi (s′)]

V ∗i (s) = maxai∈Ai

Q∗i (s, ai )

I Common-payoff games [Claus and Boutilier, 1998]

Qi (s, a) ← (1 − α)Qi (s, a) + α[Ri (s, a, s ′) + γVi (s′)]

Vi (s) ← maxai∈A

∑a−i∈A/Ai

Pi (s, a−i )Qi (s, ai , a−i )

11 of 40

Page 12: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Multi-agent Q-learning

I Oblivious agents [Sen et al., 1994]

Qi (s, ai ) ← (1 − α)Qi (s, ai ) + α[Ri (s, ai ) + γVi (s′)]

V ∗i (s) = maxai∈Ai

Q∗i (s, ai )

I Common-payoff games [Claus and Boutilier, 1998]

Qi (s, a) ← (1 − α)Qi (s, a) + α[Ri (s, a, s ′) + γVi (s′)]

Vi (s) ← maxai∈A

∑a−i∈A/Ai

Pi (s, a−i )Qi (s, ai , a−i )

11 of 40

Page 13: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Independent vs Cooperative Learning

[Tan, 1993]: “Can N communicating agents outperform Nnon-communicating agents?”

Ways of communication:I Agents share Q-learning updates (thus syncing Q-values):

I Pro: each agent learns N-fold faster (per timestep),I Note: same asymptotic performance as independent agents.

I Agents share sensory information:I Pro: more information → better policies,I Con: more information → larger state space → slower learning.

12 of 40

Page 14: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Independent vs Cooperative Learning

[Tan, 1993]: “Can N communicating agents outperform Nnon-communicating agents?”

Ways of communication:I Agents share Q-learning updates (thus syncing Q-values):

I Pro: each agent learns N-fold faster (per timestep),I Note: same asymptotic performance as independent agents.

I Agents share sensory information:I Pro: more information → better policies,I Con: more information → larger state space → slower learning.

12 of 40

Page 15: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Hunter-Prey Problem

prey hunter

10× 10 grid world.

x

y

perceptual state, visual depth 2(prey’s relative position).|S| = 52 + 1 = 26

R =

1.0 : a hunter catches a prey, i.e. xi , yi = 0, 0

−0.1 : otherwise

13 of 40

Page 16: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Hunter-Prey Experiments

Experiment 1 – any hunter catches a prey:

I Baseline: 2 independent hunters, |Si | = 52 + 1 = 26

I 2 hunters, communicating Q-value updates. |Si | = 26

Experiment 2 – both hunters catch same prey simultaneously:

I Baseline: 2 independent hunters, |Si | = 26

I 2 hunters, communicating own locations, |Si | ≈ 26 · 192 = 9386

I 2 hunters, communicating own+prey locations. |Si | ≈ (192 + 1) · 192 = 130682

14 of 40

Page 17: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Hunter-Prey Results

50 100 150 200 250 300 350 400 450 500Number of Trials

010

2030

4050

Ave

rage

step

sin

train

ing

(cum

ulat

ive) Independent

Same-policy

Experiment 1: any hunter catchesa prey

0 2000 4000 6000 8000 10000Number of Trials

025

5075

100

125

150

Average

stepsintra

ining(fo

revery200tra

ils)

IndependentPassively-observingMutual-scouting

Experiment 2: Both hunterscatching same prey simultaneously

15 of 40

Page 18: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Decentralised Sparse-Interaction MDP

[Melo and Veloso, 2011] Philosophy:

I N-agent coordination is hard since the size of the state spacegrows exponentially in N.

I ∴ Limit scope of coordination to where it’s probably more useful;plans and learn w.r.t. ‘local’ agent-agent interactions only.

The Dec-SIMDP framework determines when and how agents i and jcoordinate vs act independently.

‘Decentralised’ = have full joint S-observability, but not full individualS-observability (agent i only observes Si + nearby agents).

16 of 40

Page 19: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Dec-SIMDP: Reducing Joint State Space

S1

S2

S1 × S2

Global coupling

S1

S2

S1 × S2

Local coupling only

17 of 40

Page 20: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Dec-SIMDP: A Navigation Task

Navigation task: coordination necessarily only when crossing the narrowdoorway.

Si = 1, ..., 20, D,

Ai = N, S , E , W ,

Zi = Si ∪ 6, 15, D× 6, 15, D

R(s, a) =

2 if s = (20, 9)1 if s1 = 20, or s2 = 9

−20 if s = (D, D)0 otherwise

18 of 40

Page 21: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Sparse Interaction [video]

Four interconnected modular robots cooperate to change configuration:line → ring

19 of 40

Page 22: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Teammate Modelling

[Mundhe and Sen, 2000]

20 of 40

Page 23: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Credit Assignment

How should individuals be individually credited w.r.t. total teamperformance (or utility)?

21 of 40

Page 24: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Communication

“Shall we both choose to cooperate next round?”

“OK.”

Sideshow Bob

Cooperate Defect

Sn

ake Cooperate 1,1 3,0

Defect 0,3 2,2

(prison sentence in years)

22 of 40

Page 25: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Unknown Languages

“ ?”

“What?”

Alien

Cooperate Defect

Sn

ake Cooperate 1,1 3,0

Defect 0,3 2,2

(prison sentence in years)

23 of 40

Page 26: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Game Theory

Multi-Agent Reinforcement Learning

Learning Communication

Page 27: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Learning communication

I How learning communication can help in RL collaboration

I Approaches to learning communication (ranging from linguisticallymotivated to a pragmatic view)

I What problems exist with learning communication?

25 of 40

Page 28: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Learning communication for collaboration

How can learning communication help in RL collaboration?

I Forgoes expensive expert time for protocol planning

I Allows for a decentralised system without an external authority todecide on a communication protocol

I Life-long learning (adaptive tasks, e.g. future proofed robots)

26 of 40

Page 29: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Learning communication for collaboration

How can learning communication help in RL collaboration?I Forgoes expensive expert time for protocol planningI Allows for a decentralised system without an external authority to

decide on a communication protocolI Life-long learning (adaptive tasks, e.g. future proofed robots)

26 of 40

Page 30: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Learning communication for collaboration

How can learning communication help in RL collaboration?

I Forgoes expensive expert time for protocol planning

I Allows for a decentralised system without an external authority todecide on a communication protocol

I Life-long learning (adaptive tasks, e.g. future proofed robots)

26 of 40

Page 31: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – emergent languagesI Emergent languages

I Pidgin – a simplified language developed for communicationbetween groups that do not have a common language

I Creole – a pidgin language nativised by children as their primarylanguage, e.g. Singlish

27 of 40

Page 32: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the

same word),I and be open (agents may enter or leave the population, new words

might emerge to describe meanings).

28 of 40

Page 33: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the

same word),I and be open (agents may enter or leave the population, new words

might emerge to describe meanings).

28 of 40

Page 34: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the

same word),I and be open (agents may enter or leave the population, new words

might emerge to describe meanings).

28 of 40

Page 35: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the

same word),I and be open (agents may enter or leave the population, new words

might emerge to describe meanings).

28 of 40

Page 36: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I [Steels, 1996] constructs a model in which words map to featuresof an object

29 of 40

Page 37: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I Agents learn each-other’s word-feature mappings by selecting anobject and describing one of its distinctive features

30 of 40

Page 38: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – computational models

I An agent’s word-feature mapping is reinforced when both agentsuse the same word to identify a distinctive feature of the object

31 of 40

Page 39: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework

where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .

32 of 40

Page 40: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework

where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .

32 of 40

Page 41: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework

where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .

32 of 40

Page 42: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework

where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .

32 of 40

Page 43: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework

where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .

32 of 40

Page 44: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions

δi : Ω∗ × Σ∗ → Ai ,

I and define a secondary mapping from sequences of state-messagepairs to messages

δΣi : Ω∗ × Σ∗ → Σi .

I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,

I And meaning is interpreted as “what belief state would cause meto send the message I just received”.

33 of 40

Page 45: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions

δi : Ω∗ × Σ∗ → Ai ,

I and define a secondary mapping from sequences of state-messagepairs to messages

δΣi : Ω∗ × Σ∗ → Σi .

I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,

I And meaning is interpreted as “what belief state would cause meto send the message I just received”.

33 of 40

Page 46: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions

δi : Ω∗ × Σ∗ → Ai ,

I and define a secondary mapping from sequences of state-messagepairs to messages

δΣi : Ω∗ × Σ∗ → Σi .

I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,

I And meaning is interpreted as “what belief state would cause meto send the message I just received”.

33 of 40

Page 47: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions

δi : Ω∗ × Σ∗ → Ai ,

I and define a secondary mapping from sequences of state-messagepairs to messages

δΣi : Ω∗ × Σ∗ → Σi .

I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,

I And meaning is interpreted as “what belief state would cause meto send the message I just received”.

33 of 40

Page 48: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Learning communication: a model

Overview of the framework

34 of 40

Page 49: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – formal framework

I Several experiments where used to assess the framework.I For example, two agents work to meet at a point in a gridworld

according to a belief over the location of the other.

I Messages describing an agent’s location are exchanged and theirtranslations are updated depending on whether the agents meet ornot.

I The optimal policies are assumed to be known before the agentstry to learn how to communicate.

35 of 40

Page 50: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – a pragmatic view

I Use in roboticsI A leader robot controlling a follower robot [Yanco and Stein, 1993]I Small robots pushing a box towards a source of light [Mataric, 1998]

Figure: Leader-follower robots Figure: Box pushing

36 of 40

Page 51: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – a pragmatic view

I Use in roboticsI A leader robot controlling a follower robot

Communication diagram

37 of 40

Page 52: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Approaches to learning communication

From linguistic motivation to a pragmatic view – a pragmatic view

I Use in roboticsI A leader robot controlling a follower robot

Reinforcement regime38 of 40

Page 53: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Why is learning communication difficult?

What problems exist with learning communication?I Difficult to specify a framework

I Many partial frameworks proposed with different approachesI State space explosionI Difficult to use for RL collaboration

I No framework has been shown to improve on independent RL

39 of 40

Page 54: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Why is learning communication difficult?

What problems exist with learning communication?I Difficult to specify a framework

I Many partial frameworks proposed with different approachesI State space explosionI Difficult to use for RL collaboration

I No framework has been shown to improve on independent RL

39 of 40

Page 55: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Why is learning communication difficult?

What problems exist with learning communication?

I Difficult to specify a frameworkI Many partial frameworks proposed with different approaches

I State space explosionI Difficult to use for RL collaboration

I No framework has been shown to improve on independent RL

39 of 40

Page 56: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Why is learning communication difficult?

What problems exist with learning communication?

I Difficult to specify a frameworkI Many partial frameworks proposed with different approaches

I State space explosionI Difficult to use for RL collaboration

I No framework has been shown to improve on independent RL

These problems are not fully answered in currentresearch.

39 of 40

Page 57: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Up, Up and Away

Where this might go

I Learning communication based on sparse interactionsI Reduce state space complexity

I Selecting what to listen to in incoming communicationI State space selection

I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of

agents

I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and

then allowing agents to start communicating to improvecollaboration

40 of 40

Page 58: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Up, Up and Away

Where this might go

I Learning communication based on sparse interactionsI Reduce state space complexity

I Selecting what to listen to in incoming communicationI State space selection

I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of

agents

I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and

then allowing agents to start communicating to improvecollaboration

40 of 40

Page 59: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Up, Up and Away

Where this might go

I Learning communication based on sparse interactionsI Reduce state space complexity

I Selecting what to listen to in incoming communicationI State space selection

I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of

agents

I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and

then allowing agents to start communicating to improvecollaboration

40 of 40

Page 60: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Up, Up and Away

Where this might go

I Learning communication based on sparse interactionsI Reduce state space complexity

I Selecting what to listen to in incoming communicationI State space selection

I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of

agents

I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and

then allowing agents to start communicating to improvecollaboration

40 of 40

Page 61: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Up, Up and Away

Where this might go

I Learning communication based on sparse interactionsI Reduce state space complexity

I Selecting what to listen to in incoming communicationI State space selection

I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of

agentsI Online learning of communication

I Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and

then allowing agents to start communicating to improvecollaboration

Lots to do for future research!40 of 40

Page 62: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Claus, C. and Boutilier, C. (1998).The dynamics of reinforcement learning in cooperative multiagentsystems.In AAAI/IAAI, pages 746–752.

Goldman, C. V., Allen, M., and Zilberstein, S. (2007).Learning to communicate in a decentralized environment.Autonomous Agents and Multi-Agent Systems, 15(1):47–90.

Mataric, M. J. (1998).Using communication to reduce locality in distributed multiagentlearning.Journal of experimental & theoretical artificial intelligence,10(3):357–369.

Melo, F. S. and Veloso, M. (2011).Decentralized mdps with sparse interactions.Artificial Intelligence, 175(11):1757–1789.

Mundhe, M. and Sen, S. (2000).Evolving agent socienties that avoid social dilemmas.

40 of 40

Page 63: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

In GECCO, pages 809–816.

Sen, S., Sekaran, M., Hale, J., et al. (1994).Learning to coordinate without sharing information.In AAAI, pages 426–431.

Steels, L. (1996).Emergent adaptive lexicons.From animals to animats, 4:562–567.

Tan, M. (1993).Multi-agent reinforcement learning: Independent vs. cooperativeagents.In Proceedings of the tenth international conference on machinelearning, volume 337. Amherst, MA.

Yanco, H. and Stein, L. A. (1993).An adaptive communication protocol for cooperating mobilerobots.In Meyer, JA, HL Roitblat, and S. Wilson (1993) From Animals toAnimats 2. Proceedings of the Second International Conference on

40 of 40

Page 64: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game

Simulation of Adaptive Behavior. The MIT Press, Cambridge Ma,pages 478–485.

40 of 40