comparative evaluation of cooperative multi-agent deep

27

Comparative Evaluation of Cooperative Multi-Agent Deep Reinforcement Learning Algorithms Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht Autonomous Agents Research Group School of Informatics University of Edinburgh

Upload: others

Post on 21-Feb-2022

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Comparative Evaluation of Cooperative Multi-Agent Deep

Comparative Evaluation of Cooperative Multi-Agent Deep Reinforcement Learning

Algorithms

Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht

Autonomous Agents Research GroupSchool of InformaticsUniversity of Edinburgh

Page 2: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 3: Comparative Evaluation of Cooperative Multi-Agent Deep

Introduction and Motivation

● Lack of commonly used benchmarking environments

● Lack of consistent algorithm implementations

3

Page 4: Comparative Evaluation of Cooperative Multi-Agent Deep

Introduction and Motivation

● Lack of commonly used benchmarking environments

● Lack of consistent algorithm implementations

● Lack of consistent evaluation

● Measuring progress in MARL research

4

Page 5: Comparative Evaluation of Cooperative Multi-Agent Deep

Contributions

● Evaluate seven commonly used MARL algorithms

● We open-source two evaluation environments

● We discuss and analyse their performance with several metrics

5

Page 6: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 7: Comparative Evaluation of Cooperative Multi-Agent Deep

Algorithms

We evaluate three type of algorithms:

1. Independent Learners: IQL [11], IA2C [4, 6]

2. Centralised Multi-Agent Policy Gradient: MADDPG [7], COMA [5], Central-V

3. Value Decomposition: VDN [10], QMIX [8]

7

Page 8: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 9: Comparative Evaluation of Cooperative Multi-Agent Deep

Environments

9

Robotic Warehouse

MPE [7] SMAC [9]

LBF RWARE

Page 10: Comparative Evaluation of Cooperative Multi-Agent Deep

Environments

10

Robotic WarehouseClimbing Penalty

+Matrix games [3]

Page 11: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 12: Comparative Evaluation of Cooperative Multi-Agent Deep

Evaluation Protocol

● In each algorithm and task we train several hyperparameter combinations for 10 seeds each

● In Matrix Games:○ Train on-policy algorithms for 2.5M steps○ Train off-policy algorithms for 250K steps○ Evaluate 100 times during training for 100 episodes at each evaluation

● In the rest:○ Train on-policy algorithms for 20M steps○ Train off-policy algorithms for 2M steps○ Evaluate 40 times during training for 100 episodes at each evaluation

12

Page 13: Comparative Evaluation of Cooperative Multi-Agent Deep

Evaluation Metrics

● Maximum returns (maximum evaluation over different hyperparameters and evaluation points)

● Average returns (average among all evaluation points)

● Reliability metrics [1]:○ Dispersion across Time (variability across training evaluations)○ Short-term Risk across Time (maximum drop between two evaluations)○ Long-term Risk across Time (maximum drop between any evaluation and

the so-far best evaluation)○ Dispersion across Runs (variability across different seeds)

13

Page 14: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 15: Comparative Evaluation of Cooperative Multi-Agent Deep

Results in Matrix Games

15

Page 16: Comparative Evaluation of Cooperative Multi-Agent Deep

Results in MPE

16

Page 17: Comparative Evaluation of Cooperative Multi-Agent Deep

Results in SMAC

17

Page 18: Comparative Evaluation of Cooperative Multi-Agent Deep

Results in LBF

18

Page 19: Comparative Evaluation of Cooperative Multi-Agent Deep

Results in RWARE

19

Page 20: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 21: Comparative Evaluation of Cooperative Multi-Agent Deep

Independent Learners

● Achieve high returns in most fully-observable environments

● Struggle to coordinate in partially-observable environments

● Cannot reason about the joint action of all agents

● Experience replay hinders the learning of IQL

● IA2C achieves better returns in coordination tasks because it learns on-policy

21

Page 22: Comparative Evaluation of Cooperative Multi-Agent Deep

Centralised Multi-Agent Policy Gradient

● Most effective in partially-observable environments

● MADDPG suffers from the biased sampling (Gumbel-Softmax)

● COMA suffers from high variance in the baseline computation

● Central-V is unable to reason about joint actions

● Centralised training can complicate the training procedure

22

Page 23: Comparative Evaluation of Cooperative Multi-Agent Deep

Value Decomposition

● Achieve high returns in the majority of environments

● Can reason about joint trajectories

● VDN is bounded by the linear decomposition

● Require highly informative reward signal

23

Page 24: Comparative Evaluation of Cooperative Multi-Agent Deep

1. Introduction and Motivation

2. Algorithms

3. Evaluation Environments

4. Evaluation Metrics

5. Results

6. Discussion

7. Conclusion

Page 25: Comparative Evaluation of Cooperative Multi-Agent Deep

Future Work

● Include more algorithms such as IPPO, MAPPO

● Include more environments

● Evaluate implementation details, such as parameter sharing [2], reward standardisation

25

Page 26: Comparative Evaluation of Cooperative Multi-Agent Deep

Contributions: 1. We evaluate and compare seven MARL algorithms

in 23 tasks using several evaluation metrics

2. We discuss and analyse the main benefits and limitations of the evaluated algorithms

Comparative Evaluation of Cooperative Multi-Agent Deep Reinforcement

Learning Algorithms

https://arxiv.org/abs/2006.07869Contributions:

https://arxiv.org/abs/2006.07869

Page 27: Comparative Evaluation of Cooperative Multi-Agent Deep

[1] Stephanie CY Chan, Sam Fishman, John Canny, Anoop Korattikara, and Sergio Guadarrama. 2020. Measuring the Reliability of Reinforcement Learning Algorithms. International Conference on Learning Representations (2020).

[2] Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, and Stefano V. Albrecht. "Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing." ALA Workshop (2021).

[3] Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI Conference on Artificial Intelligence (1998).

[4] Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.

[5] Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual multi-agent policy gradients. AAAI Conference on Artificial Intelligence (2018).

[6] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning (2016).

[7] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitiveenvironments. Neural Information Processing Systems (2017).

[8] Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. International Conference on Machine Learning (2018).

[9] Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson. 2019. The StarCraft multi-agent challenge. (2019).

[10] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2018. Value-Decomposition networks for cooperative multi-agent learning. International Conference on Autonomous Agents and Multi-Agent Systems (2018).

[11] Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. International Conference on Machine Learning (1993).

References

Shannon Huber Norwood Multi-County Extension Agent Alabama Cooperative Extension System

Cooperative Agent Approach to Quality Assurance and Testing

A Cooperative Switching Algorithm for Multi-Agent …n/pub/2015/cooperative-EAAI.pdf · A Cooperative Switching Algorithm for Multi-Agent Foraging Zedadra Ouardaa,b, Jouandeau Nicolasc,

Cooperative, hybrid agent architecture for real-time ...sc42050/assign/pap/Traffic_Control2.pdf · CHOY et al.: COOPERATIVE, HYBRID AGENT ARCHITECTURE FOR REAL-TIME TRAFFIC SIGNAL

Cooperative Multi-Agent Vehicle-to-Vehicle Wireless

Default and Cooperative Reasoning in Multi-Agent Systems

Marketing Cooperative vs Producer’s Agent: The Turkish ...ageconsearch.umn.edu/bitstream/9405/1/sp07le03.pdf · Marketing Cooperative vs Producer’s Agent: ... They discussed how

A multi-agent cooperative reinforcement learning … multi-agent cooperative reinforcement learning model using ... need to be learned through interaction with the environment [1]

Learning and Testing Resilience in Cooperative Multi-Agent ...ifaamas.org/Proceedings/aamas2020/pdfs/p1055.pdf · 2.2 Multi-Agent Reinforcement Learning Multi-Agent Reinforcement

Comparative Statics in Principal-Agent Problemsresearch.economics.unsw.edu.au/richardholden/assets/moral-hazard.pdf · Comparative Statics in Principal-Agent Problems Richard T. Holden

Option-Critic in Cooperative Multi-agent Systems · Option-Critic in Cooperative Multi-agent Systems Extended Abstract Jhelum Chakravorty1,3, Patrick Nadeem Ward1,3, Julien Roy2,3,

Multi-agent Cooperative Dynamical Systems: Theory and ... · Multi-agent Cooperative Dynamical Systems: Theory and Numerical Simulations C. Canuto, F. Fagnani and P. Tilli Dipartimento

A comparative study of cooperative education and work ... · PDF fileA comparative study of cooperative education and work-integrated learning in Germany, South Africa, and Namibia

A multi-agent architecture with cooperative fuzzy control for a …eia.udg.es/~qsalvi/papers/2007-RAS.pdf · 2008-01-16 · A multi-agent architecture with cooperative fuzzy control

Cooperative Multi-Agent Learning: The State of the Arteclab/papers/panait05cooperative.pdf · Cooperative Multi-Agent Learning: The State of ... search space involved can be ... we

A MULTI-AGENT BASED COOPERATIVE APPROACH …pszeo/docs/publications/agent.pdf · 1 A MULTI-AGENT BASED COOPERATIVE 2 APPROACH TO SCHEDULING AND ROUTING Simon Martin1, Djamila Ouelhadj2,

SciAgents--An Agent Based Environment for Distributed, Cooperative Scientific Computing

ACTAM: Cooperative Multi-Agent System Architecture …cse.unl.edu/.../seminars/Seminar_SIB.pdf · ACTAM: Cooperative Multi-Agent System Architecture for Urban Traffic Signal Control

RoleEP: Role Based Evolutionary Programming for Cooperative Mobile Agent Applications

Feudal Multi-Agent Hierarchies for Cooperative Reinforcement ...Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning Sanjeevan Ahilan1 Peter Dayan2 Abstract We investigate

Cooperative Control of Multi-Agent Systemswatanabe-...1 Cooperative Control of Multi-Agent Systems Hideaki Ishii Dept. Computational Intelligence & Systems Science [email protected]

Value-Decomposition Networks For Cooperative Multi-Agent

Cooperative Multi-Agent Learning

Cooperative highway traﬀic: multi-agent modelling and

Artificial Intelligence Cooperative Agent Systems

Evolving Cooperative Strategies in Multi-Agent Systems Using a Coevolutionary Algorithm

Multi-Agent Actor-Critic for Mixed Cooperative … · Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments Ryan Lowe McGill University OpenAI Yi Wu UC Berkeley

A comparative study of cooperative education and work

Comparative Models of Cooperative Journalism

AN OPTIMAL SOLUTION FOR THE MULTI-AGENT … · RENDEZVOUS PROBLEM APPEARING IN COOPERATIVE CONTROL ... The multi-agent rendezvous problem appearing in cooperative ... 3.31 Final …

Comparative Effectiveness Research in the VA Cooperative

Multi-Agent Based Cooperative Control Framework for ... · microgrid, multi-agent system, distributed control, agent coalition, smart grid . I. INTRODUCTION ODERN power systems have

Cooperative multi-agent systems

Cooperative Exploration for Multi-Agent Deep Reinforcement

Feudal Multi-Agent Hierarchies for Cooperative …Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning Sanjeevan Ahilan Gatsby Computational Neuroscience Unit University