innorobo 2016 keynote - making robots dream to face open environments
TRANSCRIPT
Making robots dream to face open environments
Stéphane Doncieux
What a machine can do
YASKAWA BUSHIDO PROJECT / industrial robot vs sword master
Deep Blue vs G. Kasparov 1997
Motion Problem resolution
Doncieux, S. (to appear) Creativity: A Driver for Research on Robotics in Open Environments, Intellectica
Perf
orm
ance
Context
Robot A
Robot B
??? ???
Known
Unknown Unknown
How can a robot face a new environment ?
1. Robustness
2. Learning
3. Development
Manual development
Autonomous development
1. Robustness
Manual development
Autonomous development
Learning
2. Learning
Reward
High
Low A
Learning the action to apply in a state to maximize reward.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT press.
2. Learningcontinuous actions & states
EvaluationGenotype
Fitness
Random generation
Selection
Variation
8.300110100111
Termination
Initial conditionsEvaluation
Genotype
Phenotype
Behavior
Environment
Fitness
Evolutionary Robotics
Mouret, J.B., Bredeche, N. et Doncieux S. La robotique évolutionniste Pour la science n°87, Avril-Juin 2015
Doncieux, S., Bredeche, N., Mouret, J.-B., & Eiben, A. E. (2015). Evolutionary Robotics: What, Why, and Where to.
Frontiers in Evolutionary Robotics, doi: 10.3389/frobt.2015.00004
A
Kober, J., Bagnell, J. a., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274. doi:10.1177/0278364913495721
2. Learning The representation is critical !
???• Reinforcement Learning: fast but requires an efficient representation • Evolutionary Robotics: low level representation, but slow…
3. Development
Weng, J. (2004). Developmental robotics : Theory and experiments. International Journal of Humanoid Robotics, 1(2), 199–236.
Manual development
Autonomous development
3. DevelopmentInsights from psychology
The importance of redescribing knowledge representations
« A specifically human way to gain knowledge is for the mind to exploit internally the information that it has already stored (both innate and acquired), by redescribing its representations or, more precisely, by iteratively re-representing in different representational formats what its internal representations represent » [Karmiloff-Smith 1996]
When to restructure and consolidate knowledge ?
« Sleep consolidates recent memories and, concomitantly, could allow insight by changing their representational structure. » [Wagner, 2004]
Kick-off meeting DREAM, Paris, 26/01/2015
Deferred Restructuring of Experience in Autonomous Machines
H2020 FET Proactive « Knowing, doing, being » 01/2015-12/2018
http://www.robotsthatdream.eu/ https://twitter.com/robotsthatdream
3. Development Changing representations
Daytime experience
(large batch)
Daytime
Consolidated knowledge- task-relevant features- task contexts- abstract knowledge- new motivations
No initial policyNo single taskMotivations:- curiosity- satisfying humans- global mission
Behavior explorationKnowledge improvement
Knowledge adaptation
Small batch
Skill Knowledge validation
Sequence of learning episodes driven by motivations
New situation:-no reprogramming-fast adaptation
Knowledge sharingbetween robots:- better generalization- faster learning
Nighttime
Dream
Collective scale
Individual scale
Knowledge restructuring
Transfer from STM to LTM
Learning 10 to 100
times faster
Generates examplesof behaviours
Discrete actions and sensors to consider
Passive analysis
Representationredescription
2
1
Learning
Direct policy search(neuroevolution)
Task-agnosticrepresentations
Slow learningLimited generalization
3
Learning
Discrete reinforcementlearning
Task-specific representations
Fast learningGood generalization
Development: bootstrapping simple manipulation skills
1. Day 1: sensori-motor babbling 2. «Night» Learning to manipulate an object in simulation
3. Day 2 : Back to reality
Thank you ! Questions ?
[email protected] https://twitter.com/SDoncieux http://people.isir.upmc.fr/doncieux