![Page 1: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/1.jpg)
RL and deep learningNando de Freitas
![Page 2: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/2.jpg)
![Page 3: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/3.jpg)
![Page 4: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/4.jpg)
Google’s neural net learns just by watching youtube videos
![Page 5: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/5.jpg)
Place cells in the hippocampus
![Page 6: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/6.jpg)
[Denil et al 2012]
![Page 7: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/7.jpg)
Hierarchical reinforcement learning
High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers.
Mid-level active path learning for navigating a topological map.
Low-level active policy optimizer to learn control of continuous non-linear vehicle dynamics.
![Page 8: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/8.jpg)
Active Path Finding in Middle LevelNavigate policy generates sequence of waypoints on a topological map to navigate from a location to a destination.
![Page 9: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/9.jpg)
Low-Level: Trajectory following
Vx
VyYerr
err
TORCS: 3D game engine that implements complex vehicle dynamics complete with manual and automatic transmission, engine, clutch, tire, and suspension models.
![Page 10: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/10.jpg)
Bayesian optimization was used to find the neural net low-level policy and the value function for the upper levels
![Page 11: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/11.jpg)
![Page 12: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/12.jpg)
Deepmind approach
![Page 13: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/13.jpg)
Deepmind approach
![Page 14: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/14.jpg)
Deepmind approach
![Page 15: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/15.jpg)
Imitation learning & mirror neurons
![Page 16: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/16.jpg)
Imitation learning for Atari
[Dejan, Miroslav, 2014]
![Page 17: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/17.jpg)
Imitation learning for Atari
[Dejan Markovikj, Miroslav Bogdanovic, Misha Denil, NdF 2014]
![Page 18: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/18.jpg)
Inverse RL…or teaching deepmind how to play Atari
![Page 19: RL and deep learning - mlss2014.com · Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level](https://reader033.vdocuments.net/reader033/viewer/2022052923/5c680e6409d3f2bb148caf3b/html5/thumbnails/19.jpg)
Next lecture: scalable learning
Thank you