deep reinforcement learning
TRANSCRIPT
![Page 1: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/1.jpg)
Deep RLAbecon 20.05.2016
![Page 2: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/2.jpg)
RL-type Problems
• game of chess, GO, Space Invaders
• balancing a unicycle
• investing in stock market
• running a business
• making fast food
• life…!
![Page 3: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/3.jpg)
Markov Decision Process
• S - set of states
• A - set of actions (or actions for state)
• P(s, s’ | a) - state change
• R(s, s’ | a) - reward
• ∈ [0, 1] - discount factor
![Page 4: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/4.jpg)
![Page 5: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/5.jpg)
Maximize the total discounted reward:
The GOAL
![Page 6: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/6.jpg)
: discount factor0 instant gratification
…
patience1
![Page 7: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/7.jpg)
Value Functions
http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
![Page 8: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/8.jpg)
SARSA
http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
![Page 9: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/9.jpg)
Q-Learning
http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
![Page 10: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/10.jpg)
Reinforce.jshttp://cs.stanford.edu/people/karpathy/reinforcejs/
// create an environment objectvar env = {};env.getNumStates = function() { return 8; }env.getMaxNumActions = function() { return 4; }
// create the DQN agentvar spec = { alpha: 0.01 } agent = new RL.DQNAgent(env, spec);
setInterval(function(){ // start the learning loop var action = agent.act(s); // s is an array of length 8 agent.learn(reward);}, 0);
![Page 11: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/11.jpg)
2013: Deep RL
http://arxiv.org/abs/1312.5602
![Page 12: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/12.jpg)
2014: Google buys DeepMind
![Page 13: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/13.jpg)
2015: AlphaGO
![Page 14: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/14.jpg)
Deep Q-Learning1. Do a feedforward pass for the current state s to get predicted Q-values
for all actions.
2. Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’).
3. Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.
4. Update the weights using backpropagation.
http://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 15: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/15.jpg)
Deep Q-Learning
http://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 16: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/16.jpg)
https://www.youtube.com/watch?v=32y3_iyHpBc
http://gabrielecirulli.github.io/2048/
![Page 17: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/17.jpg)
Asynchronous Gradient Descent
http://arxiv.org/abs/1602.01783
![Page 18: Deep Reinforcement Learning](https://reader031.vdocuments.net/reader031/viewer/2022022414/586f89851a28ab54768b6023/html5/thumbnails/18.jpg)
http://www.rethinkrobotics.com/baxter/