introduction to deep q-network - washington state...
TRANSCRIPT
![Page 1: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/1.jpg)
Introduction to Deep Q-network
Presenter: Yunshu Du
CptS 580 Deep Learning
10/10/2016
![Page 2: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/2.jpg)
Deep Q-network (DQN)
![Page 3: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/3.jpg)
Deep Q-network (DQN)
• An artificial agent for general Atari game playing
– Learn to master 49 different Atari games directly from game
screens
– Beat the best performing learner from the same domain in 43
games
– Excel human expert in 29 games
![Page 4: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/4.jpg)
Deep Q-network (DQN)
• A demo on DQN playing Atari Breakout
https://www.youtube.com/watch?v=V1eYniJ0Rnk
![Page 5: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/5.jpg)
![Page 6: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/6.jpg)
DQN is reinforcement learning + CNN magic!
• “Q”: Q-learning, a reinforcement learning (RL) method, the
agent interact with the environment to maximize future
rewards
• “Deep”, “network” : deep artificial neural networks to
learn general representation in complex environments
![Page 7: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/7.jpg)
Q-Learning
• Action-value (Q) function
• Optimal Q function obeys Bellman equation
• The Q-Learning algorithm
http://www.nervanasys.com/demystifying-deep-reinforcement-learning/
approximator
![Page 8: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/8.jpg)
Q-Learning
• Exploration vs. Exploitation
– Do I want to know as much as possible, or do my best at
things that I already know?
– ε-greedy exploration to select actions
approximator
http://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 9: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/9.jpg)
Example: Q-Learning for Atari Breakout
![Page 10: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/10.jpg)
Q-Learning
• But what if there are too many states/actions?
– Solution: deep convolutional network as function
approximator
weights
![Page 11: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/11.jpg)
Deep Convolutional neural network (CNN)
• Extracts features directly from raw pixel
• Atari game image pre-processing: 84x84x4
http://cs231n.github.io/convolutional-networks/
![Page 12: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/12.jpg)
DQN Architecture
8x8
Input image: 84x84x4
84x84
32 filters
8x8 stride 4
3x3
#W0 = 8192
(8*8*4)*32
http://www.slideshare.net/onghaoyi/distributed-deep-qlearning
output size
=(84-8)/4+1
= 20*20*32
64 filters
4x4 stride 2
output size
=(20-4)/2+1
= 9*9*64
#W1 = 32768
(4*4*32)*64
![Page 13: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/13.jpg)
DQN Architecture
8x8
Input image: 84x84x4
84x84
32 filters
8x8 stride 4
3x3
#W0 = 8192
(8*8*4)*32
http://www.slideshare.net/onghaoyi/distributed-deep-qlearning
output size
=(84-8)/4+1
= 20*20*32
64 filters
4x4 stride 2
output size
=(20-4)/2+1
= 9*9*64
#W1 = 32768
(4*4*32)*64
64 filters
3x3 stride 1
7x7
output size
= 7*7*64
#W2 = 36864
(3*3*64)*64
Convolutional
![Page 14: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/14.jpg)
DQN Architecture
8x8
Input image: 84x84x4
84x84
32 filters
8x8 stride 4
3x3
#W0 = 8192
(8*8*4)*32
http://www.slideshare.net/onghaoyi/distributed-deep-qlearning
output size
=(84-8)/4+1
= 20*20*32
64 filters
4x4 stride 2
output size
=(20-4)/2+1
= 9*9*64
#W1 = 32768
(4*4*32)*64
64 filters
3x3 stride 1
7x7
output size
= 7*7*64
#W2 = 36864
(3*3*64)*64
512 rectifierReshape
3136
Output Q values
for each action
Fully ConnectedConvolutionalAny missing
component?
![Page 15: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/15.jpg)
Q-Learning
• Problem: Reinforcement learning is known to be unstable
or even to diverge when use a nonlinear function
approximator such as a neural network
– Correlation between samples
– Small updates to Q value may significantly change the policy
Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE
transactions on automatic control, 42(5), 674-690.
Deep
![Page 16: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/16.jpg)
Q-Learning
• Solutions in DQN
– Experience replay
• Each iterations store experience sequence
et
= (st,a
t,r
t,s
t + 1), D
t= {e
1,…,e
t}
• Randomly drawn samples of experience (s,a,r,s′) ~ U(D) and apply
Q update in minibatch fashion
– Separate target network
• Clone Q(s,a; θ) to a separate target Qˆ(s,a; θ–) every C time step
• Treat y as the target and θ–are held fixed while update
– Reward clipping
• {-1, 1}
Deep
![Page 17: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/17.jpg)
Deep Q-network (DQN)
• Minimize squared error loss
• Stochastic gradient decent w.r.t. weights
– Minibatch of size 32
• Update weights using RMSprop: divide weights by a
running average
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
https://en.wikipedia.org/wiki/Stochastic_gradient_descent
target prediction
![Page 18: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/18.jpg)
DQN: Putting Together
Input CNN
Q value
for actions
Store experience
{st,at,rt,st + 1} then
Sample minibatch
Calculate target for
each sample
Calculate gradient and update weights
![Page 19: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/19.jpg)
Q DQN
http://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 20: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/20.jpg)
But … It’s not perfect!
• Reward clipping
– Agent can’t distinguish different scales of rewards
(e.g., Macman)
• Limited experience replay
– Might through away important experiences
• High computational complexity
– Almost 10 days to train one game on a single GPU! Slower on
physical robots
– 10+ GB to store experiences
Andrej Karpathy’s blog
![Page 21: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/21.jpg)
Beyond DQN
• More stabled learning
– Double DQN (Van, H et al. (2015)): use two Q-networks, one
for select action, the other for evaluate action
• Limited experience replay
– Prioritized Experience Replay (Schaul, T et al. (2016)): weight
experience according to surprise
• High computational time complexity
– Parallel/distributed computing (Nair, A et al. (2015))
– Dueling network (Wang, Z et al. (2015))L split DQN into two
channels
– Asynchronous RL (A3C) (Mnih, V et al. (2016)): can be trained
in CPU
David Silver’s tutorial on Deep Reinforcement Learning
ICML 2016, http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 22: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/22.jpg)
Beyond DQN
David Silver’s tutorial on Deep Reinforcement Learning
ICML 2016, http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 23: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/23.jpg)
Beyond DQN
• Deep Policy Network for continuous control
– Simulated robots
– Physical robots
David Silver’s tutorial on Deep Reinforcement Learning
ICML 2016, http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 24: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/24.jpg)
Beyond DQN
Mastering the game of
Go with deep neural
networks and tree search
Silver, D., Huang, A.,
Maddison, C.J., Guez, A.,
Sifre, L., Van Den Driessche,
G., Schrittwieser, J.,
Antonoglou, I.,
Panneershelvam, V., Lanctot,
M. and Dieleman, S., 2016.
![Page 25: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/25.jpg)
So … DQN is not magic
• Q learning + CNN as function approximator
• Experience replay + separate target + reward clipping
= stabilize learning
• To be continue …
![Page 26: Introduction to Deep Q-network - Washington State Universitytaylorm/17_580/Yunshu_DQNpresentation_10… · Deep Q-network (DQN) •An artificial agent for general Atari game playing](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec60811df097e0643499b0b/html5/thumbnails/26.jpg)
Introduction to Deep Q-network
Presenter: Yunshu Du
CptS 580 Deep Learning
10/10/2016