non-linear value function approximation: double deep q

Non-linear Value Function Approximation: Double Deep Q-Networks

Alina Vereshchaka

CSE4/510 Reinforcement LearningFall 2019

avereshc@buffalo.edu

October 8, 2019

*Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, UnnatJain. Illinois

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Overview

1 Recap: DQN

2 Double Deep Q Network

Table of Contents

1 Recap: DQN

Recap: Deep Q-Networks (DQN)

Represent value function by deep Q-network with weights w

Q(s, a,w) ≈ Qπ(s, a)

Define objective function

Leading to the following Q-leaning gradient

Optimize objective end-to-end by SGD, using ∂L(w)∂w

Deep Q-Networks

DQN provides a stable solution to deep value-based RL

1 Use experience replay

2 Freeze target Q-network

3 Clip rewards or normalize network adaptive to sensible range

Deep Q-Network (DQN) Architecture

Naive DQN Optimized DQN used by DeepMind

DQN in Atari

DQN Algorithm

Table of Contents

1 Recap: DQN

Double Q learning

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Double Q-learning

Two estimators:

What is the main motivation?

Double Q-learning

Two estimators:

Chances of both estimators overestimating at sameaction is lesser

Double Q-learning

Two estimators:

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Double Q-learning

Two estimators:

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Double Q-learning

Double Deep Q Network

Two estimators:

Double Deep Q Network

Are the Q-values accurate?

non-linear value function approximation: double deep q

Documents

provable approximation properties for deep neural...

travel time models for double-deep automated storage and

deep learning: approximation of functions by...

optimal approximation with sparse deep neural...

deep reinforcement learning from policy-dependent human...

interpretable deep learning - aifrenz deep learning.pdf ·...

scientific computing lecture series introduction to deep...

deep sequencing reveals double mutations in cis of mpl...

comparison of deep learning frameworks from a viewpoint of...

hardware-aware softmax approximation for deep neural...

lradnn: high-throughput and energy-efficient deep neural...

weighted double deep multiagent reinforcement learning in...

a new type of non-linear approximation with applic - deep...

compressed self-attention for deep metric learning with...

flappy bird hack using deep reinforcement learning with...

pure warehouse dynamics - schaefershelving.com · k pallet...

mcdnn an approximation-based execution framework for deep...

exploring per-input filter selection and approximation...

wandzura-wilczek-type approximation · prepared for...

double-deep storage technology - unicarriers europe...