non-linear value function approximation: double deep q

Post on 08-Dec-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Non-linear Value Function Approximation: Double Deep Q-Networks

Alina Vereshchaka

CSE4/510 Reinforcement LearningFall 2019

avereshc@buffalo.edu

October 8, 2019

*Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, UnnatJain. Illinois

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Overview

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 2 / 18

Table of Contents

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 3 / 18

Recap: Deep Q-Networks (DQN)

Represent value function by deep Q-network with weights w

Q(s, a,w) ≈ Qπ(s, a)

Define objective function

Leading to the following Q-leaning gradient

Optimize objective end-to-end by SGD, using ∂L(w)∂w

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 4 / 18

Deep Q-Networks

DQN provides a stable solution to deep value-based RL

1 Use experience replay

2 Freeze target Q-network

3 Clip rewards or normalize network adaptive to sensible range

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 5 / 18

Deep Q-Network (DQN) Architecture

Naive DQN Optimized DQN used by DeepMind

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 6 / 18

DQN in Atari

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 7 / 18

DQN Algorithm

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 8 / 18

Table of Contents

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 9 / 18

Double Q learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 10 / 18

Double Q learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 11 / 18

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Chances of both estimators overestimating at sameaction is lesser

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 13 / 18

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18

Double Q-learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 15 / 18

Double Deep Q Network

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 16 / 18

Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 17 / 18

Are the Q-values accurate?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 18 / 18

top related