non-linear value function approximation: double deep q

20
Non-linear Value Function Approximation: Double Deep Q-Networks Alina Vereshchaka CSE4/510 Reinforcement Learning Fall 2019 avereshc@buffalo.edu October 8, 2019 *Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, Unnat Jain. Illinois Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Upload: others

Post on 08-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-linear Value Function Approximation: Double Deep Q

Non-linear Value Function Approximation: Double Deep Q-Networks

Alina Vereshchaka

CSE4/510 Reinforcement LearningFall 2019

[email protected]

October 8, 2019

*Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, UnnatJain. Illinois

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Page 2: Non-linear Value Function Approximation: Double Deep Q

Overview

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 2 / 18

Page 3: Non-linear Value Function Approximation: Double Deep Q

Table of Contents

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 3 / 18

Page 4: Non-linear Value Function Approximation: Double Deep Q

Recap: Deep Q-Networks (DQN)

Represent value function by deep Q-network with weights w

Q(s, a,w) ≈ Qπ(s, a)

Define objective function

Leading to the following Q-leaning gradient

Optimize objective end-to-end by SGD, using ∂L(w)∂w

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 4 / 18

Page 5: Non-linear Value Function Approximation: Double Deep Q

Deep Q-Networks

DQN provides a stable solution to deep value-based RL

1 Use experience replay

2 Freeze target Q-network

3 Clip rewards or normalize network adaptive to sensible range

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 5 / 18

Page 6: Non-linear Value Function Approximation: Double Deep Q

Deep Q-Network (DQN) Architecture

Naive DQN Optimized DQN used by DeepMind

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 6 / 18

Page 7: Non-linear Value Function Approximation: Double Deep Q

DQN in Atari

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 7 / 18

Page 8: Non-linear Value Function Approximation: Double Deep Q

DQN Algorithm

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 8 / 18

Page 9: Non-linear Value Function Approximation: Double Deep Q

Table of Contents

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 9 / 18

Page 10: Non-linear Value Function Approximation: Double Deep Q

Double Q learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 10 / 18

Page 11: Non-linear Value Function Approximation: Double Deep Q

Double Q learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 11 / 18

Page 12: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18

Page 13: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18

Page 14: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Chances of both estimators overestimating at sameaction is lesser

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 13 / 18

Page 15: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18

Page 16: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18

Page 17: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 15 / 18

Page 18: Non-linear Value Function Approximation: Double Deep Q

Double Deep Q Network

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 16 / 18

Page 19: Non-linear Value Function Approximation: Double Deep Q

Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 17 / 18

Page 20: Non-linear Value Function Approximation: Double Deep Q

Are the Q-values accurate?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 18 / 18