non-linear value function approximation: double deep q

Non-linear Value Function Approximation: Double Deep Q-Networks Alina Vereshchaka CSE4/510 Reinforcement Learning Fall 2019 avereshc@buﬀalo.edu October 8, 2019 *Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, Unnat Jain. Illinois Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Upload: others

Post on 08-Dec-2021

4 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Non-linear Value Function Approximation: Double Deep Q

Non-linear Value Function Approximation: Double Deep Q-Networks

Alina Vereshchaka

CSE4/510 Reinforcement LearningFall 2019

[email protected]

October 8, 2019

*Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, UnnatJain. Illinois

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Page 2: Non-linear Value Function Approximation: Double Deep Q

Overview

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 2 / 18

Page 3: Non-linear Value Function Approximation: Double Deep Q

Table of Contents

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 3 / 18

Page 4: Non-linear Value Function Approximation: Double Deep Q

Recap: Deep Q-Networks (DQN)

Represent value function by deep Q-network with weights w

Q(s, a,w) ≈ Qπ(s, a)

Define objective function

Leading to the following Q-leaning gradient

Optimize objective end-to-end by SGD, using ∂L(w)∂w

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 4 / 18

Page 5: Non-linear Value Function Approximation: Double Deep Q

Deep Q-Networks

DQN provides a stable solution to deep value-based RL

1 Use experience replay

2 Freeze target Q-network

3 Clip rewards or normalize network adaptive to sensible range

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 5 / 18

Page 6: Non-linear Value Function Approximation: Double Deep Q

Deep Q-Network (DQN) Architecture

Naive DQN Optimized DQN used by DeepMind

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 6 / 18

Page 7: Non-linear Value Function Approximation: Double Deep Q

DQN in Atari

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 7 / 18

Page 8: Non-linear Value Function Approximation: Double Deep Q

DQN Algorithm

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 8 / 18

Page 9: Non-linear Value Function Approximation: Double Deep Q

Table of Contents

1 Recap: DQN

2 Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 9 / 18

Page 10: Non-linear Value Function Approximation: Double Deep Q

Double Q learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 10 / 18

Page 11: Non-linear Value Function Approximation: Double Deep Q

Double Q learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 11 / 18

Page 12: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18

Page 13: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18

Page 14: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Chances of both estimators overestimating at sameaction is lesser

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 13 / 18

Page 15: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18

Page 16: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18

Page 17: Non-linear Value Function Approximation: Double Deep Q

Double Q-learning

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 15 / 18

Page 18: Non-linear Value Function Approximation: Double Deep Q

Double Deep Q Network

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 16 / 18

Page 19: Non-linear Value Function Approximation: Double Deep Q

Double Deep Q Network

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 17 / 18

Page 20: Non-linear Value Function Approximation: Double Deep Q

Are the Q-values accurate?

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 18 / 18

Deep Neural Network Approximation Theory · Deep Neural Network Approximation Theory Philipp Grohs, Dmytro Perekrestenko, Dennis Elbrachter, and Helmut B¨ olcskei¨ Abstract Deep

Double-Deep storage technology - UniCarriers Europe Deep... · 2018-11-16 · Double-Deep storage technology UniCarriers designs, manufactures and supports the most advanced material

LRADNN: High-Throughput and Energy-Efficient Deep Neural ... · LRADNN: High-Throughput and Energy-Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang

Compressed Self-Attention for Deep Metric Learning with ... · Compressed Self-Attention for Deep Metric Learning with Low-Rank Approximation Ziye Chen1, Mingming Gong2, Lingjuan

Deep Bar & Double Cage Rotors

Interpretable Deep Learning - AiFrenz Deep Learning.pdf · Interpretable Deep Learning 2019.2.20 Beomsu Kim KAIST Mathematical Science / Computer Science Double Major SI Analytics

MCDNN: An Approximation-Based Execution … Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints ... of differing accuracy in order …

Travel time models for double-deep automated storage and

Deep bar n double cage rotors

Deep Reinforcement Learning with Double Q-Learning

Provable approximation properties for deep neural networkscpsc.yale.edu/sites/default/files/files/tr1513(2).pdf · Provable approximation properties for deep neural networks Uri Shahamy,

Theoretical Issues in Deep Networks: Approximation, Optimization … · 2019-08-17 · Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization Tomaso Poggioa,1,Andrzej

Value Function Approximation and Deep Q Learning

In DEEP Chicken Pickin - Double Stops Columns

Smooth Function Approximation by Deep Neural Networks with ... · 3.2. Locally Quadratic Activation Functions One of the basic building blocks in approximation by deep neural networks

Week 5: Value Function Approximation and Deep Q-Learning

Approximation spaces of deep neural networks

IMPLEMENTASI DOUBLE DEEP PACKET INTRUSION DETECTION DAN …

Comparison of deep learning frameworks from a viewpoint of double backpropagation

Counting Tables using the Double Saddlepoint Approximation

Reviewing Autoencoding Variational Bayes...Autoencoding Variational Bayes & Deep Generative Models Elham Dolatabadi March 2019 Overview 2 Deep Latent-Variable Models Variational Approximation

Provable approximation properties for deep neural networks

Provable approximation properties for deep neural networks2).pdfProvable approximation properties for deep neural networks Uri Shahamy, Alexander Cloningerzand Ronald R. Coifmanz,

Deep G-Buffers for Stable Global Illumination Approximation

Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Deep Neural Network Approximation using Tensor Sketchingaccuracy even on large datasets, such as Places2, where other state-of-the-art network approximation techniques fail. 2 Preliminaries

Deep Learning: Approximation of Functions by Compositionhelper.ipam.ucla.edu/publications/dlt2018/dlt2018_14936.pdf · Outline 1 A brief introduction of approximation theory 2 Deep

Rapid Introduction to Machine Learning/ Deep Learninghichoi/machinelearning/lecture... · 4.2 Universal approximation theorem 4.3 Deep vs Shallow learning. 4/62 1 Bird’s-eye view

Wandzura-Wilczek-type approximation · Prepared for submission to JHEP JLAB-THY-18-2775 Semi-Inclusive Deep Inelastic Scattering in Wandzura-Wilczek-type approximation S. Bastamia

Optimal Approximation with Sparse Deep Neural Networksspars2017.lx.it.pt/index_files/PlenarySlides/Kutyniok_SPARS2017.pdf · Approximation by DDNs versus best M-term approximations

Applications of Double Difference Tomography for a …...Applications of Double-Difference Tomography for a Deep Hard Rock Mine Jeffrey Kerr Abstract Seismicity at a deep hard rock

A new type of non-linear approximation with applic - Deep Blue

MCDNN An Approximation-Based Execution Framework for Deep

Hardware-aware Softmax Approximation for Deep Neural Networks · Hardware-aware Softmax Approximation for Deep Neural Networks Xue Geng x, Jie Lin , Bin Zhao\, Anmin Kong , Mohamed

Exploring Per-Input Filter Selection and Approximation Techniques … · 2020-01-20 · Exploring Per-Input Filter Selection and Approximation Techniques for Deep Neural Networks