value iteration networks

22

Value Iteration Networks Aviv Tamar, Sergey Levine, and Pieter Abbeel Presenter: Sungjoon Choi arXiv:1602.02867v1 [cs.AI] 9 Feb 2

Upload: sungjoon-samuel

Post on 21-Feb-2017

823 views

Category:

Engineering

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Value iteration networks

Value Iteration Networks

Aviv Tamar, Sergey Levine, and Pieter Abbeel

Presenter: Sungjoon Choi

arXiv:1602.02867v1 [cs.AI] 9 Feb 2016

Page 2: Value iteration networks

This paper can be used for

Page 3: Value iteration networks

Convolutional Networks

Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.

Page 4: Value iteration networks

Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!

Page 5: Value iteration networks

Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.

Page 6: Value iteration networks

Path Planning?

Page 7: Value iteration networks

Why not just End to End?

Page 8: Value iteration networks

Is it Deep Q Learning?

No, it is different. DQN only models the Q-function with CNN.

Page 9: Value iteration networks

Reinforcement Learning

We only get the reward at certain points. What makes RL different from other methods?

But we have to make decision every time.

Page 10: Value iteration networks

RL: Value IterationSo, we introduce the notion of value.

And of course, ways to find the value function.

Page 11: Value iteration networks

Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”

Page 12: Value iteration networks

Value Iteration via CNN?

Page 13: Value iteration networks

Value Iteration Block

Page 14: Value iteration networks

Value Iteration Block

The depth of the Q layer need not to be the same as the number of actions.

Page 15: Value iteration networks

Value Iteration Network

VI Block

Page 16: Value iteration networks

Value Iteration Network

Or just a feature extraction stage. (I guess)

Page 17: Value iteration networks

Hierarchical VI Network

Page 18: Value iteration networks

Grid-World Experiment

Page 19: Value iteration networks

Grid-World Experiment

Input: Sequence of states (locations)Output: Sequence of actions (controls)

Page 20: Value iteration networks

Grid-World Experiment

Value Iteration Network vs. Direct Policy Learning

Page 21: Value iteration networks

Mars Rover Navigation

Page 22: Value iteration networks

ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!

Make things differentiable and use deep networks, deep learning tools will take care of the rest.

Still conceptual level, but potentials are limitless

Markov Decision Processes - INAOEesucar/Clases-mgp/Notes/c11-mdp.pdfMarkov Decision Processes Representation Evaluation Value Iteration Policy Iteration Factored MDPs Abstraction Decomposition

The Asymptotic Behavior of Undiscounted Value Iteration in

Lecture IV Value Function Iteration with Discretization · 2015-09-17 · Lecture IV Value Function Iteration with Discretization Gianluca Violante New York University Quantitative

MODULE COURSEWORK FEEDBACK - WordPress.com · 2. Question A: Value Iteration In reinforcement learning, value iteration concerns with ﬁnding the optimal policy ⇡ using an iterative

Limiting Extrapolation in Linear Approximate Value Iteration · 2020. 4. 16. · Linear Approximate Value Iteration Andrea Zanette Institute for Computational and Mathematical Engineering,

Gated Path Planning Networks ICML 2018proceedings.mlr.press/v80/lee18c/lee18c.pdf · 2019-02-08 · Value Iteration Networks (VINs) are effective dif-ferentiable path planning modules

Sistemi Intelligenti Reinforcement ... - homes.di.unimi.it · Schema di Apprendimento Policy iteration Generalized Policy iteration { Value iteration ... With this setting the estimated

Two Approximate Dynamic Programming Algorithms for Managing Complete SIS Networksbartlett/papers/pbbhc... · 2018-07-16 · 2.2 Value iteration Value iteration requires the introduction

The Predictron: End-to-end Learning and Planningmlg.postech.ac.kr/~readinglist/slides/20161227.pdf · Value Iteration Networks Summary I Neural network architecture that plans using

Approximate Value Iteration with Temporally Extended Actions · 2017-05-04 · Approximate Value Iteration with Temporally Extended Actions 2. Background Let Xbe a subset of d-dimensional

MAKING COMPLEX DEClSlONS. outline MDPs(Markov Decision Processes) Sequential decision problems Value iteration&Policy iteration POMDPs Partially observable

Generalized Value Iteration Networks:Life Beyond LatticesGeneralized Value Iteration Networks: Life Beyond Lattices Sufeng Niu ySiheng Chenz, Hanyu Guo , Colin Targonski , Melissa

Minimax Value Iteration Applied to Robotic Soccer

Perseus: Randomized Point-based Value Iteration for POMDPs · Perseus: Randomized Point-based Value Iteration for POMDPs The algorithm ensures that in each backup stage the value

Rover-IRL: Inverse Reinforcement Learning with Soft Value … · 2020-05-03 · Rover-IRL: Inverse Reinforcement Learning with Soft Value Iteration Networks for Planetary Rover Path

Empirical Q-Value Iteration - arXiv · 2019. 1. 31. · Kalathil, Borkar and Jain/Empirical Q-Value Iteration 2 step-sizes that are either decreasing slowly in a precise sense or

Dynamic Programming Value Iteration

New Research Article Two-Phase Iteration for Value Function …downloads.hindawi.com/journals/mpe/2015/760459.pdf · 2019. 7. 31. · Research Article Two-Phase Iteration for Value

Value iteration networks

Value Iteration Networks - arXiv.org e-Print archive Iteration Networks Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel Dept. of Electrical Engineering and Computer

Privacy Value Networks

Reachability in MDPs: Refining Convergence of Value Iteration · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and Benjamin

CS 357: Numerical Methods Lecture 14: Orthogonal Iteration ... · Lecture 14: Orthogonal Iteration Singular Value Decomposition ... Q k+1 R k+1 =X k X k+1 =AQ k. Orthogonal Iteration

A FRACTAL VALUE RANDOM ITERATION - Super Fractals

Module 6 Value Iteration - David R. Cheriton School of

MEMORY AUGMENTED CONTROL NETWORKSerl.ucsd.edu/ref/Khan_MACN_ICLR18.pdf · 2018-06-10 · Value Iteration algorithm. To this end we use the VI module in Value Iteration Networks (VIN)

Value iteration and optimization of multiclass queueing networks · 2014-12-08 · Queueing Systems 32 (1999) 65–97 65 Value iteration and optimization of multiclass queueing networks

RL 8: Value Iteration and Policy Iteration · RL 8: Value Iteration and Policy Iteration MichaelHerrmann University of Edinburgh, School of Informatics 06/02/2015

Value Function Iteration as a Solution Method for the

Non-delusional Q-learning and value iteration...architecture with VC-dimension d

1 Planning under Uncertainty. Today’s Topics Sequential Decision Problems Markov Decision Process (MDP) Value Iteration Policy Iteration Partially Observable

Games, Times, and Probabilities: Value Iteration in Verification and Control

Reinforcement Learning: Value Iteration & Policy Iterationmgormley/courses/10601-f19/slides/... · 2020-03-05 · Value Iteration & Policy Iteration 1 10-601 Introduction to Machine

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS

Convergence of Indirect Adaptive Asynchronous Value Iteration … · 2014-04-14 · Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms 699 policy evaluation