value iteration networks

Value Iteration Networks

Aviv Tamar, Sergey Levine, and Pieter Abbeel

Presenter: Sungjoon Choi

arXiv:1602.02867v1 [cs.AI] 9 Feb 2016

This paper can be used for

Convolutional Networks

Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.

Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!

Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.

Path Planning?

Why not just End to End?

Is it Deep Q Learning?

No, it is different. DQN only models the Q-function with CNN.

Reinforcement Learning

We only get the reward at certain points. What makes RL different from other methods?

But we have to make decision every time.

RL: Value IterationSo, we introduce the notion of value.

And of course, ways to find the value function.

Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”

Value Iteration via CNN?

Value Iteration Block

The depth of the Q layer need not to be the same as the number of actions.

Value Iteration Network

VI Block

Value Iteration Network

Or just a feature extraction stage. (I guess)

Hierarchical VI Network

Grid-World Experiment

Input: Sequence of states (locations)Output: Sequence of actions (controls)

Grid-World Experiment

Value Iteration Network vs. Direct Policy Learning

Mars Rover Navigation

ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!

Make things differentiable and use deep networks, deep learning tools will take care of the rest.

Still conceptual level, but potentials are limitless

value iteration networks

Engineering

limiting extrapolation in linear approximate value...

gated path planning networks icml...

markov decision processes -...

aviv tamar joint work with pieter abbeel, sergey levine,...

two approximate dynamic programming algorithms for managing...

1 0.561.121.170.322.787.423.147.71 value 6.214.42 iteration...

cs 357: numerical methods lecture 14: orthogonal iteration...

games, times, and probabilities: value iteration in...

new research article two-phase iteration for value function...

primitive (co)recursion and course-of-value (co)iteration...

monte carlo value iteration for continuous-state...

rl 8: value iteration and policy iteration · rl 8: value...

lecture iv value function iteration with discretization ·...

value iteration networks

minimax value iteration applied to robotic soccer

privacy value networks

pid accelerated value iteration algorithm

perseus: randomized point-based value iteration for...

the asymptotic behavior of undiscounted value iteration in

making complex declslons. outline mdps(markov decision...