value iteration networks
TRANSCRIPT
Value Iteration Networks
Aviv Tamar, Sergey Levine, and Pieter Abbeel
Presenter: Sungjoon Choi
arXiv:1602.02867v1 [cs.AI] 9 Feb 2016
This paper can be used for
Convolutional Networks
Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.
Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!
Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.
Path Planning?
Why not just End to End?
Is it Deep Q Learning?
No, it is different. DQN only models the Q-function with CNN.
Reinforcement Learning
We only get the reward at certain points. What makes RL different from other methods?
But we have to make decision every time.
RL: Value IterationSo, we introduce the notion of value.
And of course, ways to find the value function.
Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”
Value Iteration via CNN?
Value Iteration Block
Value Iteration Block
The depth of the Q layer need not to be the same as the number of actions.
Value Iteration Network
VI Block
Value Iteration Network
Or just a feature extraction stage. (I guess)
Hierarchical VI Network
Grid-World Experiment
Grid-World Experiment
Input: Sequence of states (locations)Output: Sequence of actions (controls)
Grid-World Experiment
Value Iteration Network vs. Direct Policy Learning
Mars Rover Navigation
ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!
Make things differentiable and use deep networks, deep learning tools will take care of the rest.
Still conceptual level, but potentials are limitless