value iteration networks

22
Value Iteration Networks Aviv Tamar, Sergey Levine, and Pieter Abbeel Presenter: Sungjoon Choi arXiv:1602.02867v1 [cs.AI] 9 Feb 2

Upload: sungjoon-samuel

Post on 21-Feb-2017

823 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Value iteration networks

Value Iteration Networks

Aviv Tamar, Sergey Levine, and Pieter Abbeel

Presenter: Sungjoon Choi

arXiv:1602.02867v1 [cs.AI] 9 Feb 2016

Page 2: Value iteration networks

This paper can be used for

Page 3: Value iteration networks

Convolutional Networks

Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.

Page 4: Value iteration networks

Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!

Page 5: Value iteration networks

Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.

Page 6: Value iteration networks

Path Planning?

Page 7: Value iteration networks

Why not just End to End?

Page 8: Value iteration networks

Is it Deep Q Learning?

No, it is different. DQN only models the Q-function with CNN.

Page 9: Value iteration networks

Reinforcement Learning

We only get the reward at certain points. What makes RL different from other methods?

But we have to make decision every time.

Page 10: Value iteration networks

RL: Value IterationSo, we introduce the notion of value.

And of course, ways to find the value function.

Page 11: Value iteration networks

Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”

Page 12: Value iteration networks

Value Iteration via CNN?

Page 13: Value iteration networks

Value Iteration Block

Page 14: Value iteration networks

Value Iteration Block

The depth of the Q layer need not to be the same as the number of actions.

Page 15: Value iteration networks

Value Iteration Network

VI Block

Page 16: Value iteration networks

Value Iteration Network

Or just a feature extraction stage. (I guess)

Page 17: Value iteration networks

Hierarchical VI Network

Page 18: Value iteration networks

Grid-World Experiment

Page 19: Value iteration networks

Grid-World Experiment

Input: Sequence of states (locations)Output: Sequence of actions (controls)

Page 20: Value iteration networks

Grid-World Experiment

Value Iteration Network vs. Direct Policy Learning

Page 21: Value iteration networks

Mars Rover Navigation

Page 22: Value iteration networks

ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!

Make things differentiable and use deep networks, deep learning tools will take care of the rest.

Still conceptual level, but potentials are limitless