value iteration networks

Post on 21-Feb-2017

823 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Value Iteration Networks

Aviv Tamar, Sergey Levine, and Pieter Abbeel

Presenter: Sungjoon Choi

arXiv:1602.02867v1 [cs.AI] 9 Feb 2016

This paper can be used for

Convolutional Networks

Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.

Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!

Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.

Path Planning?

Why not just End to End?

Is it Deep Q Learning?

No, it is different. DQN only models the Q-function with CNN.

Reinforcement Learning

We only get the reward at certain points. What makes RL different from other methods?

But we have to make decision every time.

RL: Value IterationSo, we introduce the notion of value.

And of course, ways to find the value function.

Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”

Value Iteration via CNN?

Value Iteration Block

Value Iteration Block

The depth of the Q layer need not to be the same as the number of actions.

Value Iteration Network

VI Block

Value Iteration Network

Or just a feature extraction stage. (I guess)

Hierarchical VI Network

Grid-World Experiment

Grid-World Experiment

Input: Sequence of states (locations)Output: Sequence of actions (controls)

Grid-World Experiment

Value Iteration Network vs. Direct Policy Learning

Mars Rover Navigation

ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!

Make things differentiable and use deep networks, deep learning tools will take care of the rest.

Still conceptual level, but potentials are limitless

top related