trials and tribulations

13
TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm

Upload: jerry

Post on 24-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Trials and Tribulations. Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm. Subject of Investigation. How humans integrate visual object properties into their action policy when learning a novel visuomotor task. BubblePop ! - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Trials and Tribulations

TRIALS AND TRIBULATIONS

Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm

Page 2: Trials and Tribulations

SUBJECT OF INVESTIGATION

How humans integrate visual object properties into their action policy when learning a novel visuomotor task.

• BubblePop!

Problem: Too many possible questions…

Solution: Motivate behavioral research by looking at modeling difficulties.

• Nonobvious crossroads

Page 3: Trials and Tribulations

APPROACH

Since the task has a scalar performance signal, model must utilize reinforcement learning.

• Temporal Difference Back Propagation

Start with an extremely simplified version of the task and add back the complexity once you have a successful model.

Analyze the representational and architectural constraints necessary for each model.

Page 4: Trials and Tribulations

5x5 grid-world

4 possible actions• Up, down, left, right

1 unmoving target

Starting locations of target and agent randomly assigned

Fixed reward upon reaching target and a new target generated

Epoch ends after fixed number of steps

FIRST STEPS: DUMMY WORLD

Page 5: Trials and Tribulations

DUMMY WORLD ARCHITECTURES

25 units for the grid 4 Actions

8 Hidden Layer

1

context

Expected Reward

(ego only)

The whole grid (allocentric), or agent centered (egocentric)

Page 6: Trials and Tribulations

Current architectures learn each action independently.

‘Up’ is like ‘Down’, but different.

• It shifts the world

1 action, 4 different inputs• “In which rotation of the

world would you rather go ‘up’ in?”

BUILDING IN SYMMETRY

Page 7: Trials and Tribulations

Scaled grid size up to 10x10• Not as unrealistic as one might think… (tile

coding)

Scaled number of targets• Difference from 1 to 2, but not from 2 to

many.

Confirmed ‘winning-est’ representation

Added memory

WORLD SCALING

Page 8: Trials and Tribulations

Added a ‘ripeness’ dimension to target, and changed the reward function:If target.ripeness >.60

reward = 1;

Else

reward = -.66667;

NO LOW HANGI NG FRUI T:THE R I P ENESS P ROBL EM

How the problem occurs:

1. At a high temperature you move randomly.

2. The random pops net zero reward.

3. The temperature lowers and you ignore the target entirely.

Page 9: Trials and Tribulations

ANN EAL ING AWAY THE CU RS E O F P ICKINESS

Page 10: Trials and Tribulations

No feedback for almost ripe

So how could we anneal our ripeness criterion?

Anneal the amount you care about unripe pops.

Differentiate internal and extern reward functions

A PS YCHO L OGICAL LY PL AUS IBL E SO LUTION

Page 11: Trials and Tribulations

FUTURE DIRECTIONS

Investigate how the type of ripeness difficulty impacts computational demands.

• Difficulty due to reward schedule vs. perceptual acuity vs. redundancy vs. conjunctive-ness vs. ease of prediction

How to handle the ‘Feature Binding ‘Problem’ in this context• Emergent binding through deep learning?

Just keep increasing complexity and see what problems crop up.

• If the model gets to human level performance without a hitch, then that’d be pretty good to.

Page 12: Trials and Tribulations

SUMMARY& DISCUSSION

Egocentric representations pay off in this domain, even with the added memory cost.

• In any domain with a single agent?

Symmetries in the action space can be exploited to greatly expedite learning

• Could there be a general mechanism for detecting such symmetries?

Difficult reward functions might be learnt via annealing internal reward signals.

• How could we have this annealing emerge from the model?

Page 13: Trials and Tribulations

QUESTIONS?