incorporating advice into agents that learn from reinforcement

23
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ

Upload: ovidio

Post on 14-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Incorporating Advice into Agents that Learn from Reinforcement. Presented by Alp Sardağ. Problem of RL. Reinforcement Learning usually requires a large number of trainning episodes. HOW OVERCOME? Two approachs: Implicit representation of the utility function - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Incorporating Advice into Agents that Learn from Reinforcement

Incorporating Advice into Agents that Learn from

Reinforcement

Presented by

Alp Sardağ

Page 2: Incorporating Advice into Agents that Learn from Reinforcement

Problem of RL• Reinforcement Learning usually requires a

large number of trainning episodes.

HOW OVERCOME?• Two approachs:

• Implicit representation of the utility function• Allowing Q-learner to accept given advice at

any time and, in a natural manner.

Page 3: Incorporating Advice into Agents that Learn from Reinforcement

Input Generalization• To learn how to play game (chess 10120

states), impossible to visit all these states• Implicit representation of the function: A form

that allows to calculate the output for any input, much more compact than tabular form.

Example:

U(i) = w1f1(i)+w2f2(i)+...+wnfn(i)

Page 4: Incorporating Advice into Agents that Learn from Reinforcement

Connectionist Q-learning• As the function to be learned is characterized by a

vector of weights w, neural networks are obvious candidates for learning weights.

The new update rule:

w w + (r+Uw(j)-Uw(i))wUw(i)

Note: TD-gammon learned better than Neurogammon

Page 5: Incorporating Advice into Agents that Learn from Reinforcement

Example of Advice Taking

Advice: Don’t go into box canyons when opponents are in sight.

Page 6: Incorporating Advice into Agents that Learn from Reinforcement

General Structure of RL learner

A connectionist Q-learning augmented with advice-taking.

Page 7: Incorporating Advice into Agents that Learn from Reinforcement

Connectionist Q-learning• Q(a,i) : utility function maps state and actions to

numeric values.

• Given a perfect version of this function the optimal plan is to simply choose, in each state that is reached, the action with the maximum utility.

• The utility function is implemented as a neural network, whose inputs describe the current state and whose outputs are the utility of each action.

Page 8: Incorporating Advice into Agents that Learn from Reinforcement

Step 1 in Taking Advice• Request the advice: Instead of having the

learner request the advice, the external observer provides advice whenever the observer feels it is appropriate. There are two reasons for this:• It places less burden on the observer• It is an open question how to create the best

mechanism for having a RL agent recognize its needs for advice.

Page 9: Incorporating Advice into Agents that Learn from Reinforcement

Step 2 in Taking Advice• Convert the advice to an internal representation: Due to the

complexities of natural language processing, the external observer express its advice using a simple programming language and a list of task-specific terms.

• Example:

Page 10: Incorporating Advice into Agents that Learn from Reinforcement

Step 3 in Advice Taking• Convert the advice into usable form: Using

techniques from knowledge compilation, a learner can convert high level advice into a collection of directly interpretable statements.

Page 11: Incorporating Advice into Agents that Learn from Reinforcement

Step 4 in Advice Taking• Use ideas from knowledge-based neural networks:

Install the operationalized advice into the connectionist representation of the utility function.• Converts a ruleset into a network by mapping the

“target concepts” of the ruleset to output units and creating hidden units that represent the intermediate conclusion.

• Rules are intalled incrementally installed into networks.

Page 12: Incorporating Advice into Agents that Learn from Reinforcement

Example

Page 13: Incorporating Advice into Agents that Learn from Reinforcement

Example Cont.

Advice added, note that the inputs and outputs to the network remain unchanged; the advice only changes how the function from states to the utility of actions is calculated.

Page 14: Incorporating Advice into Agents that Learn from Reinforcement

Example Cont.A multistep plan:

Page 15: Incorporating Advice into Agents that Learn from Reinforcement

Example Cont.A multistep plan embedded in a REPEAT:

Page 16: Incorporating Advice into Agents that Learn from Reinforcement

Example Cont

A dvice that involves previously defined terms:

Page 17: Incorporating Advice into Agents that Learn from Reinforcement

Judge the Value of Advice• Once the advice is inserted, the RL agent returns to

exploring its environment, thereby integrating and refining the advice.

• In some circumstances, such as game learner that can play against itself, it would be straightforward to empirically evaluate the advice.

• It would also be possible to allow the observer to retract or counteract bad advice.

Page 18: Incorporating Advice into Agents that Learn from Reinforcement

Test Bed

Test environment: (a) sample configuration (b) sample division Of the environment into sectors (c) distance measured by the agent sensors(d) A neural network that computes utility of actions.

Page 19: Incorporating Advice into Agents that Learn from Reinforcement

Methodology• The agents are trained for a fixed number of episodes for

each experiment. • An episode consists of placing the agent into a randomly

generated, initial environment, and then allowing it to explore until it is captured or a treshold of 500 step is reached.

• Environment contains 7x7 grid 15 obstacles, 3 enemy agents, and 10 rewards.• 3 random generated environment.• 10 randomly initialized network.

• Average total reinforcement is measured by freezing the network and measuring the average reinforcement on a testset.

Page 20: Incorporating Advice into Agents that Learn from Reinforcement

Advices

Page 21: Incorporating Advice into Agents that Learn from Reinforcement

Testset Results

Page 22: Incorporating Advice into Agents that Learn from Reinforcement

Testset Result

Above table shows how well each piece of advicemeets its intent.

Page 23: Incorporating Advice into Agents that Learn from Reinforcement

Related Work Gordon and Subramanian (1994)

developed a system similiar to that one. The agent accept high-level advice of the form IF condition THEN ACHIEVE goal. It operationalizes these rules using its background knowledge about goal achievement. The resulting rules are then refined using genetic algorithms.