paper by jonathan mugan & benjamin kuipers presented by daniel hough learning distinctions and...
Post on 05-Jan-2016
215 Views
Preview:
TRANSCRIPT
PAPER BY
JONATHAN MUGAN & BENJAMIN KUIPERSPRESENTED BY
DANIEL HOUGH
Learning Distinctions and Rules in a Continuous World through
Active Exploration
The Challenge
To build a robot which learns its environment like children do.
Piaget [1952] theorised that children constructed this knowledge in stages
Cohen [2002] proposed that children have a domain-general information processing system for bootstrapping knowledge.
Foundations
The Focus of the work: How a developing agent can learn temporal contingencies in the form of predictive rules over events.
Watson [2001] proposed a model of contingencies based on his observations of infant behaviour: Prospective temporal contingency: Event B tends to
follow Event A with a likelihood greater than chance Retrospective temporal contingency: Event A tends to
come before Event B more often than chance.Distinctions must be found to determine
when an event has occurred.
Foundations
Drescher [1991] proposed a model inspired by Piaget where contingencies (here schemas) are found using marginal attribution.
Results are found which follow actions in a method similar to Watson’s.
For each schema (in the form of an action + result), the algorithm searches for context (situation) that makes the result more likely to follow that action.
The MethodIntroduction
Here, prospective contingencies as well as contingencies in which events occur simultaneously are represented using predictive rules
These rules are learned using a method inspired by marginal attribution
The difference with Drescher is continuous variables.
This brings up the issue of determining when events occur, so distinctions must be found.
The MethodIntroduction
Motor babbling method from last week to learn distinctions and contingencies.
This was undirected, does not allow learning for larger problems – too much effort is wasted on uninteresting portions of state space.
The MethodIntroduction
In this algorithm, the agent receives as input the values of time-varying continuous variables but can only represent, reason about and construct knowledge using discrete values.
Continuous values are discretised using distinctions in the form of landmarks: A discrete value v(t) for each continuous variable v’(t); If for landmarks v1 and v2, v1 < v’(t) < v2 then v(t) has the open
interval between v1 and v2 as its value, v = (v1,v2). The association means agent can focus on changes of v =
eventsThe agent greedily learns rules that use one event to
predict another.
The MethodHow it’s evaluated
The method is evaluated using a simulated robot based on the situation of a baby sitting in a high chair.
Fig. 1: Adorable Fig. 2: Less adorable
The MethodKnowledge Representation & Learning
The goal is for the agent to learn to identify landmark values from its own experience.
The importance of a qualitative distinction is estimated from the reliability of the rules that can be learned, given that distinction.
The qualitative representation is based on QSIM [Kupiers, 1994]
The MethodKnowledge Representation & Learning
A continuous variable x’(t) is represented by discrete variable x(t) for magnitude and x’’(t) for the direction of change of x’(t), and ranges over some subset of the real number line (-∞, +∞).
In QSIM, magnitude is abstracted to a discrete variable x(t) that ranges over a quantity space Q(x) of qualitative values.Q(x) = L(x) U I(x)where L(x) = {x1,...,xn}
landmark valuesI(x) = {(-∞,x1),(x1,x2),...,(xn, +∞)}
mutually disjoint open intervals
The MethodKnowledge Representation & Learning
A quantity space with two landmarks might be described as (x1,x2), which implies five distinct qualtitative values,
Q(x) = {(-∞,x1),x1,(x1,x2),x2,(x2, +∞)}
A discrete variable x’’(t) for direction of change of x’(t) has a single intrinsic landmark at 0, so its initial quantity space is
Q(x’’) = {(-∞,0),(0,+∞)}
The MethodKnowledge Representation & Learning: Events
If a is the qualitative value of a discrete variable A, meaning a ∈ Q(A), then the event At → a is defined by A(t – 1) =/= a and A(t) = a
That is, an event takes place when a discrete variable A changes to value a at time t, from some other value.
The MethodKnowledge Representation & Learning: Predictive Rules
This is how temporal contingencies are described
There are two types of predictive rules: Causal: one event occurs after another later in time Functional: linked by a function so happen at the
same time
The MethodLearning a predictive rule
The agent wants to learn rule which predicts a certain event h
It will look at other events and find that if one, u, leads to h more likely than others, then it will create a rule with that event as the antecedent It does so by starting with an initial rule with no
context
The MethodLandmarks
When a new landmark is inserted into Q(x) we replace one interval with two intervals and the dividing landmark, e.g. a new landmark x* we have(xi,x*),x*,(x*,xi+1)
Whenever a new landmark is inserted, statistics about the previous state space are thrown out and new ones are built up. This means checking that the reliability of the rule must be checked.
The MethodThe Learning Process
1. Do 7 timesa) Actively explore world with set of candidate goals
coming from discrete variables in M for 1000 timesteps
b) Learn new causal and functional rulesc) Learn new landmarks by examining statistics stored
in rules and events
2. Gather 3000 more timesteps of experience to solidify the learned rules
3. Update the strata4. Goto 1
EvaluationExperimental Setup
The robot has two motor variables, one for each of its degrees of freedom
A perpetual system creates variables for each of the two tracked objects in the environment: the hand and the block.
Too many variables to reasonably explain here, each has various constraints
During learning of the block is knocked off the tray or if it is not moved for 300 timesteps, it’s put back on the tray in a random position within reach of the agent
EvaluationExperimental Results
The algorithm was evaluated using the simple task of moving the block in a specified direction.
It was ran five times using passive learning and five using active learning and each run lasted 120,000 timesteps.
Each active run of the algorithm resulted in an average of 62 predictive rules.
The agent gains proficiency as it learns until reaching threshold at approximately 70,000 timesteps for both.
EvaluationExperimental Results
Clearly, active exploration appears to do better since at 40,000 timesteps, active learning achieves the level passive has at 60,000 timesteps.
The Complexity of Space and Time
The storage required to learn new rules is O(e2), as is the number of rules – but only a small number are learned by the agent.
Using marginal attribution each rule requires storage O(e), although all pairs of events are stored for simplicity.
Conclusion
First the agent could only determine the direction of movement of an object
Active exploration of environment and using rules to learn distinctions then using distinctions to learn more rules, the agent progressed from having a very simple representation towards a representation that is aligned with the natural “joints” of its environment.
top related