experience-oriented artificial intelligence rich sutton with special thanks to michael littman,...
Post on 15-Dec-2015
232 Views
Preview:
TRANSCRIPT
Experience-OrientedArtificial Intelligence
Rich Sutton
with special thanks to
Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone, Lawrence Saul, and Harry
Browne
Experience matters!
Not in the obvious sense - that you have to do a thing many times to get good at it
But just in the sense that you do things, that you live a life that you take actions, receive sensations that you pass through a trajectory of states over
time
This is so obvious that it passes unnoticedLike air, gravity
the actions taken and the sensations received, by the agent from its world
a continuing time sequence over the life of the agent
Experience is the minimal ontology
Experience is
Agent Worldexperience
Experience matters, and must be respected
Experience matters becauseIt is what life is all about.
Experience is the final common path, the only result of all that goes on in the agent and world
• Experience is the most prominant feature of the computational problem we call AI
• It’s the central data structure, revealed and chosen over time
• It has a definite temporal structure Order is important Speed of decision is important
• There is a continuous flow of long duration (a lifetime!) not a sequence of isolated interactions, whose order is
irrelevant
Experience matters computationally
Experience in AIMany, many AI systems have no experienceThey don't have a life! Expert Systems Knowledge bases like CYC Question-answering systems Puzzle solvers, or any planner that is designed to receive problem descriptions and emit solutions
Part of the new popularity of agent-oriented AI is that it highlights experience
Other AI systems have experience, but don’t “respect” it
Orienting around experiencesuggests radical changes in AI
Knowledge of the world should be knowledge of possible experiences
Planning should be about foreseeing and controlling experience
The state of the world should be a summary of past experience, relevant to future experience
Yet we rarely see these basic AI issues discussed in terms of experience
Is it possible or plausible that they could be? Yes!Would it matter if they were? Yes!
I am not claiming that knowledge comes from experience.
(I take no position on the nature/nuture controvery)
But only that knowledge is about experience.
And that, given that, it should be predictive.
Key PointsComputational Theory vs. just making it work
What to compute and why Experience is central to AI Knowledge should be about experience
The minimal ontologyGrounding in experience from the bottom up
A computational theory of knowledge must support Abstraction Composition Decomposition - Explicitness, verifiability
Such Modularity is the whole point of knowledge
Outline
• Experience as central to AI
• Predictive knowledge in General
• Generalized Transition Predictions (GTPs, or option models)
• Planning with GTPs (rooms-world example)
• State as predictions (PSRs)
• Prospects and conclusion
The I/O View of the WorldWe are used to taking an I/O view of the mind, of the agent
It does not matter what it is physically made of What matters is what it does
So we should be willing to consider the same I/O view of the world
It does not matter what it is physically made of What matters is what it does
The only thing that matters about the world is the experience it generates
Then the only thing to know or say about the world is what experience it generates
Thus, world knowledge must really be about future experience.
In other words, it must be a prediction
AI could be about PredictionsHypothesis: Knowledge is predictive
About what-leads-to-what, under what ways of behavingWhat will I see if I go around the corner?Objects: What will I see if I turn this over?Active vision: What will I see if I look at my hand?Value functions: What is the most reward I know how to get?
Such knowledge is learnable, chainable, verifiable
Hypothesis: Mental activity is working with predictionsLearning themCombining them to produce new predictions (reasoning)Converting them to action (planning, reinforcement
learning)Figuring out which are most useful
Philosophical and Psychological Roots
Like classical british empiricism (1650–1800) Knowledge is about experience Experience is central
But not anti-nativist (evolutionary experience)Emphasizing sequential rather than simultaneous
events Replace association/contiguity with prediction/contingency
Close to Tolman’s “Expectancy Theory” (1932–1950) Cognitive maps, vicarious trial and error
Psychology struggled to make it a science (1890–1950) Introspection Behaviorism, operational definitions Objectivity
Tolman & Honzik, 1930“Reasoning in Rats”
Food box
Path 1
Path 3Path 2
Block B
Block A
Startbox
An old, simple, appealing idea
Mind as prediction engine! Predictions are learnable, combinableThey represent cause and effect, and can be pieced
together to yield plansPerhaps this old idea is essentially correct.Just needs
Development, revitalization in modern forms Greater precision, formalization, mathematics The computational perspective to make it respectable Imagination, determination, patience
– Not rushing to performance
Outline
• Experience as central to AI
• Predictive knowledge in general
• Generalized Transition Predictions (GTPs, or option models)
• Planning with GTPs (rooms-world example)
• State as predictions (PSRs)
• Prospects and conclusion
In steps of increasing expressiveness: Simple state-transition predictions Mixtures of predictions Closed-loop termination Closed-loop action conditioning
Machinery for General Transition
Predictions
Experience L st at st+1 at+1 st+2 at+2L
1-step Prediction
state action
A Ba Pr st+1 =B st =A,at =a
k-step Prediction
A B Pr st+k =B st =A, atL at+k given by π
The Simplest Transition Predictions
Mixtures of k-step Predictions: Terminating over a period of
time
Where will I be in 10–20 steps?
Where will I be in roughly k steps?
now
k=10steps
k=20steps
k steps Arbitrary terminationprofiles are possible
short term
medium termlong termBut sometimes anything
like this is too loose and sloppy...
now
time stepsof interest
Closed-loop Termination
Terminate depending on what happensE.g., instead of “Will I finish this report soon”
which uses a soft termination profile:
Use “Will I be done when my boss gets here?”
1 hr
probably in about an hour
Prob.
time
1
0
Prob.
bossarrives
only one precise but uncertaintime matters
Closed-loop terminationallows time specification to be
both flexible and precise
Instead of “what will I see at t+100?”
Can say “what will I see when I open the box?”
Will we elect a black or a woman president first?Where will the tennis ball be when it reaches me?What time will it be when the talk starts?
or “when John arrives?” “when the bus comes?” “when I get to the store?”
A substantial increase in expressiveness
Closed-loop Action Conditioning
Each prediction has a closed-loop policy
Policy: States --> Actions (or Probs.)
If you follow the policy, then you predict and verify Otherwise not If partly followed, temporal-difference methods can be
used
General Transition Predictions (GTPs)
Closed-loop terminations and policies
Correspond to arbitrary experimentsand the results of those experiments
What will I see if I go into the next room?What time will it be when the talk is over?Is there a dollar in the wallet in my pocket?Where is my car parked?Can I throw the ball into the basket?Is this a chair situation?What will I see if I turn this object around?
Anatomy of a General Transition Prediction
1 PredictorRecognizes the conditions, makes the prediction
2 Experiment- policy- termination condition- measurement function(s)
p: S→ M
π : S→ A or 2A
β :S→ [0,1]
m: S×A{ }*→ M
knowledge
verifier
p(s) ≈ Pr e st =s,π,β m(e)
e∈ A×S{ }*
e=atst+1L at+k−1st+k
∑ = Eπ,β m(e){ }
StatesMeasurement
space
Actions
Room-to-Room GTPs (General Transition Predictions)
up
down
rightleft
(to each room's 2 hallways)
Fail 33% of the time
Sutton, Precup,& Singh, 1999
8 multi-step GTPs
4 stochasticprimitive actions
“Options”Precup 2000Sutton, Precup, & Singh 1999
Predict: Probability of reaching each terminal hallwayGoal: minimize # steps + values for target and other outcome hallway
Policy
Terminationhallways
Target (goal)hallway
Example: Open-the-door
Predictor Use visual input to estimate Probabilities of succeeding in opening the door, and of
other outcomes (door locked, no handle, no real door) expected cumulative cost (sub-par reward) in trying
Experiment Policy for walking up to the door, shaping grasp of
handle, turning, pulling, and opening the door Terminate on successful opening or various failure
conditions Measure outcome and cumulative cost
Example: RoboCup Soccer Pass
Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of Successful pass, openness of receiver Interception Reception failure Aborted pass, in trouble Aborted pass, something better to do Loss of time
Experiment Policy for maneuvering ball, or around ball, to set up and
pass Termination strategy for aborting, recognizing
completion Measurement of outcome, time
Outline
• Experience as central to AI
• Predictive knowledge in General
• Generalized Transition Predictions (GTPs, or option models)
• Planning with GTPs (rooms-world example)
• State as predictions (PSRs)
• Prospects and conclusion
Combining Predictions
If the mind is about predictions,Then thinking is combining predictions to
produce new ones
Predictions obviously compose If A->B and B->C, then A->C
GTPs are designed to do this generally Fit into “Bellman equations” of semi-Markov
extensions of dynamic programming Can also be used for simulation-based planning
Composing Predictions
A Bπ1β1
T1B C
π2β2
T2
A C π1β1 then π2β2
T1+T2
Final measurement (e.g., partial distribution of outcome states)
Transient measurement (e.g., elapsed time, cumulative reward)
Composing Predictions
A Bπ1β1
T1B C
π2β2
T2
A C 11 then if B 22T1 .8T2
B’ .1
B’’ .1
.8
B’ .1
B’’ .1
.8
Room-to-Room GTPs (General Transition Predictions)
up
down
rightleft
(to each room's 2 hallways)
Fail 33% of the time
Sutton, Precup,& Singh, 1999
8 multi-step GTPs
4 stochasticprimitive actions
“Options”Precup 2000Sutton, Precup, & Singh 1999
Predict: Probability of reaching each terminal hallwayGoal: minimize # steps + values for target and other outcome hallway
Policy
Terminationhallways
Target (goal)hallway
Planning with GTPs
Iteration #0 Iteration #1 Iteration #2
with cell-to-cellprimitive actions
Iteration #0 Iteration #1 Iteration #2
with room-to-roomoptions
V(goa l)=1
V (goa l)=1
(GTPs)
Learning Path-to-Goal with and without GTPs
Episodes
Stepsper
episode
1 10 100 1000 10,00010
100
1000
Primitives
GTPs
GTPs& primitives
Rooms Example: Simultaneous Learning of all 8 GTPs from their
Goals
Time steps
RMS Error inl l
Two subgoalstate values
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 20,000 40,000
Time Steps60,000
upperhallwaysubgoal
learnedvalues
idealvalues
lowerhallwaysubgoal
80,000 100,0000
0.1
0.2
0.3
0.4
0 20,000 40,000 60,000 80,000 100,000
All 8 hallway GTPs were learned accurately and efficiently while actions are selected totally at random
goal prediction
Outline
• Experience as central to AI
• Predictive knowledge in General
• Generalized Transition Predictions (GTPs, or option models)
• Planning with GTPs (rooms-world example)
• State as predictions (PSRs)
• Prospects and conclusion
Predictive State Representations
Problem: So far we have assumed statesbut world really just gives information, “observations”
Hypothesis: What we normally think of as stateis a set of predictions about outcomes of experiments Wallet’s contents, John’s location, presence of objects…
Prior work: Learning deterministic FSAs - Rivest & Schapire, 1987 Adding stochasticity: An alternative to HMMs - Herbert Jaeger,
1999 Adding action: An alternative to POMDPs - Littman, Sutton, &
Singh 2001
Summary of Results for Predictive State Rep’ns (PSRs)
Exist compact, linear PSRs # tests ≤ # states in minimal POMDP # tests ≤ Rivest & Schapire’s Diversity # tests can be exponentially fewer than diversity and
POMDP
Compact simulation/update processConstruction algorithm from POMDP
Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs
There are natural EM-like algorithms (current work)
Empty Gridworld with Local Sensing
Four actions: Up, Down, Right, LeftAnd four sensory bits
Distance to Wall Predictions
0 R0 RR1 RRR1 RRRR . . .
0 D1 DD1 DDD . . .
Predictive State Representation (PSR)
4 GTPs suffice to identify each stateMore needed to update PSRMany more are computed from PSR
“meaning” ofpredictions
Suppose we add one non-uniformity
0 R0 RR1 RRR1 RRRR . . .
0 D1 DD1 DDD . . .
Now there is much more to knowIt would be challenging to program it all correctly
Other Extension Ideas
• Stochasticity
• Egocentric motion
• Multiple Rooms
• Second agent
• Moveable objects
• Transient goals
It’s easy to make such problems arbitrarily challenging
Outline
• Experience as central to AI
• Predictive knowledge in general
• Generalized Transition Predictions (GTPs, or option models)
• Planning with GTPs (rooms-world example)
• State as predictions (PSRs)
• Prospects and conclusion
How Could These Ideas Proceed?
• Build systems! Build Gridworlds!
• A performance orientation would be problematic
• The “Knowledge Representation” guys may not be impressed
• But others I think will be very interested and appreciative - throughout modern probabalistic AI
The Experience ManifestoExperience is the input and output of AIAn AI must have experience; it must have a life!
Knowledge is about experienceNot about objects, or people, or space, or time…except in so far as these things can be restated in terms of experience.
Knowledge is well expressed as predictions of experiencePredictions of experience have a much clearer meaning than any previously proposed kind of knowledgePredictions of experience can be autonomously verifiedPredictive knowledge is completely in the machine, not in a person!
Planning is about composing predictions to search through the space of attainable experiences
World-state rep’ns are also predictions of experience
Key PointsWe should not try to fake intelligence or
understandingComputational Theory vs. just making it work
What to compute and why Experience is central to AI Knowledge should be about experience
The minimal ontologyGrounding in experience from the bottom up
A computational theory of knowledge must support Abstraction Composition Decomposition - Explicitness, verifiability
Such Modularity is the whole point of knowledge
Summary of the Predictive View of AI Knowledge is Predictions
About what-leads-to-what, under what ways of behavingSuch knowledge is learnable, chainable
Mental activity is working with predictionsLearning themCombining them to produce new predictions (reasoning)Converting them to action (planning, reinforcement learning)Figuring out which are most useful
Predictions are verifiableA natural way to self-maintain knowledge,
which is essential for scaling AI beyond programming
Most of the machinery is simple but potentially powerful
Is it powerful enough?
top related