experience-oriented artificial intelligence rich sutton with special thanks to michael littman,...

Experience-OrientedArtificial Intelligence

Rich Sutton

with special thanks to

Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone, Lawrence Saul, and Harry

Browne

Experience matters!

Not in the obvious sense - that you have to do a thing many times to get good at it

But just in the sense that you do things, that you live a life that you take actions, receive sensations that you pass through a trajectory of states over

time

This is so obvious that it passes unnoticedLike air, gravity

the actions taken and the sensations received, by the agent from its world

a continuing time sequence over the life of the agent

Experience is the minimal ontology

Experience is

Agent Worldexperience

Experience matters, and must be respected

Experience matters becauseIt is what life is all about.

Experience is the final common path, the only result of all that goes on in the agent and world

• Experience is the most prominant feature of the computational problem we call AI

• It’s the central data structure, revealed and chosen over time

• It has a definite temporal structure Order is important Speed of decision is important

• There is a continuous flow of long duration (a lifetime!) not a sequence of isolated interactions, whose order is

irrelevant

Experience matters computationally

Experience in AIMany, many AI systems have no experienceThey don't have a life! Expert Systems Knowledge bases like CYC Question-answering systems Puzzle solvers, or any planner that is designed to receive problem descriptions and emit solutions

Part of the new popularity of agent-oriented AI is that it highlights experience

Other AI systems have experience, but don’t “respect” it

Orienting around experiencesuggests radical changes in AI

Knowledge of the world should be knowledge of possible experiences

Planning should be about foreseeing and controlling experience

The state of the world should be a summary of past experience, relevant to future experience

Yet we rarely see these basic AI issues discussed in terms of experience

Is it possible or plausible that they could be? Yes!Would it matter if they were? Yes!

I am not claiming that knowledge comes from experience.

(I take no position on the nature/nuture controvery)

But only that knowledge is about experience.

And that, given that, it should be predictive.

Key PointsComputational Theory vs. just making it work

What to compute and why Experience is central to AI Knowledge should be about experience

The minimal ontologyGrounding in experience from the bottom up

A computational theory of knowledge must support Abstraction Composition Decomposition - Explicitness, verifiability

Such Modularity is the whole point of knowledge

Outline

• Experience as central to AI

• Predictive knowledge in General

• Generalized Transition Predictions (GTPs, or option models)

• Planning with GTPs (rooms-world example)

• State as predictions (PSRs)

• Prospects and conclusion

The I/O View of the WorldWe are used to taking an I/O view of the mind, of the agent

It does not matter what it is physically made of What matters is what it does

So we should be willing to consider the same I/O view of the world

It does not matter what it is physically made of What matters is what it does

The only thing that matters about the world is the experience it generates

Then the only thing to know or say about the world is what experience it generates

Thus, world knowledge must really be about future experience.

In other words, it must be a prediction

AI could be about PredictionsHypothesis: Knowledge is predictive

About what-leads-to-what, under what ways of behavingWhat will I see if I go around the corner?Objects: What will I see if I turn this over?Active vision: What will I see if I look at my hand?Value functions: What is the most reward I know how to get?

Such knowledge is learnable, chainable, verifiable

Hypothesis: Mental activity is working with predictionsLearning themCombining them to produce new predictions (reasoning)Converting them to action (planning, reinforcement

learning)Figuring out which are most useful

Philosophical and Psychological Roots

Like classical british empiricism (1650–1800) Knowledge is about experience Experience is central

But not anti-nativist (evolutionary experience)Emphasizing sequential rather than simultaneous

events Replace association/contiguity with prediction/contingency

Close to Tolman’s “Expectancy Theory” (1932–1950) Cognitive maps, vicarious trial and error

Psychology struggled to make it a science (1890–1950) Introspection Behaviorism, operational definitions Objectivity

Tolman & Honzik, 1930“Reasoning in Rats”

Food box

Path 1

Path 3Path 2

Block B

Block A

Startbox

An old, simple, appealing idea

Mind as prediction engine! Predictions are learnable, combinableThey represent cause and effect, and can be pieced

together to yield plansPerhaps this old idea is essentially correct.Just needs

Development, revitalization in modern forms Greater precision, formalization, mathematics The computational perspective to make it respectable Imagination, determination, patience

– Not rushing to performance

Outline


• Predictive knowledge in general





In steps of increasing expressiveness: Simple state-transition predictions Mixtures of predictions Closed-loop termination Closed-loop action conditioning

Machinery for General Transition

Predictions

Experience L st at st+1 at+1 st+2 at+2L

1-step Prediction

state action

A Ba Pr st+1 =B st =A,at =a

k-step Prediction

A B Pr st+k =B st =A, atL at+k given by π

The Simplest Transition Predictions

Mixtures of k-step Predictions: Terminating over a period of

time

Where will I be in 10–20 steps?

Where will I be in roughly k steps?

now

k=10steps

k=20steps

k steps Arbitrary terminationprofiles are possible

short term

medium termlong termBut sometimes anything

like this is too loose and sloppy...

now

time stepsof interest

Closed-loop Termination

Terminate depending on what happensE.g., instead of “Will I finish this report soon”

which uses a soft termination profile:

Use “Will I be done when my boss gets here?”

1 hr

probably in about an hour

Prob.

time

1

0

Prob.

bossarrives

only one precise but uncertaintime matters

Closed-loop terminationallows time specification to be

both flexible and precise

Instead of “what will I see at t+100?”

Can say “what will I see when I open the box?”

Will we elect a black or a woman president first?Where will the tennis ball be when it reaches me?What time will it be when the talk starts?

or “when John arrives?” “when the bus comes?” “when I get to the store?”

A substantial increase in expressiveness

Closed-loop Action Conditioning

Each prediction has a closed-loop policy

Policy: States --> Actions (or Probs.)

If you follow the policy, then you predict and verify Otherwise not If partly followed, temporal-difference methods can be

used

General Transition Predictions (GTPs)

Closed-loop terminations and policies

Correspond to arbitrary experimentsand the results of those experiments

What will I see if I go into the next room?What time will it be when the talk is over?Is there a dollar in the wallet in my pocket?Where is my car parked?Can I throw the ball into the basket?Is this a chair situation?What will I see if I turn this object around?

Anatomy of a General Transition Prediction

1 PredictorRecognizes the conditions, makes the prediction

2 Experiment- policy- termination condition- measurement function(s)

p: S→ M

π : S→ A or 2A

β :S→ [0,1]

m: S×A{ }*→ M

knowledge

verifier

p(s) ≈ Pr e st =s,π,β m(e)

e∈ A×S{ }*

e=atst+1L at+k−1st+k

∑ = Eπ,β m(e){ }

StatesMeasurement

space

Actions

Room-to-Room GTPs (General Transition Predictions)

up

down

rightleft

(to each room's 2 hallways)

Fail 33% of the time

Sutton, Precup,& Singh, 1999

8 multi-step GTPs

4 stochasticprimitive actions

“Options”Precup 2000Sutton, Precup, & Singh 1999

Predict: Probability of reaching each terminal hallwayGoal: minimize # steps + values for target and other outcome hallway

Policy

Terminationhallways

Target (goal)hallway

Example: Open-the-door

Predictor Use visual input to estimate Probabilities of succeeding in opening the door, and of

other outcomes (door locked, no handle, no real door) expected cumulative cost (sub-par reward) in trying

Experiment Policy for walking up to the door, shaping grasp of

handle, turning, pulling, and opening the door Terminate on successful opening or various failure

conditions Measure outcome and cumulative cost

Example: RoboCup Soccer Pass

Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of Successful pass, openness of receiver Interception Reception failure Aborted pass, in trouble Aborted pass, something better to do Loss of time

Experiment Policy for maneuvering ball, or around ball, to set up and

pass Termination strategy for aborting, recognizing

completion Measurement of outcome, time

Outline







Combining Predictions

If the mind is about predictions,Then thinking is combining predictions to

produce new ones

Predictions obviously compose If A->B and B->C, then A->C

GTPs are designed to do this generally Fit into “Bellman equations” of semi-Markov

extensions of dynamic programming Can also be used for simulation-based planning

Composing Predictions

A Bπ1β1

T1B C

π2β2

T2

A C π1β1 then π2β2

T1+T2

Final measurement (e.g., partial distribution of outcome states)

Transient measurement (e.g., elapsed time, cumulative reward)

Composing Predictions

A Bπ1β1

T1B C

π2β2

T2

A C 11 then if B 22T1 .8T2

B’ .1

B’’ .1

.8

B’ .1

B’’ .1

.8

Room-to-Room GTPs (General Transition Predictions)

up

down

rightleft

(to each room's 2 hallways)

Fail 33% of the time

Sutton, Precup,& Singh, 1999

8 multi-step GTPs

4 stochasticprimitive actions

“Options”Precup 2000Sutton, Precup, & Singh 1999

Predict: Probability of reaching each terminal hallwayGoal: minimize # steps + values for target and other outcome hallway

Policy

Terminationhallways

Target (goal)hallway

Planning with GTPs

Iteration #0 Iteration #1 Iteration #2

with cell-to-cellprimitive actions

Iteration #0 Iteration #1 Iteration #2

with room-to-roomoptions

V(goa l)=1

V (goa l)=1

(GTPs)

Learning Path-to-Goal with and without GTPs

Episodes

Stepsper

episode

1 10 100 1000 10,00010

100

1000

Primitives

GTPs

GTPs& primitives

Rooms Example: Simultaneous Learning of all 8 GTPs from their

Goals

Time steps

RMS Error inl l

Two subgoalstate values

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 20,000 40,000

Time Steps60,000

upperhallwaysubgoal

learnedvalues

idealvalues

lowerhallwaysubgoal

80,000 100,0000

0.1

0.2

0.3

0.4

0 20,000 40,000 60,000 80,000 100,000

All 8 hallway GTPs were learned accurately and efficiently while actions are selected totally at random

goal prediction

Outline







Predictive State Representations

Problem: So far we have assumed statesbut world really just gives information, “observations”

Hypothesis: What we normally think of as stateis a set of predictions about outcomes of experiments Wallet’s contents, John’s location, presence of objects…

Prior work: Learning deterministic FSAs - Rivest & Schapire, 1987 Adding stochasticity: An alternative to HMMs - Herbert Jaeger,

1999 Adding action: An alternative to POMDPs - Littman, Sutton, &

Singh 2001

Summary of Results for Predictive State Rep’ns (PSRs)

Exist compact, linear PSRs # tests ≤ # states in minimal POMDP # tests ≤ Rivest & Schapire’s Diversity # tests can be exponentially fewer than diversity and

POMDP

Compact simulation/update processConstruction algorithm from POMDP

Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs

There are natural EM-like algorithms (current work)

Empty Gridworld with Local Sensing

Four actions: Up, Down, Right, LeftAnd four sensory bits

Distance to Wall Predictions

0 R0 RR1 RRR1 RRRR . . .

0 D1 DD1 DDD . . .

Predictive State Representation (PSR)

4 GTPs suffice to identify each stateMore needed to update PSRMany more are computed from PSR

“meaning” ofpredictions

Suppose we add one non-uniformity

0 R0 RR1 RRR1 RRRR . . .

0 D1 DD1 DDD . . .

Now there is much more to knowIt would be challenging to program it all correctly

Other Extension Ideas

• Stochasticity

• Egocentric motion

• Multiple Rooms

• Second agent

• Moveable objects

• Transient goals

It’s easy to make such problems arbitrarily challenging

Outline


• Predictive knowledge in general





How Could These Ideas Proceed?

• Build systems! Build Gridworlds!

• A performance orientation would be problematic

• The “Knowledge Representation” guys may not be impressed

• But others I think will be very interested and appreciative - throughout modern probabalistic AI

The Experience ManifestoExperience is the input and output of AIAn AI must have experience; it must have a life!

Knowledge is about experienceNot about objects, or people, or space, or time…except in so far as these things can be restated in terms of experience.

Knowledge is well expressed as predictions of experiencePredictions of experience have a much clearer meaning than any previously proposed kind of knowledgePredictions of experience can be autonomously verifiedPredictive knowledge is completely in the machine, not in a person!

Planning is about composing predictions to search through the space of attainable experiences

World-state rep’ns are also predictions of experience

Key PointsWe should not try to fake intelligence or

understandingComputational Theory vs. just making it work

What to compute and why Experience is central to AI Knowledge should be about experience

The minimal ontologyGrounding in experience from the bottom up

A computational theory of knowledge must support Abstraction Composition Decomposition - Explicitness, verifiability

Such Modularity is the whole point of knowledge

Summary of the Predictive View of AI Knowledge is Predictions

About what-leads-to-what, under what ways of behavingSuch knowledge is learnable, chainable

Mental activity is working with predictionsLearning themCombining them to produce new predictions (reasoning)Converting them to action (planning, reinforcement learning)Figuring out which are most useful

Predictions are verifiableA natural way to self-maintain knowledge,

which is essential for scaling AI beyond programming

Most of the machinery is simple but potentially powerful

Is it powerful enough?

experience-oriented artificial intelligence rich sutton with special thanks to michael littman,...

Documents

agent experience

experience matters

outline experience

terms of experience

irrelevant experience

future experience

respected experience

agentworld experience