Computed PredictionSo far, so good. What now?
Pier Luca Lanzi
Politecnico di Milano, ItalyIllinois Genetic Algorithms Laboratory,University of Illinois at Urbana Champaign, USA
RL
What is the problem?
Environment
Agent
st atrt+1st+1
Compute a value function Q(st,at) mapping state-action pairs into expected future payoffs
How much future reward when action at is performed in state st?
What is the expected payoff for st and at?
GOAL: maximize the amount of reward received in the long run
Example: The Mountain Car
GOAL
Task: drive an underpoweredcar up a steep mountain road
a t = a
cc. l
eft,
acc.
right
, no
acc.
st = position, velocity
rt = 0 when goal isreached, -1 otherwise.
Value FunctionQ(st,at)
What are the issues?
Exact representation infeasible Approximation mandatory The function is unknown,
it is learnt online from experience
Learning the unknown payoff functionwhile also trying to approximate it
Approximator works on intermediate estimatesbut it also tries to provide information for the
learning
Convergence is not guaranteed
Classifiers
Learning Classifier Systems
Solve reinforcement learning problems
Represent the payoff function Q(st, at) asa population of rules, the classifiers.
Classifiers are evolved whileQ(st, at) is learnt online
payoff
surface for A
What is a classifier?
IF condition C is true for input sTHEN the payoff of action a is p
s
payoff
l u
p
ConditionC(s)=l≤s≤u
General conditionscovering large portionsof the problem space
Accurateapproximations
Generalization depends on how wellconditions can partition the problem space
What is the best representation for theproblem?
Several representations have beendeveloped to improve generalization
payoff
landscape of A
What is computed prediction?
Replace the prediction p bya parametrized functionp(x,w)
x
payoff
l u
p(x,w)=w0+xw1
ConditionC(s)=l≤s≤u
IF condition C is true for input sTHEN the value of action a isp(x,w)
Which Representation?
Which type ofapproximation?
Computed Prediction:Linear approximation
Each classifier has a vector of parameters wClassifier prediction is computed as,
Classifier weights are updated usingWidrow-Hoff update,
Summary
Typical RL approach:What is the best approximator?
GOAL: Learn thepayoff function
Typical LCS approach asks:What is the best representation
for the problem?
What are the differences?
REPRESENTATION
intervals messy Symbols
Hullsellipsoid0/1/#AP
PR
OX
IMA
TOR
GradientDescent
Radial Basis
NNs
Tile Coding
ComputedPrediction
BooleanRepresentationSigmoidPrediction
BooleanRepresentation
NeuralPrediction
(O’hara & Bull2004)
Real IntervalsNeuralPrediction
Convex HullsLinearPrediction
To represent or to approximate?
Powerful representations allow the solution ofdifficult problems with basic approximators
Powerful approximators may make thechoice of the representation less critical
Experiment
Consider a very powerful approximatorthat we know it can solve a certain RL problem
Use it to compute classifier prediction in an LCSand apply the LCS to solve the same problem
Does genetic search stillprovide an advantage?
Computed prediction with Tile Coding
Powerful approximator developed inthe reinforcement learning community
Tile coding can solve the mountain car problemgiven an adequate parameter setting
Classifier prediction is computed using tile coding Each tile coding has a different parameter settings When using tile coding to compute
classifier prediction, one classifier cansolve the whole problem
What should we expect?
The performance?
Computed prediction can perform as well as theapproximator with the most adequate configuration
The evolution of a population of classifiersprovides advantages over one approximator
Even if the same approximator alonemight solve the whole problem
How do parameters evolve?
What now?
What now?REPRESENTATION
AP
PR
OX
IMA
TOR
Problem
Whichrepresentation?
Whichapproximator?
Which approximator?
Let evolution decide!
Population of classifiers using differentapproximators to compute prediction
The genetic algorithm selects the bestapproximators for each problem subspace
Evolving the best approximator
What next?REPRESENTATION
AP
PR
OX
IMA
TOR
Problem
Whichrepresentation?
Whichapproximator?
Which approximator?
Let evolution decide!
Population of classifiers using differentapproximators to compute prediction
Even if the same approximator alonemight solve the whole problem
Evolving Heterogeneous Approximators
HeterogeneousApproximators
Most PowerfulApproximator
What next?
Allow different representationsin the same populations
Let evolution evolve the most adequaterepresentation for each problem subspace
Then, allow different representations anddifferent approximators evolve all together
Probably donefor BooleanConditions
Acknowledgements
Daniele LoiaconoMatteo ZaniniAll the current and former
members of IlliGAL
Thank you!Any question?