a neat way for evolving echo state networks
TRANSCRIPT
ECAI 2010ECAI 2010
A NEAT Way for Evolving A NEAT Way for Evolving Echo State NetworksEcho State Networks
Kyriakos C. ChatzidimitriouPericles A. Mitkas
Intelligent Systems and Software Engineering Labgroup Informatics and Telematics Institute Electrical and Computer Eng. Dept.Centre for Research and Technology-Hellas Aristotle University of Thessaloniki
Thessaloniki, Greece
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 2
The problemThe problem
• Engineer fully autonomous-intelligent agents
• Model as reinforcement learning problems• Need good Function Approximators• Plus properties like:
– Non-linear– Non-Markovian
Generalization
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 3
Adaptive function Adaptive function approximatorsapproximators
• Problems continue: – Q: What FA to choose…?– A: Something powerful/suitable– Q: Adjust the parameters…
• Neural Nets: Number of neurons? Topology? Weights?
– A: Adaptive function approximators
• FAs built automatically, ad-hoc, per problem/environment
• How? Through the synthesis of learning and evolution Good for autonomy
Good for the user
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 4
Our proposalOur proposal
• Our proposal for an adaptive FA methodology– Built bottom up, combining powerful ideas and
algorithms from the research literature into a single methodology
– Each one fills a different gap, developing into something complete
– Design goal: cover as many aspects as possible
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 5
The IngredientsThe Ingredients
• 1 ESN (Echo State Network) [H. Jaeger]• 50% NEAT
(NeuroEvolution of Augmented Topologies) [K. Stanley]
• 50% TD-Learning
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 6
Basic Echo State Basic Echo State NetworkNetwork
If output units are linear: y(t) = w u(t) + w’ x(t)Linear function with a) linear b) non-linear and temporal features
Large number of features
Sparse
Mean around 0
Spectral radius less than 1
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 7
NEAT – Basic PrinciplesNEAT – Basic Principles
• Start minimally and complexify
• Weight & structuralmutation
• Speciation through clusteringto protect innovation
• Crossover networksthrough historical markings on connections
• Adapt it and use it as meta-search algorithm (its principles) for ESNs
1 23
1 2
3
13
2
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 8
• Combine global and local search
• Evolution helps learning “avoid traps”• Learning helps evolution to “find better locations” nearby• ESNs allow us to do that easily
– Linear learning schemes so one can use all the classic linear RL/SL learning algorithms (TD, LS, LSTD etc.)
– Need to adjust the part of evolution – adjust NEAT
Evolution and learningEvolution and learning
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 9
InitializationInitialization
• Start minimally with 1 reservoir neuron– XOR problem
• Input weights: random [-1,1]
• Output weights: 0• Reservoir weights:
– [-1,1]– Density– Mean(Wres) = 0
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 10
Mutation – Add nodeMutation – Add node
• Node added– Adds a new feature,
gene increases– Historical markings to
the node– All the reservoir
connections are initially disabled
• Later enabled through link mutation or crossover
1
32
4
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 11
More MutationsMore Mutations
• Add/remove connections• Mutate weights
– Restart– Perturb
• The weights added/changed towards making Mean(Wres) = 0
• Mutate density/spectral radius– Restart– Perturb
1
32
4
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 12
CrossoverCrossover
1
32
4
5
1
32
Let’s assume the smallest gene is alsothe fittest.
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 13
AlignmentAlignment
1
3
2
4
5
1
3
2
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 14
FittestFittest
1
3
2
1
3
2
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 15
AlignmentAlignment
1
3
2
4
5
1
3
2
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 16
LargestLargest
1
3
2
4
5
1
3
2
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 17
SpeciationSpeciation
• ESN are supposed to be sparse• Structural similarity on connections like
NEAT would eliminate the notion of speciation
• Similarity on the basic macroscopic ESN properties:– spectral radius, density, # nodes
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 18
LearningLearning
• Use simple GD TD-learning for RL• Use Least Squares for time series - online
updating is not required• Tons of methods can be used here (both
under TD-learning or doing policy search using EC)
• Tested also Darwinian versus Lamarckian evolution
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 19
Basic FlowBasic Flow
Init Pop
Simulation Learning
Fitness
Speciation
Selection
Mutation
Crossover
Next Gen
Champion
GeneralizationPerformance
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 20
ExperimentsExperiments
• Reinforcement Learning– Mountain Car– Single & Double Pole
Balancing
• Time Series– Mackey-Glass
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 21
Time Series – Mackey Time Series – Mackey GlassGlass
• Better prediction errors than another recent TWEANN on ESN algorithm – One order of magnitude both for test and
generalization errors
• Main differences is that we start minimally and do crossover, speciation
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 22
Mountain CarMountain Car
• Same behavior as NEAT+Q algorithm– NEAT+Q = “NEAT” + “Q-
Learning through back-propagation” – “recurrences”
• Same generalization behavior (around -50)
• Our approach solves also Non-Markovian 2D and 3D cars problems with learning (only position and not a speed signal is available)
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 23
Pole BalancingPole Balancing
• Comparable performance with NEAT with respect to networks evaluated
• Our approach takes more time due to the learning procedure
• 1st bell to accommodate a more advanced learning algorithm than simple GD
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 24
Vs.Vs.
• Simple ESN– Problem probably due to on-line learning (online-
learning, RL and NNs not a good triplet)– Not a problem with our approach since NE finds
ESNs “that are better able to learn”
• Linear TD-learning (no reservoir)– No reservoir => No Non-Markovian signals, Worse
behavior
• No-learning, only evolution• No clear conclusions, but vs. simple GD TD-Learning (2nd
bell)
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 25
ExperimentsExperiments
• Reinforcement Learning– Mountain Car– Single & Double Pole
Balancing – 3D Mountain Car– Server Job
Scheduling [+++]
• Time Series– Mackey-Glass [+]– MSO [-]– Lorentz Attractor [+]
15% improvement than
NEAT+Q
ECAI 2010, Lisbon A NEAT Way for Evolving ESNs 26
Future WorkFuture Work
• Even more automation, driven by the problem at hand– For example adapting operator probabilities online
• Test new RL-TD learning techniques – i.e. iLSTD, GQ
• More difficult test-beds– TAC Ad Auctions– Poker– Real Time Strategy
ECAI 2010ECAI 2010
Thank you for your attentionThank you for your attention
Questions?
Kyriakos [email protected]
http://issel.ee.auth.gr