ijcnn, international joint conference on neural networks, san jose 2011 pawel raif silesian...
Post on 21-Dec-2015
221 Views
Preview:
TRANSCRIPT
IJCNN, International Joint Conference on Neural Networks, San Jose 2011
Pawel Raif Silesian University of Technology, Poland,
Janusz A. Starzyk Ohio University, USA,
Motivated Learningin
Autonomous Systems
Outline
• Reinforcement Learning (RL)• Goal Creation System (GCS)
yields self-organizing pain based network
• Motivated Learning (ML) as a combination of RL + GCS
• Simulations Results• Possible Applications of ML
hierarchical RL
Machine Learning Methods
intrinsic motivation
PROBLEMS IN „REAL WORLD” APPLICATIONS like in
AUTONOMOUS SYSTEMS
machine learning
supervised learning
unsupervised learning
corrective learning
reinforcement learning
„curse of dimensionality”
lack of motivation for development
„top-down approach”
„bottom-up approach”
Reinforcement Learninglearning through interaction with the environment
RLas
r
ENVIRONMENT
Motivated Learning
ML can combine internal goal creation system (GCS) and reinforcement learning (RL).
Motivated learning (ML) is need based motivation, goal creation and learning in an embodied agent. An agent creates hierarchy of goals based on the
primitive need signals. It receives internal rewards for satisfying its goals
(both primitive and abstract). ML applies to EI working in a hostile environment.
actionstate
GCreward
GOALS (motivations)
RL
ML
Motivated Learning – the main IDEA…intrinsic motivations created by learning machines.
An intelligent agent learns how to survive in a hostile environment.
How to motivate a machine?
We suggest that the hostility of the environment,is the most effective motivational factor.
Assumptions
1. ML agent is independent: it can act autonomously in its environment and is able to choose its own way of development.
2. ML agent’s interface to the environment is the same as RL agent’s.
3. Environment is hostile to the agent.
4. Hostility may be active or passive (depleted resources).
5. Environment is fully observable.
Goal Creation SystemNeural self-organizing pain-based structures
Goal creation schemea primitive pain is directly sensedan abstract pain is introduced
by solving a lower level painthresholded curiosity based pain
Motivations and selection of a goalMotivations are as desires in BDI agentWTA competition selects motivationanother WTA selects goals
P2
GwPpG
wBP1
1
Pp
G
M2
wP1GwBP2
1
P1
S1
S2
B1
B2
M1
.
Sk P2 G
M
wPG
wBP2B2
B1
wBP1
1
P1
1
UA
-10
WTA WTA
The least abstract
The most
abstract
Office
Bank
Grocery
Food
SENSOR MOTOR INCREASE DECREASE
Food Eat Sugar level Food supplies
Grocery Buy Food supplies Money amount
Bank Withdraw Money amount Bank account
Office Work Bank account Working possibilities
Internal goalssimple linear hierarchy between different goals
Hierarchy of resources(and possible agent’s goals):
Resources are distributed all over the „grid world”.
3
12
4
Modified „grid world”
This environment is:Complex,
Dynamically changing,Fully observable.
Agent must localize resources and learn how to utilize them
Environment
Internal need signals
Perception of resources
Resources present in the environment
can be used to satisfy the agent’s
needs
Subjective sense of „lack of resources”
By discovering useful resources and their dependencies, learned hierarchy of internal goals expresses the environment complexity.
Resources are distributed all over the „grid world”.
3
1
4
3
4
2
1
2
Relationships between internal goals doesn’t have to be a linear hierarchy.
They may constitute a tree structure or a complex network of resource dependencies.
Relationships between internal goals
By discovering subsequent resources and their dependencies, the complexity of internal goal network grows. BUT each system may have unique experiences (reflecting personal history of development)
Designer’s specified needs
Top level resources
need1 need2
need3
Experiment that combines ML & RL
Every resource discovered by the agent becomes a potential goal and is assigned a value function „level”.
Goal Creation System establishes new goals and switches agent’s activity between them.
RL algorithm learns value functions on different levels.
Experiment Resultsswitching
between goalsat the beginning …
… and at the end.
Initially the agent uses many iterations to reach a goal (red dots).
Sometimes it abandons the goal when another pain dominates.
Final runs are shorter and more successful.
Comparing Primitive Pain Levels of RL & MLExperiment Results
Moving average of the primitive pain signal.
Initially RL agent learns better.
Its performance deteriorates as the resources are depleted
Experiment ResultsEffectiveness in terms of cumulative reward:
Reward determined by the designer of the experiment.
Cumulative reward
Reinforcement Learning Motivated Learning
• Single value function– Various objectives
• Measurable rewards• Predictable• Objectives set by designer• Maximizes the reward
– Potentially unstable
• Learning effort increases with complexity
• Always active
• Multiple value functions– One for each goal
• Internal rewards • Unpredictable• Sets its own objectives• Solves minimax problem
– Always stable
• Learns better in complex environment than RL
• Acts when needed
http://www.bradfordvts.co.uk/images/goal.jpg
ConclusionsMotivated learning method, based on goal creation system, can improve learning of autonomus agents in special class of problems.
ML is especially useful in complex, dynamic environments where it works according to learned hierarchy of goals.
Individual goals use well known reinforcement learning algorithms to learn their corresponding value functions.
ML concerns building internal representations of useful environment percepts, through interaction with the environment.
ML switches machine’s attention and sets intended goals becoming an important mechanism for a cognitive system.
„The real danger is not that computers will begin to think like man, but that man will begin to think like computers.”
Sydney J. Harris
References:• J.A. Starzyk, J.T. Graham, P. Raif, and A-H.Tan, Motivated Learning for the
Development of Autonomous Systems, Cognitive Systems Research, Special issue on Computational Modeling and Application of Cognitive Systems, 12 January 2011.
• Starzyk J.A., Raif P., Ah-Hwee Tan, Motivated Learning as an Extension of Reinforcement Learning, Fourth International Conference on Cognitive Systems, CogSys 2010, ETH Zurich, January 2010.
• Starzyk J.A., Raif P., Motivated Learning Based on Goal Creation in Cognitive Systems, Thirteenth International Conference on Cognitive and Neural Systems, Boston University, May 2009.
• J. A. Starzyk, Motivation in Embodied Intelligence, Frontiers in Robotics, Automation and Control, I-Tech Education and Publishing, Oct. 2008, pp. 83-110.
top related