keyan zahedi , nihat ay, ralf der (published on: may 19, 2012) artificial neural network
DESCRIPTION
Higher Coordination with Less Control – A Result of Information Maximization in the Sensorimotor Loop. Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network Biointelligence Lab School of Computer Science and Engineering Seoul National University - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/1.jpg)
Higher Coordination with Less Control – A Result of Information Maximization in the Sensorimotor Loop
Keyan Zahedi, Nihat Ay, Ralf Der(Published on: May 19, 2012)
Artificial Neural NetworkBiointelligence Lab
School of Computer Science and EngineeringSeoul National University
Presenter:Sangam Uprety
Student ID: 2012-82153October 09, 2012
![Page 2: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/2.jpg)
Contents
1. Abstract2. Introduction3. Learning Rule4. Experiment5. Results6. Questions7. Discussion and Conclusion
![Page 3: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/3.jpg)
1. Abstract• A novel learning method in the context of embodied artificial
intelligence and self-organization• Less assumptions and restrictions within the world and the
underlying model• Uses the principle of maximizing the predictive information in
the sensorimotor loop• Evaluated on robot chains of varying length with individually
controlled, non-communicating segments• Maximizing the predictive information per wheel leads to a
higher coordinated behavior• Longer chains with less capable controllers outperform those
of shorter length and more complex controllers
![Page 4: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/4.jpg)
2. Introduction• Embodied artificial intelligence or cognitive systems use
learning and adaption rules• Most are based on an underlying model – so they are limited
to the model• They use intrinsically generated reinforcement signals
[prediction errors] as an input to a learning algorithm• Need of a learning rule independent of model structure,
requires less assumptions about the environment• Self-organized learning
![Page 5: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/5.jpg)
• Our way out: Directly calculate the gradient of the policy as a result of the current locally available approximation of the predictive information
• A learning rules based on Shannon’s information theory• A neural network in which earlier layers maximize the
information passed to the next layer
![Page 6: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/6.jpg)
3. Learning Rule3.1 Basic Sensori-motor loop
W0,1,…,t world stateS0,1,…,t sensor stateM0,1,…t memory stateA0,1,…,t Actions
![Page 7: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/7.jpg)
3. Learning Rule (Contd.)• The sensor state St depends only on the current world state
Wt. • The memory state Mt+1 depends on the last memory state
Mt, the previous action At, and the current sensor state St+1. • The world state Wt+1 depends on the previous state Wt and
on the action At. • No connection between the action At and the memory state
Mt+1, because we clearly distinguish between inputs and outputs of the memory Mt (which is equivalent to the controller).
• Any input is given by a sensor state St, and any output is given in form of the action state At.
• The system may not monitor its outputs At directly, but through a sensor, hence the sensor state St+1.
![Page 8: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/8.jpg)
3.2 Reduced sensori-motor loop
• Progression from step t to t+1• A, W, S present states given by distribution µ• α(a|s) defines the policy• β(w’|w,a) evolution of world given present world w and
action a• ϒ(s’|w’) effect of the world on the sensor state
![Page 9: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/9.jpg)
3.3 Derivation of Learning RuleThe Entropy H(X) of a random variable X, measuring the
uncertainty, is:
The mutual information of two random variables X and Y is:
This gives how much knowledge of Y reduces the uncertainly of X.
The maximal entropy is the entropy of a uniform distribution:H(X) <= log2|X|.
![Page 10: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/10.jpg)
β(w’|w,a), ϒ(s’|w’) δ(s’|a,s)
![Page 11: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/11.jpg)
• p(s), α(a|s) and δ(s’|a,s) are represented as matricesUpdate Rule for sensor distribution p(s)
![Page 12: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/12.jpg)
Update Rule for world model δ(s’|a,s)
![Page 13: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/13.jpg)
Update rule for policy α(a|s)
![Page 14: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/14.jpg)
4. Experiment4.1 SimulatorsYARS (Zahedi et al, 2008) has been used for the simulator
4.2 RobotsTwo wheeled differential drive robots with circular body – the
Khepera I robot (Mondada et al., 1993)
![Page 15: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/15.jpg)
• Input-output desired wheel velocity (At) and current actual velocity (St)
• At and St mapped linearly to the interval [-1,1]• -1 maximum negative speed (backwards motion)• +1 maximal positive speed (forward motion)• Robots are connected by a limited hinge joint with a maximal
deviation of ±0.9 rad (approx. 100 degree) avoiding intersection of neighboring robots
• Experiments with single robot, three-, and five-segment chaings
![Page 16: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/16.jpg)
4.3 Controller• Each robot controlled locally• Two control paradigms: combined and split• No communication between controllers• Interaction occurs through world state Wt through sensor St
current actual wheel velocity• r-c notation• r {1,3,5}• c {r,2r}
![Page 17: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/17.jpg)
![Page 18: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/18.jpg)
4.4 Environment• 8x8 meters, bounded, featureless environment• Large enough for the chains to learn a coordinated behavior
![Page 19: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/19.jpg)
5. Results• If pi increased over time for all six configurations?• If the maximization of the pi leads to qualitative changes on
the behavior?• Videos
![Page 20: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/20.jpg)
5.1 Maximizing the predictive information
Fig. Average-PI plots for each of the six experiments: 1-1, 3-3, 5-5, 1-2, 3-6, 5-10
![Page 21: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/21.jpg)
Comparison of intrinsically calculated PI (left) and PI calculated on recorded data per robot
![Page 22: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/22.jpg)
5.2 Comparing Behaviors
Fig. Trajectories of the six systems for the first 10 minutes (gray) and the last 100 minutes (black)
![Page 23: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/23.jpg)
1. All configurations explore the entire area2. Longer consecutive trails relate to higher average sliding
window coverage entropy3. The configurations which show longer consecutive trails are
those, which reach higher coverage entropy sooner Movements only occur for chains with length larger than one
if the majority of the segments moves in one direction Cooperation of the segments Higher cooperation among the segments of the split
configuration Higher pi relates to higher coverage entropy and higher sliding
window coverage entropy, for the split controller paradigm
![Page 24: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/24.jpg)
![Page 25: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/25.jpg)
4.3 Behavior AnalysisChosen bins: -3/4, -1/2, 1/2, 3/4With Configuration 1-2
![Page 26: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/26.jpg)
Transient plot wheel velocities oscillates between -1/2 and -3/4
S=-1/2 A {-1/2, 1/2, 3/4} S=-1/2A=-3/4 chosen with probability 0.95With probability 0.05, change of direction of velocity occurs,
leading to either rotation of the system, or inversion of the translational behavior
Sensor entropy H(S) is high, conditional entropy H(S’|S) is low, hence high PI
![Page 27: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/27.jpg)
With configuration 3-6
![Page 28: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/28.jpg)
• Wheel velocity of one wheel is no longer only influenced by its controller, but also by the actions of the other controllers
• Current direction of the wheel rotation is maintained with the probability 0.6
• For the entire system to progress, at least two robots [i.e four related controllers] must move in the same direction probability 0.4 4
![Page 29: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/29.jpg)
4.4 Incremental Optimization• The derived learning rule is able to maximize the predictive
information for systems in the sensorimotor loop• Increase of the PI relate to changes in the behavior and here
to a higher coverage entropy – and indirect measure for coordination among the coupled robots
![Page 30: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/30.jpg)
6. QuestionsQ.1 Explain the concept of the perception-action-cycle in fig. 1. What
are the essential characteristics of this concept? How is this concept distinguished from traditional symboloc AI approach?
Q.2 Explain the simplified version of the perception-action-cycle in fig. 2. What are their differences from the full version of figg. 1? How reasonable is this simplification? When it will work and when it does not?
Q.3 Define mutual information. Define the predictive information. Give a learning rule that maximizes the predictive information. Derive the learning rules.
Q.4 Explain the experimental tasks that are designed by the authors to evaluate the learning rule for predictive information maximization. What’s the setup? What is the task? What has been measured in simulation experiments? Summarize the results. What’s the conclusion of the experiments?
![Page 31: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/31.jpg)
7. Discussion & Conclusion• A novel approach to self-organized learning in the
sensorimotor loop, which is free of assumptions on the world and restrictions on the model
• Learning algorithm derived from the principle of maximizing the predictive information
• The average approximated predictive information increased over time in each of the settings in the experiment [Goal #1 achieved]
• There is a higher coverage entropy, a measure for coordinated behavior, for chain configurations with more robots (and well with split controllers) [counterintutive!]
![Page 32: Keyan Zahedi , Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816335550346895dd3bf70/html5/thumbnails/32.jpg)
Thank you!