17º international congress of mechanical engineering november 10–14, 2003 – holiday inn select...
Post on 18-Jan-2018
221 Views
Preview:
DESCRIPTION
TRANSCRIPT
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Authors: Areolino de Almeida Neto - UFMA Bodo Heimann - University of Hannover Luiz Carlos S. Góes - ITA Cairo L. Nascimento Jr. - ITA
Avoidance Avoidance of Multiple of Multiple Dynamic Dynamic ObstaclesObstacles
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
ObjectiveObjective
To drive a mobile robot by a safe path using indications of directions which avoid dynamic and static obstacles
goalrobot obstacle
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Reinforcement LearningReinforcement Learning
Characteristics: Intuitive data Cumulative learning Constructive solution Direct knowledge acquisition Very adequate to decision making
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Reinforcement LearningReinforcement Learning
States: Distance to the possible point
of collision (4)
Direction of the obstacle (8)
Shortest distance between obstacle and the robot’s path (8)
Time condition of arriving (3)
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Reinforcement LearningReinforcement Learning
Actions: lateral velocity
3 to the right1 null3 to the left
frontal velocity3 ahead1 null3 back
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Reinforcement LearningReinforcement Learning
State are mapped to actions using a coding scheme. There are 768 states.
For each state there are 49 possible actions and their corresponding “evaluation value”.
Training means creating the “evaluation values” for each state and the possible actions.
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Reinforcement LearningReinforcement Learning
600/n10/at100f
Training using only one obstacle: 1st level: Monte Carlo (~450000 runs)
directfast computationevaluation function
2nd level: Q-learning
necessary in around50 situations
t: duration of movementa: number of actionsn: iteration number
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Architecture: Use of a path a priori (static environment) Detection of a possibility of collision
Classification of a collision:possible immediate
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Algorythm for Multiple Obstacles Avoidance: One obstacle is defined as main and actions are
indicated based on this obstacle; The situation is divided in sectors:
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Algorythm for Multiple Obstacles Avoidance: If the last action decided has a chance to avoid the
obstacles (if it can drive the robot by a free and sufficient large sector), than it is maintained;
If not, then the RL technique indicates 10 actions for the present situation. For a new situation, the actions are the 10 best actions, otherwise they are the 10 best actions belong to the same quadrant of the last action;
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Algorythm for Multiple Obstacles Avoidance: For the 10 actions indicated, if more than one can
drive the robot by a safe trajectory, then the action with the fewest changing in lateral velocity is chosen;
If none, so the 10 actions indicated are reflected to the other side (left or right side) and a safe trajectory is searched again;
If none, an action, considering the 10 best actions for all quadrants, that presents the possibility of no collision is immediately chosen
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Algorythm for Multiple Obstacles Avoidance: If none, so an action, considering the 10 best
actions for all quadrants, that presents the possibility of arrival at the collision point before or after the obstacle is immediately chosen;
Finally, if none was found, then the robot should stop.
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Reinforcement LearningReinforcement Learning
Neural Representation: Problem: state-action matrix explosion (37632) Solution: neural representation Use of multiple neural networks
• training 1st NN: E = D – Y1
• training the 2nd NN: E = (D – Y1) – Y2
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Results:
17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel
São Paulo - SP - Brazil
Obstacle AvoidanceObstacle Avoidance
Conclusion: Complex avoidance with primitives actions Direct knowledge with Monte Carlo technique Improvement in knowledge with Q-learning Neural representation can compact well the
state-action matrix
Acknowledgements: CAPES, DAAD, UFMA and ITA for the financial support
top related