learning equivalent action choices from demonstration (s. chernova and m. veloso )
DESCRIPTION
Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso ). Basia Korel Brown University cs2950-z February 15, 2010. Outline. Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm - PowerPoint PPT PresentationTRANSCRIPT
Basia KorelBrown University cs2950-z
February 15, 2010
Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm Experiments and Results Conclusion
Addressing: equivalent action choices The context: learning from demonstration In the real world: equivalent actions
demonstrated arbitrarily and inconsistently
Resulting problem: labeled training data lacks consistency
Contribution: identify, represent and enact equivalent action choices Identify conflicting demonstrations Represent choice of multiple actions in the
policy Common assumption of previous
approaches: each state maps to one best action
Learning equivalent actions is built upon: Confident Execution: to obtain teacher
demonstrations and learn the action policy Corrective Demonstration: to correct
execution mistakes by additional demonstrations
An interactive learning algorithm. Given the current world state, the robot: Determines the need for a demonstration
based on a confidence May request demonstrations to improve policy
Robot’s policy represented by classifier C : s(a,c,db) Trained using states as inputs and actions as
labels Measure of action selection confidence
An algorithm to correct unwanted actions by providing the teacher with supplementary corrective demonstrations
Assumptions made: One-to-one state-action mapping Consistent demonstrations A complete policy given enough
demonstrations Assumptions may fail in the real world!
Multiple equivalent actions cause ambiguity Robot sensor noise may cause inconsistency
Option class: a cluster of data points that have been labeled with at least two different actions
Algorithm: extracts and explicitly models option classes in the robot’s policy
given demonstration dataset DM PointsInLowConfidenceRegion(D)d MeanNearestNeighborDist(D)C ConnectedComponents(M,d)for c ∈ C do A ActionClasses(c) if Size(c) > 3 and Size(A) > 1 then CreateClass(D, c, Option-A)UpdateClassifier(D)ResetClass(D)
Obstacle avoidance domain:
Gathered data:
Evaluation: Confident Execution with and without option classes
Metrics: % of complete policies # of demonstrations NOT classification accuracy
Results (with respect to option classes): Converge to complete policy with much higher
frequency Required demonstrations much lower
Multiple equivalent actions exist in the real world
Model action choices explicitly in the policy
Domain limitations: discrete action labels
Chad Jenkins, Brown RLAB and cs2950-z course staff/leaders