learning equivalent action choices from demonstration (s. chernova and m. veloso )

Basia KorelBrown University cs2950-z

February 15, 2010

Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm Experiments and Results Conclusion

Addressing: equivalent action choices The context: learning from demonstration In the real world: equivalent actions

demonstrated arbitrarily and inconsistently

Resulting problem: labeled training data lacks consistency

Contribution: identify, represent and enact equivalent action choices Identify conflicting demonstrations Represent choice of multiple actions in the

policy Common assumption of previous

approaches: each state maps to one best action

Learning equivalent actions is built upon: Confident Execution: to obtain teacher

demonstrations and learn the action policy Corrective Demonstration: to correct

execution mistakes by additional demonstrations

An interactive learning algorithm. Given the current world state, the robot: Determines the need for a demonstration

based on a confidence May request demonstrations to improve policy

Robot’s policy represented by classifier C : s(a,c,db) Trained using states as inputs and actions as

labels Measure of action selection confidence

An algorithm to correct unwanted actions by providing the teacher with supplementary corrective demonstrations

Assumptions made: One-to-one state-action mapping Consistent demonstrations A complete policy given enough

demonstrations Assumptions may fail in the real world!

Multiple equivalent actions cause ambiguity Robot sensor noise may cause inconsistency

Option class: a cluster of data points that have been labeled with at least two different actions

Algorithm: extracts and explicitly models option classes in the robot’s policy

given demonstration dataset DM PointsInLowConfidenceRegion(D)d MeanNearestNeighborDist(D)C ConnectedComponents(M,d)for c ∈ C do A ActionClasses(c) if Size(c) > 3 and Size(A) > 1 then CreateClass(D, c, Option-A)UpdateClassifier(D)ResetClass(D)

Obstacle avoidance domain:

Gathered data:

Evaluation: Confident Execution with and without option classes

Metrics: % of complete policies # of demonstrations NOT classification accuracy

Results (with respect to option classes): Converge to complete policy with much higher

frequency Required demonstrations much lower

Multiple equivalent actions exist in the real world

Model action choices explicitly in the policy

Domain limitations: discrete action labels

Chad Jenkins, Brown RLAB and cs2950-z course staff/leaders

learning equivalent action choices from demonstration (s. chernova and m. veloso )

Documents

complete action policies

unwanted actions

classifier c

dfor c c

real worldmodel action

complete policies

option classesmetrics

teacher demonstrations