14. behavioural cloning of control skill ivan bratko, tanja urbancic and claude sammut 발표 :...
TRANSCRIPT
14. 14. Behavioural Cloning of Behavioural Cloning of Control SkillControl Skill
Ivan Bratko, Tanja Urbancic and Claude SammutIvan Bratko, Tanja Urbancic and Claude Sammut
발표발표 : : 송창빈송창빈
2
ContentsContents
IntroductionIntroduction Behavioural CloningBehavioural Cloning ExperimentsExperiments
• Pole balancingPole balancing
• Piloting aircraftPiloting aircraft
• Driving container cranesDriving container cranes
• Production line schedulingProduction line scheduling
DiscussionDiscussion
3
IntroductionIntroduction
Controlling a complex dynamic system (a plane or a crane) Controlling a complex dynamic system (a plane or a crane) requires a requires a skilled operatorskilled operator
Behavioural cloningBehavioural cloning: using standard ML techniques to : using standard ML techniques to learn control rules from traces of human performancelearn control rules from traces of human performance
RequiresRequires• designing a suitable target concept representationdesigning a suitable target concept representation
• choosing good training exampleschoosing good training examples
• interpreting the results of MLinterpreting the results of ML
4
Behavioural CloningBehavioural Cloning
5
Behavioural CloningBehavioural Cloning
StateState: an attribute-value vector of the controlled system: an attribute-value vector of the controlled system• e.g. in piloting: throttle, flaps, ailerons, etc.e.g. in piloting: throttle, flaps, ailerons, etc.
ActionAction: a class value for a learning program: a class value for a learning program
Action = f (State)Action = f (State) A time delay is introducedA time delay is introduced
Action(Time) = f(State(Time - Delay))Action(Time) = f(State(Time - Delay)) Behaviour trace: a sequence of pairs (Behaviour trace: a sequence of pairs (StateState, , ActionAction)) In some case, divide the behaviour traces into phasesIn some case, divide the behaviour traces into phases
• e.g. a flight: take-off, straight-and-level flight, etc.e.g. a flight: take-off, straight-and-level flight, etc.
6
Pole BalancingPole Balancing
The Problem: balancing a pole on a cart moving on a track The Problem: balancing a pole on a cart moving on a track of limited lengthof limited length
Choice of Examples: 20 students control with a joystick Choice of Examples: 20 students control with a joystick the pole-and-cart system simulated by a PCthe pole-and-cart system simulated by a PC
Time Delay: 400 msTime Delay: 400 ms Clean-up EffectClean-up Effect
• By the averaging effect implicit in inductive generalization, By the averaging effect implicit in inductive generalization, inconsistency and moments of inattention were stripped awayinconsistency and moments of inattention were stripped away
• A clone outperform human operator in smaller rangesA clone outperform human operator in smaller ranges
7
Pole BalancingPole Balancing
Brittleness: widely robustBrittleness: widely robust Transparency of Induced Rules: a small number of Transparency of Induced Rules: a small number of
readable rules were generatedreadable rules were generated
8
Learning to FlyLearning to Fly
The ProblemThe Problem• Fly according to a predefined flight planFly according to a predefined flight plan
• The data are collected from a flight simulation programThe data are collected from a flight simulation program– action logs taken by pilotsaction logs taken by pilots
– the values of the state variables of simulator (pitch, roll, yaw, climb the values of the state variables of simulator (pitch, roll, yaw, climb rate, air speed, etc.)rate, air speed, etc.)
• Segmented into stages: take-off and climb, straight and level Segmented into stages: take-off and climb, straight and level flight, turn, descend and line up on runway, landflight, turn, descend and line up on runway, land
• Used induction programs: C4.5, CARTUsed induction programs: C4.5, CART
9
Learning to FlyLearning to Fly
Choice of ExamplesChoice of Examples• Mixing data from different pilots tends to confuse the inductionMixing data from different pilots tends to confuse the induction
• Auto-pilots were constructed from traces of individual pilotsAuto-pilots were constructed from traces of individual pilots
Time Delay: not critical when some delay was presentTime Delay: not critical when some delay was present BrittlenessBrittleness
• The original clones were very brittle with respect to changesThe original clones were very brittle with respect to changes
• Trained in a noisy environment, clones were more robustTrained in a noisy environment, clones were more robust
Transparency of Induced RulesTransparency of Induced Rules• Generated rules have been large and opaqueGenerated rules have been large and opaque
• Need high-level state variablesNeed high-level state variables
10
Container CranesContainer Cranes
The ProblemThe Problem• Minimizing the time to transport a containerMinimizing the time to transport a container
• System variables: position/velocity of trolley, length/velocity and System variables: position/velocity of trolley, length/velocity and inclination angle/velocity of ropeinclination angle/velocity of rope
• Used induction programs: RETIS, M5Used induction programs: RETIS, M5
Choice of Examples: the same control styleChoice of Examples: the same control style Time Delay: no clear indication of appropriate delayTime Delay: no clear indication of appropriate delay BrittlenessBrittleness
• sensitive to the choice of particular traces and to the learning sensitive to the choice of particular traces and to the learning program settingsprogram settings
• not robust with respect to changes (e.g. different position)not robust with respect to changes (e.g. different position)
11
Container CranesContainer Cranes
Clean-up EffectClean-up Effect
12
Production Line SchedulingProduction Line Scheduling
The ProblemThe Problem• To determine an optimum allocation of labor for a period of time on a To determine an optimum allocation of labor for a period of time on a
production lineproduction line
• Between each step, buffers stores output for input to the nextBetween each step, buffers stores output for input to the next
• Attribute is the buffer levelAttribute is the buffer level
Choice of Examples: individual human schedulersChoice of Examples: individual human schedulers Time Delay: not a significant factorTime Delay: not a significant factor Clean-up Effect: A clone is more closer to specified buffer Clean-up Effect: A clone is more closer to specified buffer
level than humanlevel than human Brittleness: quite robustBrittleness: quite robust Transparency of Induced Rules: understandable by expertTransparency of Induced Rules: understandable by expert
13
DiscussionDiscussion
Choice of example traces for learningChoice of example traces for learning• to use training examples from the same subject onlyto use training examples from the same subject only
Time delay between state and actionTime delay between state and action• to try first with zero and increase the delay graduallyto try first with zero and increase the delay gradually
Designing the representationDesigning the representation (choosing attributes) (choosing attributes)• useful to take into account the operator’s verbal description of the useful to take into account the operator’s verbal description of the
skillskill
The clones lack the conceptual structureThe clones lack the conceptual structure• goals and subgoals, phases and causalitygoals and subgoals, phases and causality