14. behavioural cloning of control skill ivan bratko, tanja urbancic and claude sammut 발표 :...

14. 14. Behavioural Cloning of Behavioural Cloning of Control SkillControl Skill

Ivan Bratko, Tanja Urbancic and Claude SammutIvan Bratko, Tanja Urbancic and Claude Sammut

발표발표 : : 송창빈송창빈

2

ContentsContents

IntroductionIntroduction Behavioural CloningBehavioural Cloning ExperimentsExperiments

• Pole balancingPole balancing

• Piloting aircraftPiloting aircraft

• Driving container cranesDriving container cranes

• Production line schedulingProduction line scheduling

DiscussionDiscussion

3

IntroductionIntroduction

Controlling a complex dynamic system (a plane or a crane) Controlling a complex dynamic system (a plane or a crane) requires a requires a skilled operatorskilled operator

Behavioural cloningBehavioural cloning: using standard ML techniques to : using standard ML techniques to learn control rules from traces of human performancelearn control rules from traces of human performance

RequiresRequires• designing a suitable target concept representationdesigning a suitable target concept representation

• choosing good training exampleschoosing good training examples

• interpreting the results of MLinterpreting the results of ML

4

Behavioural CloningBehavioural Cloning

5

Behavioural CloningBehavioural Cloning

StateState: an attribute-value vector of the controlled system: an attribute-value vector of the controlled system• e.g. in piloting: throttle, flaps, ailerons, etc.e.g. in piloting: throttle, flaps, ailerons, etc.

ActionAction: a class value for a learning program: a class value for a learning program

Action = f (State)Action = f (State) A time delay is introducedA time delay is introduced

Action(Time) = f(State(Time - Delay))Action(Time) = f(State(Time - Delay)) Behaviour trace: a sequence of pairs (Behaviour trace: a sequence of pairs (StateState, , ActionAction)) In some case, divide the behaviour traces into phasesIn some case, divide the behaviour traces into phases

• e.g. a flight: take-off, straight-and-level flight, etc.e.g. a flight: take-off, straight-and-level flight, etc.

6

Pole BalancingPole Balancing

The Problem: balancing a pole on a cart moving on a track The Problem: balancing a pole on a cart moving on a track of limited lengthof limited length

Choice of Examples: 20 students control with a joystick Choice of Examples: 20 students control with a joystick the pole-and-cart system simulated by a PCthe pole-and-cart system simulated by a PC

Time Delay: 400 msTime Delay: 400 ms Clean-up EffectClean-up Effect

• By the averaging effect implicit in inductive generalization, By the averaging effect implicit in inductive generalization, inconsistency and moments of inattention were stripped awayinconsistency and moments of inattention were stripped away

• A clone outperform human operator in smaller rangesA clone outperform human operator in smaller ranges

7

Pole BalancingPole Balancing

Brittleness: widely robustBrittleness: widely robust Transparency of Induced Rules: a small number of Transparency of Induced Rules: a small number of

readable rules were generatedreadable rules were generated

8

Learning to FlyLearning to Fly

The ProblemThe Problem• Fly according to a predefined flight planFly according to a predefined flight plan

• The data are collected from a flight simulation programThe data are collected from a flight simulation program– action logs taken by pilotsaction logs taken by pilots

– the values of the state variables of simulator (pitch, roll, yaw, climb the values of the state variables of simulator (pitch, roll, yaw, climb rate, air speed, etc.)rate, air speed, etc.)

• Segmented into stages: take-off and climb, straight and level Segmented into stages: take-off and climb, straight and level flight, turn, descend and line up on runway, landflight, turn, descend and line up on runway, land

• Used induction programs: C4.5, CARTUsed induction programs: C4.5, CART

9

Learning to FlyLearning to Fly

Choice of ExamplesChoice of Examples• Mixing data from different pilots tends to confuse the inductionMixing data from different pilots tends to confuse the induction

• Auto-pilots were constructed from traces of individual pilotsAuto-pilots were constructed from traces of individual pilots

Time Delay: not critical when some delay was presentTime Delay: not critical when some delay was present BrittlenessBrittleness

• The original clones were very brittle with respect to changesThe original clones were very brittle with respect to changes

• Trained in a noisy environment, clones were more robustTrained in a noisy environment, clones were more robust

Transparency of Induced RulesTransparency of Induced Rules• Generated rules have been large and opaqueGenerated rules have been large and opaque

• Need high-level state variablesNeed high-level state variables

10

Container CranesContainer Cranes

The ProblemThe Problem• Minimizing the time to transport a containerMinimizing the time to transport a container

• System variables: position/velocity of trolley, length/velocity and System variables: position/velocity of trolley, length/velocity and inclination angle/velocity of ropeinclination angle/velocity of rope

• Used induction programs: RETIS, M5Used induction programs: RETIS, M5

Choice of Examples: the same control styleChoice of Examples: the same control style Time Delay: no clear indication of appropriate delayTime Delay: no clear indication of appropriate delay BrittlenessBrittleness

• sensitive to the choice of particular traces and to the learning sensitive to the choice of particular traces and to the learning program settingsprogram settings

• not robust with respect to changes (e.g. different position)not robust with respect to changes (e.g. different position)

11

Container CranesContainer Cranes

Clean-up EffectClean-up Effect

12

Production Line SchedulingProduction Line Scheduling

The ProblemThe Problem• To determine an optimum allocation of labor for a period of time on a To determine an optimum allocation of labor for a period of time on a

production lineproduction line

• Between each step, buffers stores output for input to the nextBetween each step, buffers stores output for input to the next

• Attribute is the buffer levelAttribute is the buffer level

Choice of Examples: individual human schedulersChoice of Examples: individual human schedulers Time Delay: not a significant factorTime Delay: not a significant factor Clean-up Effect: A clone is more closer to specified buffer Clean-up Effect: A clone is more closer to specified buffer

level than humanlevel than human Brittleness: quite robustBrittleness: quite robust Transparency of Induced Rules: understandable by expertTransparency of Induced Rules: understandable by expert

13

DiscussionDiscussion

Choice of example traces for learningChoice of example traces for learning• to use training examples from the same subject onlyto use training examples from the same subject only

Time delay between state and actionTime delay between state and action• to try first with zero and increase the delay graduallyto try first with zero and increase the delay gradually

Designing the representationDesigning the representation (choosing attributes) (choosing attributes)• useful to take into account the operator’s verbal description of the useful to take into account the operator’s verbal description of the

skillskill

The clones lack the conceptual structureThe clones lack the conceptual structure• goals and subgoals, phases and causalitygoals and subgoals, phases and causality

14. behavioural cloning of control skill ivan bratko, tanja urbancic and claude sammut 발표 :...

Documents