movements for autonomous service robotics learning from ... · abstract this paper presents a...

Learning from Demonstration: RepetitiveMovements for Autonomous Service Robotics

Holger [email protected]

Alin Albu-Schä[email protected]

Patrick van der [email protected]

German Aerospace Center, Institute of Robotics and MechatronicsP.O. Box 1116, D-82230 Wessling, Germany

Abstract— This paper presents a method for learning andgenerating rhythmic movement patterns based on a simplecentral oscillator. It can be used to generate cyclic movementsfor a robot system which has to solve complex tasks. Thesystem is laid out in such a way that multiple motion dimen-sions, or degrees of freedom of the robot, are representedindependent of each other; therefore, an extension to higher-dimensional problems is easily possible. Guiding the robot byholding its end-effector, the user teaches simple movementprimitives forming the basis for a more complex task. Eachmovement primitive is represented in the system using anoscillator combined with a learned nonlinear mapping. Theseprimitives are then optimally combined to a complete solutionto the posed problem. Said optimality is obtained usingsimulated annealing with the A∗ global search algorithm. Ourapproach is demonstrated on the problem of wiping a table,but can be used for many typical problems in service andhousehold robotics.

I. I NTRODUCTION

While most present-day applications of robots are stillrestricted to industrial environments, the field of servicerobotics has a huge potential for growth. The developmentof lightweight arms and hands as well as the progress inrobot navigation systems (mobile platforms) on one side,and the development of advanced humanoid robots on theother, provide the hardware premises for a break-throughin this field. Increased sensor feedback capabilities (force-torque, visual, tactile, etc.), and the growth of comput-ing power give the possibility to react more flexibly onchanging environments. But providing the robots with anappropriate degree of autonomy which makes them ableto use these basic reactive features, as well as providinga simple and efficient human user interface, are still verychallenging research topics. Especially the robot program-ming task, done by the—often technically unskilled—user,has to be as simple as possible. In this concept, program-ming by demonstration is a widely accepted paradigm.The usual approach, in which the human demonstrates acomplete task and the recorded trajectory is followed by therobot, would certainly lack the flexibility needed to survivein common household environments. These environmentshave a continuously changing, complex structure.

A possible solution is the use of small motion primitiveswhich the user teaches to the robot. Equipped with theappropriate algorithms, the robot uses these primitives incombination with sensory input to accomplish a familyof complex tasks. In this paper, we demonstrate such

algorithms which may be generally useful in this context.We demonstrate our approach in an application whichconsists of wiping surfaces which are previously specifiedby a user or detected by a vision system. Only a “wipingstyle” has to be taught by the user, in the form of a primitivemovement pattern.

The solution presented in this paper is focused on thefollowing topics:

• teaching and learning phase of the primitive move-ment using a torque-controlled manipulator;

• providing a trajectory generator (as an alternative tousual trajectory interpolation), which can combinethese primitive movements to a smooth trajectory andwhich reacts adequately to external disturbances. Suchdisturbances could be obstacles or humans which mayinteract with the robot during execution;

• developing an algorithm which allows automaticmovement generation from the taught primitives;

• execution of the entire task in a realistic environment,using the same manipulator, in order to validate theapproach and test the required robustness and reactiv-ity.

Most of the previous work on this topic tries to solveonly small portions of the overall problem. In [3] thefocus is on the combination of primitive tasks, yet nolearning or autonomous problem-solving is available. Workon learning focuses mainly on learning a subtask or onhow to decide which part of the demonstrated informationis relevant or how to construct a neural oscillator [4], [6],[14], but no autonomous task-solving has been applied.[8]–[10] Present a more complete solution for learning apoint-to-point or repetitive movement and generating it ina very flexible way. However, primitive movements are notcombined in order to solve complex tasks. In [11] a similarsystem (with neural oscillators) is used to generate bipedwalking patterns. Another problem of programming bydemonstration is presented in [5]—the demonstrated taskhas to be analysed for its relevant actions and segmentedaccordingly to ensure a general knowledge of how to solvesuch a task. Yet this is a quite different problem fromthe one presented in this paper, where the intention is tocombine rhythmic movements to a task more complex thanpick-and-place operations. In [2] movement primitives areused to navigate a marble through a Marble Maze. Focus

0-7803-8463-6/04/$20.00 ©2004 IEEE

lies on learning the combination of movement primitives(such as “move marble away from wall”) to successfullynavigating the marble to its goal. No unforeseen externalinteraction occurs.

The first part of our paper (section II), which handleslearning the movement primitives and the trajectory gen-eration, is based on the results from [8]–[10]. Herein, anovel approach for learning and generation of rhythmicmovement patterns is presented. In contrast to this ap-proach, however, we have reduced the complexity whilemaintaining its flexibility. Learning is achieved by using asum of weighted Gauss kernels which maps a centralisedoscillator signal to the desired movement. To increase therequired robustness of execution, the difference betweendesired and actual robot position is used to slow downthe oscillator and hereby delay further generation of thetrajectory. Thus the system is able to cope with externaldisturbances or slows down if the robot cannot follow thetrajectory with the desired velocity.

The second part of the paper (section III) presents amethod to autonomously combine the learnt movementprimitives in order to solve the given task. An algorithmto cover the entire surface by primitive movements is pre-sented in section III-A, while finding an optimal executionorder of this movements is addressed in section III-B.Finally, experimental results are presented and discussed.

II. L EARNING AND GENERATING RHYTHMIC

MOVEMENTS

In this section we present the structure of the trajectorygenerator for the entire movement, as well as how themovement primitives are represented in the system.

The trajectory generator should reproduce the recordedtrajectory with the desired degree of flexibility and beable to combine these primitives to a smooth movement.Therefore it should have following features:

• it should be possible to stop the generation at anypoint of movement;

• the learnt movement should be smooth in order toappear more natural;

• to enhance smoothness between transitions from oneprimitive movement to another, a dynamic interpolatorto a moving goal is needed. This is also useful ifthe robot is displaced by external forces from thetrajectory (e.g., by a human pushing it away), and hasto return smoothly on the path after the disturbancedisappears;

• the speed of the movement must be adjustable (e.g.,to cope with dynamic constraints of the robot).

Most of these demands are met by the system developedin [8]–[10]. There, an oscillator that can be slowed downby an input signal is used as a control policy. This controlpolicy allows the generation of the desired movement thatis time-invariant, and also allows a rudimentary reactionon disturbances. The oscillator signal is used as a drivingsignal for the weighted sum of Gauss kernels in orderto generate the basic movement, which in turn can be

modified (e.g., scaled, translated, or rotated) without dis-turbing the stability of the whole system. To ensure asmooth trajectory, a first-order filter is used. The differencebetween the desired and the actual robot position slowsdown the oscillator, and therefore the generated movementis slowed down in turn.

However, the presented system has some disadvantages.In the following we will describe the system taken from[9] and introduce some our modifications to solve thesedisadvantages.

A. The oscillator

For determining the progress of the learnt movementprimitive, [9] suggests the following nonlinear oscillator:

z = − µ

E0(E − E0)z − k2u, (1)

u = z[1 + αu(y − y)2

]−1, (2)

with E(u, z) = z2

2 + k2u2

2 as the actual energy of theoscillator, E0 the desired energy,µ the desired conver-gence rate, andk as the desired frequency.y Denotes themeasured andy the ordered robot position.z, u ∈ < Arestate variables.αu Is a scaling factor for the position errorfeedback. High values forαu or high position errors(y−y)draw u towards zero and therefore stop the oscillator. Forα = 0, Eqs. (1)–(2) reduce to a second order system withthe variable dampingd = µ(E/E0 − 1). It follows thatd < 0 inside the limit cycle andd > 0 outside, so that thecircle E = E0 becomes an attractor. Fig. 1 illustrates theconvergence to the limit cycle of the proposed oscillator.

-2

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5

u*k

z

d > 0

d < 0

Fig. 1. The oscillator proposed in [9]. It can be clearly seen that thecircular limit cycle is an attractor. The plot is drawn fory − y = 0,

E0 = 0.5, k = 2π, µ = 10, andαu = 200.

The driving signalφ for the learning function in sec-tion II-B is generated by using atan2:

φ = atan2(z, ku). (3)

This converts the oscillator statesu andz into a toothsaw-shaped signal. However, this oscillator is not able to stopequally distributed over the interval defined by the drivingsignal. Since onlyu is driven to zero by the position error,the oscillator will continue to move untilz is also zero.

This means that up to a half rotation may be performedbeforez reaches 0 andφ ceases to change.

To enable a immediate stop at any point of the move-ment, we introduce a simpler oscillator that directly gen-erates a toothsaw-shaped signal:

φ′i = φi +

m

1 + (αφ|f − y|)kφ, (4)

φi+1 ={

φ′i, if φ′

i < 1,φ′

i − 1, else,(5)

where f is the desired1 and y the measured position.αφ

And kφ are fine-tuning constants;αφ defines a low reactionzone andkφ > 1 the order of the braking effect. Theslopes of the oscillator signal are in the range[0,m).By design, this simpler oscillator has no problems withslowing down; Fig. 2 shows a direct comparison of thebreaking performance of the both oscillators.

0

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 1

φ

time

ab

Fig. 2. Breaking performance. Both oscillators (a is the original one asproposed in [9] and b is our) get a complete stopping signal

(|f− y| = ∞) at time 0. As can be clearly seen, the signal of b haltsimmediately, while the original oscillator a may need some time to react.

Notice that the convergence property of the oscillator ofEqs. (1) and (2), which is not present in Eqs. (4) and (5),is not truly exploited in [9]. The smooth convergence tothe desired trajectory is obtained instead by a linear filter,similar to the one mentioned in section II-B.

B. Learning a primitive movement

We approximate the movement by a sum of Gausskernels for each degree of freedom independently:

f =N∑

i=1

wi exp[−1

2(ci − hiφ)2

]. (6)

The parameterswi, ci, and hi have to be determinedto optimally fit the movement. Many methods exist; wehave chosen to use the Fletcher-Reeves-Conjugate-Gradientoptimisation methods, since it exhibits excellent resultsafter only few iterations.

The parameterisation of the primitive movements inwi,ci, and hi can, at a later stage, be used to classify suchprimitives [10].

1We have chosen the desired positionf instead of the ordered positiony, to ensure a trajectory as near as possible to the planned.

C. Finalisation

The function f computed in Eq. (6) describes thegenerated position, which still has to be transformed (i.e.,translated, rotated, and scaled):

f = H f, (7)

with H being a homogeneous transformation matrix.Thereforef denotes the desired position.

To ensure a smooth trajectory, a second-order filter isapplied:

y + αy y + βy(y − f) = 0, (8)

whereαy =

d

m, βy =

k

m, (9)

with d as damping,m as mass andk as spring constant. Byusing a filter instead of an extra trajectory interpolator (e.g.,Bézier splines) our system includes the possibility to movetowards a moving goal, while with an extra interpolator thiswould be difficult to achieve.

An additional feature of the presented system, whichgenerates rhythmic movements, is the fact that it worksfor an arbitrary number of degrees of freedom (DOF). Theoscillator acts as a centralised control policy to synchronisethe different DOFs, while every DOF is handled separatelythereafter, by having one approximator (6) per DOF. Ourtable wiping scenario requires only 2 DOFs, as the regionis planar and we are using Cartesian impedance control forthe robot.

Fig. 3 shows the structure for multiple DOFs.

nf

∆

~

1 f

f~n ny

1

~1f

y

y

φ

Fig. 3. How to enhance the presented system to multiple DOFs.

Our system, adapted from [8]–[10], demonstrates anexceptional robustness. The generated movement can bedisturbed in any way, which results in the oscillator stop-ping for the duration of the disturbance—without losingsignificant parts of the desired movement. Even if thetrajectory cannot be followed by the physical robot (e.g.,the robot is too slow or there are obstacles in its way), oursystem adopts quickly to this constraint and changes thepath-generation accordingly.

III. B UILDING A COMPLEX TASK FROM LEARNT

MOVEMENT PRIMITIVES

In the previous section we have introduced a system tolearn and represent primitive movements which are taughtby demonstration. We will now show how we can com-bine such primitives to autonomously construct a complexmovement, e.g., in order to wipe a complete table. Contraryto approaches where the whole task is taught, our two-stage

approach allows us to solve various different tasks withoutrequiring further operator interaction. For instance, we canwipe any area of the table, as dirt is detected by a camera,or clean windows or arbitrary surfaces.

The presented system has been implemented and testedon the Light-Weight-Robot II (LWR-II), a 7-DOF robotarm developed at the Institute of Robotics and Mechatron-ics of the German Aerospace Centre (DLR). In this partic-ular approach, the LWR-II runs with Cartesian impedance-control [1], which allows us to interact with the robot armby grasping and holding it during motion. In the teachingphase, the Cartesian stiffness for translational directions isset to zero, while the orientations are kept stiff. Duringthe execution phase, all Cartesian stiffness values are high,except for the normal direction to the surface. This valueis zero and the robot is commanded to exert a constantdesired force in this normal direction. Since the impedancecontroller is implemented based on measurements of jointtorque sensors, every collision along the structure of therobot, not only at the tip, is detected and slows down oreven stops the movement.

In order to solve the given task, we define the followingsub-tasks to wipe a table from recorded motion primitives:

A fill the wiping region with primitives, according toseveral optimality constraints;

B compute an optimal path;C execute the plan.

A. Filling the wiping region

In order to wipe the whole table, the primitive wipeoperation must be distributed over the area with as littleoverlap as possible, while not leaving out any part ofthe wipeable region. This planning problem has somesimilarities to cutting or packing problems (e.g., [12]).The fundamental difference, however, is that in our caseminimal overlap is allowed (and even required), whereasleaving out parts is strongly penalized.

For reasons of simplicity we assume that the whole areathat is described by the convex hull around the primitivewipe operation is also covered by that operation. Thusthis region can be exhaustively described by the polygondescribed by it. The whole table can then be considered tobe wiped when the primitive wipe polygons are optimallydistributed on the table.

Finding the optimal distribution of primitive wipe poly-gons on the table is an NP-problem, meaning that finding asolution to this problem grows non-polynomially with thenumber of polygons. Rather than following this path, inorder to find a solution in a reasonable amount of time wefollow the following heuristic. We rasterise the wipeableregion in, e.g., 100×100 segments orpixels. The primitivewiping polygons are drawn on this region using standarddrawing algorithms (c.q. the Bresenham line draw and linescan fill), while it is counted inpi,j how often a segmentat location (i, j) is drawn into. We can thus define the

following error:

erri,j =

d1, if pi,j = 0 andRd2(pi,j − 1), if pi,j > 0 andRd3 · pi,j , if R

(10)

with R denoting the inside of the wiping region andR itsoutside. We have empirically chosen the constantsd1 = 20,d2 = 1, andd3 = 5. The sum over all these per pixel-errorvalues defines the following global error function:

ERR=∑i,j

erri,j . (11)

A perfectly filled region has the error value of ERR= 0,and the value of ERR increases with more overlap andunfilled areas. The cost to compute ERR isO(m n2),with m being the number of wiping movements andnthe segmentation in each of the two possible directions.

Every primitive wipe polygon is determined by its po-sition (x, y) in the wipe region, its rotation angle, and itsscaling factor (which was typically chosen between 0.9 and1.3). A further free parameter of the system is the numberN of polygons used.

We use simulated annealing with a linear cooling schemeto find an optimal solution within the4N + 1 parameters.The quality of the resultant solution depends on the fine-ness of rasterisation as well as the cooling scheme; theslower the system is cooled, the lower the energy of thefound solution is expected to be. Both factors are limited bythe available computing time; a wiping robot taking longerthan a few minutes to compute its solution is hardly useful.

B. Optimising the movement plan

To ensure an acceptable wiping-time and a naturallylooking wiping movement, we use a heuristic to search ashort path over the centres of the wiping-movements. Thisis similar to the Traveling-Salesman problem, and manyviable algorithms exist. Since this particular task is not verycomplex and the number of polygons in a wipe region islimited, we decided to use the A∗-algorithm.

A∗ (e.g., [13]) is a modified breadth-first search (BFS)that utilizes a heuristic to estimate the distance fromthe current to the goal state. Using a greedy approach,only the shortest path is expanded at any time. As longas the heuristic underestimates the real distances, A∗ isguaranteed to find the shortest path, as BFS would.

For the heuristich we choose:

h = γ |O|, (12)

with |O| as the number of open nodes andγ as a distanceestimator that has to be set accordingly. One problem withA∗ is that it requires a large amount of memory (up toO(N !) for a badly chosenγ ≈ 0, whereN denotes thenumber of nodes), but overestimating the distance (andhence turning A∗ into an A) results into a sufficient shortpath while memory usage is cut down significantly.

Future work will include an optimisation over the start-ing and end points of the movement primitives, perhapseven the directions of the starting and end points will be

Fig. 4. Example for a short path over the centres of the primitivewiping movements (displayed by their convex hull).

learnedoriginal

1350

1400

1450

1500

1550

1600

1650

-1400 -1300 -1200 -1100 -1000 -900

x in

mm

y in mm

desired positionordered position

measured positionregion

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

brea

king

fact

or

time*0.06s

Fig. 5. Recorded movements (middle) from an example run withroundish movements (top). Especially interesting is the breaking factor

(see Eq. (4)) as the spikes are indicating the switching to a newtransformation MatrixH. It is evident that the roundish movement

primitive can be easily followed by the robot.

included in the evaluation, to achieve an even more naturalimpression.

C. Executing the movement plan

The filling (see section III-A) and the sequencing (seesection III-B) together define thewiping plan which, inturn, is executed by the system described above.

The execution of the movement plan is straightforward.Since the repetitive generation of the primitive movementis generated by the system described in section II, only anew movement transformation matrixH has to be used inEq. (7) whenever the oscillator (Eqs. (4) and (5)) takes itsbackward step. Smoothness of the trajectory is still guar-

learnedoriginal

1350

1400

1450

1500

1550

1600

1650

-1400 -1300 -1200 -1100 -1000 -900

x in

mm

y in mm

desired positionordered position

measured positionregion

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

brea

king

fact

or

time*0.06s

Fig. 6. Recorded movements (middle) from an example run withzigzag movements (top). In contrast to the roundish movement (see

Fig. 5) the breaking factor indicates that the impedance controlled robothas quite a problem to follow the zigzag movement with the given

speed. Yet the wiping plan is executed satisfactorily.

anteed by the second-order filter. Additionally the maximalvelocity is limited. The positional error(αφ|f − y|)kφ (seeEq. (4)) stops the oscillator whenever external influencescause the robot to do so, so that no significant part of thenext movement primitive is lost. It can be clearly seen fromFig. 5 that the breaking factor slows down the trajectorygeneration whenever a new transformation MatrixH isapplied.

The roundish movement is no problem for the impedancecontrolled robot to follow—in contrast to the zigzag-movement shown in Fig. 6. Although there the oscillatoris almost always slowed down, the actual trajectory of therobot is still similar to the desired one, so the wiping actionis accurately executed.

To demonstrate a momentary external disturbance, theLWR-II was simply gripped and moved away from itsordered position while executing a movement plan similarto Fig. 5. Fig. 7 presents a part of the movement recordings.As demanded, the further generation of the trajectoryceases as long as the disturbance lasts.

IV. FUTURE WORK

Further improvements would include an evaluation ofgenetic algorithms (GA) for optimising the wiping layout,since GAs should be able to evaluateO(n3) layouts perevaluation-step (n denotes the size of a generation, see [7])

1350

1400

1450

1500

1550

1600

1650

0 40 80 120 160 200

x in

mm

time*0.06s

-1450

-1400

-1350

-1300

-1250

-1200

-1150

-1100

-1050

-1000

-950

0 40 80 120 160 200y

in m

mtime*0.06s

Fig. 7. x andy recordings of a roundish movement with a momentarydisturbance around time 80 – 100. Clearly you can see how the further

trajectory-generation ceases for the duration of the disturbance.

in contrast to Simulated Annealing which can evaluate onlyone per step.

Furthermore different path optimising algorithms willbe evaluated, especially randomised ones, as they allowsuboptimal solutions to be generated. In a demonstrationscenario computing time is always short, so it is moreimportant to find a quite good, yet suboptimal solutionthan waiting for ever for the perfect solution. RandomisedAlgorithms (such as GA) are able to deliver this—like ourA∗ by decreasingγ. Also path optimisation should alsoconsider the start- and end-points of the wiping primitiveand preferably also the directions of these. Randomisedalgorithms should also be able to cope with this matter.

Another interesting aspect would be the extraction ofone primitive wiping movement out of a continuouslydemonstrated wiping action, as we (humans) are better ableto generate a sequence of rhythmic movements rather thanjust exactly one—so this is an ergonomic issue.

Eventually the robot will be enhanced to 10 DOFs(LWR-II on a moving platform) so the area it can reachwill be greatly enhanced. Furthermore, the system must beenhanced such that non-wipe zones are taking into accountin the planning phase.

V. CONCLUSION

An approach for teaching and solving complex tasks ina service robotics scenario has been proposed. The humanteacher demonstrates simple movement primitives by guid-ing the robot by the hand. The system first learns thesemovement primitives, then combines them autonomouslyto a complex movement plan. The idea is exemplified in anapplication in which the robot has to wipe a surface whichis specified online. A major advantage of the proposedapproach is the ability to cope with temporary disturbancesand to adapt to robot limitations.

REFERENCES

[1] Albu-Schäffer, Alin and Hirzinger, Gerd. Cartesian impedancecontrol techniques for torque controlled light-weight robots.ICRA,pages 657–663, 2002.

Fig. 8. The wiping LWR-II robot.

[2] Bentivegna, Darrin C., Atkeson, Christopher G., and Cheng, Gordon.Learning from observation and practice using primitives.Workshopon Robot Programming by Demonstration, IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 2003.

[3] Brunner, Bernhard, Vogel, Jörg, and Hirzinger, Gerd. Aufgabenori-entierte Fernprogrammierung von Robotern.at - Automatisierung-stechnik, 49(7):312–319, 2001.

[4] Doya, K. and Yoshizwawa. Adaptive syncronization of neural andphysical oscillators. Advances in Neural Information ProcessingSystems, 1992.

[5] Ehrenmann, M., Zöllner, R., Rogalla, O., and Dillmann, R. Program-ming service tasks in household environments by human demon-stration.11th IEEE Int. Workshop on Robot and Human interactiveCommunication (ROMAN), pages 460–467, 2002.

[6] Ermentrout B., and Kopell, N. Learning of phase lags in coupledneural oscillators.Neural Computation, 6:225–241, 1994.

[7] Goldberg, David E. Genetic Algorithms in Search, Optimization,and Machine Learning. Addison-Wesley, Reading, Massachusettset al., 1989.

[8] Ijspeert, Auke Jan, Nakanishi, Jun, and Schaal, Stefan. Learningattractor landscapes for learning motor primitives.Advances inNeural Information Processing Systems, 15, 2002.

[9] Ijspeert, Auke Jan, Nakanishi, Jun, and Schaal, Stefan. Learningrhythmic movements by demonstration using nonlinear oscillators.Proceedings of the IEEE/RSJ Int. Conference on Intelligent Robotsand Systems, pages 958–963, 2002.

[10] Ijspeert, Auke Jan, Nakanishi, Jun, and Schaal, Stefan. Movementimitation with nonlinear dynamical systems in humanoid robots.Proceedings of the IEEE International Conference on Robotics andAutomation, pages 1398–1403, 2002.

[11] Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, StefanSchaal, and Mitsuo Kawato. Learning from demonstration and adap-tation of biped locomotion with dynamical movement primitives.Workshop on Robot Programming by Demonstration, IEEE/RSJInternational Conference on Intelligent Robots and Systems, 2003.

[12] Milenkovic, Victor J. Rotational polygon overlap minimization.13thAnnual Symposium on Computational Geometry (SoCG), 1997.

[13] Nilsson, Nils J. Principles of Artificial Intelligence. MorganKaufmann Publishers, San Francisco, reprint edition, 1986.

[14] Nishii, J. A learning model for oscillatory networks.NeuralNetworks, 11:249–257, 1998.

movements for autonomous service robotics learning from ... · abstract this paper presents a...

Documents