complex object manipulation with hierarchical optimal controltodorov/papers/simpkins... · 2011. 4....

Complex object manipulation with hierarchical optimal control

Alex Simpkins† and Emanuel Todorov‡

Abstract— This paper develops a hierarchical model pre-dictive optimal control solution to the complex and interest-ing problem of object manipulation. Controlling an objectthrough external manipulators is challenging, involving non-linearities, redundancy, high dimensionality, contact breaking,underactuation, and more. Manipulation can be framed asessentially the same problem as locomotion (with slightlydifferent parameters). Significant progress has recently beenmade on the locomotion problem. We develop a methodologyto address the challenges of manipulation, extending the mostcurrent solutions to locomotion and solving the problem fastenough to run in a realtime implementation. We accomplishthis by breaking up the single difficult problem into smallermore tractable problems. Results are presented supporting thismethod.

Index Terms— Optimal control, object manipulation, leggedlocomotion, hierarchical control, optimization, adaptive control,nonlinear systems

I. INTRODUCTION

Manipulating objects is a complicated problem (Fig. 1).As you read this paper you are either controlling your mouseor flipping through paper, performing control in the contextof uncertainty, contacts (changing dynamics), nonlinearities,redundancy, and high dimensionality. Both manipulation andlocomotion are difficult, related problems which have notyet been completely solved in the control sense. We presentextensions to current methods and demonstrate applicationto simulated robots performing manipulation tasks. Addition-ally, we address a methodology for making difficult problemsmore parsimonious by breaking the problem into pieces.

Related work, and the connection between locomotionand manipulation: Among the most relevant work that hasbeen done is the work in [8] who has assumed a grip isgiven and contacts are static, and then determines the forcesto move an object in a prescribed manner. This is akin tosolving part of our problem. Though this is groundbreakingwork, it is limited for manipulation problems where theobject may need to be in continuous rotation. In that case, thegrip must be constantly changed, and how the grip is changedis a dynamic problem coupled to the low level problem ofwhen, where, and how to move each individual manipulator.Many studies have been performed about how humans gripobjects[13], but this assumes a somewhat quasi-static notionof holding an object. In robotics, work has been done on de-termining forces to constrain an object with a gripper[11] andgripper reference positions[5], again in a static equilibriumsense, and only with a particular number of manipulators(usually a 1-degree-of-freedom gripper). Here we do notlimit ourselves to quasi-static situations, but rather determinea solution approach which deals with static and dynamicinteractions, and even changing numbers of manipulators.

This work was supported by the US National Science Foundation†Department of Computer Science and Engineering, University of

Washington, Seattle 185 Stevens Way, Seattle WA 98195-2350 email:[email protected]

‡Department of Computer Science and Engineering, University ofWashington, Seattle, 185 Stevens Way, Seattle, WA 98195-2350, email:[email protected]

Fig. 1. Overview of the two dimensional manipulation problem. How doesone apply forces to an object to (a) not allow it to drop, and (b) cause it tofollow a prescribed trajectory?

Significant work has been done in control toward improvingthe dynamics of gripper mechanisms themselves, such as[3], but this work does not solve the entire problem as onedynamic system, including the object and gripper dynamics.It assumes open loop trajectories to follow in order to achievea given grip, dealing only with what is the lowest level ofthe present control scheme. Additionally, there is evidencethat biological systems intelligently use the passive dynamicsbuilt-into their bodies to aid in control rather than cancel theirdynamics[16][9]. The present work attempts to draw on theintelligent approaches used by biology.

Prior work has addressed the problem of locomotion, butwe can consider that manipulation of an object is essentiallyfunctionally identical to locomotion with only relativisticdifferences. In object manipulation, the manipulator is typ-ically larger than the manipulated object (or on a similarscale), and so the object is moved and controlled relativeto the manipulator origin. But in locomotion, the objectbeing manipulated is the ground, which is much larger thanthe manipulator, so the origin of the manipulator movesrelative to the manipulated object. In both these cases thereis a relative movement between the object and manipulatororigin, and the goal is to control that movement.

If this assumption holds for the purposes of our model,similar approaches to locomotion can be applied to manipu-lation. Unfortunately, locomotion is still largely an unsolvedproblem in the sense that most solutions require a great dealof insight and manual ‘tweaking’ by the experimenter tocreate a successful locomotion algorithm. In addition, mostlocomotion approaches can perform only a very limited num-ber of behaviors (walk only over smooth ground, run only,etc.). However, the approach that is currently quite promisingis a hierarchical control approach[7][10], combined withmodel predictive control. Indeed, the most successful exam-

ples can be found in computer graphics where people areinterested in creating interactive characters which functionbased upon physics rather than motion capture or kinematicdata[18][12][2][6]. This approach has the potential to createvery adaptable interactive characters which have unscriptedand rich behaviors. Such work has not yet been implementedin real physical robotic systems (see [1] for an overview), andthis is currently in process in our lab.

On redundancy and underactuation: A parallel betweenwalking and manipulation is that in both cases the endeffector is underactuated - i.e. one does not have directcontrol over the object in question (either the ground relativeto the agent or the small manipulated object relative to thefingers). This is another reason that similar techniques shouldbe applied to both problem classes.

Contributions: This work contributes a new dynamichierarchical optimal control approach to the manipulationproblem, which can also be applied to the locomotionproblem. It also requires a minimum of manual tweaking,can be tuned through a few intuitive parameters, and canbe expanded in terms of behavioral complexities easily. Thisis also a realtime implementation of this solution, requiringno offline computation to adapt or pre-learn trajectories.These methods are also designed to be implemented onreal physical robots, beginning with those developed in ourlab specifically for manipulation and locomotion. Anothercontribution is the concept of treating manipulation and loco-motion as essentially the same problem with slightly differentparameters. Finally, the method of handling joint limits by auseable workspace with a soft boundary that acts as a springwhen the manipulator is drawn past the workspace facilitatesnumerical stability, and parallels biological systems in a waynot yet implemented to our knowledge.

Paper outline: The rest of the paper is organized as fol-lows: Section II describes the system model and hierarchicalcontrol approach to the problem, Section III describes theexperiments used to test the algorithm, Section IV describesthe results of the experiments and discusses the implications,and finally Section V presents a number of conclusions tothe paper and discusses the next steps to be taken.

II. MODEL

Objects will be modeled in two dimensions (2D) in thispaper, though all techniques developed here extend to threedimensions directly (and this is the topic of an upcomingpaper). An object is represented by a center of mass, mass,inertia, and an external boundary. This allows for objects ofany 2D shape to be represented. Locations for contact pointson the object are defined by angles relative to the origin ofthe object (a coordinate system located at its center of mass).The boundary of the object is represented in the controlstructure as a functional of the angle relative to the origin(this is mathematically stored as a cubic spline representationin order to facilitate the concept of the controller ‘learning’about various objects it manipulates).

The system includes a given number (which may changeat any time) of 2D manipulators (See Fig. 2) which can movein the plane of the object(s) and apply forces. These are 2Drepresentations of manipulators which are in development inour laboratory and are currently in their prototype stages.

The manipulators are represented mathematically as a fourbar linkage where each bar has mass, inertia, and a center ofmass. The end effector extends from the second link in theloop, and torques are applied at the first joint, with effects

Fr

F1

F2

τ11

τ21 τ12

τ22

Fig. 2. Each prismatic manipulator has two degrees of freedom, andmeasures position of each joint and output force.

upon the first link and the fourth link. The nature of the linkstructure creates a bounded workspace, which is representedas a function of the lengths of each link and the range ofmotion of joint one and four.

As with the physical robot, the rotor inertia in the DCmotors (as coupled through the drive system) dominates theinertia in the system, so in general the system behaves like apoint mass at the endpoint. By treating the complex trajectorypart of the control problem at this level (i.e. controlling apoint mass), optimal trajectories can be found quite rapidly,and output forces can be generated at a lower level bylocal feedback. This facilitates a realtime implementation.Complex dynamics can also be addressed by the use ofparallel processing such as with GPU’s, and implementation(at least partially) in C versus Matlab where loops areperformed.

A. n-finger 2-D manipulation problemIt can be shown that the sum of forces on the object with n

manipulators applying forces are given by the sum of forcesand moments as,∑

Fx,o =∑i

fxi−moax,o (1)∑

Fy,o =∑i

fyi−mog −moay,o∑

Mo =∑i

(− fxdy(θ) + fydx(θ)

)i− Jθo

with fi, d(θ), J , and θ representing, respectively, the forceapplied by manipulator i, the surface location relative to theobject center as a function of angle, the object rotationalinertia, and the object angle relative to horizontal, with()o representing ‘with respect to the manipulated object ofinterest.’

An important constraint in contact is the following (unlessslip occurs or the objects are moving apart, which is dealtwith in the contact algorithm), with a representing acceler-ation of either the object ao or point mass i, ai, and θo theangular acceleration of the object:

ax,i = ax,o + θodx(θi) (2)

ay,i = ay,o + θody(θi)

This states that the point masses must move in such away when in contact that they do not slip. The velocities

must also match during contact, and thus a discontinuouschange (unless a soft contact algorithm is implemented) invelocity is experienced by the point masses and object. Thiscan be modeled as an impulsive force which exactly bringsabout this change in velocity. The point masses are assumedto be insignificantly small compared to the object mass,though this assumption (could be dropped with some minoradjustments to the equations).

When not in contact, the accelerations of the point massesare given by, with external forces due to the robot actuatorsbi:

ax,i =1

mibx,i (3)

ay,i =1

miby,i − g

The object dynamics are given from Equation (1) afterrearranging by (note in the absence of contact the objectis only affected by gravity): ∑

i

fx,i −moax,o = 0 (4)∑i

fy,i −moay,o = mog

−∑i

fx,idy(θi,p) +∑i

fy,idx(θi,p)− Jθo = 0

The end effector dynamics when in contact with the objectare given similarly by, after substituting in Equation (2):

fx,i +miax,o +miθodx(θi) = bx,i (5)

fy,i +miay,o +miθody(θi) = by,i +mig

Which can be rearranged more succinctly by combiningthe force, mass/inertial, and external force variables andconstants as,

f −ma = b (6)

or in matrix form, where bold face denotes appropriatelysized sub-matrices, and I represents the identity matrix,

A =

1 0 −mo 0 00 1 0 −mo 0−dy,i dx,i 0 0 −JI 0 mi 0 miθodx(θi)

0 I 0 mi miθody(θi)

(7)

w =[fx,i fy,i ax,o ay,o θo

]T(8)

b = [ 0 g 0 bx,i by,i +mig ]T (9)

Aw = b (10)w = A+b (11)

where (x+) representing the pseudoinverse of x. The result,w, contains the forces the point masses apply on the ob-ject (and corresponding opposing forces by Newton’s lawof equal and opposite reactions), as well as the resultingacceleration of the main object. The size of the A matrixchanges ‘on the fly’ as point masses come in contact and outof contact. In this way one can solve for the set of contact

forces and acceleration. If there are multiple solutions, thisdetermines a local minimum.

We now have our problem: when in contact - computethe individual forces and locations which optimally track thereference position or force, when not in contact, computethe required individual force trajectories to move the fingersalong a trajectory. The two states of contact or non-contacttrajectories form an object manipulation ‘gait.’

In order to maintain contact with the surface, an orthogo-nal (to the surface) friction force must be maintained, unlessthe forces on the object are in static equilibrium. Let usconsider the friction case. Essentially, one must maintain afriction force which balances the forces orthogonal to thecontact point. Whatever the desired forces computed by theabove equations from the reference trajectories of the object,the friction force required poses a constraint on the individualmanipulator controls. In this paper, we assume the simplestform of friction, which is that of complete ‘stickiness’ of theobject. In other words, any force into the object normal tothe object surface causes the manipulator not to slip in theorthogonal direction to the surface at the point of contact.This does not avoid the issue of dropping the object, andso the intention is to simplify the system dynamics withoutsacrificing addressing the important aspects of this problem.

The problem also becomes more interesting when oneconsiders redundant manipulation1. In this case, a minimalintervention principle[16] can be used to generate appropriatecontrol signals. This works by one of two approaches,depending on how one is interested in solving the problem.If we don’t want the (already in contact) manipulators toadjust their ‘grip’ positions on the object to be optimal, thenthe problem is ‘given a particular distribution of contactpoints, what is the set of minimal forces to apply whichwill produce the desired acceleration and maintain constraintsatisfaction.’ The second way to approach the problem is toconsider the previous problem but add the following, ‘addi-tionally, what is the optimal configuration of manipulatorsto produce the minimal set of forces required to move theobject with a desired acceleration.’ With redundancy, thismay be a problem that has multiple solutions, so the bestsolution will be considered to be the one which is the lowestdimensional. The trajectory to move an object becomes partof the optimization problem.

This can all be formulated within the optimal controlframework[15], and solved either with a globally optimalmethod (approximate the value function or control surfaceover the region of interest) or with a locally optimal controlmethod, such as iLQG2. Additionally, iLQG can be initial-ized by an approximation to the globally optimal solution.

The actual robots to which these algorithms will be applied(which are called ModBots, developed in our lab by the firstauthor) have more than enough dynamic capability to becontrolled as point masses or ideal force applicators (with acarefully developed local controller, a well-identified system,and inertia dominated by the rotors of the motors), so the endeffectors are modeled as point masses. This simplifies theoptimization problem which we will discuss below, and helpswith making the algorithm applicable to realtime systems.

1In other words, more degrees of freedom in the system (inputs) thanin the object being manipulated (outputs), or end effector. An example inbiological systems is the human arm, which has seven degrees of freedom,while the output end effector, the hand, can be oriented in a six-axis space(position and orientation). Therefore more than one solution exists for manytasks.

2iLQG refers to the “Iterative Linear Quadratic Gaussian” method.

Fig. 3. Hierarchical control scheme block diagram.

B. Overall hierarchical control methodologyThough the entire problem can be framed within a single

level stochastic optimal control, this is a high dimensionalnonlinear problem, and to solve this in realtime, evenapproximately, is difficult. Additionally, there is evidencethat the human brain solves this problem in a hierarchicalfashion, even though the hierarchies are highly integrated[4],therefore why not use nature as a model? Instead of one highdimensional, difficult problem, we can reduce this problemto several smaller simpler ones which address each level ofcontrolling an object in stages, and sum up to a solution to themore difficult problem. Solving the problem hierarchically,with multiple independent terms at the various stages allowsfor different strategies to be inserted at various points in thehierarchy without having to reformulate the entire problem.For example, one could decide that distributing manipulatorsevenly is key, and thus this stage would have different termsin the cost function, and in fact could be broken down intotwo sub-problems to solve for positions and forces.

There are essentially three components to this algorithm(Fig. 3). The top level computes, given a reference positionand orientation (this can also be operated with force control),an ideal force and torque to be applied at the center of massof the object. This can be done with a model predictivecontroller (using iLQG to allow for nonlinearities in thecost function and/or system), or a simple feedback controller(such as a proportional derivative controller). The middlelevel computes, depending on the actual position of eachend effector, the target locations for each point mass andforces that should be applied to the point masses either toget them into position or to apply the force to the object.The low level maps those endpoint coordinate commandsinto feasible joint torques, and performs feedback if neededto make that happen. The problem can be succinctly statedas follows:

1) Determine overall force to apply on object (via optimalMPC, PD feedback, splines, etc)

2) Decide, given the number of manipulators (max.)where to place them according to an optimal criterion

3) Decide, given placements (and this is coupled to 2),what force output vector each finger will perform oncein contact/in place

4) Move them there and apply the forces, avoiding anycollisions

It may come to pass that individual end effectors must beshifted in terms of contact position. This does not need to bedescribed as an independent step, as there are two strategiesthat can be employed to resolve this which work seamlesslyas part of step three. The first and quickest to implementin a tuning sense is to use a force field which affectsfinger placement based on various system states, essentially

an output map of the second strategy. The second strategyis to use an appropriately set-up optimal control problemwith terms in the cost function for staying within bounds.The optimal approach will produce the correct movementand contact breaking behaviors, as this will simply be theoptimal thing to do. This paper explores the force fieldmethodology, and future work will exploit the knowledgegained to formulate the low level problem as an optimalcontrol problem, solved with iLQG.

C. High levelFor an in depth explanation of the iLQG technique,

see [17]. Essentially what this method does is iterativelyapproximate a linear quadratic gaussian solution to theactual nonlinear cost function and nonlinear system aboutthe current state. This method has a number of advantagesdiscussed in the reference, but perhaps the most significantis its ability to construct a general solution approach to thevaried problem structure which arises in manipulation. Weapplied this in the model predictive setting, running a timehorizon of ten time-steps into the future.

Another simple high level controller which is not modelpredictive is a proportional-derivative (PD) controller, givenas (in discrete form),

ek = rk − yk (12)

uPDk = Kp(ek) +

Kd

∆T(ek − ek−1)

where ∆T is the time increment, k is the current time-step, Kd and Kp the differential and proportional gains,respectively, e is the error, y is the output state feedback,and r is the reference. One can implement any of thevariations on classical control approaches (i.e. including anintegral term), depending on the goal. Generally, removingthe integral term creates more compliant manipulation, andbiological systems do tend to have some error toleranceduring normal behavior[16].

D. Mid-levelNow given the required force/torque to apply to the object,

we need to determine where the best place is to apply theindividual forces, given a certain number of manipulatorssuch that they combine into the overall forces we want. Wealso need to determine what those individual forces shouldbe.

In order to address this problem, we must consider thewhole behavior and system. Recall that the goal is totrack the high level reference, which may move through aparticular manipulator’s workspace and beyond, or may notenter a particular manipulator’s workspace at all. So it mustbe determined what a manipulator should do if the object tomanipulate goes outside of its workspace.

We build a method by considering behavior in humans.Hold an object in the air and rotate it (if it is breakable holdit over a pillow or other safety net please), using only yourfingers. As each finger nears the boundary of its workspace, ifyou pay attention to your experience, you feel an increasingneed, a desire to change the location of that finger to amore ‘comfortable’ one. It seems to take conscious effort tooverride this natural instinct. Essentially what we can distillfrom this in our case is that, as one nears a boundary, away of conceptualizing this is that the human feels a typeof virtual force, drawing their finger away from its currentlocation. Then a trajectory evolves to a more central to the

(a) (b)

(c) (d)

Fig. 4. A force field that facilitates contact breaking can be constructedwhich is dependent on several factors - center of workspace, center of object,and finger position. The field changes (as shown in (b), (c), and (d)) asthe object enters the region of usable workspace. Here the workspace andobject are circular for clarity, but any shape can be represented. The field isscalable and movable. The arrows depict the force vector at a particular pointin space. The concept is that as the finger is manipulating an object, andmoves too close to a workspace boundary, the field (which blends a positionfeedback controller with a number of other features) generates a force thatcauses a contact break and subsequent reposition of the finger more in thecenter of the workspace. Essentially this is an explicit representation of acost function, though employed at the control policy level. The shaded circlein (b), (c), and (d) represents the object being manipulated, and the centeredcircle is the workspace definition. Outside the workspace a simple feedbackpushes the finger back into the workspace, for stability purposes.

workspace point of contact. We call this a virtual forcebecause it is used in a weighted comparison with the contactforce at each iteration, and the one with the larger normalforce component ‘wins,’ and is then used as the force appliedto the end effector.

Workspace virtual force field : A method for distributinghigh level commands is the concept of a virtual force field(As depicted in Fig. 4(a)-4(d)) which is dependent on theshape of the workspace for each finger, the location of theend effector, and the location (or estimated location) of theobject. This strategy is helpful for constructing terms whichwill ultimately be employed for a model predictive lowlevel component. It is actually a parallel concept except thatthe model predictive approach will compute control policieswhich may achieve lower mean squared error for trackingdesired object trajectories than a feedback-only scheme.However it is important to eventually compare a high levelmodel predictive control (MPC) approach to a low level one,and finally to a scheme with all MPC methods.

This force is built by a number of terms in a functionwhich we will call a ‘Force-Cost’ function (FC function),since the terms are applicable to either the optimal controlsetting or our force field setting. When referring to thecomputed output of the FC function, for simplicity, we will

hereafter refer to the output as a ‘cost’ in either setting3. Thisfunction must include a variety of terms in order to addressthe required behaviors, the latter of which are:

1) Getting to target given some state of the system2) Applying some force to object3) Distribution of contact points4) Desire to change contact points (break contact) when

appropriate5) Handle limited workspace6) What to do if an object moves outside of the

workspace, or the end effector moves outside theworkspace

Many of these behaviors can be combined into single equa-tions. When in contact, the contact forces are computedwith a quadratic (and thus very fast) constrained (to onlyapply forces toward the object surface, or the force is zero)optimization that minimizes the error between the sum offorces (due to any number of manipulators in contact) andreference force. d(θ) is the shape of the object (parametriccurve or surface - note in the case of a surface the parameterwill be a 2D parameter), where θ is a parameter of the curve,and fr represents the reference force and torque to apply atthe next time-step,

fti,k =

[fxi fyidx(θi)− fxidy(θi)

fyidx(θi)− fxi

dy(θi) fyi

]

J(fi)k =[[fr,k −

∑i

fti,k]2

+∑

(x,y),i

f2i,k

]Where the

∑(x,y),i is defined as ‘over x, y, and all i.’ A

second term to minimize maximum forces can be added,similar to a control cost. At each time-step the optimizationis initialized with the previous result, which tends to producemore smooth outputs. Thus the optimization problem is,(

fopti = argminfi

[J(fi)k

], s.t.

[f • n < 0,

])(13)

where n is the normal vector to the surface of the object,and (•) denotes the dot product, and fi includes both x andy dimensions implicitly here.

The force field consists of the following terms, the first ofwhich is, with gain Ke,

Fe = Ke(x− xc)

1 + ||x− xw||2(14)

which generates a force as the end effector location is furtherfrom the estimated or actual center of the target object to pullit toward the object surface, but this force decreases as theend effector moves further from the center of its workspace,until the next term dominates. The one prevents a vanishingdenominator. This term works in concert with the secondterm to determine where contact will occur (Fig. 5(a)-5(d)).It is heavily affected by the location of the object relativeto the workspace. The second term, (with gain Ks, surfacenormal n, and parameter a),

Fs = Ksn||x− xw||e−a(x−xc)2

(15)

3We state ‘may’ since the force field yields a predictive component,similar to an infinite horizon setting, since the resulting policy will act suchthat force over the entire behavior is minimized (recall we are consideringforce and cost parallel in our FC function), and so it is unclear wether oneis superior, or wether they are functionally equivalent when provided withequivalent parameters

generates a force which tends to pull the end effector awayfrom the surface of the object. This force is larger the furtherthe end effector moves from the center of the workspace,but drops to zero as a gaussian function of distance from thecenter of the object. This is what is responsible for contactbreaking. The forces are summed,

Ft = Fe + Fs (16)

Which yields the field displayed in Fig. 4(b)-4(d). Notehow the field adapts as the various system states change.Fig. 4(a) shows a simple force field concept - the furthera point is from the center, the larger the force pullingtoward the center of the field. More complex fields arepossible. It is not difficult to determine empirical forcefields from behavioral data, similar to inverse optimal controlor using system identification methods to fit a function toreproduce biological data. Thus this method is extendable tomany coordinated movement patterns, and parameters can beoptimized for different styles of movement, or they can belearned in an active learning setting (such as[14]).

This is both a predictive and adaptive approach. It ispredictive in that the endpoint is not computed, but the nextpoint is a portion of an entire trajectory which will emergeif all the other states were to remain constant during thetrajectory. Essentially the MPC is recomputing all possibletrajectories at each time-step as the functions are continuousfunctions of the states, so the trajectory adapts as all stateschange.

In order to deal with workspace boundaries, a saturationeffect is produced by switching the field to, where Kk is again,

Ft = Kk(x− xw) (17)

which tends to push the manipulator back into the workspaceif it exits it. In other words this is similar to biologicalsystems where a boundary is reached by the combination oftendons with the physiological structure creates a nonlinearspring effect (move your finger up as high as it can go byitself, that is the boundary of the workspace where you cancontrol it. Using your other hand, you can move your fingerquite a bit higher, but the effect is that of a spring - thefinger pulls against your hand back toward the center of itsworkspace).

E. Low LevelThe low level is responsible for mapping the commands

for forces to apply at the endpoints into joint torques. Thereare several advantages to working in endpoint coordinatesthen mapping them into joint angles and torques. Theseinclude structural variability in the actuators used (solvethe problem then reverse the joints and no change in theoverall solution occurs, but the system still works just aswell), simpler kinematic equations (one removes severaltrigonometric relationships, which are nonlinear relationsfrom the optimizations), and more intuitive FC functionterms.

The algorithm implements a bounded Newton’s methodoptimization, initialized at the current state, to compute thenearest torques to apply which are within capabilities of thevirtual actuators at the joints which map most closely to thedesired end effector forces. This turns out to, in practicework well and execute with only a few iterations on averageuntil convergence, even when a large change occurs, such astransitioning between contact and no contact.

(a) (b)

(c) (d)

Fig. 5. Why workspace maximization is an important consideration whenchoosing where contact points occur. By choosing a point which puts eachmanipulator closest to the center of its workspace, the object workspace(how far in various directions the object can be moved without furtherchanging contacts) is maximized, either for rotation or translation. If a seriesof commands into the future are predictable, this may skew where the bestplace for these contacts will be.

Finally, a simple PD controller tracks the required jointtorques. We found that, in fact, this final component can bedisabled and the algorithm functions well, but in practiceit can improve robustness, and allow the higher levels tobe operated at a lower bandwidth than without low levelfeedback.

When implemented on the physical robot, the low levelwill run on the local digital signal controller, and so theoverall loop running at the high and middle levels willexecute in less than the current time-step.

III. EXPERIMENTS

Manipulate a known shape and mass/inertia with a specifiednumber of end effectors

Consider an object of arbitrary shape with some massm and inertia about the z-axis I . There are n manipu-lators which can have a (specified) limited range of mo-tion and maximum force (or even a forcing function overthe workspace, but for this experiment we consider aconstant force capability over the workspace). These ma-nipulators have bases located at xb, and the workspacehas a center relative to that base at xw. All simula-tions are integrated using Symplectic Euler integrationand a time-step of 5msec (Animations are available athttp://casimpkinsjr.radiantdolphinpress.com).

Grab object, reference track, not specifically attemptingcontact breaking: This consists of grabbing the object withthe end effectors, which are initialized at some random non-contact state away from the object. Then the object is, in thefirst sub-case, brought back to the zero position and musttrack a sinusoidal angular reference trajectory. In the secondsub-case, the object is moved through a sinusoidal pattern inboth position and orientation.

Grab and perform continuous rotation: Rotate the objectat a continuous velocity. This will force the fingers to changegrip, while maintaining appropriate forces to keep the objectcenter at a fixed point.

200 400 600 800

−10

0

10

20

Sample (1e−4sec)

Force (N)

Σi Fi

Fr

−20

(a)

0 500 1000−300

0

300

500

Sample

Error (N)

(b)

2 4 6 8

−0.04

−0.02

0

0.02

0.04

Sample (x1E2)

Error (N)

(c)

−1 0 1−1

0

1

Actual

Ref.Y

Posi

tion

(m

)

X Position (m)

Start

(d)

Fig. 6. This shows the high accuracy and precision with which theoptimization breaks up the generalized force and torque into forces to applyto the object through the finger endpoints. (a) shows one dimension of thereference force, and the sum of the multiple endpoint forces with threeend effectors. (b) and (c) show the error between the two signals, with (c)zoomed to see the fact that the force sum is accurate to approximately 10mN. (d) shows the object being pushed through a circular trajectory by threefingers initially not in contact (the circles denote the starting points). Thereference is tracked very well.

IV. RESULTS

A. Convergence and execution speedAs the optimization is convex, convergence was achieved

very rapidly. The iLQG algorithm, since it is solving asimplified high level system which is lower dimensional,converges in an average of two iterations. The entire loop,though not fully optimized, took on average 1.1msec whenno contact occurred, and, with four manipulators 34.5msecduring contact (Fig. 7(c)). This is executed in Matlab R2007arunning on a macintosh computer with a 2.3GHz dual coreprocessor. This is fast enough to run in realtime, though itcan be further optimized.

B. Tracking the high level reference forcesThe ability to reproduce the reference forces required

by the high level control is clear in Fig. 6(a)-6(d). Theerror between desired force and combined forces is verylow at under 20mN when the forces are on a scale of20N. The object can be made to track a fast-moving circlereference, while keeping the object horizontal. The error isquite Gaussian, which is to be expected. There is no evidenceof oscillation or instability. Indeed, the algorithm, duringimpossible initializations (fingers too far apart, or object toolow to grip), behaved well, with no unpleasant accelerationsor oscillations. Instead a very biological-appearing behaviorof reaching after the falling object appears.

C. Trajectory tracking in object manipulationFig. 7(a)-7(c) shows that the object position and orienta-

tion can be controlled using this methodology. This controlworks with various numbers of manipulators, as a result ofthe high level making a plan and the low level executingit. Thus the low level can change structure (number of

0 0.2 0.4 0.6−1

−0.5

0

0.5

1

Time (sec)

Position (m, Rad)

x

y

θ

(a)

0.5 1 1.5 2 2.5

−0.5

0

0.5

Time (sec)

Position (m, Rad)

(b)

0 200 4000

20

40

Sample (1msec)

Tim

ing

(m

sec)

(c)

Fig. 7. (a) The object position is shown in solid lines (x, y, θ), respectively,with each corresponding reference trajectory as a dashed line. The contactpoint happens at various intervals with the first finger contacting at t=0.081sec. approximately, and the last approximately 50 msec. later (three fingermanipulation). After the first finger contact, the object begins a trajectorytoward the goal angle and position. This is desirable in object manipulationwhere the number of fingers in contact is not as important as controllingthe trajectory of the object. (b) Same color plot representations, thistime demonstrating rapid tracking of all three axes. (c) Timing of thecomputational loop in Matlab (even with realtime graphics plots this isfairly fast).

manipulators, physical structure, etc) and the algorithm willproduce similar results. It is interesting to note that contacttiming for each end effector is slightly different, but thealgorithm’s method of recomputing optimal forces to applygiven current configurations consistently works, since it isthe same output, for example, with two fingers and one notin contact, as if there were only two fingers in existence.The contact locations would change, but as the movementprogresses through time the grip would change to be moreoptimal for two fingers only.

D. Contact breaking and continuous motionsFig. 8(a)-8(c) depicts the manipulators successfully ro-

tating an object (quite rapidly) to track a constant angularvelocity. The fingers move through a number of gait transi-tions as necessary, similar to Fig. 8(c). It is worth notingthat the trajectory, contact point, and moment of contactbreaking are all implicit behaviors defined in the middlelevel. As one can see in Fig. 8(b), it is possible that theparameters of the system can be optimized further to moreclosely track the reference, but object manipulation is notabout perfectly tracking a reference, as in fact it has beenshown that some error is completely ignored in the humansensorimotor system. Manipulation and movement in generalis performed in order to achieve an overall goal, regardlessof small perturbations of the system or small errors.

V. CONCLUSION

Object manipulation using multiple manipulators is acomplex problem. Optimal control and model predictivemethods appear to yield very useful strategies for addressingthe multitude of behaviors which are required to create a

−1 0 1 2−2

−1

0

1

2

x Position

y P

osi

tion

F−1

F−2

F−3

(a)

0 0.5 1−1

0

1

2

3

4

5

Time (sec)

Position (m, rad)

x

y

θ

(b)

(c)

Fig. 8. (a) Depicts xy-data of three manipulators rotating an object, andchanging grip a number of times. The object-finger grip position relationshipis interesting in that the ‘gait’ of the fingers varies depending on howlarge of a correction is required as the finger needs to reposition itself.(b) Depicts a portion of this data, where the objective is to rotate theobject continuously at a particular velocity while maintaining xy-centerposition. (c) One manipulator during a contact break and return trajectory.The manipulator, as it nears the edge of its workspace (depicted as theshaded area), breaks contact, moves away from the object, and then followsa trajectory to a central-to-the-workspace contact point on the object. Notethat collisions are only computed between end effectors and the object, andend effectors and each other.

manipulation algorithm that deals with more than a singlesimple behavior. However, attacking it as a single partproblem often is not feasible in realtime using even the mostmodern techniques and hardware. Since creating solutionswhich are adaptable and dynamic on-the-fly are important toemulate the complex behavior patterns of biological systems,a hierarchical control approach was proposed in this paper.Is this the one and only way to solve the problem? Is thebrain doing exactly this? Though there is evidence that thebrain does address control in an online and hierarchicalfashion, more than half of the neurons in the brain aredevoted to movement, so any single approach will not capturethe richness of behavior patterns available to the biologicalorganism. In addition, we realize that the model is not thereality, and so we do not claim this is precisely the processthe brain follows because that would be incorrect.

The method proposed here makes the process of generat-ing new behaviors simply a matter of creating new terms inthe FC function. This is among the first algorithms to demon-strate, without motion capture data, a method for objectmanipulation which can not only control an arbitrary objectto track a reference trajectory, but also when necessary breakcontacts and determine how to grip an object successfully.This method allows minimal ’tweaking,’ and can potentially

have all parameters learned online. It also runs in realtimewith no pre-computing required to solve the problem. Themethod of handling joint limits by a workspace with aspring force boundary which pushes the system back intothe workspace is drawn from biological systems, and workswell, and avoids rigid nonlinearities which can contribute tonumerical instabilities.

Future work will compare these results to biologicalsystems in depth, apply the method to physical robots, applyit to locomotion, and extend the equations into 3D and softcontacts. While we have not reported quantitative resultsregarding robustness to parameter perturbations, and thisis another item to present in the next paper, MPC withfeedback tends to be quite forgiving regarding these issues.The difference becomes an error signal which is fed backinto the system, and then is compensated for automatically. Infact, sometimes a simple model is purposefully used, with theintent of pushing the system to behave closer to the model.This can also be seen in biological systems, where the systembehaves well even in the face of a poor model.

Solving the problems of locomotion and manipulation willnot only contribute new mathematical techniques to scienceand engineering, it will also contribute to the developmentof new rehabilitation strategies, artificial limbs, assistivetechnologies, and far more.

REFERENCES

[1] H. Asada and J.-J. E. Slotine. Robot Analysis and Control. John Wileyand Sons, New York, NY, 1986.

[2] M. de Lasa, I. Mordatch, and A. Hertzmann. Feature-based locomotioncontrollers. ACM Transactions on Graphics (Proc. SIGGRAPH),29(3), 2010.

[3] S. Jagannathan and G. Galan. Adaptive critic neural network–basedobject grasping controller using a three-fingered gripper. IEEE Trans.on Neural Networks, 15(2):395–407, March 2004.

[4] E. Kandel, J. Schwartz, and T. Jessell. Principles of Neuroscience.McGraw-Hill, Inc., New York, 3rd edition, 1991.

[5] E. Kefalea, E. Mael, and R.P. Wurtz. An integrated object repre-sentation for recognition and grasping. Knowledge-based intelligentinformation engineering systems. Third conference, 1999.

[6] Y. Lee, S. Kim, and J. Lee. Data-driven biped control. ACMTransactions on Graphics (Proc. SIGGRAPH), 29(4), 2010.

[7] W. Li, E. Todorov, and X. Pan. Hierarchical feedback and learningfor multi-joint arm movement control. In proceedings of the IEEEEngineering in Medicine and Biology 27th Annual Conference, 2005.

[8] C. K. Liu. Dextrous manipulation from a grasping pose. ACMTransactions on Graphics (Proc. SIGGRAPH), 28(3), 2009.

[9] D. Liu and E. Todorov. Evidence for the flexible sensorimotor strate-gies predicted by optimal feedback control. Journal of Neuroscience,27(35):9354–9368, 2007.

[10] D. Liu and E. Todorov. Hierarchical optimal control of a 7-dof armmodel. In proceedings of the IEEE Symposium of Adaptive DynamicProgramming and Reinforcement Learning, 2009.

[11] A. Moon and M. Farsi. Grasp quality measures in the control ofdextrous robot hands. IEEE Colloquium on Physical Modelling as aBasis for Control, pages 6/1–6/4, Feb. 1996.

[12] I. Mordatch, M. deLasa, and A. Hertzmann. Robust physics-basedlocomotion using low-dimensional planning. ACM Transactions onGraphics (Proc. SIGGRAPH), 29(3), 2010.

[13] M. Santello, M. Flanders, and J. Soechting. Patterns of hand motionduring grasping and the influence of sensory guidance. The Journalof Neuroscience, 22(4):1425–1435, February 2002.

[14] A. Simpkins and E. Todorov. Optimal tradeoff between explorationand exploitation. In proceedings of the American Control Conference,2008.

[15] R. Stengel. Stochastic Optimal Control: Theory and Application. JohnWiley and Sons, 1986.

[16] E. Todorov and M. Jordan. A minimal intervention principle forcoordinated movement. Advances in Neural Information ProcessingSystems, 15:27–34, 2003.

[17] E. Todorov and W. Li. A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems.In proceedings of the American Control Conference, 2005.

[18] J. Wu and Z. Popovic. Terrain-adaptive bipedal locomotion control.ACM Transactions on Graphics (Proc. SIGGRAPH), 29(4):72:1–72:10, 2010.

complex object manipulation with hierarchical optimal controltodorov/papers/simpkins... · 2011. 4....

Documents