3d walking based on online optimization - cs.cmu.edusfeng/sf_hum13.pdf · 3d walking based on...

3D Walking Based on Online Optimization

Siyuan Feng, X Xinjilefu, Weiwei Huang and Christopher G. Atkeson

Abstract— We present an optimization based real-time walk-ing controller for a full size humanoid robot. The controllerconsists of two levels of optimization, a high level trajectoryoptimizer that reasons about center of mass and swing foot tra-jectories, and a low level controller that tracks those trajectoriesby solving a floating base full body inverse dynamics problemusing Quadratic Programming. Our controller is capable ofwalking on rough terrain, and also achieves longer foot steps,faster walking speed, heel-strike and toe push-off. Results aredemonstrated with Boston Dynamics’ Atlas robot in simulation.

I. INTRODUCTION

We consider a generalized walking controller that workson uneven terrain is essential to enabling humanoid robotsto walk outside labs and be actually useful. Additionally, weargue that the controller needs to pay attention to a set of footsteps provided by either a planner or human operator. Thisis important in obstacle avoidance, rough terrain traversal,and walking in a cluttered indoor environment. Althoughrecent research in hand-crafted policy based walking con-trollers [1], [2], [3] has shown robust performance on uneventerrain and against external perturbations, they are essentiallywalking blindly and are fundamentally incompatible with therequirement described above. On the other hand, traditionalZero Moment Point (ZMP) based designs [4], [5] have yetto show convincing performance on uneven terrain. We areproposing a walking controller based on online optimizationthat handles uneven terrain. Figure 2 shows a simulatedAtlas robot using our controller to traverse rough terrain inDARPA’s Virtual Robotics Challenge.

Our approach is rooted in model-based optimal control, asit takes a desired sequence of foot steps, uses optimizationto generate a locally optimal trajectory, then tracks thistrajectory using inverse dynamics. The high level controllerperforms online trajectory optimization using DifferentialDynamic Programming (DDP)[6] with a simplified modelthat only reasons about the center of mass (COM) of therobot. We also use a quintic spline in Cartesian space tosmoothly connect the given foot steps for the swing foot.The low level controller takes these trajectories as inputs anduses a Quadratic Programming (QP) based inverse dynamicssolver to generate torques for all the joints. A schematic ofthe system is depicted in Figure 1.

The high level controller is similar in spirit to PreviewControl proposed by Kajita et al. [4] in the sense that weare also using a COM model, reasoning about ZMP andusing future information to guide the current trajectory. But

All the authors are with The Robotics Institute, Carnegie MellonUniversity, 5000 Forbes Avenue, Pittsburgh, PA 15213, {sfeng, xxinjile,cga}@cs.cmu.edu, [email protected]

our approach can be easily generalized to more complexnonlinear models and adjust foot steps while optimizingthe COM trajectory. We do not explicitly generate inversekinematics solutions or use high gain position controls totrack joint trajectories, which results in compliant walking.We explicitly add the z dimension in our COM model tohandle height variations on rough terrain. Like capture pointmethods [7], we take the next few steps into considerationduring trajectory optimization, although we do not planto come to rest at the end. Ogura et al. [8] investigatedgenerating human like walking with heel-strike and toe-off by parametrizing the swing foot trajectory, and using agenetic algorithm to find the parameters. In contrast, we usevery simple rules to guide the low level controller to achievethe same behaviors.

In the low level controller, we formulate the floatingbase inverse dynamics as a Quadratic Programming problem.We continue to use a formulation previously developed inour group [9], [10]. Unlike [11], [12] that use orthogonaldecomposition to project the allowable motions into the nullspace of constraint Jacobian, and minimize any combinationof linear and quadratic costs in the contact constraints andthe commands, we directly optimize a quadratic cost in termsof state accelerations, torques and contact forces. We are alsoable to directly reason about inequality constraints such ascenter of pressure within the support polygon, friction andtorque limits. Although it becomes a bigger QP problem,we are still able to solve it in real time. Hutter el al. [13]resolved redundancy in inverse dynamics using a kinematictask prioritization approach that ensures lower priority tasksalways exist in the null space of higher priority ones. Incontrast to their strictly hierarchical approach, we minimizea sum of weighted terms. We can directly specify the relativeimportance by adjusting the weights.

II. TRAJECTORY GENERATION

Given a sequence of desired foot steps, we assign uni-form timing to them. We then plan a COM trajectory thatminimizes stance foot ankle torque with DDP, which isan iterative trajectory optimization technique that updatesthe current control signals based on the spatial derivativesof the value function, and uses the updated controls togenerate a trajectory for the next iteration. The swing foottrajectory is generated using a quintic spline between thestarting and ending positions. For foot orientation, we takethe yaw angle specified in the foot step sequence, assume0 roll angle, and estimate the pitch angle by relative heightchange from consecutive foot steps. Body orientation at theend of the swing phase is computed by averaging the yaw

Fig. 1. The proposed controller consists of a trajectory optimizer andinverse dynamics solver. The former takes robot state q, q from a stateestimator and desired foot steps from a planner, and produces desired COMand swing foot trajectories. The latter takes these trajectories and q, q andsolves floating base full body inverse dynamics to generate torques.

angles from consecutive foot steps, assuming 0 roll angle,and a task specific pitch angle (e.g. leaning forward whenclimbing a steep ramp). Both body and foot orientation arerepresented as quaternions, and interpolated using sphericallinear interpolation (slerp).

In the current implementation of the high level controller,we approximate the entire robot as a point mass, withoutconsidering angular momentum. The dynamics for the simplemodel are xy

z

t

=

(xt−px)Fz

mzt(yt−py)Fz

mztFz

m − g

.The state, X = (x, y, z, x, y, z), is the location and velocityof the center of mass. The control u = (px, py, Fz) is thecommanded center of pressure and force in the z direction.The current high level controller does not know about steplength limits, and we are relying on the foot step planner togive us reasonable foot steps.

A. Differential Dynamic Programming

DDP applies Dynamic Programming along a trajectory.It can find globally optimal trajectories for problems withtime-varying linear dynamics and quadratic costs, and rapidlyconverge to local optimal trajectories for problems withnonlinear dynamics or costs. This approach modifies (andcomplements) existing approximate Dynamics Programmingapproaches in these ways: 1) We approximate the valuefunction and policy using many local models (quadratic forthe value function, linear for the policy) along the trajectory.2) We use trajectory optimization to directly optimize thesequence of commands u0,N−1 and states X0,N . 3) Refinedlocal models of the value function and policy are created asa byproduct of our trajectory optimization process.

We represent value functions and policies using Taylorseries approximations at each time step along a trajectory. Fora state Xp, the local quadratic model for the value function

Fig. 2. The proposed controller tested in the Rough Terrain Task inDARPA’s Virtual Robotics Challenge.

is

V p(X) ≈ V p0 + V p

X(X −Xp)

+1

2(X −Xp)TV p

XX(X −Xp),

where X is some query state, V p0 is the constant term, V p

X

is the first order gradient with respect to state evaluated atXp, and V p

XX is the second order spatial gradient evaluatedat Xp. The local linear policy is

up(X) = up0 −Kp(X −Xp),

where up0 is a constant term, and Kp is the first derivative ofthe local policy with respect to state evaluated at Xp, whichis also a gain matrix for a local linear controller. V0, VX ,VXX and K are stored along with the trajectory.

The one step cost function is

L(X,u) = 0.5(X −X∗)TQ(X −X∗)+ 0.5(u− u∗)TR(u− u∗),

where R is positive definite, and Q is positive semi-definite.X∗ is given as a square wave, instantly switching to the nextfoot step location and staying there for the entire stance withvelocities equal zero. u∗ is specified in a similar way with pxand py being at the desired center of pressure in the worldframe, and Fz = mg.

For each iteration of DDP, we propagate the spatialderivatives of the value function VXX and VX backward intime, and use this information to compute an update du to

the control signal. Then we perform a forward integrationpass using the updated controls to generate a new trajectory.Although we are performing nonlinear trajectory optimiza-tion, due to analytical gradients of the dynamics, this processis fast enough in an online setting.

B. Initialization

Given the last desired center of mass location and desiredcenter of pressure, (X∗, u∗) in the foot step sequence, wefirst compute a Linear Quadratic Regulator (LQR) solution atthat point, and use its policy to generate an initial trajectoryfrom the initial X0. VXX of this LQR solution is also usedto initialize the backward pass.

C. Backward pass

Given a trajectory, one can integrate the value functionand its first and second spatial derivatives backwards intime to compute an improved value function and policy.We utilize the “Q function” notation from reinforcementlearning: Qt(X,u) = V t+1(f(X,u)) + Lt(X,u) where t isthe time index. The backward pass of DDP can be expressedas

QtX = Lt

X + V tXf

tX

Qtu = Lt

u + V tXf

tu

QtXX = Lt

XX + V tXf

tXX + f tTX V t

XXftX

QtuX = Lt

uX + V tXf

tuX + f tTu V t

XXftX

Qtuu = Lt

uu + V tXf

tuu + f tTu V t

XXftu

Kt = (Qtuu)−1Qt

uX

δut = (Qtuu)−1Qt

u

V t−1X = Qt

X −QtuK

t

V t−1XX = Qt

XX −QtXuK

t.

Derivatives are taken with respect to the subscripts, andevaluated at (X,u).

D. Forward pass

Once we have computed the local linear feedback policyKt and updates for controls δut, we integrate forward intime using

utnew = (ut − δut)−Kt(Xtnew −Xt)

with Xt0new = X0. We terminate DDP when the cost-to-go at

X0 does not change much across iterations. This approachcan be thought of as a generalized version of Kajita’s previewcontrol [4]. Figure 3 shows trajectories of COM before andafter DDP optimization. The original trajectories are plottedwith dashed lines, and results are plotted with solid lines.State trajectories are shown in blue, and controls are in green.In the plots, Fz is normalized by body weight.

E. Double support

A short double support phase is planned to allow weightshifting and explicit controls on the touch down foot and liftoff foot.

0 0.5 1 1.5 2 2.5 3 3.5−0.5

0

0.5

1

time [s]

x[m

]

0 0.5 1 1.5 2 2.5 3 3.5−0.2

−0.1

0

0.1

0.2

time [s]

y[m

]

0 0.5 1 1.5 2 2.5 3 3.50.5

1

1.5

time [s]

z[m

]

x0px0pxd

px1x1

y0py0pyd

py1y1

z0Fz0/mgzd

Fz1/mgz1

Fig. 3. Desired center of pressure in XY plane and COM height trajectoriesare plotted with solid red lines. Initial trajectories before DDP optimizationare plotted in dashed lines, and optimized ones are shown in solid lines.States x, y and z are shown in blue, and controls px, py and Fz

mgare plotted

in green.

1) Heel-strike: We command the landing foot’s orienta-tion to have a pitch angle and allowed the center of pressureto move quickly from the heel to somewhere in the middleof the foot.

2) Toe push-off: During the lift off phase, we move thefoot reference point to the toe, and instead of commandingthe angular acceleration to be zero, we request a large pitchangular acceleration in the foot frame, set the desired centerof pressure to be at the toe, and let the low level controllerhandle the rest. A more elegant approach is perhaps elimi-nating the row in Acart and bcart in (1) that corresponds topitch angular acceleration entirely and let QP figure out therest.

III. FULL BODY INVERSE DYNAMICS

For the low level control, we use an inverse dynamicsapproach to solve for torques that would best satisfy thegiven desired motion. We formulate the full body floatingbase inverse dynamics problem as a Quadratic Programming(QP) problem, and apply a standard QP solver at each timestep to generate a set of torque commands.

A. QP formulation

The equations of motion and constraints equations for afloating base humanoid robot can be described as

M(q)q + h(q, q) = Sτ + JT (q)F

J(q)q + J(q, q)q = x.

(q, q) is the full state of the system including a 6 DOF jointat the floating base. M(q) is the inertia matrix. h(q, q) is thesum of gravitational, centrifugal and Coriolis forces. S is a

selection matrix that is the identity matrix except for the first6 degrees of freedom that corresponds to the floating base.τ is the joint torque vector. JT (q) is the stacked Jacobianmatrix for all the contacts. F is the vector of all stackedcontact forces. x is the contact location and orientation inCartesian space. The sizes of F and JT change depends onthe number of contacts.

We can rewrite the equations of motion as

[M(q) −S −JT (q)

] qτF

+ h(q, q) = 0.

Given a state (q, q), the equations of motion are linear interms of

[q τ F

]T.

Let X =[q τ F

]T. We can now formulate the inverse

dynamics as a Quadratic Programming problem

minXX TGX + gTX

s.t. CEX + ceT = 0

CIX + ciT >= 0.

The equality constraints for this QP problem are the equa-tions of motions described above, and the inequality con-straints consist of various terms such as joint torque limitsand ground force limits due to friction cone and center ofpressure remaining in the support polygons. The optimizationcriteria can be thought as a big least square minimizationproblem penalizing X for deviating from some desired X ∗.

B. QP optimization criteria

Essentially, we are optimizing a cost function of the form‖AX−b‖2, where X =

[q τ F

]T. A and b can be broken

down row-wise into smaller blocks, each emphasizing certaindesired behaviors with different weights. We will present afew concrete examples.

1) Cartesian space accelerations: Since

x = J(q)q + J(q, q)q,

we can penalize deviation from desired Cartesian spaceaccelerations using

Acart =[J(q) 0 0

]bcart = x∗ − J(q, q)q.

(1)

This is useful for task level controls such as COM,swing foot or hand tracking. For COM and swing foot, thedesired accelerations are specified by the high level trajectoryoptimizer described in Section II. For the stance foot, thedesired accelerations are set to zero. Rather than treating thestance foot accelerations as hard constraints, we find thatusing a soft penalty with high weights is generally morestable and faster to solve.

2) Net external force and center of mass motion: The re-lationship between COM linear acceleration and net externalforce is

Flin = m(xcom +[0 0 g

]T),

where m is the total mass, and g is the gravitational accel-eration. Net torque and change in angular momentum L arerelated in a similar way

Fang + (xcontact − xcom)× Flin = Iωcom,

where I is the total moment of inertia, and ωcom is theangular acceleration. We can compute

AcomF =[0 0

[I3×3 0

]]bcomF = m(x∗com +

[0 0 g

]T)

and

AcomTau =[0 0

[Cross(xcontact, xcom) I3×3

]]bcomTau = Iω∗com.

I3×3 is a 3 by 3 identity matrix. x∗com and ω∗com are desiredlinear and angular accelerations. Cross(xcontact, xcom) isthe matrix representing a cross product between (xcontact−xcom) and Flin.

3) Center of pressure tracking: Given the forces andtorques, bM,b F , specified in foot frame, the location of thecenter of pressure in the foot frame is

p =

[−bMy/

bFzbMx/

bFz

].

We can penalize center of pressure deviation with

Acop =

[0 0

[0 0 p∗x 0 1 00 0 p∗y −1 0 0

] [R 00 R

]]bcop = 0,

where (p∗x, p∗y) is the desired center of pressure in foot frame,

and R is the rotation matrix from the world frame to the footframe.

4) Weight distribution: In double support, it is oftendesirable to specify the desired weight distribution w∗ =Fzl/(Fzl +Fzr). We add this term to the cost function with

Aw =[0 0 Iw

]bw = 0,

where Iw is a row vector with zeros, except Iw(3) = 1−w∗and Iw(9) = −w∗.

5) Direct tracking and regularization: We can also di-rectly penalize X from desired values with

Astate =[I I I

]bstate =

[q∗ 0 0

]T,

where q∗ is the desired joint accelerations. This is usefulfor directly controlling the upper body joints, as well asregularizing X to make the QP problem well conditioned.

The final cost function uses the vertically concatenated Asand bs each multiplied with their corresponding weights, ws.

A =

wcartAcart

wcomFAcomF

...

, b = wcartbcartwcomF bcomF

...

Weights are summarized in Table I. wqdd is for joint

acceleration. wcom and wutorsowd are for COM position

TABLE IWEIGHTS FOR QP COST FUNCTION

wqdd wcomdd wutorsowd wfootdd

10−1 6× 10−1 1 1

wcomF wcomTau wregF wregTau

10−2 10−2 10−6 10−6

ww wcop wFd wTaud

10−3 8× 10−3 10−3 10−1

acceleration and upper torso orientation acceleration. wfootdd

is for foot position and orientation acceleration. wcomF andwcomTau are for net external force and torque. wregF andwregTau are regularization weights for contact force and jointtorques. ww is for weight distribution. wcop is for center ofpressure. wFd and wTaud penalize changes in contact forceand joint torques between two consecutive time steps.

C. Constraints

Torque limits can be easily added into the inequalityconstraints. Friction constraints are approximated by

|bFx| ≤ µbFz

|bFy| ≤ µbFz.

The center of pressure also has to be under the feet, whichcan be written as

d−x ≤ −bMy/bFz ≤ d+x

d−y ≤b Mx/bFz ≤ d+y ,

where bF and bM denotes forces and torques in the footframe, and d− and d+ are the sizes of foot. Equations ofmotion are treated as equality constraints in the QP problem.

IV. IMPLEMENTATION

A. Trajectory replanning

The high level trajectory planner replans whenever anew step is taken. It optimizes a COM trajectory for 3consecutive steps, although we will only use the trajectorythat corresponds to the first step. Because sometimes DDPcannot finish in one time step (1ms), we run DDP for thenext 3 foot steps on a separate thread, and use the trajectorythat was computed most recently in the low level controller.As for runtime, DDP completes within 1 to 50ms (dependingon terrain complexity), and QP finishes within 0.5ms.

B. Desired acceleration for the low level controller

We store the local linear policies in addition to theCOM trajectory, and use them to compute the desired COMaccelerations as an input to the low level controller. Since wehave a spline representation (quintic for position, and slerpfor orientation) for the swing foot trajectory, we can computeits desired acceleration with

xswing = −Kp(x− x∗)−Kd(x− x∗) + x∗,

where x∗, x∗, x∗ are on the spline. Upper body desiredjoint accelerations are also computed in a similar PD-servofashion.

C. Body orientation tracking

In practice, we discover that controlling upper torso ori-entation is better than directly controlling pelvis orientationbecause this allows the low level controller to use the spinejoints in addition to the leg joints. Upper torso orientationtracking is also done with a Jacobian similar to COMposition tracking.

D. Foot step orientation

We only require foot step yaw angle from the foot stepplanner. To generate a desired foot step orientation, we usethe specified yaw angle, assume 0 roll angle, and estimatethe pitch angle by relative height change from consecutivefoot steps. On rough terrain, orientation tracking is disabledon the landing foot once we detect a substantial Fz on theforce sensor.

E. Stance leg damping

In DARPA’s Virtual Robotics Challenge, rough terrain isrepresented by a triangular mesh. If the robot steps onto apointed region, the foot contact geometry no longer remainsrectangular, and the foot will “rock” on the “spike”. Wehandle this problem by having explicit joint damping on thestance leg, and adding a term to the QP’s cost function thatpenalizes large torque changes from previous time step.

F. Joint limits

When some joint is close to its limit, we command a largeacceleration in the other direction to push it back with

q∗ = −10 tan(π

(q − qlower

qupper − qlower− 0.5

)).

G. State estimation

The proposed controller needs estimates of the root posi-tion, velocity, orientation, angular velocity and all the jointpositions and velocities. All this information is gathered byseveral state estimators, which are discussed in depth in [14].

V. RESULTS

The proposed walking controller is tested on a simulatedBoston Dynamics’ Atlas robot in DARPA’s Virtual RoboticsChallenge setting. The Gazebo simulator is produced by theOpen Source Robotics Foundation. We run the simulationat real time on a computer with Intel Xeon(R) CPU E5-2687W CPU with 32G memory. This setup is necessaryfor simulating Atlas in the Gazebo simulator at real timealone. Our proposed controller uses two threads. One threadis dedicated to the QP solver running at 1KHz, and the othercomputes COM trajectory only when the robot has taken astep. The QP solver finishes within 0.5ms, and trajectoryoptimizer can take up to 50ms depends on the given footsteps.

1

1Video showing walking using our proposed controller: http://www.youtube.com/watch?v=lP52k_ZkOHM

http://www.youtube.com/watch?v=lP52k_ZkOHM

http://www.youtube.com/watch?v=lP52k_ZkOHM

92 93 94 95 96 97 98 99 100 101 1025

5.5

6

6.5

7

time [s]

x[m

]

comcomd

flfldfrfrd

92 93 94 95 96 97 98 99 100 101 102−8.5

−8

−7.5

time [s]

y[m

]

comcomd

flfldfrfrd

92 93 94 95 96 97 98 99 100 101 1020

0.5

1

1.5

time [s]

z[m

]

comcomd

flfldfrfrd

(a) COM and foot position tracking on rough terrain

92 93 94 95 96 97 98 99 100 101 102−0.5

0

0.5

1

time [s]

xd[m

/s] comd

comdd

92 93 94 95 96 97 98 99 100 101 102−1

−0.5

0

0.5

time [s]

yd[m

/s] comd

comdd

92 93 94 95 96 97 98 99 100 101 102−0.5

0

0.5

1

time [s]

zd[m

/s] comd

comdd

(b) COM velocity tracking on rough terrain

92 93 94 95 96 97 98 99 100 101 102−5

0

5

time [s]

xdd

[m/s

] comddcomddd

92 93 94 95 96 97 98 99 100 101 102−5

0

5

time [s]

ydd

[m/s

] comddcomddd

92 93 94 95 96 97 98 99 100 101 102−10

−5

0

5

time [s]

zdd

[m/s

]

comddcomddd

(c) COM acceleration tracking on rough terrain

Fig. 4. All actual traces are plotted in solid lines, and desired tracesare shown in dashed lines. In 4(a), left foot traces are shown in red lines,right foot in green, and COM are plotted in blue. The reference foot pointsare set to the heel. 4(b) shows velocity tracking of COM. In 4(c), actualCOM acceleration are computed by finite differencing the velocity trace,and capped at ±3m/s2.

Fig. 5. Snapshots taken for walking on flat ground with 0.7m step length,and 0.8s period.

A. Flat ground walking

We are able to demonstrate toe push-off and heel-strikebehavior on flat floor walking. We have achieved a maximumstep length of 0.8m. Figure 5 shows a sequence of snapshotstaken for walking with 0.7m step length, and 0.8s period onflat ground. The maximum speed we have achieved is onaverage 1.14m/s with 0.8m step and 0.7s period on flatground.

B. Rough terrain

The proposed controller can handle up to 0.4 rad inclinedslopes, and continuously climb stair steps that are 0.2m highand 0.4m apart. We are able to successfully walk on therough terrain environment provided in the Virtual RoboticsChallenge as well. In order to traverse rough terrain, an A*planner [15] is used to provide sequences of foot steps, whichare given as input to the proposed controller. The high levelcontroller takes the foot steps and generates desired footand COM trajectories for the low level controller to track.Figure 4 shows tracking performance for a segment of theexperiment.

VI. DISCUSSION AND FUTURE WORK

The most important contribution of this work is demon-strating that we can perform online trajectory optimizationand inverse dynamics in real time to handle rough terrainand achieve close to human like step length and speed on a3D full size humanoid robot. Reasoning about COM motionalone is enough to guide the inverse dynamics controller,which is a greedy optimizer, through moderate rough terrainwith height and surface normal changes. Another importantpoint is that our approach does not impose a predefinedcriteria on the set of allowable foot steps since trajectoriesare optimized online. Control and foot step planning canbe treated as almost completely independent problems. Thisfeature allows us to develop the foot step planner separately.Another interesting discovery was explicitly adding toe push-off and heel-strike between single stance phases. We have

shown that, with a very simple-minded design, we canachieve these behaviors and push step length and walkingspeed closer to those of humans. We will further explorethis issue in future work.

For flat ground walking, the proposed controller is main-taining constant COM height and torso pitch angle through-out the entire walking cycle, which is undesirable for achiev-ing higher speed and larger step length. We think the pointmass model is also limiting performance since it ignoresangular momentum and swing leg dynamics completely.The current trajectory optimizer implementation uses lessthan 10% of its allowable computation time. An immediateline of research is to expand the COM model to a moresophisticated one. We can include angular momentum andorientation [16], switch to a 3D Spring Loaded InvertedPendulum (SLIP) model [17] to capture natural arc likemotion observed in human walking, or add a telescoping legto capture some simple notion of stance leg configuration[18]. Another direction would involve simultaneously opti-mizing many trajectories. Apart from changing models, wecan also take the foot steps (location and timing) as part ofthe optimization. These choices can be treated as parametersin DDP, and be optimized along with the trajectory. Weimagine the foot step planner will hand us foot steps alongwith viable regions, and the trajectory optimizer will havethe freedom to pick the foot steps that are most suitable.This will be one way to have the planner implicitly takingrobot dynamics into consideration.

Our low level controller only greedily optimizes for thecurrent time step given a set of desired values to track. Wecould potentially add some form of a value function as part ofthe optimizing criteria to incorporate a notion of the future inthe low level controller. This value function can come fromperforming full-blown DDP on a periodic walking pattern[19], or once again from the trajectory optimization part.

Our biggest concern in applying this approach to realhardware is handling modeling error. One option is to adopta multiple model approach [20] during optimization andsimultaneously consider a distribution of models.

VII. CONCLUSIONS

We have designed a real time walking controller for a3D full size humanoid robot with online optimization. Thelow level controller solves full body floating base inversedynamics by formulating it as a Quadratic Programmingproblem. The high level controller guides it with trajectoriesthat are optimized online using simplified models of therobot. We have successfully applied this approach to bothflat ground and rough terrain walking, which is demonstratedusing a simulated Boston Dynamics’ Atlas robot. We havealso achieved more human like foot step length and walkingspeed by explicitly adding toe push-off and heel-strike in avery simple way.

ACKNOWLEDGEMENT

This material is based upon work supported in part bythe US National Science Foundation (ECCS-0824077, and

IIS-0964581) and the DARPA M3 and Robotics Challengeprograms. Special thanks to Eric Whitman for his help inimplementation and enlightening discussions.

REFERENCES

[1] S. Song and H. Geyer, “Generalization of a muscle-reflex controlmodel to 3d walking,” in IEEE Engineering in Medicine and BiologySociety (EMBC’13), 2013.

[2] J. M. Wang, S. R. Hamner, S. L. Delp, and V. Koltun, “Optimizinglocomotion controllers using biologically-based actuators and objec-tives,” in ACM Transactions on Graphics, 2012, pp. Vol. 31, No. 4,Article 25.

[3] J. M. Wang, D. J. Fleet, and A. Hertzmann, “Optimizing walking con-trollers for uncertain inputs and environments,” in ACM Transactionson Graphics, 2010, pp. Vol. 29, No. 4, Article 73.

[4] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi,and H. Hirukawa, “Biped walking pattern generation by using previewcontrol of zero-moment point,” in Robotics and Automation, 2003.Proceedings. ICRA ’03. IEEE International Conference on, vol. 2,2003, pp. 1620–1626 vol.2.

[5] T. Takubo, Y. Imada, K. Ohara, Y. Mae, and T. Arai, “Rough terrainwalking for bipedal robot by using ZMP criteria map,” in Roboticsand Automation, 2009. ICRA ’09. IEEE International Conference on,2009, pp. 788–793.

[6] D. H. Jacobson and D. Q. Mayne, Differential Dynamic Programming.Elsevier, 1970.

[7] J. Pratt, J. Carff, S. Drakunov, and A. Goswami, “Capture point: Astep toward humanoid push recovery,” in Humanoid Robots, 2006 6thIEEE-RAS International Conference on, 2006, pp. 200–207.

[8] Y. Ogura, K. Shimomura, H. Kondo, A. Morishima, T. Okubo,S. Momoki, H. ok Lim, and A. Takanishi, “Human-like walking withknee stretched, heel-contact and toe-off motion by a humanoid robot,”in Intelligent Robots and Systems, 2006 IEEE/RSJ International Con-ference on, 2006, pp. 3976–3981.

[9] B. Stephens, “Push recovery control for force-controlled humanoidrobots,” Ph.D. dissertation, The Robotics Institute, Carnegie MellonUniversity, Pittsburgh, August 2011.

[10] E. Whitman and C. Atkeson, “Control of instantaneously coupledsystems applied to humanoid walking,” in Humanoid Robots (Hu-manoids), IEEE-RAS International Conference on, 2010, pp. 210–217.

[11] M. Mistry, J. Buchli, and S. Schaal, “Inverse dynamics control offloating base systems using orthogonal decomposition,” in Roboticsand Automation (ICRA), 2010 IEEE International Conference on,2010, pp. 3406–3412.

[12] L. Righetti, J. Buchli, M. Mistry, M. Kalakrishnan, and S. Schaal,“Optimal distribution of contact forces with inverse dynamics control,”International Journal of Robotics Research, pp. 280–298, 2013.

[13] M. Hutter, M. Hoepflinger, C. Gehring, C. D. R. M. Bloesch, andR. Siegwart, “Hybrid operational space control for compliant leggedsystems,” in Proc. of the 8th Robotics: Science and Systems Conference(RSS), 2012.

[14] X. Xinjilefu, S. Feng, W. Huang, and C. Atkeson, “Decoupled stateestimation for humanoid using full-body dynamics,” in InternationalConference on Robotics and Automation, 2014, p. submission.

[15] W. Huang, S. Feng, X. Xinjilefu, and C. Atkeson, “Autonomouspath planning of a humanoid robot on unknown rough terrain,”in International Conference on Robotics and Automation, 2014, p.submission.

[16] E. C. Whitman, B. J. Stephens, and C. G. Atkeson, “Torso rotationfor push recovery using a simple change of variables,” in IEEE-RASInternational Conference on Humanoid Robots, 2012.

[17] A. Wu and H. Geyer, “The 3-D spring-mass model reveals a time-based deadbeat control for highly robust running and steering inuncertain environments,” Robotics, IEEE Transactions on, vol. PP,no. 99, pp. 1–11, 2013.

[18] P. A. Bhounsule, “Gait planning and control of a compass gait walkerbased on energy regulation using ankle push-off and foot placement,”in International Conference on Robotics and Automation, 2014, p.submission.

[19] C. Liu, C. G. Atkeson, and J. Su, “Biped walking control using atrajectory library,” Robotica, 2012.

[20] E. Whitman and C. Atkeson, “Multiple model robust dynamic pro-gramming,” in American Control Conference (ACC), 2012, pp. 5998–6004.

3d walking based on online optimization - cs.cmu.edusfeng/sf_hum13.pdf · 3d walking based on...

Documents