an operational space control approach - isir · 2013-09-27 · an operational space control...

An Operational Space Control approach


to Motor Adaptation

Olivier Sigaud

Université Pierre et Marie Curie, PARIS 6http://people.isir.upmc.fr/sigaud

September 24, 2013

1 / 49


Table of contents

Introduction

Mechanical models

Learning methods

Control

Learning Robotics models

Results

Perspectives

2 / 49


Introduction

Learning one's body

I Babies don't know well their body

3 / 49


Introduction

Motor adaptation

I Adapting one's body model (kinematics, dynamics, ...) under changingcircumstances

4 / 49


Introduction

Motor adaptation: standard experiment

I Standard view: Motor adaptation results from learning a model of thedynamics

5 / 49


Introduction

Outline: goals of this class

Learning interaction models

I Impossible to model unknown objects

I Studying potential of supervised learning algorithms to control robots inunpredictable situations

I Presenting the tools (statistical learning + control framework) to give abasic account of motor adaptation

6 / 49


Mechanical models

Kinematics: some Mechanics

ξx = l1cos(q1) + l2cos(q2) + l3cos(q3)

ξy = l1sin(q1) + l2sin(q2) + l3sin(q3)

forwardkinematics

inversekinematics

ξ: operational positionq: articular position

7 / 49


Mechanical models

Velocity kinematics - Jacobian

νx = −l1sin(q1)q1 − l2sin(q2)q2 − l3sin(q3)q3

νy = l1cos(q1)q1 + l2cos(q2)q2 + l3cos(q3)q3

forwardvelocity

kinematics

inversevelocity

kinematics

q: articular velocityν: operational velocity

ν = J (q) q

8 / 49


Mechanical models

Dynamics: where forces come into play

inverse

dynamics

forward

dynamics

k

k+1

k+1

k

k

Forward and inverse dynamics (Lagrange or Newton-Euler equations)

q = A (q)−1 (τ − n (q, q)− g (q)− ε (q, q) + τ ext)

τ = A (q) q + n (q, q) + g (q) + ε (q, q)− τ ext

A: inertia matrixn: Coriolis and centrifugal e�ectsg: gravityε: unmodeled e�ectsτext: external forces

q: articular positionq: articular velocityq: articular accelerationτ : torques

9 / 49


Learning methods

Learning mechanical models = function approximation

?

I A mechanical model is a mapping between an input space and an outputspace. It can be seen as a collection of functions (one per outputdimension)

I Forward kinematics: ξ = J (q) q → ξ = F (q, q)

I Forward dynamics: q = A (q)−1 (Γ− n (q, q))→ q = G(q, q,Γ)

I Regression methods can approximate such functions

10 / 49


Learning methods

Incremental and iterative regression

I A model can be learned incrementally and iteratively from samples

I Incremental: the model is improved each time a new data point is received

I Iterative: the model Mi+1 is improved from Mi (e.g. gradient descent)

I Standard approach: linear approx. with (Recursive) Least Squares (RLS)

I The methods presented below are extensions to the non-linear case

11 / 49


Learning methods

Radial Basis Function Networks

−5 0 5 10 15−3

−2

−1

0

1

2

3

4

5

I Linear combination of Gaussian functions

I The weights are tuned so as to approximate the function

I Using some incremental/iterative approach (RLS, gradient descent...)

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control,

Signals, and Systems (MCSS), 2(4):303�314.

12 / 49


Learning methods

From Least Squares to Regularized Least Squares

I One way to tune weights is (Recursive) Least Squares (incremental anditerative)

I Optimize with lower weights (sacri�ce optimality):

I Regularized Least Squares w∗ = arg minw

λ2‖w‖2 + 1

2‖Y −XTw‖2

I Analytical solution: w∗ = (λI +XTX)−1XTY.

I Iterative and incremental computation of (λI +XTX)−1 throughCholesky decomposition

13 / 49


Learning methods

Incremental Receptive Fields Regularized Least Squares

W

W

W

W

1

2

3

4

I Approximate the function through its (approximate) Fourier transform

using random features zk(Xi) =√

2√Dcos(ωTkXi + bk), with

ωk ∼ N (0, 2γI) and bk ∼ U(0, 2π).I As RBFNs, but with K cosinus features → global versus localI Provides a strong grip against over-�tting (ignoring the high frequencies)I In practice, e�cient for large enough K, and easy to tune

Gijsberts, A. & Metta, G. (2011) �Incremental learning of robot dynamics using random features.� In IEEE

International Conference on Robotics and Automation (pp. 951�956).

14 / 49


Learning methods

Locally Weighted Regression

I Linear models are tuned with Least Squares

I Their importance is represented by a Gaussian function

Atkeson, C. (1991). Using locally weighted regression for robot learning. In Proceedings of the IEEE

International Conference on Robotics and Automation (ICRA), vol. 2, pp. 958�963.

15 / 49


Learning methods

LWR approximation: graphical intuition

wk(x) = e−12

(x−ck)TD(x−ck)

I Each RF tunes a local linearmodel

Ψk(x) = βk0 +R∑i=1

βki ski (x)

I Gaussians tell you how much eachRF contributes to the output

y(x) =

∑ki=1 wk(x)Ψk(x)∑k

i=1 wk(x)

I The global output (green line) is a weighted combination of linear models(straight lines)

16 / 49


Learning methods

LWPR: general goal

I Non-linear function approximation in very large spaces

I Using PLS to project linear models in a smaller space

I Good along local trajectories

Schaal, S., Atkeson, C. G., and Vijayakumar, S. (2002). Scalable techniques from nonparametric statistics

for real time robot learning. Applied Intelligence, 17(1):49�60.

17 / 49


Learning methods

GMR

ynew =K∑k=1

hk(Xnew)(µk,Y + Σk,Y XΣ−1k,Y (Xnew − µk,X))

Avec:

µk = [µTk,X , µTk,Y ]T and Σk =

(Σk,X Σk,XY

Σk,Y X Σk,Y X

)Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008) �Dynamical system modulation for robot learning

via kinesthetic demonstrations.� IEEE Transactions on Robotics, 24(6), 1463�1467.

18 / 49


Learning methods

XCSF: overview

I XCSF is a Learning Classi�er System [Holland, 1975]

I Linear models weighted by Gaussian functions (similar to LWPR)

I Linear models are updated using RLS

I Gaussian functions adaptation: Di and ci are updated using a GA

I Key feature: distinguish gaussian weights space and linear models space

LWPR: ξ =∑wi(q)F (i)yi(q)∑wi(q)F (i) XCSF:

ˆξ =

∑wi(q)F (i)yi(q)∑wi(q)F (i)

I Condensation: reduces population of local models to perform better ingeneralization

Wilson, S. W. (2001). Function approximation with a classi�er system. In Proceedings of the Genetic and

Evolutionary Computation Conference (GECCO-2001), pages 974�981, San Francisco, California, USA.Morgan Kaufmann.

19 / 49


Learning methods

LWR methods: main features

Algo LWR LWPR GMR XCSF

Number of RFs �xed growing �xed adaptivePosition of RFs �xed �xed adaptive adaptiveSize of RFs �xed adaptive adaptive adaptive

20 / 49


Learning methods

Learning methods survey: quick overview

−5 0 5 10 15−3

−2

−1

0

1

2

3

4

5

W

W

W

W

1

2

3

4

I Tuning weights (local/global features) vs Gaussian combination of linearmodels

I LWPR: PLS, fast implementation, the reference method

I XCSF: distinguish gaussian weights space and linear models space

I GMR: few features

I ISSGPR: easy tuning, no over-�tting

Sigaud. O. , Salaün, C. and Padois, V. (2011) �On-line regression algorithms for learning mechanical models

of robots: a survey,� Robotics and Autonomous Systems, 59:1115-1129.

21 / 49


Control

Resolve Motion Rate Control (Whitney 1969)

Planning InverseKinematics

InverseDynamics

I Also called CLIK (Closed Loop Inverse Kinematics)

I From task to torques

I Three steps architecture

I Trajectory generationI Inverse Kinematics and redundancyI Inverse Dynamics

22 / 49


Control

Resolve Motion Rate Control - Trajectory generation

Planning InverseKinematics

InverseDynamics

ξ: operational position

ξ†: desired operational position

ν?: desired operational velocity

First step, create a goal attractor. ν? = Kp

(ξ† − ξ

)23 / 49


Control

Resolve Motion Rate Control - Inverse kinematics

q: articular position

ν: operational velocity

q: articular velocity

Second step, inverse the kinematics. ν = J (q) q → q? = J (q)+ ν?

24 / 49


Control

Resolve Motion Rate Control - Inverse kinematics

InverseDynamics

q: articular position

ν: operational velocity

q: articular velocity

Second step, inverse the kinematics. ν = J (q) q → q? = J (q)+ ν?

24 / 49


Control

Control redundancy

q? = J (q)+ ν? q? = J1 (q)+ ν?1 + (J2 (q)PJ1)+ν?2

I redundancy : more actuated degrees of freedom than those necessary torealise a task

I PJ is a projector used to control redundancy

I necessary to have access to J to compute PJ

25 / 49


Control

Resolve Motion Rate Control - Inverse Dynamics

InverseDynamics

Γ: torques

M : inertia matrix

b: Coriolis and centrifugal e�ects

g: gravity

ε: unmodeled e�ects

Γext: external forces

Third step, compute the inverse dynamicsτ con = M (q) q? + b (q, q) + g (q) + ε (q, q)− τ ext

τ con = ID (q, q, q?)

26 / 49


Control

Resolve Motion Rate Control - Inverse Dynamics

Γ: torques

M : inertia matrix

b: Coriolis and centrifugal e�ects

g: gravity

ε: unmodeled e�ects

Γext: external forces

Third step, compute the inverse dynamicsτ con = M (q) q? + b (q, q) + g (q) + ε (q, q)− τ ext

τ con = ID (q, q, q?)

26 / 49



Learning mechanical models

I Forward kinematics: ξ = Fθ(q, q) (ξ = J (q) q)

I Forward dynamics: q = Gθ(q, q,Γ) q = A (q)−1 (Γ− n (q, q))

I Regression methods can approximate such functions

I The mapping can be learned incrementally from samples

I Can be used for interaction with unknown objects or users

27 / 49



Learning inverse kinematics with LWPR

I The model is learned with randommovements along an operationaltrajectory

I Input dimension: dim(ξ + q) = 29

I Output dimension: dim(q) = 26

D'Souza, A., Vijayakumar, S., and Schaal, S. (2001b). Learning inverse kinematics. In Proceedings of the

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), volume 1, pages 298�303.

28 / 49



Learning forward/inverse velocity kinematics with LWPR

Forward kinematics Inverse kinematics

I Learning inverse kinematics is conceptually simpler

I But one loses the opportunity to make pro�t of redundancy

I Rather learn forward kinematics and inverse it

29 / 49



Learning forward velocity kinematics with LWPR/XCSF

Forward kinematics with LWPR Forward kinematics with XCSF

I Learning the forward velocity kinematics of a Kuka kr16 in simulation.

I They add a constraint to inverse the kinematics and determine the jointvelocities.

Butz, M., Pedersen, G., and Stalph, P. (2009). Learning sensorimotor control structures with XCSF:

redundancy exploitation and dynamic control. In Proceedings of the 11th Annual conference on Genetic andevolutionary computation, pages 1171�1178. ACM.

30 / 49



Learning dynamics with XCSF

I Learning dynamics is more di�cult

I In dynamics, there is no redundancy

I The dynamics model is 2/3 smaller with XCSF than with LWPR

31 / 49



Learning inverse dynamics with LWPR

I The model is learned along anoperational trajectory

I Input dimension:dim(q + q + q) = 90

I Output dimension: dim(Γ) = 30

I 7, 5.106 training data points and2200 receptive �elds

Vijayakumar, S., D'Souza, A., and Schaal, S. (2005). LWPR: A scalable method for incremental online

learning in high dimensions. Technical report, Edinburgh: Press of University of Edinburgh.

32 / 49



Learning inverse dynamics

q = A (q)−1 (τ − n (q, q)− g (q)− ε (q, q) + τext)

Learn Predict

Learning inverse dynamics Predict inverse dynamicswith random movements

along a trajectory

33 / 49



Learning inverse operational dynamics

I Peters and Schaal (2008) learninverse dynamics in theoperational space.

I The model is learned along anoperational trajectory.

I Input dimension :dim(q + q + ν) = 17

I Output dimension: dim(Γ) = 7

Peters, J. and Schaal, S. (2008). Learning to control in operational space. International Journal in Robotics

Research, 27(2):197�212.34 / 49



Optimal control with dynamics learned with LWPR

iLQG u plantlearneddynamics model

+

feedbackcontroller

x, dx

L, x

u

u

perturbationsxcost function(incl. target)

δ

-

- u +-uδ

I The inverse dynamics model is learned in thewhole space.

I Input dimension :dim(q + q + u) = 10 . Outputdimension : dim(q) = 2.

I 1, 2.106 training data points and 852 receptive�elds

I Learning a model of redundant actuationShoulder

Elbow

x

y

q1

q2

1

2

3

4

5

6

Mitrovic, D., Klanke, S., and Vijayakumar, S. (2008). Adaptive optimal control for redundantly actuated

arms. In Proceedings of the Tenth International Conference on Simulation of Adaptive Behavior.

35 / 49



Properties of models

I [D'Souza et al., 2001], [Vijayakumar et al., 2005] and [?] learn kinematicsand dynamics along a trajectory.

I [Butz et al., 2009] learn kinematics in the whole space but do not makepro�t of redundancy to combine several tasks.

I [Mitrovic et al., 2008] learn dynamics in the whole space to controlredundant actuators.

36 / 49



Camille Salaün's work: combining tasks

To perform several tasks with learnt models, we have chosen to

I learn separately forward kinematics and inverse dynamics

I use classical mathematical inversion to resolve redundancy

I learn models on whole space

I use LWPR and XCSF as learning algorithms

37 / 49


Results

Learning kinematics with LWPR

Point to point task500 steps babbling with the kinematics model we want to learn.

38 / 49


Results

Controlling redundancy with LWPR

compatible task incompatible task

39 / 49


Results

Learning kinematics of iCub in simulation

I Simulation of a three degrees of freedom shoulder plus one degrees offreedom elbow

40 / 49


Results

Learning kinematics on the real robot

iCub realising two tasks: following a circle and clicking a numpad

41 / 49


Results

Inverse dynamics and motor adaptation

Applying a vertical force after 2 seconds during a point to point task.

42 / 49


Results

Inverse dynamics and after e�ects

0.3

0.35

Releasing the force after 2 seconds during a point to point task.I We reproduce Shadmehr's experiments

43 / 49


Results

Learning dynamics

I Simulation of a three degrees of freedom planar arm

44 / 49


Results

Learning forward models

0.10.2

0.3

0.4

0.20.10.2

0.3

0.4

0.2 0.10.2

0.3

0.4

0.2

0.10.2

0.3

0.4

0.20.10.2

0.3

0.4

0.2 0.10.2

0.3

0.4

0.2

0.10.2

0.3

0.4

0.20.10.2

0.3

0.4

0.2 0.10.2

0.3

0.4

0.2

I For complex robots, the CAD model is not so accurate (calibration issue)

Sicard, G., Salaün, C., Ivaldi, S., Padois, V., and Sigaud, O. (2011) Learning the velocity kinematics of icub

for model-based control: XCSF versus LWPR. In Proceedings Humanoids 2011, pp. 570-575.

45 / 49


Results

Comparing algorithms

I Main di�culty: tuning parameters for fair comparison

I Many speci�c di�culties for robotics reproducibility

Droniou, A., Ivaldi, S., Padois, V., and Sigaud, O. (2012) Autonomous Online Learning of Velocity

Kinematics on the iCub: a Comparative Study. In IROS 2012, to appear

46 / 49


Perspectives

Motor adaptation and the cerebellum

s

s

MGD

x(t)

I Structural similarity between LWPR-like algos and cerebellum: PurkinjeCells = receptive �elds

I + the problem of state estimation over time given delays

47 / 49


Perspectives

Learning dynamical interactions with objects

I Using a force/torque sensor to detect exerted force on shoulder

I Using arti�cial skin to detect contact points

I Compliant control of motion (CODYCO EU project)

I Learning high-dimensional models

48 / 49


Perspectives

Any question?

49 / 49


Perspectives

Atkeson, C. (1991).

Using locally weighted regression for robot learning.In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages958�963.

Butz, M., Pedersen, G., and Stalph, P. (2009).

Learning sensorimotor control structures with XCSF: redundancy exploitation and dynamic control.In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1171�1178.ACM.

Cybenko, G. (1989).

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals, and Systems (MCSS), 2(4):303�314.

Droniou, A., Ivaldi, S., Padois, V., and Sigaud, O. (2012).

Autonomous online learning of velocity kinematics on the icub: a comparative study.In Proceedings IROS, page in press, Portugal.

D'Souza, A., Vijayakumar, S., and Schaal, S. (2001).

Learning inverse kinematics.In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),volume 1, pages 298�303.

Hersch, M., Guenter, F., Calinon, S., and Billard, A. (2008).

Dynamical system modulation for robot learning via kinesthetic demonstrations.IEEE Transactions on Robotics, 24(6):1463�1467.

Holland, J. H. (1975).

Adaptation in Natural and Arti�cial Systems: An Introductory Analysis with Applications to Biology,Control, and Arti�cial Intelligence.University of Michigan Press, Ann Arbor, MI.

Mitrovic, D., Klanke, S., and Vijayakumar, S. (2008).

49 / 49


Perspectives

Adaptive optimal control for redundantly actuated arms.In Proceedings of the Tenth International Conference on Simulation of Adaptive Behavior, pages 93�102.

Schaal, S., Atkeson, C. G., and Vijayakumar, S. (2002).

Scalable techniques from nonparametric statistics for real time robot learning.Applied Intelligence, 17(1):49�60.

Sicard, G., Salaun, C., Ivaldi, S., Padois, V., and Sigaud, O. (2011).

Learning the velocity kinematics of icub for model-based control: XCSF versus LWPR.In Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots, pages 570 � 575,Bled, Slovenia.

Sigaud, O., Salaun, C., and Padois, V. (2011).

On-line regression algorithms for learning mechanical models of robots: a survey.Robotics and Autonomous Systems, 59(12):1115�1129.

Vijayakumar, S., D'Souza, A., and Schaal, S. (2005).

LWPR: A scalable method for incremental online learning in high dimensions.Technical report, Edinburgh: Press of University of Edinburgh.

Wilson, S. W. (2001).

Function approximation with a classi�er system.In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 974�981,San Francisco, California, USA. Morgan Kaufmann.

49 / 49

an operational space control approach - isir · 2013-09-27 · an operational space control...

Documents