by paul woon yin lee a thesis submitted in conformity with ......metric on the space of densities...

Symplectic and Subriemannian Geometry of Optimal

Transport

by

Paul Woon Yin Lee

A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Mathematics

University of Toronto

Copyright c© 2009 by Paul Woon Yin Lee

Abstract

Symplectic and Subriemannian Geometry of Optimal Transport

Paul Woon Yin Lee

Doctor of Philosophy

Graduate Department of Mathematics

University of Toronto

2009

This thesis is devoted to subriemannian optimal transportation problems. In the first

part of the thesis, we consider cost functions arising from very general optimal control

costs. We prove the existence and uniqueness of an optimal map between two given

measures under certain regularity and growth assumptions on the Lagrangian, absolute

continuity of the measures with respect to the Lebesgue class, and, most importantly,

the absence of sharp abnormal minimizers. In particular, this result is applicable in the

case where the cost function is square of the subriemannian distance on a subriemannian

manifold with a 2-generating distribution. This unifies and generalizes the correspond-

ing Riemannian and subriemannian results of Brenier, McCann, Ambrosio-Rigot and

Bernard-Buffoni. We also establish various properties of the optimal plan when abnor-

mal minimizers are present.

The second part of the thesis is devoted to the infinite-dimensional geometry of op-

timal transportation on a subriemannian manifold. We start by proving the following

nonholonomic version of the classical Moser theorem: given a bracket-generating distri-

bution on a connected compact manifold (possibly with boundary), two volume forms of

equal total volume can be isotoped by the flow of a vector field tangent to this distri-

bution. Next, we describe formal solutions of the corresponding subriemannian optimal

transportation problem and present the Hamiltonian framework for both the Otto cal-

culus and its subriemannian counterpart as infinite-dimensional Hamiltonian reductions

ii

on diffeomorphism groups. Finally, we define a subriemannian analog of the Wasserstein

metric on the space of densities and prove that the subriemannian heat equation defines

a gradient flow on the subriemannian Wasserstein space with the potential given by the

Boltzmann relative entropy functional.

Measure contraction property is one of the possible generalizations of Ricci curvature

bound to more general metric measure spaces. In the third part of the thesis, we discuss

when a three dimensional contact subriemannian manifold satisfies such property.

iii

Acknowledgements

I would like to thank my advisor, Professor Boris Khesin, for his guidance, dedication,

and invaluable advice along this project. I would also like to express my deep appreciation

to Professor Andrei Agrachev, Professor Luigi Ambrosio, Professor Robert McCann and

Professor Nassif Ghoussoub for having various fruitful discussions and constant support.

I am grateful to all Professors who taught me during my graduate study. Specifically,

I would like to convey my gratitude to Professor Velimir Jurdjevic for introducing me

to the theory of optimal control and Professor Yael Karshon for teaching me symplectic

geometry. Last but not least, I would like to thank the staff members, especially Ida

Bulat, from the Mathematics Department at University of Toronto for taking care of all

my nonacademic problems.

iv

Contents

1 Introduction 1

1.1 Part I: Optimal Transportation under Nonholonomic Constraints . . . . 1

1.2 Part II: A Nonholonomic Moser Theorem and Optimal Mass Transport . 5

1.3 Part III: Generalized Ricci Curvature Bounds for Three Dimensional Con-

tact Subriemannian Manifolds . . . . . . . . . . . . . . . . . . . . . . . . 7

I Optimal Transportation under Nonholonomic Constraints 10

2 Background 11

2.1 Elementary Optimal Control Theory . . . . . . . . . . . . . . . . . . . . 11

2.2 Optimal Mass Transportation . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Nonholonomic Optimal Transportation Problem 22

3.1 Existence and Uniqueness of an Optimal Map . . . . . . . . . . . . . . . 22

3.2 Regularity of Control Costs . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Applications: Mass Transportation on Subriemannian Manifolds . . . . . 36

4 Optimal Transportation with Non-Lipschitz Cost 39

4.1 Normal Minimizers and Properties of Optimal Maps with Continuous Op-

timal Control Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Optimal Maps with Abnormal Minimizers . . . . . . . . . . . . . . . . . 41

v

II A Nonholonomic Moser Theorem and Optimal Mass Trans-

port 45

5 Classical and Nonholonomic Moser Theorems 46

6 Distributions on Diffeomorphism Groups 56

6.1 A Fibration on the Group of Diffeomorphisms . . . . . . . . . . . . . . . 56

6.2 A Nonholonomic Distribution on the Diffeomorphism Group . . . . . . . 59

6.3 Accessibility of Diffeomorphisms and Consequences . . . . . . . . . . . . 60

7 The Riemannian Geometry of Diffeomorphism Groups and Mass Trans-

port 62

8 The Hamiltonian Mechanics on Diffeomorphism Groups 67

8.1 Averaged Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

8.2 Riemannian Submersion and Symplectic Quotients . . . . . . . . . . . . 69

8.3 Hamiltonian Flows on the Diffeomorphism Groups . . . . . . . . . . . . . 71

8.4 Hamiltonian Flows on the Wasserstein Space . . . . . . . . . . . . . . . . 74

9 The Subriemannian Geometry of Diffeomorphism Groups 77

9.1 Subriemannian Submersion . . . . . . . . . . . . . . . . . . . . . . . . . . 78

9.2 A Subriemannian Analog of the Otto Calculus. . . . . . . . . . . . . . . 81

9.3 The Nonholonomic Heat Equation . . . . . . . . . . . . . . . . . . . . . . 84

III Generalized Ricci Curvature Bounds for Three Dimen-

sional Contact Subriemannian Manifolds 88

10 Revisiting Subriemannian Geometry 89

11 Generalized Curvatures 93

11.1 The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

vi

11.2 The Three Dimensional Contact Case . . . . . . . . . . . . . . . . . . . . 97

12 Measure Contraction Properties 105

13 Isoperimetric Problems 115

IV Appendix 119

14 Proof of Pontryagin Maximum Principle for the Bolza Problem 120

15 Optimal Transportation and the Generalized Curvature 126

Bibliography 130

vii

List of Figures

5.1 A nonholonomic Hodge decomposition. . . . . . . . . . . . . . . . . . . . 52

6.1 The Moser theorem in both the classical and nonholonomic settings is a

path-lifting property in the diffeomorphism group. . . . . . . . . . . . . . 58

8.1 Hamiltonian flow of the Hamiltonian HM and its projection: The curve

φt(x) is the projection of the curve ΨHM

t (∇f(x)) to the manifold M . . . . 74

9.1 Subriemannian submersion: horizontal subdistribution T hor is mapped

isometrically to the tangent bundle TB of the base. . . . . . . . . . . . . 79

9.2 Projections of subriemannian geodesics from (S3, T ) in the Hopf bundle

give circles in S2, only one of which, the equator, is a geodesic on the base

S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

viii

Chapter 1

Introduction

Imagine we have a pile of sand and we want to move it from one place to another in

the most efficient way. This is an example of optimal transportation problem of Monge,

originally posed in 1781. Kantorovich in 1942 proposed to consider this problem together

with its dual, which turned out to be linear. After this famous work of Kantorovich [23]

a significant progress in this problem came only in the 90’s. In 1991, Brenier proved in

[16] that there is a unique transport plan to the above problem when the efficiency of the

transport plan is measured by the square of the Euclidean distance. Since then, various

generalizations were done for more general cost functions. This includes the work by

McCann [42], Ambrosio-Rigot [11], Bernard-Buffoni [14]. In the first part of the thesis,

we present a generalization of all the above to a more general type of cost functions called

optimal control cost.

1.1 Part I: Optimal Transportation under Nonholo-

nomic Constraints

Mathematically, the Monge-Kantorovich problem can be formulated as follows. The piles

of sands are replaced by Borel probability measures µ and ν on a manifold M and the

1

Chapter 1. Introduction 2

transport plans between these measures are replaced by maps ϕ : M → M . The cost for

transporting from a point x to another point y on the manifold is given by a function

c : M ×M → R. So, the total cost of the strategy ϕ is the average∫

Mc(x, ϕ(x))dµ(x).

The main goal is to show existence and uniqueness of a map ϕ which minimizes the total

cost, as well as to find the optimal ϕ explicitly whenever possible. More precisely,

Problem 1.1 Find a map ϕ : M → M which pushes µ forward to ν and minimizes the

following functional ∫

M

c(x, ϕ(x))dµ(x).

In 1942 Kantorovich studied a relaxed version of the above problem in his famous

paper [33]. However, a huge step toward solving the original problem was not achieved

until a decade ago by Brenier. In [16] Brenier proved the existence and uniqueness of

optimal map in the case where M = Rn and the cost function c is given c(x, y) = |x−y|2.Later, this was generalized by McCann [42] to the case of a closed Riemannian manifold

M with the cost given by the square of the Riemannian distance c(x, y) = d2(x, y).

Recently, Bernard and Buffoni [14] generalized this further to the case where the cost c

is the action associated to a Lagrangian function L : TM → R on a compact manifold

M . More precisely, the cost is given by

c(x, y) = inf

∫ 1

0

L(x(t), x(t))dt, (1.1)

where the infimum is taken over all curves x(·) joining the points x and y, and the

Lagrangian L is fibrewise strictly convex with superlinear growth. Note that if g is

a Riemannian metric with the corresponding Riemannian distance function d and the

Lagrangian L is defined by L(v) = g(v, v), then the cost function c is square of the

Riemannian distance function d2.

In the first part of the thesis, we consider costs similar to (1.1). However, instead of

minimizing among all curves, the infimum is taken over a subcollection of curves, called


admissible paths. These paths are given by a control system and the corresponding cost

function is called the optimal control cost. More precisely, a control system is a smooth

fiber-preserving map F of a locally trivial bundle P → M over the manifold M into its

tangent bundle TM . If the fibres of the bundle P → M are diffeomorphic to a set U ,

then the map F : P → TM can be written locally as F : (x, u) 7→ F (x, u), where x

is in the manifold M and u is in the set U . We assume that U is a closed subset of a

Euclidean space. Admissible controls are measurable bounded maps u(·) from [0, 1] to

U , while admissible paths are Lipschitz curves x(·) which satisfy the equation

x(t) = F (x(t), u(t)), (1.2)

where u(·) is an admissible control. Let L : M × U → R be a Lagrangian, then the

corresponding cost c is given by

c(x, y) = inf(x(·),u(·))

∫ 1

0

L(x(t), u(t)) dt, (1.3)

where the infimum is taken over all admissible pairs (x(·), u(·)) : [0, 1] → M × U such

that x(0) = x, y(0) = y.

In interesting cases the dimension of U is smaller than that of M , and yet any two

points of M can be connected by an optimal admissible path. In other words the control

system works as a nonholonomic constraint. The shortage of admissible velocities does

not allow us to recover an optimal path from its initial point and initial velocity and the

Euler–Lagrange description of the extremals does not work well. On the other hand, the

Hamiltonian approach remains efficient thanks to the Pontryagin maximum principle.

Another difficulty is the appearance of so called abnormal extremals (singularities of the

space of admissible paths) which make the situation quite different from the classical

optimal transport problem.

In Chapter 2 we recall basic necessary notions in optimal control theory and the

theory of optimal mass transportation. In Section 3.1 by using the arguments from the

theory of optimal mass transportation and the Pontryagin maximum principle in optimal


control theory, the existence and uniqueness of optimal map is established under certain

regularity assumptions. More precisely, the following theorem holds:

Theorem 1.2 [5] Under certain regularity and growth conditions, there exists a unique

solution to the Monge-Kantorovich problem (1.1) with the cost function c given by an

optimal control cost.

All of the assumptions of Theorem 1.2 are mild except the Lipschitz continuity of the

cost function. However, this continuity is well-known in all the known cases mentioned

after Problem 1.1. So, the theorem generalizes the work in [16, 42, 14].

In Section 3.2 we study the Lipschitz continuity of optimal control costs. There are

two types of minimizers to the cost in (1.3). They are called normal and abnormal

minimizers. If abnormal minimizers are absent, which is the case in [16, 42, 14], the cost

is not only Lipschitz but even semi-concave (see [17]). However, abnormal minimizers

are unavoidable in many interesting problems and, in particular, in all subriemannian

problems. It turns out that not all abnormal minimizers are dangerous. To keep the

Lipschitz property of the cost (although not its semi-concavity) it is sufficient to have

no sharp abnormal minimizers. Geometric control theory provides effective conditions

of the sharpness (see, for instance, [7, 9]). These conditions allow us to prove Lipschitz

continuity for a large class of optimal control costs. This, in turn, proves the existence

and uniqueness of optimal map of the corresponding Monge-Kantorovich problem.

A subriemannian manifold is a manifold M equipped with a plane distribution ∆ ⊆TM and an inner product g defined on ∆. The subriemannian distance d(x, y) between

two points x and y is defined as the length of the shortest path joining the two given

points and tangent to the distribution ∆. In Section 3.3 the optimal transportation

problem on a subriemannian manifold with cost function c given by the square of the

subriemannian distance d2 is considered. In this case all the mild regularity assumptions

are satisfied and we prove the following result:


Theorem 1.3 [5] Assume that the distribution ∆ is 2-generating (i.e. vector fields in ∆

together with their Lie brackets span all tangent spaces), then the function d2 is Lipschitz.

As a corollary we prove the existence and uniqueness of an optimal map for the

subriemannian optimal transportation problem with a 2-generating distribution. This

generalizes the corresponding result by Ambrosio-Rigot [11] on the Heisenberg group.

In Chapter 4 we prove certain properties of the optimal plan when abnormal minimiz-

ers are present. In Section 4.1 we consider flows whose trajectories are strictly abnormal

minimizers. We show that these flows cannot give an optimal plan for all “nice” initial

measures for a continuous cost. On the contrary, in Section 4.2, we show that these flows

are indeed optimal for an important class of problems with discontinuous costs.

1.2 Part II: A Nonholonomic Moser Theorem and

Optimal Mass Transport

The classical Moser theorem establishes that the total volume is the only invariant for a

volume form on a compact connected manifold with respect to the diffeomorphism action.

In the second part of the thesis we first prove a nonholonomic counterpart of this result

and present its applications to the problems of nonholonomic optimal mass transport.

The equivalence for the diffeomorphism action is often formulated in terms of “sta-

bility” of the corresponding object: the existence of a diffeomorphism relating the initial

object with a deformed one means that the initial object is stable, as it differs from the

deformed one merely by a coordinate change. Gray showed in [28] that contact structures

on a compact manifold are stable. Moser [43] established stability for volume forms and

symplectic structures. A leafwise counterpart of Moser’s argument for foliations was pre-

sented by Ghys in [27], while stability of symplectic-contact pairs in transversal foliations

was proved in [13]. In this part we establish stability of volume forms in the presence of


any bracket-generating distributions on connected compact manifolds. Recall that a dis-

tribution τ on the manifold M is called bracket-generating, or completely nonholonomic,

if local vector fields tangent to τ and their iterated Lie brackets span the entire tangent

bundle of the manifold M . The following theorem, which we call a nonholonomic Moser

theorem, is proved.

Theorem 1.4 [34] Any two volume forms of equal total volume on a manifold can be

isotoped by the flow of a time dependent vector field tangent to the bracket generating

distribution.

A version for manifolds with boundary is also proved.

Nonholonomic distributions arise in various problems related to rolling or skating,

wherever the “no-slip” condition is present. For instance, a ball rolling over a table

defines a trajectory in a configuration space tangent to a nonholonomic distribution of

admissible velocities. Note that such a ball can be rolled to any point of the table and

stopped at any a priori prescribed position. The latter is a manifestation of the Chow-

Rashevsky theorem (see e.g. [45]): For a bracket-generating distribution τ on a connected

manifold M any two points in M can be connected by a horizontal path (i.e. a path

everywhere tangent to the distribution τ). The motivation for considering volume forms

(or, densities) in a space with distribution can be related to problems with many tiny

rolling balls: It is more convenient to consider the density of such balls, rather than look

at them individually.

Note that for an integrable distribution there is a foliation to which it is tangent

and a horizontal path always stays on the same leaf of this foliation. Furthermore, for

an integrable distribution the existence of an isotopy between volume forms requires an

infinite number of conditions. On the contrary, the nonholonomic Moser theorem shows

that a non-integrable bracket-generating distribution imposes only one condition on total

volume of the forms for the existence of the isotopy between them.


Closely related to the nonholonomic Moser theorem is the existence of a nonholo-

nomic Hodge decomposition. We describe the related properties of the subriemannian

Laplace operator. We also formulate the corresponding nonholonomic mass transport

problem and describe its formal solutions as projections of horizontal geodesics on the

diffeomorphism group for the L2-Carnot-Caratheodory metric.

In order to give this description the Hamiltonian framework for what is now called

the Otto calculus is presented. We also introduce the notion of the Riemannian sub-

mersion picture for the problems of optimal mass transport is presented. It turns out

that the submersion properties can be naturally understood as an infinite-dimensional

Hamiltonian reduction on diffeomorphism groups, and this admits a generalization to the

nonholonomic setting. A nonholonomic analog of the Wasserstein metric on the space of

densities is defined. Finally, the following is an extension to the subriemannian setting

of Otto’s fundamental result on the heat equation:

Theorem 1.5 The subriemannian heat equation defines a gradient flow on the nonholo-

nomic Wasserstein space with potential given by the Boltzmann relative entropy func-

tional:

Ent(ρ) :=

∫

M

ρ log(ρ) µ .

1.3 Part III: Generalized Ricci Curvature Bounds

for Three Dimensional Contact Subriemannian

Manifolds

In the past few years, several connections between the optimal transportation problems

and curvature of Riemannian manifolds were found. One of them is the use of optimal

transportation for an alternative definition of Ricci curvature lower bound developed in a

series of papers [53, 20, 60]. Based on the ideas in these papers, a generalization of Ricci


curvature lower bound for general metric measure spaces, called curvature dimension

condition, is introduced in [39, 40, 58, 59, 48]. However the conditions are not easy

to check and there is no new example. Recently the case of a Finsler manifold was

studied in [49], but the result is very similar to that of the Riemannian case due to strict

convexity of the corresponding Hamiltonian. The situation changes dramatically in the

case of a subriemannian manifold. The reason is that the class of metric spaces we are

dealing with have Hausdorff dimensions strictly greater than their topological dimensions.

Therefore, the interplay of the metrics and the measures for these spaces should be

significantly different from that of the Riemannian or Finsler case. One particular case

of subriemannian manifolds, the Heisenberg group, is studied in [32]. In this case the the

space does not satisfy any curvature dimension condition. However, it satisfies a weaker

condition, a modification of, so called, measure contraction property.

In the third part of this thesis, we study a subriemannian version of the measure con-

traction property for all three dimensional contact subriemannian manifolds, generalizing

the corresponding result in [32]. This study uses a subriemannian generalization of the

classical Riemannian curvature. The nature of this invariant is dynamical rather than

metrical: the generalized curvature is simplest differential invariant for the geodesic flow

defined on the cotangent bundle T ∗M equipped with the bundle structure π : T ∗M → M .

The generalized curvature is easy to compute and we study its role in the measure con-

traction property in this thesis.

The structure of this part of the thesis is as follows. In Section 10 we recall and

introduce several notions on subriemannian geometry necessary for the third part of the

thesis. In Section 11 we recall and specialize the recent result of [37] on the curvature

type invariants of subriemannian manifolds to the three dimensional contact case. We

will also write down the explicit formulas for these invariants in this section. Section 12

contains the main theorem (Theorem 12.3) which shows that if the generalized curvatures

are bounded below by a constant, then the subriemannian manifold satisfies a generalized


measure contraction property MCP (r; 2, 3) (see Definition 12.4 below). In particular, if

the generalized curvatures are non-negative, then the subriemannian manifold satisfies

the measure contraction property MCP (0, 5). As a consequence, these spaces satisfy the

doubling property and a local Poincare inequality. In Section 5 we specialize to the case

where the contact subriemannian manifolds are related to the isoperimetric problems or

particles in magnetic fields. In this case the subriemannian manifold M is the total space

of a principle bundle πM : M → N . The base space N is a smooth surface equipped with

a Riemannian metric descending from the subriemannian metric of the total space M .

The total space M satisfies the measure contraction property MCP (0, 5) if the surface

N has non-negative Gauss curvature (Theorem 13.1. In particular, this is applicable to

the two famous examples: the Heisenberg group and the Hopf fibration.

Main results of this thesis are published in the papers [5, 34, 6]. In particular, the

results in Part I of this thesis is contained in [5], which is accepted for publication by the

Transactions of the American Mathematical Society.

Part I

Optimal Transportation under

Nonholonomic Constraints

10

Chapter 2

Background

In this chapter, we recall some basic notions and results from optimal control and optimal

transportation theory used in this thesis. We refer to [7, 26] and [63, 64] for more detail

on optimal control and optimal transportation theory, respectively.

2.1 Elementary Optimal Control Theory

A control system is used to single out a family of curves called admissible paths. Such

a system consists of a family of vector fields parameterized by some variables. The

parameter is allowed to change in time so that one can switch from one vector field

to another. This time dependent choice of the parameter is called a control and the

corresponding time dependent vector field is called a control vector field. Integral curves

of the flows of all such control vector fields are called admissible paths. More precisely,

let M be a smooth manifold and let U be a closed subset in the Euclidean space Rm,

which is called the control set. Let F : M ×U → TM be a Lipschitz continuous function

such that Fu := F (·, u) : M → TM are smooth vector fields for each point u in the

control set U . In other words, F defines the family of vector fields mentioned above

parameterized by the variable u. Assume also that the function (x, u) 7→ ∂∂x

F (x, u) is

continuous. Curves u(·) : [0, 1] → U in the control set U which are locally bounded and

11

Chapter 2. Background 12

measurable (i.e. u(·) ∈ L∞([0, 1], U)) are called admissible controls (or simply controls).

A control system is the following family of ordinary differential equations parameter-

ized by the variable u:

x(t) = F (x(t), u(t)). (2.1)

The solutions x(·) to the above control system are called admissible paths and (x(·), u(·))are called admissible pairs.

The classical theory of ordinary differential equations implies that a unique solution

to the system (2.1) exists locally for almost all values of time t. Moreover, the resulting

local flow is smooth in the space variable x and Lipschitz in the time variable t. The

control system is complete if the flows of all control vector fields exist globally.

Let x0 and x1 be two points on the manifold M . The set of all admissible pairs

(x(·), u(·)) for which the corresponding admissible paths x(·) start at the point x0 are

denote by Cx0 and those that start at the point x0 and end at the point x1 are denoted

by Cx1x0

. A control system is called controllable if any two points can be connected by an

admissible path. In other words the control system is controllable if the set Cx1x0

is always

nonempty for any pair of points x0 and x1 on the manifold.

Fix a smooth function L : M ×U → R, called Lagrangian. The cost of an admissible

pair (x(·), u(·)) is given by the integral∫ 1

0L(x(t), u(t))dt. The cost c(x0, x1) between two

points x0 and x1 is given by taking the infimum of the costs of all the admissible paths

connecting x0 and x1. More precisely,

c(x0, x1) = inf

∫ 1

0

L(x(t), u(t))dt, (2.2)

where the infimum is taken over all admissible pairs (x(·), u(·)) for which the admissible

paths x(·) connect the points x0 and x1. (i.e. x(0) = x0 and x(1) = x1) We declare that

the cost is ∞ if there is no admissible path connecting the two points. (i.e. when Cx1x0

is

empty)


The cost function defined above is said to be complete if the control system is complete

and given any pair of points (x0, x1) there exists an admissible pair which achieves the

infimum above.

Remark 2.1 The infimum of the problem in (2.2) can be equivalently characterized by

using the backward control system

x(s) = −F (x(s), u(s))

for which the admissible paths are the same as those in (2.1) but moving in the opposite

direction in time. The infimum over all admissible paths of the backward control system

which start at the point x1 and end at the point x0 is the same as that in (2.2).

This fact will become important for the later discussion.

Next, we fix a function f and consider the following minimization problem, commonly

known as the Bolza problem:

Problem 2.2 Find an admissible pair (x(·), u(·)) which achieves the following infimum

inf(x(·),u(·))∈Cx0

∫ 1

0

L(x(s), u(s)) ds− f(x(1)),

where the infimum is taken over all admissible pairs (x(·), u(·)) for which the admissible

path x(·) starts at the point x0.

For each point u in the control set U define the corresponding Hamiltonian function

Hu : T ∗M → R by

Hu(px) = px(F (x, u)) + L(x, u).

If H : T ∗M → R is a function on the cotangent bundle, we denote its Hamiltonian vector

field by−→H . We denote the cotangent bundle projection by π : T ∗M → M and recall

that a covector α in T ∗M belongs to the subdifferential d−fx of a continuous function f

if there is a function g which touches f from below at x and α = dgx. Here g is touching

f from below at x means that f(x) = g(x) and g ≤ f in a neighborhood of x.


Next, we present an elementary version of the Pontryagin maximum principle which

gives necessary conditions for an admissible path to be a minimizer of the Bolza problem.

For the convenience of the readers a proof is given in the appendix.

Theorem 2.3 (Pontryagin Maximum Principle for the Bolza Problem)

Let (x(·), u(·)) be an admissible pair which achieves the infimum in Problem 2.2.

Assume that the function f in Problem 2.2 is sub-differentiable at the point x(1). Then,

for each α in the sub-differential d−fx(1) of f , there exists a Lipschitz path p : [0, 1] →T ∗M such that p(1) = −α and satisfies the following conditions for almost all values of

time t in the interval [0, 1]:

π(p(t)) = x(t),

˙p(t) =−→H u(t)(p(t)),

Hu(t)(p(t)) = minu∈U

Hu(p(t)).

(2.3)

Remark 2.4 A distribution of rank k on a manifold M is a vector subbundle of rank k of

the tangent bundle TM (i.e. it is a smooth assignment of vector subspace of dimension k

in each tangent space). Let ∆ be one such a distribution and assume that the distribution

∆ is trivializable, i.e. there exists a system of vector fields X1, ..., Xk which span ∆ at

every point: ∆x = spanX1(x), ..., Xk(x). Consider the following control system:

x(t) =k∑

i=1

ui(t)Xi(x(t)) (2.4)

with the initial condition x(0) = x and the final condition x(1) = y. Recall that we

denote by Cyx the set of all admissible pair (x(·), u(·)) such that the admissible path x(·)

satisfies x(0) = x and x(1) = y.

Let L : M ×U → R be the Lagrangian defined by L(x, u) =∑k

i=1 u2i and let c be the

corresponding cost function


c(x, y) = inf(x(·),u(·))∈Cy

x

∫ 1

0

L(x, u) dt (2.5)

where the infimum is taken over all admissible pairs (x(·), u(·)) for which the correspond-

ing admissible path x(·) starts at x0 and end at x1.

If the number k of vector fields is equal to the dimension n of the manifold M , then

the distribution ∆ coincides with the tangent bundle TM of the manifold M and all

paths are admissible. It also defines a Riemannian metric on M by declaring that the

vector fields X1, ..., Xn are orthonormal everywhere. The cost function c is the square d2

of the Riemannian distance d and the minimizers of this system correspond to the length

minimizing geodesics on M . However, this does not necessarily work when the tangent

bundle is non-trivializable.

To overcome this difficulty, we can modify the general definition of a control system

in the following way. Let P be a locally trivial bundle over M with the bundle projection

πP : P → M and let F : P → TM be a fibre preserving map (i.e. The image F (Px)

of each fibre Px under the map F is contained in the corresponding tangent space TxM

with the same base point x). The control system corresponding to the map F is given

by

x(t) = F (v(t)). (2.6)

The admissible pairs are locally bounded measurable paths v(·) : [0, 1] → P in the bundle

P which project to a Lipschitz path in M (i.e. x(·) = πP (v(·)) is Lipschitz) and satisfy

the control system (2.6). If we let P be the trivial bundle M × U , we get back the

control system (2.1) introduced earlier. If a Lagrangian L : P → R is fixed, then the

corresponding cost function c is defined by

c(x, y) = infv(·)∈Cy

x

∫ 1

0

L(v(t))dt, (2.7)


where the infimum is taken over all admissible pairs v(·) : [0, 1] → P such that the

corresponding admissible paths x(·) := πP (v(·)) satisfy x(0) = x and x(1) = y.

Let 〈·, ·〉 be a Riemannian metric on the manifold M . If P is the tangent bundle

TM of M , the map F is the identity map, and the Lagrangian L : P → R is given

by L(v) = 〈v, v〉, then the cost function c is equal to the square d2 of the Riemannian

distance d. Note that the tangent bundle in this case can be non-trivializable and so this

new control system resolves the problem mentioned above.

Similar to the Riemannian case, if a distribution is trivializable, then one can define

the corresponding control system (2.4) using the trivializing vector fields X1, ..., Xk.The admissible paths of the control system (2.4) are paths tangent to the distribution ∆.

A subriemannian metric 〈·, ·〉S can be defined by declaring that the vector fields X1, ..., Xk

are orthonormal (see Section 3.3 for the basics on subriemannian geometry). The cost

(2.5) is the square d2S of the subriemannian distance dS. For general distributions ∆ which

are non-trivializable, we consider the general control system (2.6) with fibre bundle given

by the distribution P = ∆ and the fibre preserving map F : ∆ → TM given by the

inclusion map. If the Lagrangian L is defined by a subriemannian metric L(v) = 〈v, v〉S,

then the cost is again the square of the subriemannian distance.

In the first part of the thesis (except for Section 4.2), we consider the control systems

of the form (2.1), in order to avoid heavy notations. However, all the results have easy

generalization to more general intrinsically defined systems which were just introduced.

2.2 Optimal Mass Transportation

The theory of optimal mass transportation deals with moving one mass to another in

the most efficient way. Mathematically, the masses are replaced by Borel probability

measures µ and ν on a manifold M , the transportation strategy is replaced by a map


ϕ : M → M and the efficiency is measured by a cost function c : M ×M → R ∪ +∞.The total cost of the strategy ϕ is given by the average

∫

M

c(x, ϕ(x))dµ.

The strategy ϕ is said to move the mass µ to ν if the map ϕ pushes the measure µ

forward to ν. Here, we recall that the push forward measure ϕ∗µ of a measure µ by a

Borel map ϕ is defined by ϕ∗µ(B) = µ(ϕ−1(B)) for all Borel sets B in M . The problem

of optimal mass transportation is to show existence and uniqueness of a Borel map which

pushes µ forward to ν and minimizes the total cost. More precisely,

Problem 2.5 Find a Borel map ϕ which achieves the infimum

infϕ∗µ=ν

∫

M

c(x, ϕ(x)) dµ

among all Borel maps ϕ : M → M that push the Borel probability measure µ forward to

ν.

In many cases, by assuming absolute continuity of the measure µ with respect to the

Lebesgue measure, such a problem admits a solution which is unique (up to µ-measure

zero). This unique solution to Problem 2.5 is called the optimal map or the Brenier map.

The first optimal map was found by Brenier in [16] in the case where the manifold

was Rn and the cost was c(x, y) = |x− y|2. Later, it was generalized to arbitrary closed

connected Riemannian manifolds in [42] and the costs were given by the square of the

Riemannian distances. The case of the Heisenberg group with the costs given by the

square of the subriemannian distance and the gauge distance was done in [11]. In [14]

a much more general cost arising from the Lagrange problem was considered. To define

such a cost, let L : TM → R be a Lagrangian function on the tangent bundle TM of a

compact manifold M . The cost is given by minimizing the action corresponding to the

Lagrangian. More precisely,


c(x, y) = infx(0)=x,x(1)=y

∫ 1

0

L(x(t), x(t))dt, (2.8)

where the infimum is taken over all smooth curves x(·) joining the points x and y.

The existence and uniqueness of an optimal map with the cost given by (2.8) was

shown under the following assumptions:

• The Lagrangian L is fibrewise strictly convex, i.e. the restriction of L to each

tangent space TxM is strictly convex.

• L has superlinear growth, i.e. L(v)/|v| → 0 as |v| → ∞.

• The cost c is complete, i.e. the infimum (2.8) is always achieved by some C2 smooth

paths.

Recently, the compactness assumption on the manifold or on the measures was eliminated

in [23, 22].

In this thesis a connected manifold M without boundary is considered and the cost

function c is given by the optimal control problem (2.2). Next, we consider the relaxed

version of Problem 2.5, called the Kantorovich reformulation. Let π1 : M × M → M

and π2 : M × M → M be the projection onto the first and the second components,

respectively. Let Γ be the set of all joint measures Π on the product manifold M ×M

with marginals µ and ν: π1∗Π = µ and π2∗Π = ν.

Problem 2.6 Find a measure Π which achieves the following infimum

C(µ, ν) := infΠ∈Γ

∫

M×M

c(x, y) dΠ(x, y),

where the infimum is taken over all joint measures Π on the product manifold M ×M

with marginals µ and ν.

Remark 2.7 If ϕ is an optimal map in (2.5), then (id× ϕ)∗µ is a joint measure in the

set Γ. It follows that Problem 2.6 is a relaxation of the problem in (2.5).


Before we proceed with the existence proof of an optimal map, let us look at the

following dual problem of Kantorovich, see [63] for history and the importance of this

dual problem for optimal transportation.

Let c be a cost function and let f be a function on the manifold M . The c1-transform

of the function f is the function f c1 given by

f c1(y) := infx∈M

[c(x, y)− f(x)].

Similarly, the c2-transform of the function f is defined by

f c2(x) := infy∈M

[c(x, y)− f(y)].

The function f is a c-concave function if it satisfies f c1c2 = f . Let F be the set of all pairs

of functions (g, h) on the manifold such that g : M → R∪−∞ and h : M → R∪−∞are in L1(µ) and L1(ν) respectively, and g(x) + h(y) ≤ c(x, y) for all pair (x, y) in the

product manifold M×M . The dual problem of Kantorovich is the following maximization

problem:

Problem 2.8

sup(g,h)∈F

∫

M

gdµ +

∫

M

h dν.

The existence of solution to the above problem is well-known, see [63, Theorem 1.3].

Theorem 2.9 Assume that there exists two functions a and b such that a is µ-measurable,

b is ν-measurable and the cost function c satisfies c(x, y) ≤ a(x) + b(y) for all (x, y) in

M × M . If c is also continuous, bounded below, and C(µ, ν) < ∞, then there exists a

c-concave function f such that the function f is in L1(µ), its c1-transform f c1 is in L1(ν)

and the pair (f, f c1) achieves the supremum in Problem 2.8.

The following theorem on the regularity of the dual pair above is also well-known

stronger results can be found in [64, Chapter 12]. We give a simple proof here for the

convenience of the reader.


Theorem 2.10 Assume that the cost c(x, y) is continuous, bounded below, and the mea-

sures µ and ν are compactly supported. Then the functions f and f c1 are upper semicon-

tinuous. If the function x 7→ c(x, y) is also locally Lipschitz on a set U and the Lipschitz

constant is independent of y locally, then f can be chosen locally Lipschitz on U .

Proof Fix ε > 0. Since f(x) = infx∈M [c(x, y)− f c1(y)], there exists y such that f(x) +

ε/2 > c(x, y) − f c1(y). Also, we have f(x′) + f c1(y) ≤ c(x′, y) for any x′ in M . So,

combining the above equations and continuity of the cost c, we have

f(x′)− f(x) < ε

for any x′ sufficiently close to x. Therefore, f is upper semicontinuous.

Let K be a compact set containing the support of the measures µ and ν. Let

g(x) =

f(x), if x ∈ K

−∞, if x ∈ M \K, g′(x) =

f c1(x), if x ∈ K

−∞, if x ∈ M \K,

then the pair (g, g′) achieves the maximum in Problem 2.8. Let h = (g′)c2 , then the pair

(h, hc1) also achieves the maximum. By definition of g′, we have h(x) = infy∈K

[c(x, y) −f c1(y)]. By the same argument as in the proof of upper semicontinuity, for any x and x′

in the compact subset K ′ of U we can find y in K such that

h(x′)− h(x) < c(x, y)− c(x′, y) + ε/2.

By the assumption on the cost c, the above inequality becomes

eh(x′)− h(x) ≤ kd(x, x′) + ε/2

for some constant k > 0 which is independent of x on the subset K ′. By switching the

roles of x and x′, the result follows. ¤

The following theorem about minimizers in Problem 2.6 is well-known. See, for in-

stance, [63, Chapter 2], [64, Theorem 5.10].


Theorem 2.11 In the assumptions of Theorem 2.9, Problem 2.6 admits a minimizer.

Moreover, the joint measure Π in the set Γ achieves the infimum in Problem 2.6 if and

only if Π is concentrated on the set

(x, y) ∈ M ×M |f(x) + f c(y) = c(x, y).

Chapter 3

Nonholonomic Optimal

Transportation Problem

3.1 Existence and Uniqueness of an Optimal Map

In this section, we show that the optimal transportation problem with the cost given by

an optimal control cost (1.3) can be solved under certain regularity assumptions. Let

H : T ∗M → R be the function defined by

H(px) = maxu∈U

(px(F (x, u))− L(x, u)) .

If H is well-defined and C2, then we denote its Hamiltonian vector field by−→H and let

et−→H be its flow. Let f be the function defined in Theorem 2.9 and consider the map

ϕ : M × [0, 1] → M defined by ϕ(x, t) = π(et−→H (−dfx)). The following joint result with

A. Agrachev is the precise version of Theorem 1.2.

Theorem 3.1 [5] The map x 7→ ϕ1(x) := ϕ(x, 1) is the unique (up to µ-measure zero)

optimal map for Problem (2.5) with the cost c given by formula (2.2) under the following

assumptions:

22

Chapter 3. Nonholonomic Optimal Transportation Problem 23

1. The measures µ and ν are compactly supported and µ is absolutely continuous with

respect to the Lebesgue measure.

2. c is bounded below and c(x, y) is also locally Lipschitz in the x variable and the

Lipschitz constant is independent of y locally.

3. The cost c is complete, i.e. given any pairs of points (x0, x1) in the manifold M ,

there exists an admissible pair (x(·), u(·)) such that the pair achieves the infimum for

the cost (2.2), where u(·) is locally bounded measurable, x(0) = x0 and x(1) = x1.

4. The Hamiltonian function H defined in (3.5) is well-defined and C2.

5. The Hamiltonian vector field−→H is complete, i.e. global flow exists.

The rest of this section is devoted to the proof of Theorem 3.1. Let Cy be the set of

all admissible pairs such that the corresponding admissible paths x(·) start at the point

y (x(0) = y) and satisfy the following backward control system:

x(t) = −F (x(t), u(t)). (3.1)

Let Cxy be the set of all those pairs in Cy such that the corresponding admissible paths

x(·) end at the point x: x(1) = x.

First, we have the following observation:

Proposition 3.2 Let x be a point which achieves the infimum f c1(y) = infx∈M

(c(x, y)− f(x))

and let (x(·), u(·)) be an admissible pair in Cxy such that the corresponding admissible path

x(·) minimizes the cost given by

c(x, y) = inf(x(·),u(·))∈Cx

y

∫ 1

0

L(x(t), u(t)) dt,

then (x(·), u(·)) achieves the following infimum


f c1(y) = inf(x(·),u(·))∈Cy

∫ 1

0

L(x(s), u(s)) ds− f(x(1)). (3.2)

If x(t) = x(1− t), then x(·) achieves the following infimum

f c1(y) = inf(x(·),u(·))∈Cy

∫ 1

0

L(x(s), u(s)) ds− f(x(0)), (3.3)

where Cy denotes the set of all admissible pairs (x(·), u(·)) satisfying the following control

system:

x(t) = F (x(t), u(t)), x(1) = y.

Let u(·) be as in the above Proposition and let u(t) = u(1− t). Let Ht : T ∗M → R be

given by Ht(px) = px(F (x, u(t)))−L(x, u(t)). The following is a consequence of Theorem

2.3.

Proposition 3.3 Let x(·) be a curve that achieves the infimum in (3.2) and let x(t) =

x(1− t). Assume that α is contained in the subdifferential of the function f at the point

x(0), then there exists a Lipschitz curve p : [0, 1] → T ∗M in the cotangent bundle such

that p(0) = −α and the following conditions hold for almost all time t in the interval

[0, 1]:

π(p(t)) = x(t),

˙p(t) =−→H t(p(t)),

Ht(p(t)) = maxu∈U

(p(t)(F (x(t), u))− L(x(t), u))

(3.4)

Proof By Theorem 2.3, there exists a curve p : [0, 1] → T ∗M in the cotangent bundle

T ∗M such that

π(p(t)) = x(t),

p(1) = −α,

˙p(t) =−→H u(t)(p(t)),

Hu(t)(p(t)) = minu∈U

(−p(t)(F (x(t), u(t))) + L(x(t), u(t))) ,


where Hu(p) = minu∈U

(−p(F (x, u(t))) + L(x, u(t))).

Let p(t) = p(1− t) and u(t) = u(1− t), then the equations above become

π(p(t)) = x(t),

p(0) = −α,

˙p(t) =−→H u(t)(p(t)),

Hu(t)(p(t)) = maxu∈U

(p(t)(F (x(t), u(t)))− L(x(t), u(t))) .

¤

Assume that the Hamiltonian function H : T ∗M → R defined by

H(px) = maxu∈U

(px(F (x, u))− L(x, u)) (3.5)

is well-defined and C2. Let−→H be the Hamiltonian vector field of the function H and

let et−→H be its flow. The function f defined in Theorem 2.9 is Lipschitz and so it is

differentiable almost everywhere by the Rademacher Theorem. Moreover, the map df :

M → T ∗M is measurable and locally bounded. So, if we let ϕ : M × [0, 1] → M be the

map defined by ϕ(x, t) = π(et−→H (−dfx)), then the map ϕ is a Borel map.

Proposition 3.4 Under the assumptions of Theorem 3.1, the following is true for µ-

almost all x: Given a point x in the support of µ, there exists a unique point y such

that

f(x) + f c1(y) = c(x, y).

Moreover, the points x and y are related by y = ϕ(x, 1).

Proof We first claim that the infimum f(x) = infy∈M [c(x, y) − f c1(y)] is achieved for

µ almost all x. Indeed, by assumption, we have f(x) + f c1(y) ≤ c(x, y) for all (x, y) in

M ×M . Also, if Π is the measure defined in Theorem 2.11, then f(x) + f c1(y) = c(x, y)

for Π-almost everywhere. Since the first marginal of the measure Π is µ, the following


is true for µ almost all x: Given a point x in the manifold M , there exists y in M such

that f(x) + f c1(y) = c(x, y). This proves the claim.

Now, fix a point x for which the infimum infy∈M [c(x, y)− f c1(y)] is achieved and let

y be the point which achieves the infimum. By the proof of the above claim, x achieves

the infimum f c1(y) = infx∈M [c(x, y) − f(x)]. Therefore, by completeness of the cost c

and Proposition 3.2, there exists an admissible path x such that x(0) = x, x(1) = y and

x achieves the infimum (3.3).

Since f is Lipschitz on a bounded open set U containing the support of µ and ν,

it is almost everywhere differentiable on U by the Rademacher Theorem. Since µ is

absolutely continuous with respect to the Lebesgue measure, f is also differentiable µ-

almost everywhere. By Theorem 3.3, for µ-almost all x, there exists a curve p(·) : [0, 1] →T ∗M in the cotangent bundle T ∗M such that

˙p(t) =−→H t(p(t)),

p(0) = −dfx,

π(p(t)) = x(t),

Ht(p(t)) = maxu∈U

(p(t)(F (x(t), u))− L(x(t), u)) ,

where Ht is the function on the cotangent bundle T ∗M given by Ht(px) = pxF (x, u(t))−L(x, u(t)).

By the definition of H, we have H(p(t)) = Ht(p(t)). But we also have H(p) ≥ Ht(p)

for all p ∈ T ∗M . Since both H and Ht are C2, we have dH(p(t)) = dHt(p(t)). Hence,

−→H t(p(t)) =

−→H (p(t)) for almost all t. The result follows from the uniqueness of a solution

to an ODE.

¤

The rest of the arguments for the existence and uniqueness of an optimal map follow

from Theorem 2.11.

Proof of Theorem 3.1


As mentioned above, Problem 2.6 is a relaxation of Problem 2.5. We can recover

the latter from the former by restricting the minimization to joint measures of the form

(id× φ)∗µ, where φ is any Borel map pushing forward µ to ν. Therefore, the statement

of Theorem 3.1 follows from Theorem 2.11 and Proposition 3.4.

¤

3.2 Regularity of Control Costs

In Theorem 3.1 we prove the existence and uniqueness of optimal maps under certain

regularity conditions on the cost. Most of the conditions in the theorem are easy to

verify, except for conditions (2) and (3). In this section we will present simple-to-verify

conditions implying (2) and (3), which guarantee this regularity. This includes the com-

pleteness and the Lipschitz regularity of the cost. First, we recall some basic notions in

the geometry of optimal control problems, see [7, 32] and reference therein for details.

Fix a point x0 in the manifold M and assume that the control set U is Rk. In

this section we change our previous convention on admissible controls. From now on

admissible controls are mappings in L1([0, 1], U), rather than L∞([0, 1], U). Denote by

Cx0 the set of all admissible pairs (x(·), u(·)) such that the corresponding admissible paths

x(·) start at x0. Moreover, we assume that the control system is of the following form:

x(t) = X0(x(t)) +k∑

i=1

ui(t)Xi(x(t)), (3.6)

where u(t) = (u1(t), ..., uk(t)) is a control and X0, X1, ..., Xk are fixed smooth vector fields

on the manifold M . The Cauchy problem for system (3.6) is well-posed for any locally

integrable vector-function u(·). We assume throughout this section that system (3.6) is

complete, i.e. all solutions of the system are defined on the whole semi-axis [0, +∞). This

completeness assumption is automatically satisfied if one of the following is true: either


(i) M is a compact manifold, or (ii) M is a Lie group and the fields Xi are left-invariant,

or (iii) M is a closed submanifold of the Euclidean space and the vector fields Xi satisfy

the following growth conditions |Xi(x)| ≤ c(1 + |x|), i = 0, 1, . . . k.

Define the endpoint map Endx0 : L1([0, 1],Rk) → M by

Endx0(u(·)) = x(1),

where (x(·), u(·)) is the admissible pair corresponding to the control system (3.6) with

initial condition x(0) = x0. It is known that the endpoint map Endx0 is a smooth map

(see [45]). The critical points of the map Endx are called singular controls. Admissible

paths corresponding to singular controls are called singular paths.

We also need the Hessian of the mapping Endx0 at the critical point (see [7] for

detail). For this let us consider the following general situation. Let E be a Banach space

which is a dense subspace of a Hilbert space H. Consider a map Φ : E → Rn for which

the restriction Φ|W to any finite dimensional subspace W of the Banach space E is C2.

Moreover, we assume that the first and second derivatives of all the restrictions Φ|W are

continuous in the Hilbert space topology on the bounded subsets of the Banach space E.

In other words, for each point w in the space W , the following Taylor expansion holds:

Φ(v + w)− Φ(v) = DvΦ(w) +1

2D2

vΦ(w) + o(|w|2),

where DvΦ is a linear map and D2vΦ is a quadratic mapping from E to Rn. Moreover,

Φ(v), DvΦ|W , and D2vΦ|W depend continuously on v in the topology of the Hilbert space

H, while v is contained in a ball of E.

The Hessian HessvΦ : ker DvΦ → cokerDvΦ of the function Φ is the restriction of the

second derivative D2vΦ to the kernel of the first derivative DvΦ with values considered

up to the image of DvΦ. The Hessian is an intrinsically defined operator, i.e. a part of

D2vΦ which does not rely on the choice of variables in the target Banach space E and the

image Euclidean space Rn.


Let p be a covector in the dual space Rn∗ which annihilates the image of the first

derivative p(DvΦ) = 0, then p(HessvΦ) is a well-defined real quadratic form on ker DvΦ.

We denote the Morse index of this quadratic form by ind(pHessvΦ). Recall that the

Morse index of a quadratic form is the supremum over all dimensions of the subspaces

on which the form is negative definite.

Definition 3.5 A critical point v of the map Φ is called sharp if there exists a covector

p 6= 0 such that p(DvΦ) = 0 and ind(pHessvΦ) < +∞.

Needless to say, the spaces E, H, and Rn can be substituted by smooth manifolds (Ba-

nach, Hilbert, and n-dimensional) in the above consideration.

Going back to the control system (3.6), let (x(·), u(·)) be an admissible pair for this

system. We say that the control u(·) and the path x(·) are sharp if u(·) is a sharp critical

point of the endpoint map Endx(0).

One necessary condition for controls and paths to be sharp is the, so called, Goh

condition.

Proposition 3.6 (The Goh condition) If p(Hessu(·)(Endx(0))) < +∞, then

p(t)(Xi(x(t))) := p(t)([Xi, Xj](x(t))) = 0 i, j = 1, . . . , k, 0 ≤ t ≤ 1,

where p(t) = P ∗t,1p and Pt,τ is the time-dependent local flow of the control system (3.6)

with control equal to u(·) and the time parameter τ .

See [7, Proposition 20.3, 20.4],[9] and references therein for the proof and other effective

necessary and sufficient conditions of sharpness.

Now consider the optimal control problem of finding the minimizers for

c(x, y) = inf(x(·),u(·))∈Cy

x

∫ 1

0

L(x(t), u(t)) dt, (3.7)

where the infimum ranges over all admissible pairs (x(t), u(t)) corresponding to the con-

trol system (3.6) with the initial condition x(0) = x and the final condition x(1) = y.


Let H : T ∗M → R be the Hamiltonian function defined in (3.5). For simplicity, we

assume that the Hamiltonian is C2. A minimizer x(·) of the above problem is called

normal if there exists a curve p : [0, 1] → T ∗M in the cotangent bundle T ∗M such

that π(p(t)) = x(t) and p(·) is a trajectory of the Hamiltonian vector field−→H . Singular

minimizers are also called abnormal. According to this, not so perfect, terminology

a minimizer can simultaneously be normal and abnormal. A minimizer which is not

normal is called strictly abnormal. Under some regularity and growth conditions on the

Lagrangian L, all strictly abnormal minimizers are sharp, see Theorem 3.7.

The following theorem gives simple sufficient conditions for completeness of the cost

function defined in (3.7). It is a combination of the well-known existence result (see [56])

and necessary optimality conditions (see [7]).

Theorem 3.7 (Completeness of costs) Let L be a Lagrangian function which satisfies

the following:

1. L is bounded below and, on each compact subset of M , there exist a constant K > 0

such that the ratio |u|L(x,u)+K

tends to 0 uniformly as |u| → ∞;

2. for any compact C ⊂ M there exist constants a, b > 0 such that |∂L∂x

(x, u)| ≤a(L(x, u) + |u|) + b, ∀x ∈ C, u ∈ Rk;

3. the function u 7→ L(x, u) is a strongly convex function for all x ∈ M .

Then, for each pair of points (x, y) in the manifold M which satisfy c(x, y) < +∞,

there exists an admissible pair (x(·), u(·)) achieving the infimum in (3.7). Moreover, the

minimizer x(·) is either a normal or a sharp path.

Remark 3.8 Under the assumptions of Theorem 3.7, strictly abnormal minimizers are

sharp.


Remark 3.9 Theorem 3.7 leads to many examples that satisfy condition (3) in Theorem

3.1. In particular, this applies to the case where the control set is U = Rk and the

Lagrangian is L(x, u) =∑k

i=1 u2i .

Remark 3.10 It was shown that the normal optimal controls in Theorem 3.7 are locally

bounded, see [56]. This allows us to restrict the endpoint map to L∞([0, 1], U) in Theorem

3.11 below.

Next, we proceed to the main result of this section which concerns with the Lipschitz

regularity of the cost function. This takes care of condition (2.) in Theorem 3.1. The

following theorem is also a more precise version of Theorem 1.3.

Theorem 3.11 [5](Lipschitz regularity) Assume that the system (2.4) does not admit

sharp controls and the Lagrangian L satisfies conditions of Theorem 3.7. Then the set

D = (x, Endx(u(·)))|x ∈ M, u ∈ L∞([0, 1],Rk) is open in the product M ×M . More-

over, the function (x, y) 7→ c(x, y) is locally Lipschitz on the set D, where the cost c is

given by (2.2).

Remark 3.12 In the case where the endpoint map is a submersion, there is no singular

control. Therefore, Theorem 3.11 is applicable. In particular, this theorem, together with

Theorem 3.1 and 3.7, can be used to treat the cases considered in [16, 42, 14]. In Section

3.3, we will consider a class of examples where the endpoint map is not necessarily a

submersion, but Theorem 3.11 is still applicable.

The rest of the section is devoted to the proof of Theorem 3.11.

Definition 3.13 Given v in the Banach space E, we write INDvΦ ≥ m if

ind(pHessvΦ)− codim imDvΦ ≥ m

for any nonzero covector p in Rn∗ \0 which annihilates the image of the first derivative

DvΦ: p(DvΦ) = 0.


It is easy to see that v ∈ E : INDvΦ ≥ m is an open subset of E for any integer m.

Let Bv(ε), Bx(ε) be the balls of radius ε in E and Rn centered at v and x, respectively.

The following is a qualitative version of the openness of a mapping Φ and any mapping

C0 close to it.

Definition 3.14 We say that the map Φ : E → Rn is r-solid at the point v of the

Banach space E if for some constant c > 0 and any sufficiently small ε > 0 the following

inclusion holds for any map Φ : Bv(ε) → Rn which is C0 close to the map Φ:

BΦ(v)(cεr) ⊂ Φ(Bv(ε)).

As usual, to be C0 close to Φ means that there exists δ > 0 such that supw∈Bv(ε)

|Φ(w) −Φ(w)| ≤ δ.

The Implicit function theorem together with the Brouwer fixed point theorem imply

that Φ is 1-solid at any regular point.

Lemma 3.15 If INDvΦ ≥ 0 then Φ is 2-solid at v.

Proof This lemma is a refinement of Theorem 20.3 from [7]. It can be proved by a

slight modification of the proof of the cited theorem. Obviously, we may assume that

v is a critical point of Φ. Moreover, by an argument in the proof of the theorem cited

above, we may assume that E is a finite dimensional space, v = 0 and Φ(0) = 0.

Let E = E1 ⊕ E2, where E2 = ker D0Φ. For any w ∈ E we write v = v1 + v2, where

v1 ∈ E1, v2 ∈ E2. Now consider the mapping

Q : v 7→ D0Φv1 +1

2D2

0Φ(v2), v ∈ E.

It is shown in the proof of [7, Theorem 20.3] that Q−1(0) contains regular points in

any neighborhood of 0. Hence, there exists c > 0 such that the image of any continuous

mapping Q : B0(1) → Rn sufficiently close (in C0-norm) to Q|B0(1) contains the ball B0(c).


Now, if Φ is C0 close to Φ, we set Φε(v) = 1ε2 Φ(ε2v1 + εv2) and Φε(v) = 1

ε2 Φ(ε2v1 + εv2).

Then, by differentiating Φε with respect to ε, it is easy to see that Φε(v) = Q(v) + o(1)

as ε → 0. This shows that Φε and hence Φε are C0 close to Q for all sufficiently small ε.

Therefore, Φε|B0(1) contains the ball B0(c). This gives

B0(c) ⊂ Φε(B0(1)) ⊂ 1

ε2Φ(B0(ε))

and the result follows. ¤

Remark 3.16 The minimization problem (3.7) can be rephrased into a constrained min-

imization problem in an infinite-dimensional space. For simplicity, consider the case of

M = Rn. Let (x(·), u(·)) be an admissible pair of the control system (3.6) and let

ϕ : Rn × L∞([0, 1],Rk) → R be the function defined by

ϕ(x, u(·)) =

∫ 1

0

L(x(t), u(t))dt.

Let Φ : Rn × L∞([0, 1],Rk) → Rn × Rn be the map

Φ(x, u(·)) = (x,Endx(u(·))).

Finding the minimum in (3.7) is now equivalent to minimizing the function ϕ on the set

Φ−1(x, y).

Due to the above remark, we can consider the following general setting. Consider a

function ϕ : E → R on the Banach space E such that ϕ|W is a C2-mapping for any finite

dimensional subspace W of E. Recall that the Hilbert space H contains E as a dense

subset. Assume that the function ϕ as well as the first and second derivatives of the

restrictions ϕ|W are continuous on the bounded subsets of E in the topology of H. Also,

recall that the map Φ : E → Rn is C2 when restricted to any finite dimensional subspace

of E. Assume that K is a bounded subset of E that is compact in the topology of H

and satisfies the following property:

ϕ(v) = minϕ(w)|w ∈ E, Φ(w) = Φ(v)


for any v in the set K.

We define a function µ on Φ(K) by the formula µ(Φ(v)) = ϕ(v) for any v in K. In

the case discussed in Remark 3.16, K is the set of all minimizers and the function µ is

the cost function.

Lemma 3.17 If INDvΦ ≥ 2 for any v ∈ K, then µ is locally Lipschitz.

Proof Given v in the set K, there exists a finite dimensional subspace W of the Banach

space E such that INDv (Φ|W ) ≥ 2. Then INDv

(Φ|W∩ker Dvϕ

)≥ 0. Hence Φ|W∩ker Dvϕ is

2-solid at v and

Φ (Bv(ε) ∩W ∩ ker Dvϕ) ⊃ BΦ(v)(cε2)

for some c and any sufficiently small ε.

Let x = Φ(v) and |x − y| = cε2, then y = Φ(w) for some w ∈ Bv(ε) ∩W ∩ ker Dvϕ.

We have:

µ(y)− µ(x) ≤ ϕ(w)− µ(x) = ϕ(w)− ϕ(v) ≤ c′|w − v|2 ≤ c′ε2.

Here, we use the fact that w is in ker Dvϕ for the second last inequality and that w is in

Bv(ε) for the last inequality. Moreover, the compactness of K allows us to choose c, c′

and the bound for ε for all v ∈ K. In particular, we can exchange x and y in the last

inequality. Hence |µ(y)− µ(x)| ≤ c′c|y − x|. ¤


We describe the proof only in the case M = Rn in order to simplify the language,

while the generalization to any manifold is straightforward. We set

E = Rn × L∞([0, T ],Rk), H = Rn × L2([0, T ],Rk),

Φ(x, u(·)) = (x,Endx(u(·))), ϕ(x, u(·)) =

∫ 1

0

L(x(t), u(t)) dt,

and apply the above results.


First of all, IND(x,u(·))Φ = INDu(·)Endx = +∞ for all (x, u(·)) since our system does

not admit sharp controls. Lemma 3.15 implies that Φ is 2-solid and D = Φ(E) is open.

Now let B be a ball in E equipped with the weak topology of H. The endpoint

mapping Φ is continuous as a mapping from B to R2n. The strict convexity of L implies

that there is some constant c > 0 such that

ϕ(xn, un(·))− ϕ(x, u(·)) ≥ c‖un(·)− u(·)‖2L2 + o(1)

as xn → x, un(·) u(·), and (xn, un(·)) ∈ B. Therefore, limn→∞

ϕ(xn, un(·)) ≥ ϕ(x, u(·))and lim

n→∞ϕ(xn, un(·)) = ϕ(x, u(·)) if and only if (xn, un(·)) converges to (x, u(·)) in the

strong topology of H.

Assume that ϕ(xn, un(·)) = µ(Φ(xn, un(·))) for all n. Inequality ϕ(x, u(·)) < limn→∞

ϕ(xn, un(·))would imply that

µ(Φ(x, u(·))) < limn→∞

µ(Φ(xn, un(·))).

On the other hand, the openness of the map Φ implies that the map µ is uppersemi-

continuous. Together with the continuity of Φ, we have the following inequality:

µ(Φ(x, u(·))) ≥ limn→∞

µ(Φ(xn, un(·))).

Hence limn→∞

ϕ(xn, un(·)) = ϕ(x, u(·)) and (xn, un(·)) converges to (x, u(·)) in the strong

topology of H.

Let C be a compact subset of D and

K = (x, u(·)) ∈ E : Φ(x, u(·)) ∈ C, ϕ(x, u(·)) = µ(Φ(x, u(·))) .

Then K is contained in some ball B. Recall that B is equipped with the weak topology,

it is compact. Now calculations of previous two paragraphs imply compactness of K in

the strong topology of H. Finally, we derive the Lipschitz property of µ|C from Lemma

3.17. ¤


3.3 Applications: Mass Transportation on Subrie-

mannian Manifolds

In this section we will apply the results in the previous sections to some subriemannian

manifolds. First, let us recall some basic definitions.

Let ∆ and ∆′ be two (possibly singular) distributions on a manifold M . The distri-

bution [∆, ∆′] is defined by the span of all Lie brackets of vector fields in ∆ with vector

fields in ∆′, i.e.

[∆, ∆′]x = span[v, w](x)|v ∈ ∆, w ∈ ∆′.

Define inductively the following distributions: ∆2 = ∆ + [∆, ∆] and ∆k = ∆k−1 +

[∆, ∆k−1]. A distribution ∆ is called k-generating if ∆k = TM and the smallest such k

is called the degree of nonholonomy. Also, the distribution is called bracket generating

if it is k-generating for some k.

If ∆ is a bracket generating distribution, then it defines a flag of distributions by

∆ ⊂ ∆2 ⊂ ... ⊂ TM.

The growth vector of the distribution ∆ at the point x is defined by (dim ∆x, dim ∆2x, ..., dim TxM).

The distribution ∆ is called regular if the growth vector is the same for all x. Let

x(·) : [a, b] → M be an admissible curve, that is a Lipschitz curve almost everywhere

tangent to ∆. The following classical result on bracket generating distributions is the

starting point of subriemannian geometry.

Theorem 3.18 (Chow and Rashevskii) Given any two points x and y on a connected

manifold M with a bracket generating distribution, there exists an admissible curve join-

ing the two points.

Using the Chow-Rashevskii Theorem, we can define the subriemannian distance d.

Let 〈, 〉 be a fibre inner product on the distribution ∆, called a subriemannian met-

ric. The length of an admissible curve x(·) is defined in the usual way: length(x(·)) =


∫ b

a

√〈x(t), x(t)〉 dt. The subriemannian distance d(x, y) between two points x and y is

defined by the infimum of the lengths of all admissible curves joining x and y. There is

a quantitative version of the Chow-Rashevskii Theorem, called the Ball-Box Theorem,

which gives Holder continuity of the subriemannian distance, see [45] for detail.

Corollary 3.19 Let dS be the metric of a complete subriemannian space with a distribu-

tion ∆. The function d2S is locally Lipschitz if and only if the distribution is 2-generating.

Proof Systems with 2-generating distributions do not admit sharp paths because of

the Goh condition. So d2S is locally Lipschitz by Theorem 3.11. Conversely, if the degree

of nonholonomy of the distribution is greater than 2, then it follows from the ball-box

theorem [45, Theorem 2.10] that the function d2S is not Lipschitz. Indeed, let us fix a

point x in the manifold M . If d2S is locally Lipschitz, then d2

S(x, y) ≤ c|x − y| for some

constant c and for all y in a neighborhood U of x. On the other hand, by ball-box

theorem, there exists a point z in U whose subriemannian distance dS from the point

x is ε and its Euclidean distance from x satisfies |x − z| < Cεk = CdkS(x, z) for some

constant C and for all sufficiently small ε. Here, k > 2 is the degree of nonholonomy of

the distribution. This gives a contradiction and so d2S is not Lipschitz. ¤

Combining Corollary 3.19 with Theorem 3.1, we prove the existence and uniqueness

of an optimal map for a subriemannian manifold with a 2-generating distribution.

Theorem 3.20 [5] Let M be a complete subriemannian manifold defined by a 2-generating

distribution, then there exists a unique (up to µ-measure zero) optimal map to the Monge’s

problem with the cost c given by c = d2S, where dS is the subriemannian distance of M .

Remark 3.21 The locally Lipschitz property of the distance d off the diagonal is guar-

anteed for much bigger class of distribution. In particular, it is proved in [3] that

a generic distribution of rank > 2 does not admit non-constant sharp trajectories.


In the case of Carnot groups, the following estimates hold: A typical n-dimensional

Carnot group with rank k distribution does not admit nonconstant sharp trajectories if

n ≤ (k − 1)k + 1 and it has nonconstant sharp length minimizing trajectories provided

that n ≥ (k − 1)(k2

3+ 5k

6+ 1). Recall that a simply-connected Lie group endowed with

a left-invariant distribution V1 is a Carnot group if the Lie algebra g is a graded nilpo-

tent Lie algebra such that it is Lie generated by the subspace with smallest grading (i.e.

g = V1 ⊕ V2 ⊕ ...⊕ Vk, [Vi, Vj] = Vi+j, Vi = 0 if i > k and the subspace V1 Lie-generates

g).

Clearly, if the cost is locally Lipschitz off the diagonal, then the statement of Theo-

rem 4.1 remains valid after making the extra assumption that the supports of the initial

measure µ and the final measure ν are disjoint: supp(µ) ∩ supp(ν) = ∅. In the subrie-

mannian case a more general result is proved in [24]. In [24] it is shown that the Lipschitz

continuity of the distance d is only needed out of the diagonal even when the measures

µ and ν do not have distinct supports.

Chapter 4

Optimal Transportation with

Non-Lipschitz Cost

4.1 Normal Minimizers and Properties of Optimal

Maps with Continuous Optimal Control Costs

According to Theorem 3.11, it remains to study the case where sharp controls exist. In

this section we will describe properties of optimal maps when the cost is continuous.

Normal minimizers will play a very important role.

We continue to study optimal control problem (3.6), (3.7). As we already mentioned,

strictly abnormal minimizers must be sharp. In addition, if X0 = 0 in (3.6), then the

optimal control cost is continuous. According to the discussion at the end of the previous

section, we expect strictly abnormal minimizers to be present mainly for generic rank

2 distributions on manifolds of dimension greater than 3 and for typical Carnot group

of sufficiently large corank. In these situations strictly abnormal minimizers are indeed

unavoidable.

The existence of strictly abnormal minimizers for subriemannian manifolds was first

proved in [44]. In [61] and [38] it was shown that there are many strictly abnormal

39

Chapter 4. Optimal Transportation with Non-Lipschitz Cost 40

minimizers in general for subriemannian manifolds, see Theorem 4.1 below. Finally, a

general theory of abnormal minimizers for rank 2 distributions was developed in [8]. We

refer to [45] for a detail account on the history and references on abnormal minimizers.

Here is a sample result in [61] which is of interest to us:

Theorem 4.1 (Liu and Sussman) Let M be a 4-dimensional manifold with a rank 2

regular bracket generating distribution ∆ and a subriemannian metric <, >. Let X1 and

X2 be two global sections of ∆ such that

1. X1 and X2 are everywhere orthonormal,

2. X1, X2, [X1, X2] and [X2, [X1, X2]] are everywhere linearly dependent,

3. X2, [X1, X2] and [X2, [X1, X2]] are everywhere linearly independent.

Then any sufficiently short segments of the integral curves of the vector field X2 are

strictly abnormal minimizers.

We call a local flow a strictly abnormal flow if the corresponding trajectories are all

strictly abnormal minimizers. An interesting question is whether the time-1-map of an

abnormal flow is an optimal map. The following theorem shows that this is not the case

for any reasonable initial measure and continuous cost.

Theorem 4.2 Assume that the cost c in (2.5) is continuous, bounded below, and the

support of the measure µ is equal to the closure of its interior. If ϕ : M → M is a

continuous map such that (id × ϕ)∗µ achieves the infimum in Problem 2.6, then x and

ϕ(x) can be connected by a normal minimizer on a dense set of x’s in the support of µ.

Proof By Theorem 2.9, there exists a function f : M → R ∪ −∞ such that f and

its c1-transform achieve the supremum in Problem 2.8. Moreover, by Theorem 2.10, the

functions f and f c1 are upper semicontinuous. By Theorem 2.11,

f(x) + f c1(ϕ(x)) = c(x, ϕ(x)) (4.1)


for µ-almost all x. By upper semicontinuity of f and f c1 , we have

f(x) + f c1(ϕ(x)) ≥ c(x, ϕ(x)).

But f(x) + f c(y) ≤ c(x, y) for any x, y in the manifold M . So, the equality (4.1)

holds for all x’s in the support U of µ. Therefore, x achieves the infimum f c1(φ(x)) =

infz∈M [c(z, φ(x))− f c1(z)] for all x in the support of µ. Moreover, using (4.1), it is easy

to see that the function f is continuous on U . In particular, it is subdifferentiable on a

dense set of U . By Proposition 3.2 and Theorem 3.3, x and ϕ(x) can be connected by a

normal minimizer if f is subdifferentiable at x. This proves the theorem. ¤

4.2 Optimal Maps with Abnormal Minimizers

In this section, we describe an important class of control systems which admit smooth

optimal maps built essentially from abnormal minimizers. Recall that abnormal mini-

mizers are singular trajectories of the control system, whose definition does not depend

on the Lagrangian.

Let ρ : MG−→ N be a principal bundle with a connected Abelian structure group G

and let X1, . . . , Xk be vertical vector fields which generate the action of G. Consider the

following control system

x(t) = X0(x(t)) +k∑

i=1

ui(t)Xi(x(t)), (4.2)

where X0 is a smooth vector field on M , and the re-scaled systems

x(t) = εX0(x(t)) +k∑

i=1

ui(t)Xi(x(t)) (4.3)

for ε > 0.

We define the Hamiltonian H : T ∗N → R on the cotangent bundle T ∗N of the base

N by

H(px) = maxpx(dρ(X0(y))|y ∈ ρ−1(x) (4.4)


where px is a covector in T ∗N . We also assume that the maximum above is achieved for

any p in T ∗N and it is finite.

A typical example is the Hopf bundle φ : SU(2)S1−→ S2 and a left-invariant vector

field F0. In this case, the Hamiltonian is given by H(p) = α|p|, where α is a constant

and |p| is the length of the covector p with respect to the standard (constant curvature)

Riemannian structure on the sphere, see [7, Section 22.2].

Consider the following control system on the base N with an admissible pair y(·)contained in the G-bundle ρ : M

G−→ N and an admissible trajectory x(t) = ρ(y(t)), see

Remark 2.6:

x(t) = dρ(X0(y(t))). (4.5)

The function H in (4.4) is the Hamiltonian of the time-optimal problem of the control

system (4.5). (Recall that the time optimal problem is the following minimization prob-

lem: Fix two points x0 and x1 in N and minimize the time t1 among all admissible

trajectories x(·) of the control system (4.5) such that x(t0) = x0 and x(t1) = x1.)

Remark 4.3 System (4.5) is the reduced system associated to system (4.2) according

to the reduction procedure described in [7, Chapter 22]. In particular, ρ transforms any

admissible trajectory of system (4.2) to the admissible trajectory of system (4.5). Also,

the smooth extremal trajectories of the time-optimal problem for system (4.5) are images

under the map ρ of singular trajectories of system (4.2).

For any ε > 0 and any C2 smooth function f : N → R, we introduce the map

Φεf : N → N, Φε

f (x) = π(eε ~H(dxf)), x ∈ N,

where π : T ∗N → N is the standard projection and t 7→ et ~H is the Hamiltonian flow of

H. Set

D = p ∈ T ∗N : H(p) > 0, H is of class C2 at p.

Assume that Φεf pushes the measure µ′ forward to another measure ν ′ on N . Consider

some “lifts” µ and ν of the measures µ′ and ν ′: ρ∗µ = µ′, ρ∗ν = ν ′. Let Ψ : M −→ M


be an optimal map pushing forward µ to ν, then the following theorem says that Ψ is a

covering of Φεf : ρ Ψ = Φε

f ρ.

Theorem 4.4 Let K be a compact subset of N and a ∈ C2(N). Assume that df |K ⊂D. Let µ and ν be Borel probability measures such that supp(ρ∗(µ)) ⊂ K. Then, for

any sufficiently small ε > 0 and any optimal Borel map Ψ : M → M of the control

system (4.3) with any Lagrangian L : TM → R, the following is true whenever ρ∗(ν) =

Φεf ∗(ρ∗(µ)):

ρ Ψ = Φεf ρ.

In particular, x and Ψ(x) are connected by singular trajectories.

Proof We start with the following

Definition 4.5 We say that a Borel map Q : K → N is ε-admissible for system (4.3)

if there exists a Borel map ϕ : K → L∞([0, ε], G) such that

Q(x0) = x (ε; ϕ(x0)(·)) , ∀x0 ∈ K,

where t 7→ x (t; ϕ(x0)(·)) is an admissible trajectory of the reduced control system (4.5)

with the initial condition x (0; ϕ(x0)(·)) = x0.

We are going to prove that Φεf is an admissible map, unique up to a ρ∗µ-measure zero

set, which transforms ρ∗µ into ρ∗ν. This fact implies the statement of the theorem.

Inequality H(dxf) > 0 implies that dπ( ~H(dxf)) is transversal to the level hypersurface

of f passing through x. Hence the map Φεf is invertible in a neighborhood of K for any

sufficiently small ε. Moreover, the curve t 7→ Φtf (y), 0 ≤ t ≤ ε, is a unique admissible

trajectory of system (4.5) which starts at the hypersurface f−1(f(x)) and arrives at the

point Φεf (x) at time moment not greater than ε. The last fact is proved by a simple

adaptation of the standard sufficient optimality condition (see [7, Chapter 17]).


Now we set

fε(x) = f((Φε

a)−1(x)

)+ ε,

then fε is a smooth function defined in a neighborhood of K.

The optimality property of Φεf implies that

fε(Q(x)) ≤ fε

(Φε

f (x))

for any ε-admissible map Q and any x ∈ K, and the inequality is strict at any point x

where Q(y) 6= Φεf (x). In particular, if

ρ∗µ(x ∈ K : Q(x) 6= Φε

f (x)) > 0,

then ∫

N

fε d(Q∗(ρ∗(µ))) =

∫

N

fε Qd(ρ∗(µ)) <

∫

N

fε Φεf d(ρ∗(µ)) =

∫

N

fε d(ρ∗(ν)).

Hence Q∗(ρ∗(µ)) 6= ρ∗(ν). This proves the first part of the theorem. Recall that the maps

Ψ and Φ satisfy the following relation:

ρ Ψ = Φεf ρ.

It follows from this and Remark 4.3 that x and Ψ(x) are connected by a singular mini-

mizer. ¤

Part II

A Nonholonomic Moser Theorem

and Optimal Mass Transport

45

Chapter 5

Classical and Nonholonomic Moser

Theorems

The main goal of this chapter is to prove the following nonholonomic version of the

classical Moser theorem. It is also a more precise version of Theorem 1.4 mentioned in

the introduction. Consider a distribution τ on a compact manifold M (without boundary,

unless otherwise stated).

Theorem 5.1 Let τ be a bracket-generating distribution, and µ0, µ1 be two volume forms

on M with the same total volume:∫

Mµ0 =

∫M

µ1. Then there exists a diffeomorphism

φ of M which is the time-one-map of the flow φt of a non-autonomous vector field Vt

tangent to the distribution τ everywhere on M for every t ∈ [0, 1], such that φ∗µ1 = µ0.

Note that the existence of the “nonholonomic isotopy” φt is guaranteed by the only

condition on equality of total volumes for µ0 and µ1, just like in the classical case:

Theorem 5.2 [43] Let M be a manifold without boundary, and µ0, µ1 are two volume

forms on M with the same total volume:∫

Mµ0 =

∫M

µ1. Then there exists a diffeomor-

phism φ of M , isotopic to the identity, such that φ∗µ1 = µ0.

46

Chapter 5. Classical and Nonholonomic Moser Theorems 47

Remark 5.3 The classical Moser theorem has numerous variations and generalizations,

some of which we would like to mention.

a) Similarly one can show that not only the identity, but any diffeomorphism of M is

isotopic to a diffeomorphism which pulls back µ1 to µ0.

b) The Moser theorem also holds for a manifold M with boundary. In this case a

diffeomorphism φ is a time-one-map for a (non-autonomous) vector field Vt on M , tangent

to the boundary ∂M .

c) Moser also proved in [43] a similar statement for a pair of symplectic forms on a

manifold M : if two symplectic structures can be deformed to each other among symplectic

structures in the same cohomology class on M , these deformation can be carried out by

a flow of diffeomorphisms of M .

Below we describe to which degree these variations extend to the nonholonomic case.

Apparently, the most straightforward generalization of the classical Moser theorem

is its version “with parameters.” In this case, volume forms on M smoothly depend on

parameters and have the same total volume at each value of this parameter:∫

Mµ0(s) =

∫M

µ1(s) for all s. The theorem guarantees that the corresponding diffeomorphism exists

and depends smoothly on this parameter s.

The following theorem can be regarded as a modification of the parameter version:

Theorem 5.4 [34] Let π : N → B be a fibration of an n-dimensional manifold N over a

k-dimensional base manifold B. Suppose that µ0, µ1 are two smooth volume forms on N .

Assume that the pushforwards of these n-forms to B coincide, i.e. they give one and the

same k-form on B: π∗µ0 = π∗µ1. Then, there exists a diffeomorphism φ of N which is

the time-one-map of a (non-autonomous) vector field Vt tangent everywhere to the fibers

of this fibration and such that φ∗µ1 = µ0.

Remark 5.5 Note that in this version the volume forms are given on the ambient man-

ifold N , while in the parametric version of the Moser theorem we are given fiberwise


volume forms. There is also a similar version of this theorem for a foliation, cf. e.g.

[27]. In either case, for the corresponding diffeomorphism to exist the volume forms have

to satisfy infinitely many conditions (the equality of the total volumes as functions in

the parameter s or as the push-forwards π∗µ0 and π∗µ1). The case of a fibration (or a

foliation) corresponds to an integrable distribution τ , and presents the “opposite case” to

a bracket-generating distribution. Unlike the case of an integrable distribution, the exis-

tence of the corresponding isotopy between volume forms in the bracket-generating case

imposes only one condition, the equality of the total volumes of the two forms (regardless,

e.g., of the distribution growth vector at different points of the manifold).

First, we recall a proof of the classical Moser theorem. To show how the proof changes

in the nonholonomic case, we split it into several steps.

Proof

1) Connect the volume forms µ0 and µ1 by a “segment” µt = µ0+t(µ1−µ0), t ∈ [0, 1].

We will be looking for a diffeomorphism gt sending µt to µ0: g∗t µt = µ0. By taking the

t-derivative of this equation, we get the following “homological equation” on the velocity

Vt of the flow gt: g∗t (LVtµt + ∂tµt) = 0, where ∂tgt(x) = Vt(gt(x)). This is equivalent to

LVtµt = µ0 − µ1 ,

since ∂tµt = −(µ0 − µ1).

By rewriting µ0−µ1 = ρtµt for an appropriate function ρt, we reformulate the equation

LVtµt = ρtµt as the problem divµtVt = ρt of looking for a vector field Vt with a prescribed

divergence ρt. Note that the total integral of the function ρt (relative to the volume µt)

over M vanishes, which manifests the equality of total volumes for µt.

2) We omit the index t for now and consider a Riemannian metric on M whose

volume form is µ. We are looking for a required field V with prescribed divergence

among gradient vector fields V = ∇u, which “transport the mass” in the fastest way.


This leads us to the elliptic equation divµ(∇u) = ρ, i.e. ∆u = ρ, where the Laplacian ∆

is defined by ∆u := divµ∇u and depends on the Riemannian metric on M .

3) The key part of the proof is the following

Lemma 5.6 The Poisson equation ∆u = ρ on a compact Riemannian manifold M is

solvable for any function ρ with zero mean:∫

Mρ µ = 0 (with respect to the Riemannian

volume form µ).

Proof First we describe the space Coker ∆ := (Im ∆)⊥L2 , i.e. find the space of all

functions h which are L2-orthogonal to the image Im ∆. By applying integration by

parts twice, one has:

0 = 〈h, ∆u〉L2 = −〈∇h,∇u〉L2 = 〈∆h, u〉L2

for all smooth functions u on M . Then such functions h must be harmonic, and hence

they are constant functions on M : (Im ∆)⊥L2 = const (see [62, p.402] for detail). Since

the image Im ∆ is closed, it is the L2-orthogonal complement of the space of constant

functions Im ∆ = const)⊥L2 . The condition of orthogonality to constants is exactly

the condition of zero mean for ρ: 〈const, ρ〉L2 =∫

Mρ µ = 0. Thus the equation ∆u = ρ

has a weak solution for ρ with zero mean, and the ellipticity of ∆ implies that the solution

is smooth for a smooth function ρ. ¤

4) Now, take Vt := ∇ut and let gtV be the corresponding flow on M . Since M is

compact and Vt is smooth, the flow exists for all time t. The diffeomorphism φ := g1V ,

the time-one-map of the flow gtV , gives the required map which pulls back the volume

form µ1 to µ0: φ∗µ1 = µ0. ¤

Proof of Theorem 5.4, the Moser theorem for a fibration:

We start by defining the new volume form on the fibres F using the pushforward

k-form ν0 := π∗µ0 on the base B and the volume n-form µ0 on N . Namely, consider the


pull-back k-form π∗ν0 to N . Then there is a unique (n− k)-form µF0 on fibers such that

µF0 ∧ π∗ν0 = µ0. More precisely, let v1, ..., vn−k be a linearly independent set of vectors

in a tangent space TxN of N at x which is tangent to the fibre. If we extend the above

set of tangent vectors to a basis v1, ..., vn of TN , then the volume form µF0 is defined by

µF0 (v1, ..., vn−k) =

µ0(v1, ..., vn)

π∗ν0(vn−k+1, ..., vn).

It follows from linear algebra that the above definition of µF0 is independent of the choice

of extension and there is only one such n − k-form on the fibre. Similarly we find µF1 .

Due to the equality of the pushforwards π∗µ0 and π∗µ1, the total volumes of µF0 and µF

1

are fiberwise equal. Hence by the Moser theorem applied to the fibres, there is a smooth

vector field tangent to the fibers, smoothly depending on a base point, and whose flow

sends one of the (n− k)-forms, µF1 , to the other, µF

0 . This field is defined globally on N ,

and hence its time-one-map pulls back µ1 to µ0. ¤

Now we turn to a nonholonomic distribution on a manifold.

Proof of Theorem 5.1, the nonholonomic version of the Moser theorem.

1) As before, we connect the forms by a segment µt, t ∈ [0, 1], and we come to the

same homological equation. The latter reduces to divµV = ρ with∫

ρ µ = 0, but the

equation now is for a vector field V tangent to the distribution τ .

2) Consider some Riemannian metric on M . Now we will be looking for the required

field V in the form V := P τ∇u, where P τ is a pointwise orthogonal projection of tangent

vectors to the planes of our distribution τ .

We obtain the equation divµ(P τ∇u) = ρ. Rewrite this equation by introducing the

sub-Laplacian ∆τu := divµ(P τ∇u) associated to the distribution τ and the Riemannian

metric on M . The equation on the potential u becomes ∆τu = ρ.

3) An analog of Lemma 5.6 is now as follows.

Proposition 5.7 [34] a) The sub-Laplacian operator ∆τu := divµ(P τ∇u) is a self-

adjoint hypoelliptic operator. Its image is closed in L2.


b) The equation ∆τu = ρ on a compact Riemannian manifold M is solvable for any

function ρ with zero mean:∫

Mρ µ = 0.

Proof a) The principal symbol δτ of the operator ∆τ is the sum of squares of vector

fields forming a basis for the distribution τ : δτ =∑

X2i , where Xi form a horizontal

orthonormal frame for τ . This is exactly the Hormander condition of hypoellipticity [29]

for the operator ∆τ . The self-adjointness follows from the properties of projection and

integration by parts. The closedness of the image in L2 follows from the results of [54, 55].

b) We need to find the condition of weak solvability in L2 for the equation ∆τu = ρ.

Again, we are looking for all those functions h which are L2-orthogonal to the image of

∆τ (or, which is the same, in the kernel of this operator):

0 = 〈h, ∆τu〉L2 = 〈h, divµ(P τ∇u)〉L2

for all smooth functions u on M . In particular, this should hold for u = h. Integrating

by parts we come to

0 = 〈h, divµ(P τ∇h)〉L2 = −〈∇h, P τ∇h〉L2 = −〈P τ∇h, P τ∇h〉L2 ,

where in the last equality we used the projection property (P τ )2 = P τ = (P τ )∗. Then

P τ∇h = 0 on M , and hence the equation ∆τu = ρ is solvable for any function ρ ⊥L2

h | P τ∇h = 0. We claim that all such functions h are constant on M . Indeed, the

condition P τ∇h = 0 means that LXh = 0 for any horizontal field X, i.e. a field tangent

to the distribution τ . But then h must be constant along any horizontal path, and due to

the Chow-Rashevsky theorem it must be constant everywhere on M . Thus the functions

ρ must be L2-orthogonal to all constants, and hence they have zero mean. This implies

that the equation divµ(P τ∇u) = ρ is solvable for any L2 function ρ with zero mean. For

a smooth ρ the solution is also smooth due to hypoellipticity of the operator. ¤

4) Now consider the horizontal field Vt := P τ∇ut. As before, the time-one-map of its

flow exists for the smooth field Vt on the compact manifold M , and it gives the required

diffeomorphism φ. ¤


According to the classical Helmholtz-Hodge decomposition, any vector field W on a

Riemannian manifold M can be uniquely decomposed into the sum W = V + U , where

V = ∇f and divµU = 0. Proposition 5.7 suggests the following nonholonomic Hodge

decomposition of vector fields on a manifold with a bracket-generating distribution:

Proposition 5.8 1) For a bracket-generating distribution τ on a Riemannian manifold

M , any vector field W on M can be uniquely decomposed into the sum W = V + U ,

where the field V = P τ∇f and it is tangent to the distribution τ , while the field U is

divergence-free: divµ U = 0. Here P τ is the pointwise orthogonal projection to τ .

2) Moreover, if the vector field W is tangent to the distribution τ on M , then W =

V + U , where V = P τ∇f || τ as before, while the field U is divergence-free, tangent to τ ,

and L2-orthogonal to V , see Figure 5.1.

T

Xµ

P τ∇fW

U

U

W

V

V

∇f

Figure 5.1: A nonholonomic Hodge decomposition.

Proof Let ρ := divµ W be the divergence of W with respect to the Riemannian volume

µ. First, note that∫

Mρ µ = 0. Indeed,

∫M

(divµ W ) µ =∫

MLW µ = 0, since the volume

of µ is defined in a coordinate-free way, and does not change along the flow of the field

W .


Now, apply Proposition 5.7 to find a solution of the equation div (P τ∇f) = ρ. The

field V := P τ∇f is defined uniquely. Then the field U := W − V is divergence-free,

which proves 1).

For a field W || τ , we define V := P τ∇f in the same way. Note that V || τ as well.

Then U := W − V is both tangent to τ and divergence-free. Furthermore,

〈U , V 〉L2 = 〈U , P τ∇f〉L2 = 〈P τ U ,∇f〉L2 = 〈U ,∇f〉L2 = 〈divµ U , f〉L2 = 0,

where we used the properties of U established above: U || τ and divµ U = 0.

¤

Above we defined a sub-Laplacian ∆τu := divµ (P τ∇u) for a function u on a Rieman-

nian manifold M with a distribution τ .

Proposition 5.9 The sub-Laplacian ∆τ depends only on a subriemannian metric on the

distribution τ and a volume form in the ambient manifold M .

Proof Note that the operator P τ∇ on a function u is the horizontal gradient ∇τ of u,

i.e. the vector of the fastest growth of u among the directions in τ . If one chooses a local

orthonormal frame X1, ..., Xk in τ , then P τ∇u =∑k

i=1(LXiu)Xi. Thus the definition of

the horizontal gradient relies on the subriemannian metric only.

The sub-Laplacian ∆τψ = divµ (P τ∇ψ) needs also the volume form µ in the ambient

manifold to take the divergence with respect to this form. ¤

The corresponding nonholonomic heat equation ∂tu = ∆τu is also defined by the

subriemannian metric and a volume form.

For a manifold M with non-empty boundary ∂M and two volume forms µ0, µ1 of equal

total volume, the classical Moser theorem establishes the existence of diffeomorphism

φ which is the time-one-map for the flow of a field Vt tangent to ∂M and such that

φ∗µ1 = µ0.

The existence of the required gradient field Vt = ∇u is guaranteed by the following


Lemma 5.10 Let µ be a volume form on a Riemannian manifold M with boundary

∂M . The Poisson equation ∆u = ρ with Neumann boundary condition ∂∂n

u = 0 on the

boundary ∂M is solvable for any function ρ with zero mean:∫

Mρ µ = 0.

Here ∂∂n

is the differentiation in the direction of outer normal n on the boundary.

Proof Proceed in the same way as in Lemma 5.6 to find all functions L2-orthogonal to

the image Im ∆. The first integration by parts gives:

0 =

∫

M

h(∆u)µ = −∫

M

〈∇h,∇u〉µ +

∫

∂M

h (∂

∂nu) µ = −

∫

M

〈∇h,∇u〉µ ,

where in the last equality we used the Neumann boundary conditions. The second

integration by parts gives:

0 =

∫

M

〈∆h, u〉µ−∫

∂M

(∂

∂nh) uµ

This equation holds for all smooth functions u on M , so any such function h must be

harmonic in M and satisfy the Neumann boundary condition ∂∂n

h = 0. Hence, these are

constant functions on M : (Im ∆)⊥L2 = const (see [62, p.402] for detail). This gives

the same description as in the no-boundary case: the image (Im ∆) with the Neumann

condition consists of functions ρ with zero mean. ¤

Geometrically, the Neumann boundary condition means that there is no flux of density

through the boundary ∂M : 0 = ∂u∂n

= n · ∇u = n · V on ∂M .

For distributions on manifolds with boundary, the solution of the Neumann problem

becomes a much more subtle issue, as the behavior of the distribution near the boundary

affects the flux of horizontal fields across the boundary, and hence the solvability in this

problem. However, there is a class of domains in length spaces for which the solvability

of the Neumann problem was established.

Let LS be a length space with the distance function d(x, y), defined as infimum of

lengths of continuous curves joining x, y ∈ LS. Consider domains in this space with the


property that sufficiently close points in those domains can be joined by a not very long

path which does not get too close to the domain boundary. The formal definition is as

follows.

Definition 5.11 An open set Ω ⊆ LS is called an (ε, δ)-domain if there exist δ > 0 and

0 < ε ≤ 1 such that for any pair of points p, q ∈ Ω with d(p, q) ≤ δ there is a continuous

rectifiable curve γ : [0, T ] → Ω starting at p and ending at q such that the length l(γ) of

the curve γ satisfies

l(γ) ≤ 1

εd(p, q)

and

mind(p, z), d(q, z) ≤ 1

εd(z, ∂Ω)

for all points z on the curve γ.

A large source of (ε, δ)-domains is given by some classes of open sets in Carnot groups,

where the Carnot group itself is regarded as a length space with the Carnot-Caratheodory

distance, defined via the lengths of admissible (i.e. horizontal) paths, see e.g. [46]. There

is a natural notion of diameter (or, radius) for domains in length spaces.

Theorem 5.12 [34] Let τ be a bracket-generating distribution on a subriemannian man-

ifold M with smooth boundary ∂M , and µ0, µ1 be two volume forms on M with the same

total volume:∫

Mµ0 =

∫M

µ1. Suppose that the interior of M is an (ε, δ)-domain of

positive diameter.

Then there exists a diffeomorphism φ of M which is the time-one-map of the flow φt

of a non-autonomous vector field Vt tangent to the distribution τ everywhere on M and

to the boundary ∂M for every t ∈ [0, 1], such that φ∗µ1 = µ0.

The proof immediately follows from the result on solvability of the corresponding

Neumann problem ∆τu = ρ with n·(P τ∇u)|∂M = 0 (or, which is the same, ∂u∂(P τ n)

|∂M = 0)

for such domains, established in [47, 46] (cf. Theorem 1.5 in [46]).

Chapter 6

Distributions on Diffeomorphism

Groups

6.1 A Fibration on the Group of Diffeomorphisms

Let D be the group of all (orientation-preserving) diffeomorphisms of a manifold M .

Its Lie algebra X consists of all smooth vector fields on M . The tangent space to the

diffeomorphism group at any point φ ∈ D is given by the right translation of the Lie

algebra X from the identity id ∈ D to φ:

TφD = X φ | X ∈ X .

Fix a volume form µ of total volume 1 on the manifold M . Denote by Dµ the subgroup

of volume-preserving diffeomorphisms, i.e. the diffeomorphisms preserving the volume

form µ. The corresponding Lie algebra Xµ is the space of all vector fields on the manifold

M which are divergence-free with respect to the volume form µ.

Let W be the set of all smooth normalized volume forms in M , which is called the

(smooth) Wasserstein space. Consider the projection map πD : D → W defined by the

push forward of the fixed volume form µ by the diffeomorphism φ, i.e. πD(φ) = φ∗µ.

The projection πD : D → W defines a natural structure of a principal bundle on D

56

Chapter 6. Distributions on Diffeomorphism Groups 57

whose structure group is the subgroup Dµ of volume-preserving diffeomorphisms of M

and fibers F are right cosets for this subgroup in D. Two diffeomorphisms φ and φ lie

in the same fiber if they differ by a composition (on the right) with a volume-preserving

diffeomorphism: φ = φ s, s ∈ Dµ.

On the group D we define two vector bundles Ver and Hor whose spaces at a diffeo-

morphism φ ∈ D consist of right translated divergence-free fields

Verφ = X φ | divφ∗µX = 0

and gradient fields

Horφ := ∇f φ | f ∈ C∞(M) ,

respectively. Note that the bundle Ver is defined by the fixed volume form µ, while Hor

requires a Riemannian metric.1

Proposition 6.1 The bundle Ver of translated divergence-free fields is the bundle of

vertical spaces TφF for the fibration πD : D → W. The bundle Hor over D defines a

horizontal distribution for this fibration πD.

Proof Let φt be a curve in a fibre of πD : D → W emanating from the point φ0 = φ.

Then φt = φ0 st, where s0 = id and st are volume-preserving diffeomorphisms for each

t. Let Xt be a family of divergence-free vector fields, such that ∂tst = Xt st. Then the

vector tangent to the curve φt = φ0 st is given by ddt

∣∣∣t=0

(φ0 st) = (φ0∗X0)φ0. Since X0

is divergence-free with respect to µ, φ0∗X0 is divergence-free with respect to φ∗µ. Hence,

any vector tangent to the diffeomorphism group at φ is given by X φ, where X is a

divergence-free field with respect to the form φ∗µ.

By the Hodge decomposition of vector fields, we have the direct sum TD = Hor⊕Ver.

¤

1The metric on M does not need to have the volume form µ. In the general case, Xµ consists ofvector fields divergence-free with respect to µ, while the gradients are considered for the chosen metricon M .


Remark 6.2 The classical Moser theorem 5.2 can be thought of as the existence of

path-lifting property for the principal bundle πD : D → W : any deformation of volume

forms can be traced by the corresponding flow, i.e. a path on the diffeomorphism group,

projected to the deformation of forms. Its proof shows that this path lifting property

holds and has the uniqueness property in the presence of the horizontal distribution

defined above by using the Hodge decomposition. Namely, given any path µt starting at

µ0 in the smooth Wasserstein space W and a point φ0 in the fibre (πD)−1µ0, there exists

a unique path φt in the diffeomorphism group which is tangent to the horizontal bundle

Hor, starts at φ0, and projects to µt, see Figure 6.1.

Dµ

µt

µ1µ = µ0

φ0 = idφt

φ1

Horφt

∇f

W

D

π

Figure 6.1: The Moser theorem in both the classical and nonholonomic settings is a

path-lifting property in the diffeomorphism group.


6.2 A Nonholonomic Distribution on the Diffeomor-

phism Group

Let τ be a bracket-generating distribution on the manifold M . Consider the right-

invariant distribution T on the diffeomorphism group D defined at the identity id ∈ Dof the group by the subspace in X of all those vector fields which are tangent to the

distribution τ everywhere on M :

Tφ = V φ | V (x) ∈ τx for all x ∈ M.

Proposition 6.3 The infinite-dimensional distribution T is a non-integrable distribu-

tion in D. Horizontal paths in this distribution are flows of non-autonomous vector fields

tangent to the distribution τ on manifold M .

Proof To see that the distribution T is non-integrable we consider two horizontal vector

fields V and W on M and the corresponding right-invariant vector fields V and W on

D. Then their bracket at the identity of the group is (minus) their commutator as

vector fields V and W in M . This commutator does not belong to the plane Tid since

the distribution τ is non-integrable, and at least somewhere on M the commutator of

horizontal fields V and W is not horizontal.

The second statement immediately follows from the definition of T . ¤

Remark 6.4 Consider now the projection map πD : D → W in the presence of the

distribution T on D. The path lifting property in this case is a restatement of the

nonholonomic Moser theorem. Namely, for a curve µt | µ0 = µ in the space W of

smooth densities Theorem 5.1 proves that there is a curve gt | g0 = id in D, everywhere

tangent to the distribution T and projecting to µt: πD(gt) = µt.


Recall that in the classical case the corresponding path lifting becomes unique once

we fix the gradient horizontal bundle Horφ ⊂ TφD for any diffeomorphism φ ∈ D. Sim-

ilarly, in the nonholonomic case we consider the spaces of gradient projections instead

of the gradient spaces: Horτid := P τ∇f | f ∈ C∞(M), where P τ stands for the or-

thogonal projection onto the distribution τ in a given Riemannian metric on M . The

right-translated gradient projections Horτφ := (P τ∇f) φ | f ∈ C∞(M) define a hori-

zontal bundle for the principal bundle D → W by nonholonomic Hodge decomposition.

(Note also that in both classical and nonholonomic cases, the obtained horizontal distri-

butions on D are nonintegrable, cf. [52]. Indeed, the Lie bracket of two gradient fields

is not necessarily a gradient field, and similarly for gradient projections. Hence there

are no horizontal sections of the bundle D → W , tangent to these horizontal gradient

distributions.)

As we will see in Chapters 7 and 9, both gradient fields ∇f in the classical case

and gradient projections P τ∇f in the nonholonomic case allow one to move the den-

sities in the “fastest way”, and are important in transport problems of finding optimal

(“shortest”) path between densities.

6.3 Accessibility of Diffeomorphisms and Consequences

A stronger statement is recently proven in [2] after my paper [34] was distributed in the

archive:

Theorem 6.5 Every orientation preserving diffeomorphism in the diffeomorphism group

D can be accessed by a horizontal path tangent to the distribution T from the identity

diffeomorphism.

This theorem can be thought of as an analog of the Chow-Rashevsky theorem in the

infinite-dimensional setting of the group of diffeomorphisms, provided that the distribu-


tion T is bracket-generating on D. Note, however, that the Chow-Rashevsky theorem is

unknown in the general setting of an infinite-dimensional manifold, while there are only

“approximate” analogs of it, e.g. on a Hilbert manifold.

This theorem together with the original Moser theorem imply the nonholonomic Moser

theorem 5.1 on volume forms. Moreover, it also implies the following nonholonomic

version of the Moser theorem on symplectic structures from [43].

Corollary 6.6 Suppose that on a manifold M two symplectic structures ω0 and ω1 from

the same cohomology class can be connected by a path of symplectic structures in the same

class. Then for a bracket-generating distribution τ on M there exists a diffeomorphism

φ of M which is the time-one-map of a non-autonomous vector field Vt tangent to the

distribution τ everywhere on M and for every t ∈ [0, 1], such that φ∗ω1 = ω0.

This corollary follows from the one above since one would consider the diffeomorphism

from the classical Moser theorem, and realize it by the horizontal path (tangent to the

distribution T ) on the diffeomorphism group, which exists according to Theorem 6.5.

Chapter 7

The Riemannian Geometry of

Diffeomorphism Groups and Mass

Transport

The differential geometry of diffeomorphism groups is closely related to the theory of

optimal mass transport, and in particular, to the problem of moving one density to

another while minimizing certain cost on a Riemannian manifold. In this section, we

review the corresponding metric properties of the diffeomorphism group and the space

of volume forms.

Let M be a compact Riemannian manifold without boundary (or, more generally, a

complete metric space) with a distance function d. Let µ and ν be two Borel probability

measures on the manifold M which are absolutely continuous with respect to the Lebesgue

measure. Consider the following optimal mass transport problem: Find a Borel map

φ : M → M that pushes the measure µ forward to ν and attains the infimum of the

L2-cost functional∫

Md2(x, φ(x))µ among all such maps.

The set of all Borel probability measures is called the Wasserstein space. The minimal

62

Chapter 7. The Riemannian Geometry of Diffeomorphism Groups and Mass Transport63

cost of transport defines a metric d on this space:

d2(µ, ν) := infφ∫

M

d2(x, φ(x))µ | φ∗µ = ν . (7.1)

This mass transport problem admits a unique solution φ (defined up to measure zero

sets), called an optimal map (see [16] for M = Rn and [42] for any compact connected

Riemannian manifold M without boundary). Furthermore, there exists a 1-parameter

family of Borel maps φt starting at the identity map φ0 = id, ending at the optimal map

φ1 = φ and such that φt is the optimal map pushing µ forward to νt := φt∗µ for any

t ∈ (0, 1). The corresponding 1-parameter family of measures νt describes a geodesic in

the Wasserstein space of measures with respect to the distance function d and is called

the displacement interpolation between µ and ν, see [63] for details. (More generally,

in mass transport problems one can replace d2 in the above formula by a cost function

c : M×M → R, while we mostly focus on the case c = d2 and its subriemannian analog.)

In what follows, we consider a smooth version of the Wasserstein space, cf. Section

6.1. Recall that the smooth Wasserstein space W consists of smooth volume forms with

the total integral equal to 1. One can consider an infinite-dimensional manifold structure

on the smooth Wasserstein space, a (weak) Riemannian metric 〈 , 〉W , corresponding to

the distance function d, and geodesics on this space. Similar to the finite-dimensional

case, geodesics on the smooth Wasserstein spaceW can be formally defined as projections

of trajectories of the Hamiltonian vector field with the “kinetic energy” Hamiltonian in

the tangent bundle TW .

For a Riemannian manifold M both spaces D and W can be equipped with (weak)

Riemannian structures, i.e. can be formally regarded as infinite-dimensional Riemannian

manifolds, cf. [21]. (One can consider Hs-diffeomorphisms and Hs−1-forms of Sobolev

class s > n/2 + 1. Both sets can be considered as smooth Hilbert manifolds. However,

this is not applicable in the subriemannian case, discussed later, hence we confine to the

C∞ setting applicable in the both cases.)


From now on we fix a Riemannian metric 〈, 〉M on the manifold M , whose Riemannian

volume is the form µ. On the diffeomorphism group we define a Riemannian metric 〈, 〉D

whose value at a point φ ∈ D is given by

〈X1 φ, X2 φ〉D :=

∫

M

〈X1 φ(x), X2 φ(x)〉Mφ(x)µ. (7.2)

The action along a curve (or, “energy” of a curve) φt | t ∈ [0, 1] ⊂ D in this metric is

defined in the following straightforward way:

E(φt) =

∫ 1

0

dt

∫

M

〈∂tφt, ∂tφt〉M µ .

If M is flat, D is locally isometric to the (pre-)Hilbert L2-space of (smooth) vector-

functions φ, see e.g. [57]. The following proposition is well-known.

Proposition 7.1 Let φt be a geodesic on the diffeomorphism group D with respect to

the above Riemannian metric 〈, 〉D, and Vt be the (time-dependent) velocity field of the

corresponding flow: ∂tφt = Vt φt. Then the velocity Vt satisfies the inviscid Burgers

equation on M :

∂tVt +∇VtVt = 0 ,

where ∇VtVt stands for the covariant derivative of the field Vt on M along itself.

Proof In the flat case the geodesic equation is ∂2t φt = 0: this is the Euler-Lagrange

equation for the action functional E. Differentiate ∂tφt = Vt φt with respect to time t

and use this geodesic equation to obtain

∂tVt φt +∇Vt∂tφt = 0. (7.3)

After another substitution ∂tφt = Vt φt, the later becomes

(∂tVt +∇VtVt) φt = 0,

which is equivalent to the Burgers equation.

The non-flat case involves differentiation in the Levi-Civita connection on M and

leads to the same Burgers equation, see details in [21, 35]. ¤


Remark 7.2 Smooth solutions of the Burgers equation correspond to non-interacting

particles on the manifold M flying along those geodesics on M which are defined by the

initial velocities V0(x). The Burgers flows have the form φt(x) = expM(tV0(x)), where

expM : TM → M is the Riemannian exponential map on M .

Proposition 7.3 [52] The bundle projection πD : D →W is a Riemannian submersion

of the metric 〈 , 〉D on the diffeomorphism group D to the Riemannian metric 〈 , 〉W on

the smooth Wasserstein space W for the L2-cost. The horizontal (i.e. normal to fibers)

spaces in the bundle D →W are right-translated gradient fields.

Recall that for two Riemannian manifolds Q and B, a Riemannian submersion π :

Q → B is a mapping onto B which has maximal rank and preserves lengths of horizontal

tangent vectors to Q, see e.g. [51]. For a bundle Q → B, this means that there is

a distribution of horizontal spaces on Q, orthogonal to the fibers, which is projected

isometrically to the tangent spaces to B. One of the main properties of a Riemannian

submersion gives the following feature of geodesics:

Corollary 7.4 Any geodesic, initially tangent to a horizontal space on the full diffeo-

morphism group D, always remains horizontal, i.e. tangent to the horizontal distribu-

tion. There is a one-to-one correspondence between geodesics on the base W starting at

the measure µ and horizontal geodesics in D starting at the identity diffeomorphism id.

Remark 7.5 In the PDE terms, the horizontality of a geodesic means that a solution

of the Burgers equation with a potential initial condition remains potential forever. This

also follows from the Hamiltonian formalism and the moment map geometry discussed

in the next section. Since horizontal geodesics in the group D correspond to geodesics

on the density space W , potential solutions of the Burgers equation (corresponding to

horizontal geodesics) move the densities in the fastest way. The corresponding time-one-

maps for Burgers potential solutions provide optimal maps for moving the density µ to

any other density ν, see [16, 42].


The Burgers potential solutions have the form φt(x) = expM(−t∇f(x)) as long as

the right-hand-side is smooth. The time-one-map φ1 for the flow φt provides an optimal

map between probability measures if the function f is a (d2)-concave function. We

recall that the notion of c-concavity for a cost function c on M is defined as follows.

For a function f its c1-transform is f c1(y) = infx∈M(c(x, y) − f(x)), its c2-transform

f c2(x) = infy∈M(c(x, y) − f(y)) and the function f is said to be c-concave if f c1c2 = f .

Here, we consider the case c = d2. The family of maps φt defines the displacement

interpolation mentioned at the beginning of this chapter.

Let θ and ν be volume forms with the same total volume and let g and h be functions

on the manifold M defined by θ = g vol and ν = h vol, where vol be the Riemannian

volume form. Then a diffeomorphism φ moving one density to the other (φ∗θ = ν) satisfies

h(φ(x)) det(Dφ(x)) = g(x), where Dφ is the Jacobi matrix of the diffeomorphism φ. In

the flat case the optimal map φ is gradient, φ = ∇f , and the corresponding convex

potential f satisfies the Monge-Ampere equation

det(Hessf(x))) =g(x)

h(∇f(x)),

since D(∇f) = Hessf . In the non-flat case, the optimal map is φ(x) = expM(−∇f(x))

for a (d2/2)-concave potential f , and the equation is Monge-Ampere-like, see [42, 63] for

details. Below we describe the corresponding nonholonomic analogs of these objects.

Chapter 8

The Hamiltonian Mechanics on

Diffeomorphism Groups

In this section we present a Hamiltonian framework for the Otto calculus and, in par-

ticular, give a symplectic proof of Proposition 7.3 and Corollary 7.4 on the submersion

properties along with their generalizations.

8.1 Averaged Hamiltonians

We fix a Riemannian metric 〈, 〉M on the manifold M and consider the corresponding

Riemannian metric 〈, 〉D on the diffeomorphism group D. This defines a map (X φ) 7→〈Xφ , ·〉D from the tangent bundle TD to the cotangent bundle T ∗D. By using this map,

one can pull back the canonical symplectic form ωT ∗D from the cotangent bundle T ∗D to

the tangent bundle TD, and regard the latter as a manifold equipped with the symplectic

form ωTD.1 Similarly, a symplectic structure ωTM can be defined on the tangent bundle

TM by pulling back the canonical symplectic form on the cotangent bundle T ∗M via the

Riemannian metric 〈, 〉M . The two symplectic forms are related as follows. A tangent

1The consideration of the tangent bundle TD (instead of T ∗D) as a symplectic manifold allows oneto avoid dealing with duals of infinite-dimensional spaces here.

67

Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 68

vector V in the tangent space TXφTD at the point X φ ∈ TD is a map from M to

T (TM) = T 2M such that πT 2M V = X φ, where πT 2M : T (TM) → TM is the tangent

bundle projection. Let V1 and V2 be two tangent vectors in TXφTD at the point X φ,

then the symplectic forms are related in the following way:

ωTD(V1, V2) =

∫

M

ωTM(V1(x), V2(x))µ(x) ,

where ωTM is understood as the pairing on T (TM) = T 2M .

Definition 8.1 Let HM be a Hamiltonian function on the tangent bundle TM of the

manifold M . The averaged Hamiltonian function is the function HD on the tangent

bundle TD of the diffeomorphism group D obtained by averaging the corresponding

Hamiltonian HM over M in the following way: its value at a point X φ ∈ TφD is

HD(X φ) :=

∫

M

HM(X φ(x))µ(x) (8.1)

for a vector field X ∈ X and a diffeomorphism φ ∈ D.

Consider the Hamiltonian flows for these Hamiltonian functions HM and HD on

the tangent bundles TM and TD, respectively, with respect to the standard symplectic

structures on the bundles. The following theorem can be viewed as a generalization of

Propositions 7.1 and 7.3.

Theorem 8.2 Each Hamiltonian trajectory for the averaged Hamiltonian function HD

on TD describes a flow on the tangent bundle TM , in which every tangent vector to M

moves along its own HM -Hamiltonian trajectory in TM .

Example 8.3 For the Hamiltonian KM(p, q) = 12〈p, p〉M given by the “kinetic energy”

for the metric on M , the above theorem implies that any geodesic on D is a family of

diffeomorphisms of M , in which each particle moves along its own geodesic on M with

constant velocity, i.e. its velocity field is a solution to the Burgers equation, cf. Remark

7.2.


Below we discuss this theorem and its geometric meaning in detail. In particular, in

the above form, the statement is also applicable to the case of nonholonomic distributions

(i.e. subriemannian, or Carnot-Caratheodory spaces) discussed in the next section.

8.2 Riemannian Submersion and Symplectic Quo-

tients

We start with a Hamiltonian proof of Proposition 7.3 on the Riemannian submersion

D → W of diffeomorphisms onto densities. Recall the following general construction in

symplectic geometry. Let π : Q → B be a principal bundle with the structure group G.

Lemma 8.4 (see e.g. [12]) The symplectic reduction of the cotangent bundle T ∗Q over

the G-action gives the cotangent bundle T ∗B = T ∗Q//G.

Proof The moment map J : T ∗Q → g∗ associated with this action takes T ∗Q to the

dual of the Lie algebra g = Lie(G). For the G-action on T ∗Q the moment map J is

the projection of any cotangent space T ∗a Q to cotangent space T ∗

a F ≈ g∗ for the fiber F

through a point a ∈ Q. The preimage J−1(0) of the zero value is the subbundle of T ∗Q

consisting of covectors vanishing on fibers. Such covectors are naturally identified with

covectors on the base B. Thus factoring out the G-action, which moves the point a over

the fiber F , we obtain the bundle T ∗B. ¤

Suppose also that Q is equipped with a G-invariant Riemannian metric 〈, 〉Q.

Lemma 8.5 The Riemannian submersion of (Q, 〈, 〉Q) to the base B with the induced

metric 〈, 〉B is the result of the symplectic reduction.

Proof Indeed, the metric 〈, 〉Q gives a natural identification T ∗Q ≈ TQ of the tan-

gent and cotangent bundles for Q, and the “projected metric” is equivalent to a similar

identification for the base manifold B.


In the presence of metric in Q, the preimage J−1(0) is identified with all vectors in

TQ orthogonal to fibers, that is J−1(0) is the horizontal subbundle in TQ. Hence, the

symplectic quotient J−1(0)/G can be identified with the tangent bundle TB. ¤

Proof of Proposition 7.3

Now we apply this “dictionary” to the diffeomorphism group D and the Wasserstein

space W . Consider the projection map πD : D → W as a principal bundle with the

structure group Dµ of volume-preserving diffeomorphisms of M . Recall that the vertical

space of this principal bundle at a point φ ∈ D consists of right-translations by the

diffeomorphism φ of vector fields which are divergence-free with respect to the volume

form φ∗µ: Verφ = X φ | divφ∗µX = 0 , and the horizontal space is given by translated

gradient fields: Horφ = ∇f φ | f ∈ C∞(M).For each volume-preserving diffeomorphism ψ ∈ Dµ, the Dµ-action Rψ of ψ by right

translations on the diffeomorphism group is given by

Rψ(φ) = φ ψ.

The induced action TRψ : TD → TD on the tangent spaces of the diffeomorphism group

is given by

TRψ(X φ) = (X φ) ψ.

One can see that for volume-preserving diffeomorphisms ψ this action preserves the Rie-

mannian metric (7.2) on the diffeomorphism group D (it is the change of variable for-

mula), while for a general diffeomorphism one has an extra factor Dψ, the Jacobian of

ψ, in the integral. ¤

Remark 8.6 The explicit formula of the moment map J : TQ → X∗µ for the group of

volume-preserving diffeomorphisms G = Dµ acting on Q = D is

J(X φ)(Y ) =

∫

M

〈X, φ∗Y 〉Mφ∗µ ,


where Y ∈ Xµ is any vector field on M divergence-free with respect to the volume form

µ, X ∈ X, and φ ∈ D.

8.3 Hamiltonian Flows on the Diffeomorphism Groups

Let HQ : TQ → R be a Hamiltonian function invariant under the G-action on the cotan-

gent bundle of the total space Q. The restriction of the function HQ to the horizontal

bundle J−1(0) ⊂ TQ is also G-invariant, and hence descends to a function HB : TB → R

on the symplectic quotient, the tangent bundle of the base B. Symplectic quotients

admit the following reduction of Hamiltonian dynamics:

Proposition 8.7 [12] The Hamiltonian flow of the function HQ preserves the preimage

J−1(0), i.e. trajectories with horizontal initial conditions stay horizontal. Furthermore,

the Hamiltonian flow of the function HQ on the tangent bundle TQ of the total space Q

descends to the Hamiltonian flow of the function HB on the tangent bundle TB of the

base.

Now we are going to apply this scheme to the bundle D →W . For a fixed Hamiltonian

function HM on the tangent bundle TM to the manifold M , consider the corresponding

averaged Hamiltonian function HD on TD, given by the formula (8.1): HD(X φ) :=∫

MHM(X φ(x))µ. The latter Hamiltonian is Dµ-invariant (as also follows from the

change of variable formula) and it will play the role of the function HQ. Thus the flow

for the averaged Hamiltonian HD descends to the flow of a certain Hamiltonian HW on

TW .

Describe explicitly the corresponding flow on the tangent bundles of D and W . Let

ΨHM

t : TM → TM be the Hamiltonian flow of the Hamiltonian HM on the tangent

bundle of the manifold M and ΨHDt : TD → TD denotes the flow for the Hamiltonian

function HD on the tangent bundle of the diffeomorphism group.


Theorem 8.8 (=8.2′) The Hamiltonian flows of the Hamiltonians HD and HM are

related by

ΨHDt (X φ)(x) = ΨHM

t (X(φ(x))) ,

where, on the right-hand-side, the flow ΨHM

t on TM transports the shifted field X(φ(x)),

while, on the left-hand-side, X φ is regarded as a tangent vector to D at the point φ.

Proof We prove this infinitesimally (cf. [21]). Let XHD and XHM be the Hamiltonian

vector fields corresponding to the Hamiltonians HD and HM respectively. We claim that

XHD(X φ) = XHM X φ. Indeed, by the definition of Hamiltonian fields, we have

ωTD(XHM X φ, Y ) =

∫

M

ωTM(XHM (X(φ(x))), Y (x))µ =

∫

M

dHMX(φ(x))(Y (x))µ(x)

for any Y ∈ TφD. By interchanging the integration and exterior differentiation, the latter

expression becomes dHDXφ(Y ). The result follows from the uniqueness of the Hamiltonian

vector field which, in turn, is a consequence of the weak nondegeneracy of the symplectic

form ωTD (cf. [21]). ¤

Remark 8.9 This theorem has a simple geometric meaning for the “kinetic energy”

Hamiltonian function KM(v) := 12〈v, v〉M on the tangent bundle TM . One of the possible

definitions of geodesics in M is that they are projections to M of trajectories of the

Hamiltonian flow on TM , whose Hamiltonian function is the kinetic energy. In other

words, the Riemannian exponential map expM on the manifold M is the projection of

the Hamiltonian flow ΨKM

t on TM . Similarly, the Riemannian exponential expD of the

diffeomorphism group D is the projection of the Hamiltonian flow for the Hamiltonian

KD(X φ) := 12

∫M〈X φ,X φ〉Mµ on TD.

Recall that the geodesics on the diffeomorphism group (described by the Burgers

equation, see Proposition 7.1) starting at the identity with the initial velocity V ∈ TidDare the flows which move each particle x on the manifold M along the geodesic with

the direction V (x). Such a geodesic is well defined on the diffeomorphism group D as


long as the particles do not collide. The corresponding Hamiltonian flow on the tangent

bundle TD of the diffeomorphism group describes how the corresponding velocities of

these particles vary (cf. Example 8.3).

For a more general Hamiltonian HM on the tangent bundle TM , each particle x ∈ M

with an initial velocity V (x) will be moving along the corresponding characteristic, which

is the projection to M of the corresponding trajectory ΨHM

t (V (x)) in the tangent bundle

TM .

Now we would like to describe more explicitly horizontal geodesics and characteristics

on the diffeomorphism group D. Recall that ΨHDt denotes the Hamiltonian flow of the

averaged Hamiltonian HD on the tangent bundle TD of the diffeomorphism group D.

If this Hamiltonian flow is gradient at the initial moment, it always stays gradient, as

implied by Corollary 7.4. Furthermore, the corresponding potential can be described as

follows.

Corollary 8.10 Let f be a function on the manifold M . Then the Hamiltonian flow for

HD with the initial condition ∇f φ ∈ TφD has the form ∇ft φt, where φt ∈ D is a

family of diffeomorphisms and ft is the family of functions on M starting at f0 = f and

satisfying the Hamilton-Jacobi equation

∂tft + HM(∇ft(x)) = 0 . (8.2)

Proof This follows from the method of characteristics, which gives the following way

of finding ft, the solution to the Hamilton-Jacobi equation (8.2). Consider the tangent

vector ∇f(x) for each point x ∈ M . Denote by ΨHM

t : TM → TM the Hamiltonian

flow for the Hamiltonian HM : TM → R and consider its trajectory t 7→ ΨHM

t (∇f(x))

starting at the tangent vector ∇f(x). Then project this trajectory to M using the

tangent bundle projection πTM : TM → M to obtain a curve in M . It is given by the

formula t 7→ πTM(ΨHM

t (∇f(x))). As x varies over the manifold M , this defines a flow


φt := πTM ΨHM

t ∇f on M . (Note that this procedure defines a flow for small time t,

while for larger times the map φt may cease to be a diffeomorphism, i.e. shock waves can

appear.) The corresponding time-dependent vector field is gradient and defines the family

∇ft, the gradient of the solution to the Hamilton-Jacobi equation above, see Figure 8.1.

¤

TM TxM

x

∇f(x)

Mφt(x)

Tφt(x)M

ΨHM

t (∇f(x))

= ∇ft(φt(x))

Figure 8.1: Hamiltonian flow of the Hamiltonian HM and its projection: The curve φt(x)

is the projection of the curve ΨHM

t (∇f(x)) to the manifold M .

Remark 8.11 The above corollary manifests that the Hamilton-Jacobi equation (8.2)

can be solved using the method of characteristics due to the built-in symmetry group of

all volume preserving diffeomorphisms.

8.4 Hamiltonian Flows on the Wasserstein Space

What is the corresponding flow on the tangent bundle TW of the Wasserstein space,

induced by the Hamiltonian flow on TD for the diffeomorphism group D after the pro-

jection πD : D → W? Fix a Hamiltonian HM on the tangent bundle TM which defines


the averaged Hamiltonian function HD on the tangent bundle TD, see Equation (8.1).

Now we describe explicitly the induced Hamiltonian HW on the tangent bundle TW .

Let (ν, η) be a tangent vector at a density ν on M , regarded as a point of the Wasser-

stein space W . The normalization of densities (∫

ν = 1 for all ν ∈ W) gives the con-

straint for tangent vectors:∫

Mη = 0. Let f : M → R be a function that satisfies

(−divν∇f)ν = η. (Given (ν, η), such a function is defined uniquely up to an additive

constant.) Then the induced Hamiltonian on the tangent bundle TW of the base W is

given by

HW(ν, η) =

∫

M

HM(∇f(x)) ν , (8.3)

since ∇f is a vector of the horizontal distribution in TD.

Now, the flow ΨHWt of the corresponding Hamiltonian field on TW can be found

explicitly by employing Proposition 8.7. Consider the flow φt := πTM ΨHM

t ∇f defined

on M for small t in Corollary 8.10.

Theorem 8.12 The Hamiltonian flow ΨHWt of the Hamiltonian function HW on the

tangent bundle TW of the Wasserstein space W is

ΨHWt (ν, η) = (νt,−L∇ftνt) ,

where L is the Lie derivative, the family of functions ft satisfies the Hamilton-Jacobi

equation (8.2) for the Hamiltonian function HM on the tangent bundle TM , and the

family νt = (φt)∗ν is the push forward of the volume form ν by the map φt defined above.

Proof The function HD(X φ) =∫

MHM(X(φ(x)))µ(x) on the tangent bundle TD

of the diffeomorphism group induces the Hamiltonian HW on TW . By virtue of the

Hamiltonian reduction, Hamiltonian trajectories of HD contained in the horizontal bun-

dle Hor = ∇f φ | f ∈ C∞(M) descend to Hamiltonian trajectories of HW . Then the

Hamiltonian flow ΨHDof the Hamiltonian HD is given by ΨHD

(X φ) = ΨHM X φ,

due to Theorem 8.8. By restricting this to the horizontal bundle Hor we have

ΨHD(∇f φ) = ΨHM ∇f φ. (8.4)


The flow ΨHDis described in Corollary 8.10 and has the form ΨHD

(∇f φ) = ∇ft φt,

where ft and φt are defined as required.

On the other hand, recall that the projection πD : D →W is defined by πD(φ) = φ∗µ.

The differential DπD of this map πD is

Dπ(X φ) := (φ∗µ,−LX(φ∗µ)) .

The application of this relation to (8.4) gives the result. ¤

Remark 8.13 The time-one-map for the above density flow νt in the Wasserstein space

W formally describes optimal transport maps for the Hamiltonian HM . In particular, it

recovers the optimal map recently obtained in [14]. One considers the optimal transport

problem for the functional

infφ∫

M

c(x, φ(x))µ | φ∗µ = ν

with the cost function c defined by

c(x, y) = inf

∫ 1

0

L(γ, γ) dt ,

where the infimum is taken over paths γ joining the points x and y and the Lagrangian

L : TM → R satisfies certain regularity and convexity assumptions, see [14]. The corre-

sponding Hamiltonian HM in Theorem 8.12 is the Legendre transform of the Lagrangian

L. Note that for the “kinetic energy” Lagrangian KM , the above map becomes the opti-

mal map expM(−∇f) mentioned at the beginning of this section, with expM : TM → M

being the Riemannian exponential of the manifold M .

Chapter 9

The Subriemannian Geometry of

Diffeomorphism Groups

In this chapter we develop the subriemannian setting for the diffeomorphism group. In

particular, we derive the geodesic equations for the “nonholonomic Wasserstein metric,”

and describe nonholonomic versions of the Monge-Ampere and heat equations.

Let M be a manifold with a fixed distribution τ on it. Recall that a subrieman-

nian metric is a positive definite inner product 〈 , 〉τ on each plane of the distribution τ

smoothly depending on a point in M . Such a metric can be defined by the bundle map

I : T ∗M → τ , sending a covector αx ∈ T ∗xM to the vector Vx in the plane τx such that

αx(U) = 〈Vx, U〉τ on vectors U ∈ τx. The subriemannian Hamiltonian Hτ : T ∗M → R is

the corresponding fiberwise quadratic form:

Hτ (αx) =1

2〈Vx, Vx〉τ . (9.1)

Let ΨHτ

t be the Hamiltonian flow for time t of the subriemannian Hamiltonian Hτ on

T ∗M , while πT ∗M : T ∗M → M is the cotangent bundle projection. Then the subrie-

mannian exponential map expτ : T ∗M → M is defined as the projection to M of the

time-one-map of the above Hamiltonian flow on T ∗M :

expτ (tαx) := πT ∗MΨHτ

t (αx). (9.2)

77

Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 78

This relation defines a normal subriemannian geodesic on M with the initial covector αx.

Note that the initial velocity of the subriemannian geodesic expτ (tαx) is Vx = Iαx ∈ τx.

So, unlike the Riemannian case, there are many subriemannian geodesics having the same

initial velocity Vx on M .

Let dτ be a subriemannian (or, Carnot-Caratheodory) distance on the manifold M ,

defined as the infimum of the length of all absolutely continuous admissible (i.e. tangent

to τ) curves joining given two points. For a bracket-generating distribution τ any two

points can be joined by such a curve, so this distance is always finite. Consider the

corresponding optimal transport problem by replacing the Riemannian distance d in (7.1)

with the subriemannian distance dτ . Below we study the infinite-dimensional geometry

of this subriemannian version of the optimal transport problem. Although in general

normal subriemannian geodesics might not exhaust all the length minimizing geodesics in

subriemannian manifolds (see [44, 45]), we will see that in the problems of subriemannian

optimal transport one can confine oneself to only such geodesics!

9.1 Subriemannian Submersion

Consider the following general setting: Let (Q, T ) be a subriemannian space, i.e. a

manifold Q with a distribution T and a subriemannian metric 〈 , 〉τ on it. Suppose that

Q → B is a bundle projection to a Riemannian base manifold B.

Definition 9.1 The projection π : (Q, T ) → B is a subriemannian submersion if the

distribution T contains a horizontal subdistribution T hor, orthogonal (with respect to the

subriemannian metric) to the intersections of T with fibers, and the projection π maps

the spaces T hor isometrically to the tangent spaces of the base B, see Figure 9.1.

Let a subriemannian submersion π : (Q, T ) → B be a principal G-bundle Q → B,

where the distribution T and the subriemannian metric are invariant with respect to the

action of the group G. The following theorem is an analog of Corollary 7.4.


Thor

Q

B

π

T

F

Figure 9.1: Subriemannian submersion: horizontal subdistribution T hor is mapped iso-

metrically to the tangent bundle TB of the base.

Theorem 9.2 For each point b in the base B and a point q in the fibre π−1(b) ⊂ Q over

b, every Riemannian geodesics on the base B starting at b admits a unique lift to the

subriemannian geodesic on Q starting at q with the velocity vector in T hor.

Example 9.3 Consider the standard Hopf bundle π : S3 → S2, with the two-dimensional

distribution T transversal to the fibers S1. Fix the standard metric on the base S2 and

lift it to a subriemannian metric on S3, which defines a subriemannian submersion. If the

distribution T is orthogonal to the fibers, the manifold (S3, T ) can locally be thought of

as the Heisenberg 3-dimensional group. Then all subriemannian geodesics on S3 with a

given horizontal velocity project to a 1-parameter family of circles on S2 with a common

tangent element. However, only one of these circles, the equator, is a geodesic on the

standard sphere S2. Thus the equator can be uniquely lifted to a subriemannian geodesic

on S3 with the given initial vector.


Note that the uniqueness of this lifting holds even if the distribution T is not orthog-

onal, but only transversal, say at a fixed angle, to the fibers S1, see Figure 9.2.

S1

π

S3

T

S2

Figure 9.2: Projections of subriemannian geodesics from (S3, T ) in the Hopf bundle give

circles in S2, only one of which, the equator, is a geodesic on the base S2.


To prove this theorem we describe the Hamiltonian setting of the subriemannian

submersion.

Let V er be the vertical subbundle in TQ (i.e. tangent planes to the fibers of the

projection Q → B). Define V er⊥ ⊂ T ∗Q to be the corresponding annihilator, i.e. V er⊥q

is the set of all covectors αq ∈ T ∗q Q at the point q ∈ Q which annihilate the vertical space

V erq.

Definition 9.4 The restriction of the subriemannian exponential map expτ : T ∗Q → Q

to the distribution V er⊥ is called the horizontal exponential

expτ : V er⊥ → Q

and the corresponding geodesics are the horizontal subriemannian geodesics.


The symplectic reduction identifies the quotient V er⊥/G with the cotangent bundle

T ∗B of the base. Note that the subdistribution T hor defines a horizontal bundle for the

principal bundle Q → B in the usual sense. The definition of subriemannian submersion

(translated to the cotangent spaces, where we replace T hor by V er⊥) gives that the

subriemannian Hamiltonian HT defined by (9.1) descends to a Riemannian Hamiltonian

HB,T on T ∗B. Moreover, Hamiltonian trajectories of HB,T starting at the cotangent

space T ∗b B are in one-to-one correspondence with the trajectories of HT starting at the

space V er⊥q . The projection of these Hamiltonian trajectories to the manifolds B and Q

via the cotangent bundle projections πT ∗B and πT ∗Q, respectively, gives the result. ¤

Corollary 9.5 For a subriemannian submersion, geodesics on the base give rise only to

normal geodesics in the total space.

In order to describe the geodesic geometry on the tangent, rather than cotangent,

bundle of the manifold Q, we fix a Riemannian metric on Q whose restriction to the

distribution τ is the given subriemannian metric 〈 , 〉τ . This Riemannian metric allows one

to identify the cotangent bundle T ∗Q with the tangent bundle TQ. Then the exponential

map expτ can be viewed as a map TQ → Q. It is convenient to think of T hor as

the horizontal bundle and identify it with the annihilator V er⊥. This way horizontal

subriemannian geodesics are geodesics with initial (co)vector in the horizontal bundle

T hor. This identification is particularly convenient for the infinite-dimensional setting,

where we work with the tangent bundle of the diffeomorphism group.

9.2 A Subriemannian Analog of the Otto Calculus.

Fix a Riemannian metric 〈 , 〉M on the manifold M . Let P τ : TM → τ be the orthogonal

projection of vectors on M onto the distribution τ with respect to this metric. Let (ν, η1)

and (ν, η2) be two tangent vectors in the tangent space at the point ν of the smooth


Wasserstein space. Recall that for a fixed volume form µ, we define the subriemannian

Laplacian as ∆τf := divµ(P τ∇f).

Define a nonholonomic Wasserstein metric as the (weak) Riemannian metric on the

(smooth) Wasserstein space W given by

〈(ν, η1), (ν, η2)〉W,T :=

∫

M

〈P τ∇f1(x), P τ∇f2(x)〉Mν , (9.3)

where functions f1 and f2 are solutions of the subriemannian Poisson equation

−(∆τfi)ν = ηi

for the measure ν.

Theorem 9.6 The geodesics on the Wasserstein space W equipped with the nonholo-

nomic Wasserstein metric (9.3) have the form (expτ (tP τ∇f))∗ν, where expτ : T hor → M

is the horizontal exponential map and ν is any point of W.

To prove this theorem we first note that the Riemannian metric 〈 , 〉D defined on the

diffeomorphism group restricts to a subriemannian metric 〈 , 〉D,T on the right invariant

bundle T .

Proposition 9.7 The map π : (D, T ) → W is a subriemannian submersion of the

subriemannian metric 〈 , 〉D,T on the diffeomorphism group with distribution T to the

nonholonomic Wasserstein metric 〈, 〉W,T .

Proof This statement can be derived from the Hamiltonian reduction, similarly to the

Riemannian case.

Here we prove it by an explicit computation. Recall that the map π : D → W is

defined by π(φ) = φ∗µ. Let Xφ be a tangent vector at the point φ in the diffeomorphism

group D. Consider the flow φt of the vector field X, and note that π(φt φ) = φt∗φ∗µ.

To compute the derivative Dπ we differentiate this equation with respect to time t at

t = 0:

Dπ(X φ) = L−X(φ∗µ) = −(divφ∗µX)φ∗µ ,


by the definition of Lie derivative. A vector field X from the horizontal bundle T hor has

the form (P τ∇f) φ, and for it the equation becomes

Dπ((P τ∇f) φ) = −(∆τf) φ∗µ ,

where the Laplacian ∆τ is taken with respect to the volume form φ∗µ.

Therefore, for horizontal tangent vectors (P τ∇f1) φ and (P τ∇f2) φ at the point φ

their subriemannian inner product is

〈(P τ∇f1) φ, (P τ∇f2) φ〉D =

∫

M

〈P τ∇f1 φ, P τ∇f2 φ〉Mµ .

After the change of variables this becomes

∫

M

〈P τ∇f1, Pτ∇f2〉Mφ∗µ = 〈Dπ((P τ∇f1) φ), Dπ((P τ∇f2) φ)〉W,T ,

which completes the proof. ¤


To describe geodesics in the nonholonomic Wasserstein space we define the Hamilto-

nian HT : TD → R by

HT (X φ) :=

∫

M

〈(P τX) φ, (P τX) φ〉µ . (9.4)

The Hamiltonian flow with Hamiltonian HT , has the form expτ ((tP τX) φ) according

to Theorem 8.8. By taking its restriction to the bundle T hor and projecting to the base

we obtain that the geodesics on the smooth Wasserstein space are

(expτ ((tP τ∇f) φ))∗ν ,

where ν = φ∗µ and P τ∇f is defined by the Hodge decomposition for the field X. This

completes the proof of Theorem 9.6. ¤


Remark 9.8 For a horizontal subriemannian geodesic ϕt(x) := expτ (tP τ∇f(x)) with

a smooth function f , the diffeomorphism ϕt satisfies ddt

ϕt = (P τ∇ft) ϕt and ft is the

solution of the Hamilton-Jacobi equation

ft + Hτ (∇ft) = 0 (9.5)

with the initial condition f0 = f , see Corollary 8.10. This equation determines horizontal

subriemannian geodesics on the diffeomorphism group D. In the Riemannian case, one

can see that the vector fields Vt = ddt

ϕt = ∇ft ϕt satisfy the Burgers equation by taking

the gradient of the both sides in (9.5), cf. Proposition 7.1. Hence Equation (9.5) can

be viewed as a subriemannian analog of the potential Burgers equation in D. However,

a subriemannian analog of the Burgers equation for nonhorizontal (i.e. nonpotential)

normal geodesics on the diffeomorphism group is not so explicit.

Remark 9.9 If the function f is smooth, the time-one-map ϕ(x) := expτ (P τ∇f(x))

along the geodesics described in Theorem 9.6 satisfies the following nonholonomic analog

of the Monge-Ampere equation: h(ϕ(x)) det(Dϕ(x)) = g(x), where g and h are functions

on the manifold M defining two densities θ = g vol and ν = h vol.

Furthermore, for the case of the Heisenberg group this formal solution ϕ(x) coincides

with the optimal map obtained in [11]. The (minus) potential −f of the corresponding

optimal map satisfies the c-concavity condition for c = d2τ/2, where d2

τ is the subrieman-

nian distance, cf. Remark 7.5.

9.3 The Nonholonomic Heat Equation

Consider the heat equation ∂tu = ∆u on a function u on the manifold M , where the

operator ∆ is given by ∆f = divµ∇f . Upon multiplying the both sides of the heat

equation by the fixed volume form µ, one can regard it as an evolution equation on the


smooth Wasserstein space W . Note that the right-hand-side of the heat equation gives a

tangent vector (∆u)µ at the point uµ of the Wasserstein space. The Boltzmann relative

entropy functional Ent : W → R is defined by the integral

Ent(ν) :=

∫

M

log(ν/µ) ν . (9.6)

The gradient flow of Ent on the Wasserstein space with respect to the metric d gives the

heat equation, see [52].

Recall that one can define the subriemannian Laplacian: ∆τf := divµ(P τ∇f) for

a fixed volume form µ on M . The natural generalization of the heat equation to the

nonholonomic setting is as follows.

Definition 9.10 The nonholonomic (or, subriemannian) heat equation is the equation

∂tu = ∆τu on a time-dependent function u on M .

Below we show that this equation in the nonholonomic setting also admits a gradient

interpretation on the Wasserstein space.

Theorem 9.11 The nonholonomic heat equation ∂tu = ∆τu describes the gradient flow

on the Wasserstein space with respect to the relative entropy functional (9.6) and the

nonholonomic Wasserstein metric (9.3).

Namely, for the volume form νt := gt∗µ and the gradient ∇W,T with respect to the

metric 〈 , 〉W,T on the Wasserstein space one has

∂

∂tνt = −∇W,T Ent(νt) = ∆τ (νt/µ)µ.

Proof Denote by (ν, η) a tangent vector to the Wasserstein space W at a point ν ∈ W ,

where η is a volume form of total integral zero. Let ∆τν be the subriemannian Laplacian

with respect to the volume form ν.

Let h and hEnt be real-valued functions on the manifold M such that −(∆τνh)ν = η

and −(∆τνhEnt)ν = ∇W,T Ent(ν) for the entropy functional Ent. Then, by definition of


the metric 〈 , 〉W,T given by (9.3), we have

〈(ν,∇W,T Ent(ν)), (ν, η)〉W,T =

∫

M

〈P τ∇hEnt(x), P τ∇h(x)〉Mν. (9.7)

On the other hand, by definitions of Ent and the gradient ∇W,T on the Wasserstein

space, one has:

〈(ν,∇W,T Ent(ν)), (ν, η)〉W,T :=d

dt

∣∣∣t=0

Ent(ν + tη) =d

dt

∣∣∣t=0

∫

M

[log

(ν + tη

µ

)](ν + tη) .

After differentiation and simplification the latter expression becomes∫

Mlog(ν/µ) η , where

we used that∫

Mη = 0. This can be rewritten as

∫

M

log(ν/µ) η = −∫

M

log(ν/µ)LP τ∇hν =

∫

M

(LP τ∇h log(ν/µ)) ν ,

by using the Leibnitz property of the Lie derivative L on the Wasserstein space and

the fact that −(∆τνh)ν = η. Note that the Lie derivative is the inner product with the

gradient, and hence

∫

M

(LP τ∇h log(ν/µ)) ν =

∫

M

〈∇ log(ν/µ), P τ∇h〉Mν =

∫

M

〈P τ∇ log(ν/µ), P τ∇h〉Mν .

Comparing the latter form with (9.7), we get P τ∇hEnt = P τ∇ log(ν/µ), or, after taking

the divergence of both parts and using the definition of function hEnt,

∇W,T Ent(ν) = −∆τν(log(ν/µ)) ν .

Finally, let us show that the right-hand-side of the above equation coincides with

−∆τµ(ν/µ) µ. Indeed, the chain rule gives

LP τ∇ log(ν/µ)ν = L(µ/ν)P τ∇(ν/µ)ν = (µ/ν)LP τ∇(ν/µ)ν + d(µ/ν) ∧ iP τ∇(ν/µ)ν .

The last term is equal to (iP τ∇(ν/µ)d(µ/ν))ν = LP τ∇(ν/µ)(µ/ν) ν, which implies that

LP τ∇ log(ν/µ)ν = LP τ∇(ν/µ)µ

by the Leibnitz property of Lie derivative. Thus

∆τν(log(ν/µ)) ν = divν(P

τ∇(log(ν/µ))ν = LP τ∇ log(ν/µ)ν = LP τ∇(ν/µ)µ = ∆τµ(ν/µ)µ .


The above shows that the nonholonomic heat equation is the gradient flow on the

Wasserstein space for the same potential as the classical heat equation, but with respect

to the nonholonomic Wasserstein metric. ¤

Part III

Generalized Ricci Curvature Bounds

for Three Dimensional Contact

Subriemannian Manifolds

88

Chapter 10

Revisiting Subriemannian Geometry

In this chapter we recall basic notions in subriemannian geometry introduced in the

earlier part of the thesis and introduce several new concepts needed in this part of the

thesis. Recall that a subriemannian manifold is a triple (M, ∆, g), where M is a smooth

manifold, ∆ is a distribution (a vector subbundle ∆ of the tangent bundle TM of the

manifold M), and g is a fibrewise inner product defined on the distribution ∆. The

inner product g is also called a subriemannian metric. An absolutely continuous curve

γ : [0, 1] → M on the manifold M is called horizontal if it is almost everywhere tangent

to the distribution ∆. Using the inner product g, we can define the length l(γ) of a

horizontal curve γ by

l(γ) =

∫ 1

0

g(γ(t), γ(t))1/2dt.

The subriemannian or Carnot-Caratheodory distance dCC between two points x and

y on the manifold M is defined by

dCC(x, y) = inf l(γ), (10.1)

where the infimum is taken over all horizontal curves which start from x and end at y.

The above distance function may not be well-defined since there may exist two points

which are not connected by any horizontal curve. For this we assume that the distribution

89

Chapter 10. Revisiting Subriemannian Geometry 90

∆ is bracket generating. It means that the vector fields contained in the distribution ∆

together with their iterated Lie brackets span all tangent spaces of the manifold M .

Under the bracket generating assumption, the subriemannian distance is well-defined

thanks to the Chow-Rashevskii Theorem (Theorem 3.18).

As in Riemannian geometry, horizontal curves which realize the infimum in (10.1) are

called length minimizing geodesics (or simply geodesics). From now on all manifolds are

assumed to be complete with respect to a given subriemannian distance. It means that

given any two points on the manifold, there is at least one geodesic joining them. Next

we will discuss one type of geodesics called normal geodesics. For this let us recall several

notions in the symplectic geometry of the cotangent bundle T ∗M . Let π : T ∗M → M be

the projection map, the tautological one form θ on T ∗M is defined by

θα(V ) = α(dπ(V )),

where α is in the cotangent bundle T ∗M and V is a tangent vector on the manifold T ∗M

at α.

The symplectic two form ω on T ∗M is defined as the exterior derivative of the tau-

tological one form: ω = dθ. It is nondegenerate in the sense that ω(V, ·) = 0 if and only

if V = 0. Given a function H : T ∗M → R on the cotangent bundle, the Hamiltonian

vector field ~H is defined by i ~Hω = −dH. By the nondegeneracy of the symplectic form

ω, the Hamiltonian vector field ~H is uniquely defined.

Given a distribution ∆ and a subriemannian metric g on it, we can associate with

it a Hamiltonian H on the cotangent bundle T ∗M . To do this let α : TxM → R be a

covector in the cotangent space T ∗xM at the point x. The subriemannian metric g defines

a bundle isomorphism I : ∆∗ → ∆ between the distribution ∆ and its dual ∆∗. It is

defined by

g(I(β), ·) = β(·),

where β is an element in the dual bundle ∆∗ of the distribution ∆.


By restricting the domain of the covector α to the subspace ∆x of the tangent space

TxM , it defines an element, still called α, in the dual space ∆∗. Therefore, I(α) is a

tangent vector contained in the space ∆x and the Hamiltonian H corresponding to the

subriemannian metric g is defined by

H(α) := α(I(α)) = g(I(α), I(α)).

Note that this construction defines the usual kinetic energy Hamiltonian in the Rieman-

nian case.

Let ~H be the Hamiltonian vector field corresponding to the Hamiltonian H defined

above and we denote the corresponding flow by et ~H . If t 7→ et ~H(α) is a trajectory of the

above Hamiltonian flow, then its projection t 7→ γ(t) = π(et ~H(α)) is a locally minimizing

geodesic. That means sufficiently short segment of the curve γ is a minimizing geodesic

between its endpoints. The minimizing geodesics obtained this way are called normal

geodesics. In the special case where the distribution ∆ is the whole tangent bundle

TM , the distance function (10.1) is the usual Riemannian distance and all geodesics

are normal. However, this is not the case for subriemannian manifolds in general. To

introduce another class of geodesics, consider the space Ω of horizontal curves with square

integrable derivatives. The endpoint map end : Ω → M is defined by taking an element γ

in space of curves Ω and giving the endpoint γ(1) of the curve: end(γ) = γ(1). Geodesics

which are regular points of the endpoint map are automatically normal and those which

are critical points are called abnormal. However, there are geodesics which are both

normal and abnormal (see [45] and reference therein for more detail about abnormal

geodesics).

As an example consider a manifold M of dimension m equipped with a free and

proper Lie group action. If G is the group, then the quotient N := M/G is again a

manifold of dimension n and the quotient map πM : M → N defines a principal bundle

with a total space M , a base space N and a structure group G. The kernel of the

map dπM : TM → TN defines a distribution ver of rank m − n, called the vertical


bundle. A Ehresmann connection is a distribution hor, called horizontal bundle, of rank

n which is fibrewise transversal to the vertical bundle ver. The bundle hor is a principal

bundle connection (or a connection) if it is preserved under the Lie group action. A

subriemannian metric, defined on a connection hor, which is invariant under the Lie

group action is called a metric of bundle type. This subriemannian metric descends to a

Riemannian metric on the base space N . In this thesis two examples will be considered.

They are the Heisenberg group and the Hopf fibration.

The Heisenberg group is a principal bundle with the three dimensional Euclidean

space R3 as its total space. If x, y, z are the coordinates of the total space, then the Lie

group action is a R-action and it is given by the flow of the vector field ∂z. The base

space of this principal bundle is the two dimensional Euclidean space R2 = R3/R. The

connection hor is defined by the span of the vector fields ∂x − 12y∂z and ∂y + 1

2x∂z. The

subriemannian metric is defined by declaring that the above vector fields are orthonormal.

The Hopf fibration is a principal S1-bundle over the two sphere S2 with the three

sphere S3 as the total space. The explicit formulas for the definition of the Hopf fibration

can be found in [19]. Here we recall them for the convenience of the reader. Let w, x, y, z

be the coordinates on the four dimensional Euclidean space R4. The flow of the vector

field Z = −x∂w + w∂x + z∂y − y∂z is a circle action on the unit sphere S3. The bundle

map πM is given by πM(w, x, y, z) = (w2+x2−y2−z2, 2wz+2xy, 2xz−2wy) which maps

the unit 3-sphere S3 to the unit 2-sphere S2. The vector fields −y∂w − z∂x + w∂y + x∂z

and −z∂w + y∂x− x∂y + w∂z define a distribution of rank 2 and a subriemannian metric

on S3. The subriemannian metric descends to a Riemannian metric on the 2-sphere S2

which is twice the usual metric induced from R3.

Chapter 11

Generalized Curvatures

11.1 The General Case

In this chapter we recall the definition of the curvature type invariants studied in [3,

10, 37]. Then, we specialize it to the case of a three dimensional contact subriemannian

manifold. First let us consider the following general situation. Let H : T ∗M → R be

a Hamiltonian and let ~H be the corresponding Hamiltonian vector field. If we denote

the flow of the vector field ~H by et ~H and a point on the manifold T ∗M by α, then the

differential de−t ~H : Tet ~H(α)T∗M → TαT ∗M of the map e−t ~H is a symplectic transfor-

mation between the symplectic vector spaces Tet ~H(α)T∗M and TαT ∗M . Recall that the

vertical space Vα at α of the bundle π : T ∗M → M is defined by the kernel of the map

dπα : TαT ∗M → Tπ(α)M . Since each vertical space Vα is a Lagrangian subspace, the one

parameter family of subspaces t 7→ Jα(t) := de−t ~H(Vet ~H(α)) defines a curve of Lagrangian

subspaces contained in the symplectic vector space TαT ∗M . This curve is called the

Jacobi curve at α. In other words if the space of all Lagrangian subspaces, called La-

grangian Grassmannian, of a symplectic vector space Σ is denoted by LG(Σ), then the

Jacobi curve above is a smooth curve in the Lagrangian Grassmannian LG(TαT ∗M).

The Lagrangian Grassmannian is a homogeneous space of the symplectic group, and

93

Chapter 11. Generalized Curvatures 94

curvature type invariants of the Hamiltonian H are simply differential invariants of the

Jacobi curve under the action of the symplectic group (see [37]).

The construction of differential invariants for a general curve J(·) in the Lagrangian

Grassmannian LG(Σ) was done in the recent paper [37], though partial results were

obtained earlier (see [3, 10]). A principal step is the construction of the canonical splitting:

Σ = J(t)⊕ J(t),

where t 7→ J(t) is another curve in the Lagrangian Grassmannian LG(Σ) such that J(t)

is intrinsically defined by the germ of the curve J(·) at time t.

In the case of the Jacobi curve Jα(·), we have the splitting of the symplectic vector

space TαT ∗M : TαT ∗M = Jα(t) ⊕ Jα(t). In particular the subspace Jα(0) is the vertical

space Vα of the bundle π : T ∗M → M and the subspace Jα(0) is a complimentary

subspace to Jα(0) = Vα at time t = 0. Hence, Jα(0)α∈T ∗M defines an Ehresmann

connection on the bundle π : T ∗M → M . It is shown (see [3]) that this connection defines

a torsion free connection since Jα(0) are Lagrangian subspaces of the symplectic vector

space TαT ∗M . However, it is not a linear connection in general. In the Riemannian case

this is, under the identification of the tangent and cotangent spaces by the Riemannian

metric, simply the Levi-Civita connection (see [3]).

Using the above splitting we can also define a generalization of the Ricci curvature in

the Riemannian geometry. Indeed let πJα(t) and πJα(t) be the projections, corresponding to

the splitting TαT ∗M = Jα(t)⊕Jα(t), onto the subspaces Jα(t) and Jα(t), respectively. Let

w(·) be a path contained in the Jacobi curve Jα(·) (i.e. w(t) ∈ Jα(t)). Then the projection

πJα(t)w(t) of its derivative w(t) onto the subspace Jα(t) depends only on the vector w(t)

but not on the curve w(·). Therefore, it defines a linear operator ΦtJαJα

: Jα(t) → Jα(t)

ΦtJαJα

(v) = πJα(t)

d

dtw(t).

Similarly, we can also define another operator ΦtJαJα

: Jα(t) → Jα(t). Finally the

generalized Ricci curvature is defined by negative of the trace of the linear operator


Φ0JαJα

Φ0JαJα

: Jα(0) → Jα(0).

Recall that a basis e1, ..., en, f1, ..., fn in a symplectic vector space with a symplectic

form ω is a Darboux basis if it satisfies ω(ei, ej) = ω(fi, fj) = 0, and ω(fi, ej) = δij. The

canonical splitting Σ = J(t)⊕J(t) mentioned above is accompanied by a moving Darboux

basis e1(t), ..., en(t), f1(t), ..., fn(t) of the symplectic vector space Σ satisfying

J(t) = spane1(t), . . . , en(t), J(t) = spanf1(t), . . . , fn(t)

and the structural equations

ei(t) = c1ij(t)ej(t) + c2

ij(t)fj(t), f it = c3

ij(t)ej(t) + c4ij(t)fj(t).

This is an analog of the Frenet frame of a curve in the Euclidean space. The generalized

Ricci curvature is given by the trace of the matrix −c3(0)c2(0), where c2(0) and c3(0) are

the matrices with entries c2ij(0) and c3

ij(0), respectively.

The most interesting cases for us are contact subriemannian structures on three di-

mensional manifolds. To define such a structure, let ∆ be a distribution of rank two (i.e.

∆ is a vector subbundle of the tangent bundle and the dimension of each fibre is two)

on a three dimensional manifold M . We assume that ∆ is 2-generating meaning that

the vector fields contained the distribution ∆ together with their Lie brackets span all

tangent spaces of M . In other words

TM = spanX1, [X2, X3]|Xi ∈ ∆.

The structural equations, in this case, have the following form:

Theorem 11.1 For each fixed α in the manifold T ∗M , there is a moving Darboux frame

e1(t), e2(t), e3(t), f1(t), f2(t), f3(t)

in the symplectic vector space TαT ∗M and functions R11t , R22

t of time t such that e1(t), e2(t), e3(t)


form a basis for the Jacobi curve Jα(t) and it satisfies the following structural equations

e1(t) = f1(t),

e2(t) = e1(t),

e3(t) = f3(t),

f1(t) = −R11t e1(t)− f2(t),

f2(t) = −R22t e2(t),

f3(t) = 0.

Moreover, e3(t) = 1√2H

( ~E − t ~H) and f3(t) = − 1√2H

~H.

Proof According to the main result in [37], there exists a family of Darboux frame

e1(t), e2(t), e3(t), f1(t), f2(t), f3(t) and functions Rijt of time t which satisfy

e1(t) = f1(t),

e2(t) = e1(t),

e3(t) = f3(t),

f1(t) = −R11t e1(t)−R31

t e3(t)− f2(t),

f2(t) = −R22t e2(t)−R32

t e3(t),

f3(t) = −R31t e1(t)−R32

t e2(t)−R33t e3(t).

(11.1)

Let δs be the dilation in the fibre direction defined by δs(α) = sα and let ~E be the

Euler field defined by ~E(α) = dds

∣∣∣s=1

δsα. By the definition of the Jacobi curve Jα(t), the

time dependent vector field (et ~H)∗ ~E is contained in Jα(t). Next we need the following

lemma.

Lemma 11.2 (et ~H)∗ ~E = ~E − t ~H

Proof of Lemma 11.2

Using the definitions of the symplectic form ω and the Hamiltonian vector field ~H,

we have

ω(dδs( ~H(α)), X(sα)) = sω( ~H(α), dδ1/s(X(sα))) = −sdH(dδ1/s(X(sα))).


Since the Hamiltonian is homogeneous of degree two in the fibre direction, the above

equation becomes

ω(dδs( ~H(α)), X(sα)) = −1

sdH(X(sα)) =

1

sω( ~H(sα), X(sα)).

It follows that δ∗s ~H = s ~H, where δ∗s ~H is the pullback of the vector field ~H by the map

δs. By comparing the flow of the above vector fields, we have

et ~H δs = δs ets ~H .

By differentiating the above equation with respect to s and set s to 1, it follows that

(et ~H)∗ ~E = ~E − t ~H as claimed. ¤

It follows from Lemma 11.2 that ~E− t ~H =3∑

i=1

ai(t)ei(t) for some functions ai of time

t. If we differentiate with respect to time t twice, we get

2a1(t)f1(t) + 2a2(t)e1(t) + 2a3(t)f3(t)− a1(t)(R11t e1(t) + R31

t e3(t)+

+f2(t)) + a2(t)f1(t)− a3(t)(R31t e1(t) + R32

t e2(t) + R33t e3(t))+

+a1(t)e1(t) + a2(t)e2(t) + a3(t)e3(t) = 0.

If we equate the coefficients of the fi’s, we get a1 ≡ a2 ≡ a3 ≡ 0. Therefore, ~E − t ~H =

a3e3(t) and − ~H = a3f3(t) for some constant a3 satisfying (a3)2 = ω(a3f3(t), a3e3(t)) =

dH( ~E) = 2H. It follows that R31 = R32 = R33 = 0. ¤

11.2 The Three Dimensional Contact Case

In this section we come back to the case of a three dimensional contact subriemannian

manifold. We will write down explicit formulas (Theorem 11.3) for the generalized cur-

vature operator R and the moving Darboux frame e1(t),e2(t),e3(t), f1(t),f2(t),f3(t) in

Theorem 11.1.

Let ∆ be the contact distribution and let H be the Hamiltonian corresponding to a

given subriemannian metric g on ∆. Let σ be an annihilator 1-form of the distribution


∆. That means a vector v is in the distribution ∆ if and only if σ(v) = 0 (i.e. ker σ = ∆).

Since ∆ is a contact distribution, σ can be chosen in such a way that the restriction of

its exterior derivative dσ to the distribution ∆ is the volume form with respect to the

subriemannian metric g. Let v1, v2 be a local orthonormal frame in the distribution ∆

with respect to the subriemannian metric g and let v0 = e be the Reeb field defined by

the conditions ieσ = 1 and iedσ = 0. This defines a convenient frame v0, v1, v2 in the

tangent bundle TM and we let α0 = σ, α1, α2 be the corresponding dual co-frame in

the cotangent bundle T ∗M (i.e. αi(vj) = δij).

The frame v0, v1, v2 and the co-frame α0, α1, α2 defined above induces a frame in

the tangent bundle TT ∗M of the cotangent bundle T ∗M . Indeed, let ~αi be the vector

fields on the cotangent bundle T ∗M defined by i~αiω = −αi. Note that the symbol αi in

the definition of ~αi represents the pull back π∗αi of the 1-form α on the manifold M by

the projection π : T ∗M → M . This convention of identifying forms in the manifold M

and its pull back on the cotangent bundle T ∗M will be used for the rest of this paper

without mentioning. Let ξ1 and ξ2 be the 1-forms defined by ξ1 = h1α2 − h2α1 and

ξ2 = h1α1 + h2α2, respectively, and let ~ξi be the vector fields defined by i~ξiω = −ξi.

Finally if we let hi : T ∗M → R be the Hamiltonian lift of the vector fields vi, defined

by hi(α) = α(vi), then the vector fields ~h1,~h2,~h3, ~σ, ~ξ1, ~ξ2 define a local frame for the

tangent bundle TT ∗M of the cotangent bundle T ∗M . Under the above notation the

subriemannian Hamiltonian is given by H = 12((h1)

2 +(h2)2) and the Hamiltonian vector

field is ~H = h1~h1 + h2

~h2.

We also need the bracket relations of the vector fields v0, v1, v2. Let akij be the functions

on the manifold M defined by

[vi, vj] = a0ijv0 + a1

ijv1 + a2ijv2. (11.2)


The dual version of the above relation is

dαk = −∑

0≤i<j≤2

akijαi ∧ αj. (11.3)

By (11.3) and the definition of the Reeb field e = v0, it follows that dσ = dα0 = α1∧α2.

Therefore, a001 = a0

02 = 0 and a012 = −1. If we also take the exterior derivative of the

equation in (11.3), we get a101 + a2

02 = 0. Finally we come to the main theorem of this

section:

Theorem 11.3 The Darboux frame e1(t), e2(t), e3(t), f1(t), f2(t), f3(t) and the invariants

R11t and R22

t satisfy ei(t) = (et ~H)∗ei(0), fi(t) = (et ~H)∗fi(0), R11t = (et ~H)∗R11

0 , R11t =

(et ~H)∗R110 , and

e2(0) = 1√2H

~σ,

e1(0) = 1√2H

~ξ1,

f1(0) = 1√2H

[h1~h2 − h2

~h1 + χ0~α0 + (~ξ1h12)~ξ1 − h12~ξ2],

f2(0) = 1√2H

[2H~h0 − h0~H − χ1~α0 + (~ξ1a)~ξ1 − a~ξ2],

R110 = h2

0 + 2Hκ− 32~ξ1a,

R220 = ~ξ1a− 3 ~H~ξ1

~Ha + 3 ~H2~ξ1a + ~ξ1~H2a.

where

a = dh0( ~H),

χ0 = h2h01 − h1h02 + ~ξ1a,

χ1 = h0a + 2 ~H~ξ1a− ~ξ1~Ha,

κ = v1a212 − v2a

112 − (a1

12)2 − (a2

12)2 − 1

2(a2

01 − a102).

The rest of this section is devoted to the proof of this theorem. First we need a

few lemmas. Let hij : T ∗M → R be the Hamiltonian lift of the vector field [vi, vj]:

hij(α) = α([vi, vj]). Then the commutator relations of the frame ~hi, ~αi|i = 1, 2, 3 is

given by the following:

Lemma 11.4

[~hi,~hj] = ~hij, [~hi, ~αj] = −∑

k

ajik~αk, [~αi, ~αj] = 0.


Proof Since the Lie derivative of the symplectic form ω along the Hamiltonian vector

field ~hi vanishes,

i[~hi,~hj ]ω = ~hii~hj

ω = −di~hii~hj

ω. (11.4)

The function ω(~hi,~hj) is equal to hij. Indeed, since ω = dθ, we have θ(~hi) = hi. By

using Cartan’s formula, it follows that

dhj(~hi) = ω(~hi,~hj) = dhj(~hi)− dhi(~hj)− θ([~hi,~hj]).

Since dπ(~hi) = vi, it implies that θ([~hi,~hj]) = hij. Therefore, we have

ω(~hi,~hj) = −dhi(~hj) = hij. (11.5)

If we combine this with (11.4), the first assertion of the lemma follows.

A calculation similar to the above one shows that

i[~hi,~αj ]ω = ~hii~αj

ω.

By Cartan’s formula, the above equation becomes

i[~hi,~αj ]ω = −i~hi

π∗dαj = −π∗(ividαj).

The second assertion follows from this and (11.3).

If we apply Cartan’s formula again,

i[~αi,~αj ]ω = ~αii~αjω − i~αj

~αiω = −i~αid(π∗αj) + i~αj

d(π∗αi)

Since dπ(~αi) = 0, it follows that i[~αi,~αj ]ω = 0. Therefore, the third holds. ¤

Let β = h1dh2 − h2dh1, then we also have the following relations:

Lemma 11.5

dhi(~hj) = −hij, αi(~hj) = −dhi(~αj) = δij, αi(~αj) = 0,

β(~ξ2) = dH(~ξ1) = 0, β(~ξ1) = −2H, β(~ξ2) = −2H.


Proof The first assertion follows from (11.5) and the last two assertions follow from

dπ(~hi) = vi and dπ(~αi) = 0. A computation using αi(~hj) = δij proves the rest of the

assertions. ¤

Proof of Theorem 11.3 If we define E2 by E2(t) = (et ~H)∗~σ, then E2(t) is contained in

the Jacobi curve Jα(t). Since e1(t), e2(t), e3(t) span the Jacobi curve at time t,

E2(t) = c1(t)e1(t) + c2(t)e2(t) + c3(t)e3(t)

for some functions ci of time t.

Since H is the subriemannian Hamiltonian, the vector dπ( ~H) is contained in the

distribution ∆. Therefore, ω(~σ, ~H) = −π∗σ( ~H) = 0. Since f3 = −(2H)−1/2 ~H, we have

0 = ω(~σ, ~H) = ω(E2, ~H) = −(2H)−1/2c3(t).

This shows that c3 ≡ 0 and so E2(t) = c1(t)e1(t) + c2(t)e2(t). If we differentiate this

with respect to time t, then we have

(et ~H)∗[ ~H,~σ] = E2(t) = c1(t)e1(t)− c1(t)f1(t) + c2(t)e2(t)− c2(t)e1(t).

By Cartan’s formula, it follows that

ω([ ~H,~σ], ~σ) = π∗σ([ ~H,~σ]) = −π∗dσ( ~H,~σ) = 0.

By combining this with the above equation for E2 and E2, we have c1 ≡ 0. If we

differentiate the equation E2(t) = c2(t)e2(t) with respect to time t again, we get

E2(t) = c2(t)e2(t)

(et ~H)∗(ad ~H(~σ)) = c2(t)e2(t) + c2(t)e1(t)

(et ~H)∗(ad2~H(~σ)) = c2(t)e2(t) + 2c2(t)e1(t) + c2(t)f1(t).

This gives c := 1/c2(0) = (ωα(ad2~H(~σ), ad ~H(~σ)))−1/2, and c2(t) = (et ~H)∗c2(0). There-

fore, e2(t) = (et ~H)∗(c~σ).


To find out what c is more explicitly, we first compute [ ~H, ~α0]. The Lie bracket is a

derivation in each of its entries, so

[ ~H, ~α0] = [h1~h1 + h2

~h2, ~α0] = −dh1(~α0)~h1 − dh2(~α0)~h2 + h1[~h1, ~σ] + h2[~h2, ~σ].

It follows from this, Lemma 11.4, and Lemma 11.5 that

[ ~H, ~α0] = h1~α2 − h2~α1 = ξ1.

Next, we want to compute [ ~H, ~ξ1]. For this, let

[ ~H, ~ξ1] = k0~α0 + k1~ξ1 + k2

~ξ2 +3∑

i=0

ci~hi (11.6)

for some functions ci and ki.

To compute c0 for instance, we apply α0 on both sides of (11.6). Using Lemma 11.5

and Cartan’s formula, we have c0 = 0. Similar computation gives c1 = −h2 and c2 = h1.

This shows that

[ ~H, ~ξ1] = k0~α0 + k1~ξ1 + k2

~ξ2 + h1~h2 − h2

~h1. (11.7)

By applying dh0 on both sides of (11.7) and using Lemma 11.5 again, we have k0 =

h2h01 − h1h02 + ~ξ1a. Similar calculations using β and dH give

[ ~H, ~ξ1] = h1~h2 − h2

~h1 + χ0~α0 + (~ξ1h12)ξ1 − h12ξ2. (11.8)

where χ0 = h2h01 − h1h02 + ~ξ1a and a = dh0( ~H).

It follows that

c−2 = ω(ad2~H(~σ), ad ~H(~σ)) = 2H

and e2(0) = 1√2H

~α0. It also follows from Theorem 11.3 that

e1(0) = 1√2H

~ξ1,

f1(0) = 1√2H

[ ~H, ~ξ1],

f1(0) = 1√2H

[ ~H, [ ~H, ~ξ1]],

f1(0) = 1√2H

[ ~H, [ ~H, [ ~H, ~ξ1]]].

(11.9)


A computation similar to that of (11.8) gives

[ ~H, [ ~H, ~ξ1]] = −2H~h0 + h0~H + χ1~α0 + (χ2 + χ0 − ~ξ1a)ξ1 + aξ2 (11.10)

where χ1 = h0a + 2 ~H~ξ1a− ~ξ1~Ha and χ2 = h0h12 + 2 ~H~ξ1h12 − ~ξ1

~Hh12.

It follows from Theorem 11.3, (11.8), and (11.10) that

R110 = −χ0 − χ2. (11.11)

Since f1(0) = −R110 e1(0)− f2(0), it follows from (11.9), (11.10), and (11.11) that

f2(0) =1√2H

[2H~h0 − h0~H − χ1~α0 + (~ξ1a)ξ1 − aξ2].

A long computation using the bracket relations (11.2) gives

χ2 = −(h0)2 + 2H[(a1

12)2 + (a2

12)2 − v1a

212 + v2a

112] + ~ξ1a.

and

χ0 − 1

2~ξ1a = h2h01 − h1h02 +

1

2~ξ1a = H(a2

01 − a102).

It follows as claimed that

R110 = h2

0 + 2Hκ− 3

2~ξ1a.

To prove the formula for R22, we differentiate the equation f1(t) = −R11t e1(t)− f2(t)

and combine it with the equation f2(t) = −R22t e2(t). We have

R220 e2(0) = f1(0) + ~HR11

0 e1(0) + R110 f1(0).

Therefore, by applying dh0 on both sides and using dh0(e1(0)) = 0, we get

R220 = −

√2H[dh0(f1(0)) + R11

0 dh0(f1(0))].

by using Cartan’s formula and (11.9), it follows that

√2Hdh0(f1(0)) = dh0([ ~H, ~ξ1]) = −~ξ1a,


√2Hdh0(f1(0)) = dh0([ ~H, [ ~H, ~ξ1]]) = ~ξ1

~Ha− 2 ~H~ξ1a,

and therefore,

√2Hdh0(f1(0)) = 3 ~H~ξ1

~Ha− 3L2~H~ξ1a− ~ξ1

~H2a.

The formula for R220 follows from this.

¤

Chapter 12

Measure Contraction Properties

Measure contraction property is introduced in [59] as one of the generalizations of cur-

vature dimension bound to all metric measure spaces. In the setting of a subriemannian

manifold with a 2-generating distribution, a simpler definition can be given. To do this,

we first recall the recent results on the theory of optimal transportation in [5] and [24].

Let µ and ν be two Borel probability measures on the subriemannian manifold M with

a distribution ∆ and a subriemannian metric g. If we let H be the Hamiltonian corre-

sponding to the metric g and dCC be the corresponding subriemannian distance, then

the optimal transportation problem is the following minimization problem:

Find a Borel map ϕ : M → M which achieves the following infimum

inf

∫

M

d2CC(x, ϕ(x))dµ(x) (12.1)

where the infimum is taken over all Borel map ϕ which pushes µ forward to ν. (i.e.

µ(ϕ−1(U)) = ν(U) for all Borel sets U .)

The minimizers to the above problem are called optimal maps. The following theorem

is one of the main results in [5, 24] which generalizes the earlier work in [16, 42]. This is

also Theorem 3.20 in this thesis.

Theorem 12.1 Assume that the distribution ∆ is 2-generating and the measure µ is ab-

solutely continuous with respect to the Lebesgue measure, then the optimal transportation

105

Chapter 12. Measure Contraction Properties 106

problem has a solution ϕ and any optimal map equals to this one µ almost everywhere.

Moreover, ϕ is given by ϕ(x) = π(e1 ~H(−dfx)) for some Lipschitz function f .

Many important results in the theory of optimal transportation rely on the study of

the following 1-parameter family of maps called displacement interpolation introduced

by R. McCann (see [64] for the history and importance of displacement interpolation):

ϕt(x) := π(et ~H(−dfx)).

If ϕ1 is the optimal map between the measures µ and ν, then ϕt is optimal between

µ and ϕt∗µ. All the above results also hold when the distance squared cost d2 is replaced

by costs defined by Lagrange’s problem (see [14, 5]). In those cases the subriemannian

Hamiltonian H in Theorem 12.1 is replaced by more general Hamiltonians.

If we set, in the displacement interpolation, f(x) = d2CC(x, x0) for some given point

x0 on the manifold M , then ϕ1 is the optimal map which pushes any measure µ to the

delta mass concentrated at a point x0. In this case the curves defined by t 7→ ϕt(x) :=

π(et ~H(−dfx)) are unique normal geodesics joining the points x and x0 for Lebesgue almost

all x.

Let η be a Borel measure on the manifold M and let ϕt be the map ϕt(x) =

π(et ~H(−dfx)), where f(x) = d2CC(x, x0). The metric measure space (M,dCC , η) satis-

fies the measure contraction property MCP (k, N) if

(1− t)

sin(

√k

N−1dCC(x0, x))

sin(√

kN−1

dCC(x0, x)/(1− t))

N−1

ϕt∗η(U) ≤ η(U)

for each η measurable set U and each point x0 in the manifold M .

When K = 0, the measure contraction property becomes

(1− t)Nϕt∗η(U) ≤ η(U).

Next we specialize to the case where M is a three dimensional manifold with a contact

distribution ∆ and a subriemannian metric g. Let dCC be the corresponding subrieman-

nian distance function and let R11, R22 be the invariants defined as in Theorem 11.1.


Recall that the kernel of the map dπ : TT ∗M → TM defines the vertical bundle V on

the manifold T ∗M . Let m be the three form on the manifold T ∗M such that it is zero on

the vertical spaces Vα and mα(f1(0), f2(0), f3(0)) = 1 where f1(0), f2(0), f3(0) is defined

as in Theorem 11.1. The following lemma shows that the Hamiltonian H is unimodular

(see Appendix 2 for the definition of unimodular).

Lemma 12.2 Let η be a smooth volume form on the manifold M such that η(e, v1, v2) =

1, then π∗η =√

2Hm.

Proof Clearly, π∗η is zero on the space V . Therefore, it is enough to show that

π∗η(f1(0), f2(0), f3(0)) =√

2H, and this follows from Theorem 11.3 and the definition of

η. ¤

We also use the same notation for the measure corresponding to the volume form η.

Finally we come to the main result:

Theorem 12.3 If R110 (α) ≥ 2rH(α) and R22

0 ≥ 0, then the metric measure space

(M, dCC , η) satisfies

(1− t)5ϕt∗η(U) ≤[(1− t)(2− 2 cos T0 − T0 sin T0)

(2− 2 cos Tt − Tt sin Tt)

]ϕt∗η(U) ≤ η(U) (12.2)

if r > 0 and [(1− t)(2− 2 cosh T0 + T0 sinh T0)

(2− 2 cosh Tt + Tt sinh Tt)

]ϕt∗η(U) ≤ η(U) (12.3)

if r < 0, for each η measurable set U and each x0 in the manifold M , where Tt(x) =√

rdCC(x0,x)1−t

.

In particular if r ≥ 0, then (M,dCC , η) satisfies the measure contraction property

MCP (0, 5).

Definition 12.4 We say that a metric measure space satisfies the generalized measure

contraction property MCP (r; 2, 3) if either (12.2) or (12.3) holds.


Therefore, Theorem 12.3 says that if a three dimensional contact subriemannian man-

ifold satisfies R110 (α) ≥ 2rH(α) and R22

0 ≥ 0, then it satisfies the generalized measure

contraction property MCP (r; 2, 3). Note also that the condition MCP (0; 2, 3) is the

same as MCP (0, 5).

As a corollary of Theorem 12.3, we have the following doubling property (see [59]).

Corollary 12.5 (Doubling) Let Bx(r) be the ball of radius r centred at x in the space

(M, dCC , η). If R110 ≥ 0 and R22

0 ≥ 0, then it satisfies the following doubling property:

η(Bx(2r)) ≤ 25η(Bx(r)).

Recall that a Borel function h : M → R is the upper gradient of a function f : M → R

if

|f(x(0))− f(x(1))| ≤ l(x(·))∫ 1

0

h(x(s))ds

for each curve x(·) of finite length l(x(·)).The following local Poincare inequality also holds as a corollary of Theorem 12.3 (see

the proof of [40, Theorem 3.1] and [40, Theorem 2.5]).

Corollary 12.6 (Local Poincare Inequality) If the manifold M is compact and the in-

variants R110 and R22

0 are non-negative, then (M,dCC , η) satisfies the following local

Poincare inequality

1

ν(Bx(r))

∫

Bx(r)

|f(x)− 〈f〉Bx(r) |dη(x) ≤ Cr

ν(Bx(2r))

∫

ν(Bx(2r))

h(x)dη(x),

for some constant C and where 〈f〉Bx(r) = 1η(Bx(r))

∫Bx(r)

f(x)dη(x).

The rest of this chapter is devoted to the proof of Theorem 12.3.

Proof of Theorem 12.3 From the main result in [17], the function f(x) = d(x, x0) is

locally semiconcave on M − x0. So, by [24, Theorem 3.5] and [24, Section 3.4], the

measures ϕt∗η are absolutely continuous with respect to the Lebesgue class. Therefore,


ϕt∗η = ρtη for some function ρt. Hence, it is enough to show the following holds η almost

everywhere: [(1− t)(2− 2 cos T0 − T0 sin T0)

(2− 2 cos Tt − Tt sin Tt)

]ρt ≤ 1 (12.4)

if r > 0, [(1− t)(2− 2 cosh T0 + T0 sinh T0)

(2− 2 cosh Tt + Ti sinh Tt)

]ρt ≤ 1 (12.5)

if r < 0, and

(1− t)5ρt ≤ 1, (12.6)

if r = 0.

The function f(x) = d(x, x0) is locally semiconcave on M − x0, so it is twice

differentiable almost everywhere by Alexandrov’s theorem (see for instance [64]). If we

denote the differential of the map x 7→ −dfx by F , then dϕt = dπdet ~HF . Let ei(t) and

fi(t) be the Darboux frame at α defined as in Theorem 11.1 and let ςi = dπ(fi(0)). Then

the vectors F(ς1),F(ς2),F(ς3) span a linear subspace W of TαT ∗M . Let ei(t) and fi(t)

be the Darboux frame at α defined as in Theorem 11.1, then F(ςi) can be written as

F(ςi) =3∑

k=1

(aij(t)ej(t) + bij(t)fj(t)) or Ψ = AtEt + BtFt,

where At is the matrix with entries aij(t), Bt is the matrix with entries bij(t), and Ψ, Et,

and Ft are matrices with rows F(ςi), ei(t), and fi(t), respectively.

It follows from absolute continuity of the measure ϕt∗µ and the result in [5, 24] that

the map ϕt(x) is injective for η almost all x. We fix a point z for which the map ϕt

is injective and the path s 7→ ϕs(z) is minimizing. Such a point exists Lebesgue and

hence η almost everywhere. It follows from [1, Theorem 1.2] that there is no conjugate

point along the curve s 7→ ϕs(z). Therefore, the map ϕs is nonsingular for each s

in [0, t] and so ρ(ϕt(z)) 6= 0 for each s in [0, t]. Let St be the matrix defined as in

Theorem 15.2. Recall that St is defined as follow: the linear space W is transversal

to the space Jα(t) = spane1(t), ..., en(t). Therefore, the linear subspace W defined


above is the graph of a linear map from the space spanf1(t), ..., fn(t) to the space

Jα(t) = spane1(t), ..., en(t). Let St be the corresponding matrix (i.e. the linear map is

given by fi(t) 7→∑3

i=1 Sijt ej(t), where Sij

t are the entries of the matrix St). Finally recall

that St satisfies St = B−1t At.

Using the structural equation (11.1) and Theorem 15.2, we get the following.

Lemma 12.7 The matrices St satisfies the following matrix Riccati equation:

St −Rt + StC1 + CT1 St − StC2St = 0,

where Rt =

R11t 0 0

0 R22t 0

0 0 0

, C1 =

0 0 0

1 0 0

0 0 0

and C2 =

1 0 0

0 0 0

0 0 1

.

If t is sufficiently close to 1, then S−1t exists and it is the solution to the following

initial value problem

d

dt(S−1

t ) + S−1t RtS

−1t − C1S

−1t − S−1

t CT1 + C2 = 0 and S−1

1 = 0.

Proof The matrix Riccati equation follows from (11.1) and Theorem 15.2. To show

that S−11 = 0, it is enough to show that B1 = 0. If we let γ be a path such that γ(0) = ςi,

then ϕ1(γ(s)) = x0. By differentiating this equation with respect to s, we get

dπde1· ~HF(ςi) = dϕ1(ςi) = 0.

It follows that F(ςi) is contained in spane1(1), e2(1), e3(1) and so B1 = 0. ¤

Let us consider the following simpler matrix Ricatti equation:

d

dt(S−1

t ) + S−1t RtS

−1t − C1S

−1t − S−1

t CT1 + C2 = 0 (12.7)

together with the condition S−11 = 0, where R =

2rH 0 0

0 0 0

0 0 0

.


Lemma 12.8 Let τt = (1− t)√

2 |r|H. If r > 0, then the solution to (12.7) is given by

St =

τ0(sin τt−τt cos τt)D

τ20 (1−cos τt)

D 0

τ20 (1−cos τt)

Dτ30 sin τt

D 0

0 0 11−t

where D = 2− 2 cos τt − τt sin τt.

If r < 0, then it is

St =

τ0(τt cosh τt−sinh τt)Dh

τ20 (cosh τt−1)

Dh 0

τ20 (cosh τt−1)

Dh

τ30 sinh τt

Dh 0

0 0 11−t

where Dh = 2− 2 cosh τt + τt sinh τt.

Finally if r = 0, then the solution becomes

St =1

(1− t)3

4(1− t)2 6(1− t) 0

6(1− t) 12 0

0 0 (1− t)2

.

Proof of Lemma 12.8 In the case r = 0, there is no quadratic term in the matrix Ricatti

equation. Therefore, the proof in this case is straightforward and will be omitted.

For other values of r, consider the matrix A =

C1 −C2

R −CT1

and the corresponding

matrix differential equation ddt

q = Aq together with the condition q(1) = I.

The fundamental solution is given by

q(t) = e(t−1)A =

cos τt 0 0 sin τt

τ01−cos τt

τ20

0

− sin τt

τ01 0 cos τt−1

τ20

sin τt−τt

τ30

0

0 0 1 0 0 1− t

−τ0 sin τt 0 0 cos τtsin τt

τ00

0 0 0 0 1 0

0 0 0 0 0 1

.


if r > 0 and it is

q(t) = e(t−1)A =

cosh τt 0 0 sinh τt

τ0cosh τt−1

τ20

0

− sinh τt

τ01 0 1−cosh τt

τ20

τt−sinh τt

τ30

0

0 0 1 0 0 1− t

τ0 sinh τt 0 0 cosh τtsinh τt

τ00

0 0 0 0 1 0

0 0 0 0 0 1

.

if r < 0.

It follows from [36, Theorem 1] that

S−1t =

sin τt

τ01−cos τt

τ20

0

cos τt−1τ20

sin τt−τt

τ30

0

0 0 1− t

cos τtsin τt

τ00

0 1 0

0 0 1

−1

=

tan τt

τ0cos τt−1τ20 cos τt

0

cos τt−1τ20 cos τt

tan τt−τt

τ30

0

0 0 1− t

.

if r > 0 and

S−1t =

sinh τt

τ0cosh τt−1

τ20

0

1−cosh τt

τ20

τt−sinh τt

τ30

0

0 0 1− t

cosh τtsinh τt

τ00

0 1 0

0 0 1

−1

=

tanh τt

τ01−cosh τt

τ20 cosh τt

0

1−cosh τt

τ20 cosh τt

τt−tanh τt

τ30

0

0 0 1− t

.

if r < 0.

Therefore, inverting the above matrix gives the result. ¤

Since R11t ≥ 2rH and R22

t ≥ 0, by comparison theorem of the matrix Riccati equation

(see [25, Theorem 2.1]), we have S−1t ≥ S−1

t ≥ 0 for t close enough to 1. Here A ≥ B


means that A−B is nonnegative definite. By monotonicity (see [15, Proposition V.1.6]),

0 ≤ St ≤ St for t close enough to 1. If we apply the same comparison principle to St and

St, then 0 ≤ St ≤ St for all t in [0, 1]. Therefore,

tr(St

1 0 0

0 0 0

0 0 1

)≥ tr

(St

1 0 0

0 0 0

0 0 1

)(12.8)

If r = 0, then

S11t + S33

t =5

1− t.

If r > 0, then

S11t + S33

t =τ0(sin τt − τt cos τt)

2− 2 cos τt − τt sin τt

+1

1− t.

If r < 0, then

S11t + S33

t =τ0(τt cosh τt − sinh τt)

2− 2 cosh τt + τt sinh τt

+1

1− t.

If we integrate the above equations, we get

∫ t

0

S11s + S33

s ds = − log(1− t)5 (12.9)

if r = 0, ∫ t

0

(S11s + S33

s )ds = − log

[(1− t)(2− 2 cos τt − τt sin τt)

(2− 2 cos τ0 − τ0 sin τ0)

](12.10)

if r > 0, and

∫ t

0

(S11s + S33

s )ds = − log

[(1− t)(2− 2 cosh τt + τt sinh τt)

(2− 2 cosh τ0 + τ0 sinh τ0)

](12.11)

if r < 0.

It follows from Theorem 15.2 that

ρt(ϕt(z)) = e∫ t0 S11

s +S33s ds.

Combining this with (12.8), (12.9), (12.10), and (12.11) give

[(1− t)(2− 2 cos τt − τt sin τt)

(2− 2 cos τ0 − τ0 sin τ0)

]ρt(ϕt(x)) ≤ 1


if r > 0, [(1− t)(2− 2 cosh τt + τt sinh τt)

(2− 2 cosh τ0 + τ0 sinh τ0)

]ρt(ϕt(x)) ≤ 1

if r < 0, and

(1− t)5ρt(ϕt(x)) ≤ 1

if r = 0.

To complete the proof of the theorem, note that τt = (1 − t)√

2 |r|H = (1 −t)

√|r|dCC(x0, z) =

√|r|dCC(x0, ϕt(z)). This shows (12.4), (12.5) and (12.6) holds ϕt∗µ

almost everywhere. But, ρt vanishes µ-almost everywhere on a set of ϕt∗µ-measure zero

set. Therefore, (12.4), (12.5) and (12.6) holds µ-almost everywhere. ¤

Chapter 13

Isoperimetric Problems

In this chapter we specialize to the case which model the isoperimetric problem or a

particle in a constant magnetic field on a Riemannian surface. More precisely, assume

that the vector field e, which is transversal to the distribution ∆, defines a free and proper

Lie group G-action (i.e. G = S1 or G = R). Then the quotient N := M/G is again a

manifold. Assume also that the subriemannian metric g is a metric of bundle type (i.e g

is invariant under the above action). Under these assumptions the subriemannian metric

g descends to a Riemannian metric on the surface N . In this case Theorem 11.3 and 12.3

simplify to

Theorem 13.1

e2(0) = 1√2H

~σ,

e1(0) = 1√2H

~ξ1,

f1(0) = 1√2H

[h1~h2 − h2

~h1 + 2Ha201~α0 + (~ξ1h12)~ξ1 − h12

~ξ2],

f2(0) = 1√2H

[2H~h0 − h0~H],

R110 = h2

0 + 2Hκ,

R220 = 0.

where κ is the Gauss curvature of the surface N .

As a consequence the metric measure space (M,dCC , η) satisfies the generalized mea-

sure contraction property MCP (κ; 2, 3). In particular, it satisfies the measure contraction

115

Chapter 13. Isoperimetric Problems 116

property MCP (0, 5) if κ ≥ 0.

Proof

Since g is a metric of bundle type, the following holds.

Lemma 13.2 Under the above assumptions, the functions akij in the bracket relation

(11.2) satisfies

a001 = a0

02 = a101 = a2

02 = 0anda201 = −a1

02.

Proof of Lemma 13.2 If the flow of the vector field e = v0 is denoted by ete, then the

invariance of the subriemannian metric under the group action implies that

g((ete)∗vi, (ete)∗vj) = δij, σ((ete)∗vj) = 0, i, j = 1, 2.

By differentiating the above equations with respect to time t, it follows that

αj([e, vi]) + αi([e, vj]) = 0, σ([e, vj]) = 0, i, j = 1, 2.

If we apply the bracket relations (11.2) of the frame v1, v2, v3, we have

aj0i + ai

0j = αj([e, vi]) + αi([e, vj]) = 0, a00j = σ([e, vj]) = 0, i, j = 1, 2.

¤

It follows that

Lemma 13.3 The function h0 is a constant of motion of the flow et ~H . i.e. a = dh0( ~H) =

0.

Proof of Lemma 13.3 This follows from general result in Hamiltonian reduction. In this

special case this can also be seen as follow. By Lemma 11.5

dh0( ~H) = dh0(h1~h1 + h2

~h2) = h1h10 + h2h20. (13.1)


By Lemma 13.2 we also have

h10 = −a001h0 − a1

01h1 − a201h2 = −a2

01h2.

Similarly h20 = −a102h1. The result follows from this, (13.1), and Lemma 13.2. ¤

It follows from Lemma 13.2 and Lemma 13.3 that χ0 = 2Ha201. It remains to show

that κ is the Gauss curvature of the surface N . By Lemma 13.2 κ simplifies to

κ = v1a212 − v2a

112 − (a1

12)2 − (a2

12)2. (13.2)

Let w1 and w2 be a local orthonormal frame on the surface N . Let w1 and w2 be

the horizontal lift of w1 and w2, respectively. Since w1 and w2 are orthonormal with

respect to the subriemannian metric, we can set vi = wi. It follows from (11.2) that

[w1, w2] = a112w1 + a2

12w2. Let us denote the covariant derivative on the Riemannian

manifold N by ∇. It follows from Koszul formula ([50, Theorem 3.11]) that

∇v1v1 = −a112v2, ∇v2v2 = −a2

12v1,

∇v1v2 = a112v1, ∇v2v1 = −a2

12v2.(13.3)

Since the covariant derivative ∇ is tensorial in the bottom slot and is a derivation in

the other slot, it follows from (13.3) that

∇[v1,v2]v1 = ∇a112v1+a2

12v2v1

= a112∇v1v1 + a2

12∇v2v1

= −[(a112)

2 + (a212)

2]v2

and

[∇v1 ,∇v2 ]v1 = ∇v1∇v2v1 −∇v2∇v1v1

= −∇v1(a212v2) +∇v2(a

112v2)

= −(v1a212)v2 + (v2a

112)v2 − 2a1

12a212v1.

Therefore, it follows from the above calculation and (13.2) that the Gauss curvature

is given by

κ =< ∇[v1,v2]v1 − [∇v1 ,∇v2 ]v1, v2 >

= −(a112)

2 − (a212)

2 + v1a212 − v2a

112.


as claimed. ¤

Recall that the Heisenberg group is the Euclidean space R3 equipped with the distri-

bution ∆ = spanv1 = ∂x − 12y∂z, v2 = ∂y + 1

2x∂z). Let g be the subriemannian metric

for which v1 and v2 is orthonormal and let H be the subriemannian Hamiltonian. The

vector e which defines the action is given by e = [v1, v2] = ∂z. This defines a R-action

and the quotient of the manifold M by this action is N = M/G = R2. The measure η

is the Lebesgue measure. Therefore, by applying Theorem 12.3 and Theorem 13.1, we

recover the following theorem of [32].

Theorem 13.4 The Heisenberg group with subriemannian metric defined above together

with the Lebesgue measure satisfies the measure contraction property MCP (0, 5).

Next we look at the Hopf fibration. Let S3 be the unit sphere in the four dimensional

Euclidean space R4. The vector field e = −x∂w+w∂x+z∂y−y∂z defines a circle S1 action

on S3. The quotient N = M/G is the 2-sphere S2. The vector fields −y∂w− z∂x +w∂y +

x∂z and −z∂w + y∂x − x∂y + w∂z define a distribution of rank 2 and a subriemannian

metric on S3. Finally the volume form η is given by η = 12(−zdw ∧ dx∧ dy + wdx∧ dy ∧

dz + ydw ∧ dx ∧ dz − xdw ∧ dy ∧ dz).

Theorem 13.5 The 3-sphere S3 equipped with the above subriemannian metric satisfies

the generalized measure contraction property MCP (2; 2, 3). In particular, it satisfies the

measure contraction property MCP (0, 5).

Part IV

Appendix

119

Chapter 14

Proof of Pontryagin Maximum

Principle for the Bolza Problem

This appendix is devoted to the prove of Theorem 2.3. The first step is to reduce the

problem to a simpler one. Recall that the Bolza problem is the following minimization

problem:

Find minimizers for

inf(x(·),u(·))∈Cx0

∫ 1

0

L(x(s), u(s)) ds− f(x(1))

where the infimum is taken over all admissible pairs (x(·), u(·)) satisfying the control

system

x(s) = F (x(s), u(s))

and initial condition x(0) = x0.

Let x = (x, z) be a point in the product manifold M × R and consider the following

extended control system on it:

x = F (x, u) := (F (x, u), L(x, u)). (14.1)

Note that x(·) = (x(·), z(·)) satisfies this extended system and initial condition x(0) =

(x0, 0) if and only if x(·) satisfies the original control system in the Bolza problem with

120

Chapter 14. Proof of Pontryagin Maximum Principle for the Bolza Problem121

the initial condition x(0) = x0 and z(t) =∫ t

0L(q(s), u(s)) ds. Therefore, Problem 2.2 is

equivalent to the following problem:

Problem 14.1 Find minimizers for

inf(x(·),u(·))∈C(x0,0)

(z(1)− f(x(1))) , (14.2)

where the infimum is taken over all admissible pair satisfying the extended control system

(14.1).

Problem 14.1 is an example of the Mayer problem. Namely, let g : N → R be a

function on the manifold N . Then the Mayer problem is the following minimization:

Problem 14.2 Find minimizers for

infCx0

g(x(1))

where the infimum is taken over all admissible pair (x(·), u(·)) satisfying the control sys-

tem

x = F (x, u)

on N and initial condition x(0) = x0.

Note that Problem 14.1 is the Mayer problem on the manifold N = M × R with

function g : M × R→ R given by g(x, z) = z − f(x). Also, if α is in the subdifferential

d−fx of f at x, then (−α, 1) is in the superdifferential d+g(x,z) of g at (x, z).

Next, we will prove a version of the Pontryagin maximum principle for the Mayer

problem and show how Theorem 2.3 follows from this. For each point u in the control

set U , define the corresponding Hamiltonian function Hu : T ∗N → R by

Hu(px) = px(F (x, u)).


Theorem 14.3 (Pontryagin Maximum Principle for the Mayer Problem)

Let (x(·), u(·)) be an admissible pair which achieves the infimum in Problem 14.2.

Assume that the function g in Problem 14.2 is super-differentiable at the point x(1)

and let α be in the super-differential d+gx(1) of g. Then there exists a Lipschitz path

p(·) : [0, 1] → T ∗N which, for almost all values of time t in the interval [0, 1], satisfies

the following:

π(p(t)) = x(t),

p(1) = α,

˙p(t) =−→H u(t)(p(t)),

H u(t)(p(t)) = minu∈U

Hu(p(t)).

(14.3)

Proof Fix a point v in the control set and a number τ in the interval [0, 1]. For each small

positive number ε > 0, let uε be the admissible control defined by

uε(t) =

u(t), if t /∈ [τ − ε, τ ];

v, if t ∈ [τ − ε, τ ].

Since the optimal control u is locally bounded, the new control uε defined above is also

locally bounded. Let P εt0,t1

: N → N be the time-dependent local flow of the following

ordinary differential equation

x(t) = F (x(t), uε(t)).

Here P ε0,t(x) denotes the image of the point x in the manifold N under the local flow

P ε0,t at time t. It has the composition property P ε

t2,t3 P ε

t1,t2= P ε

t1,t3. Also, recall that

P εt0,t1

depends smoothly on the space variables and it is Lipschitz with respect to the time

variable.

Since x(1) = P 00,1(x0) and the function g is minimizing at x(1), the following is true

for all ε > 0:


g(P ε0,1(x0)) ≥ g(P 0

0,1(x0)). (14.4)

Let α be a point in the super-differential d+gx(1) at the point x(1). Then there exists

a C1 function φ : N → R such that dφx(1) = α and g − φ has a local maximum at x(1).

Combining this with (14.4), we have

g(P 00,1(x0))− φ(P ε

0,1(x0)) ≤

g(P ε0,1(x0))− φ(P ε

0,1(x0)) ≤ g(P 00,1(x0))− φ(P 0

0,1(x0)).

Simplifying this equation we get

φ(P ε0,1(x0))− φ(P 0

0,1(x0))

ε≥ 0. (14.5)

If Rt denotes the flow of the vector field F v, then

P ε0,1 = P 0

τ,1 Rε P 00,τ−ε. (14.6)

So, if we assume that τ is a point of differentiability of the map t 7→ P 00,t, which is

true for almost all time τ in the interval [0, 1], then P ε0,1 is differentiable with respect to

ε at zero. Therefore, we can let ε go to 0 in (14.5) and obtain

α

(d

dε

∣∣∣ε=0

P ε0,1

)≥ 0. (14.7)

If we differentiate equation (14.6) with respect to ε and set ε to be zero, the equation

becomes

d

dε

∣∣∣ε=0

P ε0,1 = (P 0

τ,1)∗(F v − F u(τ)) P 00,1.

Substitute this equation back into (14.7) and we get the following:

((P 0τ,1)

∗α)(F v(x(τ))− F u(τ)(x(τ))) ≥ 0. (14.8)


Define p : [0, 1] → T ∗N by p(t) = (P 0t,1)

∗α, then the first two assertions of the theorem

are clearly satisfied.

The following is well known (see [7] or [41]):

Lemma 14.4 Let θ = pdq be the natural 1-form on the cotangent bundle of the manifold

N , then for each diffeomorphism P : N → N , the pull back map P ∗ : T ∗N → T ∗N on

the cotangent bundle of the manifold preserves the 1-form θ.

Let Wt be the time-dependent vector field on the cotangent bundle of the manifold

which satisfies

d

dt(P 0

t,1)∗ = Wt (P 0

t,1)∗

for almost all values of time t in [0, 1]. If LV denotes the Lie derivative with respect to

a vector field V , then, by Lemma 14.4, the following is true for almost all values of time

t in [0, 1]:

LWtθ = 0.

If ω = −dθ is the canonical symplectic 2-form on the cotangent bundle, then, by

using Cartan’s formula, we have

iWtω = d(θ(Wt)).

Therefore, the vector field Wt is a Hamiltonian vector field with the Hamiltonian given

by

H u(t)(p) = p(F (x, u(t))).

This implies the third assertion of the theorem. The last assertion follows from (14.8).

¤

Going back to Problem 14.1, we can apply the Pontryagin Maximum Principle for

the Mayer problem. Let (x(·), z(·)) be an admissible pair which minimizes Problem 14.1

and let H t : T ∗M × R→ R be the function defined by


H t(p, l) := p(F (x, u(t))) + l · L(x, u(t)).

By Theorem 14.3, there exists a curve (p(·), l(·)) : [0, 1] → T ∗xM × R such that x(t) =

π(p(t)) and

( ˙p,˙l) =

−→H t(p, l),

(p(1), l(1)) = (−α, 1),

H t(p(t), l(t)) = minu∈U

(p(t)(F (x(t), u)) + l(t) · L(x(t), u)

).

(14.9)

From the first equation in (14.9), we get˙l = 0 and l(1) = 1. So, l(t) ≡ 1. Therefore,

(14.9) is simplified to

˙p =−→H u(p),

p(1) = −α,

Hu(p(t), P (t)) = minu∈U

(p(t)(F (x(t), u)) + L(x(t), u)) .

(14.10)

This finishes the proof of Theorem 2.3.

Chapter 15

Optimal Transportation and the

Generalized Curvature

In this appendix we discuss the relations between optimal transportation and the gener-

alized curvature invariants. To do this, we first recall the displacement interpolation:

ϕt(x) = π(et ~H(−dfx)),

where H is a Hamiltonian function on the cotangent bundle T ∗M , ~H is the corresponding

Hamiltonian vector field, et ~H is its Hamiltonian flow and f is a function which is twice

differentiable at almost all points x. We also assume that the map ϕt is, at almost all

points x in the manifold M , nonsingular for all t in [0, 1).

We recall that the optimal maps to the optimal transportation problem (12.1) are of

the form given by ϕ1. Let µ be a smooth volume form on the manifold M and we denote

the corresponding measure by the same symbol µ. Assume that the measures ϕt∗µ are

absolutely continuous with respect to the measure µ and let ρt be the corresponding

density. In this appendix we describe the changes in the density ρt as a function of

time t using the generalized curvature invariants. This is analogous to the Jacobi field

calculations in [20, 60].

Recall that the vertical bundle V is given by the kernel of the map dπ : TT ∗M → TM

126

Chapter 15. Optimal Transportation and the Generalized Curvature127

and the Jacobi curve Jα(t) at α corresponding to a Hamiltonian H is defined by

Jα(t) = de−t ~H(Vet ~H(α)).

Let e1(t), ..., en(t), f1(t), ..., fn(t) be a moving Darboux frame which satisfies

Jα(t) = spane1(t), ..., en(t)

and assume that the frame satisfies the following structural equations

ei(t) = c1ij(t)ej(t) + c2

ij(t)fj(t), fi(t) = c3ij(t)ej(t) + c4

ij(t)fj(t). (15.1)

Let Ckt be the matrix with entries equal to the structural constants ck

ij(t). Note that

the moving Darboux frame ei(t), fi(t) and the structural constants ckij(t) depend on the

point α in the manifold T ∗M . Let m be the n-form on the manifold T ∗M which satisfies

iei(0)m = 0 and mα(f1(0), ..., fn(0)) = 1. A Hamiltonian H is unimodular with respect

to a n-form η on the manifold M if there is a function K : T ∗M → R which is invariant

under the Hamiltonian flow et ~H such that π∗η = Km.

We will also assume that the structural equations are canonical. To say precisely

what it means, note that if ei(t) is a frame contained in the Jacobi curve Jα(t) at α, then

des ~H(ei(s+t)) is a frame contained in the Jacobi curve Jes ~H(α)(t) at es ~H(α). Therefore, we

can let e1(t), ..., en(t), f1(t), ..., fn(t) be a moving Darboux frame at α satisfying (15.1) and

we define ei(t) and fi(t) by ei(t) := des ~Hei(s + t), fi(t) := des ~Hfi(s + t). The structural

equations are canonical if e1(t), ..., en(t), f1(t), ..., fn(t) is a moving Darboux frame at

es ~H(α) satisfying

˙ei(t) = c1ij(t + s)ej(t) + c2

ij(t + s)fj(t),˙f i(t) = c3

ij(t + s)ej(t) + c4ij(t + s)fj(t).

Let us denote the differential of the map x 7→ −dfx by F , then the map dϕt satis-

fies dϕt = dπdet ~HF . If we let ςi = dπ(fi(0)), then the vectors F(ς1), ...,F(ςn) span a

linear subspace W of the symplectic vector space TαT ∗M . We write F(ςi) as a linear

combination with respect to the moving Darboux frame defined in (15.1):


F(ςi) =3∑

k=1

(aij(t)ej(t) + bij(t)fj(t)) or Ψ = AtEt + BtFt,

where At = (aij(t)), Bt = (bij(t)) and Ψ, Et and Ft are matrices with rows F(ςi), ei(t)

and fi(t) respectively.

Lemma 15.1 Assume that the measures ϕt∗µ is absolutely continuous with respect to µ,

the Hamiltonian is unimodular with respect to µ and the structural equation is canonical,

then the density ρt of ϕt∗µ satisfies

ρt(ϕt(x)) det Bt = 1.

Proof Assume that e1(t), ..., en(t), f1(t), ..., fn(t) is a moving Darboux frame at α which

satisfies (15.1). Using the definition of ei and fi, we have

des ~HF(ςi) =n∑

k=1

(aij(s)ej(0) + bij(s)fj(0)).

Since the structural equations are canonical, it follows that

m(des ~HF(ς1), ..., des ~HF(ςn)) = det Bs.

By the definition of the volume form η, the above expression implies that

η(dϕs(ς1), ..., dϕs(ςn)) = K(es ~Hα) det Bs = K(α) det Bs.

Since the function ρt is the density of the push forward measure ϕt∗η with respect to the

measure µ (i.e. ϕt∗η = ρtµ), it follows that

K(α) det B0 = µ(ς1, ..., ςn) = K(α)ρs(ϕs(x)) det Bs.

Since π(−dfx) = x, B0 is the identity matrix and the proof is complete. ¤

By assumption, for almost all points z in the manifold M , the map d(ϕt)z is non-

singular for all values of time t in [0, 1). It follows that the density ρ(ϕt(z)) is nonzero


for each such t. Lemma 15.1 shows that the corresponding matrix Bt is invertible and

so the linear space W is transversal to the space Jα(t) = spane1(t), ..., en(t). There-

fore, W is the graph of a linear map from the space spanf1(t), ..., fn(t) to the space

Jα(t) = spane1(t), ..., en(t). Let St be the corresponding matrix. (i.e. the linear map

is given by fi(t) 7→∑3

i=1 Sijt ej(t), where Sij

t are the entries of the matrix St.) Finally we

come to the main theorem of the appendix proved with A. Agrachev.

Theorem 15.2 [6] Suppose that the same assumptions as in Lemma 15.1 hold and as-

sume further that, for almost all z in M , the map d(ϕt)z is nonsingular for all values of

time t in [0, 1). Then the matrix St satisfies the following matrix Riccati equation

St + C3 + StC1 − C4St − StC

2St = 0

and the density ρt satisfies

ρt(ϕt(z)) = e∫ t0 tr(C4+SsC2)ds.

Lemma 15.3 St = B−1t At.

Proof Since fi(t) +∑3

i=1 Sijt ej(t) is in the subspace W , Ft + StEt = PtΨ = PtAtEt +

PtBtFt for some matrix Pt. By comparing the terms, we have PtAt = St and PtBt = I.

¤

Proof of Theorem 15.2 By differentiating Ψ = BtFt +BtStEt with respect to time t, we

get B−1t BtFt + Ft + B−1

t BtStEt + StEt + StEt = 0. If we apply the structural equations,

then we get

B−1t Bt + C4 + StC

2 = 0,

St + B−1t BtSt + C3 + StC

1 = 0.

Therefore, St satisfies the equation

St + C3 + StC1 − C4St − StC

2St = 0.


Finally let st = ρt(ϕt(x)), then we have, by Lemma 15.1 and 15.3, the following:

1

st

d

dtst = det Bt

d

dtdet(B−1

t ) = −tr(B−1t Bt) = tr(C4 + StC

2).

The rest of the theorem follows as claimed. ¤

Bibliography

[1] A.A. Agrachev: Exponential mappings for contact sub-Riemannian structures. J.

Dynamical and Control Systems, 2 (1996), 321–358

[2] A.A. Agrachev, M. Caponigro: Families of vector fields which generate the group

of diffeomorphisms, preprint, arXiv:0804.4403

[3] A.A. Agrachev, J.P. Gauthier: On the subanalyticity of Carnot-Caratheodory dis-

tances, Ann. I. H. Poincare – AN 18, (2001), 359–382

[4] A. Agrachev, R. Gamkrelidze: Feedback–invariant optimal control theory and dif-

ferential geometry, I. Regular extremals. J. Dynamical and Control Systems, 3

(1997), 343–389

[5] A.A. Agrachev, P. Lee: Optimal Transportation under Nonholonomic Constraints,

to appear in Trans. Amer. Soc. (2008), 35pp.

[6] A.A. Agrachev, P. Lee: Generalized Ricci Curvature Bounds for Three Dimensional

Contact Subriemannian Manifolds, preprint, 31pp.

[7] A.A. Agrachev, Y. L. Sachkov: Control Theory from the Geometric Viewpoint,

Encyclopedia of Mathematical Sciences, Vol. 87, Springer, 2004

[8] A.A. Agrachev, A. V. Sarychev: Strong minimality of abnormal geodesics for 2-

distributions, J. Dynamical and Control Systems, 1995, v.1, 139-176

131

Bibliography 132

[9] A.A. Agrachev, A. V. Sarychev: Abnormal sub-Riemannian geodesics, Morse index

and rigidity, Annales de l’Institut Henry Poincare-Analyse non lineaire, v.13, 1996,

635-690

[10] A. Agrachev, I. Zelenko: Geometry of Jacobi curves, I, II. J. Dynamical and Control

Systems, 8 (2002), 93–140, 167–215

[11] L. Ambrosio, S. Rigot: Optimal mass transportation in the Heisenberg group, J.

Func. Anal. 208 (2004), 261–301.

[12] V.I. Arnold, A.B. Givental: Symplectic geometry, Dynamical systems IV, Ency-

clopaedia Math. Sci., 4, Springer, Berlin (2001), 1–138.

[13] G. Bande, P. Ghiggini, D. Kotschick: Stability theorems for symplectic and contact

pairs, Int. Math. Res. Not. 68 (2004), 3673–3688.

[14] P. Bernard, B. Buffoni: Optimal mass transportation and Mather theory, J. Eur.

Math. Soc. (JEMS) 9 (2007), no. 1, 85–121.

[15] R. Bhatia: Matrix analysis. Graduate Texts in Mathematics, 169. Springer-Verlag,

New York, 1997

[16] Y. Brenier: Polar factorization and monotone rearrangements of vector-valued func-

tions, Comm. Pure Appl. Math. 44:4 (1991), 375–417.

[17] P. Cannarsa, L. Rifford: Semiconcavity results for optimal control problems ad-

mitting no singular minimizing controls, to appear in Ann. Inst. H. Poincare’

Anal. Non Line’aire, http://math.unice.fr/%7Erifford/Papiers_en_ligne/

CANRIFNEW.pdf

[18] P. Cannarsa, C. Sinestrari: Semiconcave Functions, Hamilton-Jacobi Equations,

and Optimal Control, Birkhauser Boston, Progress in Nonlinear Differential Equa-

tions and Their Applications, Vol. 58, 2004

Bibliography 133

[19] D.C. Chang, I. Markina, A. Vasil’ev: Sub-Lorentzian geometry on anti-de Sitter

space. J. Math. Pures Appl. (9) 90 (2008), no. 1, 82–110

[20] D. Cordero-Erausquin, R. McCann, M. Schmuckenschlager: A Riemannian inter-

polation inequality a la Borell, Brascamb and Lieb. Invent. Math., 146: 219-257,

2001.

[21] D. Ebin, J. Marsden: Groups of diffeomorphism and the motion of an incompress-

ible fluid, Ann. of Math. (2) 92 (1970), 102-163

[22] A. Fathi, A. Figalli: Optimal transportation on non-compact manifolds, Israel J.

Math., to appear.

[23] A. Figalli: Existence, uniqueness and regularity of optimal transport maps, SIAM

J. Math. Anal., 39 (2007), no.1, 126-137.

[24] A. Figalli, L. Rifford: Mass Transportation on sub-Riemannian Mani-

folds, preprint, http://math.unice.fr/$\sim$rifford/Papiers\_en\_ligne/

transpSRFigRif.pdf

[25] G. Freiling, G. Jank, H. Abou-Kandil: Generalized Riccati difference and differen-

tial equations. Linear Algebra Appl., 241/242, 291-303 (1996)

[26] R. V. Gamkrelidze: Principles of Optimal Control Theory, Plenum Publishing

Corporation, New York, 1978

[27] E. Ghys: Feuilletages riemanniens sur les varietes simplement connexes, Ann. Inst.

Fourier (Grenoble) 34:4 (1984), 203–223.

[28] J.W. Gray: Some global properties of contact structures, Ann. of Math. 69:2 (1959),

421–450.

[29] L. Hormander: Hypoelliptic second order differential equations, Acta Math., 119

(1967), 147-171

Bibliography 134

[30] L. Hormander: The analysis of linear partial differential operators III, Classics in

Mathematics, Springer, 1983

[31] N. Juillet: Geometric Inequalities and Generalized Ricci Bounds in the Heisenberg

Group, to appear in IMRN

[32] V. Jurdjevic: Geometric control theory, Cambridge Studies in Advanced Mathe-

matics, 52. Cambridge University Press, Cambridge, 1997

[33] L. Kantorovich: On the translocation of masses, C.R. (Doklady) Acad. Sci.

URSS(N.S.), 37, 1942, 199-201

[34] B. Khesin, P. Lee: A nonholonomic Moser theorem and optimal mass transport,

preprint arXiv: 0802.1551 (2008), 31pp.

[35] B. Khesin, G. Misiolek: Shock waves for the Burgers equation and curvatures of

diffeomorphism groups, Proc. Steklov Inst. Math., v.250 (2007), 1–9.

[36] J.J. Levin: On the matrix Riccati equation. Proc. Amer. Math. Soc. 10 1959 519–

524.

[37] C.B. Li, I. Zelenko: Differential geometry of curves in Lagrange Grassmannians

with given Young diagram. arXiv:0708.1100v1

[38] W.S. Liu, H.J. Sussmann: Shortest paths for sub-Riemannian metrics on rank-2

distributions, Memoirs of AMS, v.118, N. 569, 1995

[39] J. Lott, C. Villani: Ricci curvature for metric-measure spaces via optimal transport,

Ann. of Math. (2), in press

[40] J. Lott, c. Villani: Weak curvature conditions and functional inequalities. J. Funct.

Anal. 245 (2007), no. 1, 311–333

Bibliography 135

[41] J. Marsden, T. Ratiu: Introduction to Mechanics and Symmetry, Texts in Applied

Mathematics, Vol. 17, Springer, 1999

[42] R. McCann: Polar factorization of maps in Riemannian manifolds, Geom. Funct.

Anal., 11:3 (2001), 589-608

[43] J. Moser: On the volume elements on a manifold, Trans. of the AMS, 120:2 (1965),

286–294.

[44] R. Montgomery: Abnormal Minimizers, SIAM J. Control and Optimization, vol.

32, no. 6, 1994, 1605-1620.

[45] R. Montgomery: A tour of subriemannian geometries, their geodesics and applica-

tions, AMS, Mathematical Surverys and Monographs, vol. 91, 2002

[46] Duy-Minh Nhieu: The Neumann problem for sub-Laplacians on Carnot groups and

the extension theorem for Sobolev spaces, Ann. Mat. Pura Appl. (4) 180 (2001),

no. 1, 1–25.

[47] Duy-Minh Nhieu, N. Garofalo: Lipschitz continuity, global smooth approximations

and extension theorems for Sobolev functions in Carnot-Caratheodory spaces, J.

d’Analyse Math., 74 (1998), 67–97.

[48] S. Ohta: On the measure contraction property of metric measure spaces. Comment.

Math. Helv. 82 (2007), no. 4, 805–828

[49] S. Ohta: Finsler interpolation inequalities, to appear in Calc. Var. Partial Differ-

ential Equations

[50] B. O’neill: Semi-Riemannian geometry. With applications to relativity. Pure and

Applied Mathematics, 103. Academic Press, Inc., New York, 1983.

[51] B. O’Neill: Submersions and geodesics, Duke Math. J., 34 (1967), 363–373.

Bibliography 136

[52] F. Otto: The geometry of dissipative evolution equations: the porous medium

equation, Comm. Partial Differential Equations, 26:1-2 (2001), 101–174.

[53] F. Otto, C. Villani: Generalization of an inequality by Talagrand and links with

the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361-400, 200

[54] L.P. Rothschild, E. Stein: Hypoelliptic differential operators and nilpotent group,

Acta Math., 137 (1976), 247–320.

[55] L.P. Rothschild, D. Tartakoff: Parametrices with C∞ error for cmb and operators

of Hormander type, Partial differential equations and geometry (Proc. Conf., Park

City, Utah, 1977), pp. 255–271, Lecture Notes in Pure and Appl. Math., 48, Dekker,

New York, 1979.

[56] A. Sarychev, D. Torres’: Lipschitzian regularity conditions for the mnimizing tra-

jectories of optimal control problems, in Nonlinear analysis and its applications to

differential equations (Lisbon, 1998), 357-368, Progr. Nonlinear Differential Equa-

tions Appl., 43, Birkhauser Boston, Boston, MA, 2001

[57] A.I. Shnirelman: The geometry of the group of diffeomorphisms and the dynamics

of an ideal incompressible fluid, Math. USSR-Sb. 56 (1987), 79–105.

[58] K.T. Sturm: On the geometry of metric measure spaces. Acta Math. 196, no.1,

65-131 (2006)

[59] K.T. Sturm: On the geometry of metric measure spaces II. Acta Math. 196, no. 1,

133-177 (2006)

[60] K.T. Sturm and M.K. von Renesse: Transport inequalities, gradient estimates,

entropy and Ricci curvature, Comm. Pure Appl. Math. 58 (2005), 923C940

[61] H.J. Sussmann: A Cornucopia of Abnormal Sub-Riemannian Minimizers. Part I:

The Four dimensional Case, IMA technical report no. 1073, December, 1992

Bibliography 137

[62] M.E. Taylor: Partial Differential Equations I (Basic Theory), Applied Mathemati-

cal Sciences, vol. 115, Springer

[63] C. Villani: Topics in Optimal Transportation, Graduate Studies in Mathematics

58, AMS, Providence, 2003

[64] C. Villani: Optimal Transport - old and new, Grundlehren der mathematischen

Wissenschaften , Vol. 338, Springer, 2009 preprint

by paul woon yin lee a thesis submitted in conformity with ......metric on the space of densities...

Documents