by paul woon yin lee a thesis submitted in conformity with ......metric on the space of densities...
TRANSCRIPT
Symplectic and Subriemannian Geometry of Optimal
Transport
by
Paul Woon Yin Lee
A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Mathematics
University of Toronto
Copyright c© 2009 by Paul Woon Yin Lee
Abstract
Symplectic and Subriemannian Geometry of Optimal Transport
Paul Woon Yin Lee
Doctor of Philosophy
Graduate Department of Mathematics
University of Toronto
2009
This thesis is devoted to subriemannian optimal transportation problems. In the first
part of the thesis, we consider cost functions arising from very general optimal control
costs. We prove the existence and uniqueness of an optimal map between two given
measures under certain regularity and growth assumptions on the Lagrangian, absolute
continuity of the measures with respect to the Lebesgue class, and, most importantly,
the absence of sharp abnormal minimizers. In particular, this result is applicable in the
case where the cost function is square of the subriemannian distance on a subriemannian
manifold with a 2-generating distribution. This unifies and generalizes the correspond-
ing Riemannian and subriemannian results of Brenier, McCann, Ambrosio-Rigot and
Bernard-Buffoni. We also establish various properties of the optimal plan when abnor-
mal minimizers are present.
The second part of the thesis is devoted to the infinite-dimensional geometry of op-
timal transportation on a subriemannian manifold. We start by proving the following
nonholonomic version of the classical Moser theorem: given a bracket-generating distri-
bution on a connected compact manifold (possibly with boundary), two volume forms of
equal total volume can be isotoped by the flow of a vector field tangent to this distri-
bution. Next, we describe formal solutions of the corresponding subriemannian optimal
transportation problem and present the Hamiltonian framework for both the Otto cal-
culus and its subriemannian counterpart as infinite-dimensional Hamiltonian reductions
ii
on diffeomorphism groups. Finally, we define a subriemannian analog of the Wasserstein
metric on the space of densities and prove that the subriemannian heat equation defines
a gradient flow on the subriemannian Wasserstein space with the potential given by the
Boltzmann relative entropy functional.
Measure contraction property is one of the possible generalizations of Ricci curvature
bound to more general metric measure spaces. In the third part of the thesis, we discuss
when a three dimensional contact subriemannian manifold satisfies such property.
iii
Acknowledgements
I would like to thank my advisor, Professor Boris Khesin, for his guidance, dedication,
and invaluable advice along this project. I would also like to express my deep appreciation
to Professor Andrei Agrachev, Professor Luigi Ambrosio, Professor Robert McCann and
Professor Nassif Ghoussoub for having various fruitful discussions and constant support.
I am grateful to all Professors who taught me during my graduate study. Specifically,
I would like to convey my gratitude to Professor Velimir Jurdjevic for introducing me
to the theory of optimal control and Professor Yael Karshon for teaching me symplectic
geometry. Last but not least, I would like to thank the staff members, especially Ida
Bulat, from the Mathematics Department at University of Toronto for taking care of all
my nonacademic problems.
iv
Contents
1 Introduction 1
1.1 Part I: Optimal Transportation under Nonholonomic Constraints . . . . 1
1.2 Part II: A Nonholonomic Moser Theorem and Optimal Mass Transport . 5
1.3 Part III: Generalized Ricci Curvature Bounds for Three Dimensional Con-
tact Subriemannian Manifolds . . . . . . . . . . . . . . . . . . . . . . . . 7
I Optimal Transportation under Nonholonomic Constraints 10
2 Background 11
2.1 Elementary Optimal Control Theory . . . . . . . . . . . . . . . . . . . . 11
2.2 Optimal Mass Transportation . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Nonholonomic Optimal Transportation Problem 22
3.1 Existence and Uniqueness of an Optimal Map . . . . . . . . . . . . . . . 22
3.2 Regularity of Control Costs . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Applications: Mass Transportation on Subriemannian Manifolds . . . . . 36
4 Optimal Transportation with Non-Lipschitz Cost 39
4.1 Normal Minimizers and Properties of Optimal Maps with Continuous Op-
timal Control Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Optimal Maps with Abnormal Minimizers . . . . . . . . . . . . . . . . . 41
v
II A Nonholonomic Moser Theorem and Optimal Mass Trans-
port 45
5 Classical and Nonholonomic Moser Theorems 46
6 Distributions on Diffeomorphism Groups 56
6.1 A Fibration on the Group of Diffeomorphisms . . . . . . . . . . . . . . . 56
6.2 A Nonholonomic Distribution on the Diffeomorphism Group . . . . . . . 59
6.3 Accessibility of Diffeomorphisms and Consequences . . . . . . . . . . . . 60
7 The Riemannian Geometry of Diffeomorphism Groups and Mass Trans-
port 62
8 The Hamiltonian Mechanics on Diffeomorphism Groups 67
8.1 Averaged Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.2 Riemannian Submersion and Symplectic Quotients . . . . . . . . . . . . 69
8.3 Hamiltonian Flows on the Diffeomorphism Groups . . . . . . . . . . . . . 71
8.4 Hamiltonian Flows on the Wasserstein Space . . . . . . . . . . . . . . . . 74
9 The Subriemannian Geometry of Diffeomorphism Groups 77
9.1 Subriemannian Submersion . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.2 A Subriemannian Analog of the Otto Calculus. . . . . . . . . . . . . . . 81
9.3 The Nonholonomic Heat Equation . . . . . . . . . . . . . . . . . . . . . . 84
III Generalized Ricci Curvature Bounds for Three Dimen-
sional Contact Subriemannian Manifolds 88
10 Revisiting Subriemannian Geometry 89
11 Generalized Curvatures 93
11.1 The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
vi
11.2 The Three Dimensional Contact Case . . . . . . . . . . . . . . . . . . . . 97
12 Measure Contraction Properties 105
13 Isoperimetric Problems 115
IV Appendix 119
14 Proof of Pontryagin Maximum Principle for the Bolza Problem 120
15 Optimal Transportation and the Generalized Curvature 126
Bibliography 130
vii
List of Figures
5.1 A nonholonomic Hodge decomposition. . . . . . . . . . . . . . . . . . . . 52
6.1 The Moser theorem in both the classical and nonholonomic settings is a
path-lifting property in the diffeomorphism group. . . . . . . . . . . . . . 58
8.1 Hamiltonian flow of the Hamiltonian HM and its projection: The curve
φt(x) is the projection of the curve ΨHM
t (∇f(x)) to the manifold M . . . . 74
9.1 Subriemannian submersion: horizontal subdistribution T hor is mapped
isometrically to the tangent bundle TB of the base. . . . . . . . . . . . . 79
9.2 Projections of subriemannian geodesics from (S3, T ) in the Hopf bundle
give circles in S2, only one of which, the equator, is a geodesic on the base
S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
viii
Chapter 1
Introduction
Imagine we have a pile of sand and we want to move it from one place to another in
the most efficient way. This is an example of optimal transportation problem of Monge,
originally posed in 1781. Kantorovich in 1942 proposed to consider this problem together
with its dual, which turned out to be linear. After this famous work of Kantorovich [23]
a significant progress in this problem came only in the 90’s. In 1991, Brenier proved in
[16] that there is a unique transport plan to the above problem when the efficiency of the
transport plan is measured by the square of the Euclidean distance. Since then, various
generalizations were done for more general cost functions. This includes the work by
McCann [42], Ambrosio-Rigot [11], Bernard-Buffoni [14]. In the first part of the thesis,
we present a generalization of all the above to a more general type of cost functions called
optimal control cost.
1.1 Part I: Optimal Transportation under Nonholo-
nomic Constraints
Mathematically, the Monge-Kantorovich problem can be formulated as follows. The piles
of sands are replaced by Borel probability measures µ and ν on a manifold M and the
1
Chapter 1. Introduction 2
transport plans between these measures are replaced by maps ϕ : M → M . The cost for
transporting from a point x to another point y on the manifold is given by a function
c : M ×M → R. So, the total cost of the strategy ϕ is the average∫
Mc(x, ϕ(x))dµ(x).
The main goal is to show existence and uniqueness of a map ϕ which minimizes the total
cost, as well as to find the optimal ϕ explicitly whenever possible. More precisely,
Problem 1.1 Find a map ϕ : M → M which pushes µ forward to ν and minimizes the
following functional ∫
M
c(x, ϕ(x))dµ(x).
In 1942 Kantorovich studied a relaxed version of the above problem in his famous
paper [33]. However, a huge step toward solving the original problem was not achieved
until a decade ago by Brenier. In [16] Brenier proved the existence and uniqueness of
optimal map in the case where M = Rn and the cost function c is given c(x, y) = |x−y|2.Later, this was generalized by McCann [42] to the case of a closed Riemannian manifold
M with the cost given by the square of the Riemannian distance c(x, y) = d2(x, y).
Recently, Bernard and Buffoni [14] generalized this further to the case where the cost c
is the action associated to a Lagrangian function L : TM → R on a compact manifold
M . More precisely, the cost is given by
c(x, y) = inf
∫ 1
0
L(x(t), x(t))dt, (1.1)
where the infimum is taken over all curves x(·) joining the points x and y, and the
Lagrangian L is fibrewise strictly convex with superlinear growth. Note that if g is
a Riemannian metric with the corresponding Riemannian distance function d and the
Lagrangian L is defined by L(v) = g(v, v), then the cost function c is square of the
Riemannian distance function d2.
In the first part of the thesis, we consider costs similar to (1.1). However, instead of
minimizing among all curves, the infimum is taken over a subcollection of curves, called
Chapter 1. Introduction 3
admissible paths. These paths are given by a control system and the corresponding cost
function is called the optimal control cost. More precisely, a control system is a smooth
fiber-preserving map F of a locally trivial bundle P → M over the manifold M into its
tangent bundle TM . If the fibres of the bundle P → M are diffeomorphic to a set U ,
then the map F : P → TM can be written locally as F : (x, u) 7→ F (x, u), where x
is in the manifold M and u is in the set U . We assume that U is a closed subset of a
Euclidean space. Admissible controls are measurable bounded maps u(·) from [0, 1] to
U , while admissible paths are Lipschitz curves x(·) which satisfy the equation
x(t) = F (x(t), u(t)), (1.2)
where u(·) is an admissible control. Let L : M × U → R be a Lagrangian, then the
corresponding cost c is given by
c(x, y) = inf(x(·),u(·))
∫ 1
0
L(x(t), u(t)) dt, (1.3)
where the infimum is taken over all admissible pairs (x(·), u(·)) : [0, 1] → M × U such
that x(0) = x, y(0) = y.
In interesting cases the dimension of U is smaller than that of M , and yet any two
points of M can be connected by an optimal admissible path. In other words the control
system works as a nonholonomic constraint. The shortage of admissible velocities does
not allow us to recover an optimal path from its initial point and initial velocity and the
Euler–Lagrange description of the extremals does not work well. On the other hand, the
Hamiltonian approach remains efficient thanks to the Pontryagin maximum principle.
Another difficulty is the appearance of so called abnormal extremals (singularities of the
space of admissible paths) which make the situation quite different from the classical
optimal transport problem.
In Chapter 2 we recall basic necessary notions in optimal control theory and the
theory of optimal mass transportation. In Section 3.1 by using the arguments from the
theory of optimal mass transportation and the Pontryagin maximum principle in optimal
Chapter 1. Introduction 4
control theory, the existence and uniqueness of optimal map is established under certain
regularity assumptions. More precisely, the following theorem holds:
Theorem 1.2 [5] Under certain regularity and growth conditions, there exists a unique
solution to the Monge-Kantorovich problem (1.1) with the cost function c given by an
optimal control cost.
All of the assumptions of Theorem 1.2 are mild except the Lipschitz continuity of the
cost function. However, this continuity is well-known in all the known cases mentioned
after Problem 1.1. So, the theorem generalizes the work in [16, 42, 14].
In Section 3.2 we study the Lipschitz continuity of optimal control costs. There are
two types of minimizers to the cost in (1.3). They are called normal and abnormal
minimizers. If abnormal minimizers are absent, which is the case in [16, 42, 14], the cost
is not only Lipschitz but even semi-concave (see [17]). However, abnormal minimizers
are unavoidable in many interesting problems and, in particular, in all subriemannian
problems. It turns out that not all abnormal minimizers are dangerous. To keep the
Lipschitz property of the cost (although not its semi-concavity) it is sufficient to have
no sharp abnormal minimizers. Geometric control theory provides effective conditions
of the sharpness (see, for instance, [7, 9]). These conditions allow us to prove Lipschitz
continuity for a large class of optimal control costs. This, in turn, proves the existence
and uniqueness of optimal map of the corresponding Monge-Kantorovich problem.
A subriemannian manifold is a manifold M equipped with a plane distribution ∆ ⊆TM and an inner product g defined on ∆. The subriemannian distance d(x, y) between
two points x and y is defined as the length of the shortest path joining the two given
points and tangent to the distribution ∆. In Section 3.3 the optimal transportation
problem on a subriemannian manifold with cost function c given by the square of the
subriemannian distance d2 is considered. In this case all the mild regularity assumptions
are satisfied and we prove the following result:
Chapter 1. Introduction 5
Theorem 1.3 [5] Assume that the distribution ∆ is 2-generating (i.e. vector fields in ∆
together with their Lie brackets span all tangent spaces), then the function d2 is Lipschitz.
As a corollary we prove the existence and uniqueness of an optimal map for the
subriemannian optimal transportation problem with a 2-generating distribution. This
generalizes the corresponding result by Ambrosio-Rigot [11] on the Heisenberg group.
In Chapter 4 we prove certain properties of the optimal plan when abnormal minimiz-
ers are present. In Section 4.1 we consider flows whose trajectories are strictly abnormal
minimizers. We show that these flows cannot give an optimal plan for all “nice” initial
measures for a continuous cost. On the contrary, in Section 4.2, we show that these flows
are indeed optimal for an important class of problems with discontinuous costs.
1.2 Part II: A Nonholonomic Moser Theorem and
Optimal Mass Transport
The classical Moser theorem establishes that the total volume is the only invariant for a
volume form on a compact connected manifold with respect to the diffeomorphism action.
In the second part of the thesis we first prove a nonholonomic counterpart of this result
and present its applications to the problems of nonholonomic optimal mass transport.
The equivalence for the diffeomorphism action is often formulated in terms of “sta-
bility” of the corresponding object: the existence of a diffeomorphism relating the initial
object with a deformed one means that the initial object is stable, as it differs from the
deformed one merely by a coordinate change. Gray showed in [28] that contact structures
on a compact manifold are stable. Moser [43] established stability for volume forms and
symplectic structures. A leafwise counterpart of Moser’s argument for foliations was pre-
sented by Ghys in [27], while stability of symplectic-contact pairs in transversal foliations
was proved in [13]. In this part we establish stability of volume forms in the presence of
Chapter 1. Introduction 6
any bracket-generating distributions on connected compact manifolds. Recall that a dis-
tribution τ on the manifold M is called bracket-generating, or completely nonholonomic,
if local vector fields tangent to τ and their iterated Lie brackets span the entire tangent
bundle of the manifold M . The following theorem, which we call a nonholonomic Moser
theorem, is proved.
Theorem 1.4 [34] Any two volume forms of equal total volume on a manifold can be
isotoped by the flow of a time dependent vector field tangent to the bracket generating
distribution.
A version for manifolds with boundary is also proved.
Nonholonomic distributions arise in various problems related to rolling or skating,
wherever the “no-slip” condition is present. For instance, a ball rolling over a table
defines a trajectory in a configuration space tangent to a nonholonomic distribution of
admissible velocities. Note that such a ball can be rolled to any point of the table and
stopped at any a priori prescribed position. The latter is a manifestation of the Chow-
Rashevsky theorem (see e.g. [45]): For a bracket-generating distribution τ on a connected
manifold M any two points in M can be connected by a horizontal path (i.e. a path
everywhere tangent to the distribution τ). The motivation for considering volume forms
(or, densities) in a space with distribution can be related to problems with many tiny
rolling balls: It is more convenient to consider the density of such balls, rather than look
at them individually.
Note that for an integrable distribution there is a foliation to which it is tangent
and a horizontal path always stays on the same leaf of this foliation. Furthermore, for
an integrable distribution the existence of an isotopy between volume forms requires an
infinite number of conditions. On the contrary, the nonholonomic Moser theorem shows
that a non-integrable bracket-generating distribution imposes only one condition on total
volume of the forms for the existence of the isotopy between them.
Chapter 1. Introduction 7
Closely related to the nonholonomic Moser theorem is the existence of a nonholo-
nomic Hodge decomposition. We describe the related properties of the subriemannian
Laplace operator. We also formulate the corresponding nonholonomic mass transport
problem and describe its formal solutions as projections of horizontal geodesics on the
diffeomorphism group for the L2-Carnot-Caratheodory metric.
In order to give this description the Hamiltonian framework for what is now called
the Otto calculus is presented. We also introduce the notion of the Riemannian sub-
mersion picture for the problems of optimal mass transport is presented. It turns out
that the submersion properties can be naturally understood as an infinite-dimensional
Hamiltonian reduction on diffeomorphism groups, and this admits a generalization to the
nonholonomic setting. A nonholonomic analog of the Wasserstein metric on the space of
densities is defined. Finally, the following is an extension to the subriemannian setting
of Otto’s fundamental result on the heat equation:
Theorem 1.5 The subriemannian heat equation defines a gradient flow on the nonholo-
nomic Wasserstein space with potential given by the Boltzmann relative entropy func-
tional:
Ent(ρ) :=
∫
M
ρ log(ρ) µ .
1.3 Part III: Generalized Ricci Curvature Bounds
for Three Dimensional Contact Subriemannian
Manifolds
In the past few years, several connections between the optimal transportation problems
and curvature of Riemannian manifolds were found. One of them is the use of optimal
transportation for an alternative definition of Ricci curvature lower bound developed in a
series of papers [53, 20, 60]. Based on the ideas in these papers, a generalization of Ricci
Chapter 1. Introduction 8
curvature lower bound for general metric measure spaces, called curvature dimension
condition, is introduced in [39, 40, 58, 59, 48]. However the conditions are not easy
to check and there is no new example. Recently the case of a Finsler manifold was
studied in [49], but the result is very similar to that of the Riemannian case due to strict
convexity of the corresponding Hamiltonian. The situation changes dramatically in the
case of a subriemannian manifold. The reason is that the class of metric spaces we are
dealing with have Hausdorff dimensions strictly greater than their topological dimensions.
Therefore, the interplay of the metrics and the measures for these spaces should be
significantly different from that of the Riemannian or Finsler case. One particular case
of subriemannian manifolds, the Heisenberg group, is studied in [32]. In this case the the
space does not satisfy any curvature dimension condition. However, it satisfies a weaker
condition, a modification of, so called, measure contraction property.
In the third part of this thesis, we study a subriemannian version of the measure con-
traction property for all three dimensional contact subriemannian manifolds, generalizing
the corresponding result in [32]. This study uses a subriemannian generalization of the
classical Riemannian curvature. The nature of this invariant is dynamical rather than
metrical: the generalized curvature is simplest differential invariant for the geodesic flow
defined on the cotangent bundle T ∗M equipped with the bundle structure π : T ∗M → M .
The generalized curvature is easy to compute and we study its role in the measure con-
traction property in this thesis.
The structure of this part of the thesis is as follows. In Section 10 we recall and
introduce several notions on subriemannian geometry necessary for the third part of the
thesis. In Section 11 we recall and specialize the recent result of [37] on the curvature
type invariants of subriemannian manifolds to the three dimensional contact case. We
will also write down the explicit formulas for these invariants in this section. Section 12
contains the main theorem (Theorem 12.3) which shows that if the generalized curvatures
are bounded below by a constant, then the subriemannian manifold satisfies a generalized
Chapter 1. Introduction 9
measure contraction property MCP (r; 2, 3) (see Definition 12.4 below). In particular, if
the generalized curvatures are non-negative, then the subriemannian manifold satisfies
the measure contraction property MCP (0, 5). As a consequence, these spaces satisfy the
doubling property and a local Poincare inequality. In Section 5 we specialize to the case
where the contact subriemannian manifolds are related to the isoperimetric problems or
particles in magnetic fields. In this case the subriemannian manifold M is the total space
of a principle bundle πM : M → N . The base space N is a smooth surface equipped with
a Riemannian metric descending from the subriemannian metric of the total space M .
The total space M satisfies the measure contraction property MCP (0, 5) if the surface
N has non-negative Gauss curvature (Theorem 13.1. In particular, this is applicable to
the two famous examples: the Heisenberg group and the Hopf fibration.
Main results of this thesis are published in the papers [5, 34, 6]. In particular, the
results in Part I of this thesis is contained in [5], which is accepted for publication by the
Transactions of the American Mathematical Society.
Part I
Optimal Transportation under
Nonholonomic Constraints
10
Chapter 2
Background
In this chapter, we recall some basic notions and results from optimal control and optimal
transportation theory used in this thesis. We refer to [7, 26] and [63, 64] for more detail
on optimal control and optimal transportation theory, respectively.
2.1 Elementary Optimal Control Theory
A control system is used to single out a family of curves called admissible paths. Such
a system consists of a family of vector fields parameterized by some variables. The
parameter is allowed to change in time so that one can switch from one vector field
to another. This time dependent choice of the parameter is called a control and the
corresponding time dependent vector field is called a control vector field. Integral curves
of the flows of all such control vector fields are called admissible paths. More precisely,
let M be a smooth manifold and let U be a closed subset in the Euclidean space Rm,
which is called the control set. Let F : M ×U → TM be a Lipschitz continuous function
such that Fu := F (·, u) : M → TM are smooth vector fields for each point u in the
control set U . In other words, F defines the family of vector fields mentioned above
parameterized by the variable u. Assume also that the function (x, u) 7→ ∂∂x
F (x, u) is
continuous. Curves u(·) : [0, 1] → U in the control set U which are locally bounded and
11
Chapter 2. Background 12
measurable (i.e. u(·) ∈ L∞([0, 1], U)) are called admissible controls (or simply controls).
A control system is the following family of ordinary differential equations parameter-
ized by the variable u:
x(t) = F (x(t), u(t)). (2.1)
The solutions x(·) to the above control system are called admissible paths and (x(·), u(·))are called admissible pairs.
The classical theory of ordinary differential equations implies that a unique solution
to the system (2.1) exists locally for almost all values of time t. Moreover, the resulting
local flow is smooth in the space variable x and Lipschitz in the time variable t. The
control system is complete if the flows of all control vector fields exist globally.
Let x0 and x1 be two points on the manifold M . The set of all admissible pairs
(x(·), u(·)) for which the corresponding admissible paths x(·) start at the point x0 are
denote by Cx0 and those that start at the point x0 and end at the point x1 are denoted
by Cx1x0
. A control system is called controllable if any two points can be connected by an
admissible path. In other words the control system is controllable if the set Cx1x0
is always
nonempty for any pair of points x0 and x1 on the manifold.
Fix a smooth function L : M ×U → R, called Lagrangian. The cost of an admissible
pair (x(·), u(·)) is given by the integral∫ 1
0L(x(t), u(t))dt. The cost c(x0, x1) between two
points x0 and x1 is given by taking the infimum of the costs of all the admissible paths
connecting x0 and x1. More precisely,
c(x0, x1) = inf
∫ 1
0
L(x(t), u(t))dt, (2.2)
where the infimum is taken over all admissible pairs (x(·), u(·)) for which the admissible
paths x(·) connect the points x0 and x1. (i.e. x(0) = x0 and x(1) = x1) We declare that
the cost is ∞ if there is no admissible path connecting the two points. (i.e. when Cx1x0
is
empty)
Chapter 2. Background 13
The cost function defined above is said to be complete if the control system is complete
and given any pair of points (x0, x1) there exists an admissible pair which achieves the
infimum above.
Remark 2.1 The infimum of the problem in (2.2) can be equivalently characterized by
using the backward control system
x(s) = −F (x(s), u(s))
for which the admissible paths are the same as those in (2.1) but moving in the opposite
direction in time. The infimum over all admissible paths of the backward control system
which start at the point x1 and end at the point x0 is the same as that in (2.2).
This fact will become important for the later discussion.
Next, we fix a function f and consider the following minimization problem, commonly
known as the Bolza problem:
Problem 2.2 Find an admissible pair (x(·), u(·)) which achieves the following infimum
inf(x(·),u(·))∈Cx0
∫ 1
0
L(x(s), u(s)) ds− f(x(1)),
where the infimum is taken over all admissible pairs (x(·), u(·)) for which the admissible
path x(·) starts at the point x0.
For each point u in the control set U define the corresponding Hamiltonian function
Hu : T ∗M → R by
Hu(px) = px(F (x, u)) + L(x, u).
If H : T ∗M → R is a function on the cotangent bundle, we denote its Hamiltonian vector
field by−→H . We denote the cotangent bundle projection by π : T ∗M → M and recall
that a covector α in T ∗M belongs to the subdifferential d−fx of a continuous function f
if there is a function g which touches f from below at x and α = dgx. Here g is touching
f from below at x means that f(x) = g(x) and g ≤ f in a neighborhood of x.
Chapter 2. Background 14
Next, we present an elementary version of the Pontryagin maximum principle which
gives necessary conditions for an admissible path to be a minimizer of the Bolza problem.
For the convenience of the readers a proof is given in the appendix.
Theorem 2.3 (Pontryagin Maximum Principle for the Bolza Problem)
Let (x(·), u(·)) be an admissible pair which achieves the infimum in Problem 2.2.
Assume that the function f in Problem 2.2 is sub-differentiable at the point x(1). Then,
for each α in the sub-differential d−fx(1) of f , there exists a Lipschitz path p : [0, 1] →T ∗M such that p(1) = −α and satisfies the following conditions for almost all values of
time t in the interval [0, 1]:
π(p(t)) = x(t),
˙p(t) =−→H u(t)(p(t)),
Hu(t)(p(t)) = minu∈U
Hu(p(t)).
(2.3)
Remark 2.4 A distribution of rank k on a manifold M is a vector subbundle of rank k of
the tangent bundle TM (i.e. it is a smooth assignment of vector subspace of dimension k
in each tangent space). Let ∆ be one such a distribution and assume that the distribution
∆ is trivializable, i.e. there exists a system of vector fields X1, ..., Xk which span ∆ at
every point: ∆x = spanX1(x), ..., Xk(x). Consider the following control system:
x(t) =k∑
i=1
ui(t)Xi(x(t)) (2.4)
with the initial condition x(0) = x and the final condition x(1) = y. Recall that we
denote by Cyx the set of all admissible pair (x(·), u(·)) such that the admissible path x(·)
satisfies x(0) = x and x(1) = y.
Let L : M ×U → R be the Lagrangian defined by L(x, u) =∑k
i=1 u2i and let c be the
corresponding cost function
Chapter 2. Background 15
c(x, y) = inf(x(·),u(·))∈Cy
x
∫ 1
0
L(x, u) dt (2.5)
where the infimum is taken over all admissible pairs (x(·), u(·)) for which the correspond-
ing admissible path x(·) starts at x0 and end at x1.
If the number k of vector fields is equal to the dimension n of the manifold M , then
the distribution ∆ coincides with the tangent bundle TM of the manifold M and all
paths are admissible. It also defines a Riemannian metric on M by declaring that the
vector fields X1, ..., Xn are orthonormal everywhere. The cost function c is the square d2
of the Riemannian distance d and the minimizers of this system correspond to the length
minimizing geodesics on M . However, this does not necessarily work when the tangent
bundle is non-trivializable.
To overcome this difficulty, we can modify the general definition of a control system
in the following way. Let P be a locally trivial bundle over M with the bundle projection
πP : P → M and let F : P → TM be a fibre preserving map (i.e. The image F (Px)
of each fibre Px under the map F is contained in the corresponding tangent space TxM
with the same base point x). The control system corresponding to the map F is given
by
x(t) = F (v(t)). (2.6)
The admissible pairs are locally bounded measurable paths v(·) : [0, 1] → P in the bundle
P which project to a Lipschitz path in M (i.e. x(·) = πP (v(·)) is Lipschitz) and satisfy
the control system (2.6). If we let P be the trivial bundle M × U , we get back the
control system (2.1) introduced earlier. If a Lagrangian L : P → R is fixed, then the
corresponding cost function c is defined by
c(x, y) = infv(·)∈Cy
x
∫ 1
0
L(v(t))dt, (2.7)
Chapter 2. Background 16
where the infimum is taken over all admissible pairs v(·) : [0, 1] → P such that the
corresponding admissible paths x(·) := πP (v(·)) satisfy x(0) = x and x(1) = y.
Let 〈·, ·〉 be a Riemannian metric on the manifold M . If P is the tangent bundle
TM of M , the map F is the identity map, and the Lagrangian L : P → R is given
by L(v) = 〈v, v〉, then the cost function c is equal to the square d2 of the Riemannian
distance d. Note that the tangent bundle in this case can be non-trivializable and so this
new control system resolves the problem mentioned above.
Similar to the Riemannian case, if a distribution is trivializable, then one can define
the corresponding control system (2.4) using the trivializing vector fields X1, ..., Xk.The admissible paths of the control system (2.4) are paths tangent to the distribution ∆.
A subriemannian metric 〈·, ·〉S can be defined by declaring that the vector fields X1, ..., Xk
are orthonormal (see Section 3.3 for the basics on subriemannian geometry). The cost
(2.5) is the square d2S of the subriemannian distance dS. For general distributions ∆ which
are non-trivializable, we consider the general control system (2.6) with fibre bundle given
by the distribution P = ∆ and the fibre preserving map F : ∆ → TM given by the
inclusion map. If the Lagrangian L is defined by a subriemannian metric L(v) = 〈v, v〉S,
then the cost is again the square of the subriemannian distance.
In the first part of the thesis (except for Section 4.2), we consider the control systems
of the form (2.1), in order to avoid heavy notations. However, all the results have easy
generalization to more general intrinsically defined systems which were just introduced.
2.2 Optimal Mass Transportation
The theory of optimal mass transportation deals with moving one mass to another in
the most efficient way. Mathematically, the masses are replaced by Borel probability
measures µ and ν on a manifold M , the transportation strategy is replaced by a map
Chapter 2. Background 17
ϕ : M → M and the efficiency is measured by a cost function c : M ×M → R ∪ +∞.The total cost of the strategy ϕ is given by the average
∫
M
c(x, ϕ(x))dµ.
The strategy ϕ is said to move the mass µ to ν if the map ϕ pushes the measure µ
forward to ν. Here, we recall that the push forward measure ϕ∗µ of a measure µ by a
Borel map ϕ is defined by ϕ∗µ(B) = µ(ϕ−1(B)) for all Borel sets B in M . The problem
of optimal mass transportation is to show existence and uniqueness of a Borel map which
pushes µ forward to ν and minimizes the total cost. More precisely,
Problem 2.5 Find a Borel map ϕ which achieves the infimum
infϕ∗µ=ν
∫
M
c(x, ϕ(x)) dµ
among all Borel maps ϕ : M → M that push the Borel probability measure µ forward to
ν.
In many cases, by assuming absolute continuity of the measure µ with respect to the
Lebesgue measure, such a problem admits a solution which is unique (up to µ-measure
zero). This unique solution to Problem 2.5 is called the optimal map or the Brenier map.
The first optimal map was found by Brenier in [16] in the case where the manifold
was Rn and the cost was c(x, y) = |x− y|2. Later, it was generalized to arbitrary closed
connected Riemannian manifolds in [42] and the costs were given by the square of the
Riemannian distances. The case of the Heisenberg group with the costs given by the
square of the subriemannian distance and the gauge distance was done in [11]. In [14]
a much more general cost arising from the Lagrange problem was considered. To define
such a cost, let L : TM → R be a Lagrangian function on the tangent bundle TM of a
compact manifold M . The cost is given by minimizing the action corresponding to the
Lagrangian. More precisely,
Chapter 2. Background 18
c(x, y) = infx(0)=x,x(1)=y
∫ 1
0
L(x(t), x(t))dt, (2.8)
where the infimum is taken over all smooth curves x(·) joining the points x and y.
The existence and uniqueness of an optimal map with the cost given by (2.8) was
shown under the following assumptions:
• The Lagrangian L is fibrewise strictly convex, i.e. the restriction of L to each
tangent space TxM is strictly convex.
• L has superlinear growth, i.e. L(v)/|v| → 0 as |v| → ∞.
• The cost c is complete, i.e. the infimum (2.8) is always achieved by some C2 smooth
paths.
Recently, the compactness assumption on the manifold or on the measures was eliminated
in [23, 22].
In this thesis a connected manifold M without boundary is considered and the cost
function c is given by the optimal control problem (2.2). Next, we consider the relaxed
version of Problem 2.5, called the Kantorovich reformulation. Let π1 : M × M → M
and π2 : M × M → M be the projection onto the first and the second components,
respectively. Let Γ be the set of all joint measures Π on the product manifold M ×M
with marginals µ and ν: π1∗Π = µ and π2∗Π = ν.
Problem 2.6 Find a measure Π which achieves the following infimum
C(µ, ν) := infΠ∈Γ
∫
M×M
c(x, y) dΠ(x, y),
where the infimum is taken over all joint measures Π on the product manifold M ×M
with marginals µ and ν.
Remark 2.7 If ϕ is an optimal map in (2.5), then (id× ϕ)∗µ is a joint measure in the
set Γ. It follows that Problem 2.6 is a relaxation of the problem in (2.5).
Chapter 2. Background 19
Before we proceed with the existence proof of an optimal map, let us look at the
following dual problem of Kantorovich, see [63] for history and the importance of this
dual problem for optimal transportation.
Let c be a cost function and let f be a function on the manifold M . The c1-transform
of the function f is the function f c1 given by
f c1(y) := infx∈M
[c(x, y)− f(x)].
Similarly, the c2-transform of the function f is defined by
f c2(x) := infy∈M
[c(x, y)− f(y)].
The function f is a c-concave function if it satisfies f c1c2 = f . Let F be the set of all pairs
of functions (g, h) on the manifold such that g : M → R∪−∞ and h : M → R∪−∞are in L1(µ) and L1(ν) respectively, and g(x) + h(y) ≤ c(x, y) for all pair (x, y) in the
product manifold M×M . The dual problem of Kantorovich is the following maximization
problem:
Problem 2.8
sup(g,h)∈F
∫
M
gdµ +
∫
M
h dν.
The existence of solution to the above problem is well-known, see [63, Theorem 1.3].
Theorem 2.9 Assume that there exists two functions a and b such that a is µ-measurable,
b is ν-measurable and the cost function c satisfies c(x, y) ≤ a(x) + b(y) for all (x, y) in
M × M . If c is also continuous, bounded below, and C(µ, ν) < ∞, then there exists a
c-concave function f such that the function f is in L1(µ), its c1-transform f c1 is in L1(ν)
and the pair (f, f c1) achieves the supremum in Problem 2.8.
The following theorem on the regularity of the dual pair above is also well-known
stronger results can be found in [64, Chapter 12]. We give a simple proof here for the
convenience of the reader.
Chapter 2. Background 20
Theorem 2.10 Assume that the cost c(x, y) is continuous, bounded below, and the mea-
sures µ and ν are compactly supported. Then the functions f and f c1 are upper semicon-
tinuous. If the function x 7→ c(x, y) is also locally Lipschitz on a set U and the Lipschitz
constant is independent of y locally, then f can be chosen locally Lipschitz on U .
Proof Fix ε > 0. Since f(x) = infx∈M [c(x, y)− f c1(y)], there exists y such that f(x) +
ε/2 > c(x, y) − f c1(y). Also, we have f(x′) + f c1(y) ≤ c(x′, y) for any x′ in M . So,
combining the above equations and continuity of the cost c, we have
f(x′)− f(x) < ε
for any x′ sufficiently close to x. Therefore, f is upper semicontinuous.
Let K be a compact set containing the support of the measures µ and ν. Let
g(x) =
f(x), if x ∈ K
−∞, if x ∈ M \K, g′(x) =
f c1(x), if x ∈ K
−∞, if x ∈ M \K,
then the pair (g, g′) achieves the maximum in Problem 2.8. Let h = (g′)c2 , then the pair
(h, hc1) also achieves the maximum. By definition of g′, we have h(x) = infy∈K
[c(x, y) −f c1(y)]. By the same argument as in the proof of upper semicontinuity, for any x and x′
in the compact subset K ′ of U we can find y in K such that
h(x′)− h(x) < c(x, y)− c(x′, y) + ε/2.
By the assumption on the cost c, the above inequality becomes
eh(x′)− h(x) ≤ kd(x, x′) + ε/2
for some constant k > 0 which is independent of x on the subset K ′. By switching the
roles of x and x′, the result follows. ¤
The following theorem about minimizers in Problem 2.6 is well-known. See, for in-
stance, [63, Chapter 2], [64, Theorem 5.10].
Chapter 2. Background 21
Theorem 2.11 In the assumptions of Theorem 2.9, Problem 2.6 admits a minimizer.
Moreover, the joint measure Π in the set Γ achieves the infimum in Problem 2.6 if and
only if Π is concentrated on the set
(x, y) ∈ M ×M |f(x) + f c(y) = c(x, y).
Chapter 3
Nonholonomic Optimal
Transportation Problem
3.1 Existence and Uniqueness of an Optimal Map
In this section, we show that the optimal transportation problem with the cost given by
an optimal control cost (1.3) can be solved under certain regularity assumptions. Let
H : T ∗M → R be the function defined by
H(px) = maxu∈U
(px(F (x, u))− L(x, u)) .
If H is well-defined and C2, then we denote its Hamiltonian vector field by−→H and let
et−→H be its flow. Let f be the function defined in Theorem 2.9 and consider the map
ϕ : M × [0, 1] → M defined by ϕ(x, t) = π(et−→H (−dfx)). The following joint result with
A. Agrachev is the precise version of Theorem 1.2.
Theorem 3.1 [5] The map x 7→ ϕ1(x) := ϕ(x, 1) is the unique (up to µ-measure zero)
optimal map for Problem (2.5) with the cost c given by formula (2.2) under the following
assumptions:
22
Chapter 3. Nonholonomic Optimal Transportation Problem 23
1. The measures µ and ν are compactly supported and µ is absolutely continuous with
respect to the Lebesgue measure.
2. c is bounded below and c(x, y) is also locally Lipschitz in the x variable and the
Lipschitz constant is independent of y locally.
3. The cost c is complete, i.e. given any pairs of points (x0, x1) in the manifold M ,
there exists an admissible pair (x(·), u(·)) such that the pair achieves the infimum for
the cost (2.2), where u(·) is locally bounded measurable, x(0) = x0 and x(1) = x1.
4. The Hamiltonian function H defined in (3.5) is well-defined and C2.
5. The Hamiltonian vector field−→H is complete, i.e. global flow exists.
The rest of this section is devoted to the proof of Theorem 3.1. Let Cy be the set of
all admissible pairs such that the corresponding admissible paths x(·) start at the point
y (x(0) = y) and satisfy the following backward control system:
x(t) = −F (x(t), u(t)). (3.1)
Let Cxy be the set of all those pairs in Cy such that the corresponding admissible paths
x(·) end at the point x: x(1) = x.
First, we have the following observation:
Proposition 3.2 Let x be a point which achieves the infimum f c1(y) = infx∈M
(c(x, y)− f(x))
and let (x(·), u(·)) be an admissible pair in Cxy such that the corresponding admissible path
x(·) minimizes the cost given by
c(x, y) = inf(x(·),u(·))∈Cx
y
∫ 1
0
L(x(t), u(t)) dt,
then (x(·), u(·)) achieves the following infimum
Chapter 3. Nonholonomic Optimal Transportation Problem 24
f c1(y) = inf(x(·),u(·))∈Cy
∫ 1
0
L(x(s), u(s)) ds− f(x(1)). (3.2)
If x(t) = x(1− t), then x(·) achieves the following infimum
f c1(y) = inf(x(·),u(·))∈Cy
∫ 1
0
L(x(s), u(s)) ds− f(x(0)), (3.3)
where Cy denotes the set of all admissible pairs (x(·), u(·)) satisfying the following control
system:
x(t) = F (x(t), u(t)), x(1) = y.
Let u(·) be as in the above Proposition and let u(t) = u(1− t). Let Ht : T ∗M → R be
given by Ht(px) = px(F (x, u(t)))−L(x, u(t)). The following is a consequence of Theorem
2.3.
Proposition 3.3 Let x(·) be a curve that achieves the infimum in (3.2) and let x(t) =
x(1− t). Assume that α is contained in the subdifferential of the function f at the point
x(0), then there exists a Lipschitz curve p : [0, 1] → T ∗M in the cotangent bundle such
that p(0) = −α and the following conditions hold for almost all time t in the interval
[0, 1]:
π(p(t)) = x(t),
˙p(t) =−→H t(p(t)),
Ht(p(t)) = maxu∈U
(p(t)(F (x(t), u))− L(x(t), u))
(3.4)
Proof By Theorem 2.3, there exists a curve p : [0, 1] → T ∗M in the cotangent bundle
T ∗M such that
π(p(t)) = x(t),
p(1) = −α,
˙p(t) =−→H u(t)(p(t)),
Hu(t)(p(t)) = minu∈U
(−p(t)(F (x(t), u(t))) + L(x(t), u(t))) ,
Chapter 3. Nonholonomic Optimal Transportation Problem 25
where Hu(p) = minu∈U
(−p(F (x, u(t))) + L(x, u(t))).
Let p(t) = p(1− t) and u(t) = u(1− t), then the equations above become
π(p(t)) = x(t),
p(0) = −α,
˙p(t) =−→H u(t)(p(t)),
Hu(t)(p(t)) = maxu∈U
(p(t)(F (x(t), u(t)))− L(x(t), u(t))) .
¤
Assume that the Hamiltonian function H : T ∗M → R defined by
H(px) = maxu∈U
(px(F (x, u))− L(x, u)) (3.5)
is well-defined and C2. Let−→H be the Hamiltonian vector field of the function H and
let et−→H be its flow. The function f defined in Theorem 2.9 is Lipschitz and so it is
differentiable almost everywhere by the Rademacher Theorem. Moreover, the map df :
M → T ∗M is measurable and locally bounded. So, if we let ϕ : M × [0, 1] → M be the
map defined by ϕ(x, t) = π(et−→H (−dfx)), then the map ϕ is a Borel map.
Proposition 3.4 Under the assumptions of Theorem 3.1, the following is true for µ-
almost all x: Given a point x in the support of µ, there exists a unique point y such
that
f(x) + f c1(y) = c(x, y).
Moreover, the points x and y are related by y = ϕ(x, 1).
Proof We first claim that the infimum f(x) = infy∈M [c(x, y) − f c1(y)] is achieved for
µ almost all x. Indeed, by assumption, we have f(x) + f c1(y) ≤ c(x, y) for all (x, y) in
M ×M . Also, if Π is the measure defined in Theorem 2.11, then f(x) + f c1(y) = c(x, y)
for Π-almost everywhere. Since the first marginal of the measure Π is µ, the following
Chapter 3. Nonholonomic Optimal Transportation Problem 26
is true for µ almost all x: Given a point x in the manifold M , there exists y in M such
that f(x) + f c1(y) = c(x, y). This proves the claim.
Now, fix a point x for which the infimum infy∈M [c(x, y)− f c1(y)] is achieved and let
y be the point which achieves the infimum. By the proof of the above claim, x achieves
the infimum f c1(y) = infx∈M [c(x, y) − f(x)]. Therefore, by completeness of the cost c
and Proposition 3.2, there exists an admissible path x such that x(0) = x, x(1) = y and
x achieves the infimum (3.3).
Since f is Lipschitz on a bounded open set U containing the support of µ and ν,
it is almost everywhere differentiable on U by the Rademacher Theorem. Since µ is
absolutely continuous with respect to the Lebesgue measure, f is also differentiable µ-
almost everywhere. By Theorem 3.3, for µ-almost all x, there exists a curve p(·) : [0, 1] →T ∗M in the cotangent bundle T ∗M such that
˙p(t) =−→H t(p(t)),
p(0) = −dfx,
π(p(t)) = x(t),
Ht(p(t)) = maxu∈U
(p(t)(F (x(t), u))− L(x(t), u)) ,
where Ht is the function on the cotangent bundle T ∗M given by Ht(px) = pxF (x, u(t))−L(x, u(t)).
By the definition of H, we have H(p(t)) = Ht(p(t)). But we also have H(p) ≥ Ht(p)
for all p ∈ T ∗M . Since both H and Ht are C2, we have dH(p(t)) = dHt(p(t)). Hence,
−→H t(p(t)) =
−→H (p(t)) for almost all t. The result follows from the uniqueness of a solution
to an ODE.
¤
The rest of the arguments for the existence and uniqueness of an optimal map follow
from Theorem 2.11.
Proof of Theorem 3.1
Chapter 3. Nonholonomic Optimal Transportation Problem 27
As mentioned above, Problem 2.6 is a relaxation of Problem 2.5. We can recover
the latter from the former by restricting the minimization to joint measures of the form
(id× φ)∗µ, where φ is any Borel map pushing forward µ to ν. Therefore, the statement
of Theorem 3.1 follows from Theorem 2.11 and Proposition 3.4.
¤
3.2 Regularity of Control Costs
In Theorem 3.1 we prove the existence and uniqueness of optimal maps under certain
regularity conditions on the cost. Most of the conditions in the theorem are easy to
verify, except for conditions (2) and (3). In this section we will present simple-to-verify
conditions implying (2) and (3), which guarantee this regularity. This includes the com-
pleteness and the Lipschitz regularity of the cost. First, we recall some basic notions in
the geometry of optimal control problems, see [7, 32] and reference therein for details.
Fix a point x0 in the manifold M and assume that the control set U is Rk. In
this section we change our previous convention on admissible controls. From now on
admissible controls are mappings in L1([0, 1], U), rather than L∞([0, 1], U). Denote by
Cx0 the set of all admissible pairs (x(·), u(·)) such that the corresponding admissible paths
x(·) start at x0. Moreover, we assume that the control system is of the following form:
x(t) = X0(x(t)) +k∑
i=1
ui(t)Xi(x(t)), (3.6)
where u(t) = (u1(t), ..., uk(t)) is a control and X0, X1, ..., Xk are fixed smooth vector fields
on the manifold M . The Cauchy problem for system (3.6) is well-posed for any locally
integrable vector-function u(·). We assume throughout this section that system (3.6) is
complete, i.e. all solutions of the system are defined on the whole semi-axis [0, +∞). This
completeness assumption is automatically satisfied if one of the following is true: either
Chapter 3. Nonholonomic Optimal Transportation Problem 28
(i) M is a compact manifold, or (ii) M is a Lie group and the fields Xi are left-invariant,
or (iii) M is a closed submanifold of the Euclidean space and the vector fields Xi satisfy
the following growth conditions |Xi(x)| ≤ c(1 + |x|), i = 0, 1, . . . k.
Define the endpoint map Endx0 : L1([0, 1],Rk) → M by
Endx0(u(·)) = x(1),
where (x(·), u(·)) is the admissible pair corresponding to the control system (3.6) with
initial condition x(0) = x0. It is known that the endpoint map Endx0 is a smooth map
(see [45]). The critical points of the map Endx are called singular controls. Admissible
paths corresponding to singular controls are called singular paths.
We also need the Hessian of the mapping Endx0 at the critical point (see [7] for
detail). For this let us consider the following general situation. Let E be a Banach space
which is a dense subspace of a Hilbert space H. Consider a map Φ : E → Rn for which
the restriction Φ|W to any finite dimensional subspace W of the Banach space E is C2.
Moreover, we assume that the first and second derivatives of all the restrictions Φ|W are
continuous in the Hilbert space topology on the bounded subsets of the Banach space E.
In other words, for each point w in the space W , the following Taylor expansion holds:
Φ(v + w)− Φ(v) = DvΦ(w) +1
2D2
vΦ(w) + o(|w|2),
where DvΦ is a linear map and D2vΦ is a quadratic mapping from E to Rn. Moreover,
Φ(v), DvΦ|W , and D2vΦ|W depend continuously on v in the topology of the Hilbert space
H, while v is contained in a ball of E.
The Hessian HessvΦ : ker DvΦ → cokerDvΦ of the function Φ is the restriction of the
second derivative D2vΦ to the kernel of the first derivative DvΦ with values considered
up to the image of DvΦ. The Hessian is an intrinsically defined operator, i.e. a part of
D2vΦ which does not rely on the choice of variables in the target Banach space E and the
image Euclidean space Rn.
Chapter 3. Nonholonomic Optimal Transportation Problem 29
Let p be a covector in the dual space Rn∗ which annihilates the image of the first
derivative p(DvΦ) = 0, then p(HessvΦ) is a well-defined real quadratic form on ker DvΦ.
We denote the Morse index of this quadratic form by ind(pHessvΦ). Recall that the
Morse index of a quadratic form is the supremum over all dimensions of the subspaces
on which the form is negative definite.
Definition 3.5 A critical point v of the map Φ is called sharp if there exists a covector
p 6= 0 such that p(DvΦ) = 0 and ind(pHessvΦ) < +∞.
Needless to say, the spaces E, H, and Rn can be substituted by smooth manifolds (Ba-
nach, Hilbert, and n-dimensional) in the above consideration.
Going back to the control system (3.6), let (x(·), u(·)) be an admissible pair for this
system. We say that the control u(·) and the path x(·) are sharp if u(·) is a sharp critical
point of the endpoint map Endx(0).
One necessary condition for controls and paths to be sharp is the, so called, Goh
condition.
Proposition 3.6 (The Goh condition) If p(Hessu(·)(Endx(0))) < +∞, then
p(t)(Xi(x(t))) := p(t)([Xi, Xj](x(t))) = 0 i, j = 1, . . . , k, 0 ≤ t ≤ 1,
where p(t) = P ∗t,1p and Pt,τ is the time-dependent local flow of the control system (3.6)
with control equal to u(·) and the time parameter τ .
See [7, Proposition 20.3, 20.4],[9] and references therein for the proof and other effective
necessary and sufficient conditions of sharpness.
Now consider the optimal control problem of finding the minimizers for
c(x, y) = inf(x(·),u(·))∈Cy
x
∫ 1
0
L(x(t), u(t)) dt, (3.7)
where the infimum ranges over all admissible pairs (x(t), u(t)) corresponding to the con-
trol system (3.6) with the initial condition x(0) = x and the final condition x(1) = y.
Chapter 3. Nonholonomic Optimal Transportation Problem 30
Let H : T ∗M → R be the Hamiltonian function defined in (3.5). For simplicity, we
assume that the Hamiltonian is C2. A minimizer x(·) of the above problem is called
normal if there exists a curve p : [0, 1] → T ∗M in the cotangent bundle T ∗M such
that π(p(t)) = x(t) and p(·) is a trajectory of the Hamiltonian vector field−→H . Singular
minimizers are also called abnormal. According to this, not so perfect, terminology
a minimizer can simultaneously be normal and abnormal. A minimizer which is not
normal is called strictly abnormal. Under some regularity and growth conditions on the
Lagrangian L, all strictly abnormal minimizers are sharp, see Theorem 3.7.
The following theorem gives simple sufficient conditions for completeness of the cost
function defined in (3.7). It is a combination of the well-known existence result (see [56])
and necessary optimality conditions (see [7]).
Theorem 3.7 (Completeness of costs) Let L be a Lagrangian function which satisfies
the following:
1. L is bounded below and, on each compact subset of M , there exist a constant K > 0
such that the ratio |u|L(x,u)+K
tends to 0 uniformly as |u| → ∞;
2. for any compact C ⊂ M there exist constants a, b > 0 such that |∂L∂x
(x, u)| ≤a(L(x, u) + |u|) + b, ∀x ∈ C, u ∈ Rk;
3. the function u 7→ L(x, u) is a strongly convex function for all x ∈ M .
Then, for each pair of points (x, y) in the manifold M which satisfy c(x, y) < +∞,
there exists an admissible pair (x(·), u(·)) achieving the infimum in (3.7). Moreover, the
minimizer x(·) is either a normal or a sharp path.
Remark 3.8 Under the assumptions of Theorem 3.7, strictly abnormal minimizers are
sharp.
Chapter 3. Nonholonomic Optimal Transportation Problem 31
Remark 3.9 Theorem 3.7 leads to many examples that satisfy condition (3) in Theorem
3.1. In particular, this applies to the case where the control set is U = Rk and the
Lagrangian is L(x, u) =∑k
i=1 u2i .
Remark 3.10 It was shown that the normal optimal controls in Theorem 3.7 are locally
bounded, see [56]. This allows us to restrict the endpoint map to L∞([0, 1], U) in Theorem
3.11 below.
Next, we proceed to the main result of this section which concerns with the Lipschitz
regularity of the cost function. This takes care of condition (2.) in Theorem 3.1. The
following theorem is also a more precise version of Theorem 1.3.
Theorem 3.11 [5](Lipschitz regularity) Assume that the system (2.4) does not admit
sharp controls and the Lagrangian L satisfies conditions of Theorem 3.7. Then the set
D = (x, Endx(u(·)))|x ∈ M, u ∈ L∞([0, 1],Rk) is open in the product M ×M . More-
over, the function (x, y) 7→ c(x, y) is locally Lipschitz on the set D, where the cost c is
given by (2.2).
Remark 3.12 In the case where the endpoint map is a submersion, there is no singular
control. Therefore, Theorem 3.11 is applicable. In particular, this theorem, together with
Theorem 3.1 and 3.7, can be used to treat the cases considered in [16, 42, 14]. In Section
3.3, we will consider a class of examples where the endpoint map is not necessarily a
submersion, but Theorem 3.11 is still applicable.
The rest of the section is devoted to the proof of Theorem 3.11.
Definition 3.13 Given v in the Banach space E, we write INDvΦ ≥ m if
ind(pHessvΦ)− codim imDvΦ ≥ m
for any nonzero covector p in Rn∗ \0 which annihilates the image of the first derivative
DvΦ: p(DvΦ) = 0.
Chapter 3. Nonholonomic Optimal Transportation Problem 32
It is easy to see that v ∈ E : INDvΦ ≥ m is an open subset of E for any integer m.
Let Bv(ε), Bx(ε) be the balls of radius ε in E and Rn centered at v and x, respectively.
The following is a qualitative version of the openness of a mapping Φ and any mapping
C0 close to it.
Definition 3.14 We say that the map Φ : E → Rn is r-solid at the point v of the
Banach space E if for some constant c > 0 and any sufficiently small ε > 0 the following
inclusion holds for any map Φ : Bv(ε) → Rn which is C0 close to the map Φ:
BΦ(v)(cεr) ⊂ Φ(Bv(ε)).
As usual, to be C0 close to Φ means that there exists δ > 0 such that supw∈Bv(ε)
|Φ(w) −Φ(w)| ≤ δ.
The Implicit function theorem together with the Brouwer fixed point theorem imply
that Φ is 1-solid at any regular point.
Lemma 3.15 If INDvΦ ≥ 0 then Φ is 2-solid at v.
Proof This lemma is a refinement of Theorem 20.3 from [7]. It can be proved by a
slight modification of the proof of the cited theorem. Obviously, we may assume that
v is a critical point of Φ. Moreover, by an argument in the proof of the theorem cited
above, we may assume that E is a finite dimensional space, v = 0 and Φ(0) = 0.
Let E = E1 ⊕ E2, where E2 = ker D0Φ. For any w ∈ E we write v = v1 + v2, where
v1 ∈ E1, v2 ∈ E2. Now consider the mapping
Q : v 7→ D0Φv1 +1
2D2
0Φ(v2), v ∈ E.
It is shown in the proof of [7, Theorem 20.3] that Q−1(0) contains regular points in
any neighborhood of 0. Hence, there exists c > 0 such that the image of any continuous
mapping Q : B0(1) → Rn sufficiently close (in C0-norm) to Q|B0(1) contains the ball B0(c).
Chapter 3. Nonholonomic Optimal Transportation Problem 33
Now, if Φ is C0 close to Φ, we set Φε(v) = 1ε2 Φ(ε2v1 + εv2) and Φε(v) = 1
ε2 Φ(ε2v1 + εv2).
Then, by differentiating Φε with respect to ε, it is easy to see that Φε(v) = Q(v) + o(1)
as ε → 0. This shows that Φε and hence Φε are C0 close to Q for all sufficiently small ε.
Therefore, Φε|B0(1) contains the ball B0(c). This gives
B0(c) ⊂ Φε(B0(1)) ⊂ 1
ε2Φ(B0(ε))
and the result follows. ¤
Remark 3.16 The minimization problem (3.7) can be rephrased into a constrained min-
imization problem in an infinite-dimensional space. For simplicity, consider the case of
M = Rn. Let (x(·), u(·)) be an admissible pair of the control system (3.6) and let
ϕ : Rn × L∞([0, 1],Rk) → R be the function defined by
ϕ(x, u(·)) =
∫ 1
0
L(x(t), u(t))dt.
Let Φ : Rn × L∞([0, 1],Rk) → Rn × Rn be the map
Φ(x, u(·)) = (x,Endx(u(·))).
Finding the minimum in (3.7) is now equivalent to minimizing the function ϕ on the set
Φ−1(x, y).
Due to the above remark, we can consider the following general setting. Consider a
function ϕ : E → R on the Banach space E such that ϕ|W is a C2-mapping for any finite
dimensional subspace W of E. Recall that the Hilbert space H contains E as a dense
subset. Assume that the function ϕ as well as the first and second derivatives of the
restrictions ϕ|W are continuous on the bounded subsets of E in the topology of H. Also,
recall that the map Φ : E → Rn is C2 when restricted to any finite dimensional subspace
of E. Assume that K is a bounded subset of E that is compact in the topology of H
and satisfies the following property:
ϕ(v) = minϕ(w)|w ∈ E, Φ(w) = Φ(v)
Chapter 3. Nonholonomic Optimal Transportation Problem 34
for any v in the set K.
We define a function µ on Φ(K) by the formula µ(Φ(v)) = ϕ(v) for any v in K. In
the case discussed in Remark 3.16, K is the set of all minimizers and the function µ is
the cost function.
Lemma 3.17 If INDvΦ ≥ 2 for any v ∈ K, then µ is locally Lipschitz.
Proof Given v in the set K, there exists a finite dimensional subspace W of the Banach
space E such that INDv (Φ|W ) ≥ 2. Then INDv
(Φ|W∩ker Dvϕ
)≥ 0. Hence Φ|W∩ker Dvϕ is
2-solid at v and
Φ (Bv(ε) ∩W ∩ ker Dvϕ) ⊃ BΦ(v)(cε2)
for some c and any sufficiently small ε.
Let x = Φ(v) and |x − y| = cε2, then y = Φ(w) for some w ∈ Bv(ε) ∩W ∩ ker Dvϕ.
We have:
µ(y)− µ(x) ≤ ϕ(w)− µ(x) = ϕ(w)− ϕ(v) ≤ c′|w − v|2 ≤ c′ε2.
Here, we use the fact that w is in ker Dvϕ for the second last inequality and that w is in
Bv(ε) for the last inequality. Moreover, the compactness of K allows us to choose c, c′
and the bound for ε for all v ∈ K. In particular, we can exchange x and y in the last
inequality. Hence |µ(y)− µ(x)| ≤ c′c|y − x|. ¤
Proof of Theorem 3.11
We describe the proof only in the case M = Rn in order to simplify the language,
while the generalization to any manifold is straightforward. We set
E = Rn × L∞([0, T ],Rk), H = Rn × L2([0, T ],Rk),
Φ(x, u(·)) = (x,Endx(u(·))), ϕ(x, u(·)) =
∫ 1
0
L(x(t), u(t)) dt,
and apply the above results.
Chapter 3. Nonholonomic Optimal Transportation Problem 35
First of all, IND(x,u(·))Φ = INDu(·)Endx = +∞ for all (x, u(·)) since our system does
not admit sharp controls. Lemma 3.15 implies that Φ is 2-solid and D = Φ(E) is open.
Now let B be a ball in E equipped with the weak topology of H. The endpoint
mapping Φ is continuous as a mapping from B to R2n. The strict convexity of L implies
that there is some constant c > 0 such that
ϕ(xn, un(·))− ϕ(x, u(·)) ≥ c‖un(·)− u(·)‖2L2 + o(1)
as xn → x, un(·) u(·), and (xn, un(·)) ∈ B. Therefore, limn→∞
ϕ(xn, un(·)) ≥ ϕ(x, u(·))and lim
n→∞ϕ(xn, un(·)) = ϕ(x, u(·)) if and only if (xn, un(·)) converges to (x, u(·)) in the
strong topology of H.
Assume that ϕ(xn, un(·)) = µ(Φ(xn, un(·))) for all n. Inequality ϕ(x, u(·)) < limn→∞
ϕ(xn, un(·))would imply that
µ(Φ(x, u(·))) < limn→∞
µ(Φ(xn, un(·))).
On the other hand, the openness of the map Φ implies that the map µ is uppersemi-
continuous. Together with the continuity of Φ, we have the following inequality:
µ(Φ(x, u(·))) ≥ limn→∞
µ(Φ(xn, un(·))).
Hence limn→∞
ϕ(xn, un(·)) = ϕ(x, u(·)) and (xn, un(·)) converges to (x, u(·)) in the strong
topology of H.
Let C be a compact subset of D and
K = (x, u(·)) ∈ E : Φ(x, u(·)) ∈ C, ϕ(x, u(·)) = µ(Φ(x, u(·))) .
Then K is contained in some ball B. Recall that B is equipped with the weak topology,
it is compact. Now calculations of previous two paragraphs imply compactness of K in
the strong topology of H. Finally, we derive the Lipschitz property of µ|C from Lemma
3.17. ¤
Chapter 3. Nonholonomic Optimal Transportation Problem 36
3.3 Applications: Mass Transportation on Subrie-
mannian Manifolds
In this section we will apply the results in the previous sections to some subriemannian
manifolds. First, let us recall some basic definitions.
Let ∆ and ∆′ be two (possibly singular) distributions on a manifold M . The distri-
bution [∆, ∆′] is defined by the span of all Lie brackets of vector fields in ∆ with vector
fields in ∆′, i.e.
[∆, ∆′]x = span[v, w](x)|v ∈ ∆, w ∈ ∆′.
Define inductively the following distributions: ∆2 = ∆ + [∆, ∆] and ∆k = ∆k−1 +
[∆, ∆k−1]. A distribution ∆ is called k-generating if ∆k = TM and the smallest such k
is called the degree of nonholonomy. Also, the distribution is called bracket generating
if it is k-generating for some k.
If ∆ is a bracket generating distribution, then it defines a flag of distributions by
∆ ⊂ ∆2 ⊂ ... ⊂ TM.
The growth vector of the distribution ∆ at the point x is defined by (dim ∆x, dim ∆2x, ..., dim TxM).
The distribution ∆ is called regular if the growth vector is the same for all x. Let
x(·) : [a, b] → M be an admissible curve, that is a Lipschitz curve almost everywhere
tangent to ∆. The following classical result on bracket generating distributions is the
starting point of subriemannian geometry.
Theorem 3.18 (Chow and Rashevskii) Given any two points x and y on a connected
manifold M with a bracket generating distribution, there exists an admissible curve join-
ing the two points.
Using the Chow-Rashevskii Theorem, we can define the subriemannian distance d.
Let 〈, 〉 be a fibre inner product on the distribution ∆, called a subriemannian met-
ric. The length of an admissible curve x(·) is defined in the usual way: length(x(·)) =
Chapter 3. Nonholonomic Optimal Transportation Problem 37
∫ b
a
√〈x(t), x(t)〉 dt. The subriemannian distance d(x, y) between two points x and y is
defined by the infimum of the lengths of all admissible curves joining x and y. There is
a quantitative version of the Chow-Rashevskii Theorem, called the Ball-Box Theorem,
which gives Holder continuity of the subriemannian distance, see [45] for detail.
Corollary 3.19 Let dS be the metric of a complete subriemannian space with a distribu-
tion ∆. The function d2S is locally Lipschitz if and only if the distribution is 2-generating.
Proof Systems with 2-generating distributions do not admit sharp paths because of
the Goh condition. So d2S is locally Lipschitz by Theorem 3.11. Conversely, if the degree
of nonholonomy of the distribution is greater than 2, then it follows from the ball-box
theorem [45, Theorem 2.10] that the function d2S is not Lipschitz. Indeed, let us fix a
point x in the manifold M . If d2S is locally Lipschitz, then d2
S(x, y) ≤ c|x − y| for some
constant c and for all y in a neighborhood U of x. On the other hand, by ball-box
theorem, there exists a point z in U whose subriemannian distance dS from the point
x is ε and its Euclidean distance from x satisfies |x − z| < Cεk = CdkS(x, z) for some
constant C and for all sufficiently small ε. Here, k > 2 is the degree of nonholonomy of
the distribution. This gives a contradiction and so d2S is not Lipschitz. ¤
Combining Corollary 3.19 with Theorem 3.1, we prove the existence and uniqueness
of an optimal map for a subriemannian manifold with a 2-generating distribution.
Theorem 3.20 [5] Let M be a complete subriemannian manifold defined by a 2-generating
distribution, then there exists a unique (up to µ-measure zero) optimal map to the Monge’s
problem with the cost c given by c = d2S, where dS is the subriemannian distance of M .
Remark 3.21 The locally Lipschitz property of the distance d off the diagonal is guar-
anteed for much bigger class of distribution. In particular, it is proved in [3] that
a generic distribution of rank > 2 does not admit non-constant sharp trajectories.
Chapter 3. Nonholonomic Optimal Transportation Problem 38
In the case of Carnot groups, the following estimates hold: A typical n-dimensional
Carnot group with rank k distribution does not admit nonconstant sharp trajectories if
n ≤ (k − 1)k + 1 and it has nonconstant sharp length minimizing trajectories provided
that n ≥ (k − 1)(k2
3+ 5k
6+ 1). Recall that a simply-connected Lie group endowed with
a left-invariant distribution V1 is a Carnot group if the Lie algebra g is a graded nilpo-
tent Lie algebra such that it is Lie generated by the subspace with smallest grading (i.e.
g = V1 ⊕ V2 ⊕ ...⊕ Vk, [Vi, Vj] = Vi+j, Vi = 0 if i > k and the subspace V1 Lie-generates
g).
Clearly, if the cost is locally Lipschitz off the diagonal, then the statement of Theo-
rem 4.1 remains valid after making the extra assumption that the supports of the initial
measure µ and the final measure ν are disjoint: supp(µ) ∩ supp(ν) = ∅. In the subrie-
mannian case a more general result is proved in [24]. In [24] it is shown that the Lipschitz
continuity of the distance d is only needed out of the diagonal even when the measures
µ and ν do not have distinct supports.
Chapter 4
Optimal Transportation with
Non-Lipschitz Cost
4.1 Normal Minimizers and Properties of Optimal
Maps with Continuous Optimal Control Costs
According to Theorem 3.11, it remains to study the case where sharp controls exist. In
this section we will describe properties of optimal maps when the cost is continuous.
Normal minimizers will play a very important role.
We continue to study optimal control problem (3.6), (3.7). As we already mentioned,
strictly abnormal minimizers must be sharp. In addition, if X0 = 0 in (3.6), then the
optimal control cost is continuous. According to the discussion at the end of the previous
section, we expect strictly abnormal minimizers to be present mainly for generic rank
2 distributions on manifolds of dimension greater than 3 and for typical Carnot group
of sufficiently large corank. In these situations strictly abnormal minimizers are indeed
unavoidable.
The existence of strictly abnormal minimizers for subriemannian manifolds was first
proved in [44]. In [61] and [38] it was shown that there are many strictly abnormal
39
Chapter 4. Optimal Transportation with Non-Lipschitz Cost 40
minimizers in general for subriemannian manifolds, see Theorem 4.1 below. Finally, a
general theory of abnormal minimizers for rank 2 distributions was developed in [8]. We
refer to [45] for a detail account on the history and references on abnormal minimizers.
Here is a sample result in [61] which is of interest to us:
Theorem 4.1 (Liu and Sussman) Let M be a 4-dimensional manifold with a rank 2
regular bracket generating distribution ∆ and a subriemannian metric <, >. Let X1 and
X2 be two global sections of ∆ such that
1. X1 and X2 are everywhere orthonormal,
2. X1, X2, [X1, X2] and [X2, [X1, X2]] are everywhere linearly dependent,
3. X2, [X1, X2] and [X2, [X1, X2]] are everywhere linearly independent.
Then any sufficiently short segments of the integral curves of the vector field X2 are
strictly abnormal minimizers.
We call a local flow a strictly abnormal flow if the corresponding trajectories are all
strictly abnormal minimizers. An interesting question is whether the time-1-map of an
abnormal flow is an optimal map. The following theorem shows that this is not the case
for any reasonable initial measure and continuous cost.
Theorem 4.2 Assume that the cost c in (2.5) is continuous, bounded below, and the
support of the measure µ is equal to the closure of its interior. If ϕ : M → M is a
continuous map such that (id × ϕ)∗µ achieves the infimum in Problem 2.6, then x and
ϕ(x) can be connected by a normal minimizer on a dense set of x’s in the support of µ.
Proof By Theorem 2.9, there exists a function f : M → R ∪ −∞ such that f and
its c1-transform achieve the supremum in Problem 2.8. Moreover, by Theorem 2.10, the
functions f and f c1 are upper semicontinuous. By Theorem 2.11,
f(x) + f c1(ϕ(x)) = c(x, ϕ(x)) (4.1)
Chapter 4. Optimal Transportation with Non-Lipschitz Cost 41
for µ-almost all x. By upper semicontinuity of f and f c1 , we have
f(x) + f c1(ϕ(x)) ≥ c(x, ϕ(x)).
But f(x) + f c(y) ≤ c(x, y) for any x, y in the manifold M . So, the equality (4.1)
holds for all x’s in the support U of µ. Therefore, x achieves the infimum f c1(φ(x)) =
infz∈M [c(z, φ(x))− f c1(z)] for all x in the support of µ. Moreover, using (4.1), it is easy
to see that the function f is continuous on U . In particular, it is subdifferentiable on a
dense set of U . By Proposition 3.2 and Theorem 3.3, x and ϕ(x) can be connected by a
normal minimizer if f is subdifferentiable at x. This proves the theorem. ¤
4.2 Optimal Maps with Abnormal Minimizers
In this section, we describe an important class of control systems which admit smooth
optimal maps built essentially from abnormal minimizers. Recall that abnormal mini-
mizers are singular trajectories of the control system, whose definition does not depend
on the Lagrangian.
Let ρ : MG−→ N be a principal bundle with a connected Abelian structure group G
and let X1, . . . , Xk be vertical vector fields which generate the action of G. Consider the
following control system
x(t) = X0(x(t)) +k∑
i=1
ui(t)Xi(x(t)), (4.2)
where X0 is a smooth vector field on M , and the re-scaled systems
x(t) = εX0(x(t)) +k∑
i=1
ui(t)Xi(x(t)) (4.3)
for ε > 0.
We define the Hamiltonian H : T ∗N → R on the cotangent bundle T ∗N of the base
N by
H(px) = maxpx(dρ(X0(y))|y ∈ ρ−1(x) (4.4)
Chapter 4. Optimal Transportation with Non-Lipschitz Cost 42
where px is a covector in T ∗N . We also assume that the maximum above is achieved for
any p in T ∗N and it is finite.
A typical example is the Hopf bundle φ : SU(2)S1−→ S2 and a left-invariant vector
field F0. In this case, the Hamiltonian is given by H(p) = α|p|, where α is a constant
and |p| is the length of the covector p with respect to the standard (constant curvature)
Riemannian structure on the sphere, see [7, Section 22.2].
Consider the following control system on the base N with an admissible pair y(·)contained in the G-bundle ρ : M
G−→ N and an admissible trajectory x(t) = ρ(y(t)), see
Remark 2.6:
x(t) = dρ(X0(y(t))). (4.5)
The function H in (4.4) is the Hamiltonian of the time-optimal problem of the control
system (4.5). (Recall that the time optimal problem is the following minimization prob-
lem: Fix two points x0 and x1 in N and minimize the time t1 among all admissible
trajectories x(·) of the control system (4.5) such that x(t0) = x0 and x(t1) = x1.)
Remark 4.3 System (4.5) is the reduced system associated to system (4.2) according
to the reduction procedure described in [7, Chapter 22]. In particular, ρ transforms any
admissible trajectory of system (4.2) to the admissible trajectory of system (4.5). Also,
the smooth extremal trajectories of the time-optimal problem for system (4.5) are images
under the map ρ of singular trajectories of system (4.2).
For any ε > 0 and any C2 smooth function f : N → R, we introduce the map
Φεf : N → N, Φε
f (x) = π(eε ~H(dxf)), x ∈ N,
where π : T ∗N → N is the standard projection and t 7→ et ~H is the Hamiltonian flow of
H. Set
D = p ∈ T ∗N : H(p) > 0, H is of class C2 at p.
Assume that Φεf pushes the measure µ′ forward to another measure ν ′ on N . Consider
some “lifts” µ and ν of the measures µ′ and ν ′: ρ∗µ = µ′, ρ∗ν = ν ′. Let Ψ : M −→ M
Chapter 4. Optimal Transportation with Non-Lipschitz Cost 43
be an optimal map pushing forward µ to ν, then the following theorem says that Ψ is a
covering of Φεf : ρ Ψ = Φε
f ρ.
Theorem 4.4 Let K be a compact subset of N and a ∈ C2(N). Assume that df |K ⊂D. Let µ and ν be Borel probability measures such that supp(ρ∗(µ)) ⊂ K. Then, for
any sufficiently small ε > 0 and any optimal Borel map Ψ : M → M of the control
system (4.3) with any Lagrangian L : TM → R, the following is true whenever ρ∗(ν) =
Φεf ∗(ρ∗(µ)):
ρ Ψ = Φεf ρ.
In particular, x and Ψ(x) are connected by singular trajectories.
Proof We start with the following
Definition 4.5 We say that a Borel map Q : K → N is ε-admissible for system (4.3)
if there exists a Borel map ϕ : K → L∞([0, ε], G) such that
Q(x0) = x (ε; ϕ(x0)(·)) , ∀x0 ∈ K,
where t 7→ x (t; ϕ(x0)(·)) is an admissible trajectory of the reduced control system (4.5)
with the initial condition x (0; ϕ(x0)(·)) = x0.
We are going to prove that Φεf is an admissible map, unique up to a ρ∗µ-measure zero
set, which transforms ρ∗µ into ρ∗ν. This fact implies the statement of the theorem.
Inequality H(dxf) > 0 implies that dπ( ~H(dxf)) is transversal to the level hypersurface
of f passing through x. Hence the map Φεf is invertible in a neighborhood of K for any
sufficiently small ε. Moreover, the curve t 7→ Φtf (y), 0 ≤ t ≤ ε, is a unique admissible
trajectory of system (4.5) which starts at the hypersurface f−1(f(x)) and arrives at the
point Φεf (x) at time moment not greater than ε. The last fact is proved by a simple
adaptation of the standard sufficient optimality condition (see [7, Chapter 17]).
Chapter 4. Optimal Transportation with Non-Lipschitz Cost 44
Now we set
fε(x) = f((Φε
a)−1(x)
)+ ε,
then fε is a smooth function defined in a neighborhood of K.
The optimality property of Φεf implies that
fε(Q(x)) ≤ fε
(Φε
f (x))
for any ε-admissible map Q and any x ∈ K, and the inequality is strict at any point x
where Q(y) 6= Φεf (x). In particular, if
ρ∗µ(x ∈ K : Q(x) 6= Φε
f (x)) > 0,
then ∫
N
fε d(Q∗(ρ∗(µ))) =
∫
N
fε Qd(ρ∗(µ)) <
∫
N
fε Φεf d(ρ∗(µ)) =
∫
N
fε d(ρ∗(ν)).
Hence Q∗(ρ∗(µ)) 6= ρ∗(ν). This proves the first part of the theorem. Recall that the maps
Ψ and Φ satisfy the following relation:
ρ Ψ = Φεf ρ.
It follows from this and Remark 4.3 that x and Ψ(x) are connected by a singular mini-
mizer. ¤
Part II
A Nonholonomic Moser Theorem
and Optimal Mass Transport
45
Chapter 5
Classical and Nonholonomic Moser
Theorems
The main goal of this chapter is to prove the following nonholonomic version of the
classical Moser theorem. It is also a more precise version of Theorem 1.4 mentioned in
the introduction. Consider a distribution τ on a compact manifold M (without boundary,
unless otherwise stated).
Theorem 5.1 Let τ be a bracket-generating distribution, and µ0, µ1 be two volume forms
on M with the same total volume:∫
Mµ0 =
∫M
µ1. Then there exists a diffeomorphism
φ of M which is the time-one-map of the flow φt of a non-autonomous vector field Vt
tangent to the distribution τ everywhere on M for every t ∈ [0, 1], such that φ∗µ1 = µ0.
Note that the existence of the “nonholonomic isotopy” φt is guaranteed by the only
condition on equality of total volumes for µ0 and µ1, just like in the classical case:
Theorem 5.2 [43] Let M be a manifold without boundary, and µ0, µ1 are two volume
forms on M with the same total volume:∫
Mµ0 =
∫M
µ1. Then there exists a diffeomor-
phism φ of M , isotopic to the identity, such that φ∗µ1 = µ0.
46
Chapter 5. Classical and Nonholonomic Moser Theorems 47
Remark 5.3 The classical Moser theorem has numerous variations and generalizations,
some of which we would like to mention.
a) Similarly one can show that not only the identity, but any diffeomorphism of M is
isotopic to a diffeomorphism which pulls back µ1 to µ0.
b) The Moser theorem also holds for a manifold M with boundary. In this case a
diffeomorphism φ is a time-one-map for a (non-autonomous) vector field Vt on M , tangent
to the boundary ∂M .
c) Moser also proved in [43] a similar statement for a pair of symplectic forms on a
manifold M : if two symplectic structures can be deformed to each other among symplectic
structures in the same cohomology class on M , these deformation can be carried out by
a flow of diffeomorphisms of M .
Below we describe to which degree these variations extend to the nonholonomic case.
Apparently, the most straightforward generalization of the classical Moser theorem
is its version “with parameters.” In this case, volume forms on M smoothly depend on
parameters and have the same total volume at each value of this parameter:∫
Mµ0(s) =
∫M
µ1(s) for all s. The theorem guarantees that the corresponding diffeomorphism exists
and depends smoothly on this parameter s.
The following theorem can be regarded as a modification of the parameter version:
Theorem 5.4 [34] Let π : N → B be a fibration of an n-dimensional manifold N over a
k-dimensional base manifold B. Suppose that µ0, µ1 are two smooth volume forms on N .
Assume that the pushforwards of these n-forms to B coincide, i.e. they give one and the
same k-form on B: π∗µ0 = π∗µ1. Then, there exists a diffeomorphism φ of N which is
the time-one-map of a (non-autonomous) vector field Vt tangent everywhere to the fibers
of this fibration and such that φ∗µ1 = µ0.
Remark 5.5 Note that in this version the volume forms are given on the ambient man-
ifold N , while in the parametric version of the Moser theorem we are given fiberwise
Chapter 5. Classical and Nonholonomic Moser Theorems 48
volume forms. There is also a similar version of this theorem for a foliation, cf. e.g.
[27]. In either case, for the corresponding diffeomorphism to exist the volume forms have
to satisfy infinitely many conditions (the equality of the total volumes as functions in
the parameter s or as the push-forwards π∗µ0 and π∗µ1). The case of a fibration (or a
foliation) corresponds to an integrable distribution τ , and presents the “opposite case” to
a bracket-generating distribution. Unlike the case of an integrable distribution, the exis-
tence of the corresponding isotopy between volume forms in the bracket-generating case
imposes only one condition, the equality of the total volumes of the two forms (regardless,
e.g., of the distribution growth vector at different points of the manifold).
First, we recall a proof of the classical Moser theorem. To show how the proof changes
in the nonholonomic case, we split it into several steps.
Proof
1) Connect the volume forms µ0 and µ1 by a “segment” µt = µ0+t(µ1−µ0), t ∈ [0, 1].
We will be looking for a diffeomorphism gt sending µt to µ0: g∗t µt = µ0. By taking the
t-derivative of this equation, we get the following “homological equation” on the velocity
Vt of the flow gt: g∗t (LVtµt + ∂tµt) = 0, where ∂tgt(x) = Vt(gt(x)). This is equivalent to
LVtµt = µ0 − µ1 ,
since ∂tµt = −(µ0 − µ1).
By rewriting µ0−µ1 = ρtµt for an appropriate function ρt, we reformulate the equation
LVtµt = ρtµt as the problem divµtVt = ρt of looking for a vector field Vt with a prescribed
divergence ρt. Note that the total integral of the function ρt (relative to the volume µt)
over M vanishes, which manifests the equality of total volumes for µt.
2) We omit the index t for now and consider a Riemannian metric on M whose
volume form is µ. We are looking for a required field V with prescribed divergence
among gradient vector fields V = ∇u, which “transport the mass” in the fastest way.
Chapter 5. Classical and Nonholonomic Moser Theorems 49
This leads us to the elliptic equation divµ(∇u) = ρ, i.e. ∆u = ρ, where the Laplacian ∆
is defined by ∆u := divµ∇u and depends on the Riemannian metric on M .
3) The key part of the proof is the following
Lemma 5.6 The Poisson equation ∆u = ρ on a compact Riemannian manifold M is
solvable for any function ρ with zero mean:∫
Mρ µ = 0 (with respect to the Riemannian
volume form µ).
Proof First we describe the space Coker ∆ := (Im ∆)⊥L2 , i.e. find the space of all
functions h which are L2-orthogonal to the image Im ∆. By applying integration by
parts twice, one has:
0 = 〈h, ∆u〉L2 = −〈∇h,∇u〉L2 = 〈∆h, u〉L2
for all smooth functions u on M . Then such functions h must be harmonic, and hence
they are constant functions on M : (Im ∆)⊥L2 = const (see [62, p.402] for detail). Since
the image Im ∆ is closed, it is the L2-orthogonal complement of the space of constant
functions Im ∆ = const)⊥L2 . The condition of orthogonality to constants is exactly
the condition of zero mean for ρ: 〈const, ρ〉L2 =∫
Mρ µ = 0. Thus the equation ∆u = ρ
has a weak solution for ρ with zero mean, and the ellipticity of ∆ implies that the solution
is smooth for a smooth function ρ. ¤
4) Now, take Vt := ∇ut and let gtV be the corresponding flow on M . Since M is
compact and Vt is smooth, the flow exists for all time t. The diffeomorphism φ := g1V ,
the time-one-map of the flow gtV , gives the required map which pulls back the volume
form µ1 to µ0: φ∗µ1 = µ0. ¤
Proof of Theorem 5.4, the Moser theorem for a fibration:
We start by defining the new volume form on the fibres F using the pushforward
k-form ν0 := π∗µ0 on the base B and the volume n-form µ0 on N . Namely, consider the
Chapter 5. Classical and Nonholonomic Moser Theorems 50
pull-back k-form π∗ν0 to N . Then there is a unique (n− k)-form µF0 on fibers such that
µF0 ∧ π∗ν0 = µ0. More precisely, let v1, ..., vn−k be a linearly independent set of vectors
in a tangent space TxN of N at x which is tangent to the fibre. If we extend the above
set of tangent vectors to a basis v1, ..., vn of TN , then the volume form µF0 is defined by
µF0 (v1, ..., vn−k) =
µ0(v1, ..., vn)
π∗ν0(vn−k+1, ..., vn).
It follows from linear algebra that the above definition of µF0 is independent of the choice
of extension and there is only one such n − k-form on the fibre. Similarly we find µF1 .
Due to the equality of the pushforwards π∗µ0 and π∗µ1, the total volumes of µF0 and µF
1
are fiberwise equal. Hence by the Moser theorem applied to the fibres, there is a smooth
vector field tangent to the fibers, smoothly depending on a base point, and whose flow
sends one of the (n− k)-forms, µF1 , to the other, µF
0 . This field is defined globally on N ,
and hence its time-one-map pulls back µ1 to µ0. ¤
Now we turn to a nonholonomic distribution on a manifold.
Proof of Theorem 5.1, the nonholonomic version of the Moser theorem.
1) As before, we connect the forms by a segment µt, t ∈ [0, 1], and we come to the
same homological equation. The latter reduces to divµV = ρ with∫
ρ µ = 0, but the
equation now is for a vector field V tangent to the distribution τ .
2) Consider some Riemannian metric on M . Now we will be looking for the required
field V in the form V := P τ∇u, where P τ is a pointwise orthogonal projection of tangent
vectors to the planes of our distribution τ .
We obtain the equation divµ(P τ∇u) = ρ. Rewrite this equation by introducing the
sub-Laplacian ∆τu := divµ(P τ∇u) associated to the distribution τ and the Riemannian
metric on M . The equation on the potential u becomes ∆τu = ρ.
3) An analog of Lemma 5.6 is now as follows.
Proposition 5.7 [34] a) The sub-Laplacian operator ∆τu := divµ(P τ∇u) is a self-
adjoint hypoelliptic operator. Its image is closed in L2.
Chapter 5. Classical and Nonholonomic Moser Theorems 51
b) The equation ∆τu = ρ on a compact Riemannian manifold M is solvable for any
function ρ with zero mean:∫
Mρ µ = 0.
Proof a) The principal symbol δτ of the operator ∆τ is the sum of squares of vector
fields forming a basis for the distribution τ : δτ =∑
X2i , where Xi form a horizontal
orthonormal frame for τ . This is exactly the Hormander condition of hypoellipticity [29]
for the operator ∆τ . The self-adjointness follows from the properties of projection and
integration by parts. The closedness of the image in L2 follows from the results of [54, 55].
b) We need to find the condition of weak solvability in L2 for the equation ∆τu = ρ.
Again, we are looking for all those functions h which are L2-orthogonal to the image of
∆τ (or, which is the same, in the kernel of this operator):
0 = 〈h, ∆τu〉L2 = 〈h, divµ(P τ∇u)〉L2
for all smooth functions u on M . In particular, this should hold for u = h. Integrating
by parts we come to
0 = 〈h, divµ(P τ∇h)〉L2 = −〈∇h, P τ∇h〉L2 = −〈P τ∇h, P τ∇h〉L2 ,
where in the last equality we used the projection property (P τ )2 = P τ = (P τ )∗. Then
P τ∇h = 0 on M , and hence the equation ∆τu = ρ is solvable for any function ρ ⊥L2
h | P τ∇h = 0. We claim that all such functions h are constant on M . Indeed, the
condition P τ∇h = 0 means that LXh = 0 for any horizontal field X, i.e. a field tangent
to the distribution τ . But then h must be constant along any horizontal path, and due to
the Chow-Rashevsky theorem it must be constant everywhere on M . Thus the functions
ρ must be L2-orthogonal to all constants, and hence they have zero mean. This implies
that the equation divµ(P τ∇u) = ρ is solvable for any L2 function ρ with zero mean. For
a smooth ρ the solution is also smooth due to hypoellipticity of the operator. ¤
4) Now consider the horizontal field Vt := P τ∇ut. As before, the time-one-map of its
flow exists for the smooth field Vt on the compact manifold M , and it gives the required
diffeomorphism φ. ¤
Chapter 5. Classical and Nonholonomic Moser Theorems 52
According to the classical Helmholtz-Hodge decomposition, any vector field W on a
Riemannian manifold M can be uniquely decomposed into the sum W = V + U , where
V = ∇f and divµU = 0. Proposition 5.7 suggests the following nonholonomic Hodge
decomposition of vector fields on a manifold with a bracket-generating distribution:
Proposition 5.8 1) For a bracket-generating distribution τ on a Riemannian manifold
M , any vector field W on M can be uniquely decomposed into the sum W = V + U ,
where the field V = P τ∇f and it is tangent to the distribution τ , while the field U is
divergence-free: divµ U = 0. Here P τ is the pointwise orthogonal projection to τ .
2) Moreover, if the vector field W is tangent to the distribution τ on M , then W =
V + U , where V = P τ∇f || τ as before, while the field U is divergence-free, tangent to τ ,
and L2-orthogonal to V , see Figure 5.1.
T
Xµ
P τ∇fW
U
U
W
V
V
∇f
Figure 5.1: A nonholonomic Hodge decomposition.
Proof Let ρ := divµ W be the divergence of W with respect to the Riemannian volume
µ. First, note that∫
Mρ µ = 0. Indeed,
∫M
(divµ W ) µ =∫
MLW µ = 0, since the volume
of µ is defined in a coordinate-free way, and does not change along the flow of the field
W .
Chapter 5. Classical and Nonholonomic Moser Theorems 53
Now, apply Proposition 5.7 to find a solution of the equation div (P τ∇f) = ρ. The
field V := P τ∇f is defined uniquely. Then the field U := W − V is divergence-free,
which proves 1).
For a field W || τ , we define V := P τ∇f in the same way. Note that V || τ as well.
Then U := W − V is both tangent to τ and divergence-free. Furthermore,
〈U , V 〉L2 = 〈U , P τ∇f〉L2 = 〈P τ U ,∇f〉L2 = 〈U ,∇f〉L2 = 〈divµ U , f〉L2 = 0,
where we used the properties of U established above: U || τ and divµ U = 0.
¤
Above we defined a sub-Laplacian ∆τu := divµ (P τ∇u) for a function u on a Rieman-
nian manifold M with a distribution τ .
Proposition 5.9 The sub-Laplacian ∆τ depends only on a subriemannian metric on the
distribution τ and a volume form in the ambient manifold M .
Proof Note that the operator P τ∇ on a function u is the horizontal gradient ∇τ of u,
i.e. the vector of the fastest growth of u among the directions in τ . If one chooses a local
orthonormal frame X1, ..., Xk in τ , then P τ∇u =∑k
i=1(LXiu)Xi. Thus the definition of
the horizontal gradient relies on the subriemannian metric only.
The sub-Laplacian ∆τψ = divµ (P τ∇ψ) needs also the volume form µ in the ambient
manifold to take the divergence with respect to this form. ¤
The corresponding nonholonomic heat equation ∂tu = ∆τu is also defined by the
subriemannian metric and a volume form.
For a manifold M with non-empty boundary ∂M and two volume forms µ0, µ1 of equal
total volume, the classical Moser theorem establishes the existence of diffeomorphism
φ which is the time-one-map for the flow of a field Vt tangent to ∂M and such that
φ∗µ1 = µ0.
The existence of the required gradient field Vt = ∇u is guaranteed by the following
Chapter 5. Classical and Nonholonomic Moser Theorems 54
Lemma 5.10 Let µ be a volume form on a Riemannian manifold M with boundary
∂M . The Poisson equation ∆u = ρ with Neumann boundary condition ∂∂n
u = 0 on the
boundary ∂M is solvable for any function ρ with zero mean:∫
Mρ µ = 0.
Here ∂∂n
is the differentiation in the direction of outer normal n on the boundary.
Proof Proceed in the same way as in Lemma 5.6 to find all functions L2-orthogonal to
the image Im ∆. The first integration by parts gives:
0 =
∫
M
h(∆u)µ = −∫
M
〈∇h,∇u〉µ +
∫
∂M
h (∂
∂nu) µ = −
∫
M
〈∇h,∇u〉µ ,
where in the last equality we used the Neumann boundary conditions. The second
integration by parts gives:
0 =
∫
M
〈∆h, u〉µ−∫
∂M
(∂
∂nh) uµ
This equation holds for all smooth functions u on M , so any such function h must be
harmonic in M and satisfy the Neumann boundary condition ∂∂n
h = 0. Hence, these are
constant functions on M : (Im ∆)⊥L2 = const (see [62, p.402] for detail). This gives
the same description as in the no-boundary case: the image (Im ∆) with the Neumann
condition consists of functions ρ with zero mean. ¤
Geometrically, the Neumann boundary condition means that there is no flux of density
through the boundary ∂M : 0 = ∂u∂n
= n · ∇u = n · V on ∂M .
For distributions on manifolds with boundary, the solution of the Neumann problem
becomes a much more subtle issue, as the behavior of the distribution near the boundary
affects the flux of horizontal fields across the boundary, and hence the solvability in this
problem. However, there is a class of domains in length spaces for which the solvability
of the Neumann problem was established.
Let LS be a length space with the distance function d(x, y), defined as infimum of
lengths of continuous curves joining x, y ∈ LS. Consider domains in this space with the
Chapter 5. Classical and Nonholonomic Moser Theorems 55
property that sufficiently close points in those domains can be joined by a not very long
path which does not get too close to the domain boundary. The formal definition is as
follows.
Definition 5.11 An open set Ω ⊆ LS is called an (ε, δ)-domain if there exist δ > 0 and
0 < ε ≤ 1 such that for any pair of points p, q ∈ Ω with d(p, q) ≤ δ there is a continuous
rectifiable curve γ : [0, T ] → Ω starting at p and ending at q such that the length l(γ) of
the curve γ satisfies
l(γ) ≤ 1
εd(p, q)
and
mind(p, z), d(q, z) ≤ 1
εd(z, ∂Ω)
for all points z on the curve γ.
A large source of (ε, δ)-domains is given by some classes of open sets in Carnot groups,
where the Carnot group itself is regarded as a length space with the Carnot-Caratheodory
distance, defined via the lengths of admissible (i.e. horizontal) paths, see e.g. [46]. There
is a natural notion of diameter (or, radius) for domains in length spaces.
Theorem 5.12 [34] Let τ be a bracket-generating distribution on a subriemannian man-
ifold M with smooth boundary ∂M , and µ0, µ1 be two volume forms on M with the same
total volume:∫
Mµ0 =
∫M
µ1. Suppose that the interior of M is an (ε, δ)-domain of
positive diameter.
Then there exists a diffeomorphism φ of M which is the time-one-map of the flow φt
of a non-autonomous vector field Vt tangent to the distribution τ everywhere on M and
to the boundary ∂M for every t ∈ [0, 1], such that φ∗µ1 = µ0.
The proof immediately follows from the result on solvability of the corresponding
Neumann problem ∆τu = ρ with n·(P τ∇u)|∂M = 0 (or, which is the same, ∂u∂(P τ n)
|∂M = 0)
for such domains, established in [47, 46] (cf. Theorem 1.5 in [46]).
Chapter 6
Distributions on Diffeomorphism
Groups
6.1 A Fibration on the Group of Diffeomorphisms
Let D be the group of all (orientation-preserving) diffeomorphisms of a manifold M .
Its Lie algebra X consists of all smooth vector fields on M . The tangent space to the
diffeomorphism group at any point φ ∈ D is given by the right translation of the Lie
algebra X from the identity id ∈ D to φ:
TφD = X φ | X ∈ X .
Fix a volume form µ of total volume 1 on the manifold M . Denote by Dµ the subgroup
of volume-preserving diffeomorphisms, i.e. the diffeomorphisms preserving the volume
form µ. The corresponding Lie algebra Xµ is the space of all vector fields on the manifold
M which are divergence-free with respect to the volume form µ.
Let W be the set of all smooth normalized volume forms in M , which is called the
(smooth) Wasserstein space. Consider the projection map πD : D → W defined by the
push forward of the fixed volume form µ by the diffeomorphism φ, i.e. πD(φ) = φ∗µ.
The projection πD : D → W defines a natural structure of a principal bundle on D
56
Chapter 6. Distributions on Diffeomorphism Groups 57
whose structure group is the subgroup Dµ of volume-preserving diffeomorphisms of M
and fibers F are right cosets for this subgroup in D. Two diffeomorphisms φ and φ lie
in the same fiber if they differ by a composition (on the right) with a volume-preserving
diffeomorphism: φ = φ s, s ∈ Dµ.
On the group D we define two vector bundles Ver and Hor whose spaces at a diffeo-
morphism φ ∈ D consist of right translated divergence-free fields
Verφ = X φ | divφ∗µX = 0
and gradient fields
Horφ := ∇f φ | f ∈ C∞(M) ,
respectively. Note that the bundle Ver is defined by the fixed volume form µ, while Hor
requires a Riemannian metric.1
Proposition 6.1 The bundle Ver of translated divergence-free fields is the bundle of
vertical spaces TφF for the fibration πD : D → W. The bundle Hor over D defines a
horizontal distribution for this fibration πD.
Proof Let φt be a curve in a fibre of πD : D → W emanating from the point φ0 = φ.
Then φt = φ0 st, where s0 = id and st are volume-preserving diffeomorphisms for each
t. Let Xt be a family of divergence-free vector fields, such that ∂tst = Xt st. Then the
vector tangent to the curve φt = φ0 st is given by ddt
∣∣∣t=0
(φ0 st) = (φ0∗X0)φ0. Since X0
is divergence-free with respect to µ, φ0∗X0 is divergence-free with respect to φ∗µ. Hence,
any vector tangent to the diffeomorphism group at φ is given by X φ, where X is a
divergence-free field with respect to the form φ∗µ.
By the Hodge decomposition of vector fields, we have the direct sum TD = Hor⊕Ver.
¤
1The metric on M does not need to have the volume form µ. In the general case, Xµ consists ofvector fields divergence-free with respect to µ, while the gradients are considered for the chosen metricon M .
Chapter 6. Distributions on Diffeomorphism Groups 58
Remark 6.2 The classical Moser theorem 5.2 can be thought of as the existence of
path-lifting property for the principal bundle πD : D → W : any deformation of volume
forms can be traced by the corresponding flow, i.e. a path on the diffeomorphism group,
projected to the deformation of forms. Its proof shows that this path lifting property
holds and has the uniqueness property in the presence of the horizontal distribution
defined above by using the Hodge decomposition. Namely, given any path µt starting at
µ0 in the smooth Wasserstein space W and a point φ0 in the fibre (πD)−1µ0, there exists
a unique path φt in the diffeomorphism group which is tangent to the horizontal bundle
Hor, starts at φ0, and projects to µt, see Figure 6.1.
Dµ
µt
µ1µ = µ0
φ0 = idφt
φ1
Horφt
∇f
W
D
π
Figure 6.1: The Moser theorem in both the classical and nonholonomic settings is a
path-lifting property in the diffeomorphism group.
Chapter 6. Distributions on Diffeomorphism Groups 59
6.2 A Nonholonomic Distribution on the Diffeomor-
phism Group
Let τ be a bracket-generating distribution on the manifold M . Consider the right-
invariant distribution T on the diffeomorphism group D defined at the identity id ∈ Dof the group by the subspace in X of all those vector fields which are tangent to the
distribution τ everywhere on M :
Tφ = V φ | V (x) ∈ τx for all x ∈ M.
Proposition 6.3 The infinite-dimensional distribution T is a non-integrable distribu-
tion in D. Horizontal paths in this distribution are flows of non-autonomous vector fields
tangent to the distribution τ on manifold M .
Proof To see that the distribution T is non-integrable we consider two horizontal vector
fields V and W on M and the corresponding right-invariant vector fields V and W on
D. Then their bracket at the identity of the group is (minus) their commutator as
vector fields V and W in M . This commutator does not belong to the plane Tid since
the distribution τ is non-integrable, and at least somewhere on M the commutator of
horizontal fields V and W is not horizontal.
The second statement immediately follows from the definition of T . ¤
Remark 6.4 Consider now the projection map πD : D → W in the presence of the
distribution T on D. The path lifting property in this case is a restatement of the
nonholonomic Moser theorem. Namely, for a curve µt | µ0 = µ in the space W of
smooth densities Theorem 5.1 proves that there is a curve gt | g0 = id in D, everywhere
tangent to the distribution T and projecting to µt: πD(gt) = µt.
Chapter 6. Distributions on Diffeomorphism Groups 60
Recall that in the classical case the corresponding path lifting becomes unique once
we fix the gradient horizontal bundle Horφ ⊂ TφD for any diffeomorphism φ ∈ D. Sim-
ilarly, in the nonholonomic case we consider the spaces of gradient projections instead
of the gradient spaces: Horτid := P τ∇f | f ∈ C∞(M), where P τ stands for the or-
thogonal projection onto the distribution τ in a given Riemannian metric on M . The
right-translated gradient projections Horτφ := (P τ∇f) φ | f ∈ C∞(M) define a hori-
zontal bundle for the principal bundle D → W by nonholonomic Hodge decomposition.
(Note also that in both classical and nonholonomic cases, the obtained horizontal distri-
butions on D are nonintegrable, cf. [52]. Indeed, the Lie bracket of two gradient fields
is not necessarily a gradient field, and similarly for gradient projections. Hence there
are no horizontal sections of the bundle D → W , tangent to these horizontal gradient
distributions.)
As we will see in Chapters 7 and 9, both gradient fields ∇f in the classical case
and gradient projections P τ∇f in the nonholonomic case allow one to move the den-
sities in the “fastest way”, and are important in transport problems of finding optimal
(“shortest”) path between densities.
6.3 Accessibility of Diffeomorphisms and Consequences
A stronger statement is recently proven in [2] after my paper [34] was distributed in the
archive:
Theorem 6.5 Every orientation preserving diffeomorphism in the diffeomorphism group
D can be accessed by a horizontal path tangent to the distribution T from the identity
diffeomorphism.
This theorem can be thought of as an analog of the Chow-Rashevsky theorem in the
infinite-dimensional setting of the group of diffeomorphisms, provided that the distribu-
Chapter 6. Distributions on Diffeomorphism Groups 61
tion T is bracket-generating on D. Note, however, that the Chow-Rashevsky theorem is
unknown in the general setting of an infinite-dimensional manifold, while there are only
“approximate” analogs of it, e.g. on a Hilbert manifold.
This theorem together with the original Moser theorem imply the nonholonomic Moser
theorem 5.1 on volume forms. Moreover, it also implies the following nonholonomic
version of the Moser theorem on symplectic structures from [43].
Corollary 6.6 Suppose that on a manifold M two symplectic structures ω0 and ω1 from
the same cohomology class can be connected by a path of symplectic structures in the same
class. Then for a bracket-generating distribution τ on M there exists a diffeomorphism
φ of M which is the time-one-map of a non-autonomous vector field Vt tangent to the
distribution τ everywhere on M and for every t ∈ [0, 1], such that φ∗ω1 = ω0.
This corollary follows from the one above since one would consider the diffeomorphism
from the classical Moser theorem, and realize it by the horizontal path (tangent to the
distribution T ) on the diffeomorphism group, which exists according to Theorem 6.5.
Chapter 7
The Riemannian Geometry of
Diffeomorphism Groups and Mass
Transport
The differential geometry of diffeomorphism groups is closely related to the theory of
optimal mass transport, and in particular, to the problem of moving one density to
another while minimizing certain cost on a Riemannian manifold. In this section, we
review the corresponding metric properties of the diffeomorphism group and the space
of volume forms.
Let M be a compact Riemannian manifold without boundary (or, more generally, a
complete metric space) with a distance function d. Let µ and ν be two Borel probability
measures on the manifold M which are absolutely continuous with respect to the Lebesgue
measure. Consider the following optimal mass transport problem: Find a Borel map
φ : M → M that pushes the measure µ forward to ν and attains the infimum of the
L2-cost functional∫
Md2(x, φ(x))µ among all such maps.
The set of all Borel probability measures is called the Wasserstein space. The minimal
62
Chapter 7. The Riemannian Geometry of Diffeomorphism Groups and Mass Transport63
cost of transport defines a metric d on this space:
d2(µ, ν) := infφ∫
M
d2(x, φ(x))µ | φ∗µ = ν . (7.1)
This mass transport problem admits a unique solution φ (defined up to measure zero
sets), called an optimal map (see [16] for M = Rn and [42] for any compact connected
Riemannian manifold M without boundary). Furthermore, there exists a 1-parameter
family of Borel maps φt starting at the identity map φ0 = id, ending at the optimal map
φ1 = φ and such that φt is the optimal map pushing µ forward to νt := φt∗µ for any
t ∈ (0, 1). The corresponding 1-parameter family of measures νt describes a geodesic in
the Wasserstein space of measures with respect to the distance function d and is called
the displacement interpolation between µ and ν, see [63] for details. (More generally,
in mass transport problems one can replace d2 in the above formula by a cost function
c : M×M → R, while we mostly focus on the case c = d2 and its subriemannian analog.)
In what follows, we consider a smooth version of the Wasserstein space, cf. Section
6.1. Recall that the smooth Wasserstein space W consists of smooth volume forms with
the total integral equal to 1. One can consider an infinite-dimensional manifold structure
on the smooth Wasserstein space, a (weak) Riemannian metric 〈 , 〉W , corresponding to
the distance function d, and geodesics on this space. Similar to the finite-dimensional
case, geodesics on the smooth Wasserstein spaceW can be formally defined as projections
of trajectories of the Hamiltonian vector field with the “kinetic energy” Hamiltonian in
the tangent bundle TW .
For a Riemannian manifold M both spaces D and W can be equipped with (weak)
Riemannian structures, i.e. can be formally regarded as infinite-dimensional Riemannian
manifolds, cf. [21]. (One can consider Hs-diffeomorphisms and Hs−1-forms of Sobolev
class s > n/2 + 1. Both sets can be considered as smooth Hilbert manifolds. However,
this is not applicable in the subriemannian case, discussed later, hence we confine to the
C∞ setting applicable in the both cases.)
Chapter 7. The Riemannian Geometry of Diffeomorphism Groups and Mass Transport64
From now on we fix a Riemannian metric 〈, 〉M on the manifold M , whose Riemannian
volume is the form µ. On the diffeomorphism group we define a Riemannian metric 〈, 〉D
whose value at a point φ ∈ D is given by
〈X1 φ, X2 φ〉D :=
∫
M
〈X1 φ(x), X2 φ(x)〉Mφ(x)µ. (7.2)
The action along a curve (or, “energy” of a curve) φt | t ∈ [0, 1] ⊂ D in this metric is
defined in the following straightforward way:
E(φt) =
∫ 1
0
dt
∫
M
〈∂tφt, ∂tφt〉M µ .
If M is flat, D is locally isometric to the (pre-)Hilbert L2-space of (smooth) vector-
functions φ, see e.g. [57]. The following proposition is well-known.
Proposition 7.1 Let φt be a geodesic on the diffeomorphism group D with respect to
the above Riemannian metric 〈, 〉D, and Vt be the (time-dependent) velocity field of the
corresponding flow: ∂tφt = Vt φt. Then the velocity Vt satisfies the inviscid Burgers
equation on M :
∂tVt +∇VtVt = 0 ,
where ∇VtVt stands for the covariant derivative of the field Vt on M along itself.
Proof In the flat case the geodesic equation is ∂2t φt = 0: this is the Euler-Lagrange
equation for the action functional E. Differentiate ∂tφt = Vt φt with respect to time t
and use this geodesic equation to obtain
∂tVt φt +∇Vt∂tφt = 0. (7.3)
After another substitution ∂tφt = Vt φt, the later becomes
(∂tVt +∇VtVt) φt = 0,
which is equivalent to the Burgers equation.
The non-flat case involves differentiation in the Levi-Civita connection on M and
leads to the same Burgers equation, see details in [21, 35]. ¤
Chapter 7. The Riemannian Geometry of Diffeomorphism Groups and Mass Transport65
Remark 7.2 Smooth solutions of the Burgers equation correspond to non-interacting
particles on the manifold M flying along those geodesics on M which are defined by the
initial velocities V0(x). The Burgers flows have the form φt(x) = expM(tV0(x)), where
expM : TM → M is the Riemannian exponential map on M .
Proposition 7.3 [52] The bundle projection πD : D →W is a Riemannian submersion
of the metric 〈 , 〉D on the diffeomorphism group D to the Riemannian metric 〈 , 〉W on
the smooth Wasserstein space W for the L2-cost. The horizontal (i.e. normal to fibers)
spaces in the bundle D →W are right-translated gradient fields.
Recall that for two Riemannian manifolds Q and B, a Riemannian submersion π :
Q → B is a mapping onto B which has maximal rank and preserves lengths of horizontal
tangent vectors to Q, see e.g. [51]. For a bundle Q → B, this means that there is
a distribution of horizontal spaces on Q, orthogonal to the fibers, which is projected
isometrically to the tangent spaces to B. One of the main properties of a Riemannian
submersion gives the following feature of geodesics:
Corollary 7.4 Any geodesic, initially tangent to a horizontal space on the full diffeo-
morphism group D, always remains horizontal, i.e. tangent to the horizontal distribu-
tion. There is a one-to-one correspondence between geodesics on the base W starting at
the measure µ and horizontal geodesics in D starting at the identity diffeomorphism id.
Remark 7.5 In the PDE terms, the horizontality of a geodesic means that a solution
of the Burgers equation with a potential initial condition remains potential forever. This
also follows from the Hamiltonian formalism and the moment map geometry discussed
in the next section. Since horizontal geodesics in the group D correspond to geodesics
on the density space W , potential solutions of the Burgers equation (corresponding to
horizontal geodesics) move the densities in the fastest way. The corresponding time-one-
maps for Burgers potential solutions provide optimal maps for moving the density µ to
any other density ν, see [16, 42].
Chapter 7. The Riemannian Geometry of Diffeomorphism Groups and Mass Transport66
The Burgers potential solutions have the form φt(x) = expM(−t∇f(x)) as long as
the right-hand-side is smooth. The time-one-map φ1 for the flow φt provides an optimal
map between probability measures if the function f is a (d2)-concave function. We
recall that the notion of c-concavity for a cost function c on M is defined as follows.
For a function f its c1-transform is f c1(y) = infx∈M(c(x, y) − f(x)), its c2-transform
f c2(x) = infy∈M(c(x, y) − f(y)) and the function f is said to be c-concave if f c1c2 = f .
Here, we consider the case c = d2. The family of maps φt defines the displacement
interpolation mentioned at the beginning of this chapter.
Let θ and ν be volume forms with the same total volume and let g and h be functions
on the manifold M defined by θ = g vol and ν = h vol, where vol be the Riemannian
volume form. Then a diffeomorphism φ moving one density to the other (φ∗θ = ν) satisfies
h(φ(x)) det(Dφ(x)) = g(x), where Dφ is the Jacobi matrix of the diffeomorphism φ. In
the flat case the optimal map φ is gradient, φ = ∇f , and the corresponding convex
potential f satisfies the Monge-Ampere equation
det(Hessf(x))) =g(x)
h(∇f(x)),
since D(∇f) = Hessf . In the non-flat case, the optimal map is φ(x) = expM(−∇f(x))
for a (d2/2)-concave potential f , and the equation is Monge-Ampere-like, see [42, 63] for
details. Below we describe the corresponding nonholonomic analogs of these objects.
Chapter 8
The Hamiltonian Mechanics on
Diffeomorphism Groups
In this section we present a Hamiltonian framework for the Otto calculus and, in par-
ticular, give a symplectic proof of Proposition 7.3 and Corollary 7.4 on the submersion
properties along with their generalizations.
8.1 Averaged Hamiltonians
We fix a Riemannian metric 〈, 〉M on the manifold M and consider the corresponding
Riemannian metric 〈, 〉D on the diffeomorphism group D. This defines a map (X φ) 7→〈Xφ , ·〉D from the tangent bundle TD to the cotangent bundle T ∗D. By using this map,
one can pull back the canonical symplectic form ωT ∗D from the cotangent bundle T ∗D to
the tangent bundle TD, and regard the latter as a manifold equipped with the symplectic
form ωTD.1 Similarly, a symplectic structure ωTM can be defined on the tangent bundle
TM by pulling back the canonical symplectic form on the cotangent bundle T ∗M via the
Riemannian metric 〈, 〉M . The two symplectic forms are related as follows. A tangent
1The consideration of the tangent bundle TD (instead of T ∗D) as a symplectic manifold allows oneto avoid dealing with duals of infinite-dimensional spaces here.
67
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 68
vector V in the tangent space TXφTD at the point X φ ∈ TD is a map from M to
T (TM) = T 2M such that πT 2M V = X φ, where πT 2M : T (TM) → TM is the tangent
bundle projection. Let V1 and V2 be two tangent vectors in TXφTD at the point X φ,
then the symplectic forms are related in the following way:
ωTD(V1, V2) =
∫
M
ωTM(V1(x), V2(x))µ(x) ,
where ωTM is understood as the pairing on T (TM) = T 2M .
Definition 8.1 Let HM be a Hamiltonian function on the tangent bundle TM of the
manifold M . The averaged Hamiltonian function is the function HD on the tangent
bundle TD of the diffeomorphism group D obtained by averaging the corresponding
Hamiltonian HM over M in the following way: its value at a point X φ ∈ TφD is
HD(X φ) :=
∫
M
HM(X φ(x))µ(x) (8.1)
for a vector field X ∈ X and a diffeomorphism φ ∈ D.
Consider the Hamiltonian flows for these Hamiltonian functions HM and HD on
the tangent bundles TM and TD, respectively, with respect to the standard symplectic
structures on the bundles. The following theorem can be viewed as a generalization of
Propositions 7.1 and 7.3.
Theorem 8.2 Each Hamiltonian trajectory for the averaged Hamiltonian function HD
on TD describes a flow on the tangent bundle TM , in which every tangent vector to M
moves along its own HM -Hamiltonian trajectory in TM .
Example 8.3 For the Hamiltonian KM(p, q) = 12〈p, p〉M given by the “kinetic energy”
for the metric on M , the above theorem implies that any geodesic on D is a family of
diffeomorphisms of M , in which each particle moves along its own geodesic on M with
constant velocity, i.e. its velocity field is a solution to the Burgers equation, cf. Remark
7.2.
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 69
Below we discuss this theorem and its geometric meaning in detail. In particular, in
the above form, the statement is also applicable to the case of nonholonomic distributions
(i.e. subriemannian, or Carnot-Caratheodory spaces) discussed in the next section.
8.2 Riemannian Submersion and Symplectic Quo-
tients
We start with a Hamiltonian proof of Proposition 7.3 on the Riemannian submersion
D → W of diffeomorphisms onto densities. Recall the following general construction in
symplectic geometry. Let π : Q → B be a principal bundle with the structure group G.
Lemma 8.4 (see e.g. [12]) The symplectic reduction of the cotangent bundle T ∗Q over
the G-action gives the cotangent bundle T ∗B = T ∗Q//G.
Proof The moment map J : T ∗Q → g∗ associated with this action takes T ∗Q to the
dual of the Lie algebra g = Lie(G). For the G-action on T ∗Q the moment map J is
the projection of any cotangent space T ∗a Q to cotangent space T ∗
a F ≈ g∗ for the fiber F
through a point a ∈ Q. The preimage J−1(0) of the zero value is the subbundle of T ∗Q
consisting of covectors vanishing on fibers. Such covectors are naturally identified with
covectors on the base B. Thus factoring out the G-action, which moves the point a over
the fiber F , we obtain the bundle T ∗B. ¤
Suppose also that Q is equipped with a G-invariant Riemannian metric 〈, 〉Q.
Lemma 8.5 The Riemannian submersion of (Q, 〈, 〉Q) to the base B with the induced
metric 〈, 〉B is the result of the symplectic reduction.
Proof Indeed, the metric 〈, 〉Q gives a natural identification T ∗Q ≈ TQ of the tan-
gent and cotangent bundles for Q, and the “projected metric” is equivalent to a similar
identification for the base manifold B.
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 70
In the presence of metric in Q, the preimage J−1(0) is identified with all vectors in
TQ orthogonal to fibers, that is J−1(0) is the horizontal subbundle in TQ. Hence, the
symplectic quotient J−1(0)/G can be identified with the tangent bundle TB. ¤
Proof of Proposition 7.3
Now we apply this “dictionary” to the diffeomorphism group D and the Wasserstein
space W . Consider the projection map πD : D → W as a principal bundle with the
structure group Dµ of volume-preserving diffeomorphisms of M . Recall that the vertical
space of this principal bundle at a point φ ∈ D consists of right-translations by the
diffeomorphism φ of vector fields which are divergence-free with respect to the volume
form φ∗µ: Verφ = X φ | divφ∗µX = 0 , and the horizontal space is given by translated
gradient fields: Horφ = ∇f φ | f ∈ C∞(M).For each volume-preserving diffeomorphism ψ ∈ Dµ, the Dµ-action Rψ of ψ by right
translations on the diffeomorphism group is given by
Rψ(φ) = φ ψ.
The induced action TRψ : TD → TD on the tangent spaces of the diffeomorphism group
is given by
TRψ(X φ) = (X φ) ψ.
One can see that for volume-preserving diffeomorphisms ψ this action preserves the Rie-
mannian metric (7.2) on the diffeomorphism group D (it is the change of variable for-
mula), while for a general diffeomorphism one has an extra factor Dψ, the Jacobian of
ψ, in the integral. ¤
Remark 8.6 The explicit formula of the moment map J : TQ → X∗µ for the group of
volume-preserving diffeomorphisms G = Dµ acting on Q = D is
J(X φ)(Y ) =
∫
M
〈X, φ∗Y 〉Mφ∗µ ,
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 71
where Y ∈ Xµ is any vector field on M divergence-free with respect to the volume form
µ, X ∈ X, and φ ∈ D.
8.3 Hamiltonian Flows on the Diffeomorphism Groups
Let HQ : TQ → R be a Hamiltonian function invariant under the G-action on the cotan-
gent bundle of the total space Q. The restriction of the function HQ to the horizontal
bundle J−1(0) ⊂ TQ is also G-invariant, and hence descends to a function HB : TB → R
on the symplectic quotient, the tangent bundle of the base B. Symplectic quotients
admit the following reduction of Hamiltonian dynamics:
Proposition 8.7 [12] The Hamiltonian flow of the function HQ preserves the preimage
J−1(0), i.e. trajectories with horizontal initial conditions stay horizontal. Furthermore,
the Hamiltonian flow of the function HQ on the tangent bundle TQ of the total space Q
descends to the Hamiltonian flow of the function HB on the tangent bundle TB of the
base.
Now we are going to apply this scheme to the bundle D →W . For a fixed Hamiltonian
function HM on the tangent bundle TM to the manifold M , consider the corresponding
averaged Hamiltonian function HD on TD, given by the formula (8.1): HD(X φ) :=∫
MHM(X φ(x))µ. The latter Hamiltonian is Dµ-invariant (as also follows from the
change of variable formula) and it will play the role of the function HQ. Thus the flow
for the averaged Hamiltonian HD descends to the flow of a certain Hamiltonian HW on
TW .
Describe explicitly the corresponding flow on the tangent bundles of D and W . Let
ΨHM
t : TM → TM be the Hamiltonian flow of the Hamiltonian HM on the tangent
bundle of the manifold M and ΨHDt : TD → TD denotes the flow for the Hamiltonian
function HD on the tangent bundle of the diffeomorphism group.
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 72
Theorem 8.8 (=8.2′) The Hamiltonian flows of the Hamiltonians HD and HM are
related by
ΨHDt (X φ)(x) = ΨHM
t (X(φ(x))) ,
where, on the right-hand-side, the flow ΨHM
t on TM transports the shifted field X(φ(x)),
while, on the left-hand-side, X φ is regarded as a tangent vector to D at the point φ.
Proof We prove this infinitesimally (cf. [21]). Let XHD and XHM be the Hamiltonian
vector fields corresponding to the Hamiltonians HD and HM respectively. We claim that
XHD(X φ) = XHM X φ. Indeed, by the definition of Hamiltonian fields, we have
ωTD(XHM X φ, Y ) =
∫
M
ωTM(XHM (X(φ(x))), Y (x))µ =
∫
M
dHMX(φ(x))(Y (x))µ(x)
for any Y ∈ TφD. By interchanging the integration and exterior differentiation, the latter
expression becomes dHDXφ(Y ). The result follows from the uniqueness of the Hamiltonian
vector field which, in turn, is a consequence of the weak nondegeneracy of the symplectic
form ωTD (cf. [21]). ¤
Remark 8.9 This theorem has a simple geometric meaning for the “kinetic energy”
Hamiltonian function KM(v) := 12〈v, v〉M on the tangent bundle TM . One of the possible
definitions of geodesics in M is that they are projections to M of trajectories of the
Hamiltonian flow on TM , whose Hamiltonian function is the kinetic energy. In other
words, the Riemannian exponential map expM on the manifold M is the projection of
the Hamiltonian flow ΨKM
t on TM . Similarly, the Riemannian exponential expD of the
diffeomorphism group D is the projection of the Hamiltonian flow for the Hamiltonian
KD(X φ) := 12
∫M〈X φ,X φ〉Mµ on TD.
Recall that the geodesics on the diffeomorphism group (described by the Burgers
equation, see Proposition 7.1) starting at the identity with the initial velocity V ∈ TidDare the flows which move each particle x on the manifold M along the geodesic with
the direction V (x). Such a geodesic is well defined on the diffeomorphism group D as
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 73
long as the particles do not collide. The corresponding Hamiltonian flow on the tangent
bundle TD of the diffeomorphism group describes how the corresponding velocities of
these particles vary (cf. Example 8.3).
For a more general Hamiltonian HM on the tangent bundle TM , each particle x ∈ M
with an initial velocity V (x) will be moving along the corresponding characteristic, which
is the projection to M of the corresponding trajectory ΨHM
t (V (x)) in the tangent bundle
TM .
Now we would like to describe more explicitly horizontal geodesics and characteristics
on the diffeomorphism group D. Recall that ΨHDt denotes the Hamiltonian flow of the
averaged Hamiltonian HD on the tangent bundle TD of the diffeomorphism group D.
If this Hamiltonian flow is gradient at the initial moment, it always stays gradient, as
implied by Corollary 7.4. Furthermore, the corresponding potential can be described as
follows.
Corollary 8.10 Let f be a function on the manifold M . Then the Hamiltonian flow for
HD with the initial condition ∇f φ ∈ TφD has the form ∇ft φt, where φt ∈ D is a
family of diffeomorphisms and ft is the family of functions on M starting at f0 = f and
satisfying the Hamilton-Jacobi equation
∂tft + HM(∇ft(x)) = 0 . (8.2)
Proof This follows from the method of characteristics, which gives the following way
of finding ft, the solution to the Hamilton-Jacobi equation (8.2). Consider the tangent
vector ∇f(x) for each point x ∈ M . Denote by ΨHM
t : TM → TM the Hamiltonian
flow for the Hamiltonian HM : TM → R and consider its trajectory t 7→ ΨHM
t (∇f(x))
starting at the tangent vector ∇f(x). Then project this trajectory to M using the
tangent bundle projection πTM : TM → M to obtain a curve in M . It is given by the
formula t 7→ πTM(ΨHM
t (∇f(x))). As x varies over the manifold M , this defines a flow
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 74
φt := πTM ΨHM
t ∇f on M . (Note that this procedure defines a flow for small time t,
while for larger times the map φt may cease to be a diffeomorphism, i.e. shock waves can
appear.) The corresponding time-dependent vector field is gradient and defines the family
∇ft, the gradient of the solution to the Hamilton-Jacobi equation above, see Figure 8.1.
¤
TM TxM
x
∇f(x)
Mφt(x)
Tφt(x)M
ΨHM
t (∇f(x))
= ∇ft(φt(x))
Figure 8.1: Hamiltonian flow of the Hamiltonian HM and its projection: The curve φt(x)
is the projection of the curve ΨHM
t (∇f(x)) to the manifold M .
Remark 8.11 The above corollary manifests that the Hamilton-Jacobi equation (8.2)
can be solved using the method of characteristics due to the built-in symmetry group of
all volume preserving diffeomorphisms.
8.4 Hamiltonian Flows on the Wasserstein Space
What is the corresponding flow on the tangent bundle TW of the Wasserstein space,
induced by the Hamiltonian flow on TD for the diffeomorphism group D after the pro-
jection πD : D → W? Fix a Hamiltonian HM on the tangent bundle TM which defines
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 75
the averaged Hamiltonian function HD on the tangent bundle TD, see Equation (8.1).
Now we describe explicitly the induced Hamiltonian HW on the tangent bundle TW .
Let (ν, η) be a tangent vector at a density ν on M , regarded as a point of the Wasser-
stein space W . The normalization of densities (∫
ν = 1 for all ν ∈ W) gives the con-
straint for tangent vectors:∫
Mη = 0. Let f : M → R be a function that satisfies
(−divν∇f)ν = η. (Given (ν, η), such a function is defined uniquely up to an additive
constant.) Then the induced Hamiltonian on the tangent bundle TW of the base W is
given by
HW(ν, η) =
∫
M
HM(∇f(x)) ν , (8.3)
since ∇f is a vector of the horizontal distribution in TD.
Now, the flow ΨHWt of the corresponding Hamiltonian field on TW can be found
explicitly by employing Proposition 8.7. Consider the flow φt := πTM ΨHM
t ∇f defined
on M for small t in Corollary 8.10.
Theorem 8.12 The Hamiltonian flow ΨHWt of the Hamiltonian function HW on the
tangent bundle TW of the Wasserstein space W is
ΨHWt (ν, η) = (νt,−L∇ftνt) ,
where L is the Lie derivative, the family of functions ft satisfies the Hamilton-Jacobi
equation (8.2) for the Hamiltonian function HM on the tangent bundle TM , and the
family νt = (φt)∗ν is the push forward of the volume form ν by the map φt defined above.
Proof The function HD(X φ) =∫
MHM(X(φ(x)))µ(x) on the tangent bundle TD
of the diffeomorphism group induces the Hamiltonian HW on TW . By virtue of the
Hamiltonian reduction, Hamiltonian trajectories of HD contained in the horizontal bun-
dle Hor = ∇f φ | f ∈ C∞(M) descend to Hamiltonian trajectories of HW . Then the
Hamiltonian flow ΨHDof the Hamiltonian HD is given by ΨHD
(X φ) = ΨHM X φ,
due to Theorem 8.8. By restricting this to the horizontal bundle Hor we have
ΨHD(∇f φ) = ΨHM ∇f φ. (8.4)
Chapter 8. The Hamiltonian Mechanics on Diffeomorphism Groups 76
The flow ΨHDis described in Corollary 8.10 and has the form ΨHD
(∇f φ) = ∇ft φt,
where ft and φt are defined as required.
On the other hand, recall that the projection πD : D →W is defined by πD(φ) = φ∗µ.
The differential DπD of this map πD is
Dπ(X φ) := (φ∗µ,−LX(φ∗µ)) .
The application of this relation to (8.4) gives the result. ¤
Remark 8.13 The time-one-map for the above density flow νt in the Wasserstein space
W formally describes optimal transport maps for the Hamiltonian HM . In particular, it
recovers the optimal map recently obtained in [14]. One considers the optimal transport
problem for the functional
infφ∫
M
c(x, φ(x))µ | φ∗µ = ν
with the cost function c defined by
c(x, y) = inf
∫ 1
0
L(γ, γ) dt ,
where the infimum is taken over paths γ joining the points x and y and the Lagrangian
L : TM → R satisfies certain regularity and convexity assumptions, see [14]. The corre-
sponding Hamiltonian HM in Theorem 8.12 is the Legendre transform of the Lagrangian
L. Note that for the “kinetic energy” Lagrangian KM , the above map becomes the opti-
mal map expM(−∇f) mentioned at the beginning of this section, with expM : TM → M
being the Riemannian exponential of the manifold M .
Chapter 9
The Subriemannian Geometry of
Diffeomorphism Groups
In this chapter we develop the subriemannian setting for the diffeomorphism group. In
particular, we derive the geodesic equations for the “nonholonomic Wasserstein metric,”
and describe nonholonomic versions of the Monge-Ampere and heat equations.
Let M be a manifold with a fixed distribution τ on it. Recall that a subrieman-
nian metric is a positive definite inner product 〈 , 〉τ on each plane of the distribution τ
smoothly depending on a point in M . Such a metric can be defined by the bundle map
I : T ∗M → τ , sending a covector αx ∈ T ∗xM to the vector Vx in the plane τx such that
αx(U) = 〈Vx, U〉τ on vectors U ∈ τx. The subriemannian Hamiltonian Hτ : T ∗M → R is
the corresponding fiberwise quadratic form:
Hτ (αx) =1
2〈Vx, Vx〉τ . (9.1)
Let ΨHτ
t be the Hamiltonian flow for time t of the subriemannian Hamiltonian Hτ on
T ∗M , while πT ∗M : T ∗M → M is the cotangent bundle projection. Then the subrie-
mannian exponential map expτ : T ∗M → M is defined as the projection to M of the
time-one-map of the above Hamiltonian flow on T ∗M :
expτ (tαx) := πT ∗MΨHτ
t (αx). (9.2)
77
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 78
This relation defines a normal subriemannian geodesic on M with the initial covector αx.
Note that the initial velocity of the subriemannian geodesic expτ (tαx) is Vx = Iαx ∈ τx.
So, unlike the Riemannian case, there are many subriemannian geodesics having the same
initial velocity Vx on M .
Let dτ be a subriemannian (or, Carnot-Caratheodory) distance on the manifold M ,
defined as the infimum of the length of all absolutely continuous admissible (i.e. tangent
to τ) curves joining given two points. For a bracket-generating distribution τ any two
points can be joined by such a curve, so this distance is always finite. Consider the
corresponding optimal transport problem by replacing the Riemannian distance d in (7.1)
with the subriemannian distance dτ . Below we study the infinite-dimensional geometry
of this subriemannian version of the optimal transport problem. Although in general
normal subriemannian geodesics might not exhaust all the length minimizing geodesics in
subriemannian manifolds (see [44, 45]), we will see that in the problems of subriemannian
optimal transport one can confine oneself to only such geodesics!
9.1 Subriemannian Submersion
Consider the following general setting: Let (Q, T ) be a subriemannian space, i.e. a
manifold Q with a distribution T and a subriemannian metric 〈 , 〉τ on it. Suppose that
Q → B is a bundle projection to a Riemannian base manifold B.
Definition 9.1 The projection π : (Q, T ) → B is a subriemannian submersion if the
distribution T contains a horizontal subdistribution T hor, orthogonal (with respect to the
subriemannian metric) to the intersections of T with fibers, and the projection π maps
the spaces T hor isometrically to the tangent spaces of the base B, see Figure 9.1.
Let a subriemannian submersion π : (Q, T ) → B be a principal G-bundle Q → B,
where the distribution T and the subriemannian metric are invariant with respect to the
action of the group G. The following theorem is an analog of Corollary 7.4.
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 79
Thor
Q
B
π
T
F
Figure 9.1: Subriemannian submersion: horizontal subdistribution T hor is mapped iso-
metrically to the tangent bundle TB of the base.
Theorem 9.2 For each point b in the base B and a point q in the fibre π−1(b) ⊂ Q over
b, every Riemannian geodesics on the base B starting at b admits a unique lift to the
subriemannian geodesic on Q starting at q with the velocity vector in T hor.
Example 9.3 Consider the standard Hopf bundle π : S3 → S2, with the two-dimensional
distribution T transversal to the fibers S1. Fix the standard metric on the base S2 and
lift it to a subriemannian metric on S3, which defines a subriemannian submersion. If the
distribution T is orthogonal to the fibers, the manifold (S3, T ) can locally be thought of
as the Heisenberg 3-dimensional group. Then all subriemannian geodesics on S3 with a
given horizontal velocity project to a 1-parameter family of circles on S2 with a common
tangent element. However, only one of these circles, the equator, is a geodesic on the
standard sphere S2. Thus the equator can be uniquely lifted to a subriemannian geodesic
on S3 with the given initial vector.
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 80
Note that the uniqueness of this lifting holds even if the distribution T is not orthog-
onal, but only transversal, say at a fixed angle, to the fibers S1, see Figure 9.2.
S1
π
S3
T
S2
Figure 9.2: Projections of subriemannian geodesics from (S3, T ) in the Hopf bundle give
circles in S2, only one of which, the equator, is a geodesic on the base S2.
Proof of Theorem 9.2
To prove this theorem we describe the Hamiltonian setting of the subriemannian
submersion.
Let V er be the vertical subbundle in TQ (i.e. tangent planes to the fibers of the
projection Q → B). Define V er⊥ ⊂ T ∗Q to be the corresponding annihilator, i.e. V er⊥q
is the set of all covectors αq ∈ T ∗q Q at the point q ∈ Q which annihilate the vertical space
V erq.
Definition 9.4 The restriction of the subriemannian exponential map expτ : T ∗Q → Q
to the distribution V er⊥ is called the horizontal exponential
expτ : V er⊥ → Q
and the corresponding geodesics are the horizontal subriemannian geodesics.
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 81
The symplectic reduction identifies the quotient V er⊥/G with the cotangent bundle
T ∗B of the base. Note that the subdistribution T hor defines a horizontal bundle for the
principal bundle Q → B in the usual sense. The definition of subriemannian submersion
(translated to the cotangent spaces, where we replace T hor by V er⊥) gives that the
subriemannian Hamiltonian HT defined by (9.1) descends to a Riemannian Hamiltonian
HB,T on T ∗B. Moreover, Hamiltonian trajectories of HB,T starting at the cotangent
space T ∗b B are in one-to-one correspondence with the trajectories of HT starting at the
space V er⊥q . The projection of these Hamiltonian trajectories to the manifolds B and Q
via the cotangent bundle projections πT ∗B and πT ∗Q, respectively, gives the result. ¤
Corollary 9.5 For a subriemannian submersion, geodesics on the base give rise only to
normal geodesics in the total space.
In order to describe the geodesic geometry on the tangent, rather than cotangent,
bundle of the manifold Q, we fix a Riemannian metric on Q whose restriction to the
distribution τ is the given subriemannian metric 〈 , 〉τ . This Riemannian metric allows one
to identify the cotangent bundle T ∗Q with the tangent bundle TQ. Then the exponential
map expτ can be viewed as a map TQ → Q. It is convenient to think of T hor as
the horizontal bundle and identify it with the annihilator V er⊥. This way horizontal
subriemannian geodesics are geodesics with initial (co)vector in the horizontal bundle
T hor. This identification is particularly convenient for the infinite-dimensional setting,
where we work with the tangent bundle of the diffeomorphism group.
9.2 A Subriemannian Analog of the Otto Calculus.
Fix a Riemannian metric 〈 , 〉M on the manifold M . Let P τ : TM → τ be the orthogonal
projection of vectors on M onto the distribution τ with respect to this metric. Let (ν, η1)
and (ν, η2) be two tangent vectors in the tangent space at the point ν of the smooth
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 82
Wasserstein space. Recall that for a fixed volume form µ, we define the subriemannian
Laplacian as ∆τf := divµ(P τ∇f).
Define a nonholonomic Wasserstein metric as the (weak) Riemannian metric on the
(smooth) Wasserstein space W given by
〈(ν, η1), (ν, η2)〉W,T :=
∫
M
〈P τ∇f1(x), P τ∇f2(x)〉Mν , (9.3)
where functions f1 and f2 are solutions of the subriemannian Poisson equation
−(∆τfi)ν = ηi
for the measure ν.
Theorem 9.6 The geodesics on the Wasserstein space W equipped with the nonholo-
nomic Wasserstein metric (9.3) have the form (expτ (tP τ∇f))∗ν, where expτ : T hor → M
is the horizontal exponential map and ν is any point of W.
To prove this theorem we first note that the Riemannian metric 〈 , 〉D defined on the
diffeomorphism group restricts to a subriemannian metric 〈 , 〉D,T on the right invariant
bundle T .
Proposition 9.7 The map π : (D, T ) → W is a subriemannian submersion of the
subriemannian metric 〈 , 〉D,T on the diffeomorphism group with distribution T to the
nonholonomic Wasserstein metric 〈, 〉W,T .
Proof This statement can be derived from the Hamiltonian reduction, similarly to the
Riemannian case.
Here we prove it by an explicit computation. Recall that the map π : D → W is
defined by π(φ) = φ∗µ. Let Xφ be a tangent vector at the point φ in the diffeomorphism
group D. Consider the flow φt of the vector field X, and note that π(φt φ) = φt∗φ∗µ.
To compute the derivative Dπ we differentiate this equation with respect to time t at
t = 0:
Dπ(X φ) = L−X(φ∗µ) = −(divφ∗µX)φ∗µ ,
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 83
by the definition of Lie derivative. A vector field X from the horizontal bundle T hor has
the form (P τ∇f) φ, and for it the equation becomes
Dπ((P τ∇f) φ) = −(∆τf) φ∗µ ,
where the Laplacian ∆τ is taken with respect to the volume form φ∗µ.
Therefore, for horizontal tangent vectors (P τ∇f1) φ and (P τ∇f2) φ at the point φ
their subriemannian inner product is
〈(P τ∇f1) φ, (P τ∇f2) φ〉D =
∫
M
〈P τ∇f1 φ, P τ∇f2 φ〉Mµ .
After the change of variables this becomes
∫
M
〈P τ∇f1, Pτ∇f2〉Mφ∗µ = 〈Dπ((P τ∇f1) φ), Dπ((P τ∇f2) φ)〉W,T ,
which completes the proof. ¤
Proof of Theorem 9.6
To describe geodesics in the nonholonomic Wasserstein space we define the Hamilto-
nian HT : TD → R by
HT (X φ) :=
∫
M
〈(P τX) φ, (P τX) φ〉µ . (9.4)
The Hamiltonian flow with Hamiltonian HT , has the form expτ ((tP τX) φ) according
to Theorem 8.8. By taking its restriction to the bundle T hor and projecting to the base
we obtain that the geodesics on the smooth Wasserstein space are
(expτ ((tP τ∇f) φ))∗ν ,
where ν = φ∗µ and P τ∇f is defined by the Hodge decomposition for the field X. This
completes the proof of Theorem 9.6. ¤
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 84
Remark 9.8 For a horizontal subriemannian geodesic ϕt(x) := expτ (tP τ∇f(x)) with
a smooth function f , the diffeomorphism ϕt satisfies ddt
ϕt = (P τ∇ft) ϕt and ft is the
solution of the Hamilton-Jacobi equation
ft + Hτ (∇ft) = 0 (9.5)
with the initial condition f0 = f , see Corollary 8.10. This equation determines horizontal
subriemannian geodesics on the diffeomorphism group D. In the Riemannian case, one
can see that the vector fields Vt = ddt
ϕt = ∇ft ϕt satisfy the Burgers equation by taking
the gradient of the both sides in (9.5), cf. Proposition 7.1. Hence Equation (9.5) can
be viewed as a subriemannian analog of the potential Burgers equation in D. However,
a subriemannian analog of the Burgers equation for nonhorizontal (i.e. nonpotential)
normal geodesics on the diffeomorphism group is not so explicit.
Remark 9.9 If the function f is smooth, the time-one-map ϕ(x) := expτ (P τ∇f(x))
along the geodesics described in Theorem 9.6 satisfies the following nonholonomic analog
of the Monge-Ampere equation: h(ϕ(x)) det(Dϕ(x)) = g(x), where g and h are functions
on the manifold M defining two densities θ = g vol and ν = h vol.
Furthermore, for the case of the Heisenberg group this formal solution ϕ(x) coincides
with the optimal map obtained in [11]. The (minus) potential −f of the corresponding
optimal map satisfies the c-concavity condition for c = d2τ/2, where d2
τ is the subrieman-
nian distance, cf. Remark 7.5.
9.3 The Nonholonomic Heat Equation
Consider the heat equation ∂tu = ∆u on a function u on the manifold M , where the
operator ∆ is given by ∆f = divµ∇f . Upon multiplying the both sides of the heat
equation by the fixed volume form µ, one can regard it as an evolution equation on the
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 85
smooth Wasserstein space W . Note that the right-hand-side of the heat equation gives a
tangent vector (∆u)µ at the point uµ of the Wasserstein space. The Boltzmann relative
entropy functional Ent : W → R is defined by the integral
Ent(ν) :=
∫
M
log(ν/µ) ν . (9.6)
The gradient flow of Ent on the Wasserstein space with respect to the metric d gives the
heat equation, see [52].
Recall that one can define the subriemannian Laplacian: ∆τf := divµ(P τ∇f) for
a fixed volume form µ on M . The natural generalization of the heat equation to the
nonholonomic setting is as follows.
Definition 9.10 The nonholonomic (or, subriemannian) heat equation is the equation
∂tu = ∆τu on a time-dependent function u on M .
Below we show that this equation in the nonholonomic setting also admits a gradient
interpretation on the Wasserstein space.
Theorem 9.11 The nonholonomic heat equation ∂tu = ∆τu describes the gradient flow
on the Wasserstein space with respect to the relative entropy functional (9.6) and the
nonholonomic Wasserstein metric (9.3).
Namely, for the volume form νt := gt∗µ and the gradient ∇W,T with respect to the
metric 〈 , 〉W,T on the Wasserstein space one has
∂
∂tνt = −∇W,T Ent(νt) = ∆τ (νt/µ)µ.
Proof Denote by (ν, η) a tangent vector to the Wasserstein space W at a point ν ∈ W ,
where η is a volume form of total integral zero. Let ∆τν be the subriemannian Laplacian
with respect to the volume form ν.
Let h and hEnt be real-valued functions on the manifold M such that −(∆τνh)ν = η
and −(∆τνhEnt)ν = ∇W,T Ent(ν) for the entropy functional Ent. Then, by definition of
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 86
the metric 〈 , 〉W,T given by (9.3), we have
〈(ν,∇W,T Ent(ν)), (ν, η)〉W,T =
∫
M
〈P τ∇hEnt(x), P τ∇h(x)〉Mν. (9.7)
On the other hand, by definitions of Ent and the gradient ∇W,T on the Wasserstein
space, one has:
〈(ν,∇W,T Ent(ν)), (ν, η)〉W,T :=d
dt
∣∣∣t=0
Ent(ν + tη) =d
dt
∣∣∣t=0
∫
M
[log
(ν + tη
µ
)](ν + tη) .
After differentiation and simplification the latter expression becomes∫
Mlog(ν/µ) η , where
we used that∫
Mη = 0. This can be rewritten as
∫
M
log(ν/µ) η = −∫
M
log(ν/µ)LP τ∇hν =
∫
M
(LP τ∇h log(ν/µ)) ν ,
by using the Leibnitz property of the Lie derivative L on the Wasserstein space and
the fact that −(∆τνh)ν = η. Note that the Lie derivative is the inner product with the
gradient, and hence
∫
M
(LP τ∇h log(ν/µ)) ν =
∫
M
〈∇ log(ν/µ), P τ∇h〉Mν =
∫
M
〈P τ∇ log(ν/µ), P τ∇h〉Mν .
Comparing the latter form with (9.7), we get P τ∇hEnt = P τ∇ log(ν/µ), or, after taking
the divergence of both parts and using the definition of function hEnt,
∇W,T Ent(ν) = −∆τν(log(ν/µ)) ν .
Finally, let us show that the right-hand-side of the above equation coincides with
−∆τµ(ν/µ) µ. Indeed, the chain rule gives
LP τ∇ log(ν/µ)ν = L(µ/ν)P τ∇(ν/µ)ν = (µ/ν)LP τ∇(ν/µ)ν + d(µ/ν) ∧ iP τ∇(ν/µ)ν .
The last term is equal to (iP τ∇(ν/µ)d(µ/ν))ν = LP τ∇(ν/µ)(µ/ν) ν, which implies that
LP τ∇ log(ν/µ)ν = LP τ∇(ν/µ)µ
by the Leibnitz property of Lie derivative. Thus
∆τν(log(ν/µ)) ν = divν(P
τ∇(log(ν/µ))ν = LP τ∇ log(ν/µ)ν = LP τ∇(ν/µ)µ = ∆τµ(ν/µ)µ .
Chapter 9. The Subriemannian Geometry of Diffeomorphism Groups 87
The above shows that the nonholonomic heat equation is the gradient flow on the
Wasserstein space for the same potential as the classical heat equation, but with respect
to the nonholonomic Wasserstein metric. ¤
Part III
Generalized Ricci Curvature Bounds
for Three Dimensional Contact
Subriemannian Manifolds
88
Chapter 10
Revisiting Subriemannian Geometry
In this chapter we recall basic notions in subriemannian geometry introduced in the
earlier part of the thesis and introduce several new concepts needed in this part of the
thesis. Recall that a subriemannian manifold is a triple (M, ∆, g), where M is a smooth
manifold, ∆ is a distribution (a vector subbundle ∆ of the tangent bundle TM of the
manifold M), and g is a fibrewise inner product defined on the distribution ∆. The
inner product g is also called a subriemannian metric. An absolutely continuous curve
γ : [0, 1] → M on the manifold M is called horizontal if it is almost everywhere tangent
to the distribution ∆. Using the inner product g, we can define the length l(γ) of a
horizontal curve γ by
l(γ) =
∫ 1
0
g(γ(t), γ(t))1/2dt.
The subriemannian or Carnot-Caratheodory distance dCC between two points x and
y on the manifold M is defined by
dCC(x, y) = inf l(γ), (10.1)
where the infimum is taken over all horizontal curves which start from x and end at y.
The above distance function may not be well-defined since there may exist two points
which are not connected by any horizontal curve. For this we assume that the distribution
89
Chapter 10. Revisiting Subriemannian Geometry 90
∆ is bracket generating. It means that the vector fields contained in the distribution ∆
together with their iterated Lie brackets span all tangent spaces of the manifold M .
Under the bracket generating assumption, the subriemannian distance is well-defined
thanks to the Chow-Rashevskii Theorem (Theorem 3.18).
As in Riemannian geometry, horizontal curves which realize the infimum in (10.1) are
called length minimizing geodesics (or simply geodesics). From now on all manifolds are
assumed to be complete with respect to a given subriemannian distance. It means that
given any two points on the manifold, there is at least one geodesic joining them. Next
we will discuss one type of geodesics called normal geodesics. For this let us recall several
notions in the symplectic geometry of the cotangent bundle T ∗M . Let π : T ∗M → M be
the projection map, the tautological one form θ on T ∗M is defined by
θα(V ) = α(dπ(V )),
where α is in the cotangent bundle T ∗M and V is a tangent vector on the manifold T ∗M
at α.
The symplectic two form ω on T ∗M is defined as the exterior derivative of the tau-
tological one form: ω = dθ. It is nondegenerate in the sense that ω(V, ·) = 0 if and only
if V = 0. Given a function H : T ∗M → R on the cotangent bundle, the Hamiltonian
vector field ~H is defined by i ~Hω = −dH. By the nondegeneracy of the symplectic form
ω, the Hamiltonian vector field ~H is uniquely defined.
Given a distribution ∆ and a subriemannian metric g on it, we can associate with
it a Hamiltonian H on the cotangent bundle T ∗M . To do this let α : TxM → R be a
covector in the cotangent space T ∗xM at the point x. The subriemannian metric g defines
a bundle isomorphism I : ∆∗ → ∆ between the distribution ∆ and its dual ∆∗. It is
defined by
g(I(β), ·) = β(·),
where β is an element in the dual bundle ∆∗ of the distribution ∆.
Chapter 10. Revisiting Subriemannian Geometry 91
By restricting the domain of the covector α to the subspace ∆x of the tangent space
TxM , it defines an element, still called α, in the dual space ∆∗. Therefore, I(α) is a
tangent vector contained in the space ∆x and the Hamiltonian H corresponding to the
subriemannian metric g is defined by
H(α) := α(I(α)) = g(I(α), I(α)).
Note that this construction defines the usual kinetic energy Hamiltonian in the Rieman-
nian case.
Let ~H be the Hamiltonian vector field corresponding to the Hamiltonian H defined
above and we denote the corresponding flow by et ~H . If t 7→ et ~H(α) is a trajectory of the
above Hamiltonian flow, then its projection t 7→ γ(t) = π(et ~H(α)) is a locally minimizing
geodesic. That means sufficiently short segment of the curve γ is a minimizing geodesic
between its endpoints. The minimizing geodesics obtained this way are called normal
geodesics. In the special case where the distribution ∆ is the whole tangent bundle
TM , the distance function (10.1) is the usual Riemannian distance and all geodesics
are normal. However, this is not the case for subriemannian manifolds in general. To
introduce another class of geodesics, consider the space Ω of horizontal curves with square
integrable derivatives. The endpoint map end : Ω → M is defined by taking an element γ
in space of curves Ω and giving the endpoint γ(1) of the curve: end(γ) = γ(1). Geodesics
which are regular points of the endpoint map are automatically normal and those which
are critical points are called abnormal. However, there are geodesics which are both
normal and abnormal (see [45] and reference therein for more detail about abnormal
geodesics).
As an example consider a manifold M of dimension m equipped with a free and
proper Lie group action. If G is the group, then the quotient N := M/G is again a
manifold of dimension n and the quotient map πM : M → N defines a principal bundle
with a total space M , a base space N and a structure group G. The kernel of the
map dπM : TM → TN defines a distribution ver of rank m − n, called the vertical
Chapter 10. Revisiting Subriemannian Geometry 92
bundle. A Ehresmann connection is a distribution hor, called horizontal bundle, of rank
n which is fibrewise transversal to the vertical bundle ver. The bundle hor is a principal
bundle connection (or a connection) if it is preserved under the Lie group action. A
subriemannian metric, defined on a connection hor, which is invariant under the Lie
group action is called a metric of bundle type. This subriemannian metric descends to a
Riemannian metric on the base space N . In this thesis two examples will be considered.
They are the Heisenberg group and the Hopf fibration.
The Heisenberg group is a principal bundle with the three dimensional Euclidean
space R3 as its total space. If x, y, z are the coordinates of the total space, then the Lie
group action is a R-action and it is given by the flow of the vector field ∂z. The base
space of this principal bundle is the two dimensional Euclidean space R2 = R3/R. The
connection hor is defined by the span of the vector fields ∂x − 12y∂z and ∂y + 1
2x∂z. The
subriemannian metric is defined by declaring that the above vector fields are orthonormal.
The Hopf fibration is a principal S1-bundle over the two sphere S2 with the three
sphere S3 as the total space. The explicit formulas for the definition of the Hopf fibration
can be found in [19]. Here we recall them for the convenience of the reader. Let w, x, y, z
be the coordinates on the four dimensional Euclidean space R4. The flow of the vector
field Z = −x∂w + w∂x + z∂y − y∂z is a circle action on the unit sphere S3. The bundle
map πM is given by πM(w, x, y, z) = (w2+x2−y2−z2, 2wz+2xy, 2xz−2wy) which maps
the unit 3-sphere S3 to the unit 2-sphere S2. The vector fields −y∂w − z∂x + w∂y + x∂z
and −z∂w + y∂x− x∂y + w∂z define a distribution of rank 2 and a subriemannian metric
on S3. The subriemannian metric descends to a Riemannian metric on the 2-sphere S2
which is twice the usual metric induced from R3.
Chapter 11
Generalized Curvatures
11.1 The General Case
In this chapter we recall the definition of the curvature type invariants studied in [3,
10, 37]. Then, we specialize it to the case of a three dimensional contact subriemannian
manifold. First let us consider the following general situation. Let H : T ∗M → R be
a Hamiltonian and let ~H be the corresponding Hamiltonian vector field. If we denote
the flow of the vector field ~H by et ~H and a point on the manifold T ∗M by α, then the
differential de−t ~H : Tet ~H(α)T∗M → TαT ∗M of the map e−t ~H is a symplectic transfor-
mation between the symplectic vector spaces Tet ~H(α)T∗M and TαT ∗M . Recall that the
vertical space Vα at α of the bundle π : T ∗M → M is defined by the kernel of the map
dπα : TαT ∗M → Tπ(α)M . Since each vertical space Vα is a Lagrangian subspace, the one
parameter family of subspaces t 7→ Jα(t) := de−t ~H(Vet ~H(α)) defines a curve of Lagrangian
subspaces contained in the symplectic vector space TαT ∗M . This curve is called the
Jacobi curve at α. In other words if the space of all Lagrangian subspaces, called La-
grangian Grassmannian, of a symplectic vector space Σ is denoted by LG(Σ), then the
Jacobi curve above is a smooth curve in the Lagrangian Grassmannian LG(TαT ∗M).
The Lagrangian Grassmannian is a homogeneous space of the symplectic group, and
93
Chapter 11. Generalized Curvatures 94
curvature type invariants of the Hamiltonian H are simply differential invariants of the
Jacobi curve under the action of the symplectic group (see [37]).
The construction of differential invariants for a general curve J(·) in the Lagrangian
Grassmannian LG(Σ) was done in the recent paper [37], though partial results were
obtained earlier (see [3, 10]). A principal step is the construction of the canonical splitting:
Σ = J(t)⊕ J(t),
where t 7→ J(t) is another curve in the Lagrangian Grassmannian LG(Σ) such that J(t)
is intrinsically defined by the germ of the curve J(·) at time t.
In the case of the Jacobi curve Jα(·), we have the splitting of the symplectic vector
space TαT ∗M : TαT ∗M = Jα(t) ⊕ Jα(t). In particular the subspace Jα(0) is the vertical
space Vα of the bundle π : T ∗M → M and the subspace Jα(0) is a complimentary
subspace to Jα(0) = Vα at time t = 0. Hence, Jα(0)α∈T ∗M defines an Ehresmann
connection on the bundle π : T ∗M → M . It is shown (see [3]) that this connection defines
a torsion free connection since Jα(0) are Lagrangian subspaces of the symplectic vector
space TαT ∗M . However, it is not a linear connection in general. In the Riemannian case
this is, under the identification of the tangent and cotangent spaces by the Riemannian
metric, simply the Levi-Civita connection (see [3]).
Using the above splitting we can also define a generalization of the Ricci curvature in
the Riemannian geometry. Indeed let πJα(t) and πJα(t) be the projections, corresponding to
the splitting TαT ∗M = Jα(t)⊕Jα(t), onto the subspaces Jα(t) and Jα(t), respectively. Let
w(·) be a path contained in the Jacobi curve Jα(·) (i.e. w(t) ∈ Jα(t)). Then the projection
πJα(t)w(t) of its derivative w(t) onto the subspace Jα(t) depends only on the vector w(t)
but not on the curve w(·). Therefore, it defines a linear operator ΦtJαJα
: Jα(t) → Jα(t)
ΦtJαJα
(v) = πJα(t)
d
dtw(t).
Similarly, we can also define another operator ΦtJαJα
: Jα(t) → Jα(t). Finally the
generalized Ricci curvature is defined by negative of the trace of the linear operator
Chapter 11. Generalized Curvatures 95
Φ0JαJα
Φ0JαJα
: Jα(0) → Jα(0).
Recall that a basis e1, ..., en, f1, ..., fn in a symplectic vector space with a symplectic
form ω is a Darboux basis if it satisfies ω(ei, ej) = ω(fi, fj) = 0, and ω(fi, ej) = δij. The
canonical splitting Σ = J(t)⊕J(t) mentioned above is accompanied by a moving Darboux
basis e1(t), ..., en(t), f1(t), ..., fn(t) of the symplectic vector space Σ satisfying
J(t) = spane1(t), . . . , en(t), J(t) = spanf1(t), . . . , fn(t)
and the structural equations
ei(t) = c1ij(t)ej(t) + c2
ij(t)fj(t), f it = c3
ij(t)ej(t) + c4ij(t)fj(t).
This is an analog of the Frenet frame of a curve in the Euclidean space. The generalized
Ricci curvature is given by the trace of the matrix −c3(0)c2(0), where c2(0) and c3(0) are
the matrices with entries c2ij(0) and c3
ij(0), respectively.
The most interesting cases for us are contact subriemannian structures on three di-
mensional manifolds. To define such a structure, let ∆ be a distribution of rank two (i.e.
∆ is a vector subbundle of the tangent bundle and the dimension of each fibre is two)
on a three dimensional manifold M . We assume that ∆ is 2-generating meaning that
the vector fields contained the distribution ∆ together with their Lie brackets span all
tangent spaces of M . In other words
TM = spanX1, [X2, X3]|Xi ∈ ∆.
The structural equations, in this case, have the following form:
Theorem 11.1 For each fixed α in the manifold T ∗M , there is a moving Darboux frame
e1(t), e2(t), e3(t), f1(t), f2(t), f3(t)
in the symplectic vector space TαT ∗M and functions R11t , R22
t of time t such that e1(t), e2(t), e3(t)
Chapter 11. Generalized Curvatures 96
form a basis for the Jacobi curve Jα(t) and it satisfies the following structural equations
e1(t) = f1(t),
e2(t) = e1(t),
e3(t) = f3(t),
f1(t) = −R11t e1(t)− f2(t),
f2(t) = −R22t e2(t),
f3(t) = 0.
Moreover, e3(t) = 1√2H
( ~E − t ~H) and f3(t) = − 1√2H
~H.
Proof According to the main result in [37], there exists a family of Darboux frame
e1(t), e2(t), e3(t), f1(t), f2(t), f3(t) and functions Rijt of time t which satisfy
e1(t) = f1(t),
e2(t) = e1(t),
e3(t) = f3(t),
f1(t) = −R11t e1(t)−R31
t e3(t)− f2(t),
f2(t) = −R22t e2(t)−R32
t e3(t),
f3(t) = −R31t e1(t)−R32
t e2(t)−R33t e3(t).
(11.1)
Let δs be the dilation in the fibre direction defined by δs(α) = sα and let ~E be the
Euler field defined by ~E(α) = dds
∣∣∣s=1
δsα. By the definition of the Jacobi curve Jα(t), the
time dependent vector field (et ~H)∗ ~E is contained in Jα(t). Next we need the following
lemma.
Lemma 11.2 (et ~H)∗ ~E = ~E − t ~H
Proof of Lemma 11.2
Using the definitions of the symplectic form ω and the Hamiltonian vector field ~H,
we have
ω(dδs( ~H(α)), X(sα)) = sω( ~H(α), dδ1/s(X(sα))) = −sdH(dδ1/s(X(sα))).
Chapter 11. Generalized Curvatures 97
Since the Hamiltonian is homogeneous of degree two in the fibre direction, the above
equation becomes
ω(dδs( ~H(α)), X(sα)) = −1
sdH(X(sα)) =
1
sω( ~H(sα), X(sα)).
It follows that δ∗s ~H = s ~H, where δ∗s ~H is the pullback of the vector field ~H by the map
δs. By comparing the flow of the above vector fields, we have
et ~H δs = δs ets ~H .
By differentiating the above equation with respect to s and set s to 1, it follows that
(et ~H)∗ ~E = ~E − t ~H as claimed. ¤
It follows from Lemma 11.2 that ~E− t ~H =3∑
i=1
ai(t)ei(t) for some functions ai of time
t. If we differentiate with respect to time t twice, we get
2a1(t)f1(t) + 2a2(t)e1(t) + 2a3(t)f3(t)− a1(t)(R11t e1(t) + R31
t e3(t)+
+f2(t)) + a2(t)f1(t)− a3(t)(R31t e1(t) + R32
t e2(t) + R33t e3(t))+
+a1(t)e1(t) + a2(t)e2(t) + a3(t)e3(t) = 0.
If we equate the coefficients of the fi’s, we get a1 ≡ a2 ≡ a3 ≡ 0. Therefore, ~E − t ~H =
a3e3(t) and − ~H = a3f3(t) for some constant a3 satisfying (a3)2 = ω(a3f3(t), a3e3(t)) =
dH( ~E) = 2H. It follows that R31 = R32 = R33 = 0. ¤
11.2 The Three Dimensional Contact Case
In this section we come back to the case of a three dimensional contact subriemannian
manifold. We will write down explicit formulas (Theorem 11.3) for the generalized cur-
vature operator R and the moving Darboux frame e1(t),e2(t),e3(t), f1(t),f2(t),f3(t) in
Theorem 11.1.
Let ∆ be the contact distribution and let H be the Hamiltonian corresponding to a
given subriemannian metric g on ∆. Let σ be an annihilator 1-form of the distribution
Chapter 11. Generalized Curvatures 98
∆. That means a vector v is in the distribution ∆ if and only if σ(v) = 0 (i.e. ker σ = ∆).
Since ∆ is a contact distribution, σ can be chosen in such a way that the restriction of
its exterior derivative dσ to the distribution ∆ is the volume form with respect to the
subriemannian metric g. Let v1, v2 be a local orthonormal frame in the distribution ∆
with respect to the subriemannian metric g and let v0 = e be the Reeb field defined by
the conditions ieσ = 1 and iedσ = 0. This defines a convenient frame v0, v1, v2 in the
tangent bundle TM and we let α0 = σ, α1, α2 be the corresponding dual co-frame in
the cotangent bundle T ∗M (i.e. αi(vj) = δij).
The frame v0, v1, v2 and the co-frame α0, α1, α2 defined above induces a frame in
the tangent bundle TT ∗M of the cotangent bundle T ∗M . Indeed, let ~αi be the vector
fields on the cotangent bundle T ∗M defined by i~αiω = −αi. Note that the symbol αi in
the definition of ~αi represents the pull back π∗αi of the 1-form α on the manifold M by
the projection π : T ∗M → M . This convention of identifying forms in the manifold M
and its pull back on the cotangent bundle T ∗M will be used for the rest of this paper
without mentioning. Let ξ1 and ξ2 be the 1-forms defined by ξ1 = h1α2 − h2α1 and
ξ2 = h1α1 + h2α2, respectively, and let ~ξi be the vector fields defined by i~ξiω = −ξi.
Finally if we let hi : T ∗M → R be the Hamiltonian lift of the vector fields vi, defined
by hi(α) = α(vi), then the vector fields ~h1,~h2,~h3, ~σ, ~ξ1, ~ξ2 define a local frame for the
tangent bundle TT ∗M of the cotangent bundle T ∗M . Under the above notation the
subriemannian Hamiltonian is given by H = 12((h1)
2 +(h2)2) and the Hamiltonian vector
field is ~H = h1~h1 + h2
~h2.
We also need the bracket relations of the vector fields v0, v1, v2. Let akij be the functions
on the manifold M defined by
[vi, vj] = a0ijv0 + a1
ijv1 + a2ijv2. (11.2)
Chapter 11. Generalized Curvatures 99
The dual version of the above relation is
dαk = −∑
0≤i<j≤2
akijαi ∧ αj. (11.3)
By (11.3) and the definition of the Reeb field e = v0, it follows that dσ = dα0 = α1∧α2.
Therefore, a001 = a0
02 = 0 and a012 = −1. If we also take the exterior derivative of the
equation in (11.3), we get a101 + a2
02 = 0. Finally we come to the main theorem of this
section:
Theorem 11.3 The Darboux frame e1(t), e2(t), e3(t), f1(t), f2(t), f3(t) and the invariants
R11t and R22
t satisfy ei(t) = (et ~H)∗ei(0), fi(t) = (et ~H)∗fi(0), R11t = (et ~H)∗R11
0 , R11t =
(et ~H)∗R110 , and
e2(0) = 1√2H
~σ,
e1(0) = 1√2H
~ξ1,
f1(0) = 1√2H
[h1~h2 − h2
~h1 + χ0~α0 + (~ξ1h12)~ξ1 − h12~ξ2],
f2(0) = 1√2H
[2H~h0 − h0~H − χ1~α0 + (~ξ1a)~ξ1 − a~ξ2],
R110 = h2
0 + 2Hκ− 32~ξ1a,
R220 = ~ξ1a− 3 ~H~ξ1
~Ha + 3 ~H2~ξ1a + ~ξ1~H2a.
where
a = dh0( ~H),
χ0 = h2h01 − h1h02 + ~ξ1a,
χ1 = h0a + 2 ~H~ξ1a− ~ξ1~Ha,
κ = v1a212 − v2a
112 − (a1
12)2 − (a2
12)2 − 1
2(a2
01 − a102).
The rest of this section is devoted to the proof of this theorem. First we need a
few lemmas. Let hij : T ∗M → R be the Hamiltonian lift of the vector field [vi, vj]:
hij(α) = α([vi, vj]). Then the commutator relations of the frame ~hi, ~αi|i = 1, 2, 3 is
given by the following:
Lemma 11.4
[~hi,~hj] = ~hij, [~hi, ~αj] = −∑
k
ajik~αk, [~αi, ~αj] = 0.
Chapter 11. Generalized Curvatures 100
Proof Since the Lie derivative of the symplectic form ω along the Hamiltonian vector
field ~hi vanishes,
i[~hi,~hj ]ω = ~hii~hj
ω = −di~hii~hj
ω. (11.4)
The function ω(~hi,~hj) is equal to hij. Indeed, since ω = dθ, we have θ(~hi) = hi. By
using Cartan’s formula, it follows that
dhj(~hi) = ω(~hi,~hj) = dhj(~hi)− dhi(~hj)− θ([~hi,~hj]).
Since dπ(~hi) = vi, it implies that θ([~hi,~hj]) = hij. Therefore, we have
ω(~hi,~hj) = −dhi(~hj) = hij. (11.5)
If we combine this with (11.4), the first assertion of the lemma follows.
A calculation similar to the above one shows that
i[~hi,~αj ]ω = ~hii~αj
ω.
By Cartan’s formula, the above equation becomes
i[~hi,~αj ]ω = −i~hi
π∗dαj = −π∗(ividαj).
The second assertion follows from this and (11.3).
If we apply Cartan’s formula again,
i[~αi,~αj ]ω = ~αii~αjω − i~αj
~αiω = −i~αid(π∗αj) + i~αj
d(π∗αi)
Since dπ(~αi) = 0, it follows that i[~αi,~αj ]ω = 0. Therefore, the third holds. ¤
Let β = h1dh2 − h2dh1, then we also have the following relations:
Lemma 11.5
dhi(~hj) = −hij, αi(~hj) = −dhi(~αj) = δij, αi(~αj) = 0,
β(~ξ2) = dH(~ξ1) = 0, β(~ξ1) = −2H, β(~ξ2) = −2H.
Chapter 11. Generalized Curvatures 101
Proof The first assertion follows from (11.5) and the last two assertions follow from
dπ(~hi) = vi and dπ(~αi) = 0. A computation using αi(~hj) = δij proves the rest of the
assertions. ¤
Proof of Theorem 11.3 If we define E2 by E2(t) = (et ~H)∗~σ, then E2(t) is contained in
the Jacobi curve Jα(t). Since e1(t), e2(t), e3(t) span the Jacobi curve at time t,
E2(t) = c1(t)e1(t) + c2(t)e2(t) + c3(t)e3(t)
for some functions ci of time t.
Since H is the subriemannian Hamiltonian, the vector dπ( ~H) is contained in the
distribution ∆. Therefore, ω(~σ, ~H) = −π∗σ( ~H) = 0. Since f3 = −(2H)−1/2 ~H, we have
0 = ω(~σ, ~H) = ω(E2, ~H) = −(2H)−1/2c3(t).
This shows that c3 ≡ 0 and so E2(t) = c1(t)e1(t) + c2(t)e2(t). If we differentiate this
with respect to time t, then we have
(et ~H)∗[ ~H,~σ] = E2(t) = c1(t)e1(t)− c1(t)f1(t) + c2(t)e2(t)− c2(t)e1(t).
By Cartan’s formula, it follows that
ω([ ~H,~σ], ~σ) = π∗σ([ ~H,~σ]) = −π∗dσ( ~H,~σ) = 0.
By combining this with the above equation for E2 and E2, we have c1 ≡ 0. If we
differentiate the equation E2(t) = c2(t)e2(t) with respect to time t again, we get
E2(t) = c2(t)e2(t)
(et ~H)∗(ad ~H(~σ)) = c2(t)e2(t) + c2(t)e1(t)
(et ~H)∗(ad2~H(~σ)) = c2(t)e2(t) + 2c2(t)e1(t) + c2(t)f1(t).
This gives c := 1/c2(0) = (ωα(ad2~H(~σ), ad ~H(~σ)))−1/2, and c2(t) = (et ~H)∗c2(0). There-
fore, e2(t) = (et ~H)∗(c~σ).
Chapter 11. Generalized Curvatures 102
To find out what c is more explicitly, we first compute [ ~H, ~α0]. The Lie bracket is a
derivation in each of its entries, so
[ ~H, ~α0] = [h1~h1 + h2
~h2, ~α0] = −dh1(~α0)~h1 − dh2(~α0)~h2 + h1[~h1, ~σ] + h2[~h2, ~σ].
It follows from this, Lemma 11.4, and Lemma 11.5 that
[ ~H, ~α0] = h1~α2 − h2~α1 = ξ1.
Next, we want to compute [ ~H, ~ξ1]. For this, let
[ ~H, ~ξ1] = k0~α0 + k1~ξ1 + k2
~ξ2 +3∑
i=0
ci~hi (11.6)
for some functions ci and ki.
To compute c0 for instance, we apply α0 on both sides of (11.6). Using Lemma 11.5
and Cartan’s formula, we have c0 = 0. Similar computation gives c1 = −h2 and c2 = h1.
This shows that
[ ~H, ~ξ1] = k0~α0 + k1~ξ1 + k2
~ξ2 + h1~h2 − h2
~h1. (11.7)
By applying dh0 on both sides of (11.7) and using Lemma 11.5 again, we have k0 =
h2h01 − h1h02 + ~ξ1a. Similar calculations using β and dH give
[ ~H, ~ξ1] = h1~h2 − h2
~h1 + χ0~α0 + (~ξ1h12)ξ1 − h12ξ2. (11.8)
where χ0 = h2h01 − h1h02 + ~ξ1a and a = dh0( ~H).
It follows that
c−2 = ω(ad2~H(~σ), ad ~H(~σ)) = 2H
and e2(0) = 1√2H
~α0. It also follows from Theorem 11.3 that
e1(0) = 1√2H
~ξ1,
f1(0) = 1√2H
[ ~H, ~ξ1],
f1(0) = 1√2H
[ ~H, [ ~H, ~ξ1]],
f1(0) = 1√2H
[ ~H, [ ~H, [ ~H, ~ξ1]]].
(11.9)
Chapter 11. Generalized Curvatures 103
A computation similar to that of (11.8) gives
[ ~H, [ ~H, ~ξ1]] = −2H~h0 + h0~H + χ1~α0 + (χ2 + χ0 − ~ξ1a)ξ1 + aξ2 (11.10)
where χ1 = h0a + 2 ~H~ξ1a− ~ξ1~Ha and χ2 = h0h12 + 2 ~H~ξ1h12 − ~ξ1
~Hh12.
It follows from Theorem 11.3, (11.8), and (11.10) that
R110 = −χ0 − χ2. (11.11)
Since f1(0) = −R110 e1(0)− f2(0), it follows from (11.9), (11.10), and (11.11) that
f2(0) =1√2H
[2H~h0 − h0~H − χ1~α0 + (~ξ1a)ξ1 − aξ2].
A long computation using the bracket relations (11.2) gives
χ2 = −(h0)2 + 2H[(a1
12)2 + (a2
12)2 − v1a
212 + v2a
112] + ~ξ1a.
and
χ0 − 1
2~ξ1a = h2h01 − h1h02 +
1
2~ξ1a = H(a2
01 − a102).
It follows as claimed that
R110 = h2
0 + 2Hκ− 3
2~ξ1a.
To prove the formula for R22, we differentiate the equation f1(t) = −R11t e1(t)− f2(t)
and combine it with the equation f2(t) = −R22t e2(t). We have
R220 e2(0) = f1(0) + ~HR11
0 e1(0) + R110 f1(0).
Therefore, by applying dh0 on both sides and using dh0(e1(0)) = 0, we get
R220 = −
√2H[dh0(f1(0)) + R11
0 dh0(f1(0))].
by using Cartan’s formula and (11.9), it follows that
√2Hdh0(f1(0)) = dh0([ ~H, ~ξ1]) = −~ξ1a,
Chapter 11. Generalized Curvatures 104
√2Hdh0(f1(0)) = dh0([ ~H, [ ~H, ~ξ1]]) = ~ξ1
~Ha− 2 ~H~ξ1a,
and therefore,
√2Hdh0(f1(0)) = 3 ~H~ξ1
~Ha− 3L2~H~ξ1a− ~ξ1
~H2a.
The formula for R220 follows from this.
¤
Chapter 12
Measure Contraction Properties
Measure contraction property is introduced in [59] as one of the generalizations of cur-
vature dimension bound to all metric measure spaces. In the setting of a subriemannian
manifold with a 2-generating distribution, a simpler definition can be given. To do this,
we first recall the recent results on the theory of optimal transportation in [5] and [24].
Let µ and ν be two Borel probability measures on the subriemannian manifold M with
a distribution ∆ and a subriemannian metric g. If we let H be the Hamiltonian corre-
sponding to the metric g and dCC be the corresponding subriemannian distance, then
the optimal transportation problem is the following minimization problem:
Find a Borel map ϕ : M → M which achieves the following infimum
inf
∫
M
d2CC(x, ϕ(x))dµ(x) (12.1)
where the infimum is taken over all Borel map ϕ which pushes µ forward to ν. (i.e.
µ(ϕ−1(U)) = ν(U) for all Borel sets U .)
The minimizers to the above problem are called optimal maps. The following theorem
is one of the main results in [5, 24] which generalizes the earlier work in [16, 42]. This is
also Theorem 3.20 in this thesis.
Theorem 12.1 Assume that the distribution ∆ is 2-generating and the measure µ is ab-
solutely continuous with respect to the Lebesgue measure, then the optimal transportation
105
Chapter 12. Measure Contraction Properties 106
problem has a solution ϕ and any optimal map equals to this one µ almost everywhere.
Moreover, ϕ is given by ϕ(x) = π(e1 ~H(−dfx)) for some Lipschitz function f .
Many important results in the theory of optimal transportation rely on the study of
the following 1-parameter family of maps called displacement interpolation introduced
by R. McCann (see [64] for the history and importance of displacement interpolation):
ϕt(x) := π(et ~H(−dfx)).
If ϕ1 is the optimal map between the measures µ and ν, then ϕt is optimal between
µ and ϕt∗µ. All the above results also hold when the distance squared cost d2 is replaced
by costs defined by Lagrange’s problem (see [14, 5]). In those cases the subriemannian
Hamiltonian H in Theorem 12.1 is replaced by more general Hamiltonians.
If we set, in the displacement interpolation, f(x) = d2CC(x, x0) for some given point
x0 on the manifold M , then ϕ1 is the optimal map which pushes any measure µ to the
delta mass concentrated at a point x0. In this case the curves defined by t 7→ ϕt(x) :=
π(et ~H(−dfx)) are unique normal geodesics joining the points x and x0 for Lebesgue almost
all x.
Let η be a Borel measure on the manifold M and let ϕt be the map ϕt(x) =
π(et ~H(−dfx)), where f(x) = d2CC(x, x0). The metric measure space (M,dCC , η) satis-
fies the measure contraction property MCP (k, N) if
(1− t)
sin(
√k
N−1dCC(x0, x))
sin(√
kN−1
dCC(x0, x)/(1− t))
N−1
ϕt∗η(U) ≤ η(U)
for each η measurable set U and each point x0 in the manifold M .
When K = 0, the measure contraction property becomes
(1− t)Nϕt∗η(U) ≤ η(U).
Next we specialize to the case where M is a three dimensional manifold with a contact
distribution ∆ and a subriemannian metric g. Let dCC be the corresponding subrieman-
nian distance function and let R11, R22 be the invariants defined as in Theorem 11.1.
Chapter 12. Measure Contraction Properties 107
Recall that the kernel of the map dπ : TT ∗M → TM defines the vertical bundle V on
the manifold T ∗M . Let m be the three form on the manifold T ∗M such that it is zero on
the vertical spaces Vα and mα(f1(0), f2(0), f3(0)) = 1 where f1(0), f2(0), f3(0) is defined
as in Theorem 11.1. The following lemma shows that the Hamiltonian H is unimodular
(see Appendix 2 for the definition of unimodular).
Lemma 12.2 Let η be a smooth volume form on the manifold M such that η(e, v1, v2) =
1, then π∗η =√
2Hm.
Proof Clearly, π∗η is zero on the space V . Therefore, it is enough to show that
π∗η(f1(0), f2(0), f3(0)) =√
2H, and this follows from Theorem 11.3 and the definition of
η. ¤
We also use the same notation for the measure corresponding to the volume form η.
Finally we come to the main result:
Theorem 12.3 If R110 (α) ≥ 2rH(α) and R22
0 ≥ 0, then the metric measure space
(M, dCC , η) satisfies
(1− t)5ϕt∗η(U) ≤[(1− t)(2− 2 cos T0 − T0 sin T0)
(2− 2 cos Tt − Tt sin Tt)
]ϕt∗η(U) ≤ η(U) (12.2)
if r > 0 and [(1− t)(2− 2 cosh T0 + T0 sinh T0)
(2− 2 cosh Tt + Tt sinh Tt)
]ϕt∗η(U) ≤ η(U) (12.3)
if r < 0, for each η measurable set U and each x0 in the manifold M , where Tt(x) =√
rdCC(x0,x)1−t
.
In particular if r ≥ 0, then (M,dCC , η) satisfies the measure contraction property
MCP (0, 5).
Definition 12.4 We say that a metric measure space satisfies the generalized measure
contraction property MCP (r; 2, 3) if either (12.2) or (12.3) holds.
Chapter 12. Measure Contraction Properties 108
Therefore, Theorem 12.3 says that if a three dimensional contact subriemannian man-
ifold satisfies R110 (α) ≥ 2rH(α) and R22
0 ≥ 0, then it satisfies the generalized measure
contraction property MCP (r; 2, 3). Note also that the condition MCP (0; 2, 3) is the
same as MCP (0, 5).
As a corollary of Theorem 12.3, we have the following doubling property (see [59]).
Corollary 12.5 (Doubling) Let Bx(r) be the ball of radius r centred at x in the space
(M, dCC , η). If R110 ≥ 0 and R22
0 ≥ 0, then it satisfies the following doubling property:
η(Bx(2r)) ≤ 25η(Bx(r)).
Recall that a Borel function h : M → R is the upper gradient of a function f : M → R
if
|f(x(0))− f(x(1))| ≤ l(x(·))∫ 1
0
h(x(s))ds
for each curve x(·) of finite length l(x(·)).The following local Poincare inequality also holds as a corollary of Theorem 12.3 (see
the proof of [40, Theorem 3.1] and [40, Theorem 2.5]).
Corollary 12.6 (Local Poincare Inequality) If the manifold M is compact and the in-
variants R110 and R22
0 are non-negative, then (M,dCC , η) satisfies the following local
Poincare inequality
1
ν(Bx(r))
∫
Bx(r)
|f(x)− 〈f〉Bx(r) |dη(x) ≤ Cr
ν(Bx(2r))
∫
ν(Bx(2r))
h(x)dη(x),
for some constant C and where 〈f〉Bx(r) = 1η(Bx(r))
∫Bx(r)
f(x)dη(x).
The rest of this chapter is devoted to the proof of Theorem 12.3.
Proof of Theorem 12.3 From the main result in [17], the function f(x) = d(x, x0) is
locally semiconcave on M − x0. So, by [24, Theorem 3.5] and [24, Section 3.4], the
measures ϕt∗η are absolutely continuous with respect to the Lebesgue class. Therefore,
Chapter 12. Measure Contraction Properties 109
ϕt∗η = ρtη for some function ρt. Hence, it is enough to show the following holds η almost
everywhere: [(1− t)(2− 2 cos T0 − T0 sin T0)
(2− 2 cos Tt − Tt sin Tt)
]ρt ≤ 1 (12.4)
if r > 0, [(1− t)(2− 2 cosh T0 + T0 sinh T0)
(2− 2 cosh Tt + Ti sinh Tt)
]ρt ≤ 1 (12.5)
if r < 0, and
(1− t)5ρt ≤ 1, (12.6)
if r = 0.
The function f(x) = d(x, x0) is locally semiconcave on M − x0, so it is twice
differentiable almost everywhere by Alexandrov’s theorem (see for instance [64]). If we
denote the differential of the map x 7→ −dfx by F , then dϕt = dπdet ~HF . Let ei(t) and
fi(t) be the Darboux frame at α defined as in Theorem 11.1 and let ςi = dπ(fi(0)). Then
the vectors F(ς1),F(ς2),F(ς3) span a linear subspace W of TαT ∗M . Let ei(t) and fi(t)
be the Darboux frame at α defined as in Theorem 11.1, then F(ςi) can be written as
F(ςi) =3∑
k=1
(aij(t)ej(t) + bij(t)fj(t)) or Ψ = AtEt + BtFt,
where At is the matrix with entries aij(t), Bt is the matrix with entries bij(t), and Ψ, Et,
and Ft are matrices with rows F(ςi), ei(t), and fi(t), respectively.
It follows from absolute continuity of the measure ϕt∗µ and the result in [5, 24] that
the map ϕt(x) is injective for η almost all x. We fix a point z for which the map ϕt
is injective and the path s 7→ ϕs(z) is minimizing. Such a point exists Lebesgue and
hence η almost everywhere. It follows from [1, Theorem 1.2] that there is no conjugate
point along the curve s 7→ ϕs(z). Therefore, the map ϕs is nonsingular for each s
in [0, t] and so ρ(ϕt(z)) 6= 0 for each s in [0, t]. Let St be the matrix defined as in
Theorem 15.2. Recall that St is defined as follow: the linear space W is transversal
to the space Jα(t) = spane1(t), ..., en(t). Therefore, the linear subspace W defined
Chapter 12. Measure Contraction Properties 110
above is the graph of a linear map from the space spanf1(t), ..., fn(t) to the space
Jα(t) = spane1(t), ..., en(t). Let St be the corresponding matrix (i.e. the linear map is
given by fi(t) 7→∑3
i=1 Sijt ej(t), where Sij
t are the entries of the matrix St). Finally recall
that St satisfies St = B−1t At.
Using the structural equation (11.1) and Theorem 15.2, we get the following.
Lemma 12.7 The matrices St satisfies the following matrix Riccati equation:
St −Rt + StC1 + CT1 St − StC2St = 0,
where Rt =
R11t 0 0
0 R22t 0
0 0 0
, C1 =
0 0 0
1 0 0
0 0 0
and C2 =
1 0 0
0 0 0
0 0 1
.
If t is sufficiently close to 1, then S−1t exists and it is the solution to the following
initial value problem
d
dt(S−1
t ) + S−1t RtS
−1t − C1S
−1t − S−1
t CT1 + C2 = 0 and S−1
1 = 0.
Proof The matrix Riccati equation follows from (11.1) and Theorem 15.2. To show
that S−11 = 0, it is enough to show that B1 = 0. If we let γ be a path such that γ(0) = ςi,
then ϕ1(γ(s)) = x0. By differentiating this equation with respect to s, we get
dπde1· ~HF(ςi) = dϕ1(ςi) = 0.
It follows that F(ςi) is contained in spane1(1), e2(1), e3(1) and so B1 = 0. ¤
Let us consider the following simpler matrix Ricatti equation:
d
dt(S−1
t ) + S−1t RtS
−1t − C1S
−1t − S−1
t CT1 + C2 = 0 (12.7)
together with the condition S−11 = 0, where R =
2rH 0 0
0 0 0
0 0 0
.
Chapter 12. Measure Contraction Properties 111
Lemma 12.8 Let τt = (1− t)√
2 |r|H. If r > 0, then the solution to (12.7) is given by
St =
τ0(sin τt−τt cos τt)D
τ20 (1−cos τt)
D 0
τ20 (1−cos τt)
Dτ30 sin τt
D 0
0 0 11−t
where D = 2− 2 cos τt − τt sin τt.
If r < 0, then it is
St =
τ0(τt cosh τt−sinh τt)Dh
τ20 (cosh τt−1)
Dh 0
τ20 (cosh τt−1)
Dh
τ30 sinh τt
Dh 0
0 0 11−t
where Dh = 2− 2 cosh τt + τt sinh τt.
Finally if r = 0, then the solution becomes
St =1
(1− t)3
4(1− t)2 6(1− t) 0
6(1− t) 12 0
0 0 (1− t)2
.
Proof of Lemma 12.8 In the case r = 0, there is no quadratic term in the matrix Ricatti
equation. Therefore, the proof in this case is straightforward and will be omitted.
For other values of r, consider the matrix A =
C1 −C2
R −CT1
and the corresponding
matrix differential equation ddt
q = Aq together with the condition q(1) = I.
The fundamental solution is given by
q(t) = e(t−1)A =
cos τt 0 0 sin τt
τ01−cos τt
τ20
0
− sin τt
τ01 0 cos τt−1
τ20
sin τt−τt
τ30
0
0 0 1 0 0 1− t
−τ0 sin τt 0 0 cos τtsin τt
τ00
0 0 0 0 1 0
0 0 0 0 0 1
.
Chapter 12. Measure Contraction Properties 112
if r > 0 and it is
q(t) = e(t−1)A =
cosh τt 0 0 sinh τt
τ0cosh τt−1
τ20
0
− sinh τt
τ01 0 1−cosh τt
τ20
τt−sinh τt
τ30
0
0 0 1 0 0 1− t
τ0 sinh τt 0 0 cosh τtsinh τt
τ00
0 0 0 0 1 0
0 0 0 0 0 1
.
if r < 0.
It follows from [36, Theorem 1] that
S−1t =
sin τt
τ01−cos τt
τ20
0
cos τt−1τ20
sin τt−τt
τ30
0
0 0 1− t
cos τtsin τt
τ00
0 1 0
0 0 1
−1
=
tan τt
τ0cos τt−1τ20 cos τt
0
cos τt−1τ20 cos τt
tan τt−τt
τ30
0
0 0 1− t
.
if r > 0 and
S−1t =
sinh τt
τ0cosh τt−1
τ20
0
1−cosh τt
τ20
τt−sinh τt
τ30
0
0 0 1− t
cosh τtsinh τt
τ00
0 1 0
0 0 1
−1
=
tanh τt
τ01−cosh τt
τ20 cosh τt
0
1−cosh τt
τ20 cosh τt
τt−tanh τt
τ30
0
0 0 1− t
.
if r < 0.
Therefore, inverting the above matrix gives the result. ¤
Since R11t ≥ 2rH and R22
t ≥ 0, by comparison theorem of the matrix Riccati equation
(see [25, Theorem 2.1]), we have S−1t ≥ S−1
t ≥ 0 for t close enough to 1. Here A ≥ B
Chapter 12. Measure Contraction Properties 113
means that A−B is nonnegative definite. By monotonicity (see [15, Proposition V.1.6]),
0 ≤ St ≤ St for t close enough to 1. If we apply the same comparison principle to St and
St, then 0 ≤ St ≤ St for all t in [0, 1]. Therefore,
tr(St
1 0 0
0 0 0
0 0 1
)≥ tr
(St
1 0 0
0 0 0
0 0 1
)(12.8)
If r = 0, then
S11t + S33
t =5
1− t.
If r > 0, then
S11t + S33
t =τ0(sin τt − τt cos τt)
2− 2 cos τt − τt sin τt
+1
1− t.
If r < 0, then
S11t + S33
t =τ0(τt cosh τt − sinh τt)
2− 2 cosh τt + τt sinh τt
+1
1− t.
If we integrate the above equations, we get
∫ t
0
S11s + S33
s ds = − log(1− t)5 (12.9)
if r = 0, ∫ t
0
(S11s + S33
s )ds = − log
[(1− t)(2− 2 cos τt − τt sin τt)
(2− 2 cos τ0 − τ0 sin τ0)
](12.10)
if r > 0, and
∫ t
0
(S11s + S33
s )ds = − log
[(1− t)(2− 2 cosh τt + τt sinh τt)
(2− 2 cosh τ0 + τ0 sinh τ0)
](12.11)
if r < 0.
It follows from Theorem 15.2 that
ρt(ϕt(z)) = e∫ t0 S11
s +S33s ds.
Combining this with (12.8), (12.9), (12.10), and (12.11) give
[(1− t)(2− 2 cos τt − τt sin τt)
(2− 2 cos τ0 − τ0 sin τ0)
]ρt(ϕt(x)) ≤ 1
Chapter 12. Measure Contraction Properties 114
if r > 0, [(1− t)(2− 2 cosh τt + τt sinh τt)
(2− 2 cosh τ0 + τ0 sinh τ0)
]ρt(ϕt(x)) ≤ 1
if r < 0, and
(1− t)5ρt(ϕt(x)) ≤ 1
if r = 0.
To complete the proof of the theorem, note that τt = (1 − t)√
2 |r|H = (1 −t)
√|r|dCC(x0, z) =
√|r|dCC(x0, ϕt(z)). This shows (12.4), (12.5) and (12.6) holds ϕt∗µ
almost everywhere. But, ρt vanishes µ-almost everywhere on a set of ϕt∗µ-measure zero
set. Therefore, (12.4), (12.5) and (12.6) holds µ-almost everywhere. ¤
Chapter 13
Isoperimetric Problems
In this chapter we specialize to the case which model the isoperimetric problem or a
particle in a constant magnetic field on a Riemannian surface. More precisely, assume
that the vector field e, which is transversal to the distribution ∆, defines a free and proper
Lie group G-action (i.e. G = S1 or G = R). Then the quotient N := M/G is again a
manifold. Assume also that the subriemannian metric g is a metric of bundle type (i.e g
is invariant under the above action). Under these assumptions the subriemannian metric
g descends to a Riemannian metric on the surface N . In this case Theorem 11.3 and 12.3
simplify to
Theorem 13.1
e2(0) = 1√2H
~σ,
e1(0) = 1√2H
~ξ1,
f1(0) = 1√2H
[h1~h2 − h2
~h1 + 2Ha201~α0 + (~ξ1h12)~ξ1 − h12
~ξ2],
f2(0) = 1√2H
[2H~h0 − h0~H],
R110 = h2
0 + 2Hκ,
R220 = 0.
where κ is the Gauss curvature of the surface N .
As a consequence the metric measure space (M,dCC , η) satisfies the generalized mea-
sure contraction property MCP (κ; 2, 3). In particular, it satisfies the measure contraction
115
Chapter 13. Isoperimetric Problems 116
property MCP (0, 5) if κ ≥ 0.
Proof
Since g is a metric of bundle type, the following holds.
Lemma 13.2 Under the above assumptions, the functions akij in the bracket relation
(11.2) satisfies
a001 = a0
02 = a101 = a2
02 = 0anda201 = −a1
02.
Proof of Lemma 13.2 If the flow of the vector field e = v0 is denoted by ete, then the
invariance of the subriemannian metric under the group action implies that
g((ete)∗vi, (ete)∗vj) = δij, σ((ete)∗vj) = 0, i, j = 1, 2.
By differentiating the above equations with respect to time t, it follows that
αj([e, vi]) + αi([e, vj]) = 0, σ([e, vj]) = 0, i, j = 1, 2.
If we apply the bracket relations (11.2) of the frame v1, v2, v3, we have
aj0i + ai
0j = αj([e, vi]) + αi([e, vj]) = 0, a00j = σ([e, vj]) = 0, i, j = 1, 2.
¤
It follows that
Lemma 13.3 The function h0 is a constant of motion of the flow et ~H . i.e. a = dh0( ~H) =
0.
Proof of Lemma 13.3 This follows from general result in Hamiltonian reduction. In this
special case this can also be seen as follow. By Lemma 11.5
dh0( ~H) = dh0(h1~h1 + h2
~h2) = h1h10 + h2h20. (13.1)
Chapter 13. Isoperimetric Problems 117
By Lemma 13.2 we also have
h10 = −a001h0 − a1
01h1 − a201h2 = −a2
01h2.
Similarly h20 = −a102h1. The result follows from this, (13.1), and Lemma 13.2. ¤
It follows from Lemma 13.2 and Lemma 13.3 that χ0 = 2Ha201. It remains to show
that κ is the Gauss curvature of the surface N . By Lemma 13.2 κ simplifies to
κ = v1a212 − v2a
112 − (a1
12)2 − (a2
12)2. (13.2)
Let w1 and w2 be a local orthonormal frame on the surface N . Let w1 and w2 be
the horizontal lift of w1 and w2, respectively. Since w1 and w2 are orthonormal with
respect to the subriemannian metric, we can set vi = wi. It follows from (11.2) that
[w1, w2] = a112w1 + a2
12w2. Let us denote the covariant derivative on the Riemannian
manifold N by ∇. It follows from Koszul formula ([50, Theorem 3.11]) that
∇v1v1 = −a112v2, ∇v2v2 = −a2
12v1,
∇v1v2 = a112v1, ∇v2v1 = −a2
12v2.(13.3)
Since the covariant derivative ∇ is tensorial in the bottom slot and is a derivation in
the other slot, it follows from (13.3) that
∇[v1,v2]v1 = ∇a112v1+a2
12v2v1
= a112∇v1v1 + a2
12∇v2v1
= −[(a112)
2 + (a212)
2]v2
and
[∇v1 ,∇v2 ]v1 = ∇v1∇v2v1 −∇v2∇v1v1
= −∇v1(a212v2) +∇v2(a
112v2)
= −(v1a212)v2 + (v2a
112)v2 − 2a1
12a212v1.
Therefore, it follows from the above calculation and (13.2) that the Gauss curvature
is given by
κ =< ∇[v1,v2]v1 − [∇v1 ,∇v2 ]v1, v2 >
= −(a112)
2 − (a212)
2 + v1a212 − v2a
112.
Chapter 13. Isoperimetric Problems 118
as claimed. ¤
Recall that the Heisenberg group is the Euclidean space R3 equipped with the distri-
bution ∆ = spanv1 = ∂x − 12y∂z, v2 = ∂y + 1
2x∂z). Let g be the subriemannian metric
for which v1 and v2 is orthonormal and let H be the subriemannian Hamiltonian. The
vector e which defines the action is given by e = [v1, v2] = ∂z. This defines a R-action
and the quotient of the manifold M by this action is N = M/G = R2. The measure η
is the Lebesgue measure. Therefore, by applying Theorem 12.3 and Theorem 13.1, we
recover the following theorem of [32].
Theorem 13.4 The Heisenberg group with subriemannian metric defined above together
with the Lebesgue measure satisfies the measure contraction property MCP (0, 5).
Next we look at the Hopf fibration. Let S3 be the unit sphere in the four dimensional
Euclidean space R4. The vector field e = −x∂w+w∂x+z∂y−y∂z defines a circle S1 action
on S3. The quotient N = M/G is the 2-sphere S2. The vector fields −y∂w− z∂x +w∂y +
x∂z and −z∂w + y∂x − x∂y + w∂z define a distribution of rank 2 and a subriemannian
metric on S3. Finally the volume form η is given by η = 12(−zdw ∧ dx∧ dy + wdx∧ dy ∧
dz + ydw ∧ dx ∧ dz − xdw ∧ dy ∧ dz).
Theorem 13.5 The 3-sphere S3 equipped with the above subriemannian metric satisfies
the generalized measure contraction property MCP (2; 2, 3). In particular, it satisfies the
measure contraction property MCP (0, 5).
Part IV
Appendix
119
Chapter 14
Proof of Pontryagin Maximum
Principle for the Bolza Problem
This appendix is devoted to the prove of Theorem 2.3. The first step is to reduce the
problem to a simpler one. Recall that the Bolza problem is the following minimization
problem:
Find minimizers for
inf(x(·),u(·))∈Cx0
∫ 1
0
L(x(s), u(s)) ds− f(x(1))
where the infimum is taken over all admissible pairs (x(·), u(·)) satisfying the control
system
x(s) = F (x(s), u(s))
and initial condition x(0) = x0.
Let x = (x, z) be a point in the product manifold M × R and consider the following
extended control system on it:
x = F (x, u) := (F (x, u), L(x, u)). (14.1)
Note that x(·) = (x(·), z(·)) satisfies this extended system and initial condition x(0) =
(x0, 0) if and only if x(·) satisfies the original control system in the Bolza problem with
120
Chapter 14. Proof of Pontryagin Maximum Principle for the Bolza Problem121
the initial condition x(0) = x0 and z(t) =∫ t
0L(q(s), u(s)) ds. Therefore, Problem 2.2 is
equivalent to the following problem:
Problem 14.1 Find minimizers for
inf(x(·),u(·))∈C(x0,0)
(z(1)− f(x(1))) , (14.2)
where the infimum is taken over all admissible pair satisfying the extended control system
(14.1).
Problem 14.1 is an example of the Mayer problem. Namely, let g : N → R be a
function on the manifold N . Then the Mayer problem is the following minimization:
Problem 14.2 Find minimizers for
infCx0
g(x(1))
where the infimum is taken over all admissible pair (x(·), u(·)) satisfying the control sys-
tem
x = F (x, u)
on N and initial condition x(0) = x0.
Note that Problem 14.1 is the Mayer problem on the manifold N = M × R with
function g : M × R→ R given by g(x, z) = z − f(x). Also, if α is in the subdifferential
d−fx of f at x, then (−α, 1) is in the superdifferential d+g(x,z) of g at (x, z).
Next, we will prove a version of the Pontryagin maximum principle for the Mayer
problem and show how Theorem 2.3 follows from this. For each point u in the control
set U , define the corresponding Hamiltonian function Hu : T ∗N → R by
Hu(px) = px(F (x, u)).
Chapter 14. Proof of Pontryagin Maximum Principle for the Bolza Problem122
Theorem 14.3 (Pontryagin Maximum Principle for the Mayer Problem)
Let (x(·), u(·)) be an admissible pair which achieves the infimum in Problem 14.2.
Assume that the function g in Problem 14.2 is super-differentiable at the point x(1)
and let α be in the super-differential d+gx(1) of g. Then there exists a Lipschitz path
p(·) : [0, 1] → T ∗N which, for almost all values of time t in the interval [0, 1], satisfies
the following:
π(p(t)) = x(t),
p(1) = α,
˙p(t) =−→H u(t)(p(t)),
H u(t)(p(t)) = minu∈U
Hu(p(t)).
(14.3)
Proof Fix a point v in the control set and a number τ in the interval [0, 1]. For each small
positive number ε > 0, let uε be the admissible control defined by
uε(t) =
u(t), if t /∈ [τ − ε, τ ];
v, if t ∈ [τ − ε, τ ].
Since the optimal control u is locally bounded, the new control uε defined above is also
locally bounded. Let P εt0,t1
: N → N be the time-dependent local flow of the following
ordinary differential equation
x(t) = F (x(t), uε(t)).
Here P ε0,t(x) denotes the image of the point x in the manifold N under the local flow
P ε0,t at time t. It has the composition property P ε
t2,t3 P ε
t1,t2= P ε
t1,t3. Also, recall that
P εt0,t1
depends smoothly on the space variables and it is Lipschitz with respect to the time
variable.
Since x(1) = P 00,1(x0) and the function g is minimizing at x(1), the following is true
for all ε > 0:
Chapter 14. Proof of Pontryagin Maximum Principle for the Bolza Problem123
g(P ε0,1(x0)) ≥ g(P 0
0,1(x0)). (14.4)
Let α be a point in the super-differential d+gx(1) at the point x(1). Then there exists
a C1 function φ : N → R such that dφx(1) = α and g − φ has a local maximum at x(1).
Combining this with (14.4), we have
g(P 00,1(x0))− φ(P ε
0,1(x0)) ≤
g(P ε0,1(x0))− φ(P ε
0,1(x0)) ≤ g(P 00,1(x0))− φ(P 0
0,1(x0)).
Simplifying this equation we get
φ(P ε0,1(x0))− φ(P 0
0,1(x0))
ε≥ 0. (14.5)
If Rt denotes the flow of the vector field F v, then
P ε0,1 = P 0
τ,1 Rε P 00,τ−ε. (14.6)
So, if we assume that τ is a point of differentiability of the map t 7→ P 00,t, which is
true for almost all time τ in the interval [0, 1], then P ε0,1 is differentiable with respect to
ε at zero. Therefore, we can let ε go to 0 in (14.5) and obtain
α
(d
dε
∣∣∣ε=0
P ε0,1
)≥ 0. (14.7)
If we differentiate equation (14.6) with respect to ε and set ε to be zero, the equation
becomes
d
dε
∣∣∣ε=0
P ε0,1 = (P 0
τ,1)∗(F v − F u(τ)) P 00,1.
Substitute this equation back into (14.7) and we get the following:
((P 0τ,1)
∗α)(F v(x(τ))− F u(τ)(x(τ))) ≥ 0. (14.8)
Chapter 14. Proof of Pontryagin Maximum Principle for the Bolza Problem124
Define p : [0, 1] → T ∗N by p(t) = (P 0t,1)
∗α, then the first two assertions of the theorem
are clearly satisfied.
The following is well known (see [7] or [41]):
Lemma 14.4 Let θ = pdq be the natural 1-form on the cotangent bundle of the manifold
N , then for each diffeomorphism P : N → N , the pull back map P ∗ : T ∗N → T ∗N on
the cotangent bundle of the manifold preserves the 1-form θ.
Let Wt be the time-dependent vector field on the cotangent bundle of the manifold
which satisfies
d
dt(P 0
t,1)∗ = Wt (P 0
t,1)∗
for almost all values of time t in [0, 1]. If LV denotes the Lie derivative with respect to
a vector field V , then, by Lemma 14.4, the following is true for almost all values of time
t in [0, 1]:
LWtθ = 0.
If ω = −dθ is the canonical symplectic 2-form on the cotangent bundle, then, by
using Cartan’s formula, we have
iWtω = d(θ(Wt)).
Therefore, the vector field Wt is a Hamiltonian vector field with the Hamiltonian given
by
H u(t)(p) = p(F (x, u(t))).
This implies the third assertion of the theorem. The last assertion follows from (14.8).
¤
Going back to Problem 14.1, we can apply the Pontryagin Maximum Principle for
the Mayer problem. Let (x(·), z(·)) be an admissible pair which minimizes Problem 14.1
and let H t : T ∗M × R→ R be the function defined by
Chapter 14. Proof of Pontryagin Maximum Principle for the Bolza Problem125
H t(p, l) := p(F (x, u(t))) + l · L(x, u(t)).
By Theorem 14.3, there exists a curve (p(·), l(·)) : [0, 1] → T ∗xM × R such that x(t) =
π(p(t)) and
( ˙p,˙l) =
−→H t(p, l),
(p(1), l(1)) = (−α, 1),
H t(p(t), l(t)) = minu∈U
(p(t)(F (x(t), u)) + l(t) · L(x(t), u)
).
(14.9)
From the first equation in (14.9), we get˙l = 0 and l(1) = 1. So, l(t) ≡ 1. Therefore,
(14.9) is simplified to
˙p =−→H u(p),
p(1) = −α,
Hu(p(t), P (t)) = minu∈U
(p(t)(F (x(t), u)) + L(x(t), u)) .
(14.10)
This finishes the proof of Theorem 2.3.
Chapter 15
Optimal Transportation and the
Generalized Curvature
In this appendix we discuss the relations between optimal transportation and the gener-
alized curvature invariants. To do this, we first recall the displacement interpolation:
ϕt(x) = π(et ~H(−dfx)),
where H is a Hamiltonian function on the cotangent bundle T ∗M , ~H is the corresponding
Hamiltonian vector field, et ~H is its Hamiltonian flow and f is a function which is twice
differentiable at almost all points x. We also assume that the map ϕt is, at almost all
points x in the manifold M , nonsingular for all t in [0, 1).
We recall that the optimal maps to the optimal transportation problem (12.1) are of
the form given by ϕ1. Let µ be a smooth volume form on the manifold M and we denote
the corresponding measure by the same symbol µ. Assume that the measures ϕt∗µ are
absolutely continuous with respect to the measure µ and let ρt be the corresponding
density. In this appendix we describe the changes in the density ρt as a function of
time t using the generalized curvature invariants. This is analogous to the Jacobi field
calculations in [20, 60].
Recall that the vertical bundle V is given by the kernel of the map dπ : TT ∗M → TM
126
Chapter 15. Optimal Transportation and the Generalized Curvature127
and the Jacobi curve Jα(t) at α corresponding to a Hamiltonian H is defined by
Jα(t) = de−t ~H(Vet ~H(α)).
Let e1(t), ..., en(t), f1(t), ..., fn(t) be a moving Darboux frame which satisfies
Jα(t) = spane1(t), ..., en(t)
and assume that the frame satisfies the following structural equations
ei(t) = c1ij(t)ej(t) + c2
ij(t)fj(t), fi(t) = c3ij(t)ej(t) + c4
ij(t)fj(t). (15.1)
Let Ckt be the matrix with entries equal to the structural constants ck
ij(t). Note that
the moving Darboux frame ei(t), fi(t) and the structural constants ckij(t) depend on the
point α in the manifold T ∗M . Let m be the n-form on the manifold T ∗M which satisfies
iei(0)m = 0 and mα(f1(0), ..., fn(0)) = 1. A Hamiltonian H is unimodular with respect
to a n-form η on the manifold M if there is a function K : T ∗M → R which is invariant
under the Hamiltonian flow et ~H such that π∗η = Km.
We will also assume that the structural equations are canonical. To say precisely
what it means, note that if ei(t) is a frame contained in the Jacobi curve Jα(t) at α, then
des ~H(ei(s+t)) is a frame contained in the Jacobi curve Jes ~H(α)(t) at es ~H(α). Therefore, we
can let e1(t), ..., en(t), f1(t), ..., fn(t) be a moving Darboux frame at α satisfying (15.1) and
we define ei(t) and fi(t) by ei(t) := des ~Hei(s + t), fi(t) := des ~Hfi(s + t). The structural
equations are canonical if e1(t), ..., en(t), f1(t), ..., fn(t) is a moving Darboux frame at
es ~H(α) satisfying
˙ei(t) = c1ij(t + s)ej(t) + c2
ij(t + s)fj(t),˙f i(t) = c3
ij(t + s)ej(t) + c4ij(t + s)fj(t).
Let us denote the differential of the map x 7→ −dfx by F , then the map dϕt satis-
fies dϕt = dπdet ~HF . If we let ςi = dπ(fi(0)), then the vectors F(ς1), ...,F(ςn) span a
linear subspace W of the symplectic vector space TαT ∗M . We write F(ςi) as a linear
combination with respect to the moving Darboux frame defined in (15.1):
Chapter 15. Optimal Transportation and the Generalized Curvature128
F(ςi) =3∑
k=1
(aij(t)ej(t) + bij(t)fj(t)) or Ψ = AtEt + BtFt,
where At = (aij(t)), Bt = (bij(t)) and Ψ, Et and Ft are matrices with rows F(ςi), ei(t)
and fi(t) respectively.
Lemma 15.1 Assume that the measures ϕt∗µ is absolutely continuous with respect to µ,
the Hamiltonian is unimodular with respect to µ and the structural equation is canonical,
then the density ρt of ϕt∗µ satisfies
ρt(ϕt(x)) det Bt = 1.
Proof Assume that e1(t), ..., en(t), f1(t), ..., fn(t) is a moving Darboux frame at α which
satisfies (15.1). Using the definition of ei and fi, we have
des ~HF(ςi) =n∑
k=1
(aij(s)ej(0) + bij(s)fj(0)).
Since the structural equations are canonical, it follows that
m(des ~HF(ς1), ..., des ~HF(ςn)) = det Bs.
By the definition of the volume form η, the above expression implies that
η(dϕs(ς1), ..., dϕs(ςn)) = K(es ~Hα) det Bs = K(α) det Bs.
Since the function ρt is the density of the push forward measure ϕt∗η with respect to the
measure µ (i.e. ϕt∗η = ρtµ), it follows that
K(α) det B0 = µ(ς1, ..., ςn) = K(α)ρs(ϕs(x)) det Bs.
Since π(−dfx) = x, B0 is the identity matrix and the proof is complete. ¤
By assumption, for almost all points z in the manifold M , the map d(ϕt)z is non-
singular for all values of time t in [0, 1). It follows that the density ρ(ϕt(z)) is nonzero
Chapter 15. Optimal Transportation and the Generalized Curvature129
for each such t. Lemma 15.1 shows that the corresponding matrix Bt is invertible and
so the linear space W is transversal to the space Jα(t) = spane1(t), ..., en(t). There-
fore, W is the graph of a linear map from the space spanf1(t), ..., fn(t) to the space
Jα(t) = spane1(t), ..., en(t). Let St be the corresponding matrix. (i.e. the linear map
is given by fi(t) 7→∑3
i=1 Sijt ej(t), where Sij
t are the entries of the matrix St.) Finally we
come to the main theorem of the appendix proved with A. Agrachev.
Theorem 15.2 [6] Suppose that the same assumptions as in Lemma 15.1 hold and as-
sume further that, for almost all z in M , the map d(ϕt)z is nonsingular for all values of
time t in [0, 1). Then the matrix St satisfies the following matrix Riccati equation
St + C3 + StC1 − C4St − StC
2St = 0
and the density ρt satisfies
ρt(ϕt(z)) = e∫ t0 tr(C4+SsC2)ds.
Lemma 15.3 St = B−1t At.
Proof Since fi(t) +∑3
i=1 Sijt ej(t) is in the subspace W , Ft + StEt = PtΨ = PtAtEt +
PtBtFt for some matrix Pt. By comparing the terms, we have PtAt = St and PtBt = I.
¤
Proof of Theorem 15.2 By differentiating Ψ = BtFt +BtStEt with respect to time t, we
get B−1t BtFt + Ft + B−1
t BtStEt + StEt + StEt = 0. If we apply the structural equations,
then we get
B−1t Bt + C4 + StC
2 = 0,
St + B−1t BtSt + C3 + StC
1 = 0.
Therefore, St satisfies the equation
St + C3 + StC1 − C4St − StC
2St = 0.
Chapter 15. Optimal Transportation and the Generalized Curvature130
Finally let st = ρt(ϕt(x)), then we have, by Lemma 15.1 and 15.3, the following:
1
st
d
dtst = det Bt
d
dtdet(B−1
t ) = −tr(B−1t Bt) = tr(C4 + StC
2).
The rest of the theorem follows as claimed. ¤
Bibliography
[1] A.A. Agrachev: Exponential mappings for contact sub-Riemannian structures. J.
Dynamical and Control Systems, 2 (1996), 321–358
[2] A.A. Agrachev, M. Caponigro: Families of vector fields which generate the group
of diffeomorphisms, preprint, arXiv:0804.4403
[3] A.A. Agrachev, J.P. Gauthier: On the subanalyticity of Carnot-Caratheodory dis-
tances, Ann. I. H. Poincare – AN 18, (2001), 359–382
[4] A. Agrachev, R. Gamkrelidze: Feedback–invariant optimal control theory and dif-
ferential geometry, I. Regular extremals. J. Dynamical and Control Systems, 3
(1997), 343–389
[5] A.A. Agrachev, P. Lee: Optimal Transportation under Nonholonomic Constraints,
to appear in Trans. Amer. Soc. (2008), 35pp.
[6] A.A. Agrachev, P. Lee: Generalized Ricci Curvature Bounds for Three Dimensional
Contact Subriemannian Manifolds, preprint, 31pp.
[7] A.A. Agrachev, Y. L. Sachkov: Control Theory from the Geometric Viewpoint,
Encyclopedia of Mathematical Sciences, Vol. 87, Springer, 2004
[8] A.A. Agrachev, A. V. Sarychev: Strong minimality of abnormal geodesics for 2-
distributions, J. Dynamical and Control Systems, 1995, v.1, 139-176
131
Bibliography 132
[9] A.A. Agrachev, A. V. Sarychev: Abnormal sub-Riemannian geodesics, Morse index
and rigidity, Annales de l’Institut Henry Poincare-Analyse non lineaire, v.13, 1996,
635-690
[10] A. Agrachev, I. Zelenko: Geometry of Jacobi curves, I, II. J. Dynamical and Control
Systems, 8 (2002), 93–140, 167–215
[11] L. Ambrosio, S. Rigot: Optimal mass transportation in the Heisenberg group, J.
Func. Anal. 208 (2004), 261–301.
[12] V.I. Arnold, A.B. Givental: Symplectic geometry, Dynamical systems IV, Ency-
clopaedia Math. Sci., 4, Springer, Berlin (2001), 1–138.
[13] G. Bande, P. Ghiggini, D. Kotschick: Stability theorems for symplectic and contact
pairs, Int. Math. Res. Not. 68 (2004), 3673–3688.
[14] P. Bernard, B. Buffoni: Optimal mass transportation and Mather theory, J. Eur.
Math. Soc. (JEMS) 9 (2007), no. 1, 85–121.
[15] R. Bhatia: Matrix analysis. Graduate Texts in Mathematics, 169. Springer-Verlag,
New York, 1997
[16] Y. Brenier: Polar factorization and monotone rearrangements of vector-valued func-
tions, Comm. Pure Appl. Math. 44:4 (1991), 375–417.
[17] P. Cannarsa, L. Rifford: Semiconcavity results for optimal control problems ad-
mitting no singular minimizing controls, to appear in Ann. Inst. H. Poincare’
Anal. Non Line’aire, http://math.unice.fr/%7Erifford/Papiers_en_ligne/
CANRIFNEW.pdf
[18] P. Cannarsa, C. Sinestrari: Semiconcave Functions, Hamilton-Jacobi Equations,
and Optimal Control, Birkhauser Boston, Progress in Nonlinear Differential Equa-
tions and Their Applications, Vol. 58, 2004
Bibliography 133
[19] D.C. Chang, I. Markina, A. Vasil’ev: Sub-Lorentzian geometry on anti-de Sitter
space. J. Math. Pures Appl. (9) 90 (2008), no. 1, 82–110
[20] D. Cordero-Erausquin, R. McCann, M. Schmuckenschlager: A Riemannian inter-
polation inequality a la Borell, Brascamb and Lieb. Invent. Math., 146: 219-257,
2001.
[21] D. Ebin, J. Marsden: Groups of diffeomorphism and the motion of an incompress-
ible fluid, Ann. of Math. (2) 92 (1970), 102-163
[22] A. Fathi, A. Figalli: Optimal transportation on non-compact manifolds, Israel J.
Math., to appear.
[23] A. Figalli: Existence, uniqueness and regularity of optimal transport maps, SIAM
J. Math. Anal., 39 (2007), no.1, 126-137.
[24] A. Figalli, L. Rifford: Mass Transportation on sub-Riemannian Mani-
folds, preprint, http://math.unice.fr/$\sim$rifford/Papiers\_en\_ligne/
transpSRFigRif.pdf
[25] G. Freiling, G. Jank, H. Abou-Kandil: Generalized Riccati difference and differen-
tial equations. Linear Algebra Appl., 241/242, 291-303 (1996)
[26] R. V. Gamkrelidze: Principles of Optimal Control Theory, Plenum Publishing
Corporation, New York, 1978
[27] E. Ghys: Feuilletages riemanniens sur les varietes simplement connexes, Ann. Inst.
Fourier (Grenoble) 34:4 (1984), 203–223.
[28] J.W. Gray: Some global properties of contact structures, Ann. of Math. 69:2 (1959),
421–450.
[29] L. Hormander: Hypoelliptic second order differential equations, Acta Math., 119
(1967), 147-171
Bibliography 134
[30] L. Hormander: The analysis of linear partial differential operators III, Classics in
Mathematics, Springer, 1983
[31] N. Juillet: Geometric Inequalities and Generalized Ricci Bounds in the Heisenberg
Group, to appear in IMRN
[32] V. Jurdjevic: Geometric control theory, Cambridge Studies in Advanced Mathe-
matics, 52. Cambridge University Press, Cambridge, 1997
[33] L. Kantorovich: On the translocation of masses, C.R. (Doklady) Acad. Sci.
URSS(N.S.), 37, 1942, 199-201
[34] B. Khesin, P. Lee: A nonholonomic Moser theorem and optimal mass transport,
preprint arXiv: 0802.1551 (2008), 31pp.
[35] B. Khesin, G. Misiolek: Shock waves for the Burgers equation and curvatures of
diffeomorphism groups, Proc. Steklov Inst. Math., v.250 (2007), 1–9.
[36] J.J. Levin: On the matrix Riccati equation. Proc. Amer. Math. Soc. 10 1959 519–
524.
[37] C.B. Li, I. Zelenko: Differential geometry of curves in Lagrange Grassmannians
with given Young diagram. arXiv:0708.1100v1
[38] W.S. Liu, H.J. Sussmann: Shortest paths for sub-Riemannian metrics on rank-2
distributions, Memoirs of AMS, v.118, N. 569, 1995
[39] J. Lott, C. Villani: Ricci curvature for metric-measure spaces via optimal transport,
Ann. of Math. (2), in press
[40] J. Lott, c. Villani: Weak curvature conditions and functional inequalities. J. Funct.
Anal. 245 (2007), no. 1, 311–333
Bibliography 135
[41] J. Marsden, T. Ratiu: Introduction to Mechanics and Symmetry, Texts in Applied
Mathematics, Vol. 17, Springer, 1999
[42] R. McCann: Polar factorization of maps in Riemannian manifolds, Geom. Funct.
Anal., 11:3 (2001), 589-608
[43] J. Moser: On the volume elements on a manifold, Trans. of the AMS, 120:2 (1965),
286–294.
[44] R. Montgomery: Abnormal Minimizers, SIAM J. Control and Optimization, vol.
32, no. 6, 1994, 1605-1620.
[45] R. Montgomery: A tour of subriemannian geometries, their geodesics and applica-
tions, AMS, Mathematical Surverys and Monographs, vol. 91, 2002
[46] Duy-Minh Nhieu: The Neumann problem for sub-Laplacians on Carnot groups and
the extension theorem for Sobolev spaces, Ann. Mat. Pura Appl. (4) 180 (2001),
no. 1, 1–25.
[47] Duy-Minh Nhieu, N. Garofalo: Lipschitz continuity, global smooth approximations
and extension theorems for Sobolev functions in Carnot-Caratheodory spaces, J.
d’Analyse Math., 74 (1998), 67–97.
[48] S. Ohta: On the measure contraction property of metric measure spaces. Comment.
Math. Helv. 82 (2007), no. 4, 805–828
[49] S. Ohta: Finsler interpolation inequalities, to appear in Calc. Var. Partial Differ-
ential Equations
[50] B. O’neill: Semi-Riemannian geometry. With applications to relativity. Pure and
Applied Mathematics, 103. Academic Press, Inc., New York, 1983.
[51] B. O’Neill: Submersions and geodesics, Duke Math. J., 34 (1967), 363–373.
Bibliography 136
[52] F. Otto: The geometry of dissipative evolution equations: the porous medium
equation, Comm. Partial Differential Equations, 26:1-2 (2001), 101–174.
[53] F. Otto, C. Villani: Generalization of an inequality by Talagrand and links with
the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361-400, 200
[54] L.P. Rothschild, E. Stein: Hypoelliptic differential operators and nilpotent group,
Acta Math., 137 (1976), 247–320.
[55] L.P. Rothschild, D. Tartakoff: Parametrices with C∞ error for cmb and operators
of Hormander type, Partial differential equations and geometry (Proc. Conf., Park
City, Utah, 1977), pp. 255–271, Lecture Notes in Pure and Appl. Math., 48, Dekker,
New York, 1979.
[56] A. Sarychev, D. Torres’: Lipschitzian regularity conditions for the mnimizing tra-
jectories of optimal control problems, in Nonlinear analysis and its applications to
differential equations (Lisbon, 1998), 357-368, Progr. Nonlinear Differential Equa-
tions Appl., 43, Birkhauser Boston, Boston, MA, 2001
[57] A.I. Shnirelman: The geometry of the group of diffeomorphisms and the dynamics
of an ideal incompressible fluid, Math. USSR-Sb. 56 (1987), 79–105.
[58] K.T. Sturm: On the geometry of metric measure spaces. Acta Math. 196, no.1,
65-131 (2006)
[59] K.T. Sturm: On the geometry of metric measure spaces II. Acta Math. 196, no. 1,
133-177 (2006)
[60] K.T. Sturm and M.K. von Renesse: Transport inequalities, gradient estimates,
entropy and Ricci curvature, Comm. Pure Appl. Math. 58 (2005), 923C940
[61] H.J. Sussmann: A Cornucopia of Abnormal Sub-Riemannian Minimizers. Part I:
The Four dimensional Case, IMA technical report no. 1073, December, 1992
Bibliography 137
[62] M.E. Taylor: Partial Differential Equations I (Basic Theory), Applied Mathemati-
cal Sciences, vol. 115, Springer
[63] C. Villani: Topics in Optimal Transportation, Graduate Studies in Mathematics
58, AMS, Providence, 2003
[64] C. Villani: Optimal Transport - old and new, Grundlehren der mathematischen
Wissenschaften , Vol. 338, Springer, 2009 preprint