rough paths theory and its application to time series ... math fin... · of using third order areas...
TRANSCRIPT
Rough Paths Theory and itsApplication to Time Series
Analysis of Financial Data Streams
Antti K. Vauhkonen
Christ Church
University of Oxford
A thesis submitted in partial fulfillment ofthe degree of Master of Science in Mathematical Finance
Trinity 2017
Abstract
The signature of a continuous multi-dimensional path of bounded varia-
tion, i.e. the sequence of its iterated integrals, is a central concept in the
theory of rough paths. The key insight of this theory is that for any path of
finite p-variation with p ≥ 1 (e.g. sample paths of Brownian motion have
finite p-variation for any p > 2 almost surely), one can define a construct
analogous to signature, called its rough path lift, that incorporates all the
information required for solving controlled differential equations driven
by the given path. In the first part of this thesis we give an intuitive yet
mathematically rigorous introduction to rough paths.
Information encoded in the signatures of multi-dimensional discrete data
streams can also be utilised in their time series analysis, and in some
recent publications signatures of financial data streams have been used
as feature sets in linear regression for the purposes of classifying data
and making statistical predictions. In the second part of this thesis we
present a novel application of this signature-based approach in the context
of a market model where every variable is assumed to follow a diffusion
process that either has a constant underlying drift or reverts to some
long-term mean. Specifically, we show that third order areas of financial
data streams – special linear combinations of their fourth order iterated
integrals – provide an efficient means of determining the parameters of a
market variable given one of its realisations in a space of finitely many
Brownian sample paths that can drive the process, and thus enable one
to distinguish between the two fundamental modes of market behaviour,
namely upward or downward trending versus mean-reverting.
An interesting line of future research would be to investigate the possibility
of using third order areas as a tool for decomposing arbitrary market paths
into mean-reverting path components with a spectrum of mean reversion
speeds.
To the memory of my mother.
Acknowledgements
I would like to express my gratitude to my academic supervisor Prof.
Ben Hambly for his technical guidance, careful reading of my thesis and
valuable comments.
I also owe a big debt of gratitude to Dr. Daniel Jones for giving his time
so generously, his wise counsel and constant encouragement and support
without which this thesis would probably never have been completed.
My sincere thanks are also due to my family for their help, support and
understanding while working on this thesis over a period that at times
must have seemed interminable.
Lastly, with love and eternal gratitude I remember my late mother, my
most steadfast supporter in all of my varied endeavours, who sadly didn’t
live to see this project reach its conclusion.
Contents
1 Rough paths theory 1
1.1 Origins of rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Formal definition of rough paths . . . . . . . . . . . . . . . . . . . . . 11
2 Application of rough paths theory to time series analysis of financial
data streams 21
2.1 Classical time series analysis . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Signatures as feature sets for linear regression analysis . . . . . . . . . 22
2.3 Lead and lag transforms of data streams . . . . . . . . . . . . . . . . 27
2.3.1 Gyurko-Lyons-Kontkowski-Field method . . . . . . . . . . . . 28
2.3.2 Flint-Hambly-Lyons method (Mark 1) . . . . . . . . . . . . . 30
2.3.3 Flint-Hambly-Lyons method (Mark 2) . . . . . . . . . . . . . 30
2.4 Area processes of multi-dimensional paths . . . . . . . . . . . . . . . 32
2.4.1 Definition and basic properties of areas . . . . . . . . . . . . . 32
2.4.2 Higher order areas . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Classification of paths using third order areas . . . . . . . . . . . . . 39
2.5.1 Diffusion process market model . . . . . . . . . . . . . . . . . 39
2.5.2 Areas for pairs of diffusion processes . . . . . . . . . . . . . . 42
2.5.3 Classifying sample paths of diffusion processes by using third
order areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References 54
Appendix 1: Quadratic variation and cross-variation of data streams 56
Appendix 2: Python code 58
i
List of Figures
1 GLKF method of lead-lag transforming data streams. . . . . . . . . . 29
2 FHL (Mark1) method of lead-lag transforming data streams. . . . . . 31
3 FHL (Mark 2) method of lead-lag transforming data streams. . . . . 32
4 Area between path components X i and Xj . . . . . . . . . . . . . . . 33
5 A typical 2-dimensional Brownian sample path. . . . . . . . . . . . . 34
6 Scatter plot of areas of two pairs of MR processes with different long-
term means and volatilities, but all four processes having the same
mean reversion speed and driven by the same Brownian path. . . . . 43
7 Scatter plot of areas of the same two pairs of MR processes as in Figure
6 after slightly altering the mean reversion speed for one of the processes. 44
8 Scatter plot of the areas of a pair of two CD processes and a mixed
pair of CD and MR processes all driven by the same Brownian path. 45
9 Scatter plot of terminal values of the areas of two mixed pairs of CD
and MR processes for 500 simulation runs. . . . . . . . . . . . . . . . 46
10 Scatter plot of terminal values of the same two areas as in Figure 9
with different long-term means assigned to the MR processes. . . . . 47
11 Scatter plots of terminal values of the areas of two pairs of MR pro-
cesses all having the same mean reversion speed for 500 simulation
runs with the pairwise correlation between Brownian motions driving
the processes equal to 1.00, 0.99 and 0.90, respectively. . . . . . . . . 48
ii
List of Tables
1 Determining the mean reversion speed of a given ‘market’ path by
minimising its third order area with three test paths all driven by the
same Brownian motion. . . . . . . . . . . . . . . . . . . . . . . . . . . 50
iii
Chapter 1
Rough paths theory
1.1 Origins of rough paths
There is no more fundamental question in science than that pertaining to the nature
of change. Since antiquity thinkers have pondered over problems concerning motion,
as illustrated by the famous paradoxes of Zeno. In one of them, Zeno argued that a
flying arrow occupies a particular position in space at any given instant of time, hence
is instantaneously motionless, and since time consists of instants, he concluded that
motion is just an illusion; and in another paradox the Greek hero Achilles was unable
to overtake a tortoise in a race where the latter had been given a head start, for in
order to accomplish this he would need to traverse an infinite number of (progressively
shorter) distances, which, according to Zeno, is impossible in a finite amount of time.
While the arrow paradox can be seen simply as an acute observation that motion
has no meaning with respect to a single instant of time – in fact any set of instants
which has zero measure – the notion of an infinite series that is convergent to a limit
provides a satisfactory resolution to the Achilles and tortoise paradox: specifically,
that a geometric series like 1/2 + 1/4 + 1/8 + 1/16 + ∙ ∙ ∙ that arises in the equivalent
dichotomy paradox does not grow without limit but converges to 1, enabling Achilles
to quickly pass the tortoise.
The concept of a limit of a function f that expresses the dependence of a variable y
on another variable x as y = f(x) was similarly crucial to the development in the late
1600s of modern differential and integral calculus which provides proper analytical
tools for the mathematical study of change. The chief among these is the derivative of
a function, usually denoted by f ′(x), f(x) or df(x)dx
, which, as the last notation due to
Leibniz suggests, was originally conceived as the quotient of an infinitesimally small
change df(x) in the value of the function f(x) corresponding to an infinitesimally
1
small change dx in the value of its argument x, until derivatives were defined in a
more rigorous way using the (ε, δ)-definition of a limit in the early 19th century.
Rather than needing to differentiate given functions, one is often faced with the
(usually harder) inverse problem of finding a function F (x) whose derivative is a given
function f(x), i.e. solving the differential equation
dF (x)
dx= f(x). (1.1)
By the fundamental theorem of calculus, such an antiderivative F (x) of f(x) is the
same as an indefinite integral of f(x), i.e.
F (x) =
∫ x
a
f(z) dz
for any constant a < x in the domain of f where it is continuous.
Differential equations first emerged in the context of dynamical systems as a way
to implicitly describe their time evolution, and most fundamental laws in the mathe-
matical sciences from fluid dynamics and electromagnetism to general relativity and
quantum mechanics – and also mathematical finance – are expressed in terms of dif-
ferential equations. For example, if in (1.1), relabelling the independent variable t
for time, f(t) is a time-varying force acting on a body of mass m, then, according to
Newton’s second law of motion, the momentum mv(t) = mdx(t)dt
of the body, where
x(t) is its position at time t, is a solution of this differential equation. Indeed, this
is the first type of differential equation Newton considered and solved using infinite
series in his Methodus fluxionum et Serierum Infinitarum of 1671.
The second type of differential equations that Newton studied in the same work
are of the formdy
dx= f(x, y),
and we will be especially interested in the special case where f is a function of the
unknown variable y only, i.e.dy
dx= f(y). (1.2)
However, not all functions are differentiable. Up to the second half of the 19 th
century, it was a general belief among mathematicians that continuous functions had
to be everywhere differentiable except at some isolated singular points, until the
first examples of ‘pathological’ continuous functions that are nowhere differentiable
were constructed by Riemann and Weierstrass. Actually, far from being pathological,
such functions are in fact the norm rather than exceptions, for almost all continuous
2
functions – viewed as sample paths of a Brownian motion – can be seen to be nowhere
differentiable!
For non-differentiable functions, we would like to generalise (1.2) and write it (in
a manner of Leibniz) in the following form:
dy = f(y) dx (1.3)
– hoping that we can still make sense of it subject to some conditions! We can think
of (1.3) as describing a dynamical system that evolves in such a way that the change
in its state y = yt over an infinitesimally small time interval[t, t + dt
]is given by the
product of its velocity in the current state, as specified by the vector field f on the
state space, and the corresponding increment in the control process xt driving the
system.
In general, the state space of a dynamical system is some manifold that is locally
either a Euclidean space or a Banach space so that in these two cases (1.3) can be
rewritten as
dyt =d∑
i=1
fi(yt) dxit (1.4)
where yt ∈ Re, fi : Re → Re and xit ∈ R for i = 1, . . . , d, or
dyt = f(yt) dxt (1.5)
where yt ∈ U , xt ∈ V and f : U → L(V, U) with U and V (possibly infinite-
dimensional) Banach spaces. Equations of the type of (1.4) and (1.5) are called
controlled differential equations.
Thus, assuming that initially at time 0 the system is in state y0, solving for its
state yt at an arbitrary time t > 0 involves iterating its equation of motion (1.5) an
infinite number of times and integrating all the infinitesimally small local changes
into a global change over the time interval[0, t], so that
yt = y0 +
∫ t
0
f(yu) dxu. (1.6)
Within the theory of rough paths, whose formal definition will be given in the next
section, the function If (xt, y0) 7→ yt is called the Ito map associated with the vector
field f .
In other words, in the language of differential geometry, for finding yt we should
be able to integrate the one-form f on V with values in the linear space of vector
fields on U with respect to the path xt in V . As one might expect, this cannot be done
3
in general without imposing some regularity conditions on f and xt. It is a classical
result (the Picard-Lindelof theorem, see [10, Theorem 1.3]) that if the vector field f is
Lipschitz continuous and the control process xt is a path of bounded variation in V ,
then, for any initial condition y0 ∈ U , (1.5) has a unique solution given by (1.6) where
the integral is defined as a Stieltjes integral. Under the weaker condition that f is
merely continuous, a solution is still guaranteed to exist by the Cauchy-Peano theorem
(see [10, Theorem 1.4]), but it may not be unique. But for less regular driving signals
– e.g. sample paths of a Brownian motion – classical integration methods are known
to fail. Let us see why this is the case by considering formal solution of controlled
differential equations using iteration.
For simplicity, we shall consider the 1-dimensional case where xt, yt and f all take
values in R. Hence, formally integrating (1.4) gives
u=t∫
u=s
dyu =
u=t∫
u=s
f(yu) dxu. (1.7)
Under any reasonable definition of an integral, the left hand side of (1.7) must be
equal to δys,t := yt − ys, so we have
δys,t =
u=t∫
u=s
f(yu) dxu. (1.8)
Further, expanding f about ys in a formal Taylor series (effectively assuming that f
is an analytic function) on the right hand side of (1.8), then using (1.8) to substi-
tute integral expressions for the increments δys,t in the Taylor series expansion, and
repeating the procedure yields after three iterations
δys,t = f(ys)
u=t∫
u=s
dxu
+ f ′(ys)f(ys)
u=t∫
u=s
v=u∫
v=s
dxv dxu
+ f ′(ys)2f(ys)
u=t∫
u=s
v=u∫
v=s
w=v∫
w=s
dxw dxv dxu
+ 12f ′′(ys)f(ys)
2
u=t∫
u=s
v=u∫
v=s
dxv
w=u∫
w=s
dxw
dxu + . . .
(1.9)
4
where the remaining terms (not shown above) all involve fourth or higher order iter-
ated integrals. Moreover, provided that the above integrals satisfy the usual integra-
tion by parts formula, the integral in the last term can be written as
u=t∫
u=s
v=u∫
v=s
dxv
w=u∫
w=s
dxw
dxu = 2
u=t∫
u=s
v=u∫
v=s
w=v∫
w=s
dxw dxv dxu.
In the general multi-dimensional case, an expression analogous to (1.9) can be
just as easily derived for each component yjt of yt for j = 1, . . . , e. For each integer
n ≥ 1, let us formally define nth order componentwise iterated integrals of a path xt
in Rd over the time interval[s, t]
by
xi1,...,ins,t :=
un=t∫
un=s
. . .
u1=u2∫
u1=s
dxi1u1
. . . dxinun
(1.10)
for i1, . . . , in ∈ {1, . . . , d}. In particular, xis,t = xi
t − xis for i = 1, . . . , d, so the first
order iterated integrals of xt ∈ Rd are just its componentwise linear increments over[s, t]. Then, for each j ∈ {1, . . . , e}, we have
yjt = yj
s +∞∑
n=1
∑
i1,...,in∈{1,...,d}
F ji1,...,in
(ys) xi1,...,ins,t (1.11)
where the functions F ji1,...,in
: Re → R are products of partial derivatives of components
of the vector fields f ji : Re → R evaluated at ys for i = 1, . . . , d and j = 1, . . . , e, as
in (1.9).
As illustrated by (1.11), the importance of iterated integrals for solving controlled
differential equations is due to the fact that the local behaviour of the solution is
controlled by the sequence of iterated integrals of the path driving the equation –
assuming that the series in (1.11) converges and a solution does indeed exist. However,
this is by no means always the case. For we need to remind ourselves that the solution
in (1.11) was derived under the strongest possible condition on the vector field f
(namely analyticity), and, moreover, we didn’t specify how the iterated integrals in
(1.10) should be constructed, but rather tacitly assumed that they can be canonically
defined as limits of Riemann sums even though we also didn’t impose any condition
on the regularity of the path xt. To advance beyond the classical Picard-Lindelof and
Cauchy-Peano theorems, one would like to be able to solve (1.5) for vector fields which
satisfy some mildly stronger form of continuity than plain continuity, and for paths
5
which are not of bounded variation but whose irregularity – colloquially roughness –
is nevertheless controlled.
For this purpose we introduce the concept of p-variation of a path. For a closed
bounded time interval[0, T
], a subdivision D of
[0, T
]will be taken to mean a
finite ordered set of real numbers (t0, t1, . . . , tk) such that 0 = t0 < t1 < ∙ ∙ ∙ < tk = T ,
denoting the set of all subdivisions of[0, T
]by D(
[0, T
]). Then we make the following
Definition 1.1. Let x :[0, T
]→ Rd be a continuous function. Then, for any real
number p ≥ 1, the p-variation of x on[0, T
]is defined by
‖x‖p,[0,T ] =
(
supD∈D([0,T ])
k∑
h=1
|xth − xth−1|p)1/p
where | ∙ | denotes the Euclidean norm on Rd.
The concept of p-variation can also be straightforwardly extended to paths xt that
take values in an arbitrary Banach space V by replacing the Euclidean norm with a
norm ‖ ∙ ‖V on V in the above definition. Up to reparameterisation, a path having
finite p-variation is equivalent to saying that it is Holder continuous with exponent 1p.
Of course, paths with finite 1-variation are just paths of bounded variation. It is worth
emphasizing that the p-variation of a path is defined by taking the supremum over
all the subdivisions of the time interval, not as a limit as the mesh of the subdivision
tends to zero, as there are paths of finite (non-zero) p-variation with p > 1 for which
the latter is zero. One should also note that if a path has finite p-variation, then it
also has finite q-variation for all q > p.
As a major advance to the classical theory of integration, L. C. Young discovered
in 1936 (see [13]) that Stieltjes integrals can also be defined for paths which are
of unbounded variation but have finite p-variation for some p > 1 as long as the
integrand is a continuous function of finite q-variation such that 1p
+ 1q
> 1. This
result allows (1.5) to be solved for paths of finite p-variation with 1 ≤ p < 2 provided
that the vector field f is Lipschitz-γ continuous with p− 1 < γ ≤ 1, and, subject to
these conditions, the Young integral, as a function of t, also has finite p-variation.
However, even with Young’s extension of the classical theory, integration against
sample paths of stochastic processes remained tantalisingly out of reach for a long
time, as many important classes of stochastic processes have finite p-variation with
p ∈[2, 3). In particular, almost all Brownian paths have infinite 2-variation and
finite p-variation for all p > 2 on any finite time interval – which is not to be confused
with the fundamental fact that the quadratic variation process of a Brownian motion
6
(Bt)t≥0 is deterministic, finite and equal to t – and sample paths of semi-martingales
almost surely have finite p-variation for p ∈ (2, 3).
It wasn’t until 1945 that integrals of some tractable stochastic processes against
Brownian motion were successfully defined when K. Ito published his construction of
what is now called, in his honour, the Ito integral, which has subsequently been ex-
tended to other martingales and, further, semi-martingales as integrators. Essentially
the Ito integral is a Riemann-Stieltjes type of stochastic integral in that it is defined
as the limit of a sequence of Riemann sums of random variables that converges in
probability.
Unfortunately, such limits do not usually exist in a pathwise sense – which per-
haps isn’t all that surprising considering that Brownian motion has exceedingly nice
properties – it is a Gaussian process with independent and stationary increments –
whereas its sample paths are very rough being (almost surely) nowhere differentiable
and having unbounded variation on any time interval (no matter how small). So,
in view of this, while Ito’s theory of stochastic integration ranks among the princi-
pal achievements of 20th century mathematics, developing a theory of integration for
Brownian motion paths would appear, on the face of it, an even more challenging
task, although some early results in this direction – notably the construction of the
Levy area of a 2-dimensional Brownian path, defined as the difference of two second
order iterated integrals – had been established even before the invention of stochastic
integrals.
Let us now examine in some detail, albeit somewhat heuristically, what goes wrong
when one tries to define iterated integrals of Brownian paths as classical Riemann
integrals, as this will give us important clues as to how one should formally define
rough paths. But first, as a precursor, let us briefly return to the construction of
iterated integrals for more regular paths.
For a continuous path xt = (x1t , . . . , x
dt ) ∈ R
d on a time interval[0, T
]all of whose
components are differentiable functions of t, we can define its nth iterated integrals
xi1,...,ins,t over a subinterval
[s, t]
for any n ≥ 1 as limits of the sequences of Riemann
sums
S ns,t(N) =
N∑
in=1
in∑
in−1=1
∙ ∙ ∙i2∑
i1=1
(xi1
ti1− xi1
t(i1−1)
). . .(x
in−1
tin−1− x
in−1
t(in−1−1)
)(xin
tin− xin
t(in−1)
)
(1.12)
where tik − t(ik−1) = (t− s)/N for k = 1, . . . , n, so that
xi1,...,ins,t = lim
N→∞S n
s,t(N). (1.13)
7
Assuming that (t− s)� 1, we have, by Taylor’s theorem, that
xiktik− xik
t(ik−1)= x(t(ik−1))(t− s)/N + o
((t− s)/N
),
which, when substituted into (1.12), implies, by (1.13), that xi1,...,ins,t ∼ (t− s)n.
If xt has bounded variation on[0, T
], then its iterated integrals can be similarly
defined as Stieltjes integrals, and we also have xi1,...,ins,t ∼ (t− s)n.
Even when xt is of unbounded variation but has finite p-variation for some p ∈
(1, 2) its iterated integrals can still be defined in this way as Young integrals, but now
xi1,...,ins,t ∼ (t − s)n/p. Thus, in common with paths of bounded variation, this means
that also in this case xi1,...,ins,t = o(t−s) for any n ≥ 2, so that second and higher order
iterated integrals all become negligible as t→ s.
Finally, let xt be a sample path of a Brownian motion (Bt)0≤t≤T , and, for the sake
of simplicity, let us assume that d = 1, i.e. the Brownian motion Bt is 1-dimensional.
It is instructive to consider iterated integrals of xt from the view point of expected
values of corresponding stochastic integrals of Bt using the defining properties of
Brownian motion – even though this will not lead us to precisely the right answers.
For example, the sum of the expected absolute increments of Bt over[0, T
]in the
limit as the size of time increments tends to zero is given by
limN→∞
N∑
i=1
E[|Bi(T/N) − B(i−1)(T/N)|
]= lim
N→∞
N∑
i=1
√2π
√TN
= limN→∞
√2TN
π=∞
since, for all 0 ≤ s < t ≤ T , Bt−Bs is normally distributed with mean 0 and variance
t− s, and hence E[|Bt − Bs|
]=√
2π
√t− s. These results suggest that a Brownian
path xt has infinite variation on any finite time interval (as T can be made arbitrarily
small) – which is correct (almost surely) – and that x 1s,t = xt−xs ∼ (t−s)1/2 – which
is almost correct.
Similarly, one can get a measure of the 2-variation of xt on[0, T
]by computing
N∑
i=1
E[ (
Bi(T/N) − B(i−1)(T/N)
)2 ]=
N∑
i=1
T/N = T, (1.14)
which indicates that Brownian paths have finite 2-variation – which, as we know by
now, is nearly but not quite true.
Further, corresponding to the second order iterated integral x 2s,t of xt over
[s, t]
defined by
x 2s,t :=
u=t∫
u=s
v=u∫
v=s
dxv dxu
8
we have the following discrete stochastic approximation
N∑
i=1
i∑
j=1
(Bs+j(t−s)/N − Bs+(j−1)(t−s)/N
) (Bs+i(t−s)/N − Bs+(i−1)(t−s)/N
). (1.15)
Since disjoint increments of Brownian motion are independent, taking expectation of
(1.15) simply yields
N∑
i=1
E[ (
Bs+i(t−s)/N − Bs+(i−1)(t−s)/N
)2 ](1.16)
which, by (1.14), is just equal to t − s. Thus, we might conjecture that x 2s,t ∼
(t − s) – which again is slightly wrong. In order not to mislead the reader any
further, let us state the correct expressions for the orders of magnitude of the first
and second iterated integrals of Brownian paths xt : for any p > 2, x 1s,t ∼ (t − s)1/p
and x 2s,t ∼ (t − s)2/p. Hence, for Brownian paths, second iterated integrals do not
become negligible as t → s, but rather both their first and second iterated integrals
are greater than first order in (t− s).
Now it is also apparent from (1.15) and (1.16) why the second iterated integral of
a Brownian path xt cannot be defined as the limit of Riemann sums, for this would
include the following expression
limN→∞
N∑
i=1
(xs+i(t−s)/N − xs+(i−1)(t−s)/N
)2
which is infinite since Brownian paths have unbounded 2-variation almost surely, and
the total contribution from the cross terms involving disjoint subintervals of[s, t]
can
be expected to be finite and small.
In general, if xt ∈ Rd is a path of finite p-variation on[0, T
]for some p ≥ 1, then,
motivated by the above discussion, we would like its iterated integrals – if they can
be defined – to satisfy the analytic condition xi1,...,ins,t ∼ (t−s)n/p for all 0 ≤ s < t ≤ T
and n ≥ 1. In particular, this would mean that xi1,...,ins,t = o(t − s) for any n > b pc,
and thus, going back to our example of solving a controlled differential equation using
iteration and assuming that the control process has finite p-variation with p ∈[3, 4),
its solution would be given to first order in (t− s) by the terms shown in (1.9), while
ignoring any of these terms would mean that the Ito map taking the control and an
initial condition to the solution would in general fail to be continuous.
One can now also fully appreciate the significance of p = 2 as a key threshold, since
for paths of finite p-variation with 1 ≤ p < 2 iterated integrals can be canonically
9
defined as either Riemann-Stieltjes (p = 1) or Young (1 < p < 2) integrals even
though they are not required beyond first order linear increments for solving controlled
differential equations, whereas when p ≥ 2 second and possibly also higher order
iterated integrals would be needed in the solution if only they could be defined! As we
will see, the theory of rough paths provides a general resolution to this fundamental
dichotomy. But for now, let us just note, as already mentioned, that Brownian
motion is one of those special stochastic processes with irregular sample paths for
which second order iterated integrals can be defined pathwise – the Levy area of a
2-dimensional Brownian path involving one specific construction – and with them all
differential equations controlled by Brownian paths can be solved (subject to vector
fields being regular enough).
The problem of integrating one-forms f along a path xt in Rd essentially boils
down to being able to give meaning to the differential dxt. The challenges that one
faces when trying to define differentials of less regular paths are well illustrated by
considering Brownian paths. While a differentiable function x(t) of time t becomes
smooth and linear on sufficiently small time scales so that its differential can be
expressed as dx(t) = x(t) dt, there is no time scale on which a typical Brownian
path wt could be guaranteed to behave in a regular fashion: for, as δt → 0, δwt :=
wt+δt − wt can go through a whole range of positive and negative values – indeed
taking arbitrarily large values with non-zero probability for any δt > 0 – so that
δwt/δt does not approach any limiting value; all we can say is that |δwt| is expected
to be of the order δt 1/2. Therefore, simply knowing linear increments of a Brownian
path is not sufficient to define its differential.
For a differential equation of the form dyt =∑d
i=1 fi(yt) dxit to make sense, we
should be able to write down a full expression for the change δyt,t+δt in yt over a
small time interval[t, t + δt
]that is first order in δt, which, assuming that xt has
finite p-variation for some p ≥ 1 on[t, t+ δt
], involves, as we have seen above, its nth
order iterated integrals for n = 1, . . . , b pc, supposing that these satisfy the analytic
condition xi1,...,int,t+δt ∼ (δt)n/p for all n ≥ 1. Then, letting δt→ 0, any higher order terms
become negligible and vanish for an infinitesimally small change dt, so, by (1.11), we
have that
dyt =
b pc∑
n=1
∑
i1,...,in∈{1,...,d}
Fi1,...,in(yt) dxi1,...,int (1.17)
which can then be normally integrated with respect to time t. In this sense, we can
say that the sequence of iterated integrals dxi1,...,int := xi1,...,in
t,t+dt over the infinitesimal
10
time interval[t, t + dt
]for n = 1, . . . , b pc describes the full differential dxt of xt.
We have seen above that the sequence of iterated integrals of a path xt ∈ Rd
emerges in a natural way when one (formally) solves a differential equation controlled
by xt through iteration. Moreover, it has been known since the works of K. T. Chen
in the 1970s (see [1]) that if xt is a path of bounded variation and one forms all the
iterated integrals of xt into a single mathematical object, called the signature of the
path, viewing it as an element of the infinite sequence of successive tensor product
powers of Rd, then this object can be shown to possess some remarkable algebraic
(multiplicative) properties.
The central idea of the theory of rough paths is to define for any path of finite
p-variation with p ≥ 1 an analogous object, called a p-rough path, as an extension of
the path into an extended tensor product space that satisfies the relevant algebraic
conditions. Furthermore, as part of the definition, second and higher order com-
ponents of a p-rough path with p ≥ 2 – which play the role of canonically defined
iterated integrals for more regular paths with 1 ≤ p < 2, and provide the data that
enables differential equations controlled by the rough path to be solved – are assumed
to satisfy the analytic condition prescribed above, thus extending the concept of finite
p-variation to rough paths. All of these foundational ideas will be made rigorous in
the following section where rough paths are formally defined.
1.2 Formal definition of rough paths
Let x :[0, T
]→ V be a continuous path of finite 1-variation, as defined in Definition
1.1, where V = Rd with an ordered set of basis vectors {e1, . . . , ed}, so that, for each
integer n ≥ 1, the set {ei1⊗ . . . ⊗ ein : i1, . . . , in ∈ {1, . . . , d}} furnishes a basis for the
nth tensor power V ⊗n of V . For short, we shall denote ei1 ⊗ . . . ⊗ ein by ei1,...,in . It is
easy to see that, for each n ≥ 1, V⊗n is isomorphic as a vector space to the space of
homogeneous polynomials of degree n in non-commuting indeterminates X1, . . . , Xd.
Hence, the extended tensor product algebra T (V ) of V defined by
T (V ) := R ⊕ V ⊕ V ⊗ 2 ⊕ . . .
with componentwise addition and multiplication induced by the tensor product is
isomorphic as an algebra to the space of all formal power series in X1, . . . , Xd with the
tensor product of elements of T (V ) corresponding to the product of non-commuting
polynomials.
11
Under the above assumptions, we define for any n ≥ 1 the nth order iterated
integral xns,t of the path xt ∈ V over any time interval
[s, t]
with 0 ≤ s < t ≤ T as an
element of V ⊗n by
xns,t =
∑
i1,...,in∈{1,...,d}
xi1,...,ins,t ei1,...,in (1.18)
where the coefficients xi1,...,ins,t are defined in (1.10). By the multi-linearity of ten-
sor products, iterated path integrals can be equivalently expressed in the following
coordinate-free way:
xns,t =
un=t∫
un=s
. . .
u1=u2∫
u1=s
dxu1 ⊗ ∙ ∙ ∙ ⊗ dxun (1.19)
which is very useful as it allows this definition to be generalised to paths that take
values in arbitrary infinite-dimensional Banach spaces.
We have now all the requisite ingredients for defining an object that will serve as
the prototype for rough paths.
Definition 1.2 (Signature). Let x :[0, T
]→ V be a continuous path of bounded
variation taking values in a Banach space V , and let ΔT := {(s, t) : 0 ≤ s ≤ t ≤ T}.
Then the signature S(x) : ΔT → T (V ) of x is the continuous functional mapping
(s, t) to S(x)s,t :=(x 0
s,t, x 1s,t, x 2
s,t, . . .)
where xns,t for n ≥ 1 are the iterated integrals
defined in (1.19) and x 0s,t ≡ 1.
Signatures of bounded variation paths can readily be shown to have the following
fundamental property:
Theorem 1.3 (Multiplicative property). Let S(x) be the signature of a bounded
variation path x :[0, T
]→ V . Then, for all 0 ≤ s ≤ u ≤ t ≤ T , we have that
S(x)s,u ⊗ S(x)u,t = S(x)s,t .
This result is usually called Chen’s identity (even though K. T. Chen wasn’t the
first person to discover it), and the signature S(x) of a bounded variation path x is
commonly called the Chen lift of x, since it extends – lifts – a path x in V to an
element S(x) in T (V ) such that the projection of S(x)s,t onto V is x 1s,t = xt − xs.
Let x :[t, u]→ V and y :
[v, w
]→ V be two arbitrary bounded variation paths
taking values in a Banach space V . Then the concatenation of x and y is defined to
be the path x ∗ y :[t, u + w − v
]→ V satisfying
x ∗ y(s) =
{x(s) for t ≤ s ≤ u,
x(u) + y(s− u + v)− y(v) for u ≤ s ≤ u + w − v.
12
The set of V -valued bounded variation paths, denoted by BV(V ), is clearly closed
under concatenation, and, moreover, as this operation is associative, (BV(V ), ∗) is
a semigroup (or even a monoid since each trivial path x :[t, t]→ V is an identity
element for the operation of concatenation).
Thus, we have
S(x)t,u ⊗ S(x)v,w = S(x ∗ y)t,u ⊗ S(x ∗ y)u,u+w−v (1.20)
as signatures are invariant under time translations of paths, and, further, by Chen’s
identity
S(x ∗ y)t,u ⊗ S(x ∗ y)u,u+w−v = S(x ∗ y)t,u+w−v (1.21)
which, combined with (1.20), shows that the range of the signature map S : BV(V )→
T (V ) is closed under multiplication in T (V ) induced by the tensor product ⊗. More-
over, every element v = (v0, v1, v2, . . . ) of T (V ) with v0 ∈ R\{0} possesses an inverse
element, namely
v−1 =1
v0
∞∑
n=0
(1−
v
v0
)⊗n
where 1 is the multiplicative unit element (1, 0, 0, . . . ), as one can directly verify. In
particular, the subset
T (V ) := { (1, v1, v2, . . . ) : vn ∈ V ⊗n, n ≥ 1 } ⊂ T (V )
is a group which contains the range of the signature map as a subgroup, since the
inverse of the signature of a bounded variation path is the signature of the path ‘run
backwards’, i.e. for any x :[s, u]→ V belonging to BV(V )
(S(x)s,u)−1 = S (←−x )s,u (1.22)
where ←−x (t) := x(s + u− t) for s ≤ t ≤ u (see [10, Proposition 2.14]). Hence, we have
established that the signature map is a homomorphism from the monoid (BV(V ), ∗)
into the group (T (V ),⊗).
In fact, projections of the range of the signature map S : BV(V ) → T (V ) onto
the truncated tensor product algebras T (n)(V ) := R ⊕ V ⊕ V ⊗ 2 ⊕ . . . ⊕ V ⊗n are
Lie groups, as defined below, for all n ≥ 1.
Definition 1.4. For a Banach space V , let us define
[V, n V
]:={[
vn, . . . ,[v2, v1
]. . .]
: vi ∈ V, 1 ≤ i ≤ n}
13
for n ≥ 2 with[V, 1 V
]:= V and
[V, 0 V
]:= {0} where the Lie bracket is defined as
[v, w
]:= v⊗w−w⊗ v for all v, w ∈ V . Then the space of Lie polynomials of degree
n over V , denoted by L(n)(V ) and defined as
L(n)(V ) :=n⊕
i=0
[V, i V
]
is a linear subspace of T (n)(V ) =⊕n
i=0 V ⊗ i. Further, let us define the exponential
map expn : T (n)(V )→ T (n)(V ) by
expn(x) :=n∑
i=0
x⊗ i
i!
for any x ∈ T (n)(V ). Then G(n)(V ) := exp n
(L(n)(V )
)is a group with multiplication
in T (n)(V ) induced by the tensor product, and is called the free nilpotent Lie group
of step n over V , or the set of group -like elements in T (n)(V ).
Proposition 1.5 ([10, Proposition 2.27]). For any n ≥ 1, G(n)(V ) coincides with the
projection of the range of the signature map S : BV(V ) → T (V ) onto the truncated
tensor product algebra T (n)(V ). Thus, every element of G(n)(V ) can be expressed as
the truncated signature of a bounded variation path in V .
It is also natural to enquire about the kernel of the signature map in BV(V ). By
(1.22), for any path x ∈ BV(V ), S(x)⊗S(←−x ) = S(x∗←−x ) = 1, i.e. any path concate-
nated with its reverse path has trivial signature, and, furthermore, any path that can
be reduced to a constant path by successively removing pairs of path segments of the
form (x,←−x ) also has trivial signature. Such paths are called tree-like, and one should
point out that path segments x and ←−x in such paths do not necessarily need to be
adjacent (and may even be infinitesimal): e.g. a path of the form x∗y ∗z ∗←−z ∗←−y ∗←−x
is tree-like and has trivial signature. As a profound converse statement, B. Ham-
bly and T. Lyons have proved (see [6, Theorem 1]) for bounded variation paths in
finite-dimensional Euclidean spaces Rd that a path being tree-like is also a necessary
condition for it to have trivial signature. Thus, we can define an equivalence relation
on BV(V ) by x ∼ y if and only if x ∗←−y is tree-like. Then we have that S(x) = S(y)
if and only if S(x)⊗ S(y)−1 = S(x)⊗ S(←−y ) = S(x ∗←−y ) = 1, i.e. if and only if x and
y are tree-like equivalent.
In addition to the above geometric interpretation of signatures as elements of T (V )
that are in one-to-one correspondence with classes of tree-like equivalent bounded
variation paths in V , the signature of each bounded variation path x :[0, T
]→ V
14
can be characterised as the solution of the following ‘universal’ rough differential
equation, i.e. a differential equation on the extended tensor product algebra T (V ):
dSt = St ⊗ dxt (1.23)
with the initial value S0 = 1, and where dxt represents the element (0, xt+dt−xt, 0, . . . )
of T (V ). Indeed, it is nice to observe how the signature of a path builds up through
repeated application of tensor multiplication by infinitesimal path increments dxt in
(1.23), so that St = S(x)0,t is the unique solution to (1.23). Thus, informally we can
think of the signature of a bounded variation path as a universal non-commutative
exponential of the path. Furthermore, this provides a succinct proof of Theorem 1.3
above – viz. the multiplicative property of signatures – since for all 0 ≤ s ≤ t ≤ T
both S(x)0,s ⊗ Ss,t and S(x)0,t satisfy the same differential equation (1.23) with the
same initial condition, and hence must be equal.
We take this key characteristic of signatures of bounded variation paths as the
defining property of a more general abstract object.
Definition 1.6 (Multiplicative functional). With above notation, a multiplicative
functional is a continuous functional x : ΔT → T (V ) with x s,t =(1, x 1
s,t, x 2s,t, . . .
)
where xns,t ∈ V ⊗n for n ≥ 1, satisfying the multiplicative property
x s,u ⊗ xu,t = x s,t . (1.24)
for all 0 ≤ s ≤ u ≤ t ≤ T .
Next we extend the concept of p-variation to multiplicative functionals.
Definition 1.7 (p-variation). Let x : ΔT → T (V ) be a multiplicative functional.
Then, for any real number p ≥ 1, the p-variation of x on[0, T
]is defined by
‖x‖p,[0,T ] = supn≥1
(
supD∈D([0,T ])
k∑
h=1
‖xnth−1,th
‖ p/n
V ⊗n
)n/p
where ‖ ∙ ‖V ⊗n denote a set of compatible norms on V ⊗n for n ≥ 1.
As with paths of finite p-variation in V , we note that any multiplicative functional
of finite p-variation in T (V ) also has finite q-variation for all q > p. A multiplicative
functional having finite p-variation, as defined above, is equivalent to it satisfying the
condition in the following
15
Proposition 1.8 ([11, Proposition 3.3.2]). Let x : ΔT → T (V ) be a multiplicative
functional. Then x has finite p-variation on[0, T
]for some p ≥ 1, in the sense of
Definition 1.7 above, if and only if there exists a super-additive continuous function
ω : ΔT → R+, called a control function, such that
‖xns,t‖V ⊗n ≤ ω(s, t)n/p
for all (s, t) ∈ ΔT and n ≥ 1.
In fact, the above condition is the same, in the general context of Banach spaces,
as the analytic condition that we formulated earlier for iterated integrals of finite
p-variation paths in Euclidean spaces with the control function ω(s, t) = C(t− s)n/p
where C is a constant that may depend on n and p. However, the reader should be
cautioned about this terminology, as we have also used the word ‘control’ for paths
that drive controlled differential equations, rather than referring to functions that
control the regularity of iterated path integrals.
Finally, we can state the formal definition of a p-rough path.
Definition 1.9 (p-rough path). A rough path of regularity p, or a p-rough path
for short, is a multiplicative functional of finite p-variation.
The first fundamental result on rough paths in the development of the theory by
T. Lyons is the following theorem ([9, Theorem 2.2.1]); it shows that only the first
b pc components of a p-rough path really matter since a p-rough path is uniquely
determined by its truncature at level b pc. For this reason, we may regard p-rough
paths as multiplicative functionals of degree bpc with finite p-variation, taking values
in the truncated tensor product algebra T (bpc)(V ).
Theorem 1.10 (Truncature of p-rough path). If x and y are p-rough paths in
T(V) such that xns,t = y n
s,t for all (s, t) ∈ ΔT and n = 1, . . . , b pc, then x = y.
Conversely, any multiplicative functional of degree bpc with finite p-variation can be
uniquely extended to a multiplicative functional with finite p-variation of arbitrarily
high degree r > bpc.
By contrast, it is important to realise that for a p-rough path x : ΔT → T (bpc)(V )
for any k satisfying 1 < k ≤ bpc, the terms xns,t for n = k, . . . , bpc are never uniquely
determined by the lower order terms xms,t for m = 1, . . . , k− 1. For example, a sample
path of a Brownian motion in Rd can be extended to a p-rough path for any p > 2
by defining its second order components as either Ito or Stratonovich integrals, which
are distinct in general.
16
Thus, abstracting the concept and characteristic properties of the signature of a
bounded variation path, a p-rough path is an extension of a path of finite p-variation
taking values in a Banach space V to its extended tensor product algebra T (V ) or its
truncature T (bpc)(V ) at level bpc such that the second and higher order components
– to be interpreted as, or actually defining, its iterated path integrals – satisfy an
analogous regularity condition. In other words, a p-rough path incorporates a full
sequence of iterated integrals, and hence encodes all the necessary data to provide
an unambiguous solution to any rough differential equation controlled by the p-rough
path. However, only components up to order bpc are needed for the solution of an
RDE. In particular, as already noted, in the classical case where 1 ≤ p < 2 only first
order linear increments of the controlling path are required for integration.
The so-called Universal Limit Theorem ([10, Theorem 5.3]), the main result of
the theory of rough paths, asserts that any rough differential equation dyt = f(yt)xt
controlled by a p-rough path xt, subject to the vector field f being Lipschitz-γ con-
tinuous with 1 ≤ p < γ, has a unique solution yt that is also a p-rough path, and the
Ito map If (xt, y0) 7→ yt is uniformly continuous. This is a deep and satisfying result.
As to the meaning of the defining multiplicative property of p-rough paths, this
simply corresponds to the additive property of iterated integrals over contiguous time
domains: for example, equating second order components on the left and right hand
sides of (1.24) for 0 ≤ s ≤ t ≤ u ≤ T yields
x 2s,t + x 2
t,u + x 1s,t ⊗ x 1
t,u = x 2s,u
which expresses the natural requirement that the second order iterated integral over[s, u]
should equal the sum of second order integrals over[s, t]
and[t, u]
plus the
product of the first order integrals (i.e. linear increments) over[s, t]
and[t, u].
One should carefully note the presence of the tensor product term on the left hand
side of the above equation, for second and higher order components of multiplicative
functionals are not additive!
For any p ≥ 1, let Ωp(V ) denote the set of all p-rough paths x : ΔT → T (bpc)(V ).
In particular, by our earlier remark, we note that Ω1(V ) is contained in Ωp(V ) for
all p > 1. Further, we can make Ωp(V ) into a metric space by equipping it with the
following metric:
Definition 1.11 (p-variation distance). If x and y are two elements of Ωp(V ),
then their p-variation distance is defined by
d p(x, y) = max1≤n≤bpc
(
supD∈D([0,T ])
k∑
h=1
‖xnth−1,th
− y nth−1,th
‖ p/n
V ⊗n
)n/p
.
17
In fact, it is straightforward to show that(Ωp(V ), dp
)is a complete metric space
for all p ≥ 1 (see [11, Lemma 3.3.3]). However, for p ≥ 2 , Ωp(V ) is not a linear space
due to non-linearity of the multiplicative property: in general the sum or difference
of two multiplicative functionals fails to be multiplicative.
With this metric, we can identify an important subclass of p-rough paths in Ωp(V ),
namely those elements that can be approximated arbitrarily closely by 1-rough paths
as measured by the p-variation distance.
Definition 1.12 (Geometric p-rough paths). The closure in Ωp(V ) of the space
Ω1(V ) of 1-rough paths under the topology induced by the p-variation distance dp is
called the space of geometric p-rough paths and denoted by GΩp(V ).
Thus, an element x of Ωp(V ) is a geometric p-rough path if and only if there
exists a sequence (xn)n≥1 of 1-rough paths such that limn→∞ dp(xn, x) = 0. Based
on the above topological description, one may wonder what ‘geometric’ there is about
geometric p-rough paths. The reason for this nomenclature is that geometric p-rough
paths take their values in the free nilpotent Lie group of step bpc – the very interesting
algebro-geometric object we defined above!
However, when p ≥ 2 there are also p-rough paths that take values in G(bpc)(V )
but cannot be expressed as the limit of a sequence of 1-rough paths in the p-variation
distance. We shall denote the space of such weakly geometric p-rough paths on V by
WGΩp(V ). Hence, GΩp(V ) ⊆ WGΩp(V ) with the inclusion being strict for p ≥ 2.
One should note that even though by Proposition 1.5 each weakly geometric p-rough
path x ∈ WGΩp([0, T ], V ) can be expressed as the truncated signature of a bounded
variation path for any (s, t) ∈ ΔT , there isn’t a single y ∈ BV([0, T ], V ) satisfying
x s,t = S(y)s,t for all (s, t) ∈ ΔT – unless, of course, x is actually a 1-rough path.
Generally though the difference between geometric and weakly geometric rough paths
is insignificant, and for our purposes we shall ignore it and just talk about geometric
rough paths without distinction.
The key implication of this is that for any geometric p-rough path x, we have
x s,t = x−10,s ⊗ x 0,t
for all 0 ≤ s ≤ t ≤ T where x−10,s is the group inverse of x 0,s in G(bpc)(V ), hence
providing for x s,t the natural interpretation of an increment of the p-rough path x
over the time interval[s, t].
In addition to the two characterisations – a topological and an algebro-geometric
one – of geometric rough paths given above, there is also a third way of describing
them that is analytical in nature and somewhat more concrete than the previous ones.
18
Let V = Rd. Then it can be shown that among all the p-rough paths in Ωp(V )
only the geometric ones x ∈ GΩp(V ) satisfy the following identity
x i1,...,ims,t . x j1,...,jn
s,t =∑
k1,...,km,km+1,...,km+n
∈{i1,...,im}t{j1,...,jn}
xk1,...,km+n
s,t (1.25)
where {i1, . . . , im}t{j1, . . . , jn} is the shuffle product of these two sets of indices, i.e.
the set of all permutations of {i1, . . . , im, j1, . . . , jn} that preserve the orderings of ik
(k = 1, . . . ,m) and jl (l = 1, . . . , n) – just like the orderings of cards in each half of
the deck are preserved in a riffle shuffle. It was first observed by R. Ree (see [12])
that if x is the signature of a bounded variation path in V constructed canonically
by means of Riemann integrals, then x satisfies the above shuffle product identity.
The analytical content of this rather combinatorial looking result may not be
immediately obvious, but if we set m = n = 1 above, then (1.25) becomes
x is,t . x
js,t = x i,j
s,t + x j,is,t
which, when interpreting the terms as normal first and second order iterated integrals,
is nothing other than the familiar integration by parts formula! Thus, the shuffle
product identity can be seen to be a generalisation of the integration by parts formula
to higher order iterated integrals.
Therefore, geometric rough paths can be characterised as those rough paths that
obey the standard rules of calculus, and it is in this fact that their analytical signifi-
cance lies.
To finish off our brief overview of the theory of rough paths, let us say a few words
about non-geometric p-rough paths for p ≥ 2 that hence don’t follow the ordinary
rules of calculus without any correction terms. In 2010, M. Gubinelli published a
new theory (see [4]) in which he defines branched rough paths as functionals mapping
from a simplex ΔT into a Hopf algebra H that is generated, as an algebra, by the set
of rooted trees with vertices labelled by the basis elements of the path space V = Rd
(containing the tensor product algebra T (V ) as a linear subspace spanned by the set
of linear, i.e. non-branched, trees) such that these functionals satisfy two algebraic
conditions analogous to the multiplicative property and the shuffle product identity
for geometric rough paths as well as an analytic condition that corresponds to finite
p-variation. Other parallels with the theory of geometric rough paths include the
fact that the set of branched rough paths also form a Lie group in the Hopf algebra
which is very similar to the free nilpotent Lie group. However, as branched, i.e. non-
geometric, rough paths will not be used in the rest of this thesis, we will not explore
19
this fascinating theory in greater detail.
In this chapter we have endeavoured to give an intuitive introduction to the theory
of rough paths, and especially we have wanted to show that even though this is a
very modern theory, developed over the past two decades, it has deep historical roots
going back to classical infinitesimal calculus and further all the way to ancient Greek
mathematics, and indisputably represents one of the most important advances to the
mathematical study of change since the days of Newton and Leibniz.
From its initial, purely analytical problem of solving differential equations con-
trolled by irregular paths – equivalently, integrating differential forms along such paths
– the key insight of the theory of rough paths has been to take the whole sequence of
iterated path integrals as the fundamental object driving differential equations and
controlling the local behaviour of their solutions, and, endowing them individually
with a rich algebraic structure, consequently discovering that collectively they form
a beautiful geometric object. Underpinning this pleasing aesthetics of the theory of
rough paths lies its fundamental achievement of giving meaning to the differential of
a function with controlled irregularity – and, in the field of mathematical analysis,
surely nothing is more fundamental than that.
20
Chapter 2
Application of rough paths theoryto time series analysis of financialdata streams
In this chapter we consider ways in which the theory of rough paths can be applied to
analyse time series, particularly focussing on high frequency financial data streams.
We begin with a brief discussion of the methods traditionally employed in classical
time series analysis, and then give a literature review of recent applications of the
approach to use the signature of a financial time series for the purposes of data
classification and prediction based on supervised learning algorithms. Finally, we
present our own novel application of rough paths theory in the context of financial
data analysis.
2.1 Classical time series analysis
Financial data is usually obtained in discrete form as a time series: an ordered set
of multi-dimensional numerical values X = {(X 1ti, . . . , X d
ti) ∈ Rd : i = 0, . . . , N}
observed at finitely many time points t0 < t1 < ∙ ∙ ∙ < tN . In traditional time series
analysis it is a common approach to view discrete data points as samples of an under-
lying continuous time stochastic process which is typically assumed to have a specific
form in order to capture certain characteristics of the time series being modelled – e.g.
autoregression (AR), moving-average (MA) or conditional heteroscedasticity (CH) –
and whose parameters are estimated using regression techniques so as to best fit the
chosen model to the given data stream.
However, parametric approaches to modelling time series have several inherent
limitations. First of all, they usually depend to a large extent on the assumptions
21
made about the underlying (unknown) data-generating process, and thus are sub-
ject to the potential risk of model misspecification, for it may happen that a chosen
model cannot adequately describe a given time series even with an optimal calibra-
tion. Secondly, in some cases sampling may not be an effective way to approximate
a continuous process – especially when dealing with highly oscillatory processes – as
a sequence of data points may fail to capture the order of events between different
coordinates of a multi-dimensional process, and hence be incapable of detecting la-
tencies and causal effects within the structure of the data stream. Though increasing
sampling frequency normally improves the accuracy of a discrete approximation, this
is not always the case: for example, it is known that sampling a Brownian motion,
as the driving signal of a dynamical system, with arbitrarily high frequency does not
necessarily provide sufficient statistical information for its effects on the evolution of
the system to be predicted.
Moreover, sampling at high rates is beset with its own fundamental problems,
chief of which is the curse of dimensionality. For instance, recording a high fre-
quency financial data stream tick by tick – which may be only milliseconds apart –
is usually an inefficient way of representing such data, for it is bound to contain lots
of redundant information, meaningless market noise, that might obscure the main
structural characteristics of the data stream, and to carry out regression analysis on
increments as features of the data stream would be infeasible because of prohibitively
high dimensionality. Therefore, one would like to find methods to summarise high
frequency data streams in a more concise way – to compress big data sets without
losing key information – and thus to achieve significant dimension reduction enabling
standard regression techniques to be applied. As we will see, using the signature of
a data stream, truncated to a suitable order, as the feature set of the data stream
accomplishes these objectives.
2.2 Signatures as feature sets for linear regression
analysis
As we recall from Section 1.2 above, B. Hambly and T. Lyons showed in [6] that
the signature of a multi-dimensional path of finite length uniquely determines the
path up to tree-like equivalence (i.e. up to modifications that have null effect when
the path is used as a system control), and hence pointed out that mapping a path
to its signature can be viewed as a faithful data transform in the sense that no
information is lost in the process. Following this notion in [8], D. Levin, T. Lyons
22
and H. Ni were the first to propose using signatures of time series for the purposes
of analysing financial data, and demonstrated the potential this approach has for
machine learning and statistical inference. Specifically, in their paper the authors built
a general non-parametric model for determining the conditional distribution of the
output variable of a system as a linear functional of the components of the expected
truncated signature of a random input stream (for a large number of series of data
samples), and thus, using the signature of a data stream as a feature set, estimated the
functional relationship between an input stream and the corresponding noisy system
response by employing standard techniques of regression analysis. Moreover, they
showed that classical parametric time series models such as AR, ARCH and GARCH
can be considered to be special cases of the expected signature model.
Given that signature is an intrinsically non-linear object, its use as a feature set
for linear regression analysis might, on first thought, seem somewhat counterintuitive.
However, assuming that the conditional distribution of a future system output is a
smooth function of the signature of the current input stream, then it is reasonable
also to assume that this function can be well approximated locally by a polynomial.
But, by the shuffle product property of signatures, as stated in (1.25) above, any
polynomial of signature components, i.e. iterated path integrals, can be expressed as
a linear combination of higher order signature components. Thus, it is indeed natural
to assume a linear relationship between the expected signature of input streams and
the system output.
Since the signature of a d-dimensional path truncated to order n has (dn+1−1)(d−1)
components, one can see that the signature approach allows high frequency data
streams to be represented by relatively few features compared to the sampling method.
Furthermore, the number of these signature based features does not depend on the
sampling frequency whereas the dimensionality of increment features increases linearly
with the sampling frequency. This reduction in effective dimensionality through the
use of signature components as a feature set can lead to substantial efficiency gains
when performing regression analysis, and may also help to avoid overfitting issues
that often bedevil the use of high frequency data. In [8], D. Levin et al. showed that
for a 2-dimensional path one can predict the output of a system controlled by the path
more accurately by using the truncated signature of degree 4, and thus of dimension
31, than by using an increment feature set of dimension 250, which demonstrates that
signature features furnish a more efficient summary of a path in terms of its effects
than traditional increment features. The authors also compared the performance of
the expected signature model to that of a non-parametric Gaussian process model,
23
and found that it achieves similar forecasting accuracy with a computational cost
that is lower by two orders of magnitude!
Even though some information is inevitably lost when the full signature of a path
is projected onto a finite-dimensional truncated tensor product space, the low order
components contain most of the information with the truncation error decreasing
factorially as the degree of the truncated signature increases, and, moreover, these
leading components are not particularly sensitive to the sampling frequency used.
Hence, the iterated integrals of a path (of bounded variation) provide very efficient
statistics of the path in the sense that they determine the response of any linear
system driven by the path very accurately. Indeed, the beauty and power of this
whole approach lies in the fundamental fact that the signature of a path efficiently
summarises information on normal time scales in a way that enables the effects of the
path in dynamical interactions to be effectively predicted without needing to know the
behaviour of the path on microscopic scales – which for some less regular paths can
be highly complex.
In [5], G. Gyurko et al. took a similar approach by embedding financial time se-
ries into continuous processes using linear interpolation between discrete data points,
computing truncated signatures of the paths thus constructed, and by performing
standard linear regression (combined with the LASSO shrinkage method) on the
signature components as a feature set, classified data streams in a given learning
set according to some selected properties, and then proceeded to classify fresh data
streams based on their signatures in out-of-sample testing. For example, in one of
the numerical tests presented in [5], it was explored to what extent the signatures of
streams of WTI crude oil futures market data (including mid-price, bid-ask spread,
order imbalance and cumulative trading volume) sampled by the minute from stan-
dard 30 minute time intervals determine the time buckets they are sampled from,
and, by using standard statistical indicators to measure the accuracy of classification,
it was shown that a very small number of low-dimensional signature components of
data streams suffice to characterise their time buckets with a high degree of accu-
racy (the ratio of correct classification exceeding 90% in most cases). This example,
together with the other experiments presented in [5] aiming to characterise two dif-
ferent trade execution algorithms by distinguishing between parent orders generated
by them and hence to detect their traces in market data, demonstrate again that
signatures of data streams efficiently capture information in a non-parametric way
that avoids traditional statistical modelling of time series data.
24
This paper is also notable for (i) introducing lead and lag transforms of a multi-
dimensional data stream – special types of time re-parameterisation of the data stream
that preserve its signature – in order to capture the quadratic variation of path com-
ponents, as this quantity – i.e. volatility – is of fundamental importance in financial
applications, and (ii) using first and higher order areas between path components to
analyse data streams. We will discuss these topics in detail in the next two sections,
especially the latter since in our novel application of the signature approach (to be
presented in Section 2.5) third order areas will play a key role as sensitive tools for
detecting mean-reverting behaviour in financial time series. However, let us first state
the fundamental properties of signatures that will be used in practice, as the theoret-
ical foundation of our numerical algorithms, to compute signatures of data streams
as well as the invariance property of signatures under time re-parameterisations upon
which the useability of lead and lag transforms rests.
Let Xt = (X1t , . . . , Xd
t ) ∈ Rd be a continuous d-dimensional path of bounded
variation defined on a time interval[0, T
]. For any multi-index, i.e. an ordered set of
indices I = (i1, . . . , ik) with k ≥ 1 and ij ∈ {1, . . . , d} for j = 1, . . . , k, we define, as in
(1.10), the kth order iterated integral of the path X corresponding to the multi-index
I over the time interval[s, t]
for any 0 ≤ s < t ≤ T by
X(i1,..., ik)s,t :=
∫∙ ∙ ∙∫
s<u1<∙∙∙<uk<t
dX i1u1
. . . dX ikuk
.
Then the signature S(X)s,t of X over the time interval[s, t]
is defined to be
the sequence of iterated integrals (XIs,t)I∈I where I is the set of all multi-indices
with the zeroth order component of the signature corresponding to the empty set of
indices defined to be 1, and, for any non-negative integer n, the truncated signature
S n(X)s,t of degree n is the sequence (XIs,t)I∈In where In is the set of all multi-indices
that consist of at most n indices. With this notation, we have the following key
properties of signatures:
(i) Uniqueness ([6, Theorem 1]): The signature S(X)s,t of a path (Xu)s≤u≤t of
bounded variation taking values in Rd determines the path, i.e. the function
u 7→ (Xu −Xs) for s ≤ u ≤ t, up to tree-like equivalence. Moreover, if at least
one of the co-ordinates X iu with i ∈ {1, . . . , d} is a monotonically increasing
function of u, then the path (Xu)s≤u≤t is uniquely determined by the signature
S(X)s,t. However, the proof of this uniqueness result in [6] is non-constructive,
and recently X. Geng has provided an explicit method in a more general setting
25
for reconstructing a rough path from its signature (see [3]), thus effectively in-
verting the signature map (Xu)s≤u≤t 7→ S(X)s,t. It should be noted that any two
1-dimensional paths whose initial and final values differ by the same amount are
tree-like equivalent and hence have the same signature, irrespective of the way
the distance between the start and end points is traversed (whether travelling
straight or zigzagging). By contrast, this is not the case for multi-dimensional
paths whose higher order iterated integrals are not uniquely determined by their
first order increments, but in general depend on the trajectories between the
start and end points. Nevertheless, by the uniqueness property and (iii) below,
the signature of a path of arbitrary dimension that is tree-like equivalent to a
linear path is uniquely determined by its first order increments.
(ii) Invariance under time re-parameterisations: For any continuous and
monotonically increasing function f :[0, T
]→[U, V
]and (i1, . . . , ik) ∈ I,
we have∫∙ ∙ ∙∫
s<u1<∙∙∙<uk<t
dX i1u1
. . . dX ikuk
=
∫∙ ∙ ∙∫
f(s)<u1<∙∙∙<uk<f(t)
dX i1f−1(u1) . . . dX ik
f−1(uk)
for any 0 ≤ s < t ≤ T . Therefore, S(X)s,t = S(X)f(s),f(t) where the path
(Xu)f(s)≤u≤f(t) is an arbitrary time re-parameterisation of the original path
(Xu)s≤u≤t such that Xu = Xf−1(u) for f(s) ≤ u ≤ f(t).
(iii) Signature of a linear path: If Xt = X0 + Y t for some fixed points X0 and
Y = (Y1, . . . , Yd) in Rd and all t ∈[0, T
], then for any multi-index (i1, . . . , ik)
X(i1,...,ik)s,t =
(t− s)k
k!
k∏
j=1
Yij
for any 0 ≤ s < t ≤ T . Thus, each iterated integral of a linear path is simply
the product of its increments in the relevant co-ordinates over the time interval
divided by the factorial of the order of the iterated integral. This means that the
signature of a linear path – in fact that of any path – is independent of its initial
value X0, or, to put it differently, signatures are invariant under translations of
paths in the spatial domain.
(iv) Multiplicative property: For all 0 ≤ s ≤ t ≤ u ≤ T , we have
S(X)s,t ⊗ S(X)t,u = S(X)s,u .
26
Application of lead and lag transforms to data streams relies on property (ii)
in that time re-parameterisations of paths leave their signatures invariant, whereas
properties (iii) and (iv) will be used to compute truncated signatures of data streams
through the following procedure: 1) for given data streams, continuous paths are
constructed by linearly interpolating between discrete data points, 2) the signature
of each linear segment of the path is computed using (iii), and 3) the signatures of
contiguous linear segments are joined together to form the signature of the whole
piecewise linear path using (iv).
Note (on Numerical Algorithms). Even though free open-source software packages
are available for the computation of signatures1, all numerical algorithms used in this
thesis were developed from scratch and implemented in Python by the author. These
include a function that produces a time-indexed sequence of signatures, truncated to
an arbitrary degree specified by the user, for a given serial data stream of arbitrary
dimension, and functions that generate different types of lead and lag transforms
of input data streams, as well as various routines used to visualise outputs of such
functions. All of these programs were rigorously tested (e.g. by checking signature
components against iterated integrals computed in an Excel spreadsheet) to ensure
that they do not contain any bugs. Code samples are exhibited in Appendix 2.
2.3 Lead and lag transforms of data streams
Since financial data usually comes in the form of a time series, whereas signatures are
defined for continuous paths, our first task is to embed discrete data streams into paths
defined over continuous time intervals. Let X = {(X 1ti, . . . , X d
ti) ∈ Rd : i = 0, . . . , N}
be a set of data points observed at finitely many time points t0 < t1 < ∙ ∙ ∙ < tN .
Obviously there are various possible ways of embedding X into a continuous time
path (Xt)t0≤t≤tN so that Xti = X ti for i = 0, . . . , N – and different ways of ‘joining
the dots’ generally produce paths with different signatures. The following methods
are the most relevant for our current purposes: the first two – constructing piecewise
linear or piecewise constant paths – are standard approaches (applied, for instance,
in [5] and [8], respectively), whereas the third method – lead and lag transforms –
was introduced by B. Hoff in his D.Phil. thesis [7] in 2005.
1 For example, the sigtools Python package which is based on the libalgebra library of theCoRoPa project (downloadable from http://sourceforge.net/projects/coropa) that was usedin both [5] and [8].
27
(i) Piecewise linear interpolation: For t ∈[ti, ti+1
]with i = 0, . . . , N − 1, we
define
Xt = X ti +t− ti
ti+1 − ti
(X ti+1
− X ti
).
(ii) Piecewise constant ‘axis’ path: For t ∈[ti, ti+1
)with i = 0, . . . , N − 1, we
set Xt = X ti , so that at each time point ti+1 (i = 0, . . . , N − 1) the path jumps
discontinuously to the value X ti+1. It is worth remarking that even though such
axis paths are continuous time paths in the sense that they are defined for a
continuous range of time values within a specified interval, they are clearly not
continuous functions of time, and to call them that, as is done in some research
papers (see e.g. [8]) is somewhat misleading.
For any embedding of a data stream (X ti)Ni=0 into a continuous time path
(Xt)t0≤t≤tN , we define the signature of the data stream as S(X)t0,tN . As said,
in general different embeddings yield different signatures, but it is easy to see
that the piecewise linear path X lint and the piecewise constant path Xcon
t defined
from the same data stream have the same signature. However, one should note
that even though S(X lin)ti,tj = S(Xcon)ti,tj for any 0 ≤ i < j ≤ N , S(X lin)ti,t
does not equal S(Xcon)ti,t for any t ∈(tj , tj+1
), since S(Xcon)tj ,t = (1, 0, 0, . . . ).
(iii) Lead and lag transforms: The idea behind lead and lag transforming a given
d-dimensional time series (X ti)Ni=0 is to create new backward (‘lag’) and forward
(‘lead’) time series by adding data points to the original time series in two
distinct ways that both preserve its increments, and hence leave its signature
invariant, since such transforms are time re-parameterisations of the given data
stream. The data points of the lag and lead transformed streams are then
joined together to form axis or piecewise linear paths, depending on the method
applied. Several different definitions of lead and lag transforms can be found in
the literature, of which we will review three methods below.
2.3.1 Gyurko-Lyons-Kontkowski-Field method
In [5, Section 2.5], Gyurko, Lyons, Kontkowski and Field (‘GLKF’) defined the
lead and lag transforms of a d-dimensional data stream (X ti)Ni=0 as follows: For
i = 0, . . . , N , X leadti
= X lagti = X ti , and, for i = 1, . . . , N , X lead
ti− 1
2
= X ti and
X lagti− 1
2
= X ti−1. Thus, the lead and lag transforms of a given stream of N + 1 data
points consist of 2N + 1 data points. Indeed, from the above description it is easy to
see that they can be created by repeating the data points of the original stream, and
28
deleting the first and last data points in order to obtain the lead and lag transforms,
respectively. Hence, lead and lag transforms are time translations of each other, as
illustrated below in Figure 1 which displays the lead and lag transforms produced
by applying the GLKF method to a 1-dimensional data stream whose increments are
randomly sampled from a standard normal distribution.
Figure 1: GLKF method of lead-lag transforming data streams.
This definition was motivated by the authors’ desire to be able to easily read off the
volatilities of the components (X jti)
Ni=0, for j = 1, . . . , d, of a given data stream from
the signature of the (2d)-dimensional data stream (Y ti/2) 2Ni=0 = (X lead
ti/2, X lag
ti/2) 2Ni=0, as
volatilities of market variables are highly relevant quantities in financial applications.
For, it is straightforward to verify by a direct calculation that for any j = 1, . . . , d
Y(j, j+d)t0,tN
− Y(j+d, j)t0,tN
=N−1∑
i=0
(X jti+1− X j
ti)2
where (Yt)t0≤t≤tN is the piecewise constant or piecewise linear interpolation of the data
stream (Y ti/2) 2Ni=0, i.e. the quadratic variation of the jth component of the original
29
data stream (X ti)Ni=0 is equal to the difference between the iterated integral of the jth
components of the lead and lag transformed data streams and the iterated integral of
the same components in reverse order. This latter quantity is twice the area between
the jth components of the lead and lag transformed data streams, as defined in Section
2.3 of [5]. We will use the same definition of area between path components in this
work.
Furthermore, in Section 2.5 of [5] it was claimed that “the (signed) area between
the ith component of the lead-transform and the jth component of the lag-transform
equals to the quadratic cross-variation of the trajectories X i and Xj ”. Unfortunately,
this is not a valid statement, as one can readily show either analytically or using a
numerical simulation. The current author has verified this in both ways. To prove
this point mathematically, in Appendix 1 we provide an explicit calculation of the
signature of a multi-dimensional data stream with one lead-transformed and one lag-
transformed component.
2.3.2 Flint-Hambly-Lyons method (Mark 1)
In the early version of [2] (of February 2014), Flint, Hambly and Lyons (‘FHL’) used
a different definition (see Definition 1.2 of that version of their paper) for lead and
lag transforms by setting X leadti
= X ti+1and X lag
ti = X ti−1for i = 1, . . . , N − 1,
X leadti+1
2
= X ti+1and X lag
ti+1
2
= X ti for i = 0, . . . , N − 1, with X leadt0
= X lagt0 = X t0 and
X leadtN
= X lagtN
= X tN , and then linearly interpolating between data points to form
continuous time paths. Figure 2 below illustrates the lead and lag transforms of the
same data stream as in Figure 1 produced by using this method.
From Figure 2, we can see that at the start and at the end of the data stream its
lead and lag transforms are not simple time translations of each other; nevertheless,
this method of lead-lag transforming a data stream does preserve its increments, and
hence leaves its signature invariant.
2.3.3 Flint-Hambly-Lyons method (Mark 2)
In the current version of [2] (of September 2016), the authors have modified their
earlier definition of lead and lag transforms (see Definition 2.1 of that version of the
paper), and now define them by setting X leadti
= X leadti+1
4
= X leadti+1
2
= X ti+1and X lead
ti+3
4
=
X ti+2for i = 0, . . . , N − 2 with X lead
tN−1= X lead
tN− 3
4
= X leadtN− 1
2
= X leadtN− 1
4
= X leadtN
= X tN ,
and X lagti = X lag
ti+1
4
= X lagti+1
2
= X lagti+3
4
= X ti for i = 0, . . . , N − 1 with X lagtN
= X tN−1.
(Strictly speaking, Definition 2.1 of [2] fails to assign a value to the penultimate point
30
Figure 2: FHL (Mark1) method of lead-lag transforming data streams.
X leadtN− 1
4
of the lead transform.) Thus, under this method the lead and lag transforms
of a time series of N +1 data points consist of 4N +1 data points. In Figure 3 below
these are illustrated for the same data stream that was used in Figures 1 and 2.
This new definition was suggested by a context in mathematical finance where an
investor would readjust at time ti+1 the amounts of stock he holds in his portfolio
based on the stock prices at time ti – i.e. when there is a delay between receiving
market information and acting on it by trading – for by defining the lead and lag
transforms of the time series of stock prices in this way allows one to express the
profit (or loss) made by the investor’s trading strategy as an exact integral of a
function of the lag transform with the lead transform as an integrator.
The problem with this definition is that lead and lag transforms specified as above
do not preserve the increments of a data stream, since X leadt0
= X t1 and X lagtN
= X tN−1
mean that the first and last increments of the original data stream are missing from
the lead and lag transformed streams, respectively, as can be seen from Figure 3
below, and consequently their signatures generally differ from that of the original
31
Figure 3: FHL (Mark 2) method of lead-lag transforming data streams.
data stream (as well as from each other), as one can readily verify with a numerical
example. However, this situation can be easily remedied by redefining X leadt0
= X t0
and X lagtN
= X tN .
2.4 Area processes of multi-dimensional paths
2.4.1 Definition and basic properties of areas
In Subsection 2.3.1 above we already encountered the concept of area between two
components of a multi-dimensional path. This is now formalised in the following
Definition 2.1 (Area). Let X : u ∈[0, T
]7→(X1
u, . . . , Xdu
)∈ Rd be a continuous
path of finite length. Then the area A(i, j)s,t between two path components X i
u and Xju
with 1 ≤ i, j ≤ d over any time interval[s, t]
where 0 ≤ s ≤ t ≤ T is defined by
A(i, j)s,t :=
1
2
v=t∫
v=s
u=v∫
u=s
dX iu dXj
v −
v=t∫
v=s
u=v∫
u=s
dXju dX i
v
=1
2
(X
(i, j)s,t −X
(j, i)s,t
).
32
Figure 4: Area between path components X i and Xj .
As immediate consequences of the above definition, we have that A(i, i)s,t = 0 and
A(i, j)s,t = −A
(j, i)s,t for all 1 ≤ i, j ≤ d and 0 ≤ s ≤ t ≤ T .
The quantity A(i, j)s,t has a natural geometric interpretation – which also explains
its name – as the signed area between the curve u 7→ (X iu, Xj
u) for u ∈[s, t]
and the
chord that connects the start point (X is, Xj
s ) and end point(X i
t , Xjt
)of the curve.
This is illustrated in Figure 4 above. For, it is clear that the second order iterated
integrals X(i, j)s,t and X
(j, i)s,t represent the areas between the curve and the vertical and
horizontal axes, respectively, so that X(i, j)s,t + X
(j, i)s,t = X
(i)s,tX
(j)s,t . Moreover, denoting
the area between the curve and the chord by A(i, j)s,t , which is shaded yellow in Figure
4, we have that A(i, j)s,t +X
(j, i)s,t = 1
2X
(i)s,tX
(j)s,t , which, when substituted into the previous
equation, yields A(i, j)s,t = 1
2
(X
(i, j)s,t −X
(j, i)s,t
).
One should emphasize that A(i, j)s,t is a signed area, i.e. that it can take positive
or negative (or zero) values: indeed, if A(i, j)s,t > 0, then, as we have seen, it follows
straight from the definition that A(j, i)s,t < 0; or, in geometric terms, reflecting the
curve in Figure 4 in the chord connecting its start and end points, which corresponds
to reversing the order of the path components in the area calculation, would give a
negative area (of the same absolute magnitude) above the chord.
33
Figure 5: A typical 2-dimensional Brownian sample path.
However, sample paths of stochastic processes don’t usually look like the smooth
monotonic curve displayed in Figure 4. Rather, a typical 2-dimensional Brownian-
like random walk with independent normal increments is shown in Figure 5 above.
It should be noted that for less regular paths that zigzag and cross themselves the
simple geometric interpretation of area between path components as the signed area
enclosed between the curve and the chord is no longer valid in general. In particular,
it is easy to see that the sign of the area enclosed within a loop – a path segment
whose start and end points coincide – depends on the direction in which the loop
is traversed. Yet none of the many research articles or text books we have come
across that deal with areas between path components point out this basic fact while
presenting the standard geometric interpretation.
Another invalid view about areas between path components that can be found
in the literature is expressed in the following statement (see Section 2.3 of [5] under
the heading of ‘Lead-lag relationship’): “if an increase (respectively decrease) of the
component X1 is typically followed by an increase (decrease) in the component X2,
34
then the area A1,2 is positive. If a move in X1 is followed by a move in X2 to the
opposite direction, the area is negative”. This assertion is clearly false, since, for
example in Figure 4 above, both the path and its reflection in the chord connecting
its start and end points have positive increments in their components, but the area
enclosed by the former below the chord is positive whereas the area enclosed by
the latter above the chord is negative. It is also evident that the area between path
components which have increments of opposite signs can be either positive or negative
(or zero).
Even though area between path components does not capture correlation between
increments in the components, as is clear from the above discussion, by viewing
the operation of taking iterated integrals of path components as a kind of ‘product’
on the space of path components, the operation of computing areas between path
components can be regarded as a ‘commutator’ or ‘Lie bracket’ on the space of path
components, and in this sense the area between two path components can be viewed
as a measure of their non-commutativity. We will formalise this idea in the next
subsection, and will subsequently explore what kind of algebraic structure it endows
on the path space.
2.4.2 Higher order areas
In the previous subsection, we defined the area A(i1, i2)s, t between two components X i1
u
and X i2u of a continuous d-dimensional path
(X1
u, . . . , Xdu
)0≤u≤T
of finite length where
ik ∈ {1, . . . , d} for k = 1, 2 over a time interval[s, t]
with 0 ≤ s ≤ t ≤ T . For a
fixed s, A(i1, i2)s, u is a function of u for s ≤ u ≤ t, and thus A
(i1, i2)s, u can be viewed as
a 1-dimensional path defined on the time interval[s, t]. Furthermore, as A
(i1, i2)s, u is
clearly continuous and has finite length, we can define the second order area A((i1, i2), i3)s, t
between three path components X i1u , X i2
u and X i3u where ik ∈ {1, . . . , d} for k = 1, 2, 3
over a time interval[s, t]
with 0 ≤ s ≤ t ≤ T as follows:
A((i1, i2), i3)s,t :=
1
2
v=t∫
v=s
u=v∫
u=s
dA(i1, i2)s, u dX i3
v −
v=t∫
v=s
u=v∫
u=s
dX i3u dA(i1, i2)
s, v
. (2.1)
By the anti-commutativity of area operation with respect to the order of path
indices, we have that A((i1, i2), i3)s, t = −A
(i3, (i1, i2))s, t = A
(i3, (i2, i1))s, t . However, one should
carefully note that the operation of forming second order areas is not associative –
the way path indices are bracketed certainly matters – so that in general A((i1, i2), i3)s, t
is not equal to A(i1, (i2, i3))s, t . For example, when i2 = i3, A
(i1, (i2, i3))s, t = 0, but for i1 6= i2
A((i1, i2), i3)s, t may well be non-zero.
35
Let us examine the area differential dA(i1, i2)s, u := A
(i1, i2)s, u+du−A
(i1, i2)s, u that appeared in
the above definition, as it will not only enable us to express second and higher order
areas (to be analogously defined shortly) as linear combinations of iterated integrals
of one degree higher, but will also give the crucial idea for our classification of paths
using third order areas.
Since the area differential can be written as
dA(i1, i2)s,u =
1
2
v=u∫
v=s
dX i1v
dX i2u −
v=u∫
v=s
dX i2v
dX i1u
=1
2
{ (X i1
u −X i1s
)dX i2
u −(X i2
u −X i2s
)dX i1
u
},
(2.2)
we have that dA(i1, i2)s, u = 0 is equivalent to
dX i2u =
(X i2u −X i2
s )
(X i1u −X i1
s )dX i1
u
provided that (X i1u −X i1
s ) 6= 0. Therefore, it follows that A(i1, i2)s, u = 0 for all u ∈
[s, t]
if and only if
X i2u = X i2
s +(X i2
t −X i2s )
(X i1t −X i1
s )(X i1
u −X i1s )
with X i1t 6= X i1
s or X i1u = X i1
s for all u ∈[s, t]. Thus, the area process u 7→ A
(i1, i2)s, u for
u ∈[s, t]
is identically zero if and only if the point (X i1u , X i2
u ) traces a straight line of
fixed slope from the start point (X i1s , X i2
s ) to the end point (X i1t , X i2
t ) for u ∈[s, t]
– possibly moving backwards and forwards or pausing for some time along the way.
Indeed, this is a natural result bearing in mind the earlier picture of an area between
a curve and the chord connecting its start and end points! For, in order to have a
zero area enclosed between a curve and its chord at every point in time the two must
always coincide; equivalently, a non-zero area can arise only if the curve has non-zero
curvature at some point in time. Thus, the area process of two path components can
be seen to capture any curvature in their trajectory.
We can now use the above expression for area differential to derive a general
formula for second order area as a linear combination of third order iterated integrals.
For, by substituting (2.2) into (2.1), after some manipulation of the iterated integrals,
we obtain
A(i1, (i2, i3))s, t =
1
4
(X
(i1, i2, i3)s, t + X
(i2, i1, i3)s, t + X
(i3, i2, i1)s, t
−X(i1, i3, i2)s, t −X
(i2, i3, i1)s, t −X
(i3, i1, i2)s, t
).
(2.3)
36
For indices i1, i2 and i3 that are all distinct, the third order iterated integrals
corresponding to all the permutations of them generally have different values so that
the expression for their second order area consists of 6(= 3!) different terms, whereas,
when i1 = i2 or i1 = i3, two of the terms cancel each other out and the remaining four
comprise two pairs of equal terms so that A(i1, (i1, i3))s, t = 1
2
(X
(i1, i1, i3)s, t −X
(i1, i3, i1)s, t
),
and, when i2 = i3, A(i1, (i2, i3))s, t of course vanishes. In particular, one should note that
the signs of the iterated integrals in the expression for the second order area are not
given by the signs of the permutations of the indices.
Analogously, third order areas involve four path components, and can be defined
in two fundamentally different ways: as the area between path component X i1u and
the second order area A(i2, (i3, i4))s, u between path components X i2
u , X i3u and X i4
u , or the
area between the areas A(i1, i2)s, u and A
(i3, i4)s, u of two pairs of path components. Formally,
we define
A(i1, (i2, (i3, i4)))s, t :=
1
2
v=t∫
v=s
u=v∫
u=s
dX i1u dA(i2, (i3, i4))
s, v −
v=t∫
v=s
u=v∫
u=s
dA(i2, (i3, i4))s, u dX i1
v
(2.4)
and
A((i1, i2), (i3, i4))s, t :=
1
2
v=t∫
v=s
u=v∫
u=s
dA(i1, i2)s, u dA(i3, i4)
s, v −
v=t∫
v=s
u=v∫
u=s
dA(i3, i4)s, u dA(i1, i2)
s, v
. (2.5)
For our path classification purposes, we will find third order areas of the latter type
much more useful, and in the sequel whenever we refer to a third order area without
qualification we will always mean a third order area of this type.
Applying the differential operator to (2.3) and substituting into (2.4) yields
A(i1, (i2, (i3, i4)))s, t =
1
8
(X
(i1, i2, i3, i4)s, t + X
(i2, i1, i3, i4)s, t + X
(i2, i3, i1, i4)s, t
+ X(i1, i3, i2, i4)s, t + X
(i3, i1, i2, i4)s, t + X
(i3, i2, i1, i4)s, t
+ X(i1, i4, i3, i2)s, t + X
(i4, i1, i3, i2)s, t + X
(i4, i3, i1, i2)s, t
+ X(i2, i4, i3, i1)s, t + X
(i3, i4, i2, i1)s, t + X
(i4, i2, i3, i1)s, t
−X(i1, i2, i4, i3)s, t −X
(i2, i1, i4, i3)s, t −X
(i2, i4, i1, i3)s, t
−X(i1, i4, i2, i3)s, t −X
(i4, i1, i2, i3)s, t −X
(i4, i2, i1, i3)s, t
−X(i1, i3, i4, i2)s, t −X
(i3, i1, i4, i2)s, t −X
(i3, i4, i1, i2)s, t
−X(i2, i3, i4, i1)s, t −X
(i3, i2, i4, i1)s, t −X
(i4, i3, i2, i1)s, t
).
(2.6)
37
Similarly, substituting (2.2) into (2.5) gives
A((i1, i2), (i3, i4))s, t =
1
8
(X
(i1, i2, i3, i4)s, t + X
(i1, i3, i2, i4)s, t + X
(i3, i1, i2, i4)s, t
+ X(i2, i1, i4, i3)s, t + X
(i2, i4, i1, i3)s, t + X
(i4, i2, i1, i3)s, t
+ X(i4, i3, i1, i2)s, t + X
(i4, i1, i3, i2)s, t + X
(i1, i4, i3, i2)s, t
+ X(i3, i4, i2, i1)s, t + X
(i3, i2, i4, i1)s, t + X
(i2, i3, i4, i1)s, t
−X(i2, i1, i3, i4)s, t −X
(i2, i3, i1, i4)s, t −X
(i3, i2, i1, i4)s, t
−X(i1, i2, i4, i3)s, t −X
(i1, i4, i2, i3)s, t −X
(i4, i1, i2, i3)s, t
−X(i3, i4, i1, i2)s, t −X
(i3, i1, i4, i2)s, t −X
(i1, i3, i4, i2)s, t
−X(i4, i3, i2, i1)s, t −X
(i4, i2, i3, i1)s, t −X
(i2, i4, i3, i1)s, t
).
(2.7)
Thus, as one can see from (2.6) and (2.7) above, the general expressions for the two
types of third order area as linear combinations of 4! = 24 fourth order iterated
integrals are indeed different. Further, general formulae for different types of higher
order areas could be worked out just as straightforwardly – though the tedium of such
an exercise increases factorially, and for our current purposes it is unnecessary to go
beyond third order areas.
In addition to mathematically proving (2.3), (2.6) and (2.7), these expressions
have also been verified numerically by computing second and third order areas from
first principles according to their definitions (2.1), (2.4) and (2.5), and by comparing
the results obtained using the two methods to make sure that they agree.
Finally, we return to our earlier suggestion of viewing the operation of computing
areas as a Lie bracket on the space of 1-dimensional paths. For, if (X1u)s≤u≤t and
(X2u)s≤u≤t are two paths defined over the time interval
[s, t], we can define their
product (X1 ∗ X2)s≤u≤t to be the path u 7→ X(1, 2)s, u for u ∈
[s, t], i.e. multiplying
paths means computing their second order iterated integral. Further, we define the
Lie bracket[X1, X2
]s≤u≤t
of X1u and X2
u to be their commutator, so that for u ∈[s, t]
[X1, X2
]u
:=(X1 ∗X2 −X2 ∗X1
)u
= X(1, 2)s, u −X(2, 1)
s, u = 2A(1, 2)s, u . (2.8)
It is clear that the multiplication * on the path space is non-commutative, since
in general X(1, 2)s, u is not equal to X
(2, 1)s, u , and it is also important to realise that it is not
associative either, as can readily be seen through theoretical considerations as well as
verified by numerical examples, i.e.
((X1 ∗X2) ∗X3
)u6=(X1 ∗ (X2 ∗X3)
)u
,
38
for arbitrary paths X1u, X2
u and X3u defined on
[s, t]. Moreover, it should be pointed
out that generally ((X1 ∗X2) ∗X3)u is not equal to X(1, 2, 3)s, u : e.g. for a 3-dimensional
linear path(X1
u, X2u, X3
u
)s≤u≤t
the former equals 14X
(1)s,uX
(2)s,uX
(3)s,u whereas the latter
equals 16X
(1)s,uX
(2)s,uX
(3)s,u .
Despite being both bilinear and anti-commutative, the Lie bracket defined by (2.8)
does not endow the path space with the algebraic structure of a Lie algebra, because
the Jacobi identity does not hold due to the non-associativity of the * multiplication.
However, using (2.3), the Lie bracket is easily seen to satisfy the following interesting
identity for any u ∈[s, t]:
[ [X1, X2
], X3
]u
+[ [
X2, X3], X1
]u
+[ [
X3, X1], X2
]u
= X(1, 2, 3)s, u −X(1, 3, 2)
s, u + X(2, 3, 1)s, u −X(2, 1, 3)
s, u + X(3, 1, 2)s, u −X(3, 2, 1)
s, u .
Thus, we note that the expression on the right hand side of the above Jacobi-type
identity for our Lie bracket is an alternating sum of third order iterated integrals
for all the permutations of the path indices where the sign of the term indexed by
(i1, i2, i3) with distinct ik ∈ {1, 2, 3} for k = 1, 2, 3 is the sign of the permutation
that maps (1, 2, 3) to (i1, i2, i3).
2.5 Classification of paths using third order areas
2.5.1 Diffusion process market model
The idea of modelling the evolution of stock prices and other variables in financial
markets by diffusion processes goes back to the pioneering work of L. Bachelier at the
turn of the 20th century. By a diffusion process we mean an n-dimensional stochastic
process Xt =(X1
t , . . . , Xnt
)t≥0
that satisfies a stochastic differential equation of the
form
dX it = μi(t,Xt)dt +
m∑
j=1
σ ij (t,Xt)dBj
t
where, for each i ∈ {1, . . . , n}, μi : Rn+1 → R and σ ij : Rn+1 → R are the drift and
diffusion (volatility) coefficients of X it , and
(Bj
t
)t≥0
are independent Brownian motion
processes for j = 1, . . . ,m.
In a financial setting, a diffusion process Xt =(X1
t , . . . , Xnt
)t≥0
may be used
to model a financial market that consists of n market variables X it (i = 1, . . . , n)
such as stock and commodity prices, currency exchange rates and interest rates of
various maturities. Thus, in such a modelling framework the market is driven by
39
m random processes – which may be interpreted as representing macro- and micro-
economic factors as well as political or other events that affect the prices of financial
instruments – whose impacts on the value of each market variable are superimposed
on an underlying growth rate or ‘drift’ that is specific to each variable but may
depend on the values of other variables as well as time. In an alternative, more
restricted formulation, each market variable would be driven by a single random
process associated with it, though the Brownian motions for different variables would
be assumed to be correlated, and the drift and volatility of each variable would be
functions of its own value and time only. In this section, we shall adopt this approach
to modelling a financial market, as specified below.
Some market variables could be expected to grow in value by a constant or possibly
deterministic time-dependent rate: e.g. it might be appropriate to assume that the
share price of a public listed company would grow at a constant rate subject to random
shocks due to market releases of company-specific information or general economic
data. By contrast, other market variables have a tendency to fluctuate around some
long-term mean levels in a cyclical fashion – e.g. interest rates exhibit such mean-
reverting behaviour over economic cycles – and for any such market variable X it one
could postulate its drift to be of the form θi(αi − X i
t
)where the parameters θi and
αi are called its mean reversion speed and mean-reverting level (or long-term mean),
respectively, which in general might be time-dependent. Hence, if the current value
of such a mean-reverting process is below its long-term mean, the drift is positive,
whereas if its current value is above the mean-reverting level, the drift is negative, so
that in both cases the process tends to revert towards its long-term mean.
From now on, we shall consider a financial market consisting of n variables X it
(i = 1, . . . , n) each of which follows either a Wiener process with a constant drift
(‘CD process’)
dX it = μidt + σidBi
t (2.9)
where μi and σi > 0 are constants, or a mean-reverting Ornstein-Uhlenbeck process
(‘MR process’)
dX it = θi
(αi −X i
t
)dt + σidBi
t (2.10)
where αi, θi > 0 and σi > 0 are constants, and the correlation ρij between the
Brownian motion Bit driving the variable X i
t and the Brownian driver Bjt of another
variable Xjt is given by E
[dBi
tdBjt
]= ρijdt.
For t ≥ 0, (2.9) and (2.10) can be easily integrated to give
X it = X i
0 + μit + σiBit (2.11)
40
and
X it = X i
0 e−θit + αi(1− e−θit
)+ σi
∫ t
0
e−θi(t−s)dBis . (2.12)
Thus, for a mean-reverting variable X it , the limit of E
[X i
t ] as t tends to infinity is αi,
so this parameter is indeed the long-term mean of such a process.
We will simulate the evolution of our market Xt =(X1
t , . . . , Xnt
)0≤t≤T
, as defined
above, over a finite time horizon[0, T
]by generating a large number of correlated
Brownian sample paths(Bi
t
)0≤t≤T
for the market variables(X i
t
)0≤t≤T
for i = 1, . . . , n,
assuming that all pairs of Brownian motions driving distinct market variables have
the same correlation ρ.
Even though in our diffusion model all paths of market variables are, by construc-
tion, realisations of CD or MR processes for some combinations of drift (or mean
reversion speed and long-term mean) and volatility parameters, just by looking at
such market paths it is usually far from apparent which type of diffusion process was
used to generate them and what parameter values those processes may have had.
For example, individual realisations of an MR process for different Brownian sample
paths might well appear either upward or downward trending rather than exhibiting
mean-reverting behaviour depending on the characteristics of the Brownian sample
paths driving the process. Indeed, an arbitrary continuous path can be represented
as a realisation of either a CD or MR process with any combination of parameters
for some continuous path as the Brownian driver of the given path; and, further, any
number of arbitrary paths can all be regarded as realisations of, say, the same CD
process driven by different Brownian sample paths.
As can be seen from (2.9) and (2.10), for a small time step δt, the drift term of the
increment of a CD or MR process is of the order of δt whereas the expected absolute
value of the diffusive term is of the order of√
δt. This means that over short time
scales the diffusive term tends to dominate the drift term with random Brownian
movements obscuring any underlying constant drift or mean-reverting trend, such
latent tendencies only manifesting themselves over longer time periods. In practice,
when trying to estimate the parameters of a CD process one commonly finds that while
one can usually obtain a reasonably accurate estimate for the volatility parameter by
calculating the standard deviation of increments for a small number of realised paths
– even for a single realisation – the sample mean of increments is often, even for a
significantly larger number of realisations, a woefully inadequate estimate for the drift
parameter.
41
2.5.2 Areas for pairs of diffusion processes
The key to classifying paths in our diffusion process market model is to derive general
expressions for the area differentials of pairs of realisations of CD or MR processes.
For two CD processes(X ik
t
)0≤t≤T
with drift and volatility parameters μik and σik
where ik ∈ {1, . . . , n} for k ∈ {1, 2}, both of which are driven by the same Brownian
path(Bt
)0≤t≤T
, by substituting (2.9) and (2.11) into (2.2) one readily obtains
dA(i1, i2)0, t =
1
2
{(μi1σi2 − μi2σi1
)(t dBt − Btdt
)}(2.13)
for 0 ≤ t ≤ T . Similarly, for two MR processes(Y ik
t
)0≤t≤T
with long-term mean
and volatility parameters αik and σik and the same mean reversion speed θ where
ik ∈ {1, . . . , n} for k ∈ {1, 2}, both of which are driven by the same Brownian path(Bt
)0≤t≤T
, by substituting (2.10) and (2.12) into (2.2) and after some algebra one
arrives at the following expression:
dA(i1, i2)0, t =
1
2
{((αi1 − Y i1
0 )σi2 − (αi2 − Y i20 )σi1
)(ft(θ)dBt − θZt(θ)dt
)}(2.14)
where ft(θ) = 1 − e−θt and Zt(θ) =
∫ t
0
e−θ(t−s)dBs for 0 ≤ t ≤ T . We note that
as θ → 0, ft(θ) → θt and Zt(θ) → Bt, so in the limit when the mean reversion
speed approaches zero, we recover (2.13) with the drifts μik equal to the initial drifts
θ(αik − Y ik0 ) of the two MR processes for k ∈ {1, 2}.
Just as straightforwardly one could also derive a (somewhat messier) expression
for the area differential of a CD process and an MR process driven by the same
Brownian path as a linear combination of dBt and dt terms whose coefficients are
functions of ft(θ), t, Bt and Zt(θ) as well as the parameters of the two processes, but
since this won’t be used in our subsequent analysis it is omitted.
Hence, as can be seen from (2.13) and (2.14), the increment of the area of two
CD or two MR processes with the same mean reversion speed and driven by the
same Brownian path over a small (non-infinitesimal) time interval δt has a stochastic
dependence on the cumulative value of the underlying Brownian motion as well as a
deterministic time dependence, and, depending on the relative magnitudes and signs
of these quantities, the area increment can have either the same or the opposite sign
as the Brownian increment δBt at different points in time along the path. Thus, in
this sense increments of the area process of two such CD or MR processes are in gen-
eral neither perfectly correlated nor anti-correlated with increments of the Brownian
motion driving the processes.
42
However, the key observation is that the ratio of area increments of two pairs
of CD processes or two pairs of MR processes that have the same mean reversion
speed, with all four processes driven by the same Brownian path, is a deterministic,
time-invariant constant that depends only on the drift and volatility parameters of
the four CD processes or on the long-term mean and volatility parameters of the
four MR processes (as well as their initial values, but is independent of the mean
reversion speed shared by the four MR processes). This means that in these cases the
trajectory t 7→(A
(i1, i2)0, t , A
(i3, i4)0, t
)is a straight line of constant gradient for t ∈
[0, T
],
or, equivalently, the third order area A((i1, i2), (i3, i4))0, t is identically equal to zero for all
t ∈[0, T
]. This is illustrated in Figure 6 below where, at time points ti = ( i
100)T for
i = 0, 1, . . . , 100, the area of two MR processes is plotted against the area of another
pair of MR processes with different long-term mean and volatility parameters but
all four processes having the same mean reversion speed and driven by the same
Brownian path.
Figure 6: Scatter plot of areas of two pairs of MR processes with different long-termmeans and volatilities, but all four processes having the same mean reversion speedand driven by the same Brownian path.
43
To underline the fundamental importance of the assumptions of the same mean
reversion speed for MR processes and a common Brownian driving path to the linear
relationship between areas of pairs of CD or MR processes, Figure 7 below illustrates
the dramatic impact that changing the mean reversion speed – even slightly – for one
of the MR processes that were used to create Figure 6 can have on this relationship
– it suddenly becomes highly non-linear!
Figure 7: Scatter plot of areas of the same two pairs of MR processes as in Figure 6after slightly altering the mean reversion speed for one of the processes.
In fact, the area process of a pair of CD or MR processes with the same mean
reversion speed can sometimes be completely ‘derailed’ in this manner by changing
any parameter of one of the processes for the duration of a single time step only
while having a barely perceptible effect on the future evolution of the process that
has been momentarily perturbed. So, any hope that the above result on the linear
relationship between area processes could be extended to CD or MR processes with
deterministic time-dependent drift or volatility parameters is unfortunately forlorn (as
is also evident from a theoretical perspective by relaxing the parameter assumptions
in (2.13) and (2.14) above) – it only works for constant parameters.
44
If instead one pairs a CD process with an MR process and plots their area against
the area of a pair of either two CD processes or two MR processes or against that of
another mixed pair of CD and MR processes, one can no longer expect to observe a
linear relationship between the two area processes even though all four path processes
may still be driven by the same Brownian motion. And, naturally, the same will be
true for a scatter plot of the areas of a pair of CD paths and a pair of MR paths.
Figure 8 below exhibits the areas of a pair of CD processes and a mixed pair of CD
and MR processes plotted against each other for one Brownian sample path – and for
different realisations of Brownian motion this scatter plot would look very different!
Figure 8: Scatter plot of the areas of a pair of two CD processes and a mixed pair ofCD and MR processes all driven by the same Brownian path.
As a further example of intriguing non-linear behaviour that can be witnessed in
some of such cases, in Figure 9 below we display the final values of the areas of two
mixed pairs of CD and MR processes at the end of the time interval[0, T
]for 500
Brownian sample paths.
As can be seen from Figure 9, different Brownian paths have quite disparate effects
on these two area processes resulting in a highly non-linear scatter plot. It is rather
surprising that by simply changing some of the parameter values (e.g. the long-term
45
Figure 9: Scatter plot of terminal values of the areas of two mixed pairs of CD andMR processes for 500 simulation runs.
means of the two MR processes) the relationship between the two area processes can
be rendered approximately linear, as is shown in Figure 10 below, so that for each
Brownian path both of these areas evolve in essentially the same way. Furthermore,
the linear relationship can be made even tighter by increasing mean reversion speeds
for the MR processes (without affecting the slope).
As said, for the areas of two pairs of CD or MR processes to be linearly related,
in addition to all four processes having the same mean reversion speed (regarding
CD processes as having zero mean reversion speed even though in general they have
non-zero drifts), it is also a crucial requirement that they should all be driven by
the same Brownian path. To illustrate this point, in Figure 11 below are plotted
the terminal values of the areas of the two pairs of MR processes that were used to
produce Figure 6 for 500 simulation runs under the scenarios where the Brownian
motions driving the four MR processes have pairwise correlations of 1.00, 0.99 and
0.90, respectively. We observe in the scatter plots increasing dispersion about the
gradient as correlation is lowered. Indeed, this effect is quite pronounced even for a
marginal reduction in correlation from 1.00 to 0.99 though this makes the Brownian
46
Figure 10: Scatter plot of terminal values of the same two areas as in Figure 9 withdifferent long-term means assigned to the MR processes.
paths diverge only slightly from one another. Reducing correlation further to 0.90 –
which is still a high value for path correlation – renders the linear relationship scarcely
discernible, and for lower values of correlation the points appear to be scattered at
random without any recognisable pattern.
Alternatively, one could consider the third order area of four CD or MR processes
with the same mean reversion speed for a single simulation run (with the same four
sequences of independent Gaussian increments) for different values of the pairwise
correlation ρ. By examining the entries of Cholesky factorization of the correlation
matrix2 that is used to produce correlated Gaussian increments from four sequences
of independent drawings from the standard normal distribution, it is not too difficult
to see that, for values of ρ close to 1, the third order area is O(√
1− ρ2), so that,
for example, reducing correlation from 99.99% to 99.00% increases the third order
area approximately by a factor of 10. However, when all the diffusion processes have
the same parameters – so that the four paths are realisations of the same CD or MR
2 For it can readily be shown that each of the entries below the top row (1, ρ, ρ, ρ) of the uppertriangular Cholesky factor tends to some constant multiple of
√1− ρ2 as ρ approaches 1.
47
Figure 11: Scatter plots of terminal values of the areas of two pairs of MR processesall having the same mean reversion speed for 500 simulation runs with the pairwisecorrelation between Brownian motions driving the processes equal to 1.00, 0.99 and0.90, respectively.
process for highly correlated Brownian motions – in the expression (2.7) for the third
order area all the fourth order iterated integrals that are O(√
1− ρ2)
actually cancel
out, wherefore in this case the third order area is in fact O(1 − ρ2
), and, for this
reason, it grows twice as fast on a logarithmic scale as correlation is reduced from 1.
48
2.5.3 Classifying sample paths of diffusion processes by usingthird order areas
In the previous subsection, we established, via (2.13) and (2.14), the fundamental
result that the area process of the areas of two pairs of paths – i.e. the third order
area (of this type) of the four paths concerned – that are realisations of either CD
processes (with arbitrary drift and volatility parameters) or MR processes which
have the same mean reversion speed (but arbitrary long-term means and volatilities)
is identically zero as a function of time whenever all of these processes are driven by
the same Brownian motion.
We also saw that the third order area provides a sensitive means of detecting
differences in mean reversion speeds between four given diffusion processes that are
driven by the same Brownian motion, as the value of the third order area of their
sample paths can deviate spectacularly from zero even when discrepancies in mean
reversion speeds are so slight that their impacts on the sample paths are hardly visible.
Thus, third order areas can be used to classify sample paths of diffusion processes
according to their mean reversion speeds, for given an arbitrary ‘market’ path that
is assumed to be the realisation of a CD or MR process for some Brownian sample
path, we can combine it with three other paths that are realisations of either CD
or MR processes with the same mean reversion speed (but drift or long-term mean
and volatility parameters that can be given arbitrary values) all driven by the same
Brownian motion, and run a sufficiently large number of simulations to determine the
value of mean reversion speed and the Brownian sample path that minimise the third
order area of these four paths. More precisely, we want to find the combination of
mean reversion speed and Brownian sample path that minimises the Euclidean norm
of the third order area over the time interval[0, T
]as given by the expression below
(n∑
i=0
(A
((1, 2), (3, 4))0, i(T/n)
)2)1/2
where n is the number of time steps per simulation. In this way, computing third
order areas of market paths with sets of three test paths of known characteristics can
be used to differentiate between the two basic modes of market behaviour, namely
upward or downward trending (with constant drift) versus mean-reverting.
However, going back to our earlier discussion in Subsection 2.5.1, since an arbitrary
market path can be represented as the realisation of either a CD or MR process
(with any given parameters) for some Brownian sample paths, there are no absolute
49
grounds for regarding one path as ‘upward trending’, say, and another path as ‘mean-
reverting’ if one works with the whole sample space of Brownian motion, that is,
the set of all continuous paths. To make the distinction between these two types of
path meaningful, we need to restrict, for the purposes of our analysis, the cardinality
of the set of Brownian sample paths driving diffusion processes to a finite number.
Nevertheless, this number, which we will denote by N , can be chosen to be arbitrarily
large.
Let us now illustrate our path classification method with a numerical example. As
first step, we generate 1,000 Brownian sample paths, each comprising 100 independent
Gaussian increments (so in this example n = 100 and N = 1000). For our market
path, we choose the realisation of an MR process with mean reversion speed θmkt
equal to 5.1 (and some arbitrary long-term mean and volatility) for the first Brownian
sample path (i.e. of index 1). Then, for three ‘test’ MR processes which have the same
mean reversion speed θtest (and fixed but arbitrary long-term means and volatilities),
we compute the third order area of the quadruple of market and test paths that are
the realisations of the test MR processes for each of the 1,000 Brownian sample paths
– so that in each case the three test paths are driven by the same Brownian motion
– and for each value of θtest ∈ {0.0, 1.0, . . . , 9.0} we record the minimum Euclidean
norm of the third order area and the index of the Brownian sample path that attains
the minimum area value3. The results of this experiment are displayed in Table 1.
Table 1: Determining the mean reversion speed of a given ‘market’ path by minimisingits third order area with three test paths all driven by the same Brownian motion.
3 For θtest = 0.0, the test paths are actually realisations of CD processes with different drifts andvolatilities. Hence, for a market path that is the realisation of some CD process, the third orderarea would be identically zero when the test paths are driven by the first Brownian sample path.
50
There are several interesting observations that one can make from these results.
Firstly, we note a general tendency for the third order area to increase with increasing
value of θtest – which is indeed what one would expect on the basis of (2.14). However,
this trend is bucked at θtest = 5.0 – the value of mean reversion speed for the test
paths that is closest to that of the market path (recall θmkt = 5.1) – where we observe
a sharp dip in the minimum value of the Euclidean norm of the third order area
that is attained when the test paths are driven by the same Brownian motion as the
market path. It is also worth noticing that while for θtest = 6.0 the minimum third
order area is also attained by the same Brownian sample path (of index 1) – though
with much higher minimum value compared to when θtest = 5.0 – for all other values
of θtest different Brownian sample paths are responsible for producing the minima, as
shown in Table 1.
Having thus found an approximate value for θmkt – namely, that it is around 5.0 –
as well as correctly identified the Brownian sample path that drives the market path,
we could proceed to determine θmkt to an arbitrary degree of accuracy by iterating on
the value of θtest to make the third order area converge towards zero. However, there
is a more direct and efficient way to determine the exact value of θmkt as well as those
of the other parameters: since both the market path and the Brownian sample path
driving it are now completely known, we can write down equations for, say, the first
three increments of the market path using (2.10), and as these are three simultaneous
equations that are linear in θmkt, σmkt and θmktαmkt, they can be easily solved for
the mean reversion speed, volatility and long-term mean of the MR process whose
realisation the market path is.
As this example demonstrates, in a market model where the evolution of every
variable is the realisation of a CD or MR diffusion process for one of finitely many
Brownian sample paths, the above method of computing third order areas can be used
to classify market paths into upward/downward trending versus mean-reverting ones
by efficiently determining the values of mean reversion speed and other parameters
as well as the Brownian sample path that drives the diffusion process.
In order to simulate the application of this path classification method to real mar-
ket data streams (i.e. arbitrary continuous paths), we also carried out an experiment
where market paths were realisations of CD or MR processes for Brownian sample
paths not belonging to the set of Brownian paths that were used to drive test paths.
Unfortunately, unlike in the previous experiment, in this experiment the approximate
value of mean reversion speed of the market path could not be identified as the value
of mean reversion speed for test paths that produces the lowest third order area –
51
even when the market and test paths had exactly the same mean reversion speed
their third order area did not stand out by having a distinctively low, or even zero,
value, and in most cases the Brownian driver of test paths that minimised third order
area bore no resemblance to the Brownian path driving the market path! Moreover,
increasing the number of Brownian sample paths from 1,000 to 10,000 or even 100,000
did not improve the performance of this path classification method. Evidently, the
lesson to learn from this latter exercise is that even in a very large but finite space
of Brownian sample paths the probability of chancing upon a path that is, in some
defined sense, close to a given arbitrary path is vanishingly small – in fact zero – and
in the previous subsection we saw how even tiny discrepancies in the driving paths of
CD or MR processes can have a dramatic impact on their area processes!
In view of this negative (though not at all surprising) result, it is clear that the
proposed method of classifying paths according to their mean reversion speeds is not
applicable, without modification, in the general market setting where paths of market
variables can be thought of as realisations of diffusion processes for arbitrary sample
paths of Brownian motion (i.e. any continuous paths). However, if the problem could
somehow be reduced to one involving a space of only finitely many Brownian sample
paths, this method would be rendered viable, as was shown in the first experiment.
One idea in this direction would be to try to use third order areas of quadruples of
an arbitrary market path and three CD or MR test paths computed, as in the above
experiments, for a range of mean reversion speeds and a finite number of Brownian
sample paths, in order to approximate a given market path by a sum of realisations
of CD and MR processes for paths in a fixed, finite sample space of Brownian motion.
This suggestion will be pursued by the author as a line of future research outside the
scope of this thesis.
2.6 Conclusion
We began this chapter by surveying existing literature for applications of the theory
of rough paths to time series analysis, and found that so far the two main approaches
have been to use the signatures of multi-dimensional discrete data streams as feature
sets in linear regression for the purposes of statistical classification and prediction –
an approach that can capture subtle underlying features of market behaviour when
applied to financial data – and to employ expected signatures of multi-dimensional
stochastic processes for estimating their parameters.
52
In our original research work presented in this thesis, we have taken a different,
more direct (non-statistical and non-probabilistic) approach by exploring ways in
which first and higher order areas of multi-dimensional data streams – an nth order
area being a specific linear combination of (n + 1)! signature components of order
(n+1) – can be used to classify data streams according to their basic characteristics.
Having first developed all the requisite mathematical and computational tools for
this task, we have shown, as a particular application of this approach, that in a
market model where every variable follows either a Wiener process with a constant
drift or a mean-reverting Ornstein-Uhlenbeck process driven by one of finitely many
Brownian sample paths, third order areas provide an efficient means of determining
the parameters of a market variable given any of its realisations, thus enabling one to
distinguish between the two fundamental modes of market behaviour, namely upward
or downward trending versus mean-reverting.
In conclusion, our path classification method based on third order areas represents
a novel way of using signature components of multi-dimensional paths for the purposes
of time series analysis of financial data streams.
An interesting idea for future research would be to investigate the possibility of
using third order areas as a tool for decomposing arbitrary market paths into mean-
reverting path components with a spectrum of mean reversion speeds.
53
References
[1] K. T. Chen. Iterated path integrals. Bulletin of American Mathematical Society,
83:831–879, 1977.
[2] G. Flint, B. Hambly, and T. Lyons. Discretely sampled signals and the rough
Hoff process. Stochastic Processes and their Applications, arXiv:1310.4054v11,
2016.
[3] X. Geng. Reconstruction for the signature of a rough path. Preprint
arXiv:1508.06890v2, 2016.
[4] M. Gubinelli. Ramification of rough paths. Journal of Differential Equations,
248(4):693–721, 2010.
[5] L. J. Gyurko, T. Lyons, M. Kontkowski, and J. Field. Extracting information
from the signature of a financial data stream. Preprint arXiv:1307.7244v2, 2014.
[6] B. M. Hambly and T. J. Lyons. Uniqueness for the signature of a path of bounded
variation and the reduced path group. Annals of Mathematics, 171(1):109–167,
2010.
[7] B. Hoff. The Brownian frame process as a rough path. D.Phil. thesis, University
of Oxford, 2005.
[8] D. Levin, T. Lyons, and H. Ni. Learning from the past, predicting the statistics
for the future, learning an evolving system. Preprint arXiv:1309.0260v6, 2016.
[9] T. J. Lyons. Differential equations driven by rough signals. Revista Matematica
Iberoamericana, 14(2):215–310, 1998.
[10] T. J. Lyons, M. Caruana, and T. Levy. Differential Equations Driven by Rough
Paths. Number 1908 in Lecture Notes in Mathematics. Springer-Verlag, 2007.
54
[11] T. J. Lyons and Z. Qian. System Control and Rough Paths. Oxford Mathematical
Monographs. Oxford University Press, 2002.
[12] R. Ree. Lie elements and an algebra associated with shuffles. Annals of Mathe-
matics, 68(2):210–220, 1958.
[13] L. C. Young. An inequality of Holder type, connected with Stieltjes integration.
Acta Mathematica, 67:251–282, 1936.
55
Appendix 1: Quadratic variationand cross-variation of data streams
In this appendix we compute the signature of a multi-dimensional data stream after
lead- and lag-transforming two of its path components respectively.
Let(Xt
)N
t=0=(X1
t e1+∙ ∙ ∙+Xdt ed
)N
t=0be a data stream in V = Rd with a set of basis
vectors {e1, . . . , ed}, and denote by(Yt/2
) 2N
t=0the data stream derived from
(Xt
)N
t=0
by lead-transforming its ith component and lag-transforming its jth component in
accordance to the GLKF method so that Y it = X i
t and Y jt = Xj
t for t = 0, . . . , N , and
Y it− 1
2
= X it and Y j
t− 12
= Xjt−1 for t = 1, . . . , N . As we are ultimately interested in the
area between the Y i and Y j path components – i.e. (half of) the difference between
the (i, j) and (j, i) second order signature components of Y – in our calculation we will
explicitly show only those signature components that can contribute to this quantity
(i.e. the first order increments in Y i and Y j in addition to the aforementioned second
order signature components).
Computing the signature of Y over the time interval[0, 1
2
], we get
S(Y )0, 12
= 1 + (X i1 −X i
0)ei + 0ej + 12(X i
1 −X i0).0ei ⊗ ej + 1
20.(X i
1 −X i0)ej ⊗ ei + . . .
= 1 + (X i1 −X i
0)ei + . . .
Likewise, S(Y ) 12,1 = 1 + (Xj
1 −Xj0)ej + . . . , whence we obtain
S(Y )0,1 = S(Y )0, 12⊗S(Y ) 1
2,1 = 1+(X i
1−X i0)ei+(Xj
1−Xj0)ej+(X i
1−X i0)(X
j1−Xj
0)ei⊗ej+. . .
Similarly, we have
S(Y )1,2 = S(Y )1, 32⊗S(Y ) 3
2,2 = 1+(X i
2−X i1)ei+(Xj
2−Xj1)ej+(X i
2−X i1)(X
j2−Xj
1)ei⊗ej+. . .
which, as S(Y )0,1 ⊗ S(Y )1,2 = S(Y )0,2, gives
S(Y )0,2 = 1 + (X i2 −X i
0)ei + (Xj2 −Xj
0)ej
+{(X i
1 −X i0)(X
j1 −Xj
0) + (X i2 −X i
1)(Xj2 −Xj
1) + (X i1 −X i
0)(Xj2 −Xj
1)}ei ⊗ ej
+ (Xj1 −Xj
0)(Xi2 −X i
1)ej ⊗ ei + . . .
56
Further, extending the signature calculation to the next time step yields
S(Y )0,3 = 1 + (X i3 −X i
0)ei + (Xj3 −Xj
0)ej
+{(X i
1 −X i0)(X
j1 −Xj
0) + (X i2 −X i
1)(Xj2 −Xj
1) + (X i3 −X i
2)(Xj3 −Xj
2)
(X i1 −X i
0)(Xj2 −Xj
1) + (X i2 −X i
0)(Xj3 −Xj
2)}ei ⊗ ej
+{(Xj
1 −Xj0)(X
i2 −X i
1) + (Xj2 −Xj
0)(Xi3 −X i
2)}ej ⊗ ei + . . .
Thus, we can see from the above expressions for S(Y )0,1, S(Y )0,2, S(Y )0,3, etc.
that if X it = Xj
t for all t = 0, . . . , N – i.e. Y i and Y j are the lead- and lag-transform
of the same data stream – then the difference between the (i, j) and (j, i) signature
components of Y is equal to the quadratic variation of this data stream. However,
when X i and Xj are two different data streams, it is evident that by subtracting the
(j, i) component from the (i, j) component does not in general cancel out the ‘extra’
terms in the latter, leaving just the quadratic cross-variation of X i and Xj . This can
also be experimentally demonstrated in random simulations where (twice) the area
between lead- and lag-transformed data streams and the quadratic cross-variation of
the original streams usually have decidedly different values.
57
Appendix 2: Python code
In this appendix we exhibit samples of source code of the computer programs written
in Python that have been used to produce all the computational results presented in
this thesis.
1. Signature function
The numerical algorithm that is fundamental to all the computational work done in
this piece of research is a function that generates a time-indexed sequence of truncated
signatures of an arbitrary degree specified by the user for a given multi-dimensional
time series. This is accomplished through the following two-step procedure: 1) by
repeated application of outer matrix multiplication (a standard Numpy functionality)
computing from path increments iterated integrals of successively higher orders for
each linear segment of the continuous multi-dimensional path constructed by linearly
interpolating between discrete data points, and constituting these into an incremen-
tal signature over the time step by appending them to a list starting with 1 as the
zeroth order component; and 2) by joining together incremental signatures for con-
tiguous time steps using tensor multiplication (by another application of outer matrix
multiplication) to form a cumulative signature over the whole time period.
The source code of our user-defined signature function is shown in full below.
import numpy as np
""" Definition of a function to compute a time-indexed sequence of truncated signatures
of a specified degree of multi-dimensional path increments """
""" For t=0,1,...,(t_fin-t_ini) and k=0,1,...,p, Sig( ) function computes signature components
(i.e. iterated integrals) S[t][k][d_1,...,d_k] with d_j=0,1,...,d-1 for j=1,...,k """
def Sig(path_increments, t_ini, t_fin, signature_degree):
I = path_increments # matrix of floats with n rows and d columns
p = signature_degree # degree of truncated signature
n = len(I[:,0]) # number of increments per path component
d = len(I[0,:]) # number of path components (dimension)
# t_ini (0,...,n-1) and t_fin (1,...,n) are initial and final time points
58
""" Initialize signature for time t_ini """
Sig_0 = [1.0]
Z = np.zeros(d)
for i in range(p):
s = np.multiply.outer(Sig_0[i], Z)
Sig_0.append(s)
Sig = Sig_0
S = [Sig_0]
""" Compute incremental signature for each time step and update cumulative signature """
for i in range(t_ini, t_fin, 1): # iterate over time points from t_ini to t_fin-1
J = I[i,:] # extract ith row of path increments
Sig_inc = [1.0]
for j in range(p): # compute incremental signature
x = np.multiply.outer(Sig_inc[j], J)
x = x/(j+1)
Sig_inc.append(x)
Sig_new = [1.0]
for k in range(1, p+1, 1): # k = 1,...,p
y = Sig_0[k]
for l in range(k+1): # l = 0,1,...,k
y = y + np.multiply.outer(Sig[l], Sig_inc[k-l])
Sig_new.append(y)
Sig = Sig_new # update cumulative signature
S.append(Sig) # form sequence of signatures indexed by time
return S # return value of user-defined signature function
2. Lead-lag transforms
Below we exhibit a Python script that can be used to obtain lead and lag transforms
of arbitrary multi-dimensional data streams in accordance with the three methods
examined in Section 2.3, and to compute their signatures – to check that they are
indeed invariant under these transforms – as well as to visualise the original and
lead-lag transformed paths.
import numpy as np
import matplotlib.pyplot as plt
import pylab
from matplotlib.ticker import MultipleLocator
59
majorLocator = MultipleLocator(0.20)
minorLocator = MultipleLocator(0.10)
""" Set dimensionality and number of time steps """
d = 3 # number of paths (dimensionality)
n = 10 # number of increments per path
p = 3 # degree of signature
""" Generate original Brownian paths """
np.random.seed(1) # initialize random number generation
I = np.random.randn(n, d) # standard normal random increments
z = np.zeros(d)
dP = np.vstack( (z, I) )
P = np.cumsum(dP, axis=0) # original Brownian paths
S = Sig(I, 0, n, p) # compute their signature
""" Lead transform paths (GLKF method) """
c1 = np.delete( np.repeat(P[:,0], 2), 0) # lead transform first path
c2 = np.delete( np.repeat(P[:,1], 2), 0) # lead transform second path
c3 = np.delete( np.repeat(P[:,2], 2), 0) # lead transform third path
R = np.column_stack( (c1, c2, c3) ) # lead transformed paths
R1 = np.delete(R, 0, 0) # delete top row
R2 = np.delete(R, 2*n, 0) # delete bottom row
J = R1 - R2 # increments of lead transformed paths
T = Sig(J, 0, 2*n, p) # compute their signature
""" Lag transform paths (GLKF method) """
C1 = np.delete( np.repeat(P[:,0], 2), (2*n + 1)) # lag transform first path
C2 = np.delete( np.repeat(P[:,1], 2), (2*n + 1)) # lag transform second path
C3 = np.delete( np.repeat(P[:,2], 2), (2*n + 1)) # lag transform third path
Q = np.column_stack( (C1, C2, C3) ) # lag transformed paths
Q1 = np.delete(Q, 0, 0) # delete top row
Q2 = np.delete(Q, 2*n, 0) # delete bottom row
K = Q1 - Q2 # increments of lag transformed paths
U = Sig(K, 0, 2*n, p) # compute their signature
""" Lead and lag transformed paths have the same signature as the original paths! """
60
print S # original signature
print T # lead transformed signature
print U # lag transformed signature
""" Plot the original and lead/lag transformed paths """
t = np.linspace(0.0, 1.0, n+1)
u = np.linspace(0.0, 1.0, (2*n)+1)
j = 1 # choose path (j = 0, . . . , d-1)
plt.figure()
plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path
plt.plot(u, R[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path
plt.plot(u, Q[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path
plt.title(’Gyurko-Lyons-Kontkowski-Field Lead-Lag Transforms’, fontsize=14)
ax = plt.axes()
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_minor_locator(minorLocator)
ax.xaxis.grid(which=’minor’)
plt.xlabel(’time’)
plt.ylabel(’value’)
pylab.legend(loc = ’upper left’)
plt.grid()
plt.show()
""" Lead transform paths (old FHL method) """
d1 = np.insert( np.repeat(P[:,0], 2), 2*n+2, P[n,0] ) # lead transform first path
d1 = np.delete( np.delete(d1, 1), 1)
d2 = np.insert( np.repeat(P[:,1], 2), 2*n+2, P[n,1] ) # lead transform second path
d2 = np.delete( np.delete(d2, 1), 1)
d3 = np.insert( np.repeat(P[:,2], 2), 2*n+2, P[n,2] ) # lead transform third path
d3 = np.delete( np.delete(d3, 1), 1)
V = np.column_stack( (d1, d2, d3) ) # lead transformed paths
V1 = np.delete(V, 0, 0) # delete top row
V2 = np.delete(V, 2*n, 0) # delete bottom row
L = V1 - V2 # increments of lead transformed paths
X = Sig(L, 0, 2*n, p) # compute their signature
""" Lag transform paths (old FHL method) """
D1 = np.insert( np.repeat(P[:,0], 2), 0, P[0,0] ) # lag transform first path
D1 = np.delete( np.delete(D1, 2*n), 2*n)
D2 = np.insert( np.repeat(P[:,1], 2), 0, P[0,1] ) # lag transform second path
D2 = np.delete( np.delete(D2, 2*n), 2*n)
D3 = np.insert( np.repeat(P[:,2], 2), 0, P[0,2] ) # lag transform third path
D3 = np.delete( np.delete(D3, 2*n), 2*n)
W = np.column_stack( (D1, D2, D3) ) # lag transformed paths
61
W1 = np.delete(W, 0, 0) # delete top row
W2 = np.delete(W, 2*n, 0) # delete bottom row
M = W1 - W2 # increments of lag transformed paths
Y = Sig(M, 0, 2*n, p) # compute their signature
""" Lead and lag transformed paths have the same signature as the original paths! """
print S # original signature
print X # lead transformed signature
print Y # lag transformed signature
""" Plot the original and lead/lag transformed paths """
t = np.linspace(0.0, 1.0, n+1)
u = np.linspace(0.0, 1.0, (2*n)+1)
j = 1 # choose path (j = 0, . . . , d-1)
plt.figure()
plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path
plt.plot(u, V[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path
plt.plot(u, W[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path
plt.title(’Flint-Hambly-Lyons Lead-Lag Transforms: Old Method’, fontsize=14)
ax = plt.axes()
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_minor_locator(minorLocator)
ax.xaxis.grid(which=’minor’)
plt.xlabel(’time’)
plt.ylabel(’value’)
pylab.legend(loc = ’upper left’)
plt.grid()
plt.show()
""" Lead transform paths (new FHL method) """
e1 = np.delete( np.repeat(np.delete(P[:,0], 0), 4) , 0) # lead transform first path
e1 = np.insert( np.insert( e1, 4*n - 1, P[n,0]), 4*n, P[n,0])
# in order to preserve signature, set . . .
#e1[0] = 0.0
e2 = np.delete( np.repeat(np.delete(P[:,1], 0), 4) , 0) # lead transform second path
e2 = np.insert( np.insert( e2, 4*n - 1, P[n,1]), 4*n, P[n,1])
# in order to preserve signature, set . . .
#e2[0] = 0.0
e3 = np.delete( np.repeat(np.delete(P[:,2], 0), 4) , 0) # lead transform third path
e3 = np.insert( np.insert( e3, 4*n - 1, P[n,2]), 4*n, P[n,2])
# in order to preserve signature, set . . .
#e3[0] = 0.0
v = np.column_stack( (e1, e2, e3) ) # lead transformed paths
v1 = np.delete(v, 0, 0) # delete top row
v2 = np.delete(v, 4*n, 0) # delete bottom row
62
l = v1 - v2 # increments of lead transformed paths
x = Sig(l, 0, 4*n, p) # compute their signature
""" Lag transform paths (new FHL method) """
E1 = np.repeat(np.delete(P[:,0], n), 4) # lag transform first path (corrected way)
E1 = np.insert( E1, 4*n, P[n,0])
E1[4*n]=E1[4*n-1] # incorrect FHL definition
E2 = np.repeat(np.delete(P[:,1], n), 4) # lag transform second path (corrected way)
E2 = np.insert( E2, 4*n, P[n,1])
E2[4*n]=E2[4*n-1] # incorrect FHL definition
E3 = np.repeat(np.delete(P[:,2], n), 4) # lag transform third path (corrected way)
E3 = np.insert( E3, 4*n, P[n,2])
E3[4*n]=E3[4*n-1] # incorrect FHL definition
w = np.column_stack( (E1, E2, E3) ) # lag transformed paths
w1 = np.delete(w, 0, 0) # delete top row
w2 = np.delete(w, 4*n, 0) # delete bottom row
m = w1 - w2 # increments of lag transformed paths
y = Sig(m, 0, 4*n, p) # compute their signature
""" Neither lead nor lag transform preserves signature! """
print S # original signature
print x # lead transformed signature
print y # lag transformed signature
""" Plot the original and lead/lag transformed paths """
t = np.linspace(0.0, 1.0, n+1)
f = np.linspace(0.0, 1.0, (4*n)+1)
j = 1 # choose path (j = 0, . . . , d-1)
plt.figure()
plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path
plt.plot(f, v[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path
plt.plot(f, w[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path
plt.title(’Flint-Hambly-Lyons Lead-Lag Transforms: New Method’, fontsize=14)
ax = plt.axes()
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_minor_locator(minorLocator)
ax.xaxis.grid(which=’minor’)
plt.xlabel(’time’)
plt.ylabel(’value’)
pylab.legend(loc = ’upper left’)
plt.grid()
plt.show()
63
3. Area simulations
The code sample displayed below is an extract from a program that for a set of
constant drift Wiener and mean-reverting Ornstein-Uhlenbeck processes with user-
specified parameters and correlation matrix computes first, second and third order
areas of their sample paths as functions of time using the closed-form formulae in
terms of second, third and fourth order iterated integrals, respectively (as worked out
in Section 2.4) for a given number of simulation runs.
import numpy as np
import sys
""" Set parameters for simulation """
d = 8 # number of paths (i.e. dimensionality of market)
n = 100 # number of increments per path (i.e. number of time steps)
t_ini = 0 # initial time point
t_fin = n # final time point
p = 4 # degree of signature
dt = 1.0/n # length of time step
sqrdt = np.sqrt(dt)
rho = 0.99 # correlation of any pair of paths
M = 1000 # number of simulation runs
np.random.seed(1) # initialize random number generation
""" Set parameters for diffusion processes """
P = np.zeros((d, 3))
""" Each row of matrix P specifies mean reversion speed (set 0.0 for any constant drift Wiener
process), long-term mean or drift, and volatility of an Ornstein-Uhlenbeck or Wiener process """
P[0] = [0.0, 10.0, 5.0]
P[1] = [0.0, 5.0, 10.0]
P[2] = [0.0, -10.0, 5.0]
P[3] = [0.0, -5.0, 10.0]
P[4] = [10.0, 10.0, 5.0]
P[5] = [10.0, 5.0, 10.0]
P[6] = [10.0, -10.0, 5.0]
P[7] = [10.0, -5.0, 10.0]
iv = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] # initial values of the paths
correlation = ’same’ # ’same’ to use the same correlation (rho) for all pairs of paths
""" Choose pairs of paths for two first order areas """
# indices run from 0 to d-1
# set i1 indices to be less than d/2, i2 indices d/2 or greater; similarly for higher order areas
64
i1 = [0,1]
i2 = [4,5]
""" Choose triples of paths for two second order areas """
j1 = [0,1,2]
j2 = [4,5,6]
""" Choose quadruples of paths for two third order areas of type (a) """
k1 = [0,1,2,3]
k2 = [4,5,6,7]
""" Choose quadruples of paths for two third order areas of type (b) """
l1 = [0,1,2,3]
l2 = [4,5,6,7]
""" Specify distinct correlations for each pair of paths """
R = np.zeros((d, d))
R[0] = [1.0, 0.5, 0.4, 0.7, 0.3, 0.2, 0.4, 0.6]
R[1] = [0.0, 1.0, 0.3, 0.4, 0.2, 0.5, 0.6, 0.4]
R[2] = [0.0, 0.0, 1.0, 0.6, 0.4, 0.3, 0.5, 0.2]
R[3] = [0.0, 0.0, 0.0, 1.0, 0.4, 0.5, 0.3, 0.7]
R[4] = [0.0, 0.0, 0.0, 0.0, 1.0, 0.3, 0.2, 0.5]
R[5] = [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.4, 0.2]
R[6] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.5]
R[7] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]
R = R + np.transpose(R) - np.eye(d) # correlation matrix
if rho < 1.0 or correlation != ’same’:
if correlation == ’same’: # use the same correlation (rho) for all pairs of paths
R = np.ones((d,d))*rho + np.eye(d)*(1 - rho)
E = np.linalg.eigvals(R) # compute eigenvalues of correlation matrix
if E[0]<=0 or E[1]<=0 or E[2]<=0 or E[3]<=0 or E[4]<=0 or E[5]<=0 or E[6]<=0 or E[7]<=0:
sys.exit("Correlation matrix is not positive definite")
else:
L = np.linalg.cholesky(R) # lower triangular Cholesky factor of correlation matrix
U = np.transpose(L) # upper triangular Cholesky factor of correlation matrix
""" Simulation loop """
# For multiple simulation runs (M > 1), first and higher order areas at the final time point (n)
65
A1 = [ ] # first order areas for first pair of paths
B1 = [ ] # first order areas for second pair of paths
A2 = [ ] # second order areas for first triple of paths
B2 = [ ] # second order areas for second triple of paths
A3a = [ ] # third order areas of type (a) for first quadruple of paths
B3a = [ ] # third order areas of type (a) for second quadruple of paths
A3b = [ ] # third order areas of type (b) for first quadruple of paths
B3b = [ ] # third order areas of type (b) for second quadruple of paths
# For a single simulation run (M = 1), first and higher order areas at each time point (0, 1, . . . , n)
a1 = [ ] # first order areas for first pair of paths
b1 = [ ] # first order areas for second pair of paths
a2 = [ ] # second order areas for first triple of paths
b2 = [ ] # second order areas for second triple of paths
a3a = [ ] # third order areas of type (a) for first quadruple of paths
b3a = [ ] # third order areas of type (a) for second quadruple of paths
a3b = [ ] # third order areas of type (b) for first quadruple of paths
b3b = [ ] # third order areas of type (b) for second quadruple of paths
for m in range(M):
""" Generate independent standard normal path increments """
dW = np.random.randn(n,d)
if rho == 1.0 and correlation == ’same’:
for k in range(d):
dW[:,k] = dW[:,0]
""" Correlate path increments """
if rho == 1.0 and correlation == ’same’:
dY = dW
else:
dY = np.dot(dW, U)
""" Create correlated Brownian paths """
Z = np.zeros(d)
Z = np.vstack( (Z, dY*sqrdt) )
Z = np.cumsum(Z, axis=0)
""" Create path increments for diffusion processes """
for k in range(d):
if P[k, 0] == 0.0: # create increments for a Wiener process with drift
dY[:,k] = dY[:,k]*P[k,2]*sqrdt + P[k,1]*dt
66
else: # create increments for an Ornstein-Uhlenbeck process
X = np.zeros(n+1)
dX = np.zeros(n)
X[0] = iv[k]
for l in range(n): # l = 0,1,..., n-1
dX[l] = P[k,0]*(P[k,1] - X[l])*dt + P[k,2]*sqrdt*dY[l,k]
X[l+1] = X[l] + dX[l]
dY[:,k] = dX
""" Create paths for diffusion processes """
Y_0 = iv # initial values of paths
Y = np.vstack( (Y_0, dY) )
X = np.cumsum(Y, axis=0) # entire paths
""" Compute sequence of time-indexed signatures of multi-dimensional path increments """
S1 = Sig(dY[:,:(d/2)], t_ini, t_fin, p)
S2 = Sig(dY[:,(d/2):], t_ini, t_fin, p)
s1 = S1[t_fin - t_ini] # signatures over the whole time interval
s2 = S2[t_fin - t_ini]
x1 = s1[2] # second order iterated integrals
x2 = s2[2]
if p > 2:
y1 = s1[3] # third order iterated integrals
y2 = s2[3]
if p > 3:
z1 = s1[4] # fourth order iterated integrals
z2 = s2[4]
""" Function to compute first order area """
def Area1(i1, i2, x):
# i1 is index of first path
# i2 is index of second path
# x is array of second order iterated integrals
A1 = 0.5*(x[i1,i2] - x[i2,i1])
return A1
67
""" Function to compute second order area """
def Area2(j1, j2, j3, y):
# j1 is path to be integrated with area
# j2 is first path for area
# j3 is second path for area
# y is array of third order iterated integrals
# second order area A(j1,(j2,j3)) is computed
A2 = 0.25*(y[j1,j2,j3] - y[j1,j3,j2] + y[j2,j1,j3] - y[j2,j3,j1] - y[j3,j1,j2] + y[j3,j2,j1])
return A2
""" Functions to compute third order areas (two types) """
def Area3a(k1, k2, k3, k4, z):
# k1 is path to be integrated with second order area
# k2 is path to be integrated with area
# k3 is first path for area
# k4 is second path for area
# z is array of fourth order iterated integrals
# third order area A(k1,(k2,(k3,k4))) is computed
A3a = 0.125*(z[k1,k2,k3,k4] + z[k2,k1,k3,k4] + z[k2,k3,k1,k4]
+ z[k1,k3,k2,k4] + z[k3,k1,k2,k4] + z[k3,k2,k1,k4]
+ z[k1,k4,k3,k2] + z[k4,k1,k3,k2] + z[k4,k3,k1,k2]
- z[k1,k2,k4,k3] - z[k2,k1,k4,k3] - z[k2,k4,k1,k3]
- z[k1,k3,k4,k2] - z[k3,k1,k4,k2] - z[k3,k4,k1,k2]
- z[k1,k4,k2,k3] - z[k4,k1,k2,k3] - z[k4,k2,k1,k3]
- z[k2,k3,k4,k1] - z[k3,k2,k4,k1] - z[k4,k3,k2,k1]
+ z[k2,k4,k3,k1] + z[k3,k4,k2,k1] + z[k4,k2,k3,k1])
return A3a
def Area3b(l1, l2, l3, l4, z):
# l1 is first path for first area
# l2 is second path for first area
# l3 is first path for second area
# l4 is second path for second area
# z is array of fourth order iterated integrals
# third order area A((l1,l2),(l3,l4)) is computed
68
A3b = 0.125*(z[l1,l2,l3,l4] + z[l1,l3,l2,l4] + z[l3,l1,l2,l4]
- z[l1,l2,l4,l3] - z[l1,l4,l2,l3] - z[l4,l1,l2,l3]
- z[l2,l1,l3,l4] - z[l2,l3,l1,l4] - z[l3,l2,l1,l4]
+ z[l2,l1,l4,l3] + z[l2,l4,l1,l3] + z[l4,l2,l1,l3]
+ z[l4,l3,l1,l2] + z[l4,l1,l3,l2] + z[l1,l4,l3,l2]
- z[l4,l3,l2,l1] - z[l4,l2,l3,l1] - z[l2,l4,l3,l1]
- z[l3,l4,l1,l2] - z[l3,l1,l4,l2] - z[l1,l3,l4,l2]
+ z[l3,l4,l2,l1] + z[l3,l2,l4,l1] + z[l2,l3,l4,l1])
return A3b
A1.append(Area1(i1[0], i1[1], x1))
B1.append(Area1(i2[0]-(d/2), i2[1]-(d/2), x2))
if p > 2:
A2.append(Area2(j1[0],j1[1],j1[2],y1))
B2.append(Area2(j2[0]-(d/2),j2[1]-(d/2),j2[2]-(d/2),y2))
if p > 3:
A3a.append(Area3a(k1[0],k1[1],k1[2],k1[3],z1))
B3a.append(Area3a(k2[0]-(d/2),k2[1]-(d/2),k2[2]-(d/2),k2[3]-(d/2),z2))
A3b.append(Area3b(l1[0],l1[1],l1[2],l1[3],z1))
B3b.append(Area3b(l2[0]-(d/2),l2[1]-(d/2),l2[2]-(d/2),l2[3]-(d/2),z2))
if M == 1:
for t in range(t_fin - t_ini +1): # iterate over all time points (t = 0, 1, . . . , n)
s1 = S1[t] # signatures over time interval from 0 to time point t
s2 = S2[t]
x1 = s1[2] # second order iterated integrals
x2 = s2[2]
if p > 2:
y1 = s1[3] # third order iterated integrals
y2 = s2[3]
if p > 3:
z1 = s1[4] # fourth order iterated integrals
z2 = s2[4]
69
a1.append(Area1(i1[0], i1[1], x1))
b1.append(Area1(i2[0]-(d/2), i2[1]-(d/2), x2))
if p > 2:
a2.append(Area2(j1[0],j1[1],j1[2],y1))
b2.append(Area2(j2[0]-(d/2),j2[1]-(d/2),j2[2]-(d/2),y2))
if p > 3:
a3a.append(Area3a(k1[0],k1[1],k1[2],k1[3],z1))
b3a.append(Area3a(k2[0]-(d/2),k2[1]-(d/2),k2[2]-(d/2),k2[3]-(d/2),z2))
a3b.append(Area3b(l1[0],l1[1],l1[2],l1[3],z1))
b3b.append(Area3b(l2[0]-(d/2),l2[1]-(d/2),l2[2]-(d/2),l2[3]-(d/2),z2))
# end of simulation loop
4. Path classification
Our final code sample shown below is an extract from the source code of a program
that implements our novel method of classifying paths as either upward/downward
trending or mean-reverting based on computing the third order area of a given market
path and three test paths that are realisations of diffusion processes with the same
mean reversion speed for the same Brownian path driving the processes.
For each value of mean reversion speed assigned to the test processes, the program
computes the minimum Euclidean norm for the third order area of a given market
path and three test paths among all the finitely many Brownian paths that can drive
the processes, and identifies the Brownian path that attains the minimum.
import numpy as np
""" Set parameters for simulation """
n = 100 # number of increments per path (i.e. number of time steps)
t_ini = 0 # initial time point
t_fin = n # final time point
p = 4 # degree of signature
dt = 1.0/n # length of time step
sqrdt = np.sqrt(dt)
""" Generate Brownian paths driving market and test paths """
N = 1000 # number of Brownian paths
np.random.seed(4) # initialize random number generation
dV = np.random.randn(N, n) # standard normal increments
70
dW = np.transpose(dV) # for given n and random seed, dW[:,0] are increments of the first path for any N
W = np.zeros(N)
W = np.vstack( (W, sqrdt*dW) )
W = np.cumsum(W, axis=0) # Brownian paths
""" Set up matrices of increments and paths for computing third order areas """
dX = np.zeros((4,n,N)) # matrix of increments of four Wiener/OU paths for all underlying Brownian paths
X = np.zeros((4,n+1,N)) # matrix of four Wiener/OU paths for all underlying Brownian paths
a3b = np.zeros(n+1) # time-indexed sequence of third order areas for market and test paths
en = np.zeros(N) # Euclidean norm of third order area of market and test paths for each Brownian path
M = np.zeros((10,3)) # for each value of mrs, minimum third order area and Brownian path that attains it
""" Initialize market and test paths """
ini_val = [0.0, 0.0, 0.0, 0.0] # initial values of market and test paths
for i in range(4):
X[i,0,:] = ini_val[i]*np.ones(N)
""" Set parameters for market path """
theta = 5.1 # mean reversion speed ‘mrs’ (theta = 0.0 for a Wiener process with constant drift)
mu = 7.5 # drift/long-term mean of Wiener/Ornstein-Uhlenbeck process
sigma = 10.0 # volatility
P = np.zeros((4, 3)) # matrix of parameters for market and test paths
P[0] = [theta, mu, sigma] # market path parameters
""" Generate market path """
if P[0,0] == 0.0: # create increments for a Wiener process with constant drift
for m in range(n): # m = 0,1,..., n-1
dX[0,m,0] = P[0,1]*dt + P[0,2]*sqrdt*dW[m,0] # the first Brownian path drives the market path
X[0,m+1,0] = X[0,m,0] + dX[0,m,0]
else: # create increments for an Ornstein-Uhlenbeck process
for m in range(n):
dX[0,m,0] = P[0,0]*(P[0,1] - X[0,m,0])*dt + P[0,2]*sqrdt*dW[m,0]
X[0,m+1,0] = X[0,m,0] + dX[0,m,0]
""" Set parameters for test paths """
P[1] = [1.0, 2.0*mu, 0.5*sigma] # first column entries to be multiplied by ‘mrs’ in the loop below
P[2] = [1.0, 0.0, 2.0*sigma]
P[3] = [1.0, (-1.0)*mu, 1.5*sigma]
71
""" Iterate over values of mean reversion speed (mrs) for test paths """
for mrs in range(10):
""" Generate test paths for all underlying Brownian paths """
for i in range(1, 4, 1): # i = 1, 2 and 3
if mrs == 0.0: # create increments for a Wiener process with constant drift
for m in range(n):
dX[i,m,:] = P[i,1]*dt + P[i,2]*sqrdt*dW[m,:]
X[i,m+1,:] = X[i,m,:] + dX[i,m,:]
else: # create increments for an Ornstein-Uhlenbeck process
for m in range(n):
dX[i,m,:] = mrs*P[i,0]*(P[i,1] - X[i,m,:])*dt + P[i,2]*sqrdt*dW[m,:]
X[i,m+1,:] = X[i,m,:] + dX[i,m,:]
""" Compute signature of market and test paths for each underlying Brownian path """
for j in range(N):
I = np.column_stack( ( dX[0,:,0], dX[1,:,j], dX[2,:,j], dX[3,:,j] ) ) # path increment matrix
S = Sig( I, t_ini, t_fin, p)
for t in range(t_fin - t_ini +1):
z = S[t][4] # fourth order iterated integrals over interval from 0 to time point t
a3b[t] = Area3b(0,1,2,3,z) # third order area of market and test paths
en[j] = np.sqrt( sum(a3b**2) ) # Euclidean norm of third order area over the whole time interval
M[mrs] = [ mrs, np.amin(en), np.argmin(en)+1 ]
print M
72