rough paths theory and its application to time series ... math fin... · of using third order areas...

Rough Paths Theory and itsApplication to Time Series

Analysis of Financial Data Streams

Antti K. Vauhkonen

Christ Church

University of Oxford

A thesis submitted in partial fulfillment ofthe degree of Master of Science in Mathematical Finance

Trinity 2017

Abstract

The signature of a continuous multi-dimensional path of bounded varia-

tion, i.e. the sequence of its iterated integrals, is a central concept in the

theory of rough paths. The key insight of this theory is that for any path of

finite p-variation with p ≥ 1 (e.g. sample paths of Brownian motion have

finite p-variation for any p > 2 almost surely), one can define a construct

analogous to signature, called its rough path lift, that incorporates all the

information required for solving controlled differential equations driven

by the given path. In the first part of this thesis we give an intuitive yet

mathematically rigorous introduction to rough paths.

Information encoded in the signatures of multi-dimensional discrete data

streams can also be utilised in their time series analysis, and in some

recent publications signatures of financial data streams have been used

as feature sets in linear regression for the purposes of classifying data

and making statistical predictions. In the second part of this thesis we

present a novel application of this signature-based approach in the context

of a market model where every variable is assumed to follow a diffusion

process that either has a constant underlying drift or reverts to some

long-term mean. Specifically, we show that third order areas of financial

data streams – special linear combinations of their fourth order iterated

integrals – provide an efficient means of determining the parameters of a

market variable given one of its realisations in a space of finitely many

Brownian sample paths that can drive the process, and thus enable one

to distinguish between the two fundamental modes of market behaviour,

namely upward or downward trending versus mean-reverting.

An interesting line of future research would be to investigate the possibility

of using third order areas as a tool for decomposing arbitrary market paths

into mean-reverting path components with a spectrum of mean reversion

speeds.

To the memory of my mother.

Acknowledgements

I would like to express my gratitude to my academic supervisor Prof.

Ben Hambly for his technical guidance, careful reading of my thesis and

valuable comments.

I also owe a big debt of gratitude to Dr. Daniel Jones for giving his time

so generously, his wise counsel and constant encouragement and support

without which this thesis would probably never have been completed.

My sincere thanks are also due to my family for their help, support and

understanding while working on this thesis over a period that at times

must have seemed interminable.

Lastly, with love and eternal gratitude I remember my late mother, my

most steadfast supporter in all of my varied endeavours, who sadly didn’t

live to see this project reach its conclusion.

Contents

1 Rough paths theory 1

1.1 Origins of rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Formal definition of rough paths . . . . . . . . . . . . . . . . . . . . . 11

2 Application of rough paths theory to time series analysis of financial

data streams 21

2.1 Classical time series analysis . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Signatures as feature sets for linear regression analysis . . . . . . . . . 22

2.3 Lead and lag transforms of data streams . . . . . . . . . . . . . . . . 27

2.3.1 Gyurko-Lyons-Kontkowski-Field method . . . . . . . . . . . . 28

2.3.2 Flint-Hambly-Lyons method (Mark 1) . . . . . . . . . . . . . 30

2.3.3 Flint-Hambly-Lyons method (Mark 2) . . . . . . . . . . . . . 30

2.4 Area processes of multi-dimensional paths . . . . . . . . . . . . . . . 32

2.4.1 Definition and basic properties of areas . . . . . . . . . . . . . 32

2.4.2 Higher order areas . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5 Classification of paths using third order areas . . . . . . . . . . . . . 39

2.5.1 Diffusion process market model . . . . . . . . . . . . . . . . . 39

2.5.2 Areas for pairs of diffusion processes . . . . . . . . . . . . . . 42

2.5.3 Classifying sample paths of diffusion processes by using third

order areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

References 54

Appendix 1: Quadratic variation and cross-variation of data streams 56

Appendix 2: Python code 58

i

List of Figures

1 GLKF method of lead-lag transforming data streams. . . . . . . . . . 29

2 FHL (Mark1) method of lead-lag transforming data streams. . . . . . 31

3 FHL (Mark 2) method of lead-lag transforming data streams. . . . . 32

4 Area between path components X i and Xj . . . . . . . . . . . . . . . 33

5 A typical 2-dimensional Brownian sample path. . . . . . . . . . . . . 34

6 Scatter plot of areas of two pairs of MR processes with different long-

term means and volatilities, but all four processes having the same

mean reversion speed and driven by the same Brownian path. . . . . 43

7 Scatter plot of areas of the same two pairs of MR processes as in Figure

6 after slightly altering the mean reversion speed for one of the processes. 44

8 Scatter plot of the areas of a pair of two CD processes and a mixed

pair of CD and MR processes all driven by the same Brownian path. 45

9 Scatter plot of terminal values of the areas of two mixed pairs of CD

and MR processes for 500 simulation runs. . . . . . . . . . . . . . . . 46

10 Scatter plot of terminal values of the same two areas as in Figure 9

with different long-term means assigned to the MR processes. . . . . 47

11 Scatter plots of terminal values of the areas of two pairs of MR pro-

cesses all having the same mean reversion speed for 500 simulation

runs with the pairwise correlation between Brownian motions driving

the processes equal to 1.00, 0.99 and 0.90, respectively. . . . . . . . . 48

ii

List of Tables

1 Determining the mean reversion speed of a given ‘market’ path by

minimising its third order area with three test paths all driven by the

same Brownian motion. . . . . . . . . . . . . . . . . . . . . . . . . . . 50

iii

Chapter 1

Rough paths theory

1.1 Origins of rough paths

There is no more fundamental question in science than that pertaining to the nature

of change. Since antiquity thinkers have pondered over problems concerning motion,

as illustrated by the famous paradoxes of Zeno. In one of them, Zeno argued that a

flying arrow occupies a particular position in space at any given instant of time, hence

is instantaneously motionless, and since time consists of instants, he concluded that

motion is just an illusion; and in another paradox the Greek hero Achilles was unable

to overtake a tortoise in a race where the latter had been given a head start, for in

order to accomplish this he would need to traverse an infinite number of (progressively

shorter) distances, which, according to Zeno, is impossible in a finite amount of time.

While the arrow paradox can be seen simply as an acute observation that motion

has no meaning with respect to a single instant of time – in fact any set of instants

which has zero measure – the notion of an infinite series that is convergent to a limit

provides a satisfactory resolution to the Achilles and tortoise paradox: specifically,

that a geometric series like 1/2 + 1/4 + 1/8 + 1/16 + ∙ ∙ ∙ that arises in the equivalent

dichotomy paradox does not grow without limit but converges to 1, enabling Achilles

to quickly pass the tortoise.

The concept of a limit of a function f that expresses the dependence of a variable y

on another variable x as y = f(x) was similarly crucial to the development in the late

1600s of modern differential and integral calculus which provides proper analytical

tools for the mathematical study of change. The chief among these is the derivative of

a function, usually denoted by f ′(x), f(x) or df(x)dx

, which, as the last notation due to

Leibniz suggests, was originally conceived as the quotient of an infinitesimally small

change df(x) in the value of the function f(x) corresponding to an infinitesimally

1

small change dx in the value of its argument x, until derivatives were defined in a

more rigorous way using the (ε, δ)-definition of a limit in the early 19th century.

Rather than needing to differentiate given functions, one is often faced with the

(usually harder) inverse problem of finding a function F (x) whose derivative is a given

function f(x), i.e. solving the differential equation

dF (x)

dx= f(x). (1.1)

By the fundamental theorem of calculus, such an antiderivative F (x) of f(x) is the

same as an indefinite integral of f(x), i.e.

F (x) =

∫ x

a

f(z) dz

for any constant a < x in the domain of f where it is continuous.

Differential equations first emerged in the context of dynamical systems as a way

to implicitly describe their time evolution, and most fundamental laws in the mathe-

matical sciences from fluid dynamics and electromagnetism to general relativity and

quantum mechanics – and also mathematical finance – are expressed in terms of dif-

ferential equations. For example, if in (1.1), relabelling the independent variable t

for time, f(t) is a time-varying force acting on a body of mass m, then, according to

Newton’s second law of motion, the momentum mv(t) = mdx(t)dt

of the body, where

x(t) is its position at time t, is a solution of this differential equation. Indeed, this

is the first type of differential equation Newton considered and solved using infinite

series in his Methodus fluxionum et Serierum Infinitarum of 1671.

The second type of differential equations that Newton studied in the same work

are of the formdy

dx= f(x, y),

and we will be especially interested in the special case where f is a function of the

unknown variable y only, i.e.dy

dx= f(y). (1.2)

However, not all functions are differentiable. Up to the second half of the 19 th

century, it was a general belief among mathematicians that continuous functions had

to be everywhere differentiable except at some isolated singular points, until the

first examples of ‘pathological’ continuous functions that are nowhere differentiable

were constructed by Riemann and Weierstrass. Actually, far from being pathological,

such functions are in fact the norm rather than exceptions, for almost all continuous

2

functions – viewed as sample paths of a Brownian motion – can be seen to be nowhere

differentiable!

For non-differentiable functions, we would like to generalise (1.2) and write it (in

a manner of Leibniz) in the following form:

dy = f(y) dx (1.3)

– hoping that we can still make sense of it subject to some conditions! We can think

of (1.3) as describing a dynamical system that evolves in such a way that the change

in its state y = yt over an infinitesimally small time interval[t, t + dt

]is given by the

product of its velocity in the current state, as specified by the vector field f on the

state space, and the corresponding increment in the control process xt driving the

system.

In general, the state space of a dynamical system is some manifold that is locally

either a Euclidean space or a Banach space so that in these two cases (1.3) can be

rewritten as

dyt =d∑

i=1

fi(yt) dxit (1.4)

where yt ∈ Re, fi : Re → Re and xit ∈ R for i = 1, . . . , d, or

dyt = f(yt) dxt (1.5)

where yt ∈ U , xt ∈ V and f : U → L(V, U) with U and V (possibly infinite-

dimensional) Banach spaces. Equations of the type of (1.4) and (1.5) are called

controlled differential equations.

Thus, assuming that initially at time 0 the system is in state y0, solving for its

state yt at an arbitrary time t > 0 involves iterating its equation of motion (1.5) an

infinite number of times and integrating all the infinitesimally small local changes

into a global change over the time interval[0, t], so that

yt = y0 +

∫ t

0

f(yu) dxu. (1.6)

Within the theory of rough paths, whose formal definition will be given in the next

section, the function If (xt, y0) 7→ yt is called the Ito map associated with the vector

field f .

In other words, in the language of differential geometry, for finding yt we should

be able to integrate the one-form f on V with values in the linear space of vector

fields on U with respect to the path xt in V . As one might expect, this cannot be done

3

in general without imposing some regularity conditions on f and xt. It is a classical

result (the Picard-Lindelof theorem, see [10, Theorem 1.3]) that if the vector field f is

Lipschitz continuous and the control process xt is a path of bounded variation in V ,

then, for any initial condition y0 ∈ U , (1.5) has a unique solution given by (1.6) where

the integral is defined as a Stieltjes integral. Under the weaker condition that f is

merely continuous, a solution is still guaranteed to exist by the Cauchy-Peano theorem

(see [10, Theorem 1.4]), but it may not be unique. But for less regular driving signals

– e.g. sample paths of a Brownian motion – classical integration methods are known

to fail. Let us see why this is the case by considering formal solution of controlled

differential equations using iteration.

For simplicity, we shall consider the 1-dimensional case where xt, yt and f all take

values in R. Hence, formally integrating (1.4) gives

u=t∫

u=s

dyu =

u=t∫

u=s

f(yu) dxu. (1.7)

Under any reasonable definition of an integral, the left hand side of (1.7) must be

equal to δys,t := yt − ys, so we have

δys,t =

u=t∫

u=s

f(yu) dxu. (1.8)

Further, expanding f about ys in a formal Taylor series (effectively assuming that f

is an analytic function) on the right hand side of (1.8), then using (1.8) to substi-

tute integral expressions for the increments δys,t in the Taylor series expansion, and

repeating the procedure yields after three iterations

δys,t = f(ys)

u=t∫

u=s

dxu

+ f ′(ys)f(ys)

u=t∫

u=s

v=u∫

v=s

dxv dxu

+ f ′(ys)2f(ys)

u=t∫

u=s

v=u∫

v=s

w=v∫

w=s

dxw dxv dxu

+ 12f ′′(ys)f(ys)

2

u=t∫

u=s

v=u∫

v=s

dxv

w=u∫

w=s

dxw

dxu + . . .

(1.9)

4

where the remaining terms (not shown above) all involve fourth or higher order iter-

ated integrals. Moreover, provided that the above integrals satisfy the usual integra-

tion by parts formula, the integral in the last term can be written as

u=t∫

u=s

v=u∫

v=s

dxv

w=u∫

w=s

dxw

dxu = 2

u=t∫

u=s

v=u∫

v=s

w=v∫

w=s

dxw dxv dxu.

In the general multi-dimensional case, an expression analogous to (1.9) can be

just as easily derived for each component yjt of yt for j = 1, . . . , e. For each integer

n ≥ 1, let us formally define nth order componentwise iterated integrals of a path xt

in Rd over the time interval[s, t]

by

xi1,...,ins,t :=

un=t∫

un=s

. . .

u1=u2∫

u1=s

dxi1u1

. . . dxinun

(1.10)

for i1, . . . , in ∈ {1, . . . , d}. In particular, xis,t = xi

t − xis for i = 1, . . . , d, so the first

order iterated integrals of xt ∈ Rd are just its componentwise linear increments over[s, t]. Then, for each j ∈ {1, . . . , e}, we have

yjt = yj

s +∞∑

n=1

∑

i1,...,in∈{1,...,d}

F ji1,...,in

(ys) xi1,...,ins,t (1.11)

where the functions F ji1,...,in

: Re → R are products of partial derivatives of components

of the vector fields f ji : Re → R evaluated at ys for i = 1, . . . , d and j = 1, . . . , e, as

in (1.9).

As illustrated by (1.11), the importance of iterated integrals for solving controlled

differential equations is due to the fact that the local behaviour of the solution is

controlled by the sequence of iterated integrals of the path driving the equation –

assuming that the series in (1.11) converges and a solution does indeed exist. However,

this is by no means always the case. For we need to remind ourselves that the solution

in (1.11) was derived under the strongest possible condition on the vector field f

(namely analyticity), and, moreover, we didn’t specify how the iterated integrals in

(1.10) should be constructed, but rather tacitly assumed that they can be canonically

defined as limits of Riemann sums even though we also didn’t impose any condition

on the regularity of the path xt. To advance beyond the classical Picard-Lindelof and

Cauchy-Peano theorems, one would like to be able to solve (1.5) for vector fields which

satisfy some mildly stronger form of continuity than plain continuity, and for paths

5

which are not of bounded variation but whose irregularity – colloquially roughness –

is nevertheless controlled.

For this purpose we introduce the concept of p-variation of a path. For a closed

bounded time interval[0, T

], a subdivision D of

[0, T

]will be taken to mean a

finite ordered set of real numbers (t0, t1, . . . , tk) such that 0 = t0 < t1 < ∙ ∙ ∙ < tk = T ,

denoting the set of all subdivisions of[0, T

]by D(

[0, T

]). Then we make the following

Definition 1.1. Let x :[0, T

]→ Rd be a continuous function. Then, for any real

number p ≥ 1, the p-variation of x on[0, T

]is defined by

‖x‖p,[0,T ] =

(

supD∈D([0,T ])

k∑

h=1

|xth − xth−1|p)1/p

where | ∙ | denotes the Euclidean norm on Rd.

The concept of p-variation can also be straightforwardly extended to paths xt that

take values in an arbitrary Banach space V by replacing the Euclidean norm with a

norm ‖ ∙ ‖V on V in the above definition. Up to reparameterisation, a path having

finite p-variation is equivalent to saying that it is Holder continuous with exponent 1p.

Of course, paths with finite 1-variation are just paths of bounded variation. It is worth

emphasizing that the p-variation of a path is defined by taking the supremum over

all the subdivisions of the time interval, not as a limit as the mesh of the subdivision

tends to zero, as there are paths of finite (non-zero) p-variation with p > 1 for which

the latter is zero. One should also note that if a path has finite p-variation, then it

also has finite q-variation for all q > p.

As a major advance to the classical theory of integration, L. C. Young discovered

in 1936 (see [13]) that Stieltjes integrals can also be defined for paths which are

of unbounded variation but have finite p-variation for some p > 1 as long as the

integrand is a continuous function of finite q-variation such that 1p

+ 1q

> 1. This

result allows (1.5) to be solved for paths of finite p-variation with 1 ≤ p < 2 provided

that the vector field f is Lipschitz-γ continuous with p− 1 < γ ≤ 1, and, subject to

these conditions, the Young integral, as a function of t, also has finite p-variation.

However, even with Young’s extension of the classical theory, integration against

sample paths of stochastic processes remained tantalisingly out of reach for a long

time, as many important classes of stochastic processes have finite p-variation with

p ∈[2, 3). In particular, almost all Brownian paths have infinite 2-variation and

finite p-variation for all p > 2 on any finite time interval – which is not to be confused

with the fundamental fact that the quadratic variation process of a Brownian motion

6

(Bt)t≥0 is deterministic, finite and equal to t – and sample paths of semi-martingales

almost surely have finite p-variation for p ∈ (2, 3).

It wasn’t until 1945 that integrals of some tractable stochastic processes against

Brownian motion were successfully defined when K. Ito published his construction of

what is now called, in his honour, the Ito integral, which has subsequently been ex-

tended to other martingales and, further, semi-martingales as integrators. Essentially

the Ito integral is a Riemann-Stieltjes type of stochastic integral in that it is defined

as the limit of a sequence of Riemann sums of random variables that converges in

probability.

Unfortunately, such limits do not usually exist in a pathwise sense – which per-

haps isn’t all that surprising considering that Brownian motion has exceedingly nice

properties – it is a Gaussian process with independent and stationary increments –

whereas its sample paths are very rough being (almost surely) nowhere differentiable

and having unbounded variation on any time interval (no matter how small). So,

in view of this, while Ito’s theory of stochastic integration ranks among the princi-

pal achievements of 20th century mathematics, developing a theory of integration for

Brownian motion paths would appear, on the face of it, an even more challenging

task, although some early results in this direction – notably the construction of the

Levy area of a 2-dimensional Brownian path, defined as the difference of two second

order iterated integrals – had been established even before the invention of stochastic

integrals.

Let us now examine in some detail, albeit somewhat heuristically, what goes wrong

when one tries to define iterated integrals of Brownian paths as classical Riemann

integrals, as this will give us important clues as to how one should formally define

rough paths. But first, as a precursor, let us briefly return to the construction of

iterated integrals for more regular paths.

For a continuous path xt = (x1t , . . . , x

dt ) ∈ R

d on a time interval[0, T

]all of whose

components are differentiable functions of t, we can define its nth iterated integrals

xi1,...,ins,t over a subinterval

[s, t]

for any n ≥ 1 as limits of the sequences of Riemann

sums

S ns,t(N) =

N∑

in=1

in∑

in−1=1

∙ ∙ ∙i2∑

i1=1

(xi1

ti1− xi1

t(i1−1)

). . .(x

in−1

tin−1− x

in−1

t(in−1−1)

)(xin

tin− xin

t(in−1)

)

(1.12)

where tik − t(ik−1) = (t− s)/N for k = 1, . . . , n, so that

xi1,...,ins,t = lim

N→∞S n

s,t(N). (1.13)

7

Assuming that (t− s)� 1, we have, by Taylor’s theorem, that

xiktik− xik

t(ik−1)= x(t(ik−1))(t− s)/N + o

((t− s)/N

),

which, when substituted into (1.12), implies, by (1.13), that xi1,...,ins,t ∼ (t− s)n.

If xt has bounded variation on[0, T

], then its iterated integrals can be similarly

defined as Stieltjes integrals, and we also have xi1,...,ins,t ∼ (t− s)n.

Even when xt is of unbounded variation but has finite p-variation for some p ∈

(1, 2) its iterated integrals can still be defined in this way as Young integrals, but now

xi1,...,ins,t ∼ (t − s)n/p. Thus, in common with paths of bounded variation, this means

that also in this case xi1,...,ins,t = o(t−s) for any n ≥ 2, so that second and higher order

iterated integrals all become negligible as t→ s.

Finally, let xt be a sample path of a Brownian motion (Bt)0≤t≤T , and, for the sake

of simplicity, let us assume that d = 1, i.e. the Brownian motion Bt is 1-dimensional.

It is instructive to consider iterated integrals of xt from the view point of expected

values of corresponding stochastic integrals of Bt using the defining properties of

Brownian motion – even though this will not lead us to precisely the right answers.

For example, the sum of the expected absolute increments of Bt over[0, T

]in the

limit as the size of time increments tends to zero is given by

limN→∞

N∑

i=1

E[|Bi(T/N) − B(i−1)(T/N)|

]= lim

N→∞

N∑

i=1

√2π

√TN

= limN→∞

√2TN

π=∞

since, for all 0 ≤ s < t ≤ T , Bt−Bs is normally distributed with mean 0 and variance

t− s, and hence E[|Bt − Bs|

]=√

2π

√t− s. These results suggest that a Brownian

path xt has infinite variation on any finite time interval (as T can be made arbitrarily

small) – which is correct (almost surely) – and that x 1s,t = xt−xs ∼ (t−s)1/2 – which

is almost correct.

Similarly, one can get a measure of the 2-variation of xt on[0, T

]by computing

N∑

i=1

E[ (

Bi(T/N) − B(i−1)(T/N)

)2 ]=

N∑

i=1

T/N = T, (1.14)

which indicates that Brownian paths have finite 2-variation – which, as we know by

now, is nearly but not quite true.

Further, corresponding to the second order iterated integral x 2s,t of xt over

[s, t]

defined by

x 2s,t :=

u=t∫

u=s

v=u∫

v=s

dxv dxu

8

we have the following discrete stochastic approximation

N∑

i=1

i∑

j=1

(Bs+j(t−s)/N − Bs+(j−1)(t−s)/N

) (Bs+i(t−s)/N − Bs+(i−1)(t−s)/N

). (1.15)

Since disjoint increments of Brownian motion are independent, taking expectation of

(1.15) simply yields

N∑

i=1

E[ (

Bs+i(t−s)/N − Bs+(i−1)(t−s)/N

)2 ](1.16)

which, by (1.14), is just equal to t − s. Thus, we might conjecture that x 2s,t ∼

(t − s) – which again is slightly wrong. In order not to mislead the reader any

further, let us state the correct expressions for the orders of magnitude of the first

and second iterated integrals of Brownian paths xt : for any p > 2, x 1s,t ∼ (t − s)1/p

and x 2s,t ∼ (t − s)2/p. Hence, for Brownian paths, second iterated integrals do not

become negligible as t → s, but rather both their first and second iterated integrals

are greater than first order in (t− s).

Now it is also apparent from (1.15) and (1.16) why the second iterated integral of

a Brownian path xt cannot be defined as the limit of Riemann sums, for this would

include the following expression

limN→∞

N∑

i=1

(xs+i(t−s)/N − xs+(i−1)(t−s)/N

)2

which is infinite since Brownian paths have unbounded 2-variation almost surely, and

the total contribution from the cross terms involving disjoint subintervals of[s, t]

can

be expected to be finite and small.

In general, if xt ∈ Rd is a path of finite p-variation on[0, T

]for some p ≥ 1, then,

motivated by the above discussion, we would like its iterated integrals – if they can

be defined – to satisfy the analytic condition xi1,...,ins,t ∼ (t−s)n/p for all 0 ≤ s < t ≤ T

and n ≥ 1. In particular, this would mean that xi1,...,ins,t = o(t − s) for any n > b pc,

and thus, going back to our example of solving a controlled differential equation using

iteration and assuming that the control process has finite p-variation with p ∈[3, 4),

its solution would be given to first order in (t− s) by the terms shown in (1.9), while

ignoring any of these terms would mean that the Ito map taking the control and an

initial condition to the solution would in general fail to be continuous.

One can now also fully appreciate the significance of p = 2 as a key threshold, since

for paths of finite p-variation with 1 ≤ p < 2 iterated integrals can be canonically

9

defined as either Riemann-Stieltjes (p = 1) or Young (1 < p < 2) integrals even

though they are not required beyond first order linear increments for solving controlled

differential equations, whereas when p ≥ 2 second and possibly also higher order

iterated integrals would be needed in the solution if only they could be defined! As we

will see, the theory of rough paths provides a general resolution to this fundamental

dichotomy. But for now, let us just note, as already mentioned, that Brownian

motion is one of those special stochastic processes with irregular sample paths for

which second order iterated integrals can be defined pathwise – the Levy area of a

2-dimensional Brownian path involving one specific construction – and with them all

differential equations controlled by Brownian paths can be solved (subject to vector

fields being regular enough).

The problem of integrating one-forms f along a path xt in Rd essentially boils

down to being able to give meaning to the differential dxt. The challenges that one

faces when trying to define differentials of less regular paths are well illustrated by

considering Brownian paths. While a differentiable function x(t) of time t becomes

smooth and linear on sufficiently small time scales so that its differential can be

expressed as dx(t) = x(t) dt, there is no time scale on which a typical Brownian

path wt could be guaranteed to behave in a regular fashion: for, as δt → 0, δwt :=

wt+δt − wt can go through a whole range of positive and negative values – indeed

taking arbitrarily large values with non-zero probability for any δt > 0 – so that

δwt/δt does not approach any limiting value; all we can say is that |δwt| is expected

to be of the order δt 1/2. Therefore, simply knowing linear increments of a Brownian

path is not sufficient to define its differential.

For a differential equation of the form dyt =∑d

i=1 fi(yt) dxit to make sense, we

should be able to write down a full expression for the change δyt,t+δt in yt over a

small time interval[t, t + δt

]that is first order in δt, which, assuming that xt has

finite p-variation for some p ≥ 1 on[t, t+ δt

], involves, as we have seen above, its nth

order iterated integrals for n = 1, . . . , b pc, supposing that these satisfy the analytic

condition xi1,...,int,t+δt ∼ (δt)n/p for all n ≥ 1. Then, letting δt→ 0, any higher order terms

become negligible and vanish for an infinitesimally small change dt, so, by (1.11), we

have that

dyt =

b pc∑

n=1

∑

i1,...,in∈{1,...,d}

Fi1,...,in(yt) dxi1,...,int (1.17)

which can then be normally integrated with respect to time t. In this sense, we can

say that the sequence of iterated integrals dxi1,...,int := xi1,...,in

t,t+dt over the infinitesimal

10

time interval[t, t + dt

]for n = 1, . . . , b pc describes the full differential dxt of xt.

We have seen above that the sequence of iterated integrals of a path xt ∈ Rd

emerges in a natural way when one (formally) solves a differential equation controlled

by xt through iteration. Moreover, it has been known since the works of K. T. Chen

in the 1970s (see [1]) that if xt is a path of bounded variation and one forms all the

iterated integrals of xt into a single mathematical object, called the signature of the

path, viewing it as an element of the infinite sequence of successive tensor product

powers of Rd, then this object can be shown to possess some remarkable algebraic

(multiplicative) properties.

The central idea of the theory of rough paths is to define for any path of finite

p-variation with p ≥ 1 an analogous object, called a p-rough path, as an extension of

the path into an extended tensor product space that satisfies the relevant algebraic

conditions. Furthermore, as part of the definition, second and higher order com-

ponents of a p-rough path with p ≥ 2 – which play the role of canonically defined

iterated integrals for more regular paths with 1 ≤ p < 2, and provide the data that

enables differential equations controlled by the rough path to be solved – are assumed

to satisfy the analytic condition prescribed above, thus extending the concept of finite

p-variation to rough paths. All of these foundational ideas will be made rigorous in

the following section where rough paths are formally defined.

1.2 Formal definition of rough paths

Let x :[0, T

]→ V be a continuous path of finite 1-variation, as defined in Definition

1.1, where V = Rd with an ordered set of basis vectors {e1, . . . , ed}, so that, for each

integer n ≥ 1, the set {ei1⊗ . . . ⊗ ein : i1, . . . , in ∈ {1, . . . , d}} furnishes a basis for the

nth tensor power V ⊗n of V . For short, we shall denote ei1 ⊗ . . . ⊗ ein by ei1,...,in . It is

easy to see that, for each n ≥ 1, V⊗n is isomorphic as a vector space to the space of

homogeneous polynomials of degree n in non-commuting indeterminates X1, . . . , Xd.

Hence, the extended tensor product algebra T (V ) of V defined by

T (V ) := R ⊕ V ⊕ V ⊗ 2 ⊕ . . .

with componentwise addition and multiplication induced by the tensor product is

isomorphic as an algebra to the space of all formal power series in X1, . . . , Xd with the

tensor product of elements of T (V ) corresponding to the product of non-commuting

polynomials.

11

Under the above assumptions, we define for any n ≥ 1 the nth order iterated

integral xns,t of the path xt ∈ V over any time interval

[s, t]

with 0 ≤ s < t ≤ T as an

element of V ⊗n by

xns,t =

∑

i1,...,in∈{1,...,d}

xi1,...,ins,t ei1,...,in (1.18)

where the coefficients xi1,...,ins,t are defined in (1.10). By the multi-linearity of ten-

sor products, iterated path integrals can be equivalently expressed in the following

coordinate-free way:

xns,t =

un=t∫

un=s

. . .

u1=u2∫

u1=s

dxu1 ⊗ ∙ ∙ ∙ ⊗ dxun (1.19)

which is very useful as it allows this definition to be generalised to paths that take

values in arbitrary infinite-dimensional Banach spaces.

We have now all the requisite ingredients for defining an object that will serve as

the prototype for rough paths.

Definition 1.2 (Signature). Let x :[0, T

]→ V be a continuous path of bounded

variation taking values in a Banach space V , and let ΔT := {(s, t) : 0 ≤ s ≤ t ≤ T}.

Then the signature S(x) : ΔT → T (V ) of x is the continuous functional mapping

(s, t) to S(x)s,t :=(x 0

s,t, x 1s,t, x 2

s,t, . . .)

where xns,t for n ≥ 1 are the iterated integrals

defined in (1.19) and x 0s,t ≡ 1.

Signatures of bounded variation paths can readily be shown to have the following

fundamental property:

Theorem 1.3 (Multiplicative property). Let S(x) be the signature of a bounded

variation path x :[0, T

]→ V . Then, for all 0 ≤ s ≤ u ≤ t ≤ T , we have that

S(x)s,u ⊗ S(x)u,t = S(x)s,t .

This result is usually called Chen’s identity (even though K. T. Chen wasn’t the

first person to discover it), and the signature S(x) of a bounded variation path x is

commonly called the Chen lift of x, since it extends – lifts – a path x in V to an

element S(x) in T (V ) such that the projection of S(x)s,t onto V is x 1s,t = xt − xs.

Let x :[t, u]→ V and y :

[v, w

]→ V be two arbitrary bounded variation paths

taking values in a Banach space V . Then the concatenation of x and y is defined to

be the path x ∗ y :[t, u + w − v

]→ V satisfying

x ∗ y(s) =

{x(s) for t ≤ s ≤ u,

x(u) + y(s− u + v)− y(v) for u ≤ s ≤ u + w − v.

12

The set of V -valued bounded variation paths, denoted by BV(V ), is clearly closed

under concatenation, and, moreover, as this operation is associative, (BV(V ), ∗) is

a semigroup (or even a monoid since each trivial path x :[t, t]→ V is an identity

element for the operation of concatenation).

Thus, we have

S(x)t,u ⊗ S(x)v,w = S(x ∗ y)t,u ⊗ S(x ∗ y)u,u+w−v (1.20)

as signatures are invariant under time translations of paths, and, further, by Chen’s

identity

S(x ∗ y)t,u ⊗ S(x ∗ y)u,u+w−v = S(x ∗ y)t,u+w−v (1.21)

which, combined with (1.20), shows that the range of the signature map S : BV(V )→

T (V ) is closed under multiplication in T (V ) induced by the tensor product ⊗. More-

over, every element v = (v0, v1, v2, . . . ) of T (V ) with v0 ∈ R\{0} possesses an inverse

element, namely

v−1 =1

v0

∞∑

n=0

(1−

v

v0

)⊗n

where 1 is the multiplicative unit element (1, 0, 0, . . . ), as one can directly verify. In

particular, the subset

T (V ) := { (1, v1, v2, . . . ) : vn ∈ V ⊗n, n ≥ 1 } ⊂ T (V )

is a group which contains the range of the signature map as a subgroup, since the

inverse of the signature of a bounded variation path is the signature of the path ‘run

backwards’, i.e. for any x :[s, u]→ V belonging to BV(V )

(S(x)s,u)−1 = S (←−x )s,u (1.22)

where ←−x (t) := x(s + u− t) for s ≤ t ≤ u (see [10, Proposition 2.14]). Hence, we have

established that the signature map is a homomorphism from the monoid (BV(V ), ∗)

into the group (T (V ),⊗).

In fact, projections of the range of the signature map S : BV(V ) → T (V ) onto

the truncated tensor product algebras T (n)(V ) := R ⊕ V ⊕ V ⊗ 2 ⊕ . . . ⊕ V ⊗n are

Lie groups, as defined below, for all n ≥ 1.

Definition 1.4. For a Banach space V , let us define

[V, n V

]:={[

vn, . . . ,[v2, v1

]. . .]

: vi ∈ V, 1 ≤ i ≤ n}

13

for n ≥ 2 with[V, 1 V

]:= V and

[V, 0 V

]:= {0} where the Lie bracket is defined as

[v, w

]:= v⊗w−w⊗ v for all v, w ∈ V . Then the space of Lie polynomials of degree

n over V , denoted by L(n)(V ) and defined as

L(n)(V ) :=n⊕

i=0

[V, i V

]

is a linear subspace of T (n)(V ) =⊕n

i=0 V ⊗ i. Further, let us define the exponential

map expn : T (n)(V )→ T (n)(V ) by

expn(x) :=n∑

i=0

x⊗ i

i!

for any x ∈ T (n)(V ). Then G(n)(V ) := exp n

(L(n)(V )

)is a group with multiplication

in T (n)(V ) induced by the tensor product, and is called the free nilpotent Lie group

of step n over V , or the set of group -like elements in T (n)(V ).

Proposition 1.5 ([10, Proposition 2.27]). For any n ≥ 1, G(n)(V ) coincides with the

projection of the range of the signature map S : BV(V ) → T (V ) onto the truncated

tensor product algebra T (n)(V ). Thus, every element of G(n)(V ) can be expressed as

the truncated signature of a bounded variation path in V .

It is also natural to enquire about the kernel of the signature map in BV(V ). By

(1.22), for any path x ∈ BV(V ), S(x)⊗S(←−x ) = S(x∗←−x ) = 1, i.e. any path concate-

nated with its reverse path has trivial signature, and, furthermore, any path that can

be reduced to a constant path by successively removing pairs of path segments of the

form (x,←−x ) also has trivial signature. Such paths are called tree-like, and one should

point out that path segments x and ←−x in such paths do not necessarily need to be

adjacent (and may even be infinitesimal): e.g. a path of the form x∗y ∗z ∗←−z ∗←−y ∗←−x

is tree-like and has trivial signature. As a profound converse statement, B. Ham-

bly and T. Lyons have proved (see [6, Theorem 1]) for bounded variation paths in

finite-dimensional Euclidean spaces Rd that a path being tree-like is also a necessary

condition for it to have trivial signature. Thus, we can define an equivalence relation

on BV(V ) by x ∼ y if and only if x ∗←−y is tree-like. Then we have that S(x) = S(y)

if and only if S(x)⊗ S(y)−1 = S(x)⊗ S(←−y ) = S(x ∗←−y ) = 1, i.e. if and only if x and

y are tree-like equivalent.

In addition to the above geometric interpretation of signatures as elements of T (V )

that are in one-to-one correspondence with classes of tree-like equivalent bounded

variation paths in V , the signature of each bounded variation path x :[0, T

]→ V

14

can be characterised as the solution of the following ‘universal’ rough differential

equation, i.e. a differential equation on the extended tensor product algebra T (V ):

dSt = St ⊗ dxt (1.23)

with the initial value S0 = 1, and where dxt represents the element (0, xt+dt−xt, 0, . . . )

of T (V ). Indeed, it is nice to observe how the signature of a path builds up through

repeated application of tensor multiplication by infinitesimal path increments dxt in

(1.23), so that St = S(x)0,t is the unique solution to (1.23). Thus, informally we can

think of the signature of a bounded variation path as a universal non-commutative

exponential of the path. Furthermore, this provides a succinct proof of Theorem 1.3

above – viz. the multiplicative property of signatures – since for all 0 ≤ s ≤ t ≤ T

both S(x)0,s ⊗ Ss,t and S(x)0,t satisfy the same differential equation (1.23) with the

same initial condition, and hence must be equal.

We take this key characteristic of signatures of bounded variation paths as the

defining property of a more general abstract object.

Definition 1.6 (Multiplicative functional). With above notation, a multiplicative

functional is a continuous functional x : ΔT → T (V ) with x s,t =(1, x 1

s,t, x 2s,t, . . .

)

where xns,t ∈ V ⊗n for n ≥ 1, satisfying the multiplicative property

x s,u ⊗ xu,t = x s,t . (1.24)

for all 0 ≤ s ≤ u ≤ t ≤ T .

Next we extend the concept of p-variation to multiplicative functionals.

Definition 1.7 (p-variation). Let x : ΔT → T (V ) be a multiplicative functional.

Then, for any real number p ≥ 1, the p-variation of x on[0, T

]is defined by

‖x‖p,[0,T ] = supn≥1

(

supD∈D([0,T ])

k∑

h=1

‖xnth−1,th

‖ p/n

V ⊗n

)n/p

where ‖ ∙ ‖V ⊗n denote a set of compatible norms on V ⊗n for n ≥ 1.

As with paths of finite p-variation in V , we note that any multiplicative functional

of finite p-variation in T (V ) also has finite q-variation for all q > p. A multiplicative

functional having finite p-variation, as defined above, is equivalent to it satisfying the

condition in the following

15

Proposition 1.8 ([11, Proposition 3.3.2]). Let x : ΔT → T (V ) be a multiplicative

functional. Then x has finite p-variation on[0, T

]for some p ≥ 1, in the sense of

Definition 1.7 above, if and only if there exists a super-additive continuous function

ω : ΔT → R+, called a control function, such that

‖xns,t‖V ⊗n ≤ ω(s, t)n/p

for all (s, t) ∈ ΔT and n ≥ 1.

In fact, the above condition is the same, in the general context of Banach spaces,

as the analytic condition that we formulated earlier for iterated integrals of finite

p-variation paths in Euclidean spaces with the control function ω(s, t) = C(t− s)n/p

where C is a constant that may depend on n and p. However, the reader should be

cautioned about this terminology, as we have also used the word ‘control’ for paths

that drive controlled differential equations, rather than referring to functions that

control the regularity of iterated path integrals.

Finally, we can state the formal definition of a p-rough path.

Definition 1.9 (p-rough path). A rough path of regularity p, or a p-rough path

for short, is a multiplicative functional of finite p-variation.

The first fundamental result on rough paths in the development of the theory by

T. Lyons is the following theorem ([9, Theorem 2.2.1]); it shows that only the first

b pc components of a p-rough path really matter since a p-rough path is uniquely

determined by its truncature at level b pc. For this reason, we may regard p-rough

paths as multiplicative functionals of degree bpc with finite p-variation, taking values

in the truncated tensor product algebra T (bpc)(V ).

Theorem 1.10 (Truncature of p-rough path). If x and y are p-rough paths in

T(V) such that xns,t = y n

s,t for all (s, t) ∈ ΔT and n = 1, . . . , b pc, then x = y.

Conversely, any multiplicative functional of degree bpc with finite p-variation can be

uniquely extended to a multiplicative functional with finite p-variation of arbitrarily

high degree r > bpc.

By contrast, it is important to realise that for a p-rough path x : ΔT → T (bpc)(V )

for any k satisfying 1 < k ≤ bpc, the terms xns,t for n = k, . . . , bpc are never uniquely

determined by the lower order terms xms,t for m = 1, . . . , k− 1. For example, a sample

path of a Brownian motion in Rd can be extended to a p-rough path for any p > 2

by defining its second order components as either Ito or Stratonovich integrals, which

are distinct in general.

16

Thus, abstracting the concept and characteristic properties of the signature of a

bounded variation path, a p-rough path is an extension of a path of finite p-variation

taking values in a Banach space V to its extended tensor product algebra T (V ) or its

truncature T (bpc)(V ) at level bpc such that the second and higher order components

– to be interpreted as, or actually defining, its iterated path integrals – satisfy an

analogous regularity condition. In other words, a p-rough path incorporates a full

sequence of iterated integrals, and hence encodes all the necessary data to provide

an unambiguous solution to any rough differential equation controlled by the p-rough

path. However, only components up to order bpc are needed for the solution of an

RDE. In particular, as already noted, in the classical case where 1 ≤ p < 2 only first

order linear increments of the controlling path are required for integration.

The so-called Universal Limit Theorem ([10, Theorem 5.3]), the main result of

the theory of rough paths, asserts that any rough differential equation dyt = f(yt)xt

controlled by a p-rough path xt, subject to the vector field f being Lipschitz-γ con-

tinuous with 1 ≤ p < γ, has a unique solution yt that is also a p-rough path, and the

Ito map If (xt, y0) 7→ yt is uniformly continuous. This is a deep and satisfying result.

As to the meaning of the defining multiplicative property of p-rough paths, this

simply corresponds to the additive property of iterated integrals over contiguous time

domains: for example, equating second order components on the left and right hand

sides of (1.24) for 0 ≤ s ≤ t ≤ u ≤ T yields

x 2s,t + x 2

t,u + x 1s,t ⊗ x 1

t,u = x 2s,u

which expresses the natural requirement that the second order iterated integral over[s, u]

should equal the sum of second order integrals over[s, t]

and[t, u]

plus the

product of the first order integrals (i.e. linear increments) over[s, t]

and[t, u].

One should carefully note the presence of the tensor product term on the left hand

side of the above equation, for second and higher order components of multiplicative

functionals are not additive!

For any p ≥ 1, let Ωp(V ) denote the set of all p-rough paths x : ΔT → T (bpc)(V ).

In particular, by our earlier remark, we note that Ω1(V ) is contained in Ωp(V ) for

all p > 1. Further, we can make Ωp(V ) into a metric space by equipping it with the

following metric:

Definition 1.11 (p-variation distance). If x and y are two elements of Ωp(V ),

then their p-variation distance is defined by

d p(x, y) = max1≤n≤bpc

(

supD∈D([0,T ])

k∑

h=1

‖xnth−1,th

− y nth−1,th

‖ p/n

V ⊗n

)n/p

.

17

In fact, it is straightforward to show that(Ωp(V ), dp

)is a complete metric space

for all p ≥ 1 (see [11, Lemma 3.3.3]). However, for p ≥ 2 , Ωp(V ) is not a linear space

due to non-linearity of the multiplicative property: in general the sum or difference

of two multiplicative functionals fails to be multiplicative.

With this metric, we can identify an important subclass of p-rough paths in Ωp(V ),

namely those elements that can be approximated arbitrarily closely by 1-rough paths

as measured by the p-variation distance.

Definition 1.12 (Geometric p-rough paths). The closure in Ωp(V ) of the space

Ω1(V ) of 1-rough paths under the topology induced by the p-variation distance dp is

called the space of geometric p-rough paths and denoted by GΩp(V ).

Thus, an element x of Ωp(V ) is a geometric p-rough path if and only if there

exists a sequence (xn)n≥1 of 1-rough paths such that limn→∞ dp(xn, x) = 0. Based

on the above topological description, one may wonder what ‘geometric’ there is about

geometric p-rough paths. The reason for this nomenclature is that geometric p-rough

paths take their values in the free nilpotent Lie group of step bpc – the very interesting

algebro-geometric object we defined above!

However, when p ≥ 2 there are also p-rough paths that take values in G(bpc)(V )

but cannot be expressed as the limit of a sequence of 1-rough paths in the p-variation

distance. We shall denote the space of such weakly geometric p-rough paths on V by

WGΩp(V ). Hence, GΩp(V ) ⊆ WGΩp(V ) with the inclusion being strict for p ≥ 2.

One should note that even though by Proposition 1.5 each weakly geometric p-rough

path x ∈ WGΩp([0, T ], V ) can be expressed as the truncated signature of a bounded

variation path for any (s, t) ∈ ΔT , there isn’t a single y ∈ BV([0, T ], V ) satisfying

x s,t = S(y)s,t for all (s, t) ∈ ΔT – unless, of course, x is actually a 1-rough path.

Generally though the difference between geometric and weakly geometric rough paths

is insignificant, and for our purposes we shall ignore it and just talk about geometric

rough paths without distinction.

The key implication of this is that for any geometric p-rough path x, we have

x s,t = x−10,s ⊗ x 0,t

for all 0 ≤ s ≤ t ≤ T where x−10,s is the group inverse of x 0,s in G(bpc)(V ), hence

providing for x s,t the natural interpretation of an increment of the p-rough path x

over the time interval[s, t].

In addition to the two characterisations – a topological and an algebro-geometric

one – of geometric rough paths given above, there is also a third way of describing

them that is analytical in nature and somewhat more concrete than the previous ones.

18

Let V = Rd. Then it can be shown that among all the p-rough paths in Ωp(V )

only the geometric ones x ∈ GΩp(V ) satisfy the following identity

x i1,...,ims,t . x j1,...,jn

s,t =∑

k1,...,km,km+1,...,km+n

∈{i1,...,im}t{j1,...,jn}

xk1,...,km+n

s,t (1.25)

where {i1, . . . , im}t{j1, . . . , jn} is the shuffle product of these two sets of indices, i.e.

the set of all permutations of {i1, . . . , im, j1, . . . , jn} that preserve the orderings of ik

(k = 1, . . . ,m) and jl (l = 1, . . . , n) – just like the orderings of cards in each half of

the deck are preserved in a riffle shuffle. It was first observed by R. Ree (see [12])

that if x is the signature of a bounded variation path in V constructed canonically

by means of Riemann integrals, then x satisfies the above shuffle product identity.

The analytical content of this rather combinatorial looking result may not be

immediately obvious, but if we set m = n = 1 above, then (1.25) becomes

x is,t . x

js,t = x i,j

s,t + x j,is,t

which, when interpreting the terms as normal first and second order iterated integrals,

is nothing other than the familiar integration by parts formula! Thus, the shuffle

product identity can be seen to be a generalisation of the integration by parts formula

to higher order iterated integrals.

Therefore, geometric rough paths can be characterised as those rough paths that

obey the standard rules of calculus, and it is in this fact that their analytical signifi-

cance lies.

To finish off our brief overview of the theory of rough paths, let us say a few words

about non-geometric p-rough paths for p ≥ 2 that hence don’t follow the ordinary

rules of calculus without any correction terms. In 2010, M. Gubinelli published a

new theory (see [4]) in which he defines branched rough paths as functionals mapping

from a simplex ΔT into a Hopf algebra H that is generated, as an algebra, by the set

of rooted trees with vertices labelled by the basis elements of the path space V = Rd

(containing the tensor product algebra T (V ) as a linear subspace spanned by the set

of linear, i.e. non-branched, trees) such that these functionals satisfy two algebraic

conditions analogous to the multiplicative property and the shuffle product identity

for geometric rough paths as well as an analytic condition that corresponds to finite

p-variation. Other parallels with the theory of geometric rough paths include the

fact that the set of branched rough paths also form a Lie group in the Hopf algebra

which is very similar to the free nilpotent Lie group. However, as branched, i.e. non-

geometric, rough paths will not be used in the rest of this thesis, we will not explore

19

this fascinating theory in greater detail.

In this chapter we have endeavoured to give an intuitive introduction to the theory

of rough paths, and especially we have wanted to show that even though this is a

very modern theory, developed over the past two decades, it has deep historical roots

going back to classical infinitesimal calculus and further all the way to ancient Greek

mathematics, and indisputably represents one of the most important advances to the

mathematical study of change since the days of Newton and Leibniz.

From its initial, purely analytical problem of solving differential equations con-

trolled by irregular paths – equivalently, integrating differential forms along such paths

– the key insight of the theory of rough paths has been to take the whole sequence of

iterated path integrals as the fundamental object driving differential equations and

controlling the local behaviour of their solutions, and, endowing them individually

with a rich algebraic structure, consequently discovering that collectively they form

a beautiful geometric object. Underpinning this pleasing aesthetics of the theory of

rough paths lies its fundamental achievement of giving meaning to the differential of

a function with controlled irregularity – and, in the field of mathematical analysis,

surely nothing is more fundamental than that.

20

Chapter 2

Application of rough paths theoryto time series analysis of financialdata streams

In this chapter we consider ways in which the theory of rough paths can be applied to

analyse time series, particularly focussing on high frequency financial data streams.

We begin with a brief discussion of the methods traditionally employed in classical

time series analysis, and then give a literature review of recent applications of the

approach to use the signature of a financial time series for the purposes of data

classification and prediction based on supervised learning algorithms. Finally, we

present our own novel application of rough paths theory in the context of financial

data analysis.

2.1 Classical time series analysis

Financial data is usually obtained in discrete form as a time series: an ordered set

of multi-dimensional numerical values X = {(X 1ti, . . . , X d

ti) ∈ Rd : i = 0, . . . , N}

observed at finitely many time points t0 < t1 < ∙ ∙ ∙ < tN . In traditional time series

analysis it is a common approach to view discrete data points as samples of an under-

lying continuous time stochastic process which is typically assumed to have a specific

form in order to capture certain characteristics of the time series being modelled – e.g.

autoregression (AR), moving-average (MA) or conditional heteroscedasticity (CH) –

and whose parameters are estimated using regression techniques so as to best fit the

chosen model to the given data stream.

However, parametric approaches to modelling time series have several inherent

limitations. First of all, they usually depend to a large extent on the assumptions

21

made about the underlying (unknown) data-generating process, and thus are sub-

ject to the potential risk of model misspecification, for it may happen that a chosen

model cannot adequately describe a given time series even with an optimal calibra-

tion. Secondly, in some cases sampling may not be an effective way to approximate

a continuous process – especially when dealing with highly oscillatory processes – as

a sequence of data points may fail to capture the order of events between different

coordinates of a multi-dimensional process, and hence be incapable of detecting la-

tencies and causal effects within the structure of the data stream. Though increasing

sampling frequency normally improves the accuracy of a discrete approximation, this

is not always the case: for example, it is known that sampling a Brownian motion,

as the driving signal of a dynamical system, with arbitrarily high frequency does not

necessarily provide sufficient statistical information for its effects on the evolution of

the system to be predicted.

Moreover, sampling at high rates is beset with its own fundamental problems,

chief of which is the curse of dimensionality. For instance, recording a high fre-

quency financial data stream tick by tick – which may be only milliseconds apart –

is usually an inefficient way of representing such data, for it is bound to contain lots

of redundant information, meaningless market noise, that might obscure the main

structural characteristics of the data stream, and to carry out regression analysis on

increments as features of the data stream would be infeasible because of prohibitively

high dimensionality. Therefore, one would like to find methods to summarise high

frequency data streams in a more concise way – to compress big data sets without

losing key information – and thus to achieve significant dimension reduction enabling

standard regression techniques to be applied. As we will see, using the signature of

a data stream, truncated to a suitable order, as the feature set of the data stream

accomplishes these objectives.

2.2 Signatures as feature sets for linear regression

analysis

As we recall from Section 1.2 above, B. Hambly and T. Lyons showed in [6] that

the signature of a multi-dimensional path of finite length uniquely determines the

path up to tree-like equivalence (i.e. up to modifications that have null effect when

the path is used as a system control), and hence pointed out that mapping a path

to its signature can be viewed as a faithful data transform in the sense that no

information is lost in the process. Following this notion in [8], D. Levin, T. Lyons

22

and H. Ni were the first to propose using signatures of time series for the purposes

of analysing financial data, and demonstrated the potential this approach has for

machine learning and statistical inference. Specifically, in their paper the authors built

a general non-parametric model for determining the conditional distribution of the

output variable of a system as a linear functional of the components of the expected

truncated signature of a random input stream (for a large number of series of data

samples), and thus, using the signature of a data stream as a feature set, estimated the

functional relationship between an input stream and the corresponding noisy system

response by employing standard techniques of regression analysis. Moreover, they

showed that classical parametric time series models such as AR, ARCH and GARCH

can be considered to be special cases of the expected signature model.

Given that signature is an intrinsically non-linear object, its use as a feature set

for linear regression analysis might, on first thought, seem somewhat counterintuitive.

However, assuming that the conditional distribution of a future system output is a

smooth function of the signature of the current input stream, then it is reasonable

also to assume that this function can be well approximated locally by a polynomial.

But, by the shuffle product property of signatures, as stated in (1.25) above, any

polynomial of signature components, i.e. iterated path integrals, can be expressed as

a linear combination of higher order signature components. Thus, it is indeed natural

to assume a linear relationship between the expected signature of input streams and

the system output.

Since the signature of a d-dimensional path truncated to order n has (dn+1−1)(d−1)

components, one can see that the signature approach allows high frequency data

streams to be represented by relatively few features compared to the sampling method.

Furthermore, the number of these signature based features does not depend on the

sampling frequency whereas the dimensionality of increment features increases linearly

with the sampling frequency. This reduction in effective dimensionality through the

use of signature components as a feature set can lead to substantial efficiency gains

when performing regression analysis, and may also help to avoid overfitting issues

that often bedevil the use of high frequency data. In [8], D. Levin et al. showed that

for a 2-dimensional path one can predict the output of a system controlled by the path

more accurately by using the truncated signature of degree 4, and thus of dimension

31, than by using an increment feature set of dimension 250, which demonstrates that

signature features furnish a more efficient summary of a path in terms of its effects

than traditional increment features. The authors also compared the performance of

the expected signature model to that of a non-parametric Gaussian process model,

23

and found that it achieves similar forecasting accuracy with a computational cost

that is lower by two orders of magnitude!

Even though some information is inevitably lost when the full signature of a path

is projected onto a finite-dimensional truncated tensor product space, the low order

components contain most of the information with the truncation error decreasing

factorially as the degree of the truncated signature increases, and, moreover, these

leading components are not particularly sensitive to the sampling frequency used.

Hence, the iterated integrals of a path (of bounded variation) provide very efficient

statistics of the path in the sense that they determine the response of any linear

system driven by the path very accurately. Indeed, the beauty and power of this

whole approach lies in the fundamental fact that the signature of a path efficiently

summarises information on normal time scales in a way that enables the effects of the

path in dynamical interactions to be effectively predicted without needing to know the

behaviour of the path on microscopic scales – which for some less regular paths can

be highly complex.

In [5], G. Gyurko et al. took a similar approach by embedding financial time se-

ries into continuous processes using linear interpolation between discrete data points,

computing truncated signatures of the paths thus constructed, and by performing

standard linear regression (combined with the LASSO shrinkage method) on the

signature components as a feature set, classified data streams in a given learning

set according to some selected properties, and then proceeded to classify fresh data

streams based on their signatures in out-of-sample testing. For example, in one of

the numerical tests presented in [5], it was explored to what extent the signatures of

streams of WTI crude oil futures market data (including mid-price, bid-ask spread,

order imbalance and cumulative trading volume) sampled by the minute from stan-

dard 30 minute time intervals determine the time buckets they are sampled from,

and, by using standard statistical indicators to measure the accuracy of classification,

it was shown that a very small number of low-dimensional signature components of

data streams suffice to characterise their time buckets with a high degree of accu-

racy (the ratio of correct classification exceeding 90% in most cases). This example,

together with the other experiments presented in [5] aiming to characterise two dif-

ferent trade execution algorithms by distinguishing between parent orders generated

by them and hence to detect their traces in market data, demonstrate again that

signatures of data streams efficiently capture information in a non-parametric way

that avoids traditional statistical modelling of time series data.

24

This paper is also notable for (i) introducing lead and lag transforms of a multi-

dimensional data stream – special types of time re-parameterisation of the data stream

that preserve its signature – in order to capture the quadratic variation of path com-

ponents, as this quantity – i.e. volatility – is of fundamental importance in financial

applications, and (ii) using first and higher order areas between path components to

analyse data streams. We will discuss these topics in detail in the next two sections,

especially the latter since in our novel application of the signature approach (to be

presented in Section 2.5) third order areas will play a key role as sensitive tools for

detecting mean-reverting behaviour in financial time series. However, let us first state

the fundamental properties of signatures that will be used in practice, as the theoret-

ical foundation of our numerical algorithms, to compute signatures of data streams

as well as the invariance property of signatures under time re-parameterisations upon

which the useability of lead and lag transforms rests.

Let Xt = (X1t , . . . , Xd

t ) ∈ Rd be a continuous d-dimensional path of bounded

variation defined on a time interval[0, T

]. For any multi-index, i.e. an ordered set of

indices I = (i1, . . . , ik) with k ≥ 1 and ij ∈ {1, . . . , d} for j = 1, . . . , k, we define, as in

(1.10), the kth order iterated integral of the path X corresponding to the multi-index

I over the time interval[s, t]

for any 0 ≤ s < t ≤ T by

X(i1,..., ik)s,t :=

∫∙ ∙ ∙∫

s<u1<∙∙∙<uk<t

dX i1u1

. . . dX ikuk

.

Then the signature S(X)s,t of X over the time interval[s, t]

is defined to be

the sequence of iterated integrals (XIs,t)I∈I where I is the set of all multi-indices

with the zeroth order component of the signature corresponding to the empty set of

indices defined to be 1, and, for any non-negative integer n, the truncated signature

S n(X)s,t of degree n is the sequence (XIs,t)I∈In where In is the set of all multi-indices

that consist of at most n indices. With this notation, we have the following key

properties of signatures:

(i) Uniqueness ([6, Theorem 1]): The signature S(X)s,t of a path (Xu)s≤u≤t of

bounded variation taking values in Rd determines the path, i.e. the function

u 7→ (Xu −Xs) for s ≤ u ≤ t, up to tree-like equivalence. Moreover, if at least

one of the co-ordinates X iu with i ∈ {1, . . . , d} is a monotonically increasing

function of u, then the path (Xu)s≤u≤t is uniquely determined by the signature

S(X)s,t. However, the proof of this uniqueness result in [6] is non-constructive,

and recently X. Geng has provided an explicit method in a more general setting

25

for reconstructing a rough path from its signature (see [3]), thus effectively in-

verting the signature map (Xu)s≤u≤t 7→ S(X)s,t. It should be noted that any two

1-dimensional paths whose initial and final values differ by the same amount are

tree-like equivalent and hence have the same signature, irrespective of the way

the distance between the start and end points is traversed (whether travelling

straight or zigzagging). By contrast, this is not the case for multi-dimensional

paths whose higher order iterated integrals are not uniquely determined by their

first order increments, but in general depend on the trajectories between the

start and end points. Nevertheless, by the uniqueness property and (iii) below,

the signature of a path of arbitrary dimension that is tree-like equivalent to a

linear path is uniquely determined by its first order increments.

(ii) Invariance under time re-parameterisations: For any continuous and

monotonically increasing function f :[0, T

]→[U, V

]and (i1, . . . , ik) ∈ I,

we have∫∙ ∙ ∙∫

s<u1<∙∙∙<uk<t

dX i1u1

. . . dX ikuk

=

∫∙ ∙ ∙∫

f(s)<u1<∙∙∙<uk<f(t)

dX i1f−1(u1) . . . dX ik

f−1(uk)

for any 0 ≤ s < t ≤ T . Therefore, S(X)s,t = S(X)f(s),f(t) where the path

(Xu)f(s)≤u≤f(t) is an arbitrary time re-parameterisation of the original path

(Xu)s≤u≤t such that Xu = Xf−1(u) for f(s) ≤ u ≤ f(t).

(iii) Signature of a linear path: If Xt = X0 + Y t for some fixed points X0 and

Y = (Y1, . . . , Yd) in Rd and all t ∈[0, T

], then for any multi-index (i1, . . . , ik)

X(i1,...,ik)s,t =

(t− s)k

k!

k∏

j=1

Yij

for any 0 ≤ s < t ≤ T . Thus, each iterated integral of a linear path is simply

the product of its increments in the relevant co-ordinates over the time interval

divided by the factorial of the order of the iterated integral. This means that the

signature of a linear path – in fact that of any path – is independent of its initial

value X0, or, to put it differently, signatures are invariant under translations of

paths in the spatial domain.

(iv) Multiplicative property: For all 0 ≤ s ≤ t ≤ u ≤ T , we have

S(X)s,t ⊗ S(X)t,u = S(X)s,u .

26

Application of lead and lag transforms to data streams relies on property (ii)

in that time re-parameterisations of paths leave their signatures invariant, whereas

properties (iii) and (iv) will be used to compute truncated signatures of data streams

through the following procedure: 1) for given data streams, continuous paths are

constructed by linearly interpolating between discrete data points, 2) the signature

of each linear segment of the path is computed using (iii), and 3) the signatures of

contiguous linear segments are joined together to form the signature of the whole

piecewise linear path using (iv).

Note (on Numerical Algorithms). Even though free open-source software packages

are available for the computation of signatures1, all numerical algorithms used in this

thesis were developed from scratch and implemented in Python by the author. These

include a function that produces a time-indexed sequence of signatures, truncated to

an arbitrary degree specified by the user, for a given serial data stream of arbitrary

dimension, and functions that generate different types of lead and lag transforms

of input data streams, as well as various routines used to visualise outputs of such

functions. All of these programs were rigorously tested (e.g. by checking signature

components against iterated integrals computed in an Excel spreadsheet) to ensure

that they do not contain any bugs. Code samples are exhibited in Appendix 2.

2.3 Lead and lag transforms of data streams

Since financial data usually comes in the form of a time series, whereas signatures are

defined for continuous paths, our first task is to embed discrete data streams into paths

defined over continuous time intervals. Let X = {(X 1ti, . . . , X d

ti) ∈ Rd : i = 0, . . . , N}

be a set of data points observed at finitely many time points t0 < t1 < ∙ ∙ ∙ < tN .

Obviously there are various possible ways of embedding X into a continuous time

path (Xt)t0≤t≤tN so that Xti = X ti for i = 0, . . . , N – and different ways of ‘joining

the dots’ generally produce paths with different signatures. The following methods

are the most relevant for our current purposes: the first two – constructing piecewise

linear or piecewise constant paths – are standard approaches (applied, for instance,

in [5] and [8], respectively), whereas the third method – lead and lag transforms –

was introduced by B. Hoff in his D.Phil. thesis [7] in 2005.

1 For example, the sigtools Python package which is based on the libalgebra library of theCoRoPa project (downloadable from http://sourceforge.net/projects/coropa) that was usedin both [5] and [8].

27

(i) Piecewise linear interpolation: For t ∈[ti, ti+1

]with i = 0, . . . , N − 1, we

define

Xt = X ti +t− ti

ti+1 − ti

(X ti+1

− X ti

).

(ii) Piecewise constant ‘axis’ path: For t ∈[ti, ti+1

)with i = 0, . . . , N − 1, we

set Xt = X ti , so that at each time point ti+1 (i = 0, . . . , N − 1) the path jumps

discontinuously to the value X ti+1. It is worth remarking that even though such

axis paths are continuous time paths in the sense that they are defined for a

continuous range of time values within a specified interval, they are clearly not

continuous functions of time, and to call them that, as is done in some research

papers (see e.g. [8]) is somewhat misleading.

For any embedding of a data stream (X ti)Ni=0 into a continuous time path

(Xt)t0≤t≤tN , we define the signature of the data stream as S(X)t0,tN . As said,

in general different embeddings yield different signatures, but it is easy to see

that the piecewise linear path X lint and the piecewise constant path Xcon

t defined

from the same data stream have the same signature. However, one should note

that even though S(X lin)ti,tj = S(Xcon)ti,tj for any 0 ≤ i < j ≤ N , S(X lin)ti,t

does not equal S(Xcon)ti,t for any t ∈(tj , tj+1

), since S(Xcon)tj ,t = (1, 0, 0, . . . ).

(iii) Lead and lag transforms: The idea behind lead and lag transforming a given

d-dimensional time series (X ti)Ni=0 is to create new backward (‘lag’) and forward

(‘lead’) time series by adding data points to the original time series in two

distinct ways that both preserve its increments, and hence leave its signature

invariant, since such transforms are time re-parameterisations of the given data

stream. The data points of the lag and lead transformed streams are then

joined together to form axis or piecewise linear paths, depending on the method

applied. Several different definitions of lead and lag transforms can be found in

the literature, of which we will review three methods below.

2.3.1 Gyurko-Lyons-Kontkowski-Field method

In [5, Section 2.5], Gyurko, Lyons, Kontkowski and Field (‘GLKF’) defined the

lead and lag transforms of a d-dimensional data stream (X ti)Ni=0 as follows: For

i = 0, . . . , N , X leadti

= X lagti = X ti , and, for i = 1, . . . , N , X lead

ti− 1

2

= X ti and

X lagti− 1

2

= X ti−1. Thus, the lead and lag transforms of a given stream of N + 1 data

points consist of 2N + 1 data points. Indeed, from the above description it is easy to

see that they can be created by repeating the data points of the original stream, and

28

deleting the first and last data points in order to obtain the lead and lag transforms,

respectively. Hence, lead and lag transforms are time translations of each other, as

illustrated below in Figure 1 which displays the lead and lag transforms produced

by applying the GLKF method to a 1-dimensional data stream whose increments are

randomly sampled from a standard normal distribution.

Figure 1: GLKF method of lead-lag transforming data streams.

This definition was motivated by the authors’ desire to be able to easily read off the

volatilities of the components (X jti)

Ni=0, for j = 1, . . . , d, of a given data stream from

the signature of the (2d)-dimensional data stream (Y ti/2) 2Ni=0 = (X lead

ti/2, X lag

ti/2) 2Ni=0, as

volatilities of market variables are highly relevant quantities in financial applications.

For, it is straightforward to verify by a direct calculation that for any j = 1, . . . , d

Y(j, j+d)t0,tN

− Y(j+d, j)t0,tN

=N−1∑

i=0

(X jti+1− X j

ti)2

where (Yt)t0≤t≤tN is the piecewise constant or piecewise linear interpolation of the data

stream (Y ti/2) 2Ni=0, i.e. the quadratic variation of the jth component of the original

29

data stream (X ti)Ni=0 is equal to the difference between the iterated integral of the jth

components of the lead and lag transformed data streams and the iterated integral of

the same components in reverse order. This latter quantity is twice the area between

the jth components of the lead and lag transformed data streams, as defined in Section

2.3 of [5]. We will use the same definition of area between path components in this

work.

Furthermore, in Section 2.5 of [5] it was claimed that “the (signed) area between

the ith component of the lead-transform and the jth component of the lag-transform

equals to the quadratic cross-variation of the trajectories X i and Xj ”. Unfortunately,

this is not a valid statement, as one can readily show either analytically or using a

numerical simulation. The current author has verified this in both ways. To prove

this point mathematically, in Appendix 1 we provide an explicit calculation of the

signature of a multi-dimensional data stream with one lead-transformed and one lag-

transformed component.

2.3.2 Flint-Hambly-Lyons method (Mark 1)

In the early version of [2] (of February 2014), Flint, Hambly and Lyons (‘FHL’) used

a different definition (see Definition 1.2 of that version of their paper) for lead and

lag transforms by setting X leadti

= X ti+1and X lag

ti = X ti−1for i = 1, . . . , N − 1,

X leadti+1

2

= X ti+1and X lag

ti+1

2

= X ti for i = 0, . . . , N − 1, with X leadt0

= X lagt0 = X t0 and

X leadtN

= X lagtN

= X tN , and then linearly interpolating between data points to form

continuous time paths. Figure 2 below illustrates the lead and lag transforms of the

same data stream as in Figure 1 produced by using this method.

From Figure 2, we can see that at the start and at the end of the data stream its

lead and lag transforms are not simple time translations of each other; nevertheless,

this method of lead-lag transforming a data stream does preserve its increments, and

hence leaves its signature invariant.

2.3.3 Flint-Hambly-Lyons method (Mark 2)

In the current version of [2] (of September 2016), the authors have modified their

earlier definition of lead and lag transforms (see Definition 2.1 of that version of the

paper), and now define them by setting X leadti

= X leadti+1

4

= X leadti+1

2

= X ti+1and X lead

ti+3

4

=

X ti+2for i = 0, . . . , N − 2 with X lead

tN−1= X lead

tN− 3

4

= X leadtN− 1

2

= X leadtN− 1

4

= X leadtN

= X tN ,

and X lagti = X lag

ti+1

4

= X lagti+1

2

= X lagti+3

4

= X ti for i = 0, . . . , N − 1 with X lagtN

= X tN−1.

(Strictly speaking, Definition 2.1 of [2] fails to assign a value to the penultimate point

30

Figure 2: FHL (Mark1) method of lead-lag transforming data streams.

X leadtN− 1

4

of the lead transform.) Thus, under this method the lead and lag transforms

of a time series of N +1 data points consist of 4N +1 data points. In Figure 3 below

these are illustrated for the same data stream that was used in Figures 1 and 2.

This new definition was suggested by a context in mathematical finance where an

investor would readjust at time ti+1 the amounts of stock he holds in his portfolio

based on the stock prices at time ti – i.e. when there is a delay between receiving

market information and acting on it by trading – for by defining the lead and lag

transforms of the time series of stock prices in this way allows one to express the

profit (or loss) made by the investor’s trading strategy as an exact integral of a

function of the lag transform with the lead transform as an integrator.

The problem with this definition is that lead and lag transforms specified as above

do not preserve the increments of a data stream, since X leadt0

= X t1 and X lagtN

= X tN−1

mean that the first and last increments of the original data stream are missing from

the lead and lag transformed streams, respectively, as can be seen from Figure 3

below, and consequently their signatures generally differ from that of the original

31

Figure 3: FHL (Mark 2) method of lead-lag transforming data streams.

data stream (as well as from each other), as one can readily verify with a numerical

example. However, this situation can be easily remedied by redefining X leadt0

= X t0

and X lagtN

= X tN .

2.4 Area processes of multi-dimensional paths

2.4.1 Definition and basic properties of areas

In Subsection 2.3.1 above we already encountered the concept of area between two

components of a multi-dimensional path. This is now formalised in the following

Definition 2.1 (Area). Let X : u ∈[0, T

]7→(X1

u, . . . , Xdu

)∈ Rd be a continuous

path of finite length. Then the area A(i, j)s,t between two path components X i

u and Xju

with 1 ≤ i, j ≤ d over any time interval[s, t]

where 0 ≤ s ≤ t ≤ T is defined by

A(i, j)s,t :=

1

2

v=t∫

v=s

u=v∫

u=s

dX iu dXj

v −

v=t∫

v=s

u=v∫

u=s

dXju dX i

v

=1

2

(X

(i, j)s,t −X

(j, i)s,t

).

32

Figure 4: Area between path components X i and Xj .

As immediate consequences of the above definition, we have that A(i, i)s,t = 0 and

A(i, j)s,t = −A

(j, i)s,t for all 1 ≤ i, j ≤ d and 0 ≤ s ≤ t ≤ T .

The quantity A(i, j)s,t has a natural geometric interpretation – which also explains

its name – as the signed area between the curve u 7→ (X iu, Xj

u) for u ∈[s, t]

and the

chord that connects the start point (X is, Xj

s ) and end point(X i

t , Xjt

)of the curve.

This is illustrated in Figure 4 above. For, it is clear that the second order iterated

integrals X(i, j)s,t and X

(j, i)s,t represent the areas between the curve and the vertical and

horizontal axes, respectively, so that X(i, j)s,t + X

(j, i)s,t = X

(i)s,tX

(j)s,t . Moreover, denoting

the area between the curve and the chord by A(i, j)s,t , which is shaded yellow in Figure

4, we have that A(i, j)s,t +X

(j, i)s,t = 1

2X

(i)s,tX

(j)s,t , which, when substituted into the previous

equation, yields A(i, j)s,t = 1

2

(X

(i, j)s,t −X

(j, i)s,t

).

One should emphasize that A(i, j)s,t is a signed area, i.e. that it can take positive

or negative (or zero) values: indeed, if A(i, j)s,t > 0, then, as we have seen, it follows

straight from the definition that A(j, i)s,t < 0; or, in geometric terms, reflecting the

curve in Figure 4 in the chord connecting its start and end points, which corresponds

to reversing the order of the path components in the area calculation, would give a

negative area (of the same absolute magnitude) above the chord.

33

Figure 5: A typical 2-dimensional Brownian sample path.

However, sample paths of stochastic processes don’t usually look like the smooth

monotonic curve displayed in Figure 4. Rather, a typical 2-dimensional Brownian-

like random walk with independent normal increments is shown in Figure 5 above.

It should be noted that for less regular paths that zigzag and cross themselves the

simple geometric interpretation of area between path components as the signed area

enclosed between the curve and the chord is no longer valid in general. In particular,

it is easy to see that the sign of the area enclosed within a loop – a path segment

whose start and end points coincide – depends on the direction in which the loop

is traversed. Yet none of the many research articles or text books we have come

across that deal with areas between path components point out this basic fact while

presenting the standard geometric interpretation.

Another invalid view about areas between path components that can be found

in the literature is expressed in the following statement (see Section 2.3 of [5] under

the heading of ‘Lead-lag relationship’): “if an increase (respectively decrease) of the

component X1 is typically followed by an increase (decrease) in the component X2,

34

then the area A1,2 is positive. If a move in X1 is followed by a move in X2 to the

opposite direction, the area is negative”. This assertion is clearly false, since, for

example in Figure 4 above, both the path and its reflection in the chord connecting

its start and end points have positive increments in their components, but the area

enclosed by the former below the chord is positive whereas the area enclosed by

the latter above the chord is negative. It is also evident that the area between path

components which have increments of opposite signs can be either positive or negative

(or zero).

Even though area between path components does not capture correlation between

increments in the components, as is clear from the above discussion, by viewing

the operation of taking iterated integrals of path components as a kind of ‘product’

on the space of path components, the operation of computing areas between path

components can be regarded as a ‘commutator’ or ‘Lie bracket’ on the space of path

components, and in this sense the area between two path components can be viewed

as a measure of their non-commutativity. We will formalise this idea in the next

subsection, and will subsequently explore what kind of algebraic structure it endows

on the path space.

2.4.2 Higher order areas

In the previous subsection, we defined the area A(i1, i2)s, t between two components X i1

u

and X i2u of a continuous d-dimensional path

(X1

u, . . . , Xdu

)0≤u≤T

of finite length where

ik ∈ {1, . . . , d} for k = 1, 2 over a time interval[s, t]

with 0 ≤ s ≤ t ≤ T . For a

fixed s, A(i1, i2)s, u is a function of u for s ≤ u ≤ t, and thus A

(i1, i2)s, u can be viewed as

a 1-dimensional path defined on the time interval[s, t]. Furthermore, as A

(i1, i2)s, u is

clearly continuous and has finite length, we can define the second order area A((i1, i2), i3)s, t

between three path components X i1u , X i2

u and X i3u where ik ∈ {1, . . . , d} for k = 1, 2, 3

over a time interval[s, t]

with 0 ≤ s ≤ t ≤ T as follows:

A((i1, i2), i3)s,t :=

1

2

v=t∫

v=s

u=v∫

u=s

dA(i1, i2)s, u dX i3

v −

v=t∫

v=s

u=v∫

u=s

dX i3u dA(i1, i2)

s, v

. (2.1)

By the anti-commutativity of area operation with respect to the order of path

indices, we have that A((i1, i2), i3)s, t = −A

(i3, (i1, i2))s, t = A

(i3, (i2, i1))s, t . However, one should

carefully note that the operation of forming second order areas is not associative –

the way path indices are bracketed certainly matters – so that in general A((i1, i2), i3)s, t

is not equal to A(i1, (i2, i3))s, t . For example, when i2 = i3, A

(i1, (i2, i3))s, t = 0, but for i1 6= i2

A((i1, i2), i3)s, t may well be non-zero.

35

Let us examine the area differential dA(i1, i2)s, u := A

(i1, i2)s, u+du−A

(i1, i2)s, u that appeared in

the above definition, as it will not only enable us to express second and higher order

areas (to be analogously defined shortly) as linear combinations of iterated integrals

of one degree higher, but will also give the crucial idea for our classification of paths

using third order areas.

Since the area differential can be written as

dA(i1, i2)s,u =

1

2

v=u∫

v=s

dX i1v

dX i2u −

v=u∫

v=s

dX i2v

dX i1u

=1

2

{ (X i1

u −X i1s

)dX i2

u −(X i2

u −X i2s

)dX i1

u

},

(2.2)

we have that dA(i1, i2)s, u = 0 is equivalent to

dX i2u =

(X i2u −X i2

s )

(X i1u −X i1

s )dX i1

u

provided that (X i1u −X i1

s ) 6= 0. Therefore, it follows that A(i1, i2)s, u = 0 for all u ∈

[s, t]

if and only if

X i2u = X i2

s +(X i2

t −X i2s )

(X i1t −X i1

s )(X i1

u −X i1s )

with X i1t 6= X i1

s or X i1u = X i1

s for all u ∈[s, t]. Thus, the area process u 7→ A

(i1, i2)s, u for

u ∈[s, t]

is identically zero if and only if the point (X i1u , X i2

u ) traces a straight line of

fixed slope from the start point (X i1s , X i2

s ) to the end point (X i1t , X i2

t ) for u ∈[s, t]

– possibly moving backwards and forwards or pausing for some time along the way.

Indeed, this is a natural result bearing in mind the earlier picture of an area between

a curve and the chord connecting its start and end points! For, in order to have a

zero area enclosed between a curve and its chord at every point in time the two must

always coincide; equivalently, a non-zero area can arise only if the curve has non-zero

curvature at some point in time. Thus, the area process of two path components can

be seen to capture any curvature in their trajectory.

We can now use the above expression for area differential to derive a general

formula for second order area as a linear combination of third order iterated integrals.

For, by substituting (2.2) into (2.1), after some manipulation of the iterated integrals,

we obtain

A(i1, (i2, i3))s, t =

1

4

(X

(i1, i2, i3)s, t + X

(i2, i1, i3)s, t + X

(i3, i2, i1)s, t

−X(i1, i3, i2)s, t −X

(i2, i3, i1)s, t −X

(i3, i1, i2)s, t

).

(2.3)

36

For indices i1, i2 and i3 that are all distinct, the third order iterated integrals

corresponding to all the permutations of them generally have different values so that

the expression for their second order area consists of 6(= 3!) different terms, whereas,

when i1 = i2 or i1 = i3, two of the terms cancel each other out and the remaining four

comprise two pairs of equal terms so that A(i1, (i1, i3))s, t = 1

2

(X

(i1, i1, i3)s, t −X

(i1, i3, i1)s, t

),

and, when i2 = i3, A(i1, (i2, i3))s, t of course vanishes. In particular, one should note that

the signs of the iterated integrals in the expression for the second order area are not

given by the signs of the permutations of the indices.

Analogously, third order areas involve four path components, and can be defined

in two fundamentally different ways: as the area between path component X i1u and

the second order area A(i2, (i3, i4))s, u between path components X i2

u , X i3u and X i4

u , or the

area between the areas A(i1, i2)s, u and A

(i3, i4)s, u of two pairs of path components. Formally,

we define

A(i1, (i2, (i3, i4)))s, t :=

1

2

v=t∫

v=s

u=v∫

u=s

dX i1u dA(i2, (i3, i4))

s, v −

v=t∫

v=s

u=v∫

u=s

dA(i2, (i3, i4))s, u dX i1

v

(2.4)

and

A((i1, i2), (i3, i4))s, t :=

1

2

v=t∫

v=s

u=v∫

u=s

dA(i1, i2)s, u dA(i3, i4)

s, v −

v=t∫

v=s

u=v∫

u=s

dA(i3, i4)s, u dA(i1, i2)

s, v

. (2.5)

For our path classification purposes, we will find third order areas of the latter type

much more useful, and in the sequel whenever we refer to a third order area without

qualification we will always mean a third order area of this type.

Applying the differential operator to (2.3) and substituting into (2.4) yields

A(i1, (i2, (i3, i4)))s, t =

1

8

(X

(i1, i2, i3, i4)s, t + X

(i2, i1, i3, i4)s, t + X

(i2, i3, i1, i4)s, t

+ X(i1, i3, i2, i4)s, t + X

(i3, i1, i2, i4)s, t + X

(i3, i2, i1, i4)s, t

+ X(i1, i4, i3, i2)s, t + X

(i4, i1, i3, i2)s, t + X

(i4, i3, i1, i2)s, t

+ X(i2, i4, i3, i1)s, t + X

(i3, i4, i2, i1)s, t + X

(i4, i2, i3, i1)s, t

−X(i1, i2, i4, i3)s, t −X

(i2, i1, i4, i3)s, t −X

(i2, i4, i1, i3)s, t

−X(i1, i4, i2, i3)s, t −X

(i4, i1, i2, i3)s, t −X

(i4, i2, i1, i3)s, t

−X(i1, i3, i4, i2)s, t −X

(i3, i1, i4, i2)s, t −X

(i3, i4, i1, i2)s, t

−X(i2, i3, i4, i1)s, t −X

(i3, i2, i4, i1)s, t −X

(i4, i3, i2, i1)s, t

).

(2.6)

37

Similarly, substituting (2.2) into (2.5) gives

A((i1, i2), (i3, i4))s, t =

1

8

(X

(i1, i2, i3, i4)s, t + X

(i1, i3, i2, i4)s, t + X

(i3, i1, i2, i4)s, t

+ X(i2, i1, i4, i3)s, t + X

(i2, i4, i1, i3)s, t + X

(i4, i2, i1, i3)s, t

+ X(i4, i3, i1, i2)s, t + X

(i4, i1, i3, i2)s, t + X

(i1, i4, i3, i2)s, t

+ X(i3, i4, i2, i1)s, t + X

(i3, i2, i4, i1)s, t + X

(i2, i3, i4, i1)s, t

−X(i2, i1, i3, i4)s, t −X

(i2, i3, i1, i4)s, t −X

(i3, i2, i1, i4)s, t

−X(i1, i2, i4, i3)s, t −X

(i1, i4, i2, i3)s, t −X

(i4, i1, i2, i3)s, t

−X(i3, i4, i1, i2)s, t −X

(i3, i1, i4, i2)s, t −X

(i1, i3, i4, i2)s, t

−X(i4, i3, i2, i1)s, t −X

(i4, i2, i3, i1)s, t −X

(i2, i4, i3, i1)s, t

).

(2.7)

Thus, as one can see from (2.6) and (2.7) above, the general expressions for the two

types of third order area as linear combinations of 4! = 24 fourth order iterated

integrals are indeed different. Further, general formulae for different types of higher

order areas could be worked out just as straightforwardly – though the tedium of such

an exercise increases factorially, and for our current purposes it is unnecessary to go

beyond third order areas.

In addition to mathematically proving (2.3), (2.6) and (2.7), these expressions

have also been verified numerically by computing second and third order areas from

first principles according to their definitions (2.1), (2.4) and (2.5), and by comparing

the results obtained using the two methods to make sure that they agree.

Finally, we return to our earlier suggestion of viewing the operation of computing

areas as a Lie bracket on the space of 1-dimensional paths. For, if (X1u)s≤u≤t and

(X2u)s≤u≤t are two paths defined over the time interval

[s, t], we can define their

product (X1 ∗ X2)s≤u≤t to be the path u 7→ X(1, 2)s, u for u ∈

[s, t], i.e. multiplying

paths means computing their second order iterated integral. Further, we define the

Lie bracket[X1, X2

]s≤u≤t

of X1u and X2

u to be their commutator, so that for u ∈[s, t]

[X1, X2

]u

:=(X1 ∗X2 −X2 ∗X1

)u

= X(1, 2)s, u −X(2, 1)

s, u = 2A(1, 2)s, u . (2.8)

It is clear that the multiplication * on the path space is non-commutative, since

in general X(1, 2)s, u is not equal to X

(2, 1)s, u , and it is also important to realise that it is not

associative either, as can readily be seen through theoretical considerations as well as

verified by numerical examples, i.e.

((X1 ∗X2) ∗X3

)u6=(X1 ∗ (X2 ∗X3)

)u

,

38

for arbitrary paths X1u, X2

u and X3u defined on

[s, t]. Moreover, it should be pointed

out that generally ((X1 ∗X2) ∗X3)u is not equal to X(1, 2, 3)s, u : e.g. for a 3-dimensional

linear path(X1

u, X2u, X3

u

)s≤u≤t

the former equals 14X

(1)s,uX

(2)s,uX

(3)s,u whereas the latter

equals 16X

(1)s,uX

(2)s,uX

(3)s,u .

Despite being both bilinear and anti-commutative, the Lie bracket defined by (2.8)

does not endow the path space with the algebraic structure of a Lie algebra, because

the Jacobi identity does not hold due to the non-associativity of the * multiplication.

However, using (2.3), the Lie bracket is easily seen to satisfy the following interesting

identity for any u ∈[s, t]:

[ [X1, X2

], X3

]u

+[ [

X2, X3], X1

]u

+[ [

X3, X1], X2

]u

= X(1, 2, 3)s, u −X(1, 3, 2)

s, u + X(2, 3, 1)s, u −X(2, 1, 3)

s, u + X(3, 1, 2)s, u −X(3, 2, 1)

s, u .

Thus, we note that the expression on the right hand side of the above Jacobi-type

identity for our Lie bracket is an alternating sum of third order iterated integrals

for all the permutations of the path indices where the sign of the term indexed by

(i1, i2, i3) with distinct ik ∈ {1, 2, 3} for k = 1, 2, 3 is the sign of the permutation

that maps (1, 2, 3) to (i1, i2, i3).

2.5 Classification of paths using third order areas

2.5.1 Diffusion process market model

The idea of modelling the evolution of stock prices and other variables in financial

markets by diffusion processes goes back to the pioneering work of L. Bachelier at the

turn of the 20th century. By a diffusion process we mean an n-dimensional stochastic

process Xt =(X1

t , . . . , Xnt

)t≥0

that satisfies a stochastic differential equation of the

form

dX it = μi(t,Xt)dt +

m∑

j=1

σ ij (t,Xt)dBj

t

where, for each i ∈ {1, . . . , n}, μi : Rn+1 → R and σ ij : Rn+1 → R are the drift and

diffusion (volatility) coefficients of X it , and

(Bj

t

)t≥0

are independent Brownian motion

processes for j = 1, . . . ,m.

In a financial setting, a diffusion process Xt =(X1

t , . . . , Xnt

)t≥0

may be used

to model a financial market that consists of n market variables X it (i = 1, . . . , n)

such as stock and commodity prices, currency exchange rates and interest rates of

various maturities. Thus, in such a modelling framework the market is driven by

39

m random processes – which may be interpreted as representing macro- and micro-

economic factors as well as political or other events that affect the prices of financial

instruments – whose impacts on the value of each market variable are superimposed

on an underlying growth rate or ‘drift’ that is specific to each variable but may

depend on the values of other variables as well as time. In an alternative, more

restricted formulation, each market variable would be driven by a single random

process associated with it, though the Brownian motions for different variables would

be assumed to be correlated, and the drift and volatility of each variable would be

functions of its own value and time only. In this section, we shall adopt this approach

to modelling a financial market, as specified below.

Some market variables could be expected to grow in value by a constant or possibly

deterministic time-dependent rate: e.g. it might be appropriate to assume that the

share price of a public listed company would grow at a constant rate subject to random

shocks due to market releases of company-specific information or general economic

data. By contrast, other market variables have a tendency to fluctuate around some

long-term mean levels in a cyclical fashion – e.g. interest rates exhibit such mean-

reverting behaviour over economic cycles – and for any such market variable X it one

could postulate its drift to be of the form θi(αi − X i

t

)where the parameters θi and

αi are called its mean reversion speed and mean-reverting level (or long-term mean),

respectively, which in general might be time-dependent. Hence, if the current value

of such a mean-reverting process is below its long-term mean, the drift is positive,

whereas if its current value is above the mean-reverting level, the drift is negative, so

that in both cases the process tends to revert towards its long-term mean.

From now on, we shall consider a financial market consisting of n variables X it

(i = 1, . . . , n) each of which follows either a Wiener process with a constant drift

(‘CD process’)

dX it = μidt + σidBi

t (2.9)

where μi and σi > 0 are constants, or a mean-reverting Ornstein-Uhlenbeck process

(‘MR process’)

dX it = θi

(αi −X i

t

)dt + σidBi

t (2.10)

where αi, θi > 0 and σi > 0 are constants, and the correlation ρij between the

Brownian motion Bit driving the variable X i

t and the Brownian driver Bjt of another

variable Xjt is given by E

[dBi

tdBjt

]= ρijdt.

For t ≥ 0, (2.9) and (2.10) can be easily integrated to give

X it = X i

0 + μit + σiBit (2.11)

40

and

X it = X i

0 e−θit + αi(1− e−θit

)+ σi

∫ t

0

e−θi(t−s)dBis . (2.12)

Thus, for a mean-reverting variable X it , the limit of E

[X i

t ] as t tends to infinity is αi,

so this parameter is indeed the long-term mean of such a process.

We will simulate the evolution of our market Xt =(X1

t , . . . , Xnt

)0≤t≤T

, as defined

above, over a finite time horizon[0, T

]by generating a large number of correlated

Brownian sample paths(Bi

t

)0≤t≤T

for the market variables(X i

t

)0≤t≤T

for i = 1, . . . , n,

assuming that all pairs of Brownian motions driving distinct market variables have

the same correlation ρ.

Even though in our diffusion model all paths of market variables are, by construc-

tion, realisations of CD or MR processes for some combinations of drift (or mean

reversion speed and long-term mean) and volatility parameters, just by looking at

such market paths it is usually far from apparent which type of diffusion process was

used to generate them and what parameter values those processes may have had.

For example, individual realisations of an MR process for different Brownian sample

paths might well appear either upward or downward trending rather than exhibiting

mean-reverting behaviour depending on the characteristics of the Brownian sample

paths driving the process. Indeed, an arbitrary continuous path can be represented

as a realisation of either a CD or MR process with any combination of parameters

for some continuous path as the Brownian driver of the given path; and, further, any

number of arbitrary paths can all be regarded as realisations of, say, the same CD

process driven by different Brownian sample paths.

As can be seen from (2.9) and (2.10), for a small time step δt, the drift term of the

increment of a CD or MR process is of the order of δt whereas the expected absolute

value of the diffusive term is of the order of√

δt. This means that over short time

scales the diffusive term tends to dominate the drift term with random Brownian

movements obscuring any underlying constant drift or mean-reverting trend, such

latent tendencies only manifesting themselves over longer time periods. In practice,

when trying to estimate the parameters of a CD process one commonly finds that while

one can usually obtain a reasonably accurate estimate for the volatility parameter by

calculating the standard deviation of increments for a small number of realised paths

– even for a single realisation – the sample mean of increments is often, even for a

significantly larger number of realisations, a woefully inadequate estimate for the drift

parameter.

41

2.5.2 Areas for pairs of diffusion processes

The key to classifying paths in our diffusion process market model is to derive general

expressions for the area differentials of pairs of realisations of CD or MR processes.

For two CD processes(X ik

t

)0≤t≤T

with drift and volatility parameters μik and σik

where ik ∈ {1, . . . , n} for k ∈ {1, 2}, both of which are driven by the same Brownian

path(Bt

)0≤t≤T

, by substituting (2.9) and (2.11) into (2.2) one readily obtains

dA(i1, i2)0, t =

1

2

{(μi1σi2 − μi2σi1

)(t dBt − Btdt

)}(2.13)

for 0 ≤ t ≤ T . Similarly, for two MR processes(Y ik

t

)0≤t≤T

with long-term mean

and volatility parameters αik and σik and the same mean reversion speed θ where

ik ∈ {1, . . . , n} for k ∈ {1, 2}, both of which are driven by the same Brownian path(Bt

)0≤t≤T

, by substituting (2.10) and (2.12) into (2.2) and after some algebra one

arrives at the following expression:

dA(i1, i2)0, t =

1

2

{((αi1 − Y i1

0 )σi2 − (αi2 − Y i20 )σi1

)(ft(θ)dBt − θZt(θ)dt

)}(2.14)

where ft(θ) = 1 − e−θt and Zt(θ) =

∫ t

0

e−θ(t−s)dBs for 0 ≤ t ≤ T . We note that

as θ → 0, ft(θ) → θt and Zt(θ) → Bt, so in the limit when the mean reversion

speed approaches zero, we recover (2.13) with the drifts μik equal to the initial drifts

θ(αik − Y ik0 ) of the two MR processes for k ∈ {1, 2}.

Just as straightforwardly one could also derive a (somewhat messier) expression

for the area differential of a CD process and an MR process driven by the same

Brownian path as a linear combination of dBt and dt terms whose coefficients are

functions of ft(θ), t, Bt and Zt(θ) as well as the parameters of the two processes, but

since this won’t be used in our subsequent analysis it is omitted.

Hence, as can be seen from (2.13) and (2.14), the increment of the area of two

CD or two MR processes with the same mean reversion speed and driven by the

same Brownian path over a small (non-infinitesimal) time interval δt has a stochastic

dependence on the cumulative value of the underlying Brownian motion as well as a

deterministic time dependence, and, depending on the relative magnitudes and signs

of these quantities, the area increment can have either the same or the opposite sign

as the Brownian increment δBt at different points in time along the path. Thus, in

this sense increments of the area process of two such CD or MR processes are in gen-

eral neither perfectly correlated nor anti-correlated with increments of the Brownian

motion driving the processes.

42

However, the key observation is that the ratio of area increments of two pairs

of CD processes or two pairs of MR processes that have the same mean reversion

speed, with all four processes driven by the same Brownian path, is a deterministic,

time-invariant constant that depends only on the drift and volatility parameters of

the four CD processes or on the long-term mean and volatility parameters of the

four MR processes (as well as their initial values, but is independent of the mean

reversion speed shared by the four MR processes). This means that in these cases the

trajectory t 7→(A

(i1, i2)0, t , A

(i3, i4)0, t

)is a straight line of constant gradient for t ∈

[0, T

],

or, equivalently, the third order area A((i1, i2), (i3, i4))0, t is identically equal to zero for all

t ∈[0, T

]. This is illustrated in Figure 6 below where, at time points ti = ( i

100)T for

i = 0, 1, . . . , 100, the area of two MR processes is plotted against the area of another

pair of MR processes with different long-term mean and volatility parameters but

all four processes having the same mean reversion speed and driven by the same

Brownian path.

Figure 6: Scatter plot of areas of two pairs of MR processes with different long-termmeans and volatilities, but all four processes having the same mean reversion speedand driven by the same Brownian path.

43

To underline the fundamental importance of the assumptions of the same mean

reversion speed for MR processes and a common Brownian driving path to the linear

relationship between areas of pairs of CD or MR processes, Figure 7 below illustrates

the dramatic impact that changing the mean reversion speed – even slightly – for one

of the MR processes that were used to create Figure 6 can have on this relationship

– it suddenly becomes highly non-linear!

Figure 7: Scatter plot of areas of the same two pairs of MR processes as in Figure 6after slightly altering the mean reversion speed for one of the processes.

In fact, the area process of a pair of CD or MR processes with the same mean

reversion speed can sometimes be completely ‘derailed’ in this manner by changing

any parameter of one of the processes for the duration of a single time step only

while having a barely perceptible effect on the future evolution of the process that

has been momentarily perturbed. So, any hope that the above result on the linear

relationship between area processes could be extended to CD or MR processes with

deterministic time-dependent drift or volatility parameters is unfortunately forlorn (as

is also evident from a theoretical perspective by relaxing the parameter assumptions

in (2.13) and (2.14) above) – it only works for constant parameters.

44

If instead one pairs a CD process with an MR process and plots their area against

the area of a pair of either two CD processes or two MR processes or against that of

another mixed pair of CD and MR processes, one can no longer expect to observe a

linear relationship between the two area processes even though all four path processes

may still be driven by the same Brownian motion. And, naturally, the same will be

true for a scatter plot of the areas of a pair of CD paths and a pair of MR paths.

Figure 8 below exhibits the areas of a pair of CD processes and a mixed pair of CD

and MR processes plotted against each other for one Brownian sample path – and for

different realisations of Brownian motion this scatter plot would look very different!

Figure 8: Scatter plot of the areas of a pair of two CD processes and a mixed pair ofCD and MR processes all driven by the same Brownian path.

As a further example of intriguing non-linear behaviour that can be witnessed in

some of such cases, in Figure 9 below we display the final values of the areas of two

mixed pairs of CD and MR processes at the end of the time interval[0, T

]for 500

Brownian sample paths.

As can be seen from Figure 9, different Brownian paths have quite disparate effects

on these two area processes resulting in a highly non-linear scatter plot. It is rather

surprising that by simply changing some of the parameter values (e.g. the long-term

45

Figure 9: Scatter plot of terminal values of the areas of two mixed pairs of CD andMR processes for 500 simulation runs.

means of the two MR processes) the relationship between the two area processes can

be rendered approximately linear, as is shown in Figure 10 below, so that for each

Brownian path both of these areas evolve in essentially the same way. Furthermore,

the linear relationship can be made even tighter by increasing mean reversion speeds

for the MR processes (without affecting the slope).

As said, for the areas of two pairs of CD or MR processes to be linearly related,

in addition to all four processes having the same mean reversion speed (regarding

CD processes as having zero mean reversion speed even though in general they have

non-zero drifts), it is also a crucial requirement that they should all be driven by

the same Brownian path. To illustrate this point, in Figure 11 below are plotted

the terminal values of the areas of the two pairs of MR processes that were used to

produce Figure 6 for 500 simulation runs under the scenarios where the Brownian

motions driving the four MR processes have pairwise correlations of 1.00, 0.99 and

0.90, respectively. We observe in the scatter plots increasing dispersion about the

gradient as correlation is lowered. Indeed, this effect is quite pronounced even for a

marginal reduction in correlation from 1.00 to 0.99 though this makes the Brownian

46

Figure 10: Scatter plot of terminal values of the same two areas as in Figure 9 withdifferent long-term means assigned to the MR processes.

paths diverge only slightly from one another. Reducing correlation further to 0.90 –

which is still a high value for path correlation – renders the linear relationship scarcely

discernible, and for lower values of correlation the points appear to be scattered at

random without any recognisable pattern.

Alternatively, one could consider the third order area of four CD or MR processes

with the same mean reversion speed for a single simulation run (with the same four

sequences of independent Gaussian increments) for different values of the pairwise

correlation ρ. By examining the entries of Cholesky factorization of the correlation

matrix2 that is used to produce correlated Gaussian increments from four sequences

of independent drawings from the standard normal distribution, it is not too difficult

to see that, for values of ρ close to 1, the third order area is O(√

1− ρ2), so that,

for example, reducing correlation from 99.99% to 99.00% increases the third order

area approximately by a factor of 10. However, when all the diffusion processes have

the same parameters – so that the four paths are realisations of the same CD or MR

2 For it can readily be shown that each of the entries below the top row (1, ρ, ρ, ρ) of the uppertriangular Cholesky factor tends to some constant multiple of

√1− ρ2 as ρ approaches 1.

47

Figure 11: Scatter plots of terminal values of the areas of two pairs of MR processesall having the same mean reversion speed for 500 simulation runs with the pairwisecorrelation between Brownian motions driving the processes equal to 1.00, 0.99 and0.90, respectively.

process for highly correlated Brownian motions – in the expression (2.7) for the third

order area all the fourth order iterated integrals that are O(√

1− ρ2)

actually cancel

out, wherefore in this case the third order area is in fact O(1 − ρ2

), and, for this

reason, it grows twice as fast on a logarithmic scale as correlation is reduced from 1.

48

2.5.3 Classifying sample paths of diffusion processes by usingthird order areas

In the previous subsection, we established, via (2.13) and (2.14), the fundamental

result that the area process of the areas of two pairs of paths – i.e. the third order

area (of this type) of the four paths concerned – that are realisations of either CD

processes (with arbitrary drift and volatility parameters) or MR processes which

have the same mean reversion speed (but arbitrary long-term means and volatilities)

is identically zero as a function of time whenever all of these processes are driven by

the same Brownian motion.

We also saw that the third order area provides a sensitive means of detecting

differences in mean reversion speeds between four given diffusion processes that are

driven by the same Brownian motion, as the value of the third order area of their

sample paths can deviate spectacularly from zero even when discrepancies in mean

reversion speeds are so slight that their impacts on the sample paths are hardly visible.

Thus, third order areas can be used to classify sample paths of diffusion processes

according to their mean reversion speeds, for given an arbitrary ‘market’ path that

is assumed to be the realisation of a CD or MR process for some Brownian sample

path, we can combine it with three other paths that are realisations of either CD

or MR processes with the same mean reversion speed (but drift or long-term mean

and volatility parameters that can be given arbitrary values) all driven by the same

Brownian motion, and run a sufficiently large number of simulations to determine the

value of mean reversion speed and the Brownian sample path that minimise the third

order area of these four paths. More precisely, we want to find the combination of

mean reversion speed and Brownian sample path that minimises the Euclidean norm

of the third order area over the time interval[0, T

]as given by the expression below

(n∑

i=0

(A

((1, 2), (3, 4))0, i(T/n)

)2)1/2

where n is the number of time steps per simulation. In this way, computing third

order areas of market paths with sets of three test paths of known characteristics can

be used to differentiate between the two basic modes of market behaviour, namely

upward or downward trending (with constant drift) versus mean-reverting.

However, going back to our earlier discussion in Subsection 2.5.1, since an arbitrary

market path can be represented as the realisation of either a CD or MR process

(with any given parameters) for some Brownian sample paths, there are no absolute

49

grounds for regarding one path as ‘upward trending’, say, and another path as ‘mean-

reverting’ if one works with the whole sample space of Brownian motion, that is,

the set of all continuous paths. To make the distinction between these two types of

path meaningful, we need to restrict, for the purposes of our analysis, the cardinality

of the set of Brownian sample paths driving diffusion processes to a finite number.

Nevertheless, this number, which we will denote by N , can be chosen to be arbitrarily

large.

Let us now illustrate our path classification method with a numerical example. As

first step, we generate 1,000 Brownian sample paths, each comprising 100 independent

Gaussian increments (so in this example n = 100 and N = 1000). For our market

path, we choose the realisation of an MR process with mean reversion speed θmkt

equal to 5.1 (and some arbitrary long-term mean and volatility) for the first Brownian

sample path (i.e. of index 1). Then, for three ‘test’ MR processes which have the same

mean reversion speed θtest (and fixed but arbitrary long-term means and volatilities),

we compute the third order area of the quadruple of market and test paths that are

the realisations of the test MR processes for each of the 1,000 Brownian sample paths

– so that in each case the three test paths are driven by the same Brownian motion

– and for each value of θtest ∈ {0.0, 1.0, . . . , 9.0} we record the minimum Euclidean

norm of the third order area and the index of the Brownian sample path that attains

the minimum area value3. The results of this experiment are displayed in Table 1.

Table 1: Determining the mean reversion speed of a given ‘market’ path by minimisingits third order area with three test paths all driven by the same Brownian motion.

3 For θtest = 0.0, the test paths are actually realisations of CD processes with different drifts andvolatilities. Hence, for a market path that is the realisation of some CD process, the third orderarea would be identically zero when the test paths are driven by the first Brownian sample path.

50

There are several interesting observations that one can make from these results.

Firstly, we note a general tendency for the third order area to increase with increasing

value of θtest – which is indeed what one would expect on the basis of (2.14). However,

this trend is bucked at θtest = 5.0 – the value of mean reversion speed for the test

paths that is closest to that of the market path (recall θmkt = 5.1) – where we observe

a sharp dip in the minimum value of the Euclidean norm of the third order area

that is attained when the test paths are driven by the same Brownian motion as the

market path. It is also worth noticing that while for θtest = 6.0 the minimum third

order area is also attained by the same Brownian sample path (of index 1) – though

with much higher minimum value compared to when θtest = 5.0 – for all other values

of θtest different Brownian sample paths are responsible for producing the minima, as

shown in Table 1.

Having thus found an approximate value for θmkt – namely, that it is around 5.0 –

as well as correctly identified the Brownian sample path that drives the market path,

we could proceed to determine θmkt to an arbitrary degree of accuracy by iterating on

the value of θtest to make the third order area converge towards zero. However, there

is a more direct and efficient way to determine the exact value of θmkt as well as those

of the other parameters: since both the market path and the Brownian sample path

driving it are now completely known, we can write down equations for, say, the first

three increments of the market path using (2.10), and as these are three simultaneous

equations that are linear in θmkt, σmkt and θmktαmkt, they can be easily solved for

the mean reversion speed, volatility and long-term mean of the MR process whose

realisation the market path is.

As this example demonstrates, in a market model where the evolution of every

variable is the realisation of a CD or MR diffusion process for one of finitely many

Brownian sample paths, the above method of computing third order areas can be used

to classify market paths into upward/downward trending versus mean-reverting ones

by efficiently determining the values of mean reversion speed and other parameters

as well as the Brownian sample path that drives the diffusion process.

In order to simulate the application of this path classification method to real mar-

ket data streams (i.e. arbitrary continuous paths), we also carried out an experiment

where market paths were realisations of CD or MR processes for Brownian sample

paths not belonging to the set of Brownian paths that were used to drive test paths.

Unfortunately, unlike in the previous experiment, in this experiment the approximate

value of mean reversion speed of the market path could not be identified as the value

of mean reversion speed for test paths that produces the lowest third order area –

51

even when the market and test paths had exactly the same mean reversion speed

their third order area did not stand out by having a distinctively low, or even zero,

value, and in most cases the Brownian driver of test paths that minimised third order

area bore no resemblance to the Brownian path driving the market path! Moreover,

increasing the number of Brownian sample paths from 1,000 to 10,000 or even 100,000

did not improve the performance of this path classification method. Evidently, the

lesson to learn from this latter exercise is that even in a very large but finite space

of Brownian sample paths the probability of chancing upon a path that is, in some

defined sense, close to a given arbitrary path is vanishingly small – in fact zero – and

in the previous subsection we saw how even tiny discrepancies in the driving paths of

CD or MR processes can have a dramatic impact on their area processes!

In view of this negative (though not at all surprising) result, it is clear that the

proposed method of classifying paths according to their mean reversion speeds is not

applicable, without modification, in the general market setting where paths of market

variables can be thought of as realisations of diffusion processes for arbitrary sample

paths of Brownian motion (i.e. any continuous paths). However, if the problem could

somehow be reduced to one involving a space of only finitely many Brownian sample

paths, this method would be rendered viable, as was shown in the first experiment.

One idea in this direction would be to try to use third order areas of quadruples of

an arbitrary market path and three CD or MR test paths computed, as in the above

experiments, for a range of mean reversion speeds and a finite number of Brownian

sample paths, in order to approximate a given market path by a sum of realisations

of CD and MR processes for paths in a fixed, finite sample space of Brownian motion.

This suggestion will be pursued by the author as a line of future research outside the

scope of this thesis.

2.6 Conclusion

We began this chapter by surveying existing literature for applications of the theory

of rough paths to time series analysis, and found that so far the two main approaches

have been to use the signatures of multi-dimensional discrete data streams as feature

sets in linear regression for the purposes of statistical classification and prediction –

an approach that can capture subtle underlying features of market behaviour when

applied to financial data – and to employ expected signatures of multi-dimensional

stochastic processes for estimating their parameters.

52

In our original research work presented in this thesis, we have taken a different,

more direct (non-statistical and non-probabilistic) approach by exploring ways in

which first and higher order areas of multi-dimensional data streams – an nth order

area being a specific linear combination of (n + 1)! signature components of order

(n+1) – can be used to classify data streams according to their basic characteristics.

Having first developed all the requisite mathematical and computational tools for

this task, we have shown, as a particular application of this approach, that in a

market model where every variable follows either a Wiener process with a constant

drift or a mean-reverting Ornstein-Uhlenbeck process driven by one of finitely many

Brownian sample paths, third order areas provide an efficient means of determining

the parameters of a market variable given any of its realisations, thus enabling one to

distinguish between the two fundamental modes of market behaviour, namely upward

or downward trending versus mean-reverting.

In conclusion, our path classification method based on third order areas represents

a novel way of using signature components of multi-dimensional paths for the purposes

of time series analysis of financial data streams.

An interesting idea for future research would be to investigate the possibility of

using third order areas as a tool for decomposing arbitrary market paths into mean-

reverting path components with a spectrum of mean reversion speeds.

53

References

[1] K. T. Chen. Iterated path integrals. Bulletin of American Mathematical Society,

83:831–879, 1977.

[2] G. Flint, B. Hambly, and T. Lyons. Discretely sampled signals and the rough

Hoff process. Stochastic Processes and their Applications, arXiv:1310.4054v11,

2016.

[3] X. Geng. Reconstruction for the signature of a rough path. Preprint

arXiv:1508.06890v2, 2016.

[4] M. Gubinelli. Ramification of rough paths. Journal of Differential Equations,

248(4):693–721, 2010.

[5] L. J. Gyurko, T. Lyons, M. Kontkowski, and J. Field. Extracting information

from the signature of a financial data stream. Preprint arXiv:1307.7244v2, 2014.

[6] B. M. Hambly and T. J. Lyons. Uniqueness for the signature of a path of bounded

variation and the reduced path group. Annals of Mathematics, 171(1):109–167,

2010.

[7] B. Hoff. The Brownian frame process as a rough path. D.Phil. thesis, University

of Oxford, 2005.

[8] D. Levin, T. Lyons, and H. Ni. Learning from the past, predicting the statistics

for the future, learning an evolving system. Preprint arXiv:1309.0260v6, 2016.

[9] T. J. Lyons. Differential equations driven by rough signals. Revista Matematica

Iberoamericana, 14(2):215–310, 1998.

[10] T. J. Lyons, M. Caruana, and T. Levy. Differential Equations Driven by Rough

Paths. Number 1908 in Lecture Notes in Mathematics. Springer-Verlag, 2007.

54

[11] T. J. Lyons and Z. Qian. System Control and Rough Paths. Oxford Mathematical

Monographs. Oxford University Press, 2002.

[12] R. Ree. Lie elements and an algebra associated with shuffles. Annals of Mathe-

matics, 68(2):210–220, 1958.

[13] L. C. Young. An inequality of Holder type, connected with Stieltjes integration.

Acta Mathematica, 67:251–282, 1936.

55

Appendix 1: Quadratic variationand cross-variation of data streams

In this appendix we compute the signature of a multi-dimensional data stream after

lead- and lag-transforming two of its path components respectively.

Let(Xt

)N

t=0=(X1

t e1+∙ ∙ ∙+Xdt ed

)N

t=0be a data stream in V = Rd with a set of basis

vectors {e1, . . . , ed}, and denote by(Yt/2

) 2N

t=0the data stream derived from

(Xt

)N

t=0

by lead-transforming its ith component and lag-transforming its jth component in

accordance to the GLKF method so that Y it = X i

t and Y jt = Xj

t for t = 0, . . . , N , and

Y it− 1

2

= X it and Y j

t− 12

= Xjt−1 for t = 1, . . . , N . As we are ultimately interested in the

area between the Y i and Y j path components – i.e. (half of) the difference between

the (i, j) and (j, i) second order signature components of Y – in our calculation we will

explicitly show only those signature components that can contribute to this quantity

(i.e. the first order increments in Y i and Y j in addition to the aforementioned second

order signature components).

Computing the signature of Y over the time interval[0, 1

2

], we get

S(Y )0, 12

= 1 + (X i1 −X i

0)ei + 0ej + 12(X i

1 −X i0).0ei ⊗ ej + 1

20.(X i

1 −X i0)ej ⊗ ei + . . .

= 1 + (X i1 −X i

0)ei + . . .

Likewise, S(Y ) 12,1 = 1 + (Xj

1 −Xj0)ej + . . . , whence we obtain

S(Y )0,1 = S(Y )0, 12⊗S(Y ) 1

2,1 = 1+(X i

1−X i0)ei+(Xj

1−Xj0)ej+(X i

1−X i0)(X

j1−Xj

0)ei⊗ej+. . .

Similarly, we have

S(Y )1,2 = S(Y )1, 32⊗S(Y ) 3

2,2 = 1+(X i

2−X i1)ei+(Xj

2−Xj1)ej+(X i

2−X i1)(X

j2−Xj

1)ei⊗ej+. . .

which, as S(Y )0,1 ⊗ S(Y )1,2 = S(Y )0,2, gives

S(Y )0,2 = 1 + (X i2 −X i

0)ei + (Xj2 −Xj

0)ej

+{(X i

1 −X i0)(X

j1 −Xj

0) + (X i2 −X i

1)(Xj2 −Xj

1) + (X i1 −X i

0)(Xj2 −Xj

1)}ei ⊗ ej

+ (Xj1 −Xj

0)(Xi2 −X i

1)ej ⊗ ei + . . .

56

Further, extending the signature calculation to the next time step yields

S(Y )0,3 = 1 + (X i3 −X i

0)ei + (Xj3 −Xj

0)ej

+{(X i

1 −X i0)(X

j1 −Xj

0) + (X i2 −X i

1)(Xj2 −Xj

1) + (X i3 −X i

2)(Xj3 −Xj

2)

(X i1 −X i

0)(Xj2 −Xj

1) + (X i2 −X i

0)(Xj3 −Xj

2)}ei ⊗ ej

+{(Xj

1 −Xj0)(X

i2 −X i

1) + (Xj2 −Xj

0)(Xi3 −X i

2)}ej ⊗ ei + . . .

Thus, we can see from the above expressions for S(Y )0,1, S(Y )0,2, S(Y )0,3, etc.

that if X it = Xj

t for all t = 0, . . . , N – i.e. Y i and Y j are the lead- and lag-transform

of the same data stream – then the difference between the (i, j) and (j, i) signature

components of Y is equal to the quadratic variation of this data stream. However,

when X i and Xj are two different data streams, it is evident that by subtracting the

(j, i) component from the (i, j) component does not in general cancel out the ‘extra’

terms in the latter, leaving just the quadratic cross-variation of X i and Xj . This can

also be experimentally demonstrated in random simulations where (twice) the area

between lead- and lag-transformed data streams and the quadratic cross-variation of

the original streams usually have decidedly different values.

57

Appendix 2: Python code

In this appendix we exhibit samples of source code of the computer programs written

in Python that have been used to produce all the computational results presented in

this thesis.

1. Signature function

The numerical algorithm that is fundamental to all the computational work done in

this piece of research is a function that generates a time-indexed sequence of truncated

signatures of an arbitrary degree specified by the user for a given multi-dimensional

time series. This is accomplished through the following two-step procedure: 1) by

repeated application of outer matrix multiplication (a standard Numpy functionality)

computing from path increments iterated integrals of successively higher orders for

each linear segment of the continuous multi-dimensional path constructed by linearly

interpolating between discrete data points, and constituting these into an incremen-

tal signature over the time step by appending them to a list starting with 1 as the

zeroth order component; and 2) by joining together incremental signatures for con-

tiguous time steps using tensor multiplication (by another application of outer matrix

multiplication) to form a cumulative signature over the whole time period.

The source code of our user-defined signature function is shown in full below.

import numpy as np

""" Definition of a function to compute a time-indexed sequence of truncated signatures

of a specified degree of multi-dimensional path increments """

""" For t=0,1,...,(t_fin-t_ini) and k=0,1,...,p, Sig( ) function computes signature components

(i.e. iterated integrals) S[t][k][d_1,...,d_k] with d_j=0,1,...,d-1 for j=1,...,k """

def Sig(path_increments, t_ini, t_fin, signature_degree):

I = path_increments # matrix of floats with n rows and d columns

p = signature_degree # degree of truncated signature

n = len(I[:,0]) # number of increments per path component

d = len(I[0,:]) # number of path components (dimension)

# t_ini (0,...,n-1) and t_fin (1,...,n) are initial and final time points

58

""" Initialize signature for time t_ini """

Sig_0 = [1.0]

Z = np.zeros(d)

for i in range(p):

s = np.multiply.outer(Sig_0[i], Z)

Sig_0.append(s)

Sig = Sig_0

S = [Sig_0]

""" Compute incremental signature for each time step and update cumulative signature """

for i in range(t_ini, t_fin, 1): # iterate over time points from t_ini to t_fin-1

J = I[i,:] # extract ith row of path increments

Sig_inc = [1.0]

for j in range(p): # compute incremental signature

x = np.multiply.outer(Sig_inc[j], J)

x = x/(j+1)

Sig_inc.append(x)

Sig_new = [1.0]

for k in range(1, p+1, 1): # k = 1,...,p

y = Sig_0[k]

for l in range(k+1): # l = 0,1,...,k

y = y + np.multiply.outer(Sig[l], Sig_inc[k-l])

Sig_new.append(y)

Sig = Sig_new # update cumulative signature

S.append(Sig) # form sequence of signatures indexed by time

return S # return value of user-defined signature function

2. Lead-lag transforms

Below we exhibit a Python script that can be used to obtain lead and lag transforms

of arbitrary multi-dimensional data streams in accordance with the three methods

examined in Section 2.3, and to compute their signatures – to check that they are

indeed invariant under these transforms – as well as to visualise the original and

lead-lag transformed paths.

import numpy as np

import matplotlib.pyplot as plt

import pylab

from matplotlib.ticker import MultipleLocator

59

majorLocator = MultipleLocator(0.20)

minorLocator = MultipleLocator(0.10)

""" Set dimensionality and number of time steps """

d = 3 # number of paths (dimensionality)

n = 10 # number of increments per path

p = 3 # degree of signature

""" Generate original Brownian paths """

np.random.seed(1) # initialize random number generation

I = np.random.randn(n, d) # standard normal random increments

z = np.zeros(d)

dP = np.vstack( (z, I) )

P = np.cumsum(dP, axis=0) # original Brownian paths

S = Sig(I, 0, n, p) # compute their signature

""" Lead transform paths (GLKF method) """

c1 = np.delete( np.repeat(P[:,0], 2), 0) # lead transform first path

c2 = np.delete( np.repeat(P[:,1], 2), 0) # lead transform second path

c3 = np.delete( np.repeat(P[:,2], 2), 0) # lead transform third path

R = np.column_stack( (c1, c2, c3) ) # lead transformed paths

R1 = np.delete(R, 0, 0) # delete top row

R2 = np.delete(R, 2*n, 0) # delete bottom row

J = R1 - R2 # increments of lead transformed paths

T = Sig(J, 0, 2*n, p) # compute their signature

""" Lag transform paths (GLKF method) """

C1 = np.delete( np.repeat(P[:,0], 2), (2*n + 1)) # lag transform first path

C2 = np.delete( np.repeat(P[:,1], 2), (2*n + 1)) # lag transform second path

C3 = np.delete( np.repeat(P[:,2], 2), (2*n + 1)) # lag transform third path

Q = np.column_stack( (C1, C2, C3) ) # lag transformed paths

Q1 = np.delete(Q, 0, 0) # delete top row

Q2 = np.delete(Q, 2*n, 0) # delete bottom row

K = Q1 - Q2 # increments of lag transformed paths

U = Sig(K, 0, 2*n, p) # compute their signature

""" Lead and lag transformed paths have the same signature as the original paths! """

60

print S # original signature

print T # lead transformed signature

print U # lag transformed signature

""" Plot the original and lead/lag transformed paths """

t = np.linspace(0.0, 1.0, n+1)

u = np.linspace(0.0, 1.0, (2*n)+1)

j = 1 # choose path (j = 0, . . . , d-1)

plt.figure()

plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path

plt.plot(u, R[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path

plt.plot(u, Q[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path

plt.title(’Gyurko-Lyons-Kontkowski-Field Lead-Lag Transforms’, fontsize=14)

ax = plt.axes()

ax.xaxis.set_major_locator(majorLocator)

ax.xaxis.set_minor_locator(minorLocator)

ax.xaxis.grid(which=’minor’)

plt.xlabel(’time’)

plt.ylabel(’value’)

pylab.legend(loc = ’upper left’)

plt.grid()

plt.show()

""" Lead transform paths (old FHL method) """

d1 = np.insert( np.repeat(P[:,0], 2), 2*n+2, P[n,0] ) # lead transform first path

d1 = np.delete( np.delete(d1, 1), 1)

d2 = np.insert( np.repeat(P[:,1], 2), 2*n+2, P[n,1] ) # lead transform second path


d3 = np.insert( np.repeat(P[:,2], 2), 2*n+2, P[n,2] ) # lead transform third path


V = np.column_stack( (d1, d2, d3) ) # lead transformed paths

V1 = np.delete(V, 0, 0) # delete top row

V2 = np.delete(V, 2*n, 0) # delete bottom row

L = V1 - V2 # increments of lead transformed paths

X = Sig(L, 0, 2*n, p) # compute their signature

""" Lag transform paths (old FHL method) """

D1 = np.insert( np.repeat(P[:,0], 2), 0, P[0,0] ) # lag transform first path

D1 = np.delete( np.delete(D1, 2*n), 2*n)

D2 = np.insert( np.repeat(P[:,1], 2), 0, P[0,1] ) # lag transform second path


D3 = np.insert( np.repeat(P[:,2], 2), 0, P[0,2] ) # lag transform third path


W = np.column_stack( (D1, D2, D3) ) # lag transformed paths

61

W1 = np.delete(W, 0, 0) # delete top row

W2 = np.delete(W, 2*n, 0) # delete bottom row

M = W1 - W2 # increments of lag transformed paths

Y = Sig(M, 0, 2*n, p) # compute their signature

""" Lead and lag transformed paths have the same signature as the original paths! """


print X # lead transformed signature

print Y # lag transformed signature



u = np.linspace(0.0, 1.0, (2*n)+1)

j = 1 # choose path (j = 0, . . . , d-1)

plt.figure()


plt.plot(u, V[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path

plt.plot(u, W[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path

plt.title(’Flint-Hambly-Lyons Lead-Lag Transforms: Old Method’, fontsize=14)

ax = plt.axes()







plt.grid()

plt.show()

""" Lead transform paths (new FHL method) """

e1 = np.delete( np.repeat(np.delete(P[:,0], 0), 4) , 0) # lead transform first path

e1 = np.insert( np.insert( e1, 4*n - 1, P[n,0]), 4*n, P[n,0])

# in order to preserve signature, set . . .

#e1[0] = 0.0

e2 = np.delete( np.repeat(np.delete(P[:,1], 0), 4) , 0) # lead transform second path



#e2[0] = 0.0

e3 = np.delete( np.repeat(np.delete(P[:,2], 0), 4) , 0) # lead transform third path



#e3[0] = 0.0

v = np.column_stack( (e1, e2, e3) ) # lead transformed paths

v1 = np.delete(v, 0, 0) # delete top row

v2 = np.delete(v, 4*n, 0) # delete bottom row

62

l = v1 - v2 # increments of lead transformed paths

x = Sig(l, 0, 4*n, p) # compute their signature

""" Lag transform paths (new FHL method) """

E1 = np.repeat(np.delete(P[:,0], n), 4) # lag transform first path (corrected way)

E1 = np.insert( E1, 4*n, P[n,0])

E1[4*n]=E1[4*n-1] # incorrect FHL definition

E2 = np.repeat(np.delete(P[:,1], n), 4) # lag transform second path (corrected way)



E3 = np.repeat(np.delete(P[:,2], n), 4) # lag transform third path (corrected way)



w = np.column_stack( (E1, E2, E3) ) # lag transformed paths

w1 = np.delete(w, 0, 0) # delete top row

w2 = np.delete(w, 4*n, 0) # delete bottom row

m = w1 - w2 # increments of lag transformed paths

y = Sig(m, 0, 4*n, p) # compute their signature

""" Neither lead nor lag transform preserves signature! """


print x # lead transformed signature

print y # lag transformed signature



f = np.linspace(0.0, 1.0, (4*n)+1)

j = 1 # choose path (j = 0, . . . , d-1)

plt.figure()


plt.plot(f, v[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path

plt.plot(f, w[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path

plt.title(’Flint-Hambly-Lyons Lead-Lag Transforms: New Method’, fontsize=14)

ax = plt.axes()







plt.grid()

plt.show()

63

3. Area simulations

The code sample displayed below is an extract from a program that for a set of

constant drift Wiener and mean-reverting Ornstein-Uhlenbeck processes with user-

specified parameters and correlation matrix computes first, second and third order

areas of their sample paths as functions of time using the closed-form formulae in

terms of second, third and fourth order iterated integrals, respectively (as worked out

in Section 2.4) for a given number of simulation runs.

import numpy as np

import sys

""" Set parameters for simulation """

d = 8 # number of paths (i.e. dimensionality of market)

n = 100 # number of increments per path (i.e. number of time steps)

t_ini = 0 # initial time point

t_fin = n # final time point


dt = 1.0/n # length of time step

sqrdt = np.sqrt(dt)

rho = 0.99 # correlation of any pair of paths

M = 1000 # number of simulation runs


""" Set parameters for diffusion processes """

P = np.zeros((d, 3))

""" Each row of matrix P specifies mean reversion speed (set 0.0 for any constant drift Wiener

process), long-term mean or drift, and volatility of an Ornstein-Uhlenbeck or Wiener process """

P[0] = [0.0, 10.0, 5.0]

P[1] = [0.0, 5.0, 10.0]

P[2] = [0.0, -10.0, 5.0]

P[3] = [0.0, -5.0, 10.0]

P[4] = [10.0, 10.0, 5.0]

P[5] = [10.0, 5.0, 10.0]

P[6] = [10.0, -10.0, 5.0]

P[7] = [10.0, -5.0, 10.0]

iv = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] # initial values of the paths

correlation = ’same’ # ’same’ to use the same correlation (rho) for all pairs of paths

""" Choose pairs of paths for two first order areas """

# indices run from 0 to d-1

# set i1 indices to be less than d/2, i2 indices d/2 or greater; similarly for higher order areas

64

i1 = [0,1]

i2 = [4,5]

""" Choose triples of paths for two second order areas """

j1 = [0,1,2]

j2 = [4,5,6]

""" Choose quadruples of paths for two third order areas of type (a) """

k1 = [0,1,2,3]

k2 = [4,5,6,7]

""" Choose quadruples of paths for two third order areas of type (b) """

l1 = [0,1,2,3]

l2 = [4,5,6,7]

""" Specify distinct correlations for each pair of paths """

R = np.zeros((d, d))

R[0] = [1.0, 0.5, 0.4, 0.7, 0.3, 0.2, 0.4, 0.6]

R[1] = [0.0, 1.0, 0.3, 0.4, 0.2, 0.5, 0.6, 0.4]

R[2] = [0.0, 0.0, 1.0, 0.6, 0.4, 0.3, 0.5, 0.2]

R[3] = [0.0, 0.0, 0.0, 1.0, 0.4, 0.5, 0.3, 0.7]

R[4] = [0.0, 0.0, 0.0, 0.0, 1.0, 0.3, 0.2, 0.5]

R[5] = [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.4, 0.2]

R[6] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.5]

R[7] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]

R = R + np.transpose(R) - np.eye(d) # correlation matrix

if rho < 1.0 or correlation != ’same’:

if correlation == ’same’: # use the same correlation (rho) for all pairs of paths

R = np.ones((d,d))*rho + np.eye(d)*(1 - rho)

E = np.linalg.eigvals(R) # compute eigenvalues of correlation matrix

if E[0]<=0 or E[1]<=0 or E[2]<=0 or E[3]<=0 or E[4]<=0 or E[5]<=0 or E[6]<=0 or E[7]<=0:

sys.exit("Correlation matrix is not positive definite")

else:

L = np.linalg.cholesky(R) # lower triangular Cholesky factor of correlation matrix

U = np.transpose(L) # upper triangular Cholesky factor of correlation matrix

""" Simulation loop """

# For multiple simulation runs (M > 1), first and higher order areas at the final time point (n)

65

A1 = [ ] # first order areas for first pair of paths

B1 = [ ] # first order areas for second pair of paths

A2 = [ ] # second order areas for first triple of paths

B2 = [ ] # second order areas for second triple of paths

A3a = [ ] # third order areas of type (a) for first quadruple of paths

B3a = [ ] # third order areas of type (a) for second quadruple of paths

A3b = [ ] # third order areas of type (b) for first quadruple of paths

B3b = [ ] # third order areas of type (b) for second quadruple of paths

# For a single simulation run (M = 1), first and higher order areas at each time point (0, 1, . . . , n)

a1 = [ ] # first order areas for first pair of paths

b1 = [ ] # first order areas for second pair of paths

a2 = [ ] # second order areas for first triple of paths

b2 = [ ] # second order areas for second triple of paths

a3a = [ ] # third order areas of type (a) for first quadruple of paths

b3a = [ ] # third order areas of type (a) for second quadruple of paths

a3b = [ ] # third order areas of type (b) for first quadruple of paths

b3b = [ ] # third order areas of type (b) for second quadruple of paths

for m in range(M):

""" Generate independent standard normal path increments """

dW = np.random.randn(n,d)

if rho == 1.0 and correlation == ’same’:

for k in range(d):

dW[:,k] = dW[:,0]

""" Correlate path increments """

if rho == 1.0 and correlation == ’same’:

dY = dW

else:

dY = np.dot(dW, U)

""" Create correlated Brownian paths """

Z = np.zeros(d)

Z = np.vstack( (Z, dY*sqrdt) )

Z = np.cumsum(Z, axis=0)

""" Create path increments for diffusion processes """

for k in range(d):

if P[k, 0] == 0.0: # create increments for a Wiener process with drift

dY[:,k] = dY[:,k]*P[k,2]*sqrdt + P[k,1]*dt

66

else: # create increments for an Ornstein-Uhlenbeck process

X = np.zeros(n+1)

dX = np.zeros(n)

X[0] = iv[k]

for l in range(n): # l = 0,1,..., n-1

dX[l] = P[k,0]*(P[k,1] - X[l])*dt + P[k,2]*sqrdt*dY[l,k]

X[l+1] = X[l] + dX[l]

dY[:,k] = dX

""" Create paths for diffusion processes """

Y_0 = iv # initial values of paths

Y = np.vstack( (Y_0, dY) )

X = np.cumsum(Y, axis=0) # entire paths

""" Compute sequence of time-indexed signatures of multi-dimensional path increments """

S1 = Sig(dY[:,:(d/2)], t_ini, t_fin, p)

S2 = Sig(dY[:,(d/2):], t_ini, t_fin, p)

s1 = S1[t_fin - t_ini] # signatures over the whole time interval

s2 = S2[t_fin - t_ini]

x1 = s1[2] # second order iterated integrals

x2 = s2[2]

if p > 2:

y1 = s1[3] # third order iterated integrals

y2 = s2[3]

if p > 3:

z1 = s1[4] # fourth order iterated integrals

z2 = s2[4]

""" Function to compute first order area """

def Area1(i1, i2, x):

# i1 is index of first path

# i2 is index of second path

# x is array of second order iterated integrals

A1 = 0.5*(x[i1,i2] - x[i2,i1])

return A1

67

""" Function to compute second order area """

def Area2(j1, j2, j3, y):

# j1 is path to be integrated with area

# j2 is first path for area

# j3 is second path for area

# y is array of third order iterated integrals

# second order area A(j1,(j2,j3)) is computed

A2 = 0.25*(y[j1,j2,j3] - y[j1,j3,j2] + y[j2,j1,j3] - y[j2,j3,j1] - y[j3,j1,j2] + y[j3,j2,j1])

return A2

""" Functions to compute third order areas (two types) """

def Area3a(k1, k2, k3, k4, z):

# k1 is path to be integrated with second order area

# k2 is path to be integrated with area

# k3 is first path for area

# k4 is second path for area

# z is array of fourth order iterated integrals

# third order area A(k1,(k2,(k3,k4))) is computed

A3a = 0.125*(z[k1,k2,k3,k4] + z[k2,k1,k3,k4] + z[k2,k3,k1,k4]

+ z[k1,k3,k2,k4] + z[k3,k1,k2,k4] + z[k3,k2,k1,k4]

+ z[k1,k4,k3,k2] + z[k4,k1,k3,k2] + z[k4,k3,k1,k2]

- z[k1,k2,k4,k3] - z[k2,k1,k4,k3] - z[k2,k4,k1,k3]




+ z[k2,k4,k3,k1] + z[k3,k4,k2,k1] + z[k4,k2,k3,k1])

return A3a

def Area3b(l1, l2, l3, l4, z):

# l1 is first path for first area

# l2 is second path for first area

# l3 is first path for second area

# l4 is second path for second area

# z is array of fourth order iterated integrals

# third order area A((l1,l2),(l3,l4)) is computed

68

A3b = 0.125*(z[l1,l2,l3,l4] + z[l1,l3,l2,l4] + z[l3,l1,l2,l4]

- z[l1,l2,l4,l3] - z[l1,l4,l2,l3] - z[l4,l1,l2,l3]


+ z[l2,l1,l4,l3] + z[l2,l4,l1,l3] + z[l4,l2,l1,l3]

+ z[l4,l3,l1,l2] + z[l4,l1,l3,l2] + z[l1,l4,l3,l2]



+ z[l3,l4,l2,l1] + z[l3,l2,l4,l1] + z[l2,l3,l4,l1])

return A3b

A1.append(Area1(i1[0], i1[1], x1))

B1.append(Area1(i2[0]-(d/2), i2[1]-(d/2), x2))

if p > 2:

A2.append(Area2(j1[0],j1[1],j1[2],y1))

B2.append(Area2(j2[0]-(d/2),j2[1]-(d/2),j2[2]-(d/2),y2))

if p > 3:

A3a.append(Area3a(k1[0],k1[1],k1[2],k1[3],z1))

B3a.append(Area3a(k2[0]-(d/2),k2[1]-(d/2),k2[2]-(d/2),k2[3]-(d/2),z2))

A3b.append(Area3b(l1[0],l1[1],l1[2],l1[3],z1))

B3b.append(Area3b(l2[0]-(d/2),l2[1]-(d/2),l2[2]-(d/2),l2[3]-(d/2),z2))

if M == 1:

for t in range(t_fin - t_ini +1): # iterate over all time points (t = 0, 1, . . . , n)

s1 = S1[t] # signatures over time interval from 0 to time point t

s2 = S2[t]

x1 = s1[2] # second order iterated integrals

x2 = s2[2]

if p > 2:

y1 = s1[3] # third order iterated integrals

y2 = s2[3]

if p > 3:

z1 = s1[4] # fourth order iterated integrals

z2 = s2[4]

69

a1.append(Area1(i1[0], i1[1], x1))

b1.append(Area1(i2[0]-(d/2), i2[1]-(d/2), x2))

if p > 2:

a2.append(Area2(j1[0],j1[1],j1[2],y1))

b2.append(Area2(j2[0]-(d/2),j2[1]-(d/2),j2[2]-(d/2),y2))

if p > 3:

a3a.append(Area3a(k1[0],k1[1],k1[2],k1[3],z1))

b3a.append(Area3a(k2[0]-(d/2),k2[1]-(d/2),k2[2]-(d/2),k2[3]-(d/2),z2))

a3b.append(Area3b(l1[0],l1[1],l1[2],l1[3],z1))

b3b.append(Area3b(l2[0]-(d/2),l2[1]-(d/2),l2[2]-(d/2),l2[3]-(d/2),z2))

# end of simulation loop

4. Path classification

Our final code sample shown below is an extract from the source code of a program

that implements our novel method of classifying paths as either upward/downward

trending or mean-reverting based on computing the third order area of a given market

path and three test paths that are realisations of diffusion processes with the same

mean reversion speed for the same Brownian path driving the processes.

For each value of mean reversion speed assigned to the test processes, the program

computes the minimum Euclidean norm for the third order area of a given market

path and three test paths among all the finitely many Brownian paths that can drive

the processes, and identifies the Brownian path that attains the minimum.

import numpy as np

""" Set parameters for simulation """

n = 100 # number of increments per path (i.e. number of time steps)

t_ini = 0 # initial time point

t_fin = n # final time point


dt = 1.0/n # length of time step

sqrdt = np.sqrt(dt)

""" Generate Brownian paths driving market and test paths """

N = 1000 # number of Brownian paths


dV = np.random.randn(N, n) # standard normal increments

70

dW = np.transpose(dV) # for given n and random seed, dW[:,0] are increments of the first path for any N

W = np.zeros(N)

W = np.vstack( (W, sqrdt*dW) )

W = np.cumsum(W, axis=0) # Brownian paths

""" Set up matrices of increments and paths for computing third order areas """

dX = np.zeros((4,n,N)) # matrix of increments of four Wiener/OU paths for all underlying Brownian paths

X = np.zeros((4,n+1,N)) # matrix of four Wiener/OU paths for all underlying Brownian paths

a3b = np.zeros(n+1) # time-indexed sequence of third order areas for market and test paths

en = np.zeros(N) # Euclidean norm of third order area of market and test paths for each Brownian path

M = np.zeros((10,3)) # for each value of mrs, minimum third order area and Brownian path that attains it

""" Initialize market and test paths """

ini_val = [0.0, 0.0, 0.0, 0.0] # initial values of market and test paths

for i in range(4):

X[i,0,:] = ini_val[i]*np.ones(N)

""" Set parameters for market path """

theta = 5.1 # mean reversion speed ‘mrs’ (theta = 0.0 for a Wiener process with constant drift)

mu = 7.5 # drift/long-term mean of Wiener/Ornstein-Uhlenbeck process

sigma = 10.0 # volatility

P = np.zeros((4, 3)) # matrix of parameters for market and test paths

P[0] = [theta, mu, sigma] # market path parameters

""" Generate market path """

if P[0,0] == 0.0: # create increments for a Wiener process with constant drift

for m in range(n): # m = 0,1,..., n-1

dX[0,m,0] = P[0,1]*dt + P[0,2]*sqrdt*dW[m,0] # the first Brownian path drives the market path

X[0,m+1,0] = X[0,m,0] + dX[0,m,0]


for m in range(n):

dX[0,m,0] = P[0,0]*(P[0,1] - X[0,m,0])*dt + P[0,2]*sqrdt*dW[m,0]

X[0,m+1,0] = X[0,m,0] + dX[0,m,0]

""" Set parameters for test paths """

P[1] = [1.0, 2.0*mu, 0.5*sigma] # first column entries to be multiplied by ‘mrs’ in the loop below

P[2] = [1.0, 0.0, 2.0*sigma]

P[3] = [1.0, (-1.0)*mu, 1.5*sigma]

71

""" Iterate over values of mean reversion speed (mrs) for test paths """

for mrs in range(10):

""" Generate test paths for all underlying Brownian paths """

for i in range(1, 4, 1): # i = 1, 2 and 3

if mrs == 0.0: # create increments for a Wiener process with constant drift

for m in range(n):

dX[i,m,:] = P[i,1]*dt + P[i,2]*sqrdt*dW[m,:]

X[i,m+1,:] = X[i,m,:] + dX[i,m,:]


for m in range(n):

dX[i,m,:] = mrs*P[i,0]*(P[i,1] - X[i,m,:])*dt + P[i,2]*sqrdt*dW[m,:]

X[i,m+1,:] = X[i,m,:] + dX[i,m,:]

""" Compute signature of market and test paths for each underlying Brownian path """

for j in range(N):

I = np.column_stack( ( dX[0,:,0], dX[1,:,j], dX[2,:,j], dX[3,:,j] ) ) # path increment matrix

S = Sig( I, t_ini, t_fin, p)

for t in range(t_fin - t_ini +1):

z = S[t][4] # fourth order iterated integrals over interval from 0 to time point t

a3b[t] = Area3b(0,1,2,3,z) # third order area of market and test paths

en[j] = np.sqrt( sum(a3b**2) ) # Euclidean norm of third order area over the whole time interval

M[mrs] = [ mrs, np.amin(en), np.argmin(en)+1 ]

print M

72

rough paths theory and its application to time series ... math fin... · of using third order areas...

Documents