introduction to algorithmic trading strategies...

Introduction to Algorithmic Trading StrategiesLecture 5

Pairs Trading by Stochastic Spread Methods

Haksun [email protected]

www.numericalmethod.com

Outline First passage time Kalman filter Maximum likelihood estimate EM algorithm

2

References As the emphasis of the basic co‐integration methods of most papers are

on the construction of a synthetic mean‐reverting asset, the stochastic spread methods focuses on the dynamic of the price of the synthetic asset.

Most referenced academic paper: Elliot, van der Hoek, and Malcolm, 2005, Pairs Trading Model the spread process as a state‐space version of Ornstein‐Uhlenbeck

process Jonathan Chiu, Daniel Wijaya Lukman, Kourosh Modarresi, Avinayan

Senthi Velayutham. High‐frequency Trading. Stanford University. 2011 The idea has been conceived by a lot of popular pairs trading books

Technical analysis and charting for the spread, Ehrman, 2005, The Handbook of Pairs Trading

ARMA model, HMM ARMA model, some non‐parametric approach, and a Kalman filter model, Vidyamurthy, 2004, Pairs Trading: Quantitative Methods and Analysis

3

Spread as a Mean‐Reverting Process

The long term mean = .

The rate of mean reversion = .

4

Sum of Power Series We note that

∑

5

Unconditional Mean

6

Long Term Mean

7

Unconditional Variance

8

Long Term Variance

9

Observations and Hidden State Process The hidden state process is:

1

0, 0 1 The observations:

We want to compute the expected state from observations. | |

10

First Passage Time Standardized Ornstein‐Uhlenbeck process 2

First passage time , inf 0, 0| 0

The pdf of , has a maximum value at

ln 1 3 4 3

11

A Sample Trading Strategy

,

Buy when unwind after time

Sell when unwind after time

12

Kalman Filter The Kalman filter is an efficient recursive filter that estimates the state of a dynamic system from a series of incomplete and noisy measurements.

13

Conceptual Diagram

prediction at time t Update at time t+1

as new measurements come in

correct for better estimation

14

A Linear Discrete System

: the state transition model applied to the previous state

: the control‐input model applied to control vectors : the noise process drawn from multivariate Normal distribution

15

Observations and Noises

: the observation model mapping the true states to observations

: the observation noise

16

Discrete System Diagram

17

Prediction predicted a prior state estimate | |

predicted a prior estimate covariance | |

18

Computing the ‘Best’ State Estimate Given , , , , we define the conditional variance Σ | ≡ E |

Start with | , .

20

Predicted (a Priori) State Estimation |

|

21

Predicted (a Priori) Variance |

|

|

|

|

22

Minimize Posteriori Variance Let the Kalman updating formula be | | |

We want to solve for K such that the conditional variance is minimized. Σ | E |

23

Solve for K E |

E | | |

E | | |

E 1 | |

1 E | |

1 Σ |

24

First Order Condition for k

|

|

|

25

Optimal Kalman Filter

|

|

26

Updated (a Posteriori) State Estimation So, we have the “optimal” Kalman updating rule. | | |

||

||

27

Updated (a Posteriori) Variance Σ | E | 1 Σ |

1 |

|Σ |

|

|

|

Σ ||

|

| |

|

| |

|

| |

|

Σ |

28

Parameter Estimation We need to estimate the parameters from the observable data before we can use the Kalman filter model.

We need to write down the likelihood function in terms of , and then maximize w.r.t. .

29

Likelihood Function A likelihood function (often simply the likelihood) is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values.

30

Maximum Likelihood Estimate We find such that is maximized given the observation.

31

Example Using the Normal Distribution We want to estimate the mean of a sample of size

drawn from a Normal distribution.

exp

32

Log‐Likelihood

Maximizing the log‐likelihood is equivalent to maximizing the following. ∑

First order condition w.r.t., ∑

33

Nelder‐Mead After we write down the likelihood function for the Kalman model in terms of , we can run any multivariate optimization algorithm, e.g., Nelder‐Mead, to search for . ma ;

The disadvantage is that it may not converge well, hence not landing close to the optimal solution.

34

Marginal Likelihood For the set of hidden states, , we write ; | ∑ , |

Assume we know the conditional distribution of , we could instead maximize the following. ma E | , , or

ma E log | ,

The expectation is a weighted sum of the (log‐) likelihoods weighted by the probability of the hidden states.

35

The Q‐Function Where do we get the conditional distribution of from?

Suppose we somehow have an (initial) estimation of the parameters, . Then the model has no unknowns. We can compute the distribution of .

| ,

36

EM Intuition Suppose we know , we know completely about the mode; we can find

Suppose we know , we can estimate , by, e.g., maximum likelihood.

What do we do if we don’t know both and ?

37

Expectation‐Maximization Algorithm Expectation step (E‐step): compute the expected value of the log‐likelihood function, w.r.t., the conditional distribution of under and . | E

| ,log | ,

Maximization step (M‐step): find the parameters, , that maximize the Q‐value. argmax |

38

EM Algorithms for Kalman Filter Offline: Shumway and Stoffer smoother approach, 1982

Online: Elliott and Krishnamurthy filter approach, 1999

39

A Trading Algorithm From , , …, , we estimate . Decide whether to make a trade at , unwind at

, or some time later, e.g., . As arrives, estimate . Repeat.

40

Results (1)

41

Results (2)

42

Results (3)

43

introduction to algorithmic trading strategies...

Documents