stochastic models of biochemical systemsanderson/recenttalks/2012/amsterdam/... · 2012. 11....

Stochastic models of biochemical systems

David F. Anderson∗

∗[email protected]

Department of Mathematics

University of Wisconsin - Madison

University of Amsterdam

November 14th, 2012

Stochastic models of biochemical systems

Goal:I give broad introduction to stochastic models of biochemical systems,

I with minimal technical details.

Outline

1. Construct useful representation for most common continuous timeMarkov chain model for population processes.

2. Discuss some computational methods – sensitivity analysis.

3. Discuss various approximate models for these CTMCs.

Example: ODE Lotka-Volterra predator-prey model

Think of A as a prey and B as a predator.

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with κ1 = 2, κ2 = .002, κ3 = 2.

Deterministic model. Let x(t) = [# prey at t ,# predator at t ]T

˙x(t) = κ1x1(t)[

]+ κ2x1(t)x2(t)

[−11

]+ κ3x2(t)

[0−1

x(t) = x(0) + κ1

0x1(s)ds

]+ κ2

0x1(s)x2(s)ds

[−11

]+ κ3

0x2(s)ds

[0−1

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with κ1 = 2, κ2 = .002, κ3 = 2.

Deterministic model. Let x(t) = [# prey at t ,# predator at t ]T

˙x(t) = κ1x1(t)[

]+ κ2x1(t)x2(t)

[−11

]+ κ3x2(t)

[0−1

x(t) = x(0) + κ1

0x1(s)ds

]+ κ2

0x1(s)x2(s)ds

[−11

]+ κ3

0x2(s)ds

[0−1

Lotka-Volterra

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with κ1 = 2, κ2 = .002, κ3 = 2.

0 5 10 15 20 25 30 35 40700

PreyPredator

Biological example: transcription-translation

Gene transcription & translation:

Gκ1→ G + M transcription

Mκ2→ M + P translation

Mκ3→ ∅ degradation

Pκ4→ ∅ degradation

G + Pκ5�κ−5

B Binding/unbinding of Gene

Cartoon representation:

Nαq1�

X1N·λ1→ M, X2

N·λ2→ M,

M µ→ ∅.

1J. Paulsson, Physics of Life Reviews, 2, 2005 157 – 175.

Another example: Viral infectionLet

1. T = viral template.2. G = viral genome.3. S = viral structure.4. V = virus.

Reactions:

R1) T + “stuff”κ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T + “stuff”κ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

Some examples

E. coli Heat Shock Response Model. 9 species, 18 reactions.

2Hye Won Kang, presentation at SPA in 2007.

Modeling

1. These models (and much more complicated ones) have historically beenpredominantly modeled using ODEs.

2. However:

2.1 there are often low numbers of molecules, which makes timing of reactionsmore random (less averaging),

2.2 when a reaction occurs, the system jumps to new state by non-trivialamount: 1� 0.

3. Researchers (mostly) lived with these shortcomings until the late 1990sand early 2000s when it was shown ODE models can not captureimportant qualitative behavior of certain models:

I λ-phage lysis-lysogeny decision mechanism (Arkin-McAdams 1998).

I Green fluorescent protein.

ODEs were often the wrong modeling choice.

Modeling

1. These models (and much more complicated ones) have historically beenpredominantly modeled using ODEs.

2. However:

2.1 there are often low numbers of molecules, which makes timing of reactionsmore random (less averaging),

2.2 when a reaction occurs, the system jumps to new state by non-trivialamount: 1� 0.

3. Researchers (mostly) lived with these shortcomings until the late 1990sand early 2000s when it was shown ODE models can not captureimportant qualitative behavior of certain models:

I λ-phage lysis-lysogeny decision mechanism (Arkin-McAdams 1998).

I Green fluorescent protein.

ODEs were often the wrong modeling choice.

Specifying infinitesimal behavior

Q: What is a better modeling choice? Should be

1. discrete space, since counting molecules, and

2. stochastic dynamics.

Let’s return to development of ODEs.

An ordinary differential equation is specified by describing how a functionshould vary over a small period of time

X (t + ∆t)− X (t) ≈ F (X (t))∆t

A more precise description (consider a telescoping sum)

X (t) = X (0) +

0F (X (s))ds

Specifying infinitesimal behavior

Q: What is a better modeling choice? Should be

1. discrete space, since counting molecules, and

2. stochastic dynamics.

Let’s return to development of ODEs.

An ordinary differential equation is specified by describing how a functionshould vary over a small period of time

X (t + ∆t)− X (t) ≈ F (X (t))∆t

A more precise description (consider a telescoping sum)

X (t) = X (0) +

0F (X (s))ds

Infinitesimal behavior for jump processes

We are interested in functions that are piecewise constant and random.

Changes, when they occur, won’t be small. If “reaction k ” occurs at time t ,

X (t)− X (t−) = ζk ∈ Zd

What is small? The probability of seeing a jump of a particular size.

P{X (t + ∆t)− X (t) = ζk | Ft} ≈ λζk (t)∆t

Question: Can we specify the λζk in some way that determines X?

I For the ODE, F depended on X .

I Maybe λζk should depend on X?

Infinitesimal behavior for jump processes

We are interested in functions that are piecewise constant and random.

Changes, when they occur, won’t be small. If “reaction k ” occurs at time t ,

X (t)− X (t−) = ζk ∈ Zd

What is small? The probability of seeing a jump of a particular size.

P{X (t + ∆t)− X (t) = ζk | Ft} ≈ λζk (t)∆t

Question: Can we specify the λζk in some way that determines X?

I For the ODE, F depended on X .

I Maybe λζk should depend on X?

Simple model

For example, consider the simple system

A + B → C

where one molecule each of A and B is being converted to one of C.

Intuition for standard stochastic model:

P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t

whereI κ is a positive constant, the reaction rate constant.

I Ft is all the information pertaining to the process up through time t .

Can we specify a reasonable model satisfying this assumption?

Simple model

For example, consider the simple system

A + B → C

Can we specify a reasonable model satisfying this assumption?

Background information: The Poisson processWill view a Poisson process, Y (·), through the lens of an underlyingpoint process.

(a) Let {ei} be i.i.d. exponential random variables with parameter one.

(b) Now, put points down on a line with spacing equal to the ei :

x x x x x x x x↔e1↔e2

←→e3 · · · t

I Let Y1(t) denote the number of points hit by time t .

I In the figure above, Y1(t) = 6.

0 5 10 15 200

Background information: The Poisson processWill view a Poisson process, Y (·), through the lens of an underlyingpoint process.

(a) Let {ei} be i.i.d. exponential random variables with parameter one.

(b) Now, put points down on a line with spacing equal to the ei :

←→e3 · · · t

I Let Y1(t) denote the number of points hit by time t .

I In the figure above, Y1(t) = 6.

0 5 10 15 200

The Poisson processLet

I Y1 be a unit rate Poisson process.

I Define Yλ(t) ≡ Y1(λt),

Then Yλ is a Poisson process with parameter λ.

←→e3 · · · t

Intuition: The Poisson process with rate λ is simply the number of points hit(of the unit-rate point process) when we run along the time frame at rate λ.

0 5 10 15 200

The Poisson process

There is no reason λ needs to be constant in time, in which case

Yλ(t) ≡ Y(∫ t

0λ(s)ds

)is a non-homogeneous Poisson process with propensity/intensity λ(t) ≥ 0.

P{Yλ(t + ∆t)− Yλ(t) > 0|Ft} = 1− exp{−∫ t+∆t

tλ(s)ds

}≈ λ(t)∆t .

Points:

1. We have “changed time” to convert a unit-rate Poisson process to onewhich has rate or intensity or propensity λ(t).

2. Will use similar time changes of unit-rate processes to build models ofinterest.

The Poisson process

There is no reason λ needs to be constant in time, in which case

Yλ(t) ≡ Y(∫ t

0λ(s)ds

)is a non-homogeneous Poisson process with propensity/intensity λ(t) ≥ 0.

P{Yλ(t + ∆t)− Yλ(t) > 0|Ft} = 1− exp{−∫ t+∆t

tλ(s)ds

}≈ λ(t)∆t .

Points:

1. We have “changed time” to convert a unit-rate Poisson process to onewhich has rate or intensity or propensity λ(t).

2. Will use similar time changes of unit-rate processes to build models ofinterest.

Return to models of interest

Consider the simple systemA + B → C

Models of interest

A + B → C

Simple book-keeping says: if

X (t) =

XA(t)XB(t)XC(t)

gives the state at time t , then

X (t) = X (0) + R(t)

−1−11

whereI R(t) is the # of times the reaction has occurred by time t andI X (0) is the initial condition.

Goal: represent R(t) in terms of Poisson process.

Models of interest

Recall that for A + B → C our intuition was to specify infinitesimal behavior

P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t ,

and that for a counting process with specified intensity λ(t) we have

P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .

This suggests we can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson process.Hence XA(t)

XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

This equation uniquely determines X for all t ≥ 0.

Models of interest

P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .

R(t) = Y(∫ t

0κXA(s)XB(s)ds

XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

Models of interest

P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

Models of interest

P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .

R(t) = Y(∫ t

0κXA(s)XB(s)ds

XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

Build up model: Random time change representation of Kurtz• Now consider a network of reactions involving d chemical species,

S1, . . . ,Sd :d∑

νik Si −→d∑

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

so that if reaction k occurs at time t

X (t) = X (t−) + ζk .

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before:

X (t) = X (0) +∑

Rk (t)ζk ,

X (t) = X (0) +∑

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

Build up model: Random time change representation of Kurtz• Now consider a network of reactions involving d chemical species,

S1, . . . ,Sd :d∑

νik Si −→d∑

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

so that if reaction k occurs at time t

X (t) = X (t−) + ζk .

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before:

X (t) = X (0) +∑

Rk (t)ζk ,

X (t) = X (0) +∑

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

Mass-action kinetics

The standard intensity function chosen is mass-action kinetics:

λk (x) = κk (∏

νik !)

)= κk

(xi − νik )!.

Example: If S1 → anything, then λk (x) = κk x1.

Example: If S1 + S2 → anything, then λk (x) = κk x1x2.

Example: If 2S2 → anything, then λk (x) = κk x2(x2 − 1).

Other ways to understand model

The infinitesimal generator of a Markov process determines the process:

Af (x)def= lim

[Ex f (X (h))− f (x)]

= limh→0

(f (x + ζk )− f (x))P(Rk (h) = 1)

]+ O(h)

= limh→0

(f (x + ζk )− f (x))λk (x)h

]+ O(h)

λk (x)(f (x + ζk )− f (x)).

Af (x)def= lim

[Ex f (X (h))− f (x)]

= limh→0

(f (x + ζk )− f (x))P(Rk (h) = 1)

]+ O(h)

= limh→0

(f (x + ζk )− f (x))λk (x)h

]+ O(h)

λk (x)(f (x + ζk )− f (x)).

Af (x)def= lim

[Ex f (X (h))− f (x)]

= limh→0

(f (x + ζk )− f (x))P(Rk (h) = 1)

]+ O(h)

= limh→0

(f (x + ζk )− f (x))λk (x)h

]+ O(h)

λk (x)(f (x + ζk )− f (x)).

Af (x)def= lim

[Ex f (X (h))− f (x)]

= limh→0

(f (x + ζk )− f (x))P(Rk (h) = 1)

]+ O(h)

= limh→0

(f (x + ζk )− f (x))λk (x)h

]+ O(h)

λk (x)(f (x + ζk )− f (x)).

And we have Dynkin’s formula (See Ethier and Kurtz, 1986, Ch. 1)

Ef (X (t))− f (X0) = E∫ t

0Af (X (s))ds,

Letting f (y) = 1x (y), above so that

E[f (X (t))] = P{X (t) = x} = px (t),

gives Kolmogorov forward equation (chemical master equation)

p′t (x) =∑

λ(x − ζk )pt (x − ζk )− pt (x)∑

λk (x)

And we have Dynkin’s formula (See Ethier and Kurtz, 1986, Ch. 1)

Ef (X (t))− f (X0) = E∫ t

0Af (X (s))ds,

Letting f (y) = 1x (y), above so that

E[f (X (t))] = P{X (t) = x} = px (t),

gives Kolmogorov forward equation (chemical master equation)

p′t (x) =∑

λ(x − ζk )pt (x − ζk )− pt (x)∑

λk (x)

Equivalence of formulationsWe now have three ways of making the infinitesimal specification

P{X (t + ∆t)− X (t) = ξk |FXt } ≈ λk (X (t))∆t

precise:

1. The stochastic equation:

X (t) = X (0) +∑

(∫ t

0λk (X (s))ds

2. The process is Markov with infinitesimal generator

(Af )(x) =∑

λk (x)(f (x + ζk )− f (x))

3. The master (forward) equation for the probability distributions:

p′x (t) =∑

λk (x − ζk )pt (x − ζk )− pt (x)∑

λk (x)

Fortunately, if the solution of the stochastic equation doesn’t blow up, thethree are equivalent.

This model is an example of a continuous time Markov chain.

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with κ1 = 2, κ2 = .002, κ3 = 2.

Deterministic model. Let x(t) = [#prey,#predators]T

x(t) = x(0) + κ1

0x1(s)ds

]+ κ2

0x1(s)x2(s)ds

[−11

]+ κ3

0x2(s)ds

[0−1

Stochastic model. Let X (t) = [#prey,#predators]T

X(t) = X(0) + Y1

0X1(s)ds

0X1(s)X2(s)ds

)[−11

0X2(s)ds

)[0−1

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with κ1 = 2, κ2 = .002, κ3 = 2.

Deterministic model. Let x(t) = [#prey,#predators]T

x(t) = x(0) + κ1

0x1(s)ds

]+ κ2

0x1(s)x2(s)ds

[−11

]+ κ3

0x2(s)ds

[0−1

Stochastic model. Let X (t) = [#prey,#predators]T

X(t) = X(0) + Y1

0X1(s)ds

0X1(s)X2(s)ds

)[−11

0X2(s)ds

)[0−1

Another example: Viral infectionLet

1. T = viral template.2. G = viral genome.3. S = viral structure.4. V = virus.

Reactions:

R1) T + “stuff”κ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T + “stuff”κ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

Another example: Viral infection

Stochastic equations for X = (XG,XS ,XT ,XV ) are

X1(t) = X1(0) + Y1

(∫ t

0X3(s)ds

)− Y2

(0.025

0X1(s)ds

)− Y6

(7.5× 10−6

0X1(s)X2(s)ds

X2(t) = X2(0) + Y3

0X3(s)ds

)− Y5

(2∫ t

0X2(s)ds

)− Y6

(7.5× 10−6

0X1(s)X2(s)ds

X3(t) = X3(0) + Y2

(0.025

0X1(s)ds

)− Y4

0X3(s)ds

X4(t) = X4(0) + Y6

(7.5× 10−6

0X1(s)X2(s)ds

Computational methods

These are continuous time Markov chains!

Simulation/computation should be easy.

The most common simulation methods include

1. Gillespie’s Algorithm – Answer where and when independently.

2. The next reaction method of Gibson and Bruck.

3. Each is an example of discrete event simulation.

Numerical methodsEach exact method produces sample paths that can approximate valuessuch as (which I will talk about tomorrow at CWI)

Ef (X (t)) ≈ 1n

n∑i=1

f (X[i](t))

For example,

1. Means – expected virus yield.

2. Variances.

3. Probabilities.

or sensitivitiesddθ

Ef (θ,X θ(t)).

Problem: solving using these algorithms can be computationally expensive:

1. Each path may require significant number of computational steps.

2. → May require significant number of paths.

Solution: Need to use novel stochastic representations to get good methods.

Numerical methodsEach exact method produces sample paths that can approximate valuessuch as (which I will talk about tomorrow at CWI)

Ef (X (t)) ≈ 1n

n∑i=1

f (X[i](t))

For example,

1. Means – expected virus yield.

2. Variances.

3. Probabilities.

or sensitivitiesddθ

Ef (θ,X θ(t)).

Problem: solving using these algorithms can be computationally expensive:

1. Each path may require significant number of computational steps.

2. → May require significant number of paths.

Solution: Need to use novel stochastic representations to get good methods.

Specific computational problem: Gradient estimation/sensitivity analysisWe have

X θ(t) = X θ(0) +∑

(∫ t

0λk (θ,X θ(s))ds

)ζk ,

with θ ∈ Rs, and we define

J(θ) = Ef (θ,X θ(t)].

We know how to estimate J(θ) using Monte Carlo.

However, what if we want

J ′(θ) =ddθ

Ef (θ,X θ(t)).

Thus, we want to know how sensitive our statistic is to perturbations in θ.Tells us, for example:

1. Robustness of system to perturbations in parameters.

2. Which parameters we need to estimate well from data, etc.

There are multiple methods. We will consider:I Finite differences.

X θ(t) = X θ(0) +∑

(∫ t

0λk (θ,X θ(s))ds

)ζk ,

J(θ) = Ef (θ,X θ(t)].

J ′(θ) =ddθ

Ef (θ,X θ(t)).

X θ(t) = X θ(0) +∑

(∫ t

0λk (θ,X θ(s))ds

)ζk ,

J(θ) = Ef (θ,X θ(t)].

J ′(θ) =ddθ

Ef (θ,X θ(t)).

Finite differencing

This method is pretty straightforward and is therefore used most.

Simply note that

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε) = E

[f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

]+ O(ε).

Centered differencing reduces bias to O(ε2).

The usual finite difference estimator is

DN(ε) =1N

N∑i=1

f (θ + ε,X θ+ε[i] (t))− f (θ,X θ

[i](t))

Letting δ > 0 be some desired accuracy (for confidence interval), we need Nso that √

Var(DN(ε)) ≤ δ.

Finite differencing

Simply note that

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε) = E

[f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

]+ O(ε).

DN(ε) =1N

N∑i=1

f (θ + ε,X θ+ε[i] (t))− f (θ,X θ

[i](t))

Var(DN(ε)) ≤ δ.

Finite differencing

Simply note that

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε) = E

[f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

]+ O(ε).

DN(ε) =1N

N∑i=1

f (θ + ε,X θ+ε[i] (t))− f (θ,X θ

[i](t))

Var(DN(ε)) ≤ δ.

Finite differencing

Want √Var(DN(ε)) ≤ δ.

DN(ε) =1N

N∑i=1

f (θ,X θ+ε[i] (t))− f (θ,X θ

[i](t))

If paths generated independently, then

Var(DN(ε)) = N−1ε−2Var(f (θ,X θ+ε[i] (t))− f (θ,X θ

[i](t)))

= O(N−1ε−2),

implying1√N

= O(δ) =⇒ N = O(ε−2δ−2)

Terrible. Worse than expectations.

How about common random numbers for variance reduction?

Common random numbers

It’s exactly what it sounds like. Reuse the random numbers used in thegeneration of

X θ+ε[i] (t) and X θ

[i](t).

Because:

Var(f (θ,X θ+ε[i] (t))− f (θ,X θ+ε

[i] (t))) = Var(f (θ,X θ+ε[i] (t))) + Var(f (θ,X θ

[i](t)))

− 2Cov(f (θ,X θ+ε[i] (t)), f (θ,X θ

[i](t))).

So, if we can “couple” the random variables, we can get a variance reduction!Sometimes substantial.

It’s exactly what it sounds like. Reuse the random numbers used in thegeneration of

X θ+ε[i] (t) and X θ

[i](t).

Why? Because:

Var(f (θ,X θ+ε[i] (t))− f (θ,X θ+ε

[i] (t))) = Var(f (θ,X θ+ε[i] (t))) + Var(f (θ,X θ

[i](t)))

− 2Cov(f (θ,X θ+ε[i] (t)), f (θ,X θ

[i](t))).

So, if we can “couple” the random variables, we can get a variance reduction!Sometimes substantial.

I In the context of Gillespie’s algorithm, we simply reuse all the samerandom numbers (uniforms).

I This can be achieved simply by setting the “seed” of the random numbergenerator before generating X θ+ε and X θ.

CRN + Gillespie is good idea.

1. Costs little in terms of implementation.

2. Variance reduction and gains in efficiency can be huge.

Thus, it is probably the most common method used today.

But:I Over time, the processes decouple, often completely.

Can we do better?

Coupling

Using common random numbers in previous fashion is a way of “coupling”the two processes together.

Is there a natural way to couple processes using random time change? Canwe couple the Poisson processes?

Answer: yes. Multiple ways. I will show one which works very well.

Coupling

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

= 26.1t .

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

= 26.1t .

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

Using a fact: sum of homogeneous Poisson process is again a Poissonprocess.

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

Using a fact: sum of homogeneous Poisson process is again a Poissonprocess.

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

where we are using that, for example,

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

(∫ t

0f (s)ds

where Y is a unit rate Poisson process.

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

(∫ t

0f (s)ds

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

(∫ t

0f (s) ∧ g(s)ds

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

(∫ t

0f (s)ds

Parameter sensitivities.

Couple the processes.

X θ+ε(t) = X θ+ε(0) +∑

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

(∫ t

0λθ+ε

k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

X θ(t) = X θ(0) +∑

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

(∫ t

0λθk (X θ(s))− λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk ,

Theorem3 Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is aCT ,f > 0 for which

E supt≤T

(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

)2≤ CT ,f ε.

This lowers variance of estimator from

O(N−1ε−2),

toO(N−1ε−1).

Lowered by order of magnitude (in ε).

Point: a deeper mathematical understanding led to better computationalmethod.

3David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 50, No. 5, 2012.

Theorem3 Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is aCT ,f > 0 for which

E supt≤T

(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

)2≤ CT ,f ε.

This lowers variance of estimator from

O(N−1ε−2),

toO(N−1ε−1).

Lowered by order of magnitude (in ε).

Point: a deeper mathematical understanding led to better computationalmethod.

3David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 50, No. 5, 2012.

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

)2≤ CT ,f ε.

Proof:

Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds,

where “most” of the jumps have vanished.

Now work on Martingale and absolutely continuous part.

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))

)2≤ CT ,f ε.

Proof:Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds,

where “most” of the jumps have vanished.

Now work on Martingale and absolutely continuous part.

Example: gene transcription and translation

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂θE[X θ

protein(30)], θ ≈ 1/4.

Method R 95% CI # updates CPU TimeLikelihood 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 S

CMC 246,000 -319.3 ± 6.0 2.1× 109 2,364.8 SCRP/CRN 25,980 -316.7 ± 6.0 2.2× 108 270.9 S

CFD 4,580 -319.9 ± 6.0 2.0× 107 29.2 S

Table: Each finite difference method used ε = 1/40. The exact value isJ(1/4) = −318.073.

Example: gene transcription and translation

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂θE[X θ

protein(30)], θ ≈ 1/4.

Method R 95% CI # updates CPU TimeLikelihood 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 S

CMC 246,000 -319.3 ± 6.0 2.1× 109 2,364.8 SCRP/CRN 25,980 -316.7 ± 6.0 2.2× 108 270.9 S

CFD 4,580 -319.9 ± 6.0 2.0× 107 29.2 S

Table: Each finite difference method used ε = 1/40. The exact value isJ(1/4) = −318.073.

Comparison from 5,000 samples each with ε = 1/40

0 10 20 30 40 50 600

Coupled Finite DifferencesCommon Reaction Path

0 10 20 30 40 50 600

Crude Monte Carlo

0 10 20 30 40 50 600

Girsanov Transformation

Example: genetic toggle switch

∅λ1(X)

�λ2(X)

X1, ∅λ3(X)

�λ4(X)

X2, (1)

with intensity functions

λ1(X (t)) =α1

1 + X2(t)β, λ2(X (t)) = X1(t)

λ3(X (t)) =α2

1 + X1(t)γ. λ4(X (t)) = X2(t),

and parameter choice

α1 = 50, α2 = 16, β = 2.5, γ = 1.

I Begin the process with initial condition [0, 0] andI consider the sensitivity of X1 as a function of α1.

Example: genetic toggle switch

0 5 10 15 20 25 30 35 400

Coupled Finite DifferencesCommon Reaction Path

(a) Variance to time T = 40

Figure: Time plot of the variance of the Coupled Finite Difference estimator versus theCommon Reaction Path estimator for the model (1). Each plot was generated using10,000 sample paths. A perturbation of ε = 1/10 was used.

Are these representations only good for simulation? – LLN and ODEs.Tom Kurtz ∼ 1970’s

Suppose XN(t) = O(N). Denote concentrations via

X N(t) = N−1XN(t) = O(1).

Under mild assumptions, have

λk (XN(t)) = λk (N · XN(t)/N) = N · λk (X N(t)).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use thatlim

N→∞sup{u≤U}

∣∣N−1Y (Nu)− u∣∣ = 0,

find X N(t) converges to solution of classical ODE

x(t) = x(0) +∑

0λk (x(s))ds · ξk ≡ x(0) +

0F (X (s))ds.

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use thatlim

N→∞sup{u≤U}

∣∣N−1Y (Nu)− u∣∣ = 0,

x(t) = x(0) +∑

0λk (x(s))ds · ξk ≡ x(0) +

0F (X (s))ds.

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use thatlim

N→∞sup{u≤U}

∣∣N−1Y (Nu)− u∣∣ = 0,

x(t) = x(0) +∑

0λk (x(s))ds · ξk ≡ x(0) +

0F (X (s))ds.

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use thatlim

N→∞sup{u≤U}

∣∣N−1Y (Nu)− u∣∣ = 0,

x(t) = x(0) +∑

0λk (x(s))ds · ξk ≡ x(0) +

0F (X (s))ds.

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use thatlim

N→∞sup{u≤U}

∣∣N−1Y (Nu)− u∣∣ = 0,

x(t) = x(0) +∑

0λk (x(s))ds · ξk ≡ x(0) +

0F (X (s))ds.

Diffusions? Argument due to Tom KurtzSuppose XN(t) = O(N). Denote concentrations via

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use that1√N

(Yk (Nu)− Nu) ≈ Wk (u)

find X N(t) well approximated by chemical Langevin process

X (t) = X (0) +∑

0λk (X (s))ds +

√λk (X (s))dWk (s).

ordX (t) = F (X (t))dt + N−1/2

√λk (X (t))dWk (t).

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use that1√N

X (t) = X (0) +∑

0λk (X (s))ds +

ordX (t) = F (X (t))dt + N−1/2

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use that1√N

X (t) = X (0) +∑

0λk (X (s))ds +

ordX (t) = F (X (t))dt + N−1/2

X N(t) = N−1XN(t) = O(1).

XN(t) = XN(0) +∑

(∫ t

0λk (XN(s))ds

becomes

X N(t) = X N(0) +∑

N−1Yk

(N∫ t

0λk (XN(s))ds

use that1√N

X (t) = X (0) +∑

0λk (X (s))ds +

ordX (t) = F (X (t))dt + N−1/2

Central limit theorem - Kurtz/ Van KampenSuppose XN(t) = O(N). Denote concentrations via

X N(t) = N−1XN(t) = O(1), x(t) = ODE solution.

LetUN(t) =

X N(t)− x(t))

=Xn(t)− Nx(t)√

UN(t) =√

(N−1

ζk Yk

(N∫ t

0λk (X n(s))ds

N∫ t

0(F (X n(s))− F (x(s)))ds

≈ 1√N

ζk Yk

(N∫ t

0λk (X n(s))ds

0DF (x(s))Un(s)ds.

use martingale central limit theorem to show that1√N

Yk (N·)⇒ Wk (·),

get Un ⇒ U,

U(t) =∑

ζk Wk

(∫ t

0λk (x(s))ds

0DF (x(s))U(s)ds

LetUN(t) =

X N(t)− x(t))

=Xn(t)− Nx(t)√

UN(t) =√

(N−1

ζk Yk

(N∫ t

0λk (X n(s))ds

N∫ t

0(F (X n(s))− F (x(s)))ds

≈ 1√N

ζk Yk

(N∫ t

0λk (X n(s))ds

0DF (x(s))Un(s)ds.

Yk (N·)⇒ Wk (·),

get Un ⇒ U,

U(t) =∑

ζk Wk

(∫ t

0λk (x(s))ds

0DF (x(s))U(s)ds

LetUN(t) =

X N(t)− x(t))

=Xn(t)− Nx(t)√

UN(t) =√

(N−1

ζk Yk

(N∫ t

0λk (X n(s))ds

N∫ t

0(F (X n(s))− F (x(s)))ds

≈ 1√N

ζk Yk

(N∫ t

0λk (X n(s))ds

0DF (x(s))Un(s)ds.

Yk (N·)⇒ Wk (·),

get Un ⇒ U,

U(t) =∑

ζk Wk

(∫ t

0λk (x(s))ds

0DF (x(s))U(s)ds

Thanks!

References:

1. David F. Anderson, An Efficient Finite Difference Method for ParameterSensitivities of Continuous Time Markov Chains, SIAM: Journal onNumerical Analysis, Vol. 50, No. 5, 2012.

2. David F. Anderson and Thomas G. Kurtz, Continuous Time MarkovChain Models for Chemical Reaction Networks, in Design and Analysisof biomolecular circuits, Springer, 2011, Eds. Heinz Koeppl et al.

Funding: NSF-DMS-1009275.

stochastic models of biochemical systemsanderson/recenttalks/2012/amsterdam/... · 2012. 11....

Documents