stochastic models of biochemical systemsanderson/recenttalks/2012/amsterdam/... · 2012. 11....
TRANSCRIPT
Stochastic models of biochemical systems
David F. Anderson∗
Department of Mathematics
University of Wisconsin - Madison
University of Amsterdam
November 14th, 2012
Stochastic models of biochemical systems
Goal:I give broad introduction to stochastic models of biochemical systems,
I with minimal technical details.
Outline
1. Construct useful representation for most common continuous timeMarkov chain model for population processes.
2. Discuss some computational methods – sensitivity analysis.
3. Discuss various approximate models for these CTMCs.
Example: ODE Lotka-Volterra predator-prey model
Think of A as a prey and B as a predator.
Aκ1→ 2A, A + B
κ2→ 2B, Bκ3→ ∅,
with κ1 = 2, κ2 = .002, κ3 = 2.
Deterministic model. Let x(t) = [# prey at t ,# predator at t ]T
˙x(t) = κ1x1(t)[
10
]+ κ2x1(t)x2(t)
[−11
]+ κ3x2(t)
[0−1
]or
x(t) = x(0) + κ1
∫ t
0x1(s)ds
[10
]+ κ2
∫ t
0x1(s)x2(s)ds
[−11
]+ κ3
∫ t
0x2(s)ds
[0−1
]
Example: ODE Lotka-Volterra predator-prey model
Think of A as a prey and B as a predator.
Aκ1→ 2A, A + B
κ2→ 2B, Bκ3→ ∅,
with κ1 = 2, κ2 = .002, κ3 = 2.
Deterministic model. Let x(t) = [# prey at t ,# predator at t ]T
˙x(t) = κ1x1(t)[
10
]+ κ2x1(t)x2(t)
[−11
]+ κ3x2(t)
[0−1
]or
x(t) = x(0) + κ1
∫ t
0x1(s)ds
[10
]+ κ2
∫ t
0x1(s)x2(s)ds
[−11
]+ κ3
∫ t
0x2(s)ds
[0−1
]
Lotka-Volterra
Think of A as a prey and B as a predator.
Aκ1→ 2A, A + B
κ2→ 2B, Bκ3→ ∅,
with κ1 = 2, κ2 = .002, κ3 = 2.
0 5 10 15 20 25 30 35 40700
800
900
1000
1100
1200
1300
1400
1500
PreyPredator
Biological example: transcription-translation
Gene transcription & translation:
Gκ1→ G + M transcription
Mκ2→ M + P translation
Mκ3→ ∅ degradation
Pκ4→ ∅ degradation
G + Pκ5�κ−5
B Binding/unbinding of Gene
Cartoon representation:
1
X1
Nαq1�
Nαq2
X2,
X1N·λ1→ M, X2
N·λ2→ M,
M µ→ ∅.
1J. Paulsson, Physics of Life Reviews, 2, 2005 157 – 175.
Another example: Viral infectionLet
1. T = viral template.2. G = viral genome.3. S = viral structure.4. V = virus.
Reactions:
R1) T + “stuff”κ1→ T + G κ1 = 1
R2) Gκ2→ T κ2 = 0.025
R3) T + “stuff”κ3→ T + S κ3 = 1000
R4) Tκ4→ ∅ κ4 = 0.25
R5) Sκ5→ ∅ κ5 = 2
R6) G + Sκ6→ V κ6 = 7.5× 10−6
I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.
Some examples
E. coli Heat Shock Response Model. 9 species, 18 reactions.
2
2Hye Won Kang, presentation at SPA in 2007.
Modeling
1. These models (and much more complicated ones) have historically beenpredominantly modeled using ODEs.
2. However:
2.1 there are often low numbers of molecules, which makes timing of reactionsmore random (less averaging),
2.2 when a reaction occurs, the system jumps to new state by non-trivialamount: 1� 0.
3. Researchers (mostly) lived with these shortcomings until the late 1990sand early 2000s when it was shown ODE models can not captureimportant qualitative behavior of certain models:
I λ-phage lysis-lysogeny decision mechanism (Arkin-McAdams 1998).
I Green fluorescent protein.
ODEs were often the wrong modeling choice.
Modeling
1. These models (and much more complicated ones) have historically beenpredominantly modeled using ODEs.
2. However:
2.1 there are often low numbers of molecules, which makes timing of reactionsmore random (less averaging),
2.2 when a reaction occurs, the system jumps to new state by non-trivialamount: 1� 0.
3. Researchers (mostly) lived with these shortcomings until the late 1990sand early 2000s when it was shown ODE models can not captureimportant qualitative behavior of certain models:
I λ-phage lysis-lysogeny decision mechanism (Arkin-McAdams 1998).
I Green fluorescent protein.
ODEs were often the wrong modeling choice.
Specifying infinitesimal behavior
Q: What is a better modeling choice? Should be
1. discrete space, since counting molecules, and
2. stochastic dynamics.
Let’s return to development of ODEs.
An ordinary differential equation is specified by describing how a functionshould vary over a small period of time
X (t + ∆t)− X (t) ≈ F (X (t))∆t
A more precise description (consider a telescoping sum)
X (t) = X (0) +
∫ t
0F (X (s))ds
Specifying infinitesimal behavior
Q: What is a better modeling choice? Should be
1. discrete space, since counting molecules, and
2. stochastic dynamics.
Let’s return to development of ODEs.
An ordinary differential equation is specified by describing how a functionshould vary over a small period of time
X (t + ∆t)− X (t) ≈ F (X (t))∆t
A more precise description (consider a telescoping sum)
X (t) = X (0) +
∫ t
0F (X (s))ds
Infinitesimal behavior for jump processes
We are interested in functions that are piecewise constant and random.
Changes, when they occur, won’t be small. If “reaction k ” occurs at time t ,
X (t)− X (t−) = ζk ∈ Zd
What is small? The probability of seeing a jump of a particular size.
P{X (t + ∆t)− X (t) = ζk | Ft} ≈ λζk (t)∆t
Question: Can we specify the λζk in some way that determines X?
I For the ODE, F depended on X .
I Maybe λζk should depend on X?
Infinitesimal behavior for jump processes
We are interested in functions that are piecewise constant and random.
Changes, when they occur, won’t be small. If “reaction k ” occurs at time t ,
X (t)− X (t−) = ζk ∈ Zd
What is small? The probability of seeing a jump of a particular size.
P{X (t + ∆t)− X (t) = ζk | Ft} ≈ λζk (t)∆t
Question: Can we specify the λζk in some way that determines X?
I For the ODE, F depended on X .
I Maybe λζk should depend on X?
Simple model
For example, consider the simple system
A + B → C
where one molecule each of A and B is being converted to one of C.
Intuition for standard stochastic model:
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t
whereI κ is a positive constant, the reaction rate constant.
I Ft is all the information pertaining to the process up through time t .
Can we specify a reasonable model satisfying this assumption?
Simple model
For example, consider the simple system
A + B → C
where one molecule each of A and B is being converted to one of C.
Intuition for standard stochastic model:
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t
whereI κ is a positive constant, the reaction rate constant.
I Ft is all the information pertaining to the process up through time t .
Can we specify a reasonable model satisfying this assumption?
Background information: The Poisson processWill view a Poisson process, Y (·), through the lens of an underlyingpoint process.
(a) Let {ei} be i.i.d. exponential random variables with parameter one.
(b) Now, put points down on a line with spacing equal to the ei :
x x x x x x x x↔e1↔e2
←→e3 · · · t
I Let Y1(t) denote the number of points hit by time t .
I In the figure above, Y1(t) = 6.
0 5 10 15 200
5
10
15
20
25
! = 1
Background information: The Poisson processWill view a Poisson process, Y (·), through the lens of an underlyingpoint process.
(a) Let {ei} be i.i.d. exponential random variables with parameter one.
(b) Now, put points down on a line with spacing equal to the ei :
x x x x x x x x↔e1↔e2
←→e3 · · · t
I Let Y1(t) denote the number of points hit by time t .
I In the figure above, Y1(t) = 6.
0 5 10 15 200
5
10
15
20
25
! = 1
The Poisson processLet
I Y1 be a unit rate Poisson process.
I Define Yλ(t) ≡ Y1(λt),
Then Yλ is a Poisson process with parameter λ.
x x x x x x x x↔e1↔e2
←→e3 · · · t
Intuition: The Poisson process with rate λ is simply the number of points hit(of the unit-rate point process) when we run along the time frame at rate λ.
0 5 10 15 200
10
20
30
40
50
60
! = 3
The Poisson process
There is no reason λ needs to be constant in time, in which case
Yλ(t) ≡ Y(∫ t
0λ(s)ds
)is a non-homogeneous Poisson process with propensity/intensity λ(t) ≥ 0.
Thus
P{Yλ(t + ∆t)− Yλ(t) > 0|Ft} = 1− exp{−∫ t+∆t
tλ(s)ds
}≈ λ(t)∆t .
Points:
1. We have “changed time” to convert a unit-rate Poisson process to onewhich has rate or intensity or propensity λ(t).
2. Will use similar time changes of unit-rate processes to build models ofinterest.
The Poisson process
There is no reason λ needs to be constant in time, in which case
Yλ(t) ≡ Y(∫ t
0λ(s)ds
)is a non-homogeneous Poisson process with propensity/intensity λ(t) ≥ 0.
Thus
P{Yλ(t + ∆t)− Yλ(t) > 0|Ft} = 1− exp{−∫ t+∆t
tλ(s)ds
}≈ λ(t)∆t .
Points:
1. We have “changed time” to convert a unit-rate Poisson process to onewhich has rate or intensity or propensity λ(t).
2. Will use similar time changes of unit-rate processes to build models ofinterest.
Return to models of interest
Consider the simple systemA + B → C
where one molecule each of A and B is being converted to one of C.
Intuition for standard stochastic model:
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t
whereI κ is a positive constant, the reaction rate constant.
I Ft is all the information pertaining to the process up through time t .
Models of interest
A + B → C
Simple book-keeping says: if
X (t) =
XA(t)XB(t)XC(t)
gives the state at time t , then
X (t) = X (0) + R(t)
−1−11
,
whereI R(t) is the # of times the reaction has occurred by time t andI X (0) is the initial condition.
Goal: represent R(t) in terms of Poisson process.
Models of interest
Recall that for A + B → C our intuition was to specify infinitesimal behavior
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t ,
and that for a counting process with specified intensity λ(t) we have
P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .
This suggests we can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson process.Hence XA(t)
XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
This equation uniquely determines X for all t ≥ 0.
Models of interest
Recall that for A + B → C our intuition was to specify infinitesimal behavior
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t ,
and that for a counting process with specified intensity λ(t) we have
P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .
This suggests we can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson process.Hence XA(t)
XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
This equation uniquely determines X for all t ≥ 0.
Models of interest
Recall that for A + B → C our intuition was to specify infinitesimal behavior
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t ,
and that for a counting process with specified intensity λ(t) we have
P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .
This suggests we can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson process.
Hence XA(t)XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
This equation uniquely determines X for all t ≥ 0.
Models of interest
Recall that for A + B → C our intuition was to specify infinitesimal behavior
P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t ,
and that for a counting process with specified intensity λ(t) we have
P{Yλ(t + ∆t)− Yλ(t) = 1|Ft} ≈ λ(t)∆t .
This suggests we can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson process.Hence XA(t)
XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
This equation uniquely determines X for all t ≥ 0.
Build up model: Random time change representation of Kurtz• Now consider a network of reactions involving d chemical species,
S1, . . . ,Sd :d∑
i=1
νik Si −→d∑
i=1
ν′ik Si
Denote reaction vector as
ζk = ν′k − νk ,
so that if reaction k occurs at time t
X (t) = X (t−) + ζk .
• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.
• By analogy with before:
X (t) = X (0) +∑
k
Rk (t)ζk ,
with
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Yk are independent, unit-rate Poisson processes.
Build up model: Random time change representation of Kurtz• Now consider a network of reactions involving d chemical species,
S1, . . . ,Sd :d∑
i=1
νik Si −→d∑
i=1
ν′ik Si
Denote reaction vector as
ζk = ν′k − νk ,
so that if reaction k occurs at time t
X (t) = X (t−) + ζk .
• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.
• By analogy with before:
X (t) = X (0) +∑
k
Rk (t)ζk ,
with
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Yk are independent, unit-rate Poisson processes.
Mass-action kinetics
The standard intensity function chosen is mass-action kinetics:
λk (x) = κk (∏
i
νik !)
(xνk
)= κk
∏i
xi !
(xi − νik )!.
Example: If S1 → anything, then λk (x) = κk x1.
Example: If S1 + S2 → anything, then λk (x) = κk x1x2.
Example: If 2S2 → anything, then λk (x) = κk x2(x2 − 1).
Other ways to understand model
The infinitesimal generator of a Markov process determines the process:
Af (x)def= lim
h→0
1h
[Ex f (X (h))− f (x)]
= limh→0
1h
[∑k
(f (x + ζk )− f (x))P(Rk (h) = 1)
]+ O(h)
= limh→0
1h
[∑k
(f (x + ζk )− f (x))λk (x)h
]+ O(h)
=∑
k
λk (x)(f (x + ζk )− f (x)).
Other ways to understand model
The infinitesimal generator of a Markov process determines the process:
Af (x)def= lim
h→0
1h
[Ex f (X (h))− f (x)]
= limh→0
1h
[∑k
(f (x + ζk )− f (x))P(Rk (h) = 1)
]+ O(h)
= limh→0
1h
[∑k
(f (x + ζk )− f (x))λk (x)h
]+ O(h)
=∑
k
λk (x)(f (x + ζk )− f (x)).
Other ways to understand model
The infinitesimal generator of a Markov process determines the process:
Af (x)def= lim
h→0
1h
[Ex f (X (h))− f (x)]
= limh→0
1h
[∑k
(f (x + ζk )− f (x))P(Rk (h) = 1)
]+ O(h)
= limh→0
1h
[∑k
(f (x + ζk )− f (x))λk (x)h
]+ O(h)
=∑
k
λk (x)(f (x + ζk )− f (x)).
Other ways to understand model
The infinitesimal generator of a Markov process determines the process:
Af (x)def= lim
h→0
1h
[Ex f (X (h))− f (x)]
= limh→0
1h
[∑k
(f (x + ζk )− f (x))P(Rk (h) = 1)
]+ O(h)
= limh→0
1h
[∑k
(f (x + ζk )− f (x))λk (x)h
]+ O(h)
=∑
k
λk (x)(f (x + ζk )− f (x)).
Other ways to understand model
And we have Dynkin’s formula (See Ethier and Kurtz, 1986, Ch. 1)
Ef (X (t))− f (X0) = E∫ t
0Af (X (s))ds,
Letting f (y) = 1x (y), above so that
E[f (X (t))] = P{X (t) = x} = px (t),
gives Kolmogorov forward equation (chemical master equation)
p′t (x) =∑
k
λ(x − ζk )pt (x − ζk )− pt (x)∑
k
λk (x)
Other ways to understand model
And we have Dynkin’s formula (See Ethier and Kurtz, 1986, Ch. 1)
Ef (X (t))− f (X0) = E∫ t
0Af (X (s))ds,
Letting f (y) = 1x (y), above so that
E[f (X (t))] = P{X (t) = x} = px (t),
gives Kolmogorov forward equation (chemical master equation)
p′t (x) =∑
k
λ(x − ζk )pt (x − ζk )− pt (x)∑
k
λk (x)
Equivalence of formulationsWe now have three ways of making the infinitesimal specification
P{X (t + ∆t)− X (t) = ξk |FXt } ≈ λk (X (t))∆t
precise:
1. The stochastic equation:
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk
2. The process is Markov with infinitesimal generator
(Af )(x) =∑
k
λk (x)(f (x + ζk )− f (x))
3. The master (forward) equation for the probability distributions:
p′x (t) =∑
k
λk (x − ζk )pt (x − ζk )− pt (x)∑
k
λk (x)
Fortunately, if the solution of the stochastic equation doesn’t blow up, thethree are equivalent.
This model is an example of a continuous time Markov chain.
Example: ODE Lotka-Volterra predator-prey model
Think of A as a prey and B as a predator.
Aκ1→ 2A, A + B
κ2→ 2B, Bκ3→ ∅,
with κ1 = 2, κ2 = .002, κ3 = 2.
Deterministic model. Let x(t) = [#prey,#predators]T
x(t) = x(0) + κ1
∫ t
0x1(s)ds
[10
]+ κ2
∫ t
0x1(s)x2(s)ds
[−11
]+ κ3
∫ t
0x2(s)ds
[0−1
]
Stochastic model. Let X (t) = [#prey,#predators]T
X(t) = X(0) + Y1
(κ1
∫ t
0X1(s)ds
)[10
]+ Y2
(κ2
∫ t
0X1(s)X2(s)ds
)[−11
]
+ Y3
(κ3
∫ t
0X2(s)ds
)[0−1
]
Example: ODE Lotka-Volterra predator-prey model
Think of A as a prey and B as a predator.
Aκ1→ 2A, A + B
κ2→ 2B, Bκ3→ ∅,
with κ1 = 2, κ2 = .002, κ3 = 2.
Deterministic model. Let x(t) = [#prey,#predators]T
x(t) = x(0) + κ1
∫ t
0x1(s)ds
[10
]+ κ2
∫ t
0x1(s)x2(s)ds
[−11
]+ κ3
∫ t
0x2(s)ds
[0−1
]
Stochastic model. Let X (t) = [#prey,#predators]T
X(t) = X(0) + Y1
(κ1
∫ t
0X1(s)ds
)[10
]+ Y2
(κ2
∫ t
0X1(s)X2(s)ds
)[−11
]
+ Y3
(κ3
∫ t
0X2(s)ds
)[0−1
]
Another example: Viral infectionLet
1. T = viral template.2. G = viral genome.3. S = viral structure.4. V = virus.
Reactions:
R1) T + “stuff”κ1→ T + G κ1 = 1
R2) Gκ2→ T κ2 = 0.025
R3) T + “stuff”κ3→ T + S κ3 = 1000
R4) Tκ4→ ∅ κ4 = 0.25
R5) Sκ5→ ∅ κ5 = 2
R6) G + Sκ6→ V κ6 = 7.5× 10−6
I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.
Another example: Viral infection
Stochastic equations for X = (XG,XS ,XT ,XV ) are
X1(t) = X1(0) + Y1
(∫ t
0X3(s)ds
)− Y2
(0.025
∫ t
0X1(s)ds
)− Y6
(7.5× 10−6
∫ t
0X1(s)X2(s)ds
)
X2(t) = X2(0) + Y3
(1000
∫ t
0X3(s)ds
)− Y5
(2∫ t
0X2(s)ds
)− Y6
(7.5× 10−6
∫ t
0X1(s)X2(s)ds
)
X3(t) = X3(0) + Y2
(0.025
∫ t
0X1(s)ds
)− Y4
(0.25
∫ t
0X3(s)ds
)
X4(t) = X4(0) + Y6
(7.5× 10−6
∫ t
0X1(s)X2(s)ds
).
Computational methods
These are continuous time Markov chains!
Simulation/computation should be easy.
The most common simulation methods include
1. Gillespie’s Algorithm – Answer where and when independently.
2. The next reaction method of Gibson and Bruck.
3. Each is an example of discrete event simulation.
Numerical methodsEach exact method produces sample paths that can approximate valuessuch as (which I will talk about tomorrow at CWI)
Ef (X (t)) ≈ 1n
n∑i=1
f (X[i](t))
For example,
1. Means – expected virus yield.
2. Variances.
3. Probabilities.
or sensitivitiesddθ
Ef (θ,X θ(t)).
Problem: solving using these algorithms can be computationally expensive:
1. Each path may require significant number of computational steps.
2. → May require significant number of paths.
Solution: Need to use novel stochastic representations to get good methods.
Numerical methodsEach exact method produces sample paths that can approximate valuessuch as (which I will talk about tomorrow at CWI)
Ef (X (t)) ≈ 1n
n∑i=1
f (X[i](t))
For example,
1. Means – expected virus yield.
2. Variances.
3. Probabilities.
or sensitivitiesddθ
Ef (θ,X θ(t)).
Problem: solving using these algorithms can be computationally expensive:
1. Each path may require significant number of computational steps.
2. → May require significant number of paths.
Solution: Need to use novel stochastic representations to get good methods.
Specific computational problem: Gradient estimation/sensitivity analysisWe have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λk (θ,X θ(s))ds
)ζk ,
with θ ∈ Rs, and we define
J(θ) = Ef (θ,X θ(t)].
We know how to estimate J(θ) using Monte Carlo.
However, what if we want
J ′(θ) =ddθ
Ef (θ,X θ(t)).
Thus, we want to know how sensitive our statistic is to perturbations in θ.Tells us, for example:
1. Robustness of system to perturbations in parameters.
2. Which parameters we need to estimate well from data, etc.
There are multiple methods. We will consider:I Finite differences.
Specific computational problem: Gradient estimation/sensitivity analysisWe have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λk (θ,X θ(s))ds
)ζk ,
with θ ∈ Rs, and we define
J(θ) = Ef (θ,X θ(t)].
We know how to estimate J(θ) using Monte Carlo.
However, what if we want
J ′(θ) =ddθ
Ef (θ,X θ(t)).
Thus, we want to know how sensitive our statistic is to perturbations in θ.Tells us, for example:
1. Robustness of system to perturbations in parameters.
2. Which parameters we need to estimate well from data, etc.
There are multiple methods. We will consider:I Finite differences.
Specific computational problem: Gradient estimation/sensitivity analysisWe have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λk (θ,X θ(s))ds
)ζk ,
with θ ∈ Rs, and we define
J(θ) = Ef (θ,X θ(t)].
We know how to estimate J(θ) using Monte Carlo.
However, what if we want
J ′(θ) =ddθ
Ef (θ,X θ(t)).
Thus, we want to know how sensitive our statistic is to perturbations in θ.Tells us, for example:
1. Robustness of system to perturbations in parameters.
2. Which parameters we need to estimate well from data, etc.
There are multiple methods. We will consider:I Finite differences.
Finite differencing
This method is pretty straightforward and is therefore used most.
Simply note that
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε) = E
[f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
ε
]+ O(ε).
Centered differencing reduces bias to O(ε2).
The usual finite difference estimator is
DN(ε) =1N
N∑i=1
f (θ + ε,X θ+ε[i] (t))− f (θ,X θ
[i](t))
ε
Letting δ > 0 be some desired accuracy (for confidence interval), we need Nso that √
Var(DN(ε)) ≤ δ.
Finite differencing
This method is pretty straightforward and is therefore used most.
Simply note that
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε) = E
[f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
ε
]+ O(ε).
Centered differencing reduces bias to O(ε2).
The usual finite difference estimator is
DN(ε) =1N
N∑i=1
f (θ + ε,X θ+ε[i] (t))− f (θ,X θ
[i](t))
ε
Letting δ > 0 be some desired accuracy (for confidence interval), we need Nso that √
Var(DN(ε)) ≤ δ.
Finite differencing
This method is pretty straightforward and is therefore used most.
Simply note that
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε) = E
[f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
ε
]+ O(ε).
Centered differencing reduces bias to O(ε2).
The usual finite difference estimator is
DN(ε) =1N
N∑i=1
f (θ + ε,X θ+ε[i] (t))− f (θ,X θ
[i](t))
ε
Letting δ > 0 be some desired accuracy (for confidence interval), we need Nso that √
Var(DN(ε)) ≤ δ.
Finite differencing
Want √Var(DN(ε)) ≤ δ.
with
DN(ε) =1N
N∑i=1
f (θ,X θ+ε[i] (t))− f (θ,X θ
[i](t))
ε
If paths generated independently, then
Var(DN(ε)) = N−1ε−2Var(f (θ,X θ+ε[i] (t))− f (θ,X θ
[i](t)))
= O(N−1ε−2),
implying1√N
1ε
= O(δ) =⇒ N = O(ε−2δ−2)
Terrible. Worse than expectations.
How about common random numbers for variance reduction?
Common random numbers
It’s exactly what it sounds like. Reuse the random numbers used in thegeneration of
X θ+ε[i] (t) and X θ
[i](t).
Why?
Because:
Var(f (θ,X θ+ε[i] (t))− f (θ,X θ+ε
[i] (t))) = Var(f (θ,X θ+ε[i] (t))) + Var(f (θ,X θ
[i](t)))
− 2Cov(f (θ,X θ+ε[i] (t)), f (θ,X θ
[i](t))).
So, if we can “couple” the random variables, we can get a variance reduction!Sometimes substantial.
Common random numbers
It’s exactly what it sounds like. Reuse the random numbers used in thegeneration of
X θ+ε[i] (t) and X θ
[i](t).
Why? Because:
Var(f (θ,X θ+ε[i] (t))− f (θ,X θ+ε
[i] (t))) = Var(f (θ,X θ+ε[i] (t))) + Var(f (θ,X θ
[i](t)))
− 2Cov(f (θ,X θ+ε[i] (t)), f (θ,X θ
[i](t))).
So, if we can “couple” the random variables, we can get a variance reduction!Sometimes substantial.
Common random numbers
I In the context of Gillespie’s algorithm, we simply reuse all the samerandom numbers (uniforms).
I This can be achieved simply by setting the “seed” of the random numbergenerator before generating X θ+ε and X θ.
Common random numbers
CRN + Gillespie is good idea.
1. Costs little in terms of implementation.
2. Variance reduction and gains in efficiency can be huge.
Thus, it is probably the most common method used today.
But:I Over time, the processes decouple, often completely.
Can we do better?
Coupling
Using common random numbers in previous fashion is a way of “coupling”the two processes together.
Is there a natural way to couple processes using random time change? Canwe couple the Poisson processes?
Answer: yes. Multiple ways. I will show one which works very well.
Coupling
Using common random numbers in previous fashion is a way of “coupling”the two processes together.
Is there a natural way to couple processes using random time change? Canwe couple the Poisson processes?
Answer: yes. Multiple ways. I will show one which works very well.
Coupling
Using common random numbers in previous fashion is a way of “coupling”the two processes together.
Is there a natural way to couple processes using random time change? Canwe couple the Poisson processes?
Answer: yes. Multiple ways. I will show one which works very well.
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set
Z13.1(t) = Y1(13.1t),
Z13(t) = Y2(13t),
Using this representation, these processes are independent and, hence,not coupled.
The variance of difference is large:
Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))
= 26.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set
Z13.1(t) = Y1(13.1t),
Z13(t) = Y2(13t),
Using this representation, these processes are independent and, hence,not coupled.
The variance of difference is large:
Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))
= 26.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set
Z13.1(t) = Y1(13.1t),
Z13(t) = Y2(13t),
Using this representation, these processes are independent and, hence,not coupled.
The variance of difference is large:
Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))
= 26.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset
Z13.1(t) = Y1(13t) + Y2(0.1t)
Z13(t) = Y1(13t),
The variance of difference is much smaller:
Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .
Using a fact: sum of homogeneous Poisson process is again a Poissonprocess.
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset
Z13.1(t) = Y1(13t) + Y2(0.1t)
Z13(t) = Y1(13t),
The variance of difference is much smaller:
Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .
Using a fact: sum of homogeneous Poisson process is again a Poissonprocess.
How do we generate processes simultaneously
More generally, suppose we want
1. non-homogeneous Poisson process with intensity f (t) and
2. non-homogeneous Poisson process with intensity g(t).
We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define
Zf (t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
),
Zg(t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y3
(∫ t
0g(s)− (f (s) ∧ g(s)) ds
),
where we are using that, for example,
Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
)= Y
(∫ t
0f (s)ds
),
where Y is a unit rate Poisson process.
How do we generate processes simultaneously
More generally, suppose we want
1. non-homogeneous Poisson process with intensity f (t) and
2. non-homogeneous Poisson process with intensity g(t).
We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define
Zf (t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
),
Zg(t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y3
(∫ t
0g(s)− (f (s) ∧ g(s)) ds
),
where we are using that, for example,
Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
)= Y
(∫ t
0f (s)ds
),
where Y is a unit rate Poisson process.
How do we generate processes simultaneously
More generally, suppose we want
1. non-homogeneous Poisson process with intensity f (t) and
2. non-homogeneous Poisson process with intensity g(t).
We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define
Zf (t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
),
Zg(t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y3
(∫ t
0g(s)− (f (s) ∧ g(s)) ds
),
where we are using that, for example,
Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
)= Y
(∫ t
0f (s)ds
),
where Y is a unit rate Poisson process.
Parameter sensitivities.
Couple the processes.
X θ+ε(t) = X θ+ε(0) +∑
k
Yk,1
(∫ t
0λθ+ε
k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk
+∑
k
Yk,2
(∫ t
0λθ+ε
k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds
)ζk
X θ(t) = X θ(0) +∑
k
Yk,1
(∫ t
0λθ+ε
k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk
+∑
k
Yk,3
(∫ t
0λθk (X θ(s))− λθ+ε
k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk ,
Parameter sensitivities.
Theorem3 Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is aCT ,f > 0 for which
E supt≤T
(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
)2≤ CT ,f ε.
This lowers variance of estimator from
O(N−1ε−2),
toO(N−1ε−1).
Lowered by order of magnitude (in ε).
Point: a deeper mathematical understanding led to better computationalmethod.
3David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 50, No. 5, 2012.
Parameter sensitivities.
Theorem3 Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is aCT ,f > 0 for which
E supt≤T
(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
)2≤ CT ,f ε.
This lowers variance of estimator from
O(N−1ε−2),
toO(N−1ε−1).
Lowered by order of magnitude (in ε).
Point: a deeper mathematical understanding led to better computationalmethod.
3David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 50, No. 5, 2012.
Analysis
TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which
E supt≤T
(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
)2≤ CT ,f ε.
Proof:
Key observation of proof:
X θ+ε(t)− X θ(t) = Mθ,ε(t) +
∫ t
0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds,
where “most” of the jumps have vanished.
Now work on Martingale and absolutely continuous part.
Analysis
TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which
E supt≤T
(f (θ + ε,X θ+ε(t))− f (θ,X θ(t))
)2≤ CT ,f ε.
Proof:Key observation of proof:
X θ+ε(t)− X θ(t) = Mθ,ε(t) +
∫ t
0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds,
where “most” of the jumps have vanished.
Now work on Martingale and absolutely continuous part.
Example: gene transcription and translation
G 2→ G + M,
M 10→ M + P,
M k→ ∅,
P 1→ ∅.
Want∂
∂θE[X θ
protein(30)], θ ≈ 1/4.
Method R 95% CI # updates CPU TimeLikelihood 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 S
CMC 246,000 -319.3 ± 6.0 2.1× 109 2,364.8 SCRP/CRN 25,980 -316.7 ± 6.0 2.2× 108 270.9 S
CFD 4,580 -319.9 ± 6.0 2.0× 107 29.2 S
Table: Each finite difference method used ε = 1/40. The exact value isJ(1/4) = −318.073.
Example: gene transcription and translation
G 2→ G + M,
M 10→ M + P,
M k→ ∅,
P 1→ ∅.
Want∂
∂θE[X θ
protein(30)], θ ≈ 1/4.
Method R 95% CI # updates CPU TimeLikelihood 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 S
CMC 246,000 -319.3 ± 6.0 2.1× 109 2,364.8 SCRP/CRN 25,980 -316.7 ± 6.0 2.2× 108 270.9 S
CFD 4,580 -319.9 ± 6.0 2.0× 107 29.2 S
Table: Each finite difference method used ε = 1/40. The exact value isJ(1/4) = −318.073.
Comparison from 5,000 samples each with ε = 1/40
0 10 20 30 40 50 600
10
20
30
40
50
60
Time
Varia
nce
Coupled Finite DifferencesCommon Reaction Path
0 10 20 30 40 50 600
100
200
300
400
500
Time
Varia
nce
Crude Monte Carlo
0 10 20 30 40 50 600
500
1000
1500
2000
2500
3000
Time
Varia
nce
Girsanov Transformation
Example: genetic toggle switch
∅λ1(X)
�λ2(X)
X1, ∅λ3(X)
�λ4(X)
X2, (1)
with intensity functions
λ1(X (t)) =α1
1 + X2(t)β, λ2(X (t)) = X1(t)
λ3(X (t)) =α2
1 + X1(t)γ. λ4(X (t)) = X2(t),
and parameter choice
α1 = 50, α2 = 16, β = 2.5, γ = 1.
I Begin the process with initial condition [0, 0] andI consider the sensitivity of X1 as a function of α1.
Example: genetic toggle switch
0 5 10 15 20 25 30 35 400
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Time
Varia
nce
Coupled Finite DifferencesCommon Reaction Path
(a) Variance to time T = 40
Figure: Time plot of the variance of the Coupled Finite Difference estimator versus theCommon Reaction Path estimator for the model (1). Each plot was generated using10,000 sample paths. A perturbation of ε = 1/10 was used.
Are these representations only good for simulation? – LLN and ODEs.Tom Kurtz ∼ 1970’s
Suppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
Under mild assumptions, have
λk (XN(t)) = λk (N · XN(t)/N) = N · λk (X N(t)).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use thatlim
N→∞sup{u≤U}
∣∣N−1Y (Nu)− u∣∣ = 0,
find X N(t) converges to solution of classical ODE
x(t) = x(0) +∑
k
∫ t
0λk (x(s))ds · ξk ≡ x(0) +
∫ t
0F (X (s))ds.
Are these representations only good for simulation? – LLN and ODEs.Tom Kurtz ∼ 1970’s
Suppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
Under mild assumptions, have
λk (XN(t)) = λk (N · XN(t)/N) = N · λk (X N(t)).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use thatlim
N→∞sup{u≤U}
∣∣N−1Y (Nu)− u∣∣ = 0,
find X N(t) converges to solution of classical ODE
x(t) = x(0) +∑
k
∫ t
0λk (x(s))ds · ξk ≡ x(0) +
∫ t
0F (X (s))ds.
Are these representations only good for simulation? – LLN and ODEs.Tom Kurtz ∼ 1970’s
Suppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
Under mild assumptions, have
λk (XN(t)) = λk (N · XN(t)/N) = N · λk (X N(t)).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use thatlim
N→∞sup{u≤U}
∣∣N−1Y (Nu)− u∣∣ = 0,
find X N(t) converges to solution of classical ODE
x(t) = x(0) +∑
k
∫ t
0λk (x(s))ds · ξk ≡ x(0) +
∫ t
0F (X (s))ds.
Are these representations only good for simulation? – LLN and ODEs.Tom Kurtz ∼ 1970’s
Suppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
Under mild assumptions, have
λk (XN(t)) = λk (N · XN(t)/N) = N · λk (X N(t)).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use thatlim
N→∞sup{u≤U}
∣∣N−1Y (Nu)− u∣∣ = 0,
find X N(t) converges to solution of classical ODE
x(t) = x(0) +∑
k
∫ t
0λk (x(s))ds · ξk ≡ x(0) +
∫ t
0F (X (s))ds.
Are these representations only good for simulation? – LLN and ODEs.Tom Kurtz ∼ 1970’s
Suppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
Under mild assumptions, have
λk (XN(t)) = λk (N · XN(t)/N) = N · λk (X N(t)).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use thatlim
N→∞sup{u≤U}
∣∣N−1Y (Nu)− u∣∣ = 0,
find X N(t) converges to solution of classical ODE
x(t) = x(0) +∑
k
∫ t
0λk (x(s))ds · ξk ≡ x(0) +
∫ t
0F (X (s))ds.
Diffusions? Argument due to Tom KurtzSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use that1√N
(Yk (Nu)− Nu) ≈ Wk (u)
find X N(t) well approximated by chemical Langevin process
X (t) = X (0) +∑
k
ζk
∫ t
0λk (X (s))ds +
1√N
∑k
∫ t
0
√λk (X (s))dWk (s).
ordX (t) = F (X (t))dt + N−1/2
∑k
ζk
√λk (X (t))dWk (t).
Diffusions? Argument due to Tom KurtzSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use that1√N
(Yk (Nu)− Nu) ≈ Wk (u)
find X N(t) well approximated by chemical Langevin process
X (t) = X (0) +∑
k
ζk
∫ t
0λk (X (s))ds +
1√N
∑k
∫ t
0
√λk (X (s))dWk (s).
ordX (t) = F (X (t))dt + N−1/2
∑k
ζk
√λk (X (t))dWk (t).
Diffusions? Argument due to Tom KurtzSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use that1√N
(Yk (Nu)− Nu) ≈ Wk (u)
find X N(t) well approximated by chemical Langevin process
X (t) = X (0) +∑
k
ζk
∫ t
0λk (X (s))ds +
1√N
∑k
∫ t
0
√λk (X (s))dWk (s).
ordX (t) = F (X (t))dt + N−1/2
∑k
ζk
√λk (X (t))dWk (t).
Diffusions? Argument due to Tom KurtzSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1).
XN(t) = XN(0) +∑
k
Yk
(∫ t
0λk (XN(s))ds
)ξk
becomes
X N(t) = X N(0) +∑
k
N−1Yk
(N∫ t
0λk (XN(s))ds
)ξk
use that1√N
(Yk (Nu)− Nu) ≈ Wk (u)
find X N(t) well approximated by chemical Langevin process
X (t) = X (0) +∑
k
ζk
∫ t
0λk (X (s))ds +
1√N
∑k
∫ t
0
√λk (X (s))dWk (s).
ordX (t) = F (X (t))dt + N−1/2
∑k
ζk
√λk (X (t))dWk (t).
Central limit theorem - Kurtz/ Van KampenSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1), x(t) = ODE solution.
LetUN(t) =
√N(
X N(t)− x(t))
=Xn(t)− Nx(t)√
N.
Then,
UN(t) =√
N
(N−1
∑k
ζk Yk
(N∫ t
0λk (X n(s))ds
))
+√
N∫ t
0(F (X n(s))− F (x(s)))ds
≈ 1√N
∑k
ζk Yk
(N∫ t
0λk (X n(s))ds
)+
∫ t
0DF (x(s))Un(s)ds.
use martingale central limit theorem to show that1√N
Yk (N·)⇒ Wk (·),
get Un ⇒ U,
U(t) =∑
k
ζk Wk
(∫ t
0λk (x(s))ds
)+
∫ t
0DF (x(s))U(s)ds
Central limit theorem - Kurtz/ Van KampenSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1), x(t) = ODE solution.
LetUN(t) =
√N(
X N(t)− x(t))
=Xn(t)− Nx(t)√
N.
Then,
UN(t) =√
N
(N−1
∑k
ζk Yk
(N∫ t
0λk (X n(s))ds
))
+√
N∫ t
0(F (X n(s))− F (x(s)))ds
≈ 1√N
∑k
ζk Yk
(N∫ t
0λk (X n(s))ds
)+
∫ t
0DF (x(s))Un(s)ds.
use martingale central limit theorem to show that1√N
Yk (N·)⇒ Wk (·),
get Un ⇒ U,
U(t) =∑
k
ζk Wk
(∫ t
0λk (x(s))ds
)+
∫ t
0DF (x(s))U(s)ds
Central limit theorem - Kurtz/ Van KampenSuppose XN(t) = O(N). Denote concentrations via
X N(t) = N−1XN(t) = O(1), x(t) = ODE solution.
LetUN(t) =
√N(
X N(t)− x(t))
=Xn(t)− Nx(t)√
N.
Then,
UN(t) =√
N
(N−1
∑k
ζk Yk
(N∫ t
0λk (X n(s))ds
))
+√
N∫ t
0(F (X n(s))− F (x(s)))ds
≈ 1√N
∑k
ζk Yk
(N∫ t
0λk (X n(s))ds
)+
∫ t
0DF (x(s))Un(s)ds.
use martingale central limit theorem to show that1√N
Yk (N·)⇒ Wk (·),
get Un ⇒ U,
U(t) =∑
k
ζk Wk
(∫ t
0λk (x(s))ds
)+
∫ t
0DF (x(s))U(s)ds
Thanks!
References:
1. David F. Anderson, An Efficient Finite Difference Method for ParameterSensitivities of Continuous Time Markov Chains, SIAM: Journal onNumerical Analysis, Vol. 50, No. 5, 2012.
2. David F. Anderson and Thomas G. Kurtz, Continuous Time MarkovChain Models for Chemical Reaction Networks, in Design and Analysisof biomolecular circuits, Springer, 2011, Eds. Heinz Koeppl et al.
Funding: NSF-DMS-1009275.