introduction to large deviation theory: theory, applications,...
Post on 29-Jun-2020
5 Views
Preview:
TRANSCRIPT
Introduction to large deviation theory:Theory, applications, simulations
Hugo Touchette
School of Mathematical SciencesQueen Mary, University of London
3rd International Summer School on Modern Computational Science:Simulation of Rare and Extreme Events
Oldenburg, Germany15-26 August 2011
Hugo Touchette (QMUL) Large deviations August 2011 1 / 74
Outline
Themes
Random variables / stochastic systems
Most probable values / typical states
Fluctuations around these states
Small vs large deviations / fluctuations / rare events
1 Large deviation theory
2 Applications
3 Simulations I
4 Simulations II
Lecture notes: arxiv:1106.4146
H. Touchette, The large deviation approach to statistical mechanics,Physics Reports 478, 1-69, 2009. arxiv:0804.0327
Hugo Touchette (QMUL) Large deviations August 2011 2 / 74
Example: Sum of Gaussian random variables
Sn =1
n
n∑i=1
Xi , p(Xi = x) =1√
2πσ2e−(x−µ)2/(2σ2)
0 100 200 300 400 5000
100
200
300
400
500
0 100 200 300 400 500
-1
0
1
2
3
4
n n
SnSnn
Basic observations
Sn → µ in probability
Fluctuations ∼ 1/√
n→ 0
Hugo Touchette (QMUL) Large deviations August 2011 4 / 74
Sum of Gaussian random variables (cont’d)
Probability density function (pdf) of Sn:
p(Sn = s) =
√n
2πσ2e−n(s−µ)2/(2σ2)
Variance: var(Sn) =σ2
n→ 0
Dominant part:
p(Sn = s) ≈ e−nI (s)
Rate function:
I (s) =(s − µ)2
2σ2
p(S =s)
I(s)
n
s
n=10
n=100
n=500
µ
Hugo Touchette (QMUL) Large deviations August 2011 5 / 74
Exercise: Calculation of p(Sn) via generating functions[Exercise 2.7.1]
Hugo Touchette (QMUL) Large deviations August 2011 6 / 74
Example: Sum of exponential random variables
Sn =1
n
n∑i=1
Xi
p(x) =1
µe−x/µ
0 100 200 300 400 5000
100
200
300
400
500
0 100 200 300 400 500
1
1.5
2
2.5
3
3.5
n n
SnSnn
Large deviation probability:
p(Sn = s) ≈ e−nI (s)
Rate function:
I (s) =s
µ− 1− ln
s
µ
I(s)
s
n=10
n=100
n=500
µ
p(S =s)n
Hugo Touchette (QMUL) Large deviations August 2011 7 / 74
Example: Random bits
Sequence of bits:
ω = 010011100101︸ ︷︷ ︸n bits
,P(0) = p0
P(1) = p1 = 1− p0
Empirical vector:
Ln,0(ω) =# 0’s in ω
n
Ln,1(ω) =# 1’s in ω
n
Ln = (Ln,0, Ln,1)
Example:
ω = 0001101001︸ ︷︷ ︸n=10
, L10,0 =6
10=
3
5
L10,1 =4
10=
2
5
Hugo Touchette (QMUL) Large deviations August 2011 8 / 74
Random bits (cont’d)
Probability of a sequence:
P(ω) = pnLn,0
0 pnLn,1
1
Probability of Ln:
P(Ln = µ) = P(Ln,0 = µ0, Ln,1 = µ1)
=n!
(nµ0)!(nµ1)!pnµ0
0 pnµ11
[Exercise 2.7.5]
Large deviation probability
P(Ln = µ) ≈ e−nD(µ), D(µ) = µ0 lnµ0
p0+ µ1 ln
µ1
p1
D = relative entropy
Zero of D: µ = (p0, p1)
Ln → (p0, p1) in probability
Hugo Touchette (QMUL) Large deviations August 2011 9 / 74
Example: Spin system
Spin chain (configuration):
ω = ω1, ω2, . . . , ωn︸ ︷︷ ︸n spins
, ωi ∈ −1,+1
Mean magnetization:
Mn =1
n
n∑i=1
ωi
Density of states:
Ω(m) = # configurations ω with Mn = m
Large deviation form
Ω(m) ≈ ens(m), s(m) = −1−m
2ln
1−m
2− 1 + m
2ln
1 + m
2
Hugo Touchette (QMUL) Large deviations August 2011 10 / 74
Large deviation theory
Random variable: An
Probability density: p(An = a)
Large deviation principle (LDP)
p(An = a) ≈ e−nI (a)
Meaning of ≈:ln p(a) = −nI (a) + o(n)
limn→∞
−1
nln p(a) = I (a)
Rate function: I (a) ≥ 0
Goals of large deviation theory
1 Prove that a large deviation principle exists
2 Calculate the rate function
Hugo Touchette (QMUL) Large deviations August 2011 11 / 74
Varadhan’s TheoremExponential average:
〈enf (An)〉 =
∫p(An = a) enf (a) da
Assume LDP for An:
p(An = a) ≈ e−nI (a)
Courant Institute
Abel Prize 2007
Theorem: Varadhan (1966)
λ(f ) = limn→∞
1
nlog〈enf (An)〉 = max
af (a)− I (a)
Special case: f (a) = ka
λ(k) = maxaka− I (a)
Hugo Touchette (QMUL) Large deviations August 2011 12 / 74
Heuristic derivation of Varadhan’s result
Hugo Touchette (QMUL) Large deviations August 2011 13 / 74
Gartner-Ellis Theorem
Scaled cumulant generating function (SCGF)
λ(k) = limn→∞
1
nln〈enkAn〉, k ∈ R
Theorem: Gartner (1977), Ellis (1984)
If λ(k) is differentiable, then
1 Existence of LDP:
p(An = a) ≈ e−nI (a)
2 Rate function:
I (a) = maxkka− λ(k)
Richard S. Ellis
UMass, USA
I (a) = Legendre transform of λ(k)
I (a) convex in this case
Hugo Touchette (QMUL) Large deviations August 2011 14 / 74
Exercise: Legendre transforms[Exercise 2.7.8]
Hugo Touchette (QMUL) Large deviations August 2011 15 / 74
Contraction principle
LDP An −→ Bn LDP?
LDP for An:p(An = a) ≈ e−nIA(a)
Contraction: Bn = f (An)Probability for Bn:
p(Bn = b) =
∫f −1(b)
p(An = a) da
Contraction principle
LDP for Bn:p(Bn = b) ≈ e−nIB(b)
Rate function:
IB(b) = mina:f (a)=b
IA(a) = minf −1(b)
IA(a)
Hugo Touchette (QMUL) Large deviations August 2011 16 / 74
Sums of IID random variablesRandom variable:
Sn =1
n
n∑i=1
Xi , Xi ∼ p(x), IID
SCGF:
λ(k) = limn→∞
1
nln〈enkSn〉 = lim
n→∞
1
nln
⟨n∏
i=1
ekXi
⟩= ln〈ekX 〉
Gartner-Ellis Theorem
I (s) = k(s)s − λ(k(s)), λ′(k(s)) = s
I (s) is convex
Zero of I (s) at 〈X 〉Originally proved by Cramer (1938)
Hugo Touchette (QMUL) Large deviations August 2011 17 / 74
Example: Gaussian random variables[Exercise 3.6.1]
Sn =1
n
n∑i=1
Xi , p(Xi = x) =1√
2πσ2e−(x−µ)2/(2σ2)
Generating function:
〈ekX 〉 = ln
∫ ∞−∞
ekx 1√2πσ2
e−(x−µ)2/(2σ2)dx = ekµ+σ2k2/2
Log-generating function:
λ(k) = ln〈ekX 〉 = kµ+σ2
2k2
Rate function:
I (s) = k(s)s−λ(k(s)) =(s − µ)2
2σ2
p(S =s)
I(s)
n
s
n=10
n=100
n=500
µ
Hugo Touchette (QMUL) Large deviations August 2011 18 / 74
Example: Exponential random variables*
[Exercise 3.6.1]
Sn =1
n
n∑i=1
Xi , p(Xi = x) =1
µe−x/µ, x > 0
Log-generating function:
λ(k) = − ln(1− µk), k <1
µ
Rate function:
I (s) =s
µ− 1− ln
s
µ, s > 0
I(s)
s
n=10
n=100
n=500
µ
p(S =s)n
Hugo Touchette (QMUL) Large deviations August 2011 19 / 74
Example: Bernoulli sample mean[Exercise 3.6.1]
Sn =1
n
n∑i=1
Xi ,Xi ∈ 0, 1P(Xi = 1) = αP(Xi = 0) = 1− α
Discrete RV:
Sn ∈
0,1
n,
2
n, . . . ,
n − 1
n, 1
Discrete probability distribution: P(Sn = s)
Values of Sn “fill” unit interval [0, 1] as n→∞Continuous-limit probability density:
p(Sn = s) =P(Sn ∈ [s, s + ∆s])
∆s
Smoothed pdf (weak convergence)
Hugo Touchette (QMUL) Large deviations August 2011 20 / 74
Bernoulli sample mean (cont’d)
0.0 0.2 0.4 0.6 0.8 1.00.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
s
PS n
HsL
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
s
I nHsL
•n=5•n=25•n=50•n=100
0.0 0.2 0.4 0.6 0.8 1.00
2
4
6
8
s
p SnHsL
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
s
I nHsL
— n=5— n=25— n=50— n=100
LDP: p(Sn = s) ≈ e−nI (s)
Rate function:
I (s) = s lns
α+ (1− s) ln
1− s
1− αHugo Touchette (QMUL) Large deviations August 2011 21 / 74
Example: Cauchy random variables*
[Exercise 3.6.1]
Sn =1
n
n∑i=1
Xi , p(Xi = x) =1
π
1
x2 + 1, x ∈ R
Generating function:
λ(k) =
0 k = 0∞ k 6= 0
No large deviations
I (s) = 0
P(Sn = s) has power-law tails (not exponential)
Hugo Touchette (QMUL) Large deviations August 2011 22 / 74
Sanov’s Theorem[Exercise 3.6.8]
n IID random variables: ω = ω1, ω2, . . . , ωn
Empirical frequencies:
Ln,j =1
n
n∑i=1
δωi ,j =# (ωi = j)
n
Empirical vector: Ln = (Ln,1, Ln,2, . . . , Ln,q)
SCGF:
λ(k) = ln
q∑j=1
pj ekj
Gartner-Ellis Theorem
LDP: P(Ln = µ) ≈ e−nD(µ)
Rate function: D(µ) = k(µ) · µ− λ(k(µ)) =
q∑j=1
µj lnµj
pj
Hugo Touchette (QMUL) Large deviations August 2011 23 / 74
Markov processesDonsker and Varadhan (1975)
Markov chain:
ω = ω1, ω2, . . . , ωn, p(ω) = ρ(ω1)π(ω2|ω1) · · ·π(ωn|ωn−1)
Additive process:
Sn =1
n
n∑i=1
f (ωi )
Gatner-Ellis Theorem
Tilted transition matrix: πk(ωn|ωn−1) = π(ωn|ωn−1)ekf (ωn)
Dominant eigenvalue: ζ(πk)
SCGF:λ(k) = ln ζ(πk)
LDP:p(Sn = s) ≈ e−nI (s), I (s) = max
kks − λ(k)
Hugo Touchette (QMUL) Large deviations August 2011 24 / 74
Exercise: SCGF for Markov chains
Hugo Touchette (QMUL) Large deviations August 2011 25 / 74
Example: Binary Markov chain*[Exercise 3.6.10]
Sample mean:
Rn =1
n
n∑i=1
ωi , ωi ∈ 0, 1
Transition matrix:
Π =
(π(0|0) π(0|1)π(1|0) π(1|1)
)=
(1− α αα 1− α
), α ∈ (0, 1)
Rate function:
I (r) ≈ I0(r)+2(1−2r)2ε+(2−32r2 +64r3−32r4)ε2, ε = 1/2−α
λ(k)
k
(b)
I(r)
r
(c)
-4 -2 0 2 4
-1
0
1
2
3
4
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6 α = 0.75
α = 0.25
α = 0.5
α = 0.75
α = 0.25
10
(a)
α
α
1 − α
1 − α
α = 0.5
Hugo Touchette (QMUL) Large deviations August 2011 26 / 74
General properties
Most probable value = min and zero of I
Zero of I = Law of Large Numbers
Parabolic minimum = Central Limit Theorem
I(a)
a a
I(a)
p (a)n p (a)n
I (a) 6= maxkka− λ(k) when I is nonconvex
[Exercise 3.6.2]
Hugo Touchette (QMUL) Large deviations August 2011 27 / 74
Summary
Large deviation principle
p(An = a) ≈ e−nI (a)
Valid for uncorrelated and correlated processes
Exponential term is dominant
Describes small and large fluctuations
Most probable value: min of I (a)
Law of large numbers: min (zero) of I (a)
Central limit theorem: Parabolic minimum of I (a)Methods for obtaining I (a):
I Gartner-Ellis TheoremI Contraction principle
Exercises
2.7.1 – 2.7.10
3.6.1 – 3.6.10
Hugo Touchette (QMUL) Large deviations August 2011 28 / 74
Continuous-time processesStochastic process: x(t)Path pdf:
p[x ]︸︷︷︸functional
= p(x(t)Tt=0) = Probability of path x(t)
Observable: AT [x ]Observable distribution:
p(AT = a) =
∫x(t):AT [x]=a
D[x ] p[x ]
Problems
Find p(AT = a)
Find most probable value of AT
Determine fluctuations around steady state
Scaling limits:I Long-time: T →∞I Low noise
Hugo Touchette (QMUL) Large deviations August 2011 30 / 74
Low-noise limit of stochastic differential equations
Dynamical system:x(t) = F (x(t))
Perturbed dynamics:
X (t) = F (X (t)) +√ε ξ(t)︸ ︷︷ ︸
perturbation
Low-noise limit: ε→ 0
Gaussian white noise:
〈ξ(t)〉 = 0, 〈ξ(t)ξ(t ′)〉 = δ(t − t ′)
0 τ
x0 x(t)
0 τ
x0 X (t)ǫ
Hugo Touchette (QMUL) Large deviations August 2011 31 / 74
Exercise: Simulating SDEs
Hugo Touchette (QMUL) Large deviations August 2011 32 / 74
LDP for the random paths
Functional LDP
p[x ] ≈ e−J[x]/ε, J[x ] =1
2
∫ T
0[x(t)− F (x(t))]2 dt︸ ︷︷ ︸
Lagrangian︸ ︷︷ ︸Action
Low-noise limit: ε→ 0
Derived in maths by Freidlin and Wentzell (1984)
Derived in physics by Onsager and Machlup (1953)
Zero of J[x ] = most probable dynamics = unperturbed dynamics:
x∗ = F (x∗)
Gaussian fluctuations around x∗(t)
Hugo Touchette (QMUL) Large deviations August 2011 33 / 74
Exercise: Derivation of the action
Hugo Touchette (QMUL) Large deviations August 2011 34 / 74
Other LDPs by contraction
Transition probability:
p(x ,T |x0) ≈ e−V (x ,T |x0)/ε
V (x ,T |x0) = minx(t):x(0)=x0,x(T )=x
J[x ]
Stationary distribution:
p(x) ≈ e−V (x)/ε
V (x) = minx(t):x(−∞)=0,x(0)=x
J[x ]
Exit time:
τε ≈ eV ∗/ε in probability
V ∗ = minx∈∂D
mint≥0
V (x , t|xs)
0 τ
x0
x(t)x
0
0
x(t)x
...
...
−∞
D
∂D
t
Hugo Touchette (QMUL) Large deviations August 2011 35 / 74
Example: Ornstein-Uhlenbeck process[Exercise 3.6.14]
System:x(t) = −γx(t) +
√ε ξ(t)
Stationary distribution:
p(x) ≈ e−V (x)/ε, V (x) = minx(t):x(0)=x0,x(∞)=x
I [x ]
Euler-Lagrange equation:
δI [x∗] = 0 ⇐⇒ d
dt
∂L
∂x− ∂L
∂x= 0, L =
1
2(x + γx)2
Solution: V (x) = I [x∗] = γx2
General result
x = −U ′(x) +√ε ξ(t) =⇒ V (x) = 2U(x)
Hugo Touchette (QMUL) Large deviations August 2011 36 / 74
Example: Noisy Van der Pol oscillator*[Exercise 3.6.17]
Equations of motion:
x = v
v = −x + v(α− x2 − v2) +√ε ξ(t)
I Coupled Langevin equationsI Bifurcation: Stable fixed point (α < 0) to stable limit cycle (α > 0)
0 10 20 30 40
1
0
1
2
t2 1 0 1 2
2
1
0
1
2
xv
Stationary distribution: p(r , θ) ≈ e−W (r)/ε
Hugo Touchette (QMUL) Large deviations August 2011 37 / 74
Noisy Van der Pol oscillator (cont’d)Solution:
W (r) = −αr2 +r4
2
21
01
2 2
1
0
1
2
010203040
21
01
2 2
1
0
1
2
05101520
21
01
2 2
1
0
1
2
2024
4 2 0 2 44
2
0
2
4
4 2 0 2 44
2
0
2
4
4 2 0 2 44
2
0
2
4
Hugo Touchette (QMUL) Large deviations August 2011 38 / 74
Long-time limit of additive observablesStochastic process: x(t)Observable:
AT [x ] =1
T
∫ T
0f (x(t)) dt
Gartner-Ellis
SCGF:
λ(k) = limT→∞
1
Tln〈eTkAT 〉, 〈eTkAT 〉 =
∫D[x ] p[x ] eTkAT [x]
Rate function: I (a) = maxkka− λ(k)
Donsker-Varadhan
Generator: L
Tilted generator: Lk = L + kf
SCGF: λ(k) = ζ(Lk)
Hugo Touchette (QMUL) Large deviations August 2011 39 / 74
Example: Pulled Brownian particle[Exercise 3.6.18]
Langevin dynamics:
mx(t) = −αx︸︷︷︸drag
−k[x(t)− vt]︸ ︷︷ ︸spring force
+ ξ(t)︸︷︷︸noise
Work:
WT =1
T
∫ T
0F (t) v dt = ∆U + QT T
vtQ
U
Large deviation principle
SCGF:
λ(k) = limT→∞
1
Tln〈eTkWT 〉 = ck(1 + k), c = v2
Rate function:
I (w) = maxkkw − λ(k) =
(w − c)2
4c
Hugo Touchette (QMUL) Large deviations August 2011 40 / 74
Summary
Path LDPs
p[x ] ≈ e−I [x]/ε p(AT [x ] = a) ≈ e−TI (a)
LDPs for time-evolving or steady-state processesDescribes most probable state (or trajectory)Describes fluctuations around most probable statesMost probable state = zero of I = min of IMost probable state given by variational principle
Connection with classical mechanics
δI [x∗] = 0 =⇒
Euler-Lagrange equationHamiltonian equationHamiltonian-Jacobi equation
Exercises
3.6.11 – 3.6.18
Hugo Touchette (QMUL) Large deviations August 2011 41 / 74
Other applications
Equilibrium statistical mechanics
Multifractals
Chaotic systems (thermodynamic formalism)
Disorded systemsI Random walks in random environmentsI Spin glassesI Quenched/annealed large deviations
Nonequilibrium systems
Interacting particle modelsI Zero-range processI Exclusion processI Current, density profileI Fluctuation relationsI Space/time large deviations
Hugo Touchette (QMUL) Large deviations August 2011 42 / 74
Equilibrium many-particle systems
N particles
Microstate: ω = ω1, ω2, . . . , ωN
Space of one particle: ωi ∈ Λ
Space of N particles: ΛN = ΛN
Probability distribution on ΛN : P(ω)
Macrostate: MN(ω)
Probability distribution for MN :
P(MN = m) =∑
ω:MN(ω)=m
P(ω)
Problems
Find P(MN = m)
Find most probable values of MN (= equilibrium states)
Study fluctuations around most probable value
Consider thermodynamic limit N →∞Hugo Touchette (QMUL) Large deviations August 2011 43 / 74
Thermodynamic LDPs
Ensembles
Microcanonical
Pu(MN = m) ≈ e−NIu(m)
Canonical
Pβ(MN = m) ≈ e−NIβ(m)
Generalize and refine Einstein’s theory of fluctuations
Equilibrium and fluctuation properties given by rate function
Equilibrium states = min of I (m) = zero of I (m)
Equilibrium states given by variational principles
Entropy
P(UN = u) ≈ eNs(u), s(u) = minββu − ϕ(β)
Legendre transform of thermo = Legendre transform of LDT
Legendre transform is valid only if s(u) is concave
Hugo Touchette (QMUL) Large deviations August 2011 44 / 74
Problem
Sequence of RVs: ω = X1,X2, . . . ,Xn
Observable (RV): Sn(ω)
Probability density:
p(Sn = s) =P(Sn ∈ [s, s + ∆s])
∆s=
P(Sn ∈ ∆s)
∆s
Goals
1 Sample Sn
2 Estimate p(Sn = s)
3 Test existence of LDP
4 Estimate rate function I (s)
n→∞∆s → 0
Continuous-time problem: Time discretization
X (t)τt=0 −→ Xini=1, Xi = X (i∆t)
Hugo Touchette (QMUL) Large deviations August 2011 46 / 74
Direct estimators
Configuration sample:
ω(1), ω(2), . . . , ω(L)I L copies or realizations of ωI Prior pdf: p(ω)
Observable sample:
s(1), s(2), . . . , s(L), s(j) = Sn(ω(j))
Estimator of p(Sn = s):
p(s) =P(∆s)
∆s=
1
L∆s
L∑j=1
1∆s (s(j)).
I Empirical pdfI Unbiased estimator [Exercise 4.7.3]
Hugo Touchette (QMUL) Large deviations August 2011 47 / 74
Exercise: What is an empirical pdf?
Hugo Touchette (QMUL) Large deviations August 2011 48 / 74
Direct estimators (cont’d)Estimator of rate function:
I (s) = −1
nln p(s)
1 Estimate p(s) for fixed L and n
2 Estimate I (s) for fixed L and n
3 Repeat for increasing values of n
-1 0 1 2 30
1
2
3
4
s
p SnHsL
— n=5— n=25— n=50— n=100
-1 0 1 2 30.0
0.5
1.0
1.5
2.0
2.5
3.0
s
I nHsL
— n=1— n=2— n=3— n=10
Use large enough L to get good statistics
Hugo Touchette (QMUL) Large deviations August 2011 49 / 74
Problem with direct sampling
p(Sn = s) ≈ e−nI (s)
Event Sn = s (or Sn ∈ ∆s) is exponentially rare
Choose L ∼ enI (s) to see this event
L exponential with n
Solution: Importance sampling
Sample ω with another pdf q(ω)
Choose q(ω) to make Sn = s probable
Correct estimation of p(Sn = s) according to q(ω) chosen
Hugo Touchette (QMUL) Large deviations August 2011 50 / 74
Example: Gaussian sample mean
[Exercise 4.7.2]
Sn =1
n
n∑i=1
Xi , p(Xi = x) =1√
2πσ2e−(x−µ)2/(2σ2)
1 Generate x1, x2, . . . , xn ∼ Gaussian variates
2 Compute Sn
3 Repeat L times to obtain sample s(1), s(2), . . . , s(L)4 Compute p(s) of sample
5 Compute I (s)
6 Repeat for different n and L
Hugo Touchette (QMUL) Large deviations August 2011 51 / 74
Gaussian sample mean (cont’d)
-1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
s
I n,L
HsLn = 10
ê L = 100
ê L = 1000
ê L = 10000
ê L = 100000
-1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
s
I n,L
HsL
L = 10 000
ê n = 10
ê n = 2
ê n = 20
ê n = 5
Increase L to sample tails
Increase L to smooth results (smaller error bars)[Exercise 4.7.4] [Exercise 4.7.5]
Increase L for increasing n
Choose L ∼ en
Hugo Touchette (QMUL) Large deviations August 2011 52 / 74
Importance sampling
Original sampling pdf: p(ω)
New sampling pdf: q(ω)
New estimator
q(s) =1
L∆s
L∑j=1
1∆s
(Sn(ω(j))
)R(ω(j))
Importance sampling or likelihood ratio: R(ω) =p(ω)
q(ω)
q(s) is unbiased:
〈q(s)〉q = 〈p(s)〉p = p(Sn = s)
q(s) may have smaller variance than p(s)
Choose q such that varq(q) < varp(p)
Hugo Touchette (QMUL) Large deviations August 2011 53 / 74
Exercise: Derivation of new estimator
Hugo Touchette (QMUL) Large deviations August 2011 54 / 74
Exponential change of measure
Original sampling pdf: p(ω)
Exponentially tilted pdf:
pk(ω) =enkSn(ω)
Wn(k)p(ω), k ∈ R
Generating function:
Wn(k) = 〈enkSn〉p =
∫enkSn(ω) p(ω) dω
Likelihood ratio:
R(ω) =p(ω)
pk(ω)= e−nkSn(ω) Wn(k) ≈ e−n[kSn(ω)−λ(k)]
pk(ω) is efficient
Zero-variance estimator as n→∞How to choose k?
Hugo Touchette (QMUL) Large deviations August 2011 55 / 74
Example: Gaussian sample mean[Exercise 4.7.9]
Sn =1
n
n∑i=1
Xi , p(ω) =m∏
i=1
p(Xi ), p(Xi = x) =1√
2πσ2e−(x−µ)2/(2σ2)
Choose k ∈ RGenerate variate x1, x2, . . . , xn according to tilted pdf:
pk(xi ) =ekxi p(xi )
W (k)=
e−(xi−µ−σ2k)2/(2σ2)
√2πσ2
Compute Sn
Repeat L times to obtain sample s(1), s(2), . . . , s(L)Compute q(s)
Compute I (s)
Repeat for k ∈ [kmin, kmax]
Hugo Touchette (QMUL) Large deviations August 2011 56 / 74
Gaussian sample mean (cont’d)
èè
è
èè
è
èèèèè
è
èèèèèèè
è
èèèè
èèèè
èèèèèèè
èèèèèèè
èèèèè
èè
èèè
èè
èèè
è
è
è
è
è
è
è
è
èè
è
èèè
èè
èèèèèèè
èèèèèèèèèèèèèèèèèèè
èèèè
èèèèèèèèèèè
è
è
è
è
èè
-2 -1 0 1 2 3 4
0
1
2
3
4
s
I n,L
HsL
L = 100ê n = 10ê n = 5 è
è
è
è
è
è
è
èè
èèè
èèèèèèèè
èèèèèè
èèèèèèèèèèèè
èèè
èèèèèèèèè
è
èèè
è
è
èè
è
è
è
è
è
è
è
è
è
è
è
è
èèèèèèèèèèèèèèèèèèèèèèèèèèèèè
èèèèèèèèèèèèè
è
è
è
è
è
è
è
è
-2 -1 0 1 2 3 40
1
2
3
4
s
I n,L
HsL
L = 1000ê n = 10ê n = 5
pk is efficient to recover I (s) at s = λ′(k)
Scan k ∈ [kmin, kmax] to obtain desired range s ∈ [smin, smax]
Non-parametric: choose k such that s = λ′(k) to obtain I at s
Hugo Touchette (QMUL) Large deviations August 2011 57 / 74
Gaussian sample mean (cont’d)
SCGF:
λ(k) = µk +1
2σ2k2
Concentration point:
λ′(k) = µ+ σ2k = s ⇒ k(s) =s − µσ2
Explicit form of tilted pdf:
pk(s)(xi ) =e−(xi−s)2/(2σ2)
√2πσ2
Importance sampling interpretation
Sn = s large deviation under p(ω)↓
Sn = s typical event under pk(s)(ω)
Hugo Touchette (QMUL) Large deviations August 2011 58 / 74
Metropolis (Monte Carlo) sampling
Problem
Tilted pdf pk involves Wn(k)
Tilted pdf pk assumes knowledge of SCGF λ(k)
Knowledge of SCGF ⇒ I (s)
Solutions1 Use other types of estimators2 Use Metropolis (Monte Carlo) sampling with pk
I Based on pk(ω)/pk(ω′)I Free of Wn(k) and λ(k)
[Exercise 4.7.11]
Hugo Touchette (QMUL) Large deviations August 2011 59 / 74
Exercise: Illustration of Metropolis sampling
Hugo Touchette (QMUL) Large deviations August 2011 60 / 74
Application to SDEs
Path pdf form
Original path pdf: p[x ]
Tilted pdf:
pk [x ] =eTkST [x] p[x ]
WT (k)≈ e−TIk [x], WT (k) = 〈eTkST 〉p
Transition probability form
Original transition matrix: Π(∆t)
Tilted transition matrix:
Πk(∆t) =ek∆tx ′Π(∆t)
eλ(k)∆t= e [kx ′+G−λ(k)]∆t
Generator form
Original generator: G
Tilted generator: Gk = G + kx ′ − λ(k)
Hugo Touchette (QMUL) Large deviations August 2011 61 / 74
Example: Gaussian additive processSDE: x(t) = ξ(t)Pure Brownian motion without driftObservable:
DT [x ] =1
T
∫ T
0x(t) dt =
x(T )
T
Intuitive observations
Typical state: DT = 0
Modified process with typical state DT = d :
x(t) = d + ξ(t)
Effective dynamics for large deviation
Explicit result:
Ik(d)[x ] = I [x ]− dDT [x ] +d2
2=
1
2T
∫ T
0(x − d)2 dt,
Hugo Touchette (QMUL) Large deviations August 2011 62 / 74
Exercise: Simulation of SDEs
Hugo Touchette (QMUL) Large deviations August 2011 63 / 74
Summary
Direct sampling of Sn with p(ω) inefficient
Requires exponential sample size L
Change sampling distribution to q(ω)
Make rare event under p more probable under q
Possible change of measure: Exponential measure
Exponential sampling efficient
Subexponential sample size
Structure of exponential measure = structure of LDT
Combine exponential sampling with Metropolis sampling
Other methods?
Hugo Touchette (QMUL) Large deviations August 2011 64 / 74
Sample mean method
Sample λ(k) instead of p(Sn = s)
Sampling with p(ω):
Sn → λ′(0) in probability
Sampling with pk(ω):
Sn → λ′(k) in probability
Estimator of Sn:
s(k) =1
L
L∑j=1
Sn(ω(j))
Estimator of λ(k):
λ(k) =
∫ k
0s(k ′) dk ′
Compute I (s) by Legendre transformNo need to estimate p
Hugo Touchette (QMUL) Large deviations August 2011 66 / 74
Example: Gaussian sample mean
[Exercise 5.4.1]
Sn =1
n
n∑i=1
Xi , p(ω) =m∏
i=1
p(Xi ), p(Xi = x) =1√
2πσ2e−(x−µ)2/(2σ2)
Choose k ∈ RGenerate variates x1, x2, . . . , xL according to tilted pdf:
pk(xi ) =ekxi p(xi )
W (k)=
e−(xi−µ−σ2k)2/(2σ2)
√2πσ2
Compute s for L large
Repeat for different k
Integrate results to obtain λ(k)
Obtain I by Legendre transform
Hugo Touchette (QMUL) Large deviations August 2011 67 / 74
Gaussian sample mean (cont’d)
-6 -4 -2 0 2 4 6
-4
-2
0
2
4
6
k
s LHkL
— L = 5— L = 50— L = 100
-6 -4 -2 0 2 4 60
5
10
15
20
k
Λ` LHsL
-2 -1 0 1 2 3 40
1
2
3
4
s
I LHsL
— L = 5— L = 50— L = 100
No empirical pdf calculatedNon-convex artefacts for small L (λ(k) not convex)
Hugo Touchette (QMUL) Large deviations August 2011 68 / 74
Empirical generating functionIID sample mean: λ(k) = E [ekX ]Estimator for λ(k):
λ(k) = ln1
L
L∑j=1
ekX (j)
Alternative estimator:
s(k) = λ′(k) =
∑Lj=1 X (j) ekX (j)∑L
j=1 ekX (j)
Markov chain:
X1 + · · ·+ Xb︸ ︷︷ ︸Y1
+ Xb+1 + · · ·+ X2b︸ ︷︷ ︸Y2
+ · · ·+ Xn−b+1 + · · ·+ Xn︸ ︷︷ ︸Ym
Markov estimator:
λ(k) =1
bln
1
L
L∑j=1
ekY (j), m =
n
b
Hugo Touchette (QMUL) Large deviations August 2011 69 / 74
Example: Gaussian sample mean
[Exercise 5.4.2]
Sn =1
n
n∑i=1
Xi , p(ω) =m∏
i=1
p(Xi ), p(Xi = x) =1√
2πσ2e−(x−µ)2/(2σ2)
Choose k ∈ RGenerate variates x1, x2, . . . , xL according to original pdf p(xi )
Compute λ(k) for L large
Repeat for different k
Obtain I by Legendre transform
Repeat for larger L
Hugo Touchette (QMUL) Large deviations August 2011 70 / 74
Gaussian sample mean (cont’d)
-6 -4 -2 0 2 4 60
5
10
15
20
k
Λ` LHkL
— L = 10— L = 100— L = 1000— L = 10000
-6 -4 -2 0 2 4 6
-4
-2
0
2
4
6
k
sHkL
-2 -1 0 1 2 3 40
1
2
3
4
s
I LHsL
— L = 10— L = 100— L = 1000— L = 10000
Efficient for RVs with bounded support [Exercise 5.4.2]Less efficient for unbounded RVs
Hugo Touchette (QMUL) Large deviations August 2011 71 / 74
Application for SDEs*[Exercise 5.4.3]
Hugo Touchette (QMUL) Large deviations August 2011 72 / 74
Other methods
Optimal path methods
p(x ,T |x0) ≈ e−V (x ,T |x0)/ε
V (x ,T |x0) = minx(t):x(0)=x0,x(T )=x
J[x ] D
∂D
t
I [Exercise 5.4.4 – 5.4.6]
Transition path methodI Monte Carlo for pathsI See Christoph Dellago
Splitting / cloning methodsI See notes for references
Eigenvalue method
λ(k) = dominant eigenvalue of tilted generator Gk
I See notes for references
Hugo Touchette (QMUL) Large deviations August 2011 73 / 74
Reading
H. TouchetteThe large deviation approach to statistical mechanicsPhysics Reports 478, 1-69, 2009arxiv:0804.0327
Y. OonoLarge deviation and statistical physicsProg. Theoret. Phys. Suppl. 99, 165-205, 1989
A. Dembo and O. ZeitouniLarge Deviations Techniques and ApplicationsSpringer, New York, 1998
J. A. BucklewIntroduction to Rare Event SimulationSpringer, New York, 2004
Hugo Touchette (QMUL) Large deviations August 2011 74 / 74
top related