tutorial on quasi-monte carlo · pdf filemonte carlo simulation; ... different worst-case...
Embed Size (px)
TRANSCRIPT

Tutorial onquasi-Monte Carlo methods
Josef Dick
School of Mathematics and Statistics, UNSW, Sydney, [email protected]

Comparison: MCMC, MC, QMC
Roughly speaking:
• Markov chain Monte Carlo and quasi-Monte Carlo are fordifferent types of problems;
• If you have a problem where Monte Carlo does not work, thenchances are quasi-Monte Carlo will not work as well;
• If Monte Carlo works, but you want a faster method⇒ try(randomized) quasi-Monte Carlo (some tweaking might benecessary).
• Quasi-Monte Carlo is an "experimental design" approach toMonte Carlo simulation;
In this talk we shall discuss how quasi-Monte Carlo can be faster thanMonte Carlo under certain assumptions.

Outline
DISCREPANCY, REPRODUCING KERNEL HILBERTSPACES AND WORST-CASE ERROR
QUASI-MONTE CARLO POINT SETS
RANDOMIZATIONS
WEIGHTED FUNCTION SPACES ANDTRACTABILITY

The task is to approximate an integral
Is(f ) =
∫[0,1]s
f (z) dz
for some integrand f by some quadrature rule
QN,s(f ) =1N
N−1∑n=0
f (xn)
at some sample points x0, . . . ,xN−1 ∈ [0,1]s.

∫[0,1]s
f (x) dx ≈ 1N
N−1∑n=0
f (xn)
In other words:
Area under curve = Volume of cube × average function value.
• If x0, . . . ,xN−1 ∈ [0,1]s are chosen randomly⇒ Monte Carlo
• If x0, . . . ,xN−1 ∈ [0,1]s chosen deterministically⇒ Quasi-MonteCarlo

Smooth integrands
• Integrand f : [0,1]→ R; say continuously differentiable;• We want to study the integration error:∫ 1
0f (x) dx − 1
N
N−1∑n=0
f (xn).
• Representation:
f (x) = f (1)−∫ 1
xf ′(t) dt = f (1)−
∫ 1
01[0,t](x)f ′(t) dt ,
where
1[0,t](x) =
{1 if x ∈ [0, t ],0 otherwise.

Integration error
• Substitute:∫ 1
0f (x)dx =
∫ 1
0
(f (1)−
∫ 1
01[0,t](x)f ′(t) dt
)dx
= f (1)−∫ 1
0
∫ 1
01[0,t](x)f ′(t) dx dt ,
1N
N−1∑n=0
f (xn) =1N
N−1∑n=0
(f (1)−
∫ 1
01[0,t](xn)f ′(t) dt
)
= f (1)−∫ 1
0
1N
N−1∑n=0
1[0,t](xn)f ′(t) dt .
• Integration error:∫ 1
0f (x) dx − 1
N
N−1∑n=0
f (xn) =
∫ 1
0
(1N
N−1∑n=0
1[0,t](xn)−∫ 1
01[0,t](x) dx
)f ′(t) dt .

Local discrepancy
Integration error:∫ 1
0f (x) dx − 1
N
N−1∑n=0
f (xn) =
∫ 1
0
(1N
N−1∑n=0
1[0,t](xn)− t
)f ′(t) dt .
Let P = {x0, . . . , xN−1} ⊂ [0,1]. Define the local discrepancy by
∆P(t) =1N
N−1∑n=0
1[0,t](xn)− t , t ∈ [0,1].
Then ∫ 1
0f (x) dx − 1
N
N−1∑n=0
f (xn) =
∫ 1
0∆P(t)f ′(t) dt .

Koksma-Hlawka inequality
∣∣∣∣∣∫ 1
0f (x) dx − 1
N
N−1∑n=0
f (xn)
∣∣∣∣∣ =
∣∣∣∣∣∫ 1
0∆P(t)f ′(t) dt
∣∣∣∣∣≤
(∫ 1
0|∆P(t)|p dt
)1/p(∫ 1
0|f ′(t)|q dt
)1/q
= ‖∆P‖Lp‖f ′‖Lq ,1p
+1q
= 1,
where ‖g‖Lp =(∫|g|p
)1/p.

Interpretation of discrepancy
Let P = {x0, . . . , xN−1} ⊂ [0,1]. Recall the definition of the localdiscrepancy:
∆P(t) =1N
N−1∑n=0
1[0,t](xn)− t , t ∈ [0,1].
Local discrepancy measures difference between uniform distributionand emperical distribution of quadrature points P = {x0, . . . , xN−1}.
This is the Kolmogorov-Smirnov test for the difference between theempirical distribution of {x0, . . . , xN−1} and the uniform distribution.

Function space
Representation
f (x) = f (1)−∫ 1
xf ′(t) dt .
Define inner product:
〈f ,g〉 = f (1)g(1) +
∫ 1
0f ′(t)g′(t) dt .
and norm
‖f‖ =
√|f (1)|2 +
∫ 1
0|f ′(t)|2 dt .
Function space:
H = {f : [0,1]→ R : f absolutely continuous and ‖f‖ <∞}.

Worst-case error
The worst-case error is defined by
e(H,P) = supf∈H,‖f‖≤1
∣∣∣∣∣∫ 1
0f (x) dx − 1
N
N−1∑n=0
f (xn)
∣∣∣∣∣ .

Reproducing kernel Hilbert space
Recall:
f (y) = f (1) · 1 +
∫ 1
0f ′(x)(−1[0,x ](y)) dx
and
〈f ,g〉 = f (1)g(1) +
∫ 1
0f ′(x)g′(x) dx .
Goal: Find set of functions gy ∈ H for each y ∈ [0,1] such that
〈f ,gy 〉 = f (y) for all f ∈ H.
Conclusions:• gy (1) = 1 for all y ∈ [0,1];•
g′y (x) = −1[0,x ](y) =
{−1 if y ≤ x ,
0 otherwise;
• Make g continuous such that g ∈ H;

Reproducing kernel Hilbert space
It follows that
gy (x) = 1 + min{1− x ,1− y}.
The function
K (x , y) := gy (x) = 1 + min{1− x ,1− y}, x , y ∈ [0,1]
is called reproducing kernel.
The space H is called a reproducing kernel Hilbert space (withreproducing kernel K ).

Numerical integration in reproducing kernel Hilbertspaces
Function representation:∫ 1
0f (z) dz =
∫ 1
0〈f ,K (·, z)〉 dz =
⟨f ,∫ 1
0K (·, z) dz
⟩1N
N−1∑n=0
f (xn) =1N
N−1∑n=0
〈f ,K (·, xn)〉 =
⟨f ,
1N
N−1∑n=0
K (·, xn)
⟩
Integration error∫ 1
0f (z) dz − 1
N
N−1∑n=0
f (xn) =
⟨f ,∫ 1
0K (·, z) dz − 1
N
N−1∑n=0
K (·, xn)
⟩= 〈f ,h〉,
where
h(x) =
∫ 1
0K (x , z) dz − 1
N
N−1∑n=0
K (x , xn).

Worst-case error in reproducing kernel Hilbert spaces
Thus
e(H,P) = supf∈H,‖f‖≤1
∣∣∣∣∣∫ 1
0f (z) dz − 1
N
N−1∑n=0
f (xn)
∣∣∣∣∣= sup
f∈H,‖f‖=1|〈f ,h〉|
= supf∈H,f 6=0
∣∣∣∣⟨ f‖f‖
,h⟩∣∣∣∣
=〈h,h〉‖h‖
= ‖h‖,
since the supremum is attained when choosing f = h‖h‖ ∈ H.

Worst-case error in reproducing kernel Hilbert spaces
e2(H,P) =
∫ 1
0
∫ 1
0K (x , y) dx dy − 2
N
N−1∑n=0
∫ 1
0K (x , xn) dx
+1
N2
N−1∑n,m=0
K (xn, xm)

Numerical integration in higher dimensions
• Tensor product space Hs = H× · · · × H.• Reproducing kernel
K (x ,y) =s∏
i=1
[1 + min{1− xi ,1− yi}],
where x = (x1, . . . , xs),y = (y1, . . . , ys) ∈ [0,1]s.• Functions f ∈ Hs have partial mixed derivatives up to order 1 in
each variable∂|u|f∂xu
∈ L2([0,1]s),
where u ⊆ {1, . . . , s}, xu = (xi )i∈u and |u| denotes the cardinalityof u and where ∂|u|f
∂xu(xu,1) = 0 for all u ⊂ {1, . . . , s}.

Worst-case error
Again
e2(H,P) =
∫[0,1]s
∫[0,1]s
K (x ,y) dx dy − 2N
N−1∑n=0
∫[0,1]s
K (x ,xn) dx
+1
N2
N−1∑n,m=0
K (xn,xm)
ande2(H,P) =
∫[0,1]s
∆P(x)∂sf∂x
(x) dx ,
where
∆P(x) =1N
N−1∑n=0
1[0,x](xn)−s∏
i=1
xi .

Discrepancy in higher dimensions
Point set P = {x0, . . . ,xN−1} ⊂ [0,1]s, t = (t1, . . . , ts) ∈ [0,1]s.
• Local discrepancy:
∆P(t) =1N
N−1∑n=0
1[0,t](xn)−s∏
i=1
ti ,
where [0, t ] =∏s
i=1[0, ti ].

Koksma-Hlawka inequalityLet f : [0,1]s → R with
‖f‖ =
(∫[0,1]s
∣∣∣∣ ∂s
∂xf (x)
∣∣∣∣q dx
)1/q
,
and where ∂|u|f∂xu
(xu,1) = 0 for all u ⊂ {1, . . . , s}.Then ∣∣∣∣∣
∫[0,1]s
f (x) dx − 1N
N−1∑n=0
f (xn)
∣∣∣∣∣ ≤ ‖f‖‖∆P‖Lp ,
where 1p + 1
q = 1.
⇒ Construct points P = {x0, . . . ,xN−1} ⊂ [0,1]s with smalldiscrepancy
Lp(P) = ‖∆P‖Lp .
It is often useful to consider different reproducing kernels yieldingdifferent worst-case errors and discrepancies. We will see furtherexamples later.

Constructions of low-discrepancy sequences
• Lattice rules (Bilyk, Brauchart, Cools, D., Hellekalek, Hickernell,Hlawka, Joe, Keller, Korobov, Kritzer, Kuo, Larcher, L’Ecuyer,Lemieux, Leobacher, Niederreiter, Nuyens, Pillichshammer,Sinescu, Sloan, Temlyakov, Wang, Wozniakowski, ...)
• Digital nets and sequences (Baldeaux, Bierbrauer, Brauchart,Chen, D., Edel, Faure, Hellekalek, Hofer, Keller, Kritzer, Kuo,Larcher, Leobacher, Niederreiter, Owen, Özbudak,Pillichshammer, Pirsic, Schmid, Skriganov, Sobol’, Wang, Xing,Yue, ...)
• Hammersley-Halton sequences (Atanassov, De Clerck, Faure,Halton, Hammersley, Kritzer, Larcher, Lemieux, Pillichshammer,Pirsic, White, ...)
• Kronecker sequences (Beck, Hellekalek, Larcher, Niederreiter,Schoissengeier, ...)

Lattice rules
Let N ∈ N, let
g = (g1, . . . ,gs) ∈ {1, . . . ,N − 1}s.
Choose the quadrature points as
xn ={ng
N
}, for n = 0, . . . ,N − 1,
where {z} = z − bzc for z ∈ R+0 .

Fibonacci lattice rules
Lattice rule with N = 55 points and generating vector g = (1,34).

Fibonacci lattice rules
Lattice rule with N = 89 points and generating vector g = (1,55).

In comparison: Random point set
Random set of 64 points generated by Matlab using MersenneTwister.

Lattice rules
How to find generating vector g?
Reproducing kernel:
K (x ,y) =s∏
i=1
(1 + 2πBα ({xi − yi})) ,
where {xi − yi} = (xi − yi )− bxi − yic is the fractional part of xi − yi .
Reproducing kernel Hilbert space of Fourier series:
f (x) =∑h∈Zs
f (h)e2πih·x .

Worst-case error:
e2α(g) = −1 +
1N
N−1∑n=0
s∏i=1
(1 + 2πBα
({ngi
N
})),
where Bα is the Bernoulli polynomial of order α. For instanceB2(x) = x2 − x + 1/6.

Component-by-component construction (Korobov,Sloan-Reztsov,Sloan-Kuo-Joe, Nuyens-Cools)
• Set g1 = 1.• For d = 2, . . . , s assume that we have found g2, . . . ,gd−1. Then
find gd ∈ {1, . . . ,N − 1} which minimizes eα(g1, . . . ,gd−1,g) as afunction of g, i.e.
gd = argming∈{1,...,N−1}e
2α(g1, . . . ,gd−1,g).
Using fast Fourier transform (Nuyens, Cools), a good generatingvector g ∈ {1, . . . ,N − 1}s can be found in O(sN log N) operations.

Hammersley-Halton sequence
Radical inverse function in base b:
• Let n ∈ N0 have base b expansion
n = n0 + n1b + · · ·+ na−1ba−1.
• Setϕb(n) =
n0
b+
n1
b2 + · · ·+ na−1
ba ∈ [0,1].
Hammersley-Halton sequence:• Let P = {p1,p2, . . . , } be the set of prime numbers in increasing
order, i.e. p1 = 2,p2 = 3,p3 = 5,p4 = 7, . . ..• Define quadrature points x0,x1, . . . by
xn = (ϕp1 (n), ϕp2 (n), . . . , ϕps (n)) for n = 0,1,2, . . . .

Hammersley-Halton sequence
Hammerlsey-Halton point set with 64 points.

Digital nets
• Choose prime number b and finite field Zb = {0,1, . . . ,b − 1} oforder b.
• Choose C1, . . . ,Cs ∈ Zm×mb .
• Let n = n0 + n1b + · · ·+ nm−1bm−1 and set~n = (n0, . . . ,nm−1)> ∈ Zm
b .• Let
~yn,i = Ci~n for 1 ≤ i ≤ s,0 ≤ n < bm.
• For ~yn,i = (yn,i,1, . . . , yn,i,m)> ∈ Zmb let
xn,i =yn,i,1
b+ · · ·+ yn,i,m
bm .
• Set xn = (xn,1, . . . , xn,s) for 0 ≤ n < bm.

(t ,m, s)-net property
Let m, s ≥ 1 and b ≥ 2 be integers. A point set P = {x0, . . . ,xbm−1}is called a (t ,m, s)-net in base b, if for all integers d1, . . . ,ds ≥ 0 with
d1 + · · ·+ ds = m − t
the number of points in the elementary intervals
s∏i=1
[ai
bdi,
ai + 1bdi
)where 0 ≤ ai < bdi ,
is bt .

If P = {x0, . . . ,xbm−1} ⊂ [0,1]s is a (t ,m, s)-net, then localdiscrepancy function
∆P
( a1
bd1, . . . ,
as
bds
)= 0
for all 0 ≤ ai < bdi ,di ≥ 0,d1 + · · ·+ ds = m − t .













Sobol point set
The first 64 points of a Sobol sequence which form a (0,6,2)-net inbase 2.

Niederreiter-Xing point set
The first 64 points of a Niederreiter-Xing sequence.

Niederreiter-Xing point set
The first 1024 points of a Niederreiter-Xing sequence.

Kronecker sequence
Let α1, . . . , αs ∈ R be linearly independent over Q.Let
xn = ({nα1}, . . . , {nαs}) for n = 0,1,2, . . . .
For instance, one can choose αi =√
pi where pi is the i th primenumber.

Kronecker sequence
The first 64 points of a Kronecker sequence.

Discrepancy bounds
• It is known that for all the constructions above one has
Lp(P) ≤ Cs(log N)c(s)
N, for all N ≥ 1,1 ≤ p ≤ ∞,
where Cs > 0 and c(s) ≈ s.• Lower bound: For all point sets P consisting of N points we have
Lp(P) ≥ C′s(log N)(s−1)/2
Nfor all N ≥ 1,1 < p ≤ ∞.
In comparison: random point set
E(L2(P)) = O
(√log log N
N
).

For 1 < p <∞, the exact order of convergence is known to be oforder
(log N)(s−1)/2
N.
Explicit constructions of such points were achieved by Chen &Skriganov for p = 2 and Skriganov for 1 < p <∞.
Great open problem: What is the correct order of convergence of
minP⊂[0,1]s|P|=N
L∞(P)?

Randomized quasi-Monte Carlo
Introduce random element to deterministic point set.
Has several advantages:• yields an unbiased estimator;• there is a statistical error estimate;• better rate of convergence of the random-case error compared to
worst-case error;• performs similar to Monte Carlo for L2 functions but has better
rate of convergence for smooth functions;

Randomly shifted lattice rules
Lattice rules can be randomized using a random shift:• Lattice point set:
{ng/N} , n = 0,1, . . . ,N − 1.
• Shifted lattice rule: choose σ ∈ [0,1]s uniformly distributed; thenthe shifted lattice point set is given by
{ng/N + σ} , n = 0,1, . . . ,N − 1.

Lattice point set

Shifted lattice point set

Owen’s scrambling
• Let Zb = {0,1, . . . ,b − 1} and let
x =x1
b+
x2
b2 + · · · , x1, x2, . . . ∈ Zb.
• Randomly choose permutations σ, σx1 , σx1,x2 , . . . : Zb → Zb.• Let
y1 = σ(x1)
y2 = σx1 (x2)
y3 = σx1,x2 (x3)
. . . . . .
• Sety =
y1
b+
y2
b2 + · · · .

Scrambled Sobol point set
The first 1024 points of a Sobol sequence (left) and a scrambledSobol sequence (right).

Scrambled net variance
Let
e(R) =
∫[0,1]s
f (x) dx − 1N
bm−1∑n=0
f (yn).
Then•
E(e(R)) = 0
•
Var(e(R)) = O(
(log N)s
N2α+1
)for integrands with smoothness 0 ≤ α ≤ 1.

Dependence on the dimension
Although the convergence rate is better for quasi-Monte Carlo, thereis a stronger dependence on the dimension:
• Monte Carlo: error = O(N−1/2);
• Quasi-Monte Carlo: error = O(
(log N)s−1
N
);
Notice that g(N) := N−1(log N)s−1 is an increasing function for
N ≤ es−1.
So if s is large, say s ≈ 30, then N−1(log N)29 increases for N ≤ 1012.

Intractable
For integration error in H with reproducing kernel
K (x ,y) =s∏
i=1
min{1− xi ,1− yi}
it is known that
e2(H,P) ≥(
1− N(
89
)s)e2(H, ∅).
⇒ Error can only decrease if
N > constant ·(
98
)s
.

Weighted function spaces (Sloan and Wozniakowski)
Study weighted function spaces: Introduce γ1, γ2, . . . , > 0 and define
K (x ,y) =s∏
i=1
(1 + γi min{1− xi ,1− yi}).
Then if∞∑i=1
γi <∞
we havee(Hs,γ ,P) ≤ CN−δ,
where C > 0 is independent of the dimension s and δ < 1.

Tractability
• Minimal error over all methods using N points:
e∗N(H) = infP:|P|=N
e(H,P);
• Inverse of the error:
N∗(s, ε) = min{N ∈ N : e∗N(H) ≤ εe∗0(H)}
for ε > 0;• Strong tractability:
N∗(s, ε) ≤ Cε−β
for some constant C > 0;
Sloan and Wozniakowski: For the function space considered above,we get strong tractability if
∞∑i=1
γi <∞.

Tractability of star-discrepancy
Recall the local discrepancy function
∆P(t) =1N
N−1∑n=0
1[0,t)(xn)− t1 · · · ts, where P = {x0, . . . ,xN−1}.
The star-discrepancy is given by
D∗N(P) = supt∈[0,1]s
|∆P(t)| .
Result by Heinrich, Novak, Wozniakowski, Wasilkowski:
Minimal star discrepancy D∗(s,N) = infP
D∗N(P) ≤ C√
sN
for all s,N ∈ N.
This implies thatN∗(s, ε) ≤ Csε−2.

Extensions, open problems and future research
• Quasi-Monte Carlo methods which achieve convergence rates oforder
N−α(log N)αs, α > 1
for sufficiently smooth integrands;• Construction of point sets whose star-discrepancy is tractable;• Completely uniformly distributed point sets to speed up
convergence in Markov chain Monte Carlo algorithms;• Infinite dimensional integration: Integrands have infinitely many
variables;• Connections to codes, orthogonal arrays and experimental
designs;• Quasi-Monte Carlo for function approximation;• Choosing the weights in applications;• Quasi-Monte Carlo on the sphere;

Book

Thank You!Special thanks to
• Dirk Nuyens for creating many of the pictures in this talk;• Friedrich Pillichshammer for letting me use a picture of his
daughters;• Fred Hickernell, Peter Kritzer, Art Owen, and Friedrich
Pillichshammer for helpful comments on the slides;• All colleagues who worked on quasi-Monte Carlo methods over
the years.