omiros' talk on the bernoulli factory problem

Simulating events of unknown probability byreverse time martingales

(poprzez martynga ly z czasem odwroconym)

Omiros PapaspiliopoulosUPF, Barcelona

joint work withKrzysztof Latuszynski Ioannis Kosmidis Gareth O. Roberts

(Warwick University)

Motivation

The Bernoulli factory - general results and previous approaches

Reverse time martingale approach to sampling

Application to the Bernoulli Factory problem

Generic description of the problem

I Let p ∈ (0, 1) be unknown.

I Given a black box that samples p−coins

I Can we construct a black box that samples f (p) coins for known f ?For example f (p) = min(1, 2p)

Some history

(see for example Peres, 1992)von Neumann posed and solved the problem: f (p) = 1/2

1. set n = 1;

2. sample Xn,Xn+1

3. if (Xn,Xn+1) = (0, 1) output 1 and STOP

5. set n := n + 2 and GOTO 2.

Lets check why this work

Some history

(see for example Peres, 1992)von Neumann posed and solved the problem: f (p) = 1/2

1. set n = 1;

2. sample Xn,Xn+1

5. set n := n + 2 and GOTO 2.

Lets check why this work

The Bernoulli Factory problem

for known f and unknown p, how to generate an f (p)−coin?

von Neumann: f (p) = 1/2

Asmussen conjectured f (p) = 2p, but it turned out difficult

Exact simulation of diffusions as Bernoulli factory

This is the description of EA closest in spirit to Beskos and Roberts(2005)

Simulate XT at time T > 0 from:

dXt = α(Xt)dt + dWt , X0 = x ∈ R, t ∈ [0,T ] (1)

driven by the Brownian motion Wt ; 0 ≤ t ≤ T

Ω ≡ C ([0,T ],R), co-ordinate mappings Bt : Ω 7→ R, t ∈ [0,T ], suchthat for any t, Bt(ω) = ω(t) and the cylinder σ-algebraC = σ(Bt ; 0 ≤ t ≤ T).

W x = W xt ; 0 ≤ t ≤ T the Brownian motion started at x ∈ R, and by

W x,u = W x,ut ; 0 ≤ t ≤ T the Brownian bridge.

1. The drift function α is differentiable.

2. The function h(u) = expA(u)− (u − x)2/2T, u ∈ R, forA(u) =

0α(y)dy , is integrable.

3. The function (α2 + α′)/2 is bounded below by ` > −∞, and above

by r + ` <∞.

φ(u) =1

r[(α2 + α

′)/2− `] ∈ [0, 1] , (2)

Q be the probability measure induced by the solution X of (1) on (Ω, C),W the corresponding probability measure for W x , and Z be theprobability measure defined as the following simple change of measurefrom W: dW/dZ(ω) ∝ exp−A(BT ). Note that a stochastic processdistributed according to Z has similar dynamics to the Brownian motion,with the exception of the distribution of the marginal distribution at timeT (with density, say, h) which is biased according to A.

(ω) ∝ exp

T−1φ(Bt)dt

≤ 1 Z− a.s. (3)

Q be the probability measure induced by the solution X of (1) on (Ω, C),W the corresponding probability measure for W x , and Z be theprobability measure defined as the following simple change of measurefrom W: dW/dZ(ω) ∝ exp−A(BT ). Note that a stochastic processdistributed according to Z has similar dynamics to the Brownian motion,with the exception of the distribution of the marginal distribution at timeT (with density, say, h) which is biased according to A. Then,

(ω) ∝ exp

T−1φ(Bt)dt

≤ 1 Z− a.s. (3)

EA using Bernoulli factory

1. simulate u ∼ h2. generate a Cs coin where s := e−rTJ , and J :=

0T−1φ(W x,u

t )dt;3. If Cs = 1 output u and STOP;4. If Cs = 0 GOTO 1.

Exploiting the Markov property, we can assume from now on that rT < 1.

The challenging part of the algorithm is Step 2, since exact computationof J is impossible due to the integration over a Brownian bridge path.

On the other hand, it is easy to generate J-coins:

CJ = I(ψ < φ(W x,uχ )), ψ ∼ U(0, 1) , χ ∼ U(0,T )

independent of the Brownian bridge W x,u and of each other.

Therefore, we deal with another instance of the problem studied in thisarticle: given p-coins how to generate f (p)-coins, where here f is theexponential function.

(Note the interplay between unbiased estimation and exact simulationhere)

Keane and O’Brien - existence result

I Keane and O’Brien (1994):

Let p ∈ P ⊆ (0, 1)→ [0, 1]

then it is possible to simulate an f (p)−coin ⇐⇒I f is constantI f is continuous and for some n ∈ N and all p ∈ P satisfies

minf (p), 1− f (p)

≥ min

p, 1− p

I however their proof is not constructive

Note that the result rules out min1, 2p, but not min1− 2ε, 2p

Keane and O’Brien - existence result

I Keane and O’Brien (1994):

Let p ∈ P ⊆ (0, 1)→ [0, 1]

then it is possible to simulate an f (p)−coin ⇐⇒I f is constantI f is continuous and for some n ∈ N and all p ∈ P satisfies

minf (p), 1− f (p)

≥ min

p, 1− p

nI however their proof is not constructive

Note that the result rules out min1, 2p, but not min1− 2ε, 2p

Nacu-Peres (2005) Theorem - Bernstein polynomialapproach

I There exists an algorithm which simulates f ⇐⇒ there existpolynomials

gn(x , y) =n∑

)a(n, k)xkyn−k , hn(x , y) =

n∑k=0

)b(n, k)xkyn−k

with following properties

I 0 ≤ a(n, k) ≤ b(n, k) ≤ 1I(nk

)a(n, k) and

)b(n, k) are integers

I limn→∞ gn(p, 1− p) = f (p) = limn→∞ hn(p, 1− p)I for all m < n

(x+y)n−mgm(x , y) gn(x , y) and (x+y)n−mhm(x , y) hn(x , y)

I Nacu & Peres provide coefficients for f (p) = min2p, 1− 2εexplicitly.

I Given an algorithm for f (p) = min2p, 1− 2ε Nacu & Peresdevelop a calculus that collapses every real analytic g to nestingthe algorithm for f and simulating g .

gn(x , y) =n∑

n∑k=0

)b(n, k)xkyn−k

with following propertiesI 0 ≤ a(n, k) ≤ b(n, k) ≤ 1

)a(n, k) and

gn(x , y) =n∑

n∑k=0

)b(n, k)xkyn−k

with following propertiesI 0 ≤ a(n, k) ≤ b(n, k) ≤ 1I(nk

)a(n, k) and

gn(x , y) =n∑

n∑k=0

)b(n, k)xkyn−k

)a(n, k) and

I limn→∞ gn(p, 1− p) = f (p) = limn→∞ hn(p, 1− p)

I for all m < n

gn(x , y) =n∑

n∑k=0

)b(n, k)xkyn−k

)a(n, k) and

gn(x , y) =n∑

n∑k=0

)b(n, k)xkyn−k

)a(n, k) and

gn(x , y) =n∑

n∑k=0

)b(n, k)xkyn−k

)a(n, k) and

too nice to be true?

I at time n the N-P algorithm computes sets An and Bn - subsetsof all 01 strings of length n

I the cardinalities of An and Bn are precisely(nk

)a(n, k) and(

)b(n, k)

I the upper polynomial approximation is converging slowly to f

I length of 01 strings is 215 = 32768 and above, e.g.225 = 16777216

I one has to deal efficiently with the set of 2225 strings, of length225 each.

I we shall develop a reverse time martingale approach to theproblem

I we will construct reverse time super- and submartingales thatperform a random walk on the Nacu-Peres polynomial coefficientsa(n, k), b(n, k) and result in a black box that has algorithmiccost linear in the number of original p−coins

)a(n, k) and(

)b(n, k)

)a(n, k) and(

)b(n, k)

)a(n, k) and(

)b(n, k)

)a(n, k) and(

)b(n, k)

)a(n, k) and(

)b(n, k)

)a(n, k) and(

)b(n, k)

Before giving the most general algorithm, let us think gradually how tosimulate events of unknown probability constructively

Algorithm 1 - randomization

I Lemma: Sampling events of probability s ∈ [0, 1] is equivalent toconstructing an unbiased estimator of s taking values in [0, 1]with probability 1.

I Proof: Let S , s.t. ES = s and P(S ∈ [0, 1]) = 1 be theestimator. Then draw G0 ∼ U(0, 1), obtain S and define a coinCs := IG0 ≤ S.

P(Cs = 1) = E I(G0 ≤ S) = E(E(I(G0 ≤ s) | S = s

))= ES = s.

The converse is straightforward since an s−coin is an unbiasedestimator of s with values in [0, 1].

I Algorithm 1

1. simulate G0 ∼ U(0, 1);2. obtain S ;3. if G0 ≤ S set Cs := 1, otherwise set Cs := 0;4. output Cs .

P(Cs = 1) = E I(G0 ≤ S) = E(E(I(G0 ≤ s) | S = s

))= ES = s.

I Algorithm 1

P(Cs = 1) = E I(G0 ≤ S) = E(E(I(G0 ≤ s) | S = s

))= ES = s.

I Algorithm 1

Algorithm 2 - lower and upper monotone deterministicbounds

I let l1, l2, ... and u1, u2, ... be sequences of lower and uppermonmotone bounds for s converging to s, i.e.

li s and ui s.

II Algorithm 2

1. simulate G0 ∼ U(0, 1); set n = 1;2. compute ln and un;3. if G0 ≤ ln set Cs := 1;4. if G0 > un set Cs := 0;5. if ln < G0 ≤ un set n := n + 1 and GOTO 2;6. output Cs .

II Remark: P(N > n) = un − ln.

li s and ui s.

II Algorithm 2

li s and ui s.

II Algorithm 2

This is a practically useful technique, suggested for example in Devroye(1986), and implemented for simulation from random measures inP+Roberts (2008).

Algorithm 3 - monotone stochastic bounds

Ln ≤ Un (4)

Ln ∈ [0, 1] and Un ∈ [0, 1] (5)

Ln−1 ≤ Ln and Un−1 ≥ Un (6)

E Ln = ln s and E Un = un s. (7)

F0 = ∅,Ω, Fn = σLn,Un, Fk,n = σFk ,Fk+1, ...Fn for k ≤ n.

I Algorithm 3

1. simulate G0 ∼ U(0, 1); set n = 1;2. obtain Ln and Un; conditionally on F1,n−1

3. if G0 ≤ Ln set Cs := 1;4. if G0 > Un set Cs := 0;5. if Ln < G0 ≤ Un set n := n + 1 and GOTO 2;6. output Cs .

Ln ≤ Un (4)

Ln ∈ [0, 1] and Un ∈ [0, 1] (5)

III Algorithm 3

Ln ≤ Un (4)

Ln ∈ [0, 1] and Un ∈ [0, 1] (5)

III Algorithm 3

LemmaAssume (4), (5), (6) and (7). Then Algorithm 3 outputs a valid s−coin.Moreover the probability that it needs N > n iterations equals un − ln.

Proof.Probability that Algorithm 3 needs more then n iterations equalsE(Un − Ln) = un − ln → 0 as n→∞. And since 0 ≤ Un − Ln is adecreasing sequence a.s., we also have Un − Ln → 0 a.s. So there exists arandom variable S , such that for almost every realization of sequencesLn(ω)n≥1 and Un(ω)n≥1 we have Ln(ω) S(ω) and Un(ω) S(ω).

By (5) we have S ∈ [0, 1] a.s. Thus for a fixed ω the algorithm outputsan S(ω)−coin a.s (by Algorithm 2). Clearly E Ln ≤ E S ≤ E Un andhence E S = s.

Algorithm 4 - reverse time martingales

Ln ≤ Un

Ln ∈ [0, 1] and Un ∈ [0, 1]

Ln−1 ≤ Ln and Un−1 ≥ Un

E Ln = ln s and E Un = un s.

The final step is to weaken 3rd condition and let Ln be a reverse timesupermartingale and Un a reverse time submartingale with respect toFn,∞. Precisely, assume that for every n = 1, 2, ... we have

E (Ln−1 | Fn,∞) = E (Ln−1 | Fn) ≤ Ln a.s. and (8)

E (Un−1 | Fn,∞) = E (Un−1 | Fn) ≥ Un a.s. (9)

III Algorithm 4

1. simulate G0 ∼ U(0, 1); set n = 1; set L0 ≡ L0 ≡ 0 andU0 ≡ U0 ≡ 1

2. obtain Ln and Un given F0,n−1,3. compute L∗

n = E (Ln−1 | Fn) and U∗n = E (Un−1 | Fn).

4. compute

Ln = Ln−1 +Ln − L∗

U∗n − L∗

(Un−1 − Ln−1

Un = Un−1 −U∗

n − Un

U∗n − L∗

(Un−1 − Ln−1

TheoremAssume (4), (5), (7), (8) and (9). Then Algorithm 4 outputs a valids−coin. Moreover the probability that it needs N > n iterations equalsun − ln.

We show that L and U satisfy (4), (5), (7) and (6) and henceAlgorithm 4 is valid since Algorithm 3 was valid.In fact, we have constructed a mean preserving transformation, in thesense E[Ln] = E[Ln], and E[Un] = E[Un]. Therefore, the proof is basedon establishing this property and appealing to Algorithm 3.

I By construction L0 = 0,U0 = 1 a.s thus L∗0 = 0,U∗0 = 1

I Therefore, L1 = L1, and U1 = U1 (intuitive)

I Thus,

L2 = L1 +L2 − L∗2U∗2 − L∗2

(U1 − L1)

I Take conditional expectation given F2

I Therefore result holds for n = 1, 2, then use induction (following thesame approach)

I Thus,

L2 = L1 +L2 − L∗2U∗2 − L∗2

(U1 − L1)

I Thus,

L2 = L1 +L2 − L∗2U∗2 − L∗2

(U1 − L1)

I Thus,

L2 = L1 +L2 − L∗2U∗2 − L∗2

(U1 − L1)

Unbiased estimators

TheoremSuppose that for an unknown value of interest s ∈ R, there exist aconstant M <∞ and random sequences Ln and Un s.t.

P(Ln ≤ Un) = 1 for every n = 1, 2, ...

P(Ln ∈ [−M,M]) = 1 and P(Un ∈ [−M,M]) = 1 for every n = 1, 2, ...

E Ln = ln s and E Un = un s

E (Ln−1 | Fn,∞) = E (Ln−1 | Fn) ≤ Ln a.s. and

E (Un−1 | Fn,∞) = E (Un−1 | Fn) ≥ Un a.s.

Then one can construct an unbiased estimator of s.

Proof.After rescaling, one can use Algorithm 4 to sample events of probability(M + s)/2M, which gives an unbiased estimator of (M + s)/2M andconsequently of s.

A version of the Nacu-Peres Theorem

An algorithm that simulates a function f on P ⊆ (0, 1) exists if and onlyif for all n ≥ 1 there exist polynomials gn(p) and hn(p) of the form

gn(p) =n∑

)a(n, k)pk(1−p)n−k and hn(p) =

n∑k=0

)b(n, k)pk(1−p)n−k ,

(i) 0 ≤ a(n, k) ≤ b(n, k) ≤ 1,

(ii) limn→∞ gn(p) = f (p) = limn→∞ hn(p),

(iii) For all m < n, their coefficients satisfy

a(n, k) ≥k∑

(n−mk−i)(

) a(m, i), b(n, k) ≤k∑

(n−mk−i)(

) b(m, i). (12)

Proof: polynomials ⇒ algorithm.

I Let X1,X2, . . . iid tosses of a p−coin.

I Define Ln,Unn≥1 as follows:

I if∑n

i=1 Xi = k, let Ln = a(n, k) and Un = b(n, k).

I In the rest of the proof we check that (4), (5), (7), (8) and (9) holdfor Ln,Unn≥1 with s = f (p). Thus executing Algorithm 4 withLn,Unn≥1 yields a valid f (p)−coin.Clearly (4) and (5) hold due to (i). For (7) note thatE Ln = gn(p) f (p) and E Un = hn(p) f (p).

I if∑n

i=1 Xi = k, let Ln = a(n, k) and Un = b(n, k).

I if∑n

i=1 Xi = k , let Ln = a(n, k) and Un = b(n, k).

I if∑n

i=1 Xi = k , let Ln = a(n, k) and Un = b(n, k).

Proof - continued.

I To obtain (8) and (9) define the sequence of random variables Hn

I the number of heads in X1, . . . ,Xn, i.e. Hn =∑n

i=1 Xi

I and let Gn = σ(Hn). ThusLn = a(n,Hn) and Un = b(n,Hn), hence Fn ⊆ Gn and it is

enough to check that E(Lm|Gn) ≤ Ln and E(Um|Gn) ≥ Un for m < n.

I The distribution of Hm given Hn is hypergeometric and

E(Lm|Gn) = E(a(m,Hm)|Hn) =Hn∑i=0

(n−mHn−i

) a(m, i) ≤ a(n,Hn) = Ln.

Clearly the distribution of Hm given Hn is the same as thedistribution of Hm given Hn,Hn+1, . . . . The argument for Un isidentical.

Proof - continued.

i=1 Xi

(n−mHn−i

) a(m, i) ≤ a(n,Hn) = Ln.

Proof - continued.

i=1 Xi

(n−mHn−i

) a(m, i) ≤ a(n,Hn) = Ln.

Proof - continued.

i=1 Xi

(n−mHn−i

) a(m, i) ≤ a(n,Hn) = Ln.

Practical issues for Bernoulli factory

Given a function f , finding polynomial envelopes satisfying propertiesrequired is not easy. Section 3 of N-P provides explicit formulas forpolynomial envelopes of f (p) = min2p, 1− 2ε that satisfy conditions,precisely a(n, k) and b(n, k) satisfy (ii) and (iii) and one can easilycompute n0 = n0(ε) s.t. for n ≥ n0 condition (i) also holds, which isenough for the algorithm (however n0 is substantial, e.g. n0(ε) = 32768for ε = 0.1 and it increases as ε decreases).

The probability that Algorithm 4 needs N > n inputs equalshn(p)− gn(p). The polynomials provided in N-P satisfyhn(p)− gn(p) ≤ Cρn for p ∈ [0, 1/2− 4ε] guaranteeing fast convergence,and hn(p)− gn(p) ≤ Dn−1/2 elsewhere. Using similar techniques one canestablish polynomial envelopes s.t. hn(p)− gn(p) ≤ Cρn forp ∈ [0, 1](1/2− (2 + c)ε, 1/2− (2− c)ε).

Moreover, we note that despite the fact that the techniques developed inN-P for simulating a real analytic g exhibit exponentially decaying tails,they are often not practical. Nesting k times the algorithm forf (p) = min2p, 1− 2ε is very inefficient. One needs at least n0(ε)k oforiginal p−coins for a single output.

Nevertheless, both algorithms, i.e. the original N-P and our martingalemodification, use the same number of original p−coins for a singlef (p)−coin output with f (p) = min2p, 1− 2ε and consequently also forsimulating any real analytic function using methodology of N-P. Asignificant improvement in terms of p−coins can be achieved only if themonotone super/sub-martingales can be constructed directly and usedalong with Algorithm 3. This is discussed in the next subsection.

Bernoulli Factory for alternating series expansions

PropositionLet f : [0, 1]→ [0, 1] have an alternating series expansion

f (p) =∞∑k=0

(−1)kakpk with 1 ≥ a0 ≥ a1 ≥ . . .

Then an f (p)−coin can be simulated by Algorithm 3 and the probabilitythat it needs N > n iterations equals anpn.

Proof.Let X1,X2, . . . be a sequence of p−coins and define

U0 := a0 L0 := 0,

Un−1 − an

∏nk=1 Xk if n is odd,

Ln−1 if n is even,

Un−1 if n is odd,Ln−1 + an

∏nk=1 Xk if n is even.

Clearly (4), (5), (6) and (7) are satisfied with s = f (p). Moreover,

un − ln = EUn − E Ln = anpn ≤ an.

Thus if an → 0, the algorithm converges for p ∈ [0, 1], otherwise forp ∈ [0, 1).

The exponential function is precisely in this family

Proof.Let X1,X2, . . . be a sequence of p−coins and define

U0 := a0 L0 := 0,

Un−1 − an

∏nk=1 Xk if n is odd,

Ln−1 if n is even,

Un−1 if n is odd,Ln−1 + an

∏nk=1 Xk if n is even.

Clearly (4), (5), (6) and (7) are satisfied with s = f (p). Moreover,

un − ln = EUn − E Ln = anpn ≤ an.

Thus if an → 0, the algorithm converges for p ∈ [0, 1], otherwise forp ∈ [0, 1).

The exponential function is precisely in this family

Some References

I A. Beskos, G.O. Roberts. Exact simulation of diffusions. Ann. Appl.Probab. 15(4): 2422–2444, 2005.

I A. Beskos, O. Papaspiliopoulos, G.O. Roberts. Retrospective exactsimulation of diffusion sample paths with applications. Bernoulli 12:1077–1098, 2006.

I A. Beskos, O. Papaspiliopoulos, G.O. Roberts, and P. Fearnhead.Exact and computationally efficient likelihood-based estimation fordiscreetly observed diffusion processes (with discussion). Journal ofthe Royal Statistical Society B, 68(3):333–382, 2006.

I L. Devroye. Non-uniform random variable generation.Springer-Verlag, New York, 1986.

I M.S. Keane and G.L. O’Brien. A Bernoulli factory. ACMTransactions on Modelling and Computer Simulation (TOMACS),4(2):213–219, 1994.

I P.E. Kloeden and E.Platen. Numerical solution of stochasticdifferential equations, Springer-Verlag, 1995.

I K. Latuszynski, I. Kosmidis, O. Papaspiliopoulos, G.O. Roberts.Simulating events of unknown probabilities via reverse timemartingales. Random Structures and Algorithms, to appear.

I S. Nacu and Y. Peres. Fast simulation of new coins from old.Annals of Applied Probability, 15(1):93–115, 2005.

I O. Papaspiliopoulos, G.O. Roberts. Retrospective Markov chainMonte Carlo for Dirichlet process hierarchical models. Biometrika,95:169–186, 2008.

I Y. Peres. Iterating von Neumann’s procedure for extracting randombits. Annals of Statistics, 20(1): 590–597, 1992.

omiros' talk on the bernoulli factory problem

Education

p p satises n

f pcoins

example f p

p note

f pcoin f

samples f p coins

asmussen conjectured

time t