Fast evaluation of the inverse Poisson CDF
Mike Giles
University of Oxford
Mathematical Institute
Oxford e-Research Centre
GPU Technology Conference 2014
March 26, 2014
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 1 / 23
Outline
problem specification
incomplete Gamma function
CPUs versus GPUs
inverse approximation based on Temme expansion
Temme asymptotic evaluation
complete algorithm
results
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 2 / 23
Poisson CDF and inverse
A discrete Poisson random variable N with rate λ takes integer value n
with probability
e−λλn
n!
Hence, the cumulative distribution function is
C (n) ≡ P(N ≤ n) = e−λ
n∑
m=0
λm
m!.
To generate N, can take a uniform (0, 1) random variable U and then
compute N = C−1
(U), where N is the smallest integer such that
U ≤ C (N)
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 3 / 23
Poisson CDF and inverse
Illustration of the inversion process
4 6 8 10 12 14 16 180
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
u
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 4 / 23
Poisson CDF and inverse
When λ is fixed and not too large (λ<104 ?) can pre-compute C (n)and perform a table lookup.
When λ is variable but small (λ<10 ?) can use bottom-up/top-downsummation.
When λ is variable and large, then rejection methods can be used togenerate Poisson r.v.’s, but the inverse CDF is sometimes helpful:
stratified sampling
Latin hypercube
QMC
This is the problem I am concerned with — approximating C−1
(u) ata cost similar to the inverse Normal CDF, or inverse error function.
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 5 / 23
Poisson CDF and inverse
Illustration of the inversion process through rounding down of some
Q(u) ≡ C−1(u) to give C−1
(u)
4 6 8 10 12 14 16 180
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
u
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 6 / 23
Poisson CDF and inverse
Errors in approximating Q(u) can only lead to errors in rounding downif near an integer
4 6 8 10 12 14 16 180
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
u
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 7 / 23
Incomplete Gamma function
If X is a positive random variable with CDF
C (x) ≡ P(X < x) =1
Γ(x)
∫∞
λ
e−t tx−1 dt.
then integration by parts gives
P(⌊X ⌋ ≤ n) =1
n!
∫∞
λ
e−t tn dt = e−λ
n∑
m=0
λm
m!
=⇒ C−1
(u) = ⌊C−1(u)⌋
We will approximate Q(u) ≡ C−1(u) so that |Q̃(u)−Q(u)| < δ ≪ 1
This will round down correctly except when Q(u) is within δ of an integer– then we need to check some C (m)
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 8 / 23
CPUs and GPUs
On a CPU, if the costs of Q̃(u) and C (m) are CQ and CC , the averagecost is approximately
CQ + 2 δ CC .
However, on a GPU with a warp length of 32, the CC penalty is incurredif any thread in the warp needs it, so the average cost is
CQ +(1− (1−2 δ)32
)CC ≈ CQ + 64 δ CC if δ ≪ 1.
This pushes us to more accurate approximations for GPUs.
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 9 / 23
Temme expansion
Temme (1979) derived a uniformly convergent asymptotic expansionfor C (x) of the form
C (x) = Φ(λ
12 f (r)
)+ λ−
12 φ
(λ
12 f (r)
) ∞∑
n=0
λ−n an(r)
where r = x/λ and
f (r) ≡√
2 (1− r + r log r),
with the sign of the square root matching the sign of r−1.
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 10 / 23
Temme expansion
Based on this, can prove that the quantile function is
Q(u) ≈ λ r + c0(r)
wherer = f −1(w/
√λ), w = Φ−1(u)
and
c0(r) =log
(f (r)
√r/(r−1)
)
log r
Both f −1(s) and c0(r) can be approximated very accurately (over acentral range) by polynomials, and an additional ad hoc correction gives
Q̃T (u) = λ r + p2(r) + p3(r)/λ
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 11 / 23
Temme approximation
The function f (r)
0 0.5 1 1.5 2 2.5 3 3.5 4−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
r
f(r)
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 12 / 23
Temme approximation
Errors in f −1(s) and c0(r) approximations:
−1 0 1 2 3−8
−6
−4
−2
0
2
4
6
8
10x 10
−10
s
erro
r f−1 approximation
0 1 2 3 4−5
−4
−3
−2
−1
0
1
2
3
4
5x 10
−6
r
erro
r
c0 approximation
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 13 / 23
Temme approximation
Maximum error in Q̃T approximation:
101
102
103
104
105
0
0.2
0.4
0.6
0.8
1
1.2x 10
−5
x
max
imum
err
or
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 14 / 23
C (m) evaluation
In double precision, when Q̃(u) is too close to an integer m+1,we need to evaluate C (m) to choose between m and m+1.
When 12λ≤m≤2λ, this can be done very accurately using another
approximation due to Temme (1987).
Outside this range, a modified version of bottom-up / top-downsummation can be used, because successive terms decrease by factor2 or more.
In single precision this “correction” procedure does not improve theaccuracy.
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 15 / 23
Temme approximation
Maximum relative error in Temme approximation for C (m)
101
102
103
10−14
10−13
10−12
λ
max
imum
rel
ativ
e er
ror
1
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 16 / 23
The GPU algorithm (single precision)
given inputs: λ, u
if λ > 4w := Φ−1(u)s := w/
√λ
if smin<s<smax main branchr := p1(s)x := λ r + p2(r) + p3(r)/λ
elser := f −1(w/
√λ) Newton iteration
x := λ r + c0(r)x := x − 0.0218/(x+0.065λ)
end
n := ⌊x⌋
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 17 / 23
The GPU algorithm (single precision)
if x > 10return n
endend
use bottom-up summation to determine n
if u>0.5 and not accurate enoughuse top-down summation to determine n
end
Top-down summation finds smallest n such that
1−u ≥ e−λ
∞∑
m=n+1
λm
m!
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 18 / 23
The GPU algorithm (double precision)given inputs: λ, u
if λ > 4w := Φ−1(u)s := w/
√λ
if smin<s<smax
r := p1(s)x := λ r + p2(r) + p3(r)/λδ := 2×10−5
elser := f −1(w/
√λ)
x := λ r + c0(r)x := x − 0.0218/(x+0.065λ)δ := 0.01/λ
end
n := ⌊x+δ⌋
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 19 / 23
The GPU algorithm (double precision)
if x > 10if x−n > δ
return n
else if C (n) < u “correction” testreturn n
elsereturn n−1
endend
end
use bottom-up summation to determine n
if u>0.5 and not accurate enoughuse top-down summation to determine n
end
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 20 / 23
Accuracy
100
102
104
106
10−15
10−10
10−5
λ
L1error
poissinvfpoissinv
L1 errors of poissinvf and poissinv functions written in CUDA.
(It measures the fraction of the (0, 1) interval for which the error is ±1.)
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 21 / 23
Performance
Samples/sec for poissinvf and poissinv using CUDA 5.0
GTX670 K20λ poissinvf poissinv poissinvf poissinv
2 1.25e10 1.03e09 1.94e10 5.29e098 5.66e09 3.70e08 8.77e09 2.07e0932 8.07e09 6.98e08 1.25e10 4.20e09128 8.38e09 6.98e08 1.25e10 4.20e09mixed 4.91e09 3.00e08 6.83e09 1.64e09
normcdfinvf 1.96e10 2.70e10normcdfinv 9.61e08 7.15e09
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 22 / 23
Conclusions
By approximating the inverse incomplete Gamma function, havedeveloped an approach for inverting the Poisson CDF for λ>4
Computational cost is roughly cost of inverse Normal CDF functionplus three polynomials of degree 8–12
Report and open source CUDA implementation available now:http://people.maths.ox.ac.uk/gilesm/poissinv
Report also describes a second approximation which is faster forCPUs, but has more branching so is worse for GPUs
Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 23 / 23