Sampling Secondary Particles in High Energy PhysicsSimulation on the GPU
Soon Yung JunFermilab
for the Geant Vector/Coprocessor R&D Team
GPU Computing in High Energy Physics
Sept. 10-12, 2014
Pisa, Italy
Introduction
• MC methods are often the only practicalway to sample random variables governedby complicated probability distributions
• MC methods that are widely used in highenergy physics and detector simulation
– inverse transform (analytical solution)– acceptance and rejection– composition and rejection (Geant4)
• Compton scattering: Formula vs. Geant4
– Klein-Nishina dσ[γ(E, 0) → γ′(E′, sin θ)]
– average number of trials Ns = 1.25,sampling efficiency = 0.80 at E = 1 MeV
using a composition and rejection
• What is a problem?
Scattered Photon Energy (MeV)0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
/dE
σC
ompt
on S
catte
ring
d
1
1.5
2
2.5
3 Geant4Klein-Nishina
E = 1 MeV
)S
Number of Trials for Sampling (N0 1 2 3 4 5 6 7 8 9 10
SdN
/dN
1
10
210
310
410
510
610
<Ns> = 1.25
S. Y. Jun @GPU in HEP 9/12/2014 2
Simulation with Coprocessors
• Optimal algorithms for coprocessors (GPUs, many cores)
– maximal instruction throughput and data locality– minimal branch and memory access
• Problems of sampling with “acceptance/composition and rejection” methods
– non-deterministic loop counter (the number of trials in sampling)
//sample x and y for a given yo
do {
x = f(r1);
y = g(x,r2);
} while (y<yo);
– implicit (do-while) loops: divergence, not vectorizable– adopt/change/develop algorithms for parallel or vector architectures
S. Y. Jun @GPU in HEP 9/12/2014 3
Problem Statement and Background
• Explore physics algorithms for HEP detector simulation suitable for SIMD/SIMT
• A part of R&D efforts for Geant Vector/Coprocessor Simulation
– concurrent framework (support heterogeneous computing models)– vectorized geometry (bucketized particle transport)– physics models (tabulated/vectorized/massively parallel)
S. Y. Jun @GPU in HEP 9/12/2014 4
Review: Sampling Techniques
• Inverse transform (cumulative distribution):F (a) =
R a
−∞f(x)dx for ∀f(x) (p.d.f)
u[0, 1] = F (x) → ∃x = F−1(u)
• Acceptance and rejection (Von Neumann):try x, accept if
u[0, 1] ≤f(x)
Ch(x)
efficient if proportionality const. C → 1
• Composition and rejection: decomposewith density functions, fi(x) and rejectionfunctions, 0 < gi(x) < 1
f(x) =n
X
i=1
αifi(x)gi(x)
• Alias method (A.J Walker, 1974)
0
1
0
1
F(x)
F(x)
} f (xk)
xxk+1xk
u
xx = F−1(u)
Continuous
distribution
Discrete
distribution
u
(a)
(b)
C h(x)
C h(x)
f (x)
x
f (x)
(a)
(b)
S. Y. Jun @GPU in HEP 9/12/2014 5
Alias Method for N Discrete Outcomes
• Recast p.d.f f(x) → c, N equal probableevents, each with likelihood 1/N = c
– alias, a[recipient]= donor– non-alias probability, q[i]– pdf, pj = f(xj) for interpolation
• Sample xj with random u[0.1)
– uniform (Alias): N · u = i + αselect j = (u ≤ q[i]) ? i : a[i]
xd = j∆x, xu = (j + 1)∆x
xj = (1 − α)xd + αxu
– linear interpolation (Alias2): testif u′(pj + pj+1) ≤ (1 − u)pj + u · pj+1
then x = xj
else x = αxd + (1 − α)xu
• Effectively vectorized
100 200 300 400 500 6000
0.001
0.002
0.003
0.004
0.005
0.006
100 200 300 400 500 6000
0.001
0.002
0.003
0.004
0.005
0.006
ix100 200 300 400 500 600
Pro
babi
lity
Den
sity
0
0.001
0.002
0.003
0.004
0.005
0.006
f(x)c = 1/N
f(x) > c : donorf(x) < c : recipient
x100 200 300 400 500 600
Equ
al O
utco
me
0
0.001
0.002
0.003
0.004
0.005
0.006
f(x)c = 1/N
S. Y. Jun @GPU in HEP 9/12/2014 6
Implementation
• Sampling methods
MC methods Sampling Table
Composition and rejection do-while loop on-the-fly calc.
Inverse transform uniform (InverseCDF) q[double]
linear (InverseCDF2) q[double], p[double]
Alias uniform (Alias) a[int], q[double]
linear (Alias2) a[int], q[double], p[double]
• Physics models studied and size of sampling tables
Process Model Secondaries NS Bin Scale Table Size
γ Compton Klein-Nishina e− 1.012 linear 100×100
e− Bremsstrahlung Seltzer-Berger γ 1.943 log 100×1000
e− Ionization Moller e− 2.162 linear 100×100
e+ Ionization Bhabha e− 4.934 linear 100×100
Table 1: “Composition of rejection methods” used in Geant4 EM physics models
and tested for this talk with primary particle E = exp(−x/200) in x[1, 1001] MeV,
where NS = average number of trials, ǫ = Nevents/N(total trials)
S. Y. Jun @GPU in HEP 9/12/2014 7
Test Setups
• Build 2-dimensional sampling tables for each physics models on CPU
– size of double[m][n] (ex: m=energy bins, n=sampling bins)– convert to 1D array: inverseCDF (p, q)m×n and alias (a, p, q)m×n
– remove/refine valley/tail ambiguity for inverseCDF (see backup, p26)
• Data transfer to GPU
– sampling tables and random states: one time– primary tracks (particles): recurrent per task
• Kernel set-up (performance measurement): sample secondary particles including
– task1: read energy of particles and write back the updated momenta– task2: create secondary particles and store them to the global memory (GST)
• Copy secondaries to CPU for validations
• Other considerations: impact of conditional loops, texture memory,data layout (AoS vs. SoA)
S. Y. Jun @GPU in HEP 9/12/2014 8
Validation: GPU vs. CPU
• Comparisons: should be identical except the random sequence
– store secondaries on GPU and copy them to CPU (D2H)– compare entries of secondaries produced on GPU to those on CPU– ex: Compton - scattered energy (difference is a statistical fluctuation?)
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
210
310
410
510
610
710
Geant4
GPU
CPU
Geant4
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
210
310
410
510
610
710
InverseCDF
GPU
CPU
InverseCDF
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
210
310
410
510
610
710
InverseCDF2
GPU
CPU
InverseCDF2
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
210
310
410
510
610
710
Alias
GPU
CPU
Alias
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
10
210
310
410
510
610
710
Alias2
GPU
CPU
Alias2
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
GP
U/C
PU
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25Geant4Geant4
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
GP
U/C
PU
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25InverseCDFInverseCDF
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
GP
U/C
PU
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25InverseCDF2InverseCDF2
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
GP
U/C
PU
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25AliasAlias
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
GP
U/C
PU
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25Alias2Alias2
S. Y. Jun @GPU in HEP 9/12/2014 9
Validation: GPU vs. CPU
• Compton - scattered angle (rotated to the incident γ direction)
– the angle is (100%) correlated to the scattered energy, E′
E = meme+E(1−cos θ)
– difference is less than 1.0% for the given sample (a ruler of measurement)
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
610
Geant4
GPU
CPU
Geant4
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
610
InverseCDF
GPU
CPU
InverseCDF
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(6
10
InverseCDF2
GPU
CPU
InverseCDF2
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
610
Alias
GPU
CPU
Alias
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
610
Alias2
GPU
CPU
Alias2
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
GP
U/C
PU
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05Geant4Geant4
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
GP
U/C
PU
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05InverseCDFInverseCDF
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
GP
U/C
PU
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05InverseCDF2InverseCDF2
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
GP
U/C
PU
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05AliasAlias
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
GP
U/C
PU
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05Alias2Alias2
S. Y. Jun @GPU in HEP 9/12/2014 10
Method Validation: Comparisons on CPU
• Energy and angle of scattered particles (4 models × 5 methods)
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
210
310
410
510
610
710
Compton : Energy
inverseCDF
inverseCDF2
alias
alias2
geant4
Compton : Energy
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
610
Compton : AngleCompton : Angle
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
Met
hod/
Gea
nt4
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Compton : Energy (Ratio)
inverseCDF
inverseCDF2
alias
alias2
Compton : Energy (Ratio)
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Met
hod/
Gea
nt4
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
Compton : Angle (Ratio)
inverseCDF
inverseCDF2
alias
alias2
Compton : Angle (Ratio)
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
210
310
410
510
610
710
SeltzerBerger : Energy
inverseCDF
inverseCDF2
alias
alias2
geant4
SeltzerBerger : Energy
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
410
510
610
710
SeltzerBerger : AngleSeltzerBerger : Angle
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
Met
hod/
Gea
nt4
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
SeltzerBerger : Energy (Ratio)
inverseCDF
inverseCDF2
alias
alias2
SeltzerBerger : Energy (Ratio)
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Met
hod/
Gea
nt4
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
SeltzerBerger : Angle (Ratio)
inverseCDF
inverseCDF2
alias
alias2
SeltzerBerger : Angle (Ratio)
• The InverseCDF (uniform) shows a bias in Compton Scattering
S. Y. Jun @GPU in HEP 9/12/2014 11
Method Validation: Comparisons on CPU
• Energy and angle of scattered particles (4 models × 5 methods)
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
1
10
210
310
410
510
610
710
Moller : Energy
inverseCDF
inverseCDF2
alias
alias2
geant4
Moller : Energy
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
1
10
210
310
410
510
610
Moller : AngleMoller : Angle
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
Met
hod/
Gea
nt4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Moller : Energy (Ratio)
inverseCDF
inverseCDF2
alias
alias2
Moller : Energy (Ratio)
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Met
hod/
Gea
nt4
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
Moller : Angle (Ratio)Moller : Angle (Ratio)
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
/dE
σd
1
10
210
310
410
510
610
710
Bhabha : Energy
inverseCDF
inverseCDF2
alias
alias2
geant4
Bhabha : Energy
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
)θdN
/dsi
n(
410
510
610
Bhabha : AngleBhabha : Angle
Sampled Energy (MeV)
100 200 300 400 500 600 700 800 900 1000
Met
hod/
Gea
nt4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Bhabha : Energy (Ratio)Bhabha : Energy (Ratio)
)θSampled sin(
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Met
hod/
Gea
nt4
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
Bhabha : Angle (Ratio)
inverseCDF
inverseCDF2
alias
alias2
Bhabha : Angle (Ratio)
• Alias and Alias2 show good agreements compared with Geant4 for all models
S. Y. Jun @GPU in HEP 9/12/2014 12
Performance Measurements
• HardwareHost (CPU) Device (GPU)
Intel R© Xeon R© E5-2620 Nvidia K20(Kepler)
12 cores @ 2.0 GHz 2496 cores @0.7 GHz
– compilation: cuda 6.0, nvcc -arch=sm 35 –optimize 3 –use fast math
• Initialization
– build tables on CPU and allocate them to GPU : ∼ 3 msec for q[100 × 100]– set up random states: (blocks × threads) curandState
• Measurement
– 1000 events, 79872 tracks/event, sample one secondary particle/track– thread execution: blocks per grid = 26, threads per block = 192– time in [msec]: CPU Time (1 core) vs. GPU Time (all cores)
• Comparisons: relative gain, speedup (CPU/GPU) with RMS errors
– 5 sampling methods and 4 physics models– 2 kernels: task1, task2 (GST)
S. Y. Jun @GPU in HEP 9/12/2014 13
Relative Gain w.r.t Composition and Rejection
Sampling Method
InverseCDF InverseCDF2 Alias Alias2
Tim
e(R
ejec
tion)
/Tim
e(M
etho
d)
0
1
2
3
4Relative Gain: KleinNishina
CPU (+GST)CPUGPU (+GST)GPU
Relative Gain: KleinNishina
Sampling Method
InverseCDF InverseCDF2 Alias Alias2
Tim
e(R
ejec
tion)
/Tim
e(M
etho
d)
0
5
10
Relative Gain: SeltzerBergerCPU (+GST)CPUGPU (+GST)GPU
Relative Gain: SeltzerBerger
Sampling Method
InverseCDF InverseCDF2 Alias Alias2
Tim
e(R
ejec
tion)
/Tim
e(M
etho
d)
0
1
2
3
4Relative Gain: Moller
CPU (+GST)CPUGPU (+GST)GPU
Relative Gain: Moller
Sampling Method
InverseCDF InverseCDF2 Alias Alias2
Tim
e(R
ejec
tion)
/Tim
e(M
etho
d)
0
2
4
6Relative Gain: Bhabha
CPU (+GST)CPUGPU (+GST)GPU
Relative Gain: Bhabha
S. Y. Jun @GPU in HEP 9/12/2014 14
Performance: CPU/GPU
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
<C
PU
Tim
e/G
PU
Tim
e>
0
50
100
CPU/GPU: KleinNishinaSamplingSampling + GST
CPU/GPU: KleinNishina
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
<C
PU
Tim
e/G
PU
Tim
e>
0
50
100
150
CPU/GPU: SeltzerBerger
SamplingSampling + GST
CPU/GPU: SeltzerBerger
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
<C
PU
Tim
e/G
PU
Tim
e>
0
20
40
60
80CPU/GPU: Moller
SamplingSampling + GST
CPU/GPU: Moller
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
<C
PU
Tim
e/G
PU
Tim
e>
0
20
40
60
80CPU/GPU: Bhabha
SamplingSampling + GST
CPU/GPU: Bhabha
S. Y. Jun @GPU in HEP 9/12/2014 15
GPU Time by Kernel Components
• GPU time contribution by each step (reference: Moller InverseCDF)
– Base: invoke objects, read/write track energy (39%)– Random: invoke random states (33%)– InverseCDF: bin search and memory access for CDF (28%)– Alias (+22%), InverseCDF2 (+32%), Alias2 (+38%)
SampleBy Kernel
Base RandomInverseCDF
Alias InverseCDF2
Alias2
Mea
n T
ime/
Eve
nt [m
sec]
0
0.1
0.2
0.3
0.4GPU Time: Moller
S. Y. Jun @GPU in HEP 9/12/2014 16
Impact of Implicit Loops: Divergence
• Degradation due to divergence is not proportional to Nmax ∼ O(〈Ns〉 log Ntotal)
– cost for an additional trial is relatively inexpensive in GPU– impact of implicit loops is relatively weak when 〈NS〉 or Nmax is large
N of Trial LoopsG4 1 2 3 4 5 6 7 8 9
Mea
n T
ime/
Eve
nt [m
sec]
0
20
40
60
80
Moller: CPU
N of Trial LoopsG4 1 2 3 4 5 6 7 8 9
Mea
n T
ime/
Eve
nt [m
sec]
0
0.2
0.4
0.6
0.8
Moller: GPU
Model G4 NS G4 GPU [ms] Fixed NS Fixed GPU [ms] G4/Fixed
Klein-Nishina 1.012 0.214 ± 0.002 1.0 0.209 ± 0.002 1.03
Seltzer-Berger 1.943 3.675 ± 0.061 2.0 1.866 ± 0.011 1.97
Moller 2.162 0.393 ± 0.007 2.0 0.243 ± 0.002 1.62
Bhabha 4.934 0.552 ± 0.014 5.0 0.273 ± 0.001 2.03
S. Y. Jun @GPU in HEP 9/12/2014 17
SoA vs. AoS: Alias and Alias2
• AoS vs. SoA: struct AoS { struct SoA {
G4int a; G4int *a;
G4double p; G4double *p;
G4double q; G4double *q;
}; };
AoS *aos; SoA soa;
Sampling Model
AoS Alias SoA Alias AoS Alias2 SoA Alias2
Mea
n T
ime/
Eve
nt [m
sec]
0.2
0.25
0.3
0.35
0.4
GPU
Sampling Model
AoS Alias SoA Alias AoS Alias2 SoA Alias2
Mea
n T
ime/
Eve
nt [m
sec]
5
10
15
20
CPU
– GPU: SoA is worse than AoS (uncoalesced memory access by eachthread for random array lookups)
– CPU: SoA is better than AoS (∼ 20%)
S. Y. Jun @GPU in HEP 9/12/2014 18
Texture
• 1D Texture (double) for sampling (inverse CDF, PDF) tables
texture<int2, 1> texInverseCDF;
static __inline__ __device__ G4double FetchToDouble(texture<int2, 1> T, int i)
{
int2 v = tex1Dfetch(T,i);
return __hiloint2double(v.y, v.x);
}
Sampling MethodInverseCDF InverseCDFText InverseCDF2 InverseCDF2Text
Mea
n T
ime/
Eve
nt [m
sec]
0
0.2
0.4
0.6
1D Texture (double)SeltzerBergerKleinNishinaMollerBhbha
1D Texture (double)
– 10-20% improvement (use int2 and cast it to double, no linear filtering)
S. Y. Jun @GPU in HEP 9/12/2014 19
Conclusion
• Evaluated performance of different MC methods to sample secondaryparticles produced by EM physics processes
– performance degradation due to the thread divergence (with compositionand rejection method) is not as significant as expected
– inverse transform (inverse CDF method) or alias method improvecomputing performance in both CPU and GPU and may replace theconventional composition and rejection method
– alias methods show no bias in sampled distributions for EM modelstested and are suitable for both parallel and vector architectures
• Perspective
– adopt and develop HEP algorithms for HPC– GPU prototype as a task-driven (high throughput applications)– vector prototype as a framework-driven (vector pipeline)– portable codes for different architectures (code abstraction)
S. Y. Jun @GPU in HEP 9/12/2014 20
Acknowledgment
• Geant Vector/Coprocessor R&D Team: http://geant.cern.ch
– John Apostolakis, Rene Brun, Federico Carminati,Andrei Gheata, Mihaly Novak, Sandro Wenzel (CERN)
– Philippe Canal, Victor Daniel Elvira, Soon Yung Jun,Guilherme Lima (Fermilab, US)
– Marilena Bandieramonte (Universita e INFN, IT)– Georgios Bitzes (University of Athens, GR)– Johannes Christof De Fine Licht (University of Copenhagen, DK)– Laurent Duhem (Intel)– Raman Sehgal (Bhabha Atomic Research Centre, IN)– Oksana Shadura (National Technical Univ. of Ukraine)
• Members of the US ASCR-HEP Collaboration
– Azamat Mametjanov (ANL, US)– Boyana Norris (Univ. of Oregon, US)
S. Y. Jun @GPU in HEP 9/12/2014 21
Backup
S. Y. Jun @GPU in HEP 9/12/2014 22
Composition and Rejection Method
• Combination of the “composition” and rejection methods for the case that
– f(x) is a complicated function– acceptance and rejection is inefficient
• Widely used in Geant4 to sample secondary particles for a given interaction
– to sample x from the distribution f(x) which can be written as
f(x) =
nX
i=1
αifi(x)gi(x)
– where αi > 0 (fraction), fi(x) (PDFs), 0 ≤ gi(x) ≤ 1 (rejection functions)– select i based on αi, try x′ from fi(x) and calculate gi(x
′)– x = x′ if gi(x
′) < u (random u[0, 1)), otherwise try again
• Very powerful method for sampling x based on the distribution f(x) if
– fi/gi can be sampled/evaluated easily– f(x) may not be normalized and the mean number of trials is∑
i αi/C where∫
f(x)dx = C ≈ 1
S. Y. Jun @GPU in HEP 9/12/2014 23
Klein-Nishina Compton Scattering
• Klein-Nishina Differential Cross Section: γ(E, 0) → γ′(E′, sin θ)
dσ
dǫ=
Xonπr2ome
E2[1
ǫ+ ǫ][1 −
ǫ sin2 θ
1 + ǫ2]
ǫ =E′
E=
me
me + E(1 − cos θ)
• Composition and Rejection (Geant4)
f(ǫ) = [1
ǫ+ ǫ] =
2X
i=1
αifi(ǫ)
g(ǫ) = [1 −ǫ sin2 θ
1 + ǫ2]
decompose f(ǫ) with ǫo = 1/(1 + 2E/me) for the minimum γ energy as
α1 × f1(ǫ) = ln(1/ǫo) ×1
ǫ ln(1/ǫo), α2f2(ǫ) =
1
2(1 − ǫ
2o) ×
2ǫ
(1 − ǫ2o)
S. Y. Jun @GPU in HEP 9/12/2014 24
Inverse Probability Distribution (Technical Details)
• Building Inverse PDF, {F−1(u)[k] | k = 0, N}
– find the bin index jk for F−1(u)[k] searching F (x)[k] until ∆u × k < F (x)[j]
– build F−1(u)[k] = ∆x[(jk − 1) + δj] taking into account the spread in[jk − 1, jk] (linear interpolation for the same j over k)
//linear interpolation within the interval with the same index
unsigned int last, cnt, lcnt;
last = cnt = lcnt = 0;
for(int i = 0; i < n ; ++i) {
if(ip[i] > last) {
for(int j = 0 ; j < (cnt-lcnt) ; ++j ) {
q[j+lcnt] = dy*(last + (1.0*j/(cnt-lcnt)));
}
last = ip[i];
lcnt = cnt;
}
++cnt;
}
//for the last bin
q[n-1] = 1.0;
S. Y. Jun @GPU in HEP 9/12/2014 25
Alias Method (Technical Details)
• Building the alias table (a[n]) and the non-alias probability (q[n])
– build the discrete probability density function, p[k] = f(xk) with xk = k∆x
– calculate the average likelihood per equal probable event;
c =1
n − 1
– pick a pair (i, j) =(recipient,donor) such that non-zero p[i] ≤ c and p[j] > c– evaluate alias and non-alias probability for recipient
a[recipient] = donor;
q[recipient] = n*p[recipient];
– update pdfp[donor] = p[donor] - (c-p[recipient]);
p[recipient] = 0.0;
– repeat this for O(n) iterations
S. Y. Jun @GPU in HEP 9/12/2014 26
Performance: Time
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
Mea
n T
ime/
Eve
nt [m
sec]
-110
1
10
210
Time: KleinNishinaCPU (+GST)CPUGPU (+GST)GPU
Time: KleinNishina
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
Mea
n T
ime/
Eve
nt [m
sec]
-110
1
10
210
310
Time: SeltzerBergerCPU (+GST)CPUGPU (+GST)GPU
Time: SeltzerBerger
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
Mea
n T
ime/
Eve
nt [m
sec]
-110
1
10
210
Time: MollerCPU (+GST)CPUGPU (+GST)GPU
Time: Moller
Sampling Method
InverseCDF InverseCDF2 Alias Alias2 Rejection
Mea
n T
ime/
Eve
nt [m
sec]
-110
1
10
210
Time: BhabhaCPU (+GST)CPUGPU (+GST)GPU
Time: Bhabha
S. Y. Jun @GPU in HEP 9/12/2014 27