Download - Parallel Random Generator - GDC 2015
![Page 1: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/1.jpg)
Parallel Random Generator Manny Ko Principal Engineer Activision
![Page 2: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/2.jpg)
Outline
●Serial RNG
●Background
●LCG, LFG, crypto-hash
●Parallel RNG
●Leapfrog, splitting, crypto-hash
![Page 3: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/3.jpg)
RNG - desiderata
● White noise like
● Repeatable for any # of cores
● Fast
● Small storage
![Page 4: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/4.jpg)
RNG Quality
● DIEHARD
● Spectral test
● SmallCrush
● BigCrush
GPUBBS
![Page 5: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/5.jpg)
Power Spectrum
Power spectrum density Radial Mean Radial Variance
![Page 6: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/6.jpg)
Serial RNG: LCG
● Linear-congruential (LCG)
● 𝑋𝑖 = 𝑎 ∗ 𝑋𝑖−1 + 𝑐 𝑚𝑜𝑑 𝑀,
● a, c and M must be chosen carefully!
● Never choose 𝑀 = 231! Should be a prime
● Park & Miller: 𝑎 = 16807, 𝑚 = 214748647 =231 − 1. 𝑚 is a Mersenne prime!
● Most likely in your C runtime
![Page 7: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/7.jpg)
LCG: the good and bad
● Good:
● Simple and efficient even if we use mod
● Single word of state
● Bad:
● Short period – at most m
● Low-bits are correlated especially if 𝑚 = 2𝑛
● Pure serial
![Page 8: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/8.jpg)
LCG - bad
● 𝑋𝑘_+1 = (3 ∗ 𝑋𝑘+4) 𝑚𝑜𝑑 8
● {1,7,1,7, … }
![Page 9: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/9.jpg)
Mersenne Prime modulo
● IDIV can be 40~80 cycles for 32b/32b
● 𝑘 𝑚𝑜𝑑 𝑝 where 𝑝 = 2𝑠 − 1:
● 𝑖 = 𝑘 & 𝑝 + 𝑘 ≫ 𝑠 ;
● 𝑟𝑒𝑡 𝑖 ≥ 𝑝 ? 𝑖 − 𝑝 ∶ 𝑖;
![Page 10: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/10.jpg)
Lagged-Fibonacci Generator
● 𝑋𝑖 = 𝑋𝑖−𝑝 ∗ 𝑋𝑖−𝑞; p and q are the lags ● ∗ is =-* mod M (or XOR);
● ALFG: 𝑋𝑛 = 𝑋𝑛−𝑗 + 𝑋𝑛−𝑘(𝑚𝑜𝑑 2𝑚)
● * give best quality
● Period = 2𝑝 − 1 2𝑏−3; 𝑀 = 2𝑏
![Page 11: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/11.jpg)
LFG
● The good:
●Very efficient: 2 ops + power-of-2 mod
●Much Long period than LCG;
●Directly works in floats
●Higher quality than LCG
●ALFG can skip ahead
![Page 12: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/12.jpg)
LFG – the bad
● Need to store max(p,q) floats
● Pure sequential –
● multiplicative LFG can’t jump ahead.
![Page 13: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/13.jpg)
Mersenne Twister
● Gold standard ?
● Large state (624 ints)
● Lots of flops
● Hard to leapfrog
● Limited parallelism
power spectrum
![Page 14: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/14.jpg)
● End of Basic RNG Overview
![Page 15: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/15.jpg)
Parallel RNG
● Maintain the RNG’s quality
● Same result regardless of the # of cores
● Minimal state especially for gpu.
● Minimal correlation among the streams.
![Page 16: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/16.jpg)
Random Tree
• 2 LCGs with different 𝑎
• L used to generate a seed for R
• No need to know how many generators or # of values #s per-thread
• GG
![Page 17: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/17.jpg)
Leapfrog with 3 cores
• Each thread leaps ahead by 𝑁 using L
• Each thread use its own R to generate its own sequence
• 𝑁 = 𝑐𝑜𝑟𝑒𝑠 ∗ 𝑠𝑒𝑞𝑝𝑒𝑟𝑐𝑜𝑟𝑒
![Page 18: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/18.jpg)
Leapfrog
● basic LCG without c:
● 𝐿𝑘+1 = 𝑎𝐿𝑘𝑚𝑜𝑑 𝑚
● 𝑅𝑘+1 = 𝑎𝑛𝑅𝑘 𝑚𝑜𝑑 𝑚
● LCG: 𝐴 = 𝑎𝑛and 𝐶 = 𝑐(𝑎𝑛 − 1)/(𝑎 − 1) – each core jumps ahead by n (# of cores)
![Page 19: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/19.jpg)
Leapfrog with 3 cores
• Each sequence will not overlap
• Final sequence is the same as the serial code
![Page 20: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/20.jpg)
Leapfrog – the good
● Same sequence as serial code
● Limited choice of RNG (e.g. no MLFG)
● No need to fix the # of random values used per core (need to fix ‘n’)
![Page 21: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/21.jpg)
Leapfrog – the bad
● 𝑎𝑝no longer have the good qualities of 𝑎
● power-of-2 N produce correlated sub-sequences
● Need to fix ‘n’ - # of generators/sequences
● the period of the original RNG is shorten by a factor of ‘n’. 32 bit LCG has a short period to start with.
![Page 22: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/22.jpg)
Sequence Splitting
• If we know the # of values per thread 𝑛
• 𝐿𝑘+1 = 𝑎𝑛𝐿𝑘 𝑚𝑜𝑑 𝑚 • 𝑅𝑘+1 = 𝑎𝑅𝑘𝑚𝑜𝑑 𝑚
• the sequence is a subset of the serial code
![Page 23: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/23.jpg)
Leapfrog and Splitting
● Only guarantees the sequences are non-overlap; nothing about its quality
● Not invariant to degree of parallelism
● Result change when # cores change
● Serial and parallel code does not match
![Page 24: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/24.jpg)
Lagged-Fibonacci Leapfrog
● LFG has very long period ● Period = 2𝑝 − 1 2𝑏−3; 𝑀 = 2𝑏
● 𝑀 can be power-of-two!
● Much better quality than LCG
● No leapfrog for the best variant – ‘*’
● Luckily the ALFG supports leapfrogging
![Page 25: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/25.jpg)
Issues with Leapfrog & Splitting ● LCG’s period get even shorter
● Questionable quality
● ALFG is much better but have to store more state – for the ‘lag’.
![Page 26: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/26.jpg)
Crypto Hash
● MD5
● TEA: tiny encryption algorithm
![Page 27: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/27.jpg)
Core Idea
1. input trivially prepared in parallel, e.g. linear ramp
2. feed input value into hash, independently and in parallel
3. output white noise
hash
input
output
![Page 28: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/28.jpg)
TEA
● A Feistel coder
● Input is split into L and R
● 128B key
● F: shift and XORs or adds
![Page 29: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/29.jpg)
TEA
![Page 30: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/30.jpg)
Magic ‘delta’
● 𝑑𝑒𝑙𝑡𝑎 = 5 − 1 231
● Avalanche in 6 cycles (often in 4)
● * mixes better than ^ but makes TEA twice as slow
![Page 31: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/31.jpg)
Applications
Fractal terrain
(vertex shader)
Texture tiling
(fragment shader)st
![Page 32: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/32.jpg)
SPRNG
● Good package by Michael Mascagni
● http://www.sprng.org/
![Page 33: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/33.jpg)
References ● [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999.
● [Park & Miller 88] Random Number Generators: Good Ones are hard to Find, CACM, 1988.
● [Pryor 94] Implementation of a Portable and Reproducible Parallel Pseudorandom Number Generator, SC, 1994
● [Tzeng & Li 08] Parallel White Noise Generation on a GPU via Cryptographic Hash, I3D, 2008
● [Wheeler 95] TEA, a tiny encryption algorithm, 1995.
![Page 34: Parallel Random Generator - GDC 2015](https://reader031.vdocuments.net/reader031/viewer/2022021921/58f179131a28ab11758b457f/html5/thumbnails/34.jpg)
Take Aways
● Look beyond LCG
● ALFG is worth a closer look
● Crypto-based hash is most promising – especially TEA.