accelerating rigid body simulation on today’s...
TRANSCRIPT
-
1 | Accelerating Rigid Body Simulation on Today’s GPUs
Accelerating Rigid Body Simulation
on Today’s GPUs
Takahiro Harada
-
2 | Accelerating Rigid Body Simulation on Today’s GPUs
RIGID BODY SIMULATION PIPELINE
Broad phase collision detection
– Quick check using bounding volumes
Narrow phase collision detection
– Detailed check using geometry
Solve A
B
C
D
E
F
-
3 | Accelerating Rigid Body Simulation on Today’s GPUs
WHY USE GPU?
Large scale simulation requires a lot of bodies
– Destruction
Increase the cost of a simulation
GPU has high-
– Peak performance
– Memory bandwidth
GPU is different from CPU
-
4 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
-
5 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Rahul Sathe, Rigid Body Collision
Detection on the GPU, Sig
Poster(2006)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
Cube map
Sathe(2006)
Cube map
! "#"$%&' $( %) ' **"+"' , %- . /. 0/"' , %' , %/1. %2 34%! "#$%&' "(#)&*+"#$%,- ,. "(#) / 01()%,234 56&7 8"4 &9": )&*"8"4 ,(,%": ) / 01()%,234 5&
; "4 01/ 4"56 ($&
#
@)#%/ , 5)-6 "C#)+"#$%&41"6 #-' #)-, 2#-0#)B&4A"*)' #(%"#-, )"%' "*)-, 37#
N"):' #*&, ' -2"%#)B&4A"*)' #@#( , 2#O7#D-3/ %"#J#' +&B' #)+"#(13&%-)+6 #-, #(*)-&, 7#! "#' +&&)#%(9' #0%&6 #*", )%&-2#&0#&4A"*)#@#))+"#?"%)"E#&0#
&4A"*)#O7#! "#*(1*/ 1()"#)+"#2-' )( , *"#4")B"", #)+"' "#)B$&-, )' #P2JQ#-, #B&%12#' $(*"7#= ' -, 3#)+"#2-' )( , *"#*/ 4"56 ($#1&&R5/ $C#B"#*( , #0-, 2#
&/ )#+&B#0(%#&4A"*)#@:' #4&/ , 2(%9#-' #0%&6 #-)' #*", )%&-2#-, #)+"#
2-%"*)-&, #&0#)+"#?"%)"E#&0#&4A"*)#O#P2HQ7#! "#)+", #*&6 $(%"#2H#( , 2#2J7#G0#2H#ST#2JC#)+", #)+"#(*)/ (1#2-' )( , *"#4")B"", #)+"#)B$&-, )' #-' #
1"' ' #)+(, #)+"#2-' )( , *"#0%&6 #*", )%&-2#)-)' #4&/ , 2(%97#8+-' #6 "( , ' #
)+()#B+", #2H#ST#2J#)+"#&4A"*)' #(%"#-, )"%' "*)-, 37#! "#%"$"( )#)+-' #' )"$#
0&%# (11# )+"# ?"%)-*"' # ( , 2# )+", # B-)+# )+"# &4A"*)' # @# ( , 2# O#-, )"%*+(, 3"2C#/ , )-1#)+-' #*&, 2-)-&, #-' #)%/ "7##
#
! "
# " $
# " %
# " &
# " '
# $'
# $$
# $"
! $
# " "
, $, "
, $
, "-. /012345/0, 6, 78/963:6; 01/03. 6"
:13? 6! " 6@87. >68A@510B? 526433CB@2
, $ DE/@546, 78/96F0/G00. 6! " 65. , 6#$"
H710E/73. 63:6; 01/7E0863:6234=>3. 6$6:13? 6E0. /016! " 63:6234=>3. 6"
, "
, " I 6/0
#C0
-
6 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Takahiro Harada et al. , Smoothed
Particle Hydrodynamics on GPUs,
Proc. of CGI(2007)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Cube map
Sathe(2006)
Cube map
Uniform
Grid
(Atomics)
Penalty
-
7 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Takahiro Harada, Real-time Rigid
Body Simulation on GPUs, GPU
Gems 3 (2007)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007) Cube map
Sathe(2006)
Cube map Particles
Uniform
Grid
(Atomics)
Penalty
-
8 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Scott Le Grand, Broad-phase
Collision Detection with CUDA,
GPU Gems3(2007)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007) Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Particles
Uniform
Grid
(Sort)
Uniform
Grid
(Atomics)
Penalty
-
9 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Peter Kipfer, LCP Algorithms for
Collision Detection using CUDA,
GPU Gems 3(2007)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Uniform
Grid
(Sort)
Uniform
Grid
(Atomics)
Penalty
-
10 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Takahiro Harada, Parallelizing the
Physics Pipeline, GDC(2009)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Constraint
Partially
Serial
Batching
Batching
Harada(2009)
Uniform
Grid
(Sort)
Uniform
Grid
(Atomics)
Penalty
-
11 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Richard Tonge, PhysX GPU Rigid
Bodies in Batman, Game
Programming Gems 8(2010)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Uniform
Grid
(Sort)
Uniform
Grid
(Atomics)
Penalty
-
12 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Liu et al., Real-time Collision Culling
of a Million Bodies on Graphics
Processing Units, Siggraph
Asia(2010)
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
GPU Sweep & Prune
Liu(2010)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Uniform
Grid
(Atomics)
Penalty Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Uniform
Grid
(Sort)
Sweep &
Prune
-
13 | Accelerating Rigid Body Simulation on Today’s GPUs
Architecture & Algorithm
-
14 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU ARCHITECTURE
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
Global Memory
-
15 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Used GPU as a processor with
many ALUs
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
GPU Sweep & Prune
Liu(2010)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Uniform
Grid
(Sort)
Sweep &
Prune
Uniform
Grid
(Atomics)
Penalty
-
16 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU ARCHITECTURE
Global Memory
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
ALU
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
AL
U
LDS
Compute Unit (CU)
-
17 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
LCP
– A CU processes a pair
– Synchronization
– LDS
Broad phase collision
– Radix sort
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
GPU Sweep & Prune
Liu(2010)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Particles
Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Geometry
Uniform
Grid
(Sort)
Sweep &
Prune
Uniform
Grid
(Atomics)
Penalty
-
18 | Accelerating Rigid Body Simulation on Today’s GPUs
GPU RIGID BODY TECHNIQUES
Solver using CU
Approximated narrowphase using
CU
– Performance
GPU Graphics GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
GPU Sweep & Prune
Liu(2010)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Uniform
Grid
(Sort)
Sweep &
Prune
Batching
Constraint
Representati
on for GPU
Uniform
Grid
(Atomics)
Penalty
-
19 | Accelerating Rigid Body Simulation on Today’s GPUs
Compute Unit Aware
Rigid Body Simulation
-
20 | Accelerating Rigid Body Simulation on Today’s GPUs
PIPELINE
Copy body and pair buffer
GPU allocates big buffers
– Contact
– Constraints
Narrow phase and solve is done on the GPU
Don’t have to read back big buffers
Bo
dy
Pai
r
Bo
dy
Co
nta
ct
Co
nst
rain
t
Pai
r
Merge
Dispatch Logic
CPU
Broad phase Collision
GPU
NP Collision
Solve
Body, Pair
Body Copy
Copy
Copy
Copy
-
21 | Accelerating Rigid Body Simulation on Today’s GPUs
NARROWPHASE – CU LEVEL PARALLELIZATION
Collision of a pair is independent
Use a SIMD lane for a pair
– Not enough resource for a SIMD lane
Use a CU for a pair
A CU processes and “append” colliding pairs
– HW accelerated append operation on AMD GPU
Load balancing
– CU can fetch a pair from queue
– Explicit pair split
CU level parallelization
Contact buffer
Pair buffer
Narrowphase
CU CU CU CU CU CU CU CU
-
22 | Accelerating Rigid Body Simulation on Today’s GPUs
NARROWPHASE – SIMD LANE PARALLELIZATION
A collision has to be split into parallel works
Parallel collision detection
Support arbitrary convex shapes
Too many contact points
– Increase cost of solver
Redundancy
– Use sync and LDS for vector wide operation
– Eliminate redundant contacts
Contact buffer
Pair buffer
Narrowphase
Read
Reduce
Write
Collide
-
23 | Accelerating Rigid Body Simulation on Today’s GPUs
SHAPE REPRESENTATION
-
24 | Accelerating Rigid Body Simulation on Today’s GPUs
SHAPE REPRESENTATION
-
25 | Accelerating Rigid Body Simulation on Today’s GPUs
SOLVER
Constraint based solver uses a Gauss Seidel method
– Velocity is input and output
– Penalty method
Sequential solve
Parallel solve is possible only if
– Bodies are not shared among constraints
When a body is colliding to several bodies, parallel
solve cannot be used
Solution
– Split constraints into batches
– Constraints in a batch doesn’t share bodies
-
26 | Accelerating Rigid Body Simulation on Today’s GPUs
BATCHING
2 level batching
– Global batching
Split into independent sections
– Local batching
Batch in each section
Contact buffer
Contact buffer
Local batching
CU CU CU CU CU CU CU CU
Global batching
CU CU CU CU CU CU CU CU
Contact buffer
-
27 | Accelerating Rigid Body Simulation on Today’s GPUs
ITERATIVE LOCAL BATCHING
A CU processes a section of contact buffer
– Local batching Global batching
– Extract independent pairs in a set
Stream processing pair buffer
– Read to local, reorder
Pairs are localized
– X Parallel batching
– X Serial batching
– O Iterative batching
Contact buffer
Local batching
Read
Extract Batch
Write
Contact buffer
-
28 | Accelerating Rigid Body Simulation on Today’s GPUs
SOLVER
A CU processes a section of contact buffer
GPU dispatches works by itself
– Read constraints
– Fill batch buffer
– Solve
– If the batch is done, go to the next batch
Previous work
– CPU had to dispatch a kernel per batch
Our method
– CPU dispatches some
– GPU dispatches works by itself
Solver
Read
Solve
Write
Contact buffer
Fill local batch
-
29 | Accelerating Rigid Body Simulation on Today’s GPUs
DEMO
OpenCL
Radeon HD5870
20K objects
About 30fps
GPU collide and solve ≈ 30ms
(Inc. data transfer)
-
30 | Accelerating Rigid Body Simulation on Today’s GPUs
DEMO
OpenCL
Radeon HD5870
12K objects
About 30fps
GPU collide and solve ≈ 30ms
(Inc. data transfer)
-
31 | Accelerating Rigid Body Simulation on Today’s GPUs
OFFLINE DEMO
400K objects
GPU collide and solve < 1s
-
32 | Accelerating Rigid Body Simulation on Today’s GPUs
LIMITATIONS
GPU prefers uniform work granularity
Some works are better to use CPUs
-
33 | Accelerating Rigid Body Simulation on Today’s GPUs
HETEROGENEOUS ERA
GPU Graphics Heterogeneous GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
GPU Sweep & Prune
Liu(2010)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Uniform Grid
(Atomics)
Penalty Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Uniform
Grid
(Sort)
Sweep &
Prune
Batching?
Constraint?
Representati
on for GPU
-
34 | Accelerating Rigid Body Simulation on Today’s GPUs
HETEROGENEOUS ERA
GPU Graphics Heterogeneous GPU Compute
Bro
ad p
hase
N
arr
ow
phase
Solv
e
SPH
Harada(2007)
Particle based rigid body
Harada(2007)
LCP
Kipfer(2007)
GPU Sweep & Prune
Liu(2010)
Cube map
Sathe(2006)
Sort based grid
Le Grand(2007)
Cube map Geometry Particles
Uniform Grid
(Atomics)
Penalty Constraint
Partially
Serial
Batching
Parallel
Batching
Constraint
Batching
Harada(2009) Batching
Tonge(2010)
Uniform
Grid
(Sort)
Sweep &
Prune
Batching?
Constraint?
Representati
on for GPU
AMD Fusion
– Llano, Zacate
Advantages
– Shared memory
– CPU & GPU can work together
-
35 | Accelerating Rigid Body Simulation on Today’s GPUs
HETEROGENEOUS PARTICLE BASED SIMULATION G
PU
CP
U0
CP
U1
CP
U2
Bu
ild
Ac
c. S
truc
ture
SS
Co
llisio
n
S
Integ.
LS
Co
llisio
n
LS
Co
llisio
n
LS
Co
llisio
n
Synchronization
Merge Merge Merge
LL Coll.
L
Integ.
Synchronization
S S
ortin
g
S S
ortin
g
S S
ortin
g
Synchronization
Takahiro Harada, Physics Simulation on Fusion Architecture, AMD Fusion
Developer Summit(2011)
-
36 | Accelerating Rigid Body Simulation on Today’s GPUs
REFERENCES
Peter Kipfer, LCP Algorithms for Collision Detection using CUDA, GPU Gems 3(2007)
Takahiro Harada et al. , Smoothed Particle Hydrodynamics on GPUs, Proc. of CGI(2007)
Takahiro Harada, Real-time Rigid Body Simulation on GPUs, GPU Gems 3 (2007)
Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units, Siggraph
Asia(2010)
Takahiro Harada, Parallelizing the Physics Pipeline, GDC(2009)
Richard Tonge, PhysX GPU Rigid Bodies in Batman, Game Programming Gems 8(2010)
Rahul Sathe, Rigid Body Collision Detection on the GPU, Sig Poster(2006)
Scott Le Grand, Broad-phase Collision Detection with CUDA, GPU Gems3(2007)
Takahiro Harada, Physics Simulation on Fusion Architecture, AMD Fusion Developer Summint(2011)