compute-based gpu particle systemstwvideo01.ubm-us.net/o1/.../gareth_thomas_compute... ·...
TRANSCRIPT
![Page 1: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/1.jpg)
Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD
![Page 2: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/2.jpg)
Agenda
● Overview
● Collisions
● Sorting
● Tiled Rendering
● Conclusions
![Page 3: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/3.jpg)
Overview
● Why use the GPU?
● Highly parallel workload
● Free your CPU to do game code
● Leverage compute
![Page 4: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/4.jpg)
Overview
● Emit
● Simulate
● Sort
● Rendering
● Rasterization or Tiled Rendering
![Page 5: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/5.jpg)
Data Structures
Particle Pool
Sort List Dead List
Position, Velocity, Age, Color, etc..
uint index; float distanceSq; uint index;
![Page 6: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/6.jpg)
Particle Pool Position, Velocity, Age,
Color, etc..
Dead List uint index;
ConsumeStructuredBuffer<uint>
Emit Compute Shader Initialize Particles from Dead List
uint index = g_DeadList.Consume();
![Page 7: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/7.jpg)
Particle Pool
Dead List uint index;
AppendStructuredBuffer<uint>
Simulate Compute Shader Update Particles. Add alive ones to Sort List, add dead ones to
Dead List
Sort List uint index; float distanceSq
RWStructuredBuffer<float2>
g_DeadList.Append( index ); g_SortList.IncrementCounter();
RWStructuredBuffer<>
![Page 8: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/8.jpg)
Collisions
● Primitives
● Heightfield
● Voxel data
● Depth buffer [Tchou11]
![Page 9: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/9.jpg)
view space
Depth Buffer Collisions
● Project particle into screen space
● Read Z from depth buffer
● Compare view-space particle position vs view-space position of Z buffer value
● Use thickness value P(n)
P(n+1)
thickness
Z
![Page 10: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/10.jpg)
Depth Buffer Collision Response
● Use normal from G-buffer
● Or take multiple taps a depth buffer
● Watch out for depth discontinuities
![Page 11: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/11.jpg)
Sort Compute Shader
Sort List uint index; float distanceSq
RWStructuredBuffer<float2>
● Sort for correct alpha blending
● Additive blending just saturates the effect
● Bitonic sort parallelizes well on GPU
![Page 12: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/12.jpg)
Bitonic Sort
7 3 6 8
1 4 2 5
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2)
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2)
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 13: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/13.jpg)
2 5 1 4 6 8
7 3
Bitonic Sort (Pass 1)
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 2
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 14: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/14.jpg)
3 7
Bitonic Sort (Pass 2)
8 6
1 4 5 2
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 15: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/15.jpg)
3 6
Bitonic Sort (Pass 3)
8 7
5 4 1 2
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 16: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/16.jpg)
3 6
Bitonic Sort (Pass 4)
7 8
5 4 2 1
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 4
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 17: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/17.jpg)
3 4
Bitonic Sort (Pass 5)
2 1
5 6 7 8
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 18: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/18.jpg)
2 1
Bitonic Sort (Pass 6)
3 4
5 6 7 8
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
![Page 19: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/19.jpg)
Sort List
Vertex Shader Read Particle Buffer
Geometry Shader Expand one point to four. Billboard in view space.
Pixel Shader Texturing and tinting. Depth fade for soft particles.
Particle Pool
![Page 20: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/20.jpg)
Sort List
Vertex Shader Read particle buffer and billboard in view space
Pixel Shader Texturing and tinting. Depth fade for soft particles.
Particle Pool
Index Buffer
![Page 21: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/21.jpg)
Rasterization
● DrawIndexedIndirectInstanced() or DrawIndirectInstanced()
● VertexId = particle index (or VertexId/4 for VS billboarding)
● 1 instance
● Heavy overdraw on large particles – restricts game design
● Fit polygon billboard around texture [Persson09]
● Render to half size buffer [Cantlay07]
● Sorting issues
● Loss of fidelity
![Page 22: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/22.jpg)
Tiled Rendering ● Inspired by Forward+ [Harada12]
● Screen-space binning of particles instead of lights
● Per-tile
● Cull & Sort
● Per pixel/thread
● Evaluate color of each particle
● Blend together
● Composite back onto scene
![Page 23: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/23.jpg)
Tiled light particle culling
1 2
3
[1] [1,2,3] [2,3]
![Page 24: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/24.jpg)
Tiled particle culling
● Divide screen into tiles
● Fit asymmetric frustum around each tile
Tile0 Tile1 Tile2 Tile3
![Page 25: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/25.jpg)
Tiled particle culling
![Page 26: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/26.jpg)
Thread Group View ● numthreads[32,32,1]
● Culling 1024 particles in parallel
● Write visible indices to LDS
![Page 27: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/27.jpg)
Per Tile Bitonic Sort
● Because each thread adds a visible particle
● Particles are added to LDS in arbitrary order
● Need to sort
● Only sorting particles in tile rather than global list
![Page 28: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/28.jpg)
Tiled Rendering (1 thread = 1 pixel)
● Set accum color to float4( 0, 0, 0, 0 )
● For each particle in tile (back to front)
● Evaluate particle contribution
● Radius check
● Texture lookup
● Optional normal generation and lighting
● Manually blend
● color = ( srcA x srcCol ) + ( invSrcA x destCol )
● alpha = srcA + ( invSrcA x destA )
● Write to screen size UAV
![Page 29: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/29.jpg)
Tiled Rendering, improved! ● Set accum color to float4( 0, 0, 0, 0 )
● For each particle in tile (front to back)
● Evaluate particle contribution
● Manually blend [Bavoil08]
● color = ( invDestA x srcA x srcCol ) + destCol
● alpha = srcA + ( invSrcA x destA )
● if ( accum alpha > threshold )
accum alpha = 1 and bail
● Write to screen size UAV
![Page 30: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/30.jpg)
Coarse Culling ● Bin particles into 8x8
● UAV0 for indices ● Array split into sections using offsets
● UAV1 for storing particle count per bin ● 1 element per bin
● Use InterlockedAdd() to bump counter
● For each alive particle
● For each bin ● Test particle against bin’s frustum planes
● Bump counter in UAV1 to get slot to write to
● Add particle index to UAV0
![Page 31: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/31.jpg)
Demo
Demo with full source available soon
![Page 32: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/32.jpg)
Performance Results
mode frame time (ms)*
Rasterization 4.86
Tiled 3.15
*AMD Radeon R9 290X @ 1080p
Breakdown frame time (ms)*
Simulation 0.39
Coarse Culling 0.06
Tile Culling 0.43
Render 1.60
![Page 33: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/33.jpg)
Performance Results
mode frame time (ms)*
Rasterization 25.0
Tiled 5.1
*R9 290X @ 1080p
![Page 34: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/34.jpg)
Conclusions ● Leverage compute for particle simulations
● Depth buffer collisions
● Bitonic sort for correct blending
● Tiled rendering ● Faster than rasterization
● Great for combating heavy overdraw
● More predictable behavior
● Future work ● Volume tracing
● Add arbitrary geometry for OIT
![Page 35: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/35.jpg)
Questions?
Demo with full source available soon
http://developer.amd.com/tools/graphics-development/amd-radeon-sdk/
![Page 36: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD](https://reader031.vdocuments.net/reader031/viewer/2022020215/5bb7be2f09d3f2751e8b608d/html5/thumbnails/36.jpg)
References ● [Tchou11] Chris Tchou, “Halo Reach Effects Tech”, GDC 2011
● [Persson09] Emil Persson, http://www.humus.name/index.php?page=News&ID=266
● [Cantlay07] Iain Cantlay, “High-Speed, Off-Screen Particles”, GPU Gems 3 2007
● [Harada12] Takahiro Harada et al, “Forward+: Bringing Deferred Lighting to the Next Level”, Short Papers, Eurographics 2012
● [Bavoil08] Louis Bavoil et al, “Order Independent Transparency with Dual Depth Peeling”, 2008