holy smoke! faster particle rendering using direct compute by gareth thomas

Download Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas

If you can't read please download the document

Upload: amd-developer-central

Post on 16-Apr-2017

6.580 views

Category:

Technology


4 download

TRANSCRIPT

presentation title

Holy Smoke! Faster Particle rendering using directcomputeAmd and Microsoft developer day, june 2014, stockholmGareth thomas2nd june 2014

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

1

PLAN FOR TODAYSimulation OverviewCollisionsSortingTiled RenderingConclusions

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

2

overviewWhy use the gpu for simulation?Highly parallel workloadFree your CPU to do other cool stuffLeverage computeTake advantage of the Local Data Store (LDS)Asynchronous compute on some platforms

motivation

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

3

OVERVIEWEmitSimulateSortRenderRasterize billboardsTiled Rendering using DirectCompute

How to build a GPU particle system

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#Add pic4

SIMULATION OVERVIEWHow the simulation fits togetherSimulate Compute ShaderUpdate Particles. Add alive ones to Alive List, add dead ones to Dead List

Dead ListPersistent list of particle indicesAlive ListList of alive particle indices. Rebuilt each frame by Simulation CSEmit Compute ShaderReads free indices from dead list. Writes new particle data into global array

Particle ArrayPersistent list of particle indices

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

5

collisionsCan no longer use CPU-side physics engine for collisionsUse depth buffer [Tchou11]Project particle into screen space and read depth bufferProject particle into view spaceTransform depth buffer value into view space and compare depthsGenerate collision responseUse G-buffer normalsOr take multiple depth samples to reconstruct the normal

A gpu-based solutionview space

P(n)P(n+1)

thicknessZ

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

6

collisionsOnly collides against geometry in the depth bufferParticles would collide against depth buffer even if they are behind the geometryUse a thickness value to assume particles are in free space behind geometryParticles dont collide when they are off screenCauses issues when particles that are at rest on the floor have gone off-screen and have now disappearedPut particles to sleep in the simulation once they have come to restUse G-buffer to mark parts of the scene that particles can sleep on (static objects)Not Multi-GPU Friendly!Switch off depth buffer collisions in MGPU modeProblems with using the depth buffer

Fallen through world!

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

7

73681425for( subArraySize=2; subArraySize0; compareDist/=2) { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

8

251468

73for( subArraySize=2; subArraySize0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort (Pass 1)

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#3786

1452for( subArraySize=2; subArraySize0; compareDist/=2) // compareDist == 2 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort (Pass 2)

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#3687

5412for( subArraySize=2; subArraySize0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort (Pass 3)

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#3678

5421for( subArraySize=2; subArraySize0; compareDist/=2) // compareDist == 4 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort (Pass 4)

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#3421

5678for( subArraySize=2; subArraySize0; compareDist/=2) // compareDist == 2 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort (Pass 5)

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#2134

5678for( subArraySize=2; subArraySize0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}Bitonic Sort (Pass 6)

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

Sorted Alive List

Vertex ShaderRead Particle Buffer

Geometry ShaderExpand one point to four. Billboard in view space.

Pixel ShaderTexturing and tinting. Depth fade for soft particles.

Particle PoolRendering

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

15

Sorted Alive List

Vertex ShaderRead particle buffer and billboard in view space

Pixel ShaderTexturing and tinting. Depth fade for soft particles.

Particle Pool

Index Buffer

Rendering

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

16

RenderingThe alive particle count is only available on the GPUUse Indirect APIDrawInstancedIndirect( GPU-args ) for Geometry Shader billboardsD3DPT_POINTLIST with no VB, IB or IA VertexId = Particle indexVertexCountPerInstance = NumParticlesInstanceCount = 1Geometry Shader expands the point into four vertices and a 2 triangle strip per billboardOr better still. DrawIndexedInstancedIndirect( GPU-args )D3DPT_TRIANGLELIST, use IBVertexId / 4 = Particle indexVertexId % 4 = Billboard corner indexIndexCountPerInstance = NumParticles * 6InstanceCount = 1

Rasterization for old school GPU particle systems

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#RenderingOverdraw from large particles kills game performance!Get artists to throttle back on the VFX OptimizationsTightly fit polygons around texture [Persson09]Render to smaller buffer [Cantlay07]Sorting issuesLoss of fidelityProblems with Rasterization

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

18

Tiled renderingInspired by Forward+ [Harada12]Screen-space binning of particles instead of point lights!Use a 32x32 thread group to shade a 32x32 pixel tile in screen spaceCull particles (just like Forward+)Sort particles Per pixel/threadEvaluate colour of each particleBlend togetherComposite back onto scene

overview

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

19

Tiled rendering

123

[1][1,2,3][2,3]Divide screen into tilesBuild index lists of intersecting particles per tile

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

20

Tiled renderingView space asymmetric frustum generated per tileUse cameras near planeUse cameras far planeOr calculate far plane from depth buffer

Tile0Tile1Tile2Tile3

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

21

Tiled rendering

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

22

Tiled renderingnumthreads[ 32,32,1]Culling 1024 particles in parallelAdd to LDS index listWrite out to memoryParticle countParticle indicesThread group view

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

23

Tiled renderingTile complexity

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

24

Tiled renderingCannot sort global list of particlesBecause 1024 particles get culled in parallel they get added to visible list in arbitrary orderNeed to sort particles per-tileThis is a good thing!Only need to sort a subset of the global listSorting particles in single pass in LDS vs main memory and in multiple passes

Per tile bitonic sort

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

25

Tiled renderingnumthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen spaceSet accumulation colour to float4( 0, 0, 0, 0 )For each particle in tile (back to front)Evaluate particle contributionUV generation & radius checkTexture lookupNormal generation and lightingManually blendColour = ( srcA x srcCol ) + ( invSrcA x destCol )Alpha = srcA + ( invSrcA x destA )Write result to screen size UAV

Evaluating tile colour

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

26

Tiled renderingnumthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen spaceSet accumulation colour to float4( 0, 0, 0, 0 )For each particle in tile (front to back)Evaluate particle contributionUV generation & radius checkTexture lookupNormal generation and lightingManually blend [Bavoil08]Colour = ( invDestA x srcA x srcCol ) + destColAlpha = srcA + ( invSrcA x destA )if ( accumulation alpha > threshold )accumulation alpha = 1 and bailWrite result to screen size UAV

Evaluating tile colour improved!!!

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

27

Tiled renderingBin particles into 8x8 gridFor each particleFor each binTest particle against binAdd particle if visibleUAV0 for particle indices (size = 8 x 8 x maxparticles)Array split into 64 bins using offsetsUAV1 for storing particle count per bin (size = 8 x 8)1 element per binUse InterlockedAdd() to bump bins counter

Coarse culling

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

Tiled renderingCompute shader setup

Per-bin particle indices

Per-tile sorted particle indices

Screen space colour buffer

Per-bin frustum planes

Per-tile particle indices and distances

Particle data (position, radius, colour etc)Compute ShadersLDSShader Output

Updated particle data

Simulationnumthreads[256, 1, 1], 1 thread per particle

Coarse Cullingnumthreads[256, 1, 1], 1 thread per particle

Tile Culling and Sortingnumthreads[32, 32, 1], 1 thread per particle

Tile Renderingnumthreads[32, 32, 1], 1 thread per pixel

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

29

modeframe time (ms)*Rasterization5.2Tiled3.4

*AMD Radeon R9 290X @ 1080p

Breakdownframe time (ms)*Simulation0.50Coarse Culling0.06Tile Culling and Sorting0.37Tiled Rendering1.86

PERFORMAnce resultsDefault View, ~35K particles

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

30

modeframe time (ms)*Rasterization27.3Tiled6.2

*AMD Radeon R9 290X @ 1080p

PERFORMAnce resultsIn Smoke View, ~35K particles

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

31

conclusionsDepth buffer collisionsGreat bang-for-buckNot perfect!Bitonic sortGood fit for sorting on the GPUTiled RenderingFaster than rasterizationGreat for combatting heavy overdrawMore predictable behaviourFuture workAdd arbitrary geometry for OITVolume tracing

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#

32

Questions?Demo with full source coming soonhttp://developer.amd.com/tools/graphics-development/amd-radeon-sdk/

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#References[Tchou11] Chris Tchou, Halo Reach Effects Tech, GDC 2011[Persson09] Emil Persson, http://www.humus.name/index.php?page=News&ID=266[Cantlay07] Iain Cantlay, High-Speed, Off-Screen Particles, GPU Gems 3 2007[Harada12] Takahiro Harada et al, Forward+: Bringing Deferred Lighting to the Next Level, Short Papers, Eurographics 2012[Bavoil08] Louis Bavoil et al, Order Independent Transparency with Dual Depth Peeling, 2008

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#Disclaimer & AttributionThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

| fASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - june 2 2014, STOCKHOLM#