tressfx the fast and the furry by nicolas thibieroz

23

Click here to load reader

Upload: amd-developer-central

Post on 16-Apr-2017

4.541 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: TressFX The Fast and The Furry by Nicolas Thibieroz

TRESSFXTHE FAST AND THE FURRY

AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM

NICOLAS THIBIEROZWORLDWIDE GAMING ENGINEERING MANAGER, AMD

Page 2: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2

TRESSFX: NEXT-GENERATION HAIR AND FUR RENDERING

The time for next-gen quality is now Tomb Raider pioneered next-gen hair

‒ Includes PS4/XB1

Users expect this level of quality for next-gen titles You need to start thinking about this!

Page 3: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3

WHAT MAKES GOOD HAIR/FUR?

Basic Rendering Antialiasing Antialiasing + Self Shadowing

Antialiasing + Self Shadowing

+ Transparency

Demo

All three components are a must to ensure high quality Transparency in particular is essential to next-gen visuals

‒ Requires an Order-Independent Transparency (OIT) solution

Page 4: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM4

ISOLINE TESSELLATION FOR HAIR/FUR? 1/2

Isoline tessellation has two tessellation factors‒ First is line density (lines per invocation)‒ Second is line detail (segments per line)

In theory provides easy LOD system‒ Variable line density and detail by increasing both tessellation factors based on distance

Tess = (1,1) Tess = (2,1) Tess = (2,2) Tess = (2,3) Tess = (3,3)

Page 5: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5

ISOLINE TESSELLATION FOR HAIR/FUR? 2/2

In practice isoline tessellation is not cost effective for this scenario Lines are always 1-pixel thick

‒ Need Geometry Shader to extrude them into triangles for smooth edges‒ Major impact on performance!

‒ Alternative is to enable MSAA‒ Most engines are deferred so this causes a large performance impact

‒ No extrusion for smoothing edges and no MSAA = poor quality!

Bottom line: a pure Vertex Shader solution is faster‒ Curvature is rarely a problem (dependant on vertices/strands at authoring time)‒ If needed LOD benefit can be done in Vertex Shader

Page 6: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6

TRESSFX RENDERING PIPELINE

TressFX 2 uses a deferred approach for best performance

Three main steps STEP 1: Hair simulation STEP 2: Store fragment properties into buffers STEP 3: Fetch fragment properties, sort, selective shading and render

‒ Full shading on K-frontmost fragments‒ “Tail” fragments are shaded with a simpler light equation and shadowing algorithm

Page 7: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7

TRESSFX RENDERING PIPELINESTEP 1: HAIR SIMULATION

CSCSCS

Input Geometry (SRV)

Post-simulation geometry (UAV)

Simulation parameters

Pre-simulation line segments (model space)

Post-simulation line segments (world space)

Simulation compute shadersEdge length constraintLocal shape constraintGlobal shape constraint Not always needed for furModel TransformCollision Shape Not always needed for furExternal Forces (wind, gravity, etc.)

Input model is a collection of line segments (each segment composed of up to 64 vertices) Optionally divided into “master strands” and “slave strands” to optimize simulation performance

‒ Only master strands are simulated (e.g. 1:4 ratio)‒ Slave strands use master strand simulation results with added noise‒ Virtually no difference from full-scale simulation but much better simulation performance!‒ Master:slave simulation ratio can also vary with distance for even better performance

Demo

Page 8: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8

TRESSFX RENDERING PIPELINESTEP 2: STORE FRAGMENT PROPERTIES INTO BUFFERS

VS

World space

Index Buffer

Indexed triangle list

10

1

2

3 2

4

0

5

Extrusion into triangles

Page 9: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9

LINE SEGMENT EXTRUSION INTO TRIANGLES

A lot of vertices go through rendering high-quality hair or fur!‒ Geometry processing can therefore be a significant bottleneck

In previous versions of TressFX extrusion was done in Geometry Shader (don’t do it!) and then VS with Draw() Much faster performance was obtained with pure VS solution and precomputed index buffers

‒ Maximizes post vertex cache use!

DrawIndexed() method

Indexed triangle list = { ( 0, 1, 2 ), (2, 1, 3 ), ( 2, 3, 4 ), (4, 3, 5 ), ( … ) };

Line segments Expanded quads

10

1

2

3 2

4

0

5

1,4

Draw() method

Line segments Expanded quads

0

1

2

3,562,3

7,10

8,9

0

11

Triangle list = { ( 0, 1, 2 ), ( 3, 4, 5 ), ( 6, 7, 8 ), (9, 10, 11 ), ( … ) };

SLOW!FAST!

Page 10: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10

TRESSFX RENDERING PIPELINESTEP 2: STORE FRAGMENT PROPERTIES INTO BUFFERS

Antialiasing

VS PS

Homogeneous clip space

World space

Index Buffer

Indexed triangle list

10

1

2

3 2

4

0

5

Extrusion into triangles

Page 11: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11

ANTIALIASING

Antialiasing (aka “coverage”) using analytical method‒ This is NOT Multisampling Anti-Aliasing!

Compute pixel coverage on edges of hair strand triangles and convert it to an alpha value

Alpha value fades out based on distance from pixel centre to strand axis Similar principle to Emil Persson’s phone wire Anti-Aliasing

http://www.humus.name/Articles/Persson_GraphicsGemsForGames.pdf

Page 12: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12

TRESSFX RENDERING PIPELINESTEP 2: STORE FRAGMENT PROPERTIES INTO BUFFERS

Antialiasing

depth

tangent

coverage

next

VS PS

Homogeneous clip space

World space

Null RT

Stencil

PPLL UAV

Head UAV

Index Buffer

Indexed triangle list

10

1

2

3 2

4

0

5

Extrusion into triangles

Page 13: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13

PER-PIXEL LINKED LISTS

Head UAV‒ Each pixel location has a “head pointer” to a linked list in the PPLL UAV

PPLL UAV‒ As new fragments are rendered, they are added to the next open location in the PPLL (using UAV counter)‒ A link is created to the fragment pointed to by the head pointer‒ Head pointer then points to the new fragment

// Retrieve current pixel count and increase counter uint uPixelCount = LinkedListUAV.IncrementCounter(); uint uOldStartOffset;

// Exchange indices in LinkedListHead texture corresponding to pixel location InterlockedExchange(LinkedListHeadUAV[address], uPixelCount, uOldStartOffset);

// Append new element at the end of the Fragment and Link Buffer Element.uNext = uOldStartOffset; LinkedListUAV[uPixelCount] = Element;

depth

tangent

coverage

next

PPLL UAV

Head UAV

Memory requirements can be large!‒ Width * Height * Average overdraw * sizeof (PPLL structure)‒ Can use tiling approach in memory-constrained situations

Page 14: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14

TRESSFX RENDERING PIPELINESTEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER

VS PS

Stencil

Head UAV

PPLL UAV

Lighting

Full Screen Quad/Triangle

Page 15: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15

LIGHTING

Different options available‒ Kajiya-Kay hair lighting model‒ Marshner model‒ Anything else that looks good!

Fragment properties storage requirements may limit your options!

TressFX 2 sample uses an approximation of the Marchner technique when rendering two highlights‒ Unique fragment properties: depth, tangent vector

Primary Highlights

Secondary Highlights

Page 16: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16

TRESSFX RENDERING PIPELINESTEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER

VS PS

Stencil

Head UAV

PPLL UAV

Lighting Shadows

Full Screen Quad/Triangle

Page 17: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17

SHADOWS

Three different cases

Hair self-shadowing‒ Essential component to give next-gen volumetric quality look‒ Simplified Deep Shadow Map technique

Hair casting shadows on body & environment‒ Body: Need a very soft look at close range (blur shadow map)‒ Environment: render (possibly simplified) hair geometry into cascaded shadow map

Environment casting shadows on hair‒ Sample environment shadow map at hair fragment rendering time

Page 18: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18

TRESSFX RENDERING PIPELINESTEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER

VS PS

Stencil

Head UAV

PPLL UAV K frontmost

fragment: full shading, sorting and manual blending

Lighting Shadows

Full Screen Quad/Triangle

Tail fragments: cheap shading, no sorting and manual blending

Page 19: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19

SELECTIVE FRAGMENT SHADINGTHIS IS WHERE THE MEAT OF THE CODE OCCURS!

// Go through the rest of the linked list, and keep closest k fragments but// not in sorted order[allow_uav_condition]for(int l=0; l < g_iMaxFragments; l++){ if(pointer == NULLPOINTER) break; int id = 0; float max_depth = 0;

// Find the furthest node in array [unroll]for(int i=0; i<KBUFFER_SIZE; i++) { float fDepth = kBuffer[i].depth; if(max_depth < fDepth) { max_depth = fDepth; id = i; } }

// get the start of the linked list from the head pointeruint pointer = LinkedListHeadSRV[In.vPosition.xy];

// Copy first K fragments from PPLL into KBuffer[]NODE Kbuffer[KBUFFER_SIZE];for(int p=0; p<KBUFFER_SIZE; p++){ if (pointer != NULLPOINTER) { kBuffer[p] = LinkedListSRV[pointer]; pointer = LinkedListSRV[pointer].uNext; }}

// If linked list node is nearer than the furthest one in the local array // exchange the node in the local array for the one in the linked list NODE Node = LinkedListSRV[pointer]; if (max_depth > Node.depth) { SWAP(Node, Kbuffer[i]); }

// Do simple shading and shadowing for nodes not part of the K closest fragments fragmentcolor = ComputeSimpleShading(Node); // Out of order blending fcolor.xyz = mad(-fcolor.xyz, fragmentColor.w, fcolor.xyz) + fragmentColor.xyz * fragmentColor.w; fcolor.w = mad(-fcolor.w, fragmentColor.w, fcolor.w);

// Retrieve next node pointer pointer = LinkedListSRV[pointer].uNext;}// Blend the k nearest layers of fragments from back to front, where k = KBUFFER_SIZEfor(int j=0; j<KBUFFER_SIZE; j++){ int id = 0; float max_depth = 0;

// Find the furthest node in the array for(int i=0; i<KBUFFER_SIZE; i++) { float fDepth = kBuffer[i].depth; if(max_depth < fDepth) { max_depth = fDepth; id = i; } } // Take this node out of the next search Node = KBuffer[id]; KBuffer[id] = (NODE)0;

// Do high quality shading and shadowing fragmentcolor = ComputeHighQualityshading(Node);

// Blend fragment color fcolor.xyz = mad(-fcolor.xyz, fragmentColor.w, fcolor.xyz) + fragmentColor.xyz * fragmentColor.w; fcolor.w = mad(-fcolor.w, fragmentColor.w, fcolor.w);}

return fcolor;

Fetch and store

first K fragments

into array

Fetch next fragment

and replace in array

if closer

Out of order cheap

shading and blending

for fragments outside

array

Sort, shade and

blend K frontmost

fragments in array

Page 20: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20

TRESSFX RENDERING PIPELINESTEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER

VS PS

Stencil

Head UAV

PPLL UAV

Render targetK frontmost fragment: full shading, sorting and manual blending

Lighting Shadows

Full Screen Quad/Triangle

Tail fragments: cheap shading, no sorting and manual blending

Page 21: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21

TRESSFX PERFORMANCEFAST AND FURRY

High number of fragments required for quality look Main bottleneck is shading all those fragments

‒ Not per-pixel linked list traversal!

Selective shading approach allows significant performance savings with minor or negligible quality tradeoffsTechnique Cost

Out of order, no shading 1.31 msOut of order, shading 2.80 msDeferred PPLL, selective shading 2.13 ms

Shading cost is ~ 1.5 ms

24% faster

Fur model with ~130,000 fur strandsRunning on AMD Radeon 7970 @ 1080p

Distance Sim LOD Disabled

Sim LOD Enabled

Close range 1.01 ms 1.01 msMedium range 1.01 ms 0.70 msLong range 1.01 ms 0.37 ms

Simulation LOD Distance-adaptive Shading and Simulation LOD further improves performance

“K frontmost fragments” value can inversely scale with distance

Distance Shading LOD Disabled

Shading LOD Enabled

Close range 3.26 ms 3.26 msMedium range

3.23 ms 1.77 ms

Long range 2.52 ms 0.64 ms

Shading LOD

Page 22: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM22

CONCLUSION AND QUESTIONS?

Next-gen hair/fur look at real-time performance is possible now! Fast:

‒ Variable ratio master/slave compute simulations‒ Vertex Shader extrusion of segments into triangles (do not use tessellation + GS)‒ Deferred rendering with selective shading‒ Distance-based shading and simulation LOD‒ Optimized shaders!

Furry:

Full and free access to TressFX 2 SDK sample, code and documentation at:http://developer.amd.com/tools-and-sdks/graphics-development/amd-radeon-sdk/

@[email protected]

Page 23: TressFX The Fast and The Furry by Nicolas Thibieroz

| TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM23

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.