gs-4147, tressfx 2.0, by bill-bilodeau
Embed Size (px)
DESCRIPTION
Presentation GS-4147 by Bill Bilodeau at the AMD Developer Summit (APU13) November 11-13, 2013TRANSCRIPT

TressFX 2.0 AND BEYOND BILL BILODEAU, AMD
DONGSOO HAN, AMD

2 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 AND BEYOND
TressFX Overview
TressFX Rendering
‒ TressFX 2.0 improvements
TressFX Physics
Future Work
AGENDA

TressFX OVERVIEW

4 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX OVERVIEW
Realistic hair rendering and simulation
‒ Used in Tomb Raider
Goes beyond simple shells and fins representation used in games
Hair is rendered as thousands of strands with self shadowing, antialiasing and transparency
Physical simulation for each strand using GPU compute shaders
Very flexible to allow for different hair styles and different conditions
WHAT IS TressFX?

5 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX RENDERING
What goes into good hair?
‒ Anti-aliasing
‒ Volumetric self shadowing
‒ Transparency
WHAT MAKES IT LOOK GOOD
Basic Rendering Antialiasing Antialiasing
+ Self Shadowing
Antialiasing
+ Self Shadowing
+ Transparency

TressFX RENDERING

7 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX RENDERING
Kajiya-Kay Hair Lighting Model
‒ Anisotropic hair strand lighting model
‒ Uses the tangent along the strand instead of the normal for light reflections
‒ Instead of cos(N, H) , use sin(T,H)
Marschner Model
‒ Two specular highlights
‒ Primary light colored highlight shifted towards the tip
‒ Secondary hair colored highlight shifted towards the root
‒ TressFX uses an approximation of the Marchner technique when rendering two highlights
LIGHTING MODEL
Primary Highlights
Secondary Highlights

8 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX RENDERING
Every hair strand is anti-aliased manually
‒ Not using Hardware MSAA!
Compute pixel coverage on edges of hair strands and convert it to an alpha value
ANTI-ALIASING

9 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX RENDERING
Self Shadowing
‒ Uses a simplified Deep Shadow Map technique
SELF SHADOWING
No Self Shadows With Self Shadows

10 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX RENDERING
Order Independent Transparency (OIT) using a Per-Pixel Linked Lists (PPLL)
Fragments are stored in link lists on the GPU
Nearest K fragments are rendered in back to front order
TRANSPARENCY
No Transparency With Transparency

11 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 1.0 RENDERING
TressFX 1.0 Rendering
‒ Render hair strand geometry into A-buffer
‒ Do lighting, shadowing, and antialiasing
‒ Store fragment color with depth and coverage in per-pixel linked list (PPLL)
‒ Render the K nearest fragments (K-buffer) in back to front order
‒ Blend nearest K fragments in the correct order with transparency
‒ Blend the remaining fragments without sorting
How rendering was done in version 1.0

12 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 1.0 RENDERING A-BUFFER PASS
Hair Geometry
Vertex Shader Pixel Shader
Head UAV
PPLL UAV
Coverage
Lighting
Shadows depth
color
coverage
next

13 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 1.0 RENDERING
GPU implementation of order independent transparency (OIT)
Head UAV
‒ Each pixel location has a “head pointer” to a linked list in the PPLL UAV
PPLL UAV
‒ As new fragments are rendered, they are added to the next open location in the PPLL (using UAV counter)
‒ A link is created to the fragment pointed to by the head pointer
‒ Head pointer then points to the new fragment
PER-PIXEL LINKED LIST
Head UAV
PPLL UAV // Retrieve current pixel count and increase counter
uint uPixelCount = LinkedListUAV.IncrementCounter();
uint uOldStartOffset;
// Exchange indices in LinkedListHead texture corresponding to pixel location
InterlockedExchange(LinkedListHeadUAV[address], uPixelCount, uOldStartOffset);
// Append new element at the end of the Fragment and Link Buffer
Element.uNext = uOldStartOffset;
LinkedListUAV[uPixelCount] = Element;

14 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 1.0 RENDERING K-BUFFER PASS
Full Screen Quad
Vertex Shader Pixel Shader
depth
color
coverage
depth
color
coverage
depth
color
coverage
depth
color
coverage
K-Buffer
Transparency

15 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 1.0 RENDERING
Observation
‒ All fragments are lit and shadowed equally
‒ Even the ones buried under dozens of hair fragments that you can’t see
Solution
‒ Defer the lighting and shadowing until the k-buffer pass
‒ Render the nearest K fragments with high quality
‒ Render the remaining fragments with lower quality (but faster)
HOW CAN WE MAKE IT FASTER?

16 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 RENDERING A-BUFFER PASS
Hair Geometry
Vertex Shader Pixel Shader
Coverage
depth
coverage
tangent
next

17 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 RENDERING K-BUFFER PASS
Full Screen Quad
Vertex Shader Pixel Shader
K-Buffer
Lighting Shadows
depth
coverage
tangent
depth
coverage
tangent
depth
coverage
tangent
depth
coverage
tangent
Transparency

18 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 IMPROVEMENTS
Distance to camera can be used for reducing the density of the hair ‒ Uniformly remove hair strands from the rendering
‒ To compensate for missing strands, thicken the hair
‒ Adjust the minimum pixel coverage with distance
CONTINUOUS LODs
Full Density Hair Reduced Density Hair Reduced Density with Thicker Strands

19 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 IMPROVEMENTS
TressFX11 Sample Code is much more modular
All of the necessary TressFX code in separate files for
‒ Rendering
‒ Simulation
‒ Mesh management
‒ Asset loading
Code for head rendering and sample framework are completely separate
‒ Take the “TressFX” files to get just what you need
Better variable names
Removal of dead code
CODE RESTRUCTURING
Main
TressFXSimulate TressFXRender SceneRender
TressFXMesh
TressFXAssetLoader
TressFXSimulate TressFXRender SceneRender
Gaussian Filter
DX11Mesh
ObjImport TressFX Code

20 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 IMPROVEMENTS
Vertex shader optimizations for rendering
‒ Draw call for hair now uses an index buffer with a triangle list instead of looking up indices from a buffer
PPLL head buffer uses a RWTexture2D for better caching (tiled)
Hair shadow on model is softer and less blocky
Various shader code optimizations
Porting Guide
Download the new TressFX 2.0 sample soon from our Radeon SDK :
http://developer.amd.com
MISCELLANEOUS IMPROVEMENTS

21 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 RENDERING
A-Buffer
‒ 2 UAVs
‒ Size determined by resolution
‒ Head of the Linked List UAV
‒ Screen resolution RWTexture2D, DXGI_FORMAT_R32_UINT
‒ Per-Pixel Linked List UAV
‒ Structured Buffer, size = (number of pixels) x (avg hair layers) x (sizeof(LinkedListStructure))
‒ Default average number of hair layers is 8
‒ Linked list structure is currently 3 DWORDs: depth, coverage, tangent
Limited memory, but unbounded linked list
‒ This means too many fragments for a given pixel can overflow the PPLL
‒ Can cause artifacts
‒ Typically this only happens if the camera gets too close
MEMORY CONSIDERATIONS
0.00
50.00
100.00
150.00
200.00
250.00
Total A-BufferMemory (MB)
Linked List Head Per-Pixel Linked List
720p
1080p

22 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0 RENDERING PERFORMANCE RESULTS
0
0.5
1
1.5
2
2.5
Total Hair RenderTime (ms)
A-Buffer Pass K-Buffer Pass
TressFX 1.0
TressFX 2.0
0
0.5
1
1.5
2
2.5
3
Total Hair RenderTime (ms)
A-Buffer Pass K-Buffer Pass
TressFX 1.0
TressFX 2.0
R9 290x R9 280x
> 2X performance increase!

TressFX SIMULATION

24 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 1.0 Simulation Overview
‒ Main Interest
‒ Simulation Overview
‒ Constraints
‒ Global shape constraints
‒ Local Shape Constraints
‒ Edge length constraints
‒ Problems
TressFX Beyond
‒ General Constraint Formulation
‒ Tridiagonal Matrix-free Formulation
‒ Solving Linear System
‒ Benefits
TressFX Simulation Topics

25 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Main Interest
Main interest of TressFX simulation
‒ Performance, performance and performance! – DirectCompute
‒ Styled hair – bending and twisting forces are important
‒ Stability – position based dynamics
- Conditions – wet, dry or heavy
- Wind – helps express dynamics even the character in the idle mode

26 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Simulation Overview
CPU
GPU – DirectCompute
load hair data
precompute rest-state values – can be offline
while simulation running do
apply gravity
integrate
apply GSC (Global Shape Constraints)
apply LSC (Local Shape Constraints)
apply wind
apply ELC (Edge Length Constraints)
collision handling
GPU – Rendering pipeline vertex buffer

27 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
GLOBAL SHAPE CONSTRAINTS
GSC(Global Shape Constraints)
‒ The initial positions of particles serve as the global goal positions
‒ The goal positions are rigid w.r.t character head transform.
‒ You can think the initial positions are some cage and vertices are trapped in that cage during simulation.
‒ Easy and cheap. Help maintain the global shape but lose the detailed simulation
initial goal position current position
final position

28 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
LOCAL SHAPE CONSTRAINTS
LSC(Local Shape Constraints)
‒ The goal positions are determined in the local frames.
‒ Still the goal positions are transformed in world frames and applied to vertex positions.
initial goal position
current position
final position

29 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
LOCAL SHAPE CONSTRAINTS – CONT’
Local Transforms
‒ As in robotic arm, an open-chain structure has joints and each joint has parent-child relationships to its connected joints.
‒ 𝑇 𝑖−1
𝑖 is to transform (translate and rotate) child space(i) to its parent space(i-1)
‒ With local transforms in chain structure, we can get a global transforms.
‒ Local frames should be updated at each particles
𝑇 𝑤
𝑖 = 𝑇 𝑤
0 ∙ 𝑇 0
1 … ∙ 𝑇 𝑖−2
𝑖−1 ∙ 𝑇 𝑖−1
𝑖

30 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
LOCAL SHAPE CONSTRAINTS – CONT’
Initialize and update local and global transforms
‒ Initialization is performed in CPU or offline only once.
‒ Update is performed at each frame in GPU.
‒ Update is serial process but independent to other strands. We update multiple strands in massive parallel processes in GPU.
‒ With local and global transforms, we can calculate target vertex positions for local shape constraints.
‒ Finally, update two neighboring vertices to get stable convergence.
i-1
i Computing on local transform
Updating position
Zero

31 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
EDGE LENGTH CONSTRAINTS
0.5
how much stretched or compressed unit edge vector

32 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Problems
Extreme acceleration
‒ When character makes a sudden move, it can generate extreme linear and angular acceleration which stretch hair very long.
‒ Even with high iterations with Edge Length Constraints, hair doesn’t recover the original length and as a result, hair can look too stretchy.
‒ Possible solution was to enforce Edge Length Constraints in the serial fashion from the root to the end of hair with extra damping – used for Tomb Raider
‒ We need a better way! And we did research!

33 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Problems EXTREME ACCELERATION

Future TressFX Simulation

35 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
General Constraint Formulation

36 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Tridiagonal Matrix Formulation
Special Formulation for Chain Structure such as Hair
‒ We don’t want to solve a big matrix equation, especially in GPU!
‒ Let’s take advantage of linear topology and serial indexing
General case. We don’t want this!
Special case. Much simpler!
Known. Easy to compute them.
Unknown and what we are solving for

37 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
SOLVING LINEAR SYSTEM
Solving Linear System
‒ The formulation doesn’t require explicit matrix – Good for GPU!
‒ Diagonal, super and sub diagonal elements are non-zero - Sparse!
‒ The equation is diagonally dominant – Good for choice of direct solver!
‒ We can use tridiagonal matrix algorithm (Thomas algorithm)
‒ So we can solve it in GPU!

38 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
FUR CASTLE

39 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
FUR MUSHROOM

40 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
GRASS

41 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
BENEFITS
No more iterations for Edge Length Constraints
‒ Needn’t have to guess number of iterations
‒ Fixed computation cost
‒ Fast convergence

42 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
TressFX 2.0
TressFX 2.0 performance now makes hair rendering faster than the previous version
‒ More than 2X faster in some cases
TressFX is now fast enough to use on consoles
More modular code structure means easier porting to your game
Realistic physics for hair simulation can now be extended to other objects
Stay tuned for more!
‒ Ongoing research to improve and expand the use of this technology
CONCLUSIONS

43 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
REFERENCE
Real-time Hair Simulation with Efficient Hair Style Preservation – Han, et al. VRIPHYS 2012
Tridiagonal Matrix Formulation for Inextensible Hair Strand Simulation – Han, et al. VRIPHYS 2013

44 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.