advances in high-performance gpu ray tracing for physics ... › gtc › 2013 › ...naïve...

56
Advances in High-Performance GPU Ray Tracing for Physics-Based Simulation Christiaan Gribble & Lee A. Butler GPU Technology Conference 21 March 2013

Upload: others

Post on 09-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Advances in High-Performance GPU Ray Tracing for Physics-Based Simulation

    Christiaan Gribble & Lee A. Butler GPU Technology Conference

    21 March 2013

  • Introductions

    Christiaan Gribble SURVICE Engineering [email protected]

    Lee A. Butler US Army Research Laboratory [email protected]

    Alexis Naveros SURVICE Engineering [email protected]

    Mark Butkiewicz SURVICE Engineering [email protected]

  • SURVICE Engineering

    • Support DoD community

    • Focus on combat systems – Safety

    – Survivability

    – Effectiveness

    • 400+ employees

    • 10 locations nationally

  • US Army Research Laboratory

    • US Army RDECOM – Corporate laboratory

    – 2000 civilian employees

    • Directorates – SLAD

    – Army Research Office

    – Many others

    • Still in the Top 500 list

  • Agenda

    • Application domains

    • Technical motivation

    • Rayforce GPU ray tracing engine

    • Cognition-Driven Simulation

    • Visual Simulation Laboratory

    0 1

  • Agenda

    • Application domains

    • Technical motivation

    • Rayforce GPU ray tracing engine

    • Cognition-Driven Simulation

    • Visual Simulation Laboratory

    0 1

  • Agenda

    • Application domains

    • Technical motivation

    • Rayforce GPU ray tracing engine

    • Cognition-Driven Simulation

    • Visual Simulation Laboratory

    0 1

  • Agenda

    • Application domains

    • Technical motivation

    • Rayforce GPU ray tracing engine

    • Cognition-Driven Simulation

    • Visual Simulation Laboratory

    0 1

  • Agenda

    • Application domains

    • Technical motivation

    • Rayforce GPU ray tracing engine

    • Cognition-Driven Simulation

    • Visual Simulation Laboratory

    0 1

  • Application domains

    • Ballistic penetration

    • Radio frequency propagation

    • Thermal radiative transport

    • High-energy particle transport

  • Application domains

    • Ballistic penetration

    • Radio frequency propagation

    • Thermal radiative transport

    • High-energy particle transport

  • Application domains

    • Ballistic penetration

    • Radio frequency propagation

    • Thermal radiative transport

    • High-energy particle transport

  • Technical motivation

    Optical rendering Non-optical rendering

  • Technical motivation

    Interval computation Interval generation

    • Difficult or impossible – Negative epsilon hacks

    – Missed/repeated hits

    • Performance impacts – Traversal restart

    – Operational overhead

  • Technical motivation

    Interval computation Interval generation

    • Difficult or impossible – Negative epsilon hacks

    – Missed/repeated hits

    • Performance impacts – Traversal restart

    – Operational overhead

  • Technical motivation

    Interval computation Interval generation

    • Difficult or impossible – Negative epsilon hacks

    – Missed/repeated hits

    • Performance impacts – Traversal restart

    – Operational overhead

  • Rayforce

    • Programmable ray tracing engine

    • Designed for NVIDIA GPUs

    • High performance

    – Modern techniques

    – Novel acceleration structure

    – Multiple traversal algorithms

  • Rayforce

    • Programmable ray tracing engine

    • Designed for NVIDIA GPUs

    • High performance

    – Modern techniques

    – Novel acceleration structure

    – Multiple traversal algorithms

  • Rayforce

    • Programmable ray tracing engine

    • Designed for NVIDIA GPUs

    • High performance

    – Modern techniques

    – Novel acceleration structure

    – Multiple traversal algorithms

  • State-of-the-art ray tracing

    • Leverages modern techniques – Ray packets – Frustum tracing

    • Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations

    Proven techniques bolster high performance

  • State-of-the-art ray tracing

    • Leverages modern techniques – Ray packets – Frustum tracing

    • Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations

    Proven techniques bolster high performance

  • State-of-the-art ray tracing

    • Leverages modern techniques – Ray packets – Frustum tracing

    • Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations

    Proven techniques bolster high performance

  • Acceleration structure

    • kd-tree

    • Binary Space Partitioning tree

    • Regular grid

    • Bounding Volume Hierarchy

  • Acceleration structure

    • kd-tree

    • Binary Space Partitioning tree

    • Regular grid

    • Bounding Volume Hierarchy

    Graph-based spatial indexing

  • Graph-based spatial indexing

    • Efficient

    – Uses memory very carefully

    – Improves cache performance

    – Reduces memory bandwidth

    • Flexible

    • Scalable

  • Graph-based spatial indexing

    • Efficient

    • Flexible

    – Several traversal algorithms

    – Minimal overhead

    – User-configurable pipelines

    • Scalable

  • Graph-based spatial indexing

    • Efficient

    • Flexible

    • Scalable

    – Handles complex scenes

    – Performance depends only on complexity along a ray

  • Traversal algorithms

    • First-hit

    – Nearest intersected primitive?

    – Visibility/bounce rays

    • Any-hit

    • Multi-hit

  • Traversal algorithms

    • First-hit

    • Any-hit

    – Is any primitive intersected?

    – Shadow/ambient occlusion rays

    • Multi-hit

  • Traversal algorithms

    • First-hit

    • Any-hit

    • Multi-hit

    – Which primitives are intersected?

    – Transparency & non-optical rendering

  • Performance – tests

    Coherent workloads

    • vis – first-hit visibility

    – N · V shading

    • x-ray – all multi-hit intersections

    – alpha blending

    Incoherent workloads

    • ao – first-hit visibility

    – 32 AO rays/intersection

    • kajiya – first-hit visibility

    – shadows + 2 diffuse bounces

  • Performance – tests

    Coherent workloads

    • vis – first-hit visibility

    – N · V shading

    • x-ray – all multi-hit intersections

    – alpha blending

    Incoherent workloads

    • ao – first-hit visibility

    – 32 AO rays/intersection

    • kajiya – first-hit visibility

    – shadows + 2 diffuse bounces

  • Performance – tests

    Coherent workloads

    • vis – first-hit visibility

    – N · V shading

    • x-ray – all multi-hit intersections

    – alpha blending

    Incoherent workloads

    • ao – first-hit visibility

    – 32 AO rays/intersection

    • kajiya – first-hit visibility

    – shadows + 2 diffuse bounces

  • Performance – tests

    Coherent workloads

    • vis – first-hit visibility

    – N · V shading

    • x-ray – all multi-hit intersections

    – alpha blending

    Incoherent workloads

    • ao – first-hit visibility

    – 32 AO rays/intersection

    • kajiya – first-hit visibility

    – shadows + 2 diffuse bounces

  • Performance – scenes

    Images rendered at 1024x768 pixels on a NVIDIA GeForce GTX 690

    ktank 1M tris

    conference 282K tris

    san miguel 10M tris

  • Performance – results

    0

    200

    400

    600

    800

    1000

    vis x-ray ao kajiya

    Incoherent workloads

    Coherent workloads

    Mrps

  • Just for Fun …

    0

    200

    400

    600

    800

    1000

    1200

    1400

    vis

    • 1920x1080 vs 1024x768

    • Single hit

    • No color, Lambertian only

    Mrps

  • Multi-hit traversal

    • Which primitives are intersected? – One or more, & possibly all

    – Ordered by t-value along ray

    • Core operation in Rayforce

    • Critical to interval generation

    • Applications

  • Multi-hit traversal

    • Which primitives are intersected?

    • Core operation in Rayforce – Avoids negative epsilon hacks

    – Alleviates traversal restart

    • Critical to interval generation

    • Applications

  • Multi-hit traversal

    • Which primitives are intersected?

    • Core operation in Rayforce

    • Critical to interval generation – Handles bad geometry gracefully

    – Enables early exit

    • Applications

  • Multi-hit traversal

    • Which primitives are intersected?

    • Core operation in Rayforce

    • Critical to interval generation

    • Applications – Physically based simulation

    – Order-independent transparency

    – …

  • Naïve multi-hit

    1 function TRAVERSE(root, ray)

    2 INITIALIZE(hitList)

    3 node root

    4 while VALID(node) do

    5 if !EMPTY(node) then

    6 for tri in node do

    7 if INTERSECT(tri, ray) then

    8 hitData (t-value, u, v, …)

    9 ADD(hitList, hitData)

    10 end if

    11 end for

    12 end if

    13 node NEXT(node)

    14 end while

    ...

    ...

    15 for hitData in hitList

    16 if !USERHIT(ray, hitData) then

    17 goto fini

    18 end if

    19 end for

    20 label fini:

    21 USEREND(ray)

    22 end function

    Simple & effective, but potentially slow

    Find all hits

    Process desired hits

  • Rayforce multi-hit

    1 function TRAVERSE(root, ray)

    2 node root

    3 while VALID(node) do

    4 if !EMPTY(node) then

    5 SET(flags, INIT)

    6 while TRUE do

    7 INITIALIZE(hitList)

    8 for tri in node do

    9 if !DONE(hitMask, tri) then

    10 if INTERSECT(tri, ray) then

    11 hitData (t-value, u, v, …)

    12 if ADD(hitList, hitData) then

    13 SET(flags, REPEAT)

    14 end if

    15 end if

    16 end if

    17 end for

    ...

    ...

    18 if GET(flags) == (INIT & REPEAT) then

    19 INITIALIZE(hitMask)

    20 UNSET(flags, INIT)

    21 end if

    22 for hitData in hitList do

    23 if !USERHIT(ray, hitData) then

    24 goto fini

    25 end if

    26 if GET(flags) == REPEAT then

    27 DONE(hitMask, hitData, TRUE)

    28 end if

    29 end for

    ...

    Find some hits

    Early exit

  • Rayforce multi-hit

    ...

    30 if GET(flags) != REPEAT then

    31 break

    32 end if

    33 UNSET(flags, REPEAT)

    34 end while

    35 end if

    36 node NEXT(node)

    37 end while

    38 label fini:

    39 USEREND(ray)

    40 end function

    Gains efficiency with early exit

    Per-ray cleanup

  • Early Exit Buys Performance

    0

    50

    100

    150

    200

    250

    ktank conf san miguel

    +39.05%

    +91.00%

    Rayforce multi-hit outperforms naïve algorithm by 1.8x +104.01%

  • Rayforce

    • Battle-tested techniques

    • Novel acceleration structure

    • Multi-hit ray traversal

    • Hand-tuned for CUDA

    Demonstrated high performance GPU ray tracing

    first-hit

    any-hit

    multi-hit

    Demonstration Quadro 3000M

    240 Fermi CUDA Cores @ 900 MHz

  • Rayforce

    • Modern techniques

    • Novel acceleration structure

    • Multi-hit ray traversal

    • Hand-tuned for CUDA

    Demonstrated high performance GPU ray tracing

    first-hit

    any-hit

    multi-hit

  • Rayforce

    • Battle-tested techniques

    • Novel acceleration structure

    • Multi-hit ray traversal

    • Hand-tuned for CUDA

    Demonstrated high performance GPU ray tracing

    first-hit

    any-hit

    multi-hit

    Public LGPL v2.0 release of Rayforce now available!

  • Cognition-Driven Simulation

    • Perform visualization during simulation – As a by-product of computation

    – As computation progress

    • Key advantages

    • Managed computation

  • Cognition-Driven Simulation

    • Perform visualization during simulation

    • Key advantages – Enables exploration & steering

    – Drives understanding & confidence

    – User Cognition must be managed: • Too fast details missed

    • Too slow disengage

    • Managed computation

  • Cognition-Driven Simulation

  • Cognition-Driven Simulation

  • Cognition-Driven Simulation

    • Perform visualization during simulation

    • Key advantages

    • Managed computation – Focus on most interesting features

    – Avoid uninteresting parts of parameter space

  • Visual Simulation Laboratory

    • A cross-platform, open-source application framework

    – Qt, OpenSceneGraph, & other technologies

    • The foundation used for several CDS simulation applications

  • Visual Simulation Laboratory

    • A cross-platform, open-source application framework

    – Qt, OpenSceneGraph, & other technologies

    • The foundation used for several CDS simulation applications

    Public LGPL v2.0 release of VSL now available!

  • Get the software

    Rayforce

    Rayforce Website:

    http://rayforce.net

    Source code:

    http://sourceforge.net/projects/rayforce

    VSL

    VSL Website:

    http://vissimlab.org

    Source code:

    http://sourceforge.net/projects/vissimlab