real-time reyes-style adaptive surface subdivision anjul patney, john owens university of...

73
Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Post on 19-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Real-Time Reyes-Style Adaptive Surface Subdivision

Anjul Patney, John OwensUniversity of California, Davis

Page 2: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Offline Rendering

• Looks realistic• Virtually no visible artifacts• Renders on clusters of CPUs– Slow: hours per frame– Flexible rendering pipelines

Images courtesy: Pixar

Page 3: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Real-Time Rendering

• Looks good enough• Minor artifacts are OK• Renders on commodity GPUs– Fast: 60+ frames per second– Restrictive rendering pipeline

Images courtesy: www.ign.com

Page 4: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

There is a gap

• Geometric complexity– From Polygon Meshes to Smooth Surfaces

• Shading complexity– From HLSL to RenderMan shaders

• Special Effects– Motion Blur, depth-of-field

• Other complex effects– Global Illumination, Subsurface scattering,

ambient occlusion

Page 5: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

But commodity GPUs…

Vertex Processors

Composite

Fragment Processors

Host

Rasterization

Primitive Setup

Fragment Crossbar

FramebufferRef: NVIDIA GeForce 6 Architecture

Page 6: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

… have changed a lot

General-PurposeProcessors

Host

Thread Issue

Framebuffer

Setup / Rasterize

Thread Processor

Ref: NVIDIA G80 Architecture

Page 7: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

How does this affect things?

• Increased programmability– Arbitrary computation– Dynamic memory management– Irregular data structures

• Flexible Rendering– Compute for graphics– Offline quality in real-time ?

• But we must be careful

Page 8: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

There is a gap

• Geometric complexity– From Polygon Meshes to Smooth Surfaces

• Shading complexity– From HLSL to RenderMan shaders

• Special Effects– Motion Blur, depth-of-field

• Other complex effects– Global Illumination, Subsurface scattering,

ambient occlusion

Page 9: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Real-Time Reyes-Style Adaptive Surface Subdivision

Anjul Patney and John D. OwensSIGGRAPH Asia 2008

http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=952

Page 10: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Outline

• Motivation• Reyes Subdivision – algorithm– Challenges– Parallel formulation

• Subdivision on GPU – implementation– Issues– Solutions

• Results

Page 11: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Outline

• Motivation• Reyes Subdivision – algorithm– Challenges– Parallel formulation

• Subdivision on GPU – implementation– Issues– Solutions

• Results

Page 12: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Motivation

• Polygon-based Rendering is insufficient– Undesirable artifacts, especially along silhouettes– Complicated model representation– Model resolution is view-independent

• Can we expect performance from irregular computation on GPUs?

• Can GPUs support completely new pipelines?

Page 13: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Image courtesy: www.ign.com

Page 14: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Enter Reyes

• Industry standard in high-quality rendering

• Forms the architecture beneath RenderMan

• Pipeline features– Input: Parametric Surfaces– Rendering primitive: 0.5 x 0.5 pixel

micropolygons– Adaptive tessellation– Per-micropolygon programmable shading– Stochastic sampling

Micropolygongrids

Bound, Cull and Split

Dice (tessellate)

Shade

Sample and Filter

Higher-order surfaces

Split surfaces

Shaded grids

Final Pixels

Page 15: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Outline

• Motivation• Reyes Subdivision – algorithm– Challenges– Parallel formulation

• Subdivision on GPU – implementation– Issues– Solutions

• Results

Page 16: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 17: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 18: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 19: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 20: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 21: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 22: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 23: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 24: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 25: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 26: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 27: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 28: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 29: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 30: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 31: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 32: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

What is bad?

• Depth first subdivision is recursive!

• List of primitives is not static– Cull, split may destroy or generate primitives

• Unbounded memory– Dicing produces a huge number of micropolygons

Page 33: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Can we do this in parallel?

Page 34: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Can we do this in parallel?

Page 35: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Can we do this in parallel?

• A lot of independent operations– Our simplest model:• 5k primitives, 1.2M micropolygons

– Massively Parallel workload

• Regular Computation– Bound/Split/Dice all primitives together– SPMD friendly

Page 36: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 37: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 38: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

No

Bound and Cull

Diceable?

Dice

Split

Yes

Page 39: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 40: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 41: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 42: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 43: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 44: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 45: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Parallel Reyes Subdivision

No

Yes

Bound and Cull

Diceable?

Dice

SplitNo

Yes

Page 46: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Analogy: A Dynamic Work Queue

A B C D E F G H I

A A B C EE F H

Split No ActionCull

C

Page 47: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

How can we do these efficiently?

• Creating new primitives– How to dynamically allocate space?

• Culling unneeded primitives– How to avoid fragmentation?

Page 48: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

A child primitive is offset by the queue length

Our Choice – keep it simple…

A B C D E F G H I

A B C E F H A C E

Page 49: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

…and get rid of the holes later

A B C E F H A C E

A B C E F H A C E

Scan-based compact is fast!(Sengupta ‘07)

Work-queue stays contiguous

Page 50: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Outline

• Motivation• Reyes Subdivision – algorithm– Challenges– Parallel formulation

• Subdivision on GPU – implementation– Issues– Solutions

• Results

Page 51: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Platform

• NVIDIA GeForce 8800 GTX– 16 SMs, each with 32-wide effective SIMD– 16KB shared memory per SM– 768 MB total GPU memory, no cache

• NVIDIA CUDA 1.1– Grid/Block/Thread programming model– OpenGL interface through shared buffers

Page 52: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Implementation Details

• Input primitives – Bicubic Bézier Surfaces– Choice of primitive only affects implementation

• View Dependent Subdivision every frame– CPU-GPU input transfer only once– Suitable for animating control points

• Final micropolygons sent to OpenGL as a VBO– Flat-shaded and displayed for preview

Page 53: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Kernels Implemented

• Dice– Regular, symmetric computation on a highly

parallel workload– 256 threads per primitive– Primitive information in shared memory

• Bound/Split– Non-trivial to ensure efficiency in implementation

Page 54: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Bound/Split: Efficiency Goals

• Memory Coherence– Off-chip memory accesses must be efficient

• Computational Efficiency– Hardware SIMD must be maximally utilized

Page 55: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Memory Coherence

• After each iteration, work queue is compacted– Primitives always contiguous in memory

• Structure-Of-Arrays representation– Attributes across primitives adjacent in memory

• 99.5% of all accesses were fully coalesced

Page 56: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

SIMD Utilization

• Intra-Primitive parallelism– A primitive’s control points are mostly

independent– Execution path divergence is negligible

• 16 Threads per primitive– Vectorized Bound/Split– Use shared memory for

communication

• 90.16% of all branches were SIMD coherent

32 threads (1 warp)

Page 57: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Outline

• Motivation• Reyes Subdivision – algorithm– Challenges– Parallel formulation

• Subdivision on GPU – implementation– Issues– Solutions

• Results

Page 58: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Results - Killeroo

• 11532 patches 14426 grids

• 5 levels of subdivision• Bound/Split: 6.99 ms• Dice: 7.21 ms

• 4.2 frames per second (subdivision-only: 70)

Killeroo NURBS model courtesy headus 3D tools: http://headus.com.au

Page 59: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Results - Teapot

• 32 patches 4823 grids• 11 levels of subdivision• Bound/Split: 3.46 ms• Dice: 2.42 ms

• 12.4 frames per second (subdivision-only: 170)

Page 60: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

0 5000 10000 15000 200000

5

10

15

20

25

30

Total subdivision Bound / SplitDice

Number of grids

Tim

e (m

s)

Results – Random ModelsSubdivision time

proportional to number of micropolygons

Page 61: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Overheads

Killeroo Teapot Random Scene (30 patches)

0%

25%

50%

75%

100%

Render overheadCUDA-OpenGL mapping overheadNormal generation timeSubdivision time

GPUs are inefficient at rendering lots of small

triangles

Should ideally be zero; this is an acknowledged

CUDA limitation

Page 62: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Storage Issues

• Reyes pipeline suffers from unbounded memory demand– A huge number of micropolygons are generated– Transparency and Blending preclude early rejection

• Most implementations use screen-space buckets– But how does this work in parallel?– Large buckets present a more parallel workload– Small buckets have a smaller memory footprint

Page 63: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Screen-Space Buckets

0 128 256 384 5120

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

Cumulative subdivision time Maximum memory footprintAverage memory footprint

Bucket width (pixels)

Subd

ivis

ion

Tim

e (m

s)

Mem

ory

Usa

ge (M

B)

Acceptable performance!Small memory footprint!

Page 64: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Limitations

• All primitives must be split before dicing

• Cracks / Pinholes

• Uniform Dicing is wasteful

Page 65: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Limitations

• All primitives must be split before dicing

• Cracks / Pinholes

• Uniform Dicing is wasteful

Page 66: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Limitations

• All primitives must be split before dicing

• Cracks / Pinholes

• Uniform Dicing is wasteful

Page 67: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Limitations

• All primitives must be split before dicing

• Cracks / Pinholes

• Uniform Dicing is wasteful

Page 68: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Conclusions

• Recursive subdivision maps well to current GPUs– And works fast!– It is advantageous to use smooth primitives in

interactive rendering

• Fixed-function tessellation can be emulated– Dicing is already very fast (2 Mgrids/sec)

• It’s time to experiment with alternate pipelines

Page 69: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Future Work

• Crack Filling– Add dummy polygons during post-processing

• More of Reyes– Displacement Mapping– Offline quality Shading on GPUs– Parallel Stochastic Sampling (Wei ‘08)– A-buffer

Page 70: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

Thanks to

• Feedback and suggestions from– Per Christensen, Charles Loop, Dave Luebke, Matt

Pharr, and Daniel Wexler

• Financial support from– DOE Early Career PI Award– National Science Foundation– SciDAC Institute for Ultrascale Visualization

• Equipment support from NVIDIA

Page 71: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

BACKUP SLIDESReal-Time Reyes-Style Adaptive Surface Subdivision

Page 72: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

CUDA Thread Structure

Image courtesy: NVIDIA CUDA Programming Guide, 1.1

Page 73: Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis

CUDA Memory Architecture

Image courtesy: NVIDIA CUDA Programming Guide, 1.1