deferred lighting and post-processing on ps3

38
Deferred Lighting and Post Processing on PLAYSTATION®3 Matt Swoboda PhyreEngine™ Team Sony Computer Entertainment Europe (SCEE) R&D

Upload: mike

Post on 15-Nov-2014

118 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deferred Lighting and Post-Processing on PS3

Deferred Lighting and Post Processing on PLAYSTATION®3

Matt Swoboda

PhyreEngine™ Team

Sony Computer Entertainment Europe (SCEE) R&D

Page 2: Deferred Lighting and Post-Processing on PS3

2

Where Are We Now?

• PS3 into its 3rd year

• Many developers on their 2nd generation engines

• Solved the basic problems

• SPUs STILL underused – But it’s improving

Page 3: Deferred Lighting and Post-Processing on PS3

3

But..

• GPU now the most common bottleneck

• Usually limited by fragment operations

• Many titles take > 1/3 of their time in post processing

• Most developers want to do even more fragment work– More / heavier post processing effects– Better lighting techniques / more lights / softer shadows– Longer shaders– Features ported from PC / other console hardware

Page 4: Deferred Lighting and Post-Processing on PS3

4

“We fixed the vertex bottleneck..”

• Many possible solutions to improve geometry performance beyond just “optimising the shader”– LOD– Occlusion culling & visibility culling– Move large vertex operations to SPU, e.g. skinning– SPU triangle culling

Page 5: Deferred Lighting and Post-Processing on PS3

5

What About Pixels?

• Fragment operations / post processing rarely optimised like geometry operations – Throw whole operation at the GPU – Same operation done for every pixel– Spatial optimization / branching considered too slow

• SPU not considered: “too slow”, “uses too much bandwidth”

Page 6: Deferred Lighting and Post-Processing on PS3

6

SPU pixel processing

• Yes, the SPU is fast enough to process pixels

• Won’t beat the GPU in a brute force race

• GPU specialises in rasterising triangles and sampling textures – has dedicated hardware

• SPU is a general purpose processor – Use flexibility to your advantage– Choose different code branches and fast paths

Page 7: Deferred Lighting and Post-Processing on PS3

Post Processing Effects on SPU

A Whirlwind Tour

Page 8: Deferred Lighting and Post-Processing on PS3

8

What to do on SPU

• Options:

• Offload whole processes from GPU to SPU

• Or use SPU and GPU together to do one process

Page 9: Deferred Lighting and Post-Processing on PS3

9

Depth Of Field Pre-Process

• High quality depth of field requires a long fragment shader– Read depth samples and colour samples in a kernel / disc– Check depths against centre pixel depth– Weight colours by depth check results

• Wasteful for “most” of the screen– All depth checks pass (out of focus) or all fail (in focus)– All fail == pass through original buffer– All pass == use pre-blurred buffer – separable gaussian blur

• Categorise the screen for these cases on SPU

Page 10: Deferred Lighting and Post-Processing on PS3

10

Depth Of Field Classification Results

• Post process depth buffer

• Classify by min/max depth

• Green: fully in focus• Blue: fully out of focus• Red: neither fully in or out

Page 11: Deferred Lighting and Post-Processing on PS3

11

Depth Of Field Pre-process results

• Pre-process only on SPU, blur operations on GPU – Goal: minimise overall frame

time and latency

• Large blur w.r.t. depth• 15 ms+ on GPU alone• 1.5-2ms on SPU + 3 ms on

GPU

Page 12: Deferred Lighting and Post-Processing on PS3

12

Screen Tile Classification

• Categorise the screen using the range of depth values within a tile

• Powerful technique with many applications– Full screen effect optimization - DOF, SSAO..– Soft particles– Affecting lights– Occluder information

Page 13: Deferred Lighting and Post-Processing on PS3

13

Screen Space Ambient Occlusion (SSAO)

• Generate an ambient occlusion approximation using the depth buffer alone

• Perform a large kernel-based series of depth comparisons and sum the results

• Downsample output to ½ size for performance– Output normals for bilateral upsampling

Page 14: Deferred Lighting and Post-Processing on PS3

14

SPU Screen Space Ambient Occlusion Results

• GPU version: 10ms+• SPU version: 6ms on 2

SPUs• Used in “Donkey Trader”

PhyreEngine game template

Page 15: Deferred Lighting and Post-Processing on PS3

Deferred Rendering

Page 16: Deferred Lighting and Post-Processing on PS3

16

Deferred Shading Overview

• Rasterise geometry information to multiple “GBuffers” (geometry buffers)

• Apply lighting and shading in a post process

Page 17: Deferred Lighting and Post-Processing on PS3

Demo

Page 18: Deferred Lighting and Post-Processing on PS3

18

Deferred Lighting on SPU

• The SPU can handle the deferred lighting process

• The GPU renders the geometry to GBuffers

• SPU and GPU execute in parallel– Total time : max( geometry, lighting )

Page 19: Deferred Lighting and Post-Processing on PS3

19

Deferred Lighting on SPU: Implementation (1)

• Process each pixel once

• Work out which lights affect each pixel

• Apply the N affecting lights in a loop

• Process the screen in tiles

• Use classification techniques per tile to optimise

Page 20: Deferred Lighting and Post-Processing on PS3

20

Deferred Lighting on SPU: Implementation (2)

• Calculate affecting lights per tile– Build a frustum around the tile using the min and max depth

values in that tile– Perform frustum check with each light’s bounding volume– Compare light direction with tile average normal value

• Choose fast paths based on tile contents– No lights affect the tile? Use fast path – Check material values to see if any pixels are marked as lit

Page 21: Deferred Lighting and Post-Processing on PS3

21

Deferred Lighting on SPU: Implementation (3)

• Choose whether to process MSAA per tile– If no sample pair values differ, light

only one sample from the pair, otherwise light both samples separately

– Typically quite few tiles need both MSAA samples lit

Tiles requiring MSAA

Page 22: Deferred Lighting and Post-Processing on PS3

22

Deferred Lighting on SPU: Results

• 3 shadow casting lights, 100 point lights

• 2x MSAA, 720p– Lighting performed per sample

• Apply tone mapping on SPU– Virtually free

• Performance: > 60 fps, 3 SPUs for 11ms each– No MSAA: 2 SPUs for 11ms

Page 23: Deferred Lighting and Post-Processing on PS3

23

Deferred Lighting on SPU: Issues

• Potential latency– Must keep GPU busy while SPU process is running– Render something else or add a frame of latency

• Main memory requirements

• Shadows – Requires “random” texture access – not ideal for SPU – Can render shadows on GPU to a full screen buffer and use it

on SPU

Page 24: Deferred Lighting and Post-Processing on PS3

24

Flavours of Deferred Lighting on SPU

• Full deferred render on SPU– Input all GBuffers, output final composited result

• Light pre-pass render on SPU– Input normal and depth only; calculate light result; sample in

2nd geometry pass

• Light tile classification data output?– SPU outputs information per tile about affecting lights– Do lighting calculations on GPU

Page 25: Deferred Lighting and Post-Processing on PS3

Volumetric Lighting

Page 26: Deferred Lighting and Post-Processing on PS3

26

Volumetric Lighting

• Also known as “god rays” or “light beams”

• Simulates the effect of light illuminating dust particles in the air

• Numerous fakes exist– Artist-placed geometry– Artist-placed particles

• Better: generate using the shadow map– Works in a “general case”

Page 27: Deferred Lighting and Post-Processing on PS3

27

Volumetric Lighting

• Ray march through the shadow map– Trace one ray per pixel in screen space – Sample the depth buffer to determine

the end of the ray

• Sample the shadow map at N points along the ray– N ~= 50– Attenuate and sum up the number of

samples that passed

• Blur and add noise

Page 28: Deferred Lighting and Post-Processing on PS3

28

Volumetric Lighting

• Effect is a bit too slow to be practical on GPU: ~5ms

• Do it on SPU instead

• Parallelises with GPU easily– Result needed late in the render at compositing stage– Only needs depth and shadow map inputs

• Problem: must randomly sample from the shadow map

Page 29: Deferred Lighting and Post-Processing on PS3

29

Texture sampling on SPU

• “Random access” texture sampling is bad for SPU• It’s bad for GPU, too, but sometimes you just have to do it• GPU:

– Fast access from texture cache; cache miss is slow– Dedicated hardware handles lookups, filtering and wrapping

• SPU:– Fast access from “texture cache” (SPU local memory)– Slow access on cache miss (DMA from main memory)– Cache lookups slow (no dedicated hardware)– Must manually handle filtering and wrapping (again, slow)

Page 30: Deferred Lighting and Post-Processing on PS3

30

Texture sampling on SPU

• Either:– Make the texture entirely fit in SPU local memory– Problem solved!– Still inefficient: random accesses reduce register parallelism

• Or– Write a very good software cache– Locate potential cache misses early - long before you need the values– Avoid branches in sampling code

Page 31: Deferred Lighting and Post-Processing on PS3

31

Volumetric Lighting on SPU

• Volumetric light result will be blurred– Don’t need full shadow map accuracy– No filtering on texture samples needed

• Downsample shadow map from 1024x1024, 32 bit to 256x256, 16 bit – 128k – fits in SPU local memory

• Fast enough to sample on SPU

Page 32: Deferred Lighting and Post-Processing on PS3

32

Volumetric Lighting on SPU: Results

• Takes ~11 ms on 1 SPU

Page 33: Deferred Lighting and Post-Processing on PS3

33

Shadow Mapping on SPU (1)

• Needs the full-size shadow map– 1024x1024x32 bit == 4mb : won’t fit in SPU local memory– We’ll have to write that “very good software cache”, then

• Pre-process the shadow map on SPU– Calculate min and max depth for each tile – Store in a low resolution depth hierarchy map – Output high resolution shadow map as cache tiles

Page 34: Deferred Lighting and Post-Processing on PS3

34

Shadow Mapping on SPU (2)

• Software cache with 32 entries– Each entry is a shadow map tile – Branchless determination of cache entry index for tile index

• Locate cache misses early– While detiling depth data – work out required shadow tiles– Pull in all cache-missed tiles

• Sample shadow map during lighting calculations– All required shadow tiles are now definitely in cache – lookup is

branchless• It’s quite slow

– Locate tile in cache per pixel

Page 35: Deferred Lighting and Post-Processing on PS3

35

Shadow Mapping on SPU (3)

• Optimise via special cases to win back performance

• Use the low resolution shadow tile map – Always in SPU local memory– If pixel shadow z > tile max Z : definitely in shadow– If pixel shadow z < tile min Z : definitely not in shadow

• Check low resolution map before triggering cache fetches

• Classify whole screen tiles as in or out of shadow– Don’t need to sample high resolution shadow map at

all for those tiles Tiles requiring high resolution shadow samples

Page 36: Deferred Lighting and Post-Processing on PS3

Conclusion

Page 37: Deferred Lighting and Post-Processing on PS3

37

Conclusion

• New additions to your toolbox:– Tile-based classification techniques on SPU– Deferred lighting on SPU– Texture sampling on SPU

• Rendering is no longer just a GPU problem– Use general purpose nature of the SPU to your advantage

• Rethink fragment processing optimisation strategies– Make the GPU work smarter, not harder

Page 38: Deferred Lighting and Post-Processing on PS3

38

Conclusion

• Some titles are already using SPU post processing– Killzone 2

• PhyreEngine™ is here to help – (If you’re a registered PS3 developer) it’s on DevNet now – Not just an engine: also a reference– Comes with full source– Download it, learn from it, steal bits of the code– Check out the PhyreEngine™ SPU Post Processing Library