bandwidth-efficient rendering · framebuffer fetch • per-pixel storage that is persistent...
TRANSCRIPT
![Page 1: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/1.jpg)
![Page 2: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/2.jpg)
Bandwidth-Efficient
Rendering
Marius Bjørge ARM
![Page 3: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/3.jpg)
• Efficient on-chip rendering
• Post-processing
– Bloom
– Blur filters
Agenda
![Page 4: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/4.jpg)
• Extensions
– Framebuffer fetch
– Pixel Local Storage
• Why extensions?
– Surely mobile GPUs are already bandwidth-efficient?
Efficient on-chip rendering
![Page 5: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/5.jpg)
• Read the current fragment’s previous color value
• ARM also supports reading the previous depth and stencil
values of the current fragment
• Useful for
– Programmable blending
– Programmable depth/stencil testing
Framebuffer fetch
![Page 6: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/6.jpg)
• Per-pixel storage that is persistent throughout the lifetime
of the frame
– Read/write access
– Storage stays on-chip
– Storage layout declared per fragment shader invocation – does
not depend on framebuffer format
• Useful for
– Deferred shading
– Order Independent Transparency [1]
– Volume rendering
Pixel Local Storage (PLS)
![Page 7: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/7.jpg)
Pixel Local Storage (PLS)
• Rendering pipeline
changes slightly when
PLS is enabled
– Writing to PLS bypasses
blending
• Note
– Fragment order
– PLS and color share the
same memory location
![Page 8: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/8.jpg)
Pixel Local Storage (PLS) OIT phaseOpaque phase
Fill gbufferLight
accumulation
Pixel Local Storage
Init OIT
R32UI R32UI R32UI R32UI
Transparent rendering
Resolve
ColorRGB10A2 RGB10A2 RG16F RG16F
At this point we change the layout of the PLS
![Page 9: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/9.jpg)
Post-processing
![Page 10: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/10.jpg)
• High-end mobile devices typically have small displays
with massive resolutions
• Rendering at native resolution is often out of the question,
especially if you add post-processing to the mix
• Solution: mixed resolution rendering
– Go as low as you can without sacrificing quality, and then upscale
Post-processing
![Page 11: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/11.jpg)
Mobile post-processing
On-chip
• Color Grading
• Tonemapping
Off-chip
• Anti-aliasing
• Bloom
• Depth of Field
• Screen Space Ambient
Occlusion
• Screen Space Reflections
![Page 12: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/12.jpg)
• Doesn’t have to be physically correct
• Wide + thin
Bloom
Threshold
Blur
Composite
![Page 13: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/13.jpg)
• What makes a good blur filter?
• Goal:
– High quality
– Stable
– High performance
Blur
![Page 14: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/14.jpg)
• 5x5 box blur = 25 samples
• Separate the blurs
– 5 + 5 = 10 samples
Box blur Horizontal
Vertical
![Page 15: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/15.jpg)
Gaussian blur
• Convolve a gaussian
function over the image
• Separable just like the
box filter
![Page 16: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/16.jpg)
Linear sampling optimization [2]
• Reduce number of
texture lookups by
exploiting the HW texture
unit
– Modify sample offsets and
gaussian weights
• Get 9x9 at similar cost as
5x5
![Page 17: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/17.jpg)
Mixing resolutions
Down
Up
Down
Up
![Page 18: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/18.jpg)
• Gets increasingly complicated when using separable
kernels
Mixing resolutions
Down
Horz
Vert
Down
Horz
Vert
Up
Horz
Vert
Up
![Page 19: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/19.jpg)
Kawase blur [3]
![Page 20: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/20.jpg)
Kawase blur
![Page 21: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/21.jpg)
“Dual filtering”
Downsample filter
Upsample filter
1/2
1/8
1/8
1/8
1/8
![Page 22: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/22.jpg)
“Dual filtering”
![Page 23: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/23.jpg)
Comparing filters
![Page 24: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/24.jpg)
• 97x97 blur
• Gaussian used as reference
• Kawase
– First downsample to 1/16th resolution
– Setup with 0, 1, 2, 3, 4, 4, 5, 6, 7 distances passes
• “Dual filtering” setup with 8 passes
• Naïve method which relies on glGenerateMipmap
Comparison setup
![Page 25: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/25.jpg)
Input Reference Dual Kawase
PSNR:
49.78 dB
50.02 dB
![Page 26: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/26.jpg)
Stability comparison
![Page 27: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/27.jpg)
Input Reference Dual Kawase Naive
![Page 28: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/28.jpg)
Input Reference Dual Kawase Naive
![Page 29: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/29.jpg)
Input Reference Dual Kawase Naive
![Page 30: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/30.jpg)
Performance comparison
![Page 31: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/31.jpg)
41.9
21.4 23.5
4.5 2.8
19.2
9.9 7.0
2.8 1.7
0
5
10
15
20
25
30
35
40
45
Gaussian Box 5x5 gaussian Kawase Dual
1080p
720p
Performance (ms)
Tested on a Mali-T760 MP8
![Page 32: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/32.jpg)
98%
10% 12% 5% 2% 3% 4% 2%
0%10%20%30%40%50%60%70%80%90%
100%
Linear sampling 5x5 gaussianreduction
Kawase Dual
Reads
Writes
Bandwidth
![Page 33: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/33.jpg)
2% 70% 63% 71% 0%
10%
20%
30%
40%
50%
60%
70%
80%
Linear sampling 5x5 gaussianreduction
Kawase Dual
Hit
Cache utilization
![Page 34: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/34.jpg)
• On-chip rendering
– Please use the extensions
• Bloom
– Multi-pass mixed resolution
– “Dual filter” blur
• Next steps
– Work on getting on-chip rendering into future core APIs
– Look into alternative data flows for doing blurs
Summary
![Page 35: Bandwidth-Efficient Rendering · Framebuffer fetch • Per-pixel storage that is persistent throughout the lifetime of the frame – Read/write access – Storage stays on-chip –](https://reader034.vdocuments.net/reader034/viewer/2022052018/60322c3e25ecc478297215b7/html5/thumbnails/35.jpg)
Thanks!
• Questions?
• References
1. Efficient Rendering with Tile Local Storage [Siggraph 2014]
2. http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-
linear-sampling/
3. Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L
[GDC 2003]