visibility optimization for games

Download Visibility Optimization for Games

Post on 13-Jan-2015




3 download

Embed Size (px)


A presentation held by Umbra Software lead programmer Sampo Lappalainen at China Game Developer Conference 2011.


  • 1. Visibility Optimization for Games Sampo Lappalainen Lead Programmer Umbra Software Ltd.

2. Introduction

  • Background in graphics programming
  • Hybrid Graphics, NVIDIA, Umbra Software
  • With Umbra since 2008
  • Graphics middleware for console and PC games
  • Emphasis on visibility

3. Roadmap

  • Motivation
  • Theory
  • Practice
  • Other applications
  • Demo


  • Why is visibility optimization important?

5. Game World 6. Our Villain 7. Our Hero 8. Screen Shot 9. Game Worlds

  • Game developers want to make impressive game worlds
  • Hardware sets limits on what can and cant be done.
  • Game developers need to push the hardware to its limits.

10. Visibility Optimization

  • The most effective way to gain performance in games.
  • Two basic ways to do visibility optimization:
    • art and level design
    • technology
  • Games use a mix of both.

11. Visibility Optimization by Level Design

  • Artists design game worlds so that performance is acceptable.
  • Can be done in numerous ways e.g.:
    • limiting view distance
    • limiting polygon or object count
    • modeling portals and cells

12. Visibility Optimization by Level Design 13. Visibility Optimization by Level Design

  • Time consuming and usually boring work.
  • Sets huge limits on what can and cannot be done.
  • May lead to monotonic level design.
  • Manual and non-recurring work.

14. Visibility Optimization by Technology 15. Visibility Optimization by Technology 16. Visibility Optimization by Technology

  • Gains:
    • No time wasted on rendering objects that dont contribute to the output image (no state changes, no draw calls etc).
    • AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.


  • Walkthrough of the key concepts

18. Terminology

  • Culling removing hidden objects from rendering
  • Target object that can be hidden by others
  • Occluder an object that blocks visibility
  • Rendering artifact A non-intended glitch in the output image

19. Metrics for comparison

  • GPU cost
  • CPU cost
  • Overall frame time
  • Memory usage
  • Precomputation time
  • Manual work
  • Culling power

20. Backface culling

  • Taken care of by the HW
  • Culling entire triangles based on their winding
  • No need to render the insides of an object

21. Depth buffering

  • Taken care of by the HW
  • A two dimensional buffer for storing z-values for each screen pixel
  • Before processing shaders for a pixel to be rendered, test the z-value.
  • Allows drawing of unsorted geometry, however sorting still greatly improves performance

22. Hierarchicaldepth buffering

  • Replace depth buffer with a depthpyramid
    • Bottom of the pyramid: full-resolution depth buffer
    • Higher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below level
  • Hierarchically rasterize the polygon starting from the highest level
    • If polygon is further than the recorded pixel, early exit
    • If polygon is closer, hierarchically test the lower levels
    • If the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramid

23. Spatial hierarchies

  • Enabled culling large portions of the game world with a single quick test
  • Dynamic objects can be moved in the hierarchy runtime
  • BSP-tree, kd-tree

24. Spatialhierarchies 25. View frustum culling

  • Culling objects that are outside the camera view cone
  • Test using object bounds
  • Tremendous speed-up using an hierarchy

26. View Frustum Culling 27. View Frustum Culling 28. Potentially Visible Set - PVS

  • A data structure that definesfrom-region-visibilityfor a scene
  • Computed in pre-process
  • Scene is divided intoCells
  • Compute a bit matrix that lists all the visible objects for each cell
  • Runtime a simple matrix lookup
  • How to find a good sub-division for a scene?
  • Cannot handle dynamic occluders
  • Target volume: extension to handle dynamic targets

29. Portals

  • Place portals in the scene that connect the cells to form aportal graph
  • In runtime, find the portals of the current cell that are in the frustum
  • Traverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portal
  • Same limitations with dynamic objects as with PVS systems

30. Rasterization-based

  • Renderoccluder geometryinto a software coverage buffer
  • Test visibility usingtest geometry
  • Usetemporal coherenceto determine the initial set to be rendered
  • Handles both dynamic targets and occluders as long as they have occluder geometry

31. Testing from coverage buffer 32. Testing from coverage buffer 33. Testing from coverage buffer 34. Testing from coverage buffer 35. Testing from coverage buffer 36. Testing from coverage buffer 37. Testing from coverage buffer 38. Testing from coverage buffer 39. Occlusion Queries

  • Supported by GPUs since 2001.
  • GPU answers the question: how many pixels would have been visible if this object would have been rendered?
  • Instead of rasterizing your own depth buffer, use the GPU depth buffer instead
  • Normally the query is done using bounding volumes (effective but not necessary).
  • No need for artist generated occluder geometry
  • GPU-CPU synchronization needed

40. Occlusion Queries

  • Determine the set of visible objects against the actual rendered geometry:
    • all pixels can be used as occluding material!

41. Using Occlusion Queries

  • Occlusion queries are a really powerful tool for visibility optimization.
  • Like all other features of the GPU occlusion queries can be used ineffectively.
  • Special tricks are needed to get the most out of occlusion queries.

42. Issuing Occlusion Queries disableColorWrite(); disableDepthWrite(); startQueryCounter(); renderObjectBounds(); stopQueryCounter(); enableColorWrite(); enableDepthWrite(); if (query->getResult() > 0) renderObject(); 43. CPU-GPU synchronization

  • With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing).
  • With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not.
  • The CPU needs to wait for the query results to be available.
  • No parallel processing (which is really bad).

44. Issuing Occlusion Queries 45. Issuing Occlusion Queries 46. Issuing Occlusion Queries 47. Issuing Occlusion Queries

  • Fortunately GPU design has a solution for this problem.
  • GPUs can store multiple occlusion query results.
  • Occlusion queries can be batched.
  • Some GPUs have a limit on how many query results can be stored.

48. Batching Occlusion Queries disableColorWrite(); disableDepthWrite(); for (each query) { startQueryCounter(); renderObjectBounds(); stopQueryCounter(); } enableDepthWrite(); enableColorWrite(); for (each query) { if (query->getResult() > 0) renderObject(); } 49. Batching Occlusion Queries 50. Latent Occlusion Queries

  • Some stalls may be introduced between frames.
  • The last query result needs to be read back before continuing.
  • Avoid GPU stalls by using the query results from the previous frame.
  • Read back the query results at the beginning of each frame.
  • Sounds like a perfect solution?

51. Latent Occlusion Queries 52. Latent Occlusion Queries

  • There are downsides to this.
  • Visible popping artifacts when objects come visible.
  • If the camera is moving slowly and FPS is good, no problem.
  • When multiple objects become visible FPS typically drops (theres a lot more to render)
  • For example when a door is opened.

53. L


View more >