introduction to parallel rendering jian huang, cs 594, spring 2002

20
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Upload: ursula-booker

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Introduction to Parallel Rendering

Jian Huang, CS 594, Spring 2002

Page 2: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Parallel Rendering• Graphics rendering process is computationally

intensive

• Parallel computation is a natural measure to leverage for higher performance

• Two levels of parallelism:– Functional parallelism – pipelining– Data parallelism – multiple results computed at the

same time

Page 3: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Data Parallel Algorithms

• A lot of taxonomies of categorizing parallel algorithms– Image space vs. object space

– Shared memory architecture, distributed memory architecture

– MPI, OpenMP, …

• Need a uniform framework to study and understand parallel rendering

Page 4: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

A Core Problem

• To partition work and distribute them– Rendering requires the involved data to go with each work

partition

– How to collect the rendered results to form the final image

– Intensive data communication

• A landmark paper: “A sorting classification of parallel rendering”, Molner, et. al., IEEE CG&A’94.

Page 5: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

The Rendering Process• The rendering pipeline:

– Geometry processing: transformation, lighting, clipping

– Rasterization: scan-conversion, shading, visibility

• Parallel processing:

Page 6: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

A New Perspective• Rendering as a sorting process:

– Sort from object coordinates to screen coordinates

– Use this concept to study computational and communication costs

Page 7: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Sorting• Parallel geometry processing: assign a subset of

primitives or objects to each processor• Parallel rasterization processing: assign a sub-

region of the screen to each processor• The key procedure: calculating the effect of each

primitive on each pixel• Rendering is a ‘sorting’ process from each

primitive into each screen pixel– This sort involves redistributing data among processors

Page 8: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Where does the ‘sort’ take place?• The location of this ‘sort’ determines the

structure of the parallel algorithm• This ‘sort’ can take place during:

– Geometry processing (sort-first)– Between geometry processing and rasterization

(sort-middle)– Rasterization (sort-last)

• Each different sort has distinct properties

Page 9: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Sort-First• Redistributing raw primitives

Page 10: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Sort-Middle• Redistributing screen primitives

Page 11: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Sort-Last• Redistributing fragments, samples, or pixels

Page 12: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Processing and Communication Model

• A refined model

• Assume a dataset containing nr raw primitives with average size ar .

• We will call primitives that result from tessellation display primitives. If T is the tessellation ratio, there are nd = Tnr of these, with average size ad = ar /T. If there is no tessellation, T = 1, nd = nr , and ad = ar .

• Assume an image containing A pixels and need to compute S samples per pixel. Assume that all primitives within the viewing frustum).

Page 13: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

The Terms

Page 14: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Analysis of Sort-First• c, proportion of primitives to be redistributed• Advantages

– Low communication requirements when the tessellation ratio and the degree of oversampling are high, or when frame-to-frame coherence can be exploited.

– Processors implement entire rendering pipeline for a portion of the screen.

• Disadvantages– Susceptible to load imbalance.

Primitives may clump into regions, concentrating the work on a few renderers.

– To take advantage of frame-to-frame coherence, retained mode and complex data handling code are necessary.

Cost over uni-processor rendering

Page 15: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Analysis of Sort-Middle• Advantages: General and

straightforward; redistribution occurs at a natural place in the pipeline.

• Disadvantages:– High communication costs if

tessellation ratio is high.– Susceptible to load imbalance

between rasterizers when primitives are distributed unevenly over the screen.

Cost over uni-processor rendering

Page 16: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Analysis of Sort-Last• Sparse merging – only merge the small region

each processor renders• Full-frame merging – always merge the whole

frame-buffer• Advantages:

– Renderers implement the full rendering pipeline and are independent until pixel merging.

– Less prone to load imbalance.– SL-full merging can be embedded in a

linear network, making it linearly scalable.

• Disadvantage: Pixel traffic may be extremely high, particularly when oversampling.

Cost over uni-processor rendering

Page 17: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

A Comparison• Sort-first, sort-middle and sort-last.

• There is no strictly the best category• Different categories can be combined in an

implementation as well

Page 18: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Parallel Volume Rendering

• A lot of algorithms choose to distribute data to processing nodes. Each node renders its portion of data, and generate a partial image. The partial images then get accumulated together.

• Screen space partitioning (tiles or continuous scan-lines) are also used.

Page 19: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Load Balancing• For better load balancing,

– Task queuing: the task queue can be ordered in decreasing task size, such that the concurrency gets finer until the queue is exhausted.

– Load stealing: having nodes steal smaller tasks from other nodes, once they have completed their own tasks

– Time stamp: timeout stamps used for each task, such that if the node can not finish its task before the timeout, it takes the remnant of the task, re-partitions it and re-distributes it.

• Hierarchical data structures, such as octree, k-d tree, etc., are commonly used.

Page 20: Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002

Sum It Up

• Parallelism is just a tool

• It depends on which algorithm you parallelize

• What about an OpenGL application utilizing occlusion culling?