efficient clustered bvh update algorithm for highly-dynamic models

40
RT 08 RT 08 Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Symposium on Interactive Ray Tracing Symposium on Interactive Ray Tracing 2008 2008 Los Angeles, California Los Angeles, California Kirill Garanzha Kirill Garanzha Department of Software for Computers Department of Software for Computers Bauman Moscow State Technical University, Russia Bauman Moscow State Technical University, Russia

Upload: ian-valentine

Post on 31-Dec-2015

47 views

Category:

Documents


2 download

DESCRIPTION

Symposium on Interactive Ray Tracing 2008 Los Angeles, California. Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha Department of Software for Computers Bauman Moscow State Technical University, Russia. BVH summary. - PowerPoint PPT Presentation

TRANSCRIPT

RT 08RT 08

Efficient Clustered

BVH Update Algorithm

for Highly-Dynamic Models

Symposium on Interactive Ray Tracing 2008Symposium on Interactive Ray Tracing 2008Los Angeles, CaliforniaLos Angeles, California

Kirill GaranzhaKirill Garanzha

Department of Software for ComputersDepartment of Software for ComputersBauman Moscow State Technical University, RussiaBauman Moscow State Technical University, Russia

RT 08RT 08BVH summaryBVH summary

Advantages:Advantages:Fast refittingFast refittingPredictable memory Predictable memory

consumptionconsumptionFastest empty space Fastest empty space

passing for SAH treespassing for SAH trees

BVH is the tree that encloses scene objectsBVH is the tree that encloses scene objects

Disadvantages:Disadvantages:Not ordered traversal of the ray in the spaceNot ordered traversal of the ray in the spaceTree-quality degradation after refittingTree-quality degradation after refittingNot very fast SAH-based buildNot very fast SAH-based build

RT 08RT 08Dynamic BVH relatedDynamic BVH related

BIH [Wächter EGSR06]BIH [Wächter EGSR06]Build from hierarchy [Hunt RT07]Build from hierarchy [Hunt RT07]SAH binning [Wald RT07]SAH binning [Wald RT07]Selective restructuring [Yoon EGSR07]Selective restructuring [Yoon EGSR07]

Fast BVH build assume some tree-Fast BVH build assume some tree-quality degradation for ray tracingquality degradation for ray tracing

RT 08RT 08Dynamic BVH relatedDynamic BVH related

In general these approaches apart In general these approaches apart from “Selective restructuring” produce from “Selective restructuring” produce splits for every BVH-node in every splits for every BVH-node in every frame – frame – brute forcebrute force

Fast BVH build assume some tree-Fast BVH build assume some tree-quality degradation for ray tracingquality degradation for ray tracing

RT 08RT 08

How to get rid of How to get rid of brute forcebrute force??

Dynamic BVH relatedDynamic BVH related

Lazy buildSelective restructuringAvoid full reconstruction of nodes

without strong dynamism under them

RT 08RT 08IdeaIdea

To measure cheaply the dynamism To measure cheaply the dynamism under node while refitting the BVHunder node while refitting the BVH

To apply the cheapest update To apply the cheapest update technique for the node that saves technique for the node that saves good SAH cost:good SAH cost:

1. Leave it refitted2. Relocate BVH-cluster in proper position

of the tree3. Rebuild the structure under node by

exploiting SAH-binning, existing hierarchy, degree of dynamism

RT 08RT 08Dynamism detectionDynamism detection

The type of dynamism is detected in the The type of dynamism is detected in the bottom-up refitting process for every bottom-up refitting process for every affected node:affected node:

Migrating node represents a BVH-cluster of coherently moving triangles

Exploded node represents a cluster of strong triangle explosions and growing overlaps

RT 08RT 08Detect migrating nodeDetect migrating node

Example:Example:

……

BB

……

AA

……

CC ……

DD

……

EE

Nodes B and C are moving. They represent clusters of coherently moving triangles

RT 08RT 08

……

BB

……

AA

……

CC

……

DD

……

EE

The SAH cost of the node (B+C) is improved. But nodes B and C should be united with others

Detect migrating nodeDetect migrating node

Example:Example:

RT 08RT 08

……

BB

……

AA

……

CC

……

DD

……

EE

Like now…

Example:Example:

Detect migrating nodeDetect migrating node

RT 08RT 08Formula:Formula:

dMNstSavedSAHCo

NSAHCost1

)(

)(

if

then NL and NR represent migrating clusters and should be relocated

NL and NR are children of N dM is the predefined threshold

Detect migrating nodeDetect migrating node

RT 08RT 08Detect exploded nodeDetect exploded node

Example:Example:

dMNstSavedSAHCo

NSAHCost1

)(

)(

……

NNRR

NNLL

The overlap between NL and NR is likely to grow if SAH cost of N is increasing

RT 08RT 08Example:Example:

dMNstSavedSAHCo

NSAHCost1

)(

)(

N is worth to rebuild if there is a big % of broken nodes under it

Detect exploded nodeDetect exploded node

NNRR

……

NNLL

RT 08RT 08Example:Example:

dMNstSavedSAHCo

NSAHCost1

)(

)(

Broken node – migrating or exploded. Broken count measures the dynamism under node

Detect exploded nodeDetect exploded node

NNRR

NNLL

RT 08RT 08Formula:Formula:

dMNstSavedSAHCo

NSAHCost1

)(

)(dM

NstSavedSAHCo

NSAHCost1

)(

)(

dMNstSavedSAHCo

NSAHCost1

)(

)(

if

then The cluster with root N should be restructured

There is big overlap between NL and NR

There is big % of broken nodes under N

NL and NR are children of N dE is the predefined threshold

Detect exploded nodeDetect exploded node

RT 08RT 08Refitting phaseRefitting phase

The bottom-up refitting phase:The bottom-up refitting phase:1.1. Updates BVsUpdates BVs2.2. Detects the dynamismDetects the dynamism3.3. Accumulates broken countsAccumulates broken counts

At the output it produces 2 arrays with At the output it produces 2 arrays with migrating and exploded nodesmigrating and exploded nodes

RT 08RT 08

How to obtain a lot of How to obtain a lot of smaller exploded clusters smaller exploded clusters to perform independent to perform independent rebuilds on them?rebuilds on them?

Independent clustersIndependent clusters

RT 08RT 08

When exploded node N is detected:Lock it. N will be considered as atomic

for rebuild process on a cluster above it.Unlock all exploded descendants of N

that intersect the BV of the overlap between NL and NR

Independent clustersIndependent clusters

RT 08RT 08

LockedLockedLockedLocked

LockedLocked

LockedLocked

NNLL

NNRR

Locked exploded descendants will be rebuilt independently and will be considered as atomic for some rebuild process above them

Independent clustersIndependent clusters

RT 08RT 08

LockedLockedLockedLocked

LockedLocked

LockedLocked

NNLL

NNRR

But if some of them intersect the BV of the overlap within some ancestor N then they create obstacles in the process eliminating the overlap

Independent clustersIndependent clusters

RT 08RT 08

LockedLockedLockedLocked

LockedLocked

LockedLocked

NNLL

NNRR

So they should be unlocked

Independent clustersIndependent clusters

RT 08RT 08Independent clustersIndependent clusters

L

L L…

…… …

The set of independent clusters for rebuilding may form the hierarchy. Their roots are locked

RT 08RT 08

… … …

Hierarchical rebuildHierarchical rebuild

L

1. New split should be created with SAH binning

4. Locked nodes are atomic

2. Smaller-sized rows are considered

while partitioning

3. In the next partition step the sub-row is refined if it has sufficient

% of Broken count 5. Other cluster, independent rebuild

This way saves some rebuild operations

RT 08RT 08Cluster migrationCluster migration

Migrating nodes are processed in 3 loops:Migrating nodes are processed in 3 loops:

1.1. Unlink each node from the treeUnlink each node from the tree

2.2. Adjust the remaining tree and search Adjust the remaining tree and search reinsertion nodesreinsertion nodes

3.3. Insert each node back into the treeInsert each node back into the tree

At every loop all migrating nodes are At every loop all migrating nodes are passed in the “end-begin” order – i.e. a passed in the “end-begin” order – i.e. a loop starts from nodes of higher levelsloop starts from nodes of higher levels

RT 08RT 08Cluster migrationCluster migration

Migrating nodes are processed in 3 loops:Migrating nodes are processed in 3 loops:

1.1. Unlink each node from the treeUnlink each node from the tree

2.2. Adjust the remaining tree and search Adjust the remaining tree and search reinsertion nodesreinsertion nodes

3.3. Insert each node back into the treeInsert each node back into the tree

The formula that detects migrating nodes The formula that detects migrating nodes and these 3 loops automatically resolve and these 3 loops automatically resolve the problem of the tree-thinning effectthe problem of the tree-thinning effect

RT 08RT 08Cluster migrationCluster migration

The insertion is a The insertion is a recursive process recursive process of taking decisions of taking decisions

XX

XXLL XXRR

NN

NNLL NNRR

Where insert N when start from X?

? <= 2. Descend into

one of the children

XX

XXLL XXRR NN ? <=

XX

XXLL XXRR

NNLL NNRR

3. Decompose N

, ? <=

1. Unite X and N

UU

XX NN

The variant of decision is taken according to its SAH evaluation

N is decomposed when its insertion produces severe

overlap in X.

RT 08RT 08Memory managerMemory manager

Problem: frequent updates of frequent updates of pointers in the proposed highly-pointers in the proposed highly-dynamic structure result in cache-dynamic structure result in cache-efficiency degradation.efficiency degradation.

Property of restructurings: in either in either rebuild or insert process a new BVH-rebuild or insert process a new BVH-node is allocated under some parent node is allocated under some parent node.node.

RT 08RT 08Memory managerMemory manager

Acceleration structure for pre-allocated Acceleration structure for pre-allocated array of BVH-nodes improves the situation:array of BVH-nodes improves the situation:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15BVH nodes array

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

4 4 4 4

16Selector acceleration structure

It accelerates the allocation of a free BVH-It accelerates the allocation of a free BVH-node that is the nearest to the given one in node that is the nearest to the given one in the memory space (e.g. nearest to parent) the memory space (e.g. nearest to parent)

RT 08RT 08

This strategy tries to keep the BVH-layout This strategy tries to keep the BVH-layout of reasonable cache-efficiencyof reasonable cache-efficiency

Memory managerMemory manager

Acceleration structure for pre-allocated Acceleration structure for pre-allocated array of BVH-nodes improves the situation:array of BVH-nodes improves the situation:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15BVH nodes array

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

4 4 4 4

16Selector acceleration structure

RT 08RT 08BenchmarksBenchmarksBVH Updater – Core 2 Quad 2.4 GHz – Core 2 Quad 2.4 GHz (1 core utilization)(1 core utilization)

Ray Tracer – GeForce 8800 GTX and – GeForce 8800 GTX and CUDA, mono-ray tracerCUDA, mono-ray tracer

UNC Dynamic models:

Exploding dragon (252K triangles)

Cloth simulation (92K triangles)

Colliding balls (146K triangles)

RT 08RT 08Dragon timingsDragon timings

0102030405060708090

100110120130140

0 10 20 30 40 50 60 70 80 90 100

Frame

Tim

e, m

s

Refit only Total updateMigrate only Render onlyRebuild only

RT 08RT 08Cloth timingsCloth timings

05

1015202530354045505560

0 10 20 30 40 50 60 70 80 90 100

Frame

Tim

e, m

s

Refit only Total updateMigrate only Render onlyRebuild only

RT 08RT 08Balls timingsBalls timings

05

10152025303540455055606570

0 10 20 30 40 50 60 70 80 90 100

Frame

Tim

e, m

s

Refit only Total updateMigrate only Render onlyRebuild only

RT 08RT 08Rebuild partitionsRebuild partitions

0%

10%

20%30%

40%

50%

60%

70%80%

90%

100%

0 10 20 30 40 50 60 70 80 90 100

Frame

Num

ber

of re

build

part

itions

Dragon Cloth Balls

Relative number of rebuild partitions. Model specific 100% = number of partitions for the full-rebuild (i.e. number of inner nodes)

RT 08RT 08Independent clustersIndependent clusters

0100200300400500600700800900

1000110012001300

0 10 20 30 40 50 60 70 80 90 100

Frame

Num

ber

of in

dependent clu

ste

rs

Dragon Cloth Balls

The number of detected independent exploded clusters

RT 08RT 08Rendering performanceRendering performance

80%

85%

90%

95%

100%

105%

0 10 20 30 40 50 60 70 80 90 100

Frame

Rela

tive p

erf

orm

ance

Dragon Cloth Balls

Relative rendering performance of the BVH in comparison to the one produced by full SAH binned-rebuild

RT 08RT 08Dragon videoDragon video

RT 08RT 08Cloth videoCloth video

RT 08RT 08Balls videoBalls video

The algorithm adapts to the rigid motion associating every ball with a separate subtree and then exploits only migrating updates

RT 08RT 08ConclusionsConclusions

The algorithm unites advantages of several The algorithm unites advantages of several updating techniques for the BVH.updating techniques for the BVH. Cheapest update techniques are utilized Cheapest update techniques are utilized when they can keep reasonable BVH quality.when they can keep reasonable BVH quality. The algorithm is applicable to various types The algorithm is applicable to various types of dynamic models.of dynamic models. Lots of produced independent clusters for Lots of produced independent clusters for rebuild will be useful in a future parallel version.rebuild will be useful in a future parallel version.