adaptive, multiresolution visualization of large data sets...

Adaptive,MultiresolutionVisualizationof LargeDataSetsusinga DistributedMemoryOctree

�

Lori A. Freitag�

RaymondM. Loy�

Abstract

The interactive visualizationandexplorationof large scientificdatasetsis a challenginganddifficulttask;their sizeoftenfar exceedstheperformanceandmemorycapacityof eventhemostpowerful graphicsworkstations.To addressthis problem,we have createda techniquethat combineshierarchicaldatare-ductionmethodswith parallelcomputingto allow interactive explorationof largedatasetswhile retainingfull-resolutioncapability. The usermay interactively changethe resolutionof the reduceddataseteitherglobally or by specifyinga region of interest.In this way, high resolutioncanbe obtainedin local subre-gionswithout sacrificinggraphicsperformance.We describethesoftwarearchitectureof thesystem,givedetailspertainingto theuseof a distributedmemoryoctreeusedto createthereduceddataset,andpresentperformanceresultsfor thevisualizationof Rayleigh-Taylorinstabilityandx-rayburstsimulationdatasets.

Keywords. InteractiveVisualization,Multi-ResolutionVisualization,AdaptiveVisualization,ParallelOctrees

1 Introduction

A critical stepin thecomputationalsolutionof applicationproblemsis the interactive explorationandvisu-alizationof the resultingdatasets.However, in today’s computationalenvironment,thedatasetsproducedby simulationsperformedon giga-andteraflop-scalemassively parallelprocessors(MPPs)oftenexceedthememoryandperformancecapacityof typical graphicsworkstations.Currently, interactive performanceofhigh-endgraphicsworkstationsis limited to datasetsthatcontainroughly �� logically regulargrid points.This is anorderof magnitudesmallerthanmany datasetsproducedby large-scalesimulations,andthedis-crepancy betweenwhat is easilyvisualizedandwhatscientistswould like to visualizeis likely to continue.Thus,theinteractivevisualizationof verylargedatasetsrequireseither(1) apostprocessingstepto reducethenumberof datapointssentto thevisualizationenvironmentor (2) sophisticatedparallelrenderingalgorithmsthatwork with thefull-resolutiondataset.Both techniqueshavebeenusedsuccessfully, andin thispaperwepresentanadaptive,multi-resolutiondatareductionmethodthatcombinesbothapproaches.Theauthorsweresupportedby theMathematical,Information,andComputationalSciencesDivision subprogramof theOffice of

ComputationalandTechnologyResearch,U.S.Departmentof Energy, underContractW-31-109-Eng-38.TheU.S.Governmentretainsfor itself, andothersactingon its behalf,a paid-up,nonexclusive, irrevocableworldwide licensein saidarticle to reproduce,preparederivativeworks,distributecopiesto thepublic,andperformpublicly anddisplaypublicly, by or on behalfof theGovernment.�

AssistantScientist, Mathematicsand ComputerScienceDivision, Argonne National Laboratory, Argonne, IL 60439. [email protected].�

PostdoctoralAppointee, Mathematicsand Computer ScienceDivision, Argonne National Laboratory, Argonne, IL [email protected].

1

To createareduceddataset,researcherscommonlybuild ahierarchical,multiresolutionrepresentationofthedataor geometricmodelto bevisualized(see[1] for anoverview of thesemethods).In thesetechniques,a seriesof coarserepresentationsof the datais constructedusing,for example,quadtreesor octrees[2, 3],progressive meshes[4, 5], or wavelets[6]. The level of detail in eachregion is controlledthrougha varietyof mechanisms,suchaserror toleranceboundsthatcontrolfidelity to theoriginal model,or userinput, suchasfield of view. Thesemethodsareusefulfor fastnavigation throughthedataset,but maximumresolutionis limited by thememorysizeandspeedof thegraphicsworkstation.

In contrast,parallelrenderingalgorithmsoftenwork with full-resolutiondatasetseitherasthecompu-tation proceedsor asa postprocessingstep[7, 8, 9, 10]. The datais distributedacrossthe processorsofthe MPP, andderived visualizationentitiessuchasisosurfacesor streamlinesarecomputedin parallelandcommunicatedto thegraphicsworkstationfor display. Theadvantageof this techniqueis thatno informationis lost in a datareductionprocess;however, for scientistswho run their parallelapplicationsat remotesites,therearetwo potentialdrawbacksto usingthis methodfor interactivevisualization.First, many MPPsuper-computersareaccessedthroughbatchqueuingsystems,makingit difficult to predictwhenthe“interactive”job will run. Second,thesetechniquesareusefulfor interactive explorationof the datasetonly if thereissufficientnetworkbandwidthbetweenthegraphicsworkstationandtheparallelcomputer.

To addresstheseproblems,we have developeda systemthat combinesdatareductiontechniquesandparallelcomputingto obtainfastperformanceon a local graphicsworkstationwhile retainingaccessto thefull-resolutiondataset. Like theparallelrenderingalgorithmsmentionedabove, we first distributethedataacrosstheprocessorsof aparallelcomputer, but ratherthancomputederivedvisualizationentities,we createa hierarchicalrepresentationof thedatasetusinga distributed-memoryoctreedatastructure.If theuserhasinteractiveaccessto theMPP, thereduceddatasetis communicateddirectly to thelocalgraphicsenvironmentfor visualizationandexploration. In this scenario,thederivedvisualizationquantitiesarecomputedlocallywhich significantly alleviatesthe bandwidthrequirementsof the parallel renderingalgorithmsmentionedabove. As regionsof interestareidentified,theoctreemaybe easilyadaptedto reflectchangingresolutionrequirements.Additional detail is obtainedby refining the octree,whereasreduceddetail is obtainedbypruningor truncatedtraversal.Thustheusercan“zoom in” andobtainmoreresolutionin local subregionswithoutsacrificinggraphicsperformanceby simultaneouslycoarseningthedataoutsidetheregionof interest.If theuserdoesnothave interactiveaccessto theMPP, thesamefunctionalitycanbeobtainedby saving eachreduceddatasetto a file thatis transferredto andvisualizedon thelocalgraphicsworkstation.

The remainderof this paperis organizedasfollows. In Section2, we describethesoftwarearchitectureof the system,both the envisionedfinal toolkit andits currentinstantiation. In Section3, we describethecreationof thereduceddatasetusingtheparalleloctreeinfrastructure.In particular, we discusstheparallelimplementationof datainsertioninto the octreeand the dynamicload balancingof octantsas resolutionneedschange.In Section4, weshow theuseof thissystemto visualizedatafrom Rayleigh-Taylor instabilitysimulationsandanx-ray burstcalculation.We provide performanceresultsfor thecreationof eachreduceddatasetandthecorrespondingvisualizationframerates.Wealsoanalyzethefidelity of thereduceddatasetsto theoriginal dataasthe level of detail is changed.Finally, in Section5 we offer concludingremarksandindicatedirectionsof futurework.

2 Software Architecture

Theinteractivedatareductionsystemis comprisedof four majorcomponents:

1. field datainput to theparalleloctreecodeeitherthroughfile input or throughinteractive requeststo arunningapplication,

2

2. a paralleloctreecodeto createthereduceddataset,

3. the local visualizationenvironment,which canconsistof externally developeddesktoptoolssuchasvtk [12], the GeneralMeshViewer (GMV) [13], and IRIS Explorer [14], or customtools built forstate-of-the-artdisplaydevicessuchastheCAVE virtual reality theater[15], and

4. a portablecommunicationinfrastructurethatallows the userto makeinteractive requestsfor new re-duceddatasetsfrom thevisualizationenvironmentandallowstheoctreecodeto obtainnew field datafrom therunningapplication.

Thesefour componentsandtheir interactionsareshown in Figure1. The arrows betweenthe componentsindicatethe communicationnecessaryfor an interactive environment. The width of the arrows indicatesthe relative sizeof the messages;small messagesareneededto requestnew reduceddatasetsor field datavalues;largemessagesarerequiredto transferinformationfrom theapplicationto theoctreecodeandfromthe octreecodeto the visualizationenvironment. Solid boxes andsolid arrows indicatecomponentsandinteractionsthatcurrentlyexist; dashedbox outlinesandarrows indicatecomponentsandinteractionmodelsthat areplannedfor future instantiationsof the toolkit. The visualizationtoolkits listed in Romanfont arecurrentlysupported;near-term supportis plannedfor thosein italics. Asterisksindicateextensibletoolkitsthatwill supportinteractivity; theothersmustbe usedwith file input andoutput. We notethatbecauseourreduceddatasetis derived from an adaptive octreedatastructure,only visualizationtools that cansupportmulti-resolutionor unstructureddatasetsareconsidered.

Figure1: Thefour primarycomponentsof thedatareductiontoolkit andtheir interactions

In thecurrentinstantiation,scalarfield datais loadedfrom a file anddistributedacrosstheprocessorsofan MPP. As it is distributed,it is insertedinto theparalleloctreedatastructureaccordingto a user-definedcriterion. The reduceddatasetis formedby averagingor agglomeratingthe dataassociatedwith eachleafoctant,andthesevaluesareeitherstoredto afile for laterprocessingor communicateddirectly to thegraphicsworkstationfor visualization.Detailspertainingto theimplementationanduseof theparalleloctreearegivenin Section3.

3

We currently supportthreevisualizationenvironments: GMV, a customdesktoptoolkit basedon vtkclasses,anda customCAVE virtual reality environment. GMV is a freely availablesoftwarepackagefromLos Alamosthat is distributedin binary form. It is lightweightandeasyto usebut not extensibleto supportinteractive requeststo the paralleloctreecode. To supportdynamic,adaptive level-of-detail requests,wedevelopeda visualizationtool usingthe freely availablevtk classlibrary. This customtool allows the userto interactively changethe level of detaileitherglobally throughoutthecomputationaldomainor locally bydefininga boundingbox arounda region of interest. Visualizationtechniquescurrentlysupportedin thistool includeisosurfacecontouring,cuttingplanes,andsurface/meshvisualization,andwe notethat this setof functionalitiescanbe easilyextendedby taking full advantageof the vtk classes.All resultspresentedin this paperwereobtainedfrom thevtk environment. The third visualizationenvironmentwe supportis acustomtool developedfor the CAVE virtual reality theater. This environmentis still in the early stagesofdevelopment,andwe planuseit to testtheutility of thehierarchicalvisualizationparadigmin animmersiveenvironment.

Thefinal componentof thesoftwaresystemis acommunicationinfrastructurewhichmeetsthefollowingdesignrequirements.

It mustbebothportableandflexible sothattheparalleloctreecodeandthevisualizationenvironmentcanrunsimultaneouslyon different,possiblyremote,computerarchitectures.

It mustallow dynamiclinking anddecouplingof multipleprocessessothat(1) theusercanperiodicallymonitora long-runningapplicationand(2) multiple scientistscancollaboratewhile visualizingtheirdata.

It must help managesynchronization of distributeddatasetsto ensurethat the reduceddataset isself-consistent.

Onecommunicationlibrary thatmeetsall of theserequirementsis theALICE MemorySnooper(AMS) [11].This library wasdesignedto providea lightweightAPI andinfrastructurethatenablescomputationalsteeringandremotemonitoringof applicationprograms.Thedesignis basedonaclient/servermodelthatallowsusersto connectto arunningapplicationandaccessor modify theapplication’spublishedvariablesatuser-definedsynchronizationpoints. Thus it canbe usedboth for fulfilling interactive requestsfor new datasetsfroma long-runningsimulationandfor modifying or “steering” the simulation. In the currentinstantiation,theoctreecodepublishesvariablesrelatingto thecriteria for refinementandcoarseningof the treeandalsothereduceddataset.Thevtk clientaccessesthatinformationandis ableto changetherefinementandcoarseningcriteria,definea boundingbox to specifya regionof interest,andaccesstheresultingreduceddataset.AMSportabilityandflexibility areachievedby usinga socketsandTCP/IPprotocol.

3 Creating Reduced Data Sets

Our datareductiontechniqueis basedon creatinga hierarchicalrepresentationof the datausinga paralleloctree. We first describethe criteria usedfor insertingfield datainto the octreeand the error indicatorsassociatedwith thereductionstoredon eachoctantleaf. We thendescribetheparalleloctreeinfrastructure,includingthedatastructuresandtechniquesusedfor parallelcreation,coarsening,andtraversalof thetree.

4

3.1 Data Reduction

To createthereduceddataset,field datavaluesareinsertedinto theappropriateleafoctantsusingtheirspatiallocation. A user-definedinsertioncriterion is usedto determinewhetherthe octantshoulddivide to createeightnew leaf octants.Currentlytheusersetsa maximumnumberof pointsto be insertedinto eachoctant.This techniquemaintainsapproximatelyuniformfidelity to thesolutiondatafor �� adaptivemeshesin whichtheerrorat eachdatapoint is roughlyconstant.For uniformmeshesor �� adaptive methods,we planto useerror indicator techniquessuchasenergy normsor gradientboundsto determinewhenoctantsshouldbesubdivided.

Thefield datavaluesareaveragedor otherwiseagglomeratedafter insertioninto theoctree.To providean indicationof the error associatedwith the reduceddataset, for eachleaf octantwe computeandstorestatisticalvaluessuchasthestandarddeviation, � , andmaximumdeviation from themean,� . Thesevaluesarenormalizedby themeanto yield �� and �� , respectively, which areincludedasadditionalscalarfieldstobevisualizedso thattheuserhasan indicationof thefidelity of thereduceddatasetto theoriginal dataset.Thesemeasuresof erroralsoserve to highlight potentialregionsof interest;thecellswith a large deviationfrom theaveragevaluearelikely to havefine-scalestructurethatwasnotadequatelycapturedby thereductionprocess.

3.2 Parallel Octree Infrastructure

To effectively managea large distributeddataset,the octreemustalsobe distributedacrossthe processorsof theMPP. In Figure2, we show theparalleloctreedatastructure;thefigureon theleft shows anexampletree,andthefigureontheright showsthesametreedistributedacrossthreeprocessorsof aparallelcomputer.Efficient traversalof the paralleloctreedatastructureis enabledby interoctantlinks, which may be eitherlocal or off-processor(shown by solid anddottedlines,respectively). Eachprocessormaintainsa local rootlist of octantswhoseparentsareoff-processor. Spatialinformationassociatedwith eachof theselocal rootoctantsallowsthecoordinateinformationof its descendantsto beeasilyinferredwithoutcommunication.Anoctreepartitioningalgorithmusingaspace-fillingcurve [16], togetherwith arbitraryoctantmigration,is usedfor loadbalancingthefield dataandtheassociatedoctree.If thedatareductionwerecloselycoupledto theparallelapplication,octantmigrationcouldbeperformedto trackthelocationof theapplicationdata.

B D E

E

LR

LR

A

C

Processor 0

LR C

F

A C

D F

Processor 1

Processor 2

G

FED

A

E

G

B C

FD

A

Figure2: A portionof anoctree(left) andthe sametreedistributedacrossthreeprocessors(right). Dottedlinks arenonlocal;squaresindicatelocal rootspatialinformation;local root listsarelabeledLR.

5

Whencreatinganinitial paralleloctreefrom file input,processor0 is responsiblefor readingthefile anddistributing thedatato theprocessorthatownsthecorrespondingpieceof thedomain.Becausethedatasetis too large to fit in a singleprocessor’s memory, we reada small portion of the dataset,distribute thosepointsto their respective processors,andrepeattheprocessuntil theentirefile is loaded.To maintaingoodparallelperformance,theoctreeis periodicallyrebalancedduringthis process.After rebalancing,thespatialinformationandprocessorassociationof all local rootsarecollectedto processor0 which allows processorownershipof any point to be determinedeasily. If a point is containedin more than one local root, thesubtreeassociatedwith thesmallestof theselocal rootscontainstheleaf octantinto which this point shouldbeinserted.For example,a point lying in octantD in Figure2 is containedin theboundsof octantC aswellasC’sparent,A. In thiscase,D is thesmallestlocal root andthepoint is sentto processor2.

Oncetheparalleloctreehasbeenconstructed,refinementandcoarseningto complywith new visualiza-tion criteriaareperformedusingthealgorithmin Figure3. Thealgorithmis performedin parallelfor eachprocessor’s local rootsandno communicationis necessary. Leaves arerecursively subdivided whentheycontainmorethantheprescribedmaximumnumberof points, �� . A setof leaf siblingsareprunedwhenthey eachcontainlessthantheprescribedminimumnumberof points, ��! "� , andthetotal numberof pointsamongthesiblingsis lessthan ��!�� . We notethat it is not alwayspossibleto meetbotha maximumandminimuminsertioncriteria;in our implementation��!�� takesprecedence.Thecurrentalgorithmdeclinestocoarsenwhenoff-processorlinks areencountered.Completecoarseningcouldbeaccomplishedby migratingterminaloctantswhich arelocal rootsto theirparent’sprocessorandrepeatingthetreeadjustment.Whennolocal rootsareterminal,theprocessis complete.

After theparalleloctreehasbeenadaptedto meettheinsertioncriteria,copiesof theoctantswith theirav-eragedata(but without themuchlargeroriginaldata)aremigratedto processor0. Processor0 thenpublishesa uniquevertex list, theoctreeconnectivity, andthereduceddatasetwhich is accessedby thevisualizationprogram.

4 System Performance

To testtheperformanceof our system,we selectedthreedatasetsof varyingsize,dimensionality, andmeshtypethathave arisenin a joint projectbetweenArgonneNationalLaboratoryandtheUniversityof Chicago.Thefirst two datasetsarefrom Rayleigh-Taylor (R-T) simulationsin both two andthreedimensions.Thetwo-dimensionalR-T dataset is from the adaptive Piecewise ParabolicMethod (PPM) codedevelopedatthe University of Chicagothat is a descendantof PARAMESH [17] andPROMETHEUS[18]. The three-dimensionalR-T dataset is from an adaptive tetrahedralmeshcodeusinga DiscontinuousGalerkinEulerformulationdevelopedatRensselearPolytechnicInstitute[16]. Thethird datasetis from athree-dimensionalsimulationof anx-ray burston a neutronstar. Thissimulationalsousestheadaptive PPMcodeandthedatasetis particularlychallengingtovisualizedueto theextremescalesin lengthandvariablevalues(for example,densityvaluesrangeover a #�$% spectrum).The problemmeshtypes,datasizes,anddimensionalitiesaresummarizedin Table1.

For eachtestproblem,the datareductionwasperformedon � processorsof an SGI Onyx2 (250 MHzMIPS R10000processors),where�'&(� and ) for thetwo- andthree-dimensionalR-T datasets,and�*&+#��for thex-ray burst dataset. Thevtk visualizationenvironmentwasrun on a singleOnyx2 Reality Monstergraphicsengine.

For theexperimentspresentedhere,coarseinitial octreesarecreatedusingtheinsertioncriteria ��,&��! "�-&.#�$�$ and300for thetwo- andthree-dimensionalR-T datasets,respectively, and ��!��/&0��! 1�2&3 ��$�$ for the x-ray burst dataset. The initial datadistribution and octreeconstructionare performedas

6

subtreeadjust(oct,46587�9 ,465;:=< ) >if (terminal(oct))

if (npoints(oct)?/4 587@9 ) >recursively refineoctuntil 46587�9 satisfiedreturn(NOCOARSEN)A

elseif (npoints(oct)B'4 5;:C< )return(npoints(oct));

/* octantis nonterminal*/total=0for i=0..7

if (child i is on local processor)total+=subtreeadjust(child i)

if (anychild returnedNO COARSENor is nonlocal)return(NOCOARSEN);

if (total BED*46587�9 ) >deletechildrenof octplacingtheir dataat octA

if (total B/465;:=< )return(total);

elsereturn(NOCOARSEN);A

Figure3: Thealgorithmfor adjustingthesubtreerootedatoct to thenew insertioncriteria, �� "� and ��!�� .Thisoperationis performedin parallelfor all local rootson all processors.

describedin Section3 andrequired7.98secondsfor thetwo-dimensionaldataset,376secondsfor thethree-dimensionalRayleigh-Taylordataset,and1735secondsfor thex-rayburstproblem.Wenotethatsignificantperformanceimprovementsfor thisaspectof theproblemcanbeachievedby usingcollectiveoperationsandparallelI/O which is thetargetof near-termwork. Oncethedatais distributed,the leaf octantaveragesandstatisticsarecomputedon eachof the processorsrequiring.0335,2.24,and.8273secondsrespectively forthe threetestproblems.To testthe interactive performanceof this system,we changethe global insertioncriterionand/oralsospecifya region of interestto bedisplayedata higherresolutionfor eachdataset.

Theperformanceresultsfor theseexperimentsaregivenin Table2. For eachlevel of detail,we give theinsertioncriterion, ��!��GF@&H��! "��I , thenumberof resultingleaf octants,� , andthepercentageof thefulldataset to which � corresponds,J . We report insertioncriteria specifiedfor regionsof interestas K ,LwhereK is theinitial globalcriterionandL is thecriterionusedfor octreeinsertionin thespecifiedregion.We alsoreportthemaximum� � over all octants,themaximum � � over all octants,theaverage� � , andtheaverage � � (recall that � � and � � were definedin Section3 to be the normalizedstandarddeviation andthe normalizedmaximumdeviation from themean,respectively). Thefirst two valuesgive the worst-casescenariofor fidelity to the original dataset; the latter two valuesgive a measureof the overall fidelity ofthe reduceddataset. In addition,we includethetime in seconds,MON , to revise theoctreefrom theoriginal

7

Table1: DatasetcharacteristicsSimulation Dimension MeshType DataLocation Numberof Cells

Rayleigh-Taylor 2-D Block Structured Cell-Center 53,480Rayleigh-Taylor 3-D Tetrahedral Cell-Center 373,611

X-ray burst 3-D Block Structured Cell-Center 6,718,976

criteriagiven in thefirst row to thenew criteria,andthe framerate, P�Q , for vtk visualization(measuredinframes/second)whendisplayingthewireframeoctreerepresentation.ThereduceddatasetscorrespondingtotheR-T experimentsareshown in Figures5 and6, respectively.

Table2: Resultsfor two- andthree-dimensionalRayleigh-TaylorsimulationdatasetsMax. Max. Avg. Avg.

� �� J � � � � � � � � MON P�Q2-D Rayleigh-Taylor

100 820 1.6 0.397 1.081 0.134 0.294 — 17510 13120 6.2 0.396 0.932 0.081 0.154 .888 13

100,2 9073 17.2 0.397 1.081 0.010 0.022 .505 20

3-D Rayleigh-Taylor300 5296 1.4 .114 .259 .0294 .0670 — 5540 34984 9.3 .088 .206 .0154 .026 1.27 10

300,20 38336 10.2 .079 .132 .0124 .017 1.38 9

3-D X-ray Burst3500 5614 .08 22.38 50.60 .629 2.56 — 312000 13134 .20 16.01 25.64 .550 1.75 5.64 131000 16126 .25 16.01 25.64 .542 1.62 13.88 12

The largesterrors in both two and threedimensionsfor the Rayleigh-Taylor instability problemsarelocatedalong the discontinuitybetweenthe two fluids. Becausethesefeaturesaremuchsmallerthantheleaf octants,themaximum�O� (aboutforty percentin two dimensionsandninepercentin threedimensions)only slowly decreasesas � increases.The average�O� decreasesmoresignificantlyas � increaseswhichimplies that thereducedRaleigh-Taylor datasetsarefairly accuraterepresentationsof theoriginal datasets(eightpercenterror in 2-D; 1.5 percenterror in 3-D) areachieved for JSRH� and JSRUT , respectively. Forthe x-ray burst problem,the errorsassociatedwith the reduceddatasetspresentedarestill quite large; � �is 54 percentfor thefinestlevel of refinementindicatingthat furtheroctreerefinementis necessary. For theRayleigh-Taylor datasets,a region of interestwasspecifiedand refinedstartingfrom the global criteriongiven in the first row. In eachcasea significantimprovementin the averageerror indicatorswasobtained;particularlyin two dimensionswherethecriterionwaschosento eliminateall error in thespecifiedregion.Thetimerequiredto adapttheoctree,M N , waslessthana secondfor thesmalltwo-dimensionaltestproblemandapproximately1.3 secondsfor the three-dimensionalR-T problem. The adaptationof the neutronstardataset requiredmoretime becauseeachleaf octantcontainsmoredatapoints increasingthe sortingandreassignmenttime of thefield datavaluesduringrefinement.

8

To further illustrate the relationshipbetweenthe error measuresand the numberof octantsassociatedwith globalrefinements,weplot average� � and � � valuesfor boththetwo- andthree-dimensionalRayleigh-Taylor datasetsfor criterionof � �� &S��VXWXWYW@V#�$��Z in two dimensionsand � �!�� &SZ�VXWYWXW[V#�$��Z in threedimensionsin Figure4. As � increases,thecorrespondingdecreasesin � � and � � both approachzeroaswell aseachother.

Errorsvs. N Errorsvs. NTwo-DimensionalR-T Three-DimensionalR-T

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

10 100 1000 10000 100000

Nor

mal

ized

Err

or

\

N

sigmamax error

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1000 10000 100000 1e+06

Nor

mal

ized

Err

or

\

N

sigmamax error

Figure4: Therelationshipbetween� andtheerrormeasuresaverage�O� (sigma)andaverage�� (maxerror)for theRayleigh-Taylordatasetsin bothtwo andthreedimensions

In Figures5 and6, we show thereducedR-T datasetscorrespondingthecasesdiscussedin Table2. Ineachcasewe show the reduceddataset image,the error plots associatedwith that level of detail, andtheassociatedleaf octants.Thelight-coloredoctantleavesin themiddlerow of figuresindicateregionsin whichtheerrorassociatedwith thedatareductionis large. Theseregionscorrespondto thesharpdensityinterfacebetweenthe two incompressiblefluids. As the userrequestsfiner detail,moreleaf octantsarecreated,theregionsof largeerror reducein size,andmorefine scalefeaturesarevisible. In thefinal column,we showthecasescorrespondingrefinementof only a region of interestasindicatedby theboundingbox outline ineachfigure.

5 Directions of Future Work

Ongoingwork for this toolkit includesaddingnew functionalitiesto improve the flexibility of the octreecode,performancetuningandtesting,andeventuallyaddingcomputationalsteeringcapabilitiesto monitorandchangeongoinglarge-scalesimulations. To improve the flexibility , we will provide moreoptionsforaveragingor interpolatingthe dataas it is insertedinto the octreeandfor determiningwhen to subdivideleaf octantsincluding,for example,gradienttestson scalarfields.We will alsoprovide subroutinestubsthatallow usersto customdesignthedatareductionmethodmostappropriatefor theirsimulationdata.

9

Figure5: Fromleft to right, two-dimensionalRayleigh-Taylor imagesat threedifferentlevelsof detail. Foreachlevel of detailwe show density(top), thestandarddeviation resultingfrom thedatareduction(center),andthecorrespondingoctantleaves(bottom).

10

Figure6: Fromleft to right, three-dimensionalRayleigh-Taylor imagesat threedifferentlevelsof detail.Foreachlevel of detailwe show density(top), thestandarddeviation resultingfrom thedatareduction(center),andthecorrespondingoctantleaves(bottom).

11

In addition,severalperformanceimprovementscanbeobtainedin theparalleloctreecode.In particular,processingthe initial dataset from a file canbe improved by leveragingongoingwork in collective com-municationwithin theROMIO implementationof MPI-IO, andoctantmigrationbetweenprocessorscanbeimprovedby agglomeratingmessagesto destinationprocessors.Wealsoplanto testtheperformanceonwideareanetworksto tunetheinfrastructurefor remoteaccessto thereduceddataset.Initial efforts will examineperformanceof a local ethernetconnection,a 10 Mb/s ethernetWAN betweenANL andChicago,andanATM WAN betweenANL andChicago.

Acknowledgments

We acknowledgethe FLASH project for providing the motivating applicationdescribedin this paper. Inparticular, we thankMike Singerof theUniversityof Chicagofor his assistancein preparingtheRayleigh-Taylor andx-ray burst datasetsfor testingwith our system.We would alsolike to thankMatt AhrensandIbrahimaBa for their help in incorporatingthe ALICE Memory Snooperinto the datareductionsoftwaresystem.Finally, we thankthe reviewersfor their insightful commentswhich helpedimprove the quality ofthispaper.

References

[1] Paul HeckbertandMichael Garland. Multiresolutionmodelingfor fast rendering. In ProceedingsofGraphicsInterface94, pages43–50,1994.

[2] PeterLindstrom, David Koller, William Ribarsky, Larry Hodges,Nick Faust, and Gregory Turner.Real-time,continuouslevel of detail renderingof heightfields. In ComputerGraphicsProceedingsSIGGRAPH96,AnnualConferenceSeries, pages109–118.ACM, 1996.

[3] Brian Von HerzenandAlan Barr. Accuratetriangulationsof deformed,intersectingsurfaces.In Com-puterGraphicsProceedings,SIGGRAPH87, volume21,pages103–110.ACM, 1987.

[4] HuguesHoppe. Efficient implementationof progressive meshes.TechnicalReportMSR-TR-98-02,MicrosoftResearch,MicrosoftCorporation,1998.

[5] HuguesHoppe.Progressivemeshes.In ComputerGraphicsSIGGRAPH96Proceedings, pages99–108,1996.

[6] JosRoerdinkand Michel Westenberg. Wavelet-basedvolume visualization. TechnicalReport IWI98-9-06,Institutefor MathematicsandComputingScience,Universityof Groningen,1998.

[7] T. W. CrockettandT Orloff. A MIMD renderingalgorithmfor distributedmemoryarchitectures.InProceedingsof theParallel RenderingSymposium, pages35–42,1993.

[8] ThomasCrockett. Beyond the renderer:Softwarearchitecturefor parallelgraphicsandvisualization.TechnicalReportICASE ReportNo. 96-75,Institutefor ComputerApplicationsin ScienceandEngi-neering,1996.

[9] Kwan-Lui Ma. Parallel renderingof 3D AMR dataon SGI/CrayT3E. In Proceedingsof theFrontiers99 Conference, pages138–145,1999.

12

[10] RobertHaimes.pV3: A distributedsystemfor large-scaleunsteadyCFD visualization. AIAA paper,94-0321,January1994.

[11] Ibrahima Ba, ChristopherMalon, and Barry Smith. Design of the ALICE Memory Snooper,http://www.mcs.anl.gov/ams,1999.

[12] Will Schroeder, KenMartin,andBill Lorensen.TheVisualizationToolkit,AnObject-OrientedApproachto 3D Graphics. PrenticeHall PTR,UpperSaddleRiver, New Jersey, 1998.

[13] FrankOrtega. Generalmeshviewer, user’s manual,1999.

[14] IRIS explorer, release3.5,user’sguide,Unix version,1993.

[15] C. Cruz-Neira,D. J.Sandin,andT. A. DeFanti. Surround-screenprojection-basedvirtual reality: Thedesignandimplementationof theCAVE. In ACM SIGGRAPH93 Proceedings, pages135–142.ACM,1993.

[16] JosephE. Flaherty, RaymondM. Loy, Mark S. Shephard,Boleslaw K. Szymanski,JamesD. Teresco,andLouis H. Ziantz. Adaptive local refinementwith octreeload-balancingfor the parallel solutionof three-dimensionalconservation laws. IMA PreprintSeries1483,Institutefor MathematicsandItsApplications,Universityof Minnesota,1997.To appear, J. Parallel andDist. Comput.

[17] P. MacNeice,K. Olson,M. Mobarry, R. deFainchtein,andC. Packer. A paralleladaptive meshrefine-mentcommunitytoolkit. Submittedto ComputerPhysicsCommunications, 1999.

[18] B. Fryxell, E. Muller, andD. Arnett. Hydrodynamicsandnuclearburning. Max-Planck-Institut furAstrophysikReport449, 1989.

13

adaptive, multiresolution visualization of large data sets...

Documents