parallel visualization, data formatting, software overvie · • largest datasets require use of...

51
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation Parallel Visualization, Data Formatting, Software Overview Sean Ahern – Remote Data Analysis and Visualization Center Friday, July 9, 2010

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of

the National Science Foundation

Parallel Visualization, Data Formatting,Software Overview

Sean Ahern – Remote Data Analysis and Visualization Center

Friday, July 9, 2010

Page 2: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

“Who are you?”

ORNL co-I for the SciDAC Visualization and Analysis Center

for Enabling Technologies (VACET)

Director of the University of Tennessee Center for Remote Data

Analysis and Visualization

Visualization lead at ORNL’s Leadership Computing Facility

Friday, July 9, 2010

Page 3: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

How do we handle the hard cases?

But what about large, distributed data?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Or distributed rendering?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Or distributed displays?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Or all three?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Friday, July 9, 2010

Page 4: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Applications are pushing data set sizes toward Exascale

Spatial resolution

Astrophysics, Radiation transport, Combustion,

Chemistry, Molecular dynamics, Fusion, …

Temporal resolution

Molecular dynamics, Climate, Fusion,

Multivariate

Climate,Radiation transport, V&V,

Ensembles

Climate, V&V,

Friday, July 9, 2010

Page 5: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Data scale limits scientific understanding

• Spatial resolution increasing in many domains

• Too many zones to see on screen (with or without high-res Powerwall)

• Temporal complexity increasing

• Climate simulations will have 100,000s of time steps

• Manually finding temporal patterns is tedious and error-prone

• Multivariate overload

• Climate simulations have 100-200 variables

• Issues of data models and domain-specific data

• E.g. Multigroup radiation fields

Friday, July 9, 2010

Page 6: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

HP

C S

ystem

Input

Query

Result

!

Sim

ulation

Datageneration

End-to-end schematic

Friday, July 9, 2010

Page 7: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Dataset size growth

Hero

Large

Medium

Small

IncreasingDataset Size

Friday, July 9, 2010

Page 8: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Data size categories

• Small:

• Data is small enough to easily move anywhere

• Analysis/vis is generally done on local workstations

• Medium:

• Data won’t fit on local workstations

• Generally have to process on “fat” SMP systems

• Large:

• Rather painful to move

• Requires distributed parallelism

• Hero:

• Functionally impossible to move

• Only approachable on largest computational platforms

Hero

Large

Medium

Small

IncreasingDataset Size

Friday, July 9, 2010

Page 9: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Hero

Large

Medium

Small

“Large” data – remote visualization

• Largest datasets require use of institutional resources

• Reduces data movement issues

• Allows exploitation of multiple GPUs

• Provides visualization to remote users

• Exploited by VisIt, ParaView, EnSight

Friday, July 9, 2010

Page 10: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Hero

Large

Medium

Small

“Hero” data

• Purchasing separate analysis systems at the petascale can be prohibitively expensive: $5-20 million

• Working to move largest vis/analysistools to HPC architecture

• VisIt on Cray XT4/5

• See Cray User’s Group ‘08

• ParaView on Cray XT4/5

• See Cray User’s Group ’09

• Parallel R on Cray XT5

• This year (hopefully)

Friday, July 9, 2010

Page 11: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

A lot of success has been through data flow networks (pipelines)

Friday, July 9, 2010

Page 12: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

A lot of success has been through data flow networks (pipelines)

• Different data formats:

• NetCDF, HDF, text, CSV, PDB, …

• Different types of data operations:

• Slicing, resampling, mesh transforms, filtering, …

• Different ways of plotting:

• Pseudocolor, isosurfaces, volume rendering

These are independent of each other!

Friday, July 9, 2010

Page 13: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Make these independent modules

•Data reading

•Data operations

•Data plotting

Read data

Filter

Filter

Plot

Friday, July 9, 2010

Page 14: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Make these independent modules

•Data reading

•Data operations

•Data plotting

Read dataRead data

Filter

FilterFilter

Plot PlotThis is adata flow network

Friday, July 9, 2010

Page 15: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Many software systems use this method

•VTK

•AVS/Express

•SCIRun

•…

Friday, July 9, 2010

Page 16: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Fine for small to medium sized data

Hero

Large

Medium

Small

IncreasingDataset Size

Friday, July 9, 2010

Page 17: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

How do we handle the hard cases?

But what about large, distributed data?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Or distributed rendering?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Or distributed displays?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Or all three?

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

01001101011001 11001010010101 00101010100110 11101101011011 00110010111010

Friday, July 9, 2010

Page 18: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

A good solution: data parallelization for all components

• Identical data flow networks on each processor.

• Networks differentiated by portion of data they operate on.

• Minimal synchronization

• “Scattered/gather”

• No distribution (i.e. scatter), because scatter is done through choice of what data to read.

• Gather: done when rendering

• Distribution of data is key

P1 P2 P3P0I/O

ParallelSimulationCode P0

P1P3

P2

ParallelAnalysisCode

Proc 0 Proc 1 Proc 2

Rendering/compositingFriday, July 9, 2010

Page 19: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Decomposition of data is key issue

• Parallelization strategy: each processor works on piece of data

• Key: Must have good decomposition of data

• Two I/O situations:

• Reading from file, file format enforces decomposition

• Reading from file, file format allows for arbitrary decomposition

• Three processing situations:

• Pre-calculated decomposition

• Must do “overloading” of chunks when more chunks than processors.

• Dynamic decomposition at I/O

• Must understand how to read out arbitrary chunks of data

• On-the-fly redistribution

• Processing-dependent

P0P1

P3

P2

Friday, July 9, 2010

Page 20: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

This can take us a long way

2T cells, 32K procs

Machine Type Problem Size # coresJaguar

Franklin

Dawn

Juno

Ranger

Purple

Cray XT5 2T 32k

Cray XT4 1T, 2T 16k, 32k

BG/P 4T 64k

Linux 1T 16k, 32k

Sun Linux 1T 16k

AIX 0.5T 8k

• Weak scaling study (isocontouring, volume rendering): ~63M cells/core

Friday, July 9, 2010

Page 21: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

This can take us a long way

2T cells, 32K procs

Machine Type Problem Size # coresJaguar

Franklin

Dawn

Juno

Ranger

Purple

Cray XT5 2T 32k

Cray XT4 1T, 2T 16k, 32k

BG/P 4T 64k

Linux 1T 16k, 32k

Sun Linux 1T 16k

AIX 0.5T 8k

Since this work, people havereached 4 trillion and 8 trillion cells.

• Weak scaling study (isocontouring, volume rendering): ~63M cells/core

Friday, July 9, 2010

Page 22: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

This is only part of the puzzle

P1 P2 P3P0I/O

ParallelSimulationCode P0

P1P3

P2

ParallelAnalysisCode

Proc 0 Proc 1 Proc 2

Rendering/compositingFriday, July 9, 2010

Page 23: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

The job of turning geometry into imagery is complex

• One or more geometry generators

• Fed to one or more displays

geometry

geometry

geometry

geometry

geometry

display

display

display

display

geometry

geometry

geometry

geometry

geometry

display

geometry

display

display

display

display

Friday, July 9, 2010

Page 24: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

geometry

geometry

geometry

geometry

geometry

display

Rendering to a single tile(sort last)

• Depth “sort” – Only good for opaque geometry

• For each set of geometry, generate image and depth

• When merging, compare depth and choose pixel that’s closest to the viewer

• Alpha blending – Parallel volume rendering / transparent geometry

• Process all geometry in back-to-front order (not always easy to do in parallel)

• Blend geometry with transparency, creating a final image

• Compositing algorithms:

• Binary tree

• Direct send

• Binary swap / n-way swap

• 2-3 swap

• Radix-swap

• …

Friday, July 9, 2010

Page 25: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

geometry

geometry

geometry

geometry

geometry

display

Rendering to a single tile(sort last)

• Depth “sort” – Only good for opaque geometry

• For each set of geometry, generate image and depth

• When merging, compare depth and choose pixel that’s closest to the viewer

• Alpha blending – Parallel volume rendering / transparent geometry

• Process all geometry in back-to-front order (not always easy to do in parallel)

• Blend geometry with transparency, creating a final image

• Compositing algorithms:

• Binary tree

• Direct send

• Binary swap / n-way swap

• 2-3 swap

• Radix-swap

• …

Friday, July 9, 2010

Page 26: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

geometry

display

display

display

display

Rendering to multiple tiles(sort first)

• Break up geometry based upon camera frustum

• Select only geometry that corresponds to each tile

• Send subset of geometry over network to be rendered

Friday, July 9, 2010

Page 27: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

geometry

display

display

display

display

Rendering to multiple tiles(sort first)

• Break up geometry based upon camera frustum

• Select only geometry that corresponds to each tile

• Send subset of geometry over network to be rendered

Used for many PowerwallsFriday, July 9, 2010

Page 28: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Both at the same time

• Complicated combinations of sort first and sort last

• Some libraries to make this easier:

• Chromium

• IceT

• Equalizer

• Paracomp

• SAGE

geometry

geometry

geometry

geometry

geometry

display

display

display

display

Friday, July 9, 2010

Page 29: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Performance is highly dependent upon I/O rates

• Reading data into memory takes the most time

• Much more than data processing

• Much more than rendering and compositing

• Parallel filesystems are essential for large data

• Data formats designed for parallel filesystems are essential for large data

Friday, July 9, 2010

Page 30: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

“Joule metric” run

Isocontour% time

spent in I/O

103 M cells

321 M cells

2.48 s 87.9%

11.62 s 94.9%

• Weak scaling study (isocontouring,volume rendering): 103M-321M cells on 4k-12k cores

• Superlinear scaling in processing

• But wallclock time is dominated by I/O

Friday, July 9, 2010

Page 31: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Trillion cell study confirms that I/O dominates

2T cells, 32K procs

• Weak scaling study (isocontouring, volume rendering): ~63M cells/core

Machine Type Problem Size # coresJaguar

Franklin

Dawn

Juno

Ranger

Purple

Cray XT5 2T 32k

Cray XT4 1T, 2T 16k, 32k

BG/P 4T 64k

Linux 1T 16k, 32k

Sun Linux 1T 16k

AIX 0.5T 8k

-Approx I/O time: 2-5 minutes-Approx processing time: 10 seconds

Friday, July 9, 2010

Page 32: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

VisIt: an end user visualization and analysis tool for extremely large data

• 5 major use cases: data exploration, data analysis, visual debugging, comparison, and presentation

• Demonstrated scaling to >100k cores

• Strong emphasis on building a usable, robust tool for scientists.

• Plugin model allows for wide range of capabilities

• Open source project, with direct support from DOE (SciDAC, NEAMS, ASC, BER) and NSF (TeraGrid XD)

• visitusers.org wiki with 400+ pages of documentation and over 80,000 page views

• Over 100,000 downloads

• R&D100 Award in 2005

Friday, July 9, 2010

Page 33: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

ParaView: open-source, multi-platform data analysis and visualization application

• Based on VTK: Deep pipeline modification

• Quantitative and qualitative use cases

• Developed to process large datasets using distributed memory computing resources: laptops to supercomputers

• Superb, best-in-class rendering infrastructure

• Customizable GUI

• Access grid integration for collaboration

• Commercial support from Kitware

Friday, July 9, 2010

Page 34: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

EnSight: leading commercial parallel visualization tool

• Specialized for engineering analysis

• CFD, FEA, etc.

• Commercial support from CEI

• Distributed rendering infrastruture

• Direct Powerwall support

• Virtual reality

• Distributed parallel data processing and rendering

Friday, July 9, 2010

Page 35: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Rmpi: Parallel statistical analysis

MPI (via modified Rmpi)

Lustre file system

Node 0 Node 1 Node 2

• Write data reader to bring data from parallel file system

• Develop data-parallel analysis with full capability of R on every node

Node 0 produces graphics files

Friday, July 9, 2010

Page 36: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Central file store

• Scalable parallel throughput

• Lustre, GPFS, Panassas, ...

• Shared between source and destination

• Provides “zero copy” access to simulation results

• Allows smartdeployment ofremotevisualizationcapabilities

Friday, July 9, 2010

Page 37: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Data formatting is important to data understanding

!e purpose of computing is insight, not numbers.

– Richard Hamming

!e most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but '!at's funny ...'

– Isaac Asimov

Friday, July 9, 2010

Page 38: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

We want our data model to reflect how the scientist “thinks” of the data

Simple Complex

bits

bytes

words

arraysnamedarrays

multidimensionalarrays

coordinatefields

scalars

meshes

vectors

atoms

materials

bonds

meshrefinement

collections

timeinterpolation

Each step gets us closer to the ideal.But each step is more domain-specific.

Friday, July 9, 2010

Page 39: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

• All data lives on a mesh

• Discretizes space into points and cells

• 1D, 2D, 3D

• All of these over time (up to 4D)

• Can have lower-dimensional meshes in ahigher-dimensional space (e.g. 2D surface in 3D space)

• Provides a place for data to be located

• Defines how data is interpolated

Meshes

Friday, July 9, 2010

Page 40: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Mesh structure

• 1D Curves

• 2D/3D meshes

• Rectilinear

• Curvilinear

• Unstructured

• Points

• AMR

• Molecular

• Domain decomposed

• Ghost zones/nodes

Unstructured

RectilinearCurve

Curvilinear

Points

Molecular

AMR

Friday, July 9, 2010

Page 41: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Variables

• Scalars, Vectors, Tensors

• Sits on points or cells of a mesh

• Points: linear interpolation

• Cells: piecewise constant

• Could have different dimensionality than the mesh (e.g. 3D vector data on a 2D mesh)

Friday, July 9, 2010

Page 42: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Materials

• Sometimes used in engineering codes

• Describes disjoint spatial regions at a sub-grid level

• Volume/area fractions

Friday, July 9, 2010

Page 43: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Parallel meshes

• Provides aggregation for meshes

• A mesh may be composed of hundreds of thousands of mesh “blocks”.

• Allows data parallelism

Friday, July 9, 2010

Page 44: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

AMR meshes

• Mesh blocks can be associated with patches and levels.

• Allows for aggregation of meshes into AMR hierarchy levels.

Friday, July 9, 2010

Page 45: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Data model goals

• Rich

• Encompasses computational domain as much as possible

• Sharability

• Being able to easily move data between tools

• Being able to give data to other people

• Self description

• Changes to data shouldn’t require a change in the format

• New data should be discoverable without human intervention

• Rapid I/O

• Minimize time dumping out data

• Minimize time reading data

• Parallel I/O

• Support access by multiple threads at once on parallel platforms.

• More…

Friday, July 9, 2010

Page 46: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Data Storage

NetCDF

HDF5Silo

EnSight Gold

FluentFlash

ADIOS

XDMF

PDBS3D

VTK

OpenFOAMCGNS

ChomboTecplot

ViSUS

STL

Friday, July 9, 2010

Page 47: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Underlying libraries

• NetCDF:

• Named array/scalar storage

• Generalized attributes

• Parallel access

• HDF5:

• Named array/scalar storage

• Hierarchical storage

• Parallel read and write

• ASCII

• Simple for humans to read and write

• Horrible for computers to read, especially in parallel

Friday, July 9, 2010

Page 48: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Higher-level libraries and formats

• ADIOS:

• Used for very fast I/O off HPC systems

• HDF5 and native storage

• Silo:

• Very rich (parallel) data model on top of HDF5

• PDB (Protein Data Bank)

• Used for storage of molecular data

• Based on ASCII

• VTK:

• Native storage for the VTK pipeline visualization system

• Both ASCII and binary formats

Friday, July 9, 2010

Page 49: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Higher-level libraries and formats (cont)

• XDMF:

• Neat format that splits “data” and “metadata”

• ASCII XML file for metadata

• HDF5 file for heavyweight data.

Friday, July 9, 2010

Page 50: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Community standards

• NetCDF Climate and Forecast (CF) convention:

• Promotes sharing of data among climate researchers

• Standard naming

• Coordinate systems

• Dimensions, missing data, units, etc.

• Many communities build formats on top of others:

• S3D is over NetCDF

• GTC, Chombo, Enzo, PFLOTRAN, Flash, etc. are over HDF5

• …

Friday, July 9, 2010

Page 51: Parallel Visualization, Data Formatting, Software Overvie · • Largest datasets require use of institutional resources • Reduces data movement issues • Allows exploitation of

Summary

• Parallel visualization is an important capability for gaining insight from petascale (and soon to be exascale) data.

• It’s as difficult of a job as doing the original HPC simulation itself.

• Rendering is not yet a solved problem.

• Data storage and data models are critical to high performance and scientific understanding.

Friday, July 9, 2010