parallel visualization, data formatting, software overvie · • largest datasets require use of...
TRANSCRIPT
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of
the National Science Foundation
Parallel Visualization, Data Formatting,Software Overview
Sean Ahern – Remote Data Analysis and Visualization Center
Friday, July 9, 2010
“Who are you?”
ORNL co-I for the SciDAC Visualization and Analysis Center
for Enabling Technologies (VACET)
Director of the University of Tennessee Center for Remote Data
Analysis and Visualization
Visualization lead at ORNL’s Leadership Computing Facility
Friday, July 9, 2010
How do we handle the hard cases?
But what about large, distributed data?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Or distributed rendering?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Or distributed displays?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Or all three?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Friday, July 9, 2010
Applications are pushing data set sizes toward Exascale
Spatial resolution
Astrophysics, Radiation transport, Combustion,
Chemistry, Molecular dynamics, Fusion, …
Temporal resolution
Molecular dynamics, Climate, Fusion,
…
Multivariate
Climate,Radiation transport, V&V,
…
Ensembles
Climate, V&V,
…
Friday, July 9, 2010
Data scale limits scientific understanding
• Spatial resolution increasing in many domains
• Too many zones to see on screen (with or without high-res Powerwall)
• Temporal complexity increasing
• Climate simulations will have 100,000s of time steps
• Manually finding temporal patterns is tedious and error-prone
• Multivariate overload
• Climate simulations have 100-200 variables
• Issues of data models and domain-specific data
• E.g. Multigroup radiation fields
Friday, July 9, 2010
HP
C S
ystem
Input
Query
Result
!
Sim
ulation
Datageneration
End-to-end schematic
Friday, July 9, 2010
Dataset size growth
Hero
Large
Medium
Small
IncreasingDataset Size
Friday, July 9, 2010
Data size categories
• Small:
• Data is small enough to easily move anywhere
• Analysis/vis is generally done on local workstations
• Medium:
• Data won’t fit on local workstations
• Generally have to process on “fat” SMP systems
• Large:
• Rather painful to move
• Requires distributed parallelism
• Hero:
• Functionally impossible to move
• Only approachable on largest computational platforms
Hero
Large
Medium
Small
IncreasingDataset Size
Friday, July 9, 2010
Hero
Large
Medium
Small
“Large” data – remote visualization
• Largest datasets require use of institutional resources
• Reduces data movement issues
• Allows exploitation of multiple GPUs
• Provides visualization to remote users
• Exploited by VisIt, ParaView, EnSight
Friday, July 9, 2010
Hero
Large
Medium
Small
“Hero” data
• Purchasing separate analysis systems at the petascale can be prohibitively expensive: $5-20 million
• Working to move largest vis/analysistools to HPC architecture
• VisIt on Cray XT4/5
• See Cray User’s Group ‘08
• ParaView on Cray XT4/5
• See Cray User’s Group ’09
• Parallel R on Cray XT5
• This year (hopefully)
Friday, July 9, 2010
A lot of success has been through data flow networks (pipelines)
Friday, July 9, 2010
A lot of success has been through data flow networks (pipelines)
• Different data formats:
• NetCDF, HDF, text, CSV, PDB, …
• Different types of data operations:
• Slicing, resampling, mesh transforms, filtering, …
• Different ways of plotting:
• Pseudocolor, isosurfaces, volume rendering
These are independent of each other!
Friday, July 9, 2010
Make these independent modules
•Data reading
•Data operations
•Data plotting
Read data
Filter
Filter
Plot
Friday, July 9, 2010
Make these independent modules
•Data reading
•Data operations
•Data plotting
Read dataRead data
Filter
FilterFilter
Plot PlotThis is adata flow network
Friday, July 9, 2010
Many software systems use this method
•VTK
•AVS/Express
•SCIRun
•…
Friday, July 9, 2010
Fine for small to medium sized data
Hero
Large
Medium
Small
IncreasingDataset Size
Friday, July 9, 2010
How do we handle the hard cases?
But what about large, distributed data?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Or distributed rendering?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Or distributed displays?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Or all three?
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
01001101011001 11001010010101 00101010100110 11101101011011 00110010111010
Friday, July 9, 2010
A good solution: data parallelization for all components
• Identical data flow networks on each processor.
• Networks differentiated by portion of data they operate on.
• Minimal synchronization
• “Scattered/gather”
• No distribution (i.e. scatter), because scatter is done through choice of what data to read.
• Gather: done when rendering
• Distribution of data is key
P1 P2 P3P0I/O
ParallelSimulationCode P0
P1P3
P2
ParallelAnalysisCode
Proc 0 Proc 1 Proc 2
Rendering/compositingFriday, July 9, 2010
Decomposition of data is key issue
• Parallelization strategy: each processor works on piece of data
• Key: Must have good decomposition of data
• Two I/O situations:
• Reading from file, file format enforces decomposition
• Reading from file, file format allows for arbitrary decomposition
• Three processing situations:
• Pre-calculated decomposition
• Must do “overloading” of chunks when more chunks than processors.
• Dynamic decomposition at I/O
• Must understand how to read out arbitrary chunks of data
• On-the-fly redistribution
• Processing-dependent
P0P1
P3
P2
Friday, July 9, 2010
This can take us a long way
2T cells, 32K procs
Machine Type Problem Size # coresJaguar
Franklin
Dawn
Juno
Ranger
Purple
Cray XT5 2T 32k
Cray XT4 1T, 2T 16k, 32k
BG/P 4T 64k
Linux 1T 16k, 32k
Sun Linux 1T 16k
AIX 0.5T 8k
• Weak scaling study (isocontouring, volume rendering): ~63M cells/core
Friday, July 9, 2010
This can take us a long way
2T cells, 32K procs
Machine Type Problem Size # coresJaguar
Franklin
Dawn
Juno
Ranger
Purple
Cray XT5 2T 32k
Cray XT4 1T, 2T 16k, 32k
BG/P 4T 64k
Linux 1T 16k, 32k
Sun Linux 1T 16k
AIX 0.5T 8k
Since this work, people havereached 4 trillion and 8 trillion cells.
• Weak scaling study (isocontouring, volume rendering): ~63M cells/core
Friday, July 9, 2010
This is only part of the puzzle
P1 P2 P3P0I/O
ParallelSimulationCode P0
P1P3
P2
ParallelAnalysisCode
Proc 0 Proc 1 Proc 2
Rendering/compositingFriday, July 9, 2010
The job of turning geometry into imagery is complex
• One or more geometry generators
• Fed to one or more displays
geometry
geometry
geometry
geometry
geometry
…
display
display
display
display
geometry
geometry
geometry
geometry
geometry
…
display
geometry
display
display
display
display
Friday, July 9, 2010
geometry
geometry
geometry
geometry
geometry
…
display
Rendering to a single tile(sort last)
• Depth “sort” – Only good for opaque geometry
• For each set of geometry, generate image and depth
• When merging, compare depth and choose pixel that’s closest to the viewer
• Alpha blending – Parallel volume rendering / transparent geometry
• Process all geometry in back-to-front order (not always easy to do in parallel)
• Blend geometry with transparency, creating a final image
• Compositing algorithms:
• Binary tree
• Direct send
• Binary swap / n-way swap
• 2-3 swap
• Radix-swap
• …
Friday, July 9, 2010
geometry
geometry
geometry
geometry
geometry
…
display
Rendering to a single tile(sort last)
• Depth “sort” – Only good for opaque geometry
• For each set of geometry, generate image and depth
• When merging, compare depth and choose pixel that’s closest to the viewer
• Alpha blending – Parallel volume rendering / transparent geometry
• Process all geometry in back-to-front order (not always easy to do in parallel)
• Blend geometry with transparency, creating a final image
• Compositing algorithms:
• Binary tree
• Direct send
• Binary swap / n-way swap
• 2-3 swap
• Radix-swap
• …
Friday, July 9, 2010
geometry
display
display
display
display
Rendering to multiple tiles(sort first)
• Break up geometry based upon camera frustum
• Select only geometry that corresponds to each tile
• Send subset of geometry over network to be rendered
Friday, July 9, 2010
geometry
display
display
display
display
Rendering to multiple tiles(sort first)
• Break up geometry based upon camera frustum
• Select only geometry that corresponds to each tile
• Send subset of geometry over network to be rendered
Used for many PowerwallsFriday, July 9, 2010
Both at the same time
• Complicated combinations of sort first and sort last
• Some libraries to make this easier:
• Chromium
• IceT
• Equalizer
• Paracomp
• SAGE
geometry
geometry
geometry
geometry
geometry
…
display
display
display
display
Friday, July 9, 2010
Performance is highly dependent upon I/O rates
• Reading data into memory takes the most time
• Much more than data processing
• Much more than rendering and compositing
• Parallel filesystems are essential for large data
• Data formats designed for parallel filesystems are essential for large data
Friday, July 9, 2010
“Joule metric” run
Isocontour% time
spent in I/O
103 M cells
321 M cells
2.48 s 87.9%
11.62 s 94.9%
• Weak scaling study (isocontouring,volume rendering): 103M-321M cells on 4k-12k cores
• Superlinear scaling in processing
• But wallclock time is dominated by I/O
Friday, July 9, 2010
Trillion cell study confirms that I/O dominates
2T cells, 32K procs
• Weak scaling study (isocontouring, volume rendering): ~63M cells/core
Machine Type Problem Size # coresJaguar
Franklin
Dawn
Juno
Ranger
Purple
Cray XT5 2T 32k
Cray XT4 1T, 2T 16k, 32k
BG/P 4T 64k
Linux 1T 16k, 32k
Sun Linux 1T 16k
AIX 0.5T 8k
-Approx I/O time: 2-5 minutes-Approx processing time: 10 seconds
Friday, July 9, 2010
VisIt: an end user visualization and analysis tool for extremely large data
• 5 major use cases: data exploration, data analysis, visual debugging, comparison, and presentation
• Demonstrated scaling to >100k cores
• Strong emphasis on building a usable, robust tool for scientists.
• Plugin model allows for wide range of capabilities
• Open source project, with direct support from DOE (SciDAC, NEAMS, ASC, BER) and NSF (TeraGrid XD)
• visitusers.org wiki with 400+ pages of documentation and over 80,000 page views
• Over 100,000 downloads
• R&D100 Award in 2005
Friday, July 9, 2010
ParaView: open-source, multi-platform data analysis and visualization application
• Based on VTK: Deep pipeline modification
• Quantitative and qualitative use cases
• Developed to process large datasets using distributed memory computing resources: laptops to supercomputers
• Superb, best-in-class rendering infrastructure
• Customizable GUI
• Access grid integration for collaboration
• Commercial support from Kitware
Friday, July 9, 2010
EnSight: leading commercial parallel visualization tool
• Specialized for engineering analysis
• CFD, FEA, etc.
• Commercial support from CEI
• Distributed rendering infrastruture
• Direct Powerwall support
• Virtual reality
• Distributed parallel data processing and rendering
Friday, July 9, 2010
Rmpi: Parallel statistical analysis
MPI (via modified Rmpi)
Lustre file system
Node 0 Node 1 Node 2
• Write data reader to bring data from parallel file system
• Develop data-parallel analysis with full capability of R on every node
Node 0 produces graphics files
Friday, July 9, 2010
Central file store
• Scalable parallel throughput
• Lustre, GPFS, Panassas, ...
• Shared between source and destination
• Provides “zero copy” access to simulation results
• Allows smartdeployment ofremotevisualizationcapabilities
Friday, July 9, 2010
Data formatting is important to data understanding
!e purpose of computing is insight, not numbers.
– Richard Hamming
!e most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but '!at's funny ...'
– Isaac Asimov
Friday, July 9, 2010
We want our data model to reflect how the scientist “thinks” of the data
Simple Complex
bits
bytes
words
arraysnamedarrays
multidimensionalarrays
coordinatefields
scalars
meshes
vectors
atoms
materials
bonds
meshrefinement
collections
timeinterpolation
Each step gets us closer to the ideal.But each step is more domain-specific.
Friday, July 9, 2010
• All data lives on a mesh
• Discretizes space into points and cells
• 1D, 2D, 3D
• All of these over time (up to 4D)
• Can have lower-dimensional meshes in ahigher-dimensional space (e.g. 2D surface in 3D space)
• Provides a place for data to be located
• Defines how data is interpolated
Meshes
Friday, July 9, 2010
Mesh structure
• 1D Curves
• 2D/3D meshes
• Rectilinear
• Curvilinear
• Unstructured
• Points
• AMR
• Molecular
• Domain decomposed
• Ghost zones/nodes
Unstructured
RectilinearCurve
Curvilinear
Points
Molecular
AMR
Friday, July 9, 2010
Variables
• Scalars, Vectors, Tensors
• Sits on points or cells of a mesh
• Points: linear interpolation
• Cells: piecewise constant
• Could have different dimensionality than the mesh (e.g. 3D vector data on a 2D mesh)
Friday, July 9, 2010
Materials
• Sometimes used in engineering codes
• Describes disjoint spatial regions at a sub-grid level
• Volume/area fractions
Friday, July 9, 2010
Parallel meshes
• Provides aggregation for meshes
• A mesh may be composed of hundreds of thousands of mesh “blocks”.
• Allows data parallelism
Friday, July 9, 2010
AMR meshes
• Mesh blocks can be associated with patches and levels.
• Allows for aggregation of meshes into AMR hierarchy levels.
Friday, July 9, 2010
Data model goals
• Rich
• Encompasses computational domain as much as possible
• Sharability
• Being able to easily move data between tools
• Being able to give data to other people
• Self description
• Changes to data shouldn’t require a change in the format
• New data should be discoverable without human intervention
• Rapid I/O
• Minimize time dumping out data
• Minimize time reading data
• Parallel I/O
• Support access by multiple threads at once on parallel platforms.
• More…
Friday, July 9, 2010
Data Storage
NetCDF
HDF5Silo
EnSight Gold
FluentFlash
ADIOS
XDMF
PDBS3D
VTK
OpenFOAMCGNS
ChomboTecplot
ViSUS
STL
Friday, July 9, 2010
Underlying libraries
• NetCDF:
• Named array/scalar storage
• Generalized attributes
• Parallel access
• HDF5:
• Named array/scalar storage
• Hierarchical storage
• Parallel read and write
• ASCII
• Simple for humans to read and write
• Horrible for computers to read, especially in parallel
Friday, July 9, 2010
Higher-level libraries and formats
• ADIOS:
• Used for very fast I/O off HPC systems
• HDF5 and native storage
• Silo:
• Very rich (parallel) data model on top of HDF5
• PDB (Protein Data Bank)
• Used for storage of molecular data
• Based on ASCII
• VTK:
• Native storage for the VTK pipeline visualization system
• Both ASCII and binary formats
Friday, July 9, 2010
Higher-level libraries and formats (cont)
• XDMF:
• Neat format that splits “data” and “metadata”
• ASCII XML file for metadata
• HDF5 file for heavyweight data.
Friday, July 9, 2010
Community standards
• NetCDF Climate and Forecast (CF) convention:
• Promotes sharing of data among climate researchers
• Standard naming
• Coordinate systems
• Dimensions, missing data, units, etc.
• Many communities build formats on top of others:
• S3D is over NetCDF
• GTC, Chombo, Enzo, PFLOTRAN, Flash, etc. are over HDF5
• …
Friday, July 9, 2010
Summary
• Parallel visualization is an important capability for gaining insight from petascale (and soon to be exascale) data.
• It’s as difficult of a job as doing the original HPC simulation itself.
• Rendering is not yet a solved problem.
• Data storage and data models are critical to high performance and scientific understanding.
Friday, July 9, 2010