Data Formats(HPC Visualization with ParaView Workshop)
Shuaib ArshadApril 23, 2014
Supported Data Types• ParaView Data (.pvd)• VTK (.vtp, .vtu, .vti, .vts, .vtr)• VTK Legacy (.vtk)• VTK Multi Block
(.vtm,.vtmb,.vtmg,.vthd,.vthb)• Partitioned VTK
(.pvtu, .pvti, .pvts, .pvtr)• ADAPT (.nc, .cdf, .elev, .ncd)• ANALYZE (.img, .hdr)• ANSYS (.inp)• AVS UCD (.inp)• BOV (.bov)• BYU (.g)• CAM NetCDF (.nc, .ncdf)• CCSM MTSD
(.nc, .cdf, .elev, .ncd)• CCSM STSD
(.nc, .cdf, .elev, .ncd)• CEAucd (.ucd, .inp)• CMAT (.cmat)• CML (.cml)• CTRL (.ctrl)• Chombo (.hdf5, .h5)• Claw (.claw)• Comma Separated Values
(.csv)• Cosmology Files
(.cosmo, .gadget2)• Curve2D (.curve, .ultra, .ult, .u)• DDCMD (.ddcmd)
• Digital Elevation Map (.dem)• Dyna3D(.dyn)• EnSight (.case, .sos)• Enzo boundary and hierarchy• ExodusII
(.g, .e, .exe, .ex2, .ex2v.., etc)• ExtrudedVol (.exvol)• FVCOM (MTMD, MTSD,
Particle, STSD) • Facet Polygonal Data• Flash multiblock files• Fluent Case Files (.cas)• GGCM (.3df, .mer)• GTC (.h5)• GULP (.trg)• Gadget (.gadget)• Gaussian Cube File (.cube)• JPEG Image (.jpg, .jpeg)• LAMPPS Dump (.dump)• LAMPPS Structure Files• LODI (.nc, .cdf, .elev, .ncd)• LODI Particle
(.nc, .cdf, .elev, .ncd)• LS-DYNA (.k, .lsdyna, .d3plot,
d3plot)• M3DCl (.h5)• MFIX Unstructred Grid (.RES)• MM5 (.mm5)• MPAS NetCDF (.nc, .ncdf)
• Meta Image (.mhd, .mha)• Miranda (.mir, .raw)• Multilevel 3d Plasma
(.m3d, .h5)• NASTRAN (.nas, .f06)• Nek5000 Files • Nrrd Raw Image (.nrrd, .nhdr)• OpenFOAM Files (.foam)• PATRAN (.neu)• PFLOTRAN (.h5)• PLOT2D (.p2d)• PLOT3D (.xyz, .q, .x, .vp3d)• PLY Polygonal File Format• PNG Image Files• POP Ocean Files• ParaDIS Files• Phasta Files (.pht)• Pixie Files (.h5)• ProSTAR (.cel, .vrt)• Protein Data Bank
(.pdb, .ent, .pdb)• Raw Image Files• Raw NRRD image files (.nrrd)• SAMRAI (.samrai) • SAR (.SAR, .sar) • SAS
(.sasgeom, .sas, .sasdata) • SESAME Tables
• SLAC netCDF mesh and mode data
• SLAC netCDF particle data• Silo (.silo, .pdb)• Spheral (.spheral, .sv)• SpyPlot CTH• SpyPlot (.case)• SpyPlot History (.hscth)• Stereo Lithography (.stl)• TFT Files• TIFF Image Files• TSurf Files• Tecplot ASCII (.tec, .tp)• Tecplot Binary (.plt)• Tetrad (.hdf5, .h5)• UNIC (.h5) • VASP CHGCA (.CHG)• VASP OUT (.OUT) • VASP POSTCAR (.POS) • VPIC (.vpc)• VRML (.wrl)• Velodyne (.vld, .rst)• VizSchema (.h5, .vsh5)• Wavefront Polygonal Data
(.obj)• WindBlade (.wind)• XDMF and hdf5 (.xmf, .xdmf)• XMol Molecule
ParaView Data Model
• Uses VTK Data Model• Fundamental data structure is data object
– Scientific dataset (Rectilinear grid, FE mesh)– Abstract data structure (graph, tree)
• Data structure Building blocks– Mesh (topology, geometry)– Attributes
VTK Data Model
Mesh
• Actual data structure vary• Common abstractions:
– Vertices– Cells
• Used to discretize a region• Various types (tetrahedra, hexahedra)
– Cells mapped to vertices by connectivity– Faces stored only for polyhedron
• Completely defined by topology and spatial coordinates of vertices
Attributes
• Defines discrete values of a field over the mesh (pressure, temperature, velocity, stress tensor)
• Stored as data arrays, and can have arbitrary number of components
• Can be associated with points, cells, or neither
Uniform Rectilinear Grid• Implicit definition of topology and point
coordinates• Complete definition requires:
– Extents – min, max indices in each direction– Origin – position of the index (0, 0, 0)– Spacing – inter-point distance, each direction
independently defined• npts_total = npts_x * npts_y * npts_z• coord = origin + index * spacing• (i, j, k) flat index = k * (npts_x * npts_y) + j *
npts_x + i• All cells are of the same type• Regular nature, require less storage, some
algorithms optimized to take advantage
Rectilinear Grid
• Implicit definition of topology and semi-implicit
definition of point coordinates
• Complete definition requires:
– Extents – min, max indices in each direction
– 3 Arrays defining coordinates in x-, y-, and z- directions,
having lengths npts_x, npts_y and npts_z respectively
• coord = (coord_array_x(i), coord_array_y(i),
coord_array_z(i))
• (i, j, k) flat index = k * (npts_x * npts_y) + j * npts_x + i
• All cells are of the same type
Curvilinear Grid
• Also called Structured Grid
• Implicit definition of topology and explicit definition
of point coordinates
• Complete definition requires:
– Extents – min, max indices in each direction
– Array of point coords – stores position of every vertex
explicitly
• coord = coord_array (idx_flat)
• (i, j, k) flat index = k * (npts_x * npts_y) + j * npts_x + i
• All cells are of the same type
AMR Dataset
• Native support
• Collection of Uniform Rectilinear
grids grouped under increasing
refinement ratios
• Support for masking (blanking) sub-
regions of the rectilinear grids using
array bytes
Unstructured Grid
• Most general primitive dataset type
• Explicit definition of topology and point
coordinates
• Significantly increased memory
requirement, so use only if previous
options can’t be used
• Supports large number of cell types, all of
which can exist within one grid
Polygonal Grid
• Polydata
• Specialized version of unstructured
grid for efficient rendering
• Consists of:
– 0D cells (vertices and polyvertices)
– 1D cells (lines and polylines)
– 2D cells (polygons and triangle strips)
Table
• Tabular dataset consisting of rows and columns
• Can be loaded using various file formats like CSV
• Can be converted to other datasets
• Filters operating on tables:
– Table to Points
– Table to Structured Grid
Multiblock Dataset
• Tree of datasets where leaf
nodes are simple datasets (all
of the above except AMR)
• Used to group together related
datasets
Multipiece Dataset
• Similar to Multiblock
• Group together datasets that are part of a
whole mesh – same type and same attributes
• Used to collect datasets produced by a
parallel sim without having to append the
meshes
• Can be produced only using certain readers
• Not possible to extract individual pieces
Introduction to HDF5
What is HDF5?
• HDF5 == Hierarchical Data Format, v5
• Open file format• Designed for high volume or complex data
• Open source software• Works with data in the format
• A data model• Structures for data organization and specification
August 7, 2013 Extreme Scale Computing HDF5 17
www.hdfgroup.org
HDF5 is designed …
August 7, 2013 Extreme Scale Computing HDF5 19
www.hdfgroup.org
• for high volume and/or complex data
• for every size and type of system (portable)
• for flexible, efficient storage and I/O
• to enable applications to evolve in their use of HDF5 and to accommodate new models
• to support long-term data preservation
HDF5 File
lat | lon | temp- - - - |- - - - - |- - - - - ‐‐‐‐‐‐‐‐‐‐‐‐‐‐12 | 23 | 3.1
15 | 24 | 4.217 | 21 | 3.6An HDF5 file is a
container that holds data objects.
August 7, 2013 Extreme Scale Computing HDF5 10 www.hdfgroup.org
HDF5 Data Model
File
Dataset
a.k.a. HDF5 Abstract Data Modela.k.a. HDF5 Logical Data Model
Link
Group
Attribute
August 7, 2013 Extreme Scale Computing HDF5 23
www.hdfgroup.org
Dataspace
Datatype
HDF5Object
s
HDF5 Dataset
• HDF5 datasets organize and contain “raw data values”.• HDF5 datatype describes individual data elements.• HDF5 dataspace describes the logical layout of the data elements.
Multi-dimensional array of identically typed data elements
Specifications for single data element and array dimensions
Dimensions
Dim_0 = 4
Dim_1 = 5
Dim_2 = 7
HDF5 Datatype
Integer 32bit LE
HDF5 Dataspace
Rank
3
August 7, 2013 Extreme Scale Computing HDF5 11 www.hdfgroup.org
HDF5 Dataspace
August 7, 2013 Extreme Scale Computing Argonne
12 www.hdfgroup.org
• Describes the logical layout of the elements in an HDF5 dataset• NULL
• no elements• Scalar
• single element• Simple array (most common)
• multiple elements organized in a rectangular array
• rank = number of dimensions• dimension sizes = number of
elements in each dimension• maximum number of elements in
each dimension• may be fixed or unlimited
HDF5 Datatypes
August 7, 2013 Extreme Scale Computing Argonne
14 www.hdfgroup.org
• Describe individual data elements in an HDF5 dataset
• Wide range of datatypes supported
• Integer
• Float
• Enum
• Array• User-defined (e.g., 13-bit integer)
• Variable length types (e.g., strings)
• Compound (similar to C structs)• Many more …
HDF5 Dataset
Dataspace: Rank = 2Dimensions = 5 x 3
Datatype: 32-bit Integer
3
5
12
August 7, 2013 Extreme Scale Computing HDF5 32 www.hdfgroup.org
How data is stored?
Chunked
Chunked & Compressed
Better access time for subsets; extendible
Improves storage efficiency, transmission speed
Contiguous (default)
Data elements stored physically adjacent to each other
Buffer in memory Data in the file
August 7, 2013 Extreme Scale Computing HDF5 33 www.hdfgroup.org
HDF5 Dataset with Compound Datatype
int16 char int32 2x3x2 array of float32Compound Datatype:
Dataspace: Rank = 2Dimensions = 5 x 3
3
5
VV VV V V V V V
August 7, 2013 Extreme Scale Computing HDF5 34 www.hdfgroup.org
HDF5 Attributes
Extreme Scale Computing HDF5August 7, 2013 18 www.hdfgroup.org
• Typically contain user metadata
• Have a name and a value
• Attributes “decorate” HDF5 objects
• Value is described by a datatype and a dataspace
• Analogous to a dataset, but do not supportpartial I/O operations; nor can they be compressed or extended
HDF5 Groups and Links
lat | lon | temp- - - - |- - - - - |- - - - - ‐‐‐‐‐‐‐‐‐‐‐‐‐‐12 | 23 | 3.1
15 | 24 | 4.217 | 21 | 3.6
Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configura . on: Standard 3
/
SimOutViz
HDF5 groups and links organize data objects.
Every HDF5 file has a root group
Parameters 10;100;1000
Timestep 36,000
August 7, 2013 Extreme Scale Computing HDF5 29 www.hdfgroup.org
HDF5 Home Page
August 7, 2013 Extreme Scale Computing HDF5 38 www.hdfgroup.org
HDF5 home page: http://hdfgroup.org/HDF5/• Latest release: HDF5 1.8.11 (1.8.12 coming in
November 2013)
HDF5 source code:• Written in C, and includes optional C++, Fortran 90
APIs, and High Level APIs• Contains command-line utilities (h5dump, h5repack,
h5diff, ..) and compile scripts
HDF5 pre-built binaries:• When possible, include C, C++, F90, and High Level
libraries. Check ./lib/libhdf5.settings file.• Built with and require the SZIP and ZLIB external
libraries
HDF5 Software Layers & Storage
HDF5 File Format File Split
Files
File on Parallel Filesystem
Other
Virtual File Layer
I/O Drivers
Posix I/ O
Split Files MPI I/O Custom
Internals Memory Mgmt
Datatype Conversion
Filters Chunked Storage
Version Compa . bility
and so on…
Language Interfaces
C, Fortran, C++
HDF5 Data Model ObjectsGroups, Datasets, Att r i b u t e s , …
Tunable Proper. esChunk Size, I/O Driver, …
HD
F5 L
ibra
rySt
orag
e
netCDF- 4‐High Level APIs
HDFview
Apps h5dump
Java InterfaceH5Part
API
August 7, 2013 Extreme Scale Computing HDF5 39 www.hdfgroup.org
Useful Tools For New Users
h5dump:Tool to “dump” or display contents of HDF5 files
August 7, 2013 Extreme Scale Computing HDF5 40 www.hdfgroup.org
h5cc, h5c++, h5fc:Scripts to compile applications
HDFView:Java browser to view HDF5 files http://www.hdfgroup.org/hdf-java-html/hdfview/
HDF5 Examples (C, Fortran, Java, Python, Matlab) http://www.hdfgroup.org/ftp/HDF5/examples/
General Programming Paradigm
August 7, 2013 Extreme Scale Computing HDF5 41 www.hdfgroup.org
• Object is opened or created• Object is accessed, possibly many times
• Object is closed
• Properties of object are optionally defined• Creation properties (e.g., use chunking
storage)• Access properties
The General HDF5 API
August 7, 2013 Extreme Scale Computing HDF5 42 www.hdfgroup.org
• C, Fortran, Java, C++, and .NET bindings• IDL, MATLAB, Python (H5Py, PyTables)
• C routines begin with prefix H5?? is a character corresponding to the type of object the function acts on
Example Functions:
H5D : Dataset interfaceH5F : File interfaceH5S : dataSpace interface
e.g., H5Dread e.g., H5Fopen e.g., H5Sclose
The HDF5 API
• For flexibility, the API is extensive• 300+ functions
• This can be daunting… but there is hope• A few functions can do a lot• Start simple• Build up knowledge as more features are
needed
Victorinox Swiss Army Cybertool 34
August 7, 2013 Extreme Scale Computing HDF5 43 www.hdfgroup.org
Basic Functions
H5Fcreate (H5Fopen)
H5Screate_simple/H5Screate
H5Dcreate (H5Dopen)
H5Dread, H5Dwrite
H5Dclose
H5Sclose
H5Fclose
create (open) File
create dataSpace
create (open) Dataset
access Dataset
close Dataset
close dataSpace
close File
August 7, 2013 Extreme Scale Computing HDF5 44 www.hdfgroup.org
Other Common Functions
DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) H5Dget_space
Groups: H5Gcreate, H5Gopen, H5Gclose
Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite
Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate
August 7, 2013 Extreme Scale Computing HDF5 30 www.hdfgroup.org
h5py Package
• Pythonic interface to HDF5 binary data format• Allows storage for large sized numerical data• Uses Numpy and Python metaphors like
dictionary and Numpy array syntax• More information:
http://www.h5py.org/
h5py – Create File
h5py – Create Dataset
h5py – Create Dataset (2)
h5py – Create Attribute
h5py – Create Group
h5py – Create Groups
h5py – Create Datasets in Groups
Introduction to XDMF
XDMF
• eXtensible Data Model and Format• XML based• Standardized method to exchange scientific
data between HPC codes and tools
Data Format• Raw data to be manipulated• Type, precision, location, rank, and dimensions completely describe any
dataset• Light data
– Data description (metadata)– Typically less than 1000 values– Can be passed around easily– Stored in XML
• Heavy data– Actual raw values of the dataset– Megabytes, Terabytes etc– Movement needs to be kept at minimum– Typically stored in HDF5, raw, or similar data formats
• Redundantly stored in both XML and HDF5
Data Model
• Describes the intended use of the data• Stored using XML• Targeted at scientific simulation data focusing on scalars,
vector, and tensors defined on a grid• Structured and Unstructured grids are described using their
topology and geometry• Calculated, time varying values are attributes of the grid• The actual values for the grid geometry, connectivity and
attribute values are contained in data format• Separation of data format and model allows for efficient
storage
Data Model contd…
• HPC data is viewed as hierarchy of Domains• Domain must contain at least 1 grid• Grid
– Basic representation of both geometric and computed/measured values
– Group of elements with structured or unstructured topology• Geometry
– Specifies X, Y, and Z positions of the Grid• One or more attributes
– Store any values associated with the Grid or individual cells
XDMF API
• C++ API to read write XDMF data from applications
• Wrappers for Python, Tcl, and Java
XML<ElementTag
AttributeName=“AttributeValue”AttributeName=“AttributeValue”… >Cdata
</ElementTag>
• Case sensitive• Made up of:
– Elements– Entities– Processing information
• Element:– <tag Name1=“Value1” Name2=“Value2”> Cdata </tag>
– <!-- This is a comment -->• “Well formed” XML
– Syntactically correct (quotes match, elements end properly)• “Valid” XML
– Conforms to the Schema or DTD• 2 extensions used
XInclude
• Allows for inclusion of files that now well formed XML
<Xdmf Version=“2.0” xmlns:xi=“[http://www.w3.org/2001/Xinclude]”><xi:include href=“Example3.xmf”/></Xdmf>
XPath
• Allows for elements in the XML document and the API to reference specific elements
/Xdmf/Domain/Grid
/Xdmf/Domain/Grid[10]
/Xdmf/Domain/Grid[@Name=“Copper Plate”]
Minimal file
• All valid XDMF should appear between <Xdmf> and </Xdmf>
<?xml version=“1.0” ?><!DOCTYPE Xdmf SYSTEM “Xdmf.dtd” []><Xdmf Version=“2.0”></Xdmf>
Entities
• XML’s basic substitution mechanism of entities good for improving readability
<?xml version=“1.0” ?><!DOCTYPE Xdmf SYSTEM “Xdmf.dtd” [<!ENTITY cellDimZXY “45 30 120”]><Xdmf Version=“2.0”>...
&cellDimZXY;...</Xdmf>
Elements<?xml version=“1.0” ?><!DOCTYPE Xdmf SYSTEM “Xdmf.dtd” [<!ENTITY cellDimZXY “45 30 120”]><Xdmf Version=“2.0”>
<Domain><Grid>
<Topology> </Topology><Geometry> </Geometry><Attribute> </Attribute><Attribute> </Attribute>
</Grid></Domain>
</Xdmf>
DataItem
• Uniform - single array of values<DataItem Dimensions=“3”>
1.0 2.0 3.0</DataItem>
DataItem
• Uniform contd …<DataItem Dimensions=“3”>
1.0 2.0 3.0</DataItem>
<DataItem ItemType=“Uniform”Format=“XML”NumberType=“Float”Precision=“4”Rank=“1” Dimensions=“3”>1.0 2.0 3.0
</DataItem>
DataItem
• Uniform contd …<DataItem ItemType=“Uniform”
Format=“HDF”NumberType=“Float”Precision=“8”Dimensions=“64 128 256”>OutputData.h5:/Results/Iteration 100/Part 2/Pressure
</DataItem>
<DataItem ItemType=“Uniform”Format=“Binary”Dimensions=“64 128 256”>PressureFile.bin
</DataItem>
DataItem
• Collection – 1D array of DataItem• Tree – Hierarchical structure of DataItem<DataItem Name=“Tree Example” ItemType=“Tree”>
<DataItem ItemType=“Tree”><DataItem Name=“Collection1” ItemType=“Collection”>
<DataItem Dimensions=“3”>1.0 2.0 3.0
</DataItem><DataItem Dimensions=“4”>
4 5 6 7</DataItem>
</DataItem></DataItem><DataItem Name=“Collection2” ItemType=“Collection”>
<DataItem Dimensions=“3”>
7 8 9</DataItem><DataItem Dimensions=“4”>
10 11 12 13</DataItem>
</DataItem><DataItem ItemType=“Uniform”
Format=“HDF”NumberType=“Float” Precision=“8”Dimensions=“64 128 256”>OutputData.h5:/Results/Iteration 100/Part
2/Pressure</DataItem>
</DataItem>
Tree Example
Collection 1
1.0 2.0 3.0
4 5 6 7
Collection 2
7 8 9
10 11 12 13
OutputData.h5:/Results/Iteration
100/Part 2/Pressure
DataItem
• HyperSlab – subset of some other DataItem, specified by:– Start– Stride– Count
Example:– Source data: HDF5 file– Source dimensions: 100 x 200 x 300 x 3– Start at [0, 0, 0, 0]– End at [50, 100, 150, 2]– Include every other plane
DataItem
• Hyperslab contd …<DataItem ItemType=“Hyperslab” Dimensions=“25 50 75 3”
Type=“Hyperslab><DataItem Dimensions=“3 4” Format=“XML”>
0 0 0 02 2 2 125 50 75 3
</DataItem><DataItem Name=“Points” Dimensions=“100 200 300 3”
Format=“HDF”>MyData.h5:/XYZ
</DataItem></DataItem>
Grid
• Container of info related to 2D and 3D points, structured or unstructured connectivity, and assigned values
• Types:– Uniform – a homogeneous single grid– Collection – array of uniform grids with same
attributes– Tree – hierarchical group– SubSet – portion of another grid
Grid contd …<Grid Name=“Car Wheel” GridType=“Tree”>
<Grid Name=“Tire” GridType=“Uniform”><Topology ...<Geometry ...
<Grid><Grid Name=“Lug Nuts” GridType=“Collection”>
<Grid Name=Lug Nut 0” GridType=“Uniform”><Topology ...<Geometry ...
</Grid><Grid Name=Lug Nut 1” GridType=“Uniform”>
<Topology ...<Geometry ...
</Grid></Grid>...
Topology
• Describes general organization of data– Structured (2DSMesh, 2DRectMesh,
2DCoRectMesh, 3DSMesh, 3DRectMesh, 3DCoRectMesh)
– Linear (Polyvertex, Polyline, Polygon, …)– Quadratic (Edge_3, Tri_6, Quad_8, …)– Arbitrary (Mixed)
Geometry
• Describe XYZ values of the mesh• Organization
– XYZ– XY– X_Y_Z– VXVYVZ– ORIGIN_DXDYDZ– ORIGIN_DXDY
Attribute• Defines values associated with the mesh• Values:
– Scalar– Vector– Tensor– Tensor6– Matrix
• Centered:– Node– Edge– Face– Cell– Grid
Example<?xml version=“1.0” ?><!DOCTYPE Xdmf SYSTEM “Xdmf.dtd” [<!ENTITY HeavyData “claw.ptc0000”>]><Xdmf Version=“2.0”>
<Domain><Grid GridType=“Uniform”>
<Topology TopologyType=“3DCoRectMesh” Dimensions=“257 5 5”/>
<Geometry GeometryType=“Origin_DxDyDz”><DataItem Dimensions=“3”
Format=“XML”>0 0 0
</DataItem>
Example contd …<DataItem Dimensions=“3”
Format=“XML”>0.0078125 0.5 0.5
</DataItem></Geometry><Attribute Name=“A1” AttributeType=“Scalar”
Center=“Cell”><DataItem ItemType=“HyperSlab”
Dimensions=“256 4 4”Type=“HyperSlab”><DataItem Dimensions=“3 4”
Format=“XML”>0 0 0 01 1 1 4256 4 4 1
</DataItem>
Example contd …<DataItem Dimensions=“256 4
4 4” NumberType=“Float”
Precision=“8” Format=“Binary”
Endian=“Big” Seek=“8”>&HeavyData;
</DataItem></DataItem>
</Attribute></Grid>
</Domain></Xdmf>
Resources
• XDMF main pagehttp://www.xdmf.org/index.php/
Main_Page
Questions?