earth science platform
DESCRIPTION
Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.TRANSCRIPT
AGU, Fall 2013 1December 12, 2013
The Earth Science PlatformTed Habermann, Mike Folk, The HDF Group
Services F
orm
ats
Conventions
Tools
2AGU, Fall 2013
HDF5
Formats with HDF Inside
December 12, 2013
3AGU, Fall 2013
High Performance / Parallel Computing
Problem: Support I/O and analysis needs for state-of-the-art plasma physics code
Novel Accomplishments: Ran Trillion particle VPIC simulation on
120,000 hopper cores and generated 350 TB dataset
Parallel HDF5 obtained peak 35GB/s I/O rate and 80% sustained bandwidth
Developed hybrid parallel FastQuery using FastBit to utilize multicore hardware
FastQuery took 10 minutes to index and 3 seconds to query energetic particles
SC12 paper, XLDB 2012 poster
Impact Demonstrated software scalability for writing
and analyzing ~40TB HDF5 files Enabled novel discoveries in plasma physics
*Vector Particle-in-Cell
December 12, 2013
4AGU, Fall 2013
Grouping Data and Metadata (HDF-EOS)
Grids
Grid_1 Grid_N
Data Fields Attributes
Data Field.1
Data Field.2
Swaths
Swath_1 Swath_N
Data Fields
Data Field.1
Data Field.2
Profile Fields GeolocationFields
Latitude
Longitude
Time
Colatitude
Profile Field.1
Profile Field.2
HDF File with HDF-EOS Conventions
Zonal Averages
Points
December 12, 2013
5AGU, Fall 2013
Conventions / History
Reconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (e.g., platform ephemeris) computed and appended but not applied to Level 0 data.
Derived geophysical variables at the same resolution and location as Level 1 source data.
Model Results / Variables mapped on uniform space-time grid scales, usually with some completeness and consistency.
CF
CF
HD
F-EOS
Swath
Points
Grid
ZonalAverage
CF Feature Types:PointsTimeseriesTrajectoryProfileTimeSeriesProfileTrajectoryProfile
??
December 12, 2013
2
1
3
Processing Level
6AGU, Fall 2013
Convention Governance
December 12, 2013
Community / Users Operational Data Processing System
7AGU, Fall 2013
Community
December 12, 2013
Using HDF to share data? Tweet #HDFInside
8AGU, Fall 2013
Acknowledgements
This work was partially supported by NASA contract number NNG10HP02C.
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.
December 12, 2013