earth science platform

8
December 12, 2013 The Earth Science Platform Ted Habermann, Mike Folk, The HDF Group 1 Services Formats C o n v e n t i o n s Tools AGU, Fall 2013

Upload: ted-habermann

Post on 18-Nov-2014

275 views

Category:

Technology


0 download

DESCRIPTION

Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.

TRANSCRIPT

Page 1: Earth Science Platform

AGU, Fall 2013 1December 12, 2013

The Earth Science PlatformTed Habermann, Mike Folk, The HDF Group

Services F

orm

ats

Conventions

Tools

Page 2: Earth Science Platform

2AGU, Fall 2013

HDF5

Formats with HDF Inside

December 12, 2013

Page 3: Earth Science Platform

3AGU, Fall 2013

High Performance / Parallel Computing

Problem: Support I/O and analysis needs for state-of-the-art plasma physics code

Novel Accomplishments: Ran Trillion particle VPIC simulation on

120,000 hopper cores and generated 350 TB dataset

Parallel HDF5 obtained peak 35GB/s I/O rate and 80% sustained bandwidth

Developed hybrid parallel FastQuery using FastBit to utilize multicore hardware

FastQuery took 10 minutes to index and 3 seconds to query energetic particles

SC12 paper, XLDB 2012 poster

Impact Demonstrated software scalability for writing

and analyzing ~40TB HDF5 files Enabled novel discoveries in plasma physics

*Vector Particle-in-Cell

December 12, 2013

Page 4: Earth Science Platform

4AGU, Fall 2013

Grouping Data and Metadata (HDF-EOS)

Grids

Grid_1 Grid_N

Data Fields Attributes

Data Field.1

Data Field.2

Swaths

Swath_1 Swath_N

Data Fields

Data Field.1

Data Field.2

Profile Fields GeolocationFields

Latitude

Longitude

Time

Colatitude

Profile Field.1

Profile Field.2

HDF File with HDF-EOS Conventions

Zonal Averages

Points

December 12, 2013

Page 5: Earth Science Platform

5AGU, Fall 2013

Conventions / History

Reconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (e.g., platform ephemeris) computed and appended but not applied to Level 0 data.

Derived geophysical variables at the same resolution and location as Level 1 source data.

Model Results / Variables mapped on uniform space-time grid scales, usually with some completeness and consistency.

CF

CF

HD

F-EOS

Swath

Points

Grid

ZonalAverage

CF Feature Types:PointsTimeseriesTrajectoryProfileTimeSeriesProfileTrajectoryProfile

??

December 12, 2013

2

1

3

Processing Level

Page 6: Earth Science Platform

6AGU, Fall 2013

Convention Governance

December 12, 2013

Community / Users Operational Data Processing System

Page 7: Earth Science Platform

7AGU, Fall 2013

Community

December 12, 2013

Using HDF to share data? Tweet #HDFInside

Page 8: Earth Science Platform

8AGU, Fall 2013

Acknowledgements

This work was partially supported by NASA contract number NNG10HP02C.

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.

December 12, 2013

[email protected]