wmo et-adrs: 23-25 april 2008 1 wmo et-adrs hierarchical data format (hdf) manuel fuentes (ecmwf)...

39
WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

Upload: ella-bowman

Post on 27-Mar-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 1

WMO ET-ADRSHierarchical Data Format (HDF)

Manuel Fuentes

(ECMWF)

Erdem Erdi

(Turkish State Meteorological Service)

Page 2: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 2

Outline

Brief introduction to HDF

SWOT Analysis

Practical examples

Page 3: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 3

Hierarchical Data Format: HDF

HDF is a file format

HDF files are self-described

HDF technologies at present include two data management formats (HDF4 and HDF5) and libraries, a modular data browser/editor, associated tools and utilities, and a conversion library

Both HDF4 and HDF5 were designed to be a general scientific format, adaptable to virtually any scientific or engineering application, and also have been used successfully in non-technical areas

HDF5 is particularly good at dealing with data where complexity and scalability are important

Page 4: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 4

Features provided by HDF5 technology

Unlimited size, extensibility, and portability

General data model

Unlimited variety of datatypes

Flexible, efficient I/O

Flexible data storage

Data transformation and complex subsetting

Page 5: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 5

HDF5: Unlimited size, extensibility, and portability

HDF5 does not limit the size of files or the size or number of objects in a file.

The HDF5 format and library are extensible and designed to evolve gracefully to satisfy new demands.

HDF5 functionality and data is portable across virtually all computing platforms and is distributed with C, C++, Java, and Fortran90 programming interfaces.

Page 6: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 6

HDF5: General data model

The HDF5 data model supports complex data relationships and dependencies through its grouping and linking mechanisms.

HDF5 accommodates many common types of metadata and arbitrary user-defined metadata.

Page 7: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 7

HDF5: Unlimited variety of datatypes

HDF5 supports a rich set of pre-defined datatypes as well as the creation of an unlimited variety of complex user-defined datatypes.

Datatype definitions can be shared among objects in an HDF file, providing a powerful and efficient mechanism for describing data.

Datatype definitions include information such as byte order (endian), size, and floating point representation, to fully describe how the data is stored, insuring portability to other platforms.

Page 8: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 8

Data model and datatypes

Page 9: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 9

HDF5: Flexible, efficient I/O

HDF5, through its virtual file layer, offers extremely flexible storage and data transfer capabilities. Standard (Posix), Parallel, and Network I/O file drivers are provided with HDF5.

Application developers can write additional file drivers to implement customized data storage or transport capabilities.

The parallel I/O driver for HDF5 reduces access times on parallel systems by reading/writing multiple data streams simultaneously.

Page 10: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 10

HDF5: Flexible data storage

HDF5 employs various compression, extensibility, and chunking strategies to improve access, management, and storage efficiency.

HDF5 provides for external storage of raw data, allowing raw data to be shared among HDF5 files and/or applications, and often saving disk space.

Page 11: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 11

HDF5: Data transformation and complex subsetting

HDF5 enables datatype and spatial transformation during I/O operations.

HDF5 data I/O functions can operate on selected subsets of the data, reducing transferred data volume and improving access speed.

Page 12: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 12

Governance: The HDF Group

The mission of The HDF Project is

to develop, promote, deploy and support open and free technologies that facilitate scientific data exchange, access, analysis, archiving and discovery

to ensure long-term availability and support for HDF technologies, and by extension, long-term accessibility of data stored using HDF technologies

The HDF group currently includes 15 full time staff members and 3 to 5 students. The group’s annual budget is $2.1 million, which is mostly provided by the government sector

Page 13: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 13

Copyright

http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html

HDF5 (Hierarchical Data Format 5) Software Library and Utilities with Copyright 2006-2008 by The HDF Group (THG).

NCSA HDF5 (Hierarchical Data Format 5) Software Library and Utilities with Copyright 1998-2006 by the Board of Trustees of the University of Illinois.

All rights reserved.

Page 14: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 14

Copyright (cont.)

Redistribution and use in source and binary forms, with or without modification, are permitted for any purpose (including commercial purposes) provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer

Redistributions in binary form must reproduce the above copyright notice (which is on the previous slide) , this list of conditions, and the following disclaimer in the documentation and/or materials provided with the distribution

In addition, redistributions of modified forms of the source or binary code must carry prominent notices stating that the original code was changed and the date of the change

All publications or advertising materials mentioning features or use of this software are asked, but not required, to acknowledge that it was developed by The HDF Group and by the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign and credit the contributors

Neither the name of The HDF Group, the name of the University, nor the name of any Contributor may be used to endorse or promote products derived from this software without specific prior written permission from THG, the University, or the Contributor, respectively

DISCLAIMER: THIS SOFTWARE IS PROVIDED BY THE HDF GROUP (THG) AND THE CONTRIBUTORS "AS IS" WITH NO WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED. In no event shall THG or the Contributors be liable for any damages suffered by the users arising out of the use of this software, even if advised of the possibility of such damage

Page 15: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 15

SWOT Analysis: Criteria

Ability to present information pertinent to WMO Programmes

Ability to encode textual information, such as warnings

Ability for usage in operational data exchanges

Ability for usage in transmission of information to users outside NMHSs

Ability for usage in storage systems by NMHSs, centres or other users

Compliance and status with regard to existing standards

Inter-operability, translation back and forward to other DRSs

Can it be used to envelope objects

Available and widespread support (skills and technology)

Page 16: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 16

SWOT : Present information pertinent to WMO

Ability/suitability to present information pertinent to WMO Programmes and Member needs including weather, climate, water, atmospheric constituents, oceanography, aviation and other related environmental information

Any data of 2-D, 3-D meteorology, hydrology or similar science can be handled

Used by satellite applications in meteorology due to suitability for large and complex data.

There are not too many tools for non-satellite meteorological data

It’s not clear how to handle millions of bulletins:

Group bulletins into a big file

1 bulletin per file (minimum HDF5 file size: 2 Kbyte)

Page 17: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 17

SWOT : Present information for pictorial display

Ability/suitability to present information for pictorial display

HDF can store and present graphical data with 2 or 3 dimensions, allows for raster and vectors. Tools can display information in graphical form

Page 18: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 18

SWOT : Encode textual information

Ability/suitability to encode textual information, such as warnings

HDF can store textual information of any length

Suitable for storing metadata

Page 19: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 19

SWOT : Encode Metadata

Page 20: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 20

SWOT : Usage in operational data exchanges

Ability/suitability for usage in operational data exchanges (real time or otherwise) between NMHSs and centres. Including information regarding existing usage especially with regard to extent of use

EUMETSAT dissemination (EUMETCAST) supports HDF as delivery format

There is no naming convention for satellite data (only TERRA & AQUA share naming convention, because the same team developed the 2 satellites). Otherwise each satellite has different naming convention and order of elements in file

There are 2 attempts to standardize satellite data in HDF:

• KNMI-HDF5: Special library for encoding

• HDF-EOS: TERRA, AQUA & Petabytes more

Page 21: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 21

SWOT : Usage in operational data exchanges

An important portion of the operational satellite based meteorological data and products are distributed to the meteorological community in the HDF format, in near-realtime or non-realtime (archive).

Some examples are

EUMETSAT SAF Products (NWC SAF, LAND SAF)

EUMETSAT EPS data

NASA EOS Data and products

Page 22: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 22

SWOT : Transmission of information to users outside NMHS

Ability/suitability for usage in transmission of information to users outside NMHSs or centres. Including information regarding existing usage especially with regard to extent of use

HDF is widely used in scientific communities:

• Universities, Research labs

• Space agencies (like NASA and EUMETSAT)HDF is mainly used for satellite data.

Use of HDF in a variety of disciplines and users

• Encourages development of tools

• Makes it easy to use outside NMHSsSoftware publicly available with supported tools

Page 23: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 23

SWOT : Usage in storage systems

Ability/suitability for usage in storage systems by NMHSs, centres or other users. Including information regarding existing usage especially with regard to extent of use

Parallel I/O

Machine independent

Compression

The HDF Group is committed to ensure the long-term accessibility of HDF-stored data

EUMETSAT does not store data in HDF, but convert from raw

NASA archives the Earth Observing System data in HDF

“Grouping” of data at archiving may impose restrictions on how data can be retrieved

Page 24: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 24

SWOT : Standards

Compliance and status with regard to with existing standards. Are they open standards? Which body overseas them. Is there any proprietary nature to them. Are they flexible enough to accommodate our current and foreseen needs. How are they updated, is it a straight forward process

HDF format is very suitable for GIS (as it can handle both data and metadata in the same file). However, it is not widely used for GIS because of the lack of a convention (schema)

It is governed by The HDF Group

Compression: SZIP method is proprietary, ZLIB is open

HDF licence seems flexible

The library is updated regularly in a straight forward manner

Page 25: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 25

SWOT : Interoperability

How suitable is the DRS to the WIS and to developing the appropriate metadata? Is existing documentation good? How much variance is there in current implementations? Are the existing flavours inter-operable?

HDF can meet the requirements regarding metadata required for the WIS

Documentation is good with lots of examples

There are 2 implementations:

Tools to convert from HDF4 to HDF5

No direct inter-operability between the 2 implementations

Page 26: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 26

SWOT : Conversion to/from other DRS

What are the issues for translating back and forward to other DRSs?

Translation could be loss-less when using same encoding method (not standard)

Encoding: Offset and scale factor. Tools are not aware of encoding

Compression can be used instead of encoding in order to avoid larger files

Compression is transparent for users

Native data types are 1, 2, 4, 8 bytes

Page 27: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 27

SWOT : Envelope objects

Can they be used to envelope objects or act as a pseudo-carrier for other data formats?

HDF can handle/envelope any kind of data format either binary or ASCII

HDF can handle BLOBs (stream of bytes)

Page 28: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 28

SWOT: Support, skills and technology

Available and widespread support for the DRS (skills and technology)

The HDF Group’s commitment to:

• Support HDF

• Ensure long-term accessibility of the dataEstablished user community

Page 29: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 29

SWOT Summary: Strengths

HDF5 can store and present 2-D or 3-D data (gridded fields), together with metadata

Rich set of predefined datatypes and data relationships

High performance features:

Parallel I/O

Unlimited dimensions

Compression

Unlimited size and amount of data

I/O functions can operate on subsets of data

Open data format and free software (libraries and tools)

Operational services (EUMETCAST) support HDF

Page 30: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 30

SWOT Summary: Weaknesses

HDF is a file format, as opposed to a message/bulletin format (like GRIB or BUFR)

There is no convention:

Names to use

Order in which to store elements

May not handle well point observation data

There aren’t many tools for meteorological data

Comparison with NetCDF:

HDF5’s general data model makes writing data more difficult than NetCDF

HDF5 will be the storage format for NetCDF4

Page 31: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 31

SWOT Summary: Opportunities

Using HDF5 may improve inter-operability with other disciplines

Using HDF5 may improve usability of meteorological data outside NMHSs

Software publicly available, with numerous (general) tools and programming languages

Page 32: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 32

SWOT Summary: Threats

The HDF format is developed and maintained by a single group (The HDF Group). Any problem with funding could jeopardise the existence of the format or its support

Meteorology would be very small community compared to other users of HDF. Requirements of the Meteorological community may not be so important for the HDF community

Page 33: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 33

Practical Examples

Data received at ECMWF, converted to BUFR, then used by Forecasting System:

HDF4

Microwave Brightness Temperature from Tropical Rainfall Measuring Mission (trmm)

Rainfall from Tropical Rainfall Measuring Mission (trmm)

HDF5

METOP GOME-2 total column ozone data

Aura OMI ozone data

Each data stream has its own conversion tool HDF to BUFR

Page 34: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 34

Practical Examples

Eumetsat

HDF-EOS

HDF-KNMI

We haven’t found any examples of field observations (SYNOP, METAR, radiosondes..)

Page 35: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 35

Practical Example: MODIS swath data from channel 3

Page 36: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 36

Practical Example: 3-D data

Schwarzschild metric (spatial components only)

Page 37: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 37

Practical Example: MSG total precipitable water

Page 38: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 38

Practical Example: SAF cloudtype product

Page 39: WMO ET-ADRS: 23-25 April 2008 1 WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service)

WMOET-ADRS: 23-25 April 2008 39

Conclusion

HDF5 is going to be the base for NetCDF4. It makes more sense to focus on NetCDF4 than HDF5

Support for subsetting

Parallel I/O

Unlimited dimensions

Compression

Remove current limitations on file size

HDF is a file format as opposed to GRIB/BUFR which are message (or bulletin) formats

HDF might not be suitable for operational exchange of meteorological data between NMHSs, but to present meteorological information to other users/disciplines