cages/data workshop mags data session · all cdf development (ports, new features, etc.) and user...

31
CAGES/Data Workshop MAGS Data Session Primer Documents Descriptions of Various Data Formats MAGS Data Archive Plan MAGS “Legacy” Archive Plan MAGS Data Access Policy Partnership with Mackenzie GEWEX Study (MAGS) MAGS Data Documentation Guidelines MAGS Documentation and Archive Guidelines: Clarification for Models

Upload: others

Post on 20-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

CAGES/Data Workshop

MAGS Data Session

Primer Documents

• Descriptions of Various Data Formats

• MAGS Data Archive Plan

• MAGS “Legacy” Archive Plan

• MAGS Data Access Policy

• Partnership with Mackenzie GEWEX Study (MAGS)

• MAGS Data Documentation Guidelines

• MAGS Documentation and Archive Guidelines: Clarification for Models

Page 2: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management
Page 3: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS Archive

• Descriptions of Various Data Formats

• MAGS Data Archive Plan

Page 4: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management
Page 5: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Descriptions of Various Data Formats:

Proprietary:RPN StandardThe standard model format used by RPN. This format is largely proprietary and the necessary libraries areavailable only on a limited selection of supported platforms (SGI, HP/UX, Linux). This is the format thatthe CMC GEWEX (RFE/GEM) archives resides in.

Converters:fst2idl - interface RPN standard files with IDL [must be on supported platform]fst2dir - converts an RPN standard file into an ASCII directory and a myriad of small data filessome converters to ASCII format exist as well

CCRNThis is the CCRN format used by the GCM and CRCM. The data format is open.

Converters:Some converters exist between RPN and CCRN format at CMC.

Widely Supported Meteorological Formats:(from: http://rsd-www.nrl.navy.mil/7214/data_fmt.html, http://www.cgd.ucar.edu/cas/tn404/text/tn404_5.html)

BUFRThe World Meteorological Organization (WMO) code form FM 94 BUFR (Binary Universal Form for theRepresentation of meteorological data) is a binary code designed to represent, employing a continuousbinary stream, any meteorological data. There is, however, nothing uniquely meteorological about BUFR.The meteorological emphasis is the result of the origin of the code. The code form may be applied to anynumerical or qualitative data type.

The form can get quite complex which allows a quite compact representation of the information. Because,the BUFR format can handle a wide variety of data set types (radiosondes, surface obs, custom data sets,etc.) often special purpose readers are developed for specific data types. This not because of differences inthe reading (decoding) the messages, but due to different methods of handling the data once it is decoded.General purpose programs exist for creating ASCII dumps of the messages contents.

The standard describes the structure of a binary message, and as such describes the file format. What, itdoes not specify is how multiple messages may be combined into one file. The practice in the usercommunity is to concatenate messages together and decoder libraries will process this file structure. Thestandard also does not distinguish between files created with FORTRAN and C. The FORTRAN createdfiles contain header and footer records while the C created files do not. This is due to differences betweenthe IO standards of the two languages. However, most decoder libraries take this into account and can readmessages created with either FORTRAN or C.

The Naval Research Laboratory has developed a BUFR library written in C. The current version can beused to develop programs to create and decode BUFR messages using the commonly accepted BUFRpractices. It also makes use of external master and local tables. Some features of the BUFR specification arenot implemented in this or other available BUFR libraries.

A BUFR to CDF converter exists, but they are Sun (4.1.1) and VAX specific [information dated: 1995].

Page 6: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Common Data Format (CDF)CDF (Common Data Format) was initially developed by NASA Goddard about 1980 as an interface and atoolkit for archival and access to multidimensional data on a VAX using VMS FORTRAN. Over the yearsit has evolved into a machine-independent standard and is often used by different NASA groups for storingspace and earth science data.

Supported by IDL.

GRIBGRIB is the World Meteorological Organization (WMO) format for GRIdded Binary data. It is thestandard used by two of the world's largest operational meteorological centers (NMC and ECMWF). It iswidely used in the international meteorological community, the NWS, FNMOC, and AFGWC, for theinterchange of gridded data fields. The standard allows for local extensions, which are typically used withina specified group. (i.e., The NWS has local extensions that it uses with in its organization.) All messagesexchanged between centers do not use these local extensions unless prior arrangements have been made, inwhich case the local extensions to the tables must be distributed prior to their use.

Converters:IDL subroutines to read are availableVarious readers and converters are available in the Public Domain

Hierarchical Data Format (HDF)HDF (Hierarchical Data Format) is a general, extensible scientific data exchange format created by NCSA.There are two main versions: HDF4 and HDF5 (introduced in 1999). HDF4 is incompatible with HDF5.HDF emphasizes a single common format for the data, on which many interfaces can be built. A netCDFinterface to HDF4 is provided but there is no support for mixing HDF and netCDF structures. In otherwords, HDF4 software can read HDF and netCDF but can only write in HDF4. Both HDF4 and HDF5 aremore flexible than cetCDF but also are more complicated. HDF is often used to archive and transmit rasterimages. A extension of this format (HDF-EOS) is used by the EOS satellite information.

Converters:Supported by IDL, and other commercial graphics suitesNetCDF2HDF - converts NetCDF files to HDFHDF2JPG - converts HDF to JPG images

Network Common Data Format (NetCDF)NetCDF (Network Common Data Form) is an interface for scientific data access which implements amachine-independent, self-describing, extendible file format. NetCDF was originally developed by Unidatafor the storage and exchange of data within the space and earth science communities. It is based on theearlier work on the Common Data Form (CDF). Strangely, netCDF is not compatible with the NASA CDFand no translation software currently exists. (At the time of this publication [1995], NASA is working on anew CDF which should be compatible with netCDF.) netCDF emphasizes a single common interface todata, implemented on top of an architecture independent representation. Access to data with netCDF files isdetermined by user written software. Access can be random or sequential.The only kind of data structure directly supported by the netCDF abstraction is a collection ofmultidimentional variables with attached vector attributes. NetCDF is not particularly well suited for storinglinked lists, trees, sparse matrices or other kinds of data structures requiring pointers.

Page 7: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Converters:Supported by IDLncdump / ncgen - netcdf <--> ASCII translatorsMexCDF interface for MATLAB MexEPS interface for MATLAB ReadNC - module for SGI/NAG Iris Explorer grads2nc - converts GrADS data files to netCDF gribtonc - a GRIB to netCDF converter (part of Unidata's "decoders" package) nc2v5d - U. Wis. is working on a converter to allow Vis5D to work with netCDF data DDI - conversion between netCDF, DRS, HDF, AVS, Explorer

Less Common Data Formats:Spatial Data Transfer Standard (SDTS)The SDTS is a language for communicating spatial information. U.S. Federal agencies developed it to allowagencies to share spatial data among applications which use different hardware, software, and operatingsystems. It was adopted in 1992 as Federal Information Processing Standard (FIPS) 173 and is the standardfor spatial data exchange for U.S. Federal Government agencies and is now ISO 8211. The SDTS specifiesexchange constructs, addressing formats, structure, and content for spatially referenced vector and raster(including gridded) data.

IDL subroutines to read are available.

Page 8: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Features of Common Scientific Data Formats; Access Information(http://www.cgd.ucar.edu/cas/tn404/text/tn404_app-b.html)

Table B.1Comparison of three scientific-data-management systems

Feature CDF HDF netCDF

Languages supported C, f77 C, f77, f90, C++ C, f77, f90, C++

Inherent data types char, short, byte, short, byte, char, int, float, long, float, short, long, double, double float, string doubleUser-definable types no yes noData-conversion method XDR, native XDR, native XDRMaximum array dimension 10 unlimited 32Extended array dimension yes yes yesHyperset access yes yes yesUser-definable attributes yes yes yesAttribute types any any anyNamed dimensions no yes yesArray-index ordering row, column row, column row, columnShareability yes yes yesCompression no yes noSupporting tools yes many ncdump, ncgen, a few othersAdapted from: "Software for Portable Scientific Data Management," 1993: Brown, et al., Computers in Physics, 7, 304-308

GRIB is not a "tool" for scientific data management. It is designed for efficient archival and transmission oftwo dimensional gridded arrays. It is a flat file format which is "quasi-self-describing" (a table look-upprocedure is used). Access would be sequential. It is used for archival by the world's largest operationalmeteorological centers (NMC and ECMWF).

Table B.2More CDF, netCDF, HDF, GRIB information

Format ftp URLCDF nssdca.gsfc.nasa.gov

cd pub/cdfhttp://nssdc.gsfc.nasa.gov/cdf/cdf

netCDF unidata.ucar.educd pub/netcdf

http://www.unidata.ucar.edu/packages/netcdf

HDF ftp.ncsa.uiuc.educd HDF

http://hdf.ncsa.uiuc.edu:8001/

GRIB ncardata.ucar.educd libraries/grib

ftp://ncardata.ucar.edu/libraries/grib

Page 9: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

[The following comparison is a bit biased as it comes from a CDF site, but there is still some interestinginformation](from: http://nssdc.gsfc.nasa.gov/cdf/html/FAQ.html)

What are the differences between CDF and netCDF, and CDF and HDF?

The differences between the following formats are based on a high level overview of each. The informationwas obtained from various publicly available documentation and articles. To the best of our knowledge, theinformation below is accurate, although the accuracy may deviate from time to time depending on thetiming of new releases. The best and most complete way to evaluate what package best fulfills yourrequirements is to acquire a copy of the documentation and software from each institution and examinethem thoroughly.

CDF vs. netCDF

CDF was designed and developed in 1985 by the National Space Science Data Center (NSSDC) atNASA/GSFC. CDF was originally written in FORTRAN and only available on the VAX/VMSenvironments. NetCDF was developed a few years later by the National Center for Atmospheric Research(NCAR). The netCDF model was based on that of the CDF conceptual model but provided a number ofadditional features (such as C language bindings, portable to a number of platforms, machine-independentdata format, etc.). Today both models and existing software have matured substantially since and are quitesimilar in most respects, although they do differ in the following ways:

Although the interfaces do provide the same basic functionality they do differ syntactically. (See usersguides for details.)

NetCDF supports named dimensions (i.e., TEMP[x, y, ...]) whereas CDF utilizes the traditional logical (i.e.,TEMP[true, true, ...]) method of indicating dimensionality.

CDF supports both multi- and single file filing systems whereas netCDF supports only single file filingsystems.

CDF software can transparently access data files in any encoding currently supported by the CDF library(For example: a CDF application running on a Sun can read and write data encoded in a VAX format.) inaddition to the machine-independent (XDR) encoding. netCDF software reads and writes data in only theXDR data encoding.

The CDF library supports an internal caching algorithm in which the user can make modifications (if sodesired) to tweak performance.

The netCDF data object is currently accessible via the HDF software; CDF is not.

As part of the CDF distribution, there exist a number of easy-to-use tools and utilities that enable the user toedit, browse, list, prototype, subset, export to ASCII, compare, etc. the contents of CDF data files.

All CDF development (ports, new features, etc.) and user support activities are centralized.

CDF vs. HDF

CDF is a scientific data management software package and format based on a multidimensional (array)model. HDF is a Hierarchal Data Format developed at the National Center for SupercomputingApplications (NCSA) at the University of Illinois. The HDF data model is based on the hierarchicalrelationship and dependencies among data. Although the two models differ (in many ways like comparingapples to oranges) significantly in their level of abstraction and the way in which their inherent structures

Page 10: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

are defined and accessed, there exists a large overlap in the types of scientific data that each can support.Some of the obvious differences are as follows:

The HDF structure is based on a tagged format, storing tag identifiers (i.e., utility, raster image, scientificdata set, and Vgroup/Vdata tags) for each inherent data object. The basic structure of HDF consists of anindex with the tags of the objects in the file, pointers to the data associated with the tags, and the datathemselves. The CDF structure is based on variable definitions (name, data type, number of dimensions,sizes, etc.) where a collection of data elements is defined in terms of a variable. The structure of CDFallows one to define an unlimited number of variables completely independent (loosely coupled) of oneanother and disparate in nature, a group of variables that illustrate a strong dependency (tightly coupled) onone another or both simultaneously. In addition CDF supports extensive metadata capabilities (calledattributes), which enable the user to define further the contents of a CDF file.

HDF supports a set of interface routines for each supported object (Raster Image, Pallets, Scientific DataSets, Annotation, Vset, and Vgroup) type. CDF supports two interfaces from which a CDF file can beaccessed: the Internal Interface and the Standard Interface. The Internal Interface is very robust and consistsof one variable argument subroutine call that enables a user to utilize all the functionality supported viaCDF software. The Standard Interface is built on top of the Internal Interface and consists of 23 subroutinecalls with a fixed argument list. The Standard Interface provides a mechanism in which novice programmerscan quickly and easily create a CDF data file.

HDF currently offers some compression for storing certain types of data objects, such as images. CDFsupports compression of any data type with a choice of run-length encoding, Huffman, adaptive Huffman,and Gnu's ZIP algorithms.

CDF supports an internal cache in which the user can modify the size through the Internal Interface toenhance performance on specific machines.

HDF data files are difficult to update. Data records are physically stored in a contiguous fashion. Therefore,if a data record needs to be extended it usually means that the entire file has to be rewritten. CDF maintainsan internal directory of pointers for all the variables in a CDF file and does not require all the data elementsfor a given variable to be contiguous. Therefore, existing variables can be extended, modified, and deleted,and new variables added to the existing file.

In the late 1980's the CDF software was redesigned and rewritten (CDF 2.0) in C. With little or no impacton performance, the redesign provided for an open framework that could be easily extended to incorporatenew functionality and features when needed. CDF is currently at Version 2.7, and performance has beenenhanced significantly.

CDF supports both host encoding and the machine-independent (XDR) encoding. In addition, the CDFsoftware can transparently access data files in any encoding currently supported by the CDF library (Forexample, a CDF application running on a Sun can read and write data encoded in a VAX format.) HDFsupports both host encoding and the machine-independent (XDR) encoding.

Page 11: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS Data Archive PlanTo ensure the success of MAGS careful attention must be given to the preservation of the data sets. The attached table has been compiled to show the

data sets being collected, the contact person and the location and availability of the data sets. This information is necessary:• to ensure that the data is archived,• to facilitate the exchange of data within MAGS,• to provided the basis for a compilation CD-ROM archive of the CAGES observations.

Additions, corrections or comments regarding the information in the table is welcomed.

Observation Location Contact 94-95 98-99 Format ArchiveLocation

TimePeriod

Satellite NOAA IR Basin Bussieres x x images / raw data on line / off line continuousVisible Basin Bussieres x x images / raw data on line / off line continuousSpot Basin Marsh x images off line continuous

GOES IR Basin Kochtubajda/Crawford x images on line / off line continuousVisible Basin Kochtubajda/Crawford x images on line / off line continuous

Upper Air Canadian Operational Fort Nelson Strong/Proctor/Crawford x x ASCII/binary on line continuous/IOP

Fort Smith Strong/Proctor/Crawford x x ASCII/binary on line continuous/IOP

Inuvik Strong/Proctor/Crawford x x ASCII/binary on line continuous/IOP

Norman Wells Strong/Proctor/Crawford x x ASCII/binary on line continuous/IOP

Whitehorse Strong/Proctor/Crawford x x ASCII/binary on line continuous/IOP

MAGS Fort Simpson Strong/Proctor x ASCII/binary on line IOPBASE Tuktoyaktuk Crawford x ASCII/binary on line IOPGPS Fort Smith Strong/Proctor x ASCII/binary on line IOP

American Operational Yakutat, AK Gyakum x x ASCII/binary on line IOPAircraft MAGS Twin Otter Schuepp x ASCII/binary on line IOP

BASE Convair Crawford x ASCII/binary on line IOPSynoptic Charts Basin Burford/Hudson x x hardcopy on line continuous

Radar Operational Carvel/Spirit Hudak x images / raw data on line continuous

Page 12: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

RiverMAGS Fort Simpson Hudak/Currie x images / raw data on line IOPBASE Inuvik Hudak x images / raw data on line IOP

Tuktoyaktuk Crawford/Asuma x images / raw data on line IOPSurface Stations Operational (many) Crawford x x ASCII on line continuous

MAGS Fort GoodHope

Kochtubajda x ASCII on line continuous

Carp Lake Kochtubajda x ASCII on line continuousFort SimpsonA

Kochtubajda x ASCII on line continuous

LindbergLanding

Kochtubajda x ASCII on line continuous

Watson Lake Kochtubajda x ASCII on line continuousMacMillanPass

Kochtubajda x ASCII on line continuous

Dease Lake Kochtubajda x ASCII on line continuousInuvik U/A Kochtubajda x ASCII on line continuousInnerWhalebacks

Kochtubajda x ASCII on line continuous

Trail ValleyCreek

Marsh/Kochtubajda x ASCII on line continuous

Havikpak Marsh/Kochtubajda x ASCII on line continuousJean Marie Pietroniro/Kochtubajda x ASCII on line continuous

BASE Inuvik U/A Crawford/Agnew x ASCII on line continuousInuvikTownsite

Crawford/Agnew x ASCII on line continuous

Tuktoyaktuk Crawford/Agnew x ASCII on line continuousNorman Wells Eley x x ASCII on line continuous

Gridded T & Precipitation Basin Hogg x x ASCII on line continuousSnow Courses (many) Walker x x ASCII on line IOP

Sea Ice Beaufort Sea Agnew x x images on line IOPResearch Basins Inuvik Marsh/Prowse x x ASCII off line continuous

MannersCreek

Marsh/Prowse x x ASCII off line continuous

Orographic Precipitation Fort Nelson Proctor x ASCII on line IOP

Page 13: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Snow Melt Marsh x ASCII off line IOPBlowing Snow Pomeroy x ASCII off line IOPEvaporation HYDRA Rouse x ASCII on line IOP

Buoy #45141 Rouse x ASCII off line IOPHexoid Rouse x ASCII off line IOP

Evaporation/Radiation Great SlaveLake

Schertzer x ASCII off line IOP

Thermistor Chain Great SlaveLake

Rouse x ASCII off line IOP

Isotope Gibson x ASCII off line IOPDischarge (many) Soulis x x ASCII off line continuous/I

OPWater Balance Snow

SurveyYellowknife Spence x ASCII off line IOP

hydrologysite

Yellowknife Spence x ASCII off line IOP

gauge data Yellowknife Spence x ASCII off line IOPLightning Basin Kochtubajda x x ASCII off line continuous

Models GCM Basin/Global Hogue x x binary off line continuousRCM Basin/Regiona

lMackay x x binary off line continuous/I

OPMC2 Basin/Regiona

lYau x x binary off line IOP

RFE/GEM Basin/Global Hogue x x binary on line continuousMOLTS (many) Hogue x x binary on line continuous

ECMWF Basin/Global binary off line continuousETA Basin/Global binary off line continuousCLASS Basin Verseghy x x binary off line continuous/I

OPWATFLOOD Basin Soulis x x binary off line continuous/I

OPSLURP Basin binary off line continuous/I

OP

Page 14: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management
Page 15: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS “Legacy” Datasets

• MAGS “Legacy” Archive Plan

Page 16: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management
Page 17: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS “Legacy” Archive Plan

The enhanced observations and data collected during MAGS will be archived in three series of CD-ROMs detailing:• Water-Years• Case Studies• Processes and Individual Phenomena

Additions, corrections or comments regarding the information in the table is welcomed.

MAGS Permanent (“Legacy”) Archives:Series Volume Version Title Status

Series I 1 1.0 Vertical Profiles through Precipitating Cloudsreleased September 1998Processes 2 Mackenzie Basin DEMs in productionand 3 Mackenzie Basin Land Cover in productionIndividual 4 1.0 MAGS Satellite Observations released (27+ CDs)Phenomena 5 Water Vapour Diurnal Cycle in MAGS in production

6 IPIX Radar Observations in production (173 CDs)7 MAGS Enhanced Surface Observations in planning8 1.0 Discharge Measurements During the 1999

Breakup periodreleased October 2000

Series II 1 2.0 BASE Observations September 30 1994 released September 1998Case Studies 2 1.0 CAGES Case Study Observations to be identified:

• Spring IOP Hailstorm• others??

Series III 1 1994-1995 in planningWater Years 2 1998-1999 in planning

Page 18: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management
Page 19: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS Policies & Procedures

• MAGS Data Access Policy

• Partnership with Mackenzie GEWEX Study (MAGS)

• MAGS Data Documentation Guidelines

• MAGS Documentation and Archive Guidelines: Clarification for Models

Page 20: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management
Page 21: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS Data Access PolicyTuesday, February 08, 2000

Originally adopted: November 24, 1997

Revisions Accepted: February 15, 2000

1. INTRODUCTION

The Mackenzie GEWEX Study (MAGS) Data Access Policy has been established to promote and govern

the access of data collected within the MAGS research area by MAGS Principal Investigators (PIs), MAGS

Co-Investigators (CIs) , non-participants, and other Canadian and International groups.

MAGS embraces as open an approach as possible to the exchange and access to data for scientific and non-

commercial/non-profit uses. This approach must respect the rights of the data originators who have

invested considerable effort in obtaining and/or generating data. For this reason, policies for those

participating in MAGS are different from those not participating or contributing to the project. These

policies are described in the following sections. All relevant terms are defined in Section 7.

This data access policy must be reviewed on an annual basis. At that time, the policy may be modified to

improve its usefulness.

2. REQUESTS FROM MAGS INVESTIGATORS

Requests from MAGS PIs and CIs will be given priority over non-participants in the project. Access to

datasets for MAGS research will be unrestricted to MAGS PIs and CIs after the data have been quality-

controlled and documented by the originating MAGS PI. MAGS CIs will direct their data requests through

their associated MAGS PI.

2.1 Special MAGS Datasets

Special MAGS-funded datasets (observations and model results) can be obtained by MAGS PIs through the

MAGS WWW home page. To ensure that only MAGS PIs and CIs have access, this portion of the MAGS

WWW home page will be password protected. Passwords will be supplied by the MAGS Data Manager.

MAGS datasets can also be obtained directly from the originating MAGS PI.

In accordance with the Canadian GEWEX Data Policy (Appendix A), MAGS PIs also have a responsibility

to make their MAGS-funded datasets openly available to other MAGS PIs and CIs after an initial

Page 22: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

reasonable period for quality control and documentation (following the MAGS Data Documentation

Guidelines). This period will nominally be no longer than one year unless there are special circumstances

(for example, if early release of data would jeopardize a graduate thesis) and an extension is granted by the

Science Committee. After this quality control period, use of the data will be restricted to other MAGS PIs

and CIs for a period of one year, after which the data must be made publicly available. For this reason,

MAGS discourages PIs and CIs from seeking additional funding from sources that require proprietary

control over data access.

2.2 Operational Datasets [EC/MSC Climate Datasets]

Operational datasets from the Environment Canada Meteorological Service of Canada (MSC, formerly

known as Atmospheric Environment Program) climate archive will be made freely available only to MAGS

PIs and their associated CIs for use in their MAGS research. Use of data obtained from the climate archive

is governed by the MSC Data Product Licence Agreement (Appendix B). MAGS PIs and CIs are urged to

read this agreement carefully. Access to the data will only be available from a password protected portion

of the MAGS WWW home page when accessed from a MAGS PI or CI’s Internet Domain. This will

ensure that only MAGS PIs and CIs have free access to the operational datasets. Passwords will be

assigned by the MAGS Data Manager.

3. REQUESTS FROM NON-PARTICIPANTS

Requests for data from the research community at large are invited, but requests by non-participants would

need to be approved on a case-by-case basis. Non-participants may become MAGS CIs by teaming with a

MAGS PI, who will subsequently apply to the MAGS Science Committee for permission to allow the new

MAGS CI.

3.1 Special MAGS Datasets

Special MAGS datasets (Observational and Modeling results) will only be made available to non-

participants after a one year period in which MAGS PIs and CIs have exclusive access. This period will

begin after the MAGS dataset has been quality controlled by the originator.

3.2 Operational Datasets [EC/MSC Climate Datasets]

Operational datasets from the Environment Canada MSC climate archive will not be supplied to non-

participants by MAGS. These datasets can be obtained from the Climate Information Branch of

Environment Canada. Similarly, archived model output data may be obtained from the Canadian

Meteorological Centre. Charges for the data and additional data use restrictions may apply.

Page 23: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

4. REQUESTS FROM COMMERCIAL ENTITIES

Permission from anyone (MAGS Principal Investigators, MAGS Co-Investigators or non-participants) to

use MAGS datasets for commercial/profitable purposes must be given by the MAGS Science Committee.

5. REQUESTS FROM OTHER NATIONAL AND INTERNATIONAL GROUPS

National and international commitments for sharing of MAGS datasets will be considered on a case-by-case

basis by the MAGS Science Committee.

6. ACKNOWLEDGMENT

Any use of MAGS datasets, by MAGS participants (PIs or CIs) or non-participants, requires

acknowledgment in any resulting publications of the MAGS project and, when applicable, acknowledgment

of the MAGS PI responsible for generating the dataset, with references to earlier studies for which the data

were obtained.

7. DEFINITIONS

7.1 MAGS Dataset

A MAGS dataset can be:

• observational, either station data or gridded data, resulting from MAGS-funded work, or

• model derived output, either utilizing MAGS special observations or resulting from MAGS-

funded work

7.2 MAGS Principal Investigator (PI)

A MAGS Principal Investigator (PI) is defined as one who:

• contributes scientifically to MAGS, and

• receives funding from MAGS.

7.3 MAGS Co-Investigator (CI)

A MAGS Co-Investigator (CI) is defined as one who:

• is working with a MAGS PI, and

Page 24: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

• has been recognized as such by the MAGS Science Committee.

7.4 Non-Participant

Non-participating researchers include any researchers who are neither MAGS Principal Investigators nor

MAGS Co-Investigators.

Page 25: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Appendix A

The Canadian GEWEX Data Policy

The Canadian GEWEX Programme data policy statement is intended to be used as a basis for establishingthe national policy concerning access to Canadian GEWEX data, and to promote internationally the mutualaccess to GEWEX data where appropriate. It was adopted by the Canadian Global Energy and Water CycleExperiment Science Committee on March 9, 1993. The policy statement is as follows:

1. The Canadian GEWEX Programme requires an early and continuing commitment to the establishment,maintenance, validation, description, accessibility, and distribution of high- quality, long-term data sets.

2. Full and open sharing of the full suite of global data sets for all Canadian GEWEX researchers is afundamental objective.

3. Preservation of all data needed to meet Canadian GEWEX research objectives is required. Aclearinghouse process should be established to prevent the purging and loss of important data sets.

4. Data archives must include easily accessible information about the data holdings, including qualityassessments, supporting ancillary information, and guidance and aids for locating and obtaining thedata.

5. National and international standards should be used to the greatest extent possible for media and forprocessing and communication of global data sets.

6. Data should be provided at the lowest possible cost to Canadian GEWEX researchers in the interest offull and open access. This cost should, as a first principle, be no more than the cost of reproduction anddistribution. Agencies should act to streamline administrative arrangements for exchanging data amongresearchers.

7. Data, whether standard network data or special data collected by projects supported by CanadianGEWEX, should be made openly available beyond an initial reasonable period for quality control.

Page 26: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Appendix B

Limited use software and data product licence agreementfor Meteorological Service of Canada supported research projects

The Government of Canada (Environment Canada) is the owner of all intellectual property rights (includingcopyright) in this software and data product. You are granted a non-exclusive, non-assignable and non-transferrable licence to use this software and data product subject to the terms below.

This software and data product are temporarily provided to you by Environment Canada for the solepurpose of carrying out the Meteorological Service of Canada supported research project entitled :

MACKENZIE GEWEX STUDY:______________________________________________________.

As such, this software and data product constitutes a contribution by Environment Canada to the abovenoted research project. Environment Canada shall be cited as a contributor in any publications resultingfrom this project wherein the products were used.

This software and data product remain the property of Environment Canada. Upon the completion of theproject, or in the event you cease to be involved in the project, the product shall be returned to EnvironmentCanada and any temporary copies of the product destroyed.

This licence is not a sale of any or all of the owner's rights. This product may only be used by you for theAtmospheric Environment Program supported research project identified above and you may not rent, lease,lend, sub-license or transfer the data product or any of your rights under this agreement to anyone else. Youmay not develop for commercial sale any other product derived from this product.

You may not transfer this product to or store it in any electronic network for use by more than one userunless you obtain prior written permission from Environment Canada.

This product is provided "as-is", and the owner makes no warranty, either express or implied, including butnot limited to warranties of merchantability and fitness for a particular purpose. In no event will the ownerbe liable for any indirect, special, consequential or other similar damages. If you fail to comply with anyterm of this agreement you may be liable to the licensor for breach of contract.

It is YOUR RESPONSIBILITY to ensure that your use of this product complies with these terms and toseek prior written permission from Environment Canada for any uses not permitted or not specified in thisagreement.

ANY USE WHATSOEVER OF THIS SOFTWARE AND DATA PRODUCT SHALL CONSTITUTEYOUR ACCEPTANCE OF THE TERMS OF THIS AGREEMENT.

FOR FURTHER INFORMATION PLEASE CONTACT:

Climate and Water Products DivisionEnvironment Canada4905 Dufferin StreetDownsview, OntarioM3H 5T4Tel: (416)739-4328Fax:(416)739-4446

Page 27: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Partnership with Mackenzie GEWEX Study (MAGS)

Background of MAGS

1988: World Climate Research Program (WCRP) initiated the Global Energy and Water Cycle Experiment(GEWEX) to observe and model energy flux and hydrological cycle in different environments to facilitateprediction of global and regional climatic change.1994: Initiation of the Mackenzie GEWEX Study (MAGS), a Canadian contribution to GEWEX, as aresearch network comprising government and university scientists, partners and users of the research.1995: Canadian Climate Research Network provides support for first year of MAGS.1996: NSERC (Natural Sciences and Engineering Research Council) provides 4-year funding for universityscientists to participate with Environment Canada scientists to undertake Phase 1 of MAGS research.2000: NSERC (Natural Sciences and Engineering Research Council) provides a 5 year Research NetworkGrant to support university participation in a partnership of universities, government departments andprivate industries to undertake Phase 2 of MAGS research.2001: Beginning of MAGS-2.

Objectives of MAGS

The overall goals are:· to understand and model the high-latitude water and energy cycles that play roles in the climate system· to improve our ability to assess the changes to Canada’s water resources that arise from climate

variability and anthropogenic climate change.

Through Phase 1 study, we have:· gained knowledge of the cold climate processes· quantified the major processes affecting the water and energy cycles of Mackenzie Basin· developed a database for model parameterization and verification· forged a framework for modelling the transport of moisture and energy.

The objectives of Phase 2 (2001-05) are:· to integrate the knowledge gained in Phase 1 to understand the atmospheric and hydrological cycles as

one unified system· to develop a hierarchy of models that range from small basin scale to coupled models for the entire

region· to apply our predictive capabilities to climatic, water resources and environmental issues in the

Mackenzie Basin and other cold climate regions.

Advantages of Partnership

· institutions establish connections with MAGS science and scientists· problems can be addressed with a broad scope through collaboration· information exchange among scientists and managers or partner institutions· MAGS label can be used for leveraging funds

An effective MAGS administration has been set up to:· promote and support atmospheric, hydrological and other associated sciences· facilitate networking among institutions, scientists and users· coordinate activities and organize scientific events

Page 28: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

· disseminate results both nationally and internationally

Policy for Partnership

MAGS welcomes partnership with scientists, institutions and users in Canada and around the world. Forconsideration or renewal of partnership, a potential partner will submit the following information to theScience Committee of MAGS:· title of project or programme· name of institution and persons involved· description of project or programme· relevance of project or programme to MAGS· resources and collaborations expected of MAGS· support of or contributions to MAGS through the project

The application will be reviewed by the Science Committee and approved for a specified time period if it isdeemed to contribute to the MAGS goals.Any approved partner will abide by the MAGS policy and will submit an annual report to MAGS, to beincluded in our annual review of activities.Renewal of partnership is subject to the review and approval by the Science Committee.

Inquiries

Additional information on MAGS can be found on our websitehttp://www.gewex.com/mags.html

or by contacting

MAGS SecretariatNational Hydrology Research Centre11 Innovation Blvd., SaskatoonSaskatchewan, Canada S7N 3H5

Phone: (306) 975-5809Fax: (306) 975-6515E-mail: [email protected]

Page 29: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS Data Documentation Guidelines

*DRAFT*Tuesday, February 08, 2000

To assist in meeting the objectives of MAGS to understand the Mackenzie River Basin’s water

and energy cycles, and to provide a useful and lasting legacy for further research, it is necessary that all data

collected for MAGS be properly documented.

It is the responsibility of the originating MAGS PI to ensure that the data collected is properly

documented. The MAGS Data Access Policy provides for a time period after data collection for the

MAGS-PI to quality-control and document his/her data. Data Documentation should be complete enough

to allow unfamiliar researchers to replicate and use the data in the future. The documentation should

contain the following headings:

MAGS Data Documentation Guidelines

0. Title

1. Abstract - Name the dataset and describe why the measurement was undertaken and how it relates to

MAGS.

2. Contact Information - Give sufficient detail (name, affiliation, full address, telephone and fax numbers,

e-mail, etc.) to contact those most knowledgeable about the dataset.

3. Site Description - including the following:

- Data Period(s) and Location(s)

- Equipment used -including manufacturer and model numbers.

- Methods/Software used - in acquiring the data.

- Data Format - including examples.

4. Data Processing/Quality Control - including the following:

- Methods/Software used - in acquiring and processing the data

- Post-Collection Data Processing -description of any processing done on the data.

- Quality Control Methods - give an indication as to the degree of quality control.

- Datasets Archived -original “raw” data should be one of the archived datasets in addition to

any processed or QCed data.

5. References

Page 30: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

Examples of documentation are available (for MAGS Ground-Based Surface Weather

Observations and CAGES Upper-Air Data) and the MAGS Data Manager is available to assist with

production of documentation.

These guidelines must be reviewed on an annual basis. At that time, the guidelines may be

modified to improve its usefulness.

Page 31: CAGES/Data Workshop MAGS Data Session · All CDF development (ports, new features, etc.) and user support activities are centralized. CDF vs. HDF CDF is a scientific data management

MAGS Documentation and Archive

Guidelines: Clarification for Models* Accepted Wednesday, June 27, 2001*

To assist in meeting the objectives of MAGS to understand the Mackenzie River Basin’s water

and energy cycles, and to provide a useful and lasting legacy for further research, it is necessary that

significant model results for MAGS be properly documented and archived.

It is the responsibility of the originating MAGS Investigators to ensure that the model output is

properly documented. The MAGS Data Access Policy provides for a time period after data collection for

the MAGS Investigator to quality-control and document his/her data. With respect to model output, the end

of a data collection period is defined as the model run after a significant model revision (e.g. change of

physics; improved routing; coupling). Data Documentation should be complete enough to allow unfamiliar

researchers to use the data in the future. The documentation should contain the following headings:

MAGS Data Documentation Guidelines

0. Title - Model name, version number.

1. Abstract - briefly describe the model and its properties and describe why the model run was

undertaken and how it relates to MAGS.

2. Contact Information - Give sufficient detail (name, affiliation, full address, telephone and fax numbers,

e-mail, etc.) to contact those most knowledgeable about the model run.

3. Run Description - including the following (valid web links acceptable):

- Period(s) and Location(s)/Resolution/Map Projection

- Initialization and Boundary Data used

- Model used - complete description of the model, physics package, any coupling state, etc.

- Data Format - including examples.

- Archive Location/Media - online link or offline contact person.

5. References

The MAGS Information Manager is available to assist with production of documentation.

These guidelines must be reviewed on an annual basis. At that time, the guidelines may be

modified to improve its usefulness.