creating archive information packages for data sets: early experiments with digital library...

16
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander, NSIDC Choonghwan Lee, THG

Upload: gerard-nelson

Post on 03-Jan-2016

285 views

Category:

Documents


2 download

TRANSCRIPT

Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards

Ruth Duerr, NSIDCMiQun Yang, THG

Azhar Sikander, NSIDCChoonghwan Lee, THG

Outline

• Motivation

• Goals

• Standards

• Plans and Status

Motivation

Technologies change regularly, organizations come and go, but data

must survive

But preserving data takes more than just preserving the bits, all the components

of an AIP are critical

Illustration

=

Project Goals

• Prototype development of Archive Information Packages for HDF data: For entire data sets For individual “granules”

• Test usability of digital library standards with geospatial data

Metadata Standards - METS

• Metadata Encoding and Transmission Standard• An initiative of the Digital Library Federation• Provides the means to convey the metadata

necessary for management of digital objects within a repository exchange of objects between repositories (or between

repositories and their users)

• Designed to facilitate shared development of information management

tools/services interoperable exchange of digital materials

METS - A very brief overview

Describes the METS document itself

e.g., creator or editorDescribes the objectusing some external standarde.g., MARC, FGDC, Dublin CoreDescribes object creation, storage,

intellectual property rights, source info, provenance, etc.

e.g., PREMISProvides an inventory of all of the files that are part of the object

describedA physical or logical map of theorganization of the materials

describedAllows specification of hyperlinksbetween parts of the map (mostlyuseful when preserving websites)Used to associate executable code

with parts of the content

ISO 19115 Geographic Information - Metadata

• Purpose Characterize geographic data properly Facilitate organization and management of

metadata for geographic data Enable users to efficiently use such data Facilitate discovery, retrieval, and reuse Enable data assessment

ISO 19115 entities

• Identification• Constraints• Data Quality• Maintenance

Information• Spatial

Representation• Reference System

• Content Information• Portrayal Catalogue

Reference• Distribution• Metadata Extension

Information• Application Schema

Information

Metadata Standards - PREMIS

• Provide a core preservation metadata set with broad applicability across the digital preservation community

• Developed by an OCLC and RLG sponsored international working group Representatives from libraries, museums,

archives, government, and the private sector.

• Maintained by the Library of Congress• Based on the OAIS reference model

Current Program Plan

NetCDF4 / HDF5 Data

METS

NSIDC/ ECS

HDF4-data

ISO-19115

H4toH5

ECS to METS

(Data Set)

CDM/NetCDF4

ECS toMETS(Granule)

NSIDC/ECS

Metadata

HDF5-AIP

NetCDF4/HDF5-data

NetCDF4 / HDF5 Data

NSIDC/ ECS

HDF4-data

H4toH5NetCDF4/HDF5-data

Data file HDF5

METS

Primary Schema Extension Schema

|<mets>|---<dmdSec>----------------<ISO 19115>|---<amdSec>--------------|--<techMD>| |--<rightsMD> PREMIS| |--<sourceMD>|----<fileGrp>|----<structMap>

http://www.hdfgroup.uiuc.edu/papers/papers/AIP/HDF5_AIP_White_Paper.pdf

HDF5 AIP Components

Metadata file

HDF5 File Level Archive Information Packages

METS

Primary Schema Extension Schema

|<mets>|---<dmdSec>----------------<ISO 19115>|---<amdSec>--------------|--<techMD>| |--<rightsMD> PREMIS| |--<sourceMD>|----<fileGrp>|----<structMap>

Metadata file

Data Set Level Archive Information Package

HDF- AIPContextualInfomationHDF- AIPHDF- AIPHDF- AIP

ContextualInfomation

ContextualInfomation

ContextualInfomation

ContextualInfomation

HDF- AIP

File Level AIP Activity Status

• Development of a map from NSIDC/ECS metadata to METS/PREMIS/ISO 19115 completed

• Implementation underway• Issues

Auxillary file handling - own AIP or not?o E.g., browse files, processing history, PGE’so Granules vs files

Schema redundancy

Data Set AIP Activities Status

• Contextual information availability assessed for MODIS data Currently GCSRLTA information requirements are

being met Much of the information is available via a variety of

websites many of which are dynamically updated Format of the material varies widely Some material should be considered geographic

data sets in their own right Much of the material applies to multiple data sets

Data Set AIP Activity Status

• Local sources of metadata identified ECS Earth Science Data Type (ESDT)

definitions NSIDC data set catalog and

documentation

• Data set catalog to ISO 19115 metadata translator implemented - to be released operationally soon