preservation environment working group

14
GGF-17 Astro Workshop www.gridforum.org Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals Demonstrate interoperability between multiple preservation environments that are based on data grid technology Interactions with Astro Working Group IVOA preservation working group Define standards for preservation of astronomy collections Sustainability Governance Preservation authenticity, integrity, infrastructure independence Standards • FITS data format • UCD semantics • Hyperatlas plates • IVOA access services

Upload: malia

Post on 02-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Preservation Environment Working Group. Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals Demonstrate interoperability between multiple preservation environments that are based on data grid technology Interactions with Astro Working Group IVOA preservation working group - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Preservation Environment Working Group

• Officers: Bruce Barkstrom (NASA Langley)Reagan Moore (SDSC)

• Goals Demonstrate interoperability between multiple preservation environments

that are based on data grid technology• Interactions with Astro Working Group

IVOA preservation working group Define standards for preservation of astronomy collections

Sustainability Governance Preservation authenticity, integrity, infrastructure independence Standards

• FITS data format• UCD semantics• Hyperatlas plates• IVOA access services

Page 2: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Intellectual Property Policy

• I acknowledge that participation in GGF8 is subject to the GGF Intellectual Property Policy.• Intellectual Property Notices Note Well: All statements related to the activities of the GGF and

addressed to the GGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the GGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in GGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the GGF plenary session,

• any GGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any GGF mailing list, including any working group or research group list, or any other list functioning

under GGF auspices, • the GFD Editor or the GWD process • Statements made outside of a GGF meeting, mailing list or other function, that are clearly not intended

to be input to an GGF activity, group or function, are not subject to these provisions.• Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the GGF

secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant GGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the GGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the GGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification. GGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.

Page 3: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Preservation Components

Authenticity - manage links to preservation metadata Data grid OGSA naming / OGSA DAIS / Information Dissemination / DFDL

Integrity - assure data and metadata are not corrupted, track chain of custody, manage access controls, update state information

Data grid OGSA naming / OGSA DAIS / Grid File Systems / OGSA Data / Grid

Information Retrieval / OGSA Authorization Infrastructure independence - assure that no dependencies are

introduced on use of a particular vendor product Data grid Grid File Systems / DFDL / OGSA Data Replication / Grid Storage

Management / GridFTP / Transaction Management / OGSA Data / Grid Remote Procedure Call

Page 4: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Preservation Approach

• Standard semantics IVOA - Uniform Content Descriptors

• Standard data encoding format IVOA - FITS file

• Standard access services IVOA - Cone Search, Simple Image Access Protocol, Simple

Spectrum Access Protocol, VOEvent notification, Mosaic service

• Standard validation services FITS header validation - correct coordinate information HyperAtlas standard plates - re-project pixels to standard plate

• Federation across independent systems Address sustainability by replicating across sustainability models

Page 5: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Data Grids as Basis for Preservation

• Authenticity mechanisms Link images to preservation metadata

Provenance information for source of image (FITS header extraction) Descriptive information - UCDs

• Integrity mechanisms Chain of custody - tracking where images have been stored Audit trail - tracking operations performed on images Persistent name spaces for users, files, metadata Checksums Replicas Validation of checksums, synchronization of replicas Federation - managing integrity across independent data grids

• Infrastructure independence Ability to migrate archives onto new technology

Page 6: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

NOAO Preservation - Irene Barg

Federated SRB data grids

Goals: Replicate images Deposit into an archive Maintain availability Capture data daily

Implementation Federation of data grids Pull environment Reliable transport

Preservation environment Separate data grid Reliable storage

Archive

Page 7: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Sustainability - Federation of Federations

Data Grid

Country SRBversion

Demouserggfsdsc

SRB Zonename

Storage ResourceLogical Name

I/OMB/sec

APAC Australia 3.4.0-P yes AU StoreDemoResc_AU 3.9 NOAO Chile/US 3.4.1 yes noao-ls-t3-z1 noao-ls-t3-fsChinaGrid China CGSP-II (software) IN2P3 France 3.4.0-P yes ccin2p3 LyonFS4 [25.] DEISA Italy 3.4.0-P yes DEISA demo-cineca KEK Japan 3.4.0-P yes KEK-CRC rsr01-ufs 7.4 SARA Netherlands 3.4.0-P yes SARA SaraStore IB New

Zealand3.4.1 yes aucklandZone aucklandResc (0.3)

ASGC Taiwan 3.4.0-P yes TWGrid SDSC-GGF_LRS1 (0.1) NCHC Taiwan 3.4.0-P yes ecogrid ggf-test RAL UK 3.4.0-P (firewall) tdmg2zone IB UK 3.4.1 yes avonZone avonResc WunGrid UK 3.3.1 (hardware) SDSC-wun sfs-tape Purdue US 3.4.0-P yes Purdue uxResc1 (2.5) Teragrid US 3.4.0-P yes SDSC-GGF sfs-disk U Md US 3.4.0-P yes umiacs narasrb02-unix1

GGF Data Grid Interoperability Demonstration

Page 8: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Preservation at Scale

• Creation of standard plates for publication in a Hyperatlas - Roy Williams (Caltech)

• Used Montage mosaic code developed at IPAC/Caltech (John Good) Created mosaics by re-projecting 4,121,440 images from the 2MASS archive of 8

TB that had been replicated to the Teragrid. Because of overlap, required manipulating 6,275,494 files, and 14 TB of data. Processing time was over 100,000 CPU-hours on the Teragrid. Each mosaic covered a 6 degree square

Tiled each mosaic into a 12x12 array Registered plates into the Hyperatlas

• Advantages Standard projection Ability to composite images for improved signal to noise ratio Incorporated domain knowledge in generation of the standard product

Page 9: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Collection-based Approach

• Authenticity - assertions made by creator of records Provenance metadata Descriptive metadata Encapsulation of metadata with data in an Archival Information

Package Validation of consistency between authenticity metadata and stored

data Verify data file exists for each metadata record Verify for each stored data file, a metadata record exists

Validation of provenance metadata Verify consistency of defined metadata attributes across all records Verify preservation consistency constraints (a record appears only

once)

Page 10: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Collection-Based Approach

• Authenticity Validation of assertions about the collection

Characterization of assertions as management policies Mapping of management policies to executable rules Specification of state information on which the rules operate Specification of state information to manage rule outcomes

• Implementation Granularity of application Type of rule

Enterprise Setting of rule parameters Archives Aperiodic rule Collection Periodic rules Record Atomic rules

Page 11: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Collection-based Approach

• Integrity - assertions made by archivists that both the data and metadata are uncorrupted, the chain of custody can be tracked, all actions performed by identified persons, the risk of data loss has been minimized

• Requires mechanisms for: Checksums - checks based on file size, System5 checksum, MD5

checksum Replicas, backups, versions Synchronization - between replicas, between system buffers and storage,

between archives and local storage Federation - replication of both metadata and data, while coordinating

name spaces Authentication - unique identity for archivists independently of storage

system Authorization - access controls managed independently of storage system

Page 12: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Implementations

• NARA Research prototype persistent archive Electronic Records Archive Persistent Archive Testbed

• SDSC NSDL persistent archive CDL Digital Preservation Repository

• NASA Langley Archive Next Generation - ANGe

• Taiwan• Caspar / Digital Curation Centre• Diligent

Page 13: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Preservation Services

• Appraisal DAIS / Grid File Systems

• Accession GridFTP / Grid File Systems / DAIS / Transaction Management / OGSA

Data / OGSA Naming / GridFTP

• Description DAIS / OGSA Naming / DFDL / Transaction Management

• Arrangement Grid File Systems / DAIS

• Preservation Grid File Systems / Grid Storage Management / OGSA Data Replication /

GridFTP / Transaction Management / OGSA Naming

• Access DAIS / DFDL / Grid File Systems / GridFTP / Transaction Management

Page 14: Preservation Environment Working Group

GGF-17 Astro Workshopwww.gridforum.org

Propose Preservation Demonstration

• Formal validation of existing archives Consistency between metadata and stored data Verification of name space integrity

• Formal extraction of records Bulk operations to extract metadata

• Formal deposition of records into a federated data grid Federation with a second data grid Bulk operations to load metadata and data into remote data grid

• Formal validation of new archives Consistency between metadata and stored data Verification of name space integrity

• Formal export of records from the new archive and import back into the original archives, without loss of authenticity or integrity