sfu library's mets-bagger tool

11

Click here to load reader

Upload: marcus-emmanuel-barnes

Post on 25-May-2015

470 views

Category:

Technology


1 download

DESCRIPTION

Normalizing existing digitized content into standardized packages for robust long-term management. A report on SFU Library's METS-Bagger tool, with a discussion of the benefits, design principles used for the packaging specification, and potential next steps. Presented at Code4Lib BC, November 28, 2013.

TRANSCRIPT

Page 1: SFU Library's METS-Bagger Tool

METS-Bagger ToolNormalizing existing digitized content into standardized

packages for robust long-term management.

Marcus Emmanuel Barnes#c4lbc

2013-11-28

Page 2: SFU Library's METS-Bagger Tool

Background

● SFU Library holds about 15 TB of content○ the Library has created high-quality master versions

of content it has digitized using ‘preservation- friendly’ formats.

○ descriptive metadata exists for almost all of it.

However, this content was not previously managed with generally accepted digital preservation practice.

Page 3: SFU Library's METS-Bagger Tool

Solution

● SFU Library Digitized Content Packaging Specification

● METS-Bagger tool for normalizing existing digitized content based on this specification for robust long-term management.

Page 4: SFU Library's METS-Bagger Tool

METS-Bagger Tool

● Two components:

○ Collection normalization script

○ Integrity scripts based on collection manifest

Page 5: SFU Library's METS-Bagger Tool

Collection Normalization

● Processes existing collections of files into a format compliant with the SFU Library Digitized Content Packaging Specification

● Packaging Formats:○ METS (http://www.loc.gov/standards/mets/)○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)

Page 6: SFU Library's METS-Bagger Tool

How Collection Normalization Works

1. Configuration file for settings2. Script walks the directory tree of a collection, compiles

list of files to be preserved3. Files are collated into items (e.g., newspaper issue),

METS file is generated4. Items files and associated METS file are bagged (and

serialized)5. Future: A collection manifest is created for the collection

for integrity checking (automatic or manual).

Page 7: SFU Library's METS-Bagger Tool

Before and After Processing

Page 8: SFU Library's METS-Bagger Tool

Design Principles

● a minimalist implementation - uses as few METS and BagIt options as possible.

● incorporates three widely implemented and understood standards: METS, BagIt and UUID (Universally Unique Identifiers)

● Technical metadata included in METS should include at a minimum bit-level checksums, file type identification, creating application, and where possible format validity

● Whenever possible, include descriptive metadata for the item in the METS file.

Page 9: SFU Library's METS-Bagger Tool

Script Details

● Configuration file, main script, log file, processed collection output directory

● Uses Python for using the tool on multiple platforms● Plugins for technical metadata (FITS) and descriptive

metadata.● Configuration options include:

○ test run (limited run size)○ skipping technical metadata creation○ file types of interest

Page 10: SFU Library's METS-Bagger Tool

Future

● Addition of manifest and integrity checking tools that check a collection against its manifest

● Additional plugins

● Sharing code on GitHub

Page 11: SFU Library's METS-Bagger Tool

Thank You

This work was made possible by the support of:● Simon Fraser University Library● SFU Library Systems group● Mark Jordan @mjordan