sfu library's mets-bagger tool
DESCRIPTION
Normalizing existing digitized content into standardized packages for robust long-term management. A report on SFU Library's METS-Bagger tool, with a discussion of the benefits, design principles used for the packaging specification, and potential next steps. Presented at Code4Lib BC, November 28, 2013.TRANSCRIPT
METS-Bagger ToolNormalizing existing digitized content into standardized
packages for robust long-term management.
Marcus Emmanuel Barnes#c4lbc
2013-11-28
Background
● SFU Library holds about 15 TB of content○ the Library has created high-quality master versions
of content it has digitized using ‘preservation- friendly’ formats.
○ descriptive metadata exists for almost all of it.
However, this content was not previously managed with generally accepted digital preservation practice.
Solution
● SFU Library Digitized Content Packaging Specification
● METS-Bagger tool for normalizing existing digitized content based on this specification for robust long-term management.
METS-Bagger Tool
● Two components:
○ Collection normalization script
○ Integrity scripts based on collection manifest
Collection Normalization
● Processes existing collections of files into a format compliant with the SFU Library Digitized Content Packaging Specification
● Packaging Formats:○ METS (http://www.loc.gov/standards/mets/)○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
How Collection Normalization Works
1. Configuration file for settings2. Script walks the directory tree of a collection, compiles
list of files to be preserved3. Files are collated into items (e.g., newspaper issue),
METS file is generated4. Items files and associated METS file are bagged (and
serialized)5. Future: A collection manifest is created for the collection
for integrity checking (automatic or manual).
Before and After Processing
Design Principles
● a minimalist implementation - uses as few METS and BagIt options as possible.
● incorporates three widely implemented and understood standards: METS, BagIt and UUID (Universally Unique Identifiers)
● Technical metadata included in METS should include at a minimum bit-level checksums, file type identification, creating application, and where possible format validity
● Whenever possible, include descriptive metadata for the item in the METS file.
Script Details
● Configuration file, main script, log file, processed collection output directory
● Uses Python for using the tool on multiple platforms● Plugins for technical metadata (FITS) and descriptive
metadata.● Configuration options include:
○ test run (limited run size)○ skipping technical metadata creation○ file types of interest
Future
● Addition of manifest and integrity checking tools that check a collection against its manifest
● Additional plugins
● Sharing code on GitHub
Thank You
This work was made possible by the support of:● Simon Fraser University Library● SFU Library Systems group● Mark Jordan @mjordan