standards showcase: premis (preservation metadata) rebecca guenther, library of congress ala annual...

28
Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Upload: heather-dean

Post on 11-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Standards Showcase:PREMIS (Preservation metadata)

Rebecca Guenther, Library of Congress

ALA Annual 2006LC booth presentationJune 24-25, 2006

Page 2: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Overview

What is preservation metadata? Background PREMIS work

• Survey• Data dictionary

Features of the data dictionary Implementing PREMIS Future

Page 3: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Digital preservation: advances & remaining challenges

Groups around the world and conferences continue to make significant progress in raising awareness about digital preservation imperative

Gradual shift in focus from articulating problem to solving it …• Not so much “Why is digital preservation important” anymore; rather,

“What must be done to achieve preservation objectives?”

Many practical challenges in implementing reliable, sustainable digital preservation programs

One key implementation challenge: preservation metadata

Page 4: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Preservation metadata includes:

Provenance:• Who has had custody/ownership of the digital object?

Authenticity:• Is the digital object what it purports to be?

Preservation Activity:• What has been done to preserve the digital object?

Technical Environment:• What is needed to render and use the digital object?

Rights Management:• What IPR must be observed?

Makes digital objects self-documenting across time

Content

PreservationMetadata

10 years on

50 years on

Forever!

Page 5: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

PREMIS background …

Pre-2002: various preservation metadata element sets released• Different scopes, purposes, underlying models/assumptions• No international standard; little consolidation of expertise/best practice

June 2002: Preservation Metadata Framework• International working group (jointly sponsored by OCLC, RLG)• Comprehensive, high-level description of types of information constituting

preservation metadata• Used OAIS reference model as starting point• Set of “prototype” preservation metadata elements• Consensus-based foundation for developing formal preservation metadata

specifications … but not an “off-the-shelf, ready to implement” solution

Post-2002: Needed implementable preservation metadata, with guidelines for application and use, relevant to a wide range of digital preservation systems and contexts

• Motivated formation of PREMIS Working Group

Page 6: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

PREMIS Working Group

Preservation metadata: key component of sustainable digital preservation

June 2003: OCLC, RLG sponsored international working group:• PREMIS: Preservation Metadata: Implementation Strategies

Objective:• Define implementable, core preservation metadata, with

guidelines/recommendations for management and use

Membership: • > 30 experts from 5 countries, libraries, museums, archives, government

agencies, private sector• Co-Chairs: Priscilla Caplan (FCLA), Rebecca Guenther (LC)

Page 7: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Membership

Priscilla Caplan, FCLA (Chair) Rebecca Guenther, LC (Chair) Michael Alexander, British Library George Barnum, GPO Charles Blair, U. of Chicago Olaf Brandt, U. of Göttingen Adam Farquhar, British Library

David Gewirtz, Yale Kevin Glavash, MIT/Dspace Cathy Hartman, U. of N. Texas Helen Hodgart, British Library Nancy Hoebelheinrich, Stanford Roger Howard/Sally Hubbard,

Getty Museum Pam Kircher, OCLC John Kunze, Calif. Digital

Library

Brian Lavoie, OCLC liaison Robin Dale, RLG liaison Vicky McCarger, LA Times Jerry McDonough, NYU/METS Evan Owens, JSTOR Erin Rhodes, NARA Madi Solomon, Walt Disney Co. Angela Spinazze, ATSPIN Gunter Waibel, RLG Lisa Weber, NARA Robin Wendler, Harvard Hilde van Wijngaarden, KB Andrew Wilson, NAA

Page 8: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Advisory Committee

Howard Besser, UCLA Liz Bishoff, OCLC (via

Colorado Digitization Program)

Gerard Clifton, National Library of Australia

Gail Hodge, CENDI Steve Knight, National Library

of New Zealand

Maggie Jones, Digital Preservation Coalition

Nancy McGovern, Cornell Cliff Morgan, Wiley UK Richard Rinehart, U. of

California, Berkeley

Page 9: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Survey Report

September 2004: Implementing PreservationRepositories for Digital Materials: Current Practice andEmerging Trends in the Cultural Heritage Community

Survey of existing and planned digital repositories:• Mission, content, funding, preservation policies/strategies,

take up of OAIS, access mechanisms, and more … • Use of metadata to support repository processes, functions, policies;

types of metadata collected; metadata storage/management practices

~50 responses:• 28 libraries, 7 archives, 3 museums, and 11 other • 13 different countries; 45% from U.S.• 38% in planning; 33% development; 46% production

Snapshot of current practices and emerging trends related to managing preservation metadata in digital archiving systems

• Variety of preservation contexts, institution types, and domains

Page 10: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Survey findings

Little experience with digital preservation• Most didn’t have active preservation strategy• Many not yet in production• Cannot assess adequacy of metadata

Lack of common vocabulary and conceptual framework• Informed by OAIS reference model• Difference of opinion as to meaning of OAIS compliance

Metadata• Many recording rights, provenance, technical,

administrative, descriptive and structural Most repositories serve goals of both preservation and

access

Page 11: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

PREMIS Data Dictionary

May 2005: Data Dictionary for PreservationMetadata: Final Report of the PREMIS Working Group

237-page report includes:• PREMIS Data Dictionary 1.0• Accompanying report (context, data model, assumptions)• Special topics, glossary, usage examples• Set of XML schema to support implementation

Data Dictionary: comprehensive, practical resource for implementing preservation metadata in digital archiving systems

• Comprehensive view of information requirements needed to support digital preservation

• Based on deep pool of institutional experiences in setting up and managing operational capacity for digital preservation

• Builds on previous work

Page 12: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

From theory to practice …

OAISOAIS DigitalArchivingSystems

FrameworkFrameworkPREMIS

DataDictionary

PREMISData

Dictionary

Preservation Metadata Requirements

Page 13: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Winner: 2005 Digital Preservation Award

Page 14: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Some guiding principles and assumptions …

“Implementable, core, preservation metadata”:• “Preservation metadata”: maintain viability, renderability,

understandability, authenticity, identity in a preservation context• “Core”: What most preservation repositories need to know to preserve

digital materials over the long-term• “Implementable”: rigorously defined; supported by usage

guidelines/recommendations; emphasis on automated workflows

Implementation neutral:• No assumptions on specific implementation• Promote flexibility/interoperability• Focus on semantic units: what you need to know (implementation-

neutral) vs. metadata elements: how you record it (implementation-specific)

• Information that needs to be “recoverable” from the digital archiving system, independent of local implementation

Page 15: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Scope of data dictionary

Implementation independent Descriptive metadata out of scope Technical metadata applying to all or most format types Media or hardware details are limited Business rules are essential for working repositories, but

not covered Rights information for preservation actions, not access

Page 16: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

PREMIS data model

IntellectualEntities

Objects

Rights

Agents

Events

Page 17: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Sample Data Dictionary entry

Semantic unit size Semantic components

None

Definition The size in bytes of the file or bitstream stored in the repository.

Rationale Size is useful for ensuring the correct number of bytes from storage have been retrieved and that an application has enough room to move or process files. It might also be used when billing for storage.

Data constraint Integer Object category Representation File Bitstream Applicability Not applicable Applicable Applicable Examples 2038927 Repeatability Not repeatable Not repeatable Obligation Optional Optional Creation/ Maintenance notes

Automatically obtained by the repository.

Usage notes Defining this semantic unit as size in bytes makes it unnecessary to record a unit of measurement. However, for the purpose of data exchange the unit of measurement should be stated or understood by both partners.

Page 18: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Semantic units pertaining to objects

objectIdentifier preservationLevel objectCategory objectCharacteristics creatingApplication originalName Storage environment

signatureInformation relationship linkingEventIdentifier linkingIntellectual

Entity Identifier linkingPermission

StatementIdentifier

Page 19: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Semantic units pertaining to Events

eventIdentifier eventType eventDateTime eventDetail eventOutcome eventOutcomeDetail linkingAgentIdentifier linkingObjectIdentifier

Page 20: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Semantic units pertaining to Agents

agentIdentifier agentName agentType

Page 21: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Semantic units pertaining to Rights

permissionStatement permissionStatementIdentifier relatedObject grantingAgent grantingAgreement permissionGranted

act restriction termOfGrant permissionNote

Page 22: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Community interest

As of March 2006:• ~25,000 “hits” on Data Dictionary• More than 100 subscribers to the PREMIS Implementers’ Group

discussion list

PREMIS Data Dictionary product of collaboration and consensus• PREMIS membership reflects variety of institutions, domains,

countries• Multiplicity of perspectives promotes applicability in multiplicity

of contexts• Digital preservation is a shared problem; this invites shared

solutions

Data Dictionary useful to any institution or organization committed to the long-term preservation of digital materials

Page 23: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

PREMIS Maintenance Activity

http://www.loc.gov/standards/premis/

Permanent Web presence,hosted by Library of Congress

Centralized destination forinformation, announcements,and other PREMIS-relatedresources

Discussion list for PREMISimplementers (PIG list)

Coordinate future revisions of Data Dictionary and XML schema

Editorial committee being established to guide development and revisions

Page 24: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Current activities

Documenting errata and proposed revisions to Data Dictionary (feedback through PIG list)• http://www.loc.gov/standards/premis/changes.html

PREMIS Implementers’ Registry• http://www.loc.gov/standards/premis/premis-registry.html

Consultancies, etc.:• Rights issues for digital preservation (Karen Coyle)• PREMIS implementation guidelines and recommendations (Deborah

Woodyard-Robinson)• PREMIS-to-OAIS mapping (Brian Lavoie)

PREMIS on the road:• Digital Curation Center PREMIS workshop (July 17-18 Glasgow)• Repository workshop at National Library of Australia (Aug. 31)• Investigating workshops in US

Page 25: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Going forward …

Establish Editorial committee

First revision of Data Dictionary

Work with other initiatives (e.g., METS, Z39.87) to integrate PREMIS with existing standards, technologies, best practices (e.g. METS)

Contribute preservation metadata resources to digital preservation community that are:• Openly available• Oriented toward practical implementation• Supported by a long-term commitment • Tools

Page 26: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Some implementers … MathArc (Germany): A joint project funded by NSF (Cornell) and

SUB Göttingen (DFG) to build a distributed archive for mathematical journals distributed between two archives to keep information redundant.

DAITTSS (Florida): a preservation repository for the use of the

libraries of the public universities of Florida. Uses a locally-developed software application (DAITSS), which implements most of the PREMIS data elements.

Ex Libris (DigiTool): an enterprise solution for the management of digital assets in libraries and academic environments consisting of a number of modules, each designed to address different needs, functions, and workflows pertaining to the life cycle of a digital object

For more information see:• http://www.loc.gov/premis/premis-registry.html

Page 27: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

Conclusion

PREMIS Data Dictionary provides critical piece of reliable digital preservation infrastructure comprised of technology, standards, and best practice

PREMIS Data Dictionary is a building block with which effective, sustainable digital preservation strategies can be implemented

PREMIS Data Dictionary tightly focused on implementation:• Practical implementation was guiding principle in all discussions• Developed tools to support implementation; released with Data

Dictionary• Further work with encouragement for international participation

and tools development is ongoing

Unglamorous but necessary infrastructure!

Page 28: Standards Showcase: PREMIS (Preservation metadata) Rebecca Guenther, Library of Congress ALA Annual 2006 LC booth presentation June 24-25, 2006

URLs, etc.

PREMIS Maintenance Activity:

http://www.loc.gov/standards/premis/

PREMIS Working Group:

http://www.oclc.org/research/projects/pmwg/

Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group:

http://www.oclc.org/research/projects/pmwg/premis-final.pdf

Please send project information to Implementers’ Registry and join the PIG list!