incompatible or interoperable? a mets bridge for a small gap between two digital preservation...

20
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@ msu.edu Aaron Collie Digital Curation Librarian [email protected]

Upload: darlene-marshall

Post on 22-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Incompatible or Interoperable?A METS bridge for a small gap between two digital preservation software packages

Lucas MakMetadata & [email protected]

Aaron CollieDigital Curation [email protected]

Page 2: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

What we wanted

What we found

What we did

METS

Page 3: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

We bridged a gap, but we didn’t close the bridge

METS

Page 4: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

METS 1.8

METS

Archivematica Output: METS.xml AIP DIP

METS

Fedora Commons Input: METS Fedora Extension

• fedora-batch-ingest.sh Datastreams!

Staging12 TB

Dark Archive84 TB

Serving5 TB

AIP

DIPDIP

AIP

Page 5: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

METS

METS Fedora Ext. 1.1

XSL

Humans

Staging12 TB

METS 1.8

DIP

AIP

ProQuest

Page 6: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Persistent ID (PID)

METS

PQ_DATA

DC

MODS

PREMIS

(…)

METS

DIP

AIP (“E”)

METS Fedora Ext. 1.1

Page 7: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Why?

We wanted to be able to control and systematize ingest at the microservice level And we like the direction Archivematica is taking

We wanted to pipe technical and preservation metadata into Fedora Commons This was the reason we got started

We haven’t contributed to the open source community, and we wanted something to learn on. We are thinking of it as professional development…

Page 8: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Comparing Archivematica & Fedora METS Different schema

Archivematica: METS v. 1.8• http://www.loc.gov/standards/mets/version18/mets.xsd

Fedora: Fedora METS 1.1• http://fedora-commons.org/definitions/1/0/mets-fedora-ext1-1.xsd

Differences in structure, elements, attributes, & values allowed

Page 9: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<structMap> Archivematica

• Physical structMap of the bag (i.e. directory structure) Fedora: No <structMap> per v.1.1*

Solution: Structure represented by <GROUPID> & <SEQ> attributes of <mets:file> <SEQ> by page no. embedded in filename

• only physical arrangement is possible unless changing file naming convention to include logical info

<GROUPID> by file type/usage (e.g. preservation master, high/low resolution access copies)

* <mets:structMap> is allowed in schema v.1.0 (used until Fedora 3.0)

Page 10: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<fileSec> Archivematica

• Two file groups: “Original” & “Submission documentation”– Original: digital objects– Submission documentation: descriptive metadata XML files

Fedora• Datastreams to be ingested as files

– Files of digital objects and others (e.g. Archivematica METS)– Descriptive metadata XML files are ingested as “inline XML

datastreams”» Copy all XML files in “Submission documentation” into

separate <dmdSecFedora> elements

Page 11: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<amdSec> Archivematica: Hierarchical structure

<amdSec ID=“amdSec1”><techMD ID=“techMD1”/>…<digiProvMD ID=“digiProvMD1”/>

</amdSec><amdSec ID=“amdSec2”>

<techMD ID=“techMD2”/>…<digiProvMD ID=“digiProvMD2”/>

</amdSec>

• 1 digital file has 1 <amdSec>• All <techMD>, <rightsMD>, <sourceMD> and <digiProvMD> pertaining to the same file are

nested under the same <amdSec>

Page 12: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Fedora: Flat structure <amdSec ID=“tech1”><techMD ID=“tech1.0”/></amdSec><amdSec ID=“digiProv1”><digiProvMD ID=“digiProv1.0”/></amdSec><amdSec ID=“tech2”><techMD ID=“tech2.0”/></amdSec>

• To accommodate inline XML datastream versioning– ID (syntax DSn.v) contains both:

» the number of the inline datastream (n) and » the version number of the datastream (v)

– Individual <amdSec> serves as container and its ID serves to indicate datastream number

– <techMD> and alike have their IDs to indicate datastream version number

Page 13: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<AMDID> attribute in <mets:file> Archivematica

• Pointing to one <amdSec>, which has <techMD>, <rightsMD>, <sourceMD>, and <digiProvMD> nested within, per file– <mets:file ID=“file1” AMDID=“amdSec1”/>

Fedora• Pointing to multiple <amdSec>, each of which contains <techMD>,

<rightsMD>, <sourceMD>, or <digiProvMD>, per file– <mets:file ID=“file1” AMDID= “tech1 rights1 source1

digiProv1”/>

Page 14: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<dmdSec> Archivematica

• Only 1 Dublin Core record is allowed to describe the SIP• Constrained by Archivematica workflow instead of METS schema• Additional descriptive metadata XML records are included in “Submission

documentation” folder Fedora

• Fedora extension element: <dmdSecFedora>• Allowed MDTYPE: MARC, EAD, DC, NISOIMG, LC-AV, VRA, TEI Header, DDI, FGDC,

& OTHER• Copy XML files in “Submission documentation” folder into separate

<dmdSecFedora>– MODS has to be labeled as “OTHER”– Use namespace URI to assign correct “MDTYPE”

» Does not work with TEI Header or EAD

Page 15: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<mets:metsHdr> Archivematica

• Does not use (optional in METS schema) Fedora

• <RECORDSTATUS> attribute to indicate whether the object is “active”, “inactive” or “deleted”

• Hard-coding in with constant data<mets:metsHdr RECORDSTATUS="A"> <mets:agent ROLE="IPOWNER" TYPE="ORGANIZATION"> <mets:name>MSU Libraries Digital and Multimedia Center</mets:name> </mets:agent> </mets:metsHdr>

Page 16: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

<OWNERID> attribute in <mets:file> Archivematica

• Does not use (optional in METS schema) Fedora

• To indicate whether the file is “managed by Fedora internally”, “externally referenced”, or “redirected”– Though optional according to Fedora-METS schema

• Determine based on filename or file format– Archivematica add “checksum” into filename for files

generated during the preservation workflow

Page 17: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Proposed Workflow

84TB

Dark Archive(s)

Staging Area 12 TB

Serving Share(s)

Web Display

METS

AIP DIP

Page 18: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

What a bridge gets us: Automatically extracts and captures technical &

preservation metadata Eases handling of complex objects with lots of

metadata or parts Maintains and manages separate AIP/DIP packages

METS

Page 19: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

What a full integration might benefit from: Archivematica A/DIP Content Model & Solution Pack Integrated AIP management

Including dashboard GUI Including JMS messaging

Integrated rebuilds from filesystem Currently supported in Fedora Commmons On Roadmap for archivematica

Automated ingest, improved handling

Page 20: Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu

Questions?

Lucas Mak ([email protected]) Aaron Collie ([email protected])