can a consortium build a viable preservation repository?daviss/work/files/presentations/cni... ·...
TRANSCRIPT
Can a Consortium Build a Viable
Preservation Repository?
Presentation at CNI
March 31, 2014
Bradley Daigle (APTrust – University of Virginia)
Stephen Davis (Columbia University) Linda Newman (University of Cincinnati)
Suzanne Thorin (APTrust – University of Virginia) Scott Turnbull (APTrust – University of Virginia)
www.aptrust.org
Academic Preservation Trust
Academic Preservation Trust, a consortium
of 17 institutions, is taking a community
approach in building and managing a
repository infrastructure that will provide
long-term preservation of the scholarly
record. APTrust will also be a DPN first
node.
www.aptrust.org
APTrust Institutions
Columbia University
Johns Hopkins University
Indiana University
North Carolina State
University
Penn State University
Stanford University
Syracuse University
University of Chicago
University of Cincinnati
www.aptrust.org
University of Connecticut
University of Maryland
University of Miami
University of Michigan
University of North Carolina
University of Notre Dame
University of Virginia
Virginia Tech
APTrust is hosted by the University of
Virginia, which fully supports 5 ½ staff,
including space and equipment. Program Director
Lead Engineer
Junior Engineer
Systems Engineer
Content Lead (1/2 time)
www.aptrust.org
Membership Dues
Member dues: $20,000 annually
Supports partner meetings, conference
travel, contract and cloud services,
marketing, and the web site
www.aptrust.org
What is the problem we are trying
to solve?
Columbia University
University of Cincinnati
University of Virginia
www.aptrust.org
Columbia University – Use Case 1
Columbia University Libraries / Information Services has made
commitments …
to granting agencies to provide long-term digital
archiving for digital content created with grant funds
to third-party content creators to provide
permanent access to born-digital content acquired
from them
to continuing to collect and preserve archival
collections, now partly or wholly born-digital content
to permanently preserve University-generated
archival and research content
Columbia University – Use Case 2
We must preserve the content of …
Local Digitization Projects
Preservation-Related Digitization
Institutional Repository / Data Sets
Born Digital Archival Content
Archived Web Sites
Super Dark Archives – highly secure
Columbia University – Questions
Why create our own single-institution long-term preservation repository?
Why divert scarce existing CUL/IS internal equipment funds to storage on a permanent basis?
Why divert scarce existing CUL/IS staff time to creation, enhancement and maintenance of our own local preservation repository, permanently?
Why undergo the costs and staff investment in obtaining local TRAC certification?
Question: Why is digital preservation
important to us?
Answer: We have digital collections
where the original source material has
deteriorated or is about to be
intentionally destroyed. (Magnetic
tapes, nitrate negatives considered
flammable). The digital object is THE
ONLY object. Magnetic tape image by Daniel P. B. Smith. Released under the GNU Free Documentation License. http://en.wikipedia.org/wiki/File:Magtape1.jpg
Nitrate negative from Cincinnati Subway and Street Improvements (digital collection) http://drc.libraries.uc.edu/handle/2374.UC/702759
University of Cincinnati – Use Case
www.aptrust.org
University of Cincinnati – Use Case Question: Why is digital preservation important to
us?
Answer:
We just moved a repository system from Columbus
Ohio to our Cincinnati campus.
10 TBs of data, in 16 different VMDKs (virtual machine
disk images) was transferred over the internet pipeline
Checksums were created for each VMDK and verified
upon receipt, some taking 24 hours to calculate.
Checksums were also created for one-million+ files,
compared with info in the repository database, and re-
compared after the storage format was changed (from
VMDK to NFS). www.aptrust.org
University of Cincinnati – Use Case Question: Why is digital preservation important to
us?
Answer: (continued)
We decided to test a full backup and restore. This
took over a week, and we discovered that 16 of our
digital assets were corrupt. We diagnosed the cause,
adjusted, and repeated without error – but if we had
not been comparing before and after checksums of all
files we would not have known about the corruption.
This process took a 1.5 months and offered a striking
example of the care that must be taken to avoid losing
data when moving large amounts of it.
www.aptrust.org
University of Cincinnati – Use Case Question: Why is digital preservation important to us?
Answer: Our credibility is at stake. We want to be
believed.
www.aptrust.org Photograph; President Nixon with Elvis Presley; 20 Dec 1970; Richard Nixon Presidential Library and
Museum, Yorba Linda, California.
http://www.nixonlibrary.gov/forresearchers/find/av/photo/images/12_20_70_3.gif
University of Cincinnati – Use Case Question: Why is digital preservation important to us?
Answer: (continued)
We are promoting a new digital repository to our
faculty. Its raison d'être – why researchers should
deposit their digital assets in this repository rather than
or in addition to several short-term delivery systems on
our campus – is long term persistence.
We have promised that their assets will also be
preserved in a dark archive such as the Academic
Preservation Trust. We have stated that preservation
means bit-level integrity and format migration.
We have asserted that the Libraries’ traditional mission
of preservation of the cultural record now applies to
the digital scholarly record. www.aptrust.org
University of Virginia Use Case
Integral part of our preservation and
curatorial landscape
Soup to nuts process for analogue
materials
◦ Selection
◦ Digitization
◦ Management
◦ Stewardship
UVa - continued
Born Digital
◦ It is all about transfer
◦ Disk images awaiting
arrangement
◦ Need and I/O space
◦ Digital Scholarship
Wish we had this years
ago
UVa Landscape
Local disk (please only temporary) /
scratch disk
Spinning disk – still only backup
Local HSM – local tape backup
APTrust – more robust preservation
actions
DPN – dark archive
Basic Technology Goals
Simple submission packaging – BagIt
Strong Chain of Custody – Logging
Format agnostic basic preservation - Fixity
Strong auditing and reporting - PREMIS
Easily reference items between systems – Identifiers
Simple distribution package for restoration - BagIt
Flow of Content in APTrust
Intellectual Object
Generic File1
Generic File2
Generic File3
Submission Bag
• Metadata (TagFiles)
• Preservation Files
• data/File1
• data/File2
• data/File3
DPN Bag
DPN Bag
DPN Bag
DPN Bag
Break apart bag and
manage as separate
fedora objects
Repackage to
same bag
format
Ingest
Restore
Bagged separately in
DPN to support
versioning
Related
Fedora Objects
Challenges
Abstracting away from specific repository
software
Identifying content across distributed
systems
Scaling solutions are still a mixed bag
Managing dependencies in a consortium
Deleting content requires some more
work
Sustainability of Service
Common development frameworks –
Hydra
Use available cloud services - AWS
Align with evolving preservation
ecosystem – OAIS & DDP
◦ Fedora 4
◦ Standards like OAIS and DDP
APTrust and TRAC Certification
APTrust is committed to working toward TRAC certification,
APTrust is the first ever repository to be built from the ground up taking TRAC into account.
A Certification Working Group has been established and will be advising and consulting with the APTrust staff and partners on TRAC objectives.
Initial development work is proceeding at the level of Digital Object Management and Infrastructure.
Examples of TRAC Requirements
“The repository shall have an appropriate succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.”
“The repository shall have short- and long-term business planning processes in place to sustain the repository over time.”
“The repository shall have contracts or deposit agreements which specify and transfer all necessary preservation rights, and those rights transferred shall be documented.”
“The repository shall have the appropriate number of staff to support all functions and services.”
“The repository shall have and use a convention that generates persistent, unique identifiers.”
Academic Preservation Trust – part
of the evolving national digital
preservation infrastructure
“The Task Force envisions the development of a national system of digital archives, which it defines as repositories of digital information that are collectively responsible for the long-term accessibility of the nation’s social, economic, cultural and intellectual heritage instantiated in digital form.”
Preserving Digital Information. Report of the Task Force on Archiving of Digital Information, commissioned by The Commission on Preservation and Access and the Research Libraries Group. May 1, 1996. Executive Summary, iii.