project update: a collaborative approach to "filling the digital preservation gap" for...

26
Project update: A collaborative approach to “filling the digital preservation gap” for Research Data Management Julie Allinson Technology Development Manager Library & Archives University of York 6 November 2015

Upload: jenny-mitcham

Post on 08-Apr-2017

132 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Project update:A collaborative approach to “filling the digital preservation gap” for Research Data ManagementJulie AllinsonTechnology Development ManagerLibrary & ArchivesUniversity of York

6 November 2015

Page 2: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Filling the digital preservation gap:Project aim

“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”

Page 3: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

This is a collaborationUniversity of Hull:• Chris Awre – Head of Information Services, Library and

Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University ArchivistUniversity of York:• Julie Allinson – Technology Development Manager• Jen Mitcham – Digital ArchivistArtefactual Systems Jisc

Page 4: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Project structure• Phase 1 – explore: testing, research,

thinking -produce a report (3 months)• Phase 2 – develop: make

Archivematica better for RDM, plan implementation (4 months)

• Phase 3 – implement: set up proof of concepts at York and Hull (6 months)

Page 5: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Phase 1: Read all about it!

http://digital-archiving.blogspot.co.uk/

Page 6: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Why do we need digital preservation for research data?

• There is a digital preservation gap in current RDM infrastructures

• We can’t ignore digital preservation – moving targets for data retention mean we need to take this seriously

• Funder requirements around retention

Page 7: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

University of York RDM questionnaire 2013

• Which data management issues have you come across in your research over the last five years?– “Inability to read files in old software formats on old

media or because of expired software licences”– 24% of 181 researchers who answered this question

admitted this had been a problem for them

Why do we need digital preservation for research data?

Page 8: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Why Archivematica?

“The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools,

methodology and confidence to begin preserving digital information today.”

Page 9: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Why Archivematica?• Standards-based• Open Source• Flexible and customisable• Compatible with hundreds of file formats• Advanced search and storage management• Integrated with third-party systems

From https://ww.archivematica.org/en/

Page 10: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Archivematica for RDM?• Flexible - can support different institutional needs and

workflows• Automates many digital preservation tasks• Can be integrated with other systems• Good for those with limited resources• Enhancements driven by and for the digital preservation

community

Page 11: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Archivematica for RDM?

It gives institutions greater confidence that they will be able to continue to provide access to usable copies of research data over time

Page 12: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Phase 2: Improving Archivematica1. Deliverable 1: Automated DIP regeneration 2. Deliverable 2: METS parsing tools3. Deliverable 3: Generic search REST API

(proof-of-concept)4. Deliverable 4: Support multiple checksum

algorithms5. Deliverable 5: Enhance PRONOM integration

6.Deliverable 6: Automation tools documentation

Page 13: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Deliverable One

✓Research Data needs to be kept,

but we don’t know if anyone will ever want it

and it might be *massive*

The Solution: enable the DIP to be generated ‘on request’ and not as part of the initial ingest

Page 14: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Deliverable Two

✓We want to be able to grab the DIP, and

metadata about it for pulling into our

repository

The Solution: a library to help with parsing and creating METS fileshttps://github.com/artefactual-labs/mets-reader-writer

Page 15: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Deliverable Three✓We want to be able to report on what we

haveThe Solution: a search API to answer basic questions about the number of files in storage, their formats, date of ingest, etc.** we’re working with DMAOnline @lancaster

Page 16: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Deliverable Four

✓With large datasets, the current checksum

mechanism in Archivematica could be a

bottleneck

The Solution: support for multiple checksum algorithms

Page 17: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Deliverable Five

✓What about all those file formats that

Archivematica can’t identify?

The Solution: mechanism for running file identification with multiple tools and a report of unidentified formats, working with PRONOM to improve their coveraage

Page 18: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Deliverable Six

✓We want to make it easier for Institutions to

adopt archivematica

The Solution: documentation and screencasts for Archivematica automation tools, eg.https://wiki.archivematica.org/Getting_started#Installation

Page 19: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

All of these new features will become part of the core Archivematica code in

2016

Page 20: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Phase 3• The plan is to run a third phase of the project to:

✓implement prototype RDM workflows with preservation using the new Archivematica features at York and Hull

✓use the search API to populate DMAOnline with stats about datasets

✓do more community outreach • We will be pitching to Jisc in December for phase

three #fingerscrossed

Page 21: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

How do York plan to use Archivematica?

Pure RDMonitor Archivematica

AIP

AIP Store

PUREWeb Services

Archivematica REST API

DIPRepository

Data Catalogue

Key:human to humanmachine to machinehuman to machine

Page 22: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management
Page 23: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Where to find out more

http://www.york.ac.uk/borthwick/

Page 24: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

The Bigger Picture•Jisc are looking at building shared services for

RDM• Our project is inputting into the specification

and discussion• One area we’d be interested to find out more

about is the appetite for ‘above campus’ options - discussion planned for later.

Page 25: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

How could you use Archivematica?• Host it in-house and link it to an existing

repository/access system (for example DSpace, CONTENTdm, Fedora/Hydra ...or a CRIS)

• Host it in-house and use as a standalone system (you would need to have a storage system in place and establish a way of facilitating access to the data)

• Sign up for a hosted instance of Archivematica with archivesDIRECT (combines Archivematica with DuraCloud storage)

• Sign up for a hosted instance of Archivematica with Arkivum (combines Archivematica with Arkivum storage)

Page 26: Project update: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Thanks!

[email protected]

Useful links:Borthwick website: http://www.york.ac.uk/borthwick/Digital archiving blog: http://digital-archiving.blogspot.co.uk/Archivematica: https://www.archivematica.org/en/Report: http://dx.doi.org/10.6084/m9.figshare.1481170