the reality of digital transfer @archivesnz
DESCRIPTION
Presentation for Archives New Zealand Records Management Network Event describing the reality of digital transfer. Looking at the potential scale of digital transfers from the largest collections we investigated during the initial transfers project and comparing it to the accession work we're currently investigating at time of writing. A look at some of the challenges involved and how we're tackling those.TRANSCRIPT
Department of Internal Affairs
The Reality of Digital Transfer
@ArchivesNZ
Ross Spencer, Talei Masters
Archives New Zealand
Records Management Network Event,
Tuesday November 25 2014
Department of Internal Affairs
Background
Born Digital and Cultural Heritage Conference
Melbourne*: http://bit.ly/1utAqz0
Spencer, Braden, Hutar, Masters, Crouch, Mosely, Fly
Away Home: Pilot Transfer of Born-digital Records at
Archives New Zealand
Collected our experiences from late 2013 through to early
2014. Royal Commission work through to GDAP Closure
and beginning of eAccessions.
* http://playitagainproject.org/conference-report/
Department of Internal Affairs
A missing piece of the jigsaw…
• An appraisal of the technical challenges
• The first of a much bigger puzzle?
• We understood a minimal set of descriptive
metadata e.g. transfer metadata file; mapping
of EDRMS fields to that schema
• But the collection profile was missing –
technical implications of digital preservation…
Department of Internal Affairs
And the numbers were/are huge!
Royal Commission on the Pike River Coal Mine Tragedy
Lotus Notes DMSAccessData Summation
Two EDRMS:
374,264 Files (200GB)
66,580 Directories
3,892 Unidentified Objects
15 Unidentified Extensions
87 Known Formats
55,425 Duplicates (Content)
Analysis time: 108 minutes
24,190 Files (5GB)
641 Directories
1,254 Unidentified Objects
8 Unidentified Extensions
62 Known Formats
6,200 Duplicates (Content)
Analysis time: 44 minutes
Department of Internal Affairs
There’s more…
The Canterbury Earthquakes Royal Commission (partial stats)
11,505 Files (57GB)
246 Directories
123 Unidentified Objects
2 Unidentified Extensions
55 Known Formats
2,468 Duplicates (Content)
Analysis time: stats not collected
Lotus Notes DMS… (but a different flavour!)
One EDRMS:
Department of Internal Affairs
Performance of tools…
Just one (fairly profound?) example for you…Pike River
metadata extraction, and checksum generation… ‘triage’
2949m21.680s
49 Hours!
Department of Internal Affairs
Questions already forming…
• How do we speed things up?
• How do we make reporting consistent?
• Where do we begin with this information?
• Some answers already appearing: stats report is now
generated by a Python script in response to these
issues: https://github.com/exponential-decay/droid-
sqlite-analysis
• Relies only on The National Archives, DROID tool, file
listing, format ID, and checksumming utility
Department of Internal Affairs
eAccession One [e1]
Legacy accessions that we have opportunity to utilise lessons
learned from Initial Digital Transfers…
175 Files (166.5 mb)
10 Directories
0 Unidentified Objects
0 Unidentified Extensions
7 Known Formats
0 Duplicates (content)
Department of Internal Affairs
eAccession Four [e4]
eAccessions were seen to be the least complex and allowed
us to focus, primarily, on the challenge of ingest…
1295 Files (565.0 mb)
6 Directories
2 Unidentified Objects
1 Unidentified Extensions
12 Known Formats
2 Duplicates (content)
Note: Obscured issue in original statistics…
A number of false positives! System files
identified as something more generic.
Thumbnail preview files, and Serif PagePlus
might normally look like MS Office file-like
objects.
Department of Internal Affairs
Technical Challenges in e1 and e4
• [Tools] Ability to handle multi-byte character encodings. Maori macrons
‘Ā’.
• [Tools] Unidentified files and false positives.
• [Tools] Recording of pre-conditioning actions on ingest into digital
preservation system.
• [Tools] Implementing CSV ingest mechanism; configuration, code, and
workflow.
• [Pre-conditioning / Tools] Digital preservation system’s ability (Rosetta)
to handle contiguous spaces in filenames.
• [Pre-conditioning] One invalid JPEG. Required rearrangement of
application marker segments.
Department of Internal Affairs
What next..?
• One step at a time. Accessions e1 and e4; develop capability
further with e2 and e3.
• Incorporate metadata extraction tool JHOVE into process
following experience with e1 and e4, possibly via FITS
• Refine current metrics and the presentation of statistics e.g.
make more useful for Archivists working on the born-digital
we’re already in possession of…
• Ideal: Archivists knowledge (processes, analysis, diagnosis)
becomes actuated.
Department of Internal Affairs
What next..?
• SCALE!
Thank you!
Department of Internal Affairs