american archives horror story
TRANSCRIPT
![Page 1: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/1.jpg)
ARCHIVE
LTO FAILURE AND DATA LOSS
![Page 2: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/2.jpg)
Who we are: WGBH MLA
![Page 3: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/3.jpg)
Who We Are: AAPB
...and more than 120 public radio and television stations and archives nationwide
![Page 4: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/4.jpg)
Digitization recently completed
WGBH’s 7,010 tapes that were sent for digitization
![Page 5: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/5.jpg)
Returned on 17 LTO-6 tapes
![Page 6: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/6.jpg)
• 5,000 hours of digitized and born digital media
• Up to 59,000 files
• Not to exceed 5.24 terabytes after transcoding has occurred
The Born Digital Deliverable
![Page 7: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/7.jpg)
• Lack of staff resources at stations• Absence of existing metadata• Unique identifiers ≠ actual names of
files• Limitations of our metadata
management system • Bicycling hard drives• Access quality vs. preservation
quality• 5.24 terabytes became 300+
terabytes
We had some challenges
![Page 8: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/8.jpg)
• Send multiple batches totaling 13,500 video and audio files
• Pull 300TB of files over our network and place on 76 3TB hard drives
– Stored on LTO-4 robotic machine in IT
– Checksums for most files did not exist
– Many files up to 100GB each
The Plan at WGBH
![Page 9: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/9.jpg)
![Page 10: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/10.jpg)
THE PROBLEM
Out of a set of 2069 files pulled for Batch 3 part 1, 1195 proved to have failed on reaching Crawford
693 failed initial analysis
394 failed QC
108 failed transcode
= 57% failure rate
The next batch had 1310 failures out of 2826 files
![Page 11: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/11.jpg)
THE PROCESS
start with csv file containing final name of file at receiving end, full path to file on source end, ID value of offline storage tape
shell script:
- sorts files by # of storage tape
- logs into DAM using ssh
- transfers file using scp through Artesia from LTO 4 tape (stored as tarball) onto 3 TB hard drive
later versions used tar rather than scp
![Page 12: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/12.jpg)
THE PROCESS (REVISED)post-transfer, compare the megabyte block counts of source and destination products
(no checksum – took too much time to perform on such large files while under time pressure)
failed items automatically removed from drive
transfer script re-run until all files download successfully
if files fail repeatedly, assume they have failed on LTO; backup tape called from Iron Mountain and attempted to be staged from there
![Page 13: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/13.jpg)
THE PROGRESS
Many files that initially failed eventually transferred successfully, either from the initial tape or from a backup, after multiple attempts
Others were never successfully transferred
Out of a planned 10,648 files in the batch, 2173 were never successfully downloaded – a 20% failure rate
![Page 14: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/14.jpg)
BREAKING DOWN THE FAILURES
ffmpeg –i ${filename}mediainfo –f ${filename}
“moov atom not found”
![Page 15: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/15.jpg)
![Page 16: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/16.jpg)
QC FAILURE
Playable files with evidence of corruption defined by Crawford as “issues that would make the file unusable,” for example:
a green screen with no audio
a video that plays for two seconds before the screen going black or grey
pixels shift out of place in zigzag pattern
audio is digital noise only
![Page 17: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/17.jpg)
![Page 18: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/18.jpg)
![Page 19: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/19.jpg)
![Page 20: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/20.jpg)
![Page 21: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/21.jpg)
THE PROGNOSIS
Sample data: 5000 files with checksums generated at creation
1012 of those files could not be transferred from LTO, after multiple attempts
However, MD5s on LTO show the files are unchanged
So the files are good – but can’t be reached?
![Page 22: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/22.jpg)
THE POSSIBILITIES
Files were bad before they went onto LTO –production environment provides little opportunity for QC
Files are good, but inaccessible on LTO because of problems with the way the data is stored on the tape or the interaction of the different technologies used to get it out c
![Page 23: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/23.jpg)
THE PROBLEMS NOW
Administrative distance between institutional IT and archival needs makes it difficult to get clear answers about the technology we’re using
Staff turnover means information about original systems/data transfer processes are lost
Local LTO systems incompatible with older tapes, making direct testing currently impossible
![Page 24: American Archives Horror Story](https://reader033.vdocuments.net/reader033/viewer/2022042518/55a9b8781a28abcf488b45eb/html5/thumbnails/24.jpg)
NEXT STEPS
Acquire Linux machine for direct testing of LTO 4 tapes
Test different transfer protocols
More investigation into the SL8500 SAMFS/QFS
Look for patterns in inaccessible files (file size, date uploaded, system architecture on storage tape)