![Page 1: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/1.jpg)
The Library of Congress Experience
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress45
![Page 2: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/2.jpg)
LOC Storage
• SANs & NAS used for transactional storage:– Shared production storage (files created directly
on shared storage)
– Quality control & proxy generation production storage
– Staging area for data robot & transmission to backup off-site location
• Failed hard drives replaced as needed
• Complete systems replaced every 5 years or so
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress46
![Page 3: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/3.jpg)
LOC Transactional Storage
• SANs connectivity: Minimum of FC-8 to meet required transfer and processing speeds
– Duale FC-8 and FC-16 also in use for very high speed applications like film scanning
• NAS connectivity:
– 1Gig-E OK for audio and SD content
– 10Gig-E fiber required for HD content
• If connections shared with other devices, storage devices must be given network priority
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress47
![Page 4: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/4.jpg)
LOC Digital Repository
• Data set is effectively permanent
• Archive contents must stand on their own (no external databases required to know all about essence within a file)
• Must be file format agnostic
• Must scale to very large size (EiB+)
• Very Low Bit Error Rates (BER)
– 10 -19th
48
![Page 5: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/5.jpg)
LOC-MBRS Content Archive
• Dual copies, geographically dispersed• Oracle Sun StorageTek T10000-C data tapes
– 5 TB per tape– 9800 slots, 4900 currently populated– Will skip a generation & go to E tapes when available
• SHA-1 Cryptographic checksum used to verify integrity of files while in transit to archive– Also used to verify integrity of the archive
• Metadata maintained in databases– Copy needs to be inserted in each archive file as well
• Proxy files maintained on servers; also stored in the archive with the archive, QC, metadata files
49
![Page 6: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/6.jpg)
LOC-MBRS Content Archive
• Lesson learned: keeping associated files together with a persistent ID is important
• Current system: – file system creates sequentially numbered IDs for each
number in the file and associates it with the MAVIS record in the database
– Problem: original file names are lost– Tie to the original unique MAVIS ID is lost at the file
level: file names have no relation to the original ID– When files are pulled from the archive the sequential
number is retained as the file name, making renaming a requirement for any workflow tracking
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress50
![Page 7: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/7.jpg)
LOC-MBRS Content Archive
• Lesson learned: keeping associated files together with a persistent ID is important
• What is needed:– Wrap associated assets together in one archive
file/object
– Create single MAVIS-based unique ID that persists all the way to the file name in the archive
– Append the MAVIS ID to all file names so original file name is retained, but also tied to the master asset ID number should the files get separated
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress51
![Page 8: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/8.jpg)
LOC-MBRS Content Archive
• Lesson learned: more than 100 million files and the file management system starts to slow down significantly
• File dependent formats need to be in wrappers
– DPX: each film frame is a single file
– 16mm collection: 40 million feet = 1.6 billion frames
– 35mm nitrate collection: 140 million feet = 2.24 billion frames
– MXF? AXF? Depends on you process your data
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress52
![Page 9: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/9.jpg)
Film Storage Calculator
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress53
![Page 10: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/10.jpg)
Other LOC Repositories
• Hold digital copies of virtually every type of content
• Dual copies, geographically dispersed
• Oracle Sun StorageTek T10000-C & D data tapes– C: 5TB/tape; D: 8.5 TB/tape
• IBM TS-1140 also used
• SHA-1 Cryptographic checksum used to verify integrity of files while in transit to archive– Also used to verify integrity of the archive
• Metadata handled in different ways depending on internal client needs
54
![Page 11: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/11.jpg)
Long Term Archive Challenges
• Standardized data structures to ease long-term repository collection maintenance
– Currently vendor dependent
– AXF standard designed to enhance long-term sustainability of a content collection even through multiple migrations
• File specs, Archive Object specs and metadata standards must be well documented
Wednesday 5/21/2014SMPTE Bits by the Bay 2014
James Snyder – Library of Congress55
![Page 12: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/12.jpg)
What Is Archived
• All files are saved individually• Includes all files produced in the archive process
for each asset:– Archive file– Viewing/listening proxy– Any production proxies– Metadata files
• Not a sustainable model for the future• Future: AXF standard (SMPTE 2034) Archive
Objects may replace BagIt archive objects
56
![Page 13: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/13.jpg)
Migration at Five+ Years:Lessons Learned
![Page 14: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/14.jpg)
Lessons Learned
• Mass migration is not only possible, but relatively easy once you have the processes figured out
• Physical workflows (people, cataloging, movement of assets) MUST be created and implemented at the same time as the technical workflows
• Metadata is one of the most complex challenges, but MUST be solved
• Get the humans out of the process as much as possible!
• Cataloging processes must be updated to raise throughput– Cataloging terminology and media terminology are very
different!
58
![Page 15: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/15.jpg)
Lessons Learned
• A lot of work is required to maintain the machinery to playback content
• Everything it took to run the machines 10-50 years ago is still needed today!– Manuals, training for personnel, spare parts
– Compressed air; 3-phase power
• Some parts simply can’t be replaced and are failing due to age– Integrated circuits don’t age well! They weren’t
designed to last for decades59
![Page 16: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/16.jpg)
Lessons Learned
• Fixity checks on files is essential
• Data tapes are still the lowest cost-per-unit for medium and long-term storage
• Don’t move your data tapes if you don’t have to!– The very act of moving them physically increases
bit error rates
• Bit error rate matters!
• What does “30 years” really mean?
60
![Page 17: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/17.jpg)
Lessons Learned
• You CAN make archiving a part of an overall production facility & workflows
– Archive files, production proxies, streaming proxies: one workflow, multiple outputs
– Metadata: Identify, use, document, IMPLEMENT!!!
• You must design, documented and implement files and metadata just as the facility itself was designed, documented and implemented.
• Change control documentation is essential61
![Page 18: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/18.jpg)
Digital Archive: The Challenge
• Long term archival storage planning:– Regular migrations (every 5-7 years)– Verifying your archive– Cryptographic checksums (SHA-1) to validate archive
integrity• Future data workflows
– Updating file wrappers as necessary • Archival MXF spec AS-07 being worked on
– Updating metadata within file wrappers at migration points
• Plan for the future of storage– What happens when increased capacities level off?
![Page 19: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/19.jpg)
Migrations
• Every 5-7 years
• Plan to skip a tape generation:
– T10000-C tape contents will be migrated when the T10000-E tapes become available
– Some or all of the retired T10000 tapes & drives kept for age testing
• Is the 30 year vendor rating real? Let’s find out….
63
![Page 20: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/20.jpg)
Born Digital Content
©, collections and beyond
64
![Page 21: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/21.jpg)
Born Digital
• Two meanings:– Content category
• Content created digitally as files
– Born Digital System: how to handle content• Born Digital File Reception & Processing
– Secure file processing system with strong security procedures
• Live Capture:– Live recordings
– Retention for commercial verification
• Physical Media Intake– Ingesting file content stored on file-based physical media
![Page 22: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/22.jpg)
Born Digital: LOC
• Content types:– Direct electronic submission for © & collections
• Television, film industry, internet; software, gaming & learning
– Files generated by Live Capture & Physical Media Intake
– Internal Library audiovisual productions– Files generated by direct archive file creation:
• Convert Congressional video and audio from physical media to direct archival encoding– Direct encoding to HD JPEG2000 Lossless MXF OP1a of LIVE video
with metadata– Will add 2-5 PB per year to archive when in production– Requires automated workflow to minimize human involvement
![Page 23: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/23.jpg)
Born Digital
• Challenges:– Interoperability for both current use and for the
long term• Users need to view today• Retention period for archiving & use
– Long-term archive storage
• Use international standards whenever possible:– Interoperability: MXF & metadata– Long-term archive storage:
• SMPTE AXF (2034-1)
![Page 24: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/24.jpg)
Future Challenges?
![Page 25: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/25.jpg)
Future Challenges: Content
• 4k Digital Cinema is here• 4k video will start fall 2014• 4k home camcorders are in the wings• 8k is in the wings• Wider color ranges• Higher frame rates• Eventual retirement of NTSC (1.001) frame rates?
– Please oh please oh PLEASE!!!– No more drop frame time codes– Transition will need to be planned: consumer equipment can’t
handle even integer frame rates yet
• Future archive file designs?
69
![Page 26: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/26.jpg)
Future Challenges
• Finding enough equipment to keep the migrations going– Hey buddy, can you spare a piece of obsolete equipment?
• Growing the Digital Repository into the exabyte realm…and beyond?– Ettabyte….yettabyte….then what….?
• Developing the knowledge and training needed to make sure the employees working on your project are adequately trained with proper documentation– We are the LAST GENERATION to have worked with analog in the
production environment! The next will have to be taught.– Manuals, basic training info, retired standards documentation
• Updating workflows and workflow software for new requirements
70
![Page 27: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/27.jpg)
Future Challenges
• Dealing with how our collection continues to age• Studying how our digital collection’s physical
storage & equipment ages• Watch how automated QC functionality is working
& adjusting as necessary• Encourage vendors to think beyond the 2-5 year
survival window: just because you WANT your equipment to obsolesce doesn’t mean it won’t be out there for another 50 years!
• Storage vendors: what do the survival time period statements (like ‘30 years’) REALLY mean? What’s under the hood?
71
![Page 28: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/28.jpg)
IT Issues
• Most commercial IT equipment has bit error rates of 10-14, including Ethernet backbone equipment: what good is storage BER of 10-17
when your system’s best BER is 10-14
• How often to check data integrity?– Continuous process above a certain archive size– Reading the data can also damage it!
• How often to migrate?– Individual files: every 5-10 years– Update the metadata when you migrate
72
![Page 29: The Library of Congress Experience - We are SMPTE · 2017-01-17 · • Oracle Sun StorageTek T10000-C data tapes – 5 TB per tape – 9800 slots, 4900 currently populated – Will](https://reader034.vdocuments.net/reader034/viewer/2022050109/5f46b31641e5906638107124/html5/thumbnails/29.jpg)
Thank You!
James SnyderSenior Systems AdministratorNational Audio Visual Conservation CenterCulpeper, [email protected] 202-707-7097