why can’t i read this file? born-digital challenges at the smithsonian institution archives
Post on 19-Oct-2014
2.565 views
DESCRIPTION
Smithsonian Institution Archives Lynda Schmitz Fuhrig Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives MARAC Fall 2011 presentationTRANSCRIPT
Why Can’t I Read This File? Born-Digital Challenges
at the Smithsonian Institution Archives
Lynda Schmitz FuhrigMid-Atlantic Regional Archives Conference Fall 2011, Bethlehem, PA
Smithsonian Institution Archives’ Mission
• Appraise, acquire, and preserve• Offer a range of research and reference services• Create and promote products and services that broaden the understanding of the Smithsonian • Provide professional archival and conservation expertise
Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives.
SI Archives Digital Services Division
• Curate and preserve born-digital collections
• Digitize images, video, and audio
• Research digital preservation issues
• Promote the archives through web and outreach
SIA Accession 11-124
Born-digital records that document
the Smithsonian’s history• Text• Images• Drawings/CAD• Databases and spreadsheets• Audio• Video• Websites and social media• Email accountsMany part of mixed collection of paper and electronicRemovable media or server/ftp transfer
SIA Accession 11-281
SI Archives’ procedures
• Inspect media
• Virus scan
• Conduct transfer/ingest with checksums
• Make copy
• Analyze files for formats and issues
• Convert proprietary files to preservation formats
Current preservation formatsMS Word/WordPerfect PDF/A or PDF
PowerPoint, Excel PDF/A or PDF
GIF, JPG, BMP, etc. TIF
Access databases SIARD XML
Audio WAV/BWF
Websites crawled and captured as WARC
Email saved to XML following CERP/EMCAP preservation schema
Born-digital video not straight-forward. Different options
Digitized video Motion JPG2000
Tools for processing
• Open source and proprietary software• Jhove, Droid, FITS (FITS is also a format)• MediaInfo• In-house batch scripts• Duke Data Accessioner• Evaluating Curator’s Workbench• CERP (SIA-Rockefeller Archive Center) parser
Files in disguise
• No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect
• Wrong extension – .doc could be a Word or it could be WordPerfectBMP that is a JPG
• Complete unknowns that date back 20 years or more
Accession 10-052
Older files
• Gerber • PCD (Kodak Photo CD)• EXE (Executables)
Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.
DATs (Digital Audio Tapes)Transfer them now, if you can!
Machine production ended
Tapes susceptible to fungus, other problems
DAT recorded in 1990 for the Folk Masters radio program. SIA Accession 06-106
It Says It Is
PDF/A
Accession 08-149
But It’s Not PDF/A
Software incompatibility issues
New formats/flavors/technologies
Geospatial PDF WWF – PDF that doesn’t print
Keep an eye on mobile sites/apps
3D scanning and printing - Point clouds
Digital forensics
Resources for formats
Sustainability of Digital Formats – Library of Congresshttp://www.digitalpreservation.gov/formats
Pronom – The National Archives in the UKhttp://www.nationalarchives.gov.uk/PRONOM/Default.aspx
Unified Digital Formats Registry – Expected date of operation 2012http://www.udfr.org/
FILExt – File Extension Sourcehttp://filext.com/
TrID – File Identifierhttp://mark0.net/soft-trid-e.html
Lynda Schmitz FuhrigDigital Services [email protected]
Smithsonian Institution Archives website:http://siarchives.si.edu