why can’t i read this file? born-digital challenges at the smithsonian institution archives

20
Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives Lynda Schmitz Fuhrig Mid-Atlantic Regional Archives Conference Fall 2011, Bethlehem, PA

Post on 19-Oct-2014

2.565 views

Category:

Technology


0 download

DESCRIPTION

Smithsonian Institution Archives Lynda Schmitz Fuhrig Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives MARAC Fall 2011 presentation

TRANSCRIPT

Page 1: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Why Can’t I Read This File? Born-Digital Challenges

at the Smithsonian Institution Archives

Lynda Schmitz FuhrigMid-Atlantic Regional Archives Conference Fall 2011, Bethlehem, PA

Page 2: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
Page 3: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Smithsonian Institution Archives’ Mission

• Appraise, acquire, and preserve• Offer a range of research and reference services• Create and promote products and services that broaden the understanding of the Smithsonian • Provide professional archival and conservation expertise

Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives.

Page 4: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

SI Archives Digital Services Division

• Curate and preserve born-digital collections

• Digitize images, video, and audio

• Research digital preservation issues

• Promote the archives through web and outreach

SIA Accession 11-124

Page 5: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Born-digital records that document

the Smithsonian’s history• Text• Images• Drawings/CAD• Databases and spreadsheets• Audio• Video• Websites and social media• Email accountsMany part of mixed collection of paper and electronicRemovable media or server/ftp transfer

SIA Accession 11-281

Page 6: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
Page 7: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

SI Archives’ procedures

• Inspect media

• Virus scan

• Conduct transfer/ingest with checksums

• Make copy

• Analyze files for formats and issues

• Convert proprietary files to preservation formats

Page 8: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Current preservation formatsMS Word/WordPerfect PDF/A or PDF

PowerPoint, Excel PDF/A or PDF

GIF, JPG, BMP, etc. TIF

Access databases SIARD XML

Audio WAV/BWF

Websites crawled and captured as WARC

Email saved to XML following CERP/EMCAP preservation schema

Born-digital video not straight-forward. Different options

Digitized video Motion JPG2000

Page 9: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Tools for processing

• Open source and proprietary software• Jhove, Droid, FITS (FITS is also a format)• MediaInfo• In-house batch scripts• Duke Data Accessioner• Evaluating Curator’s Workbench• CERP (SIA-Rockefeller Archive Center) parser

Page 10: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Files in disguise

• No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect

• Wrong extension – .doc could be a Word or it could be WordPerfectBMP that is a JPG

• Complete unknowns that date back 20 years or more

Accession 10-052

Page 11: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Older files

• Gerber • PCD (Kodak Photo CD)• EXE (Executables)

Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.

Page 12: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

DATs (Digital Audio Tapes)Transfer them now, if you can!

Machine production ended

Tapes susceptible to fungus, other problems

DAT recorded in 1990 for the Folk Masters radio program. SIA Accession 06-106

Page 13: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

It Says It Is

PDF/A

Accession 08-149

Page 14: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
Page 15: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

But It’s Not PDF/A

Page 16: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Software incompatibility issues

Page 17: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

New formats/flavors/technologies

Geospatial PDF WWF – PDF that doesn’t print

Keep an eye on mobile sites/apps

3D scanning and printing - Point clouds

Page 18: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Digital forensics

Page 19: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Resources for formats

Sustainability of Digital Formats – Library of Congresshttp://www.digitalpreservation.gov/formats

Pronom – The National Archives in the UKhttp://www.nationalarchives.gov.uk/PRONOM/Default.aspx

Unified Digital Formats Registry – Expected date of operation 2012http://www.udfr.org/

FILExt – File Extension Sourcehttp://filext.com/

TrID – File Identifierhttp://mark0.net/soft-trid-e.html

Page 20: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Lynda Schmitz FuhrigDigital Services [email protected]

Smithsonian Institution Archives website:http://siarchives.si.edu