why can’t i read this file? born-digital challenges at the smithsonian institution archives

Post on 19-Oct-2014

2.565 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Smithsonian Institution Archives Lynda Schmitz Fuhrig Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives MARAC Fall 2011 presentation

TRANSCRIPT

Why Can’t I Read This File? Born-Digital Challenges

at the Smithsonian Institution Archives

Lynda Schmitz FuhrigMid-Atlantic Regional Archives Conference Fall 2011, Bethlehem, PA

Smithsonian Institution Archives’ Mission

• Appraise, acquire, and preserve• Offer a range of research and reference services• Create and promote products and services that broaden the understanding of the Smithsonian • Provide professional archival and conservation expertise

Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives.

SI Archives Digital Services Division

• Curate and preserve born-digital collections

• Digitize images, video, and audio

• Research digital preservation issues

• Promote the archives through web and outreach

SIA Accession 11-124

Born-digital records that document

the Smithsonian’s history• Text• Images• Drawings/CAD• Databases and spreadsheets• Audio• Video• Websites and social media• Email accountsMany part of mixed collection of paper and electronicRemovable media or server/ftp transfer

SIA Accession 11-281

SI Archives’ procedures

• Inspect media

• Virus scan

• Conduct transfer/ingest with checksums

• Make copy

• Analyze files for formats and issues

• Convert proprietary files to preservation formats

Current preservation formatsMS Word/WordPerfect PDF/A or PDF

PowerPoint, Excel PDF/A or PDF

GIF, JPG, BMP, etc. TIF

Access databases SIARD XML

Audio WAV/BWF

Websites crawled and captured as WARC

Email saved to XML following CERP/EMCAP preservation schema

Born-digital video not straight-forward. Different options

Digitized video Motion JPG2000

Tools for processing

• Open source and proprietary software• Jhove, Droid, FITS (FITS is also a format)• MediaInfo• In-house batch scripts• Duke Data Accessioner• Evaluating Curator’s Workbench• CERP (SIA-Rockefeller Archive Center) parser

Files in disguise

• No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect

• Wrong extension – .doc could be a Word or it could be WordPerfectBMP that is a JPG

• Complete unknowns that date back 20 years or more

Accession 10-052

Older files

• Gerber • PCD (Kodak Photo CD)• EXE (Executables)

Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.

DATs (Digital Audio Tapes)Transfer them now, if you can!

Machine production ended

Tapes susceptible to fungus, other problems

DAT recorded in 1990 for the Folk Masters radio program. SIA Accession 06-106

It Says It Is

PDF/A

Accession 08-149

But It’s Not PDF/A

Software incompatibility issues

New formats/flavors/technologies

Geospatial PDF WWF – PDF that doesn’t print

Keep an eye on mobile sites/apps

3D scanning and printing - Point clouds

Digital forensics

Resources for formats

Sustainability of Digital Formats – Library of Congresshttp://www.digitalpreservation.gov/formats

Pronom – The National Archives in the UKhttp://www.nationalarchives.gov.uk/PRONOM/Default.aspx

Unified Digital Formats Registry – Expected date of operation 2012http://www.udfr.org/

FILExt – File Extension Sourcehttp://filext.com/

TrID – File Identifierhttp://mark0.net/soft-trid-e.html

Lynda Schmitz FuhrigDigital Services Divisionschmitzfuhrigl@si.edu

Smithsonian Institution Archives website:http://siarchives.si.edu

top related