organization, clarity, and sanity: digitization for the future on a shoestring organization,...
TRANSCRIPT
Organization, Clarity, and Sanity: Organization, Clarity, and Sanity: Digitization for the Future On a ShoestringDigitization for the Future On a Shoestring
University of Alabama Libraries
Jody L. DeRidderJody L. [email protected] [email protected]
Image courtesy of Life Magazine
Or
Digital Library Development From the Bottom UpDigital Library Development From the Bottom Up
Libraries organize information… primarily books.
Trinity College Library, Dublin, as captured by Candida Höferin her book Libraries (Thames and Hudson ,UK: 2005).
Photo credit: Flickr user "Libby", used with permission (creative commons)
If libraries organize books… Why not digital files??
It’s all information!
A digital object may belong in MANY potential virtual collections…
… but it originated from ONE SINGLE ANALOG collection. Provenance trumps all!
Slavery African Americans Sheet Music Tombigbee River Southern History … and more
“Gum Tree Canoe,” Published by G.P. Reed (Boston: 1847). Wade Hall collection of Southern History and Culture, Hoole Special Collections, University of Alabama Libraries.
Bringing Order to Chaos
University of Alabama Libraries
Holder ID: u0003
Collection ID: 0000023
Item ID: 0000007
Sequence ID: 0005
Archival File: u0003_0000023_0000007_0005.tif
1) Clarity
2) Low cost
3) Simple
4) Extensible
u0003_0001980_0000001 is the first digitized item in the MSS 1980 collection
HOLDER ID
COLLECTION ID
The Digitization Working Area…
Collection folders are named for the collection identifier. Allowed subfolders include:
Admin Metadata Scans Transcripts
Compound objects have their own subfolders for pages, named for the item.
And a Collection Folder in the Working Area
An Example of the Lowest- Cost Model: The Alabama Digital Preservation Network http://www.adpn.org/
http://www.lockss.org/
Lots of Copies Keeps Stuff Safe!!
u0003 slide
Identification, Organization and Consistency
Each segment of numbers:
Holder ID Collection ID Item ID Sequence ID
is used in the directory structure.
The directory for u0003_0000003_0002_001.tif
Is simply:
u0003/ 0000003/ 0002/ 001/
Dropping the Technical Metadata in… where it belongs
Makes METS creation a Piece of Cake!
(and redundant!)
Using FITS, the File Information Tool Set developed by Harvard which encapsulates JHOVE, DROID, ExifTool and other tools: http://code.google.com/p/fits/
Bringing Content Up to the Level Of the WEB!!! Greater Usability and Access == Longer Life
Images … ImageMagick: http://www.imagemagick.org(it’s free!)
Protected archive area
u0003 u0003
0000023 0000023
0000007
0005
u0003_0000023_0000007_0005.tif
0000007
0005
Thumb, mid-, and large-size derivatives
Web accessible area
Audio … LAME: http://lame.sourceforge.netOCR … TESSERACT: http://code.google.com/p/tesseract-ocr/
http://acumen.lib.ua.edu
ACCESS! Via Acumen
(also free!)
XML agnostic No ingest No metadata modifications All content easily accessible Open to search engines
Bringing Order to Chaos
1) Clarity
2) Low cost
3) Simple
4) Extensible
University of Alabama Libraries
Holder ID: u0003
Collection ID: 0000023
Item ID: 0000007
Sequence ID: 0005
Archival File: u0003_0000023_0000007_0005.tif
Jody L. [email protected]
UA Digital Services wiki:http://intranet.lib.ua.edu/groups/Digital_Services