eric luhrs digital initiatives librarian special collections & college archives...
TRANSCRIPT
Eric Luhrs
Digital Initiatives LibrarianSpecial Collections & College Archives
MetaDB Development at Lafayette College
Haruki Yamaguchi
Class of 2011Department of Computer Science
CS320: December 1, 2010
About this talk1. WHY: digital collection management
– The shift from analog to digital– Preserving our digital heritage– Fast-moving field
2. WHAT: usage overview– Digitization workflow– Why this is important– Brief version history
3. HOW: design overview– Development environment– Application design– Database interaction
4. END: wrap-up– Demo (time permitting)
Talk Outline
About this talkEver-Expanding Digital Collections
The Lafayette140 years online, ~43,000 searchable pageshttp://digital.lafayette.edu/collections/newspaper
Lafayette Digital RepositoryOpen Access to faculty & College publicationshttp://dspace.lafayette.edu
East Asia Image Collection~3,000 images from books, photographs, negatives, slideshttp://digital.lafayette.edu/collections/eastasia
About this talk
Analog versus digital– Can easily find 1000-year-old book– Where is the page we scanned last week?– How to manage digital material in the future?– Are we headed toward a digital dark age?– Organization, storage, and retrieval of
information is a big field trying to keep up with fast-changing technology
Preserving Our Digital Heritage
About this talk
1. Standardization is first step toward preservation– Automation prevents human error– Allows us isolate specific content types– Ubiquitous formats aid standards aid migration
2. Strengthens digital collection building efforts– Allows me to work faster and smarter– Subject exports create stronger collections
MetaDB: Return on Investment
File Input
• High Resolution Master Images
Metadata Input
• Descriptive• Administrative• Technical
Collection Output
• CSV & TSV data• Derivative Images
Workflow Managment
Digital Asset Management System
(CONTENTdm, Dspace, Drupal)
Descriptive MD(subject specialist)
•Title•Description•Subjects•[…]
Administrative MD(librarian)
•Collection•Publisher•Access Rights•[…]
Technical MD(automated)
•File Format•File Size•Checksum•[…]
MetaDB Allows Us to Automate & Distribute Collection of Metadata
Asset Management
System
File Input
• High Resolution Master Images
Metadata Input
• Descriptive• Administrative• Technical
Collection Output
• CSV & TSV data• Derivative Images
MetaDB Allows Us to Automate & Distribute Collection of Metadata
Asset Management
System
File Input
• High Resolution Master Images
Metadata Input
• Descriptive• Administrative• Technical
Collection Output
• CSV & TSV data• Derivative Images
CSV, TSV data
•Dublin Core metadata standard•Common file format outputs
Derivative Images
•Created from multiple image formats•Custom image sizes•Pan/zoom interface•Banding/branding
Completed MetaDB
Collection
TSV JPG
CONTENTdm
Better Digital Collections
Increased Visibility
New Acquisition
Greater Knowledge
Improve Workflow
Why this Work is Important
MetaDB Version History
Version 0 Microsoft Access Database shared over local Novell network
Version 1 MySQL database with simple web-based HTML interface
Version 2 MySQL database with PHP / YUI JavaScript interface
Version 3 Postgres database with Java / jQuery JavaScript interface
Major Features in Latest Version
Version 3.1 Table view editing Controlled vocabularies Drag/drop field ordering Technical metadata extraction Automatic derivative creation Image banding/branding Web-based user management Vastly improved user interface
Development Environment
MetaDB0(Production)
MetaDB1(Development)
svn.lafayette.edu
New releasesFeedbackBugs
Test
Patches/Features
Application Design
MetaDB Service APIAJAX
Database
Servlets
Images
ImageMagick
Back-End
Front-End
Update Database
Data
Whitelist
Cross-checking
Authentication
Log
Feedback
Retrieve from Database
Data
AuthenticateRequest
Gather dataWrap in objectsUnpack into JSON
Log
Project: cpw-nofukoItem: 1Type: Descriptive MetadataSession ID:AHJ7HA…
ConcurrencyCheck
Populate User Interface
Data
WidgetsTemplates
Facts & Figures
• Development : January 2009 - Present
• Size: 110+ Java classes, ~30,000 lines of code
• Database: ~120,000+ rows of data
• Images: ~200GB disk space
• Subversion: Revision 3864
Eric Luhrs Haruki [email protected] [email protected]
http://metadb.lafayette.edu
What does this mean for you?