eric luhrs digital initiatives librarian special collections & college archives...

19
Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives [email protected] MetaDB Development at Lafayette College Haruki Yamaguchi Class of 2011 Department of Computer Science [email protected] CS320: December 1, 2010

Upload: silvester-watson

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Eric Luhrs

Digital Initiatives LibrarianSpecial Collections & College Archives

[email protected]

MetaDB Development at Lafayette College

Haruki Yamaguchi

Class of 2011Department of Computer Science

[email protected]

CS320: December 1, 2010

Page 2: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

About this talk1. WHY: digital collection management

– The shift from analog to digital– Preserving our digital heritage– Fast-moving field

2. WHAT: usage overview– Digitization workflow– Why this is important– Brief version history

3. HOW: design overview– Development environment– Application design– Database interaction

4. END: wrap-up– Demo (time permitting)

Talk Outline

Page 3: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

About this talkEver-Expanding Digital Collections

The Lafayette140 years online, ~43,000 searchable pageshttp://digital.lafayette.edu/collections/newspaper

Lafayette Digital RepositoryOpen Access to faculty & College publicationshttp://dspace.lafayette.edu

East Asia Image Collection~3,000 images from books, photographs, negatives, slideshttp://digital.lafayette.edu/collections/eastasia

Page 4: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

About this talk

Analog versus digital– Can easily find 1000-year-old book– Where is the page we scanned last week?– How to manage digital material in the future?– Are we headed toward a digital dark age?– Organization, storage, and retrieval of

information is a big field trying to keep up with fast-changing technology

Preserving Our Digital Heritage

Page 5: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

About this talk

1. Standardization is first step toward preservation– Automation prevents human error– Allows us isolate specific content types– Ubiquitous formats aid standards aid migration

2. Strengthens digital collection building efforts– Allows me to work faster and smarter– Subject exports create stronger collections

MetaDB: Return on Investment

Page 6: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

File Input

• High Resolution Master Images

Metadata Input

• Descriptive• Administrative• Technical

Collection Output

• CSV & TSV data• Derivative Images

Workflow Managment

Digital Asset Management System

(CONTENTdm, Dspace, Drupal)

Page 7: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Descriptive MD(subject specialist)

•Title•Description•Subjects•[…]

Administrative MD(librarian)

•Collection•Publisher•Access Rights•[…]

Technical MD(automated)

•File Format•File Size•Checksum•[…]

MetaDB Allows Us to Automate & Distribute Collection of Metadata

Asset Management

System

File Input

• High Resolution Master Images

Metadata Input

• Descriptive• Administrative• Technical

Collection Output

• CSV & TSV data• Derivative Images

Page 8: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

MetaDB Allows Us to Automate & Distribute Collection of Metadata

Asset Management

System

File Input

• High Resolution Master Images

Metadata Input

• Descriptive• Administrative• Technical

Collection Output

• CSV & TSV data• Derivative Images

CSV, TSV data

•Dublin Core metadata standard•Common file format outputs

Derivative Images

•Created from multiple image formats•Custom image sizes•Pan/zoom interface•Banding/branding

Page 9: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Completed MetaDB

Collection

TSV JPG

CONTENTdm

Page 10: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Better Digital Collections

Increased Visibility

New Acquisition

Greater Knowledge

Improve Workflow

Why this Work is Important

Page 11: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

MetaDB Version History

Version 0 Microsoft Access Database shared over local Novell network

Version 1 MySQL database with simple web-based HTML interface

Version 2 MySQL database with PHP / YUI JavaScript interface

Version 3 Postgres database with Java / jQuery JavaScript interface

Page 12: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Major Features in Latest Version

Version 3.1 Table view editing Controlled vocabularies Drag/drop field ordering Technical metadata extraction Automatic derivative creation Image banding/branding Web-based user management Vastly improved user interface

Page 13: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Development Environment

MetaDB0(Production)

MetaDB1(Development)

svn.lafayette.edu

New releasesFeedbackBugs

Test

Patches/Features

Page 14: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Application Design

MetaDB Service APIAJAX

Database

Servlets

Images

ImageMagick

Back-End

Front-End

Page 15: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Update Database

Data

Whitelist

Cross-checking

Authentication

Log

Feedback

Page 16: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Retrieve from Database

Data

AuthenticateRequest

Gather dataWrap in objectsUnpack into JSON

Log

Project: cpw-nofukoItem: 1Type: Descriptive MetadataSession ID:AHJ7HA…

ConcurrencyCheck

Page 17: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Populate User Interface

Data

WidgetsTemplates

Page 18: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Facts & Figures

• Development : January 2009 - Present

• Size: 110+ Java classes, ~30,000 lines of code

• Database: ~120,000+ rows of data

• Images: ~200GB disk space

• Subversion: Revision 3864

Page 19: Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives luhrse@lafayette.edu MetaDB Development at Lafayette College Haruki Yamaguchi

Eric Luhrs Haruki [email protected] [email protected]

http://metadb.lafayette.edu

What does this mean for you?