building the universal library: introducing hathitrust patricia a. steele indiana university...

31
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries December 8, 2008

Upload: jose-maclean

Post on 27-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

Building the Universal Library: Introducing HathiTrust

Patricia A. SteeleIndiana University Libraries

John Price WilkinUniversity of Michigan Libraries

December 8, 2008

Page 2: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Vision

Page 3: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Reasons• Google Digitization Project• Collective Agreement with CIC Announced in

June 2007– U of Michigan and U of Wisconsin Projects already

underway

Page 4: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

• Librarians value preservation– How to ensure digital files are preserved?

The Reasons

Page 5: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Reasons• Librarians value access

– How to create a comprehensiveand coherent body of materials?

• Librarians believe in cooperation– How do you achieve a common

goal?

Page 6: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Beginning• In 2007, CIC agreed to establish a shared

digital repository• University of Michigan and Indiana University

initial leaders of this effort

Page 7: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Beginning

Page 8: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Name• The name… hathitrust.org

hathi.org

olifant.org

silverback.org

kingkong.org

toomai.org

Page 9: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Name• The meaning behind the name

– Hathi (hah-tee)--Hindi for elephant– Big, strong– Never forgets, wise– Secure– Trustworthy

Page 10: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Banking Analogy

Page 11: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Logo

Page 12: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

The Partners• When announced in October 2008, full

partners included:– University of California system– CIC (Committee on Institutional Cooperation)

– University of Virginia

University of ChicagoUniversity of IllinoisIndiana UniversityUniversity of IowaUniversity of Michigan Michgian State University

University of MinnesotaNorthwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison

Page 13: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

vs.

The Differences

Page 14: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Sorting the Issues• Cost Model

– Partners charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of those volumes.

Page 15: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Sorting the Issues• Governance

Page 16: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Sorting the Issues• Impact of Google settlement

– Full access to materials– More quickly than a court– Win would have permitted

content locked up foryears

Page 17: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

HathiTrust Architecture• Storage in Ann Arbor and Indianapolis• Encrypted backup to 2nd AA location• Inbound validation, standards-based object

storage and related metadata• Rights database for rights metadata• Online catalog as source and storage for

descriptive metadata

Page 18: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

• Objectives:– A guiding principle: store archival images, create deliverables on

demand– Incorporate TDR-specific practices

• Simple filesystem layout using Pairtree structure– One directory per volume, all files inside zip w/associated METS

file– Use of a namespace allows for conflicting identifiers– Namespaces for institutions and, if needed, types of identifiers

within the institution

Page image andmetadata repository

Page 19: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

• What information to store?– Considered complexity and maintenance– Considered using MARC directly– Needed to accommodate both bib record-derived rights and

manual overrides

• Approach: examine bib record, determine authoritative copyright status, store rights attribute, source, reason, and timestamp

• Stored in MySQL

Rights database, pt1©

Page 20: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

• Each rights attribute must have a reason.– bib: bibliographically-derived– man: manual access control override– ddd: due diligence documented

• Typical rights attributes in use– pd: public domain– pdus: public domain for US viewers*– inc: in copyright– nobody (override): no access

• Source (e.g.,‘google’)

Rights database, pt. 2©

Page 21: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

©rights

databaseGeoIP

databasearchival

page image

Pageturner: page image retrieval

librarycatalog

metadata

METS XML

online page image

XSLT

XML

HTML

browser

Page 22: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

HathiTrust and TRAC• Automatic validation in GROOVE

– Check barcode check digit using Luhn algorithm– Fixity check on JPG, TIFF, UTF8 using MD5– Well-formedness and embedded metadata check

on JPG, TIFF, UTF8 using JHove– Various completeness cross-checks– Failures retried, admin will eventually intervene

• Periodic fixity checks using MD5

Page 23: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

OAIS Reference Model

GRINInternal Data Loading

GRINInternal Data Loading

Google[OCA]

In-house Conversion

Google[OCA]

In-house Conversion

MARC record extensions (Aleph)

Rights DB

MARC record extensions (Aleph)

Rights DB

Page TurnerHathiTrust API

OAIGeoIP DB

CNRI Handles[Solr]

Page TurnerHathiTrust API

OAIGeoIP DB

CNRI Handles[Solr]

METS/PREMIS objectTIFF G4/JPEG2000

OCRMD5 checksums

METS/PREMIS objectTIFF G4/JPEG2000

OCRMD5 checksums

METS objectPNGOCRPDF

METS objectPNGOCRPDFIsilon

Site ReplicationTSM

MD5 checksum validation

IsilonSite Replication

TSMMD5 checksum validation

GROOVE(JHOVE)GROOVE(JHOVE)

Page 24: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

• Why METS?– Can serve as an Archival Information Package and

a Dissemination Information Package– Designed to record the relationship between

pieces of complex digital objects– Can be created automatically as texts are loaded

or reloaded

METS Object

Page 25: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

• What’s there?

– metsHdr with an ID and CREATEDATE

– dmdSec with a URL

– Two techMD referencing notes files

– Two fileGrps (images and OCR)

– Physical structMap tying together the files with any metadata (pg. numbers or features)

METS Object

Page 26: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

HathiTrust Services• Preservation of digital surrogate• Access (within bounds of law and settlement)

– Viewing– Redistribution

• Services for print-disabled users• Section 108• Non-consumptive research

Page 27: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

HathiTrust Branding

Page 28: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Legal Status of the Books• Outside of the Settlement

– Public domain content digitized by libraries unconstrained– Libraries continue to do preservation-related work with in-copyright

works (Sec108)

• Settlement– LDC or cooperative LDC (HathiTrust)– Services for print-disabled users– Non-consumptive research– Section 108 uses– General discovery– Sharing of Public domain

Page 29: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

HathiTrust Future• Expansion of partnership• New services • Revision of governance• Refinement of content

Page 30: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Contacts, etc.• http://www.HathiTrust.org (see sitemap)• Patricia Steele <[email protected]>• John Wilkin <[email protected]>

Page 31: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries

www.hathitrust.org

Digital library for the future