building a fedora architecture to support diverse collections
DESCRIPTION
Building a Fedora Architecture to Support Diverse Collections. Jon Dunn Ryan Scherle Digital Library Program Indiana University. Indiana University Digital Library Program. Joint venture of Libraries and University Information Technology Services (UITS) formed in 1997 - PowerPoint PPT PresentationTRANSCRIPT
Building a Fedora Architecture to Support Diverse Collections
Jon DunnRyan Scherle
Digital Library ProgramIndiana University
Indiana University Digital Library Program Joint venture of Libraries and University
Information Technology Services (UITS) formed in 1997
Bloomington-based; supporting 8 campuses Engaged in digital collection building, infrastructure
design/management, and research activities Supporting library, archive, museum, academic
department, and faculty-based digital collections projects
Digital Library Content Types at IU Books Manuscripts Photographs Art images Music audio Video Sheet music Musical score images Music notation files …and more
Current DLP Technical Environment: Access Systems DLXS (University of Michigan)
Text Finding Aids Bibliographic information
IBM Content Manager Locally-developed systems
Cushman Photograph Collection DIDO: Digital Images Delivered Online Variations2 Page turners (sheet music, METS Navigator)
Current DLP Technical Environment: Storage DLP server disk storage Tivoli Storage Manager IU Massive Data Storage System (MDSS)
HPSS software1.6 petabytes of StorageTek and IBM
automated tapeAccess via FTP, PFTP, HSI
Motivations for a repository
Centralize access and preservation functions for IU’s digital collections
Reduce DLP staff time and attention needed to create and maintain collections
Enable librarians, curators, archivists to digitize new collections
Enable digital preservation
DL Infrastructure Project
Proposal funded by University Information Technology Services to reengineer digital library infrastructure around Fedora
Builds on experience with Fedora in context of EVIA Digital Archive (ethnomusicology video)
Building services and tools around Fedora Searching/browsing of metadata and
content End-user UI for display/navigation of
metadata and content Cataloging and ingest tools Preservation services
IU Content Models
Defining a content model
Focus on what you can do with an object Behaviors are primary Behaviors are the way all external processes
will interact with the object Keep datastreams “private”
Diversity Multiple media types Multiple brands Multiple tools
Standard disseminators All objects subscribe to the default disseminator Most objects subscribe to the metadata disseminator Most objects subscribe to type-specific disseminators
Metadata dissem
getMetadata(type)
Default dissem
getDefaultView
getLabel
getFullView
getPreview
getAssetDefinition
Simple images Each image is a single Fedora object Images are available in a variety of sizes Each image belongs to one or more collections
Default dissem
Metadata dissem
Collection obj
Collection dissem
Default dissem
Metadata dissem
Image obj
Image dissem
Default dissem
Metadata dissem
Image obj
Image dissem
Default dissem
Metadata dissem
Collection obj
Collection dissem
Default dissem
Metadata dissem
Book obj
Paged dissem
Default dissem
Metadata dissem
Book obj
Paged dissem
Default dissem
Metadata dissem
Page obj
Image dissem
Default dissem
Metadata dissem
Page obj
Image dissem
Default dissem
Metadata dissem
Page obj
Image dissem
Default dissem
Metadata dissem
Page obj
Image dissem
Object-level disseminators
Image getThumbnail getScreenSize getLarge getMaster
Video getSmilFile playSmilFile getStructMap getActionObject getObjectID
PagedImage getNumChildren getChildren
PagedText getSummary getChunkList getChunk(label) getRawText getFriendlyText getTextPage(num)
Printable getPrintableVersion
Collection-level disseminators
Collection getSize listMembers(start,max)
CollectionRender renderItemPreview(pid) renderItemFullView(pid)
CollectionPagedImage viewPageTurner(pid,
pagenum)
CollectionPagedText viewText(pid, pagenum,
style) viewChunk(pid, label, style) viewPage(pid, num, style)
Image Demos
Sample Image Frank M. Hohenberger Collection U.S. Steel Collection
But what about the metadata?
Different content types have different types of metadata MARC for general library holdings MODS for collections we catalog TEI for textual collections EAD for archival collections Combinations: Some items need METS for structure,
TEI for text, MODS for description, etc.
The solution: METS
No, not the Fedora METS METS within a datastream, and everything else
within the METS A standard way of dealing with DC, MODS,
technical, structural, provenance, process, etc. Sample Image
Implementing the disseminators
Simple Image DC THUMBNAIL SCREEN LARGE METADATA RELS-EXT
Paged Object DC METADATA RELS-EXT
Collection DC METADATA INGEST_CONFIG
Want more info?
More detailed content model pages
are available on our project wiki.
IU Fedora Tools
Ingest Tool
The Ingest Tool transforms raw metadata and media files into Fedora objects that conform to our content models.
Ingest Tool
Fedora
MODSEAD JPG PDF
DatastreamsFOXML
METS Navigator
METS Navigator is a METS-based system for displaying and navigating multi-image digital objects.
It was built to be extendible and configurable. Web pages with navigational structure are built from
metadata in the repository. Available from http://metsnav.sourceforge.net
Demos
Default METS Navigator Collection
Jane Johnson Collection
Using METS Navigator with Fedora
METS document must meet minimal format requirements Logical and physical structMap Files marked with USE and GROUPID attributes Files are URLs that point to Fedora
METS Navigator may be called from a disseminator, but it is better if called separately.
Full integration instructions
Cataloging tools
No good solutions for non-MARC descriptive/structural metadata creation Some exist for specific domains: e.g. art image
cataloging Need content- or collection-appropriate interfaces Catalog directly into Fedora or into database?
Data synchronization issues Common framework or separate tools? Starting to investigate
Delivery tools
Right now: collection-specific web sites Moving towards: generic applications
appropriate to content models Examples: documentary photos, art images, books,
sheet music… May integrate components from other places
(e.g. Virginia collector tool) Exposing metadata to external services via OAI-
PMH, SRU (for Metasearch)
Other tools and services via Fedora Service Framework Search tool
Expanded, with thesaurus support Preservation integrity services
Infrastructure Project Challenges Time and resources vs. scope of work Sorting out old collections – digital
archeology Implementing new infrastructure while
continuing to do new projects Maintaining current functionality
Infrastructure Project Challenges Metadata entry / cataloging tool design Integration with MDSS/HPSS - classes of
storage Art images Searching system Preservation system
Thank You!
Contact info:Jon Dunn [email protected] Scherle [email protected]
Infrastructure project wiki: http://wiki.dlib.indiana.edu/confluence/display/INF