digital libraries made easy 2004 samla convention roanoke, virginia november 12, 2004 edward a. fox...
TRANSCRIPT
Digital LibrariesMade Easy
2004 SAMLA ConventionRoanoke, VirginiaNovember 12, 2004
Edward A. FoxDigital Library Research Laboratory & Dept. of Computer
Science, Virginia Tech, Blacksburg, VA 24061
[email protected] http://fox.cs.vt.edu
http://fox.cs.vt.edu/talks/2004/
Acknowledgements (Selected)• Sponsors: ACM, Adobe, AOL, CNI, CONACyT, DFG,
IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
• VT Faculty/Staff: Debra Dudley, Weiguo Fan, Gail McMillan, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
• VT Students: Yuxin Chen, Shahrooz Feizabadi, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, …
• Leonid Kalinichenko: for advice shaping this tutorial
Other Collaborators (Selected)
• Brazil: FUA, UFMG, UNICAMP
• Case Western Reserve University
• Emory, Notre Dame, Oregon State
• Germany: Univ. Oldenburg
• Mexico: UDLA (Puebla), Monterrey
• College of NJ, Hofstra, Penn State, Villanova
• University of Arizona
• University of Florida, Univ. of Illinois
• University of Virginia
Outline
1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)
• MIT Press: Arms, plus related by Borgman, Licklider (1965)• Morgan Kaufmann: Witten... (several), Lesk (2nd edition soon)
• Conferences• ECDL: www.ecdl2005.org• ICADL: http://icadl2004.sjtu.edu.cn• JCDL: www.jcdl2005.org
• Associations• ASIS&T DL SIG• IEEE TCDL: www.ieee-tcdl.org (student awards, consortium)
• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/
Domain Concepts (theory)
DL Architecture
instance of
Modeling Language (Meta-Model)
Model
used to compose instance of
abstracted from
represented by
interpreted as
instance of
instance of
Running DL
Actors “Real” World
“real” world object
represented by
interpreted as
Digital LibrariesShorten the Chain from
Editor
Publisher
A&I
Consolidator
Library
Reviewer
DLs Shorten the Chain to
Author
Reader
Digital
LibraryEditor
Reviewer
Teacher
Learner
Librarian
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital
Library of Theses & Dissertations
(NDLTD) http://www.ndltd.org
DLs: Why of Global Interest?
• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to economic and technological growth, education
• DL - a domain for international collaboration• wherein all can contribute and benefit• which leverages investment in networking• which provides useful content on Internet & WWW• which will tie nations and peoples together more
strongly and through deeper understanding
Libraries of the FutureJCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
5S Definition: Digital Libraries are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
SynchronousScholarly Communication
Same time, Same or different place
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
Computing (flops)Digital content
Com
mun
icat
ions
(ban
dwid
th, c
onne
ctiv
ity)
Locating Digital Libraries in Computing andCommunications Technology Space
Digital Libraries technologytrajectory: intellectualaccess to globally distributed information
less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
AmericanSouth.Org – Roles, ContentSOLINET Libraries (Data
Providers)Scholars
Intellectual Organization Controlled vocabulary Metadata extension
development
Collection Decisions Selection Criteria
Selection Criteria Controlled
vocabulary
Central Server Maintenance Local Server Maintenance Provision of Context
Metadata Repository Metadata Creation/Maintenance
Organizational Structure and
Annotation Tools
Central Interface Design/Maintenance
Local Interface Design/Maintenance
Selection of Other Annotation
Tools
Central Indices Creation/Maintenance
Local Indices Selection of Thesauri
Coordination of Metadata Gateway
Development
Gateway Implementation Concept Mapping
Digital Objects
Content Area Description Audio
Digital
Finding Aid
MSS Other
Photo
Video
MF
Total
African-American cultural life 6 4 6 9 4 12 3 10 18 72
Agricultural crisis of late 19th century
1 1 3 1 1 4 8 19
Codification of segregation laws 1 3 2 1 1 8 16
Configuration of white supremacy 1 3 3 3 1 9 20
Cultural values and activities 3 1 5 17 4 15 1 5 20 71
Disenfranchising movements 1 2 2 1 2 1 6 15
Educational movements 6 1 1 18 6 21 3 5 27 98
Emergence of Holiness & Pentecostal Groups
1 1 1 7 10
Emergence of new musical forms 3 1 1 1 2 8
Emergence of organized groups expressing farmers concerns
2 2 1 8 13
… … … … … … … … … … …Total Each Format 41 14 51 161 38 133 13 79 301 831
Outline
1. Introduction
2. Historical Perspective• Computing-related (ACM-DL,
CSTC, CITIDEL), NSDL
• DLI, Workshop Results: Chatham
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
CS -> CSTC -> CRIM• NSF and ACM Education Committee are funding
a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/
• College of NJ, U. Ill. Springfield, Virginia Tech
• Focus initially on labs, visualization, multimedia
• Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)
CS Teaching Center (CSTC)
• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.
• Learners benefit from having well-crafted modules that have been reviewed and tested.
• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
• ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org
Browsing (1)
Browsing (2)
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …
• Submission & Collection: sub/partner collections www.citidel.org
www.CITIDEL.org
• Led by Virginia Tech, with co-PIs:• Fox (director, DL systems)• Lee (history)• Perez (user interface, Spanish support)
• Partners• College of New Jersey (Knox)• Hofstra (Impagliazzo)• Villanova (Cassel)• Penn State (Giles)
English
Spanish
Nominated
Editor reviewed
Java
Multimedia
LLaanngguuaaggee TTooppiicc
QQuuaalliittyy
Identified by crawl
Peer reviewed
Algorithms
Multi-dimensional Categorization
DIGITAL LIBRARY SERVICES
REPOSITORIES
USER PORTALS
Overview of CITIDEL architecture
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
CITIDEL Technology Features•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
Cluster Search Results from CITIDEL
Cluster NDLTD-Computing
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
• (Next slides courtesy Lee Zia, NSF)
Supports:
Users
Content
Tools
(profiles)
(metadata)
(protocols)
Learning communities
Customizable collections
Application services
Enables:Environments for
• Communication
• Collaboration
• Creation
• Validation
• Evaluation
• Recognition
• ...
• Discovery
• Stability
• Reliability
• Reusability
• Interoperability
• Customizability
• ...
of Resources
AND
NSDL ProgramTracks
• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources
• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty
• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form
• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks
• Pathways: large efforts across broad ranges of areas or approaches or users
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
Outline
1. Introduction
2. Historical Perspective• Computing-related (ACM-DL, CSTC,
CITIDEL), NSDL
• DLI, Workshop Results
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
Borgman et al.:Workshop Report onSocial Aspects ofDigital Libraries: http://www-lis.gseis.ucla.edu/DL/
InformationLifeCycle
Information Life Cycle
AuthoringModifying
OrganizingIndexing
StoringRetrieving
DistributingNetworking
Retention/ Mining
AccessingFiltering
UsingCreating
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Significance
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Believability
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Benefits
• Ease of use
• Effectiveness
• “The benefits of digital libraries will not be appreciated unless they are easy to use effectively.” - IITA Workshop report
Application
Domain
Related Institutions
Examples Technical Challenges Benefit / Impact
PublishingPublishers, Eprint
archivesOAI Quality control, openness Aggregation, organization
Education
Schools, colleges, universities
NSDL, NCSTRL Knowledge management,
reuseabilityAccess to data
Art, Culture
Museum AMICO, PRDLA Digitization, describing,
catalogingGlobal understanding
ScienceGovernment,
Academia, Commerce
NVO, PDG, SwissProt, UK
eScience,European Union Commission
Data modelsreproducibility, faster reuse, faster
advance
(e) Governme
nt
Government Agencies (all levels)
Census Intellectual property rights,
privacy, multi-nationalAccountability, homeland security
(e) Commerce
, (e) Industry
Legal institutionsCourt cases,
patents Developing standards
Standardization, economic development
History, Heritage
Foundations American Memory Content, context,
interpretation
Long term view, perspective, documentation, recording, facilitating, interpretation,
understanding
Cross-cutting
Library, Archive
Web, personal collections
Multi-language, preservation, scalability, interoperability, dynamic
behavior, workflow, sustainability, ontologies,
distributed data, infrastructure
Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness
Reagan Moore
Ed Fox
June
2002
for
NSF
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,
archive, DL
• Interoperability: federated search, harvesting, OAI
• Architecture: distributed, clusters, LOCKSS
• Digitization, preservation
4. Software Solutions
5. Advanced Issues
Ourside Key Set, but Important: Interfaces
• 5S perspective: spaces, scenarios
• Taxonomy of interface components
• Workflow
• Visualization
• Environments
• Design
• Usability testing
Also Important: Epub, SGML, XML
• 5S perspective: streams, structures, scenarios
• Authoring
• Rendering, presenting
• Tagging, Markup, DOM
• Semi-structured information
• Dual-publishing, eBooks
• Styles (XSL, XSLT)
• Structure queries
Also Important: Databases
• 5S perspective: structures, streams, scenarios
• Extending database technology
• Structured and unstructured info
• Multimedia databases
• Link databases
• Performance
• Replicated storage, I2-DSI (details following)
Also Important: Agents
• 5S perspective: societies, streams, spaces, scenarios, structures
• Protocols
• Knowledge interchange
• Negotiation, registries
• Distributed issues
• Webbots (automatic indexing)
• Ontologies (standard upper)
Also Important: Economics
• 5S perspective: societies, scenarios
• E-commerce
• Sustainability
• Preservation and archiving• DLF, Besser, Lorie, Gladney
• Self-archiving
• Open collections
• Economic models, business plans
Also Important: IPR
• 5S perspective: societies, scenarios
• Intellectual property rights
• Legal issues
• Terms and conditions
• Copyright
• Patents, trademarks
• Distributed rights management
• Security
Also Important: Social Issues
• 5S perspective: societies, scenarios• Cooperation, collaboration• Annotation, ratings• Digital divide• Educational applications• Cultural heritage• Museums (AMICO)• Organizational acceptance• Personalization• Internationalization
What is Key depends on yourDL Definition
• Library ++ (library+archive+museum+…)
• Distributed information system + organization + effective interface
• User community + collection + services
• Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation
Our Perspective on Key Concepts
• Recall the 5S approach• Minimal digital library• Metamodel for minimal digital library• Metamodel for “born digital standard” DLs• Metamodel for architectural DL
• Here, focus on key concepts in minimal DL
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object• Is the DO version the same, better, or worse?• Decision for ETDs: structured + rendered
• Surrogate for “real” object• Not covered explicitly in metamodel for a
minimal DL• Crucial in metamodel for archaelogy DL
Metadata Objects (MDOs)
• MARC
• Dublin Core
• RDF
• IMS
• OAI (Open Archives Initiative)
• Crosswalks, mappings
• Ontologies
• Topics maps, concept maps
Other Key Definitions
• coll, catalog, repository, service, archive, (minimal) DL
• See Gonçalves et al. in April 2004 ACM Transactions on Information Systems (TOIS)
5S
structures (2) streams (1) spaces (4) scenarios (7) societies (10)
structural metadata specification (11)
descriptive metadata specification (12)
repository (19)
collection (17)
(20)indexing service
structured stream (15)
digital object (16)
metadata catalog (18)
browsing service (23)
searching service (21)
digital library (minimal) (24)
services (8)
sequence (A.3)
graph (A.6) function (A.2)
measurable(A.10), measure(A.11), probability (A.12), vector(A.13), topological (A.14) spaces
event (6) state (5)
hypertext (22)
sequence (A.3)
StreamsStreams Structures SpacesSpaces ScenariosScenarios SocietiesSocieties
indexingindexing
browsingbrowsing searchingsearching
servicesservices
hypertexthypertext
Structured Stream
ArchObj
ArchColl
ArchObjArchObj
ArchCollArchColl
Arch Metadata catalogArchDO
ArchDRArchDRArchDCollArchDColl Minimal ArchDL
SpaTemOrgSpaTemOrg
StraDiaStraDia
Arch Descriptive Metadata specification
Descriptive Metadata
specification
Streams
text
audio
image
video digitalobject
Repository
CollectionCatalog
describes
stores
is_version_of/ cites/links_to
Index
Service
Scenario
event
extends
reuses
ServiceManager
Actor
operationexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Topological
ProbabilisticMetric
Measurable
Measure
describes
employsproduces
employsproduces
employs
produces
Structures
Spaces
Vector
contains
metadata specifications
is_a is_a
precedes
happens_before
is_a
redefinesinvokes
contains
contains
Models Examples Objectives
Stream Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata; organization tools
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines managers responsible for running DL services; actors that use those; and relationships among them
Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing
Annotating Classifying Clustering Evaluating Extracting Indexing
Measuring Publicizing
Rating Reviewing (peer)
Surveying Translating
(language)
Conserving Converting
Copying/Replicating Emulating Renewing
Translating (format)
Acquiring Cataloging
Crawling (focused) Describing Digitizing
Federating Harvesting Purchasing Submitting
Preservational Creational
Add Value
Repository-Building
Information Satisfaction
Services
Infrastructure Services
SearchingBrowsing
queryanchor
Society
actor
Collection, {digital object}
Recommending Filtering Binding Visualizing Expanding query
user model query/category {digital object}
{digital object} {digital object}
binder
InformationSatisfaction Services
space query’
fundamental
Rating Training
Infrastructure
Services (Add_Value)
composite
Requesting
handle
p pp
e e e{(digital object, actor, rate) }
p
e
e
p p p p p
e e
classifier
e ee e
e
p
e
Indexing
Index
p
e
transformer
e
Requirements Analysis Design Implementation Test
5S 5SLOO ClassesWorkflow Components
DLEvaluation
5SGraph 5SLGenFormalTheory/Metamodel
DL XMLLog
5SLGen: Automatic DL Generation
5S Meta
Model5SLGraph
DL Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,
archive, DL
• Interoperability: federated search, harvesting, OAI
• Architecture: distributed, clusters, LOCKSS
• Digitization, preservation
4. Software Solutions
5. Advanced Issues
Interoperability through Standards
• Protocols/federation• Z39.50, CIMI• Dienst, NCSTRL• OAI protocol
• Metadata• TEI: inline, detailed (structure in stream)• MARC: two-level, fine-grained• Dublin Core: high-level, 15 elements• RDF: describing resources/collections, annotation• OAMS -> DC and others used in OAI
Interoperability and IR
• Information storage and retrieval
• Search, Retrieval, Resource Discovery
• Boolean vs. natural language
• Search engines
• Indexing, phrases, thesauri, concepts
• Federated search and harvesting, OAI
• Integrating links and ratings
• Crawlers, spiders, metasearch, fusion
Open Archives Initiative (OAI)
• Advocacy for interoperability• Standard for transferring metadata among
digital libraries• Protocol for Metadata Harvesting (PMH)
• Simplicity• Generality• Extensibility
• Support for PMH => Open Archive (OA)
OAI = Technical Umbrella forPractical Interoperability…
ReferenceLibraries
PublishersE-Print
Archives
…that can be exploited by different communities
Museums
OAI – Repository Perspective
Required: Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,
archive, DL
• Interoperability: federated search, harvesting, OAI
• Architecture: distributed, clusters, LOCKSS
• Digitization, preservation
4. Software Solutions
5. Advanced Issues
Architectural Issues
• Internet middleware• Independent system / part of federation• Decompositions vary
• search engine, browser, DBMS, MM support• repository, handle server, client• information resources + mediators, bus or agent
collection + client with workspace/environment• Metrics: e.g., for federated search
Clusters
• How can computer clusters scale with collections and user communities to achieve cost-effective solutions for DLs?
• Paul Mather dissertation by early 2005• Modeling and simulation• Cluster size• Communication fabric and patterns• Disks and nodes• Characterize DL collections: file sizes• Characterize user workload: logs• Special considerations:
• Linear hashing of names• Replication of popular objects
LOCKSS
• Lots of copies keep stuff safe• Stanford (Vicky Reich)• Initial focus on lower levels• Initial content: journals• Emory (Martin Halbert)
• Help deploy and adapt
• Help apply in other contexts• Another registry
• Set of publisher manifests (information providers)
• Set of storage systems (archival storage)
• NDIIP: AmericanSouth, MetaArchive
OCKHAM Library Network
NSDL
OCKHAM
Services
NSDLServices
Teachers LearnersLibrarians
OCKHAMLibrary
Network
LibraryServices
OCKHAM
• Simplicity (a la OCCAM’s razor)
• Support by Mellon and DLF
• Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
• Funded by NSF in NSDL, with P2P
Lightweight Protocols
• “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive.
• Successes of protocols considered lightweight is illuminating.
• Examples: TCP/IP, HTTP, LDAP, and the OAI PMH
Reference Models
• Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration
• Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement
OCKHAM Proposed Services
• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry • (plus others such as from adapted ODL)
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,
archive, DL
• Interoperability: federated search, harvesting, OAI
• Architecture: distributed, clusters, LOCKSS
• Digitization, preservation
4. Software Solutions
5. Advanced Issues
Digitization and PreservationCommunity and Activity (selected)
• Archivists worldwide• International collaboration
• Million book project in US, China, India (Reddy, Chen, Balakrishnan)• US Library of Congress
• Matching funds• American Memory• Infrastructure: NDIIP
• Dutch National Library + IBM• Associations: ARL, DLF• People
• Harnad: Self-archiving movement• Lorie: Universal virtual computer• Gladney: technology, philosophy
(http://home.pacbell.net/hgladney/ddq_3_1.htm)• Besser, Trant, …
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace,
Fedora, ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Open Source DL Examples
• Eprints (www.eprints.org)
• Fedora
• Greenstone (www.greenstone.org)
• Many systems in NSF DLI projects
• VT systems: CITIDEL, CSTC, DL-in-a-box, ETANA, MARIAN, NCSTRL, NDLTD
What is a Digital Object Repository?
Also called: digital rep., digital asset rep., institutional repository
Stores and maintains digital objects (assets)Provides external interface for Digital Objects
Creation, Modification, Access
Enforces access policiesProvides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
Goals of Institutional Repositories (by Steven Harnad, U. Southampton)
Self Archiving of Institutional ResearchSelf Archiving of Institutional ResearchThesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps
Management of digital collectionsManagement of digital collections
Preservation of materials – decentralized approachPreservation of materials – decentralized approach
Housing of teaching materialsHousing of teaching materials
Electronic Publishing of journals, books, posters, maps, audio, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsvideo and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
Fedora™ Digital Object ArchitecturePersistent ID (PID)
Disseminators
System Metadata
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Datastreams
Images, E-books, E-journals, Music, Video, etc.
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
Fedora™Repository
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Web Service Web Service Exposure Exposure LayerLayer
Adapted from Slide by V. Chachra, VTLS
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
users digital objects
?
?1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video?digital library
Monolithicand/or
Custom-builtweb-basedapplication
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
componentized digital library
?
?
?
?
???
?
?
?
?
??
? ?
?
?
?
?
?
?
?
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
Open Digital Library Protocol
Extended OAI-PMH
Protocol for Metadata Harvesting
Open Digital Library Component
Extended OPEN ARCHIVE
OPENARCHIVE
Open Digital Library Deployments
• NDLTD (www.ndltd.org)• Computer Science Teaching Center (www.cstc.org)• Computing and Information Technology
Interactive Digital Educational Library (www.citidel.org)
• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet
• OCKHAM• Open to others through DL-in-a-box
Open Digital Library
• Network of Extended Open Archives where each node acts as either a provider of data, services or both.
• Component = Node
• Protocol = Arc
Open Digital Library Components
• Running now• XML-File (data provider from file system)• Search: simple or in-memory (Essex) or generalized• Union, browse, recent, filter• E-journal/review, Submit, Edit, Annotation• Recommender, Rating; Mirroring (see JCDL’02)• Working with NCSA: from DB, unstructured text
• Others in process• Classification/categorization• Registry (and other connections with web services)
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
ETD DL for the Networked Digital Library of Theses and Dissertations
(www.ndltd.org)
Search
Filter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers ETD collections
Example Open Digital Library
OAI, ODL, DL-in-a-box
• Open Archives Initiative• since 1999, www.openarchives.org
• Open Digital Libraries• since 2001, from www.dlib.vt.edu• with Hussein Suleman (now U. Cape Town)
• DL-in-a-box• NSDL support since 2001• Aimed to help new collections / services projects• http://dlbox.nudl.org
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,
ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Commercial DL Examples
• IBM Digital Library
• Virtua (www.vtls.com)• Fedora -> VITAL
• Some systems from NSF DLI projects• Google
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,
ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Conceptual Category Feature Name
Discovery Tools
Searching
Browsing
Syndication & Notification
Aggregation Tools
Personal Collections
Content Aggregator and Packaging Tool
Community & Evaluation
Evaluation System
Context Usage Illustrators
Wish Lists
WCET
LOR
Study
2004
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,
ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Case Study: NCSTRL Costs/BenefitsStakeholders Sample Potential Cost Sample Potential Benefit
Providers Faculty Lower value for P&T Faster publishing
Students Less recognition Broader set of outlets
Practitioners Limited relevance Ease of publishing, > quantity
Users Faculty Lower quality of work Broader access to resources
Students Higher access costs (vs. department available material)
Lower access costs (vs. journal available material)
Departments New maintenance costs Broader visibility
University libraries Additional access costs Access to new resources
Practitioners More difficult access Access to new resources
Outline
1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
• Challenges, open problems
• Promising approaches
Digital Libraries --- Objectives
• World Lit.: 24hr / 7day / from desktop• Integrated “super” information systems: 5S:
streams, structures, spaces, scenarios, societies • Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery• Disintermediation -> Collaboration • Universities Reclaim Property• Interactive Courseware, Student Works• Scalable, Sustainable, Usable, Useful
DL Challenges
• Preservation - so people with trust DLs
• Supporting infrastructure - networks, ...
• Scalability, sustainability, interoperability
• DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM, ...
• Need tools & methods to make them easier to build
Outline
1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
• Challenges, open problems
• Promising approaches
NDLTD: How can a university get involved?
• Select planning/implementation team• Graduate School
• Library
• Computing / Information Technology
• Institutional Research / Educ. Tech.
• Join online, give us contact names• www.ndltd.org/join
• Adapt Virginia Tech or other proven approach• Build interest and consensus
• Start trial / allow optional submission
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access isOpened to the New Research
WWW
NDLTD
ETD Union Collection (OAI)
VIRTUA
Merged Metadata Collection
ODL (VT)
Virginia Tech ETD Archive
Brazil ETD
Archive
OCLC ETD
Archive
Future: recommender, …
… OAI Data Provider
OAI Service Provider
OAI Harvesting
LEGEND
Union catalog: OCLC
• OCLC will expand OAI data provider on TDs.
• Is getting data from WorldCat (so, from many sites!).
• Will harvest from all others who contact them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
OCLC SRU Interface
Union catalog: VTLS, VT
• VTLS will enhance search/browse service for ETDs
• Will harvest from OCLC’s set of ETD records
• Will receive through other mechanisms
• Will work with MARC-21 and ETD-MS
• VT will continue to offer experimental services
ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)
VTLS Union CatalogContent Languages
The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish
Examples follow
Language = German; hits = 137
Full record display
Complex to Simple
MARC ($50) Dublin Core (DC)
+thesis
Why ETD?Short Answer
• For Students:• Gain knowledge and skills for the Information Age
• Richer communication (digital information, multimedia, …)
• For Universities: • Easy way to enter the digital library field and benefit thereby
• For the World: • Global digital library – large, useful, many services
• General:• Save time and money
• Increased visibility for all associated with research results
ETANA-DL: 5S Extension
• 5S and component architecture to allow handling of very complex DL applications: archaeology
• Information visualization, clustering
• Mappings across streams, structure, spaces
Case Study (Archaeology):ETANA
• NSF ITR with CWRU (and Vanderbilt …)
• Faster DL development• for complex application domains,• with suitable tailoring
• Approach• ODL – pool of components• 5S – theory-based generation of systems
ETANA Website
Lahav Website
Megiddo Opening Screen
Locus Screen: Pictures
View all
Area Screen: Distribution of Artifacts
ETANA-DL Website
Archaeology DL – Approach
• Solve the following DL problems:• interoperability,• making primary data available,• data preservation
• Modeling archaeological information systems• using 5S theory to design system and services
• Rapidly prototyping DLs that handle• heterogeneous archaeological data using• componentized frameworks
ETANA-DL Schema Design
Bone Seed Figurine
ETANA-DLObject
Count
Animal
……
Species
Name
……
Description
Dimensions
……
Owner
Subpartition
PartitionLocus
ID Container
Collection
……
Data Mapping
ETANA-DL Architecture
Users Services DataETANA-DL
UnionServices Users
DigBase
DigKit
ETANA-DL ArchitectureDigBase and DigKit
Lahav
Nimrin
Umayri
Hisban
Megiddo
Jalul
New Sites
DATABASE
WRAPPERS
ETANA-DLUNION
CATALOG
SearchUSER
INTERFACE
Browse
Recommend
Note
Personalize
Review
Visualizations
ArchaeologySpecific
Work in progress
…
ETANA-DL Architecture
UnionCatalog
Inverted Files
Services DB
Index
Index
BrowseComponent
SearchComponent
Browse DB
OtherETANA-DL
Services
Web
Interface
XOAI
XOAI
DigBase
DB
DataMapping
Component
OA
I Data P
rovider
OAI
Archaeological Site ETANA-DL
DigKit
Configure
Searching – Search Results
Searching – Advanced Search
Searching – Advanced Search Results
Summary
1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
Selected Links - http://fox.cs.vt.edu• CITIDEL (computing education resources)
• www.citidel.org• NCSTRL (computing technical reports)
• www.ncstrl.org• NDLTD (electronic theses and dissertations worldwide)
• www.ndltd.org and etdguide.org• NSDL (National Science Digital Library)
• www.nsdl.org• OAI (Open Archives Initiative)
• www.openarchives.org• Virginia Tech Digital Library Research Laboratory
(DLRL, www.dlib.vt.edu)• 5S, AmericanSouth.Org, CSTC, DL-in-a-box, ENVISION,
ETANA, MARIAN, NDLTD, NSDL, OAD, ODL, …)
Questions/Discussion?