digital libraries made easy 2004 samla convention roanoke, virginia november 12, 2004 edward a. fox...

Post on 27-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Digital LibrariesMade Easy

2004 SAMLA ConventionRoanoke, VirginiaNovember 12, 2004

Edward A. FoxDigital Library Research Laboratory & Dept. of Computer

Science, Virginia Tech, Blacksburg, VA 24061

fox@vt.edu http://fox.cs.vt.edu

http://fox.cs.vt.edu/talks/2004/

Acknowledgements (Selected)• Sponsors: ACM, Adobe, AOL, CNI, CONACyT, DFG,

IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS

• VT Faculty/Staff: Debra Dudley, Weiguo Fan, Gail McMillan, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

• VT Students: Yuxin Chen, Shahrooz Feizabadi, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, …

• Leonid Kalinichenko: for advice shaping this tutorial

Other Collaborators (Selected)

• Brazil: FUA, UFMG, UNICAMP

• Case Western Reserve University

• Emory, Notre Dame, Oregon State

• Germany: Univ. Oldenburg

• Mexico: UDLA (Puebla), Monterrey

• College of NJ, Hofstra, Penn State, Villanova

• University of Arizona

• University of Florida, Univ. of Illinois

• University of Virginia

Outline

1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions

5. Advanced Issues

For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)

• MIT Press: Arms, plus related by Borgman, Licklider (1965)• Morgan Kaufmann: Witten... (several), Lesk (2nd edition soon)

• Conferences• ECDL: www.ecdl2005.org• ICADL: http://icadl2004.sjtu.edu.cn• JCDL: www.jcdl2005.org

• Associations• ASIS&T DL SIG• IEEE TCDL: www.ieee-tcdl.org (student awards, consortium)

• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/

Domain Concepts (theory)

DL Architecture

instance of

Modeling Language (Meta-Model)

Model

used to compose instance of

abstracted from

represented by

interpreted as

instance of

instance of

Running DL

Actors “Real” World

“real” world object

represented by

interpreted as

Digital LibrariesShorten the Chain from

Editor

Publisher

A&I

Consolidator

Library

Reviewer

DLs Shorten the Chain to

Author

Reader

Digital

LibraryEditor

Reviewer

Teacher

Learner

Librarian

A Digital Library Case Study

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Submission: http://etd.vt.edu

• Collection: http://www.theses.org

Project: Networked Digital

Library of Theses & Dissertations

(NDLTD) http://www.ndltd.org

DLs: Why of Global Interest?

• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly

• Knowledge and information are essential to economic and technological growth, education

• DL - a domain for international collaboration• wherein all can contribute and benefit• which leverages investment in networking• which provides useful content on Internet & WWW• which will tie nations and peoples together more

strongly and through deeper understanding

Libraries of the FutureJCR Licklider, 1965, MIT Press

World

Nation

State

City

Community

5S Definition: Digital Libraries are complex systems that

• help satisfy info needs of users (societies)

• provide info services (scenarios)

• organize info in usable ways (structures)

• present info in usable ways (spaces)

• communicate info with users (streams)

SynchronousScholarly Communication

Same time, Same or different place

Asynchronous, Digital Library Mediated Scholarly Communication

Different time and/or place

Computing (flops)Digital content

Com

mun

icat

ions

(ban

dwid

th, c

onne

ctiv

ity)

Locating Digital Libraries in Computing andCommunications Technology Space

Digital Libraries technologytrajectory: intellectualaccess to globally distributed information

less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)

D ig ita l L ib ra r y C o n te n t

A rtic le s ,R e p o rts,

B o o ks

T e xtD o cum e n ts

S p ee ch ,M u s ic

V id eoA u d io

(A e ria l)P h o tos

G e og rap h icIn fo rm ation

M o d e lsS im u la tio ns

S o ftw a re ,P ro g ra m s

G e no m eH u m a n,a n im a l,

p la n t

B ioIn fo rm ation

2 D , 3 D ,V R ,C A T

Im ag es a ndG ra p h ics

C o nte n tT yp e s

AmericanSouth.Org – Roles, ContentSOLINET Libraries (Data

Providers)Scholars

Intellectual Organization Controlled vocabulary Metadata extension

development

Collection Decisions Selection Criteria

Selection Criteria Controlled

vocabulary

Central Server Maintenance Local Server Maintenance Provision of Context

Metadata Repository Metadata Creation/Maintenance

Organizational Structure and

Annotation Tools

Central Interface Design/Maintenance

Local Interface Design/Maintenance

Selection of Other Annotation

Tools

Central Indices Creation/Maintenance

Local Indices Selection of Thesauri

Coordination of Metadata Gateway

Development

Gateway Implementation Concept Mapping

Digital Objects

Content Area Description Audio

Digital

Finding Aid

MSS Other

Photo

Video

MF

Print

Total

African-American cultural life 6 4 6 9 4 12 3 10 18 72

Agricultural crisis of late 19th century

1 1 3 1 1 4 8 19

Codification of segregation laws 1 3 2 1 1 8 16

Configuration of white supremacy 1 3 3 3 1 9 20

Cultural values and activities 3 1 5 17 4 15 1 5 20 71

Disenfranchising movements 1 2 2 1 2 1 6 15

Educational movements 6 1 1 18 6 21 3 5 27 98

Emergence of Holiness & Pentecostal Groups

1 1 1 7 10

Emergence of new musical forms 3 1 1 1 2 8

Emergence of organized groups expressing farmers concerns

2 2 1 8 13

… … … … … … … … … … …Total Each Format 41 14 51 161 38 133 13 79 301 831

Outline

1. Introduction

2. Historical Perspective• Computing-related (ACM-DL,

CSTC, CITIDEL), NSDL

• DLI, Workshop Results: Chatham

3. Topical Perspective

4. Software Solutions

5. Advanced Issues

CS -> CSTC -> CRIM• NSF and ACM Education Committee are funding

a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/

• College of NJ, U. Ill. Springfield, Virginia Tech

• Focus initially on labs, visualization, multimedia

• Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)

CS Teaching Center (CSTC)

• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.

• Learners benefit from having well-crafted modules that have been reviewed and tested.

• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.

• ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org

Browsing (1)

Browsing (2)

Computing and Information Technology Interactive Digital Educational Library (CITIDEL)

• Domain: computing / information technology

• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …

• Submission & Collection: sub/partner collections www.citidel.org

www.CITIDEL.org

• Led by Virginia Tech, with co-PIs:• Fox (director, DL systems)• Lee (history)• Perez (user interface, Spanish support)

• Partners• College of New Jersey (Knox)• Hofstra (Impagliazzo)• Villanova (Cassel)• Penn State (Giles)

English

Spanish

Nominated

Editor reviewed

Java

Multimedia

LLaanngguuaaggee TTooppiicc

QQuuaalliittyy

Identified by crawl

Peer reviewed

Algorithms

Multi-dimensional Categorization

DIGITAL LIBRARY SERVICES

REPOSITORIES

USER PORTALS

Overview of CITIDEL architecture

Annotations

OAI Data

Harvester

EDUCATORS

ADMINISTRATORS LEARNERS

Multilingual Searching

Revising Annotating Filtering Browsing Administering

Filtering Profiles User Profiles

Union Metadata

OAI Data

Provider

Remote and Peer Digital Libraries (eg. NSDL -CIS)

PORTALS

SERVICES

REPOSITORIES

Digital library architecture for localand interoperable CITIDEL services

CITIDEL Technology Features•Component architecture (Open Digital Library)

•Re-use and compose re-deployable digital library components.

•Built Using Open Standards & Technologies

•OAI: Used to collect DL Resources and DL Interoperability

•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)

•Perl: Component Integration

•ESSEX: Search Engine Functionality

•Very fast, utilizing in-memory processing

•Includes snap-shots for persistence

•Multi-scheming

•Integrates multiple classifications / views through maps, closure

Cluster Search Results from CITIDEL

Cluster NDLTD-Computing

CITIDEL -> NSDL

• A collection project in the

• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL

• National Science Digital Library

• www.nsdl.org

• (Next slides courtesy Lee Zia, NSF)

Supports:

Users

Content

Tools

(profiles)

(metadata)

(protocols)

Learning communities

Customizable collections

Application services

Enables:Environments for

• Communication

• Collaboration

• Creation

• Validation

• Evaluation

• Recognition

• ...

• Discovery

• Stability

• Reliability

• Reusability

• Interoperability

• Customizability

• ...

of Resources

AND

NSDL ProgramTracks

• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources

• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty

• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form

• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks

• Pathways: large efforts across broad ranges of areas or approaches or users

NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

Outline

1. Introduction

2. Historical Perspective• Computing-related (ACM-DL, CSTC,

CITIDEL), NSDL

• DLI, Workshop Results

3. Topical Perspective

4. Software Solutions

5. Advanced Issues

Borgman et al.:Workshop Report onSocial Aspects ofDigital Libraries: http://www-lis.gseis.ucla.edu/DL/

InformationLifeCycle

Information Life Cycle

AuthoringModifying

OrganizingIndexing

StoringRetrieving

DistributingNetworking

Retention/ Mining

AccessingFiltering

UsingCreating

AuthoringModifying

OrganizingIndexing

Storing

Archiving

NetworkingAccessing

Filtering

Creation

DistributionUtilization

Significance

Similarity

Pertinence

AccuracyCompletenessConformance

Seeking

SearchingBrowsingRecommending

Relevance

Timeliness

Accessibility

Accessibility

Believability

Inactive

Active

Discard

RetentionMining

Semi-Active

Preservability

Timeliness

Preservability

Describing

Benefits

• Ease of use

• Effectiveness

• “The benefits of digital libraries will not be appreciated unless they are easy to use effectively.” - IITA Workshop report

Application

Domain

Related Institutions

Examples   Technical Challenges Benefit / Impact

PublishingPublishers, Eprint

archivesOAI   Quality control, openness Aggregation, organization

Education

Schools, colleges, universities

NSDL, NCSTRL  Knowledge management,

reuseabilityAccess to data

Art, Culture

Museum AMICO, PRDLA  Digitization, describing,

catalogingGlobal understanding

ScienceGovernment,

Academia, Commerce

NVO, PDG, SwissProt, UK

eScience,European Union Commission

  Data modelsreproducibility, faster reuse, faster

advance

(e) Governme

nt

Government Agencies (all levels)

Census  Intellectual property rights,

privacy, multi-nationalAccountability, homeland security

(e) Commerce

, (e) Industry

Legal institutionsCourt cases,

patents  Developing standards

Standardization, economic development

History, Heritage

Foundations American Memory  Content, context,

interpretation

Long term view, perspective, documentation, recording, facilitating, interpretation,

understanding

Cross-cutting

Library, Archive

Web, personal collections

 

Multi-language, preservation, scalability, interoperability, dynamic

behavior, workflow, sustainability, ontologies,

distributed data, infrastructure

Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness

Reagan Moore

Ed Fox

June

2002

for

NSF

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,

archive, DL

• Interoperability: federated search, harvesting, OAI

• Architecture: distributed, clusters, LOCKSS

• Digitization, preservation

4. Software Solutions

5. Advanced Issues

Ourside Key Set, but Important: Interfaces

• 5S perspective: spaces, scenarios

• Taxonomy of interface components

• Workflow

• Visualization

• Environments

• Design

• Usability testing

Also Important: Epub, SGML, XML

• 5S perspective: streams, structures, scenarios

• Authoring

• Rendering, presenting

• Tagging, Markup, DOM

• Semi-structured information

• Dual-publishing, eBooks

• Styles (XSL, XSLT)

• Structure queries

Also Important: Databases

• 5S perspective: structures, streams, scenarios

• Extending database technology

• Structured and unstructured info

• Multimedia databases

• Link databases

• Performance

• Replicated storage, I2-DSI (details following)

Also Important: Agents

• 5S perspective: societies, streams, spaces, scenarios, structures

• Protocols

• Knowledge interchange

• Negotiation, registries

• Distributed issues

• Webbots (automatic indexing)

• Ontologies (standard upper)

Also Important: Economics

• 5S perspective: societies, scenarios

• E-commerce

• Sustainability

• Preservation and archiving• DLF, Besser, Lorie, Gladney

• Self-archiving

• Open collections

• Economic models, business plans

Also Important: IPR

• 5S perspective: societies, scenarios

• Intellectual property rights

• Legal issues

• Terms and conditions

• Copyright

• Patents, trademarks

• Distributed rights management

• Security

Also Important: Social Issues

• 5S perspective: societies, scenarios• Cooperation, collaboration• Annotation, ratings• Digital divide• Educational applications• Cultural heritage• Museums (AMICO)• Organizational acceptance• Personalization• Internationalization

What is Key depends on yourDL Definition

• Library ++ (library+archive+museum+…)

• Distributed information system + organization + effective interface

• User community + collection + services

• Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation

Our Perspective on Key Concepts

• Recall the 5S approach• Minimal digital library• Metamodel for minimal digital library• Metamodel for “born digital standard” DLs• Metamodel for architectural DL

• Here, focus on key concepts in minimal DL

Digital Objects (DOs)

• Born digital

• Digitized version of “real” object• Is the DO version the same, better, or worse?• Decision for ETDs: structured + rendered

• Surrogate for “real” object• Not covered explicitly in metamodel for a

minimal DL• Crucial in metamodel for archaelogy DL

Metadata Objects (MDOs)

• MARC

• Dublin Core

• RDF

• IMS

• OAI (Open Archives Initiative)

• Crosswalks, mappings

• Ontologies

• Topics maps, concept maps

Other Key Definitions

• coll, catalog, repository, service, archive, (minimal) DL

• See Gonçalves et al. in April 2004 ACM Transactions on Information Systems (TOIS)

5S

structures (2) streams (1) spaces (4) scenarios (7) societies (10)

structural metadata specification (11)

descriptive metadata specification (12)

repository (19)

collection (17)

(20)indexing service

structured stream (15)

digital object (16)

metadata catalog (18)

browsing service (23)

searching service (21)

digital library (minimal) (24)

services (8)

sequence (A.3)

graph (A.6) function (A.2)

measurable(A.10), measure(A.11), probability (A.12), vector(A.13), topological (A.14) spaces

event (6) state (5)

hypertext (22)

sequence (A.3)

StreamsStreams Structures SpacesSpaces ScenariosScenarios SocietiesSocieties

indexingindexing

browsingbrowsing searchingsearching

servicesservices

hypertexthypertext

Structured Stream

ArchObj

ArchColl

ArchObjArchObj

ArchCollArchColl

Arch Metadata catalogArchDO

ArchDRArchDRArchDCollArchDColl Minimal ArchDL

SpaTemOrgSpaTemOrg

StraDiaStraDia

Arch Descriptive Metadata specification

Descriptive Metadata

specification

Streams

text

audio

image

video digitalobject

Repository

CollectionCatalog

describes

stores

is_version_of/ cites/links_to

Index

Service

Scenario

event

extends

reuses

ServiceManager

Actor

operationexecutes

participates_in

recipient

runs

Scenarios

Societies

inherits_from/includes

association

uses

Topological

ProbabilisticMetric

Measurable

Measure

describes

employsproduces

employsproduces

employs

produces

Structures

Spaces

Vector

contains

metadata specifications

is_a is_a

precedes

happens_before

is_a

redefinesinvokes

contains

contains

Models Examples Objectives

Stream Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data

Structures Collection; catalog; hypertext; document; metadata; organization tools

Specifies organizational aspects of the DL content

Spaces Measure; measurable, topological, vector, probabilistic

Defines logical and presentational views of several DL components

Scenarios Searching, browsing, recommending

Details the behavior of DL services

Societies Service managers, learners, teachers, etc.

Defines managers responsible for running DL services; actors that use those; and relationships among them

Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing

Annotating Classifying Clustering Evaluating Extracting Indexing

Measuring Publicizing

Rating Reviewing (peer)

Surveying Translating

(language)

Conserving Converting

Copying/Replicating Emulating Renewing

Translating (format)

Acquiring Cataloging

Crawling (focused) Describing Digitizing

Federating Harvesting Purchasing Submitting

Preservational Creational

Add Value

Repository-Building

Information Satisfaction

Services

Infrastructure Services

SearchingBrowsing

queryanchor

Society

actor

Collection, {digital object}

Recommending Filtering Binding Visualizing Expanding query

user model query/category {digital object}

{digital object} {digital object}

binder

InformationSatisfaction Services

space query’

fundamental

Rating Training

Infrastructure

Services (Add_Value)

composite

Requesting

handle

p pp

e e e{(digital object, actor, rate) }

p

e

e

p p p p p

e e

classifier

e ee e

e

p

e

Indexing

Index

p

e

transformer

e

Requirements Analysis Design Implementation Test

5S 5SLOO ClassesWorkflow Components

DLEvaluation

5SGraph 5SLGenFormalTheory/Metamodel

DL XMLLog

5SLGen: Automatic DL Generation

5S Meta

Model5SLGraph

DL Expert

DL Designer

5SL DL

Model

5SLGen

Practitioner

Researcher

TailoredDL

Services

Teacher

componentpool

ODLSearch,ODLBrowse,ODLRate,ODLReview,

…….

Requirements (1) Analysis (2)

Implementation (4)

Design (3)

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,

archive, DL

• Interoperability: federated search, harvesting, OAI

• Architecture: distributed, clusters, LOCKSS

• Digitization, preservation

4. Software Solutions

5. Advanced Issues

Interoperability through Standards

• Protocols/federation• Z39.50, CIMI• Dienst, NCSTRL• OAI protocol

• Metadata• TEI: inline, detailed (structure in stream)• MARC: two-level, fine-grained• Dublin Core: high-level, 15 elements• RDF: describing resources/collections, annotation• OAMS -> DC and others used in OAI

Interoperability and IR

• Information storage and retrieval

• Search, Retrieval, Resource Discovery

• Boolean vs. natural language

• Search engines

• Indexing, phrases, thesauri, concepts

• Federated search and harvesting, OAI

• Integrating links and ratings

• Crawlers, spiders, metasearch, fusion

Open Archives Initiative (OAI)

• Advocacy for interoperability• Standard for transferring metadata among

digital libraries• Protocol for Metadata Harvesting (PMH)

• Simplicity• Generality• Extensibility

• Support for PMH => Open Archive (OA)

OAI = Technical Umbrella forPractical Interoperability…

ReferenceLibraries

PublishersE-Print

Archives

…that can be exploited by different communities

Museums

OAI – Repository Perspective

Required: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Tiered Model of Interoperability

Mediator services

Metadata harvesting

Document models

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,

archive, DL

• Interoperability: federated search, harvesting, OAI

• Architecture: distributed, clusters, LOCKSS

• Digitization, preservation

4. Software Solutions

5. Advanced Issues

Architectural Issues

• Internet middleware• Independent system / part of federation• Decompositions vary

• search engine, browser, DBMS, MM support• repository, handle server, client• information resources + mediators, bus or agent

collection + client with workspace/environment• Metrics: e.g., for federated search

Clusters

• How can computer clusters scale with collections and user communities to achieve cost-effective solutions for DLs?

• Paul Mather dissertation by early 2005• Modeling and simulation• Cluster size• Communication fabric and patterns• Disks and nodes• Characterize DL collections: file sizes• Characterize user workload: logs• Special considerations:

• Linear hashing of names• Replication of popular objects

LOCKSS

• Lots of copies keep stuff safe• Stanford (Vicky Reich)• Initial focus on lower levels• Initial content: journals• Emory (Martin Halbert)

• Help deploy and adapt

• Help apply in other contexts• Another registry

• Set of publisher manifests (information providers)

• Set of storage systems (archival storage)

• NDIIP: AmericanSouth, MetaArchive

OCKHAM Library Network

NSDL

OCKHAM

Services

NSDLServices

Teachers LearnersLibrarians

OCKHAMLibrary

Network

LibraryServices

OCKHAM

• Simplicity (a la OCCAM’s razor)

• Support by Mellon and DLF

• Four main ideas:

1. Components

2. Lightweight protocols

3. Open reference models (e.g., 5S, OAIS)

4. Community perspective and involvement

• Funded by NSF in NSDL, with P2P

Lightweight Protocols

• “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive.

• Successes of protocols considered lightweight is illuminating.

• Examples: TCP/IP, HTTP, LDAP, and the OAI PMH

Reference Models

• Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration

• Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement

OCKHAM Proposed Services

• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry • (plus others such as from adapted ODL)

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective• Key concepts: do, mdo, coll, catalog, repository, service,

archive, DL

• Interoperability: federated search, harvesting, OAI

• Architecture: distributed, clusters, LOCKSS

• Digitization, preservation

4. Software Solutions

5. Advanced Issues

Digitization and PreservationCommunity and Activity (selected)

• Archivists worldwide• International collaboration

• Million book project in US, China, India (Reddy, Chen, Balakrishnan)• US Library of Congress

• Matching funds• American Memory• Infrastructure: NDIIP

• Dutch National Library + IBM• Associations: ARL, DLF• People

• Harnad: Self-archiving movement• Lorie: Universal virtual computer• Gladney: technology, philosophy

(http://home.pacbell.net/hgladney/ddq_3_1.htm)• Besser, Trant, …

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace,

Fedora, ETD-DB, ODL

• Commercial: IBM Content Manager, VTLS’ VITAL

• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)

• Evaluation, usability

5. Advanced Issues

Open Source DL Examples

• Eprints (www.eprints.org)

• Fedora

• Greenstone (www.greenstone.org)

• Many systems in NSF DLI projects

• VT systems: CITIDEL, CSTC, DL-in-a-box, ETANA, MARIAN, NCSTRL, NDLTD

What is a Digital Object Repository?

Also called: digital rep., digital asset rep., institutional repository

Stores and maintains digital objects (assets)Provides external interface for Digital Objects

Creation, Modification, Access

Enforces access policiesProvides for content type disseminations

Adapted from Slide by V. Chachra, VTLS

Goals of Institutional Repositories (by Steven Harnad, U. Southampton)

Self Archiving of Institutional ResearchSelf Archiving of Institutional ResearchThesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps

Management of digital collectionsManagement of digital collections

Preservation of materials – decentralized approachPreservation of materials – decentralized approach

Housing of teaching materialsHousing of teaching materials

Electronic Publishing of journals, books, posters, maps, audio, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsvideo and other multimedia objects

Adapted from Slide by V. Chachra, VTLS

Fedora™ Digital Object ArchitecturePersistent ID (PID)

Disseminators

System Metadata

EAD, TEI, DC, MARC,

VRA Core, MIX, etc.

Datastreams

Images, E-books, E-journals, Music, Video, etc.

Globally unique persistent id

Public view: access methods for obtaining “disseminations” of digital object content

Internal view: metadata necessary to manage the object

Protected view: content that makes up the “basis” of the object

The Mellon Fedora Project

Adapted from Slide by V. Chachra, VTLS

Fedora™Repository

E x ter n a lC o n ten tS o u r c e

E x ter n a lC o n ten tS o u r c e

HT

TP

E x ter n a l C o n ten tR etr iev er

X M L F ile s

Re la t io n a l D B

S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n

P o l icies

U s ers /G ro u p s

H T T P

F T P

D atas tr eam s

D ig ita l O b jec tsS to rag e S u b s ys te m

S e c u rityS u b s ys te m

W e b Se r vi c eE xpo s ur eL aye r

SO

AP

R em o teS er v ic e

L o c alS er v ic e

M an ag e A c c e s s S e arc h O A I P ro v id e r

M an ag e m e n tS u b s ys te m

A c c e s sS u b s ys te m

HT

TP

FT

P

H T T PH T T P S O A P H T T P S O A P H T T P S O A P

C lie n tA pplica t io n

B a tchPro g ra m

S e rv e rA pplica t io n

W e bB ro ws e r

Co mp o n e n t M g mt

O b je c t M g mt

O b je c t Va lid a t io n

P ID Ge n e ra t io n

O b je c t D is s e min a t io n

O b je c t Re fle c t io n

P o lic y En fo rc e me n t

P o lic y M g mt

Co n te n t

Web Service Web Service Exposure Exposure LayerLayer

Adapted from Slide by V. Chachra, VTLS

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

users digital objects

?

?1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video?digital library

Monolithicand/or

Custom-builtweb-basedapplication

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

componentized digital library

?

?

?

?

???

?

?

?

?

??

? ?

?

?

?

?

?

?

?

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

open digital library

OA OA

OA

OA

OA

OA

OA

OA

OA

PMH

PMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

Open Digital Library Protocol

Extended OAI-PMH

Protocol for Metadata Harvesting

Open Digital Library Component

Extended OPEN ARCHIVE

OPENARCHIVE

Open Digital Library Deployments

• NDLTD (www.ndltd.org)• Computer Science Teaching Center (www.cstc.org)• Computing and Information Technology

Interactive Digital Educational Library (www.citidel.org)

• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet

• OCKHAM• Open to others through DL-in-a-box

Open Digital Library

• Network of Extended Open Archives where each node acts as either a provider of data, services or both.

• Component = Node

• Protocol = Arc

Open Digital Library Components

• Running now• XML-File (data provider from file system)• Search: simple or in-memory (Essex) or generalized• Union, browse, recent, filter• E-journal/review, Submit, Edit, Annotation• Recommender, Rating; Mirroring (see JCDL’02)• Working with NCSA: from DB, unstructured text

• Others in process• Classification/categorization• Registry (and other connections with web services)

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

ETD-1

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

ETD-2

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

ETD-3

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

ETD-4

ETD DL for the Networked Digital Library of Theses and Dissertations

(www.ndltd.org)

Search

Filter

Filter

Union

Recent

Browse

PMH

PMH

PMH

ODLRecent

ODLBrowse

ODLUnion

ODLUnion

ODLSearch

ODLUnionPMH

PMH

US

ER

INT

ER

FA

CE

Students and researchers ETD collections

Example Open Digital Library

OAI, ODL, DL-in-a-box

• Open Archives Initiative• since 1999, www.openarchives.org

• Open Digital Libraries• since 2001, from www.dlib.vt.edu• with Hussein Suleman (now U. Cape Town)

• DL-in-a-box• NSDL support since 2001• Aimed to help new collections / services projects• http://dlbox.nudl.org

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,

ETD-DB, ODL

• Commercial: IBM Content Manager, VTLS’ VITAL

• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)

• Evaluation, usability

5. Advanced Issues

Commercial DL Examples

• IBM Digital Library

• Virtua (www.vtls.com)• Fedora -> VITAL

• Some systems from NSF DLI projects• Google

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,

ETD-DB, ODL

• Commercial: IBM Content Manager, VTLS’ VITAL

• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)

• Evaluation, usability

5. Advanced Issues

Conceptual Category Feature Name

Discovery Tools

Searching

Browsing

Syndication & Notification

   

Aggregation Tools

Personal Collections

Content Aggregator and Packaging Tool

   

Community & Evaluation

Evaluation System

Context Usage Illustrators

Wish Lists

WCET

LOR

Study

2004

Outline1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,

ETD-DB, ODL

• Commercial: IBM Content Manager, VTLS’ VITAL

• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)

• Evaluation, usability

5. Advanced Issues

Case Study: NCSTRL Costs/BenefitsStakeholders Sample Potential Cost Sample Potential Benefit

Providers Faculty Lower value for P&T Faster publishing

Students Less recognition Broader set of outlets

Practitioners Limited relevance Ease of publishing, > quantity

Users Faculty Lower quality of work Broader access to resources

Students Higher access costs (vs. department available material)

Lower access costs (vs. journal available material)

Departments New maintenance costs Broader visibility

University libraries Additional access costs Access to new resources

Practitioners More difficult access Access to new resources

Outline

1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions

5. Advanced Issues

• Challenges, open problems

• Promising approaches

Digital Libraries --- Objectives

• World Lit.: 24hr / 7day / from desktop• Integrated “super” information systems: 5S:

streams, structures, spaces, scenarios, societies • Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery• Disintermediation -> Collaboration • Universities Reclaim Property• Interactive Courseware, Student Works• Scalable, Sustainable, Usable, Useful

DL Challenges

• Preservation - so people with trust DLs

• Supporting infrastructure - networks, ...

• Scalability, sustainability, interoperability

• DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM, ...

• Need tools & methods to make them easier to build

Outline

1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions

5. Advanced Issues

• Challenges, open problems

• Promising approaches

NDLTD: How can a university get involved?

• Select planning/implementation team• Graduate School

• Library

• Computing / Information Technology

• Institutional Research / Educ. Tech.

• Join online, give us contact names• www.ndltd.org/join

• Adapt Virginia Tech or other proven approach• Build interest and consensus

• Start trial / allow optional submission

Student Gets CommitteeSignatures and Submits ETD

Signed

Grad School

Library Catalogs ETD, Access isOpened to the New Research

WWW

NDLTD

ETD Union Collection (OAI)

VIRTUA

Merged Metadata Collection

ODL (VT)

Virginia Tech ETD Archive

Brazil ETD

Archive

OCLC ETD

Archive

Future: recommender, …

… OAI Data Provider

OAI Service Provider

OAI Harvesting

LEGEND

Union catalog: OCLC

• OCLC will expand OAI data provider on TDs.

• Is getting data from WorldCat (so, from many sites!).

• Will harvest from all others who contact them.

• Need DC and either ETD-MS or MARC.

• Has a set for ETDs.

OCLC SRU Interface

Union catalog: VTLS, VT

• VTLS will enhance search/browse service for ETDs

• Will harvest from OCLC’s set of ETD records

• Will receive through other mechanisms

• Will work with MARC-21 and ETD-MS

• VT will continue to offer experimental services

ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)

VTLS Union CatalogContent Languages

The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish

Examples follow

Language = German; hits = 137

Full record display

Complex to Simple

MARC ($50) Dublin Core (DC)

+thesis

Why ETD?Short Answer

• For Students:• Gain knowledge and skills for the Information Age

• Richer communication (digital information, multimedia, …)

• For Universities: • Easy way to enter the digital library field and benefit thereby

• For the World: • Global digital library – large, useful, many services

• General:• Save time and money

• Increased visibility for all associated with research results

ETANA-DL: 5S Extension

• 5S and component architecture to allow handling of very complex DL applications: archaeology

• Information visualization, clustering

• Mappings across streams, structure, spaces

Case Study (Archaeology):ETANA

• NSF ITR with CWRU (and Vanderbilt …)

• Faster DL development• for complex application domains,• with suitable tailoring

• Approach• ODL – pool of components• 5S – theory-based generation of systems

ETANA Website

Lahav Website

Megiddo Opening Screen

Locus Screen: Pictures

View all

Area Screen: Distribution of Artifacts

ETANA-DL Website

Archaeology DL – Approach

• Solve the following DL problems:• interoperability,• making primary data available,• data preservation

• Modeling archaeological information systems• using 5S theory to design system and services

• Rapidly prototyping DLs that handle• heterogeneous archaeological data using• componentized frameworks

ETANA-DL Schema Design

Bone Seed Figurine

ETANA-DLObject

Count

Animal

……

Species

Name

……

Description

Dimensions

……

Owner

Subpartition

PartitionLocus

ID Container

Collection

……

Data Mapping

ETANA-DL Architecture

Users Services DataETANA-DL

UnionServices Users

DigBase

DigKit

ETANA-DL ArchitectureDigBase and DigKit

Lahav

Nimrin

Umayri

Hisban

Megiddo

Jalul

New Sites

DATABASE

WRAPPERS

ETANA-DLUNION

CATALOG

SearchUSER

INTERFACE

Browse

Recommend

Note

Personalize

Review

Visualizations

ArchaeologySpecific

Work in progress

ETANA-DL Architecture

UnionCatalog

Inverted Files

Services DB

Index

Index

BrowseComponent

SearchComponent

Browse DB

OtherETANA-DL

Services

Web

Interface

XOAI

XOAI

DigBase

DB

DataMapping

Component

OA

I Data P

rovider

OAI

Archaeological Site ETANA-DL

DigKit

Configure

Searching – Search Results

Searching – Advanced Search

Searching – Advanced Search Results

Summary

1. Introduction

2. Historical Perspective

3. Topical Perspective

4. Software Solutions

5. Advanced Issues

Selected Links - http://fox.cs.vt.edu• CITIDEL (computing education resources)

• www.citidel.org• NCSTRL (computing technical reports)

• www.ncstrl.org• NDLTD (electronic theses and dissertations worldwide)

• www.ndltd.org and etdguide.org• NSDL (National Science Digital Library)

• www.nsdl.org• OAI (Open Archives Initiative)

• www.openarchives.org• Virginia Tech Digital Library Research Laboratory

(DLRL, www.dlib.vt.edu)• 5S, AmericanSouth.Org, CSTC, DL-in-a-box, ENVISION,

ETANA, MARIAN, NDLTD, NSDL, OAD, ODL, …)

Questions/Discussion?

top related