ecdl workshop “extending interoperability of digital libraries: building on the open archives...
TRANSCRIPT
ECDL Workshop“Extending Interoperability of
Digital Libraries:Building on the
Open Archives Initiative”
Lisbon – September 21, 2000
Edward A. [email protected] http://fox.cs.vt.edu
CS DLRL Internet TIC
Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected)
Sponsors: CNI, DLF, Dept. of Energy, DFG, NASA, NSF, …
VT Faculty/Staff: Anthony Atkins, …
VT Students: Fernando Das Neves, George Fillipini, Robert France, Marcos Goncalves, Hussein Suleman, …
Program
9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward
Program
9-10 Session 1 – Introduction– Introductory Remarks (Fox, Lagoze) – 15 min– Introductions from Participants
(Fox, Lagoze) – 30 min– Historical Overview (Fox) – 45 min
Introductory Remarks - Fox
Welcome! Thanks to conference organizers Program/Logistics Latest in series of meetings that have
shaped OAI during its first year
Introductions from Participants - 1
“Straw Polls”
Training: CS / LIS / Sciences / Humanities / ?
Work Now: CS / LIS / Sciences / Humanities / ?
Location: University / Industry / Gov. / Assn. / ?
OAI Connection: Run an “archive” or DL or collection / Manage data / Develop software / Standards / ?
Introductions from Participants - 2
OAI Meeting Involvement: Santa Fe mtg / San Antonio mtg / Technical Committee / Cornell mtg / Steering Committee
OAI Trials: Opened an archive / Developed software for OAI / ?
OAI Project: Wrote proposal / Plan to write a proposal / Have internally funded project / Have externally funded project
Introductions from Participants
Short Statements (20 seconds per person)– Name (pronounced slowly, clearly)– Country– Affiliation (institution/organization)
Historical Overview - Fox
Meetings– Santa Fe – “archives of the world unite”
Philosophy Repositories / Building on Black Boxes Approaches to building repositories VT view Some proposals for funding Development efforts
Open Archives Initiative (OAI) xxx@LANL, high-energy physics (Ginsparg, 1991) CSTR + WATERS = NCSTRL (Lagoze,1994) xxx + NCSTRL = CoRR collaboration (1998) Universal Preprint Service protoproto, Oct. 21-22, 1999, Santa Fe
– led by LANL, CNI, DLF, Mellon --> OAi Santa Fe Convention (see Feb. D-Lib Magazine article) Follow-on mtgs: 6/3@San Antonio, 9/21@Lisbon (ECDL) Archives -> Open Archives
– Support unique archive identifiers– Implement Open Archives metadata set (DC, using XML)– Implement OA harvesting protocol (derived from Dienst protocol)– Register the archive
Build tools, layer other services: linking, searching, …
Open Archives (protoproto)
ArXiv & Los Alamos National LabCogPrints & U. SouthamptonNACA & NASA (reports)NCSTRL & Cornell U.NDLTD & Virginia TechRePEc & U. SurreyTotal of around 200K records
Original Open Archives Members
American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation
NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University
Open Archives Future – 1st View
EconWPA (U. Washington) e-biomed -> PubMed Central (NIH) PubScience (DOE) Clinical Medicine Netprints (+ other HighWire Press holdings ) University ePub (California Digital Library) All public e-prints (MIT) Scholar’s Forum (Caltech) Int’l: CERN, Germany, India, Mexico, … Goal: millions of books/articles/reports / yr
OAi Philosophy
Self-archiving = submission mechanismLong-term storage system = archiveOpen interface = harvesting mechanismData provider + service providerStart with “gray literature”
– e-prints/pre-prints, reports, dissertations, …
Black Box OAI-ETD Perspective
ISTEC(Ibero
America)
PhysDis
NSYSU(Taiwan)
ADT(Australia)
BN.PT(Portugal)
www.theses.org
CyberTheses(Francophone)
VT
Dissert.Online(Germany)
MITOhioLINK
CBUC(Catalunya)
NDC(Greece)
NDC(Greece)
CIC U. Bergen(Norway)
…
…
Approaches to Open Archives
Build ByDiscipline
Build By Institution
AuthorCategoryInterdisciplinaryYearLanguageQuery …
Mechanisms
Sharing– Join initiative, run software– Make metadata and archive available
Aggregating– By discipline– By institution– By genre
Automating– Workflow– Harvesting and providing services– Federated searching– Dynamic linking (e.g., with SFX)
VT View of the Open Archives Initiative (OAI)
Enable sharing of publication metadata and full-text by digital libraries
Standardize low-level mechanisms to share contents of libraries
Build higher-level user-centric and administrative services in meta-libraries
Install organizational mechanisms to support the technical processes
Virginia Tech Projects
MARC XML-DTDComputer Science Teaching Centre (CSTC)W3C Web Characterization RepositoryOAI Repository ExplorerNetworked Digital Library of Theses and
Dissertations (NDLTD)OAI-Campus (esp. multimedia)
MARC XML-DTD
XML Transport format for US-MARC records
Standardized metadata exchange format for traditional library services joining OAI
CS Teaching Center (CSTC)
Collection of reviewed online resources used to aid in teaching of Computer Science
Supports author submission and peer-review process for new ACM Journal of Educational Resources In Computing (JERIC)
Connected with NSDL (NSF 00-44)
http://www.cstc.org
W3C Web Characterization Repository
Online database of metadata related to publications, tools and data sets dealing with Web characterization
Project of the Web Characterization Activity working group of the World-Wide-Web Consortium (www.w3c.org/WCA)
http://purl.org/net/repository
OAI Repository Explorer
Serves as a compliancy test Allows browsing of open archives using only OAI
protocol Sends requests on behalf of user, parses and checks
responses and displays browsable interface Will detect most discrepancies in protocol
http://purl.org/net/explorer
OAI-Campus
Undergrad term project for Honors course on digital libraries
Aim is to have many OAs on campus Emphasis will be on multimedia collections Survey developed for campus:http://intercom.virginia.edu/
SurveySuite/Surveys/OAiVT
Funding Success
NSF-DFG / VT-Oldenburg: OAI research for next 3 years 2 countries 2 domains
– Physics– Electronic theses and dissertations (ETDs)
Evolution of existing efforts to use OAI Refinement of services as ontologies develop
Funding Failures
NSF ITR – Large US Dept. of Education (FIPSE) – 5 sites
– Training– Graduate students
Figure 1. Layers Related to Open Archives Initiative
Services
…
Search/Browse
Authoring Citation Checking Submission
Metadata Creation
Editorial: Reviewing, Certification
Registry
Archives: Name, ID, Description, Terms and Conditions, …
Metadata Formats: Name, XML DTD, …
…
Archive Formats: Name, Standard, Preservation Process, …
Protocols Tools
Services
Copy-Edit / Add Value Citation DB Updating
Authority Control
Preservation Conversion
Text/MM Editing
Gazetteer Cataloging
Collaboration
Annotation
Summarization
Citation / Linking
SFX
CiteSeer
Repository NCSTRL Repository
…
EconWPA Repository
RePEc Repository
Repository for NDLTD Open Archives Harvesting Protocol
Metadata Formats: OA Metadata Set, NDLTD Standard (DC-based) Set
Transaction Log
Training Resources
VT Partition
Record (Metadata)
Record (Full Content)
… …
UVA Partition
Metadata Content
Caltech Partition
Metadata Content
Other Development Efforts
Cornell Software Los Alamos Software Southampton Software ODU Software Other Software Registered Archives
Program
9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward
Program
11-12:30 Session 2 – Technical Details– Expanding the Scope and New Technical
Agreements (Lagoze) – 60 min– Framing the Discussion for the Afternoon (Fox) –
30 min
Framing the Discussionfor the Afternoon – Fox - 1
Divide into groups soon for lunch Sit and discuss in groups during lunch Groups report back in afternoon
– Present comments orally– Lead discussion of those comments
Groups submit report later through email
Framing the Discussionfor the Afternoon – Fox - 2
Possible Groups: Political agendas and their unfolding
– “Gray literature”, Courseware/NSDL, … Guiding principles for technical agenda
– What is an archive?– What is best terminology?
Implementation plans for OAI core
Framing the Discussionfor the Afternoon – Fox - 3
Possible Groups (cont’d): Requirements for OAI-related services /
design of component-based DL Implementation plans for OAI-related services Linking OAI with other initiatives: science data, …
Program
9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward
Program
2-3:30 Session 3 – Discussion– Funding Agencies/Sponsors – 30 min– General Discussion (Fox, Lagoze) – 60 min
General Discussion
Reactions to OAI agreements Applications of OAI to communities
represented by attendees
Program
9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward
Program
3:50-4:20 Session 4 – Presentations– Constantino Thanos (IEI-CNR, Italy)– Robert Tansley (U. Southampton, UK)– Eberhard Hilf (U. Oldenburg, Germany)
Program
9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward
Program
4:20-5 Session 5 – Moving Forward (Fox, Lagoze) – 40 min– Plans for implementation– Future research agendas– Community building: listservs, …
VT - 1
General purpose tools:Hussein’s PERL implementationMarcos’ Java implementationOAI Repository Explorer – Version 2
http://purl.org/net/explorer
VT - 2
NSDL / XXDL ?Bill Graves, Collegis/EdupriseIMSUNC WilmingtonChemistry, CS, Math, …CSTC already involved in OAI
VT - 3
MARIAN:Evolved from CODER (~1987)C/C++ version: SIGIR’93Research and production DL systemHarvest/Gateway: Dienst, “Harvest”,
OAI, Z39.59 + OAI to Greenstone, Phronesis
www.theses.org
OAI Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
Sets by subject Sets by origin
MDO MDOMDOMDO
MARIAN
Dienst
VTLS
Harvest Z39.50 OAI - 1 OAI - 2…
Figure 1. Layers Related to Open Archives Initiative
Services
…
Search/Browse
Authoring Citation Checking Submission
Metadata Creation
Editorial: Reviewing, Certification
Registry
Archives: Name, ID, Description, Terms and Conditions, …
Metadata Formats: Name, XML DTD, …
…
Archive Formats: Name, Standard, Preservation Process, …
Protocols Tools
Services
Copy-Edit / Add Value Citation DB Updating
Authority Control
Preservation Conversion
Text/MM Editing
Gazetteer Cataloging
Collaboration
Annotation
Summarization
Citation / Linking
SFX
CiteSeer
Repository NCSTRL Repository
…
EconWPA Repository
RePEc Repository
Repository for NDLTD Open Archives Harvesting Protocol
Metadata Formats: OA Metadata Set, NDLTD Standard (DC-based) Set
Transaction Log
Training Resources
VT Partition
Record (Metadata)
Record (Full Content)
… …
UVA Partition
Metadata Content
Caltech Partition
Metadata Content