u.s. government use of the oai-pmh

36
U.S. Government Use of the OAI-PMH Michael L. Nelson Old Dominion University Norfolk Virginia, USA [email protected] http://www.cs.odu.edu/~mln/ Indo-US Workshop on Open Digital Libraries and Interoperability Arlington, VA - June 23-25, 2003

Upload: signa

Post on 13-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

U.S. Government Use of the OAI-PMH. Michael L. Nelson Old Dominion University Norfolk Virginia, USA [email protected] http://www.cs.odu.edu/~mln/. Indo-US Workshop on Open Digital Libraries and Interoperability Arlington, VA - June 23-25, 2003. Acknowledgements. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: U.S. Government Use  of the OAI-PMH

U.S. Government Use of the OAI-PMH

Michael L. NelsonOld Dominion University

Norfolk Virginia, USA

[email protected] http://www.cs.odu.edu/~mln/

Indo-US Workshop on Open Digital Libraries and Interoperability

Arlington, VA - June 23-25, 2003

Page 2: U.S. Government Use  of the OAI-PMH

Acknowledgements

• ODU: K. Maly, M. Zubair, J. Bollen, X. Liu

• LANL: R. Luce, X. Liu

• NASA: G. Roncaglia, J. Rocker

• MAGiC (UK): P. Needham

Page 3: U.S. Government Use  of the OAI-PMH

Outline

• Review:– OAI-PMH– data provider / service provider model

• including “aggregators”

• Role of registration for repositories• NASA projects• OSTI demo project• Technical Report Interchange (TRI)

– NASA, DOE, DOD

Page 4: U.S. Government Use  of the OAI-PMH

Disclaimer: Scientific and Technical Information (STI)

• This talk will cover US Government focused / sponsored STI only

• This talk will not cover American Memory– a cultural history project from the Library of

Congress (LoC)• http://memory.loc.gov/

– the LoC played a significant role in the definition and early adoption of the OAI-PMH

Page 5: U.S. Government Use  of the OAI-PMH

Acronym Review

NASA Department of Energy Department of Defense

CASI(Center for AeroSpace

Information)http://www.sti.nasa.gov/

OSTI(Office of Scientific and Technical Information)

http://www.osti.gov/

DTIC(Defense Technical Information Center)http://www.dtic.mil/

LaRC = Langley Research Center LANL = Los Alamos National LaboratorySandia = Sandia National Laboratory

AFRL = Air Force Research Laboratory

Page 6: U.S. Government Use  of the OAI-PMH

The Rise and Fall of Distributed Searching

• wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice– Davis & Lagoze, JASIS 51(3), pp. 273-80– Powell & French, Proc 5th ACM DL, pp. 264-265

• distributed searching of N nodes still viable, but only for small values of N

• NCSTRL: N > 100; bad• NTRS/NIX: N<=20; ok (but could be better)

Page 7: U.S. Government Use  of the OAI-PMH

resource

all available metadata about David

item

Dublin Coremetadata

MARCmetadata

SPECTRUMmetadata records

item = identifier

record = identifier + metadata format + datestamp

set-membership is item-level property

resource – item - record

Page 8: U.S. Government Use  of the OAI-PMH

Overview of OAI-PMH Verbs

Verb Function

Identify description of repository

ListMetadataFormats metadata formats supported by repository

ListSets sets defined by repository

ListIdentifiers OAI unique ids contained in repository

ListRecords listing of N records

GetRecord listing of a single record

metadataabout therepository

harvestingverbs

most verbs take arguments: dates, sets, ids, metadata formatsand resumption token (for flow control)

Page 9: U.S. Government Use  of the OAI-PMH

Data Providers / Service Providers

data providers(repositories)

service providers(harvesters)

Page 10: U.S. Government Use  of the OAI-PMH

Aggregators

data providers(repositories)

service providers(harvesters)

aggregator

aggregators allow for:• scalability for OAI-PMH• load balancing • community building• discovery

Page 11: U.S. Government Use  of the OAI-PMH

Aggregators

• Frequently interchangeable terms:– aggregators: likely to be community / institutionally

focused– caches: stores a copy, less likely to be community-

oriented– proxies: less likely to store a copy, may gateway between

OAI-PMH and other protocols• Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03

• To learn more about aggregators, caches & proxies:– http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm– http://www.cs.odu.edu/~mln/jcdl03/

Page 12: U.S. Government Use  of the OAI-PMH

Example Aggregators

• Arc - http://arc.cs.odu.edu/– first described “hierarchical harvesting” in D-

Lib Magazine, 7(4) 2001• http://www.dlib.org/dlib/april01/liu/04liu.html

• Celestial - http://celestial.eprints.org/– among other services, it provides a history of

harvests (successful vs. errors)• http://celestial.eprints.org/cgi-bin/status

Page 13: U.S. Government Use  of the OAI-PMH

OAI-PMH 2.0 Registration

Data Providers: http://www.openarchives.org/Register/BrowseSites.plService Providers: http://www.openarchives.org/service/listproviders.html

75 repositories registered

??? unregistered repositories

unregistered because:• testing / development• not for public harvesting • public, but “low-profile”• never got around to it…• ???

DP:SP ~= 5:1

Page 14: U.S. Government Use  of the OAI-PMH

Registration is Nice……But Not Required

• OAI-PMH is (becoming) the “http” for digital libraries– there is no central registry of http servers

• remember the NCSA “What’s New” page? (ca. 1994)

• There will never be “registration support” in OAI-PMH– registries are a type of service provider, built on top of

OAI-PMH– registration will be an integral part of community

building– friends…

Page 15: U.S. Government Use  of the OAI-PMH

<friends>• A light weight, optional, DP-centric

method to communicate the existence of “others”

http://techreports.larc.nasa.gov/ltrs/oai2.0/?verb=Identify

..<description> <friends ..namespace stuff..> <baseURL>http://naca.larc.nasa.gov/oai2.0</baseURL> <baseURL>http://ntrs.nasa.gov/oai2.0</baseURL> <baseURL>http://horus.riacs.edu/perl/oai/</baseURL> <baseURL>http://ston.jsc.nasa.gov/collections/TRS/oai/</baseURL> </friends> </description>..

Page 16: U.S. Government Use  of the OAI-PMH

<friends>…</friends>

http://techreports.larc.nasa.gov/ltrs/oai2.0/ http://naca.larc.nasa.gov/oai2.0/

http://ntrs.nasa.gov/oai2.0/

http://ston.jsc.nasa.gov/collections/TRS/oai/

http://horus.riacs.edu/perl/oai/

harvester

Identify

NASA <friends> example

Page 17: U.S. Government Use  of the OAI-PMH

Use of <friends>

Slide from S. Warner, Cornell University

Page 18: U.S. Government Use  of the OAI-PMH

Langley Technical Report Server

• publicly available– began as an anonymous ftp

server in 1992; http access in 1993

– model for other technical report servers at other NASA centers

• details in NASA TM-109162

• mostly LaTeX, MS Word, other systems– some scanned reports

http://techreports.larc.nasa.gov/ltrs/http://techreports.larc.nasa.gov/ltrs/oai2.0/

Page 19: U.S. Government Use  of the OAI-PMH

NACA Technical Report Server

• publicly available– began in 1996– details in NASA TM-1999-

209127

• scanned reports from 1917-1958– NACA = predecessor to NASA

• contents mirrored with the MaGIC project– a UK-based grey-literature

preservation project– OAI-PMH used to mirror

contents

http://naca.larc.nasa.gov/http://naca.larc.nasa.gov/oai2.0/

Page 20: U.S. Government Use  of the OAI-PMH

NACA Report 1345

as seen through its native DLhttp://naca.larc.nasa.gov/

Page 21: U.S. Government Use  of the OAI-PMH

NACA Report 1345

as seen through MAGiChttp://www.magic.ac.uk/

Page 22: U.S. Government Use  of the OAI-PMH

NACA Report 1345

as seen through its Scirus(Elsevier)http://www.scirus.com/

Page 23: U.S. Government Use  of the OAI-PMH

NACA Report 1345

as seen through OAIster

http://oaister.umdl.umich.edu/

Page 24: U.S. Government Use  of the OAI-PMH

NACA Report 1345

as seen through my.OAI(FS Consulting)http://www.myoai.com/

Page 25: U.S. Government Use  of the OAI-PMH

NTRS OAI Architecture

user

. . .

search for “cfd applications”

local copy ofmetadata

metadata harvested offline, through OAI interface

each node independently maintained

individual nodes canstill support direct userinteraction

NTRS

LTRS ATRS GTRS CASITRS

all searching, browsing, etc. performed on the metadata here

content (reports) remain archived at the local sites

Page 26: U.S. Government Use  of the OAI-PMH

NASA Technical Report Server• publicly available• replacement for the former

distributed searching version of NTRS– MySQL– Va Tech harvester– modified “bucket”– details in Nelson, Rocker,

Harrison, Library Hi-Tech, 21(2) (July 2003)

• a service provider & aggregator– same OAI-PMH baseURL as

used for interactive searchinghttp://ntrs.nasa.gov/

Page 27: U.S. Government Use  of the OAI-PMH

NASA Technical Report Server

• advanced, fielded search

• explicit query routing – 12 NASA repositories

– 4 non-NASA repositories

• turned “off” by default

Page 28: U.S. Government Use  of the OAI-PMH

non-NASArepositories

> 0.5M records

Page 29: U.S. Government Use  of the OAI-PMH

NASA DLs in the Larger STI Realm

NTRS

LTRS ATRS CASITRS…

DOEDODUniversitiesPublishers . . .International

NTRS could also be a data provider from the point of view of other DLs; allowing theharvesting of NASAreport metadata.

NTRS could also harvestmetadata from other DLs,and provide access to non-NASA content.

We hope to influencethe direction of the science.gov effort to useOAI-PMH

this could be a fully connected graph

Page 30: U.S. Government Use  of the OAI-PMH

OSTI Energy Citations Database

• OAI-PMH support just recently added (Feb 2003)– not yet officially

announced or registered

– 20k records, 8k full-text

• other OSTI collections planned

http://www.osti.gov/energycitations/

Page 31: U.S. Government Use  of the OAI-PMH

Technical Report Interchange • Goal: share technical reports between 4 US

government labs without creating new digital libraries for users to learn!– NASA Langley Research Center– Air Force Research Laboratory– Los Alamos National Laboratory (DOE)– Sandia National Laboratory (DOE)

• Solution: use cooperating OAI-PMH caches at each site to – export local contents – ingest remote contents

Page 32: U.S. Government Use  of the OAI-PMH

TRI Production System - Status

LaRCTRI System

LANLTRI System

SandiaTRI System

AFRLTRI System

ODUTRI System(Listener)

Records coming in from other TRI systems

Records going out to other TRI systems

Slide from M. Zubair, ODU

ProposedIn

Production

Page 33: U.S. Government Use  of the OAI-PMH

Mappings in TRI

Laboratory NativeMetadataFormat

Native SourceCommercial DLSystem

NativeDestinationCommercial DLSystem

LaRC MARC BASIS+ (TBD)LANL MARC + local fields Geac ADVANCE Science ServerAFRL COSATI Sirsi STILAS Sirsi STILASSandia MARC Horizon Verity

Details in Liu, et al. ECDL 2002; the above table also taken from the same paper

Page 34: U.S. Government Use  of the OAI-PMH

A Single TRI Module

Local DB

Scheduler

Read new data fromremote DLWrite new data publishedin local DL

Input Directory

Local DL Manager

Remote Data inDC formatLocal Data in DC format

Write Remote data to localformat

output Directory

Read local data andconvert to DC format

Connect to remote DL byOAI protocol

OAI Harvester ControlCommon Modules in all three DLsSpecific module for each DL

Slide from M. Zubair, ODU

Page 35: U.S. Government Use  of the OAI-PMH

The Future: Community Building

• Ultimately, protocols and metadata formats are not what makes a difference

• Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML)

• The best current example: The Open Language Archives Community – http://www.language-archives.org/

• OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends

Page 36: U.S. Government Use  of the OAI-PMH

STI Communities

• Government produced/sponsored STI• http://ntrs.nasa.gov/• http://www.osti.gov/energycitations/• http://dlib.cs.odu.edu/tri/

• Academia– self-archiving vs. institutional archives

• http://www.soros.org/openaccess/• http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm

• Commercial publishers– e.g. BioMed Central

• http://www.biomedcentral.com/