oai from the needle box humboldt universität berlin, march 20, 2002 thomas krichel palmer school of...

25
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University With apologies to Carl Lagoze

Upload: herbert-riley

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

OAI from the needle box

Humboldt Universität Berlin, March 20, 2002

Thomas KrichelPalmer School of Library and Information Science

Long Island University

With apologies to Carl Lagoze

Page 2: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Where I come from...

• Trained economist• Early (1991) visionary of free online scholarship • Creator of NetEc in 1993• Principal founder of RePEc in 1997

– Largest distributed academic DL in the world

– Collection that is open for • Contribution

• Usage

– Grown to over 200 archives, over 10 partly interoperable user services

Page 3: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Metadata collection process

• Metadata is expensive to collect.

• Free online scholarship requires academic self-

documentation

• Building free metadata collection is difficult• no established business model

• no established funding channels

• Only a collaborative effort will be succeed.

Page 4: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

The example of eprint servers

• attractive building block for the transformation of

scholarly communication

• but isolated efforts do not make for a scholarly

communication system

• need to federate archives

• need to interoperate with other scholarly

communication components

Page 5: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

e-print

Example: e-print accessibility

e-print

e-print

e-print

e-print

Page 6: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

e-print

Example: e-print accessibility

e-print

e-print

e-print

e-print

Page 7: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

metadata harvesting

metadata

e-print

e-print

e-print

e-print

e-print

Page 8: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

metadata harvesting

metadata

AuthorTitleAbstractIdentifer

e-print

e-print

e-print

e-print

e-print

Page 9: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

other examples

• within the area of scholarly commuication

• already implemented in RePEc

• Sharing of log data between service providers

• Provision non-document data for document data

provider

• personal data

• institutional data

Page 10: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

core concepts in OAI 1.1

• shared metadata format

OAI 1.1 protocol

Dublin Core

HTTP based

Community specific

Reply • XML Schema

• Self contained

• low-barrier interoperability

• data-provider / service-provider model

• metadata harvesting model

• parallel metadata formats

Page 11: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

harvester / repository

repos i tory

oai protocol

harves ter

supportdata

harvestingdata

items

Page 12: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

OAI protocol requests

Supporting protocol requests:• Identify• ListMetadataFormats• ListSets

Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord

repos i tory

harves ter

service provider data provider

Page 13: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

HTTP encoding - requests

BASE-URL -----------> an.oa.org/OAI-scriptkeyword arguments -->verb=ListIdentifers&set=S1

GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1

Page 14: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

HTTP encoding - responses

<xml version=1.0 encoding=“UTF-8” ?><GetRecord

xmlns=“http://oai.namespace.uri”xmlns:xsi=“http://w3.namespace.uri”xsi:schemaLocation=“http://oai.namespace.uri

http://oai.schemaURL”><responseDate>2000-19-01T19:30:30-04:00</responseDate><requestURL>http://an.oa.org/OAI-script?verb=GetRecord

&amp;identifier=oai%3AarXiv%3A0001&amp;metadataPrefix=oai_dc</requestURL>

<record>record contents

</record>additional records

</GetRecord>

responseheader

xml namespace

s

responsedata

Page 15: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

record<record>

<header><identifier>oai:eg:001</identifier><datestamp>1999-01-01</datestamp>

</header><metadata>

<dc xmlns=“http://purl.org/dc”><title>My Example</title>

</dc></metadata><about>

<ea xmlns=“http://www.arXiv.org/ea”<usage>No restrictions</usage>

</ea></about>

</record>

protocol support

format-specificmetadata

community-specific

record data

Page 16: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

selective harvesting - datestamps

repos i tory

harvest withindate range

record

record

Page 17: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

selective harvesting - sets

repos i tory

harvest within setS1

recordrecord

record

S2

Page 18: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Communication re OAI

• lists: subscribe via http://www.openarchives.org

• oai-general list

• oai-implementers list

• web: http://www.openarchives.org

• FAQ: http://www.openarchives.org/faq.htm

• mail: [email protected]

Page 19: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

• Version 1.1 frozen specifications for 12 -18 months:

• stable for experimentation; not definitive• minimize risk for early adopters

• maximize chances for future interoperability across communities

revision of specifications

The technical committee are working on the “definitive” specifications. They will come out2002-05-01.

Page 20: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

The technical committee

- Herbert Van de Sompel (LANL) - Carl Lagoze (Cornell U)

- Thomas Krichel (Long Island U & RePEc) - Jeff Young (OCLC) - Tim Cole (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U & arXiv) - Michael Nelson (NASA & NACA) - Caroline Arms (Library of Congress) - Muhammad Zubair (Old Dominion U & ARC) - Steven Bird (U Penn & Open Language Archive Community) - Robert Tansley (MIT & DSpace) - Andy Powell (UK (UKOLN) - Mogens Sandfær (DTV, Denmark) - Thomas Severiens (Oldenburg U & Physnet) - Thomas Baron (CERN) - Les Carr (U of Southampton) - Thomas Place (Tilburg U)

Page 21: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Issues in front of the committee

Error Handling: SOAP: Harvesting Granularity:   Mandatory DC: Set Semantics and Collection Description:XML Schema: Result Set Filtering: Flow Control, Result Set Cardinality, Response Level Container: Awareness Mechanisms: Multiple Metadata Return and "Best" Metadata Selection: Machine Readable Rights Management: From GetRecord to GetRecords: Dedupping Issues: idempotency of base-urls:xml format for mini-archives: response compression:

Page 22: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Thank you for your attention!

Thomas KrichelPalmer School of Library and Information Science720 Northern BoulevardBrookville NY 11548-1300USAhttp://openlib.org/home/[email protected]

Page 23: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Error handling

• badArgument• badGranularity • badResumptionToken• badVerb • cannotDisseminateFormat • idDoesNotExist • noRecordsMatch • noSetHierarchy

Page 24: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

SOAP

• SOAP is a mechanism to transmit service requests over the Internet.

• As yet it is not a fully matured protocol.

• A SOAP compatible version of the protocol may be written later.

Page 25: OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University

Harvesting granuality• From and Until arguments may allow a

more finer time stemps, up to one second.

• Level supported is chosen by the data provider and set in the response to the Identify verb.

• All times expressed in UTC.