oai-pmh harvester for agricultural knowledge gathering (development, testing and implementation)

38
1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova Francesco Castellani and Stefka Kaloyanova 4 February 2009

Upload: armand-gutierrez

Post on 02-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation). Francesco Castellani and Stefka Kaloyanova 4 February 2009. Overview. Introduction The main requirements for OAI-PMH harvester Selection and rational Requirements for Data Providers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

1

OAI-PMH harvesterfor agricultural knowledge gathering

(Development, testing and implementation)

Francesco Castellani and Stefka KaloyanovaFrancesco Castellani and Stefka Kaloyanova

4 February 2009

Page 2: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

2

Overview IntroductionIntroduction The main requirements for OAI-PMH harvesterThe main requirements for OAI-PMH harvester Selection and rational Selection and rational Requirements for Data ProvidersRequirements for Data Providers OAI framework workflow and the six verbsOAI framework workflow and the six verbs AGRIS Network and OAI-PMHAGRIS Network and OAI-PMH Setup of a harvesterSetup of a harvester InstallationInstallation Technical detailsTechnical details Main functionsMain functions Management and trouble shootingManagement and trouble shooting Results, summary and conclusionsResults, summary and conclusions Next steps Next steps

Page 3: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

3

Introduction

Main role of a harvester:Main role of a harvester:

To set up a mechanism for automatic

gathering of metadata and saving it in a

common place (central repository) as a

file system or database

Page 4: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

4

The main requirements for OAI-PMH harvester To retrieve and define remote OAI data providers for To retrieve and define remote OAI data providers for

harvesting , harvesting ,

To collect data from them according to the rules and To collect data from them according to the rules and

requirements of OAI-PMH protocol (usually it is done requirements of OAI-PMH protocol (usually it is done

automatically)automatically)

To ensure saving of this data at the central file To ensure saving of this data at the central file

system or database repository for further indexing system or database repository for further indexing

and search at the service provider (portal)and search at the service provider (portal)

Page 5: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

5

Many harvesters available as OSS Selection (Pro and cons) Selection (Pro and cons)

PKP harvester PKP harvester

OCLC harvesterOCLC harvester

Evaluation and testingEvaluation and testing

PKP harvester PKP harvester

OCLC harvesterOCLC harvester

Selection of OCLC harvester and its adaptation to the Selection of OCLC harvester and its adaptation to the

existing AGRIS flow existing AGRIS flow

Page 6: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

6

The requirements for OAI-PMH Data providers

Exposing data over Internet according to the Exposing data over Internet according to the

6 verbs of OAI-PMH6 verbs of OAI-PMH

To allow selective harvesting by date/setTo allow selective harvesting by date/set

Use of Resumption Tokens for flow control Use of Resumption Tokens for flow control

To ensure a response compression, To ensure a response compression,

validation and normalization of the data.validation and normalization of the data.

Page 7: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

7

OAI framework

HARVESTER

REPOSITORIES

OAI-PMH request for selective harvesting:Datestamp,Set

OAI-PMH XML records

Service provider Data provider

DP – ensures that the Internet accessible institutional repositories expose metadata for their digital objects to harvesters following OAI-PMH rules

SP – operates harvester as means of collecting metadata and provides extended services using harvested metadata

The quality of the service is proportional to the quality of the data harvested.

Page 8: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

8

Workflow: database - OAI-PMH-harvester

HarvesterISISOAI

(OAI plug-in/

Java layer)

WWWISIS or

wxisCDS/ISIS database

XML response

XML response

Service provider Data provider

Script interaction to database

Script: http://www4.fao.org/cgi-bin/oaiagris.exe?database=agris&search_type=query&query=ID=UY2006005761&table=mont&lang=oai&format_name=oaidc

OAI request

Request: http://www4.fao.org:8080/oaiagris/OAIHandler?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai%3Aagris.uruguay%3AUY2006005761

Page 9: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

9

OAI-PMH: the six verbs Verb Function

Identify Describes the repository

ListMetadataFormats Gives all metadata formats supported by this

repository

ListSets Describes the possible subsets defined by

repository (semantic or type of doc.)

ListIdentifiers Lists record identifiers for given

set/date-range/metadata format from this

repository

ListRecords Gives all records for given

set/date-range/metadata format from this

repository

GetRecord Get a single record by identifier

Page 10: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

10

OAIagris

Data agregator hosting metadata

(KAINet)

OAIcat

Not on Internet

Accessible on Internet

OAIagris

Service provider

OAI -DC

OAI - AGRIS AP

Data Harvester

AGRIS Service provider

FAOBIB

OAI AGRIS AP

Data Harvester

Service provider OAISter

Data Harvester

Data providerRepository

OAIagris

Local database

Local database

Local database

OAI DC

File systemXML repository

Data provider Harvester Service provider

KAINet Service provider

AGRIS services

AGRIS network

Page 11: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

11

Technical details Customized Java application on the top of OCLC Customized Java application on the top of OCLC

Harvester2Harvester2 that provides an OAI-PMH that provides an OAI-PMH harvester framework harvester framework

Open Source Software (OSS) ready to be Open Source Software (OSS) ready to be included in the CVS repository included in the CVS repository

Framework used in this project:Framework used in this project: Hibernate (Object Relation Mapping (ORM) Hibernate (Object Relation Mapping (ORM)

for RDBMS independency), persistence layerfor RDBMS independency), persistence layer Quartz (for the scheduling framework)Quartz (for the scheduling framework) Prototype framework AJAX for the Web user Prototype framework AJAX for the Web user

interface (mainly used for AGRIS centers interface (mainly used for AGRIS centers information)information)

RDBMS (MySQL) database to keep statisticsRDBMS (MySQL) database to keep statistics

Page 12: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

12

Setup of a harvester

InstallationInstallation

Register data providers to be harvested Register data providers to be harvested

(parameters)(parameters)

Establish schedule procedure (parameters)Establish schedule procedure (parameters)

Define output files and where to be savedDefine output files and where to be saved

Page 13: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

13

Installation:

Installation of TomcatInstallation of Tomcat

Installation of JavaInstallation of Java

Installation of MySQLInstallation of MySQL

Installation of harvesterInstallation of harvester

Page 14: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

14

Functionalities: SchedulerScheduler Data ProviderData Provider

Add newAdd new List/ Modify/ DeleteList/ Modify/ Delete

StatisticsStatistics List Data ProvidersList Data Providers Trace LogTrace Log

Page 15: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

15

Define parameters for each Data Provider

• Activate or Deactivate data providerActivate or Deactivate data provider• Title * Title * • Description Description • URL * URL * • Data Provider's Name Data Provider's Name • Administrator's E-mail Administrator's E-mail • Metadata Format * Metadata Format * • Set Specification Set Specification • Start Date / YYYY / MM DD Start Date / YYYY / MM DD

Page 16: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

16

Define data providers (DP)

Requires Title and URL to identify DPRequires Title and URL to identify DP

Dynamic recognition of the data Dynamic recognition of the data

provider’s parameters using OAI-PMH provider’s parameters using OAI-PMH

verb (Identify, Listset, metadataPrefix)verb (Identify, Listset, metadataPrefix)

Additional information taken from the Additional information taken from the

AGRIS data providers (mdb file) AGRIS data providers (mdb file) center code (CC), name and acronymcenter code (CC), name and acronym description of the participating centerdescription of the participating center search in AGRIS portal etc.search in AGRIS portal etc.

Page 17: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

17

Parameters for metadata format and subset selection

Available subsets as defined in ListSets Available subsets as defined in ListSets

OAI-PMH and selection of the one OAI-PMH and selection of the one

suitable for AGRIS (if not selected the suitable for AGRIS (if not selected the

whole database will be harvested)whole database will be harvested)

Available formats for storage from Available formats for storage from

ListMetadataFormats:ListMetadataFormats: AGRIS APAGRIS AP DCDC othersothers

Page 18: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

18

Defining schedule for each data provider

Continuous (runs every N minutes)Continuous (runs every N minutes)

Daily (runs every day at a given time)Daily (runs every day at a given time)

Weekly (runs every week at a given day and Weekly (runs every week at a given day and

time)time)

Monthly (runs every month at a given day and Monthly (runs every month at a given day and

time)time)

Page 19: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

19

Data storage parameters *

Identify format/type of storage Identify format/type of storage * *

File prefix for the data provider File prefix for the data provider * *

Page 20: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

20

List of defined data providers

List/Delete or Modify the List/Delete or Modify the

parameters for a data providerparameters for a data provider

Trace log for Trace log for eacheach data provider data provider

Page 21: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

21

List of Data providers defined for harvesting

Page 22: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

22

Scheduler /status of the harvesting

As for topic Two

Page 23: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

23

Define a Data Provider for harvesting

Page 24: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

24

Page 25: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

25

List of Data providers expanded for delete or modify

Page 26: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

26

Statistics:Trace log

Page 27: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

27

Statistics: Trace log

Page 28: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

28

Results from the harvesting/Trace logs

Page 29: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

29

Structure of the result XML files

Ordered by Data providerOrdered by Data provider

by formatby format

by subsetby subset

Page 30: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

30

Result file from FAOBIB harvesting

Page 31: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

31

Management of the harvesting Status (active/not active)Status (active/not active)

Management of errorsManagement of errors

Statistics kept in the MySQL database Statistics kept in the MySQL database

including: including:

the last range harvested;the last range harvested;

the date of last harvesting done for starting the date of last harvesting done for starting

the next harvestingthe next harvesting

number of records harvested;number of records harvested;

name of the XML files generatedname of the XML files generated

Administration Administration

Page 32: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

32

What was done until now: Harvester developed (shown to the group)Harvester developed (shown to the group)

Testing with more than 15 different Testing with more than 15 different

repositories (SciELO, Orton Library, repositories (SciELO, Orton Library,

FAOBIB, BIBSYS, National Library of FAOBIB, BIBSYS, National Library of

Portugal, hosted WEBAGRIS databases Portugal, hosted WEBAGRIS databases

(Uruguay, Peru)(Uruguay, Peru)

Fixing of bugs and a lot of new FAO Fixing of bugs and a lot of new FAO

requirements (or changes)requirements (or changes)

Full documentation and installation Full documentation and installation

package availablepackage available

Page 33: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

33

List of additional works done:

Error handling: in case of bad AGRIS AP xml the process should stop after 3rd trial that produces empty xml

adding “monthly” as period for harvesting in the scheduler as possible parameter

Changing RDBMS keeping statistics to MySQL Introducing login and password Enable changing of the path for the XML files Adding number of records harvested on the initial display of

DP Additional modifications of the menus Adding of additional parameters (CC, Name, acronym etc.)

for data provider taken from mdb for AGRIS data providers Changing the naming of the produced output files and

including the center code Cleaning of OAI part and the wrong namespaces in the XML

result Adding of activate/ deactivate function Improvement of the statistics

Page 34: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

34

Testing and implementation

Testing. Installation in FAO (under common accessible server GILS09) for further testing

Creation of distribution package and documentation Presenting to the management and other colleagues in

FAO Installation to another server or just redirecting of the

output to the existing directory for AGRIS production Mechanism for including in the AGRIS production cycle Trouble shooting for OAI-PMH repositories

Page 35: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

35

Summary / Conclusions The goal of the harvester

Benefits for AGRIS

Possibility to use it with other FAO

OA project

Future implementation and use in

house and by our partners

Page 36: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

36

What next

Help AGRIS centres to install OAI-PMH

plug-in and expose outside firewall.

Facilitating host services for some Data

Providers

Installing harvester to other aggregators

from AGRIS harvesting to AGRIS portal

Follow up actions

Page 37: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

37

Close

New way of organization of AGRIS

harvesting

It is not an user interface but a scheduler.

Not a search interface

Its success depend on the OAI-PMH plug-in

exported data quality.

Page 38: OAI-PMH harvester for agricultural knowledge gathering (Development, testing  and implementation)

38

Thank youThank you