biocase web services for germplasm data sets, at fao, rome (2006)

40
FAO, Rome, March 2nd 2006, Dag Endresen, NGB, IPGRI FAO, Rome, March 2nd 2006, Dag Endresen, NGB, IPGRI Sharing of biodiversity data with Web Services Demonstration of BioCASE

Upload: dag-endresen

Post on 07-Dec-2014

2.116 views

Category:

Technology


0 download

DESCRIPTION

Sharing of biodiversity data with web services - demonstration of the BioCASE software. Food and Agriculture Organization of the United Nations (FAO) 2nd March 2006.

TRANSCRIPT

Page 1: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

FAO, Rome, March 2nd 2006, Dag Endresen, NGB, IPGRIFAO, Rome, March 2nd 2006, Dag Endresen, NGB, IPGRI

Sharing of biodiversity data with Web Services

Demonstration of BioCASE

Page 2: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 2

TOPICSTOPICS

Biodiversity data Data Standards Data exchange

tools The BioCASE data

provider software Decentralized

data network

Page 3: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 3

Biodiversity collections dataBiodiversity collections data

Different Biodiversity collections data describe very similar data objects.

Preserved reference collections, such as those in museums and herbaria.

Living collections, like botanical and zoological gardens, aquaria, seed banks, microbial strain cultures and tissue collections.

Data collections, from surveys of objects in the field, such as observations.

These collections have most of their attributes in common, although the terminology used to describe them may differ substantially.[http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm]

Page 4: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 4

Germplasm data, seed Germplasm data, seed genebanksgenebanks

Germplasm genebanks are biodiversity collections.

Collection level dataMetadata about genebank institutes and the germplasm collections they hold.

Unit level dataThe unit level data for germplasm collections are the accessions. Genebank accessions have most of the same properties and attributes as other biodiversity specimens.

Page 5: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 5

Data Standards

Page 6: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 6

Crop DescriptorsCrop Descriptors

The IPGRI crop descriptors (as well as other networks) is developed to meet specific needs for these crops.

The MCPD is designed to be compatible with the IPGRI crop specific descriptor lists and the FAO World Information and Early Warning System (WIEWS).

The MCPD descriptor list is compatible with ABCD (2.06).

Page 7: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 7

Taxonomic Database Working Taxonomic Database Working GroupGroup

Standards development and Standards development and maintenancemaintenance

Darwin Core 2 - Element definitions designed to support the sharing and integration of primary biodiversity data". [http://darwincore.calacademy.org/]

Access to Biological Collection Data (ABCD) 2.06 - An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“.[http://www.bgbm.org/TDWG/CODATA/Schema/]

Page 8: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 8

ABCDABCD AAccess to ccess to BBiological iological CCollection ollection DDataata

ABCD is a common data specification for data on biological specimens and observations (including the plant genetic resources seed banks).

The design goal is to be both comprehensive and general (about 1200 elements).

Development of the ABCD started after the 2000 meeting of the TDWG.

ABCD was developed with support from TDWG/CODATA, ENHSIN, BioCASE, and GBIF.

The MCPD descriptor list is now completely mapped and compatible to ABCD 2.06

[http://www.bgbm.org/TDWG/CODATA/Schema/]

Page 9: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 9

PGR sub-unit of ABCDPGR sub-unit of ABCD

PGR

Page 10: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 10

Generation Challenge ProgramGeneration Challenge ProgramGCP_Passport_1.03GCP_Passport_1.03

In the context of the GCP (Generation Challenge Program), the GCP Passport data exchange schema was developed.

Similar XML schema are under development for Phenotype (trait data) and Genotype.

Page 11: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 11

Demo Data PortalDemo Data Portal

A demo data portal was developed, providing live access to selected BioCASE data providers.

[http://geifir.ngb.se/abcdproto/default.jsp][http://geifir.ngb.se/abcdproto/default.jsp]

Page 12: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 12

Create your own BioCASE data Create your own BioCASE data schemaschema

Create an XML schema (xsd file) of your data model and copy the schema online (http://...)

Create a Concept Mapping Configuration (CMF) file from the XML schema.[http://ww3.bgbm.org/biocase/utilities/process_schema.html] (or use your own BioCASE installation ... /utilities/process_schema.html)

Save the result XML (CMF file) into your BioCASE installation cmf folder to make it available for local mapping..../biocase/configuration/templates/cmf/cmf_your-preferred-file-

name.xml

Visit : [http://ww3.bgbm.org/bps2/GenerateCmFiles] for more info!

Page 13: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 13

Biodiversity informatics data exchange tools

Page 14: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 14

Data Provider SoftwareData Provider Software

Distributed network of data providers retrieving structured data from multiple, distributed, heterogeneous databases across the Internet.

DiGIR, Distributed Generic Information Retrieval. [http://digir.net]

BioCASE, The Biological Collection Access Service for Europe.

[http://www.biocase.org/]

Page 15: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 15

Protocol integration - TAPIRProtocol integration - TAPIR

There is a need to integrate the current protocols in use by different biodiversity informatics community networks.

During the TDWG meeting in 2004, the unified protocol was presented and named TAPIR. The TDWG Access Protocol for Information Retrieval.

New BioCASE and DiGIR software will implement the TAPIR protocol.

Will TAPIR also help us to integrate GBIF with the BioMOBY community?

[http://ww3.bgbm.org/tapir]

Page 16: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 16

BioMOBYBioMOBY

BioMOBY is an international research project on methodologies for biological data representation, distribution, and discovery.

BioMOBY is chosen as the web service framework for the Generation Challenge Program[http://www.biomoby.org/]

Work is in progress to develop BioMOBY and BioCASE interoperability.

Page 17: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 17

BioCASE data provider software

BioCASEBioCASEBioBiological logical CCollection ollection AAccess for ccess for

EEuropeurope

[http://www.biocase.org/]

Page 18: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 18

BioCASE establish web-based unified access to biological collections in Europe while leaving control of the information with the collection holders.

ABCD is the main data definition used by BioCASE.

Designed generic to handle any schema and connect to any SQL capable database.

BioCASE provide full access to its registry for GBIF. Being a BioCASE provider thus means being a GBIF provider.

[http://www.biocase.org/]

BioCASE BioCASE Biological Collection Access for Biological Collection Access for EuropeEurope

Page 19: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 19

BioCASE BioCASE [http://www.biocase.org/][http://www.biocase.org/]

BioCASE runs on MS Windows, Mac OS X, Linux, BSD, Solaris...

BioCASE works with many different databases, PostgreSQL, MySQL, Oracle, MS Access, MS SQL Server....

BioCASE works with UNICODEאבדו ضاإطقكغب ששچپچ

BioCASE is OpenSource

BioCASE is developed in the Python programming language

CVS

Page 20: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 20

Distributed BioCASE networkDistributed BioCASE network

Page 21: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 21

Unit DataProvider

PyWrapperXML / CGI

Provider Domain

Client

Client DomainInternet

Queryusing ABCD concepts

BioCASE Protocol

XML

http

Response

BioCASE Protocol

XML

ABCD SchemaABCDdata

XML

httpPSF

CMF

configurationXML files

SQL

BioCASE protocol stackBioCASE protocol stack

Page 22: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 22

Required configuration:

Web server: Any CGI compliant web server: Apache, IIS, etc.

Database: major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work.

Python (BioCASE is developed in the Python programming language. Install version 2.3 or later)

[http://ww3.bgbm.org/bps2/DocumentationToc]

[http://www.biocase.org/products/provider_software/index.shtml]

BioCASE Provider Software v BioCASE Provider Software v 2.3.1 2.3.1

Page 23: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 23

Download the provider software and unzip the archive file [provider_software_2.3.1.tar.gz]

For example uncompress it into [C:\biocase\] Configure your web server to publish the

www folder. Example [C:\biocase\] to be accessible trough [http://localhost/biocase/]

Download and install the latest Python software [http://www.python.org/download/]

Execute the [C:\biocase\setup.py] script. For a UNIX like system: %> cd biocase

%> python setup.py

Test your installation [http://localhost/biocase]

BioCASE installationBioCASE installation

[http://ww3.bgbm.org/bps2/Installation]

Page 24: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 24

Install third party software[http://localhost/biocase/utilities/testlibs.cgi ]

Follow the links from the Library test page.

The column for installed version will display the installed version after successful installation.

BioCASEBioCASE

To update the BioCASE software:

Download the new release. Unzip to a temporary folder. Execute the setup.py and follow

the instructions.

Page 25: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 25

After successful installation you will need to configure your data provider. Follow the instructions from the BioCASE documentation to configure

Data sources. If you provide more datasets or several databases they will be configured as individual data sources.

Database connection. So the software can access your database.

Database structure. Define the relevant tables, the primary keys and foreign keys.

Data model. Map your database model to the standard represented by the XML Schemas you choose.

BioCASE configurationBioCASE configuration

[http://ww3.bgbm.org/bps2/Configuration]

Page 26: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 26

Example of a service Example of a service requestrequest

All exchanged data is formatted with XML tags.

Page 27: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 27

Example of a service Example of a service responseresponse

Page 28: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 28

TAPIRTAPIR

TAPIR will offer you more advanced request formats.

Page 29: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 29

TAPIR service requestTAPIR service request

TAPIR will offer you more advanced request formats.

Page 30: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 30

TAPIR service responseTAPIR service response

singer:/sourcenamesinger:/taxonomy/genussinger:/taxonomy/speciessinger:/taxonomy/subspeciessinger:/holding/IDsinger:/holding/namesinger:/origin/collecting/

countrysourcesinger:/origin/collecting/

countrysourceIDsinger:/status/biologicalstatussinger:/status/biologicalstatusID

...

Page 31: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 31

Decentralized data network with web services

Page 32: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 32

Data warehouse modelData warehouse model(Slide by Samy Gaiji, IPGRI)(Slide by Samy Gaiji, IPGRI)

Page 33: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 33

Decentralized modelDecentralized model(Slide by Samy Gaiji, IPGRI)(Slide by Samy Gaiji, IPGRI)

Page 34: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 34

Data flow from genebanks to Data flow from genebanks to EURISCO and ECCDBs EURISCO and ECCDBs

Page 35: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 35

Decentralized modelDecentralized model

Page 36: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 36

Genebanks on BioCASEGenebanks on BioCASE

The BioCASE data provider software has been implemented at (almost) all the CGIAR germplasm centers during the autumn of 2005.

Several other genebanks have installed the GBIF web service technology. Nordic Gene Bank, IPK Gatersleben, IHAR (DiGIR), USDA GRIN, CGN, more to follow soon...

Page 37: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 37

Germplasm data indexing Germplasm data indexing toolstools

We are building data indexing methodologies for access to germplasm data with BioCASE.

This is planned to build a Germplasm Clearing House Mechanism.

Development in cooperation with GBIF, which themselves index basic biodiversity data from a similar approach.

[http://chm.grinfo.net/index.php]

Page 38: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 38

BioCASE and germplasm dataBioCASE and germplasm data[http://chm.grinfo.net/index.php?app=data_providers]

Page 39: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 39

Global Unique Identifiers, GUID (LSID, Life Science Identifiers) [http://lsid.sourceforge.net/]

Biodiversity informatics workflow tools (BioMOBY and Taverna, Kepler and SEEK...)

Germplasm Clearing House Mechanism [http://chm.grinfo.net/]

TAPIR

Works in progressWorks in progress

Page 40: BioCASE web services for germplasm data sets, at FAO, Rome (2006)

Sharing of biodiversity data with BioCASE, March 2, 2006, FAO, RomeSharing of biodiversity data with BioCASE, March 2, 2006, FAO, Rome 40

Thank you for listening!