science environment for ecological knowledge: ecogrid interfaces dave vieglais
DESCRIPTION
Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais The Natural History Museum and Biodiversity Research Center University of Kansas. Science Environment for Ecological Knowledge. Research Objectives Access to ecological and environmental data - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/1.jpg)
Science Environment for Ecological Knowledge: Ecogrid Interfaces
Dave VieglaisThe Natural History Museum and Biodiversity Research Center
University of Kansas
![Page 2: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/2.jpg)
Science Environment for Ecological Knowledge
Research Objectives
Access to ecological and environmental data Enable data sharing & re-use Enhance data discovery at global scales
Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of
data Enable communication and collaboration for analysis Address data heterogeneity issues Enable re-use of analytical components
![Page 3: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/3.jpg)
Data is Heterogeneous Syntax Schema Semantics
From many disciplines Biodiversity surveys, hydrology, atmospheric
chemistry, spatial data, behavioral experiments,… Data on economics, demographics, legal issues,…
Data is distributed
Informatics Challenges for SEEK
![Page 4: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/4.jpg)
SEEK Components
EcoGrid Ecological, biodiversity and environmental data Computational access
Analysis and Modeling System Modeling scientific workflows
Semantic Mediation System “Smart” data discovery Knowledge-based data integration Knowledge-based analysis integration
Knowledge Representation Ontologies for describing ecology
![Page 5: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/5.jpg)
Building the EcoGrid
AND
SEV
LUQ
VCR
HBR
NTL
NRSPISCO1
PISCO2 OBFS
Metacat node
Site node
LTER Network (24)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)
SDSC
NET
KU
NCEAS
SRB node
DiGIR node
![Page 6: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/6.jpg)
SEEK EcoGrid
Integrate diverse data networks from ecology, biodiversity, and environmental sciences Metacat, DiGIR, SRB, Xanthoria, ...
EML is the core for data documentation Access to computational resources via the Grid
(OGSA)
![Page 7: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/7.jpg)
Ecological Metadata Language (EML)
Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion
EML Discovery information
Creator, Title, Abstract, Keyword, etc. Coverage
Geographic, temporal, and taxonomic extent Logical and physical data structure
Data semantics via unit definitions and typing Protocols and methods
![Page 8: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/8.jpg)
DiGIR Overview
DiGIR = Distributed Generic Information Retrieval A DiGIR client may communicate with any number of
data providers A DiGIR data provider may expose any number of
resources (databases) A DiGIR resource is a collection of objects described
by a single federation schema
DiGIR Client
DiGIR Provider
DataResource1..n 1..n
![Page 9: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/9.jpg)
EcoGrid Interfaces
Registry
Session
Query
Taxon
SMS
Resolves references to objects
•Interface definitions
•Data structures
•Service instancesAuthentication
Details on session information
Coarse granularity of resource restriction
Search and retrieve metadata and data
Different levels of “conformance”
Low bar for participation in SEEKSystem to reduce ambiguity in scientific names
Commonly used to address synonomy
Mechanism for relating and resolving data andmetadata concepts
![Page 10: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/10.jpg)
EcoGrid Query Interfaces
Provides a mechanism for search and retrieval of metadata and federated data
Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval
Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I
![Page 11: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/11.jpg)
Query Interfaces Implemented
Initial requirement to support query and retrieval from: SRB Metacat DiGIR Xanthoria
Federated data sets that subscribe to a small set of federation schemas
![Page 12: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/12.jpg)
EcoGrid Query Level I
Basic, entry level exposure of data and metadata for EcoGrid and SEEK
Response contains data – intended for direct communications rather than 3rd party indirection
ResultsetType query(SessionID,QueryType)
byte[] get(SessionID,objectID)
![Page 13: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/13.jpg)
Query Example
<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"
xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-
query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace
prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"
concept="Genus">Peromyscus</condition></egq:query>
![Page 14: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/14.jpg)
Query Structure
Language independent representation of a query structure
Transformed into the appropriate native language of the data store
Example:<AND> <condition operator="LIKE“ concept="ScientificName">
peromyscus man%</condition>
<condition operator="NOT EQUALS“ concept="DecimalLatitude"> NULL</condition>
</AND>
![Page 15: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/15.jpg)
Specifying the Resultset
Specify the list of concepts (fields) to be returned in the resultset
Simple paths used to identify elements or document subtrees
Effectively flattens the structure of the records, but allows generic representation
Example: <returnfield>/ScientificName</returnfield>
<returnfield>/Longitude</returnfield>
<returnfield>/Latitude</returnfield>
![Page 16: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/16.jpg)
Query Result Set Structure
<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-
1.0.0beta1 ../../src/xsd/resultset.xsd"> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> </resultsetMetadata> <record number="1"
system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2" identifier="mvz1" namespace="http://digir.net/schema/conceptual/darwin/2003/1.0" lastModifiedDate="2003-03-03T10:42:13" creationDate="2003-03-03T10:42:13"> <darwin:ScientificName>PEROMYSCUS LEUCOPUS NOVEBORACENSIS
</darwin:ScientificName> <darwin:Longitude>121</darwin:Longitude> <darwin:Latitude>33</darwin:Latitude> </record>
![Page 17: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/17.jpg)
EcoGrid Query Level II
More detailed handling of results Uses RSIDs to identify resultsets- handles
that can be passed to a third party
Resultset retrieve(SessionID,RSID,start,numrecs)
RSID search(SessionID,query)
query decodeResultsetIdentifier(SessionID,RSID)
statusinfo getResultStatus(SessionID)
int transfer(SessionID,sourceURL,destURL,ObjectID)
![Page 18: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/18.jpg)
EcoGrid Write
Used to push data back to sources (e.g. publishing EML documents)
Depends on the availability of an authentication system
put(sessionID, objectID, object, type)
delete(sessionID,objectID)
![Page 19: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/19.jpg)
Data Instance Query?
New requirement to support direct query and retrieval with arbitrary data sets
Generally no common schemas between different instances
Could either Push data instance to service that can query
object (e.g. the SRB) Implement interface at the data instance location
Simple JDBC / SQL interface?
dbSchema getDataSchema(sessionID,objectID)
dbResultset search(sessionID,objectID,SQL)
![Page 20: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/20.jpg)
Convergence with Globus?
EcoGrid originally intended to use Globus since it provided much of the infrastructure
Globus is not a viable infrastructure layer due to installation and reliability concerns
Should SEEK implement Globus infrastructure to support project requirements?
Likely to duplicate minimal service definitions and re-implement
![Page 21: Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais](https://reader036.vdocuments.net/reader036/viewer/2022062409/5681459d550346895db28f4e/html5/thumbnails/21.jpg)
Acknowledgements
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.
The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.
The Andrew W. Mellon Foundation.
PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)