cyberinfrastructure for the geosciences geon it advances: ⁃ data integration ⁃ geon workbench...

45
www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON IT Advances: GEON IT Advances: Data Integration Data Integration GEON Workbench GEON Workbench Scientific Workflows Scientific Workflows Bertram Bertram Lud Lud ä ä scher scher Kai Lin Kai Lin Ilkay Altintas Ilkay Altintas Efrat Jaeger Efrat Jaeger San Diego Supercomputer Center University of California, San Diego

Upload: catherine-fisher

Post on 16-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON IT Advances:GEON IT Advances:⁃ ⁃ Data IntegrationData Integration

⁃ ⁃ GEON WorkbenchGEON Workbench

⁃ ⁃ Scientific WorkflowsScientific Workflows

Bertram LudBertram Ludääscher scher

Kai LinKai Lin

Ilkay AltintasIlkay Altintas

Efrat JaegerEfrat Jaeger

San Diego Supercomputer Center

University of California, San Diego

Page 2: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

2www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

The Problem: Scientific Data IntegrationThe Problem: Scientific Data Integrationor: or: … from Questions to Queries …… from Questions to Queries …

Page 3: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

3www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Information Integration Challenges: Information Integration Challenges: SS44 Heterogeneities Heterogeneities

• SSystemsystems Integration Integration– platforms, devices, data & service distribution, APIs, protocols, … Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote

resources, …

• SSyntaxyntax & & SStructuretructure– heterogeneous data formats (one for each tool ...)– heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) – heterogeneous schemas (one for each DB ...) Database mediation technologies+ XML-based data exchange, integrated views, transparent query rewriting, …

• SSemanticsemantics– fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, … Knowledge representation & semantic mediation technologies+ “smart” data discovery & integration+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways!

Page 4: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

4www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Information Integration Challenges: Information Integration Challenges: SS55 Heterogeneities Heterogeneities

• SSynthesisynthesis of analysis pipelines, integrated apps & of analysis pipelines, integrated apps & data products, …data products, …– How to make use of these wonderful things & put them

together to solve a scientist’s problem?

Scientific Problem Solving EnvironmentsScientific Problem Solving EnvironmentsGEON Portal and Workbench (“scientist’s view”)+ ontology-enhanced data registration, discovery, manipulation+ creation and registration of new data products from existing

ones, …

GEON Scientific Workflow System (“engineer’s view”)+ for designing, re-engineering, deploying analysis pipelines

and scientific workflows; a tool to make new tools … + e.g., creation of new datasets from existing ones, dataset

registration,…

Page 5: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

5www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Ontology-Enabled Application Example:Ontology-Enabled Application Example:Geologic Map IntegrationGeologic Map Integration

Show formations where AGE = ‘Paleozic’

(without age ontology)

Show formations where AGE = ‘Paleozic’

(without age ontology)

Show formations where AGE = ‘Paleozic’

(with age ontology)

Show formations where AGE = ‘Paleozic’

(with age ontology)

+/- a few hundred million years

domainknowledge

domainknowledge

Knowledge r

epresentatio

n

AGE ONTOLOGY

NevadaNevada

Page 6: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

6www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Querying by Geologic Age … Querying by Geologic Age …

Page 7: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

7www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Querying by Geologic Age: ResultQuerying by Geologic Age: Result

Page 8: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

8www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Querying by Chemical Composition … (GSC) Querying by Chemical Composition … (GSC)

Page 9: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

9www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Querying by Chemical Composition: ResultsQuerying by Chemical Composition: Results

DO know: It’s NOT there!

DON’T know! (not registered)

Note the fine differences in

shades of gray:

OK – we got to work on the color coding ;-)

OK – we got to work on the color coding ;-)

Page 10: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

10www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Querying w/ British Rock Classification (BRC)Querying w/ British Rock Classification (BRC)

Uses a GSC BRC inter-ontology articulation mapping Uses a GSC BRC inter-ontology articulation mapping

Page 11: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

11www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

British Rock Classification Query: ResultsBritish Rock Classification Query: Results

Uses a GSC BRC inter-ontology articulation mapping Uses a GSC BRC inter-ontology articulation mapping

Page 12: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

12www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

The Query: Show sedimentary rocksThe Query: Show sedimentary rocksThe Puzzle: Find the 17 differences in the results…The Puzzle: Find the 17 differences in the results…

but first: what states are we looking at?but first: what states are we looking at?

Page 13: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

13www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Sedimentary Rocks: BGS OntologySedimentary Rocks: BGS Ontology

Page 14: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

14www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Sedimentary Rocks: GSC OntologySedimentary Rocks: GSC Ontology

Page 15: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

15www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Need for Knowledge-enabled IntegrationNeed for Knowledge-enabled Integration

• A geologist analyzing chemical data from a pluton A geologist analyzing chemical data from a pluton finds no recognizable correlation between variables. finds no recognizable correlation between variables. – What possible scenarios can he examine to understand

this heterogeneity?

• Measured ages also show a scatter Measured ages also show a scatter – What is the significance of the observed spread in

measure time?

GeolAgeDB

GeoChemDB

DataTables Knowledge Representation Research:• concept maps & ontologies• process maps & ontologies• semantic types• … to facilitate (even) “smarter” tools

Page 16: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

16www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

A Prerequisite: Resource RegistrationA Prerequisite: Resource Registration

(1a) (1a) Register ontologiesRegister ontologies– geologic age; rock classifications (GSC, BGS), seismology; …

(1b) optionally: register inter-ontology articulations(1b) optionally: register inter-ontology articulations– e.g. GSC ontology BGS ontology

(2a) (2a) Item-level dataset registrationItem-level dataset registration– ADN metadata; other controlled vocabularies & ontologies

(e.g. geologic age timescale (USGS), SWEET (NASA), …)

(2b) (2b) Item-detail registrationItem-detail registration– e.g. associate values in a column with a concept

(3) (3) Use ontology-based query UI / applicationUse ontology-based query UI / application – e.g. query by geologic age and chemical composition

Page 17: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

17www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Demonstration PreviewDemonstration Preview

NOTENOTE: A : A technologytechnology demonstration, demonstration, notnot a a contentcontent

demonstrationdemonstration ( (vocabularyvocabulary, , ontologyontology, , mapsmaps, …), …)

1.1. Ontology RegistrationOntology Registration (geologicAge.owl) (geologicAge.owl)

2.2. Dataset RegistrationDataset Registration (myShapeFiles.zip) (myShapeFiles.zip)

3.3. Item-Level AssociationItem-Level Association (1 (12) 2)

4.4. GEONsearchGEONsearch

• metadata, spatial, temporal, concept-based

5.5. GEONworkbenchGEONworkbench

• use of workspace e.g. composing new maps from existing ones

… … resume with resume with GEON workflowGEON workflow overview overview

Page 18: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

18www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEONmiddleware

Demonstration PreviewDemonstration Preview

myOntology.owl myDataset.foo

metadatametadata

User Access (via Portal)User Access (via Portal)

Gazetteer, DLESE, …

Geologic Age, Chronos, …

external services

GEONsearchGEONsearch

Search condition(s)spatial temporal concept

LogLog

GEONworkbench GEONworkbench

GEON Workspace

(user)

User actionsadd delete manipulate

GEON Catalog

ResourceRegistrationResourceRegistration

SRB

Client Access (via web services)Client Access (via web services)

Other distributed apps Kepler, DLESE, …

Page 19: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

19www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Dataset to Ontology Registration (Item-level)Dataset to Ontology Registration (Item-level)

Domain Knowledge Ontologies

Domain Knowledge Ontologies

ArizonaArizona

19www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Page 20: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

20www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON Search: Concept-based Querying GEON Search: Concept-based Querying Portal Demonstration Portal Demonstration

Page 21: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

21www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Scientific Problem Solving EnvironmentsScientific Problem Solving Environments

• GEON Portal and Workbench (“scientist’s view”)GEON Portal and Workbench (“scientist’s view”) previous demonstration– a workbench for using existing/integrated tools

• Kepler Workflow System (“engineer’s view”)Kepler Workflow System (“engineer’s view”)– for (semi-)automating “scientific workflows” and “analysis

pipelines”– a tool for making and deploying new tools– some features:

• … low-level plumbing to high-level conceptual flows … • connect reusable components (“actors”, “boxes”) to form apps• abstraction via nesting of subworkflows into composite actors• deploy automated workflows on the Grid and/or with custom Uis

– demonstrations available (“Kepler2Go-1.” CD for Summer Institute)

Page 22: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

22www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

A Kepler Scientific WorkflowA Kepler Scientific Workflow

22www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

component (actor) libraries component (actor) libraries canvas for design andexecution monitoring

canvas for design andexecution monitoring

inline documentationinline documentation

Page 23: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

23www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Translating query xml response to web service xml input format.

worldImage

XML SOAP response

Look InsideSample

GEON DatasetGEON Dataset Extraction & Processing Extraction & Processing

Page 24: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

24www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES 24www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON DatasetGEON Dataset Registration Registration

Annotation form

Page 25: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

25www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES 25www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON DatasetGEON Dataset Registration Registration

validationRegistering

ADN metadata

Metadata display

Page 26: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

26www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES 26www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Putting it all together … Putting it all together …

Page 27: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

27www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON Workflows & KEPLERGEON Workflows & KEPLER

27www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

HPC workflowHPC workflow

http://kepler-project.org

http://kepler-project.org

Page 28: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Using Kepler for Using Kepler for Geological Data Geological Data

Integration WorkflowsIntegration WorkflowsIlkay AltintasIlkay Altintas

presenting joint GEON work of:presenting joint GEON work of:

Efrat Jaeger Bertram LudEfrat Jaeger Bertram Ludääscher scher

Kai Lin Ashraf MemonKai Lin Ashraf Memon

San Diego Supercomputer CenterUniversity of California, San Diego

Page 29: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

29www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Some Requirements for Some Requirements for a Scientific Workflow System (1/2)a Scientific Workflow System (1/2)

• ……it should work… (No kidding!)it should work… (No kidding!)

USER REQUIREMENTSUSER REQUIREMENTS::• Design toolsDesign tools-- especially for non-expert users-- especially for non-expert users• Ease of useEase of use-- fairly simple user interface having more -- fairly simple user interface having more

complex features hidden in the backgroundcomplex features hidden in the background• Reusable generic featuresReusable generic features

– Generic enough to serve to different communities but specific enough to serve one domain (e.g. geosciences)

• ExtensibilityExtensibility for the expert user-- almost a visual for the expert user-- almost a visual programming interfaceprogramming interface

• RegistrationRegistration and and publicationpublication of of data productsdata products and and “process products”“process products” (=workflows); provenance (=workflows); provenance

Page 30: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

30www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Some Requirements for Some Requirements for a Scientific Workflow System (2/2)a Scientific Workflow System (2/2)

TECHNICAL REQUIREMENTSTECHNICAL REQUIREMENTS::• Error detectionError detection and and recoveryrecovery from from failurefailure

– Logging information for each workflow

• Allow data-intensive and compute-intensive tasksAllow data-intensive and compute-intensive tasks(Maybe at the same time)– HPC+X (From Dr. Berman’s last GSM talk)

• Allow status checks and on the fly updatesAllow status checks and on the fly updates• Visualization…Visualization…• Semantics and metadata…Semantics and metadata…• Certification, trust, security…Certification, trust, security…

Ask the experts in this room

Page 31: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

31www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Kepler is…Kepler is…

• … … a scientific workflow systema scientific workflow system• … … a cross-project collaborationa cross-project collaboration

New contributing partners: • Cheminformatics: Resurgence (Kim Baldridge et al.)

• Life Sciences: EOL (Mark Miller et al.)

• Data Mining: SKIDL (Tony Fountain et al.)

• Neuroinformatics: BIRN (coming…)

• … … an emerging open source tool for an emerging open source tool for “scientific discovery workflows”“scientific discovery workflows”

31www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Kepler 1.0 alpha release Summer Institute

Kepler 1.0 alpha release Summer Institute

Page 32: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

32www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Some Recent Actor AdditionsSome Recent Actor Additions

Generic WS Invocation

CommandLine Execution

File Transfer

Globus Job Execution

SRB Access

SQL Queries

Queries & Transformations

Browser-based user interface

Real-time data streaming

SMTP-based messaging

Page 33: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

33www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Web Services Web Services Actors (WS Harvester) Actors (WS Harvester)

12

3

4

”Minute-made” (MM) WS-based application integration

•Similarly: MM workflow design & sharing w/o implemented components

Page 34: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

34www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON Contributions to KeplerGEON Contributions to Kepler

• System demonstrationSystem demonstration- Using Kepler FeaturesUsing Kepler Features

• GEON workflows in detailGEON workflows in detail- Dataset Registration ModelDataset Registration Model

- Processing Datasets on the Fly and Registering with the GEONworkbenchProcessing Datasets on the Fly and Registering with the GEONworkbench

Page 35: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

35www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ConclusionsConclusions

• Evolving system – GEON is a significant contributorEvolving system – GEON is a significant contributor– Plans for new generic and project-specific extensions

• Second alpha release available as CDSecond alpha release available as CD– Installers for Windows, Linux, MacOSX – Daily version tests and JWS installer generation

• User manuals and developer documentation is coming soon!User manuals and developer documentation is coming soon!• More: next week during the Summer Institute … More: next week during the Summer Institute …

Kepler project website: Kepler project website: http://kepler-project.orghttp://kepler-project.org

Thanks!Thanks!

Page 36: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON IT Advances:GEON IT Advances:⁃ ⁃ Data IntegrationData Integration

⁃ ⁃ GEON WorkbenchGEON Workbench

⁃ ⁃ Scientific WorkflowsScientific Workflows

Bertram Ludäscher Bertram Ludäscher

Kai LinKai Lin

Ilkay AltintasIlkay Altintas

Efrat JaegerEfrat Jaeger

San Diego Supercomputer CenterUC San Diego

E N DE N D

Page 37: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

37www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Related PublicationsRelated Publications

• Semantic Data Registration and IntegrationSemantic Data Registration and Integration• On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B.

Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.

• A System for Semantic Integration of Geologic Maps via Ontologies, K. Lin and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.

• Towards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.

• The Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003.

• Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Grid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003.

• Query Planning and RewritingQuery Planning and Rewriting• Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher,

Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004. • Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan

Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992.

• Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004.

Page 38: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

38www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Related PublicationsRelated Publications

• Scientific WorkflowsScientific Workflows• Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C.

Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.

• Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004.

• An Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers and B. Ludäscher, Intl. Workshop on Data Integration in the Life Sciences (DILS'04), March 25-26, 2004 Leipzig, Germany, LNCS 2994.

• A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on Web Services (ICWS), San Diego, California, July 2004.

Page 39: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Additional Material Additional Material (for questions etc)(for questions etc)

Page 40: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

40www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Multi-Hierarchical Rock Classification System (GSC)Multi-Hierarchical Rock Classification System (GSC)… a target ontology (after conversion to OWL) for geologic map registration …… a target ontology (after conversion to OWL) for geologic map registration …

Composition

Genesis

Fabric

Texture

Page 41: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

41www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Inside Ontology-Enabled Map IntegrationInside Ontology-Enabled Map Integration

User: “Show formations from Cenozoic!”User: “Show formations from Cenozoic!”

Query RewritingQuaternary Tertiary

Cenozoic

Age Ontology

Arizona Montana West

TertiaryTertiary TkgmTkgm

QuaternaryQuaternary QQ

…… …………

QgQg QuaternaryQuaternary …… …… ……

TwpTwp TertiaryTertiary …… …… ……

TwlTwl TertiaryTertiary …… …… ……

PERIOD FORMATION LITHOLOGY

TkgmTkgm

QQ

QgQg

TwpTwp

TwlTwl

……

PERIOD

Color Definition

Map Rendering

select FORMATION where AGE=“Tertiary” or AGE=“Quaternary”

ABBREV

Page 42: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

42www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Data Source Wrapping and IntegrationData Source Wrapping and Integration

Arizona

Colorado

Utah

Nevada

Wyoming

New Mexico

Montana East

Idaho

Montana West

FormationFormation ……

AgeAge ……

FormationFormation ……

AgeAge ……

FormationFormation ……

AgeAge ……

FormationFormation ……

AgeAge ……

FormationFormation ……

AgeAge ……

FormationFormation ……

AgeAge ……

FormationFormation ……

AgeAge ……

…… FormationFormation

…… AgeAge

…… CompositionComposition

…… FabricFabric

…… TextureTexture

…… FormationFormation

…… AgeAge

…… CompositionComposition

…… FabricFabric

…… TextureTexture

ABBREV

PERIOD

PERIOD

NAME

PERIOD

TYPE

TIME_UNIT

FMATN

PERIOD

NAME

PERIOD

NAME

FORMATION

PERIOD

FORMATION

FORMATION

LITHOLOGY

LITHOLOGY

AGE

AGE

andesitic sandstone

Livingston formation

Tertiary-Cretaceous

Page 43: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

43www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Gravity Modeling Design WorkflowGravity Modeling Design Workflow

• Idea: Comparing observed & synthetic gravity modelsIdea: Comparing observed & synthetic gravity models

• Steps:Steps:– Extracting and merging gravity depths from heterogeneous data

sources for a Lat/Lon bounding box (databases, web services).– Projecting and interpolating data sources into the same coordinate

systems.– Differencing observed and synthetic models.– Displaying Differential raster image.

Page 44: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

44www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Grid InterpolationGrid Interpolation

• Interpolating queried gravity data on the grid and displaying it using a Interpolating queried gravity data on the grid and displaying it using a color schema.color schema.

• Currently IDW interpolation algorithm supported. Future plans: Minimum Currently IDW interpolation algorithm supported. Future plans: Minimum Curvature, TIN, Kriging and Spline.Curvature, TIN, Kriging and Spline.

• Output: either ascii x,y,z,p or ESRI ascii grid format.Output: either ascii x,y,z,p or ESRI ascii grid format.• Display: using global mapper service.Display: using global mapper service.

Page 45: CYBERINFRASTRUCTURE FOR THE GEOSCIENCES  GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher

45www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Gravity Modeling Design WorkflowGravity Modeling Design Workflow