an ontology-driven framework for data transformation in scientific workflows shawn bowers bertram...

48
An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego

Upload: vanessa-anthony

Post on 17-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

An Ontology-Driven Framework for Data Transformation in

Scientific Workflows

Shawn BowersBertram Ludäscher

San Diego Supercomputer CenterUniversity of California, San Diego

Page 2: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

2

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Outline

• Background (SEEK Project)• Scientific Workflows• The Problem: Reusing Structurally

Incompatible Services• The Ontology-Driven Framework• Future Work

Page 3: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

3

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Outline

• Background (SEEK Project)Background (SEEK Project)• Scientific Workflows• The Problem: Reusing Structurally

Incompatible Services• The Ontology-Driven Framework• Future Work

Page 4: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

4

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Science Environment for Ecological Knowledge

(SEEK)• Domain Science Driver

– Ecology (LTER), biodiversity, …

• Analysis & Modeling System– Design and execution of

ecological models and analysis– End user focus– {application,upper}-ware

• Semantic Mediation System– Data Integration of hard-to-

relate sources and processes– Semantic Types and

Ontologies– upper middleware

• EcoGrid – Access to ecology data and

tools– {middle,under}-ware

Architecture (cf. US cyberinfrastructure, UK e-Science)

this paper

Page 5: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

5

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Outline

• The SEEK Project• Scientific Workflows Scientific Workflows

– Focus: analysis & component integration on Focus: analysis & component integration on top of data integrationtop of data integration

• The Problem: Reusing Structurally Incompatible Services

• The Ontology-Driven Framework• Future Work

Page 6: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

6

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Promoter Identification in Kepler [SSDBM’03]

Promoter Identification in Kepler [SSDBM’03]

• Problems– Many components (web

serivces) are NOT designed to fit!

“The problem P that X solves is simple, and X doesn’t solve it well”

– Semantically meaningful connections are structurally incompatible

• Approach– Distinguish structural

type and semantic type– Structural type: e.g.

XML Schema– Semantic type: e.g.

OWL expressions– Exploit the (optional!)

semantic type as much as possible

• Problems– Many components (web

serivces) are NOT designed to fit!

“The problem P that X solves is simple, and X doesn’t solve it well”

– Semantically meaningful connections are structurally incompatible

• Approach– Distinguish structural

type and semantic type– Structural type: e.g.

XML Schema– Semantic type: e.g.

OWL expressions– Exploit the (optional!)

semantic type as much as possible

Page 7: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

7

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

A Very Simple Scientific Workflow

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1

P2

P4

P3 P5

Page 8: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

8

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

A Very Simple Scientific Workflow

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1

P2

P4

P3 P5

Phase Observed

EggsInstar IInstar IIInstar IIIInstar IVAdults

44,0003,5132,5291,9221,4611,300

observations

Population samples for life stages of the common field grasshopper [Begon et al, 1996]

Page 9: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

9

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

A Very Simple Scientific Workflow

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1

P2

P4

P3 P5

Phase Observed Period Phases

EggsInstar IInstar IIInstar IIIInstar IVAdults

44,0003,5132,5291,9221,4611,300

Nymphal{Instar I, Instar II, Instar III, Instar IV}

Population samples for life stages of the common field grasshopper [Begon et al, 1996]

Periods of development in terms of phases

life stage periodsobservations

Page 10: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

10

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

A Very Simple Scientific Workflow

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1

P2

P4

P3 P5

Phase Observed Period Phases

EggsInstar IInstar IIInstar IIIInstar IVAdults

44,0003,5132,5291,9221,4611,300

Nymphal{Instar I, Instar II, Instar III, Instar IV}

Population samples for life stages of the common field grasshopper [Begon et al, 1996]

Periods of development in terms of phases

life stage periods

k-value for each periodof observation

[(nymphal, 0.44)]

observations

Page 11: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

11

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Scientific Workflows

A scientific workflow consists of a network of connected services …

A service can be any software component (including a web service or even a data source) …

Each service (optionally) takes input and (optionally) produces output

Page 12: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

12

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Scientific Workflows

SEEK adopts a Ptolemy II “workflow” model:– A service is called an actor– Each actor has zero or more input and output

ports (and possibly parameters)– Data flows through a workflow based on

connections made from output to input ports– (ignored here: different models of computation, directors,

…)

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1

P2

P4

P3 P5

Page 13: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

13

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Outline

• The SEEK Project• Scientific Workflows• The Problem: Reusing Structurally The Problem: Reusing Structurally

Incompatible ServicesIncompatible Services• The Ontology-Driven Framework• Future Work

Page 14: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

14

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Service Reusability

A scientist wishes to connect two (independent) services

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

Desired Connection

Page 15: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

15

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Service Reusability

In Ptolemy II/Kepler (and in web services), input and output ports (message parts) have structural types (XML Schema)

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Page 16: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

16

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Service Reusability

Unless “designed to fit,” independent services are structurally incompatible

Generally, the source output type will not be a subtype of the target input type

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

(⋠)

Page 17: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

17

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Service Reusability

A transformation mapping () is required to connect the services … artificially creating subtype compatibility

If such a exists, the services are “structurally feasible”

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

(⋠)

(Ps)(Ps) (≺)

Page 18: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

18

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Service Reusability

SEEK annotates services with semantic types for discovery and interoperability of services

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

Ontologies (OWL)Ontologies (OWL)

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

Desired Connection

Compatible ( )⊑

Page 19: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

19

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Service Reusability

Services can be semantically compatible, but structurally incompatible

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

Compatible

(⋠)

( )⊑

(Ps)(Ps) (≺)

Ontologies (OWL)Ontologies (OWL)

Page 20: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

20

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Example Structural Types (XML)

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1P2

P4

P3 P5

root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string

<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>

root cohortTable = (measurement)*elem measuremnt = (phase, obs)elem phase = xsd:stringelem obs = xsd:integer

<cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement>…<cohortTable>

structType(P2) structType(P3)

Page 21: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

21

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Example Semantic Types

Portion of SEEK measurement ontology

MeasContext

Observation EntityMeasProperty

hasContext 0:*1:1

appliesTo

hasProperty

0:*

AccuracyQualifier

EcologicalProperty

AbundanceCount

LifeStageProperty

NumericValue

SpatialLocation

hasLocation

hasCount

1:1

1:1

hasValue1:1

itemMeasured

1:*

Page 22: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

22

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Example Semantic Types

Portion of SEEK measurement ontology

MeasContext

Observation EntityMeasProperty

hasContext 0:*1:1

appliesTo

hasProperty

0:*

AccuracyQualifier

EcologicalProperty

AbundanceCount

LifeStageProperty

NumericValue

SpatialLocation

hasLocation

hasCount

1:1

1:1

hasValue1:1

itemMeasured

1:*

Same in OWL, a description logic standard (here, Sparrow syntax): Observation subClassOf forall hasContext/MeasContext and forall hasProperty/MeasProperty and exists itemMeasured/Entity.

MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo.

EcologicalProperty subClassOf Entity.

LifeStageProperty subClassOf EcologicalProperty.

AbundanceCount subClassOf EcologicalProperty and exists hasLocation/SpatialLocation and atMost 1/hasLocation and exists hasCount/NumericValue and atMost 1/hasCount.

Same in OWL, a description logic standard (here, Sparrow syntax): Observation subClassOf forall hasContext/MeasContext and forall hasProperty/MeasProperty and exists itemMeasured/Entity.

MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo.

EcologicalProperty subClassOf Entity.

LifeStageProperty subClassOf EcologicalProperty.

AbundanceCount subClassOf EcologicalProperty and exists hasLocation/SpatialLocation and atMost 1/hasLocation and exists hasCount/NumericValue and atMost 1/hasCount.

Page 23: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

23

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Example Semantic Types

Semantic types for P2 and P3

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1P2

P4

P3 P5

Observation

semType(P3)

MeasContext

hasContext

1:1

appliesTo LifeStageProperty1:1

AbundanceCount

itemMeasured NumberValue

hasCount

1:11:1

semType(P2)

AccuracyQualifier

hasProperty

1:1

hasValue1:1

Page 24: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

24

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Example Semantic Types

Semantic types for P2 and P3

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1P2

P4

P3 P5

Observation

semType(P3)

MeasContext

hasContext

1:1

appliesTo LifeStageProperty1:1

AbundanceCount

itemMeasured NumberValue

hasCount

1:11:1

semType(P2)

AccuracyQualifier

hasProperty

1:1

hasValue1:1

semType(P3) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured.

semType(P2) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and atMost 1/hasProperty.

semType(P3) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured.

semType(P2) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and atMost 1/hasProperty.

Page 25: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

25

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Outline

• The SEEK Project• Scientific Workflows• The Problem: Reusing Structurally

Incompatible Services• The Ontology-Driven FrameworkThe Ontology-Driven Framework• Future Work

Page 26: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

26

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

The Ontology-Driven Framework

Define semantic registration mappings (“semantic views”) to connect structural and semantic types

Use registration mappings to (semi-) automate transformation, based on derived structural correspondences

Depending on the ontologies and registration mappings, it may not be possible to find an appropriate …

(since the correspondence is often under-specified)

Page 27: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

27

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

The Ontology-Driven Framework

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

RegistrationMapping (Output)

RegistrationMapping (Input)

Ontologies (OWL)Ontologies (OWL)

Page 28: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

28

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Registration Example (simple XPaths)

/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string

<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>

structType(P2)

Page 29: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

29

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Registration Example (simple XPaths)

/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string

<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>

structType(P2)

Each sample is an instance of the semantic type

Page 30: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

30

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Registration Example (simple XPaths)

/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string

<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>

structType(P2)

Each sample’s cnt represents the itemMeasured object

Page 31: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

31

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Registration Example (simple XPaths)

/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string

<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>

structType(P2)

Each sample’s cnt’s value represents the hasCount value ofthe corresponding itemMeasured object

Page 32: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

32

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Registration Example (simple XPaths)

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

<cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement>…<cohortTable>

root cohortTable = (measurement)*elem measuremnt = (phase, obs)elem phase = xsd:stringelem obs = xsd:integer

structType(P3)

… similary for P3 .. … .

Page 33: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

33

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

The Ontology-Driven Framework

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

RegistrationMapping (Output)

RegistrationMapping (Input)

CorrespondenceCorrespondence

Ontologies (OWL)Ontologies (OWL)

Page 34: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

34

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Correspondence Example

/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

Source-side semantic registration mapping

Target-side semantic registration mapping

populationsample *meascnt

xsd:double

xsd:stringlsp

xsd:integer

acc

cohortTablemeasurement *obsxsd:integer

phasexsd:string

Page 35: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

35

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Correspondence Example

/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

Source

Target

populationsample *meascnt

xsd:double

xsd:stringlsp

xsd:integer

acc

cohortTablemeasurement *obsxsd:integer

phasexsd:string

We want to “compose”the registrations to obtain

structural correspondences

We want to “compose”the registrations to obtain

structural correspondences

Page 36: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

36

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Correspondence Example

/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

Source

Target

populationsample *meascnt

xsd:double

xsd:stringlsp

xsd:integer

acc

cohortTablemeasurement *obsxsd:integer

phasexsd:string

/population/sample == semType(P2)

/cohortTable/measurement == semType(P3)

These fragments correspond

Page 37: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

37

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Correspondence Example

/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

Source

Target

populationsample *meascnt

xsd:double

xsd:stringlsp

xsd:integer

acc

cohortTablemeasurement *obsxsd:integer

phasexsd:string

/population/sample/meas/cnt == semType(P2).itemMeasured

/cohortTable/measurement/obs == semType(P3).itemMeasured

These fragments correspond

Page 38: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

38

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Correspondence Example

/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

Source

Target

populationsample *meascnt

xsd:double

xsd:stringlsp

xsd:integer

acc

cohortTablemeasurement *obsxsd:integer

phasexsd:string

/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount

/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount

These fragments correspond

Page 39: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

39

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Correspondence Example

/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

Source

Target

populationsample *meascnt

xsd:double

xsd:stringlsp

xsd:integer

acc

cohortTablemeasurement *obsxsd:integer

phasexsd:string

/population/sample/lsp/text() == semType(P2).hasContext.appliesTo

/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo

These fragments correspond

Page 40: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

40

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

The Ontology-Driven Framework

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

RegistrationMapping (Output)

RegistrationMapping (Input)

CorrespondenceCorrespondence

Generate (Ps)(Ps)

Ontologies (OWL)Ontologies (OWL)

Transformation

Page 41: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

41

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Example Result (XQuery)

Based on the structural correspondences and certain assumptions, we derive the transformation XQuery:

<cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> }</cohortTable>

Page 42: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

42

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Assumptions Made(or why this may not work for you…)

• Common XPath prefixes refer to the same element

• Elements in correspondences have compatible cardinalities– source is equivalent or stricter than target

(e.g., + is stricter than *)

• Primitive data types are compatible

Page 43: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

43

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Framework Operations and PropertiesIn the paper, we define:

– A semantic registration mapping R as a set of rules q↔p, where q is a substructure selection (query) and p is a contextual path (a path in an ontology)

– A structural correspondence as a rule qsqt, where qs and qt are substructure selections over the source and target, resp.

– The semantic composition of registration mappings Rs and Rt, which returns a set of structural correspondence rules

– The semantic subpath operation (subconcept), which is used by the semantic composition to find matching substructure selection rules

Page 44: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

44

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Framework Operations and PropertiesIn the paper, we define:

– Registration mapping properties (cardinality consistency and partial complete registrations) and discuss the impact on determining structural transformations

– The simple XPath and Semantic Path languages for defining registration mappings, and the corresponding semantic join operator to find correspondences

Page 45: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

45

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Outline

• The SEEK Project• Scientific Workflows• The Problem: Reusing Structurally

Incompatible Services• The Ontology-Driven Framework• A Simple Framework Implementation• Future WorkFuture Work

Page 46: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

46

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Future Work

• Extend the registration mapping language– XPath is too limited … try a more general query language (e.g.,

XPath + variables) relational/Datalog based substructure

selection (query)• Formalize the properties of

registration mappings and their effect on automated transformation

• Introduce conversion routines (e.g., for units) at the ontology level; apply them in transformations

• Extend transformations to different computation models and workflow scheduling algorithms

• Add to the Kepler Scientific Workflow System

Page 47: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

47

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Acknowledgements

                                                

                                            

• NSF/ITR Science Environment for Ecological Knowledge

• NSF/ITR Geosciences Network

• NIH Biomedical Informatics Research Network

• DOE Scientific Data Management Center

Page 48: An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of

48

Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig

Questions …