an ontology-driven framework for data transformation in scientific workflows shawn bowers bertram...
TRANSCRIPT
An Ontology-Driven Framework for Data Transformation in
Scientific Workflows
Shawn BowersBertram Ludäscher
San Diego Supercomputer CenterUniversity of California, San Diego
2
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Outline
• Background (SEEK Project)• Scientific Workflows• The Problem: Reusing Structurally
Incompatible Services• The Ontology-Driven Framework• Future Work
3
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Outline
• Background (SEEK Project)Background (SEEK Project)• Scientific Workflows• The Problem: Reusing Structurally
Incompatible Services• The Ontology-Driven Framework• Future Work
4
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Science Environment for Ecological Knowledge
(SEEK)• Domain Science Driver
– Ecology (LTER), biodiversity, …
• Analysis & Modeling System– Design and execution of
ecological models and analysis– End user focus– {application,upper}-ware
• Semantic Mediation System– Data Integration of hard-to-
relate sources and processes– Semantic Types and
Ontologies– upper middleware
• EcoGrid – Access to ecology data and
tools– {middle,under}-ware
Architecture (cf. US cyberinfrastructure, UK e-Science)
this paper
5
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Outline
• The SEEK Project• Scientific Workflows Scientific Workflows
– Focus: analysis & component integration on Focus: analysis & component integration on top of data integrationtop of data integration
• The Problem: Reusing Structurally Incompatible Services
• The Ontology-Driven Framework• Future Work
6
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Promoter Identification in Kepler [SSDBM’03]
Promoter Identification in Kepler [SSDBM’03]
• Problems– Many components (web
serivces) are NOT designed to fit!
“The problem P that X solves is simple, and X doesn’t solve it well”
– Semantically meaningful connections are structurally incompatible
• Approach– Distinguish structural
type and semantic type– Structural type: e.g.
XML Schema– Semantic type: e.g.
OWL expressions– Exploit the (optional!)
semantic type as much as possible
• Problems– Many components (web
serivces) are NOT designed to fit!
“The problem P that X solves is simple, and X doesn’t solve it well”
– Semantically meaningful connections are structurally incompatible
• Approach– Distinguish structural
type and semantic type– Structural type: e.g.
XML Schema– Semantic type: e.g.
OWL expressions– Exploit the (optional!)
semantic type as much as possible
7
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
A Very Simple Scientific Workflow
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1
P2
P4
P3 P5
8
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
A Very Simple Scientific Workflow
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1
P2
P4
P3 P5
Phase Observed
EggsInstar IInstar IIInstar IIIInstar IVAdults
44,0003,5132,5291,9221,4611,300
observations
Population samples for life stages of the common field grasshopper [Begon et al, 1996]
9
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
A Very Simple Scientific Workflow
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1
P2
P4
P3 P5
Phase Observed Period Phases
EggsInstar IInstar IIInstar IIIInstar IVAdults
44,0003,5132,5291,9221,4611,300
Nymphal{Instar I, Instar II, Instar III, Instar IV}
Population samples for life stages of the common field grasshopper [Begon et al, 1996]
Periods of development in terms of phases
life stage periodsobservations
10
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
A Very Simple Scientific Workflow
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1
P2
P4
P3 P5
Phase Observed Period Phases
EggsInstar IInstar IIInstar IIIInstar IVAdults
44,0003,5132,5291,9221,4611,300
Nymphal{Instar I, Instar II, Instar III, Instar IV}
Population samples for life stages of the common field grasshopper [Begon et al, 1996]
Periods of development in terms of phases
life stage periods
k-value for each periodof observation
[(nymphal, 0.44)]
observations
11
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Scientific Workflows
A scientific workflow consists of a network of connected services …
A service can be any software component (including a web service or even a data source) …
Each service (optionally) takes input and (optionally) produces output
12
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Scientific Workflows
SEEK adopts a Ptolemy II “workflow” model:– A service is called an actor– Each actor has zero or more input and output
ports (and possibly parameters)– Data flows through a workflow based on
connections made from output to input ports– (ignored here: different models of computation, directors,
…)
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1
P2
P4
P3 P5
13
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Outline
• The SEEK Project• Scientific Workflows• The Problem: Reusing Structurally The Problem: Reusing Structurally
Incompatible ServicesIncompatible Services• The Ontology-Driven Framework• Future Work
14
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Service Reusability
A scientist wishes to connect two (independent) services
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
Desired Connection
15
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Service Reusability
In Ptolemy II/Kepler (and in web services), input and output ports (message parts) have structural types (XML Schema)
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
16
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Service Reusability
Unless “designed to fit,” independent services are structurally incompatible
Generally, the source output type will not be a subtype of the target input type
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
(⋠)
17
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Service Reusability
A transformation mapping () is required to connect the services … artificially creating subtype compatibility
If such a exists, the services are “structurally feasible”
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
(⋠)
(Ps)(Ps) (≺)
18
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Service Reusability
SEEK annotates services with semantic types for discovery and interoperability of services
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
Ontologies (OWL)Ontologies (OWL)
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
Desired Connection
Compatible ( )⊑
19
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Service Reusability
Services can be semantically compatible, but structurally incompatible
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
Compatible
(⋠)
( )⊑
(Ps)(Ps) (≺)
Ontologies (OWL)Ontologies (OWL)
20
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Example Structural Types (XML)
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1P2
P4
P3 P5
root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string
<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>
root cohortTable = (measurement)*elem measuremnt = (phase, obs)elem phase = xsd:stringelem obs = xsd:integer
<cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement>…<cohortTable>
structType(P2) structType(P3)
21
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Example Semantic Types
Portion of SEEK measurement ontology
MeasContext
Observation EntityMeasProperty
hasContext 0:*1:1
appliesTo
hasProperty
0:*
AccuracyQualifier
EcologicalProperty
AbundanceCount
LifeStageProperty
NumericValue
SpatialLocation
hasLocation
hasCount
1:1
1:1
hasValue1:1
itemMeasured
1:*
22
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Example Semantic Types
Portion of SEEK measurement ontology
MeasContext
Observation EntityMeasProperty
hasContext 0:*1:1
appliesTo
hasProperty
0:*
AccuracyQualifier
EcologicalProperty
AbundanceCount
LifeStageProperty
NumericValue
SpatialLocation
hasLocation
hasCount
1:1
1:1
hasValue1:1
itemMeasured
1:*
Same in OWL, a description logic standard (here, Sparrow syntax): Observation subClassOf forall hasContext/MeasContext and forall hasProperty/MeasProperty and exists itemMeasured/Entity.
MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo.
EcologicalProperty subClassOf Entity.
LifeStageProperty subClassOf EcologicalProperty.
AbundanceCount subClassOf EcologicalProperty and exists hasLocation/SpatialLocation and atMost 1/hasLocation and exists hasCount/NumericValue and atMost 1/hasCount.
Same in OWL, a description logic standard (here, Sparrow syntax): Observation subClassOf forall hasContext/MeasContext and forall hasProperty/MeasProperty and exists itemMeasured/Entity.
MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo.
EcologicalProperty subClassOf Entity.
LifeStageProperty subClassOf EcologicalProperty.
AbundanceCount subClassOf EcologicalProperty and exists hasLocation/SpatialLocation and atMost 1/hasLocation and exists hasCount/NumericValue and atMost 1/hasCount.
23
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Example Semantic Types
Semantic types for P2 and P3
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1P2
P4
P3 P5
Observation
semType(P3)
MeasContext
hasContext
1:1
appliesTo LifeStageProperty1:1
AbundanceCount
itemMeasured NumberValue
hasCount
1:11:1
semType(P2)
⊑
AccuracyQualifier
hasProperty
1:1
hasValue1:1
24
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Example Semantic Types
Semantic types for P2 and P3
S1
(life stage property)
S1
(life stage property)
S2
(mortality rate for period)
S2
(mortality rate for period)
P1P2
P4
P3 P5
Observation
semType(P3)
MeasContext
hasContext
1:1
appliesTo LifeStageProperty1:1
AbundanceCount
itemMeasured NumberValue
hasCount
1:11:1
semType(P2)
⊑
AccuracyQualifier
hasProperty
1:1
hasValue1:1
semType(P3) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured.
semType(P2) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and atMost 1/hasProperty.
semType(P3) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured.
semType(P2) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and atMost 1/hasProperty.
25
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Outline
• The SEEK Project• Scientific Workflows• The Problem: Reusing Structurally
Incompatible Services• The Ontology-Driven FrameworkThe Ontology-Driven Framework• Future Work
26
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
The Ontology-Driven Framework
Define semantic registration mappings (“semantic views”) to connect structural and semantic types
Use registration mappings to (semi-) automate transformation, based on derived structural correspondences
Depending on the ontologies and registration mappings, it may not be possible to find an appropriate …
(since the correspondence is often under-specified)
27
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
The Ontology-Driven Framework
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible ( )⊑
RegistrationMapping (Output)
RegistrationMapping (Input)
Ontologies (OWL)Ontologies (OWL)
28
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Registration Example (simple XPaths)
/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string
<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>
structType(P2)
29
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Registration Example (simple XPaths)
/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string
<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>
structType(P2)
Each sample is an instance of the semantic type
30
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Registration Example (simple XPaths)
/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string
<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>
structType(P2)
Each sample’s cnt represents the itemMeasured object
31
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Registration Example (simple XPaths)
/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string
<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>
structType(P2)
Each sample’s cnt’s value represents the hasCount value ofthe corresponding itemMeasured object
32
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Registration Example (simple XPaths)
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
<cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement>…<cohortTable>
root cohortTable = (measurement)*elem measuremnt = (phase, obs)elem phase = xsd:stringelem obs = xsd:integer
structType(P3)
… similary for P3 .. … .
33
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
The Ontology-Driven Framework
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible ( )⊑
RegistrationMapping (Output)
RegistrationMapping (Input)
CorrespondenceCorrespondence
Ontologies (OWL)Ontologies (OWL)
34
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Correspondence Example
/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
Source-side semantic registration mapping
Target-side semantic registration mapping
populationsample *meascnt
xsd:double
xsd:stringlsp
xsd:integer
acc
cohortTablemeasurement *obsxsd:integer
phasexsd:string
35
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Correspondence Example
/population/sample == semType(P2)/population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
Source
Target
populationsample *meascnt
xsd:double
xsd:stringlsp
xsd:integer
acc
cohortTablemeasurement *obsxsd:integer
phasexsd:string
We want to “compose”the registrations to obtain
structural correspondences
We want to “compose”the registrations to obtain
structural correspondences
36
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Correspondence Example
/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
Source
Target
populationsample *meascnt
xsd:double
xsd:stringlsp
xsd:integer
acc
cohortTablemeasurement *obsxsd:integer
phasexsd:string
/population/sample == semType(P2)
/cohortTable/measurement == semType(P3)
These fragments correspond
37
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Correspondence Example
/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
Source
Target
populationsample *meascnt
xsd:double
xsd:stringlsp
xsd:integer
acc
cohortTablemeasurement *obsxsd:integer
phasexsd:string
/population/sample/meas/cnt == semType(P2).itemMeasured
/cohortTable/measurement/obs == semType(P3).itemMeasured
These fragments correspond
38
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Correspondence Example
/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
Source
Target
populationsample *meascnt
xsd:double
xsd:stringlsp
xsd:integer
acc
cohortTablemeasurement *obsxsd:integer
phasexsd:string
/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount
/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount
These fragments correspond
39
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Correspondence Example
/population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured/population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount/population/sample/meas/acc == semType(P2).hasProperty/population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement == semType(P3)/cohortTable/measurement/obs == semType(P3).itemMeasured/cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
Source
Target
populationsample *meascnt
xsd:double
xsd:stringlsp
xsd:integer
acc
cohortTablemeasurement *obsxsd:integer
phasexsd:string
/population/sample/lsp/text() == semType(P2).hasContext.appliesTo
/cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo
These fragments correspond
40
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
The Ontology-Driven Framework
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible ( )⊑
RegistrationMapping (Output)
RegistrationMapping (Input)
CorrespondenceCorrespondence
Generate (Ps)(Ps)
Ontologies (OWL)Ontologies (OWL)
Transformation
41
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Example Result (XQuery)
Based on the structural correspondences and certain assumptions, we derive the transformation XQuery:
<cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> }</cohortTable>
42
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Assumptions Made(or why this may not work for you…)
• Common XPath prefixes refer to the same element
• Elements in correspondences have compatible cardinalities– source is equivalent or stricter than target
(e.g., + is stricter than *)
• Primitive data types are compatible
43
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Framework Operations and PropertiesIn the paper, we define:
– A semantic registration mapping R as a set of rules q↔p, where q is a substructure selection (query) and p is a contextual path (a path in an ontology)
– A structural correspondence as a rule qsqt, where qs and qt are substructure selections over the source and target, resp.
– The semantic composition of registration mappings Rs and Rt, which returns a set of structural correspondence rules
– The semantic subpath operation (subconcept), which is used by the semantic composition to find matching substructure selection rules
44
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Framework Operations and PropertiesIn the paper, we define:
– Registration mapping properties (cardinality consistency and partial complete registrations) and discuss the impact on determining structural transformations
– The simple XPath and Semantic Path languages for defining registration mappings, and the corresponding semantic join operator to find correspondences
45
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Outline
• The SEEK Project• Scientific Workflows• The Problem: Reusing Structurally
Incompatible Services• The Ontology-Driven Framework• A Simple Framework Implementation• Future WorkFuture Work
46
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Future Work
• Extend the registration mapping language– XPath is too limited … try a more general query language (e.g.,
XPath + variables) relational/Datalog based substructure
selection (query)• Formalize the properties of
registration mappings and their effect on automated transformation
• Introduce conversion routines (e.g., for units) at the ontology level; apply them in transformations
• Extend transformations to different computation models and workflow scheduling algorithms
• Add to the Kepler Scientific Workflow System
47
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Acknowledgements
• NSF/ITR Science Environment for Ecological Knowledge
• NSF/ITR Geosciences Network
• NIH Biomedical Informatics Research Network
• DOE Scientific Data Management Center
48
Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
Questions …