Download - Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010
![Page 1: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/1.jpg)
Semantic annotation on the SONet and Semtools projects:Challenges for broad
multidisciplinary exchange of observational data
Mark Schildhauer, NCEAS/UCSBTDWG meeting, Wood’s HoleObservations Activity GroupSep. 29, 2010
![Page 2: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/2.jpg)
Nature of scientific data sets
• Scientific data often in tables• Tables consist of rows (records) and columns (attributes)• The association of specific columns together (tuple) in a
scientific data set is often a non-normalized (materialized) view, with special meaning/use for researcher
• Individual cells contain values that are measurements of characteristic of some thing
![Page 3: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/3.jpg)
SONet/Semtools Semantic Approach
• Data-> metadata-> annotations-> ontologies• Ontology: formal knowledge representation in OWL-
DL– Hierarchical structure of concepts– Relationships can link concepts
• Annotations link EML metadata elements to concepts in ontology thru Observation Ontology
• EML metadata describe data and its structures
![Page 4: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/4.jpg)
Linking data values to concepts
• Extensible Observation Ontology (OBOE)• OBOE provides a high-level abstraction of
scientific observations and measurements • Enables data (or metadata) structures to be
linked to domain-specific ontology concepts• Can inter-relate values in a tuple• Provides clarification of semantics of data set
as a whole, not just “independent” values
![Page 5: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/5.jpg)
Concepts of Semantic Search
• Annotations give metadata attributes semantic meaning w.r.t. an ontology
• Enable structured search against annotations to increase precision
• Enable ontological term expansion to increase recall
• Precisely define a measured characteristic and the standard used to measure it via OBOE
![Page 6: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/6.jpg)
Logical Architecture
![Page 7: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/7.jpg)
Annotations
• XML schema defines annotation properties• Namespaces to identify sources of terms• Search performed against annotations not the
metadata itself• Returns metadata documents that are linked
to the annotation• Reasoning (term expansion, consistency, etc.)
through domain ontology
![Page 8: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/8.jpg)
XML Links
![Page 9: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/9.jpg)
KNB metadata catalog
• Stores EML (XML) and raw data objects• Extended to store Ontologies, domain and
OBOE (OWL-DLs serialized in XML)• Extended to store Annotations (XML)• Jena to facilitate querying ontologies• Pellet to reason (consistency of ontologies;
class subsumption)
![Page 10: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/10.jpg)
Metacat Implementation
![Page 11: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/11.jpg)
11
Context
Observation
Measurement
Relationship
Entity
CharacteristicValue
Standard
hasContextRelationship
ofEntity
hasValue ofCharacteristic
usesStandard
hasMeasurement
hasContext
hasContextObservation
0..*
1..1
1..10..*
0..* 1..1
0..* 1..11..1 0..*
0..*
1..1
1..1
0..*
OBOE Conceptual Model (OWL-DL)
![Page 12: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/12.jpg)
Annotation Examples (12/18/2009)
AnnotationDataset
Materialize
Define
(view def.)
OBOE Model(individuals/triples)
OBOE Concepts
instantiates
uses terms from
observation-basedrepresentation of
Query*
* Conceptually, we want to query datasets via annotations
![Page 13: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/13.jpg)
13
Annotation Examples
<observation label="o1”> <entity id=”TemporalRange"/> <measurement label="m1”> <characteristic id=”Year"/> <standard id=”DateTime"/> </measurement></observation><observation label="o2"> <entity id=“Tree"/> <measurement label="m2" precision="0.1"> <characteristic id=”DBH"/> <standard id=”Centimeter"/> </measurement> <measurement label="m3"> <characteristic id=”TaxonomicTypeName"/> <standard id=”ITIS"/> </measurement> <measurement label="m4”> <characteristic id=”EntityName"/> <standard id=“LocalTreeNames"/> </measurement> <context observation="o1"> <relationship id=“Within"/> </context></observation><map attribute="yr" measurement="m1"/><map attribute="diam" measurement="m2" if="diam ge 0"/><map attribute="spec" measurement="m4"/><map attribute="spp" measurement="m3" value="Picea rubens” if="spp eq 'piru'"/><map attribute="spp" measurement="m3" value="Abies balsamea” if="spp eq 'abba'"/>
Annotation Syntax
observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “diam” to “m2" if diam > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
* Code exists to read/write annotations using this XML format
![Page 14: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/14.jpg)
14
Annotation Examples
yr spec spp dbh
2007 1 piru 35.8
2007 1 piru 36.2
2008 2 abba 33.2
observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “dbh” to “m2" if dbh > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
Annotation Dataset
• Basic idea: go row-by-row through dataset, generating individuals/triples• “external” terms should have namespacing prefix URI
: Obs
: Meas
: Year
: DateTime
2007
: Obs
: Meas
: EntN
: LocTN.
1
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
35.8
: Obs
: Meas
: Year
: DateTime
2007
: Obs
: Meas
: EntN
: LocTN.
1
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
36.2
: Obs
: Meas
: Year
: DateTime
2008
: Obs
: Meas
: EntN
: LocTN.
2
: Meas
: TaxN
: ITIS
Abie.
: Meas
: DBH
: Centim.
33.2
: Tree: Tempral
Range
: Tree: Tempral
Range
: Tree: Tempral
Range
hasContext
hasContext
hasContext
![Page 15: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/15.jpg)
15
Annotation Examples
yr spec spp dbh
2007 1 piru 35.8
2008 1 piru 36.2
2008 2 abba 33.2
observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “dbh” to “m2" if dbh > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
Annotation Dataset
• Same Trees!! (both have name = 1)• Same Year and year observation!!
: Obs
: Meas
: Year
: DateTime
2007
: Obs
: Meas
: EntN
: LocTN.
1
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
35.8
: Obs
: Meas
: Year
: DateTime
2007
: Obs
: Meas
: EntN
: LocTN.
1
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
36.2
: Obs
: Meas
: Year
: DateTime
2008
: Obs
: Meas
: EntN
: LocTN.
2
: Meas
: TaxN
: ITIS
Abie.
: Meas
: DBH
: Centim.
33.2
: Tree: Tempral
Range
: Tree: Tempral
Range
: Tree: Tempral
Range
hasContext
hasContext
hasContext
![Page 16: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/16.jpg)
16
Annotation Examples
yr spec spp dbh
2007 1 piru 35.8
2008 1 piru 36.2
2008 2 abba 33.2
observation "o1” distinct yes entity ”TemporalRange” measurement "m1” key yes characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” key yes characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “dbh” to “m2" if dbh > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
Annotation Dataset
: Obs
: Meas
: Year
: DateTime
2007
: Obs
: Meas
: EntN
: LocTN.
1
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
35.8
: Obs
: Meas
: EntN
: LocTN.
1
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
36.2
: Obs
: Meas
: Year
: DateTime
2008
: Obs
: Meas
: EntN
: LocTN.
2
: Meas
: TaxN
: ITIS
Abie.
: Meas
: DBH
: Centim.
33.2
: Tree: Tempral
Range
: Tree
: TempralRange
Every observation has an implicit “distinct” attribute (set to “no”)
… and every measurement has an implicit “key” attribute (set to “no”)
hasContext
hasContext
![Page 17: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/17.jpg)
17
• Observation measurement keys– Like a primary key constraint
– States that observation instances with the same measurement key values are of the same entity instance
– Does not imply the same observation instance, unless the observation is declared distinct
– All key measurements of an observation together form the primary key
• Distinct observations – Only applies if at least one key measurement is defined
– States that observation instances with the same entity instance are of the same observation instance
Annotation Examples
![Page 18: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/18.jpg)
18
Annotation Examples
plt spp dbh
A piru 35.8
A piru 36.2
B piru 33.2
observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within”map “plt" to “m1”map “dbh” to “m2”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
Annotation Dataset
: Obs
: Meas
: EntN
: Nominal
A
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
35.8
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
36.2
: Obs
: Meas
: EntN
: Nominal
B
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
33.2
: Tree: Plot
: Plot
hasContext
hasContext
Here we don’t have unique ids for trees
But, assume each spp name within a plot uniquely identifies a tree …
i.e., at most one tree of a particular type was measured (possibly multiple times) in each plot
![Page 19: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/19.jpg)
19
Annotation Examples
plt spp dbh
A piru 35.8
A piru 36.2
B piru 33.2
observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within”map “plt" to “m1”map “dbh” to “m2”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
Annotation Dataset
: Obs
: Meas
: EntN
: Nominal
A
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
35.8
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
36.2
: Obs
: Meas
: EntN
: Nominal
B
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
33.2
: Tree: Plot
: Plot
hasContext
hasContext
• The Tree entity instance should depend on the plot it is in!!! (context)
![Page 20: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/20.jpg)
20
Annotation Examples
plt spp dbh
A piru 35.8
A piru 36.2
B piru 33.2
observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context identifying yes observation “o1” relationship “Within”map “plt" to “m1”map “dbh” to “m2”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”
Annotation Dataset
: Obs
: Meas
: EntN
: Nominal
A
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
35.8
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
36.2
: Obs
: Meas
: EntN
: Nominal
B
: Obs
: Meas
: TaxN
: ITIS
Picea.
: Meas
: DBH
: Centim.
33.2
: Tree: Plot
: Plot
hasContext
hasContext
Every context relationship has an “identifying” qualifier (set to “no”)
Uniqueness within context observation
Similar to a weak-entity constraint (ER)
: Tree
![Page 21: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/21.jpg)
21
Representing instances …
• Annotation(AnnotId, Resource)
• Observation(ObsId, AnnotId, EntId)
• Measurement(MeasId, ObsId, MeasType, Value)
• Context(ObsId1, ObsId2, Rel)
• Relationship(RelId, RelType)
• Entity(EntId, EntType)
This could be queried itself and/or mapped to triples
Note that ObsIds are unique across annotationsContext.ObsId’s must be for the same annotation
Annotation Examples
* Simple relational schema for OBOE models (individuals/triples)
![Page 22: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/22.jpg)
22
• Developing compatible domain ontologies (design patterns for use with observation ontology)
• Scalability of materialization algorithm from annotations (data result sets)
• Testing and developing capabilities motivated by Use Cases (coastal ecosystems and plant traits)
• SONet and JWG-ODMS continue to meet and discuss
Ongoing Activities
![Page 23: Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010](https://reader036.vdocuments.net/reader036/viewer/2022062422/56813d4f550346895da7090c/html5/thumbnails/23.jpg)
Acknowledgements: Shawn Bowers, Huiping Cao, SEEK KR/SMS working group, and all members of SONet and Semtools projects
Thanks also to Chad Berkeley and Ben Leinfelder, project software engineers
Work supported by National Science Foundation awards 0225674, 0225676, 0743429, 0733849, 0753144, 0630033