distributed data, distributed governance, distributed vocabularies: the nerc datagrid
Post on 19-Jan-2016
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
BADC, BODC, CCLRC, PML and SOC
Distributed Data, Distributed Governance, Distributed
Vocabularies: The NERC DataGrid
Distributed Data, Distributed Governance, Distributed
Vocabularies: The NERC DataGrid
+ ++ + +[ ]=
Bryan Lawrence
(on behalf of a big team, and note also a substantial piece of work with specific authorship included herein)
Semantic Workshop, Edinburgh, June 2006
Outline
• Motivation• Standards
– Feature Types• Taxonomy• Overall Architecture• NDG Products
– Discovery Portal– Data Extractor– MOLES (NumSim relationship with NMM)– CSML
• CSML– Description– Prototyping in MarineXML– Round-Tripping
• Vocabulary Issues IN NDG (Hughes, Kondapalli, Lowry)• NDG Timeline
Semantic Workshop, Edinburgh, June 2006
http://ndg.nerc.ac.uk
British Atmospheric Data Centre
British Oceanographic Data Centre
Complexity + Volume + Remote Access = Grid Challenge
NCAR
Semantic Workshop, Edinburgh, June 2006
Integration – semantics
• Want interdisciplinary semantic access to information, not abstract data– getData(potential temperature from ERA-40 dataset in North Atlantic from 1990 to 2000)
– not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340, 190:200)
– or even worse:for j=1990:2000
getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50, 300:340)
• Lossy is OK!– Care less about completeness of representation than
semantic unification
Semantic Workshop, Edinburgh, June 2006
Standards
• ISO 19101: Geographic information – Reference model
A geospatial dataset…
…consists of features and related objects…
…in a defined logical
structure…
…delivered through
services…
…and described by metadata.
Semantic Workshop, Edinburgh, June 2006
Standards
• Geographic ‘features’– “abstraction of real world
phenomena” [ISO 19101]– Type or instance– Encapsulate important
semantics in universe of discourse
– “Something you can name”• Application schema
– Defines semantic content and logical structure
– ISO standards provide toolkit:
• spatial/temporal referencing
• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena,
units, etc.)– GML – canonical encoding
[from ISO 19109 “Geographic information – Rules for Application Schema”]
Semantic Workshop, Edinburgh, June 2006
Architecture: NDG Metadata Taxonomy
… not one schema, not one solution!
CSMLNCML+CF
MOLES THREDDS
DIF -> ISO19115
CLADDIER
Semantic Workshop, Edinburgh, June 2006
Architecture:
Deployment Data Providers
NDG Core Services
Users
NDG GUI Interface(s)
Vocab Services
Semantic Workshop, Edinburgh, June 2006
Architecture:
Deployment
NDG Core Services
Users
NDG GUI Interface(s)
Vocab Services
Semantic Workshop, Edinburgh, June 2006
Architecture:
Deployment
Users
NDG GUI Interface(s)
Vocab Services
Semantic Workshop, Edinburgh, June 2006
Architecture:
Deployment
UsersVocab Services
Current Status
Semantic Workshop, Edinburgh, June 2006
Discovery Service
http://ndg.nerc.ac.uk/discovery
NDG Products: Discovery Portal
NB: Web Service Interface (you can do the search from your own site and format and present the results there!
Semantic Workshop, Edinburgh, June 2006
Semantic Workshop, Edinburgh, June 2006
Semantic Workshop, Edinburgh, June 2006
Ugly as sin! A hint of things to come:
NDG Products: MOLES
Semantic Workshop, Edinburgh, June 2006
MOLES: implementation
Core linking concept is the deployment
Deployment
Activity
on behalf of an Activity
of a Data Production Tool at an Observation Station
that produces a Data Entity
DataProduction
Tool
ObservationStation
Data Entity
Each of the main metadata objects has security data attached to it. This means that this can be applied to queries on the metadata
Links the metadata records into a structure that can be turned into a navigable structure
Semantic Workshop, Edinburgh, June 2006
Simulators as data production tools: NumSim
NDG Products: NumSim
Semantic Workshop, Edinburgh, June 2006
NumSim Example
NumSim Example
Semantic Workshop, Edinburgh, June 2006
Semantic Workshop, Edinburgh, June 2006
NDG Products: DataExtractor
Semantic Workshop, Edinburgh, June 2006
Semantic Workshop, Edinburgh, June 2006
Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools)
Download either plot or the data that went into the plot.
NDG Products: GEOSPLAT
Semantic Workshop, Edinburgh, June 2006
ERA40:
•All driven from one CDML file, 9 TB online spherical harmonics, looking like 40 TB “virtual” gridded!
Semantic Workshop, Edinburgh, June 2006
NDG-A: Climate Science Modelling Language
• Aims:– provide semantic integration mechanism for NDG data– explore new standards-based interoperability framework– emphasise content, not container
• Design principles:– offload semantics onto parameter type (‘phenomenon’,
observable, measurand)• e.g. wind-profiler, balloon temperature sounding
– offload semantics onto CRS• e.g. scanning radar, sounding radar
– ‘sensible plotting’ as discriminant• ‘in-principle’ unsupervised portrayal
– explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)
Semantic Workshop, Edinburgh, June 2006
Climate Science Modelling Language
• CSML feature types– defined on basis of geometric and topologic
structureCSML feature
type Description Examples
TrajectoryFeature
Discrete path in time and space of a platform or instrument.
ship’s cruise track, aircraft’s flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single ‘profile’ of some parameter along a directed line in space.
wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis fieldPointSeriesFeature Series of single datum measurements. tidegauge, rainfall
timeseries
ProfileSeriesFeature Series of profile-type measurements.
vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields.
numerical weather prediction model, ocean general circulation model
Semantic Workshop, Edinburgh, June 2006
Climate Science Modelling Language
• CSML feature types– examples...
ProfileSeriesFeature
ProfileFeature
GridFeature
Semantic Workshop, Edinburgh, June 2006
Climate Science Modelling Language
• Numerical array descriptors– provides ‘wrapper’
architecture for legacy data files
– ‘Connected’ to data model numerical content through ‘xlink:href’
• Three subtypes:– InlineArray– ArrayGenerator– FileExtract (NASAAmes,
NetCDF, GRIB)
• Composite design pattern for aggregation
+arraySize[1]+uom[0..1]+numericType[0..1]+numericTransform[0..1]+regExpTransform[0..1]
«Type»AbstractArrayDescriptor
+aggType[1]+aggIndex[1]
«Type»AggregatedArray
1
+component
*
+values[*]
«Type»InlineArray
+expression[1]
«Type»ArrayGenerator
+fileName[1]
«Type»AbstractFileExtract
+variableName[1]+index[0..1]
«Type»NASAAmesExtract
+variableName[1]
«Type»NetCDFExtract
+parameterCode[1]+recordNumber[0..1]+fileOffset[0..1]
«Type»GRIBExtract
+id+metaDataProperty+description+name
«Type»GML::AbstractGMLType
Semantic Workshop, Edinburgh, June 2006
Climate Science Modelling Language
• Inline array
• Array generator
<NDGInlineArray><arraySize>5 2</arraySize><uom>udunits.xml#degreeC</uom><numericType>float</numericType><regExpTransform>s/10/9/ge</regExpTransform><numericTransform>+5</numericTransform><values>1 2 3 4 5 6 7 8 9 10</values>
</NDGInlineArray>
<NDGArrayGenerator><arraySize>10001</arraySize><uom>udunits.xml#minute</uom><numericType>float</numericType><expression>0:5:50000</expression>
</NDGArrayGenerator>
Semantic Workshop, Edinburgh, June 2006
Climate Science Modelling Language
File extract<NDGNASAAmesExtract>
<arraySize>526</arraySize><numericType>double</numericType><fileName>/data/BADC/macehead/mh960606.cf1</fileName><variableName>CFC-12</variableName>
</NDGNASAAmesExtract>
<NDGNetCDFExtract gml:id="feat04azimuth"><arraySize>10000</arraySize><fileName>radar_data.nc</fileName><variableName>az</variableName>
</NDGNetCDFExtract>
<NDGGRIBExtract><arraySize>320 160</arraySize><numericType>double</numericType><fileName>/e40/ggas1992010100rsn.grb</fileName><parameterCode>203</parameterCode><recordNumber>5</ recordNumber><fileOffset>289412</fileOffset>
</NDGGRIBExtract>
Semantic Workshop, Edinburgh, June 2006
XM
L P
arser
SeeMyDENC
Data Dictionary
S52 Portrayal Library
SENC
MarineGML
(NDG) Feature
Types
XML
XML
XML
Biological Species
Chl-a from Satellite
ModelledHydrodynamics
XSLT
XSLT
XSLT
For each XSD (for the source data) there is an
XSLT to translate the data to the Feature
Types (FT) defined by CSML. The FT’s and
XSLT are maintained in a ‘MarineXML registry’ The FTs can then
be translated to equivalent FTs for
display in the ECDIS system
XSLT
Features in the source XSD must be present in
the data dictionary.
XSD
XSD
XSD
XML
XML
The result of the translation is an encoding that contains the
marine data in weakly typed (i.e. generic) Features
XSLT
XSLT
Phenomena in the XSD must have an associated
portrayal
ECDIS acts as an example client for
the data.
Data from different parts of the marine
community conforming to a variety of schema
(XSD)
MeasuredHydrodynamics
S-57v3 GML
XML
XSD
XML
XSD
Feature described using S-57v3.1Application
Schema can be imported and are equivalent to the same features in CSML’
Slide adapted from Kieran Millard (AUKEGGS, 2005)
MarineXML Testbed
Semantic Workshop, Edinburgh, June 2006
Biological sampling station with attributes for the species sampled at each
Grid of Chl-a from the MERIS instrument on ENVISAT
Predicted and measured wave climate timeseries (height, direction and period)
Vectors of currents from instruments
MarineXML Testbed
Slide adapted from Kieran Millard (AUKEGGS, 2005)
Semantic Workshop, Edinburgh, June 2006
The Concept of re-using Features
Here structured XML is converted to plain ascii text in the form required for a numerical model
HTML warning service pages are generated ‘on the fly’XML can also be converted to SVG to display data graphically
Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts.
All this requires agreement on standards
Slide adapted from Kieran Millard (AUKEGGS, 2005)
Semantic Workshop, Edinburgh, June 2006
CSML Round Tripping - 1
Managing semantics
UGAS
GML app schema
XML
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
instance
Class1
Class2
-End1
1
-End2
*
«datatype»DataType1
conceptual model
Conforms to
101010
New Dataset
Application
produces
parser
V1.0 will be in NDG Alpha
Semantic Workshop, Edinburgh, June 2006
CSML Round Tripping - 2
Managing data - 1
parser
V1.0 in NDG Alpha
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
scanner
V1.0 in NDG Alpha
GML app schema
XML
instance
101010
CF Dataset
Application
producesCF
Semantic Workshop, Edinburgh, June 2006
Managing Data 2
101010
CF Dataset
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
scanner
XSLT
ISO19115
XMLPUBLISH
DECISIONPROCESSES
101010
CF Dataset
Define Dataset
Add Information
Semantic Workshop, Edinburgh, June 2006
Architecture:
Deployment
Vocabulary Management for NERC DataGrid
Michael Hughes, V.Siva Kondapalli and Roy Lowry
Semantic Workshop, Edinburgh, June 2006
Vocabulary Presentation Outline
• Problem and Solution• NERC DataGrid Vocabulary Model• Vocabulary Technical Governance• Vocabulary Content Governance• Mappings and Thesaurus Server• Potential Role of Local Mappings
Semantic Workshop, Edinburgh, June 2006
The Problem
• NERC DataGrid cannot function operationally without metadata and data semantic interoperability
• This will never be achieved without:– Readily accessible standard terms whose
meaning is clearly understood– Readily accessible semantic maps both
within and between lists of standard terms– Semantic maps between local terms and
standard terms
Semantic Workshop, Edinburgh, June 2006
The Solution?
• Implementation of a Vocabulary Server• Building OWL ontologies mapping
between domain-relevant de-facto standard vocabularies
• Deploying the ontologies through a Web Service thesaurus server
• Making tools available for users to build and deploy local ontologies
Semantic Workshop, Edinburgh, June 2006
NDG Vocabulary Model
Entry (timestamped)
Key Term Definition
Entry (timestamped)
Key Term Definition
Entry (timestamped)
Key Term Definition
List (versioned, timestamped and labelled with key, name and definition)
List (versioned,timestamped and labelled
with key, name and definition)
Constraint (Aggregation of lists across which
entry keys are unique)
Semantic Workshop, Edinburgh, June 2006
NDG Vocabulary Model
• The vocabulary resource is built from Entries– The representation of a single object in the real world
comprising:• Key - A bit pattern that represents an entity. It must be unique,
permanent and free from semantics.• Term – Text used to label the entity to facilitate human
recognition.• Abbreviation – An shortened version of the term for use where
space is tight. Target size is 20-30 bytes.• Definition – Text that unambiguously specifies the entity.
– Entries are aggregated into Lists (entity class or subclass e.g. UK post towns)
– Lists are aggregated into Constraints (entity class e.g. post towns of the world)
Semantic Workshop, Edinburgh, June 2006
Vocabulary Technical Governance
• The story so far– Lists are available as flat ASCII files or XML
documents as URLs e.g.• http://www.cgd.ucar.edu/cms/eaton/cf-metadata/standard_name.xml• ftp://ftp.pol.ac.uk/pub/bodc/jgofs/datadict/new/parameter_group.csv• http://www.sea-search.net/cdi_documentation/cdi_sampling_codes.csv• http://gcmd.nasa.gov/Resources/valids//gcmd_parameters.html
– Some (BODC, SEA-SEARCH) include keys– Some (CF, BODC) include definitions– None are properly versioned
Semantic Workshop, Edinburgh, June 2006
Vocabulary Technical Governance
• Versioning should:– Provide a unique label for each instantiation of the
list– Enable any previous instantiation of the list to be
recreated – Provide timestamp information for creation and
modification of every object in the vocabulary system
• Delivery should– Be from the master, not a copy– Be accessible to software agents to allow automated
synchronisation of local copies– Have a ‘hotline’ to content governance
Semantic Workshop, Edinburgh, June 2006
Vocabulary Technical Governance
• NERC DataGrid Vocabulary Server– Back End
• Fully automated record archive, timestamps and version numbering. Live April 2006.
• 47 (of 115) lists publicly accessible.
– Front End• Web Service API. Live June 2006.• XML list downloads from website (July 2006?).• Web-form tools (August 2006?).
Semantic Workshop, Edinburgh, June 2006
Vocabulary Content Governance
• Standard lists need to respond to ever expanding user requirements
• Change needs to be rapid or users lose interest
• Standard lists need to maintain information quality and internal consistency
• Content governance has to resolve these conflicting requirements
Semantic Workshop, Edinburgh, June 2006
Vocabulary Content Governance
• Content governance in oceanographic and atmospheric domains is based on:– Moderated e-mail discussion lists– ‘Benign Dictator’ and well-meaning volunteers
• Variable success depending on right people having ‘spare’ time at the right moments
• More formalism underpinned by more resources required
• But need to be careful about going too far or levels of service become unacceptable
Semantic Workshop, Edinburgh, June 2006
Mappings and Thesaurus Server
• There will never be a single list for a given topic
• Term mapping therefore an essential part of semantic interoperability
• Marine Metadata Interoperability (http://marinemetdata.org) have developed tooling and trialled mappings in the measurement phenomena arena
Semantic Workshop, Edinburgh, June 2006
Mappings and Thesaurus Server
• MMI approach– Harmonise lists to be mapped in OWL
(Voc2OWL tool)– Map on basis of ‘same as’, ‘broader than’
and ‘narrower than’ relationships (VINE tool)– Place a Web Service API over the map to
implement a term or thesaurus server
Semantic Workshop, Edinburgh, June 2006
Mappings and Thesaurus Server
• NERC DataGrid Plans– Use MMI technology plus domain expertise available
in BODC, BADC and their user communities to build a complete map between
• BODC Parameter Discovery Vocabulary (300 terms)• CF Standard Names (5-600 terms)• GCMD Parameter Valids (2-300 relevant terms)
– Incorporate this map into the NDG Discovery Service to facilitate smart searching (e.g. ‘pigments’ finds dataset labelled ‘chlorophyll’) through MMI Web Service
– Integrate ontology maintenance into source list maintenance
Semantic Workshop, Edinburgh, June 2006
Role of Local Mappings
• There will always be local terms and understanding
• ‘Pigment data sets’ could mean:– Chlorophyll OR carotenoids OR
phaeopigments– Chlorophyll AND carotenoids AND
phaeopigments
• Depends on point of view
Semantic Workshop, Edinburgh, June 2006
Role of Local Mappings
• Possible solution to this:– User builds an ontology reflecting local
perception of the mapping between local terms and standard terms
– Discovery or data integration tools use ontology as a ‘plug-in’ allowing user to operate with local terminology
• Tools (e.g. VINE) could be made available to facilitate this
Semantic Workshop, Edinburgh, June 2006
NDG Timeline
NDG2 runs until September 2007:• NDG-Alpha (June 2006)
– Not all components in place (particularly delivery broker)– Not many (maybe only DX) products will be deployable by non-NDG
participants(too much hard work installing things that haven’t been optimised for
installation)– Discovery portal will be (is now) usable, linking to NCAR data etc, but
isn’t very user friendly (options not obvious etc).• NDG-Beta (Feb 2007)
– Most components should work, but deployment of software may still be difficult by non-participants
• NDG-Prod (Jun 2007)– Should be deployable and far more user friendly (spending from Feb-
June working on deployment and friendliness, no new functionality)• Last few months working on sustainability etc
http://proj.badc.rl.ac.uk/trac/roadmap
top related