dft basic digital & data concepts - data is inherently collective data

14
Semantic Interoperability for Data in Context IG RDA Plenary 3: Friday 28th March 2014 (Day32 Gary Berg-Cross (SOCoP, RDA DFT WG co-chair), [email protected]

Upload: deana

Post on 24-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Semantic Interoperability for Data in Context IG RDA Plenary 3: Friday 28th  March 2014 ( Day32 Gary Berg-Cross ( SOCoP , RDA DFT WG co-chair), [email protected] . DFT Basic Digital & Data Concepts - data is inherently collective data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

Semantic Interoperability for Data in Context IGRDA Plenary 3:

Friday 28th March 2014 (Day32Gary Berg-Cross (SOCoP, RDA DFT WG co-chair),

[email protected]

Page 2: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

DFT Basic Digital & Data Concepts - data is inherently collective data

• Digital Data refers to a structured sequence of bits/bytes that represents information content. In many contexts digital data and data are used interchangeably implying both the bits and the content.

• Real-Time Data is data/data collection which is produced in its own schedule & has a tight time relation to the processes that create it and that require immediate actions. Timeliness such as real time is an attribute of data.

• Dynamic Data is a type of data which is changing frequently and asynchronously. • Note: Dynamic data has also been used in the context of Workflow- workflow that is executed a

"dynamic data object", or you can call the results from executing the workflow a "dynamic data object"• Referable Data is a type of data (digital or not) that is persistently stored and which is referred to by a

persistent identifier. Digital data may be accesses by the identifier. Some data objects references may access a service on the object (OAI-ORE).

• Citable Data is a type of referable data that has undergone quality assessment and can be referred to as citations in publications.

Page 3: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

Background to this Semantic Interoperability

• Long time work on “data integration and sharing”.

• Semantics is FEATURED in the Application layer of OSI

• Intensive work in the AI & knowledge engineering areas.

• But to many the goal of semantic interoperability remains elusive.

• More recently the Semantic Web thrust pursued the goal of robust semantic interoperability & robust exchange of data.

• Needs deep knowledge and support of reasoning to fulfill SW vision.

>

Page 4: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

SI has a Socio-Tech Aspect

Who is doing what?

How to Understand

the Problem.

What are the critical Issues?

Is it a knowledge

representation problem?What is

the role of

Ontology?

What is the role of tools?

What are the best methods?

Re-use and integration of data from heterogeneous sources within and across discipline boundaries has not been routinely achieved.

Application of special technologies that infer, relate, interpret, and classify the implicit meanings of digital content are not easily adapted to the topical research interest or enfolded in traditional architectures.

Use an agile approach, based on sets of competency questions?

Don’t try too hard to train a Domain Expert in Gold Standard formal semantics?

Since meaning is a cognitive agent phenomena, semantic interoperability is the technical analogue to human communication and cooperation. That makes it intrinsically HARD.

Use metadata semantic annotation?

Page 5: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

Graphic Overview of Semantic /Ontology Manifesto (EarthCube)

Knowledge Infrastructure VisionCommunity Understanding of Semantic role and value

Guiding principles1. Uses Cases 2. Lightweight -opportunistic

methods3. Semantic interoperability with semantic heterogeneity4. Bottom-up & top-down

approaches5. Domain - ontology engineer

teams 6. Formalized bodies of

knowledge across science domains

7. Broader “Reasoning” services“Insertion”

Architecture &Workflow Between

Based on the work of (alphabetically)Gary Berg-Cross, Isabel Cruz, Mike Dean, Tim Finin, Mark Gahegan, Pascal Hitzler, Hook Hua, Krzysztof Janowicz, Naicong Li, Philip Murphy, Bryce Nordgren, Leo Obrst, Mark Schildhauer, Amit Sheth, Krishna Sinha, Anne Thessen, Nancy Wiegand, and Ilya Zaslavsky

Paper at http://stko.geog.ucsb.edu/gibda2012/gibda2012_submission_6.pdf /

Page 6: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

SI 6

Lightweight Methods & Products• Choose lightweight approaches to support application needs and reduced

entry barrier • Low hanging fruit leverages initial vocabularies & existing conceptual

models to ensure that a semantics-driven infrastructure is available for early use.

Simple parts/patterns & direct relations to data Triple like parts

More relation types here Bottom Up.

A useful set of idea that supports a useful subset of (approximate) reasoning

Page 7: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

7

GeoSpatial Data & Web Feature Service Standardizes Terms but Lacks Semantics

A terminology created independently based on different conceptual models differing in terms/vocabulary but also & meanings.

Page 8: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

RDA P3 8

Better Conceptualization of Properties - for Interoperability (CUAHSI)

Organize Properties like size as a physical quality since it inheres in a physical object.Qualities like physical, bulk, & measured properties like stream flow, level, pollutants, evapotranspiration etc. and make them useable concepts rather than level concepts.• Currently CUAHSI has them at many levels

• E.g. 2291 Major, bulk properties 4

Water Body Water DensityUnit

Grams /cm3Water Density

For connecting to Chem/BioChem ontologies there might be sub-categories of Physical for elements – optical, hardness, color

See Dumontier Lab ontologies to represent bio-scientific concepts and relations.http://dumontierlab.com/?page=ontologies

hasConstituent hasFeature hasUnitusesStandard

ChesapeakeBay

IsA

Area

HasFeature

AreaQuantityhasQuantity

Real Number

Sq MileshasUnit

hasValue

hasLayer …..

Page 9: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

RDA P3 9

Incrementally Adding Better Semantic Relations/Properties

Data models & SKOS offer some relations, but they are limited. SKOS is more useful for terms than conceptsConsider Irreflexive, anti-symmetric & Transitive constructs that

captures common understanding.Observation –Streams and lakes flow into rivers. • Property “flows-into” is irreflexive

• any one river cannot flow into itself as a loop • “flows-into” is also anti-symmetric

• if one river flows into the second, the second one can’t flow into the first.

• Transitive property for Regions to say that the subRegionOf property between regions is transitive

• <owl:TransitiveProperty rdf:ID="subRegionOf"> <rdfs:domain rdf:resource="#Region"/> <rdfs:range rdf:resource="#Region"/> </owl:TransitiveProperty>

If Logan, Cache County and Utah are regions, and Logan is a subRegion of Cache County , Cache County is a subRegion of Utah, then Logan is also a subRegion of Utah.

Page 10: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

Grafton Street Dublin in Context

Grafton Street (Irish: Sráid Grafton) is one of the two principal shopping streets in Dublin city centre. Do we refer to it a pedestrian mall or a shopping street?Is it a road object but with motor traffic restrictions? Or a public place? Or a non-identifiable part of the city surface?

OpenStreetMap -

All such references are usually outside a computer

Page 11: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

What Grafton Street is Depends on its Setting – when we are talking about, AND what Features

Grafton Street

1814 AD or 2014?

Transport or commerce features?

Page 12: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

12

. Philosophy Psychology Perspectives

…..

Semantics in Context: Connecting 3 Viewsfor Geography/GIScience Knowledge

GeoReality

Task- Regiment LanguageWetland….geo-entity..what boundary?Flows Into isa Type of connected-toBoundary segments = straight lines, so overall boundary is a polyline…

Knowledge/GeoConcepts

This is different than regular land

and water

Name Reality

Understand Reality:Data evidence

Model to express what you understand

Models representingGeo-Knowledge

Maybe there is more than 1 type of

boundary

Page 13: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

Example of (Powerful) Challenges – Semantic Mismatches, Inclusions & Alignments

Language level for expressing semantics• Syntax and logical representation differences of the past should be handled by standardization & rule

translations.• Different expressivity (Owl vs. Common Logic) might be harder.

Ontology level (Grafton example)• Different conceptualizations such as different class scope, Hierarchy level differences, coverage or

granularity. • Scientists use different concepts & categories; • What does it mean to say that Concept P includes concept S?• What does it mean to say that concept P and S are semantically close?• Scientific understanding, often requires existing concepts to be revised or supplanted in the field

• Perspective – 4D vs. 3D, roads as straight lines or curves, time as interval or ratio…..• Tacit assumptions (when messaging, an agent has in mind a number of “unspoken,” implicit

consequences of that message.) – “You can’t drive on Grafton”….

Pragmatics of Intentions & goals (also Grafton example)

We have different goals so application & use are targeted. We need to adjust conceptualization to accommodate these.

10

Page 14: DFT Basic  Digital & Data  Concepts  -  data  is inherently collective  data

One View of Semantic Representation & Heterogeneity

A challenge of deep semantic interoperability is that:• A global and one size fits all (Gold Standard) representation for each distinct

situation, such as Grafton St. represented by data is not realistic, • and its procrustean nature may not be desirable if it ignores real heterogeneity

• The judgment of some (CF John Sowa) is that different representations might be optimal for different use cases

• Different levels of detail or granularity, along with different kinds of data entry options seem in practice suitable for different domains and settings.

• Since scientific research is diverse, and evolving, what approach to granular standards can be developed for use?

• Perhaps it is to use formal semantics to narrow the range of ambiguity for particular purposes.