ontology for data integration
DESCRIPTION
semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate possible heterogeneities.TRANSCRIPT
SOME THOUGHTS
JUAN ESTEVA, PH. D.
.7 5 1 M A L E N A D R . , A N N A R B O R , M I 4 8 1 0 3
T E L : 7 3 4 - 7 8 6 - 0 2 3 3 C E L L 7 3 4 - 2 7 7 - 4 9 6 2F A X 7 3 4 - 8 2 1 - 0 2 3 5S K Y P E D R E S T E V A
J U A N . E S T E V A @ A J A T E L L A . C O M
Ontology Data Integration For Competitive Decision
Making
04/11/2023Juan Esteva, Ph. D.
2
Not Just The Facts
“Good decisions are based on information that is analyzed and transformed into usable knowledge” Eileen Feretic
04/11/2023Juan Esteva, Ph. D.
3
Information at the point of impact
“Information needs to be at the point of impact—at the front lines where people are making decisions. The right analysis needs to be done at the right place. It’s important for organizations to treat information as a strategic asset in order to optimize every decision, every process, everything they do.” Ambuj Goyal,
04/11/2023Juan Esteva, Ph. D.
4
Data in Silos
“One of the biggest challenges organizations face is the amount of data sitting in silos, too often, valuable data simply isn’t accessible or available.” Boris Evelson
04/11/2023Juan Esteva, Ph. D.
5
Business Decisions for Competitive Advantage
“In today’s troubled economy and competitive business environment, making good decisions is a matter of survival. But good decisions aren’t based on gut feeling alone. They should be based on information gathered from multiple sources, which is then synthesized and analyzed to generate a road map of options and possible outcomes that transform data into usable knowledge” Eileen Feretic
04/11/2023Juan Esteva, Ph. D.
6
Business Intelligence
Business Intelligence and now Business Analytics systems come into play
[However,] it is hard to assemble [heterogeneous data and] disparate pieces of information in a way that provides the intelligence and insight needed to make good business decisions. Eileen Feretic
Alas enter Ontology Data Integration.
04/11/2023Juan Esteva, Ph. D.
7
Data Integration
Data integration provides the ability to manipulate data transparently across multiple data sources.
Based on the architecture there are 2 systems: Central Data Integration
A central data integration system usually has a global schema, which provides the user with a uniform interface to access information stored in the data sources
Peer-2-peer In contrast, in a peer-to-peer data integration system, there
are no global points of control on the data sources (or peers). Instead, any peer can accept user queries for the information distributed in the whole system.
04/11/2023Juan Esteva, Ph. D.
8
Common Approaches for Data Integration
Global-as-View In the GaV approach, every entity in the global
schema is associated with a view over the source local schema. Therefore querying strategies are simple, but the evolution of the local source schemas is not easily supported.
Local-as-View On the contrary, the LaV approach permits changes to
source schemas without affecting the global schema, since the local schemas are defined as views over the global schema, but query processing can be complex.
04/11/2023Juan Esteva, Ph. D.
9
Data Heterogeneity
Data sources can be heterogeneous in: Syntax
Syntactic heterogeneity is caused by the use of different models or languages.
Schema Schematic heterogeneity results from structural differences.
Semantics Semantic heterogeneity is caused by different meanings or
interpretations of data in various contexts
To achieve data interoperability, the issues posed by data heterogeneity need to be eliminated
04/11/2023Juan Esteva, Ph. D.
10
Possible Solutions
The advent of XML has created a syntactic platform for Web data standardization and exchange. However, schematic data heterogeneity may persist, depending on the XML schemas used (e.g., nesting hierarchies). Likewise, semantic heterogeneity may persist even if both syntactic and schematic heterogeneities do not occur (e.g., naming concepts differently).
We should be concerned with solving all three kinds of heterogeneities by bridging syntactic, schematic, and semantic heterogeneities across different sources.
04/11/2023Juan Esteva, Ph. D.
11
Semantic Data Integration Using Ontologies
We call semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate possible heterogeneities.
At the heart of semantic data integration is the concept of ontology, which is an explicit specification of a shared conceptualization
04/11/2023Juan Esteva, Ph. D.
12
Ontology & Data Integration
Metadata Representation. Metadata (i.e., source schemas) in each data source can be explicitly represented by a local ontology, using a single language.
Global Conceptualization. The global ontology provides a conceptual view over the schematically-heterogeneous source schemas.
Support for High-level Queries. Given a high-level view of the sources, as provided by a global ontology, the user can formulate a query without specific knowledge of the different data sources. The query is then rewritten into queries over the sources, based on the semantic mappings between the global and local ontologies.
Declarative Mediation. Query processing in a hybrid peer-to-peer system uses the global ontology as a declarative mediator for query rewriting between peers.
Mapping Support. A thesaurus, formalized in terms of an ontology, can be used for the mapping process to facilitate its automation.
04/11/2023Juan Esteva, Ph. D.
13
What do we need?
Increase search capabilities From discovery to reasoning
Increasing metadata as to provide strong semantics From glossaries to ontologies
Consequently, moving from syntactic interoperability to structural interoperability and finally to semantic interoperability
04/11/2023Juan Esteva, Ph. D.
Graphically the model progression will be [2]
14
The point of this graph is that Increasing Metadata (from glossaries to ontologies) is highly correlated with Increasing Search Capability (from discovery to reasoning).
Juan Esteva, Ph. D. 04/11/2023
15
References
04/11/2023Juan Esteva, Ph. D.
16
References
1. Applying 4D ontologies to Enterprise Architecture, Matthew West, Shell Corp.
2. FHA Data Architecture Working Group: SICoP DRM 2.0 Pilot, 2005
3. The Role of Ontologies in Data Integration, Isabel F. Cruz Huiyong Xiao
04/11/2023Juan Esteva, Ph. D.
17
Topic Maps
Topic Maps is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The ISO standard is formally known as ISO/IEC 13250:2003.
A topic map represents information using topics (representing any concept, from people, countries, and organizations to software modules, individual files, and events), associations (representing the relationships between topics), and occurrences (representing information resources relevant to a particular topic).
04/11/2023Juan Esteva, Ph. D.
18
SKOS
Simple Knowledge Organization System (SKOS) SKOS is a common data model for sharing and linking
knowledge organization systems via the Web.
04/11/2023Juan Esteva, Ph. D.
19
RDF
Resource Description Language RDF RDF is a standard model for data interchange on the Web. RDF
has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.
04/11/2023Juan Esteva, Ph. D.
20
OWL
Web Ontology Language OWL is a Semantic Web language designed to represent rich
and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be reasoned with by computer programs either to verify the consistency of that knowledge or to make implicit knowledge explicit. OWL documents, known as ontologies, can be published in the World Wide Web and may refer to or be referred from other OWL ontologies. OWL is part of the W3C’s Semantic Web technology stack, which includes RDF, RDFS, SPARQL, etc.