ontology for data integration

20
SOME THOUGHTS JUAN ESTEVA, PH. D. . 751 MALENA DR., ANN ARBOR, MI 48103 TEL: 734-786-0233 CELL 734-277-4962 FAX 734-821-0235 SKYPE DRESTEVA [email protected] Ontology Data Integration For Competitive Decision Making

Upload: juanesteva

Post on 10-May-2015

1.998 views

Category:

Technology


2 download

DESCRIPTION

semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate possible heterogeneities.

TRANSCRIPT

Page 1: Ontology For Data Integration

SOME THOUGHTS

JUAN ESTEVA, PH. D.

.7 5 1 M A L E N A D R . , A N N A R B O R , M I 4 8 1 0 3

T E L : 7 3 4 - 7 8 6 - 0 2 3 3 C E L L 7 3 4 - 2 7 7 - 4 9 6 2F A X 7 3 4 - 8 2 1 - 0 2 3 5S K Y P E D R E S T E V A

J U A N . E S T E V A @ A J A T E L L A . C O M

Ontology Data Integration For Competitive Decision

Making

Page 2: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

2

Not Just The Facts

“Good decisions are based on information that is analyzed and transformed into usable knowledge” Eileen Feretic

Page 3: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

3

Information at the point of impact

“Information needs to be at the point of impact—at the front lines where people are making decisions. The right analysis needs to be done at the right place. It’s important for organizations to treat information as a strategic asset in order to optimize every decision, every process, everything they do.” Ambuj Goyal,

Page 4: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

4

Data in Silos

“One of the biggest challenges organizations face is the amount of data sitting in silos, too often, valuable data simply isn’t accessible or available.” Boris Evelson

Page 5: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

5

Business Decisions for Competitive Advantage

“In today’s troubled economy and competitive business environment, making good decisions is a matter of survival. But good decisions aren’t based on gut feeling alone. They should be based on information gathered from multiple sources, which is then synthesized and analyzed to generate a road map of options and possible outcomes that transform data into usable knowledge” Eileen Feretic

Page 6: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

6

Business Intelligence

Business Intelligence and now Business Analytics systems come into play

[However,] it is hard to assemble [heterogeneous data and] disparate pieces of information in a way that provides the intelligence and insight needed to make good business decisions. Eileen Feretic

Alas enter Ontology Data Integration.

Page 7: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

7

Data Integration

Data integration provides the ability to manipulate data transparently across multiple data sources.

Based on the architecture there are 2 systems: Central Data Integration

A central data integration system usually has a global schema, which provides the user with a uniform interface to access information stored in the data sources

Peer-2-peer In contrast, in a peer-to-peer data integration system, there

are no global points of control on the data sources (or peers). Instead, any peer can accept user queries for the information distributed in the whole system.

Page 8: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

8

Common Approaches for Data Integration

Global-as-View In the GaV approach, every entity in the global

schema is associated with a view over the source local schema. Therefore querying strategies are simple, but the evolution of the local source schemas is not easily supported.

Local-as-View On the contrary, the LaV approach permits changes to

source schemas without affecting the global schema, since the local schemas are defined as views over the global schema, but query processing can be complex.

Page 9: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

9

Data Heterogeneity

Data sources can be heterogeneous in: Syntax

Syntactic heterogeneity is caused by the use of different models or languages.

Schema Schematic heterogeneity results from structural differences.

Semantics Semantic heterogeneity is caused by different meanings or

interpretations of data in various contexts

To achieve data interoperability, the issues posed by data heterogeneity need to be eliminated

Page 10: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

10

Possible Solutions

The advent of XML has created a syntactic platform for Web data standardization and exchange. However, schematic data heterogeneity may persist, depending on the XML schemas used (e.g., nesting hierarchies). Likewise, semantic heterogeneity may persist even if both syntactic and schematic heterogeneities do not occur (e.g., naming concepts differently).

We should be concerned with solving all three kinds of heterogeneities by bridging syntactic, schematic, and semantic heterogeneities across different sources.

Page 11: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

11

Semantic Data Integration Using Ontologies

We call semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate possible heterogeneities.

At the heart of semantic data integration is the concept of ontology, which is an explicit specification of a shared conceptualization

Page 12: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

12

Ontology & Data Integration

Metadata Representation. Metadata (i.e., source schemas) in each data source can be explicitly represented by a local ontology, using a single language.

Global Conceptualization. The global ontology provides a conceptual view over the schematically-heterogeneous source schemas.

Support for High-level Queries. Given a high-level view of the sources, as provided by a global ontology, the user can formulate a query without specific knowledge of the different data sources. The query is then rewritten into queries over the sources, based on the semantic mappings between the global and local ontologies.

Declarative Mediation. Query processing in a hybrid peer-to-peer system uses the global ontology as a declarative mediator for query rewriting between peers.

Mapping Support. A thesaurus, formalized in terms of an ontology, can be used for the mapping process to facilitate its automation.

Page 13: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

13

What do we need?

Increase search capabilities From discovery to reasoning

Increasing metadata as to provide strong semantics From glossaries to ontologies

Consequently, moving from syntactic interoperability to structural interoperability and finally to semantic interoperability

Page 14: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

Graphically the model progression will be [2]

14

The point of this graph is that Increasing Metadata (from glossaries to ontologies) is highly correlated with Increasing Search Capability (from discovery to reasoning).

Page 15: Ontology For Data Integration

Juan Esteva, Ph. D. 04/11/2023

15

References

Page 16: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

16

References

1. Applying 4D ontologies to Enterprise Architecture, Matthew West, Shell Corp.

2. FHA Data Architecture Working Group: SICoP DRM 2.0 Pilot, 2005

3. The Role of Ontologies in Data Integration, Isabel F. Cruz Huiyong Xiao

Page 17: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

17

Topic Maps

Topic Maps is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The ISO standard is formally known as ISO/IEC 13250:2003.

A topic map represents information using topics (representing any concept, from people, countries, and organizations to software modules, individual files, and events), associations (representing the relationships between topics), and occurrences (representing information resources relevant to a particular topic).

Page 18: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

18

SKOS

Simple Knowledge Organization System (SKOS) SKOS is a common data model for sharing and linking

knowledge organization systems via the Web.

Page 19: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

19

RDF

Resource Description Language RDF RDF is a standard model for data interchange on the Web. RDF

has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.

Page 20: Ontology For Data Integration

04/11/2023Juan Esteva, Ph. D.

20

OWL

Web Ontology Language OWL is a Semantic Web language designed to represent rich

and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be reasoned with by computer programs either to verify the consistency of that knowledge or to make implicit knowledge explicit. OWL documents, known as ontologies, can be published in the World Wide Web and may refer to or be referred from other OWL ontologies. OWL is part of the W3C’s Semantic Web technology stack, which includes RDF, RDFS, SPARQL, etc.