incmap: a journey towards ontology-based data...

20
INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATION CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG, ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.

Upload: hatu

Post on 20-Aug-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATION

CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG,

ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.

EXPLORING DATABASES CAN BE TEDIOUS…

DBLP CMT EASYCHAIR

Author of paper with

title ‘IncMap’?

SQL 2 SQL 1 SQL 3

Schema 1 Schema 2 Schema 3

PROBLEM 1: TOO MANY TABLES

Author of paper with

title ‘IncMap’?

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

A typical SAP schema has more than 10.000 tables

PROBLEM 2: LIMITED EXPRESSIVENESS

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2)

Modeling generalization is “messy”

PROBLEM 3: TECHNICAL DESIGN

BDC_IXN_FACT_MA

BDC_ACCOUNT_DIM

BDC_DEMOGRAPHICS_DIM BDC_IXN_FACT_WA

Other issues: •  De-normalization (i.e., merge tables) •  No foreign keys! •  Performance optimizations (horizontal, vertical

fragmentation, …)

ONTOLOGY-BASED DATA ACCESS

DBLP CMT EASYCHAIR

ONTOLOGY-BASED DATA ACCESS

SQL 2 SQL 1 SQL 3

HIGH-LEVEL QUERY

Author of paper with

title ‘IncMap’?

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2)

Minimal Ontology (in OWL QL)

ONTOLOGY-BASED DATA ACCESS

Relational Schema

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2) Mapping?

Ontology

IncMap: A Mapping Tool for Relational-To-Ontology Data Integration

THE JOURNEY OF INCMAP

First version of IncMap

•  Incremental mapping

•  Leverage lexicographical and structural similarity

Christoph Pinkel, et al.: Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap. International Semantic Web Conference 2013

THE JOURNEY OF INCMAP

First version of IncMap

•  Incremental mapping

•  Leverage lexicographical and structural similarity

Second version of IncMap

•  Consider typical design patterns

•  Leverage reasoning (open vs. closed-world)

•  Bootstrap mappings (fully automatic)

Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Andriy Nikolov, Andreas Schwarte, Christian Heupel, Tim Kraska: IncMap: A Journey towards Ontology-based Data Integration. BTW 2017

STEP 1: MAPPING TO INCGRAPHS

Person'

ID'

...'

Paper'

?tle'

PersID'(FK)'

...'

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

Author'domain'

writes' Paper'range'

Class'

Object'Property'

type'

Datatype'Property'

hasTitle'domain'

type'

type'

subClassOf'

Person'

type'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Relational Schema R Ontology O

IncGraph(R) IncGraph(O)

Main Reason: Mitigate structural differences

IncGraph(R)

STEP 2: REASONING AND PATTERNS

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

mul?Etype'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Pattern: Inheritance Reasoning

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2)

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

IncGraph+(R) IncGraph+(O)

IncGraph(O)

REASONING: TWO OPTIONS

Option 1: Full reasoning

1.  Reasoning on the base ontology using OWL QL

2.  Add all derivable elements to IncGraph(O)

Option 2: Custom reasoning (to close “modeling gaps”)

1.  Reasoning on the IncGraph(O)

•  Generalization hierarchies •  Additional domain and range information •  …

2.  Add selected elements to IncGraph(O) set weights (see next slides)

STEP 3: PAIRWISE MATCHING

Author'ref'

writes' Paper'ref'

val'

…'

Person'ref'

PersID' Paper'ref'

val' val'…'

Target'

Source'

…'

Possible'Matches'

Author'ref'

writes' Paper'ref'Person' PersID' Paper'

Author'ref'

writes' Paper'ref'Paper' PersID' Person'

Paper'ref'

writes' Author'ref'Person' PersID' Paper'

1.0$0.1$0.2$

0.1$

0.1$0.5$

0.2$ 0.5$

0.2$

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

mul?Etype'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Pairwise Connectivity Graph

STEP 4: FIXPOINT COMPUTATION

•  Human Input (Acceptance and Rejection of Mappings)

•  Weights for Patterns (Probability of Pattern)

•  Deactivation of Edges (based on Patterns)

Author'ref'

writes' Paper'ref'Person' PersID' Paper'

Author'ref'

writes' Paper'ref'Paper' PersID' Person'

Paper'ref'

writes' Author'ref'Person' PersID' Paper'

1.0$0.1$0.2$

0.1$

0.1$0.5$

0.2$ 0.5$

0.2$

Pairwise Connectivity Graph

Fixpoint Computation (Ext. Similarity Flooding)

0.7 0.5 0.9

0.3 0.3 0.3

Sub-class

0.9 1.0 1.0 1.0

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

EVALUATION: RODI BENCHMARK

Conferenceontology1

TargetOntologies(Schema)

Oil&gasontology

SourceDatabases

(Schema+Data)

CMTVariant

CMTCanon. … Conf.

VariantConf.Canon. … Single,large

real-worldschema

MappingRules? MappingRules? MappingRules?

Conferenceontology2

Mond.Variant

Mond.Rel. …

MappingRules?

Geodataontology

Variants:

1. Adjusted Naming

2. Structural Adjustments (e.g., hierarchies)

3. Removed foreign keys

4. Merging / Splitting of tables

5. Combined cases

SIGKDD Conference CMT

Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Wolfgang May, Dominique Ritze, Martin G. Skjæveland, Alessandro Solimando, Evgeny Kharlamov: RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data Integration. ESWC 2015

Real-World

https://github.com/chrpin/rodi

EVALUATION: RODI BENCHMARK

Evaluation queries:

•  Queries simulate information need

•  Can be additional input for mapping

•  56 queries from simple to complex

Metric: per-query F-measure

EVALUATION: COMPETITORS

Relational-to-Ontology Mapping Systems

•  Ontop: http://ontop.inf.unibz.it (Free University of Bozen-Bolzano)

•  Bootox: https://www.cs.ox.ac.uk/isg/tools/BootOX/ (University of Oxford)

General Mapping Systems (Baseline)

•  COMA++: http://dbs.uni-leipzig.de/de/Research/coma.html (University of Leipzig)

EVALUATION: RESULTS

EVALUATION: RESULTS

CONCLUSIONS

•  Incremental Mapping Generation for Relational-to-Ontology Mappings

•  Most benefits from domain knowledge (patterns, reasoning)

•  Integrated into real-world platform at fluidOps

•  Possible future directions: Patterns, other graph similarity metrics, …