data integration at the ontology engineering group

56
Data integration at our group: ingredients and some prospects Credible workshop Sophia-Antipolis, October 15 th 2012 Oscar Corcho [email protected] Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos Buil-Aranda (OEG-UPM, now at PUC Chile)

Upload: oscar-corcho

Post on 19-Jan-2015

954 views

Category:

Technology


0 download

DESCRIPTION

Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).

TRANSCRIPT

Page 1: Data Integration at the Ontology Engineering Group

Data integration at our group: ingredients and some

prospects

Credible workshopSophia-Antipolis, October 15th 2012

Oscar [email protected]

Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain

With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos

Buil-Aranda (OEG-UPM, now at PUC Chile)

Page 2: Data Integration at the Ontology Engineering Group

Our data integration needs, problems (and challenges)

2

Need to access heterogeneous relational data sources (mainly in the area of Geography)

Need to submit SPARQL queries into distributed SPARQL endpoints

• Some of the databases are availablein different DBMSs

• And some of the data sources areavailable as spreadsheets

• Furthermore, many of these datasetsare already published as Linked Data

And data may be available from datastreams (e.g., sensors)

Page 3: Data Integration at the Ontology Engineering Group

Ingredients

3

Linked Open Data Spreadsheets

1 RDB2RDF

4 Federated Query Processing

5 Reasoning

2 Optimisations3 Sensor-based

query rewriting

From SemsorGrid4Env architecture (http://www.semsorgrid4env.eu/)

Page 4: Data Integration at the Ontology Engineering Group

Disclaimer

When I talk about ontology-based querying, I will be normally talking about SPARQL querying

4

Page 5: Data Integration at the Ontology Engineering Group

In other words, how to make relational data available as RDF (and connected to ontologies)

1. RDB2RDF

5

Page 6: Data Integration at the Ontology Engineering Group

• A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.

RDB2RDF. Motivation

6

transformationdescription

transformationengine

Page 7: Data Integration at the Ontology Engineering Group

Q

Q’

RDB2RDF. Query rewriting for OBDA with mappings

7

Rewriting Mappings

There may be some mappings to translate

between ontology and DB. The rewriting should

consider those mappings.

Page 8: Data Integration at the Ontology Engineering Group

RDB2RDF. Existing approaches

1. To build a new ontology from a database schema and content (direct mappings)

2. To map the ontology created in approach (1) to a legacy ontology

3. To map an existing DB to a legacy

ontology

new ontology

existing ontology

12

3

Page 9: Data Integration at the Ontology Engineering Group

9

OEG’s background knowledge in RDB2RDF

• R2O and ODEMapster• GaV wrapper generation (no mediators)

• Syntactic sugar for the generation of SQL queries.• Simple use of this language and processor in the domains of

fund finding, cultural information, and fisheries.• NeOn Toolkit plugin for common mappings

Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an extensible and semantically based database-to-ontology mapping language. In: Proceedings of the Second Workshop on Semantic Web and Databases, SWDB 2004.

Page 10: Data Integration at the Ontology Engineering Group

A view maps exactly one concept in the

ontology.

A subset of the columns in the view map a concept in the

ontology.

A subset (selection) of the records of a database view map a concept in the ontology.

A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL.

A column in a database view maps directly an attribute or a relation.

A column in a database view maps an attribute or a relation after some transformation.

A set of columns in a database view map an attribute or a relation.

One or more concepts can be extracted from a single data field (not in 1NF).

For concepts...

For attributes...

R2O (Relational-to-Ontology) Language

Page 11: Data Integration at the Ontology Engineering Group

The W3C RDB2RDF Working Group

• Created in 2007• W3C Recommendations in

September 2012• R2RML: RDB to RDF Mapping

Language - http://www.w3.org/TR/r2rml/

• Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/

• R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/

• RDB2RDF Implementation Report - http://www.w3.org/2001/sw/rdb2rdf/implementation-report/

11

Page 12: Data Integration at the Ontology Engineering Group

R2RML example

12

Page 13: Data Integration at the Ontology Engineering Group

Existing implementations

• OEG implementations• http://code.google.com/p/oeg-obdi/• https://github.com/jpcik/morph• https://github.com/boricles/morph

13

RDB2RDF Implementation Report. Boris Villazón-Terrazas, Michael Hausenblas.http://www.w3.org/2001/sw/rdb2rdf/implementation-report/

Page 14: Data Integration at the Ontology Engineering Group

Ongoing work

• Provide a list of common patterns in R2RML transformations, so that they can be reused (increasing productivity)• Sequeda J, Priyatna F, Villazón-Terrazas B. Relational

Database to RDF Mapping Patterns. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).

• Villazón-Terrazas B, Priyatna F. Building Ontologies by using Re-engineering Patterns and R2RML Mappings. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).Priyatna

• http://mappingpedia.linkeddata.es/

• Improve our support at Morph for all test cases• Adapt existing GUIs for the generation of mappings

(such as NeOn Toolkit’s one).

14

Page 15: Data Integration at the Ontology Engineering Group

In other words, how to make this query rewriting optimised, so that we don’t suffer from a bad efficiency in our results

2. R2RML query rewriting optimisations

15

Page 16: Data Integration at the Ontology Engineering Group

R2RML is now a W3C Recommendation

• That’s very good to ensure wide uptake, but…

• Implementations still suffer from their lack of efficiency• UltraWrap has shown that a similar performance can be

obtained with direct mappings on high-end databases (Oracle, SQL Server)

• What happens with low-end databases (mySQL)?

16

Page 17: Data Integration at the Ontology Engineering Group

Several works on SPARQL to SQL translation

• Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an extensible and semantically based database-to-ontology mapping language. In: Proceedings of the Second Workshop on Semantic Web and Databases, SWDB 2004.

• R. Cyganiak. A relational algebra for sparql. Digital Media Systems Laboratory. HP Laboratories Bristol. HPL-2005-170, 2005.

• B. Elliott, E. Cheng, C. Thomas-Ogbuji, and Z.M. Ozsoyoglu. A complete translation from sparql into ecient sql. In Proceedings of the 2009 International Database Engineering & Applications Symposium, pages 31-42. ACM, 2009.

• A. Chebotko, S. Lu, and F. Fotouhi. Semantics preserving sparql-to-sql translation. Data & Knowledge Engineering, 68(10):973-1000, 2009.

17

Page 18: Data Integration at the Ontology Engineering Group

Chebotko’s query rewriting

18

Page 19: Data Integration at the Ontology Engineering Group

Our proposal

19

(Under embargo)Paper in preparation

Page 20: Data Integration at the Ontology Engineering Group

An example. BSBM08

20

NATIVESELECT r.title, r.text, r.reviewDate, p.personID, p.name, r.rating1, r.rating2, r.rating3, r.rating4FROM review r, person pWHERE r.productID=55547 AND r.personID=p.personID AND r.language='en'ORDER BY r.reviewDate desc

CHEBOTKOSELECT var_rating2 AS rating2, var_reviewerName AS reviewerName, var_title AS title, var_rating1 AS rating1, var_reviewDate AS reviewDate, var_reviewer AS reviewer, var_rating3 AS rating3, var_rating4 AS rating4, var_text AS textFROM (SELECT * FROM (SELECT uri_rating41477446315 AS uri_rating41477446315, var_rating2 AS var_rating2, var_reviewer AS var_reviewer, uri_reviewDate750573656 AS uri_reviewDate750573656, var_rating4 AS var_rating4, var_rating1 AS var_rating1, var_text AS var_text, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3, uri_reviewer2088452952 AS uri_reviewer2088452952, uri_rating21477446253 AS uri_rating21477446253, uri_text1457367120 AS uri_text1457367120, uri_rating31477446284 AS uri_rating31477446284, uri_rating11477446222 AS uri_rating11477446222, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_reviewDate AS var_reviewDate, var_title AS var_title, uri_language269987354 AS uri_language269987354, uri_Product555472014519903 AS uri_Product555472014519903, v_7634.var_review AS var_review, var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, var_lang AS var_langFROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, v_6537.var_review AS var_review, uri_rating11477446222 AS uri_rating11477446222, uri_rating31477446284 AS uri_rating31477446284, uri_Product555472014519903 AS uri_Product555472014519903, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating2 AS var_rating2, uri_rating21477446253 AS uri_rating21477446253, var_reviewer AS var_reviewer, uri_name1396749066 AS uri_name1396749066, var_text AS var_text, uri_language269987354 AS uri_language269987354, var_rating1 AS var_rating1, uri_reviewDate750573656 AS uri_reviewDate750573656, var_title AS var_title, var_reviewerName AS var_reviewerName, var_rating3 AS var_rating3, uri_title1963229325 AS uri_title1963229325, var_lang AS var_lang, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDateFROM (SELECT var_reviewDate AS var_reviewDate, uri_language269987354 AS uri_language269987354, var_title AS var_title, uri_Product555472014519903 AS uri_Product555472014519903, uri_rating21477446253 AS uri_rating21477446253, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, uri_title1963229325 AS uri_title1963229325, var_rating2 AS var_rating2, v_8909.var_review AS var_review, var_lang AS var_lang, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewer2088452952 AS uri_reviewer2088452952, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating1 AS var_rating1, var_text AS var_text, var_reviewer AS var_reviewerFROM (SELECT uri_Product555472014519903 AS uri_Product555472014519903, v_7100.var_review AS var_review, uri_reviewer2088452952 AS uri_reviewer2088452952, var_reviewer AS var_reviewer, var_reviewerName AS var_reviewerName, var_reviewDate AS var_reviewDate, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_text AS var_text, uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, var_lang AS var_lang, uri_title1963229325 AS uri_title1963229325, var_title AS var_title, uri_language269987354 AS uri_language269987354FROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, uri_title1963229325 AS uri_title1963229325, var_lang AS var_lang, uri_name1396749066 AS uri_name1396749066, uri_Product555472014519903 AS uri_Product555472014519903, var_reviewerName AS var_reviewerName, var_title AS var_title, uri_reviewDate750573656 AS uri_reviewDate750573656, var_text AS var_text, uri_reviewFor1499735727 AS uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDate, v_4166.var_review AS var_review, var_reviewer AS var_reviewerFROM (SELECT v_2076.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewFor' AS uri_reviewFor1499735727, v_2076.PRODUCTID AS uri_Product555472014519903FROM REVIEW v_2076WHERE (v_2076.PRODUCTID = 55547) ) v_4166INNER JOIN (SELECT var_lang AS var_lang, var_reviewerName AS var_reviewerName, uri_reviewDate750573656 AS uri_reviewDate750573656, var_reviewer AS var_reviewer, uri_title1963229325 AS uri_title1963229325, var_text AS var_text, uri_name1396749066 AS uri_name1396749066, var_title AS var_title, uri_language269987354 AS uri_language269987354, uri_reviewer2088452952 AS uri_reviewer2088452952, v_7134.var_review AS var_review, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDateFROM (SELECT v_3759.REVIEWID AS var_review, 'http://purl.org/dc/elements/1.1/title' AS uri_title1963229325, v_3759.TITLE AS var_titleFROM REVIEW v_3759WHERE (v_3759.TITLE IS NOT NULL) ) v_7134INNER JOIN (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, var_text AS var_text, var_reviewer AS var_reviewer, uri_text1457367120 AS uri_text1457367120, uri_reviewDate750573656 AS uri_reviewDate750573656, v_3150.var_review AS var_review, var_lang AS var_lang, uri_name1396749066 AS uri_name1396749066, var_reviewDate AS var_reviewDate, var_reviewerName AS var_reviewerName, uri_language269987354 AS uri_language269987354FROM (SELECT v_7417.REVIEWID AS var_review, 'http://purl.org/stuff/rev#text' AS uri_text1457367120, v_7417.TEXT AS var_textFROM REVIEW v_7417WHERE (v_7417.TEXT IS NOT NULL) ) v_3150INNER JOIN (SELECT uri_language269987354 AS uri_language269987354, uri_name1396749066 AS uri_name1396749066, uri_reviewer2088452952 AS uri_reviewer2088452952, v_208.var_review AS var_review, var_reviewerName AS var_reviewerName, var_lang AS var_lang, var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate, uri_reviewDate750573656 AS uri_reviewDate750573656FROM (SELECT v_3119.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, v_3119.LANGUAGE AS var_langFROM REVIEW v_3119WHERE (v_3119.LANGUAGE IS NOT NULL) ) v_208INNER JOIN (SELECT uri_reviewDate750573656 AS uri_reviewDate750573656, var_reviewDate AS var_reviewDate, uri_name1396749066 AS uri_name1396749066, var_reviewerName AS var_reviewerName, var_reviewer AS var_reviewer, v_750.var_review AS var_review, uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_3971.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, v_3971.REVIEWDATE AS var_reviewDateFROM REVIEW v_3971WHERE (v_3971.REVIEWDATE IS NOT NULL) ) v_750INNER JOIN (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, var_reviewerName AS var_reviewerName, var_review AS var_review, v_3578.var_reviewer AS var_reviewer, uri_name1396749066 AS uri_name1396749066FROM (SELECT v_1393.REVIEWID AS var_review, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952, v_1393.PERSONID AS var_reviewerFROM REVIEW v_1393WHERE (v_1393.PERSONID IS NOT NULL) ) v_3578INNER JOIN (SELECT v_1353.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS uri_name1396749066, v_1353.NAME AS var_reviewerNameFROM PERSON v_1353WHERE (v_1353.NAME IS NOT NULL) ) v_6677 ON ((v_3578.var_reviewer = v_6677.var_reviewer) OR (v_3578.var_reviewer IS NULL) OR (v_6677.var_reviewer IS NULL)) ) v_6633 ON ((v_750.var_review = v_6633.var_review) OR (v_750.var_review IS NULL) OR (v_6633.var_review IS NULL)) ) v_6163 ON ((v_208.var_review = v_6163.var_review) OR (v_208.var_review IS NULL) OR (v_6163.var_review IS NULL)) ) v_4265 ON ((v_3150.var_review = v_4265.var_review) OR (v_3150.var_review IS NULL) OR (v_4265.var_review IS NULL)) ) v_3393 ON ((v_7134.var_review = v_3393.var_review) OR (v_7134.var_review IS NULL) OR (v_3393.var_review IS NULL)) ) v_2323 ON ((v_4166.var_review = v_2323.var_review) OR (v_4166.var_review IS NULL) OR (v_2323.var_review IS NULL)) ) v_7100LEFT JOIN (SELECT v_1061.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_1061.RATING1 AS var_rating1FROM REVIEW v_1061WHERE (v_1061.RATING1 IS NOT NULL) ) v_6802 ON ((v_7100.var_review = v_6802.var_review) OR (v_7100.var_review IS NULL) OR (v_6802.var_review IS NULL)) ) v_8909LEFT JOIN (SELECT v_4863.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2' AS uri_rating21477446253, v_4863.RATING2 AS var_rating2FROM REVIEW v_4863WHERE (v_4863.RATING2 IS NOT NULL) ) v_5037 ON ((v_8909.var_review = v_5037.var_review) OR (v_8909.var_review IS NULL) OR (v_5037.var_review IS NULL)) ) v_6537LEFT JOIN (SELECT v_9539.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, v_9539.RATING3 AS var_rating3FROM REVIEW v_9539WHERE (v_9539.RATING3 IS NOT NULL) ) v_2592 ON ((v_6537.var_review = v_2592.var_review) OR (v_6537.var_review IS NULL) OR (v_2592.var_review IS NULL)) ) v_7634LEFT JOIN (SELECT v_1309.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4' AS uri_rating41477446315, v_1309.RATING4 AS var_rating4FROM REVIEW v_1309WHERE (v_1309.RATING4 IS NOT NULL) ) v_219 ON ((v_7634.var_review = v_219.var_review) OR (v_7634.var_review IS NULL) OR (v_219.var_review IS NULL)) ) v_9704WHERE (var_lang = 'en')ORDER BY var_reviewDate DESC ) v_3839

Page 21: Data Integration at the Ontology Engineering Group

An example. BSBM08

21

OUR APPROACHSELECT var_rating2 AS rating2, var_reviewDate AS reviewDate, var_rating4 AS rating4, var_rating1 AS rating1, var_reviewer AS reviewer, var_rating3 AS rating3, var_reviewerName AS reviewerName, var_text AS text, var_title AS titleFROM (SELECT * FROM (SELECT v_2660.var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate, var_review AS var_review, uri_rating31477446284 AS uri_rating31477446284, uri_rating21477446253 AS uri_rating21477446253, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354, uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, var_reviewerName AS var_reviewerName, var_lang AS var_lang, uri_Product555472014519903 AS uri_Product555472014519903, var_rating2 AS var_rating2, uri_rating41477446315 AS uri_rating41477446315, var_title AS var_title, var_rating4 AS var_rating4, var_text AS var_text, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_8722.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS uri_name1396749066, v_8722.NAME AS var_reviewerNameFROM PERSON v_8722WHERE (v_8722.NAME IS NOT NULL) ) v_2660INNER JOIN (SELECT v_3353.REVIEWDATE AS var_reviewDate, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_3353.REVIEWID AS var_review, v_3353.TEXT AS var_text, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952, v_3353.RATING1 AS var_rating1, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2' AS uri_rating21477446253, v_3353.TITLE AS var_title, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewFor' AS uri_reviewFor1499735727, v_3353.PERSONID AS var_reviewer, v_3353.RATING3 AS var_rating3, v_3353.PRODUCTID AS uri_Product555472014519903, v_3353.RATING4 AS var_rating4, v_3353.LANGUAGE AS var_lang, v_3353.RATING2 AS var_rating2, 'http://purl.org/stuff/rev#text' AS uri_text1457367120, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4' AS uri_rating41477446315, 'http://purl.org/dc/elements/1.1/title' AS uri_title1963229325FROM REVIEW v_3353WHERE ((v_3353.PRODUCTID = 55547) AND (v_3353.TEXT IS NOT NULL) AND (v_3353.TITLE IS NOT NULL) AND (v_3353.PERSONID IS NOT NULL) AND (v_3353.LANGUAGE IS NOT NULL) AND (v_3353.REVIEWDATE IS NOT NULL)) ) v_3049 ON ((v_2660.var_reviewer = v_3049.var_reviewer) OR (v_2660.var_reviewer IS NULL) OR (v_3049.var_reviewer IS NULL)) ) v_3795WHERE (var_lang = 'en')ORDER BY var_reviewDate DESC ) v_5787

Page 22: Data Integration at the Ontology Engineering Group

Analysis with BSBM

22

SQL Server

mySQL

Page 23: Data Integration at the Ontology Engineering Group

Ongoing work

• Writing the paper describing our optimisations

• Proposing a comprehensive benchmarking platform to test R2RML-compliant query rewriting systems• Extending our current work on the R2RML implementation

testcases

23

Page 24: Data Integration at the Ontology Engineering Group

In other words, what happens if our data sources are not static, but data streams. Can we still use similar techniques?

3. Ontology-based sensor query rewriting

24

Page 25: Data Integration at the Ontology Engineering Group

An example: SmartCities

25SmartSantander Project

Environmental sensors

Parking sensors

Page 26: Data Integration at the Ontology Engineering Group

Data from the Web

26

Emergency planner

Flood risk alert: South East

England

forecastswave data Environmental

defenses

I have to make sense out of all this

data

Heterogeneity

Continuous querying

Streaming data

Page 27: Data Integration at the Ontology Engineering Group

Ingredients for Linked Sensor Data

Core ontological model

Additional domain ontologies

Guidelines for generation of identifiers

Sensor Web programming interfaces

Query processing engines

http://www.flickr.com/photos/santos/2252824606/

Page 28: Data Integration at the Ontology Engineering Group

Skeleton

Device

Deployment

PlatformSite

System

System

onPlatform only

hasSubsystem only, someSurvivalRang

e

hasSurvivalRange only

OperatingRangehasOperatingRange only

hasDeployment only

DeploymentRelatedProcess

Deployment

deploymentProcesPart only

deployedSystem only

Platform

deployedOnPlatform only

attachedSystem only

Device

Sensor

SensingDevice

Sensing

implements some

observes only

hasMeasurementCapability only

inDeployment only

SensorInput

detects only

isProxyFor onlyObservationValu

e

SensorOutput

hasValue some

isProducedBy some

Process

Process

hasInput only

hasOutput only, some

Input

Output

Observation

observedBy only

featureOfInterest only

observationResult only

Property

observedProperty onlyhasProperty only, some

isPropertyOf some

sensingMethodUsed only

includesEvent some

FeatureOfInterest

ConstraintBlock

Condition

inCondition only

MeasuringCapability

MeasurementCapability

forProperty only

OperatingRestriction

inCondition only

Data

Overview of the SSN ontology

Compton M, Barnaghi P, Bermúdez L, García-Castro R, Corcho O, Cox S, Graybeal J, Hauswirth M, Henson C, Herzog A, Huang V, Janowicz K, Kelsey WD, Le Phuoc D, Lefort L, Leggieri M, Neuhaus H, Nikolov A, Page K, Passant A, Sheth A, Taylor K. The SSN Ontology of the W3C Semantic Sensor Network Incubator Group. Journal of Web Semantics. In press

Page 29: Data Integration at the Ontology Engineering Group

SSN Ontology with other Ontologies

29

García-Castro R, Corcho O, Hill C. A Core Ontology Model for Semantic Sensor Web Infrastructures. International Journal of Semantic Web and Information Systems 8(1):22-42

Page 30: Data Integration at the Ontology Engineering Group

Queries to Sensor Data

30

SNEEqlRSTREAM SELECT id, speed, direction FROM wind [NOW];

Esper QLSELECT wind_speed FROM wind_sensor.win:time(10 min)

GSN RESTful servicehttp://montblanc.slf.ch:22001/multidata?vs[0]=wind_sensor&field[0]=wind_speed&

from=15/09/2011+05:00:00&to=15/09/2011+15:00:00

Pachube RESTful servicehttp://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-

02T14:01:46Z&end=2011-09-02T17:01:46Z

Data Stream Mgmt System

Complex Event Processors

Sensor Data Middleware

Querying through ontologies?

Page 31: Data Integration at the Ontology Engineering Group

SPARQL-Stream

31

SELECT ?windspeed ?tidespeed FROM NAMED STREAM <http://swiss-experiment.ch/data#WannengratSensors.srdf> [NOW-10 MINUTES TO NOW-0 MINUTES] WHERE { ?WaveObs a ssn:Observation; ssn:observationResult ?windspeed; ssn:observedProperty sweetSpeed:WindSpeed. ?TideObs a ssn:Observation; ssn:observationResult ?tidespeed; ssn:observedProperty sweetSpeed:TideSpeed. FILTER (?tidespeed<?windspeed)}

Query processing closer to data

Use ontologies as conceptual model

Query virtual stream graphs

Page 32: Data Integration at the Ontology Engineering Group

SPARQL-StreamSELECT ?name ( AVG(?temperature) AS ?avgTemperature )

( AVG(?humidity) AS ?avgHumidity )

FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS SLIDE 1 HOURS]

FROM <http://www.cwi.nl/SRBench/sensors>

FROM <http://www.cwi.nl/SRBench/geonames>

WHERE {

?sensor om-owl:generatedObservation ?temperatureObservation;

om-owl:generatedObservation ?humidityObservation;

om-owl:hasLocatedNearRel [ om-owl:hasLocation ?nearbyLocation ] .

?temperatureObservation om-owl:observedProperty weather:_AirTemperature ;

om-owl:result [ om-owl:floatValue ?temperature ] .

?humidityObservation om-owl:observedProperty weather:_RelativeHumidity ;

om-owl:result [ om-owl:floatValue ?humidity ] .

{ SELECT ?name

WHERE {

?nearbyLocation gn:featureClass ?featureClass ;

gn:name | gn:officialName ?name ;

gn:population ?population .

FILTER ( ?population > 15000 && REGEX(?featureClass, “P” , “i") )

}

}

UNION

{ SELECT ?name

WHERE {

?nearbyLocation gn:parentFeature+ ?parentFeature .

?parentFeature gn:featureClass ?parentClass ;

gn:name | gn:officialName ?name ;

gn:population ?parentPopulation .

FILTER ( ?parentPopulation > 15000 && REGEX(?parentClass, “P” , “i") )

}

}} GROUP BY ?name

32

Aggregates

Static & Streaming

Windows

Filters, Functions

Disclaimer: some features NYI

Page 33: Data Integration at the Ontology Engineering Group

33

Querying the ObservationsSELECT ?waveheightFROM STREAM <www.ssg4env.eu/SensorReadings.srdf> [NOW -10 MINUTES TO NOW STEP 1 MINUTE]WHERE { ?WaveObs a sea:WaveHeightObservation; sea:hasValue ?waveheight; }

Query Rewriting

Query Processing

Clie

nt

Mappings

SPARQLStream

[tuples]

Sensor Network

Data translation[triples]

GSN API

:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7"; rr:subjectMap [ rr:template "http://swissex.ch/ns#WindSpeed/Wan7/{timed}"; rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate ssn:hasQuantityValue ]; rr:objectMap[ rr:column "sp_wind" ] ];

R2RML Mappings

http://montblanc.slf.ch :22001/ multidata ?vs [0]= wan7 &field [0]= sp_wind

Query processing engines

Page 34: Data Integration at the Ontology Engineering Group

Rewriting to different technologies

SELECT ?windspeed FROM NAMED STREAM <http://swiss-

experiment.ch/data#WannengratSensors.srdf> [NOW-10 MINUTE TO NOW-0 MINUTE] WHERE { ?WaveObs a ssn:Observation; ssn:observationResult ?windspeed; ssn:observedProperty sweetSpeed:WindSpeed. }

34

http://montblanc.slf.ch:22001/multidata?vs[0]=wan7& field[0]=wind_speed_scalar_av&

from=15/05/2011+05:00:00&to=15/05/2011+15:00:00

http://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-02T14:01:46Z&end=2011-09-02T17:01:46Z

SELECT wind_speed_scalar_av, timed FROM wan7.win:time(10 min)

SELECT wan7.wind_speed_scalar_av AS windspeed, wan7.timed AS windts FROM wan7[FROM NOW-10 MINUTES TO NOW]

Query Rewriting

Algebra representatio

n

SNEE (DSMS)

Esper (CEP)

GSN (Middleware)

Pachube (Middleware)

Calbimonte JP, Corcho O, Yeung H, Aberer K. Enabling Query Technologies for the Semantic Sensor Web. International Journal of Semantic Web and Information Systems 8(1):43-63

Page 35: Data Integration at the Ontology Engineering Group

Ongoing work

• Benchmarking of ontology-based streaming data engines• Zhang Y, Pham MD, Corcho O, Calbimonte JP. SRBench: A

Streaming RDF/SPARQL Benchmark. Proceedings of the 11th International Semantic Web Conference (ISWC2012)

• Improve optimisations when joining static and streaming data

• Automatic characterisation of sensor data streams• Useful in citizen science approaches (e.g., AirQualityEgg)• Calbimonte JP, Yan Z, Jeung H, Corcho O, Aberer K.

Deriving Semantic Sensor Metadata from Raw Measurements. ISWC2012 5th International Workshop on Semantic Sensor Networks 2011 (SSN2012). CEUR Workshop Proceedings, Vol-904, http://ceur-ws.org/Vol-904/

35

Page 36: Data Integration at the Ontology Engineering Group

In other words, how can we access data from federated data sources

4. Federated query processing

36

Page 37: Data Integration at the Ontology Engineering Group

Example

• We query the life science domain1. Using the Pubmed references obtained from the GeneID

gene dataset, retrieve information about genes and their references in the Pubmed dataset.

2. From Pubmed we access the information in the National Library of Medicines controlled vocabulary thesaurus, stored at the MeSH endpoint, so we have more complete information about such genes.

3. Finally, we also access the HHPID endpoint, which is the knowledge base for the HIV-1 protein.

37

Page 38: Data Integration at the Ontology Engineering Group

Introduction

• Question:• How can we access such amount of RDF data in an

integrated manner?

• Current approaches• Replicate data in local stores, access it using existing RDF

databases.• Execute individual queries and manually join data.• Use existing distributed query systems (starting to appear).

38

Page 39: Data Integration at the Ontology Engineering Group

Problem

• Existing tools for distributed SPARQL query processing differ in the way of handling distribution• SPARQL-published the Federated Query Document Last

Call Working Draft• It homogenises the access to distributed RDF data

repositories• SERVICE <http://dbpedia.org/sparql> {...}

• Problems in semantics: SERVICE ?X not well defined

• Current Access to SPARQL endpoints is not optimal• Work on SPARQL distributed query optimization is beginning

39

Page 40: Data Integration at the Ontology Engineering Group

State of the Art• ANAPSID, RDF::Query, OpenAnzo, ARQ, Rasqal

RDF Query Library• ANAPSID provides SPARQL optimization based on

adaptive query processing operators• RDF::Query provides basic pattern reordering

• Implement the federation using query predicates• List of SPARQL endpoints needed• Helps user to direct queries to

remote datasets• FedX, SPLENDID, SemWIQ,

NetworkedGraphs• All provide basic optimisations: pattern

grouping (FedX), cost based optimizations(SemWIQ, SPLENDID and recently FedX, NetworkedGraphs)

• SPARQL 1.1 is mostly syntactic sugar

40

Page 41: Data Integration at the Ontology Engineering Group

Assumptions & Restrictions

• Assumptions1. Users know how to create a

query to the endpoints

2. No statistics of any kind are available for the query processing system.

3. Data are distributed

• Restrictions1. We only consider the

Federation Extension of SPARQL 1.1

2. We are not aware of the capabilities or implementation of the remote SPARQL server

3. No registry of endpoints

41

Page 42: Data Integration at the Ontology Engineering Group

SERVICE Semantics

• We extend [PAG09] with the semantics for SERVICE:

Example:SELECT ?name ?email

WHERE { SERVICE <http://example1.org/sparql>

{?y :name ?name} . SERVICE <http://example2.org/sparql>

{?y :email ?email}}

SELECT ?name ?emailWHERE {

?y :name ?name . ?y :email ?email

}

42

Page 43: Data Integration at the Ontology Engineering Group

SERVICE Semantics

Example:SELECT ?nameWHERE { SERVICE ?X {?y :name ?name} }

43

Page 44: Data Integration at the Ontology Engineering Group

SPARQL Optimisation - OPTIONAL

• We assume that we have no statistics of endpoints• This means that we cannot use cost-based optimisations• We will only focus on static optimisations

• Besides the usual static optimisations (e.g. Pushing down filters) SPARQL queries can be optimised if they contain OPTIONAL operators• The OPTIONAL operator is responsible for PSPACE-

completeness in SPARQL [PAG09]

• OPTIONAL is a key operator in SPARQL

44

Page 45: Data Integration at the Ontology Engineering Group

Well-designed patterns

• Well-designed SPARQL patterns [PAG09]• Class of SPARQL patterns which adds a restriction

45

Page 46: Data Integration at the Ontology Engineering Group

Well-designed Patterns

• We extended the notion of well-designed patterns for the SPARQL 1.1 Federation Extension• The previous rules also hold for SERVICE

46

Page 47: Data Integration at the Ontology Engineering Group

Implementation: SPARQL-DQP

• SPARQL-DQP is implemented on top of OGSA-DAI and OGSA-DQP• OGSA-DAI is a Web service-based framework for accessing

distributed data resources• OGSA-DQP adds distributed query processing infrastructure

• We reuse some OGSA-DQP operators• We added RDF and SPARQL endpoint data access

• RDFB2RDF data resource • RDF data resource • SPARQL endpoint resources

• Good behaviour for large datasets

47

Buil C, Arenas M, Corcho O. Semantics and optimization of the SPARQL 1.1 federation extension. Proceedings of the 8th Extended Semantic Web Conference (ESWC2011). Springer-Verlag LNCS 6644, pages 1-15

Page 48: Data Integration at the Ontology Engineering Group

Ongoing Work

• An extensive benchmark has been produced• Montoya G, Vidal ME, Corcho O, Ruckhaus E, Buil-Aranda

C. Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? In: Proceedings of the 11th International Semantic Web Conference (ISWC2012)

• Focusing now on Adaptive Query Processing• Query Processing should be adapted to the user's specific

needs and specific network requirements

48

Page 49: Data Integration at the Ontology Engineering Group

In other words, how can we take into account the existence of ontologies in the query rewriting process, so as to provide simple entailment

5. Entailment in query rewriting

49

Page 50: Data Integration at the Ontology Engineering Group

Main approaches in the state of the art

50

Expressiveness Author System Output

ELHIO¬ Pérez-Urbina et al. REQUIEM [R] Datalog, UCQ

Sticky-join [linear] datalog± Gottlob et al. Nyaya UCQ

DL-LiteR, DL-LiteF Calvanese et al. QuOnto UCQ

DL-LiteR Chortaras et al. Rapid UCQ

DL-LiteR [+EBox] Rosati et al.Presto & Prexto

NR-Datalog & UCQ

Page 51: Data Integration at the Ontology Engineering Group

Optimizations in the rewriting

51

• The rewriting can be optimized in several ways• Ontology preprocessing• Subsumption checks• Prioritize inferences• Constrain the searches

Page 52: Data Integration at the Ontology Engineering Group

Our proposal

52José Mora

(Under embargo)Paper in preparation

Page 53: Data Integration at the Ontology Engineering Group

Conclusion and Future Work

• We have proposed some small incremental improvements over the current state of the art in entailment-aware query rewriting• Need to integrate it with the rest of our work• This will happen during Fall 2012

53

Page 54: Data Integration at the Ontology Engineering Group

Final conclusions and future work

54

Page 55: Data Integration at the Ontology Engineering Group

Ingredients

55

Linked Open Data Spreadsheets

1 RDB2RDF

4 Federated Query Processing

5 Reasoning

3 Optimisations2 Sensor-based

query rewriting

Page 56: Data Integration at the Ontology Engineering Group

Data integration at our group: ingredients and some

prospects

Credible workshopSophia-Antipolis, October 15th 2012

Oscar [email protected]

Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain

With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos

Buil-Aranda (OEG-UPM, now at PUC Chile)