data integration at the ontology engineering group

Post on 19-Jan-2015

954 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).

TRANSCRIPT

Data integration at our group: ingredients and some

prospects

Credible workshopSophia-Antipolis, October 15th 2012

Oscar Corchoocorcho@fi.upm.es

Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain

With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos

Buil-Aranda (OEG-UPM, now at PUC Chile)

Our data integration needs, problems (and challenges)

2

Need to access heterogeneous relational data sources (mainly in the area of Geography)

Need to submit SPARQL queries into distributed SPARQL endpoints

• Some of the databases are availablein different DBMSs

• And some of the data sources areavailable as spreadsheets

• Furthermore, many of these datasetsare already published as Linked Data

And data may be available from datastreams (e.g., sensors)

Ingredients

3

Linked Open Data Spreadsheets

1 RDB2RDF

4 Federated Query Processing

5 Reasoning

2 Optimisations3 Sensor-based

query rewriting

From SemsorGrid4Env architecture (http://www.semsorgrid4env.eu/)

Disclaimer

When I talk about ontology-based querying, I will be normally talking about SPARQL querying

4

In other words, how to make relational data available as RDF (and connected to ontologies)

1. RDB2RDF

5

• A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.

RDB2RDF. Motivation

6

transformationdescription

transformationengine

Q

Q’

RDB2RDF. Query rewriting for OBDA with mappings

7

Rewriting Mappings

There may be some mappings to translate

between ontology and DB. The rewriting should

consider those mappings.

RDB2RDF. Existing approaches

1. To build a new ontology from a database schema and content (direct mappings)

2. To map the ontology created in approach (1) to a legacy ontology

3. To map an existing DB to a legacy

ontology

new ontology

existing ontology

12

3

9

OEG’s background knowledge in RDB2RDF

• R2O and ODEMapster• GaV wrapper generation (no mediators)

• Syntactic sugar for the generation of SQL queries.• Simple use of this language and processor in the domains of

fund finding, cultural information, and fisheries.• NeOn Toolkit plugin for common mappings

Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an extensible and semantically based database-to-ontology mapping language. In: Proceedings of the Second Workshop on Semantic Web and Databases, SWDB 2004.

A view maps exactly one concept in the

ontology.

A subset of the columns in the view map a concept in the

ontology.

A subset (selection) of the records of a database view map a concept in the ontology.

A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL.

A column in a database view maps directly an attribute or a relation.

A column in a database view maps an attribute or a relation after some transformation.

A set of columns in a database view map an attribute or a relation.

One or more concepts can be extracted from a single data field (not in 1NF).

For concepts...

For attributes...

R2O (Relational-to-Ontology) Language

The W3C RDB2RDF Working Group

• Created in 2007• W3C Recommendations in

September 2012• R2RML: RDB to RDF Mapping

Language - http://www.w3.org/TR/r2rml/

• Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/

• R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/

• RDB2RDF Implementation Report - http://www.w3.org/2001/sw/rdb2rdf/implementation-report/

11

R2RML example

12

Existing implementations

• OEG implementations• http://code.google.com/p/oeg-obdi/• https://github.com/jpcik/morph• https://github.com/boricles/morph

13

RDB2RDF Implementation Report. Boris Villazón-Terrazas, Michael Hausenblas.http://www.w3.org/2001/sw/rdb2rdf/implementation-report/

Ongoing work

• Provide a list of common patterns in R2RML transformations, so that they can be reused (increasing productivity)• Sequeda J, Priyatna F, Villazón-Terrazas B. Relational

Database to RDF Mapping Patterns. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).

• Villazón-Terrazas B, Priyatna F. Building Ontologies by using Re-engineering Patterns and R2RML Mappings. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).Priyatna

• http://mappingpedia.linkeddata.es/

• Improve our support at Morph for all test cases• Adapt existing GUIs for the generation of mappings

(such as NeOn Toolkit’s one).

14

In other words, how to make this query rewriting optimised, so that we don’t suffer from a bad efficiency in our results

2. R2RML query rewriting optimisations

15

R2RML is now a W3C Recommendation

• That’s very good to ensure wide uptake, but…

• Implementations still suffer from their lack of efficiency• UltraWrap has shown that a similar performance can be

obtained with direct mappings on high-end databases (Oracle, SQL Server)

• What happens with low-end databases (mySQL)?

16

Several works on SPARQL to SQL translation

• Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an extensible and semantically based database-to-ontology mapping language. In: Proceedings of the Second Workshop on Semantic Web and Databases, SWDB 2004.

• R. Cyganiak. A relational algebra for sparql. Digital Media Systems Laboratory. HP Laboratories Bristol. HPL-2005-170, 2005.

• B. Elliott, E. Cheng, C. Thomas-Ogbuji, and Z.M. Ozsoyoglu. A complete translation from sparql into ecient sql. In Proceedings of the 2009 International Database Engineering & Applications Symposium, pages 31-42. ACM, 2009.

• A. Chebotko, S. Lu, and F. Fotouhi. Semantics preserving sparql-to-sql translation. Data & Knowledge Engineering, 68(10):973-1000, 2009.

17

Chebotko’s query rewriting

18

Our proposal

19

(Under embargo)Paper in preparation

An example. BSBM08

20

NATIVESELECT r.title, r.text, r.reviewDate, p.personID, p.name, r.rating1, r.rating2, r.rating3, r.rating4FROM review r, person pWHERE r.productID=55547 AND r.personID=p.personID AND r.language='en'ORDER BY r.reviewDate desc

CHEBOTKOSELECT var_rating2 AS rating2, var_reviewerName AS reviewerName, var_title AS title, var_rating1 AS rating1, var_reviewDate AS reviewDate, var_reviewer AS reviewer, var_rating3 AS rating3, var_rating4 AS rating4, var_text AS textFROM (SELECT * FROM (SELECT uri_rating41477446315 AS uri_rating41477446315, var_rating2 AS var_rating2, var_reviewer AS var_reviewer, uri_reviewDate750573656 AS uri_reviewDate750573656, var_rating4 AS var_rating4, var_rating1 AS var_rating1, var_text AS var_text, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3, uri_reviewer2088452952 AS uri_reviewer2088452952, uri_rating21477446253 AS uri_rating21477446253, uri_text1457367120 AS uri_text1457367120, uri_rating31477446284 AS uri_rating31477446284, uri_rating11477446222 AS uri_rating11477446222, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_reviewDate AS var_reviewDate, var_title AS var_title, uri_language269987354 AS uri_language269987354, uri_Product555472014519903 AS uri_Product555472014519903, v_7634.var_review AS var_review, var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, var_lang AS var_langFROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, v_6537.var_review AS var_review, uri_rating11477446222 AS uri_rating11477446222, uri_rating31477446284 AS uri_rating31477446284, uri_Product555472014519903 AS uri_Product555472014519903, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating2 AS var_rating2, uri_rating21477446253 AS uri_rating21477446253, var_reviewer AS var_reviewer, uri_name1396749066 AS uri_name1396749066, var_text AS var_text, uri_language269987354 AS uri_language269987354, var_rating1 AS var_rating1, uri_reviewDate750573656 AS uri_reviewDate750573656, var_title AS var_title, var_reviewerName AS var_reviewerName, var_rating3 AS var_rating3, uri_title1963229325 AS uri_title1963229325, var_lang AS var_lang, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDateFROM (SELECT var_reviewDate AS var_reviewDate, uri_language269987354 AS uri_language269987354, var_title AS var_title, uri_Product555472014519903 AS uri_Product555472014519903, uri_rating21477446253 AS uri_rating21477446253, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, uri_title1963229325 AS uri_title1963229325, var_rating2 AS var_rating2, v_8909.var_review AS var_review, var_lang AS var_lang, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewer2088452952 AS uri_reviewer2088452952, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating1 AS var_rating1, var_text AS var_text, var_reviewer AS var_reviewerFROM (SELECT uri_Product555472014519903 AS uri_Product555472014519903, v_7100.var_review AS var_review, uri_reviewer2088452952 AS uri_reviewer2088452952, var_reviewer AS var_reviewer, var_reviewerName AS var_reviewerName, var_reviewDate AS var_reviewDate, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_text AS var_text, uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, var_lang AS var_lang, uri_title1963229325 AS uri_title1963229325, var_title AS var_title, uri_language269987354 AS uri_language269987354FROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, uri_title1963229325 AS uri_title1963229325, var_lang AS var_lang, uri_name1396749066 AS uri_name1396749066, uri_Product555472014519903 AS uri_Product555472014519903, var_reviewerName AS var_reviewerName, var_title AS var_title, uri_reviewDate750573656 AS uri_reviewDate750573656, var_text AS var_text, uri_reviewFor1499735727 AS uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDate, v_4166.var_review AS var_review, var_reviewer AS var_reviewerFROM (SELECT v_2076.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewFor' AS uri_reviewFor1499735727, v_2076.PRODUCTID AS uri_Product555472014519903FROM REVIEW v_2076WHERE (v_2076.PRODUCTID = 55547) ) v_4166INNER JOIN (SELECT var_lang AS var_lang, var_reviewerName AS var_reviewerName, uri_reviewDate750573656 AS uri_reviewDate750573656, var_reviewer AS var_reviewer, uri_title1963229325 AS uri_title1963229325, var_text AS var_text, uri_name1396749066 AS uri_name1396749066, var_title AS var_title, uri_language269987354 AS uri_language269987354, uri_reviewer2088452952 AS uri_reviewer2088452952, v_7134.var_review AS var_review, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDateFROM (SELECT v_3759.REVIEWID AS var_review, 'http://purl.org/dc/elements/1.1/title' AS uri_title1963229325, v_3759.TITLE AS var_titleFROM REVIEW v_3759WHERE (v_3759.TITLE IS NOT NULL) ) v_7134INNER JOIN (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, var_text AS var_text, var_reviewer AS var_reviewer, uri_text1457367120 AS uri_text1457367120, uri_reviewDate750573656 AS uri_reviewDate750573656, v_3150.var_review AS var_review, var_lang AS var_lang, uri_name1396749066 AS uri_name1396749066, var_reviewDate AS var_reviewDate, var_reviewerName AS var_reviewerName, uri_language269987354 AS uri_language269987354FROM (SELECT v_7417.REVIEWID AS var_review, 'http://purl.org/stuff/rev#text' AS uri_text1457367120, v_7417.TEXT AS var_textFROM REVIEW v_7417WHERE (v_7417.TEXT IS NOT NULL) ) v_3150INNER JOIN (SELECT uri_language269987354 AS uri_language269987354, uri_name1396749066 AS uri_name1396749066, uri_reviewer2088452952 AS uri_reviewer2088452952, v_208.var_review AS var_review, var_reviewerName AS var_reviewerName, var_lang AS var_lang, var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate, uri_reviewDate750573656 AS uri_reviewDate750573656FROM (SELECT v_3119.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, v_3119.LANGUAGE AS var_langFROM REVIEW v_3119WHERE (v_3119.LANGUAGE IS NOT NULL) ) v_208INNER JOIN (SELECT uri_reviewDate750573656 AS uri_reviewDate750573656, var_reviewDate AS var_reviewDate, uri_name1396749066 AS uri_name1396749066, var_reviewerName AS var_reviewerName, var_reviewer AS var_reviewer, v_750.var_review AS var_review, uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_3971.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, v_3971.REVIEWDATE AS var_reviewDateFROM REVIEW v_3971WHERE (v_3971.REVIEWDATE IS NOT NULL) ) v_750INNER JOIN (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, var_reviewerName AS var_reviewerName, var_review AS var_review, v_3578.var_reviewer AS var_reviewer, uri_name1396749066 AS uri_name1396749066FROM (SELECT v_1393.REVIEWID AS var_review, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952, v_1393.PERSONID AS var_reviewerFROM REVIEW v_1393WHERE (v_1393.PERSONID IS NOT NULL) ) v_3578INNER JOIN (SELECT v_1353.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS uri_name1396749066, v_1353.NAME AS var_reviewerNameFROM PERSON v_1353WHERE (v_1353.NAME IS NOT NULL) ) v_6677 ON ((v_3578.var_reviewer = v_6677.var_reviewer) OR (v_3578.var_reviewer IS NULL) OR (v_6677.var_reviewer IS NULL)) ) v_6633 ON ((v_750.var_review = v_6633.var_review) OR (v_750.var_review IS NULL) OR (v_6633.var_review IS NULL)) ) v_6163 ON ((v_208.var_review = v_6163.var_review) OR (v_208.var_review IS NULL) OR (v_6163.var_review IS NULL)) ) v_4265 ON ((v_3150.var_review = v_4265.var_review) OR (v_3150.var_review IS NULL) OR (v_4265.var_review IS NULL)) ) v_3393 ON ((v_7134.var_review = v_3393.var_review) OR (v_7134.var_review IS NULL) OR (v_3393.var_review IS NULL)) ) v_2323 ON ((v_4166.var_review = v_2323.var_review) OR (v_4166.var_review IS NULL) OR (v_2323.var_review IS NULL)) ) v_7100LEFT JOIN (SELECT v_1061.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_1061.RATING1 AS var_rating1FROM REVIEW v_1061WHERE (v_1061.RATING1 IS NOT NULL) ) v_6802 ON ((v_7100.var_review = v_6802.var_review) OR (v_7100.var_review IS NULL) OR (v_6802.var_review IS NULL)) ) v_8909LEFT JOIN (SELECT v_4863.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2' AS uri_rating21477446253, v_4863.RATING2 AS var_rating2FROM REVIEW v_4863WHERE (v_4863.RATING2 IS NOT NULL) ) v_5037 ON ((v_8909.var_review = v_5037.var_review) OR (v_8909.var_review IS NULL) OR (v_5037.var_review IS NULL)) ) v_6537LEFT JOIN (SELECT v_9539.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, v_9539.RATING3 AS var_rating3FROM REVIEW v_9539WHERE (v_9539.RATING3 IS NOT NULL) ) v_2592 ON ((v_6537.var_review = v_2592.var_review) OR (v_6537.var_review IS NULL) OR (v_2592.var_review IS NULL)) ) v_7634LEFT JOIN (SELECT v_1309.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4' AS uri_rating41477446315, v_1309.RATING4 AS var_rating4FROM REVIEW v_1309WHERE (v_1309.RATING4 IS NOT NULL) ) v_219 ON ((v_7634.var_review = v_219.var_review) OR (v_7634.var_review IS NULL) OR (v_219.var_review IS NULL)) ) v_9704WHERE (var_lang = 'en')ORDER BY var_reviewDate DESC ) v_3839

An example. BSBM08

21

OUR APPROACHSELECT var_rating2 AS rating2, var_reviewDate AS reviewDate, var_rating4 AS rating4, var_rating1 AS rating1, var_reviewer AS reviewer, var_rating3 AS rating3, var_reviewerName AS reviewerName, var_text AS text, var_title AS titleFROM (SELECT * FROM (SELECT v_2660.var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate, var_review AS var_review, uri_rating31477446284 AS uri_rating31477446284, uri_rating21477446253 AS uri_rating21477446253, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354, uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, var_reviewerName AS var_reviewerName, var_lang AS var_lang, uri_Product555472014519903 AS uri_Product555472014519903, var_rating2 AS var_rating2, uri_rating41477446315 AS uri_rating41477446315, var_title AS var_title, var_rating4 AS var_rating4, var_text AS var_text, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_8722.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS uri_name1396749066, v_8722.NAME AS var_reviewerNameFROM PERSON v_8722WHERE (v_8722.NAME IS NOT NULL) ) v_2660INNER JOIN (SELECT v_3353.REVIEWDATE AS var_reviewDate, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_3353.REVIEWID AS var_review, v_3353.TEXT AS var_text, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952, v_3353.RATING1 AS var_rating1, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2' AS uri_rating21477446253, v_3353.TITLE AS var_title, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewFor' AS uri_reviewFor1499735727, v_3353.PERSONID AS var_reviewer, v_3353.RATING3 AS var_rating3, v_3353.PRODUCTID AS uri_Product555472014519903, v_3353.RATING4 AS var_rating4, v_3353.LANGUAGE AS var_lang, v_3353.RATING2 AS var_rating2, 'http://purl.org/stuff/rev#text' AS uri_text1457367120, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4' AS uri_rating41477446315, 'http://purl.org/dc/elements/1.1/title' AS uri_title1963229325FROM REVIEW v_3353WHERE ((v_3353.PRODUCTID = 55547) AND (v_3353.TEXT IS NOT NULL) AND (v_3353.TITLE IS NOT NULL) AND (v_3353.PERSONID IS NOT NULL) AND (v_3353.LANGUAGE IS NOT NULL) AND (v_3353.REVIEWDATE IS NOT NULL)) ) v_3049 ON ((v_2660.var_reviewer = v_3049.var_reviewer) OR (v_2660.var_reviewer IS NULL) OR (v_3049.var_reviewer IS NULL)) ) v_3795WHERE (var_lang = 'en')ORDER BY var_reviewDate DESC ) v_5787

Analysis with BSBM

22

SQL Server

mySQL

Ongoing work

• Writing the paper describing our optimisations

• Proposing a comprehensive benchmarking platform to test R2RML-compliant query rewriting systems• Extending our current work on the R2RML implementation

testcases

23

In other words, what happens if our data sources are not static, but data streams. Can we still use similar techniques?

3. Ontology-based sensor query rewriting

24

An example: SmartCities

25SmartSantander Project

Environmental sensors

Parking sensors

Data from the Web

26

Emergency planner

Flood risk alert: South East

England

forecastswave data Environmental

defenses

I have to make sense out of all this

data

Heterogeneity

Continuous querying

Streaming data

Ingredients for Linked Sensor Data

Core ontological model

Additional domain ontologies

Guidelines for generation of identifiers

Sensor Web programming interfaces

Query processing engines

http://www.flickr.com/photos/santos/2252824606/

Skeleton

Device

Deployment

PlatformSite

System

System

onPlatform only

hasSubsystem only, someSurvivalRang

e

hasSurvivalRange only

OperatingRangehasOperatingRange only

hasDeployment only

DeploymentRelatedProcess

Deployment

deploymentProcesPart only

deployedSystem only

Platform

deployedOnPlatform only

attachedSystem only

Device

Sensor

SensingDevice

Sensing

implements some

observes only

hasMeasurementCapability only

inDeployment only

SensorInput

detects only

isProxyFor onlyObservationValu

e

SensorOutput

hasValue some

isProducedBy some

Process

Process

hasInput only

hasOutput only, some

Input

Output

Observation

observedBy only

featureOfInterest only

observationResult only

Property

observedProperty onlyhasProperty only, some

isPropertyOf some

sensingMethodUsed only

includesEvent some

FeatureOfInterest

ConstraintBlock

Condition

inCondition only

MeasuringCapability

MeasurementCapability

forProperty only

OperatingRestriction

inCondition only

Data

Overview of the SSN ontology

Compton M, Barnaghi P, Bermúdez L, García-Castro R, Corcho O, Cox S, Graybeal J, Hauswirth M, Henson C, Herzog A, Huang V, Janowicz K, Kelsey WD, Le Phuoc D, Lefort L, Leggieri M, Neuhaus H, Nikolov A, Page K, Passant A, Sheth A, Taylor K. The SSN Ontology of the W3C Semantic Sensor Network Incubator Group. Journal of Web Semantics. In press

SSN Ontology with other Ontologies

29

García-Castro R, Corcho O, Hill C. A Core Ontology Model for Semantic Sensor Web Infrastructures. International Journal of Semantic Web and Information Systems 8(1):22-42

Queries to Sensor Data

30

SNEEqlRSTREAM SELECT id, speed, direction FROM wind [NOW];

Esper QLSELECT wind_speed FROM wind_sensor.win:time(10 min)

GSN RESTful servicehttp://montblanc.slf.ch:22001/multidata?vs[0]=wind_sensor&field[0]=wind_speed&

from=15/09/2011+05:00:00&to=15/09/2011+15:00:00

Pachube RESTful servicehttp://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-

02T14:01:46Z&end=2011-09-02T17:01:46Z

Data Stream Mgmt System

Complex Event Processors

Sensor Data Middleware

Querying through ontologies?

SPARQL-Stream

31

SELECT ?windspeed ?tidespeed FROM NAMED STREAM <http://swiss-experiment.ch/data#WannengratSensors.srdf> [NOW-10 MINUTES TO NOW-0 MINUTES] WHERE { ?WaveObs a ssn:Observation; ssn:observationResult ?windspeed; ssn:observedProperty sweetSpeed:WindSpeed. ?TideObs a ssn:Observation; ssn:observationResult ?tidespeed; ssn:observedProperty sweetSpeed:TideSpeed. FILTER (?tidespeed<?windspeed)}

Query processing closer to data

Use ontologies as conceptual model

Query virtual stream graphs

SPARQL-StreamSELECT ?name ( AVG(?temperature) AS ?avgTemperature )

( AVG(?humidity) AS ?avgHumidity )

FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS SLIDE 1 HOURS]

FROM <http://www.cwi.nl/SRBench/sensors>

FROM <http://www.cwi.nl/SRBench/geonames>

WHERE {

?sensor om-owl:generatedObservation ?temperatureObservation;

om-owl:generatedObservation ?humidityObservation;

om-owl:hasLocatedNearRel [ om-owl:hasLocation ?nearbyLocation ] .

?temperatureObservation om-owl:observedProperty weather:_AirTemperature ;

om-owl:result [ om-owl:floatValue ?temperature ] .

?humidityObservation om-owl:observedProperty weather:_RelativeHumidity ;

om-owl:result [ om-owl:floatValue ?humidity ] .

{ SELECT ?name

WHERE {

?nearbyLocation gn:featureClass ?featureClass ;

gn:name | gn:officialName ?name ;

gn:population ?population .

FILTER ( ?population > 15000 && REGEX(?featureClass, “P” , “i") )

}

}

UNION

{ SELECT ?name

WHERE {

?nearbyLocation gn:parentFeature+ ?parentFeature .

?parentFeature gn:featureClass ?parentClass ;

gn:name | gn:officialName ?name ;

gn:population ?parentPopulation .

FILTER ( ?parentPopulation > 15000 && REGEX(?parentClass, “P” , “i") )

}

}} GROUP BY ?name

32

Aggregates

Static & Streaming

Windows

Filters, Functions

Disclaimer: some features NYI

33

Querying the ObservationsSELECT ?waveheightFROM STREAM <www.ssg4env.eu/SensorReadings.srdf> [NOW -10 MINUTES TO NOW STEP 1 MINUTE]WHERE { ?WaveObs a sea:WaveHeightObservation; sea:hasValue ?waveheight; }

Query Rewriting

Query Processing

Clie

nt

Mappings

SPARQLStream

[tuples]

Sensor Network

Data translation[triples]

GSN API

:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7"; rr:subjectMap [ rr:template "http://swissex.ch/ns#WindSpeed/Wan7/{timed}"; rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate ssn:hasQuantityValue ]; rr:objectMap[ rr:column "sp_wind" ] ];

R2RML Mappings

http://montblanc.slf.ch :22001/ multidata ?vs [0]= wan7 &field [0]= sp_wind

Query processing engines

Rewriting to different technologies

SELECT ?windspeed FROM NAMED STREAM <http://swiss-

experiment.ch/data#WannengratSensors.srdf> [NOW-10 MINUTE TO NOW-0 MINUTE] WHERE { ?WaveObs a ssn:Observation; ssn:observationResult ?windspeed; ssn:observedProperty sweetSpeed:WindSpeed. }

34

http://montblanc.slf.ch:22001/multidata?vs[0]=wan7& field[0]=wind_speed_scalar_av&

from=15/05/2011+05:00:00&to=15/05/2011+15:00:00

http://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-02T14:01:46Z&end=2011-09-02T17:01:46Z

SELECT wind_speed_scalar_av, timed FROM wan7.win:time(10 min)

SELECT wan7.wind_speed_scalar_av AS windspeed, wan7.timed AS windts FROM wan7[FROM NOW-10 MINUTES TO NOW]

Query Rewriting

Algebra representatio

n

SNEE (DSMS)

Esper (CEP)

GSN (Middleware)

Pachube (Middleware)

Calbimonte JP, Corcho O, Yeung H, Aberer K. Enabling Query Technologies for the Semantic Sensor Web. International Journal of Semantic Web and Information Systems 8(1):43-63

Ongoing work

• Benchmarking of ontology-based streaming data engines• Zhang Y, Pham MD, Corcho O, Calbimonte JP. SRBench: A

Streaming RDF/SPARQL Benchmark. Proceedings of the 11th International Semantic Web Conference (ISWC2012)

• Improve optimisations when joining static and streaming data

• Automatic characterisation of sensor data streams• Useful in citizen science approaches (e.g., AirQualityEgg)• Calbimonte JP, Yan Z, Jeung H, Corcho O, Aberer K.

Deriving Semantic Sensor Metadata from Raw Measurements. ISWC2012 5th International Workshop on Semantic Sensor Networks 2011 (SSN2012). CEUR Workshop Proceedings, Vol-904, http://ceur-ws.org/Vol-904/

35

In other words, how can we access data from federated data sources

4. Federated query processing

36

Example

• We query the life science domain1. Using the Pubmed references obtained from the GeneID

gene dataset, retrieve information about genes and their references in the Pubmed dataset.

2. From Pubmed we access the information in the National Library of Medicines controlled vocabulary thesaurus, stored at the MeSH endpoint, so we have more complete information about such genes.

3. Finally, we also access the HHPID endpoint, which is the knowledge base for the HIV-1 protein.

37

Introduction

• Question:• How can we access such amount of RDF data in an

integrated manner?

• Current approaches• Replicate data in local stores, access it using existing RDF

databases.• Execute individual queries and manually join data.• Use existing distributed query systems (starting to appear).

38

Problem

• Existing tools for distributed SPARQL query processing differ in the way of handling distribution• SPARQL-published the Federated Query Document Last

Call Working Draft• It homogenises the access to distributed RDF data

repositories• SERVICE <http://dbpedia.org/sparql> {...}

• Problems in semantics: SERVICE ?X not well defined

• Current Access to SPARQL endpoints is not optimal• Work on SPARQL distributed query optimization is beginning

39

State of the Art• ANAPSID, RDF::Query, OpenAnzo, ARQ, Rasqal

RDF Query Library• ANAPSID provides SPARQL optimization based on

adaptive query processing operators• RDF::Query provides basic pattern reordering

• Implement the federation using query predicates• List of SPARQL endpoints needed• Helps user to direct queries to

remote datasets• FedX, SPLENDID, SemWIQ,

NetworkedGraphs• All provide basic optimisations: pattern

grouping (FedX), cost based optimizations(SemWIQ, SPLENDID and recently FedX, NetworkedGraphs)

• SPARQL 1.1 is mostly syntactic sugar

40

Assumptions & Restrictions

• Assumptions1. Users know how to create a

query to the endpoints

2. No statistics of any kind are available for the query processing system.

3. Data are distributed

• Restrictions1. We only consider the

Federation Extension of SPARQL 1.1

2. We are not aware of the capabilities or implementation of the remote SPARQL server

3. No registry of endpoints

41

SERVICE Semantics

• We extend [PAG09] with the semantics for SERVICE:

Example:SELECT ?name ?email

WHERE { SERVICE <http://example1.org/sparql>

{?y :name ?name} . SERVICE <http://example2.org/sparql>

{?y :email ?email}}

SELECT ?name ?emailWHERE {

?y :name ?name . ?y :email ?email

}

42

SERVICE Semantics

Example:SELECT ?nameWHERE { SERVICE ?X {?y :name ?name} }

43

SPARQL Optimisation - OPTIONAL

• We assume that we have no statistics of endpoints• This means that we cannot use cost-based optimisations• We will only focus on static optimisations

• Besides the usual static optimisations (e.g. Pushing down filters) SPARQL queries can be optimised if they contain OPTIONAL operators• The OPTIONAL operator is responsible for PSPACE-

completeness in SPARQL [PAG09]

• OPTIONAL is a key operator in SPARQL

44

Well-designed patterns

• Well-designed SPARQL patterns [PAG09]• Class of SPARQL patterns which adds a restriction

45

Well-designed Patterns

• We extended the notion of well-designed patterns for the SPARQL 1.1 Federation Extension• The previous rules also hold for SERVICE

46

Implementation: SPARQL-DQP

• SPARQL-DQP is implemented on top of OGSA-DAI and OGSA-DQP• OGSA-DAI is a Web service-based framework for accessing

distributed data resources• OGSA-DQP adds distributed query processing infrastructure

• We reuse some OGSA-DQP operators• We added RDF and SPARQL endpoint data access

• RDFB2RDF data resource • RDF data resource • SPARQL endpoint resources

• Good behaviour for large datasets

47

Buil C, Arenas M, Corcho O. Semantics and optimization of the SPARQL 1.1 federation extension. Proceedings of the 8th Extended Semantic Web Conference (ESWC2011). Springer-Verlag LNCS 6644, pages 1-15

Ongoing Work

• An extensive benchmark has been produced• Montoya G, Vidal ME, Corcho O, Ruckhaus E, Buil-Aranda

C. Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? In: Proceedings of the 11th International Semantic Web Conference (ISWC2012)

• Focusing now on Adaptive Query Processing• Query Processing should be adapted to the user's specific

needs and specific network requirements

48

In other words, how can we take into account the existence of ontologies in the query rewriting process, so as to provide simple entailment

5. Entailment in query rewriting

49

Main approaches in the state of the art

50

Expressiveness Author System Output

ELHIO¬ Pérez-Urbina et al. REQUIEM [R] Datalog, UCQ

Sticky-join [linear] datalog± Gottlob et al. Nyaya UCQ

DL-LiteR, DL-LiteF Calvanese et al. QuOnto UCQ

DL-LiteR Chortaras et al. Rapid UCQ

DL-LiteR [+EBox] Rosati et al.Presto & Prexto

NR-Datalog & UCQ

Optimizations in the rewriting

51

• The rewriting can be optimized in several ways• Ontology preprocessing• Subsumption checks• Prioritize inferences• Constrain the searches

Our proposal

52José Mora

(Under embargo)Paper in preparation

Conclusion and Future Work

• We have proposed some small incremental improvements over the current state of the art in entailment-aware query rewriting• Need to integrate it with the rest of our work• This will happen during Fall 2012

53

Final conclusions and future work

54

Ingredients

55

Linked Open Data Spreadsheets

1 RDB2RDF

4 Federated Query Processing

5 Reasoning

3 Optimisations2 Sensor-based

query rewriting

Data integration at our group: ingredients and some

prospects

Credible workshopSophia-Antipolis, October 15th 2012

Oscar Corchoocorcho@fi.upm.es

Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain

With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos

Buil-Aranda (OEG-UPM, now at PUC Chile)

top related