data integration at the ontology engineering group
DESCRIPTION
Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).TRANSCRIPT
Data integration at our group: ingredients and some
prospects
Credible workshopSophia-Antipolis, October 15th 2012
Oscar [email protected]
Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain
With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos
Buil-Aranda (OEG-UPM, now at PUC Chile)
Our data integration needs, problems (and challenges)
2
Need to access heterogeneous relational data sources (mainly in the area of Geography)
Need to submit SPARQL queries into distributed SPARQL endpoints
• Some of the databases are availablein different DBMSs
• And some of the data sources areavailable as spreadsheets
• Furthermore, many of these datasetsare already published as Linked Data
And data may be available from datastreams (e.g., sensors)
Ingredients
3
Linked Open Data Spreadsheets
1 RDB2RDF
4 Federated Query Processing
5 Reasoning
2 Optimisations3 Sensor-based
query rewriting
From SemsorGrid4Env architecture (http://www.semsorgrid4env.eu/)
Disclaimer
When I talk about ontology-based querying, I will be normally talking about SPARQL querying
4
In other words, how to make relational data available as RDF (and connected to ontologies)
1. RDB2RDF
5
• A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.
RDB2RDF. Motivation
6
transformationdescription
transformationengine
Q
Q’
RDB2RDF. Query rewriting for OBDA with mappings
7
Rewriting Mappings
There may be some mappings to translate
between ontology and DB. The rewriting should
consider those mappings.
RDB2RDF. Existing approaches
1. To build a new ontology from a database schema and content (direct mappings)
2. To map the ontology created in approach (1) to a legacy ontology
3. To map an existing DB to a legacy
ontology
new ontology
existing ontology
12
3
9
OEG’s background knowledge in RDB2RDF
• R2O and ODEMapster• GaV wrapper generation (no mediators)
• Syntactic sugar for the generation of SQL queries.• Simple use of this language and processor in the domains of
fund finding, cultural information, and fisheries.• NeOn Toolkit plugin for common mappings
Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an extensible and semantically based database-to-ontology mapping language. In: Proceedings of the Second Workshop on Semantic Web and Databases, SWDB 2004.
A view maps exactly one concept in the
ontology.
A subset of the columns in the view map a concept in the
ontology.
A subset (selection) of the records of a database view map a concept in the ontology.
A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL.
A column in a database view maps directly an attribute or a relation.
A column in a database view maps an attribute or a relation after some transformation.
A set of columns in a database view map an attribute or a relation.
One or more concepts can be extracted from a single data field (not in 1NF).
For concepts...
For attributes...
R2O (Relational-to-Ontology) Language
The W3C RDB2RDF Working Group
• Created in 2007• W3C Recommendations in
September 2012• R2RML: RDB to RDF Mapping
Language - http://www.w3.org/TR/r2rml/
• Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/
• R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/
• RDB2RDF Implementation Report - http://www.w3.org/2001/sw/rdb2rdf/implementation-report/
11
R2RML example
12
Existing implementations
• OEG implementations• http://code.google.com/p/oeg-obdi/• https://github.com/jpcik/morph• https://github.com/boricles/morph
13
RDB2RDF Implementation Report. Boris Villazón-Terrazas, Michael Hausenblas.http://www.w3.org/2001/sw/rdb2rdf/implementation-report/
Ongoing work
• Provide a list of common patterns in R2RML transformations, so that they can be reused (increasing productivity)• Sequeda J, Priyatna F, Villazón-Terrazas B. Relational
Database to RDF Mapping Patterns. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).
• Villazón-Terrazas B, Priyatna F. Building Ontologies by using Re-engineering Patterns and R2RML Mappings. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).Priyatna
• http://mappingpedia.linkeddata.es/
• Improve our support at Morph for all test cases• Adapt existing GUIs for the generation of mappings
(such as NeOn Toolkit’s one).
14
In other words, how to make this query rewriting optimised, so that we don’t suffer from a bad efficiency in our results
2. R2RML query rewriting optimisations
15
R2RML is now a W3C Recommendation
• That’s very good to ensure wide uptake, but…
• Implementations still suffer from their lack of efficiency• UltraWrap has shown that a similar performance can be
obtained with direct mappings on high-end databases (Oracle, SQL Server)
• What happens with low-end databases (mySQL)?
16
Several works on SPARQL to SQL translation
• Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an extensible and semantically based database-to-ontology mapping language. In: Proceedings of the Second Workshop on Semantic Web and Databases, SWDB 2004.
• R. Cyganiak. A relational algebra for sparql. Digital Media Systems Laboratory. HP Laboratories Bristol. HPL-2005-170, 2005.
• B. Elliott, E. Cheng, C. Thomas-Ogbuji, and Z.M. Ozsoyoglu. A complete translation from sparql into ecient sql. In Proceedings of the 2009 International Database Engineering & Applications Symposium, pages 31-42. ACM, 2009.
• A. Chebotko, S. Lu, and F. Fotouhi. Semantics preserving sparql-to-sql translation. Data & Knowledge Engineering, 68(10):973-1000, 2009.
17
Chebotko’s query rewriting
18
Our proposal
19
(Under embargo)Paper in preparation
An example. BSBM08
20
NATIVESELECT r.title, r.text, r.reviewDate, p.personID, p.name, r.rating1, r.rating2, r.rating3, r.rating4FROM review r, person pWHERE r.productID=55547 AND r.personID=p.personID AND r.language='en'ORDER BY r.reviewDate desc
CHEBOTKOSELECT var_rating2 AS rating2, var_reviewerName AS reviewerName, var_title AS title, var_rating1 AS rating1, var_reviewDate AS reviewDate, var_reviewer AS reviewer, var_rating3 AS rating3, var_rating4 AS rating4, var_text AS textFROM (SELECT * FROM (SELECT uri_rating41477446315 AS uri_rating41477446315, var_rating2 AS var_rating2, var_reviewer AS var_reviewer, uri_reviewDate750573656 AS uri_reviewDate750573656, var_rating4 AS var_rating4, var_rating1 AS var_rating1, var_text AS var_text, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3, uri_reviewer2088452952 AS uri_reviewer2088452952, uri_rating21477446253 AS uri_rating21477446253, uri_text1457367120 AS uri_text1457367120, uri_rating31477446284 AS uri_rating31477446284, uri_rating11477446222 AS uri_rating11477446222, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_reviewDate AS var_reviewDate, var_title AS var_title, uri_language269987354 AS uri_language269987354, uri_Product555472014519903 AS uri_Product555472014519903, v_7634.var_review AS var_review, var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, var_lang AS var_langFROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, v_6537.var_review AS var_review, uri_rating11477446222 AS uri_rating11477446222, uri_rating31477446284 AS uri_rating31477446284, uri_Product555472014519903 AS uri_Product555472014519903, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating2 AS var_rating2, uri_rating21477446253 AS uri_rating21477446253, var_reviewer AS var_reviewer, uri_name1396749066 AS uri_name1396749066, var_text AS var_text, uri_language269987354 AS uri_language269987354, var_rating1 AS var_rating1, uri_reviewDate750573656 AS uri_reviewDate750573656, var_title AS var_title, var_reviewerName AS var_reviewerName, var_rating3 AS var_rating3, uri_title1963229325 AS uri_title1963229325, var_lang AS var_lang, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDateFROM (SELECT var_reviewDate AS var_reviewDate, uri_language269987354 AS uri_language269987354, var_title AS var_title, uri_Product555472014519903 AS uri_Product555472014519903, uri_rating21477446253 AS uri_rating21477446253, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, uri_title1963229325 AS uri_title1963229325, var_rating2 AS var_rating2, v_8909.var_review AS var_review, var_lang AS var_lang, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewer2088452952 AS uri_reviewer2088452952, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating1 AS var_rating1, var_text AS var_text, var_reviewer AS var_reviewerFROM (SELECT uri_Product555472014519903 AS uri_Product555472014519903, v_7100.var_review AS var_review, uri_reviewer2088452952 AS uri_reviewer2088452952, var_reviewer AS var_reviewer, var_reviewerName AS var_reviewerName, var_reviewDate AS var_reviewDate, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_text AS var_text, uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, var_lang AS var_lang, uri_title1963229325 AS uri_title1963229325, var_title AS var_title, uri_language269987354 AS uri_language269987354FROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, uri_title1963229325 AS uri_title1963229325, var_lang AS var_lang, uri_name1396749066 AS uri_name1396749066, uri_Product555472014519903 AS uri_Product555472014519903, var_reviewerName AS var_reviewerName, var_title AS var_title, uri_reviewDate750573656 AS uri_reviewDate750573656, var_text AS var_text, uri_reviewFor1499735727 AS uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDate, v_4166.var_review AS var_review, var_reviewer AS var_reviewerFROM (SELECT v_2076.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewFor' AS uri_reviewFor1499735727, v_2076.PRODUCTID AS uri_Product555472014519903FROM REVIEW v_2076WHERE (v_2076.PRODUCTID = 55547) ) v_4166INNER JOIN (SELECT var_lang AS var_lang, var_reviewerName AS var_reviewerName, uri_reviewDate750573656 AS uri_reviewDate750573656, var_reviewer AS var_reviewer, uri_title1963229325 AS uri_title1963229325, var_text AS var_text, uri_name1396749066 AS uri_name1396749066, var_title AS var_title, uri_language269987354 AS uri_language269987354, uri_reviewer2088452952 AS uri_reviewer2088452952, v_7134.var_review AS var_review, uri_text1457367120 AS uri_text1457367120, var_reviewDate AS var_reviewDateFROM (SELECT v_3759.REVIEWID AS var_review, 'http://purl.org/dc/elements/1.1/title' AS uri_title1963229325, v_3759.TITLE AS var_titleFROM REVIEW v_3759WHERE (v_3759.TITLE IS NOT NULL) ) v_7134INNER JOIN (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, var_text AS var_text, var_reviewer AS var_reviewer, uri_text1457367120 AS uri_text1457367120, uri_reviewDate750573656 AS uri_reviewDate750573656, v_3150.var_review AS var_review, var_lang AS var_lang, uri_name1396749066 AS uri_name1396749066, var_reviewDate AS var_reviewDate, var_reviewerName AS var_reviewerName, uri_language269987354 AS uri_language269987354FROM (SELECT v_7417.REVIEWID AS var_review, 'http://purl.org/stuff/rev#text' AS uri_text1457367120, v_7417.TEXT AS var_textFROM REVIEW v_7417WHERE (v_7417.TEXT IS NOT NULL) ) v_3150INNER JOIN (SELECT uri_language269987354 AS uri_language269987354, uri_name1396749066 AS uri_name1396749066, uri_reviewer2088452952 AS uri_reviewer2088452952, v_208.var_review AS var_review, var_reviewerName AS var_reviewerName, var_lang AS var_lang, var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate, uri_reviewDate750573656 AS uri_reviewDate750573656FROM (SELECT v_3119.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, v_3119.LANGUAGE AS var_langFROM REVIEW v_3119WHERE (v_3119.LANGUAGE IS NOT NULL) ) v_208INNER JOIN (SELECT uri_reviewDate750573656 AS uri_reviewDate750573656, var_reviewDate AS var_reviewDate, uri_name1396749066 AS uri_name1396749066, var_reviewerName AS var_reviewerName, var_reviewer AS var_reviewer, v_750.var_review AS var_review, uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_3971.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, v_3971.REVIEWDATE AS var_reviewDateFROM REVIEW v_3971WHERE (v_3971.REVIEWDATE IS NOT NULL) ) v_750INNER JOIN (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, var_reviewerName AS var_reviewerName, var_review AS var_review, v_3578.var_reviewer AS var_reviewer, uri_name1396749066 AS uri_name1396749066FROM (SELECT v_1393.REVIEWID AS var_review, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952, v_1393.PERSONID AS var_reviewerFROM REVIEW v_1393WHERE (v_1393.PERSONID IS NOT NULL) ) v_3578INNER JOIN (SELECT v_1353.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS uri_name1396749066, v_1353.NAME AS var_reviewerNameFROM PERSON v_1353WHERE (v_1353.NAME IS NOT NULL) ) v_6677 ON ((v_3578.var_reviewer = v_6677.var_reviewer) OR (v_3578.var_reviewer IS NULL) OR (v_6677.var_reviewer IS NULL)) ) v_6633 ON ((v_750.var_review = v_6633.var_review) OR (v_750.var_review IS NULL) OR (v_6633.var_review IS NULL)) ) v_6163 ON ((v_208.var_review = v_6163.var_review) OR (v_208.var_review IS NULL) OR (v_6163.var_review IS NULL)) ) v_4265 ON ((v_3150.var_review = v_4265.var_review) OR (v_3150.var_review IS NULL) OR (v_4265.var_review IS NULL)) ) v_3393 ON ((v_7134.var_review = v_3393.var_review) OR (v_7134.var_review IS NULL) OR (v_3393.var_review IS NULL)) ) v_2323 ON ((v_4166.var_review = v_2323.var_review) OR (v_4166.var_review IS NULL) OR (v_2323.var_review IS NULL)) ) v_7100LEFT JOIN (SELECT v_1061.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_1061.RATING1 AS var_rating1FROM REVIEW v_1061WHERE (v_1061.RATING1 IS NOT NULL) ) v_6802 ON ((v_7100.var_review = v_6802.var_review) OR (v_7100.var_review IS NULL) OR (v_6802.var_review IS NULL)) ) v_8909LEFT JOIN (SELECT v_4863.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2' AS uri_rating21477446253, v_4863.RATING2 AS var_rating2FROM REVIEW v_4863WHERE (v_4863.RATING2 IS NOT NULL) ) v_5037 ON ((v_8909.var_review = v_5037.var_review) OR (v_8909.var_review IS NULL) OR (v_5037.var_review IS NULL)) ) v_6537LEFT JOIN (SELECT v_9539.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, v_9539.RATING3 AS var_rating3FROM REVIEW v_9539WHERE (v_9539.RATING3 IS NOT NULL) ) v_2592 ON ((v_6537.var_review = v_2592.var_review) OR (v_6537.var_review IS NULL) OR (v_2592.var_review IS NULL)) ) v_7634LEFT JOIN (SELECT v_1309.REVIEWID AS var_review, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4' AS uri_rating41477446315, v_1309.RATING4 AS var_rating4FROM REVIEW v_1309WHERE (v_1309.RATING4 IS NOT NULL) ) v_219 ON ((v_7634.var_review = v_219.var_review) OR (v_7634.var_review IS NULL) OR (v_219.var_review IS NULL)) ) v_9704WHERE (var_lang = 'en')ORDER BY var_reviewDate DESC ) v_3839
An example. BSBM08
21
OUR APPROACHSELECT var_rating2 AS rating2, var_reviewDate AS reviewDate, var_rating4 AS rating4, var_rating1 AS rating1, var_reviewer AS reviewer, var_rating3 AS rating3, var_reviewerName AS reviewerName, var_text AS text, var_title AS titleFROM (SELECT * FROM (SELECT v_2660.var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate, var_review AS var_review, uri_rating31477446284 AS uri_rating31477446284, uri_rating21477446253 AS uri_rating21477446253, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3, uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354, uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, var_reviewerName AS var_reviewerName, var_lang AS var_lang, uri_Product555472014519903 AS uri_Product555472014519903, var_rating2 AS var_rating2, uri_rating41477446315 AS uri_rating41477446315, var_title AS var_title, var_rating4 AS var_rating4, var_text AS var_text, uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120, uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_8722.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS uri_name1396749066, v_8722.NAME AS var_reviewerNameFROM PERSON v_8722WHERE (v_8722.NAME IS NOT NULL) ) v_2660INNER JOIN (SELECT v_3353.REVIEWDATE AS var_reviewDate, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_3353.REVIEWID AS var_review, v_3353.TEXT AS var_text, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952, v_3353.RATING1 AS var_rating1, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2' AS uri_rating21477446253, v_3353.TITLE AS var_title, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewFor' AS uri_reviewFor1499735727, v_3353.PERSONID AS var_reviewer, v_3353.RATING3 AS var_rating3, v_3353.PRODUCTID AS uri_Product555472014519903, v_3353.RATING4 AS var_rating4, v_3353.LANGUAGE AS var_lang, v_3353.RATING2 AS var_rating2, 'http://purl.org/stuff/rev#text' AS uri_text1457367120, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4' AS uri_rating41477446315, 'http://purl.org/dc/elements/1.1/title' AS uri_title1963229325FROM REVIEW v_3353WHERE ((v_3353.PRODUCTID = 55547) AND (v_3353.TEXT IS NOT NULL) AND (v_3353.TITLE IS NOT NULL) AND (v_3353.PERSONID IS NOT NULL) AND (v_3353.LANGUAGE IS NOT NULL) AND (v_3353.REVIEWDATE IS NOT NULL)) ) v_3049 ON ((v_2660.var_reviewer = v_3049.var_reviewer) OR (v_2660.var_reviewer IS NULL) OR (v_3049.var_reviewer IS NULL)) ) v_3795WHERE (var_lang = 'en')ORDER BY var_reviewDate DESC ) v_5787
Analysis with BSBM
22
SQL Server
mySQL
Ongoing work
• Writing the paper describing our optimisations
• Proposing a comprehensive benchmarking platform to test R2RML-compliant query rewriting systems• Extending our current work on the R2RML implementation
testcases
23
In other words, what happens if our data sources are not static, but data streams. Can we still use similar techniques?
3. Ontology-based sensor query rewriting
24
An example: SmartCities
25SmartSantander Project
Environmental sensors
Parking sensors
Data from the Web
26
Emergency planner
Flood risk alert: South East
England
forecastswave data Environmental
defenses
I have to make sense out of all this
data
Heterogeneity
Continuous querying
Streaming data
Ingredients for Linked Sensor Data
Core ontological model
Additional domain ontologies
Guidelines for generation of identifiers
Sensor Web programming interfaces
Query processing engines
http://www.flickr.com/photos/santos/2252824606/
Skeleton
Device
Deployment
PlatformSite
System
System
onPlatform only
hasSubsystem only, someSurvivalRang
e
hasSurvivalRange only
OperatingRangehasOperatingRange only
hasDeployment only
DeploymentRelatedProcess
Deployment
deploymentProcesPart only
deployedSystem only
Platform
deployedOnPlatform only
attachedSystem only
Device
Sensor
SensingDevice
Sensing
implements some
observes only
hasMeasurementCapability only
inDeployment only
SensorInput
detects only
isProxyFor onlyObservationValu
e
SensorOutput
hasValue some
isProducedBy some
Process
Process
hasInput only
hasOutput only, some
Input
Output
Observation
observedBy only
featureOfInterest only
observationResult only
Property
observedProperty onlyhasProperty only, some
isPropertyOf some
sensingMethodUsed only
includesEvent some
FeatureOfInterest
ConstraintBlock
Condition
inCondition only
MeasuringCapability
MeasurementCapability
forProperty only
OperatingRestriction
inCondition only
Data
Overview of the SSN ontology
Compton M, Barnaghi P, Bermúdez L, García-Castro R, Corcho O, Cox S, Graybeal J, Hauswirth M, Henson C, Herzog A, Huang V, Janowicz K, Kelsey WD, Le Phuoc D, Lefort L, Leggieri M, Neuhaus H, Nikolov A, Page K, Passant A, Sheth A, Taylor K. The SSN Ontology of the W3C Semantic Sensor Network Incubator Group. Journal of Web Semantics. In press
SSN Ontology with other Ontologies
29
García-Castro R, Corcho O, Hill C. A Core Ontology Model for Semantic Sensor Web Infrastructures. International Journal of Semantic Web and Information Systems 8(1):22-42
Queries to Sensor Data
30
SNEEqlRSTREAM SELECT id, speed, direction FROM wind [NOW];
Esper QLSELECT wind_speed FROM wind_sensor.win:time(10 min)
GSN RESTful servicehttp://montblanc.slf.ch:22001/multidata?vs[0]=wind_sensor&field[0]=wind_speed&
from=15/09/2011+05:00:00&to=15/09/2011+15:00:00
Pachube RESTful servicehttp://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-
02T14:01:46Z&end=2011-09-02T17:01:46Z
Data Stream Mgmt System
Complex Event Processors
Sensor Data Middleware
Querying through ontologies?
SPARQL-Stream
31
SELECT ?windspeed ?tidespeed FROM NAMED STREAM <http://swiss-experiment.ch/data#WannengratSensors.srdf> [NOW-10 MINUTES TO NOW-0 MINUTES] WHERE { ?WaveObs a ssn:Observation; ssn:observationResult ?windspeed; ssn:observedProperty sweetSpeed:WindSpeed. ?TideObs a ssn:Observation; ssn:observationResult ?tidespeed; ssn:observedProperty sweetSpeed:TideSpeed. FILTER (?tidespeed<?windspeed)}
Query processing closer to data
Use ontologies as conceptual model
Query virtual stream graphs
SPARQL-StreamSELECT ?name ( AVG(?temperature) AS ?avgTemperature )
( AVG(?humidity) AS ?avgHumidity )
FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS SLIDE 1 HOURS]
FROM <http://www.cwi.nl/SRBench/sensors>
FROM <http://www.cwi.nl/SRBench/geonames>
WHERE {
?sensor om-owl:generatedObservation ?temperatureObservation;
om-owl:generatedObservation ?humidityObservation;
om-owl:hasLocatedNearRel [ om-owl:hasLocation ?nearbyLocation ] .
?temperatureObservation om-owl:observedProperty weather:_AirTemperature ;
om-owl:result [ om-owl:floatValue ?temperature ] .
?humidityObservation om-owl:observedProperty weather:_RelativeHumidity ;
om-owl:result [ om-owl:floatValue ?humidity ] .
{ SELECT ?name
WHERE {
?nearbyLocation gn:featureClass ?featureClass ;
gn:name | gn:officialName ?name ;
gn:population ?population .
FILTER ( ?population > 15000 && REGEX(?featureClass, “P” , “i") )
}
}
UNION
{ SELECT ?name
WHERE {
?nearbyLocation gn:parentFeature+ ?parentFeature .
?parentFeature gn:featureClass ?parentClass ;
gn:name | gn:officialName ?name ;
gn:population ?parentPopulation .
FILTER ( ?parentPopulation > 15000 && REGEX(?parentClass, “P” , “i") )
}
}} GROUP BY ?name
32
Aggregates
Static & Streaming
Windows
Filters, Functions
Disclaimer: some features NYI
33
Querying the ObservationsSELECT ?waveheightFROM STREAM <www.ssg4env.eu/SensorReadings.srdf> [NOW -10 MINUTES TO NOW STEP 1 MINUTE]WHERE { ?WaveObs a sea:WaveHeightObservation; sea:hasValue ?waveheight; }
Query Rewriting
Query Processing
Clie
nt
Mappings
SPARQLStream
[tuples]
Sensor Network
Data translation[triples]
GSN API
:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7"; rr:subjectMap [ rr:template "http://swissex.ch/ns#WindSpeed/Wan7/{timed}"; rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate ssn:hasQuantityValue ]; rr:objectMap[ rr:column "sp_wind" ] ];
R2RML Mappings
http://montblanc.slf.ch :22001/ multidata ?vs [0]= wan7 &field [0]= sp_wind
Query processing engines
Rewriting to different technologies
SELECT ?windspeed FROM NAMED STREAM <http://swiss-
experiment.ch/data#WannengratSensors.srdf> [NOW-10 MINUTE TO NOW-0 MINUTE] WHERE { ?WaveObs a ssn:Observation; ssn:observationResult ?windspeed; ssn:observedProperty sweetSpeed:WindSpeed. }
34
http://montblanc.slf.ch:22001/multidata?vs[0]=wan7& field[0]=wind_speed_scalar_av&
from=15/05/2011+05:00:00&to=15/05/2011+15:00:00
http://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-02T14:01:46Z&end=2011-09-02T17:01:46Z
SELECT wind_speed_scalar_av, timed FROM wan7.win:time(10 min)
SELECT wan7.wind_speed_scalar_av AS windspeed, wan7.timed AS windts FROM wan7[FROM NOW-10 MINUTES TO NOW]
Query Rewriting
Algebra representatio
n
SNEE (DSMS)
Esper (CEP)
GSN (Middleware)
Pachube (Middleware)
Calbimonte JP, Corcho O, Yeung H, Aberer K. Enabling Query Technologies for the Semantic Sensor Web. International Journal of Semantic Web and Information Systems 8(1):43-63
Ongoing work
• Benchmarking of ontology-based streaming data engines• Zhang Y, Pham MD, Corcho O, Calbimonte JP. SRBench: A
Streaming RDF/SPARQL Benchmark. Proceedings of the 11th International Semantic Web Conference (ISWC2012)
• Improve optimisations when joining static and streaming data
• Automatic characterisation of sensor data streams• Useful in citizen science approaches (e.g., AirQualityEgg)• Calbimonte JP, Yan Z, Jeung H, Corcho O, Aberer K.
Deriving Semantic Sensor Metadata from Raw Measurements. ISWC2012 5th International Workshop on Semantic Sensor Networks 2011 (SSN2012). CEUR Workshop Proceedings, Vol-904, http://ceur-ws.org/Vol-904/
35
In other words, how can we access data from federated data sources
4. Federated query processing
36
Example
• We query the life science domain1. Using the Pubmed references obtained from the GeneID
gene dataset, retrieve information about genes and their references in the Pubmed dataset.
2. From Pubmed we access the information in the National Library of Medicines controlled vocabulary thesaurus, stored at the MeSH endpoint, so we have more complete information about such genes.
3. Finally, we also access the HHPID endpoint, which is the knowledge base for the HIV-1 protein.
37
Introduction
• Question:• How can we access such amount of RDF data in an
integrated manner?
• Current approaches• Replicate data in local stores, access it using existing RDF
databases.• Execute individual queries and manually join data.• Use existing distributed query systems (starting to appear).
38
Problem
• Existing tools for distributed SPARQL query processing differ in the way of handling distribution• SPARQL-published the Federated Query Document Last
Call Working Draft• It homogenises the access to distributed RDF data
repositories• SERVICE <http://dbpedia.org/sparql> {...}
• Problems in semantics: SERVICE ?X not well defined
• Current Access to SPARQL endpoints is not optimal• Work on SPARQL distributed query optimization is beginning
39
State of the Art• ANAPSID, RDF::Query, OpenAnzo, ARQ, Rasqal
RDF Query Library• ANAPSID provides SPARQL optimization based on
adaptive query processing operators• RDF::Query provides basic pattern reordering
• Implement the federation using query predicates• List of SPARQL endpoints needed• Helps user to direct queries to
remote datasets• FedX, SPLENDID, SemWIQ,
NetworkedGraphs• All provide basic optimisations: pattern
grouping (FedX), cost based optimizations(SemWIQ, SPLENDID and recently FedX, NetworkedGraphs)
• SPARQL 1.1 is mostly syntactic sugar
40
Assumptions & Restrictions
• Assumptions1. Users know how to create a
query to the endpoints
2. No statistics of any kind are available for the query processing system.
3. Data are distributed
• Restrictions1. We only consider the
Federation Extension of SPARQL 1.1
2. We are not aware of the capabilities or implementation of the remote SPARQL server
3. No registry of endpoints
41
SERVICE Semantics
• We extend [PAG09] with the semantics for SERVICE:
Example:SELECT ?name ?email
WHERE { SERVICE <http://example1.org/sparql>
{?y :name ?name} . SERVICE <http://example2.org/sparql>
{?y :email ?email}}
SELECT ?name ?emailWHERE {
?y :name ?name . ?y :email ?email
}
42
SERVICE Semantics
Example:SELECT ?nameWHERE { SERVICE ?X {?y :name ?name} }
43
SPARQL Optimisation - OPTIONAL
• We assume that we have no statistics of endpoints• This means that we cannot use cost-based optimisations• We will only focus on static optimisations
• Besides the usual static optimisations (e.g. Pushing down filters) SPARQL queries can be optimised if they contain OPTIONAL operators• The OPTIONAL operator is responsible for PSPACE-
completeness in SPARQL [PAG09]
• OPTIONAL is a key operator in SPARQL
44
Well-designed patterns
• Well-designed SPARQL patterns [PAG09]• Class of SPARQL patterns which adds a restriction
45
Well-designed Patterns
• We extended the notion of well-designed patterns for the SPARQL 1.1 Federation Extension• The previous rules also hold for SERVICE
46
Implementation: SPARQL-DQP
• SPARQL-DQP is implemented on top of OGSA-DAI and OGSA-DQP• OGSA-DAI is a Web service-based framework for accessing
distributed data resources• OGSA-DQP adds distributed query processing infrastructure
• We reuse some OGSA-DQP operators• We added RDF and SPARQL endpoint data access
• RDFB2RDF data resource • RDF data resource • SPARQL endpoint resources
• Good behaviour for large datasets
47
Buil C, Arenas M, Corcho O. Semantics and optimization of the SPARQL 1.1 federation extension. Proceedings of the 8th Extended Semantic Web Conference (ESWC2011). Springer-Verlag LNCS 6644, pages 1-15
Ongoing Work
• An extensive benchmark has been produced• Montoya G, Vidal ME, Corcho O, Ruckhaus E, Buil-Aranda
C. Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? In: Proceedings of the 11th International Semantic Web Conference (ISWC2012)
• Focusing now on Adaptive Query Processing• Query Processing should be adapted to the user's specific
needs and specific network requirements
48
In other words, how can we take into account the existence of ontologies in the query rewriting process, so as to provide simple entailment
5. Entailment in query rewriting
49
Main approaches in the state of the art
50
Expressiveness Author System Output
ELHIO¬ Pérez-Urbina et al. REQUIEM [R] Datalog, UCQ
Sticky-join [linear] datalog± Gottlob et al. Nyaya UCQ
DL-LiteR, DL-LiteF Calvanese et al. QuOnto UCQ
DL-LiteR Chortaras et al. Rapid UCQ
DL-LiteR [+EBox] Rosati et al.Presto & Prexto
NR-Datalog & UCQ
Optimizations in the rewriting
51
• The rewriting can be optimized in several ways• Ontology preprocessing• Subsumption checks• Prioritize inferences• Constrain the searches
Our proposal
52José Mora
(Under embargo)Paper in preparation
Conclusion and Future Work
• We have proposed some small incremental improvements over the current state of the art in entailment-aware query rewriting• Need to integrate it with the rest of our work• This will happen during Fall 2012
53
Final conclusions and future work
54
Ingredients
55
Linked Open Data Spreadsheets
1 RDB2RDF
4 Federated Query Processing
5 Reasoning
3 Optimisations2 Sensor-based
query rewriting
Data integration at our group: ingredients and some
prospects
Credible workshopSophia-Antipolis, October 15th 2012
Oscar [email protected]
Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain
With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos
Buil-Aranda (OEG-UPM, now at PUC Chile)