Data Integration at the Ontology Engineering Group

Download Data Integration at the Ontology Engineering Group

Post on 19-Jan-2015




0 download

Embed Size (px)


Presentation done on the work being done on Data Integration at OEG-UPM (, for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).


  • 1. Data integration at our group: ingredients and someprospectsCredible workshop Sophia-Antipolis, October 15th 2012 Oscar Facultad de Informtica, Universidad Politcnica de MadridCampus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain With contributions from: Jos Mora (OEG-UPM), Boris Villazn-Terrazas (OEG-UPM, now atiSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos Buil-Aranda (OEG-UPM, now at PUC Chile)

2. Our data integration needs, problems (and challenges) And data may be available from data streams (e.g., sensors) Need to submit SPARQL queries intodistributed SPARQL endpointsNeed to access heterogeneous relationaldata sources (mainly in the area of Geography) Some of the databases are available in different DBMSs And some of the data sources are available as spreadsheets Furthermore, many of these datasets are already published as Linked Data2 3. Ingredients 100 80 thin applications (mas hups ) middleware 60 5 Reasoning Este 40 s emantic data integration and queryingOeste 201 RDB2RDFNorte 01er3er 2 legacy Sensor-based3query rewriting Optimisations data s ources trim. regis tries s ens or networkstrim. Federated Query4ProcessingLinked Open Data SpreadsheetsFrom SemsorGrid4Env architecture ( 3 4. Disclaimer When I talk about ontology-based querying,I will be normally talking about SPARQL querying4 5. 1. RDB2RDFIn other words, how to make relational data available asRDF (and connected to ontologies)5 6. RDB2RDF. Motivation A majority of dynamic Web content is backed by relational databases (RDB),and so are many enterprise systems. transformationtransformation enginedescription 6 7. RDB2RDF. Query rewriting for OBDA with mappingsQ Rewriting MappingsQThere may be some mappings to translate between ontology and DB.The rewriting should consider those mappings. 7 8. RDB2RDF. Existing approaches121. To build a new ontology from a database schema and content (direct mappings)2. To map the ontology created in approach (1) to a legacy ontology3. To map an existing DB to a legacy ontology3new ontologyexisting ontology 9. OEGs background knowledge in RDB2RDF R2O and ODEMapster GaV wrapper generation (no mediators) Syntactic sugar for the generation of SQL queries. Simple use of this language and processor in the domains of fund finding, cultural information, and fisheries. NeOn Toolkit plugin for common mappingsBarrasa J, Corcho O, Gmez-Prez A. (2004)R2O, an extensible and semantically baseddatabase-to-ontology mapping language. In:Proceedings of the Second Workshop onSemantic Web and Databases, SWDB 2004. 9 10. R2O (Relational-to-Ontology) LanguageFor concepts...Oneormore conceptscan be extracted from aA view maps exactlysingle data field (not one concept in thein 1NF). ontology.For attributes... A subset of theA column in acolumns in the view database view mapsmap a concept in thedirectly an attributeontology. or a relation.A subset (selection) ofthe records of a A column in adatabase view map adatabase view mapsconceptinthe an attribute or aontology.relation after some transformation.A subset of therecords of a databaseview map a conceptin the onto. but the A set of columns in aselection cannot bedatabase view mapmade using attribute or a relation. 11. The W3C RDB2RDF Working Group Created in 2007 W3C Recommendations inSeptember 2012 R2RML: RDB to RDF Mapping Language - Direct Mapping - direct-mapping/ R2RML and Direct Mapping Test Cases - 2rdf/test-cases/ RDB2RDF Implementation Report - 2rdf/implementation-report/ 11 12. R2RML example12 13. Existing implementations OEG implementations Implementation Report. Boris Villazn-Terrazas, Michael Hausenblas. 13 14. Ongoing work Provide a list of common patterns in R2RMLtransformations, so that they can be reused(increasing productivity) Sequeda J, Priyatna F, Villazn-Terrazas B. Relational Database to RDF Mapping Patterns. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012). Villazn-Terrazas B, Priyatna F. Building Ontologies by using Re-engineering Patterns and R2RML Mappings. In: Proceedings of the 3rd Workshop on Ontology Patterns (WOP2012).Priyatna Improve our support at Morph for all test cases Adapt existing GUIs for the generation of mappings(such as NeOn Toolkits one).14 15. 2. R2RML queryrewriting optimisationsIn other words, how to make this query rewritingoptimised, so that we dont suffer from a bad efficiencyin our results15 16. R2RML is now a W3C Recommendation Thats very good to ensure wide uptake, but Implementations still suffer from their lack ofefficiency UltraWrap has shown that a similar performance can be obtained with direct mappings on high-end databases (Oracle, SQL Server) What happens with low-end databases (mySQL)?16 17. Several works on SPARQL to SQL translation Barrasa J, Corcho O, Gmez-Prez A. (2004) R2O, anextensible and semantically based database-to-ontologymapping language. In: Proceedings of the Second Workshop onSemantic Web and Databases, SWDB 2004. R. Cyganiak. A relational algebra for sparql. Digital MediaSystems Laboratory. HP Laboratories Bristol. HPL-2005-170,2005. B. Elliott, E. Cheng, C. Thomas-Ogbuji, and Z.M. Ozsoyoglu. Acomplete translation from sparql into ecient sql. In Proceedingsof the 2009 International Database Engineering & ApplicationsSymposium, pages 31-42. ACM, 2009. A. Chebotko, S. Lu, and F. Fotouhi. Semantics preservingsparql-to-sql translation. Data & Knowledge Engineering,68(10):973-1000, 2009. 17 18. Chebotkos query rewriting18 19. Our proposal19 20. An example. BSBM08NATIVESELECT r.title, r.text, r.reviewDate, p.personID,, r.rating1, r.rating2, r.rating3, r.rating4FROM review r, person pWHERE r.productID=55547 AND r.personID=p.personID AND r.language=enORDER BY r.reviewDate descCHEBOTKOSELECT var_rating2 AS rating2, var_reviewerName AS reviewerName, var_title AS title, var_rating1AS rating1, var_reviewDate AS reviewDate, var_reviewer AS reviewer, var_rating3 AS rating3,var_rating4 AS rating4, var_text AS textFROM (SELECT *FROM (SELECT uri_rating41477446315 AS uri_rating41477446315, var_rating2 AS var_rating2,var_reviewer AS var_reviewer, uri_reviewDate750573656 AS uri_reviewDate750573656, var_rating4AS var_rating4, var_rating1 AS var_rating1, var_text AS var_text, uri_title1963229325 ASuri_title1963229325, var_rating3 AS var_rating3, uri_reviewer2088452952 ASuri_reviewer2088452952, uri_rating21477446253 AS uri_rating21477446253, uri_text1457367120 ASuri_text1457367120, uri_rating31477446284 AS uri_rating31477446284, uri_rating11477446222 ASuri_rating11477446222, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_reviewDate ASvar_reviewDate, var_title AS var_title, uri_language269987354 AS uri_language269987354,uri_Product555472014519903 AS uri_Product555472014519903, v_7634.var_review AS var_review,var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, var_langAS var_langFROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, v_6537.var_review ASvar_review, uri_rating11477446222 AS uri_rating11477446222, uri_rating31477446284 ASuri_rating31477446284, uri_Product555472014519903 AS uri_Product555472014519903,uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating2 AS var_rating2,20 21. An example. BSBM08OUR APPROACHSELECT var_rating2 AS rating2, var_reviewDate AS reviewDate, var_rating4 AS rating4, var_rating1AS rating1, var_reviewer AS reviewer, var_rating3 AS rating3, var_reviewerName AS reviewerName,var_text AS text, var_title AS titleFROM (SELECT *FROM (SELECT v_2660.var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate,var_review AS var_review, uri_rating31477446284 AS uri_rating31477446284, uri_rating21477446253AS uri_rating21477446253, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3,uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 ASuri_reviewFor1499735727, uri_language269987354 AS uri_language269987354,uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, var_reviewerName ASvar_reviewerName, var_lang AS var_lang, uri_Product555472014519903 ASuri_Product555472014519903, var_rating2 AS var_rating2, uri_rating41477446315 ASuri_rating41477446315, var_title AS var_title, var_rating4 AS var_rating4, var_text AS var_text,uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120,uri_reviewer2088452952 AS uri_reviewer2088452952FROM (SELECT v_8722.PERSONID AS var_reviewer, ASuri_name1396749066, v_8722.NAME AS var_reviewerNameFROM PERSON v_8722WHERE (v_8722.NAME IS NOT NULL) ) v_2660INNER JOIN (SELECT v_3353.REVIEWDATE AS var_reviewDate, AS uri_rating11477446222, v_3353.REVIEWID ASvar_review, v_3353.TEXT AS var_text, AS uri_reviewer2088452952,v_3353.RATING1 AS var_rating1, uri_rating21477446253, v_3353.TITLE AS var_title, AS uri_language269987354, AS uri_reviewDate750573656, AS uri_rating31477446284, http://www4.wiwiss.fu- 21 22. Analysis with BSBM SQL Server mySQL22 23. Ongoing work Writing the paper describing our optimisations Proposing a comprehensive benchmarking platformto test R2RML-compliant query rewriting systems Extending our current work on the R2