relational databases to rdf (a.k.a rdb2rdf) juan f. sequeda dept of computer science university of...

31
Relational Databases Relational Databases to RDF to RDF (a.k.a RDB2RDF) (a.k.a RDB2RDF) Juan F. Sequeda Juan F. Sequeda Dept of Computer Science Dept of Computer Science University of Texas at University of Texas at Austin Austin

Upload: madeline-brown

Post on 13-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Relational Databases Relational Databases to RDFto RDF

(a.k.a RDB2RDF)(a.k.a RDB2RDF)

Juan F. SequedaJuan F. Sequeda

Dept of Computer ScienceDept of Computer Science

University of Texas at AustinUniversity of Texas at Austin

Page 2: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

I want RDF… but my data is I want RDF… but my data is in RDB!in RDB!

2

Page 3: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Why RDB2RDF?Why RDB2RDF?• Semantic Web

– Deep Web is 500 times bigger than Static Web (2008)

– Where do you think that the majority of the data is stored?

– If we want a Semantic Web, we need data to be on the web as RDF and interlinked!•Where do you think this data is going

to come from?

Page 4: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDB

RDBRDBRDBRDB

Page 5: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDB

RDBRDB

RDBRDB

RDB2RDF

RDB2RDF

RDB2RDFRDB2RDF

RDB2RDF

RDB2RDF

Page 6: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Why RDB2RDF?Why RDB2RDF?• Data Integration

– Do you know why RDF is cool?•because it’s a graph!

– How do link/integrate two different graphs?•add edges between nodes or merge

nodes!

Page 7: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

• Boss: Find me clients that are based in cities who have a population less than 1 million?

• You: ???

id Name c_id

10 ACME Inc

20

11 Foo Bars 21

c_id

city state

20 Austin

TX

21 Dallas TX

Clients Locations

Real world scenarioReal world scenario

Page 8: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

• You: I found the population information… but it’s in a different database. Can you add a column to the Location table in order to insert the new data?

• DBA: NO!

id city state

pop

1 Austin

TX 790390

2 Dallas

TX 1197816

Location

Real world scenarioReal world scenario

id Name c_id

10 ACME Inc

20

11 Foo Bars 21

c_id

city state

20 Austin

TX

21 Dallas TX

Clients Locations

Page 9: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

id city state

pop

1 Austin

TX 790390

2 Dallas

TX 1197816

Location

http://db1/

client10

http://db1/

client10

http://db1/

client11

http://db1/

client11

http://db1/loc20

http://db1/loc20

http://db1/loc21

http://db1/loc21

ACME Inc

ACME Inc

Foo BarsFoo Bars

AustinAustin TXTX

Dallas

Dallas TXTX

790390790390

1197816

1197816

ex:Client

ex:Client

ex:basedIn

ex:basedInex:pop

ex:stateex:city

ex:city ex:state

ex:name

ex:name

rdf:type

rdf:type

http://db2/loc1

http://db2/loc1

Austin

Austin TXTX

ex:stateex:city

http://db2/loc2

http://db2/loc2

DallasDallas TXTX

ex:stateex:city

ex:pop

id Name c_id

10 ACME Inc

20

11 Foo Bars 21

c_id

city state

20 Austin

TX

21 Dallas TX

Clients Locations

Page 10: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

id city state

pop

1 Austin

TX 790390

2 Dallas

TX 1197816

Location

http://db1/

client10

http://db1/

client10

http://db1/

client11

http://db1/

client11

ACME Inc

ACME Inc

Foo BarsFoo Bars

790390790390

1197816

1197816

ex:Client

ex:Client

ex:basedIn

ex:basedInex:pop

ex:name

ex:name

rdf:type

rdf:type

http://db2/loc1

http://db2/loc1

Austin

Austin

TXTX

ex:stateex:city

http://db2/loc2

http://db2/loc2

DallasDallas TXTX

ex:stateex:city

ex:pop

id Name c_id

10 ACME Inc

20

11 Foo Bars 21

c_id

city state

20 Austin

TX

21 Dallas TX

Clients Locations

Page 11: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

A bit of historyA bit of history• Relational Databases on the Web.

TimBL, 1998• W3C Workshop on RDF Access to

Relational Databases, October 2007– Report: http://www.w3.org/2007/03/RdfRDB/report

• W3C RDB2RDF Incubator Group, 2008-2009– Survey:

http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf

• W3C RDB2RDF Working Group, 2009 – today– R2RML: RDB to RDF Mapping Language– A Direct Mapping of Relational Data to

RDF

Page 12: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDB and the Semantic WebRDB and the Semantic Web

12

RDF

RDFS

OWL

RIF

Page 13: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDB and the Semantic WebRDB and the Semantic Web

13

RELATIONAL MODEL

TABLE DEFINITION

CONSTRAINTS

TRIGGERS

Page 14: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDB and the Semantic WebRDB and the Semantic Web

14

RELATIONAL MODEL

TABLE DEFINITION

CONSTRAINTS

TRIGGERS

RDF

RDFS

OWL

RIF

Page 15: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

OverviewOverview

Page 16: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

R2RML: RDB to RDF Mapping R2RML: RDB to RDF Mapping LanguageLanguage• Language for expressing

customized mappings from relational databases to RDF datasets

• Give precise control to the developer– You create the structure you want– You choose the target vocabulary

• No RDFS/OWL is created from the schema

16

Page 17: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDB

RDF

R2RML

R2RML

manual

R2RML MappingR2RML Mapping

Page 18: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Direct MappingDirect Mapping• Automatic transformation from Relational

Database to RDF– Click a button… Voila!

• Generate RDFS/OWL of the database schema

• If this doesn’t get you where you want…use existing languages for mapping– RDF to RDF with RIF or SPARQL Construct

• Semantic Web community

– Create SQL Views and directly map those• Database community

18

Page 19: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDBDirect Mapping

RDFRIF/

SPARQLConstruct

automatic

RDF

Direct MappingDirect Mapping

SQL Views

Page 20: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

HybridHybrid• Instead of starting from a blank

R2RML file…• 1) Direct Mapping• 2) Manual Editing

20

Page 21: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDB

RDF

Direct Mappin

g in R2RML

Direct Mappin

g in R2RML

R2RML

R2RML

Direct Mapping

Modify

Hybrid MappingHybrid Mapping

Page 22: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Materialize TriplesMaterialize Triples• Data is not dynamic• Dump RDB into RDF and then

insert into triplestore• RDF dump may not be

consistent with RDB

22

Page 23: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDB

RDF

Dump

SPARQL

SPARQL

Materialized TriplesMaterialized Triples

Page 24: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Virtual TriplesVirtual Triples• Data is dynamic• Need to query RDB with SPARQL• Translate SPARQL to SQL

– Comparing the overall performance […] of the fastest rewriter with the fastest relational database shows an overhead for query rewriting of 106%. This is an indicator that there is still room for improving the rewriting algorithms [Bizer and Schultz 2009]

– Current rdb2rdf systems are not capable of providing the query execution performance required [...] it is likely that with more work on query translation, suitable mechanisms for translating queries could be developed. These mechanisms should focus on exploiting the underlying database system’s capabilities to optimize queries and process large quantities of structure data [Gray et al. 2009]

– Ultrawrap solves this

• RDF data is consistent with RDB data24

Page 25: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

RDBRDB

Mapping

SPARQL

SPARQL

Virtual TriplesVirtual Triples

RDF

Page 26: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Materialized

Triples

Virtual Triples

Direct Mapping

Custom Mapping

RDB2RDF SpaceRDB2RDF Space

Hybrid

Page 27: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Tuples to TriplesTuples to Triples

SID NAME AGE

1 Alice 25

2 Bob 26

SUBJECT

PREDICATE

OBJECT

http://ex.com/person1

http://ex.com/person1 2525http://ex.com/agehttp://ex.com/age

Page 28: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Current Status of W3C Current Status of W3C RDB2RDF WGRDB2RDF WG• R2RML: RDB to RDF Mapping Language

Working Draft http://www.w3.org/TR/r2rml/

• A Direct Mapping of Relational Data to RDFWorking Drafthttp://www.w3.org/TR/rdb-direct-mapping/

• Last Call: Sept 1 (hopefully)

28

Page 29: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

ImplementationsImplementations• Ultrawrap

– SPARQL and semantically equivalent SQL have equal execution time

– Commercial databases– http://ribs.csres.utexas.edu/ultrawrap

• Spyder– Oracle and HSQLDB– http://www.revelytix.com/content/spyder

• Other non-standard RDB2RDF– D2R Server, Virtuoso, Triplify, …

29

Page 30: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

PublicityPublicity

• International Semantic Web Conference– Oct 23 – 27 in Bonn, Germany

• Posters and Demos– August 15

• Consuming Linked Data Workshop– August 15

• Outrageous Ideas Track– Sept 5

• Semantic Web Challenge– Sept 30

• 2nd Linked Data-a-thon– Oct 1

30http://iswc2011.semanticweb.org/

Join the Facebook group

SSSW2011

Page 31: Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin

Thank YouThank You

@juansequeda

Acknowledgments:- RiBS @ UT Austin- W3C RDB2RDF WG members- David McNeil - Revelytix