review: scalable semantic web data management using vertical partitioning

27
Abadi, Marcus, Madden, Hollenbach VLDB 2007 Presented by: {Gui}llermo Cabrera The University of Texas at Austin

Upload: guillermo-cabrera

Post on 09-Jul-2015

496 views

Category:

Technology


0 download

DESCRIPTION

Part of the Semantic Web, Ontologies and the Cloud class at The University of Texas at Austin's Computer Science department during Spring 2010 term

TRANSCRIPT

Page 1: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Abadi, Marcus, Madden, HollenbachVLDB 2007

Presented by: {Gui}llermo CabreraThe University of Texas at Austin

Page 2: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Problem

Storage Goal

RDBMS use

RDF Physical Organization

Column store vs. Row Store

Materialized Path Expressions

Experiment & Results

Discussion

Page 3: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Performance: Self-joins

Many triples

Page 4: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Achieve scalability & performance in triple storage

Survey approaches in RDBMS

Benefits of vertical partition and column store

Page 5: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

1 table with 3 indexed columns?

Multi layer architecture◦ Translate -> Optimize -> Execute

Mapping tables for long URI and literals

Jena, Oracle, Sesame, 3store (Hyunjun),

Hexastore (Donghyuk)

Page 6: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Property tables◦ Clustered property table

Denormalize RDF (wider tables)

Clustering algorithm

NULL values

Page 7: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 8: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Property tables◦ Property-Class Tables

Exploit the type property

Properties may exist in multiple tables

Page 9: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 10: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Advantage:◦ Fewer joins

Disadvantage:◦ NULL values

◦ Multivalued attributes are complicated

Page 11: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Vertical Partition◦ n two-column tables, n = # of unique properties

◦ Table sorted by subject

Merge join

Page 12: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 13: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

• Advantage

Multi valued attributes supported

No clustering algorithm (Property tables)

Only accessed properties are read

• Disadvantage

Use of multiple properties (table joins)

Inserts expensive

Page 14: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Triple Store

Property Table

Vertical Partition (Row Store)

Vertical Partition Store (Column Store)

Page 15: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Why?

Projection is free

Tuple headers (metadata on row)◦ 35 bytes in Postgres vs. 8 bytes in C-Store

Column oriented compression◦ Run-length encoding (ex. 1,1,1,2,2 1x3, 2x2)

Optimized merge join◦ Prefetching

Page 16: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

<BookID1, Author, http://preamble/FoxJoe>

<http://preamble/FoxJoe,wasBorn, “1860”>

Find all books whose authors were born in 1860

Page 17: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 18: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Barton Libraries Dataset

Longwell Queries◦ Calculating counts

◦ Filtering

◦ Inference

Page 19: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

8.3 GB – Triple Store (Postgres)

14 GB – Property Table (Postgres)

5.2 GB – Vertically Partitioned (Postgres)

2.7 GB – Vertically Partitioned (C-store)

Including indices and mapping table

Page 20: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 21: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 22: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 23: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Replace ◦ subject-object joins subject-subject joins

Page 24: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Add 60 integer valued columns

7 GB increase in size

Page 25: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Great for reads, writes not considered

What about load times?

Using another benchmark (ex. LUBM)?

Native XML databases for RDF/XML?

Test triple store in Sesame

Page 26: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 27: Review: Scalable Semantic Web Data Management Using Vertical Partitioning