gstore: answering sparql queries via subgraph matching
DESCRIPTION
gStore: Answering SPARQL Queries Via Subgraph Matching. 1 Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo. Lei Zou 1 , Jinghui Mo 1 , Lei Chen 2 , M. Tamer Özsu 3 , Dongyan Zhao 1. Outline. Background & Related Work Overview of gStore - PowerPoint PPT PresentationTRANSCRIPT
Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer Özsu3, Dongyan Zhao1
1
gStore: Answering SPARQL Queries Via Subgraph Matching
1Peking University,2Hong Kong University of Science and
Technology,3University of Waterloo
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
2
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
3
Semantic Web
4
“Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.
RDF Data Model
5
URI
URI
Literals
RDF Graph
6
Entity VertexLiteral Vertex
SPARQL Queries
7
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
Query Graph
Subgraph Match vs. SPARQL Queries
8
Naïve Triple Store
9
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SQL: Select T3.SubjectFrom T as T1, T as T2, T as T3Where T1.Predict=“BornOnDate” and T1.Object=“1809-02-12” and T2.Predict=“DiedOnDate” and T2.Object=“1865-04-15” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject
Too many Self-Joins
Existing Solutions Three categories of solutions are proposed to speed up query
processing: 1. Property Table; Jena [K. Wilkinson et al. SWDB 03], …
2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],…
3. Exhaustive-IndexingRDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],…
10
Existing Solutions-Property Table
11
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SQL: Select People.hasName from People where People.BornOnDate = “1809-02-12” and People.DiedOnDate = “1865-04-15”.
Reducing # of join steps
Existing Solutions-Vertically Partitioned Solution
12
Fast Merge Join
Existing Solutions- Exhaustive-Indexing
Each SPARQL query statement can be translated into one “range query”.
SPARQL Query: Select ?name Where {
?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
13
Range query &
Merge Join
Some Limitations
1. Difficult to handle ``wildcard queries’’.
2. Difficult to handle updates.
14
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
15
Intuition of gStore
16
Finding Matches over a Large Graph is not a trivial task.
Preliminaries
17
Entity VertexLiteral Vertex
Storage Schema in gStore
18
Encoding all neibhors into a “bit-string”, called signature.
Encoding Technique (1)
19
“Abr”, “bra”,
”rah”,
”aha”,….,
( hasName, “Abraham Lincoln”)
0010 0000 0000
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000
0000 0000 0000 0001
1000 0010 0100 0001
OR
1000 0010 0100 0001
( BornOnDate, “1809-02-12”)
0100 0000 0000 0100 0010 0100 1000
( DiedOnDate, “1865-04-15”)
0000 1000 0000 0000 0010 0100 0000
( DiedIn, “y:Washington_D.c”)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
Encoding Technique (2)
20
Encoding Technique (3)
21
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS-tree & Query Algorithm
• Experiments
• Conclusions
22
A Straightforward Solution (1)
23
001
004
006
002
003
006
u1 u2
L1 L2
A Straightforward Solution (2)
24
001
004
006
002
003
006
Large Join Space !
L1 L2
VS-tree
Pruning Technique
26
u1 u2
31d
34d
34d
32d
3G
10010
001
004
006
002
003
006
*G
Reduced Join
Space!
An Example for Pruning Effect
27
Query:?x1 y:hasGivenName ?x5 ?x1 y:hasFamilyName ?x6 ?x1 rdf:type <wordnet_scientist_110560637> ?x1 y:bornIn ?x2 ?x1 y:hasAcademicAdvisor ?x4 ?x2 y:locatedIn <Switzerland> ?x3 y:locatedIn <Germany> ?x4 y:bornIn ?x3
Before Pruning
After Pruning
x1 810 810
X2 424 197
x3 66 66
x4 36187 6686
Query Algorithm-Top-Down
28
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
29
Datasets
30
Triple # Size
Yago 20 million 3.1GB
DBLP 8 million 0.8 GB
Exact Queries
31
Wildcard Queries
32
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
33
Conclusions
• Vertex Encoding Technique;
• An Efficient index Structure: VS-tree;
• A Novel Filtering Technique.
34
Updates- Insertion in G*
36
Updates- Insertion in VS*-tree
37
Updates- Deletion in VS*-tree
38
To be deleted
Framework in gStore
39
A Straightforward Solution (1)
40
0000 1000u u & 001 = u