answering pattern queries using views yinghui wu uc santa barbara wenfei fan university of...
TRANSCRIPT
Answering pattern queries using views
Yinghui WuUC Santa
Barbara
Wenfei FanUniversity of EdinburghSouthwest Jiaotong University
Xin Wang
Real-life graph querying is expensive
2
social scale 100B (1011)
Web scale 1T (1012)
brain scale, 100T (1014)
Real-life scope
100M(108)
An NSA Big Graph experiment, P.Burkhardt, et al, US. National Security Agency, May 2013
Querying collaborative network
3
customer developer
project manager
query 1
Customer developer
query 2
PM 2PM 1
customer 2 developer 3developer 2
customer 2
developer 3
developer 2
customer 3
“Detecting Coordination Problems in Collaborative Software Development Environments”, Amrit Chintan et al, Information System management, 2010
customer developer
project manager
A collaborative pattern
PM 2PM 1
customer 2
customer 1
developer 2
developer 3
developer 1
customer 3
A collaborative (chat) network
developer k
customer 3 customer n
…
…
tester
expensive!
Answering query using views
4
query A
database D
database views V(D)
Q(D)query result query Q
A(V)query result
1995 2000 2011
relational algebra
2002
XPath
2007
XML
2006
tree pattern query
1998
regular path queries RDF/SPARQL
graph pattern query (bounded) simulation (our work)
When?
What to choose?
How to evaluate?
Outline
5
Graph pattern matching using views◦ When, what and how?
When a query can be evaluated using views?◦ Pattern containment: an iff condition
How to evaluate?◦ query answering using views
What to choose?◦ minimum containment & minimal containment
Extension: bounded simulation
Experimental Study
Conclusion
Graphs, patterns and views
6
customer developer
pattern query
customer 2
developer 3
developer 2
customer 3
query result
edges matches
(customer, developer) {(customer 2, developer 2),(customer 3, developer 3)}
(developer, customer){(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}
(view definition)
(view extension)
edges matches
(project manager, developer)
{(PM 1, developer 2),(PM 2, developer 3)}
(project manager, customer){(PM 1, customer 2),(PM 2, customer 2),
edges matches
(customer, developer) {(customer 2, developer 2),(customer 3, developer 3)}
(developer, customer){(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}
• binary relation
• node match: satisfies predicates
• edge match: connects two node matches
view definition 2
customer developer
project manager
customer developer
view definition 1
view 1
view 2
view extension 1
view extension 2
Graph pattern matching using views
7
Given a pattern query Q, and a set V of view definitions, find another query A s.t.
◦ A is equivalent to Q (A(G) = Q(G)) for all data graph G◦ A only refers to V and extensions V(G)
query A
data graph G
views V
Q(G)matchesquery Q
A(G)
8
When a pattern query can be answered using views?
Pattern containment
9
customer developer
project manager
customer developer
project managerView 1
customer developer
View 2
(customer, developer)
{(customer 2, developer 2),(customer 3, developer 3)}
(developer, customer)
{(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}
(project manager, developer)
{(PM 1, developer 2),(PM 2, developer 3)}
(project manager, customer)
{(PM 1, customer 2),(PM 2, customer 2)}
(project manager, developer) (PM 1, developer 2)
(project manager, customer) (PM 1, customer 2)
(developer, customer) (developer 2, customer 2)
(customer, developer) (customer 2, developer 2)
Query result
Determining Pattern containment
10
Pattern containment: example
11
customer developer
project manager
View 1
customer developer
View 2
customer developer
project manager
queryas “data graph”
λ
customer
project manager
developer
view matches
12
How to answer pattern queryusing views?
Query evaluation using views
13
Given Q, a set of views V and extensions, a mapping λ, find the query result Q(G)
Algorithm◦ Collect edge matches for each query edge e and λ(e)◦ Iteratively remove non-matches until no change happens◦ Return Q(G)
Query evaluation using views
14
customer developer
query
project manager
customer developer
project manager
View 1
customer developer
View 2
(customer, developer)
{(customer 2, developer 2),(customer 3, developer 3)}
(developer,
customer)
{(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}
(project manager, developer)
{(PM 1, developer 2),(PM 2, developer 3)}
(project manager, customer)
{(PM 1, customer 2),(PM 2, customer 2)}
(project manager, developer) {(PM 1, developer 2),(PM 2, developer 3)}
(project manager, customer) {(PM 1, customer 2),(PM 2, customer 2)}
(developer, customer) {(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}
(customer, developer) {(customer 2, developer 2),(customer 3, developer 3)}
Query result
“bottom-up” strategy
15
What should be selected?
What to choose?
16
customer
developer
project manager
softwaretester
customer
softwarecustomer developer
project manager
customerdeveloper
software
customer developer
project manager
software
customer developer
project manager
software
testerdeveloper
software
query view 2 view 1
view 3 view 4
view 5 view 6
choose all?
Minimum containment
17
An log|Ep|-approximation
18
Minimum containment
19
customer
developer
project manager
softwaretester
customer
softwarecustomer developer
project manager
customer developer
project manager
software
customer developer
project manager
software
testerdeveloper
software
query view 2 view 1
view 4
view 6 view 5
customerdeveloper
software
view 3
Ec
Minimal containment
20
Minimal containment
21
customer
developer
project manager
softwaretester
customer
software
customer developer
project manager
customer developer
project manager
software
customer developer
project manager
software
testerdeveloper
software
query view 2
view 1
view 4
view 6 view 5
customerdeveloper
software
view 3
Bounded pattern matching using views
22
Bounded pattern queries
Answering bounded pattern queries◦ Idea: “reduce” bounded pattern queries to weighted pattern queries◦ View matches: weighted edge to weighted paths◦ Complexity and algorithms carry over to bounded queries
customer developer
project manager
A collaborative pattern
2 2
PM
customer 2customer 1
developer 2developer 1
A collaborative (chat) network
tester
customer developer
project manager
View 1
customer developer
View 2
2 3
2
Putting everything together
23
Problem Complexity AlgorithmSimulation containment PTIME O(card(V)|Q|2+|V|2+|Q||V|)
minimum containment
NP-c/APX-hard log|Ep|-approximableO(card(V)|Q|2+|V|2+|Q||V|+|Q|card(V)3/2)
minimal containment
PTIME O(card(V)|Q|2+|V|2+|Q||V|)
evaluation PTIME O(|Q||V(G)| + |V(G)|2)
Bounded simulation
containment PTIME O(|Q|2|V|)
minimum containment
NP-c/APX-hard log|Ep|-approximableO(|Q|2|V|+|Q|card(V)3/2)
minimal containment
PTIME O(|Q|2|V|)
evaluation PTIME O(|Q||V(G)| + |V(G)|2)
Classes Relational XML graph/RDF
language Conjunctive query
Relational algebra
Xpath (XQuery)
RPQs ECRPQs (P)SPARQL (bounded) pattern query
containment NP-c undecidable coNP-c - undecidable
undecidable undecidable
EXPTIME PTIME
24
Experimental study
Efficiency: pattern queries
25
“Music”; < 7 days
Comedy; View > 10k
“Sports” Rate > 4
Youtube Views
2.2 times and 1.75 times faster
greater improvement over denser graphs
|E| = |V| a
Efficiency: bounded pattern queries
26
greater improvement over larger graphs
“Books”; rating > 4
“Music CD”; sales rank> 5000
10 times and 7.1 times faster“DVD”; reviews> 1000 Amazon Views
Minimum vs. Minimal
27
Minimum takes slightly more time to find substantially smaller sets of views
conclusion
28
Pattern containment is tractable for (bounded) pattern queries
Query evaluation using views is much more efficient for large graphs than “batch” counterparts
Journey just starts…◦ More features to select good views to cache?◦ When a query is not contained in existing views?◦ View-based subgraph queries?
29
Thank you!
Answering pattern query using views