answering pattern queries using views yinghui wu uc santa barbara wenfei fan university of...

29
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang

Upload: vincent-lloyd

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Answering pattern queries using views

Yinghui WuUC Santa

Barbara

Wenfei FanUniversity of EdinburghSouthwest Jiaotong University

Xin Wang

Page 2: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Real-life graph querying is expensive

2

social scale 100B (1011)

Web scale 1T (1012)

brain scale, 100T (1014)

Real-life scope

100M(108)

An NSA Big Graph experiment, P.Burkhardt, et al, US. National Security Agency, May 2013

Page 3: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Querying collaborative network

3

customer developer

project manager

query 1

Customer developer

query 2

PM 2PM 1

customer 2 developer 3developer 2

customer 2

developer 3

developer 2

customer 3

“Detecting Coordination Problems in Collaborative Software Development Environments”, Amrit Chintan et al, Information System management, 2010

customer developer

project manager

A collaborative pattern

PM 2PM 1

customer 2

customer 1

developer 2

developer 3

developer 1

customer 3

A collaborative (chat) network

developer k

customer 3 customer n

tester

expensive!

Page 4: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Answering query using views

4

query A

database D

database views V(D)

Q(D)query result query Q

A(V)query result

1995 2000 2011

relational algebra

2002

XPath

2007

XML

2006

tree pattern query

1998

regular path queries RDF/SPARQL

graph pattern query (bounded) simulation (our work)

When?

What to choose?

How to evaluate?

Page 5: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Outline

5

Graph pattern matching using views◦ When, what and how?

When a query can be evaluated using views?◦ Pattern containment: an iff condition

How to evaluate?◦ query answering using views

What to choose?◦ minimum containment & minimal containment

Extension: bounded simulation

Experimental Study

Conclusion

Page 6: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Graphs, patterns and views

6

customer developer

pattern query

customer 2

developer 3

developer 2

customer 3

query result

edges matches

(customer, developer) {(customer 2, developer 2),(customer 3, developer 3)}

(developer, customer){(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}

(view definition)

(view extension)

edges matches

(project manager, developer)

{(PM 1, developer 2),(PM 2, developer 3)}

(project manager, customer){(PM 1, customer 2),(PM 2, customer 2),

edges matches

(customer, developer) {(customer 2, developer 2),(customer 3, developer 3)}

(developer, customer){(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}

• binary relation

• node match: satisfies predicates

• edge match: connects two node matches

view definition 2

customer developer

project manager

customer developer

view definition 1

view 1

view 2

view extension 1

view extension 2

Page 7: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Graph pattern matching using views

7

Given a pattern query Q, and a set V of view definitions, find another query A s.t.

◦ A is equivalent to Q (A(G) = Q(G)) for all data graph G◦ A only refers to V and extensions V(G)

query A

data graph G

views V

Q(G)matchesquery Q

A(G)

Page 8: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

8

When a pattern query can be answered using views?

Page 9: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Pattern containment

9

customer developer

project manager

customer developer

project managerView 1

customer developer

View 2

(customer, developer)

{(customer 2, developer 2),(customer 3, developer 3)}

(developer, customer)

{(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}

(project manager, developer)

{(PM 1, developer 2),(PM 2, developer 3)}

(project manager, customer)

{(PM 1, customer 2),(PM 2, customer 2)}

(project manager, developer) (PM 1, developer 2)

(project manager, customer) (PM 1, customer 2)

(developer, customer) (developer 2, customer 2)

(customer, developer) (customer 2, developer 2)

Query result

Page 10: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Determining Pattern containment

10

Page 11: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Pattern containment: example

11

customer developer

project manager

View 1

customer developer

View 2

customer developer

project manager

queryas “data graph”

λ

customer

project manager

developer

view matches

Page 12: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

12

How to answer pattern queryusing views?

Page 13: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Query evaluation using views

13

Given Q, a set of views V and extensions, a mapping λ, find the query result Q(G)

Algorithm◦ Collect edge matches for each query edge e and λ(e)◦ Iteratively remove non-matches until no change happens◦ Return Q(G)

Page 14: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Query evaluation using views

14

customer developer

query

project manager

customer developer

project manager

View 1

customer developer

View 2

(customer, developer)

{(customer 2, developer 2),(customer 3, developer 3)}

(developer,

customer)

{(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}

(project manager, developer)

{(PM 1, developer 2),(PM 2, developer 3)}

(project manager, customer)

{(PM 1, customer 2),(PM 2, customer 2)}

(project manager, developer) {(PM 1, developer 2),(PM 2, developer 3)}

(project manager, customer) {(PM 1, customer 2),(PM 2, customer 2)}

(developer, customer) {(developer 2, customer 2),(developer 2, customer 3),(developer 3, customer 2)}

(customer, developer) {(customer 2, developer 2),(customer 3, developer 3)}

Query result

“bottom-up” strategy

Page 15: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

15

What should be selected?

Page 16: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

What to choose?

16

customer

developer

project manager

softwaretester

customer

softwarecustomer developer

project manager

customerdeveloper

software

customer developer

project manager

software

customer developer

project manager

software

testerdeveloper

software

query view 2 view 1

view 3 view 4

view 5 view 6

choose all?

Page 17: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Minimum containment

17

Page 18: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

An log|Ep|-approximation

18

Page 19: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Minimum containment

19

customer

developer

project manager

softwaretester

customer

softwarecustomer developer

project manager

customer developer

project manager

software

customer developer

project manager

software

testerdeveloper

software

query view 2 view 1

view 4

view 6 view 5

customerdeveloper

software

view 3

Ec

Page 20: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Minimal containment

20

Page 21: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Minimal containment

21

customer

developer

project manager

softwaretester

customer

software

customer developer

project manager

customer developer

project manager

software

customer developer

project manager

software

testerdeveloper

software

query view 2

view 1

view 4

view 6 view 5

customerdeveloper

software

view 3

Page 22: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Bounded pattern matching using views

22

Bounded pattern queries

Answering bounded pattern queries◦ Idea: “reduce” bounded pattern queries to weighted pattern queries◦ View matches: weighted edge to weighted paths◦ Complexity and algorithms carry over to bounded queries

customer developer

project manager

A collaborative pattern

2 2

PM

customer 2customer 1

developer 2developer 1

A collaborative (chat) network

tester

customer developer

project manager

View 1

customer developer

View 2

2 3

2

Page 23: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Putting everything together

23

Problem Complexity AlgorithmSimulation containment PTIME O(card(V)|Q|2+|V|2+|Q||V|)

minimum containment

NP-c/APX-hard log|Ep|-approximableO(card(V)|Q|2+|V|2+|Q||V|+|Q|card(V)3/2)

minimal containment

PTIME O(card(V)|Q|2+|V|2+|Q||V|)

evaluation PTIME O(|Q||V(G)| + |V(G)|2)

Bounded simulation

containment PTIME O(|Q|2|V|)

minimum containment

NP-c/APX-hard log|Ep|-approximableO(|Q|2|V|+|Q|card(V)3/2)

minimal containment

PTIME O(|Q|2|V|)

evaluation PTIME O(|Q||V(G)| + |V(G)|2)

Classes Relational XML graph/RDF

language Conjunctive query

Relational algebra

Xpath (XQuery)

RPQs ECRPQs (P)SPARQL (bounded) pattern query

containment NP-c undecidable coNP-c - undecidable

undecidable undecidable

EXPTIME PTIME

Page 24: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

24

Experimental study

Page 25: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Efficiency: pattern queries

25

“Music”; < 7 days

Comedy; View > 10k

“Sports” Rate > 4

Youtube Views

2.2 times and 1.75 times faster

greater improvement over denser graphs

|E| = |V| a

Page 26: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Efficiency: bounded pattern queries

26

greater improvement over larger graphs

“Books”; rating > 4

“Music CD”; sales rank> 5000

10 times and 7.1 times faster“DVD”; reviews> 1000 Amazon Views

Page 27: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Minimum vs. Minimal

27

Minimum takes slightly more time to find substantially smaller sets of views

Page 28: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

conclusion

28

Pattern containment is tractable for (bounded) pattern queries

Query evaluation using views is much more efficient for large graphs than “batch” counterparts

Journey just starts…◦ More features to select good views to cache?◦ When a query is not contained in existing views?◦ View-based subgraph queries?

Page 29: Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

29

Thank you!

Answering pattern query using views