answering queries across mappings

Post on 05-Jan-2016

33 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Answering queries across mappings. Grigoris Karvounarakis University of Pennsylvania. WPE-II Presentation. Global mediated schema (virtual). Query Q. T. Mappings. M 1. M 2. M n. Data integration. Heterogeneous data sources. S 1. S 2. S n. I n. I 2. I 1. J. - PowerPoint PPT Presentation

TRANSCRIPT

1

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Answering queries across mappingsGrigoris Karvounarakis

University of Pennsylvania

WPE-II Presentation

WPE-II 2

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Data integration

I1 I2 In

S2 Sn S1

Heterogeneous data sources

Global mediated schema (virtual)

T

...

...

Query Q

Mappings M1 M2 Mn

WPE-II 3

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Data exchange

I

S

Source Target

TM

T

J

J is a data exchange solution if: hI,Ji ² M J ² T

WPE-II 4

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Query answering (basic problem setting)

I

S

Source Target

TM

Query Q

Given source and target schemas (S, T), mapping M, source instance(s) I and a query QT (over the target), evaluate Q (using data from I)

Query reformulation: Compute a reformulation Q’ of Q that only refers to source relations

Data exchange: Compute a data exchange solution J, such that Q can be evaluated directly on J

WPE-II 5

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Outline

Preliminaries Mapping languages Semantics of query answering

Query reformulation Query answering using data exchange Comparison

WPE-II 6

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Mapping languages

Two approaches: Containment between conjunctive queries Dependencies (logical assertions)

WPE-II 7

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Query containment

Definition: A query Q1 is contained in a query Q2, denoted by Q1 v Q2, if for all database instances I:

Q1(I) µ Q2(I). Two queries Q1 and Q2 are equivalent, if Q1 v Q2

and Q2 v Q1.

In the case where Q1 and Q2 are over different schemas, related through mapping M:

M ² Q1 v Q2 if 8I,J: hI,Ji ² M:

Q1(I) µ Q2(J)

WPE-II 8

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Containment mappings

General form (GLAV): QS(x,y) v QT(x,z) (sound – Open World Assumption) QS(x,y) ´ QT(x,z) (exact – Closed World Assumption) QS, QT are conjunctions of relational atoms over S,T resp.

Special cases: GAV (global-as-view): target is specified as a view of the source(s)QS(x,y) v T(x) (sound – OWA)QS(x,y) ´ T(x) (exact – CWA)

LAV (local-as-view): sources are specified as views of the virtual mediated schema S(x) v QT(x,y) (sound – OWA) S(x) ´ QT(x,y) (exact – CWA)

WPE-II 9

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Dependencies

Tuple-generating dependencies (tgds): 8x,z (x,z) y (x, y)(where , are conjunctions of relational atoms and

x,y,z are vectors of variables)

Equality-generating dependencies (egds): 8x (x) xi = xj

WPE-II 10

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Data exchange schema mappings

Source-to-target tgds: 8x,z (x,z) y (x, y)

is a conjunction of atoms over S and is a conjunction of atoms over T

Target tgds Both , are conjunctions of atoms over T

Target egds 8x (x) xi = xj

is a conjunction of atoms over T

WPE-II 11

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05Containment mappings vs. source-to-target

tgds A source-to-target tgd of the form: 8x,z QS(x,z) y QT(x, y)is equivalent to the sound GLAV mapping: QS(x,z) v QT(x, y) Sound GAV and LAV mappings can also be

expressed by source-to-target tgds. But exact mappings also include a target-to-

source direction: E.g.: S(x,z) ´ T1(x,y), T2(y,z) is equivalent to:

8x,z S(x,z) y T1(x, y) Æ T2(y,z) (source-to-target) and

8x,y,z T1(x, y) Æ T2(y,z) S(x,z) (target-to-source)

WPE-II 12

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Incompleteness

Mappings do not specify target instance completely E.g.: 8x,z S(x,z) ! 9y T(x,y) ÆT(y,z) does not specify the

values of y

I

S

Source Target

TM

J2

J1

J3

. . .

E.g., if I = {S(a,b)}:J1 = {T(a,a),T(a,b)}

J2 = {T(a,b),T(b,b)}

J3 = {T(a,X),T(X,b)}

J4 = {T(a,X),T(X,b),

T(a,Y),T(Y,b)} . . .

WPE-II 13

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Semantics of query answering

What do we expect as answers to queries over the target schema?

“Possible worlds” semantics: for every instance I of S, consider all possible instances J of the target schema T such that hI,Ji ² M

Convention: certain answers certainM,I(QT) = J: hI,Ji ² M QT (J)

WPE-II 14

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Outline

Preliminaries Mapping languages Semantics of query answering

Query reformulation Query answering using data exchange Comparison

WPE-II 15

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Equivalent reformulation

Definition: Q’S is an equivalent reformulation of QT across M (denoted M ² QT ´ Q’S) if, for every pair of instances I,J of S,T s.t. hI,Ji²M: Q’S (I) = QT (J)

WPE-II 16

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Equivalent reformulations may not exist

Any reformulation over S can only return values v such that T(v,v)

But there are instances J, s.t. T contains tuples in which a b

S(c) T(a,b)8x S(x) $ T(x,x)

Q(x) :- T(x,y)

… even if the mapping is exact

WPE-II 17

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Contained reformulation

Definition: Q’S is an contained reformulation of QT across M (denoted M ² Q’S v QT) if, for every pair of instances I,J of S,T s.t. hI,Ji²M: Q’S (I) µ QT (J)

WPE-II 18

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Maximally-contained reformulation

Definition: QSmax is a maximally-contained

reformulation of QT across M if: M ² QS

max v QT and Q’S v QS

max, for every Q’S s.t. M ² Q’S v QT

The union of all contained reformulations is a maximally-contained reformulation: QS

max ´ reformM(QT) ´ Q’S: M ² Q’S v QT Q’S

WPE-II 19

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05Maximally-contained reformulations

compute certain answers

Proposition ([AD98],[FKMP03],[T05]): Let certainM(Q) = I. certainM,I (Q)

Then:

certainM(Q) ´ reformM(Q)

(i.e.,: 8I, reformM(Q)(I) = certainM,I(Q) )

Note that the above holds for any mapping (i.e., not necessarily conjunctive)

WPE-II 20

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Reformulation algorithms (GAV)

Sound/exact GAV mappings: e.g. QS(x,y) v T(x)

Reformulation: for every relation Ti(x) of the target schema, let ri be the set of rules with Ti on their head (maybe > 1).

Let QTi(x) be the union of the conjunctive queries in the

body of the rules in ri

Substitute Ti(x) atoms in Q by QTi(x)

WPE-II 21

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Reformulation algorithms (LAV/GLAV)

Sound LAV/GLAV mappings: r: S1(x,y),…,Sn(x,y) v T1(x,z), …, Tm(x,z)

(note: Ti ’s are not necessarily distinct relational atoms)

(equivalent tgd: 8x,y S1(x,y),…,Sn(x,y) ! Ti(x,z),…, Tm(x,z))

Inverse rules ([DG97]): For every rule r and every i 2 [1..m] define a rule: Ti(x, fr,z1

(x,y), …, fr,zk(x,y)) :- S1(x,y),…,Sn(x,y)

(tgd: 8x,y S1(x,y),…,Sn(x,y) ! Ti(x,fr,z1(x,y),…, fr,zk

(x,y))

skolemization of existential variables)

WPE-II 22

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Inverse rules: Example

r: S1(x,y),S2(y,w) v T1(x,z),T1(z,w)

Inverse rules: T1(x,fr,z(x,y,w)) :- S1(x,y),S2(y,w) T1(fr,z(x,y,w),w) :- S1(x,y),S2(y,w)

Observe that the same skolem term (fr,z(x,y,w)) represents the common existential variable (z) of the two atoms

WPE-II 23

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Query reformulation using inverse rules

Create a logic program PQ composed by: the query Q the inverse rules of all mappings M

Let P(I) be the result of the evaluation of the composition of a logic program P with a set of facts I

Theorem ([DG97,AD98]): Let PQ+ be a logic program s.t. for every set of facts I, PQ+(I) is the result of discarding all tuples that contain skolem terms from PQ(I). Then:

PQ+ is a maximally-contained reformulation PQ+(I) = certainM,I(Q)

WPE-II 24

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Peer Data Management Systems

I1 I2

In

P2

Pn

P1

...

...

I3

P3

LAV source-to-peer mappings P2P mappings: inclusion

(sound) or equality (exact) GLAV + definitional (GAV)

Queries can be issued at any peer

Every peer can be both source and target w.r.t. different mappings

Pairs of peers may be indirectly connected (by paths of mappings)

Sn

S1 S2

S3

Mn3

M31 M23

M12

WPE-II 25

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Simple PDMS example

I1

S1

ProjMem

Area

r1:S1(n,p,a) µ ProjMem(n,p),Area(p,a)

SameProj

Author

I2

r2: S2(n1,n2) µ Author(n1,p), Author(n2,p)

r0: SameProj(n1,n2,p) = ProjMem(n1,p),ProjMem(n2,p)

Q(n1,n2) :- SameProj(n1,n2,p), Author(n1,p),Author(n2,p)

P1 P2

S1S2 S2

WPE-II 26

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Mapping Graph

ProjMem

Area

SameProj

Author

r2

r0a

r0b

r1

r1

r1: S1(n,p,a) µ ProjMem(n,p),Area(p,a)r2: S2(n1,n2) µ Author(n1,p),Author(n2,p)

r0a: SameProj(n1,n2,p) ¶ ProjMem(n1,p),ProjMem(n2,p)r0b: SameProj(n1,n2,p) µ ProjMem(n1,p),ProjMem(n2,p)

I1

I2

S1 S1S2 S2

P1 P2

WPE-II 27

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Query answering in PDMS

Theorem: ([HIST05]) In general, query answering in PDMS is undecidable

Reason: cycles in mapping graph For acyclic mapping graph: query answering is in

PTIME Still in PTIME, for a limited form of cycles (i.e., exact

mappings with some restrictions) Allows chains of sound (“LAV”) mappings and exact (“GAV”) mappings without projections

Piazza reformulation algorithm Sound and complete for acyclic mapping graph and limited form of cycles

Sound, in general (computes subset of certain answers)

WPE-II 28

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Piazza reformulation algorithm (1)

q: Q(n1,n2) :- SameProj(n1,n2,p), Author(n1,w), Author(n2,w)

q

SameProj(n1,n2,p)

Author(n1,w)

Author(n2,w)

r0

ProjMem(n1, p) ProjMem(n2, p)

ir1

a

S1(n1, p,_) S1(n2, p,_)

ir1a

ir2

a

S2(n1, n2)

S2(n2, n1)

ir2bir2

a

S2(n1, n2)S2(n2, n1)

ir2b

r1: S1(n,p,a) µ ProjMem(n,p),Area(p,a)

r0: SameProj(n1,n2,p) :- ProjMem(n1,p), ProjMem(n2,p)

ir1a: ProjMem(n,p) :- S2(n,p,a)r2: S2(n1,n2) µ Author(n1,p), Author(n2,p)ir2a: Author(n1,f(n1,n2)) :- S2(n1,n2)

ir2b: Author(n2,f(n1,n2)) :- S2(n1,n2)

WPE-II 29

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Piazza reformulation algorithm (2)

q

Q(n1,n2)

r0

SameProj(n1,n2,p)

Author(n1,w)

Author(n2,w)

ProjMem(n1, p) ProjMem(n2, p)

ir1

a

S1(n1, p,_) S1(n2, p,_)

ir1a

ir2

a

S2(n2, n1)S2(n1, n2)

ir2bir2

a

S2(n1, n2)S2(n2, n1)

ir2b

Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ (S2(n1,n2)[S2(n2,n1)) Æ (S2(n2,n1)[S2(n1,n2))

WPE-II 30

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Piazza reformulation algorithm (2)

q

Q(n1,n2)

r0

SameProj(n1,n2,p)

Author(n1,w)

Author(n2,w)

ProjMem(n1, p) ProjMem(n2, p)

ir1

a

S1(n1, p,_) S1(n2, p,_)

ir1a

ir2

a

S2(n2, n1)S2(n1, n2)

ir2bir2

a

S2(n1, n2)S2(n2, n1)

ir2b

Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ (S2(n1,n2)[S2(n2,n1)) Æ (S2(n2,n1)[S2(n1,n2))

WPE-II 31

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Piazza reformulation algorithm (2)

q

Q(n1,n2)

r0

SameProj(n1,n2,p)

Author(n1,w)

Author(n2,w)

ProjMem(n1, p) ProjMem(n2, p)

ir1

a

S1(n1, p,_) S1(n2, p,_)

ir1a

ir2

a

S2(n2, n1)S2(n1, n2)

ir2bir2

a

S2(n1, n2)S2(n2, n1)

ir2b

Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ (S2(n1,n2)[S2(n2,n1)) Æ (S2(n2,n1)[S2(n1,n2))

WPE-II 32

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Piazza reformulation algorithm (2)

q

Q(n1,n2)

r0

SameProj(n1,n2,p)

Author(n1,w)

Author(n2,w)

ProjMem(n1, p) ProjMem(n2, p)

ir1

a

S1(n1, p,_) S1(n2, p,_)

ir1a

ir2

a

S2(n2, n1)S2(n1, n2)

ir2bir2

a

S2(n1, n2)S2(n2, n1)

ir2b

Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ (S2(n1,n2)[S2(n2,n1)) Æ (S2(n2,n1)[S2(n1,n2)) ´ (S1(n1,p,_)ÆS1(n2,p,_)ÆS2(n1,n2)) (S1(n1,p,_)ÆS1(n2,p,_)ÆS2(n2,n1))

WPE-II 33

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Outline

Preliminaries Mapping languages Semantics of query answering

Query reformulation Query answering using data exchange Comparison

WPE-II 34

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Universal solutions

Data exchange setting S,T,M, instance I of S An instance J of T is a universal solution of the de setting above if it has homomorphisms to all other solutions

Solutions contain constants (i.e., values that appear in I) and variables (labeled nulls)

Homomorphism h: J1 → J2 between target instances: h(c) = c, for constant c If R(a1,…,am) is in J1,, then R(h(a1),…,h(am)) is in J2

WPE-II 35

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Universal solutions

IJ

J1

J2J3

Universal Solution

Solutions

h1 h2 h3 Homomorphisms

S

Source Target

TM

. . .

WPE-II 36

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Universal solutions example

M: 8x,z S(x,z) ! 9y T(x,y) Æ T(y,z) I = {S(a,b)}

Solutions:J1 = {T(a,a), T(a,b)} is not universal

J2 = {T(a,b), T(b,b)} is not universal

J3 = {T(a,X), T(X,b)} is universal

J4 = {T(a,X), T(X,b), T(a,Y), T(Y,b)} is universal

J5 = {T(a,X), T(X,b), T(Y,Y)} is not universal

. . .

WPE-II 37

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Computing universal solutions

Apply the chase procedure on joint instance hI,;i Source-to-target dependencies only: terminates

in PTIME and produces a joint instance hI,Ji, where J is a universal solution (chase(I))

Target dependencies: not guaranteed to terminate

If it does, it computes universal solution If it fails, no universal solution exists

WPE-II 38

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Example chase sequence

h1: x! a, y ! b, z ! c

)h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i

d1: 8x,y,z S(x,y)ÆS(y,z) ! 9w T(x,z,w)

h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i

extend to h1’ : w ! X1

WPE-II 39

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Example chase sequence

h1: x! a, y ! b, z ! c

)h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i

d1: 8x,y,z S(x,y)ÆS(y,z) ! 9w T(x,z,w)

h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i

extend to h1’ : w ! X1

h2: x! a, y ! b, z ! d

)h2h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1),T(a,d,X2)}i

extend to h2’ : w ! X2

WPE-II 40

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Example chase sequence

h1: x! a, y ! b, z ! c

)h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i

d1: 8x,y,z S(x,y)ÆS(y,z) ! 9w T(x,z,w)

h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i

extend to h1’ : w ! X1

h2: x! a, y ! b, z ! d

)h2h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1),T(a,d,X2)}i

extend to h2’ : w ! X2

h3: x! a, y ! e, z ! c extend to h3’ : w ! X1

not applicable!

WPE-II 41

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Universal solutions and query answering

Theorem ([FKMP]): If Q is a conjunctive query, I is a source instance

and J is a universal solution: Q(J)+= certainM,I(Q)

Any solution J, for which the above holds for any conjunctive query, is universal

WPE-II 42

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Outline

Preliminaries Mapping languages Semantics of query answering

Query reformulation Query answering using data exchange Comparison

WPE-II 43

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05Using inverse rules to compute universal

solutions

For every relation Ti of T, let PM,Ti be the

reformulation of the query Q(x) :- Ti(x), using the inverse rules algorithm.

Proposition: i PM,Ti (I) chase(I)

Crux: every step of a chase sequence corresponds to a step in the evaluation of the logic program using SLD resolution

Corollary: i PM,Ti (I) is a universal solution

WPE-II 44

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05Applying data exchange in GAV/LAV

settings

I1 I2 In

S2 Sn S1

T

...

...

Query Q

M1 M2 Mn

S

I

J1J2 Jn

J ...

WPE-II 45

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Performance tradeoffs

Data exchange: - requires the computation of a solution

(polynomial in the size of the instance I)- need to propagate updates in the source - may

require to recompute the whole universal solution

+ But then query evaluation is easy and efficient+ If query load is large, the cost of computing the

solution may be amortized

WPE-II 46

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Performance tradeoffs

Reformulation+ No “startup” cost+ No need to propagate updates- Adds overhead to query processing (although

reformulations for “common” queries can be precomputed/cached)

- Requires distributed query evaluation engine (but there is room for optimization, e.g., adaptive query processing)

- Generated reformulations are generally not minimal

WPE-II 47

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis June 05

Conclusions

Two approaches for answering queries across mappings

Reformulation (data integration) Universal solutions (data exchange)

Different problems Data exchange is concerned with other aspects, e.g., identifying the appropriate solution to materialize

Same answers (certain answers) Performance tradeoffs Tight relationship between chase and inverse rules

techniques

top related