searching and integrating information on the web

55
1 Searching and Integrating I nformation on the Web Seminar 2: Data Integration Professor Chen Li UC Irvine

Upload: alban

Post on 16-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Searching and Integrating Information on the Web. Seminar 2: Data Integration Professor Chen Li UC Irvine. Motivation. Biblio sever. Legacy database. Plain text files. Support seamless access to autonomous and heterogeneous information sources. Comparison Shopping. Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Searching and Integrating Information on the Web

1

Searching and Integrating Information on the Web

Seminar 2: Data Integration

Professor Chen Li

UC Irvine

Page 2: Searching and Integrating Information on the Web

Seminar 2 2

Motivation

Legacy database Plain text files

Biblio sever

Support seamless access to autonomous and heterogeneous information sources.

Page 3: Searching and Integrating Information on the Web

Seminar 2 3

Comparison Shopping

Lowest price of the DVD: “The Matrix”?

Applications Comparison shopping

Supply-chain management

Supplier 2

… Integrator

Supplier M

Supplier 1

Buyer 2

Buyer M

Buyer 1

Page 4: Searching and Integrating Information on the Web

Seminar 2 4

Mediation architecture

Mediator

Wrapper

Source 1

Wrapper

Source 2

Wrapper

Source n

TSIMMIS (Stanford), Garlic (IBM), Infomaster (Stanford), Disco (INRIA), Information Manifold (AT&T), Hermes(UMD), Tukwila (UW), InfoSleuth (MCC), …

Page 5: Searching and Integrating Information on the Web

Seminar 2 5

• Sources are heterogeneous:– Different data models: relational, object-oriented, XML, …

– Different schemas and representations:

“Keanu Reeves” or “Reeves, Keanu” or “Reeves, K.” etc.

• Describe source contents• Use source data to answer queries• Sources have limited query capabilities• Data quality• …

Challenges

Page 6: Searching and Integrating Information on the Web

Seminar 2 6

Outline

• Basics: theories of conjunctive queries

• Global-as-view (GAV) approach to data integration

• Local-as-view (LAV) approach to data integration

Page 7: Searching and Integrating Information on the Web

Seminar 2 7

Basics: conjunctive queries

• Reading: Ashok K. Chandra and Philip M. Merlin, “Optimal implementation of conjunctive queries in relational data bases,” STOC, 77-90, 1977.

• Fundamental for data integration– Source content description– Query description– Plan formulation

Page 8: Searching and Integrating Information on the Web

Seminar 2 8

Conjunctive Queries (CQ’s)• Most common form of query; equivalent to select-

project-join (SPJ) queries• Useful for data integration• Form: q(X) :- p1(X1),p2(X2),…,pn(Xn)• Head q(X) represents the query answers• Body p1(X1),p2(X2),…,pn(Xn) represents the query

conditions– Each pi(Xi) is called a subgoal– Shared variables represent join conditions– Constants represent “Attribute=const” selection conditions– A relation can appear in multiple predicates (subgoals)

Page 9: Searching and Integrating Information on the Web

Seminar 2 9

Conjunctive Queries: example• student(name,courseNum), course(number,instructor) SELECT name

FROM student, course WHERE student.courseNum=course.number AND instructor=‘Li’;

Equal to:ans(SN) :- student(SN, CN), course(CN,’Li’)

– Predicates student and course correspond to relations names– Two subgoals: student(SN, CN) and course(CN,’Li’)– Variables: SN, CN. Constant: ‘Li’– Shared variable, CN, corresponds to

“student.courseNum=course.number”– Variable SN in the head: the answer to the query

Page 10: Searching and Integrating Information on the Web

Seminar 2 10

Answer to a CQ• For a CQ Q on database D, the answer Q(D) is set of heads of Q if we:

– Substitute constants for variables in the body of Q in all possible ways– Require all subgoals to be true

• Example: ans(SN) :- student(SN, CN), course(CN,’Li’)– Tuples are also called “EDB” (external database) facts: student(Jack, 184),

student(Tom,215), …, course(184,Li), course(215,Li), …– Answer “Jack”: SNJack,CN184– Answer “Tom”: SNTom,CN215– Answer “Jack”: SNJack,CN215 (duplicate eliminated)

Name CourseNum Jack 184 Tom 215 Jack 215 Mary 214

Number Instructor 184 Li 215 Li 214 Mehrotra 252 Gupta

Student Course

Page 11: Searching and Integrating Information on the Web

Seminar 2 11

Query containment• For two queries Q1 and Q2, we say Q1 is contained in Q2, denoted Q1Q2,

if any database D, we have Q1(D) Q2(D).

• We say Q1 and Q2 are equivalent, denoted Q1Q2, if Q1(D) Q2(D) and Q1(D) Q2(D).

• Example:

Q1: ans(SN) :- student(SN, CN), course(CN,’Li’)

Q2: ans(SN) :- student(SN, CN), course(CN,INS)

We have: Q1(D) Q2(D).

Page 12: Searching and Integrating Information on the Web

Seminar 2 12

Another exampleQ1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y)

Q2: p(X,Y) :- r(X,W), b(W,W), r(W,Y)

• We have: Q2 Q1

• Proof:– For any DB D, suppose p(x,y) is in Q2(D). Then there is a w such that r(x,w),

b(w,w), and r(w,y) are in D.

– For Q1, consider the substitution: X x, W w, Z w, Y y.

– Thus the head of Q1 becomes p(x,y), meaning that p(x,y) is also in Q1(D).

• In general, how to test containment of CQ’s?– Containment mappings

– Canonical databases

Page 13: Searching and Integrating Information on the Web

Seminar 2 13

Containment mappings• Mapping from variables of CQ Q2 to variables of CQ Q1, such that:

– Head of Q2 becomes head of Q1

– Each subgoal of Q2 becomes some subgoal of Q2 It is not necessary that every subgoal of Q1 is the target of some subgoal of Q2.

• Example:

Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y)

Q2: p(X,Y) :- r(X,W), b(W,W), r(W,Y)

– Containment mapping from Q1 to Q2: X X, Y Y, W W, Z W

– No containment mapping from Q2 to Q1: For b(W,W) in Q2, its only possible target in Q1 is b(W,Z) However, we cannot have a mapping WW and WZ, since each variable

cannot be mapped to two different variables

Page 14: Searching and Integrating Information on the Web

Seminar 2 14

Example of containment mappings• Example: C1: p(X) :- a(X,Y), a(Y,Z), a(Z,W)

C2: p(X) :- a(X,Y), a(Y,X)

• Containment mapping from C1 to C2: X X, Y Y, Z X, W Y

• No containment mapping from C2 to C1. Proof:– For the two heads, the mapping must have X X

– For a(X,Y) in C2, its target in C1 can only be a(X,Y) (since XX). Thus YY.

– However, for a(Y,X) in C2, its target, which must be a(Y,X), does not exist in C1.

Page 15: Searching and Integrating Information on the Web

Seminar 2 15

Theorem of Containment Mappings• Theorem: Q1 Q2 iff there is a containment mapping from Q2 to Q1.

• Notice: the direction is the “opposite”• Proof (“If”):

– Suppose is a containment mapping from Q2 to Q1

– For any DB D, let tuple t is in Q1(D)– t is produced by a substitution on the variables of Q1 that makes all Q1’s

subgoals facts in D.– Therefore, is a substitution for variables of Q2 that produces t– Thus each t in Q1(D) must be in Q2(D)

Q1: p(X) :- G1, G2, … Gk

Q2: p(X) :- H1, H2, … Hj

Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y)

Q2: p(X,Y) :- r(X,W), b(W,W), r(W,Y)

Page 16: Searching and Integrating Information on the Web

Seminar 2 16

Proof (only if)• Key idea: frozen CQ

• Use a unique constant to replace a variable

• Frozen Q is a DB consisting of all the subgoals of Q, with the chosen constants substituted for variables

• This DB is called a “canonical database” of the query.

• Example: – Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y)

– Frozen Q1: X replaced by constant x0, W by constant w0, Z by z0, Y by y0

– Result: DB with {r(x0, w0), b(w0, z0), r(z0, y0)}

Page 17: Searching and Integrating Information on the Web

Seminar 2 17

Proof (only if) -- cont• Let Q1 Q2. Let D be the frozen Q1. Let be the substitution from those

constants to the variables in Q1.– Since we chose a unique constant for each variable, this substitution exists.

• Since Q1 Q2 the “frozen” head of Q1 must be in Q2(D). Thus there is a substitution from Q2 to D.

• We can show that is a containment mapping from Q2 to Q1

– The head of Q2 is mapped to the head of Q1.

– Each subgoal in Q2 is mapped to a subgoal in Q2.

Q1: p(X) :- G1, G2, … Gk

Q2: p(X) :- H1, H2, … Hj

Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y)

Q2: p(X,Y) :- r(X,W), b(W,W), r(W,Y)

Page 18: Searching and Integrating Information on the Web

Seminar 2 18

Testing query containment• To test Q1 Q2.:

– Get a canonical DB D of Q1.

– Compute Q2(D)

– If Q2(D) contains the frozen head of Q1, then Q1 Q2. otherwise not.

• Testing containment between CQ’s is NP-complete.

• Some polynomial-time algorithms exist in special cases.

Page 19: Searching and Integrating Information on the Web

Seminar 2 19

Extending CQ’s• CQ’s with built-in predicates:

– We can add more conditions to variables in a CQ.– Example:

student(name, GPA, courseNum), course(number,instructor,year)

ans(SN) :- student(SN, G, CN), course(CN,’Li’), G>=3.5

ans(SN) :- student(SN, G, CN), course(CN,’Li’, Y), G>=3.5, Y < 2002– More results on CQ’s with built-in predicates

• Datalog queries: – a (possibly infinite) set of CQ’s with (possibly) recursion– Example: r(Parent, Child)– Query: finding all ancestors of Tom

ancestor(P,C) :- r(P, C)

ancestor(P,C) :- ancestor(P,X), r(X, C)

result(P) :- ancestor(P, ‘tom’)

Page 20: Searching and Integrating Information on the Web

Seminar 2 20

Further Reading

• Jeff Ullman, “Principles of Database and Knowledge Systems,” Computer Science Press, 1988, Volume 2.

Page 21: Searching and Integrating Information on the Web

Seminar 2 21

Outline

• Basics: theories of conjunctive queries

• Global-as-view (GAV) approach to data integration

• Local-as-view (LAV) approach to data integration

Page 22: Searching and Integrating Information on the Web

Seminar 2 22

GAV approach to data integration

• Readings:– Jeffrey Ullman,

Information Integration Using Logical Views, ICDT 1997.

– Ramana Yerneni, Chen Li, Hector Garcia-Molina, and Jeffrey Ullman, Computing Capabilities of Mediators, SIGMOD 1999.

Page 23: Searching and Integrating Information on the Web

Seminar 2 23

Global-as-view Approach

Mediator

• Mediator exports views defined on source relationsmed(Dealer,City,Make,Year) = R1 R2

• A query is posted on mediator views:SELECT * FROM medWHERE Year = ‘2001’; ans(D,C,M) :- med(D,C,M,‘2001’)

• Mediator expands query to source queries:SELECT * FROM R1, R2 WHERE Year = ‘2001’; ans(D,C,M,Y) :- R1(D,C), R2(D,M,2001)

R1(Dealer,City) R2(Dealer, Make, Year)

med(Dealer,City,Make,Year) = R S

Page 24: Searching and Integrating Information on the Web

Seminar 2 24

• Project: TSIMMIS at Stanford• Advantages:

– User queries easy to define– Plan generation is straightforward

• Disadvantages:– Not all source information is exported:

What if users want to get dealers that may not the city information? Those dealers are not “visible.”

– Not easily scalable: every time a new source is added, mediator views need to be changed

• Research issues– Efficient query execution?– Deal with limited source capabilities?

GAV Approach (cont)

Page 25: Searching and Integrating Information on the Web

Seminar 2 25

Limited source capabilities Complete scans of relations not possible

Reasons:

– Legacy databases or structured files: limited interfaces

– Security/Privacy

– Performance concerns

Example 1: legacy databases with restrictive interfaces

Ullman DBMS

Knuth TeX

… …

author titleGiven an author,

return the books.

Page 26: Searching and Integrating Information on the Web

Seminar 2 26

Another example: Web search forms

www.imdb.com

Page 27: Searching and Integrating Information on the Web

Seminar 2 27

Problems

• How to describe source restrictions?

• How to compute mediator restrictions from sources?

• How to answer queries efficiently given these restrictions?

• How to compute as many answers as possible to a query?

• …

Page 28: Searching and Integrating Information on the Web

Seminar 2 28

Describe source capabilities: using attribute adornments.

f: free

b: bound

u: unspecified

c[S]: chosen from a list S of constants, e.g., “state”

o[S]: optional; if chosen, must be from a list S of constants

A search form is represented as multiple templates:

(Title, Author, ISBN, Format, Subject)

b f u u u 1

f b u u u 1

u u u o[] o[] 2

u u b u u 3

1

2

3

Page 29: Searching and Integrating Information on the Web

Seminar 2 29

Computing mediator restrictions• Motivation: do not want users to be frustrated by submitting a query that

cannot be answerable by the mediator• Example:

– Source 1: book(author, title, price) Capability: “bff” I.e., we must provide a title, and can get author and price info

– Source 2: review(title, reviewer, rate) Capability: “bff” I.e., we must provide a book title, and can get other info

– Mediator view: MedView(A,T,P,RV,RT) :- book(A,T,P),review(T,RV,RT)

– Query on the mediator view: Ans(RT) :- MedView(A, ‘db’, P, RV, RT). I.e., “find the review rates of DB books”

– But the mediator cannot answer this query, since we do not know the authors.

• We want to tell the user beforehand what queries can be answered

Page 30: Searching and Integrating Information on the Web

Seminar 2 30

Solutions: Compute mediator capabilities

Need algorithms that do the following:

• Given– Source relations with restrictions.

– Mediator views defined on source relations: Union Join Selection Projection

• Main idea of the algorithms– compute restrictions on mediator views

– minimize number of view templates

Page 31: Searching and Integrating Information on the Web

Seminar 2 31

“Union” views• Assumption:

– MedView :- V1V2– We want to get all tuples from two sources that satisfy a query condition– No mediator post-processing power

• Table to compute view adornments– E.g., “f, o[s3] o[s3]” – “c[s2], o[s3] c[s2s3]”– Invalid combination: “b,u -”

f o[s3] b c[s4] u f f o[s3] b c[s4] u

o[s1] o[s1] o[s1s3] c[s1] c[s1s4] u b b c[s3] b c[s4] -

c[s2] c[s2] c[s2s3] c[s2] c[s2s4] - u u u - - u

V1

V2

Page 32: Searching and Integrating Information on the Web

Seminar 2 32

“Union” views with postprocessing• Mediator can postprocess results from a source, and check if the results

satisfy certain conditions• Thus some entries are more “relaxing”

– Essentially: “o” can be treated as “f”, and “u” can be treated as “f”– E.g., “f, o[s3] f” instead of “o[s3]” – “c[s2], o[s3] c[s2]” instead of “c[s2s3]”– “b,u b” instead of “invalid combination”

f o[s3] b c[s4] u f f f b c[s4] f

o[s1] f f b c[s4] f b b b b c[s4] b

c[s2] c[s2] c[s2] c[s2] c[s2s4] c[s2] u f f b c[s4] f

V1

V2

Page 33: Searching and Integrating Information on the Web

Seminar 2 33

“Join” views with passing bindings

• Assumption:– MedView :- V1 JOIN V2

– The mediator can pass bindings from V1 to V2

– So the join order matters

f o[s3] b c[s4] u f f f f c[s4] f

o[s1] f f f c[s4] f b b b b c[s4] b

c[s2] c[s2] c[s2] c[s2] c[s2s4] c[s2] u f f f c[s4] f

V2

V1

Page 34: Searching and Integrating Information on the Web

Seminar 2 34

Other views

• Union

• Join

• Selection

• Projection

• Multiple views

Page 35: Searching and Integrating Information on the Web

Seminar 2 35

Concise template description• Some adornments subsume other adornments

• E.g.: “f” subsumes “b”, since every query supported by “b” is also supported by “f”

• Adornment graph: “subsumption” relationships

• Use the graph to “compress” templates: experiments shrank 26 8 templates

f

b

c

o

u

Adornment graph

n1 n2

Adornment n1 is at least as restrictive as adornment n2

n1 n2

Adornment n1 is at least as restrictive as adornment n2,if the constant set of n1 is a subset of that of n2

Page 36: Searching and Integrating Information on the Web

Seminar 2 36

Outline

• Basics: theories of conjunctive queries

• Global-as-view (GAV) approach to data integration

• Local-as-view (LAV) approach to data integration

Page 37: Searching and Integrating Information on the Web

Seminar 2 37

Local-as-view (LAV) approach

Mediator

• There are global predicates, e.g., “car,” “person,” “book,” etc.• They can been seen as mediator views• The content of each source is described using these global predicates• A query to the mediator is also defined on the global predicates• The mediator finds a way to answer the query using the source contents

sources

Page 38: Searching and Integrating Information on the Web

Seminar 2 38

ExampleMediator

• Global predicates: Loc(Dealer,City),Sell(Dealer,Make,Year)• Source content defined on global predicates:

S1(Dealer,City) :- Loc(Dealer,City);S2(Dear,Make,Year) :- Sell(Dear,Make,Year)

In general, each definition could be more complicated, rather than direct copies.

• Queries defined on global predicates. Q: ans(D,M,Y) :- Loc(D,’irvine’), Sell(D,M,Y)

– Users do not know source views.

• The mediator decides how to use source views to answer queries.– “Answering queries using views”:

ans(D,M,Y) :- S1(D,’irvine’), S2(D,M,Y)

S1(Dealer,City)

S2(Dealer,Make,Year)

Page 39: Searching and Integrating Information on the Web

Seminar 2 39

Another LAV Example

• Mediator predicates: car(C), sell(Car, Dealer), loc(dealer, city)• Views:

– v1(x) :- car(x)– v2(x) :- car(x), sell(x, d)– v3(x,d) :- sell(x, d), loc(d, ’la’)– v4(x) :- sell(x, d), loc(d, ’la’)

• Query: q(x) :- car(x), sell(x, d), loc(d, ’la’)

Page 40: Searching and Integrating Information on the Web

Seminar 2 40

Open-world assumption (OWA) and Close-world assumption (CWA)

W1(Make, Dealer) :- car(Make, Dealer)

W2(Make, Dealer) :- car(Make, Dealer)

All car tuplesW1 = W2 =

CWA

– W1 and W2 have all car tuples.

– E.g.: W1 and W2 are computed from the same car table in a database.

– W1 and W2 have some car tuples.

– E.g.: W1 and W2 are from two different web sites.

OWA

W1 W2

Page 41: Searching and Integrating Information on the Web

Seminar 2 41

• Projects: Information Manifold, Infomaster, Tukwila, …

• Advantages:– Scalable: new sources easy to add without modifying the

mediator views

– All we need to do is to define the new source using the existing mediator views (predicates)

• Disadvantages:– Hard to decide how to answer a query using views

Projects using the LAV approach

Page 42: Searching and Integrating Information on the Web

Seminar 2 42

Reading

• Alon Halevy, Answering Queries Using Views: A Survey.

Page 43: Searching and Integrating Information on the Web

Seminar 2 43

Answering queries using views

Mediator

• Source views can be complicated: SPJs, arithmetic comparisons,…• Not easy to decide how to answer a query using source views

Query: ans(D,M) :- Loc(D,'irvine'), Sell(D,M,Y).Rewriting: ans(D,M) :- V(D,‘irvine’, M,Y)– “Equivalent rewriting”: compute the “same” answer as the query– A rewriting can join multiple source views

• This problem exists in many other applications: – data warehousing– web caching– query optimizations

V(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y)

Query

Page 44: Searching and Integrating Information on the Web

Seminar 2 44

Arithmetic comparisons

Mediator

• Comparisons can make the problem even trickier• Query: ans(D,M) :- Loc(D,'irvine'), Sell(D,M,Y).

Rewriting: ans(D,M) :- V(D,‘irvine’, M,Y)

Contained rewriting: only retrieve cars before 1970. • Query: ans(D,M) :- Loc(D, 'irvine'), Sell(D,M,Y), Y <

1960Rewriting: ans(D,M) :- V(D,‘irvine’,M,Y), Y < 1960

V(D,C,M,Y):- Loc(D,C),Sell(D,M,Y),Y<1970

Page 45: Searching and Integrating Information on the Web

Seminar 2 45

Dropping attributes in views

Mediator

Drop “Year” in the view:V(D,C,M):- Loc(D,C),Sell(D,M,Y),Y<1970

• A variable in a CQ is called:– “distinguished”: if it appears in the query’s head– “nondistinguished”: otherwise

• The problem becomes even harder when we have nondistinguished variables.• Query: ans(D,M) :- Loc(D,'irvine'), Sell(D,M,Y), Y<1960

No rewriting! Since we do not have “Year” information.• Query: ans(D,M) :- Loc(D,'irvine'), Sell(D,M,Y), Y<1980

Contained rewriting: ans(D,M) :- V(D, ‘irvine’, M)

Page 46: Searching and Integrating Information on the Web

Seminar 2 46

Problems

• How to answer a query using views?• We will focus on the case where both the query and views are simply

conjunctive.

Query

Source views

Page 47: Searching and Integrating Information on the Web

Seminar 2 47

Query Expansion

• For each query P on views, we can expand P using the view definitions, and get a new query, denoted as Pexp, on the base tables.

• Pexp can be considered to be the “real” meaning of the query.• Example:

– View: V(D,C,M) :- Loc(D,C), Sell(D,M,Y)– A query P using V: ans(D,M) :- V(D,’la’,M)– Expansion: ans(D,M) :- Loc(D,’la’), Sell(D,M,Y)

Query P: ans() :- v1(), v2(), …, vk()

Expansion Pexp: ans():- p1,1(),…,p1,i1(),…, pk,1(),…,pk,ik()

Page 48: Searching and Integrating Information on the Web

Seminar 2 48

Rewritings

• Given a query Q and a set of views V:– A conjunctive query P is called a “rewriting” of Q using V if P only

uses views in V, and P computes a partial answer of Q. That is: Pexp Q. A rewriting is also called a “contained rewriting” (CR).

– A conjunctive query P is called an “equivalent rewriting” (ER) of Q using V if P only uses views in V, and P computes the exact answer of Q. That is: Pexp Q.

– A query P is called a “maximally-contained rewriting” of Q using V if P is a union of CRs of Q using V, and for any CR P1of Q, the answer to P contains the answer to query P1, that is, P1exp Pexp.

• See earlier slides for examples• Notice that all these definitions depend on the language of the

rewriting considered. Here we consider “conjunctive queries.”

Page 49: Searching and Integrating Information on the Web

Seminar 2 49

Focus: MiniCon algorithm

• MiniCon Algorithm: Rachel Pottinger and Alon Levy, “A scalable algorithm for answering queries using views,” VLDB 2000.

• See also: The Shared-variable-bucket algorithm by Prasenjit Mitra: "An Algorithm for Answering Queries Efficiently Using Views"; in Proceedings of the Australasian Database Conference, Jan 2001.

• Formulation:– Input: a conjunctive query Q and a set V of conjunctive views– Output: an maximally-contained rewriting (MCR) of Q using V

• Main idea:– For each query subgoal and for each view

Check if the view can be used to “answer” the query subgoal, and if so, in what “form”

Some “shared” variables are treated carefully

– Combine views to answer all query subgoals Reduced to a set-cover problem

Page 50: Searching and Integrating Information on the Web

Seminar 2 50

Example

• Query: q(x) :- car(x), sell(x, d), loc(d, ’la’)• Views:

– v1(x) :- car(x)– v2(x) :- car(x), sell(x, d)– v3(x,d) :- sell(x, d), loc(d, ’la’)– v4(x) :- sell(x, d), loc(d, ’la’)

Page 51: Searching and Integrating Information on the Web

Seminar 2 51

MCDs (“enhanced Buckets”)

• For query subgoal car(x), its MCD includes all views that can answer this subgoal:– v1(x), v2(x)

• MCD of query subgoal sell(x,d):– v3(x,d) only– but not v2(x)! Because:

Variable d is nondistinguished, i.e., it is not exported. Variable d is shared by another query subgoal, loc(d,’la’). If we

were to use v2(x) to answer query subgoal sell(x,d), we cannot get the dealer info to join with the other view to answer loc(d,’la’).

• MCD of query subgoal loc(d,’la’)– v3(x,d)

Page 52: Searching and Integrating Information on the Web

Seminar 2 52

Multi-subgoal MCD• MCD of query subgoals: sell(x,d),loc(d,’la’)

– v4(x)– If v4(x) is used to answer query subgoal sell(x,d), then the query subgoal

loc(d,’la’) must be answered using v4(x) as well.– The reason is that d is shared by two query subgoals, and the corresponding

variable in v4(x) is not exported.

Page 53: Searching and Integrating Information on the Web

Seminar 2 53

General rules• For a query subgoal G and a view subgoal H in view W, the MiniCon

algorithm considers a mapping from G to H• In this mapping, a query variable X is mapped to a view variable A• Four possible cases:

– Case 1: X is dist., A is dist.. OK. A is exported, so can join with other views.

– Case 2: X is nondist., A is dist.. OK. Same as above

– Case 3: X is dist., A is nondist.. NOT OK. X needs to be in the answer, but A is not exported.

– Case 4: X is nondist., A is nondist.. Then all the query subgoals using X must be able to be mapped to other subgoals in

view W. Reason: since A is not exported in W, it’s impossible for W to join with other views to

answer conditions involving X. I.e., “either NONE or ALL.”

Page 54: Searching and Integrating Information on the Web

Seminar 2 54

Combine MCDs to cover query subgoals

• Problem:– q(x) :- car(x), sell(x,d), loc (d,“la")– v1(x) :- car(x)– v2(x) :- car(x), sell(x,d)– v3(x,d) :- sell(x,d), loc(d,“la")– v4(x) :- sell(x,d), loc (d,“la")

• MCDs:– car(x) : v1(x), v2(x)– sell(x,d) : v3(x,d)– loc(d,"ca") : v3(x,d)– sell(x,d),loc(d,“la") : v4(x)

• Contained rewritings - using MCDs to cover all query subgoals, without overlap– P1: q(x) :- v1(x), v3(x,d), v3(x,d)– P2: q(x) :- v2(x), v3(x,d), v3(x,d)– P3: q(x) :- v1(x), v4(x)– P4: q(x) :- v2(x), v4(x)

• MCR: union of these four contained rewritings.

Page 55: Searching and Integrating Information on the Web

Seminar 2 55

Related references

• Other algorithms on AQUV:– Bucket, Inverse-rule

• Generating efficient equivalent rewritings of queries using views:– CoreCover algorithm: [Afrati, Li, Ullman, SIGMOD’01]

• Handling arithmetic comparisons and dropped attributes:– [Afrati, Li, Mitra, PODS’02]– [Afrati, Li, Mitra, EDBT’04]

Query

Source views