making views self-main - stanford universityilpubs.stanford.edu/157/1/1996-30.pdfmaking views...

Making Views Self-Maintainable for Data

Warehousing

Dallan Quass�

Stanford University

[email protected]

Ashish Gupta

IBM Almaden

[email protected]

Inderpal Singh Mumick

AT&T Research

[email protected]

Jennifer Widom�

Stanford University

[email protected]

Abstract

A data warehouse stores materialized views over data from one or more sources in order

to provide fast access to the integrated data, regardless of the availability of the data sources.

Warehouse views need to be maintained in response to changes to the base data in the sources.

Except for very simple views, maintaining a warehouse view requires access to data that is

not available in the view itself. Hence, to maintain the view, one either has to query the data

sources or store auxiliary data in the warehouse. We show that by using key and referential

integrity constraints, we often can maintain a select-project-join view when there are insertions,

deletions, and updates to the base relations without going to the data sources or replicating the

base relations in their entirety in the warehouse. We derive a set of auxiliary views such that the

warehouse view and the auxiliary views together are self-maintainable|they can be maintained

without going to the data sources or replicating all base data. In addition, our technique can be

applied to simplify traditional materialized view maintenance by exploiting key and referential

integrity constraints.

1 Introduction

The problem of materialized view maintenance has received increasing attention recently [GM95,

GM96,Mum95], particularly due to its application to data warehousing [DEB95,ZGHW95]. A view

is a derived relation de�ned in terms of base relations. A view is said to be materialized when it is

stored in the database, rather than computed from the base relations in response to queries. The

materialized view maintenance problem is the problem of keeping the contents of the stored view

consistent with the contents of the base relations as the base relations are modi�ed.

Data warehouses store materialized views in order to provide fast access to information that is

integrated from several distributed data sources [DEB95]. The data sources may be heterogeneous

and/or remote from the warehouse. Consequently, the problem of maintaining a materialized view

in a data warehouse di�ers from the traditional view maintenance problem where the view and

base data are stored in the same database. In particular, when changes are reported by one data

�This work was supported by Rome Laboratories under Air Force Contract F30602-94-C-023 and by equipment

grants from Digital and IBM Corporations.

1

source it may be necessary to access base data from other data sources in order to maintain the

view [HGW+95].

For any view involving a join, maintaining the view when base relations change may require

accessing base data, even when incremental view maintenance techniques are used [GL95,GMS93].

For example, for a view R 1 S, when an insertion to relation R is reported it is usually necessary

to query S in order to discover which tuples in S join with the insertion to R. In the warehousing

scenario, accessing base data means either querying the data sources or replicating the base relations

in the warehouse. The problems associated with querying the data sources are that the sources may

periodically be unavailable, may be expensive or time-consuming to query, and inconsistencies can

result at the warehouse unless care is taken to avoid them through the use of special maintenance

algorithms [ZGHW95]. The problems associated with replicating base relations at the warehouse

are the additional storage and maintenance costs incurred. In this paper we show that for many

views, including views with joins, if key and referential integrity constraints are present then it is

not necessary to replicate the base relations in their entirety at the warehouse in order to maintain

a view. We give an algorithm for determining what extra information, called auxiliary views, can

be stored at a warehouse in order to maintain a select-project-join view without accessing base data

at the sources. The algorithm takes key and referential integrity constraints into account, which

are often available in practice, to reduce the sizes of the auxiliary views. When a view together

with a set of auxiliary views can be maintained at the warehouse without accessing base data, we

say the views are self-maintainable.

Maintaining materialized views in this way is especially important for data marts|miniature

data warehouses that contain a subset of data relevant to a particular domain of analysis or geo-

graphic region. As more and more data is collected into a centralized data warehouse it becomes

increasingly important to distribute the data into localized data marts in order to reduce query

bottlenecks at the central warehouse. When many data marts exist, the cost of replicating entire

base relations (and their changes) at each data mart becomes especially prohibitive.

1.1 Motivating Example

We start with an example showing how the amount of extra information needed to maintain a view

can be signi�cantly reduced from replicating the base relations in their entirety. Here we present

our results without explanation of how they are obtained. We will revisit the example throughout

the paper.

Consider a database of sales data for a chain of department stores. The database has the

following relations.

store(store id, city, state, manager)

sale(sale id, store id, day, month, year)

line(line id, sale id, item id, sales price)

item(item id, item name, category, supplier name)

The �rst (underlined) attribute of each relation is a key for the relation. The store relation contains

the location and manager of each store. The sale relation has one record for each sale transaction,

2

with the store and date of the sale. A sale may involve several items, one per line on a sales receipt,

and these are stored in the line relation, with one tuple for every item sold in the transaction.

The item relation contains information about each item that is stocked. We assume that the

following referential integrity constraints hold: (1) from sale.store id to store.store id, (2)

from line.sale id to sale.sale id, and (3) from line.item id to item.item id. A referential

integrity constraint from S:B to R:A implies that for every tuple s 2 S there must be a tuple r 2 R

such that s:B = r:A.

Suppose the manager responsible for toy sales in the state of California is interested in main-

taining a view of this year's sales: \all toy items sold in California in 1996 along with the sales

price, the month in which the sale was made, and the name of the manager of the store where the

sale was made. Include the item id, the sale id, and the line id."

CREATE VIEW cal toy sales AS

SELECT store.manager, sale.sale id, sale.month, item.item id, item.item name, line.line id, line.sales price

FROM store, sale, line, item

WHERE store.store id = sale.store id and

sale.sale id = line.sale id and

line.line id = item.item id and

store.state = \CA" and

sale.year = 1996 and

item.category = \toy"

The question addressed in this paper is: Given a view such as the one above, what auxiliary

views can be materialized at the warehouse so that the view and auxiliary views together are

self-maintainable?

Figure 1 shows SQL expressions for a set of three auxiliary views that are su�cient to maintain

view cal toy sales for insertions and deletions to each of the base relations, and are themselves

self-maintainable. In this paper we give an algorithm for deriving such auxiliary views in the

general case, along with incremental maintenance expressions for maintaining the original view and

auxiliary views. Materializing the auxiliary views in Figure 1 represents a signi�cant savings over

materializing the base relations in their entirety, as illustrated in Table 1.

Suppose that each of the four base relations contain the number of tuples listed in the �rst

column of Table 1. Assuming that the selectivity of store.state="CA" is .02, the selectivity of

sale.year=1996 is .25, the selectivity of item.category="toy" is .05, and that distributions are

uniform, the number of tuples passing local selection conditions (selection conditions involving

attributes from a single relation) are given in the second column of Table 1. A related proposal

by Hull and Zhou [HZ96] achieves self-maintainability for base relation insertions by pushing down

projections and local selection conditions on the base relations and storing at the warehouse only

those tuples and attributes of the base relations that pass the selections and projections. Thus,

their approach would require that the number of tuples appearing in the second column of Table 1

is stored at the warehouse to handle insertions.

We improve upon the approach in [HZ96] by also taking key and referential integrity constraints

into account. For example, we don't need to materialize any tuples from line, because the key and

3

CREATE VIEW aux store AS

SELECT store id, manager

FROM store

WHERE state = \CA"

CREATE VIEW aux sale AS

SELECT sale id, store id, month

FROM sale

WHERE year = 1996 and

store id IN (SELECT store id FROM aux store)

CREATE VIEW aux item AS

SELECT item id, item name

FROM item

WHERE category = \toy"

Figure 1: Auxiliary Views for Maintaining the cal toy sales View

referential integrity constraints guarantee that existing tuples in line cannot join with insertions

into the other relations. Likewise we can exclude tuples in sale that do not join with existing

tuples in store whose state is California, because we are guaranteed that existing tuples in sale

will never join with insertions to store. Using our approach can dramatically reduce the number

of tuples in the auxiliary views over pushing down selections only. The number of tuples required

by our approach to handle base relation insertions in our example appears in the third column of

Table 1.

Tuples in Tuples Passing Tuples in Auxiliary

Base Relation Base Relation Local Selection Conditions Views of Figure 1

store 2,000 40 40

sale 80,000,000 20,000,000 400,000

line 800,000,000 800,000,000 0

item 1,000 50 50

Total 880,003,000 820,000,090 400,090

Table 1: Number of Tuples in Base Relations and Auxiliary Views

We can similarly use key constraints to handle deletions to the base relations without all the

base relations being available. We can determine the e�ects of deletions from sale, line, and item

without referencing any base relations because cal toy sales includes keys for these relations.

We simply join the deleted tuples with cal toy sales on the appropriate key. Even though the

view does not include a key for store, store is joined to sale on the key of sale, so the e�ect

of deletions from store can be determined by joining the deleted tuples with sale and joining the

result with cal toy sales on the key of sale.

4

Now consider updates. If all updates were treated as deletions followed by insertions, as is

common in view maintenance, then the properties of key and referential integrity constraints thatwe use to reduce the size of auxiliary views would no longer be guaranteed to hold. Thus, updatesare treated separately in our approach. Note that in data warehousing environments it is commonfor certain base relations not to be updated (e.g., relations sale and line may be append only).

Even when base relations are updateable, it may be that not all attributes are updated (e.g., wedon't expect to update the state of a store). If updates to the base relations in our example cannotchange the values of attributes involved in selection conditions in the view, then the auxiliary views

of Figure 1 are su�cient (even if attributes appearing in the view may be updated). If, on the otherhand, updates to sale may change the year (for example), then an additional auxiliary view:

CREATE VIEW aux line AS

SELECT line id, sale id, item id, sales price

FROM line

WHERE item id IN (SELECT item id FROM aux item)

would need to be materialized, which would have 40,000,000 tuples. That is, we would need to

store all purchases of items whose category is \toy," in case the year of the corresponding sales

record is changed later to 1996. Further, if updates may change the category of an item to \toy",

we would need to keep all of the line relation in order to maintain the view.

In practice we have found that the attributes appearing in selection conditions in views tend to

be attributes that are not updated, as in our example. As illustrated above and formalized later on,

when such updates do not occur, much less auxiliary information is required for self-maintenance.

Thus, exploiting knowledge of permitted updates is an important feature of our approach.

1.2 Self-Maintenance

Self maintenance is formally de�ned as follows. Consider a view V de�ned over a set of base

relations R. Changes, �R, are made to the relations in R in response to which view V needs to

be maintained. We want to compute �V , the changes to V , using as little extra information as

possible. If �V can be computed using only the materialized view V and the set of changes �R,

then view V alone is self-maintainable. If view V is not self-maintainable, we are interested in

�nding a set of auxiliary views A de�ned on the same relations as V such that the set of views

fV g[A is self-maintainable. Note that the set of base relations R forms one such set of auxiliary

views. However, we want to �nd more \economical" auxiliary views that are much smaller than

the base relations. The notion of a minimal set of auxiliary views su�cient to maintain view V is

formalized in Section 3.

A more general problem is to make a set V = V1; : : : ; Vn of views self-maintainable, i.e., �nd

auxiliary views A such that A[V is self-maintainable. Simply applying our algorithm to each view

in V is not satisfactory, since opportunities to \share" information across original and auxiliary

views will not be recognized. That is, the �nal set A[V may not be minimal. We intend to

investigate sets of views as future work.

5

1.3 Paper Outline

The paper proceeds as follows. Section 2 presents notation, terminology, and some assumptions.

Section 3 presents an algorithm for choosing a set of auxiliary views to materialize that are su�cient

for maintaining a view and are self-maintainable. Section 4 shows how the view is maintained using

the auxiliary views. Section 5 explains how the auxiliary views are self-maintained. Related work

and conclusions are presented in Section 6. Additional details and proofs are provided in the

appendices.

2 Preliminaries

We consider select-project-join (SPJ) views; that is, views consisting of a single projection followed

by a single selection followed by a single cross-product over a set of base relations. As usual, any

combination of selections, projections, and joins can be represented in this form. We assume that

all base relations have keys but that a view might contain duplicates due to the projection at

the view. In this paper we assume single-attribute keys and conjunctions of selection conditions

(no disjunctions) for simplicity, but our results carry over to multi-attribute keys and selection

conditions with disjunctions. In Section 3 we will impose certain additional restrictions on the view

but we explain how those restrictions can be lifted in Appendix B. We say that selection conditions

involving attributes from a single relation are local conditions; otherwise they are join conditions.

We say that attributes appearing in the �nal projection are preserved in the view.

In order to keep a materialized view up to date, changes to base relations must be propagated

to the view. A view maintenance expression calculates the e�ects on the view of a certain type

of change: insertions, deletions, or updates to a base relation. We use a di�erential algorithm as

given in [GL95] to derive view maintenance expressions. For example, if view V = R1S, then the

maintenance expression calculating the e�ect of insertions to R (4R) is 4VR = 4R 1 S, where

4VR represents the tuples to insert into V as a result of 4R.

Since in data warehousing environments updates to certain base relations may not occur, or

may not change the values of certain attributes, we de�ne each base relation R as having one of

three types of updates, depending on how the updateable attributes are used in the view de�nition:

� If updates to R may change the values of attributes involved in selection conditions (local or

join) in the view, then we say R has exposed updates.

� Otherwise, if updates to R will not change the values of attributes involved in selection

conditions but may change the values of preserved attributes (attributes included in the �nal

projection), then we say R has protected updates.

� Otherwise, if updates to R will not change the values of attributes involved in selection

conditions or the values of preserved attributes, then we say R has ignorable updates.

Ignorable updates cannot have any a�ect on the view, so they do not need to be propagated. From

now on we consider only exposed and protected updates. Exposed updates could cause new tuples

to be inserted into the view or tuples to be deleted from the view, so we propagate them as deletions

6

of tuples with the old values followed by insertions of tuples with the new values. For example,

given a view V = �R:A=10R1S, if the value of R:A for a tuple in R is changed from 9 to 10 then

new tuples could be inserted into V as a result. Protected updates can only change the attribute

values of existing tuples in the view; they cannot result in tuples being inserted into or deleted

from the view. We therefore propagate protected updates separately. An alternate treatment of

updates is considered in Section 4.2.2.

In addition to the usual select, project, and join symbols, we use >< to represent semijoin, ]

to represent union with bag semantics, and �� to represent minus with bag semantics. We further

assume that project (�) has bag semantics. The notation 1X represents an equijoin on attributeX ,

while 1key(R) represents an equijoin on the key attribute of R, assuming this attribute is in both of

the joined relations. Insertions to a relation R are represented as 4R, deletions are represented as

5R, and protected updates are represented as �R. Tuples in �R have two attributes corresponding

to each of the attributes of R: one containing the value before update and another containing the

value after update. We use �old to project the old attribute values and �new to project the new

attribute values.

3 Algorithm for Determining Auxiliary Views

We present an algorithm (Algorithm 3.1 below) that, given a view de�nition V , derives a set of

auxiliary views A such that view V and the views in A taken together are self-maintainable; i.e.,

can be maintained upon changes to the base relations without requiring access to any other data.

Each auxiliary view ARi2 A is an expression of the form:

ARi= (��Ri) ><ARj1

><ARj2><: : :><ARjn

That is, each auxiliary view is a selection and a projection on relation Ri followed by zero or

more semijoins with other auxiliary views. It can be seen that the number of tuples in each ARi

is never larger than the number of tuples in Ri and, as we have illustrated in Section 1.1, may

be much smaller. Auxiliary views of this form can easily be expressed in SQL, and they can be

maintained e�ciently as will be shown in Section 5. As an additional optimization, change relevancy

tests [BCL89,LS93] could be used to determine which changes to the base relations are relevant

to the auxiliary views, thereby reducing the number of changes sent to the warehouse. Change

�ltering is not discussed further in this paper.

Intuitively, the �rst part of the auxiliary view expression, (��Ri), results from pushing down

projections and local selection conditions onto Ri. Tuples in Ri that do not pass local selection

conditions cannot possibly contribute to tuples in the view; hence they are not needed for view

maintenance and therefore need not be stored in ARiat the warehouse. The semijoins in the second

part of the auxiliary view expression further reduce the number of tuples in ARiby restricting it

to contain only those tuples joinable with certain other auxiliary views. In addition, we will show

that in some cases the need for ARican be eliminated altogether.

We �rst need to present a few de�nitions that are used in the algorithm.

Given a view V , let the join graph G(V ) of a view be a directed graph hR; Ei. R is

7

the set of relations referenced in V , which form the vertices of the graph. There is a

directed edge e(Ri; Rj) 2 E from Ri to Rj if V contains a join condition Ri:B = Rj :A

and A is a key of Rj . The edge is annotated with RI if there is a referential integrity

constraint from Ri:B to Rj:A.

We assume for now that the graph is a forest (a set of trees). That is, each vertex has at most

one edge leading into it and there are no cycles. This assumption still allows us to handle a

broad class of views that occur in practice. For example, views involving chain joins (a sequence

of relations R1; : : : ; Rn where the join conditions are between a foreign key of Ri and a key of

Ri+1; 1 <= i < n) and star joins (one relation R1, usually large, joined to a set of relations

R2; : : : ;Rn, usually small, where the join conditions are between foreign keys in R1 and the keys

of R2; : : : ; Rn) have tree graphs. In addition, we assume that there are no self-joins. We explain

how each of these assumptions can be removed in Appendix B.

The following de�nition is used to determine the set of relations upon which a relation Ri

depends|that is, the set of relations Rj in which (1) a foreign key in Ri is joined to a key of Rj,

(2) there is a referential integrity constraint from Ri to Rj , and (3) Rj has protected updates.

Dep(Ri; G) = fRj j 9e(Ri; Rj) in G(V ) annotated with RI and Rj does not have exposed updatesg

Dep(Ri; G) determines the set of auxiliary views to which Ri is semijoined in the de�nition of

the auxiliary view ARifor Ri, given above. The reason for the semijoins is as follows. Let Rj be a

member of Dep(Ri; G). Due to the referential integrity constraint from Ri to Rj and the fact that

the join between Ri and Rj is on a key of Rj , each tuple ti 2 Ri must join with one and only one

tuple tj 2 Rj. Suppose tj does not pass the local selection conditions on Rj . Then tj , and hence ti,

cannot contribute to tuples in the view. Because updates to Rj are protected (by the de�nition of

Dep(Ri; G)), tj , and hence ti, will never contribute to tuples in the view, so it is not necessary to

include ti in ARiat the warehouse. It is su�cient to store only those tuples of Ri that pass the local

selection conditions on Ri and join with a tuple in Rj that passes the local selection conditions on

Rj (i.e., (�Ri) >< (�Rj), where the semijoin condition is the same as the join condition between

Ri and Rj in the view). That Ri can be semijoined with ARj, rather than �Rj , in the de�nition of

ARifollows from a similar argument applied inductively.

The following de�nition is used to determine the set of relations upon which relation Ri tran-

sitively depends.

Dep+(Ri; G) is the transitive closure of Dep(Ri; G)

Dep+(Ri; G) is used to help determine whether it is necessary to store ARiat the warehouse in

order to maintain the view or whether ARican be eliminated altogether. Intuitively, if Dep+(Ri; G)

includes all relations referenced in view V except Ri, then ARiis not needed for propagating inser-

tions to any base relation onto V . The reason is that the key and referential integrity constraints

guarantee that new insertions into the other base relations can join only with new insertions into

Ri, and not with existing tuples in Ri. This behavior is explained further in Section 4.

The following de�nition is used to determine the set of relations with which relation Ri needs to

join so that the key of one of the joining relations is preserved in the view (where all joins must be

8

store

RI

sale

RI

line

RI

item

Figure 2: Join Graph G(cal toy sales)

from keys to foreign keys). If no such relation exists then Need(Ri; G) includes all other relations

in the view.

Need(Ri; G) =

8>>>><>>>>:

� if the key of Ri is preserved in V ,

fRjg [ Need(Rj; G) if the key of Ri is not preserved in V but there is an

Rj such that e(Rj ; Ri) in G(V ),

R � fRig otherwise

Note that because we restrict the graph to be a forest, there can be at most one Rj such that

e(Rj ; Ri) is in G(V ). We rede�ne Need when this restriction is removed in Appendix B.

Need(Ri; G) is also used to help determine whether it is necessary to store auxiliary views. In

particular, an auxiliary view ARjis necessary if Rj appears in the Need set of some Ri. Intuitively,

if the key of Ri is preserved in view V , then deletions and protected updates toRi can be propagated

to V by joining them directly with V on the key of Ri. Otherwise, if the key of Ri is not preserved

in V but Ri is joined with another relation Rj on the key of Ri and V preserves the key of Rj, then

deletions and protected updates to Ri can be propagated onto V by joining them �rst with Rj, then

joining the result with V . In this case Rj is in the Need set of Ri, and hence ARjis necessary. More

generally, if the key of Ri is not present in V but Ri joins with Rj on the key of Ri, then auxiliary

views for Rj and each of the relations in Need(Rj; G) are necessary for propagating deletions and

protected updates to Ri. Finally, if none of the above conditions hold then auxiliary views for all

relations referenced in V other than Ri are necessary.

To illustrate the above de�nitions we consider again the cal toy sales view of Section 1.1.

Figure 2 shows the graph G(cal toy sales). The Dep, Dep+, and Need functions for each of the

base relations are given in Table 2. Assume for now that each base relation has protected updates.

Algorithm 3.1 appears in Figure 3. We will explain how the algorithm works on our running

cal toy sales example. The auxiliary views generated by the algorithm are exactly those given

in Figure 1 of Section 1.1. They are shown in relational algebra form in Table 3.

For each relation Ri referenced in the view V , the algorithm checks whether Dep+(Ri; G)

includes every other relation referenced in V and Ri is not in Need(Rj; G) for any relation Rj

referenced in V . If so, it is not necessary to store any part of Ri in order to maintain V . Relation

line is an example where an auxiliary view for the relation is not needed.

9

Dep(store; G) = �

Dep(sale; G) = fstoreg

Dep(item; G) = �

Dep(line; G) = fsale; itemg

Dep+(store; G) = �

Dep+(sale; G) = fstoreg

Dep+(item; G) = �

Dep+(line; G) = fsale; item; storeg

Need(store; G) = fsaleg

Need(sale; G) = �

Need(item; G) = �

Need(line; G) = �

Table 2: Dep and Need Functions for Base Relations

Astore = �store id; manager �state=CA store

Asale = (�sale id; store id; month �year=1996 sale) ><store id Astore

Aitem = �item id; item name �category=toy item

Table 3: Auxiliary Views for Maintaining the cal toy sales View

Otherwise, two steps are taken to reduce the amount of data stored in the auxiliary view ARi

for Ri. First, it is possible to push down on Ri local selection conditions (explicit or inferred) in

the view so that tuples that don't pass the selection conditions don't need to be stored; it is also

possible to project away all attributes from Ri except those that are involved in join conditions,

preserved in V , or are a key of Ri. Second, if Dep(Ri; G) is not empty, it is possible to further

reduce the tuples stored in ARito only those tuples of Ri that join with tuples in other auxiliary

views ARkwhere Rk is in Dep(Ri; G). The auxiliary view for sale is an example where both steps

have been applied. Asale is restricted by the semijoin with Astore to include only tuples that join

with tuples passing the local selection conditions on store. The auxiliary views for store and

item are examples where only selection and projection can be applied.

Although view de�nitions are small and running time is not crucial, we observe that the running

time of Algorithm 3.1 is polynomial in the number of relations, and therefore is clearly acceptable.

We now state a theorem about the correctness and minimality of the auxiliary views derived by

Algorithm 3.1.

Theorem 3.1 Let V be a view with a tree-structured join graph. The set of auxiliary views A

10

Algorithm 3.1

Input

View V .

Output

Set of auxiliary view de�nitions A.

Method

Let R be the set of relations referenced in V

Construct graph G(V )

for every relation Ri 2 R

Construct Dep(Ri; G), Dep+(Ri; G), and Need(Ri; G)

for every relation Ri 2 R

if Dep+(Ri; G) = R� fRig and

6 9Rj 2 R such that Ri 2 Need(Rj; G),

then ARiis not needed

else ARi= (�P�SRi) ><C1ARk1

><C2ARk2><C3 : : :><CmARkm

, where

P is the set of attributes in Ri that are preserved in V , appear in join conditions, or are

a key of Ri,

S is the strictest set of local selection conditions possible on Ri,

Cl is the join condition Ri:B = Rkl :A with A a key of Rkl , and

Dep(Ri; G) = fRk1 ; Rk2; : : : ; Rkmg

3

Figure 3: Algorithm to Derive Auxiliary Views

produced by Algorithm 3.1 is the unique minimal set of views that can be added to V such that

fV g[A is self-maintainable. 2

The proof of Theorem 3.1 is given in Appendix A. By minimal we mean that no auxiliary

view can be removed from A, and it is not possible to add an additional selection condition or

semijoin to further reduce the number of tuples in any auxiliary view and still have fV g[A be self-

maintainable. We show in Section 4 how V can be maintained using A, and we show in Section 5

how to maintain A without referencing base relations.

3.1 E�ect of Exposed Updates

Recall that so far in our example we have considered protected updates only. Suppose sale had

exposed updates (i.e., updates could change the values of year, sale id, or store id). We note

that the de�nition of Dep does not include any relation that has exposed updates. Thus, the Dep

function for line will not include sale, and we get:

Dep(line; G) = fitemg Dep+(line; G) = fitemg

11

in which case an auxiliary view for line would be created as:

Aline = line><item idAitem

No selection or projection can be applied on line in Aline because there are no local selections

on line in the view and all attributes of line are either preserved in the view or appear in join

conditions. Section 4.2 explains why exposed updates have a di�erent e�ect on the set of auxiliary

views needed than protected updates.

4 Maintaining the View Using the Auxiliary Views

Recall that a view maintenance expression calculates the e�ects on the view of a certain type of

change: insertions, deletions, or updates to a base relation. View maintenance expressions are

usually written in terms of the changes and the base relations [GL95,CGL+96]. In this section we

show that the set of auxiliary views chosen by Algorithm 3.1 is su�cient to maintain the view by

showing how to transform the view maintenance expressions written in terms of the changes and

the base relations to equivalent view maintenance expressions written in terms of the changes, the

view, and the auxiliary views.

We give view maintenance expressions for each type of change (insertions, deletions, and up-

dates) separately. In addition, for each type of change we apply the changes to each base relation

separately by propagating the changes to the base relation onto the view and updating the base

relation. The reason we give maintenance expressions of this form, rather than maintenance expres-

sions propagating several types of changes at once, is that maintenance expressions of this form are

easier to understand and they are su�cient for our purpose: showing that it is possible to maintain

a view using the auxiliary views generated by Algorithm 3.1. View maintenance expressions for

insertions are handled in Section 4.2, for deletions are handled in Section 4.3, and for protected

updates are handled in Section 4.4. Since exposed updates are handled as deletions followed by

insertions, they are treated within Sections 4.2 and 4.3.

4.1 Propagating Multiple Types of Changes

Before giving the view maintenance expressions, we digress brie y to explain the mechanism

whereby insertions, deletions, and updates are propagated separately. A subtle point is involved

that we feel is not widely understood.

In general, if a sequence S of intermixed insert, delete, and update operations to a set of base

relations is to be propagated to a view V , it is not correct to simply propagate all the deletions,

then all the insertions, then all the updates in S to V . For example, if a tuple t were inserted into

and subsequently deleted from a base relation, by propagating insertions after deletions the e�ects

of tuple t would be seen in the view.

In order to correctly propagate all operations of each type of change separately, it is necessary

to �rst transform S into an equivalent sequence S0 that represents the \net e�ect" [SP89] of the

operations in S. That is, if one operation in S is followed by another operation on the same

tuple, the two operations are replaced by a single operation that represents the net e�ect of both

12

operations on that tuple. Table 4 gives the transformation rules. (In the table, \update" refers

to protected updates; recall that we model exposed updates as \delete t" followed by \insert t0,"

which is not transformed by the rules in the table.) The rules of the table may need to be applied

multiple times. For example, if S contained three operations: update t ! t0, update t0 ! t00, and

update t00 ! t000, then two applications of the last rule in the table would be necessary.

Earlier operation Later operation Net operation

delete t insert t no operation

insert t delete t no operation

insert t update t! t0 insert t0

update t! t0 delete t0 delete t

update t! t0 update t0 ! t00 update t! t00

Table 4: Capturing Net E�ect

The maintenance expressions used in this paper for propagating changes use the contents of the

base relations and possibly the view before the changes have been applied. The changes are then

applied to the view and the base relations before the next maintenance expression is evaluated.

Furthermore, since we propagate exposed updates as deletions followed by insertions, deletions to

a base relation must be propagated before insertions to that base relation so that key constraints

are not violated during propagation.

4.2 Insertions

In this section we show how the e�ect on a view of insertions to base relations can be calculated

using the auxiliary views chosen by Algorithm 3.1. The view maintenance expression for calculating

the e�ects on an SPJ view V of insertions to a base relation R is obtained by substituting 4R

(insertions to R) for base relation R in the relational algebra expression for V . For example, the

view maintenance expressions calculating the e�ects on our cal toy sales view (Section 1.1) of

insertions to store, sale, line, and item appear in Table 5.

A few words of explanation about the table are in order.

� For convenience, in the table and hereafter we abbreviate store, sale, line, and item as

St, Sa, L, and I , respectively.

� We abbreviate view cal toy sales as V .

� We have applied the general rule of \pushing selections and projections down" to the main-

tenance expressions.

� We use the notation 4VR to represent the insertions into view V due to insertions into base

relation R. For example, 4VSt represents insertions into V due to insertions into St.

13

4VSt = �Schema(V ) (�store id;manager �state=CA 4St

1store id �sale id;store id;month �year=1996 Sa

1sale id L

1item id �item id;item name �category=toy I)

4VSa = �Schema(V ) (�store id;manager �state=CA St

1store id �sale id;store id;month �year=1996 4Sa

1sale id L


4VL = �Schema(V ) (�store id;manager �state=CA St


1sale id 4L


4VI = �Schema(V ) (�store id;manager �state=CA St


1sale id L

1item id �item id;item name �category=toy 4I)

Table 5: Maintenance Expressions for Insertions

� Each of the maintenance expressions of Table 5 calculates the e�ect on view V of insertions

to one of the base relations. We show in Section 4.2.3 that even if insertions to multiple base

relations are propagated at once, the auxiliary views generated by Algorithm 3.1 are still

su�cient.

From the expressions of Table 5 it would appear that beyond pushing down selections and projec-

tions, nothing can be done to reduce the base relation data required for evaluating the maintenance

expressions. If there are no referential integrity constraints, that is indeed the case. However, refer-

ential integrity constraints allow certain of the maintenance expressions to be eliminated, requiring

less base relation data in the auxiliary views. Maintenance expressions are eliminated due to the

following property and corresponding rule.

Property 4.1 (Insertion Property for Foreign Keys) If there is a referential integrity con-

straint from Rj :B to Ri:A (Rj :B is the \foreign key"), A is a key of Ri, and Ri does not have

exposed updates, then Ri 14Ri:A=Rj :B Rj = �. In general, if the above conditions hold then the

following is true.

��R11 : : :1��4Ri14Ri:A=Rj :B��Rj1 : : :��Rn = �

�

14

Property 4.1 holds because the referential integrity constraint requires that each tuple in Rj

join with an existing tuple in Ri, and because it joins on a key of Ri it cannot join with any of the

tuples in 4Ri, so the join of 4Ri with Rj must be empty.

Rule 4.1 (Insertion Rule for Foreign Keys) Let G(V ) be the join graph for view V . The

maintenance expression calculating the e�ect on a view V of insertions to a base relation Ri

is guaranteed to be empty and thus can be eliminated if there is some relation Rj such that

Ri 2 Dep(Rj ; G). �

Rule 4.1 is used to eliminate the maintenance expression that calculates the e�ect on a view V

of insertions to a base relation Ri if there is another relation Rj in V such that Ri 2 Dep(Rj; G)

where G(V ) is the join graph for V . The rule holds because, by the de�nition of Dep, Ri is in

Dep(Rj; G) when the view equates a foreign key of Rj to the key of Ri, there is a referential integrity

constraint from the foreign key in Rj to the key of Ri, and Ri has protected updates (the e�ect of

exposed updates is discussed in Section 4.2.2). Since the maintenance expression that calculates

the e�ect of insertions to Ri includes a join between 4Ri and Rj , it must be empty by Property 4.1

and therefore can be eliminated. Joins between keys and foreign keys are common in practice; in

SQL, referential integrity constraints are always declared between keys and foreign keys, so the

conditions of Rule 4.1 are often met.

Being able to eliminate certain maintenance expressions when calculating the e�ect on a view

V of insertions to base relations can signi�cantly reduce the cost of maintaining V . Although view

maintenance expressions themselves are not the main theme of this paper, nevertheless this is an

important stand-alone result.

4.2.1 Rewriting the Maintenance Expressions to Use Auxiliary Relations

Eliminating certain maintenance expressions using Rule 4.1 allows us to use the auxiliary views

instead of base relations when propagating insertions. After applying Rule 4.1, the remaining

maintenance expressions are rewritten using the auxiliary views generated by Algorithm 3.1 by

replacing each ��Ri subexpression with the corresponding auxiliary view ARifor Ri.

For example, assuming for now that the base relations have protected updates, the mainte-

nance expressions for 4VSt, 4VSa, and 4VI , in Table 5 can be eliminated by Rule 4.1 due to the

referential integrity constraints between sale.store id and store.store id, line.sale id and

sale.sale id, and line.item id and item.item id, respectively. Only 4VL, the expression cal-

culating the e�ect of insertions to L, is not guaranteed to be empty. The maintenance expression

4VL is rewritten using auxiliary views as follows. Recall that the auxiliary views are shown in

Table 3.

4VL = �Schema(V ) (ASt 1sale id ASa 1store id 4L 1item id AI)

Notice that the base relation L is never referenced in the above maintenance expression, so an

auxiliary view for L is not needed. In addition, Sa is joined with St in the maintenance expression,

which is why it is acceptable to store only the tuples in Sa that join with existing tuples in St|

tuples in Sa that don't join with existing tuples in St won't contribute to the result. We show how

15

to handle insertions to Sa and St simultaneously in Section 4.2.3. A proof that the auxiliary views

are su�cient in general to evaluate the (reduced) maintenance expressions for insertions appears

in Appendix A.

4.2.2 E�ect of Exposed Updates

Suppose the view contains a join condition Rj:B = Ri:A, A is a key of Ri, there is a referen-

tial integrity constraint from Rj :B to Ri:A, but Ri has exposed, rather than protected, updates.

Dep(Rj; G) thus does not contain Ri. Recall that exposed updates can change the values of at-

tributes involved in selection conditions (local or join). We handle exposed updates as deletions of

tuples with the old attribute values followed by insertions of tuples with the new attribute values,

since exposed updates may result in deletions or insertions in the view. Thus, if Ri has exposed

updates then 4Ri may include tuples representing the new values of exposed updates. Because

these tuples can join with existing tuples in Rj (without violating the referential integrity or key

constraints), Property 4.1 does not hold and Rule 4.1 cannot be used to eliminate the maintenance

expression propagating insertions to Ri.

For example, suppose updates may occur to the year attribute of Sa. Then an auxiliary view

for L would be created as AL = L><item idAI as shown in Section 3.1. We cannot semijoin L with

ASa in the auxiliary view for L because new values of updated tuples in Sa could join with existing

tuples in L, where the old values of the updated tuples didn't pass the local selection conditions on

Sa and hence weren't in ASa. That is, suppose the year of some sale tuple t was changed from 1995

to 1996. Although the old value of t doesn't pass the selection criteria year=1996 and therefore

wouldn't appear in ASa, the new value of t would, and since it could join with existing tuples in L

we cannot restrict AL to include only those tuples that join with existing tuples in ASa.

In this paper we assume that it is known in advance whether each relation of a view V has

exposed or protected updates. If a relation has exposed updates, we may need to store more

information in the auxiliary views in order to maintain V than if the relation had protected updates.

For example, we had to create an auxiliary view for L when Sa had exposed updates, where the

auxiliary view for L wasn't needed when Sa had protected updates.

An alternate way to consider updates, which doesn't require advance knowledge of protected

versus exposed, is to assume that every base relation has protected updates. Then, before propa-

gating updates, the updates to each base relation are divided into two classes: updates that do not

modify attributes involved in selection conditions, and those that do. The �rst class of updates can

be propagated as protected updates using the expressions of Section 4.4. Assuming the second class

of updates is relatively small, updates in the second class could be propagated by issuing queries

back to the data sources.

4.2.3 Propagating Insertions to Multiple Relations at Once

When using maintenance expressions of the form in Table 5, if we wish to obtain the e�ect on

V of propagating insertions to all four base relations at once, it is incorrect to evaluate the four

maintenance expressions and take the union: new view tuples resulting from joining tuples in more

than one 4 relation would not be created. It is instead necessary when using Table 5 to assume

16

that insertions to multiple base relations are propagated to the view one base relation at a time, so

that when insertions to base relation Ri are propagated, the insertions to base relations Rj(j < i),

have already been applied to the base relations.

Table 6 shows for our example a single maintenance expression propagating insertions to all

base relations at once, obtained using the method in [GMS93]. For simplicity we have omitted the

selections and projections.

4V = (4St 1 Sa 1 L 1 I)

] ((4St]St) 1 4Sa 1 L 1 I)

] ((4St]St) 1 (4Sa]Sa) 1 4L 1 I)

] ((4St]St) 1 (4Sa]Sa) 1 4L 1 I)

] ((4St]St) 1 (4Sa]Sa) 1 (4L]L) 1 4I)

Table 6: Propagating Insertions to Multiple Base Relations at Once

It is possible to reduce the maintenance expression in Table 6 to the following maintenance

expression by eliminating the joins that are guaranteed to be empty by Property 4.1.

4V = (((4St]St) 1 4Sa)](St 1 Sa)) 14L 1 (4I]I)

Just as for the case of propagating insertions one relation at a time, this maintenance expression

can be rewritten to use auxiliary views. An auxiliary view for L is not needed, and Sa can be

semijoined with ASt in the auxiliary view for Sa.

4V = (((4St]ASt) 1 4Sa)](ASt 1 ASa)) 14L 1 (4I]AI)

The amount of data needed in the auxiliary views is the same whether insertions are propagated

one relation at a time or all at once. The same holds true for propagating deletions and updates

to multiple base relations at once, although due to space constraints details are not given in this

paper.

4.3 Deletions

In this section we show how the e�ect on a view of deletions to base relations can be calculated

using the auxiliary views. The view maintenance expression for calculating the e�ects on an SPJ

view V of deletions to a base relation R is obtained similarly to the expression for calculating the

e�ects of insertions: we substitute 5R (deletions to R) for base relation R in the relational algebra

expression for V . For example, the view maintenance expressions for calculating the e�ects on

our cal toy sales view of deletions to store, sale, line, and item appear respectively as 5VSt,

5VSa, 5VL, and 5VI in Table 7. We use the notation 5VR to represent the deletions from view

V due to deletions from base relation R.

Often we can simplify maintenance expressions for deletions to use the contents of the view itself

if keys are preserved in the view. We do this using the following properties and rule for deletions

in the presence of keys.

17

5VSt = �Schema(V ) (�store id;manager �state=CA 5St


1sale id L


5VSa = �Schema(V ) (�store id;manager �state=CA St

1store id �sale id;store id;month �year=1996 5Sa

1sale id L


5VL = �Schema(V ) (�store id;manager �state=CA St


1sale id 5L


5VI = �Schema(V ) (�store id;manager �state=CA St


1sale id L

1item id �item id;item name �category=toy 5I)

Table 7: Maintenance Expressions for Deletions

Property 4.2 (Deletion Property for Keys) Given view V = �Schema(V )(��R11 : : :1��Rn),

if the key of a relation Ri is preserved in V then the following equivalence holds:

�Schema(V )(��R11 : : :1��Ri�11��5Ri1��Ri+11 : : :1��Rn) � �Schema(V )(V 1key(Ri) 5Ri)

�

Consider the join graph G(V ) of view V . Property 4.2 says that if V preserves the key of some

relation Ri (i.e., Need(Ri; G) = �), then we can calculate the e�ect on V of deletions to Ri by

joining V with 5Ri on the key of Ri. The property holds because each tuple in V with the same

value for the key of Ri as a tuple t in 5Ri must have been derived from t. Conversely, all tuples in

V that were derived from tuple t in 5Ri must have the same value as t for the key of Ri. Therefore,

the set of tuples in V that join with t on the key of Ri is exactly the set of tuples in V that should

be deleted when t is deleted from Ri. A similar property holds if the key of Ri is not preserved in

V , but is equated by a selection condition in V to an attribute C that is preserved in V . In this

case the e�ect of deletions from Ri can be obtained by joining V with 5Ri using the join condition

V:C = key(Ri).

Property 4.2 is used in [GJM96] to determine when a view is self-maintainable with respect to

deletions from a base relation. We extend their result with Property 4.3.

18

Property 4.3 (Deletion Property for Key Joins) Given a view

V = �Schema(V )(��R11 : : :1��Rn) satisfying the following conditions:

1. V contains join conditions Ri:A = Ri+1:B, Ri+1:A = Ri+2:B, : : :, Ri+k�1:A = Ri+k:B

2. attribute A is a key for Ri+j(0 <= j <= k), and

3. Ri+k:A is preserved in V ,

then the following equivalence holds (even without referential integrity constraints):

�Schema(V )(��R11 : : :1��Ri�11��5Ri1��Ri+11 : : :1��Rn) �

�Schema(V )(V 1Ri+k:A ��Ri+k 1Ri+k:B=Ri+k�1:A ��Ri+k�1 1 : : :1Ri+1:B=Ri:A 5Ri)�

Let G(V ) be the join graph for view V . Property 4.3 generalizes Property 4.2 to say that if V

preserves the key of some relation Ri+k and Ri joins to Ri+k along keys (that is, Need(Ri; G) =

fRi+1; : : : ; Ri+kg and does not include all the base relations of V ), then we can calculate the e�ect

on V of deletions to Ri by joining 5Ri with the sequence of relations up to Ri+k and then joining

Ri+k with V . The property holds because tuples in V with the same value for the key of Ri+k as a

tuple t in Ri+k must have been derived from t as explained in Property 4.2. Furthermore, since the

joins between Ri+k and Ri are all along keys, each tuple in Ri+k can join with at most one tuple

t0 in Ri, which means that tuples in V that are derived from tuple t in Ri+k must also be derived

from tuple t0 in Ri. Conversely, if a tuple in V is derived from t0 in Ri, then it must have the same

value for the key of Ri+k as some tuple t in Ri+k that t0 joins with. Therefore, the set of tuples in

V that join on the key of Ri+k with some tuple t in Ri+k that joins along keys with a tuple t0 in

Ri is exactly the set of tuples in V that should be deleted when t0 is deleted from Ri. As before,

a similar property also holds if a key of Ri+k is not preserved in V but is equated by a selection

condition in V to an attribute C that is preserved in V . In this case the e�ect of deletions from Ri

can be obtained by joining V with Ri+k using the join condition V:C = key(Ri+k).

Rule 4.2 (Deletion Rule) Let V be a view with a tree structured join graph G(V ), and let

Need(Ri; G) = fRi+1; : : : ; Ri+kg, where k � 0. The maintenance expression calculating the ef-

fect on a view V of deletions to a base relation Ri may be simpli�ed according to Property 4.3 to

reference V unless Need(Ri; G) includes all the base relations of V except Ri. �

Rule 4.2 is used to simplify maintenance expressions for deletions to use the contents of the

view and fewer base relations. This allows us to rewrite the maintenance expressions for deletions

to use the auxiliary views instead of base relations.


After simplifying the maintenance expressions according to Rule 4.2, the simpli�ed expressions are

rewritten to use the auxiliary views generated by Algorithm 3.1 by replacing each ��Ri subex-

pression in the maintenance expression with the corresponding auxiliary view ARifor Ri. This

rewriting is similar to the rewriting for insertions.

19

Consider again our running example. The maintenance expressions of Table 7 are simpli�ed

using Rule 4.2 as follows.

5VSt = �Schema(V ) (V 1sale id �sale id;store id�year=1996Sa 1store id 5St)

5VSa = �Schema(V ) (V 1sale id 5Sa)

5VL = �Schema(V ) (V 1line id 5L)

5VI = �Schema(V ) (V 1item id 5I)

The �rst maintenance expression is rewritten to use the auxiliary view ASa in place of base relation

Sa. The other maintenance expressions do not reference any base relations so they are not rewritten

to use auxiliary views.

A proof that the auxiliary views are su�cient in general to evaluate the (simpli�ed) maintenance

expressions for deletions appears in Appendix A. For this example note that ASa is the only

auxiliary view referenced in the maintenance expressions for deletions. Because all tuples in Sa

that could contribute to tuples in the view are also in ASa, when calculating the e�ect on the view

of deletions to St it is su�cient to join 5St with tuples in ASa.

4.4 Protected Updates

In this section we show how the e�ect on a view of protected updates to base relations can be

calculated using the auxiliary views. (Recall that exposed updates are treated separately as dele-

tions followed by insertions.) We give two maintenance expressions for calculating the e�ect on a

view V of protected updates to a base relation R: one returning the tuples to delete from the view

(denoted as 5�VR) and another returning the tuples to insert into the view (denoted as 4�VR). In

practice, these pairs of maintenance expressions usually can be combined into a single SQL update

statement.

The view maintenance expression for calculating the tuples to delete from an SPJ view V due to

protected updates to a base relation R is obtained by substituting �old�R (the old attribute values

of the updated tuples in R) for base relation R in the relational algebra expression for V . (Recall

that �R, �old, and�new were de�ned in Section 2.) The view maintenance expression for calculating

the tuples to insert is obtained similarly by substituting �new�R (the new attribute values of the

updated tuples in R) for base relation R in the relational algebra expression for V . For example,

the view maintenance expressions calculating the tuples to delete from our cal toy sales view due

to protected updates to each of the base relations are given in Table 8. Expressions calculating the

tuples to insert into the view cal toy sales are not shown but can be obtained by substituting

�new for �old in the expressions of Table 8. Note that the Table 8 expressions are similar to the

deletion expressions of Table 7.

We simplify the maintenance expressions for protected updates similarly to the way we simplify

the maintenance expressions for deletions, by using the contents of the view itself if keys are

preserved in the view. We give the following properties and rule for updates in the presence of

preserved keys. In the following let P (�Ri) = (Schema(�Ri)\Schema(V ))[Schema(V ). We use

�oldP (�Ri)

and �newP (�Ri)

to project the old and new attribute values respectively of preserved attributes

20

5�VSt = �Schema(V ) (�oldstore id;manager �state=CA �St


1sale id L


5�VSa = �Schema(V ) (�store id;manager �state=CA St

1store id �oldsale id;store id;month �year=1996 �Sa

1sale id L


5�VL = �Schema(V ) (�store id;manager �state=CA St


1sale id �oldline id;sale id;item id;sales price �L


5�VI = �Schema(V ) (�store id;manager �state=CA St


1sale id L

1item id �olditem id;item name �category=toy �I)

Table 8: Maintenance Expressions for Removing Old Updates

in �Ri and the (regular) attribute values for preserved attributes of other relations in V . We use

1oldkey(Ri) to denote joining on the attribute in which the key value before the update is held.

Property 4.4 (Protected Update Property for Keys) Given a view

V = �Schema(V )(��R11 : : :1��Rn) where the key of a relation Ri is preserved in V , then the

following equivalences hold:

�Schema(V )(��R11 : : :1��Ri�11�old��Ri1��Ri+11 : : :1��Rn) � �old

P (�Ri)(V 1oldkey(Ri) �Ri)

�Schema(V )(��R11 : : :1��Ri�11�new

��Ri1��Ri+11 : : :1��Rn) � �newP (�Ri)

(V 1oldkey(Ri) �Ri)�

Property 4.5 (Protected Update Property for Key Joins) Given a view

V = �Schema(V )(��R11 : : :1��Rn) satisfying the following conditions:

1. view V contains join conditions Ri:A = Ri+1:B; Ri+1:A = Ri+2:B; : : : ; Ri+k�1:A = Ri+k:B

2. attribute A is a key for Ri+j(0 <= j <= k), and

3. Ri+k:A is preserved in V ,

21

then the following equivalences hold (even without referential integrity constraints):

�Schema(V )(��R11 : : :1��Ri�11�old��Ri1��Ri+11 : : :1��Rn) �

�oldP (�Ri)

(V 1Ri+k:A Ri+k 1Ri+k :B=Ri+k�1:A Ri+k�1 1 : : :1Ri+1 :B=Ri:A �Ri)

�Schema(V )(��R11 : : :1��Ri�11�new

��Ri1��Ri+11 : : :1��Rn) �

�newP (�Ri)

(V 1Ri+k:A Ri+k 1Ri+k :B=Ri+k�1:A Ri+k�1 1 : : :1Ri+1 :B=Ri:A �Ri)�

Properties 4.4 and 4.5 are similar to the corresponding properties for deletions. Attributes of

Ri that are involved in selection conditions are guaranteed not to be updated, so it does not matter

whether we test the old or new value in selection conditions. Property 4.4 is used in [GJM96] to

determine when a view is self-maintainable for base relation updates.

Consider the join graph G(V ) of view V . Property 4.5 generalizes Property 4.4; Property 4.5

says that if V preserves the key of some relation Ri+k and Ri joins to Ri+k along keys (that is,

Need(Ri; G) = fRi+1; : : : ; Ri+kg and does not include all the base relations of V ), then we can

calculate the e�ect on V of protected updates to Ri by joining �Ri with the sequence of relations

up to Ri+k and then joining Ri+k with V . As for deletions, a similar property also holds if a key

of Ri+k is not preserved in V but is equated by a selection condition in V to an attribute C that

is preserved in V . In this case the e�ect of updates to Ri can be obtained by joining V with Ri+k

using the join condition V:C = key(Ri+k).

Rule 4.3 (Protected Update Rule) Let V be a view with a tree structured join graph G(V ),

and let Need(Ri; G) = fRi+1; : : : ; Ri+kg, where k � 0. The maintenance expressions calculating

the e�ect on a view V of protected updates to a base relation Ri can be simpli�ed according to

Property 4.5 to reference V unless Need(Ri; G) includes all the base relations of V except Ri. �

Similar to the rule for deletions, Rule 4.3 is used to simplify the maintenance expressions for

5�VR and 4�VR to use the contents of the view and fewer base relations so that the maintenance

expressions can be rewritten in terms of the auxiliary views.


After simplifying the maintenance expressions according to Rule 4.3, the simpli�ed expressions are

rewritten to use the auxiliary views generated by Algorithm 3.1 by replacing each ��Ri subexpres-

sion in the maintenance expression with the corresponding auxiliary view ARifor Ri. The rewriting

is similar to the rewriting for insertions and deletions.

Consider again our running example. The maintenance expressions of Table 8 are simpli�ed

using Rule 4.3 as follows.

5�VSt = �oldP (�St) (V 1sale id Sa 1store id �St)

5�VSa = �oldP (�Sa) (V 1oldsale id �Sa)

5�VL = �oldP (�L) (V 1oldline id �L)

5�VI = �oldP (�I) (V 1olditem id �I)

22

The maintenance expressions 4�VSt, 4�VSa, 4�VL, and 4�VI are obtained by substituting �new for

�old in the above expressions.1 The maintenance expressions 5�VSt and 4�VSt are rewritten to use

the auxiliary view ASa in place of base relation Sa. No other maintenance expressions reference

base relations. The rewritten expressions using the auxiliary views are equivalent to the expressions

using the base relations for the same reason that rewriting maintenance expressions for deletions

to use auxiliary views preserves equivalence. Again, a proof that the auxiliary views are su�cient

in general to evaluate the maintenance expressions is given in Appendix A.

5 Maintaining Auxiliary Views

This section shows that the set of auxiliary views is itself self-maintainable. We begin with an

intuitive argument for this claim based on join graphs. Recall that the auxiliary views derived by

Algorithm 3.1 are of the form:

AR = (�Schema(AR)�SR) ><C1AR1><C2AR2

><C3 : : :><CmARm

where Ci equijoins a foreign key of R with the corresponding key for relation Ri. Because the joins

are along foreign key referential integrity constraints, each semijoin could be replaced by a join.

Thus, each auxiliary view is an SPJ view, and its join graph can be constructed as discussed in

Section 3. Further, note that the join graph for each auxiliary view is a subgraph of the graph for

the original view, because each join in an auxiliary view is also a join in the original view. Thus, the

information needed to maintain the original view is also su�cient to maintain each of its auxiliary

views.

Below we give the maintenance expressions for the auxiliary views. The auxiliary views are a

special class of SPJ views where all joins are across foreign key constraints. Thus, their maintenance

expressions are simpler than the expressions for general views described in Section 4. Because of the

simple structure of auxiliary views, they are extremely e�cient to maintain. In fact, maintenance

expressions for propagating deletions and protected updates to auxiliary views involve only a single

join, independent of the number of joins in the auxiliary view. For propagating insertions to

auxiliary view AR, only one maintenance expression is needed, because only the insertions to R

a�ect the auxiliary view.

5.1 Insertions

For the auxiliary view AR as given above, the maintenance expression for insertions is is:

4AR = �Schema(AR)((�S4R) 1C1AR11C2AR2

1C3 : : :1CmARm)

Insertions into any of the auxiliary views used to de�ne AR do not a�ect 4AR due to Rule 4.1: a

tuple inserted into any auxiliary view ARiwill not join with any tuple in relation R because of the

referential integrity constraint from R to Ri. Thus, insertions to ARineed not be propagated to

AR.

1Note that in the 4� maintenance expressions we still join on the old attribute values of the keys. For example, if

the key of L were updated we would still join V with �L on the old value of the key of L in 4�VL.

23

5.2 Deletions

For the auxiliary view AR as given above, the maintenance expression for deletions is:

5AR = �Schema(AR)( AR 1key(R) (�S5R)

[ AR 1C1 5AR1

[ : : :

[ AR 1Cm 5ARm)

Deletions from auxiliary view ARican be joined directly with AR because AR contains the keys of

each ARi.

5.3 Protected Updates

We calculate the e�ect on AR of protected updates to base relation R by giving two maintenance

expressions: one returning the tuples to delete from AR and another returning the tuples to insert

into AR. Protected updates to the auxiliary views AR1; : : : ; ARm do not a�ect AR because AR

includes only attributes from relation R. The maintenance expression for updates is:

5�AR = �oldSchema(AR)

(AR 1old key(R) �R)

4�AR = �newSchema(AR)

(AR 1old key(R) �R)

In practice this pair of expressions can be combined into a single SQL update statement.

5.4 Evaluating the Maintenance Expressions

The maintenance expressions for an auxiliary view AR rely upon auxiliary views AR1; : : : ; ARm.

Thus, to compute changes to AR, we �rst need to compute changes to AR1; : : : ; ARm. We can order

the computation of auxiliary view maintenance expressions because the auxiliary view de�nitions

are non-recursive. That is, an auxiliary view will not (directly or indirectly) depend on itself. More

formally, consider a graph that has a node for each auxiliary view and an edge from the node for

view A to the node for view B if A occurs in the de�nition of B; this graph is guaranteed to be

acyclic: AR uses ARionly if relation Ri is in the Dep set of R, and a relation cannot be in the

closure of its own Dep set for tree-structured join graphs. Thus, auxiliary views can be maintained

in a bottom-up fashion, starting with views that depend on no other auxiliary views, and working

up to the �nal original view.

6 Related Work, Summary, and Future Directions

The problem of view self-maintainability was considered initially in [BCL89,GJM96]. For each

modi�cation type (insertions, deletions, and updates), they identify subclasses of SPJ views that

can be maintained using only the view and the modi�cation. [BCL89] states necessary and su�cient

conditions on the view de�nition for the view to be self-maintainable for updates speci�ed using

a particular SQL modi�cation statement (e.g., delete all tuples where R:A > 3). [GJM96] uses

24

information about key attributes to determine self-maintainability of a view with respect to all

modi�cations of a certain type.

In this paper we consider the problem of making a view self-maintainable by materializing a

set of auxiliary views such that the original view and the auxiliary views taken together are self-

maintainable. Although the set of base relations over which a view is de�ned forms one such set

of auxiliary views, our approach is to derive auxiliary views that are much smaller than storing

the base relations in their entirety. Identifying a set of small auxiliary views to make another view

self-maintainable is an important problem in data warehousing, where the base relations may not

be readily available.

In [HZ96], views are made self-maintainable by pushing down selections and projections to the

base relations and storing the results at the warehouse. Thus, using our terminology, they consider

auxiliary views based only on select and project operators. We improve upon their approach

by considering auxiliary views based on select, project, and semijoin operators, along with using

knowledge about key and referential integrity constraints. We have shown in Section 1.1 that our

approach can signi�cantly reduce the sizes of the auxiliary views. We also have shown (Section 5)

that auxiliary views of the form our algorithm produces can be (self-)maintained e�ciently.

In [TSI94], inclusion dependencies (similar to referential integrity constraints) are used to de-

termine when it is possible to answer from a view joining several relations, a query over a subset

of the relations; e.g., given V is a view joining R and other relations, when �Schema(R)V � R.

We on the other hand, use similar referential integrity constraints to simplify view maintenance

expressions.

As part of our work, we have shown that by using key and referential integrity constraints,

maintenance expressions calculating the e�ect on a view of insertions to certain base relations are

guaranteed to be empty. Thus, these maintenance expressions do not have to be evaluated, reducing

the overall cost of maintaining the view. To the best of our knowledge, no previous work considers

using key and referential integrity constraints to reduce the sizes of auxiliary relations required to

maintain a view, or even to reduce the cost of view maintenance in a traditional setting.

In the future we plan to extend our approach along three directions:

� Extend the class of views considered to include aggregation.

� Extend the class of auxiliary views considered to use join and anti-semijoin operators.

� Rather than restricting ourselves to a single view, consider the problem of making a set of

views self-maintainable. Considering a set of views together can produce opportunities for

\sharing" among views and auxiliary views.

References

[BCL89] J. Blakeley, N. Coburn, and P. Larson. Updating derived relations: Detecting ir-

relevant and autonomously computable updates. ACM Transactions on Database

Systems, 14(3):369{400, September 1989.

25

[CGL+96] L. Colby, T. Gri�n, L. Libkin, I. Mumick, and H. Trickey. Algorithms for deferred

view maintenance. In SIGMOD 1996.

[DEB95] IEEE Data Engineering Bulletin, Special Issue on Materialized Views and Data

Warehousing, 18(2), June 1995.

[GJM96] A. Gupta, H. Jagadish, and I. Mumick. Data integration using self-maintainable

views. In EDBT, 1996.

[GL95] T. Gri�n and L. Libkin. Incremental maintenance of views with duplicates. In

SIGMOD 1995.

[GM95] A. Gupta and I. Mumick. Maintenance of Materialized Views: Problems, Techniques,

and Applications. In IEEE Data Engineering Bulletin, Special Issue on Materialized

Views and Data Warehousing [DEB95], pages 3{19.

[GM96] A. Gupta and I. Mumick, editors. Materialized Views. MIT Press, Cambridge, MA,

1996. To be published.

[GMS93] A. Gupta, I. Mumick, and V. Subrahmanian. Maintaining views incrementally. In

SIGMOD 1993.

[HGW+95] J. Hammer, H. Garcia-Molina, J. Widom, W. Labio, and Y. Zhuge. The Stanford

Data Warehousing Project. In IEEE Data Engineering Bulletin, Special Issue on

Materialized Views and Data Warehousing [DEB95], pages 41{48.

[HZ96] R. Hull and G. Zhou. A framework for supporting data integration using the mate-

rialized and virtual approaches. In SIGMOD 1996.

[LS93] A. Levy and Y. Sagiv. Queries independent of updates. In VLDB 1993.

[Mum95] I. Mumick. The Rejuvenation of Materialized Views. In Proceedings of the Sixth

International Conference on Information Systems and Management of Data (CIS-

MOD), Bombay, India, November 1995.

[SP89] A. Segev and J. Park. Updating distributed materialized views. IEEE Transactions

on Knowledge and Data Engineering, 1(2):173{184, June 1989.

[TSI94] Odysseas G. Tsatalos, Marvin H. Solomon, and Yannis E. Ioannidis. The GMAP:

A versatile tool for physical data independence. In Jorge Bocca, Matthias Jarke,

and Carlo Zaniolo, editors, Proceedings of the 20th International Conference on Very

Large Databases, pages 367{378, Santiago, Chile, September 12-15 1994.

[ZGHW95] Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a

warehousing environment. In SIGMOD 1995.

26

A Informal Proof of Theorem 3.1

Proof: We show that for view V with join graph G(V ) and the set of corresponding auxiliary views A

derived by Algorithm 3.1,

1. A is su�cient to maintain V ,

2. A is self-maintainable, and

3. A is the unique minimal set of views such that the above two conditions hold.

By minimal we mean that no element AR 2 A can be removed, and no additional selection conditions or

semijoins can be applied to AR to further reduce the number of tuples in AR. The above claims hold when

the auxiliary views considered are of the form:

ARi = (��Ri) ><ARj1><ARj2>< : : :><ARjn

First we show that A is su�cient to maintain V .

1. We show that the maintenance expressions propagating insertions to the base relations onto V rewrit-

ten by substituting for each base relation R its corresponding auxiliary view AR 2 A are equivalent

to the maintenance expressions using the base relations.

� IfAR is not created then R is the root ofG(V ), andDep+(R;G) contains every relation referenced

in V other than R. Therefore, every maintenance expression that calculates the e�ect on V of

insertions to a base relation other than R will be eliminated by Rule 4.1. The reason is that for

every other relation S, there is a relation S0 such that S 2 Dep(S0 ; G). Thus, the maintenance

expression that calculates the e�ect on V of insertions to S will be eliminated by Rule 4.1.

Hence the only maintenance expression remaining will be the one that calculates the e�ect on V

of insertions to R, and R is not referenced in that maintenance expression (4R is referenced).

� If AR is of the form AR = (��R) ><AR1><AR2

><: : :><ARk , then we need to show that the

only tuples of R that could contribute to the maintenance expressions are those that are joinable

with AR1; AR2

; : : : ; ARk. We show this inductively. Let the level of a relation S be de�ned as

the maximum path length from S to a leaf node in G(V ). For level 0, AR is of the form ��R,

then AR contains all tuples of R that could contribute to any maintenance expression. For level

m+1, We illustrate the claim using only AR1out of AR1

; AR2; : : : ; ARk, where each Ri is in level

at most m. R1 2 Dep(R;G), so the maintenance expression that calculates the e�ect on V of

insertions to R1 will be eliminated by Rule 4.1. Thus, whenever R appears in a maintenance

expression M it is joined with R1. Since by induction AR1contains all tuples of R1 that could

contribute to any maintenance expression, only tuples in R that are joinable with tuples in AR1

could contribute to M .

2. We now show that the maintenance expressions propagating deletions to the base relations onto

V rewritten by substituting for each base relation R its corresponding auxiliary view AR 2 A are

equivalent to the maintenance expressions using the base relations.

� If AR is not created then 6 9S such that R 2 Need(S;G). Therefore, after rewriting the mainte-

nance expressions according Rule 4.2, R will not appear in any maintenance expression.

� Consider AR of the form AR = (��R) ><AR1><AR2

><: : :><ARk , and let AR appear in the

maintenance expression M calculating the e�ect onto V of deletions to some base relation S.

The only tuples in R that join with some tuple in S and could contribute to the view are those

in AR. Since 5S � S, all the tuples in R that join with some tuple in 5S and could contribute

to M are in AR.

27

3. The rules for simplifying maintenance expressions for updates are similar to the rules for simplifying

maintenance expressions for deletes. Therefore, the argument showing that the maintenance expres-

sions propagating updates to the base relations onto V rewritten by substituting for each base relation

R its corresponding auxiliary view AR 2 A is similar to the argument for deletions.

Next we show that A is self-maintainable.

AR of the form AR = (��R) ><AR1><AR2

>< : : :><ARk ,

Consider again AR = (�Schema(AR)�SR) ><C1AR1

><C2AR2

><C3: : :><CmARm . Because the joins are

along foreign key referential integrity constraints, each semijoin could be replaced by a join. Thus, each

auxiliary view is an SPJ view, and its join graph can be constructed as discussed in Section 3. Further, note

that the join graph for each auxiliary view is a subtree of the graph for the original view, because each join

in an auxiliary view is also a join in the original view. Thus, every auxiliary view generated by Algorithm 3

for maintaining AR is also in the set A for maintaining the original view.

Finally we show that A is a unique minimal set.

Every auxiliary view AR 2 A is necessary. Suppose Dep+(R;G) 6= R�fRg (R is not the root of G(V )).

Then AR appears in the maintenance expression calculating the e�ect onto V of insertions to some other

relation S. If Dep+(R;G) = R � fRg (R is the root of G(V )), then AR is only created if 9S such that

R 2 Need(S), in which case AR appears in the maintenance expression calculating the e�ect onto V of

deletions to S.

For each auxiliary view AR 2 A, we cannot further reduce the number of tuples in AR by adding

additional selections or semijoins. We cannot add additional selections because by de�nition all possible

selection conditions are applied to R in AR. There are two cases to consider for why we cannot add a

semijoin with an auxiliary expression AS .

1. If S 2 Dep(R;G), AR already includes a semijoin with AS .

2. If S 62 Dep(R;G), then tuples in 4S could join with tuples in R that are not in R><S, because

Property 4.1 does not hold. Therefore, the maintenance expression calculating the e�ect onto V of

insertions to S could potentially miss insertions into V .

B General Join Graphs

In Section 3 we restricted our views to have join graphs that are forests of trees, as a simpli�cation

for deriving the auxiliary views. Here we discuss the impact of allowing general join graphs. We

also consider self-joins.

First consider an arbitrary join graph for a view without self-joins. We de�ne the RI projection

of a join graph G(V ) to be the graph obtained by removing from G(V ) all edges that are not

labeled RI . In other words, we remove edges due to joins that do not imply a referential integrity

constraint.

B.1 Acyclic RI Projections

We �rst consider the case where the RI projection is acyclic. We compute the Dep+(Ri; G) sets

as discussed in Section 3. Need(Ri; G) sets are computed in three phases: First, we initialize the

28

need sets using the following de�nition:

Need(Ri; G) =

8>>>>>>><>>>>>>>:

� if the key of Ri is preserved in V ,

ffRjg[Need(Rj; G)g if the key of Ri is not preserved in V and there is an Rj

such that e(Rj; Ri) in G(V ), and e(Rj; Ri) is not part

of a cycle

undefined otherwise

We note that multiple need sets are possible since a node Ri can now have multiple parents. Second,

we consider the nodes on each cycle. If all the nodes on the cycle have unde�ned need sets, we set

Need(Ri) = fRg, that is the set of all relations except relation Ri. However, if some node Ri on

the cycle has a de�ned need set, we add to the need sets of other nodes Rk on the cycle the sets:

ffRjg[Need(Rj; G)g

provided the key of Rk is not preserved in V and there is an Rj such that e(Rj ; Ri) is part of the

cycle under consideration. If a cycle is contained wholly within another cycle, the containing cycle

should be considered �rst.

Thirdly, if a relation Ri has multiple need sets, we will remove any need sets containing the

unique root of the graph (if there is a unique root).

After these steps, we will use Algorithm 3 to compute the set of auxiliary views. We now reason

why there exists a minimal set of auxiliary views, and why Algorithm 3 adapted as above computes

this set.

Consider a connected component of the full join graph. The graph can be seen as a generalization

of a tree where (1) there can be more than one root, and (2) an internal node can have more than

one parent. Having more than one root does not impact either the Algorithm 3 to compute the

auxiliary views (except that the condition (Dep+(Ri; G) = R � fRig) cannot be true for any

relation Ri), or the expressions that maintain the view using the auxiliary views. When a node

R has multiple parents, it can have multiple need sets, one through each parent. The need sets

determine which relations can be joined with the deleted or updated tuples of R to determine

deleted or updated tuples in the view. When there are multiple need sets, and if more than one

is available in an auxiliary expression, there will be multiple maintenance expressions for deletions

and updates to relation R, and one then has to choose between them. However, it is interesting

that even in presence of multiple need sets, there is still a unique minimal set of auxiliary views.

To see why this holds, note that the only application of the need sets on choosing auxiliary views is

that if the graph has a single root then an auxiliary view may not be needed for the root unless it is

in some relation's Need set. Therefore, for graphs with a single root, if a relation has multiple need

sets one of which contains the root, we should remove the need set containing the root. Algorithm 3

will then choose the minimal set of auxiliary views.

Need sets are used to simplify the maintenance expressions for deletions and updates. Since

we have multiple need sets, we can chose any one of these to simplify the expressions by Rules 4.2

and 4.3.

29

B.2 Cyclic RI Projections

We now consider nonrecursive views whose RI projection is cyclic. We construct a reduced join

graph by coalescing all strongly connected components of the RI projection into single nodes, and

de�ne an intermediate view representing the join of all relations in the cycle, with a projection that

preserves the keys of the base relations.

The new view de�nition, in terms of the intermediate views, has an acyclic join graph, and is

solved as discussed in Sections B.1. The solution tell us whether an auxiliary view corresponding

to the intermediate view needs to be materialized. If one needs to be materialized, then we have

two choices:

� Materialize a select-project expression of each relation in the cycle, restricted by semi-joins

de�ned on the intermediate view.

� Materialize the intermediate view. (This is one case where we deviate from our standard form

of auxiliary expressions.)

Note that for the second option, we do not need to materialize any other auxiliary view. The cycle

of RI joins implies that new insertions into any relation in the cyclic join will not cause any change

in the result of the join.

B.3 Self-joins

Let a view be de�ned by joining multiple (for simplicity, say two) occurrences of relation T . Label

each occurrence with a unique name Ta and Tb, and run Algorithm 3 on the resulting join graph

G to determine the auxiliary views ATa and ATb , if any.

We have the following scenarios:

� One or both of the auxiliary views ATa and ATb are not needed, or the two are identical: In

this case materialize the auxiliary view that is needed.

� The auxiliary views ATa and ATb are both needed and they are not equal: In this case we

have two options:

{ Materialize a select-project expression of relation T . The selection condition is the

disjunction of the selection conditions in Ta and Tb, and the projection list is the union

of the projection lists of Ta and Tb.

{ Materialize both the auxiliary views ATa and ATb.

In other words we have two minimal sets of auxiliary views.

30

making views self-main - stanford universityilpubs.stanford.edu/157/1/1996-30.pdfmaking views...

Documents