object-oriented databases: design and implementation

24
Object-Oriented Databases: Design and Implementation John V. Joseph Satish M. Thatte Craig W. Thompson David L. Wells Reprinted from PROCEEDINGS OF THE IEEE Vol. 79, No. 1, January 1991 PROC/79/1/ /41186

Upload: others

Post on 16-Feb-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Object-Oriented Databases: Design and Implementation

Object-Oriented Databases: Design and

Implementation

John V. Joseph Satish M. Thatte

Craig W. Thompson David L. Wells

Reprinted from PROCEEDINGS OF THE IEEE

Vol. 79, No. 1, January 1991

PROC/79/1/ /41186

Page 2: Object-Oriented Databases: Design and Implementation

Object-Oriented Databases: Design and Implementation

JOHN V. JOSEPH, MEMBER, IEEE, SATISH M. THATTE, SENIOR MEMBER, IEEE ,

CRAIG w. THOMPSON' SENIOR MEMBER, IEEE, AND DAVID L. WELLS, MEMBER, IEEE

Invited Paper

Object-oriented database systems aim ar meeting the data modeling. performance. cooperative design, and version management requirements of next-generation applications, such as CAD, CAM, CASE, hypermedia. and expert systems. These needs cannot be mer with conventional data­base systems, which have been developed primarily for business and finan cial applications. Object-oriented database (OODB) systems repre­sent the confluence of ideas from object-oriented programming languages and database management. The paper presents key features of OODB 's, provides a taxonomy of approaches to OODB 's, and discusses key OODB architectural and implementation issues, deo~ign alternatives, and rrade­offs. It provides a brief summary of a varierv of OODB systems, both research prototype.! and commercial systems. Finally, it discuso~es indus­try efforts ro accelerate a consensus that can lead to standards in the OODB area.

l. INTRODUCTION

Many " next-generation" applications , such as computer-aided design and manufacturing systems, computer-aided software engineering, multimedia and hypermedia information systems, and artificial intelligence expert systems require databases that can support objects of a wide variety of rypes with the ability to express complex relationships among objects. Conventional data­base systems (network, hierarchical, or relational models) are not adequate to support the requirements of these applications. Some of the key requirements are: modeling power, performance, long­term and cooperative " transactions' ' necessary for design activ­ities, and version and configuration management.

Object-oriented database (OODB) systems represent the con­fluence of ideas from object-oriented programming languages and database management. Object-oriented programming languages, such as Smalltalk , Common Lisp Object System, and C++, pro­vide rich data abstraction capabilities including the powerful modeling capabilities based on the ability to define abstract data types and construct type hierarchies that pem1it property inherit­

ance. Database systems provide long-tem1 reliable data storage, multiuser access, concurrency control, query, recovery, and security capabilities. OODB's combine the advantages of object­oriented programming languages with those of database systems.

Manuscript received August 17, 1990: revised October 12. 1990. The authors are with the Information Technologies Laboratory, Texas

Instruments Inc .. Dallas. TX 75265. IEEE Log Number 9041186.

For instance, an OODB is not just a repository of "passive" data, such as numbers and character strings, but can contain a rich vari­ety of complex objects such as arrays , vectors , graphics bit maps, and pictures along with procedures that operate on objects in response to the messages sent to them.

In the past few years, two schools within the database research community have emerged to respond to the needs of these next­generation applications. The first school has advocated the approach of extending relational database technology with object extensions. The Third Generation Database Manifesto [ I] artic­ulates the position of this school. The school has advocated a ground-up approach of developing object-oriented database tech­nology, as articulated in the Object-Oriented Database Manifesto [2]. Several research prototypes and a few commercial products representing both schools have been developed in the past five years. Both schools have vigorously debated their positions [3] . It is not the primary purpose of this paper to compare or evaluate the two distinct approaches taken by the schools. A brief com­parison of the approaches can be found in [4]. This paper is con­cerned with characterizing OODB's. Since OODB's is a rapidly evolving fie ld, some of our statements may be controversial and reflect our biases; we believe this is unavoidable.

We do not expect OODB's to replace conventional databases in commercial data processing applications for several years. The strength of OODB's will be their ability to support next-genera­tion applications. Conventional databases and their extensions and OODB's will coexist to support different activities of an enter­

prise. Although OODB technology has great potential , there are still

many questions to be answered and challenges to be met before the technology can be considered mature. The variety of com­mercial products and research prototypes indicates that consensus and standards may be several years away . Performance bench­marks are just beginning to emerge. Feedback from application developers and resulting improvements are just beginning.

This paper will provide the reader with the current status of the rapidly emerging field of OODB's. Section II provides back­ground information on object-oriented concepts and may be skipped by readers with this background. Section III describes the requirements of next-generation applications and how conven­tional databases fail to meet these needs. The section introduces

0018-9219/9 1/0 100·0042$0 1.00 © 1991 I EEE

PDf"\rt:'Cf'\ 1 1\lr..c:' r u:; TUC II:' C'C "f"\1 ""1'0 llro.TA I IAII.lf f ,\DV lf\01

Page 3: Object-Oriented Databases: Design and Implementation

the tenninology used to describe OODB's and presents an intro­duction to OODB's in tenns of the features they are expected to have; a taxonomy of various OODB approaches is also included. The first three sections of the paper are of a tutorial nature. Sec­tion IV also has a tutorial style but is geared more toward a reader interested in the state of the art in OODB research. This section discusses key OODB implementation issues, design alternatives, and trade-offs. Section V provides short sketches of a variety of OODB systems, both research prototypes and commercial sys­tems. Section VI discusses some industry efforts to accelerate

consensus that can lead to standards in the OODB area. Section VII concludes with a summary of the paper along with a d iscus­sion of open issues.

II. OBJECT-ORIENTATION

Object-orientation is becoming increasingly popular in the design and implementation of computer based systems. This is because the object metaphor provides a natural way to map real­world objects and their relationships directly to computer repre­sentations. In this section, we will define and illustrate the object­oriented concepts and examine how these concepts provide a nat­ural way to model real-world entities and thei r relationships. For further elaboration and related ideas, see [5)-[8] .

The fundamental concepts of object-orientation are objects, object idenriry, classes, and inheritance. These concepts first appeared in programming languages. Many of the ideas of object­oriented programming date back to Simula-67 [9], [ 10). Research in object-oriented programming has produced new programming languages, notably Smalltalk [ I I], Eiffe l [5], and Trellis/Owl [12]. Object-oriented programming has also been supported by extensions to existing languages; some notable examples are: C + + [ 13] and Objective-C [ 14) as extensions of C; and Flavors [15), LOOPS [16), and Common Lisp Object System (CLOS) [17] as extensions of Lisp. The field is now sufficiently mature that there are ANSI standardizat ion efforts for the CLOS [ 17) and C + + [ 18!languages. Excellent introduct ions to object-oriented programming can be found in [19]-[2 1].

Frames [22] is an object-oriented knowledge representation scheme adopted in languages such as FRL [23) and KRL [24]. Commercial products like KEE from Intell icorp and ART from Inference use frame-based knowledge representation. In the area of databases, there is an overlap between concepts in the object­oriented models and semant ic data models [25], [26]. Most semantic data models do not take advantage of the power of the class concept in that the models do not support methods or inher­itance. Object-oriented database research attempts to bring to databases the full power of the object -oriented concepts of objects and object identity, classes, and inheritance.

Productivity gains from the use of object-oriented concepts in programming, system design, and databases are widely claimed. Experiments by the customers of Productivity Products Interna­tional, Inc. claim that for 30 graphics, CAD, and spreadsheet applications the amount of code using the Objectivc-C language was four times smaller than the code using the C language [27]. This 4X reduction in program code size also implies an attendant reduction in development and maintenance effort and cost. The developers of a program to balance ai rl ine over-booking and no­shows estimate that using the object-oriented approach a llowed two programmers to complete in th ree years a task that nonnally would have required 20 people and a multimillion dollar budget [28]. In add ition, the use of object-oriented concepts in program design and implementation appears to provide significant

JOSEPH <'I a/.: OBJECT·OR!ENTED DATABASES

improvements in programmer productivity by fac ilitating soft­ware reuse [5], [6], [28).

A. Concepts and Definitions

Object-oriented systems share a common set of concepts. These concepts are described in this section. As yet, there is no consen­sus on a qualify ing set of core requ irements fo r a system to be called object-oriented .

1) Objects and Object Identity: a) Objects: The one th ing that is common to all object-ori­

ented systems is the notion of "object." Objects are entities (data structures in a computer) that are used to represent abstract or concrete real-wo rld things in the application domain being mod­eled . Examples of real-world entities include : a boiler in a com­puter aided manufactu ring (CAM) application, an adder in a com­puter-aided design (CAD) application, and a battle ship in a military application. Objects are also used to model entities that are purely artifacts of a computer-based system, such as a win­dow in a window system or an input/output buffer.

An object is an entity that has a local state and an ability to manipulate its local state in response to external requests . The local state of an object is the set of values of its attributes (var­iously called instance variables, propert ies, data members, or slots). T he external requests are called messages; and the pro­gram code that operates on the state to change it in response to messages is called a method. The collection of all messages defined fo r an object constitutes its abstract imerface or type . The collection of all methods of an object defi nes its behavior. The principle of data abstraction or encapsulation states that the local state and the methods of an object are not visible to users of the object; they may only interact w ith the object by making requests to the object through messages. This princ iple promotes modu­larity and maintainabil ity. Since the user of an object cannot make assumptions about the implementation and internal representa­tions of the object, the underlying implementations can be changed without affecting users.

b) Object idenriry: Each object is associated with a unique identifier, regardless of its current state [29]. The idea is that an object has an existence which is independent of its value. Identity is a stronger concept than simply a value describing an object; it cannot be changed in the same way other values describing the object can be changed. Although not supported by most object­oriented systems, it might even be possible fo r an object to change its type and retain its identity .

Since identity is a stronger concept than value, it is possible to disti nguish between two objects that have the same value. To see the distinction, consider a relation SHIP (NAME, CAPTAIN) in a relational database cons isting of "name-captain" tuples. We may ask what happens if two ships named '"Sea Breeze" also happen to have captains named "Long John Silver." Considered as two ship objects, we can distinguish between them based on their object I D's regardless of their values. We cannot distinguish them as tuples in a relation, since a relation is a set and a tuple cannot be a member of a set more than once. The object identi­fiers in object-oriented databases and the record pointers in hier­archical databases are similar; the major difference is that object identifiers arc logical pointers; whereas, record pointers are phys­ical pointers. A consequence of this is that object pointers, but not record pointers, can be used for referential integrity [30], [31) . It is possible to simulate object identity in a value-based system (in a relational database, for example) by introducing object iden­tifie rs explicitly . This, however, places the responsibility of ensuring identifier uniqueness and referential integrity on the user.

43

Page 4: Object-Oriented Databases: Design and Implementation

2) Class: An object has attributes and behavior. Class is a means of grouping together objects that share the same attributes and behavior. A class is implemented by choosing a collection of attributes, or instance variables, in which to store the internal state of the instances, and by writi ng a method of each message defining the abstract interface of the instances of the class. A class is, therefore, the implementation of the abstract interface or type of an object; and an object's structure and behavior are defined by its type and its class. Members of the class are called insTance objecTs or instances. In object-oriented systems, each object is an instance of some class. Class implements the data modeling con­cept insTance-of. The distinction between "type" and "class" is important for the discussion of architectural choices and imple­mentation issues in Section IV and is further e laborated there.

Class composiTion hierarchy: A class consists of a set of attributes. The domain of an attribute may be a class that, in turn, may have attributes with domains as classes. This nested structure of a class can give rise to a directed , possibly cyclic , graph rep­resenting the composition relationship between a class and its attributes. In object-oriented systems, the composition relation­ship is the equi valent of the data modeling concept of aggrega­tion. The class composition hierarchy is orthogonal to the class inheritance hierarchy discussed below. The composition hier­archy provides a way to model rich , complex data structures with­out first ftattening out the structure. Examples of aggregation rela­tionships are is_yarT_ofand is_owned_by.

3) Inheritance: Object-oriented systems allow the user to derive a new class from an existing class. The new class inherits all attributes and methods of the original class and may define addi­tional attributes and methods and redefine inherited methods. The new class. a subclass. specializes the original class. The original class is called a superclass of the derived class and is its gener­

alization. Inheritance realizes the data modeling concept is_ a. It reduces the need to specify redundant information and hence sim­plifies updating and modification. It is also used to create objects that are almost like other objects with a few incremental changes. Mechanisms like this are important because they make it possible to declare that certain specifications are shared by multiple parts of a program. Inheritance helps to keep programs shorter and more tightly organized. The power of inheritance is in the economy of expression that results when a class shares description with its superclasses . The common use of class libraries is a good exam­ple of software reuse through the inheritance mechanism. Pro­

grammers build upon basic objects provided by class libraries by specializing the classes in the libraries.

When a class can have only one immediate superclass, it is called single inheriTance; when a class can inhe rit from multiple classes, we have mulTiple inherirance. Multiple inheritance increases sharing by making it possible to combine descriptions from several classes. The graph resulting from the subclass­superclass relationship among classes is referred to as the inher­itance hierarchy.

B. lllustration of the Concepts

In this subsection, the concepts discussed above are illustrated with an example . Figure I shows a Ship class. Instances of this class may be used to model real-world ships . Ship has attributes Next_ Port, Position, Speed and Heading. Next_ Port points to an object of class Port , defined elsewhere, using an object identifier; Position is a pair of real numbers representing longi­tude and latitude; Speed, and Heading are real numbers repre­senting knots and compass heading, respectively. Ship's ab-

MESSAGES

Estim:ue_ Time_of_ArrivaJ ___.,..

Ge!_Ne.x r_Pon --..

Gei_Speed ­

Get_Heading __..

Get_Position ___..

Set_Nex1_Po.si1ion ~

ATTRIBUTES

Next_Pon

Position

Speed

Heading

S.._Speed--.. . ----- -· ... - .--- . _ - ___ .. ...... __

Set_Heading ­

Set_Position--..

Fig. 1. Ship class.

METHODS TO HANDLE MESSAGES

Ship

Class

stract interface consists of messages Get Next Port , Set Next_ Port , Get_ Speed, Set Speed: G;t_ Heading-:­Set_ Heading, Get_Position, Set_Position, and Esti­mate_ Time_ Of_Arrival. The Get_ and Set_messages get or set the values of the corresponding attributes. The method imple­menting the Estimate_ Time_ Of _Arriva l message computes its result by sending a message to the Next_ Port object asking where it is , finding the current time, and then using Position, Heading, and Speed to determine when the port will be reached (an error is signaled if the ship is not going in the right direction to reach the port). Note that since the method is hidden from the user it is possible to change the way Estimate Ti me Of ' Arrival is computed without affecting the user's code. It i; als~ possible to reimplement the class with a different set of attributes without affecting the user; for example, the attribute Position may be replaced by two attributes Latitude and Longitude. Ship's methods cannot directly read or change the state of Next_ Port, only Port's methods can do these. Because all inter­actions with Next_Port are through Port's abstract interface, changes to the implementation of Port do not affect Ship.

The attribute Next_Port is an instance of class Port; suppose that Port has an attribute called Located_ln which is an instance of class City. The same Port object could be pointed to by other objects; e.g. , a State object could point to the Porr object as My _Port . Also, a City object could point to a State object by an attribute ln_State. This, as illustrated in Fig. 2, is how com-

Ntxl_do" Loaled_ln Port ,. ------~

My_Port Stale

Fig. 2. Composition hierarchy.

position hierarchies are formed. It is natural to model application entities as objects and construct a graph of these objects , with relationships among them real ized as a composition hierarchy.

To illustrate inheritance, consider a new class Cargo_Ship as a subclass of Ship (see Fig. 3). A Cargo_Ship has all the prop­erties of Ship, but can also carry cargo. To support this, Cargo_Ship adds the attribute Items and the following messages to the abstract interface of Ship: Load_ltem and Unload_ltem, which take as an argument an Item object to be loaded or unloaded, and List_ltems, which returns a list of all Items cur­rently on board. At the Cargo_Ship level , it might be necessary to reimplement the Set_ Speed method by restricting the maxi-

Page 5: Object-Oriented Databases: Design and Implementation

MESSAGES

Estimate_Time_of_Arrival ______________.

Get_Ne ;o;t_Pon ~

Get_Speed ______...

Get_Heading _______.._

Get_Pos ition _____...

Set_Nc:xt_Position ______________.

ATTRIBUTES

Next_Port

Position

Speed

Heading

Set_Speed __________.,. ----------~..-.... ---------~--------...,...--~---..._

Set_Heading _____.....

Set_Position ...------...

MESSAGES

luad_!tem ___..

Unload_llcm ______.,.

METHODS TO HANDLE MESSAGES

j ADDITIONAL ATTRIBUTES

Items List_he ms ___________.. F-----~--,-----~~--~~-

METHODS TO HANDLE ADDITIONAL MESSAGES AND

REDEFINED METHODS

Fig. 3. Inheritance hierarchy.

Ship

Class

Cargo_Ship

Class

mum Speed of the Ship when carrying certain kinds of Items. For example, it might be desirable to require a maximum speed of 10 knots for Cargo_Ship' s carrying oil when within 20 nautical miles of a coast.

III. 0BJECT-ORIEI'TAT!ON AND DATABASES

Computer-based solutions are being sought for tasks that were previously thought to be too complex to be fully automated. Tasks like factory automation. command and control applications, mul­timedia informations systems, and very large scale integration (VLSI) design involve processes dealing with a large number of objects , with different structures and complex behavior, intercon­nected in intricate networks. We refer to computer-based solu­tions to such tasks as next-generation applications. As object­oriented technology becomes more accepted, it is being applied to next-generation applications. In these applications , object bases are much larger than a machine's virtual memory , represent information that must be shared by multiple users at many sites, and must be preserved after the processes which created the object bases terminate. The field of "object-oriented database sys­tems," (also called object data bases , object data management, and persistent object bases) has emerged primarily as a result of research into supporting these object bases. A variety of related technologies like " object-oriented analysi~ and design," "object­oriented communications," ''object-oriented application integra­tion frameworks ," and "object-oriented interfaces " are also important for the success of next generation applications; how­ever, discussion of these topics is outside the scope of this paper. Some of these are discussed in [5] and [6]; also many excellent text books are available on conventional database systems [32], [33].

In Section III-A. we describe the database requirements of next­generation applications; in Section III-B, we examine why con­ventional database systems fai l to satisfy some of these require­ments . In Section 111-C, we present the features and characteris­tics of OODB's. These features and characteristics define a database model that addresses the deficiencies of conventional database systems. Relational database researchers try to remedy the deficiencies by extending or enhancing the relational model (!]. Their approach has a certain appeal because it is an evolu­tionary migration path and there is a large installed base of rela-

JOSEPH et a/.: OBJECT-ORIENTED DATABASES

tiona! databases. However, we believe that the ability to make programming language structures directly persistent , without flat­tening them into tables, has a bigger appeal to next-generation applications. There are lively debates on the relative technical merits of the two approaches, for example [3]. When the debates move from academia to the users, the market place will determine the relative merits of the two approaches.

A. Database Requirements of Next-Generation Applications

Data management requirements of next-generation applications will be increasingly typified by the following requirements [1],

[2], [34], [35]:

Rich Data Modeling: As described in Section II, object-ori­ented concepts can be used to model application data and rela­tionships in a natural manner. This modeling is realized in the transient memory of the computer. Programming languages sup­port object-oriented concepts by means of a rich system of user defined classes. Many applications need to deal with persistent data (data that lives after the processes that created them termi­nate) and share the data among multiple users. These applications will benefit from extending the powerful modeling capabilities of the transient environment to persistent, shared data.

Navigational as well as Query Access to Objects: Large appli­cations typically create complex graphs of objects; it must be pos­sible to efficiently navigate through these graphs even after they have been stored in a database. This is particularly important for applications such as hypermedia , which require efficient interac­tive performance. However, as applications become very large, it is often inconvenient to retrieve every object by name. It is also quite natural to retrieve objects by a query, e.g., in a software engineering application, retrieve definitions of all functions that call a given function. Thus , associative retrieval based on predi­cates needs to be supported, thought not necessarily for all per­sistent objects .

Sharing of Objects Among Application Systems: Next-genera­tion applications consist of different subsystems that may be implemented in different programming languages. The subsys­tems need to communicate with each other by exchanging data. It is, therefore, imperative that the database be able to handle data generated by different programming languages. Since object implementation is hidden in the object-oriented style, the lan­guage in which a particular object is implemented should be irrel­evant to users of that object; this is not of much use if the database cannot support multiple language objects . Since adaptability to change is crucial to databases, it must be possible to use old data in new environments. By supporting multiple languages, it becomes easier to adapt to new languages or variants of old ones.

Seamlessness: We refer to the integration of a database with the rest of the programming environment in a nonobtrusive man­ner as seamlessness . The differences in the data models supported by databases and programming environments create a seam. Pro­gramming environments support a rich data model. The objects created in application programs are, typically, richly intercon­nected to reflect real-world relationships such as is_yart_of, is_a, and is_owned_by. They have rich types: numbers, characters, strings, lists, arrays, vectors, bit maps, and procedures. Much of this information could be in multiple media, such as numbers, text, graphics, images, video, and audio . If this large collection of types and interconnectivity must be mapped into a different data model (simple types, poor support for relationships like is_part_of, i~>_a, and is_owned_by) for storage, much of the da!a abstraction provided by the object-oriented approach is lost . For

45

Page 6: Object-Oriented Databases: Design and Implementation

this reason, many next-generation database system developers believe that it is imperative that the object model of the applica­tion program and the database be as similar as possible. Seam­lessness requires that the database support as rich a data model as those found in programming languages.

Transactions Appropriate for Cooperative Design Work: Data­base systems for next-generation applications must clearly sup­port at least conventional concurrency and transaction mecha­nisms. In addition, design appl ications, like CAD, operate on data for much longer duration (days to months) than conventional databases (seconds to minutes). Long-duration transactions need to deal with the fact that locking data for the duration of the entire transaction is undesirable for several types of applications. The idea that a transaction loses all its work if it cannot commit atom­ically is also inappropriate for transactions that run for a long time. There should be additional mechanisms to support cooper­ative work, "partial commit," and visibility of transaction data outside the transaction. Researchers in CAD databases have stud­ied several techniques (workspaces, nested transactions, version­ing) [36], [37] to deal with long-duration transactions in cooper­ative design work.

Support for Evolution of Object Instances and Classes: Objects in a database are created using "templates"; these templates are relations in a relational system and classes in an object system. Application programs are not static entities; they undergo change, especially in large applications. It is not possible to know the correct structure of all templates when the application is first launched. The templates undergo change based on better infor­mation, change in the environment, or changes in requirements. Such changes, discussed in Section IV-E-2), are referred to as schema evolution. Persistent objects also undergo version changes. Objects evolve through their entire life-cycle. It is nec­essary to provide support for managing this life cycle evolution. Change management aspects of next-generation applications are elaborated in Section IV-E.

Distributed, Platform Independent, Object Storage: Next gen­eration applications deal with large numbers of objects of varying sizes. For example, a YLSI CAD application, modeling a millon tmnsistor VLSI chip, deals with gigabytes of information. A sin­gle I ,024 by I ,024 color bit map with 4 bits / pixel represents 4 megabits of information; there may be tens of thousands of such images, along with text, graphics , video, and audio in a large multimedia application. The volume of such information and the need to share it effectively among widely separated users require that stomge be distributed. Because such an environment is likely to be heterogeneous, storage must be platform independent. Information maintained in database systems is vital to the orga­nization that created it and is of value beyond the lifetime of the platform on which it was created. As a result, data must be able to migrate gracefully from one generation of hardware and oper­ating system platforms to the next.

Other Requirements: As database systems continue to evolve, better understanding will emerge regarding the requirements in areas like triggers (alerters, demons) , rule systems, and con­straint languages [38], [39]. No matter what functionality is added, some baseline requirements like adequate performance, reliability, robustness, and easy-to-use inte1jaces wi ll remain appl icable.

B. Convemional Databases are Inadequate for Next-Generation Applications

Database management systems provide efficient access to large amounts of persistent data. They also provide I) transaction man-

agement for correct, efficient, and concurrent access by multiple users, 2) access control for limiting data access to authorized users only , 3) long-term reliable storage of data and recovery from media and system failures, and 4) support for one or more query language for data definition and data manipulation. Database management systems for next generation applications should, of course, have all these provisions that conventional databases, such as relational databases, support.

Conventional databases, however, are often inadequate to serve the needs of next-generation applications for five principal rea­sons I) lack of expressive data modeling power, 2) the so-called "impedance mismatch" between programming languages and database systems, 3) inadequate interactive performance to sup­port next-generation applications, 4) lack of appropriate mecha­nisms for supporting long transactions, and 5) lack of appropriate mechanisms for supporting schema evolution and version man­agement.

Lack of Expressive Data Modeling Power: Conventional data­bases have largely met the demands of business applications such as payroll, accounting, inventory control, airline reservations, and electronic fund transfers . These applications are typified by very large amounts of well-structured information, limited types and

structures, and transactions that last for short lengths of time (usu­ally a few seconds). The success of relational database systems in meeting the demands of business applications is primarily because of the mathematical simplicity of the relational data model, founded on set theory [40], and on simple, powerful declarative query languages, such as SQL. However, this sim­plicity is a hindrance when it comes to supporting next-genemtion applications, since 1) conventional relational systems cannot sup­port complex data types (arrays, objects, class definitions, func­tions) and interobject references (as between the Ship and the Port in the example of Section II and 2) these data manipulation prim­itives do not include programming language control structures (conditional clauses, procedure calls, selection, iteration, and recursion). Next-generation applications require databases to sup­port the same level of expressive power provided by program­ming languages.

Impedance Mismatch: Conventional databases and program­ming languages support different data models and different para­digms for manipulating objects. These differences are usually referred to as an impedance mismatch. Impedance mismatch decreases application programmer productivity in two ways. First, programmers with complex problems (represented in the rich data model of the programming language) cannot easily map these problems to the simpler data model of the conventional databases. Second, even if the database uses some rich data model, if that model does not correspond closely to the data model in a host programming language, programmers would have to use different languages and modeling paradigms in the two environments. For many applications, this amounts to 30% or more of application code just to do translation between application language data structures and database structures [41], [42].

Performance Problems: One important reason why conven­tional databases fail to meet the needs of data-intensive object­oriented applications is lack of performance. Commercial data­base systems are not fast enough to support simulators and inter­active design tools . As a consequence, most CAD systems, for example. perfom1 their own data management on top of the file system . If a relational system is used at all, it is only as an index package to support associative access. A typical CAD task is unlike most data processing transactions, which involve getting tuples from a relation and updating them, or selecting large groups

Page 7: Object-Oriented Databases: Design and Implementation

of tuples from one or more relations and performing similar oper­ations on them (e.g., taking a join to generate a report or updating a salary field in each employee tuple to post a raise). The CAD

task also starts with a selection to pull out the pieces of a design, but then continues with many dissimilar fetch and store opera­tions, as it navigates through a web of CAD objects. The access paths on the selected data follow connectivity of the real-world

entities, not the logical structures of the database. The cost of a relational query to fetch single, already identified

objects is necessarily excessive for the following reasons [43]:

• Each fetch or store incurs the cost of a procedure call from the application program to the database. That overhead is insignificant on a data procesJing transaction that accesses a field in all tuples in a relation~ but is a burden when access­ing a single tuple. A procedure call cannot compete with simple offset addressing for accessing a field.

• Connections between entities in a relational system are through keys. At least one address translation is required to get from a key to the location of a tuple.

• Normalization and other encodings of complex design struc­tures impose additional levels of indirection between an en­tity and its components. Invoking a query processor to optimize a join of just a few tuples is very expensive. The common strategies for transactions and recovery that work well in data processing systems are locking and log­ging. Both put a lot of overhead on transactions that do indi­vidual updates to tuples. Neither has been validated as the optimal approach in an environment with long transact ions and data fields that may change many times before commit. Each tuple in a relational database is in some set . Insertions, deletions, and access of set elements require maintaining indices associated with the set. Persistent objects do not need to share the burden of set maintenance if they are never going to be accessed as clements in a set.

The activities in a conventional data processing system and a next-generation application system are of different nature. As a consequence, the demands made by these systems on a database are very different. There is a distinction between performance measures that are appropriate for conventional data processing applications and next-generation applications. Several bench­marks have been developed for measuring performance for con­ventional databases, such as the Wisconsin benchmark (44], and the TP I benchmark [45]. These benchmarks are closely tied to relational database usages , which emphasize operations specific to the relational model and high-volume transaction processing , respectively. Recently, benchmarks for next-generation database applications have been developed [46]-[49]. These include mea­sures for navigation ("pointer chasing'' performance), traversal across multiple aggregation and generalization relationships, clustering, "blobs" (binary large objects- multikilobyte values), versions, and interactive response time.

Lack of Mechanisms for Supporting Long Transactions: As we stated in Section III-A, next-generation applications need support for long-duration transactions and cooperative transactions. Con­ventional database management systems assume that transactions are short duration and lock very little data . Under this assump­tion. confl icting lock requests are infrequent and the cost of redo­ing an aborted transaction is minimal. In next-generation appli­cations, the cost of redoing an aborted transaction and the cost of an operation being blocked because of conflicting lock requests may be prohibitive. Also, conventional transaction mechanisms assume that all transactions started in a computing sess ion ter-

JOSEPH et a/. : OBJECT-ORIEI\'TED DATABASES

minate (commit or abort) in the same session; this assumption is not always valid in applications supporting design activities. A new model of transaction to support next-generation applications needs additional implementation mechanisms. Some of these mechanisms are discussed in Section IV-C-3).

Lack of Support for Schema Evolution and Versioning: Tradi­tional databases have provided no support for version manage­ment. A database is thought of as having a single state , namely

the current state. Even if historical or evolut ionary data were pre­sent, such data were for the use of the database system's recovery purpose. The database management system did not provide any way for applications to access the historical information. Ver­sioning tools to manage the life-cycle evolution of entities do not get support from the conventional database management system.

Schema evolution, in traditional databases, is considered the venue of the systems administrator. This is inappropriate in next­generation applications where the evolution of the schema is as much a part of the application semantics as the creation and evo­lution of objects modeled by the schema. The important role of schema evolution in object-oriented programming and OODB's is further clarified in Section IV-E-2).

C. Object-Oriemed Database (OODB) Data Models .

Object-oriented databases (OODB's) offer solutions to meet the requirements of next-generation applications. As characterized in Atkinson et a/. (2], OODB's have object-oriented data models similar to those found in programming languages. OODB's offer support for sharing of large object bases by multiple users . They provide support for long transactions and for versioning and con­figuration management of instances and types. They provide sup­port for queries. OODB's have the potential for providing per­formance adequate for next-generation applications.

At the current time, there are a number of commercial and research prototype OODB's. Some of these systems are surveyed in Section V. These systems take different approaches to meeting next generation application requirements. They have different features and strengths. The field of OODB's is too new for a general agreement on a "definitive list" of features and charac­teristics; but OODB's are beginning to exhibit similar function­ality.

1) Features and Characteristics: Ullman [50) defines the term "object-oriented database management system" as a class of pro­gramming systems with the capability of a DBMS (management of large amounts of persistent data , transaction-based concurrent access, data model, and query language), along with the object­oriented features (object identity, encapsulation, inheritance, object composition, and complex objects) discussed in Section II­A. Several more detailed characterizations of OODB's exist in the literature [2). [35], [51]. Table I lists features and character­istics of an OODB; most of these features are discussed in various sections of this paper. Database features like distribution and security. which are orthogonal to object-orientedness, are not listed in the table. Not all current OODB's support all these fea­tures.

The remainder of this section provides a taxonomy of OODB"s based on their data model.

2) A Taxonomy oJOODB's: The response of the research com­munity to meeting the database needs of next-generation appli­cations can be categorized into two schools , based on the data model of choice. The first school has advocated value-oriented database systems; the second has advocated object-oriented data-

47

Page 8: Object-Oriented Databases: Design and Implementation

Table 1. OODB Features and Characteristics. base systems. In this section we briefly compare value-oriented and object-oriented data models. This is followed by a description of different approaches to object-oriented data modeling. A tax­onomy based on the different approaches is illustrated in Fig. 4; some of the systems listed in the illustration are discussed briefly in Section V. A detailed exposition of value-oriented databases is outside the scope of this paper.

Feature

Complex Objects

Object Identity

Types and Classes

Encapsulation

Inheritance

Dynamic Binding ~nd Polymorphism

Seamlessness

Persistence

Secondary Storage Management

Transactions and Concurrency Control Recovery

Query Facility

Design T ransactions

Change Management

Description

The ability to define data types with a nested structure and to manage composition hierarchies: Sections 11-A-2), III-C, li-A-2). IV-A- I)

The ability to distinguish two objects independent of the values of attributes; Section 11-A-1)

The ability to organize similar objects and their implementation into an abstraction; Section 11-A-2)

The clear separation between the visible semantics and the implementation of objects; Section III-C

The abil ity to derive new classes from existing classes; Sect ions 11-A-3), 11-B

The ability to bind messages to different methods depending on type: methods may be bound at compile time or run time

Integrat ion of the database with the rest of the programming environment in a non­obtrusive manner: Sections lll-B, 111-C-2)

Existence of objects beyond the life time of the processes that created them: Section IV -B

Efficient data access by supponing clustering , indexes, buffering, and query optimizations: Sections IV-B-1), IV-B-3)

Concurrent access to data by means of atomicity . controlled sharing via locks, serializability ; Section IV-C- 1)

Ability to recover from software and media fai lures

Efficient high level declarative access to objects in addition to navigational programmatic access: Sections 111-C-2), IV-A-3). IV-D

Long running and nested transactions; Section IV-C-3)

Database suppon for managing the evolutionary life cycle of objects and classes; Section IV -E

a) Value-oriented versus OODB 's: In value-oriented data models like the relational model, relationships between different objects are stored implicitly, by comparison of values of attri­butes. For instance, Ship (USS Rendezvous , · · · Los Angeles, · · · .) and Porr(Los Angeles, · · · .) "match" by value in the Next_Port attribute of Ship and the Port_N ame attribute of Port. There are many value-oriented data models, ranging from extensions to the relational model to logic as a data model. The nested relational model relaxes the constraint that relations be in first normal form [32] . A relation could then contain another rela­tion as the value of a field of a tuple. However, such an extended relational model does not allow a relation to share, or " point to" another relation, i.e. , there is no support for the notion of object identity. Other extensions allow more complex data types than the primitive types of numbers, characters, and strings [52]. A main advantage of the extensions approach is its relative simplic­ity, based on incremental evolution of the well-known relational model.

Over the last decade, many new data models have been pro­posed. They include the entity relationship model [53], many semantic data models [25], [54], [55], and many object-oriented data models [43] , [56] . In fact some researchers [50] have argued that the earliest database systems (hierarchical and network model databases) were object-oriented in the limited sense of supporting object identity, even though they had no user-defined types and no notion of state or behavior encapsulation. The object-oriented models do not have the simplicity of the relational model , and many challenging research issues related to data modeling, query languages, and query optimization are the subject of active research.

Among OODB's, we distinguish two further categories: per­sistent language-based OODB's and query-based OODB's.

I UNIVERSE OF DATA MODELS I VALUE-ORIENTED OBJECT -ORIENTED I

Relational I Entity-relationship I . 082 ·ORACLE ·UNIFY

Semantic Data Models • DAPLEX ·SDM

Extended-relational Nested Rc.lat.onal Persistent Language Based

Objtct Suppon (navi&llional ~query suppon)

• lktkdey Postgres I New Object-Oriented-languages

·DEC trellis/Owl (~o queries)

Query-based Existing J:,anguages

-HI' Iris Dual Type Systems logic . MCC Orion

- Datalog (Stanford) · S.:nev Curpuro~IIOfl Q(n~un¢

·LDL (MCCI · At:ur Ol

Single Type System .n Zeil&elst . Ontologie Ontos

Obiect-Orienied Data Models

I<'ig. 4. Taxonomy of object-oriented databases.

48

Page 9: Object-Oriented Databases: Design and Implementation

b) Persistent language based OODB's: This category includes object-oriented data models that are the same or very close to the abstract data typing cap~bilities of an object -oriented programming language. The objective is that a database should be integrated with the rest of the programming environment as

seamlessly as possible. A truly seaniless integration of computa­tional storage (based on main memory or virtual memory) and data storage environments (based on secondary storage) requires that both use the same language and data model. This means that the same data types should exist in both transient and persistent environments and that the instances of these data types should be manipulated by the same operators. The lifetime of objects should not matter to programs manipulating them. It is also important that the same model of sharing and object identity be supported in both environments. This goal is consistent with the notion of "orthogonal persistence," as articulated in [57].

Even within this category there are different approaches. Some researchers have developed a new object-oriented programming language to support persistent objects [12], [58]; others have extended existing object-oriented programming languages to deal with persistence [56], [59], [60]; many have stayed close to an existing object-oriented programming language and accommo­dated persistence. In this last category there are 11 number of com­mercial OODB's and research prototypes that support a single mainstream programming language well; examples are Smalltalk in Servia Corporation's Gemstone [59], Common Lisp in TI Zeitgeist [61] and Symbolics Statice [60], and C/C+ + in Onto­logic Ontos [62].

Although these efforts have met the goal of seamless persis­tence to a large extent, no system has achieved the goal with respect to multiple languages. It may be argued that the goal of seamless persistence of multiple languages is unachievable. The goal of seamlessness may be relaxed to mean that persistence is achievable with minimal effort; for example, the language data model remains the same, but additional information may have to be supplied by a programmer for making certain classes of objects persistent. Achieving seamless persistence (in the relaxed sense) with respect to multiple host languages that share persistent objects is still a research issue.

To support seamlessness, programs must be able to interact with the database either by sending messages to objects held by the database (using the same syntax as they would for nonpersis­tent objects) or by explicitly retrieving objects and acting on them directly. Applications written in object-oriented programming languages such as CLOS and C++ typically use both approaches. These two ways of accessing the database are essentially navi­gational , since they tend to make use of embedded interobject references.

The goal of a seamless integration of database and program­ming environments is ambitious. There are concepts critical to

the database domain, such as concurrency control and transaction atomicity, that are supported poorly, if at all, in programming languages. Programming languages deal with the current state of objects and operations; there is no natural way to deal with past states. In contrast, databases are primarily for keeping of histor­ical records. We are learning that computational environments can benefit from many of the database amenities like transactions, concurrency, and history and that databases can benefit from pro­gramming language amenities like user-defined types and a rich data model. The way to realize such benefits will probably require augmenting the programming language or data model in cases

IO'>FPH Pr "' · ORJFrT.nRJFNTFn nATARA~F~

where a relevant construct does not exist, and will require the programmer or user to perform operations to use the database that would not be required if all data were transient. These concerns and issues are the domain of persistent programming language research [61], [63], [64].

c) Query-language-based OODB's: Persistent programming languages support all programming language entities as first-class objects. Query-based OODB's follow the relational database her­itage and support sets as the only kind to first-class (persistent) objects. Examples of this approach are HP Iris [65] and Berkeley Postgres [52], [66].

In this approach, applications are developed using a variety of host programming languages, such as Lisp or C. Applications interact with the OODB using a set-oriented query language to retrieve or update objects. The type system and control structures of the host programming language(s) are significantly different from those of the object-oriented query language. There is no claim of seamlessness; there is an "impedance mismatch" between the host language(s) and the query language. However, the query language makes allowances for the object-orientedness of the environment; methods are allowed in selection predicates and path expressions are permitted, as in navigational persistent language OODB's. Path expressions are implemented via queries instead of "pointers.'' The extent of a class (the collection of all instances of the class) is implicitly a set that can be queried; users may also define other explicit sets.

One of the main attractions of a query language like SQL is its declarative nature; declarativeness is important in relational sys­tems since it allows them to have simple, yet powerful query lan­guages whose queries can be optimized to yield acceptable per­formance. Some of this declarative nature is sacrificed when methods are allowed in queries. On the other hand, there is a concern that implementation of path expressions as queries is counter-intuitive and slow.

A synthesis between persistent language-based and query-based approaches to OODB's is likely. Such a synthesis allows first­class persistent objects but also optionally supports sets and a declarative query language. Section IV -D provides more detail on object queries.

This section of the paper examined the database requirements of next-generation applications and how these requirements are not adequately met by conventional databases. One approach to addressing the deficiencies of the conventional databases leads us to the area of OODB's. We listed the features and characteristics of OODB's and presented a taxonomy of OODB's based on data models. The next section goes into the architectural and imple­mentation considerations in building an OODB.

IV. OODB ARCHITECTURE AND IMPLEMENTATION

This section provides a look at the key design decisions in OODB's. OODB's built or proposed to date differ in their 1) object models , 2) mechanisms for storing and retrieving persistent objects, 3) concurrency control and transaction management, 4) query models and methods of processing queries, and 5) man­agement of evolving objects and class definitions. These aspects of OODB's are discussed in detail. Particular emphasis is placed on the key design choices and their implications on the remainder of the system components and the user/application interface func­tions. Representative interfaces and internal organizations of a number of OODB's are discussed. Performance improvement

dO

Page 10: Object-Oriented Databases: Design and Implementation

techniques are covered throughout the section. Other aspects of database management systems (DBMS's) such as access control , user-friendly interfaces, report generation, access to data held in other databases, and the ability to share information across het­

erogeneous platforms have so far received very little treatment in the OODB community. We discuss these aspects only briefly.

A. Object Model

Applications interact with an OODB through an object model specifying the OODB's functionality. Since different object models provide different capabilities or have different perfor­mance objectives, the object model has major implications on the OODB 's internal organization and implementation. Although all OODB's manipulate objects, their object models vary widely. Our discussion of OODB architecture begins by clarifying I) treat­ment of types and classes, 2) how objects are identified by appli­cations, and 3) how applications interact with objects stored in the OODB.

1) Treatment of Types and Classes: Recalling our discussion from Section II , objects are defined by interface (type) and imple­mentation (class), with each object instance having a unique object identity (OlD). A class definition augments a type definition with information about attribute names, attribute types, and internal methods. Thus in a sense, a class definition comprehends its cor­responding type definition. However, since many class defini­tions can be compatible with a single type definition, this rela­tionship need not be one-to-one. The result is that , in principle, a type can be implemented by more than one class. However, most popular object-oriented languages do not make a clear dis­tinction between type and class. with the result that the relation­ship between types and classes becomes one-to-one. As we shall see, this has profound impact on the ability of the OODB to man­age type and class evolution (Section IV-E-2). In our subsequent discussions, we shall use the term type to mean the abstract inter­face and the term class to mean the additional information added as part of the specification of the implementation. This seems in accordance with the actual usage in the fie ld. The reader is referred to Wegner and Cardell i [67] and Moss and Wolf [68] for further discussions of type theory and the distinction between type and class . We now examine how OODB's treat type and class, and the implications of these differences .

Class describes the physical structure of object instances. By comprehending the physical definition alone, OODB's can mater­ialize an object 's state (recreate it or restore it from secondary storage) and store that state when requested. However, this does not give the OODB the ability to interact with the object's inter­face by invoking its methods (which are part of the type defini­tion).

Even though all OODB's comprehend class , there is much variation as to what classes can look like. A primary question is whether the attributes of a class instance can be complex non­objects (armys, lists, structures, etc.), or are constrained to be only primitive types (numbers, characters, etc.) and references to other objects. If complex constructs are not supported. much flex­ibility is lost to the class developer, with a probable loss of effi­ciency in the implementation of the methods of the class. OODB's that attempt to extend existing programming languages typically support the wider interpretation. In some cases (notably Exodus/ E [64] and Object Store [69]), non-objects are supported as first­class entities by the OODB to ensure compatibility with a pro­gramming language that combines object-oriented and conven­tional constructs.

2) Identifying Objects: Each separately persistent object must have a unique OlD within the OODB. The OlD space must be large enough to provide unique OlD's for all persistent objects over the OODB's lifetime. To preserve information hiding, the OlD (as presented to the application) should be independent of the location where the object is stored. Location independence does not preclude lower levels of the system from having local, location-specific OlD's or from assigning OlD's based on expected scope to minimize OlD size. There must be one or more name managers in the system to map OlD 's to locations. The name space of OlD's may be partitioned among the various name managers. If this is done, a multipart OlD (similar to a phone number consisting of an area code, exchange, and number) is generally used to allow piecewise determination of which name manager to use. This scheme (used in Ford er al. [61] and Moss and Sinofski [70]) takes advantage of the locality of interobject references whi le still supporting nonlocal references and location independence.

Modularity in object-oriented applications is achieved by par­titioning an application's world into logically cohesive objects and allowing these objects to exchange information with other objects via messages. To do this, objects must have some way to refer to each other through inter-object references . There are sev­eral ways in which interobject references may be specified. These ways are stated as follows.

• The simplest approach is to embed the references within the referencing objects. This is typically implemented in object­oriented programming by memory pointers and in OODB's by replacing the memory pointers with OlD's . This closely models the OOP paradigm, and is generally used by OODB's that provide a persistent programming model. Drawbacks of this approach are that embedding the references makes object­oriented queries much more difficult (Section IV-D) , com­plicates object translation (Section IV -B), and makes it dif­ficult to dynamically add unanticipated references in stati­cally typed languages. A second approach is borrowed from the entity-relationship [53] model where all interobject references are outside the scope of the objects themselves. In this case. relationships based on OlD's can be added at will, and queries over rela­tionships can be resolved without recourse to the objects themselves. The disadvantage is that resolving references is slower, since they are not directly accessible from the objects. A third approach is to force all inter-object references to take the form of queries embedded within the referencing object [66]. This provides good support for optimized object-ori­ented queries over relationships, and since queries can spec­ify single objects, subsumes the embedded OlD approach . This approach requires a query processor and appears to be somewhat slower because of the query processor's overhead.

3) Interacting with Persistent Objects: There are two funda­mental issues with respect to interaction models; namely, who controls actions (the appl ication or the database) and whether the interactions are via navigation or queries.

a) Active and passive object models: In the passive object model, object state is materialized in the application 's workspace by the OODB and operated in the workspace by the application. Once in the application's workspace , the application may send

messages to the objects, or in the case of non-objects (unencap­sulated data), operate on them directly. By contrast , in the active object model, activity takes p)ace under control of the OODB and takes the form only of messages sent to objects.

Page 11: Object-Oriented Databases: Design and Implementation

The active object model has advantages in that it allows the OODB to restrict the operations from the abstract interface that may be executed by a particular application (presenting an object view similar to a schema view in a relational database), and allows the OODB and the object to decide jointly where a given method will execute. Remote execution under OODB control is particu­larly valuable if there is a disparity between the sizes of the objects involved in the message operation and the size of the result. Small objects producing large results can choose to execute their meth­ods on the application's machine, while large objects producing small results can choose to execute wherever they currently reside. Some objects, such as windows or Vrinter queues , may have log­ical or environmental reasons for executing at a particular place. The active object model also allows the use of specialized hard­ware, as long as the object can be materialized in that environ­ment, and facilitates scheduling based on network load.

In the passive object model, a restriction to only class instances is still possible, and is often done since it eases the task of mater­ializing/storing objects, and provides a purely object-oriented interface to the OODB. However, restricting only class instances to be first-class objects introduces a seam with respect to popular languages such as C++ and CLOS, which allow mixed use of encapsulated (class instances) and unencapsulated data. There is an active debate as to whether this is a restriction that not only simplifies the OODB developer's task, but also forces better pro­gram design on the application programmers. ObjectStore and E allow unencapsulated data; Zeitgeist, Orion, and 0 2 do not.

b) Navigation versus queries: The Persistent Programming Language approach [63], [64] allows the user to make program­ming language objects (encapsulated, and possibly unencapsu­lated, data structures) seamlessly persistent without changing the data model that the programmer sees. Normal programming lan­guage operations then "navigate" through persistent data follow­ing interobject references. This approach does not necessarily support sets , unless they are already supported in the language being made persistent. A criticism of this approach is that it does not preserve data independence any better than older network or hierarchical data models. Many applications (such as CAD) nat­urally access their data one object at a time. For these applica­tions, navigational access may be more important than the sup­port of sets. Seamless persistence provides several advantages, including ease of application development, type checking support at the database interface, and first-class persistence for all data types rather than just sets.

Another approach [52], (56], [65] extends the relational model to provide better support for user-defined types, sub-types, and functions. In this approach, database objects are always members of sets. Sets are the only first-class persistent type. An odd con­sequence is that a persistent array must be wrapped in a one-row one-column table, and accessed with a query. This approach does not preclude database types mirroring program types; a data model independent of any programming language is defined. It adds object-oriented properties to the database language but does not solve the impedance mismatch problem (Section IIJ-B) between programming languages and the database. Since the new type sys­tem does not span both languages, type checking across the pro­gramming language/OODB interface is not supported. A major advantage of this approach is that it may lead to upward compat­ible , object-oriented extensions to industry standard SQL.

Between these two extremes is a third approach, which orthog­onally extends object-oriented programming languages with both sets and persistence [58], [591, [61]. In these systems, all queries are against sets (collections), but navigation is also supported and

JOSEPH et a/.: OBJECT-ORIENTED DATABASES

not all instances of a type need be stored in a set. This is permitted since it is recognized that many applications do not need set-ori­ented queries and it is inefficient to force all instances to reside in a set. A consequence of this approach is that not only persistent sets but also transient sets can be supported [71].

B. Storing and Retrieving Persistent Objects.

The most fundamental function of a database is to provide a way for objects to persist beyond the scope of an individual pro­gram execution. Persistence may be provided using a single-level storage (persistent memory approach) or a two-level storage (stor­

age server approach). In this section, we discuss the two-level

storage architecture in some detail; a detailed discussion of the single-level storage architecture is beyond the scope of this paper.

To achieve persistence, an OODB developer has two choices.

o Extend the conventional programming environment into a persistent programming environment in which the results of a program remain in the program's memory after the creating program terminates. This approach, referred to as persistent memory approach, is based on a large, shared persistent vir­tual memory in which all programs execute. Objects are per­sistent if they can be addressed within this (potentially gar­bage collectable) space [72], [73]. Create a separate storage server to which objects are written from the application program's workspace before the pro­gram terminates and from which objects can be retrieved by subsequent program executions. This approach, referred to as storage server approach, requires a transfer of objects to the storage server; however, depending on the interface, this transfer may be invisible to the application.

The persistent memory approach provides the ultimate in seam­lessness between programming environment and database since they are , in fact, one and the same. However, no existing pro­gramming language provides all the capabilities required for a truly unified persistent programming environment. PS-ALGOL [63] is an attempt to design and implement the "ideal" persistent programming language.

Storage servers can be implemented in several ways. The choice of server style is determined by the object model to be supported and by the access characteristics of target applications. In the fol­lowing, we discuss several classes of storage servers and how they move objects to/from an application's computational mem­ory and how these servers can be used to support the OODB's object model. Some performance issues related to storage servers are also discussed.

1) Storage Server Models: Storage servers can be classified by their unit of transfer and control, and by the level of semantics associated with the objects managed by the server. Using these criteria, storage servers can be classified into domains of increas­ing complexity and power:

o type less pages servers (Exodus [64] and ObjectStore [69]; o typeless object servers (Zeitgeist [61], Mneme [70], and

ObServer [74]; o class-based object servers (Postgres [52], Iris [65], and pro­

posal by Wiederhold [75]; o type-based object servers (Orion [56] and 0 2 [76].

Of course, it is possible to augment any of these server types by adding a layer of software, or to increase system modularity by suppressing some capabilities; however, for our purposes , we shall consider only the basic levels of functionality of each model.

51

Page 12: Object-Oriented Databases: Design and Implementation

a) Typeless page servers: Page servers do not directly manipulate objects; instead, they manipulate virtual memory pages on which objects (or portions of objects) are known to reside. This is accomplished by creating a persistent virtual mem­

ory parallel to the target machine's (transient) virtual memory. Pages in the persistent virtual memory have the same format as pages in the transient virtual memory (thus page servers tend to be hardware architecture specific because of page format differ­

ences between platforms). When programs execute, they refer­ence their virtual memory in the normal fashion. However, some of their virtual memory pages are identified as overlapping a por­tion of the persistent virtual memory. Accesses to these pages cause a page to be copied from the persistent virtual memory of the page server into the transient virtual memory. The contents of the page are then operated on by the program in the normal way. When the program is finished with one of the persistent virtual memory pages, it is released and, if modified, copied back to the page server.

Since the page formats are the same in both memories, transfer costs are lower in page servers than in other storage server models that must do substantial work to materialize objects. However, this apparent simplicity and speed is offset by the need to have a way to trigger the page transfers and to resolve addresses properly after a page has been loaded. In Exodus/ E. which supports objects that are not class instances , pointer following is the c ritical oper­ation. Under the page server scheme, pointers cannot be virtual memory addresses . Instead, they are represented as offset within the page, or as references into another page. Therefore, any

pointer following operations in the application language must be redefined to compute a real virtual memory pointer from the offset and buffer address . This involves a preprocessor to replace all pointer references with the appropriate computation and to ensure that the buffer is resident and its start address is known. As a result of all this overhead, pointer fo llowing is somewhat more expensive in page server schemes. More serious issues are the optimization of call s to the page server (it is inefficient to repeat­edly pin an already resident page when an object is referenced many times in a tight program loop), and buffer management pol­icies to avoid thrashing. On the other hand, if the OODB supports a pure object model (all interobject references are by object ID for transient as well as persistent objects), pointer dereferencing is not an issue because all such references must go through a level of indirection to map OlD to virtual memory address anyway. This is the case in SmallTalk-like languages used by Servio Cor­poration's Gemstone 000 8.

To improve performance, pages may be buffered at the server and/or the client. Another performance consideration is cluster­ing. When a page containing a requested object is brought in , if other objects on the page are also requi red, the total number of pages to be moved/buffered is reduced. Since the unit of concur­rency control is the page, it is poss ible fo r incidentally coresident objects to cause unnecessary concurrency conflicts; proper clus­tering reduces this effect.

b) Typeless object servers: Object servers move and control access to individual objects or groups of objects . Typeless object servers understand only some notions of " abjectness:" namely, identity, the fact that an object has a type (without knowing what the type is), and that objects may be related to other objects by embedded OlD, externally speci fied relationship, or query. The type or class of individual objects is uninterpreted by typeless object servers; these servers cannot execute methods or access

the states of objects they manage . For storage into typeless object servers, object state is trans-

lated into a string of bits or bit-buckets; the process is reversed during object materialization. T ranslation preserves the structure of the object graph and decomposes the graph according to a set of persistent object boundary rules (discussed below). Translat­

ing an object graph requires translation routines for each primitive and constructor data type in the supported language. To preserve sharing semantics, visited cells must be so marked to prevent pro­cessing on successive visits . Object boundary rules defi ne the extent of a persistent object. The ability to change these rules and to select different translation primitives fo r objects provide a degree of independence between logical and physical represen­tations.

When translating an object graph for storage, the persistent object boundary rules determine what goes into a bit-bucket. One rule could be to make every object instance into a separate per­sistent object. Since objects tend to be small , this is inefficient in both space (management overhead per object) and time (many separate disk accesses) . It is more efficient to partition the object graph into somewhat larger subgraphs, and store each of these as a separate persistent object. On the other hand, to increase incre­

mentalism and improve concurrency granularity, it is desirable to increase the number of objects. For this reason, it is probably desirable to allow the appl ication programmer some control over where object boundaries lie. Object boundary rules also have implications on concurrency and access control processes, since in an object server, both muSt know the scope of the controlled object.

What should constitute the state of an independently persistent object? In the example of Section li-B, assume that a ship refer­ences a port. Is the port part of the ship? If the port is directly reachable from more than one ship, it must be a separate object , since otherwise the port 's state would be copied into the state of more than one ship. Then if some information about the port is changed from one ship, there is no effective way to ensure that the changes propagate to the other ship. However, consider an oil bunker object that is never referenced except as part of the ship. Is it a separate object, or is it part of the state of the ship object? The root cause of this confusion is that in nonpersistent languages such as C++ and CLOS, interobject references are generally implemented by memory pointers. Thus no clear dis­tinction is made between being · ' part of" an object and being "referenced by" an object. T his distinction must be made in the OODB world (see Section IV-A), and appears to be the cause of an unavoidable seam. However, it can be argued that this forces application programmers to think clearly about object identity.

c) Class-based object servers: Like typeless object servers, class-based object servers manipulate individual objects or groups of objects . However, they are also able to interpret and use the objects' state to provide additional service, particularly queries based on object state (see Section IV-D). Since the ability to manipulate object state directly in the server is critical for class­based object servers, they are typically built on top of relational

databases. Class-based object servers map object state directly into rela­

tional tuples . Each class definition defines a relation, and each instance of the class becomes a tuple in the relation. If an object gets some of its attributes by inheritance, the implementor can flatten out the inheritance graph and define a single relation for all the attributes of the class . This is called horizontal partitioning [56]. In vertical partitioning [77 j, a relation is defined per inher­ited definition, with the object state spread out through all these relations . In the Ships example (Section li-B, Fig. 3), with hori­zontal partitioning there would be a single relation Cargo_Ship

Page 13: Object-Oriented Databases: Design and Implementation

whose attributes correspond to all attributes from both the Cargo_Ship class and its parent Ship class; with vertical parti­tioning there would be two relations Cargo_Ship and Ship, with the object identifier as a key and with an individual object having its slot values stored as a tuple in each relation. In the latter case,

reconstructing a cargo ship requires a join of the Ship and Cargo_Ship relations. In either case, constructing the computa­tional state of an object is straightforward using relational quer­ies. Vertical panitioning makes it costly to construct an object

from several tuples in different relations; horizontal partitioning makes class evolution more complicated.

The approach is limited by the data types that the relational system can store in tuple attributes. For example, separate han­dling must be provided for types such as arrays and methods that are not supported by the relational database. Also, if the object contains other objects (as opposed to referencing them by name), these contained objects must be stored in separate relations and, again, the costly operation of constructing object state from mul­tiple tuples arises. There are advantages to this approach for sys­tems that stress a query interface, and most systems using this approach, in fact, emphasize queries over navigation. Also,

because a mature database technology is used as an underpinning, existing database services such as SQL queries, backup and recovery, access control, and concurrency control can be used.

d) Type-based object servers: Type-based object servers not only operate on individual objects, but have the ability to execute the object's methods. The ability to execute methods allows com­putation to be moved from the application to the storage server, allows the server to execute type-based object-oriented queries, and allows these queries to be further optimized

Method execution requires the storage server to first material­ize a computational representation of the objects; this can in fact be supported by page servers or either of the other two classes of object servers. Thus the extension of one of the other servers to a type-based object server can be accomplished by means of a software layer similar to that which supports the materialization of objects in the application's computational memory. The choice of whether to tightly bundle the added capabilities of type-based object servers with an underlying server or to make them separate modules in the OODB manager is a trade-off between perfor­mance and modularity.

2) Supporting rhe OODB 's Object Model: Systems that strive for seamlessness must have a transparent way to get objects from the OODB into computational memory. For example, if a Get_Port message is sent to a ship object, the application ought not to be concerned whether the port is acrually a separately per­sistent object; to do so would break the abstraction provided by the object-oriented model. In fact, it is possible that when appli­cation sending the message was written, the port was really part of the ship object, but was later separated out because of the needs of other applications. Ideally , the Get_ Port message ought to behave the same in both cases. Thus one could argue that, from a programmer's perspective. an object consists of everything reachable from the object. The object's root must be explicitly fetched using its OlD; most systems provide a name server to map user-friendly names to OlD's.

Once the root is identified. objects that it references should appear without further application intervention. This is known as object faulting, and can be supported in a number of ways. In page servers, transparent retrieval (and saving) is implemented by a compiler preprocessor or compiler modifications to gener­ated code 10 pin pages before they arc required and unpin/flush them after use [64]. In OODB's employing object servers, fault-

JOSEPH eta/.: OBJECT·ORIE'ITED DATABASES

ing is achieved by either augmenting the message dispatch pro­cedure [59] to perform the residence check and retrieval if required, or by forcing references to unfetched objects to cause a trap to the OODB (typically by an illegal memory address or memory contents), which can then materialize the object [61).

3) Storage Server Performance Issues: a) Clustering and prefetching: One school of the OODB

community believes that following interobject references is the dominant way of accessing objects from OODB's. If application

experience with the first generation of OODB's shows this to be true, these interobject references may be used to improve perfor­mance. Since the cost to retrieve a group of objects that are phys­ically colocated on the disk is essentially the same as the cost to retrieve one of these objects, disk access time can be reduced by clustering together objects likely to be accessed in the same ses­sion. Clustering schemes have long been used in databases; the major difference in OODB's is that the stored objects themselves provide a rich source of information that can be used to drive the clustering mechanisms. Some of the open issues arc the follow­ing: I) to what extent interobject references actually drive access patterns, 2) how to cluster with respect to multiple applications with dissimilar access patterns, and 3) whether clustering based on data type or clustering based on interobject references is bet­ter. Performance improvements of over 60% because of cluster­ing have been reported by CACTIS [54). ObServer [74] is astor­age server that dynamically clusters according to reference patterns.

Prefetching refers to physically moving certain objects off of disk into a buffer or cache before they are requested, expecting that they will be requested soon. Prefetching has the advantage of reducing communication bottlenecks and has a positive impact on application performance. It is based on much the same strategy as clustering. A potential problem with prefetching is illustrated by the claim that the disparity in processor and disk speeds will make it impossible to determine far enough in advance which objects are likely to be needed [78].

b) Using the storage server's object materializarion capa­bilities to support parallelism: The encapsulation of objects reduces the degree to which objects are dependent on their envi­ronment. This faci litates moving objects around a network to evaluate their methods where it can be done most efficiently. Encapsulation appears to simplify the parallel execution of object­oriented code because of the minimal dependence on environ­ment. The object translation capability of OODB's can be used to implement the actual movement of objects between machines. This is an area that presents opportunities to improve perfor­mance, particularly with respect to object-oriented queries, since queries tend to have a great deal of parallelism.

c) Measuring storage server petformance: DeWitt eta/. [79] simulated the performance characteristics of page and object serv­ers. The results indicate that the object server approach will per­form poorly with read-only applications that tend to scan large data sets, but will perform generally better than a page or file server for applications performing many updates. The simulation did not investigate possible similar benefits in the object server model from prefetching related objects.

C. Concurrency Control and Transaction Management

One of the primary purposes of a database is to allow sharing of information. As such , an OODB must regulate access to infor­mation by multiple concurrent users, each of whom is potentially unaware of the existence of the other users. This section I) sum­marizes concurrency control and transaction management as

53

Page 14: Object-Oriented Databases: Design and Implementation

understood and applied in conventional databases, 2) presents OODB-specific techniques that promise a highe r degree of con­CUITent access than achievable using conventional techniques , and 3) discusses how OODB's can support cooperative design envi­

ronments . 1) Transactions: Any database system must support the notion

of atomic, recoverable, and serializable transactions [80]. Atom­icity means that a series of operations has an all or nothing effect on the database; either all operations succeed or all fail. This is necessary so that applications sec a consistent state. Recovera­bility can be provided at many levels, and OODB's do not require or impose anything special in this area. Serializability means that if the operations of two atomic transactions are interleaved, the result is as if one ran to completion before the other started. Seri­alizability is considered to be sufficient (but not necessary) for ensuring the proper behavior of independent transactions in a multiuser system.

A transaction tree in which a transaction may contain sub­transactions is called a nested transaction [81]. Nested transac­tions are applicable to database systems other than OODB's. The results of committed subtransactions arc visible only to their par­ents. The results of subtransactions may or may not be recover­able to any given resiliency; this is purely an efficiency issue .

In an OODB , it makes sense for a method invocation to be treated as a transaction, since the actions that implement the method appear atomic to the message sender. Since methods often send messages to other objects, a natural nesting of transactions occurs. Also. because one of the goals of object-oriented pro­gramming is to facilitate code reuse , it must be possible to use existing objects that use transactions as part of larger transactions. If object abstraction is not to be broken. this must be possible without modifying the methods of existing object classes. Thus nested transactions are extremely important in OODB's. To date, nested transactions have been primarily a research topic with no commercially available implementations, to our knowledge. A simpler approach often used in practice involves using one global, s ingle-level transaction to wrap a collection of OODB operations.

If each method invocation is a transaction, it is essential that the cost of a subtransaction be minimal, as they will be so fre­quent. This implies that sub-transactions should not be recover­able. However, it is often desirable for an object to have its meth­ods create recoverable results. If these methods are then invoked from inside another method (itself a transaction), there is now a need for subtransactions to be recoverable, else the behavior of the object changes. For these competing reasons, it is desirable to separate the notion of recoverability from that of atomicity. It should be possible to specify how recoverable a transaction will be. There must also be rules to ensure that results read by a recoverable transaction are recoverable. This is still a research area and, to our knowledge, no implementation of a "variable weight" transaction mechanism exists.

2) Type-Specific Transaction Mechanisms: An object-oriented

approach to concurrency control is the notion of type-specific con­currency control [821, [83] . Typically, concurrency control is obtained by examining the read/write behavior of transactions. However, since the behavior of a type is completely defined by its interface, it is possible to construct atomic objects that can be used concurrently in ways not possible in schemes based on read/ write behavior only. For example, it is legal to both enqueue and dequeue from a queue object simultaneously , but this cannot be constructed using conventional protocols that depend on read/ write behavior.

Griffeth, Moss, and Graham [84] extend the notion of atomic

objects with the concepts of abstract and concrete atomicity. A transaction is thought of as a movement from one abstract state to another, by means of a sequence of abstract actions. A very flexible concurrency scheme can be built by considering layers of abstractions, in which an abstract action at one level is imple­mented as a sequence of concrete actions at the next lower level. As long as the actions at each individual level produce a serial­izable schedule at that level, the total schedule will also be scri­alizable. Different criteria for serializability (two phase locking, type-specific, etc.) can be employed at the various levels. This allows a much larger class of legal schedu les than would be pos­sible without the levels of abstraction. For example, if simple two-phase locking is used at each level, it would be impossible to release locks anywhere in the schedule if additional locks were required later. This restriction docs not hold in this model, since two phasedness is required to hold only within a single abstract operation. As a result, concurrency is increased. This model is particularly well suited for object-oriented databases, since com­position of objects provides natural levels of abstraction and encourages the use of type-specific concurrency control for indi­vidual objects. Without the notion of abstraction levels, concur­rency would either have to be enforced at the object level based on read/write behavior or encapsulation of the subobjects would have to be broken to allow a higher level type spec ific concur­rency control scheme to be implemented.

3) Long and Cooperative Transactions: Since one of the initial uses of OODB's is expected to be support of next-generation applications (see Section III) , there has been considerable atten­tion paid to the concurrency control requirements of such appli­cations, in particular, in computer-aided design (CAD). Design tasks generally involve a team of designers cooperating for a period of days to months. The long duration of these tasks (trans­actions) means that the concurrency control strategies used in conventional databases arc not appropriate [36]. Traditional data­bases enforce serializable schedules of transactions, with the major differences being in the size of the information grain whose access is individually controlled. Since the traditional transactions are of short duration (seconds to minutes), trad itional schemes rely on the blocking transactions terminating quickly, thus enabling other transactions to continue execution. When the transactions execute for long durations, this scheme does not work; a long-running transaction may block out other transactions for days or months.

Support for long transactions is crucial in an OODB. The first issue is to ensure that long transactions can save their interme­diate state. This can be accomplished by checkpointing (tradi­tional) , nested transactions (see above), or piggy-back transac­tions (which allow a long transaction to be split into a series of shorter transactions that run sequentially, passing their locks directly to their successor to prevent the intervention of another transaction from outside the sequence). These techniques do not allow increased concurrency.

Sagas [85] relaxes the restriction that the subtransactions of the saga execute without external interference by requiring that each subtransaction be supplied with a compensating transaction that can undo the transaction's effects should the saga abort, even when another unrelated transaction has al ready executed. Additional concurrency can also be gained by type-specific concurrency con­trol (see above). This allows schedules whose individual opera­tions are not read/write seria/izable. However, the behavior of the resulting schedule is still serializable in the sense that trans­actions only see a globally consistent database state.

In a design team, designers often look at each others' incom­plete or inconsistent results to guide their own work. The concept

Page 15: Object-Oriented Databases: Design and Implementation

of cooperative transactions relaxes the database consistency requirements to allow transactions to view each others' partial results under certain conditions.

One mechanism to support cooperative work is a check-in/ check-our system [86] . Designers wishing to cooperate "check out" design objects from the global database into a private work­space. In the private workspace, designers operate on the objects outside the database's concurrenci control. When the designers are done , the objects are "checked in" to the database. To the global database. which enforces minnal concurrency control, the entire collection of operations in the private workspace appear as a single transaction. The check-in and check-out operations are normal (short) transactions against the global database. The check­in/check-out scheme supports flexible concurrency control while not requiring changes to the database's concurrency manager; but it delegates much of the responsibility for data integrity and con­currency control to the designers.

OODB's can also support cooperative work more directly by providing and enforcing a wider variety of lock types than the customary read and write locks [87]. An example is the notify lock, which causes the holder of the lock to be notified in the event that another transaction modifies the locked object. This notification can then be used by the lock holder to either trigger a reread of the object, or a negotiation with the modifying trans­action to resolve any inconsistencies. This scheme provides ser­vices by which consistency can be maintained, but does not actual! y enforce consistency.

Read-only or multiversion databases also can provide addi­tional concurrency [88]. Each write creates a new version of the object; thus it is possible for several transactions to be creating new versions simultaneously. This has the drawback that there is no uniform way to reconcile or merge competing versions. The database could provide some tools to identify conflicts in need of resolution when merging versions .

When the set of possible transactions is known in advance, it is possible to define a transaction group [89] whose members are known not to conflict and thus can be interleaved in any order. The transaction groups arc determined by examining the seman­tics of the individual transactions. A similar approach was used in the System for Distributed Databases (SDD-1) [901. Korth and Speegle [9 1] define local consistency constraints based on pre and post conditions for individual transactions. In this case, if a trans­action has a less rigid requ irement for concurrency control , it can run under more relaxed conditions, perhaps allowing the use of uncommitted or only partially consistent objects . Skarra [92] combines the concepts of local consistency and transaction groups into a scheme allowing dynamic transaction groups.

D. Object-Oriented Queries

Database queries retrieve or manipulate information that satisfies some predicate . In other words, information is accessed based on its value, rather than its identity. In relational databases , this cor­responds to the retrieval of a relation of tuples via a query lan­guage such as SQL. In relational database systems, the targets and results of queries are relations: in object-oriented systems, the targets and results of queries are sets of objects. Object-ori­ented queries differ from relational queries in three main respects: I) allowable predicates and response sets, 2) semantics of rela­tionships and inheritance, and 3) query optimization techniques.

1) Allowable Predicates and Response Sets: Objects selected in an object-oriented query can be detcnnincd by a predicate involving either the object's abstract interface (type) or the

JOSEPH el at.: OBJECT-ORIENTED DATABASES

object's implementation (class). Queries over type are purely object-oriented; however, to support them, the OODB must be able to execute the object's methods. Thus queries over type are restricted to OODB's supporting an active object model. Queries over class can be implemented by OODB's supporting only a pas­sive object model. When supporting queries over types, the 0008 may choose to allow only non-side-effecting methods to be used in a query predicate. This avoids the necessity of deciding what to do about an unintended side effect caused by the execu­tion of a query. Similarly, restricting queries over class to apply only to certain attributes is reasonable when the OODB uses a relational database to implement its storage server. In such sys­tems, some attribute values (primitive values) are stored in a way understandable by the relational database, while others (bit-buck­ets, see Section IV-B) are not. The SQL engine of a relational database can support queries over primitive attributes , while addi­tional work would have to be done to support queries over packed attributes. By restricting queries to primitive attributes only, a simple query capability can be added easily.

In a relational database, heterogeneous responses (response sets containing more than one type of tuple) are not possible. How­ever, in OODB's heterogeneous responses are possible (though not always supported) for object-oriented queries. For example, Postgres [52] supports heterogeneous sets, and thus it would be possible to send a List_ Items message to a set of both Ships and Cargo _Ships in the example of Section II-8; other systems would consider this to be an error since Ship does not have List_ ltems in its interface.

2) Semantics of Relationships and inheritance: Objects have a much higher level of semantics than relational tuples. The addi­tional semantics arise in two ways; namely , I) rich interobjcct reference semantics expressible in multiple ways and 2) the use of inheritance to express relationships between classes. Queries over objects must comprehend these additional semantics.

Interobject reference may mean relationship , containment. connectivity, or ownership. For example, assume that we have a part object containing a list of references to other part objects . If a query asks for the list of all sub-pa1ts of the part , it is necessary for the query processor to know if this list of part objects is a list of sub-parts or a list of other parts to which this part is attached. For this reason, OOD8's that define a new type system often define explicit types of known interobject references. In persistent language based systems, where references are embedded within the objects as OlD's, there must be a way to communicate similar information to the query processor.

Inheritance is used for many purposes [68], [93]; two examples are subtyping and code sharing. It is important for the query pro­cessor to know the meaning of the inheritance in a particular use. This is illustrated by the following example. If there is a class Android that inherits from class Person to share code, it is not meaningful for a query asking for persons with some character­istic to return any androids. However, in the example of Section li-B, a retrieval of Ships with speed greater than 50 could return Cargo _Ships. Again, OODB's that define a new data model are free to restrict the meaning of inheritance, while persistent lan­guage systems must retrofit to the meaning of inheritance in the language.

3) Optimization of Object-Oriented Queries: Query optimiza­tion in relational databases is accomplished by mapping a query into a graph of algebraic operators like join, semi-join, project,

and select and then transforming the graph to a more efficient execution graph. To know "how" the graph can be rearranged, the optimizer must know which operations commute, distribute,

55

Page 16: Object-Oriented Databases: Design and Implementation

and associate. To know which ordering is ''better,'' the optimizer must have knowledge of such things as estimated cost to perform an operation, expected result size, and existence of indexes. These are known for relational algebra on which relational query lan­guages are based. Since object-oriented queries allow arbitrary methods to be used as part of the predicate of a query, neither their algebra nor performance characteristics can be known at the time the optimizer is written.

Optimizing queries in the presence of arbitrary methods is an open issue. Several systems allow database methods to be imple­mented with arbitrary code. This promotes seamlessness but allowing methods blocks the usual query optimization strategies. Also, side-effecting methods may cause iterating over a collection in different orders to give different results (Andrews [58] presents a scheme for using blocks with identifiable end-markers to avoid this problem). Graefe and Maier [94] explore a mechanism to make the implementation of methods visible to the query opti­mizer. This is a violation of encapsulation if the optimizer is seen as an application; it is a reasonable extension if the optimizer is seen as a system module that may break encapsulations in a dis­ciplined manner. Systems like Postgres restrict methods to con­tain only data manipulation language (DML) commands that can be optimized. Other systems do not permit methods in queries; but do allow methods to be executed against objects retrieved by class-based queries.

Relational extensions [95], [96] showed how abstract data types (ADT) from a programming language, restricted to not containing pointers, could be imported/exported into relational database fields and how operations on ADT's could be used in queries. This work also showed how to extend standard indexes to provide fast access paths for ADT's and how to define entirely new index types like KWIC and R-trees. These extensions were made by registering ADT's, operators, and abstract indices with the query optimizer. We see a trend toward providing open query opti­mizers to support not only abstract indexes but also semantic query optimization, cooperative response, incremental view update, incremental query reformulation, and/or parallelism and distrib­uted queries.

A generally useful optimization technique is caching, i.e., sav­ing the result of a computation so that it can be reused rather than recalculated. This becomes very useful if method results are allowed in query predicates. A generic cache management system implemented in Lisp is described in [97]. Postgres uses caching to compute derived representations of complex objects, like forms, so they are immediately available when needed. If the data from which they were derived are modified, a trigger recomputes the cached representation. Similar data caching mechanisms can be used to support view materialization [98], [99] and could be used to maintain consistency of compute indexes.

E. Change Management

There is general agreement in the research and industrial com­munity [71], [ 1 00] that change management is an important ser­vice to be provided in an object-oriented environment. Change management (CM) may be provided as an integral part of the data model or as a distinct layer decoupled from the data model. Some OODB architectures have embedded CM as an intrinsic part of the model [65]; others implement it as a distinct layer [61]. Most existing software change management systems [101]-[103] inter­act closely with file systems and only support the management of change at the file level. An object-oriented environment needs a change management system that provides a unified environment

supporting the evolution and configuration of arbitrary objects. Systems like PIE [104] and Common Lisp Framework [105] address this need in specific domains by providing an object-ori­ented framework to manage change at the level of granularity indicated by the semantics of the application. A reference model for a generic CM system is described in Joseph et at. [106].

1) Change Management Definitions: CM may be defined as a consistent set of techniques that aid in evolving the design and implementation of an abstraction. These techniques may be applied at many levels to record history and explore alternatives (versions), manage a layered design (configurations), and main­tain consistency during evolution and across multiple represen­tations (transformations) . The operational model of change man­agement is tightly related to the transaction model in the system. A model supporting nesting of transactions, exceptions, and noti­fications is needed for a full implementation of a CM system. Since inheritance is an important part of object-orientedness, a CM system also needs to understand the additional constraints that the inheritance model imposes on the environment. It is important that the system support objects of arbitrary types at different levels of granularity. Users are also concerned about interfaces to the change management system; for example, graph­ical presentations and query faci lities. CM should document the evolution of objects for purposes of validation [I 07], traceability and reuse. Change management systems exist to assist the man­agement of data; this support role implies that a CM system that is obtrusive, low performance, or unfriendly may not be used.

a) Versions: the life cycle of a system is a set of discrete activities occurring during its development and use. The objects in the system evolve during this life cycle. A snapshot of an object during this evolution is called a version of the object. This snap­shot may be distinguished from others by its creation time or by some other quantitative or qualitative attributes. An object is rep­resented by its many versions during its life cycle. The version derivation sequence of an object reflects how the object evolves. The simplest way an object could evolve is linear: changes to the object always occur on the current version. In many applications, however, changes often occur in a nonlinear fashion. Therefore, a change management system needs to support alternatives or branching versions. Also, the desired object may be selected from a combination of several possibilities, implying the system needs to support merging of versions. Since objects are usually struc­tured hierarchically, the creation of versions of an object can trig­ger versions of other objects in the hierarchy. It is often desirable to allow the application to control the triggered creation of ver­sions in a hierarchy [ 108].

A version of an object may be represented as a delta from a "previous" version. Such differential representations save space when the object is large or when the changes are small and fre­quent. Differential representation trades off computation time for storage space.

b) Configurations: It is frequently necessary to design an object by composing it from other objects (see Section II). A con­figuration of an object is its specification as a composition of other objects. The specification may be bound either statically or dynamically. In static binding, the exact versions of the compo­nents that make up the configuration are specified; in dynamic binding one may use various schemes to delay the binding of versions of components [104], [109]. It is sometimes appropriate to bind the component objects only to their interfaces; this way a change management system can select the appropriate implemen­tation of the object dynamically. A configuration structure is, in general, a directed acyclic graph. The links in this graph may be

Page 17: Object-Oriented Databases: Design and Implementation

adorned with properties to associate with an object and its com­ponents. One such property is selective inheritance of some prop­erties of the composite object. Properties common to many com­ponents of an object may be attached to the object and inherited by these components. This kind of inheritance promotes perfor­mance and consistency. Another link property is that of owner­ship [110]; this property asserts that a component exist only by virtue of its being a part of another object. Physical hierarchies can be simulated in object-oriented design by means of ownership links.

c) Transformations: Operations applied to objects during their life cycle are transformations. There are transformations such as editing, simulation, and analysis applied to particular views of an object. There are also transformations, variously called trans­lations, compilations, expansions, or synthesis, to bring an object from one view to another. In heterogeneous or multiuser environ­ments, there are transformations to transport objects across machines or development environments. As a result of transfor­mations, new objects are created. These new objects are different from the original objects in versions, configurations, or represen­tations. The transformation aspect of change management addresses issues such as change notification, change propagation, dependency tracking, and constraint maintenance. What is required, in general, is a constraint specification and management component. A body of work exists in the area of constraint spec­ification and enforcement [111]-[115]. A transformational design paradigm for software development and application of the para­digm to YLSI design are studied in Mostow and Balzer [116].

2) Schema Evolution: The objects in an OODB are defined by a set of type and class definitions. The inheritance structure of an object-oriented system defines a relation is-a-subtype/subclass on this set of types/classes. Under this relation, the set of types/ classes is a directed acyclic graph called the schema. The schema evolves by changes to the set of behaviors associated with a type, the structure of the type/class hierarchy, the physical organization of the class instances, or the methods implementing the behavior [59], [ 117].

Schema evolution can be seen as an application of the change management system where type/class definitions are the objects to be versioned. Inheritance imposes a configuration structure on the types/classes. The user is concerned with the effect that a versioning of a type/class has on existing instances; this is the domain of the transformation aspect of change management. Conceptually, therefore, a change management system covers all aspects; but there are difficult and interesting practical issues because of the extensional semantics (the associated instance and methods) of a class [87]. When a type/class evolves, the seman­tics may require that existing instances of the type/class change to conform to the new definition. The operational details and pol­icies are provided by the change management system. The poli­cies about when (immediate or at access time) and how (version­ing or overwrite) the instances are updated for conformity have a major influence on performance and functionality. To assure structural consistency of old objects with new programs and to assure that existing programs continue to work, it is necessary to keep versions of the types, classes, instances, and programs. The programs need to know the particular versions of the types/classes that are correct for them. A fully functional change management system is required at all stages of software development and use to guarantee the correct schema evolution semantics. Schema evolution in object-oriented systems is inherently more complex than schema evolution in relational databases; this is because of the additional semantics associated with objects- namely , inher-

JOSEPH eta/.: OBJECT-ORIENTED DATABASES

itance and behavior. Most existing object-oriented systems (pro­gramming and database) have very limited schema evolution capabilities. CLOS supports a limited form of schema evolution [17]. The reader is referred to Joseph et al. [71] for a discussion of schema evolution support in their systems by the designers of some OODB's.

Schema evolution presents one of the most dramatic rationales for the separation of type and class. The promotion of existing instances from one class definition to another, and the recompi­lation of programs to correspond to new definitions is a major expense. By separating the concepts of type and class, many of these operations can be avoided for certain kinds of schema evo­lution. For example, allowing several classes to implement the same type and then requiring programs to interact with objects by their type allows instances of old and new classes implementing the same type to coexist in memory, thus eliminating the need for instance promotion or program recompilation.

F. Other Database Issues

Given the immaturity of the OODB field, issues such as access control , remote database access, interlanguage sharing of objects, and user interfaces have not received adequate attention. How­ever, these are areas that are likely to receive more attention in the next few years.

I) Access Control: Because OODB's store active objects, it is possible for the objects to perform their own access control. For example, an object could demand authentication before it per­formed some service. An interesting issue is how access control will interact with querying, since queries are generally performed under control of the database rather than the application. The question then becomes: who needs authentication, the application or the OODB? Also, would this preclude some optimization strat­egies, since some reorganizations of the query graph might be precluded by the need to maintain the access checks in the same relative positions in the graph?

2) Remote Database Access: Because OODB's present an abstract view of information that is representation independent, OODB' s may form an excellent way to access heterogeneous databases, presenting the user the impression that there is a single OODB being accessed. A problem with this approach is to coor­dinate the transaction/commit mechanisms of the "foreign" data­bases to ensure atomicity.

3) lnterlanguage Sharing: It is desirable to be able to share objects across hardware/software platforms, and also between dif­ferent programming languages. When sharing objects across plat­forms/compilers, the data types' representations may be different based on different hardware word length, byte ordering, formats of structures like arrays and records, etc. This implies that the OODB must translate objects; systems that copy uninterpreted byte strings will not be able to support such sharing. There has been substantial effort [ 118]-[ 121] in other contexts to provide translation of primitive types (such as integers and reals) between machine classes. For OODB work, this must be extended to com­plex data structures. Some applicable work is the MIT Mercury system [122], but Mercury does not preserve sharing within a graph structure.

Sharing across language boundaries raises the additional com­plication that not all data types are supported in all languages. Even for data types that appear to be universally supported, the semantics vary from language to language. For example, some languages require arrays to be homogenous (all data elements of the same type) while others do not. Further, some languages allow

57

Page 18: Object-Oriented Databases: Design and Implementation

an array to grow in size as needed, while others require its size to be declared at compile time. A system wishing to share rep­resentations between languages must decide whether it wishes to take a "least common denominator" approach that supports the least powerful features of the supported languages, or perform some runtime checking that will prevent a data structure from being moved to an inappropriate environment. Each approach has

the disadvantage of requiring a progmmmcr to know the potential usages of objects. Additionally, since methods may be arbitmry programs, they cannot be translated between languages. This would either require coding presumably identical methods in the various languages supported (with the resulting uncertainty of equality), or destroy the objcct-orientedness of the system entirely. While limited sharing is likely in a few years, a total and trans­parent ability to share across language boundaries remains a dis­tant hope.

In OODB's such as Trellis/Owl [1 2] that support a separate data model , access from "conventional" programming languages is by means of messages sent from the conventional language to objects in the data model. This requires a dual type system, but does allow objects in the data model to be used from multiple languages in much the same way relational detabases can be used from multiple languages today.

4) User Interfaces: OODB's are likely to use the same user interface technology as other applications, s ince user interface technology is becoming increasingly independent of backend applications. Generdl-purpose object-oriented user interface toolkits like Motif [123] or Interviews [124) will provide appli­cation program interfaces including libraries fo r tables, forms, and other presentation styles . Higher level User Interface Man­agement Systems (UIMS's) and User Interface Builders (UIB's) will hide even these programming interface details from users.

Data structure inspectors and schema browsers/editors, which OODB's will usc, will be needed in ordinary programming envi­ronments independent of whether OODB's supply the persis­tence. In fact. it will be most desirable if the same editors will operate on persistent and transient objects.

On the other hand , OODB's will make it eas ier to store and manipulate multimedia data (image, video, audio) , as well as graphics and text data , making it easier to construct much larger multiuser, distributed hypermedia systems and spatial database applications. OODB's make it possible to use object queries to query mail , document structure, and other semistructured data to build rich views of information.

V . A SURVEY OF OODB SYSTEMS

In this section, we present a brief survey of a few typical OODB systems. This is not a comprehensive listing. There are many interesting systems that we are not going to cover. These are: DEC Trellis/OWL'~ [ 12], Symbolics Statice [60), Object Design ObjectStore [69], Versant OBJECT-Base [125], and Objectivity Objectivity/DB [126]. The information below is based on pub­lished material and may not be the most up to date as these sys­tems are undergoing rapid evolution.

Of the seven systems surveyed here , the first two (Ontos, Gem­stone) are commercial products and the third (MCC Orion) has a commercial variant; the others are research prototypes .

Figure 4 shows how these systems fit into the taxonomy of

databases discussed in Section III-C-2).

'"Trellis is a trademark of Digital Equipment Corporation.

A. Omologic Ontos

Ontos [62] is a comme rcial object-oriented database system developed by Ontologie. It provides persistence to C++ pro­

grams . Instances of classes that inherit from an Object class can be made persistent. Such classes have some restrictions to guar­antee that the size of instances can be detem1ined by Ontos. Addi­tional methods to support persistence may also need to be defined for such classes. Ontos interfaces directly to programs written in C++ via a client library of C++ functions and classes. The library , which is linked d irectly into application programs, pro­vides applications access to the database by means of persistent classes , schema classes, Aggregates , and exception classes. Schema classes are used to manage the class infonnation (data dictionary) and provide run-time type information; Aggregate classes form the basis of arrays , lists and sets and deal efficiently with groups of objects via associated Iterators; exception classes are used for consistent detection and handling of run-time error conditions.

Ontos supports concurrent users through a transaction mecha­nism based on locks. Transactions have a number of options to

support short and long transactions. Ontos provides both pessi­mistic and optimistic concurrency control and checkpointing . A programmatic SQL (embedded in C++) is provided for associ­ative access to data . An assoc iative query may include methods or functions in any of its clauses; any side effects are visible in the database. The query may iterate over instances of classes or of Aggregates.

Ontos can be distributed over a network of nodes, each of the same hardware family and qperating system. The current sup­

ported families are Sun/Unix, Apollo/Unix , HP/Unix. Yax!VMS, and PC/OS2. Ontologie customers are mostly in the engineering and CAD/ CAM markets. A previous OODB product from Onto­

logic was called Ybase (58].

B. Servia Corporation Gemstone

Gemstone"' [59] is a commerc ial object-oriented database sys­tem developed by Servio Corporation. A language , OPAL, based on Smalltalk [II] is provided for data definition , data manipula­tion, server access, and general computation. Secondary storage management , concurrency control, transactions, and work spaces arc managed by the STONE subsystem built on top of a file sys­tem. The GEM subsystem supports the OPAL language and libraries of OPAL classes and methods. Gemstone also provides a module callable from multiple languages to link with applica­tions running on a PC. Associative access to objects is provided through a calculus limited so that queries are viewed as OPAL procedures. A detailed discussion of indexing in object-oriented databases and the particular implementation choices in Gemstone appears in Maier et al. [43] . Gemstone is designed as a multiuser system and supports transactions, replication of data , and multi­level authorization control. It also provides the capability to

extract data from SQL relational database systems. The OPAL programming environment that runs on the local workstation includes a class browser, workspace manager, inspector, and debugger. There is no support for version control o r configurat ion

management. Gemstone is written in C and runs on DEC (VAX and DEC­

station) , Sun (Sun-3 and Sun-4), and IBM (RS6000) computers. Client applications may be written in seveml languages and run on various IBM PC 's and Apple Macintoshes in addition to the

'"Gemstone is a registered trademark of Servio Corporation.

Page 19: Object-Oriented Databases: Design and Implementation

machines listed above. Servio Corporation's customers are in CAD/CAM, CASE, and text-oriented applications including con­figuration management and documentation systems.

C. MCC Orion

Orion [56], [127] is a database system developed in the Advanced Computer Architecture Program at MCC. The major o~jective of Orion is the integration of a programming language With a database system. Orion does this by adding persistence and sharing to objects created and manipulated in object-oriented applications. The Orion data model supports multiple inheritance, s~h~ma evolution, versioning of objects, composite objects, asso­Ciative queries, and transact ion management. Queries are done on members of a class. Only objects that are instances of Orion classes can be made persistent. The application interface to Orion is an object-oriented extension to Common Lisp. The query lan­guage allows user-defined functions in the selection predicates; there are some restrictions on the objects the queries can return. Orion provides programming level control to do physical cluster­ing of objects, maintain ing secondary indexes, and transaction ~anagement. Details of transaction management in Orion appear m Garza and Kim [37); implementation details of the buffer man­agement scheme are given in Kim eta!. [128].

Orion- ! SX is a multiuser version of Orion in which a single server provides persistent object management to multiple work­stations. Orion-2 is a fully distributed version of Orion-ISX. Orion was implemented in Common Lisp on Symbolics 3600 workstations and then ported to SUN workstations running UNIX. A commercial product called ITASCA, based on Orion technol­ogy is marketed by Itasca Systems Incorporated.

D. HP Iris

Iris [65), [ 129] is a research prototype of an object-oriented database system developed by Hewlett-Packard. The Iris system consists of 1) a query processor, or object manager, that imple­ments the object model, 2) a storage manager subsystem provid­ing access paths, concurrency control, backup, and recovery, and 3) a collection of programmatic and inte ractive interfaces. The data model supports structural and behavioral abstractions. In Iris, information about objects is modeled using relationships. Attri­butes are modeled via functions whose values are derived from the relationships. The query processor translates Iris queries and operations into an internal relational algebra format. The Iris stor­age manager is similar to the RSS in System R[130]. The storage manager provides for the dynamic creation and deletion of rela­tions, transactions with check-pointing, and indexing. One of the interfaces is an object-oriented extension to SQL. The two main extensions are that direct references to objects rather than keys are used and that functions defined by the user or Iris can appear in SELECT and WHERE clauses. A second interactive interface to Iris is a structure browser to view the Iris schema data. Version ~ontro l is coupled with the data model; schema versioning is Implemented by a schema version identifier that is associated with each object .

The Iris prototype is implemented in C on HP-9000/320 UNIX workstations. The storage is on HP's ALLBASE Relational DBMS. There is a version of Object SQL embedded in Lisp to access Iris from Lisp applications.

E. Tl Zeitgeist

The Zeitgeist [61) OODB system under development at Texas Instruments is designed to support design applications and large-

JOSEPH ~~a/.: OBJECT-ORIENTED DATABASES

scale object-oriented applications by providing a seamless inter­face to persistent objects from programming languages . The Zeitgeist architecture is modular and is composed of the persis­tent object store, the object management system, the set-oriented

query interface, the change management system, and the user interface system. Persistent object store provides storage for objects, concurrency and control primitives, and atomic transac­tions. Object management system oversees the translation of objects between computational and stored formats [131], and a transparent, on-demand retrieval mechanism called object-fault­ing. The programmatic interface is via messages to objects. Zeitgeist does not introduce a new data-model; instead, it sup­ports the data model of the programming envi ronment. A change management system implemented as an abstract machine on top of Zeitgeist provides versioning and configuration support . Schema Evolution is supported using the change management system. A hypermedia system provides database browsing.

There are two implementation of Zeitgeist: one running on Unix workstations and supporting applications written in C + +; another running on Unix workstations and TI Explorer Lisp machines and supporting CLOS applications. Both implementations currently support single server, multiclient configurations. Zeitgeist is being used by computer-aided design and manufacturing appl ications within Tl.

F. Altafr 0 2

02 [76] is an object-oriented database system being developed by the Altair consort ium in France. It has the functionality of a DBMS (pers istence, disk management, sharing, and query lan­guage) and that of an object-oriented system (complex objects, object identify, encapsulation, typing, inheritance, overriding, extensibility, and completeness). 0 2 supports a set of database programming languages, C02 (a combination of the 0 2 data model and C programming language), and Basic02 (a combination of the 02 data model and Basic programming language); a set of user interface generation tools (LOOKS); and a programming envi­ronment (OOPE). 0 2 consists of eight functional modules : OOPE, the alphanumeric interface; LOOKS, the language processor; the query interpreter; the schema manager; the object manager; and the disk manager.

The disk manager takes care of I/0, data placement, indexing, and buffering. The object manager maps the abstract object data model onto the disk representation. The schema manager deals with schema information such as types and programs. The lan­guage processor manages the data definition language commands and the compilation of programs. It also populates the schema by sending orders to the schema manager. The query interpreter is responsible for interpreting queries using the object manager and the schema manager. LOOKS manages the screen, displays objects and values, and handles their interaction with the object manager. OOPE is the programming environment; it uses LOOKS to display and manage data on the screen. The alphanumeric interface provides direct access to the various languages of the system, without using graphical facilities .

G. Berkeley Postgres

Postgres [52], [66] is a prototype database system developed at the University of California at Berkeley. It is, from different points of view, an "extended relational" DBMS, a " nested rela­tional" DBMS, and an OODB. It extends relational databases in several ways. Where most relational systems provide only a few built-in types and operations on them, Postgres provides three

59

Page 20: Object-Oriented Databases: Design and Implementation

kinds of user-defined types and three kinds of user-defined oper­ations on them. The system supports a limited number of base types. Any user can define new abstract data types (ADT's) by specifying functions to convert instances of the type to and from the character string base data type. These ADT's cannot contain pointers. Finally, Postgres supports constructed types and struc­tural inheritance. The user can register functions, written inC or Lisp , that operate on ADT's and pose queries using these ADT's and ADT-specific operations. To permit optimization in queries, the user can define one and two operand operators that define equality and ordering so that B-trees can be used as indices. The user can even define new index types to POSTGRES using a reg­istration protocol. Finally, the user can define POSTQUEL func­tions , which allow a field in a relation to be defined as a query (nested relation). As in HP Iris, stand alone data structures cannot be made separately persistent; the only first class persistent objects are sets.

Postgres also implements a rules system. Triggers can be defined as "always" or " never" running Postgres data manipu­lation commands that maintain consistency relationships. A com­plex marking scheme is used to implement Postgres rules effi­ciently, supporting eager and lazy evaluation. Postgres implements a "no-overwrite" storage manager instead of a tra­ditional write-ahead log. In this scheme, modified or deleted rec­ords remain in the database, making transaction aborts instanta­neous. Since old records as well as current ones are available in the database, queries in past states of the database are supported. However, queries that span past database states are hard to spec-ify.

Postgres was implemented originally in C and Lisp and later reimplemented in C. A Postgres prototype is available from Berkeley.

VI. OODB STANDARDiZATION EFFORTS

As Sections III-V have indicated, there is a large diversity in approaches for OODB's. The need for experimentation in approaches continues. On the other hand, there are now several OODB products and substantial research prototypes. Several researchers have stated requirements for OODB 'sand have offered surprisingly similar definitions and initial specifications for con­sideration [I]. [2]. [35].

There is a real need, driven by industry and government [132]. for reaching consensus on OODB functionality. The value of a standard OODB would be interoperability and interchangeability. A standard would insulate applications using OODB technology from incidental differences between different OODB systems. A number of industry groups are working to accelerate the conver­gence process toward OODB standardization. A "standard" OODB may be far in the future; however, it appears that in sev­eral areas, such as Persistent (X), where X is an object-oriented language, there are good prospects for consensus leading to stan­dards.

A. X3/SPARC/DBSSG/OODBTG

In January 1989, the Database Systems Study Group (DBSSG), one of the advisory groups to the Accredited Standards Commit­tee X3 (ASC/X3), Standards Planning and Requirements Com­mittee (SP ARC), operating under the procedures of the American National Standards Institute (ANSI), established a task group on object-oriented databases (OODBTG).

OODBTG seeks to facilitate further development and use of OODB technology by defining a common reference model for an

OODB, based on object-oriented programming and database management systems models. OODBTG is assessing whether and where standardization on OODB's is possible and useful. Some areas of possible standardization include glossary, reference model, operational model, interfaces, and data exchange. In mid-1991, OODBTG will issue a final report recommending how ASC/X3 should pursue standards in the OODB area , and how OODB standards would be related to existing standards like X3H4 (SQL), X3Jl6 (C++), X3Ji3 (Common Lisp), and others.

B. Object Management Group and Other Application Integration Frameworks

An industrial consortium called Object Management Group (OMG) was formed, in April 1989, to build an object-oriented application integration framework. The objective of OMG is to accelerate the formation of complementary technologies that can provide the basis for improved application portability. OMG now has over 70 members. It is one of several industrial consortia that aim at integration framework~. Others are Portable Commontools Environment, Engineering Information System, CASE Integrated Services, and CAD Framework Initiative.

A unifying theme of all these frameworks is to view future computer systems as collections of applications and services. Examples of common services include a common user interface, a common help system, and a common database system. In object­oriented frameworks, applications invoke services by sending messages. When common services are available, then application designers do not have to reinvent the service for every applica­tion, enhancing reuse, and end users get the benefit of a consistent semantics.

Many frameworks groups view an object-oriented database as a backbone of their system and are working in different ways towards a consensus-based solution. Some, like OMG, are plan­ning to issue proposals to industry for a common OODB system. Others like PCTE and CIS have fa irly detailed specifications in progress for enterprise-wide OODB's.

C. Benchmarks and Conformance Testing

As mentioned in Section III-B, there is a growing interest in the OODB community in developing OODB performance bench­marks that will help to quantize and tune different OODB archi­tectures [ 132] . As yet there are no efforts to build benchmarks for standards conformance testing to try certify OODB interopcrabil­ity compliance, since formal standards do not exist .

VII. SUMMARY AND CONCLUSION

Object-oriented database systems aim at meeting the data mod­eling, performance, cooperative design support, and version management requirements of next-generation applications, such as CAD, CAM , CASE, hypermedia, and expert systems. We began this paper with background infom1ation on object-oriented concepts, such as objects and object identity, object classes, inheritance, and message passing. We then described the require­ments of next-generation applications . These requirements are: rich data modeling, distributed and platform independent object storage, navigational as well as query access to objects, transac­tions appropriate for cooperative design work , sharing of objects among application systems, seamlessness, support for evolution of object instances and object schema, and adequate perfor­mance. Conventional databases were designed to support com­mercial data processing applications that are characterized by

Page 21: Object-Oriented Databases: Design and Implementation

simple data types, short-duration operations, and set-oriented

associative access to data. These databases fail to meet the data modeling and performance requirements of next-generation appli­

cations.

We described the characteristics and features of an OOOB; the primary novel features are the support for complex objects, the notion of object identity , inheritance , and encapsulation of object

behavior. The characteristics and features are chosen so that

OOOB's can support the needs of next-generation applications with acceptable performance. There are different approaches to designing an OOOB; we have presented a taxonomy of these

approaches.

We presented key OOOB architectural and implementation issues , design alternatives , and trade-offs in the areas of object

data models, persistent object storage and retrieval , concurrency

control and transaction management , query processing, version

management, and schema evolution. We presented a brief survey of seven OOOB systems: two commercial products (Ontologie Ontos , Servio Gemstone) , and five research prototypes (MCC

Orion, HP Iris, TI Zeitgeist, Altai:r 0 2 , Berkeley Postgres). It is clear that there is a substantial divergence on many key issues

within the OOOB community, indicating that consensus and stan­

dards on some of the key issues are several years away. However.

since there are several areas where technical consensus is poss ible and there is strong pull from industry for standards , there are good

prospects that standardization activity (already begun) can suc­ceed. We also expect to see further work and consensus on per­

formance benchmarks and continued performance improvements.

Interestingly, both the 0008 and conventional database

schools seem to be heading in the same direction , i.e ., toward the use of an object-oriented data model to pem1it richer data

modeling. The two schools differ in how this will be achieved. Not surprisingly, today's established database vendor community

likes to retain the basic relational architecture and accommodate object extensions to it. Today's OOOB vendors feel that the basic

database architecture itself should change, while supporting quer­

ies as a necessary capability. The 0008 school and the extended­

relational database school are vigorously debating their

approaches. We believe that the approaches taken by the OOOB

school have an advantage over an extended relational database when "seamless " or "low-impedance" integ ration of object-ori­

ented programming languages (such as CLOS, C++. Smalltalk)

with database amenities becomes an important ing redient fo r soft­ware productivity and reliability. On the other hand. extended

relational database systems have an advantage over 0008's when an evolutionary migration path from wel l-established relational

database technology base is an important cons ideration. Eventual

synthesis between these schools is likely: however, both schools face the same challenging research questions, such as how to per­mit query optimization in the presence of encapsulation, how to

support cooperative desig n work, and finally how to meet demanding performance needs. The first act played out over the last five years has been exciting, but the drama will continue to

unfold for several more years.

A CKNOWLEDGMENT

The authors greatly appreciate the critique and comments made by anonymous referees on an earlier manuscript of this paper, as

well as comments made by the members of the Zeitgeist 0008

project under way at Texas Instruments. Many of the ideas pre­sented in this paper came out of discussions with the Zeitgeist 0008 project members, as well as feedback received from Zeitgeist users within Texas Instruments.

JOSEPH et a! .: OBJECT-ORI ENTED DATABASES

R EFERENCES

[I] The Commiuee for Advanced DBMS Funcrion . "Third generation database system manifesto," UC Berkeley Tech. Rep. UCB/ERL M90/28, Apr. 1990.

[2] M. Atkinson er a/ .• "The object-oriented system manifesto, .. in Proc. DOOD '89. Dec. 1989.

13] M. Stonebraker era/. , "Panel: Database systems debate," pre­sented at 1990 ACM SIGMOD Con f .. Atlantic City. NJ , May 1990.

[4] C. Stone and D. Hentchel , "Database wars revisited,' ' BYTE. vol. 10, pp. 233-242, Oct. 1990.

[5] B. Meyer. Objecr-Oriemed Software Construction. Englewood Cliffs, NJ: Prentice-Hall, 1988.

[6] G. Booch, Object-Orienred Designwirlz Applications . Redwood City, CA: Benjamin-Cummings, 1990.

[7) G. E. Peterson , Object-Oriemed Compuring. Washington , DC: Computer Soc. IEEE, 1987.

[8] W. Kim. "Object-oriented databases: Definitions and research directions ," IEEE Trans. Knowledge Data Eng., vol. 2, pp. 327-341, Sept. 1990.

[9] 0. J. Dahl and K. Nygaard , "SIMULA-an Algol-based simu­lation language," Commun. ACM, vol. 9, pp. 671-678, Sept. 1966.

[10] G. Binwistle era/., Simula Begin. Berlin , Germany: Auerbach, 1973.

1 I I ] A. Goldberg and D. Robson, Sma//ralk-80: The Language and Irs lmplememation. Reading. MA: Addison-Wesley, 1983.

112] C. Schaffet1, T. Cooper. B. Bullis, M. Killian, and C. Wilpolt, " An introduction to Trellis/OWL," in OOPSLA '86 Con}: Proc .• 1986.

[1 3] B. Stroustrup, 17le C++ Programming Language. Reading, MA: Addison-Wesley, 1987.

[14] B. Cox, Objecr-Oriemed Programming: An Evolurionary Approach. Reading, MA: Addison-Wesley, 1986.

[15] S. Keene and D. Moon. Common Lisp Classes: A Draft Objecr­Oriemed Standard. Cambridge, MA: Symbolics . Inc., 1986.

[16] D. Bobrow and M. Stefik, The LOOPS Manual. Xerox PARC. Palo Alto. CA 1983.

[17] D. Bobrow. L. DeMichiel, R. Gabriel, S. Keene, G. Kiczales. and D. Moon, "Common Lisp object system specification.'' X3Jl3 Tech. Rep. 88-002R, June 1988.

[ 18] M. Ellis and B. Stroustrup, 77te Annorared C++ Reference Man­ual. Reading, MA: Addison-Wesley. 1990.

[19] M. Stefik and D. Bobrow. "Object oriented programming: Themes and variation~ ... The AI Mag .. vol. 6. 1986.

[20] S. Danfonh and C. Tomlinson. "Type theories and object-ori­ented programming ... ACM Computing Sttl'l'e.\'S, vol. 20. pp. 29-72, Mar. 1988.

[21] P. Wegner. " Conceptual evolution of object-oriented program­ming ... Brown Univ .. Providence. RL Tech. Rep. CS-89. Dec. 1989.

[22] M. Minsky. "A framework for representing knowledge.·· in 111e Psychology of Computer Vision. P. Winston. Ed. New York. NY: McGraw-Hill , 1975.

(23] B. R. Robens and I. P. Goldstein. "The FRL primer.'' Mass. lnst. Techno!. , Cambridge. MA, Tech. Rep. AIM 408. Nov. 1977.

[24] D. Bobrow and T. Winograd. ''An overview of KRL. a Knowl­edge representation language ... Cogniti1•e Science. vol. I. 1977.

[25] M. Hammer and D. McLeod, " Database description with SDM : A semantic database model,·· A CM Trans. Database Sysr., vol. 6, pp . 35 1-386, Sept. 1981.

[26) D. Shipman. "The functional data model and the data language DAPLEX," ACM Trans. Database Syst., vol. 6. Mar. 1981.

[27] E. van Orden, OOPSLA '86 Turoria/ Norebook. New York, NY: Association for Computing Machinery. Sept. 1986.

[28] D. Stamps. "Taking an objective look," Datamation , vol. 5, pp. 45-48, May 1989.

[29] S. Khoshafian and G. Copeland. "Object identity," in OOPSLA '86 Conf Proc., pp. 406-416.

[30] E. F. Codd, "Extending the relational database model to capture more meaning," ACM Trans. Database Syst., vol. 4, pp . 377-387, Dec. 1979.

(31] C. J. Date , ''Referential integrity," in Proc. 7th lm. Conf on Very Large Databases, Sept. 198 1.

[32] J. D. Ullman , Principles of Dawbase Sysrems, 2nd ed. Rock­ville, MD: Computer Science Press, 1982.

61

Page 22: Object-Oriented Databases: Design and Implementation

[33) C. J . Date , An Introduction to Database Systems, 4th ed. Read­ing, MA: Addison-Wesley, 1986.

[34) W. Kim and F. H . Lochovsky, Object-Oriented Concepts, Data­bases and Applications. New York. NY: ACM Press, 1989.

[35] J. Zdonik and D. Maier, " Fundamentals of Object-Oriented data­bases," in Readings in Object-Oriented Databases, J. Zdonik and D. Maier, Eds. Morgan-Kaufman, 1990, ch. I, pp. 1-32.

[36) F. Bancilhon, W. Kim , and H. Korth. "A model of CAD trans­actions," in Proc. Int. Conf on Very Large Data Bases. 1985. pp. 25-33.

[37) J . F. Garza and W. Kim, ·'Transaction management in an object­oriented database system," in Proc. ACM SIGMOD /m. Conf on Management of Data, 1988, pp. 37-45.

[38] The Laguna Beach Participants, "Future directions in DBMS research," SIGMOD RECORD, vol. 18, pp. 17-26, Mar. 1989.

[39] D. McLeod, " 1988 VLDB panel on future directions in DBMS research,'' SIGMOD RECORD, vol. 18, pp. 27-30, Mar. 1989.

[40] E. F. Codd, "A relational model for large shared databanks," Commun. ACM, vol. 13, pp. 377-387, June 1970.

[41) M. P. Atkinson, P. J. Bailey, K. J. Chisholm, P. W. Cockshott, and R. Morrison , "An approach to persistent programming," Comput. J. , vol. 26, pp. 360-365, Dec. 1983.

[42] S. B. Zdonik and K. Smith, " Interrnedia: A case study of the differences between relational and object-oriented database sys­tems,'' in OOPSLA '87 Conf Proc. , 1987, pp. 452-465.

[43) D. Maier, J. Stein, A. Otis, and A. Purdy , "Development of an object-oriented DBMS," in OOPSLA '86 Conf Proc., 1986, pp. 472-482 .

[44) D. Bitton. D. J . DeWitt, and C. Turbyfil , "Benchmarking data­base systems: A systematic approach," in Proc. of the Nimh Int. Conf on Very Large Data Bases, 1983, pp. 8-19.

[45] Anon eta/., "A measure of transaction processing power,'' CMU Tech . Rep ., Apr. 1985.

[46] R. Cattel, W. Rubenstein, M. Kubicar. "Benchmarking simple database operations," in ACM SIGMOD Int. Conf on Manage­ment of Data, May 1987, pp . 387-394.

[47] L. Anderson, A. Berre, M. Mallison, H. Porter, and B. Schnei­der, "The Tektronix Hyperrnodel benchmark," Tektronix Tech. Rep. , Aug. 1989.

[48] D. Dewitt, P. Futtersack, D. Maier, and F. Velez, "A study of three alternative workstation-server architectures for object ori­ented database systems," Altair Tech. Rep. 42-90, Jan. 1990.

[49] R. Cattel and J. Skeen, "Engineering database benchmark." Sun Microsystems Database Eng. Group Tech. Rep., Apr. 1990.

[50] J. D. Ullman, " Database theory: past and future,'' Keynote speech presented at Principles of Database Syst. Conf. Mar. 1987.

[51] A. Otis, " Reference Model for Object Data Management," National lnstitllle of Standards and Technology, May 1990. Available from E. Fong, NIST, Tech. Bldg. A266, Gaithersburg, MD 20899.

[52] M. Stonebraker and L. Rowe, ·'The design of Postgres,' · in Pro c. 1986 ACM SIGMOD Int. Conf on Management of Data , 1986, pp. 340-355.

!53] P. P. Chen, "The entity-relationship model: Toward a unified view of data," ACM Trans. on Database Syst., vol. I , pp. 9-36, Mar. 1976.

[54] S. E. Hudson and R. King, "Cactis: A self-adaptive, concurrent implementation of an object-oriented database management sys­tem,'' ACM Trans. Database Syst., to be published .

[55] R. Hull and R. King, "Semantic database modeling : Survey, applications and research issues," ACM Computing Surveys, pp. 201-260, Sept. 1987.

[56] J. Banerjee, H . T. Chou, J. F. Garza, W. Kim, D. Woelk, N. Ballou , and H . J. Kim, " Data model issues for object-oriented applications," ACM Trans. Office Information Syst., vol. 5, pp. 3-26, Jan. 1987.

[57] M. Atkinson and P. Buneman, " Types and persistence in data­base programming languages," ACM Computing Surveys, pp. 105-190, June 1987.

[58] T. Andrews and C. Harris, "Combining language and database advances in an object-oriented development environment," in OOPSLA '87 Conf Proc., 1987, pp. 430-440.

[59] A. Purdy, B. Schuchardt, and D. Maier, "Integrating an object­server with other worlds," ACM Trans. Office Info. Sysr., vol. 5, pp. 27-47, Jan. 1987.

[60] D. Weinreb, N. Feinberg, D. Gerson, and C. Lamb, "An object-

oriented database system to support an integrated programming environment," Cambridge, MA, Symbolics Tech. Rep., 1988.

[61] S. Ford , J. Joseph, D. Langworthy, D. Lively, G. Pathak, E. Perez, R. Peterson, D. Sparacin, S. Thane, D. Wells, and S. Agarwal. ··zeitgeist: Database support for object-oriented pro­gramming," in Proc. Second Int. Workshop on Object-Oriented Database Syst., 1988, pp. 23-42.

[62] Ontologie Incorporated, Ontos System Documentation. Biller­ica, MA: Ontologie , Inc ., Mar. 1990.

[63] M. P. Atkinson, K. J. Chisholm, and P. W. Cockshott, "PS­Aigol: An Algol with a persistent heap," SIGPLAN Notice, vol. 17 , pp. 24-31, July 1982.

[64] D. DeWitt and M. Carey, "Object and file management in the EXODUS extensible database system," in Proc. Int. Conf. Very Large Data Bases, 1986, pp. 91-100.

[65] D. Fishman , D. Beech, H. Cate. E. Chow, T. Connors, J. Davis, N. Derrett, C. Hoch, W. Kent , P. Lyngbaek, B. Mahbod, M. Neimat, T. Ryan, and M. Shan, " Iris: An object-oriented data­base management system.·· ACM Trans. Office Information Syst., vol. 5, pp. 48- 69, Jan. 1987.

[66) M. Stonebraker, L. Rowe. and M. Hirohama, "The implemen­tation of POSTGRES, · · IEEE Trans. Knowledge Data Eng., vol. 2. pp. 125-141. Mar. 1990.

[67] P. Wegner and L. Cardelli, "On understanding types, data abstmction. & polymorphism,'' CompLtting Sun•eys, vol. 17, pp. 472- 522, Dec. 1985.

[68] J. E. B. Moss and A. L. Wolf, "Towards principles of inheri­tance and subtyping in programming languages," Univ. of Mas­sachusetts, Amherst, MA, COINS Tech. Rep. 88-95, 1988.

[69] Object Design , An Introduction to Object-Store, Release 1.0. Burlington, MA: Object Design Inc. Mar. 1990.

1701 J. E. Moss and S. Sinofsky, "Managing persistent data with Mneme: Designing a reliable, shared object interface." in Proc. Second 1m. Workshop on Object-Oriented Database Syst., 1988, pp. 298-316.

[71] J. Joseph, S. Thane, C. Thomp;on, and D. Wells, "Report on the Object-Oriented Databases Workshop," SIGMOD Record, Sept. 1989.

[72] S. Thatte, "Persistent memory: Storage architecture for object­oriented databases, .. in Proc. Int. Workshop on Object-Oriented Database Systems. Pacific Grove. CA. Sept. 1986.

[73] R. Greenblatt, " MOBY address space. " Seminar report on research in progress. Aug. 1985.

[74] S. Reiss, A. Skarra. and S. Zdonik, "An object server for an object-oriented database system." in 1986 Int. Workshop on Object-Oriented Database Sysr., 1986. pp. 196-205.

[75] G. Weiderhold , "Views. objects, and databases," Compw., vol. 19, pp. 37-43. Dec. 1986.

[76] 0. Deux eta/., "The story of 0 2," IEEE Trans. Knowledge Data Eng., vol. 2, pp. 9 1-108, Mar. 1990.

[77) J. Smith and D. Smith, " Database abstractions: Aggregation and generalization,'' ACM Trans. Database Syst., vol. 2, June 1977.

[78] J. Duhl and C. Damon, "A performance comparison of object and relational databases using the Sun benchmark," in OOPSLA '88 Conf Proc .. 1988, pp. 153- 163.

[79] D. 1. Dewitt, S. Ghandeharizadeh, D. A. Schneider, A. Bric ker. H. I. Hsiao, and R. Ramussen , " The gamma database machine project,·' IEEE Trans. Knowledge Data Eng., vol. 2, pp. 44-62, Mar. 1990.

[80] K. Eswaran, J. Gray, R. Lorie, and I. Traiger, "The notions of consistency and predicate locks in a database system," Commun. ACM, vol. 19, pp. 624-633, Nov. 1976.

[81] J. E. Moss. "Nested transactions: An approach to reliable dis­tributed computing," Ph .D. dissertation, Mass. lnst. Techno!., Cambridge, MA, 1981.

[82] W. Weihl and B. Liskov. "Implementation of resilient, atomic data types," ACM Trans. Programming Languages and Syst., vol. 7, pp. 244-269, Apr. 1985.

[83] S. Schwart.. "Synchronizing shared abstract types," CMU Tech. Rep. , 1983.

!84] N. Griffeth , J. E. Moss, and M. Graham, " Abstraction in con­currency control and recovery management,·· Univ . of Massa­chusetts, Amherst, MA. COINS Tech. Rep. 86-20, 1986.

[85] H . Garcia-Molina and K. Salem, "SAGAS," Princeton Univ. Princeton , NJ, Tech. Rep. CS-TR-070-87, Jan. 1987.

[86) R. Katz and S. Weiss, "Design transaction management," in Proc. 19th ACMIIEEE Del·ign Awomation Conj., June 1984.

Page 23: Object-Oriented Databases: Design and Implementation

[87] S. Zdonik and A. Skarra, "The management of changing types in an object-oriented Database.·· in OOPSLA '86 Conf Proc., pp. 483-495.

[88] Randy H. Katz, " Towards a unified framework for version mod­eling,'' Univ. of California, Berkeley, Tech. Rep. UCB/CSD 88/ 484, Dec. 1988.

[89] M. Fernandez and S. Zdonik, " Transact ion groups: A model for controlling cooperative transactions,·· in Proc. Workshop on Per­sistem Object Systems: Their Design, Implementation, and Use, The Univ. of Newcastle. N. S. W., Australia, 1989.

[90] J. Rothney et a/. , "An introduction to a system for distributed database (SDD-1)," Trans. Database Syst. , vol. 5, pp. 1- 17, Mar. 1980.

[91] H. Korth and G . Speegle, " Fonnal model of correctness without serializability, " in Proc. ACM SIGMOD Int. Conf 0 11 Manage­mem of Data , June 1988.

[92] A. Skarra , " Localized correctness specifications for cooperating transactions in an object-oriented database," Office Knowledge Engineering, vol. 4 , to be published.

[93] A. Synder, "Encapsulation and inheritance in object-oriented programming languages,'' in Proc. Conf on Object-Oriented Programming Systems, Languages, and Applications, 1986, pp. 38-45.

[94] G. Graefe and D. Maier, "Query optimization in object-oriented database systems: A prospectus,., in Proc. Second Int. Workshop on Object-Oriemed Database Syst .. 1988, pp. 359-363.

[95] M. Stonebraker and A. Guttman, "Using a relational database management system for computer aided design data-an update. ·· IEEE Database Eng., vol. 7, pp. 56-60, June 1984.

[96] Texas Instruments Incorporated , RTMS: Relational Table Man­ageme/11 System Reference Manual. Austin, TX: Texas Instru­ments Data Systems Group, 1984.

[97] J. Eisen, " A software cache management system. " Texas Instru­ments CRL-Comp. Sc i. Lab. , Austin, TX, Tech. Rep. , 1985.

[98] J . Blakeley, P.-A. Larson, and F. Tompa , " Efficiently updating materialized views,'' in Proc. ACM SIGMOD Int. Conf on Man­agemelll of Data, 1986, pp. 61-71.

[99] R. Hanson, "Toward hype11ext publishing : Issues and choices in database design," presented at Hypertext87 Conf, Chapel Hill , NC. 1987.

[100] W. Kent , " Panel: An overview of the Versioning problem," in Proc. ACM SIGMOD 1111. Conf. on Management of Data , May 1989.

[101] S. Feldman. ' ·Make-A program for maintaining computer pro­grams," Software-Practice and Experience, vol. 9, pp. 255-265. Apr. 1979.

[102] D. Moon, R. Stallman, and D. Weinreb. Usp Machine Manual 6th ed. Cambridge, MA: M.I.T. Press. !984.

[I 03] M. J . Rochkind, "The source code control system, '· IEEE Trans. Software Eng., vol. SE-1. pp. 364-370. Dec. 1975.

[104] I. Goldstein and D. G. Bobrow, "A layered approach to software design.·· in Interactive Programming Environments, D. R. Bar­stow, H. E. Shrobe. and E. Sandwall, Eds. New York , NY: McGraw-Hi ll, ch. 19, 1984, p. 387.

(105] CLF Project, Introduction to the CLF Environment. Marina Del Ray, CA: USC Information Sciences Institute, 1986.

[106] J . Joseph, M. Shadowens, J. Chen, and C. Thompson, "Straw­man reference model for Change Management," in Proc. OODBTG workshop on Object-Oriented Database. (NIST Tech. Rep. available from E. Fong, Bldg . A266, Gaithersburg. MD 20899) May 1990.

[107] R. Bhateja and R. H. Katz, "A validation subsystem of a version server for computer-aided design data ," in Proc. 24th ACMIIEEE Design Automation Conf., 1987.

1108] G. S. Landis, '·Design evolution and history in an object-oriented CAD/CAM database, " in Proc. 31st IEEE Computer Society lm. Conf. on Applications of Compwers, 1986, pp. 297-303.

[109] H. T. Chou and W. Kim, " A unifying framework for versions in a CAD environment," in Proc. Int. Con/ on Very Large Data Bases, 1986, pp. 336-344.

[1 10] W . Kim , J . Banerjee, H. T. Chou, J. F. Garza, and D. Woelk, "Composite object support in an object-oriented database sys­tem," in Proc. Object-Oriented Programming Systems and Lan­guages Conf, 1987 , pp. 118-125.

[Ill] G. L. Steele, ''The definition and implementation of a computer pro~;;ramming language based on constraiuts," Mass. Inst. Tech­no!. , Cambridge, MA, Tech. Rep. AI-TR.595, 1980.

JOSEPH et a/.: OBJECT-ORIENTED DATABASES

(1 12] Wm. Leier. Comtraint Programming Languages-Their Specifi­cation and Generation. Reading, MA: Addison-Wesley. 1988.

[113] A. Borning, " Thing/ah-A constraint-oriented simulation labo­ratory," Ph.D. dissertation, Stanford Univ., Stanford, CA. 1979.

[1 14] A. Borning and R. Duisberg. "Constraint-based tools for building user interfaces." ACM Trans. Graphics, vol. 5, Oct. 1986.

[1 15] A. Boming, R. Duisberg, B. Freeman-Benson, A. Kramer. and M. Woolf, " Constraint hierarchies," in Proc. Object-Oriemed Programming Systems and Languages Conf, 1987, pp. 48- 60.

[ 116] J . Mostow and R. Balzer, "Application of a transformational soft­ware development methodology for VLSI design," J. Syst. Soft­ware, vol. 4 , pp . 5 1-61, 1984.

[117] J. Banerjee, W. Kim , H. Kim, and H. Korth, "Semantics and implementation of schema evolution in object-oriented data­bases,'' in Proc. 1987 ACM-SIGMOD 1111. Conf on Management of Data, 1987.

[118] P. H. Stanford, Electronic Design Interchange Format Version 2 0 0, Electronic Industries Association , 1986.

[ 119] Sun Microsystems Incorporated , External Data Representation (XDR). Mountain View, CA: Sun Microsystems, Inc ., Jan. 1985.

[ 120] A. Birrel and B. Nelson, "Implementing remote procedure calls," ACM Trans. Compwer Syst., vol. 2 , pp. 39-59, Feb. 1983.

[121] R. Jones, "Mach and Matchmaker: Kernel and language support for object-oriented distributed systems,'· CMU Tech . Rep., Sept. 1986.

[122] B. Liskov, T . Bloom, D. Gifford. R. Scheifler, and W. Weihl , "Communications in the Mercury System," in Proc. Twenty-First Annual Hawaii bu. Conf on System Science, 1988.

[ 123] Open Software Foundation, OSF/Motif Series. Englewood Cliffs, NJ: Prentice Hall , vols. 1-5, 1990.

(124] M. Linton, J. Vlissides. and P. Calder, "Composing user inter­faces with Interviews," IEEE Comput., vol. 22, pp. 65-84, Feb. 1989.

[125] Versant Object Technology Corporation, Object Today! (quar­te rly newsletter). Menlo Park, CA: Versant Object Technology Corp., June 1990.

[ 126] Objectivity Incorporated, Objectivity/DB System Overview. Menlo Park, CA: Objectivity Inc .. Mar. 1990.

[1 27] W. Kim , J. F. Garza. N. Ballou, and D. Wnclk, "Architecture of the ORION next-generation database system," IEEE Trans. Knowledge Data Eng., vol. 2, pp. 109- 124. Mar. 1990.

[128] W. Kim, N. Ballou, H. T. Chou. J. F. Garza, and D. Woelk. " Integrating an object-oriented programming system with a data­base syste m," in Proc. Objected-Oriemed Programming Systems and Langunges Conf. !988, pp. 142-152.

[129] K. Wilkinson, P. Lyngboek. and W. Hasan. " The iris architec­ture and implementation," IEEE Trans. Knowledge Data Eng., vol. 2, pp. 63- 75. Mar. 1990.

( 130] M. M. Astrahan , "System R: A relational database management system," ACM Trans. Database Syst. , vol. I, pp. 97-137, June 1976.

[131] G. Pathak. J. Joseph, and S. Ford , ·'Object Exchange Service for an object-oriented database system," in Proc. Fifth lm. Conf on Data Eng. , 1989, pp. 27-34.

[132] F. Bancilhon et a/., "Final report on DARPA-NSF-ESPRIT workshop on US/EC collaboration in information technology: Session on OODBs," National Science Foundation. Tech. Rep ., Aug. 1990, (Available from E. Fong, NIST. Tech. Bldg. A266 , Gaithersburg, MD 20899.)

John V. Joseph (Member, IEEE) received the Ph.D. degree in mathematics from Purdue Uni­versity and the M .S. in computer science from University of North Carolina, Chapel Hill in 1977 and !983, respectively.

From 1977 to 1983 he taught at the Univer­sity of North Carolina, Greensboro. He joined Texas Instruments, Dallas, TX , in 1983 as the project manager for VHSIC software tools. Since then his research has centered on design tools, object-oriented databases, change man-

63

Page 24: Object-Oriented Databases: Design and Implementation

agement systems, and software engineering. He is a developer of the Zeitgeist Object-Oriented Database. He has published in the areas of change management and object-oriented databases and has two patents pending in these areas.

Dr. Joseph is a member of the Association for Computing Machinery.

Satish M. Thatte (Senior Member, IEEE) received the B.E. (Hons. degree) in electronics engineering with Gold Medal for highest scho­lastic achievement from the Birla Institute of Technology and Science, Pilani, India, in 1975. He received the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois, Urbana-Champagne, IL, in 1977 and 1979, respectively.

He joined Texas Instruments, Dallas, TX in 1979 and played a leading role in formulating

and initiating TI's VLSI Design for Testability effort in the VLSI Design Laboratory. He was the Principal Technical Investigator of the " Design Test Technology for VHSIC," VHSIC Phase III contract from 1980 to 1983. At Texas Instruments, he was elected a Senior Member of Tech­nical Staff in 1983. From 1983 to 1985 he worked on advanced computer architectures for symbolic computing and artificial intelligence, involving research on memory management (virtual memories, cache management, garbage collection techniques), and database systems architecture. From 1986 to 1988 he was manager of Database Systems branch inTI's Arti­ficial Intelligence Laboratory. He is Director of the Information Tech­nologies Laboratory in the Computer Science Center, where he leads research on object-oriented database systems, hypermedia systems , and advanced information delivery technologies. He has published twenty­seven technical papers, and holds eight U.S. and one European patents.

Dr. Thatte is a member of the Association for Computing Machinery.

Craig W. Thompson (Senior Member, IEEE) received the B.A. degree in mathematics from Stanford University in 1971 and the M.A. and Ph.D. degrees in computer science from The University of Texas Austin in 1977 and 1984, respectively.

From 1977 to 1981 he taught at the Univer­sity of Tennessee, Knoxville . He joined Texas Instruments in 1981. He is currently manager of the Zeitgeist Open OODB project in the Information Technologies Laboratory, Com­

puter Science Center, Texas Instruments, Dallas, TX. His research has centered on engineering databases, object-oriented databases, hyperme­dia systems, and user interfaces. He has published twenty-five technical papers and holds two U.S. patents with three patents pending.

Dr. Thompson is an active member of X3/SPARC/DBSSG/OODB Task Group, Object Management Group, and the Association for Computing Machinery.

David L. Wells (Member, IEEE) received the B.S. degree in applied mathematics and phys­ics, the M.S. in computer science, and the Doc­tor of engineering degree in computer science in 1975, 1976, and 1980, respectively, from the University of Wisconsin, Milwaukee.

From 1980 to 1986 he was an Assistant Pro­fessor of Computer Science at Southern Meth­odist University in Dallas, TX, performing research in computer security, computer graph­ics, and database systems. Since 1986, he has

been a Member of Technical Staff in the Information Technologies Lab­oratory, Computer Science Center, at Texas Instruments, Dallas, TX, where he is a developer of the Zeitgeist object-oriented database.