Download - The Functional Data Model as the Basis for an Enriched Database Query Language

Journal of Intelligent Information Systems 12, 139–164 (1999)c© 1999 Kluwer Academic Publishers. Manufactured in The Netherlands.

The Functional Data Model as the Basisfor an Enriched Database Query Language

ROBERT AYRES [email protected] of Informatics and Simulation, Cranfield University, Royal Military College of Science, Shrivenham,Swindon, Wiltshire SN6 8LA, UK

Editors: Peter M.D. Gray, Peter J.H. King, Larry Kerschberg

Abstract. Conventional database languages rely on the user specifying what relations are to be used whenevaluating a query. Consequently they preclude queries which involve searching for unspecified connections orassociations in the database. In this paper we present Hydra, a functional language with all the facilities to define,update and query a database, which also enables users to carry out “associational” queries. Hydra uses a graph-based data model in which nodes represent values or entities and arcs the relationships between them. Associationalfacilities are made possible by the provision of built-in functions which find paths through the database graph.The mappings between sets of nodes in the database graph are represented as functions at the Hydra languagelevel and it is as lists of such functions that associational results are returned. The use of a functional language isimportant since such languages allow functions to be returned as results; such an approach could not be adoptedin a logic-based language which would not permit predicates to be returned as answers. Hydra also allows usersto define general computational functions which are not considered to form part of the database. This use of twosets of functions achieves a computationally complete system which extends the query power of previous databasesystems without compromising their expressive or query power.

Keywords: functional data model, functional programming, graph databases, semantic networks

1. Introduction

In current database query languages it is not possible to express a query that correspondsto a question such as:

Is there a connection between John and Mary?

One of the reasons for this is that the record-oriented data model used by most systemslimits the semantic expressiveness of the database. A separate problem is that standardquery languages rely on the user specifying the relations to use in evaluating the query. Itis the nature of the query above that the user does not know.

There are a number of domains where such open-ended queries need to be supported.Biologists model eco-systems in terms of food-webs and are often interested in the waythat species can be linked by the food chain or shared habitats. Likewise, in sociologysocial network analysisis concerned with studying groups ofactors(such as people, com-panies, clubs, etc.) in terms of networks of relationships and associations (Wasserman and

140 AYRES

Figure 1. Portion of an instance-level Hydra database.

Faust, 1994). Such applications have normally relied on special-purpose programs withgraph-searching facilities. The drawback of this approach is that the data collected can-not be managed as an ordinary database with all the advantages of integrity constraints,general query facilities and so on. It would of course be possible to build special-purposeapplications over a relational (or other) system but this would be to produce specific so-lutions to a general problem—data in many application domains is usefully viewed as anetwork.

In this paper we present Hydra, a computationally complete functional language ex-tended with the facilities to define and query a database. In this respect it is similar toprevious functional database languages, such as FQL (Buneman and Frankel, 1979) andFDL (Poulovassilis and King, 1990). The novelty of Hydra lies in the facilities it providesto query the way in which database values are associated with each other. These “associ-ational” facilities are provided without losing any of the retrieval power of previous querylanguages. Hydra uses a restricted functional data model in which the database becomes agraph with the nodes representing entities or scalar values and labelled arcs between nodescapturing attributes and relationships. A portion of a Hydra-style database graph is shownin figure 1. The advantage of such a representation scheme for data is that associationalqueries can be processed by searching for a path through this database graph. For example,finding an association between John and Mary can be seen as equivalent to looking for adirect or indirect connection between the database nodes which correspond to those entities.

In Hydra the database graph is built up by giving defining equations for a set of functionswhich correspond to the different kinds of association recorded in the database. For example,the database fragment shown in figure 1 can be constructed by the Hydra script shown infigure 2 in which a class ofperson entities is created and populated before two database

entity person;create person John, Mary, Sue;age :: person -> int;child :: person -> [person];child John =+ Mary, Sue;age John = 55;age Sue = 30;

Figure 2. Portion of Hydra script to create a database.

THE FUNCTIONAL DATA MODEL 141

functions are declared. The defining equations for these functions can be seen as creatingthe instance-level arcs of the database graph. Conventional queries can be carried out byapplying database functions to parameters—for example,age John; which evaluates to55. More importantly though the Hydra language contains built-in primitives to search fordatabase associations. Hence the questionWhat is the connection between John and Mary?can be answered using the built-in Hydra functiontrail as follows:

trail 1 John Mary;

which returns the list

[[John, child, Mary]]

containing all the direct paths connectingJohn andMary. The user can specify longersearch paths as in

trail 3 55 30;

which will return the result

[[55, ~age, John, child, Sue, age, 30]]

where the~age function corresponds to an arc traversed “backwards”.The benefit of integrating associational features in the way they have been in Hydra

is that they are general purpose and can be used on any database. This offers advantageseven in applications where the data might not normally be modelled in terms of a network.For example, the user can carry out queries corresponding to questions such asFind outeverything known about Johnor Find an entity which is directly or indirectly associatedwith 55, Mary, and Sue. In Hydra these questions can be answered with the user-definedfunctionsknown andcentre (introduced later) as follows:

known John;centre [55, Mary, Sue];

The former returns the result

[(ml.age, [55]), (child, [Mary, Sue]), (~child, [])]

representing everything that is known aboutJohn. The latter query returns the list[John]containing all the values in the database graph with a direct link to each of55, Mary, andSue.

These queries are conceptually simple but cannot be carried out in standard databasesystems. Such systems generally use records as the fundamental data structure; this meansthat the semantics of connections are not preserved in the data. Moreover, standard querylanguages are first order and do not allow a second order query (one quantified over thedatabase schema) to be formulated. A further problem, and one which motivates the use

142 AYRES

of a functional query language, is that imperative or logic-based languages treat valuesand functions (or predicates) in different ways and so do not provide a framework wherefunctions can be returned as query results.

The use of a graph-based data model combined with a functional language has otheradvantages. Schema design is simple—largely a question of identifying the relevant entitytypes to be modelled—and the schema can be extended without needing to change theexisting design or data organisation. The use of a graph-based data model means that thedatabase is naturally viewed as a network and this has been exploited in the development ofa graphical interface, VisualQ (see figure 3). VisualQ allows the naive user to use a smallset of queries based on the associational primitives to explore a Hydra database and placethe result on a free-form canvas. The use of a canvas allows multimedia data to be naturallyintegrated into a database view.

Apart from the ability to return functions as results the use of a functional languageto define and query the database has further advantages. It provides a declarative querylanguage which, by allowing the user to define other functions (which are not consideredto form part of the database) results in a computationally complete language. Finally, theuse of lazy implementation techniques (Jones, 1987), where the evaluation of expressionsis delayed until their result is required to evaluate some outer expression, is appropriate fora database query language since it minimises retrievals from secondary storage.

Figure 3. The VisualQ graphical interface to Hydra.


Hydra is not the first system to provide specialised graph-searching facilities. Other sys-tems, such as GraphLog (Consens and Mendelzon, 1990, 1993), have not, however, inte-grated these with a full database system. GraphLog, for example, uses a directed graphas a way of representing a logic program graphically and does not provide the unre-stricted searching capabilities available in Hydra. Where such facilities have been inte-grated with a database they have been restricted to particular data types, as with the RM/Tmodel (Codd, 1979) or in certain geographical information systems where there are spe-cial features for representing and querying transportation systems, as in GraphDB (Guting,1994).

In the rest of the paper we first give an overview of the data modelling issues which must beaddressed. Section 3 gives an overview of the Hydra language concentrating particularly onthe novel features of the language; examples of how these features may be used are givenin Section 4. Section 5 gives an overview of a graphical interface, VisualQ, which wasdeveloped as a front-end to the language. Finally, in Section 6 we outline our conclusionsand discuss further work.

2. Data modelling issues

The motivation behind the design of Hydra was to produce a language which could ex-press and process queries concerned with retrieving direct or indirect associations betweendatabase elements. For this to be possible the language must use a data model in which allsuch associations are explicitly represented.

This requirement effectively rules out the use of a record-oriented data model such asthe relational model. As has been pointed out by Kent (1979) records have a numberof weaknesses for data modelling. One of these is that there is not necessarily a one-to-one correspondence between application entities and database records. In a relationaldatabase an entity may be represented by several records appearing in different tables. Forexample, an individual may be both an employee and a customer of a company so recordscorresponding to the individual may appear in both employee and customer tables in thecompany’s database. Similarly, records only distinguish between entities to the extentthat they record distinguishing attributes of those entities—a record only really impliesexistence of at least one occurrence of an entity or relationship. These issues are addressedin object-oriented models by providing object-identifiers (or surrogates) which can be usedto directly model application entities. However, there is a further problem with record-based systems: the semantics of the application links captured by attributes in records(or foreign key relationships) are not preserved in the database and so cannot be directlyretrieved.

These data modelling problems are avoided ingraph databasesin which application datais modelled in terms of a set of binary relationships between sets of entities or scalar values(Kent, 1979). Using this approach the database becomes a labelled digraph. The retrievalof associations between database entities can consequently be treated as a search for a paththrough the database graph. Other queries are also possible—finding all the neighbours ofa given node, for instance, is equivalent to retrieving everything known about the entity orvalue represented by the node.

144 AYRES

2.1. The data model of Hydra

The underlying data model of Hydra is a labelled, directed graph whose nodes correspondto atomic values such as strings or integers, or surrogates representing application entities.A Hydra database can thus be thought of as a set of binary relations between sets of atomicvalues. For instance, the two relations

age(Person,Integer)child(Person,Person)

constitute a database schema wherePerson is a set of entities corresponding to peo-ple in the application domain. The instance level database would be built up by intro-ducing surrogates corresponding toPerson entities and defining the extents of the tworelations.

In order to accommodate a binary relational database within the syntactic framework ofa functional language these relations are represented as functions at the level of the Hydralanguage. Two kinds of functions may be defined: a relation such asage would be definedas a “single-valued” function of type

person -> integer

and a relation such aschild as a “multi-valued” function of type

person -> [person]

where[person]denoteslist of person. A function such as age will be partial so a null-valueis also introduced at the language level to cater for situations where the application of asingle-valued function to a value is undefined. List-valued functions are used to accommo-date relations such aschild since sets cannot be easily supported in functional languages.This means that an arbitrary (but consistent) order is imposed on the result of an applicationof a multi-valued function.

A Hydra database is declared by giving type declarations for a set of single and multi-valued functions representing application data. The instance-level data is entered by givingdefinitions to the functions introduced. Such database functions are termedprimary in theHydra language to distinguish them from other, general computational functions which theuser may also define.

Application of a function, such asage to an entity of type person, corresponds to followingthe relation “forwards” (from person to integer); in order to follow a relation “backwards”(from an age to people with that age) Hydra provides a set ofconversefunctions. For eachdatabase function, such asage or child, the system automatically maintains the conversefunction, denoted by prefixing the function with a tilde (e.g.,~age, ~child). Hence thefunction~age has type

integer -> [person]


and when applied to an integer returns a list of people with that age. All converse functionsare list-valued, the lists they return corresponding to sets of results on which a consistent,but arbitrary, order is imposed.

An important point about this data model is that it entails no loss of expressive powercompared to the relational (or any other record-oriented) data model. Just as anyn-aryrelation can be re-expressed usingn+ 1 binary relations so can it be re-expressed usingn+ 1 functions.

This data model is clearly very close to that of Daplex (Shipman, 1981; Kulkarni andAtkinson, 1986) but with some differences. The insistence, in the data model, on atomictypes and functions of at most one parameter ensures that associations between values canalways be seen as simple paths through the database graph. Were constructed types (orfunctions of more than one parameter) to appear in the database then the process of findingassociations between entities or values (the motivation behind the language) would becomplicated and such paths might not have a simple representation in terms of user-definedfunctions.

A further difference is that Hydra, like FDL, treats what Daplex callsbaseandderivedfunctions in a uniform manner—indeed functions can combine both extensional and in-tensional defining equations. As a consequence the distinction made in relational systemsbetween base and derived tables has no analogue in Hydra.

3. Overview of Hydra

The use of a graph-oriented data model ensures that, in principle, a query system can searchfor arbitrary associations between database nodes and return the associations found asresults. However, standard query languages are first-order, that is they do not allow the userto carry out queries which are quantified over the relations in a database schema. Moreover,they do not provide a framework in which functions or relationships can be returned asresults. In contrast functional languages, in which functions are first-class citizens, allowfunctions to be returned as results and so provide an ideal framework for the inclusion of“associational” facilities. In Hydra, such facilities are provided by augmenting the set ofbuilt-in functions with a small set of second-order primitives which use schema-informationto find actual or potential associations between entities and return the results in the form oflists of functions.

Hydra is a polymorphic functional language with a syntax similar to that of Miranda(Turner, 1985). It incorporates many of the features of modern functional languages suchas: user-defined types, polymorphism, higher-order functions, lazy evaluation, and listcomprehensions. It allows the user to build up a database by declaring and defining aspecial class of functions, calledprimary functions . The user may also define generalpurpose (computational) functions, termedsecondary, which are not considered to formpart of the database.

Below we give an overview of Hydra presenting, in turn, its type system, its database def-inition and query facilities, its general computational facilities, and finally the associationalquery facilities which are provided by built-in functions.

146 AYRES

3.1. Type system

One of the characteristics of queries which correspond to questions such as:

What is the connection between John and Bill?

or

Find everything known about John.

is that the types of the results cannot be known in advance. In particular, the way thatresults of such queries are represented in Hydra—as lists of database values (nodes) ordatabase values and functions (arcs)—gives rise to heteromorphic lists which are not well-typed according to the conventional polymorphic type system generally used by functionallanguages.

However, extending the query power of the database language clearly requires suchheteromorphic lists to be supported. In Hydra this has been done by augmenting the typesystem with a universal type which is taken to be the union of all other types and bypreserving type information at run time. We review the conventional features of the typesystem first before introducing its novel features.

Atomic types. In Hydra atomic values are those which are considered to have no internalstructure. It is precisely atomic values which may appear as nodes in the database graph.Three kinds of built-in atomic type are supported:

• integers, represented by a non-empty string of digits such as0 or 190,• strings, which are enclosed in double quotes as in"" (the empty string),"abc1" or"\"\"" (using the backslash convention of C), and• booleans, represented byTrue andFalse.

Standard operations are provided on all these types and future implementations will alsoinclude support for a “real” type.

The only atomic types which the user may introduce are classes of database surrogates.These are introduced using the keywordentity, hence the declarations

entity person;entity location;

introduce the entity classesperson andlocation. The entity classes of Hydra are likeenumerated types except that they are dynamic—entities may be added or removed at anypoint. Hydra uses visible surrogates which may either be introduced by the user as in

create person RobertoBonni;create person ColinNewmarch, JohnSmith;

or generated by the system as in

create person;


which will automatically generate a surrogate such asPerson001. The system ensures thatall surrogates are unique. Surrogates (entities) can be removed as follows

delete Person001;

and the system will automatically update function definitions to remove references toPerson001. The use of visible surrogates means that, from a functional programmingperspective, surrogates can be treated as nullary constructor functions. The surrogate strat-egy of Hydra (along with features for changing surrogates) is discussed in greater detailelsewhere (Ayres and King, 1995).

Constructed types. Hydra provides two built-in constructed types—lists and tuples. Listsare enclosed in square brackets and their elements separated by commas. Hence[1,2,3] isa list of integers whose type is designated as[int]. Lists are constructed using the normallist constructor operator (represented by:) so [1,2,3] is really just a syntactic variantfor the expression1:2:3:[] where[] is the empty list. Tuples of two or more items areenclosed in ordinary brackets and the items separated by commas. Thus(1,True) is a pairformed from an integer and a boolean whose type is designated as(int,bool).

The user may define new “sum-of-product” types and their constructors. For example,the types

day ::= Mon | Tue | Wed | Thur | Fri | Sat | Sun;date ::= JUL int int

| GREG int int int| DAYCOUNT int date;

represent days of the week and dates measured in different calendars. User-defined data-types can be recursive, as with the third alternative for date which represents a date as a daycount from a base date. Examples of valid dates are thus:

JUL 321 1997;GREG 12 7 1996;DAYCOUNT 10319 (JUL 365 1979);

Polymorphism. Hydra, in keeping with other functional languages, supports polymorphicor generic types. For instance, the type definition

tree ’a ::= TREE (tree ’a) (tree ’a)| LEAF ’a

defines a generic binary tree type which can be instantiated to particular tree types byreplacing the polymorphic type variable’a with a specific type. Hence:

TREE (LEAF 1) (TREE (LEAF 2) (LEAF 1));

148 AYRES

is a value of typetree int. Such polymorphic types encapsulate the essential structureof a set of related types and allow generic manipulation functions to be encoded.

The universal type. In addition to the entity classes and standard type features outlinedabove, Hydra provides a special universal type denoted by?. The universal type representsthe union of all types (functional and non-functional) and is needed to accommodate someof the results which may be produced by the associational primitives (introduced below).For example, using the associational features it is quite simple to construct a heteromorphiclist such as[John, age, 29]. Such a value would give rise to a type error in conventionalfunctional languages but in Hydra it can be assigned the type[?].

One implication of the universal type is that it is no longer possible to use the typeinference techniques of standard functional languages (Milner, 1978). In Hydra the type ofa function must be declared before its definition is given. This is not a disadvantage in adatabase context since type declarations of functions also serve as integrity constraints.

A separate implication of the support for heteromorphic structures (such as the list above)is that type information must be preserved at run time. This is primarily so that the systemcan determine how to display values of differing types but type information is also exploitedby some of the features of the language to be introduced below.

3.2. Database definition, update, and query facilities

A database schema is declared in Hydra by introducing entity classes and giving the typedeclarations of one or more primary functions. Primary functions are specifically intendedto model application data and must be consistent with the underlying data model outlinedabove. Hence they must be declared with an atomic domain and a range which is either ofatomic type or of list of atomic type. For example, a single-valued function to representpeople’s ages may be declared as

primary age :: person -> int;

and a multi-valued function to record the locations frequented by individuals as

primary frequents :: person -> [location];

The instance-level database is built up by defining the primary functions—these definitionscan be incrementally updated. For a single-valued function, such as age, we can givedefinitions as follows

age RobertoBonni = 32;age ColinNewmarch = 56;

Where multi-valued functions are concerned the definitions are set-oriented. The definition

frequents RobertoBonni =+ KingGeorge, RonsGym;


means thatRobertoBonni frequents the two locationsKingGeorge andRonsGym in ad-dition to whatever locations he is already recorded as frequenting.

The definitions of primary functions can combine both extensional and intensional equa-tions. For example, a possible default definition for the functionage is

age x = 21;

Information is retrieved by applying primary functions to parameters as in the expression

age RobertoBonni;

which is evaluated using best-fit pattern matching. The evaluator first looks for a definingequation of the formage RobertoBonni = ... and finds

age RobertoBonni = 32;

so returning the result32. Had the queryage JohnSmith; been entered the evaluatorwould have failed to find a definition of the formage JohnSmith = ... and then lookedfor one with a variable on the left hand side, found the default equation given above, andso returned the result21.

As mentioned above, converse functions are automatically maintained by the system: tofind all the people who are 32 years old the user can enter the query

~age 32;

which returns the list[RobertoBonni] given the database so far built up.Defining-equations for functions can be removed from the database by simply giving

their defining pattern without a right hand side. Thus

age RobertoBonni = ;

removes the definition ofage for RobertoBonni and

age x = ;

removes the default definition forage. The entire function (declaration and definition) canbe removed by the command

delete age;

For multi-valued functions there is the facility to carry out set-oriented deletions. Thus thestatement

frequents RobertoBonni =- KingGeorge, RonsGym;

150 AYRES

removesKingGeorge andRonsGym from the locations frequented byRobertoBonni andof course the declaration

frequents RobertoBonni = ;

removes all the locations whichRobertoBonni has been defined as frequenting.

Null values. Support for single-valued primary functions means that a null value has tobe introduced into the language so that a result can be returned when a primary function isundefined. Hydra supports typed null values which are designated by prefixing the nameof an atomic type with a question mark. Hence?person is a null value of type person, andage JohnSmith will evaluate to?int.

Null values are associated with types in this way so that they can be used as parametersfor some of the type-sensitive query facilities presented below.

More complex queries. Hydra provides a functionlike which retrieves all databaseentities of the same type as its parameter. Hence

like JohnSmith;

returns a list of all the person entities in the database

[RobertoBonni, ColinNewmarch, JohnSmith]

The same result would have been returned if the null person had been used instead as inlike ?person;. Thelike primitive can be used with parameters of any type to return allthe values of the same type that have so far been defined or used. Hence

like Mon;

returns the list[Mon, Tue, Wed, Thur, Fri, Sat, Sun] and

like age;

returns the singleton list[age]. The behaviour oflike is slightly different for integersand strings since these values are predefined. Hence the expressionlike ?int; returnsthe list[21,32,56] of all the integers which are explicitly used in function definitions.

When used in conjunction with list abstractions thelike primitive makes it possible toexpress SQL-like queries. Hence the expression

[age x | x <- like ?person];

returns the list[32,56,21] of the ages of all the person entities defined in the database.Further generators (that is<- expressions) and predicates can be added to the right hand


side of the list abstraction. Hence

[x | x <- like ?person | age x > 35];

gives[ColinNewmarch]—the list of all persons over 35 years of age.Just as we can convert any relational data model into an equally expressive data model

in Hydra so any query which might be made on the original relational database can bere-expressed in Hydra usinglike combined with list abstractions and possibly some user-defined computational functions (discussed below).

Function composition. Hydra provides a number of built-in function composition oper-ators. For instance, if the user has defined a functionpartner as follows

primary partner :: person -> person;

then using the built-in function composition operator (denoted by a full stop) the user canform the function

age.partner

corresponding to the relationship“age of partner of” or even

~age.age.partner

corresponding to “people of the same age as the partner of”. It is not possible to combinelist-valued functions in the same way since the expressionpartner. ~age (intended tocorrespond to the relation “partners of people of age”) is not well-formed—the range of~age is[person] so we cannot directly composepartnerwith the function. To overcomethis problem Hydra provides two further, specialised composition operators. The first(denoted by the infix operator..) allows a single-valued function to be composed with amulti-valued function to produce a multi-valued function. Hence

partner..~age

is well-formed and correctly encodes the “partners of people of age” relation. Two multi-valued functions can be composed using a further operator (denoted by...) to produce asingle multi-valued function. For example, if the user has defined a multi-valued functionchild to associate an individual with his or her children then

child...child

represents the relationship “grandchildren”.

152 AYRES

Type-sensitive features.In Hydra it is possible to define database functions which takeor return any kind of atomic value. For instance the user can define a functionicon whichwill, for any atomic value, return the name of a bitmap to be used when displaying thevalue (the icon function is used in this way by the graphical interface discussed later). Thefunction is declared as

primary icon :: ? -> string;

Note that the use of? in this context really means atomic rather than universal since primaryfunctions are only defined on atomic types. The user can give definitions foricon such as

icon KingGeorge = "pub.xbm";icon JohnSmith = "johnsmith.xbm";

to associate bitmaps with particular values. Atomic types can be used in pattern speci-fications to associate icons with classes of value in the absence of an exact match. Theequations

icon (x::person) = "person.xbm";icon (x::location) = "building.xbm";

associate default bitmaps with entities of type person or location. Of course, an overalldefault can also be specified by omitting the type specification as in

icon x = "point.xbm";

The best-fit pattern matching means that for any parameter to theicon function the evaluatorwill first look for a precise match, then for a match on the basis of the parameter’s type, andfinally for a general default equation.

The equality test of Hydra is heteromorphic and makes use of type information to comparevalues. Hence the tests

True == 1;JohnSmith == [JohnSmith];

are both well-formed and evaluate toFalse. As well as comparing atomic and constructedvalues it is also possible to compare functionson the basis of their syntactic identity. Hencethe test

age == parent;

is well-formed and returnsFalse. This facility does compromise referential transparency—the property of declarative languages where two identical subexpressions always evaluateto the same result—to the extent that it is possible for the user to give two functions,f andg say, identical definitions and yet for the testf == g to evaluate toFalse. In practice


though this is unlikely to pose a problem and, from a database perspective, given that suchfunctions will certainly have different application semantics it is questionable whether atest purely on the basis of the functions’ denotations would be preferable. These issues arediscussed in greater detail elsewhere (Ayres and King, 1995).

3.3. General computational facilities

The computational power of primary functions is limited by the type-restrictions imposedon their definitions. To extend the language to computational completeness a further classof functions—termedsecondary—is supported. These are user-defined functions that arenot considered to form part of the database so their types and definitions are unconstrainedby data-modelling considerations.

As with primary functions, secondary functions must be declared before their definitionis given. A polymorphic function to return the length of a list can be declared and definedas

secondary length :: [’a] -> int;length (x:xs) = 1 + length xs |length [] = 0 ;

and used in expressions such aslength (like ?person) to return the number of personsurrogates defined in the database.

Secondary functions are evaluated using top-to-bottom pattern matching; different defin-ing equations are separated by vertical bars. This approach has been taken since secondaryfunctions are likely to have a relatively small number of defining equations and top-to-bottom pattern matching can be implemented more efficiently than the best-fit approachused with primary functions. A further difference is that the definition of a secondary func-tion must be given in one go and cannot be updated (though it can be removed altogether asin the commanddelete length; and then redefined). Secondary functions are treated inthis way because their definitions, which do not track an application domain, are stable andunlikely to be updated. Their implementation can also be optimised using all the standardfunctional program compilation techniques (Jones, 1987)—the current implementation ofHydra uses a supercombinator implementation strategy for secondary functions.

A function to test whether a value is null can be defined using a secondary function asfollows

secondary isnull :: ’a -> bool;isnull x = not (x == x);

This definition makes use of the property of Hydra nulls that a test such as?int == ?intreturnsFalse. Theisnull function can be used in a functionml defined as

secondary ml :: ’a -> [’a];ml x = if (isnull x) [] [x];

154 AYRES

Thus(ml ?int) returns[] and(ml 2) returns[2]. The purpose ofml is to coercesingle-valued functions to behave in the same way as multi-valued ones. Hence, the ex-pression

ml.age JohnSmith;

returns a singleton list if the age ofJohnSmith has been defined and an empty list otherwise.Theml function is used by the associational primitives discussed below.

3.4. Associational facilities

The novel feature of Hydra lies in the associational facilities it provides. These enable theuser to determine the way in which database entities or values are related to each other andto program generalised searches of the database. These associational facilities are providedthrough the mechanism of built-in primitives which make use of schema information todetermine what functions can be applied to atomic values or the ways in which atomicvalues are associated with each other. There are four primitives—from, to, trail, andlink which are introduced below.

The primitivefrom returns all the primary functions which can be applied to a value.Given the schema shown in figure 4 (where multi-valued functions are represented withdouble-headed arrows), the query

from JohnSmith;

returns the answer

[works at, ml.sex, pers desc]

The from function is evaluated by first determining the type of its parameter and theninspecting the schema to return a list of primary functions with the same domain. The orderof the functions in this list (and in the lists returned by other associational functions) isimplementation dependent but consistent between updates to the database.

Figure 4. Portion of schema of a criminal intelligence database.


Figure 5. Instance level fragment of criminal intelligence database.

Thefrom primitive may be used to find all the information held onJohnSmith with thequery

[(f, f JohnSmith) | f <- from JohnSmith];

which (given the instance-level database shown in figure 5) will give the answer

[(works at,[RomaRest]), (ml.sex,["male"]),(pers desc,["limps", "waiter"])]

Coercing single-valued functions to be list-valued through the use of the secondary functionml makes the answer more uniform since only list-valued functions are returned. Thisuniformity makes it easier to use primitives likefrom in the definitions of other functions.

It is to cater for the results offrom and the other associational primitives that the universaltype? is provided. This is because a list such as

[works at, ml.sex, pers desc]

is not well-typed according to a strict polymorphic type system since it contains values bothof typeperson -> [location] and of typeperson -> [string]. With the universaltype, however, the list can be assigned the type[person -> [?]]. Consequently, the typeof from is taken to be

’a -> [’a -> [?]]

Note that iffrom is applied to a value which is not atomic (and thus could not appear in thedatabase) the empty list is returned.

The primitiveto is similar tofrom except that instead of returning primary functions itreturns converse functions. Hence (given the schema in figure 4) the query

to KingGeorge;

156 AYRES

returns the result

[~place, ~works at]

The primitivetrail is concerned with finding paths through the database graph whichconnect entities or values. The query

trail 1 JohnSmith RomaRest;

searches the database for direct connections betweenJohnSmith and RomaRest. Theresult is returned in the form of a list of lists each one of which represents a separate pathconnecting the two values. A path is represented as an alternating sequence of nodes andarcs, thus an answer to the above query might be

[[JohnSmith, works at, RomaRest]]

If the original query had been expressed instead as

trail 1 RomaRest JohnSmith;

the same connection would have been found but expressed instead as

[[RomaRest, ~works at, JohnSmith]]

The first parameter totrail limits the length of path which will be searched for. Thus

trail 3 JohnSmith Inc045;

will return the result

[[JohnSmith, works at, RomaRest, ~place, Inc045],[John, works at, RomaRest, ~works at, MarcoBonni,

~involved, Inc045]]

corresponding to two connections—JohnSmith works at the place where the incidentoccurred andJohnSmith works at the same place asMarcoBonni who was involved inthe same incident. The paths which traverse the string constants"waiter" and"male"are not returned in the result. This is because a constraint that scalar values may onlyappear at the ends is imposed on the paths retrieved—intermediate nodes on a path must beentities. This condition is imposed to eliminate paths which probably have little interest orsignificance—such as the path[JohnSmith, sex, "male", ~sex, MarcoBonni].

The evaluation oftrail is carried out by first using the schema information to determinewhat connections could exist between the two values (given their types) and the database isthen queried to confirm the existence of actual paths. The evaluator automatically eliminatesany paths which contain a loop, that is in which the same node appears more than once.


The last primitive,link, carries out the same search astrailabove but returns the answerin the form of simple or composed functions corresponding to the paths found. Thus

link 1 JohnSmith RomaRest;

might give a result such as[works at] signifying that there is only one direct connectionbetween the two entities. Searching for longer connections results in composed functionsbeing returned. For example

link 3 JohnSmith Inc045;

will return

[~place...works at, ~involved...~works at...works at]

Applying any of the functions in the result list toJohnSmith will return a list of entitieswhich will includeInc045 as one of the elements.

The associational primitives outlined above can be used with typed nulls to directly queryschema information. Hencefrom ?person returns a list of primary functions with the typeperson as domain;to ?person a list of converse functions withperson as domain; andlink 2 ?person ?person shows all the ways in which person entities could be directlyor indirectly associated.

4. Examples

The advantage of integrating associational primitives into Hydra is that it becomes possibleto define functions which carry out complex searches on the database. For example, theprimitives from andto may be used to find everything that is known about a databasevalue. Such a facility is most simply encoded as a secondary function, which we callknown, declared and defined as

secondary known :: ’a -> [(’a -> [?], [?])];known x = [(f,f x)|f <- append (from x) (to x)|f x != []];

where functions which do not associate any values withx are filtered out using the testf x != []. This definition uses the standard functionappend to combine two lists,defined as follows

secondary append :: [’a] -> [’a] -> [’a];append (x:xs) ys = x : append xs ys |append [] ys = ys ;

Given the instance-level database fragment shown in figure 5, the query

known MarcoBonni;

158 AYRES

returns

[(works at,[RomaRest]), (ml.sex,["male"]),(pers desc,["waiter","tall"]),(~involved,[Inc045])]

whereInc045 is an incident entity. It is a relatively simple matter to extend the defi-nition of known so that it will expand the database around a given node to a specifieddepth.

A similar function toknown, which finds all the neighbours of a given node in the databasecan be defined as

secondary nbrs :: ’a -> [?];nbrs x = edups [n | f <- append (from x) (to x) | n <- f x]

whereedups is a function to eliminate duplicates from a list and is declared and defined as

secondary edups :: [’a] -> [’a];edups (x:xs) = if (member x xs) (edups xs) (x:edups xs) |edups [] = [] ;

wheremember, a function to find whether an element appears in a list, is defined as

secondary member :: ’a -> [’a] -> bool;member x (y:ys) = (x == y) or (member x ys) |member x [] = False ;

Consequently a query such asnbrs JohnSmith will give the result

["waiter", "limps", "male", RomaRest]

Thenbrs function has been defined to be used as a building block in a functioncentre tobe discussed below.

Suppose someone wishes to search the criminal intelligence database to find the identityof a limping man who is believed to be connected with a particular restaurant (RomaRestin our example). If the way in which this man is associated with the restaurant is known(as a customer or the proprietor, for instance) the search is relatively simple. Howeverthe situation can easily arise where the manner in which the known entities or valuesare associated to the unknown person may not be known. In conventional databases thissituation is problematical and can only be resolved by trying out large numbers of queries orwriting programs to scan the database. Given that, in Hydra, we can find all the neighbouringnodes of any item in the database it is simple to find entities or values which are directlyconnected with any number of known values or entities.

A function to do this,centre, which takes a list of entities or values and returns a listof all those database nodes which have a direct association with each of the items in the


parameter list can be declared as

secondary centre :: [?] -> [?];

and defined using thenbrs function as follows:

centre [] = [] |centre [x] = nbrs x |centre (x:xs) = inter (nbrs x) (centre xs);

whereinter takes two lists and returns a list of items which appear in both the lists. It isdeclared and defined as

secondary inter :: [’a] -> [’a] -> [’a];inter (x:xs) ys = if (member x ys)

(x : inter xs ys)(inter xs ys) |

inter [] ys = [] ;

Hence a query such as

centre [RomaRest, "limps", "male"];

will return the singleton list[JohnSmith].It is relatively simple to generalise the definition ofcentre so that it can cope with

indirect as well as direct associations or so that it can find the entities or values which areassociated with the highest number of elements in the parameter list. The important pointabout functions such asknown andcentre is that they are easily encoded and provideuseful facilities which are not available in standard database systems.

5. A graphical query interface

One problem which arises in Hydra is that the semantics of the results returned may oftenbe obscured. For example an answer such as

[[John, ~involved, Inc003, involved, Bill],[John, works at, KingGeorge, ~place, Inc003, involved, Bill]]

to the querytrail 4 John Bill; obscures the fact that the two paths overlap. Clearly agraphical representation of the result would be preferable. A further problem with a textualinterface is that it excludes multimedia datatypes such as pictures or text.

These weaknesses motivated the development of VisualQ, a graphical interface to Hydra(Abreu, 1995; Ayres and Abreu, 1997). VisualQ exploits the restricted functional data modelof Hydra by allowing the user to draw out the database in the form of a graph. Starting from

160 AYRES

a blank canvas the user can place nodes corresponding to values in the database and browsethrough the database. Entities are represented by icons and connections between entitiesor values (corresponding to primary function definitions) are shown as labelled arcs. Anadvantage of using a free-form canvas is that multimedia data (such as pictures or text) canbe easily incorporated.

The VisualQ interface is shown in figure 6. This shows part of a criminal intelligencedatabase in which a particular incident (Inc100, represented by a running-man icon) andsome associated information is displayed. The text block on the left gives a descriptionof the incident—a bank robbery. Entities connected to the incident, a car and a suspectalong with his description, are also shown. The user has also managed to retrieve fromthe database a known criminal,MarcoBonni who fits the suspect’s description, along withhis photo. This association was retrieved by using a VisualQ option to invoke thecentrefunction using the three known attributes ofSus100a.

VisualQ is at the same time a drafting tool and a database query interface. Thus thecomponents of the diagram can be dragged on the canvas to make it more readable. Alsoa limited set of queries can be invoked by clicking on values shown on the canvas. Thoseentities or values which are displayed with a question mark underneath them have con-nections to other values which are not shown on the canvas. Clicking on the car entityF234GHJ automatically invokes the queryknown F234GHJ; on the underlying databaseand the results of this query are shown in the sub-window in the middle of the canvas.This sub-window shows all the functions (underlined) which can be applied to the en-tity and, indented, the results of applying them. Thus we can see thatF234GHJ is a redVauxhall Cavalier. Values in the sub-window which are prefixed by*** already appearon the canvas. Entities or values in the sub-window can be transferred to the canvas byclicking on them and their connection withF234GHJ will automatically be drawn in. WhenVisualQ draws an entity (such asF234GHJ) on the canvas it carries out the queryiconF234GHJ; on the underlying database to determine what icon is to be used to represent theentity.

In figure 6 the user has selected the two entitiesF234GHJ andMarcoBonni by high-lighting them. The underlyingtrail function of Hydra can now be invoked (throughthe option on the left of the window’s menu bar) to determine if there is any connectionbetweenMarcoBonni and the car used in the incident. The result of the trail query is au-tomatically placed on the canvas giving the view shown in figure 7 where the user hasretrieved an additional incident description. It appears from this result that the car usedin the bank robbery was dumped at a location near the place of work (an Italian restau-rant) of MarcoBonni. Note that the association labellednear between the place wherethe car was dumped and the restaurant is a purely intensionally defined primary functionwhich uses the map coordinates of locations to determine nearby locations held in thedatabase.

This partial overview of VisualQ demonstrates that the results of Hydra queries can bepresented in a form appropriate for the naive user. Currently VisualQ only uses a smallnumber of underlying Hydra functions but is being extended to present the results of otherHydra-defined queries. This will produce a flexible system in which new Hydra functionscan be quickly incorporated into a user-friendly interface.


Fig

ure

6.T

heV

isua

lQgr

aphi

cali

nter

face

with

the

ends

ofa

trai

lque

ryhi

ghlig

hted

.

162 AYRES

Fig

ure

7.R

esul

tof

trai

lque

ryfo

rmat

ted

byV

isua

lQ.


6. Conclusion

The use of a restricted functional data model in the development of Hydra has yieldedseveral benefits:

• It allows the language to incorporate features to query the way in which database entitiesor values are associated with each other—a query capability which is not available instandard database systems and which is provided without compromising any standardquery features.• The restricted model ensures that the database has a simple network representation which

has been exploited in the VisualQ interface.• The functional view makes possible a relatively clean integration of the database with

a computationally complete programming language. The way in which functional lan-guages treat functions as “first-class citizens” makes it possible to return functions them-selves as query results in certain circumstances. This integration would not be so easy ina logic-oriented language, such as Prolog, due to the syntactic distinction made betweenpredicates and atoms.

The only expense of the data model, from a user perspective, is the distinction which must bemade between primary and secondary functions. However, we maintain that the benefits ofthis separation—making the data model explicit, allowing associational queries, permittingdifferent update and evaluation behaviours—far outweigh the inconvenience it imposes.

Further work. Currently the Hydra system is still under development and there is morethan one avenue of research which needs to be investigated. The main ones are:

• To investigate the properties of the type system to formally demonstrate its soundnessand possibly to modify it to provide support for class hierarchies.• To investigate changing the data model. There are several kinds of change which need to

be investigated. These include altering it to incorporate temporal or certainty information.The simplicity of the data model makes it a good candidate for experimentation withincorporating further semantics. The other avenue is to investigate whether to incorporatecomplex objects but limit the search space for associational queries so that any internalstructure of such objects would be effectively ignored when looking for associations. Thiswould widen the applicability of the data model to domains where there is no advantage indecomposing information down to its most basic values and connections—for example,in holding co-ordinates in geographic or scientific data.• To enhance the interface to ensure that the full query power of Hydra is not obscured.

An obvious first step is to incorporate facilities to provide the standard query facilities ofrelational databases and then extend these possibly with further Hydra-specific facilities.The interface also needs to be enhanced to permit schema and data definition and update.

References

Abreu, R. (1995). A Visual Query Interface to the Associational Functional Database Language Hydra. Master’sthesis, Birkbeck College, University of London.

164 AYRES

Ayres, R. and Abreu, R. (1997). VisualQ: A Graphical Interface to Facilitate Database Exploration. Submitted forPublication.

Ayres, R. and King, P.J.H. (1995). Entities, Functions, and Surrogates in Functional Database Languages. InB. Werner (Ed.),Proceedings of Basque International Conference on Information Technology, BIWIT 95. SanSebastian, Spain: IEEE Computer Society Press.

Ayres, R. and King, P.J.H. (1996). Querying Graph Databases Using a Functional Language Extended withSecond Order Facilities. In R. Morrison and J. Kennedy (Eds.),Advances in Databases, 14th British NationalConference on Databases, BNCOD14. Edinburgh, UK: Springer-Verlag.

Buneman, P. and Frankel, R.E. (1979). FQL—A Functional Query Language. In P.A. Bernstein (Ed.),ProceedingsACM SIGMOD 79 Conference(pp. 52–58). ACM.

Codd, E.F. (1979). Extending the Database Relational Model to Capture More Meaning.ACM Transactions onDatabase Systems, 4(4), 397–434.

Consens, M.P. and Mendelzon, A.O. (1990). GraphLog: A visual formalism for real life recursion.Proceedingsof the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(pp. 404–416).

Consens, M.P. and Mendelzon, A.O. (1993). Hy+: A hygraph-based query and visualisation system.Proceedingsof the 1993 ACM SIGMOD International Conference on the Management of Data(pp. 511–516). ACM Press.

Guting, R.H. (1994). GraphDB: Modelling and querying graphs in databases.Proceedings of the 20th InternationalConference on Very Large Data Bases. Santiago, Chile.

Jones, P.S.L. (1987).The Implementation of Functional Programming Languages, Prentice-Hall International.Kent, W. (1979). Limitations of Record-Based Information Models.ACM Transactions on Database Systems,

4(1), 107–131.Kulkarni, K.G. and Atkinson, M.P. (1986). EFDM: Extended Functional Data Model.The Computer Journal, 29,

38–46.Milner, R. (1978). A Theory of Type Polymorphism in Programming.Journal of Computer and System Science,

17(3), 348–375.Poulovassilis, A. and King, P.J.H. (1990). Extending the functional data model to computational completeness.

Proceedings of EDBT’90(vol. LNCS 416, pp. 75–91). Venice, Italy: Springer-Verlag.Shipman, D. (1981). The Functional Model and the Data Language DAPLEX.ACM Transactions on Database

Systems, 6(1), 140–173.Turner, D.A. (1985). Miranda: A Non-Strict Functional Language with Polymorphic Types. In J.P. Jouannaud

(Ed.), Functional Programming Languages and Computer Architectures. Springer-Verlag. Lecture Notes inComputer Science No. 201.

Wasserman, S. and Faust, K. (1994).Social Network Analysis: Methods and Applications. Cambridge UniversityPress.

Download - The Functional Data Model as the Basis for an Enriched Database Query Language

Top Related