towards the implementation of a generalized fuzzy relational

E L S E V I E R Fuzzy Sets and Systems 75 (1995) 273-289

FUZZY sets and systems

Towards the implementation of a generalized fuzzy relational database model

J . M . M e d i n a * , M . A . Vi la , J .C. C u b e r o , O . P o n s

Department of Computer Sciences and Artificial Intelligence, University of Granada, 18071 Granada, Spain

Received April 1993; revised June 1994

Abstract

This paper shows the necessary elements for the effective implementation of the generalized fuzzy relational database model.

From the model described in Medina et al. (1994) some criteria for representation and handling of imprecise information are introduced, the most important aspect being the simplicity of the implementation. The paper shows a series of mechanisms to implement imprecise information in a classical RDBMS. Having the information represented in a classical RDBMS data structure and having the implementation of procedural knowledge about such information, we will be able to build a FRDBMS on a host RDBMS.

Keywords." Fuzzy relational database; Fuzzy sets.

1. Introduction

In the last few years, some authors have dealt with the problem of relaxing the relational model in order to admit some imprecision; this leads us to Database system that lay within the scope of Artifi- cial Intelligence, as they allow us to manage information with a terminology that is very similar to natural language.

Imprecision can be included in the system at two levels: The first level considers the possibility of making imprecise queries to the classic databases.

* Corresponding author. E-mail: [email protected].

The second one is related to the problem of adding imprecise information to the system.

In both cases, the fuzzy sets theory [35] provides a powerful tool to represent imprecision. So, at the first level, we consider the works [3, 12]. The handling of the problems at the second level gives rise to the fuzzy relational database models. The existing approaches at this level can be grouped into two classes: Models through similarity relations unifications and relational models based on possibility distribution.

In the first group the main works are [4-6, 1], and additional contributions are [24-26]. In the second group there are some approaches: Umano [27], Prade-Testemale [20] and Zemankova [37].

In the paper [15] we introduce a new extension of the Relational Model called GEFRED. This model incorporates a new definition for the data

0165-0114/95/$09.50 © 1995 - Elsevier Science B.V. All rights reserved SSDI 0165-01 14(94)00380-7

274 J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289

structure and the corresponding data handling, which allows us to integrate, in the same framework, the previous relational models.

In the papers [30, 32], it is proved that G E F R E D can be represented from a logical point of view and that previous models are considered as particular cases.

Once our model, briefly described in Section 2, has been formulated, we are concerned with the problem of viability, that is, how, if possible, we can implement it. The problem of the implementation of FRDBMS has been treated in the literature followirlg.two basic lines. • Starting from a RDBMS with precise informa-

tion, to develop a syntax that allows formulate imprecise queries. In [3] there is a study about a SQL extension to make this kind of queries.

• To build a FRDBMS prototype which implements a concrete fuzzy relational database model. In this sense, some proposals have been made: in [28] Umano shows an implementation for his model and the work [13] is about the implementation of a FRDBMS based in Umano's model using Fuzzy-Prolog. For Prade-Testemale model, some aspects of its implementation can be found in I l l ] , and in [37] some ideas about the implementation of Zemankova-Kandel model are showed. Our particular proposal is inside the first line,

but including the capability of representating and handling fuzzy information in a classical RDBMS. The first thing to be considered is to adopt a particular criterium for imprecise information representation, i.e., to find the most suitable representation for imprecise data and operators among those possible in GEFRED. Such representation (described in Section 3) will facilitate, as far as possible, the corresponding implementation.

In Section 4 we will analyze some aspects related to the implementation of fuzzy information in the system to be developed. In this section, it is shown that the mechanisms adopted to implement all elements related to imprecise information are very general, and their validity is extended to implementations based on other criteria. This section ends with an example that illustrates the way in which imprecise information is represented and how it is implemented in the database and in the metaknowledge base.

2. Theoretical model used

In this section we introduce the basic elements of a fuzzy extension of the relational model, called GEFRED, described in [15]. Such an extension includes some elements shown in the previously reviewed fuzzy relational models of new characteristics.

The main contributions are the following. • Information handling whose imprecise origin is

wider. • A different information organization. The same

relation structure is used to represent the initial information, the information resulting from alge- braic operations and the final results.

• A certain control can be made on the precision with which any simple condition involved in a query is satisfied.

2.1. Data structure

The information the model handles is organized as follows: • The domain DG underlying every attribute of the

relation contains some of the data of Table 1. • We structure the data through a relation model,

RF~, given by

Rr~ ~ (DG1, C1) × ... × (De,, C,),

where every D~j is a domain of the type previously described, Cj is a "compatibility attribute" that takes its values in [0, 1]. Every attribute is associated with a "compatibility attribute". In base relations, "compatibility attribute" does not appear. This relation represents the initial information as well as that resulting from the fuzzy algebra operations made on it. Handling of these relations through fuzzy relational algebra could modify, for every tuple, the compatibility attribute values.

2.2. Data handling

The fuzzy algebra used in this model is an extension of the classical one; in this extension specific comparison operators are used in order to handle fuzzy information. Fuzzy querying receives special handling, based on the following points.

£M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289 275

Table 1 Data types

1. 2. 3. 4. 5. 6. 7. 8. 9.

10.

A single scalar (Behavior = good, represented by the possibility distribution, 1/good) A single number (Age = 28, represented by the possibility distribution, 1/28) A set of possible scalar assignments (Behavior = {good, bad}, represented by {1/good, 1/bad}) A set of possible numeric assignments (Age = {20, 21}, represented by { 1/20, 1/21}) A possibility distribution in a scalar domain (Behavior = {0.6/bad, 0.7/normal}) A possibility distribution in a numeric domain (Age = {0.3/23, 1.0/24, 0.8/25}, fuzzy numbers or linguistic labels) A real number belonging to [0, 1], referring to degree of matching (Quality = 0.9) An Unknown value with possibility distribution, Unknown = {I/u: u E U} An Undefined value with possibility distribution, Undefined = {0/u: u e U} A NULL value given by NULL = {1/Unknown, l/Undefined}

• We call "atomic selection" a query, on a relation type RrG, in which we look for the satisfaction of a simple condition.

• When an attribute, an operator and a fuzzy con- stant are involved in an "atomic selection", such a condition will be satisfied in a degree for every attribute value. Such a degree takes a value in [0, 13.

• In an "atomic selection" we can establish a threshold for the degree of satisfaction of a condition. Thanks to that threshold in the "atomic selection" we can eliminate those tuples that do not satisfy the condition to a great or equal degree as the threshold.

• The result of an "atomic selection" with a threshold for the degree is, once again, a relation of the type introduced in Section 2.1. In that relation, the degree of satisfaction of a condition for every value of the attribute involved appears in the compatibility attribute. Compound conditions are those obtained com- bining simple conditions through logic connect- ives (negation, conjunction and disjunction). Compound conditions are solved as follows. - From every simple condition we obtain the

resulting relation applying the "atomic selection" with a given threshold.

- For simple conditions connected with con- junctive operator, we make the intersection of the relations obtained from every condition. Afterwards, the values of the "compatibility attribute" associated with every attribute involved in the simple conditions are computed. Such computing consists in giving to the compatibility attribute of every tuple of the

intersection a value that is equal to the min- imum of those present in the respective initial simple conditions.

- For simple conditions connected with disjunc- tive operator, we make the union of the relations obtained for every condition and update the compatibility attribute with the maximum value.

- For a negated simple condition, we update the compatibility attribute value with the complement to 1 of the present value in every tuple.

3. Fuzzy information representation

The elements related to fuzzy data handling can have different representations. So then, a nor- malized possibility distribution, for example, can be represented by different types of functions, but we will use a trapezoidal representation for it. The same can be said about the way we are going to model fuzzy relational operators as well as for the rest of fuzzy items to appear in the system. In this section, we show the representation criteria adopted in our implementation. These criteria is not exclusive for a concrete representation but represents the base on which the system is built according to the designed scheme for a FRDBMS. Therefore, we could say that these criteria constitutes a step between the formulation of a FRDB model and the effective implementation of a system based on it.

Precise data We will use the representation provided by the

host RDBMS.


I m p r e c i s e d a t a

The model considers two different groups with different representation for imprecise data. - " Imprec ise data over ordered underlying do-

main." This group of data contains possibility distributions defined on continuous or discrete but ordered domains. Type 6 of Table i belongs to this group. Each data of this type is associated with a membership function. For the sake of simplicity in the representation and computing efficiency, we will adopt the representation shown in Table 2.

- "Dat~ .with analogy over discrete domain." This group of data is built over discrete domains on which there are "proximity relations" defined between its values. In this case, we will have to store the data representation as well as the representation of the proximity relations defined on the domain values. The different data which we can represent in this group are the following. • Simple scalars. These data are represented using

the representation scheme of the host RDBMS. We only have to provide the system with the information for it to handle the "proximity relation" defined on the underlying domain.

• Possibility distribution over discrete domain. An imprecise data of this type is associated with

a representation in which the domain values that constitute it are described together with the respective possibility values for each of them. ((Pl, dl), . . . , (Pn, d,)).

- "Unknown" Data type. Data of this type express ignorance about the value an attribute takes but we know, in fact, that it can take one of the domain values. This means that it is possible for the attribute to take any of them. Therefore, we represent the U N K N O W N type through the possibility distribution { 1/u: u ~ U} where U is the underlying domain. Fig. 1 shows this possibility distribution.

- "Undefined" Data type. When an attribute takes the value UNDEFINED, it reflects the fact that none of the values of its domain are allowed. This means that none of the values are possible. Therefore, the possibility distribution associated is {0/u: u ~ U}, where U is the underlying domain. The possibility distribution is shown in Fig. 1. "Null" Data type. When an attribute takes the value NULL, it means that we have no information, either because we do not know it (UNKNOWN) or because a domain value (UNDEFINED) is not possible. The possibility distribution according to this case is {1/ UNKNOWN, 1/UNDEFINED}.

Table 2 Representation of data type

Data type

Trapezoidal possibility distribution

Intervalar distribution

Representation

1 ! 0

ct=fl

Imprecise data over ordered underlying domain

Representation

z;

Data type

Linguistic label

"Tall" 1

125 150 1"/0 190 200 cn~

J y=t5

Approximate value

n-margin n n+margin D

0

J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289

1

D' 0

Fig. 1. Types UNKNOWN and UNDEFINED.

277

Ilia --

0 D' -margin 0 margin

Fig. 2. "equal" and "approximately equal" operators.

(z ---'y)

Proximity relations We use proximity relations to model the impreci-

sion derived from the likelihood between two values of the discourse domain.

In this case we will only use proximity relations defined over finite discourse universes. So then, we can model such relations in a matricial way. Fuzzy relational operators

The different comparison operators used to re- late data base relations are the relational ones. To operate on imprecise information these operators are extended. The representation adopted in our model for the different relational operators is as follows. - "Equal to". This operator models the equality

concept for imprecise data. Formally, it can be expressed through the membership function given by

].,lequal_lo(~,~")= sup min(p(d,d'),nit(d),n1,(d')), (d,d')~DxD

where p(d,d') is a "proximity relation" nz(d), nz,(d') are the respective possibility distributions defined over the discourse domain D. • For imprecise data defined on ordered domain,

p(d, d') = 6(d, d'), where 6 is a Diracs delta. Taking into account the representation we give to this data type, the result of the equal to operation can be obtained geometrically (Fig. 2).

• For data with analogy on discrete domain, p(d, d') is the matricial representation of the "proximity relation" that is defined on the discourse domain D.

- "Approximately equal". This operator provides the degree in which two "crisp" numeric values are approximately equal. It is calculated according to the following expression:

~app . . . . . qual( X, Y)

={01 if I x - y l > m a r g i n , - Ix - yl/margin if Ix - yl ~< margin.

Fig. 2 shows the way it is calculated. The parameter margin fits the operator to the domain it is defined over.

- "Greater or Equal". It is defined on ordered domains. This operator membership function is given by the fuzzy relation

p >t (A,B) = sup min( ~> (x,y), ha(x), nB(y)), (x, y ) e X x Y

where A and B are imprecise data over ordered domain or "crisp" numeric data, hA(x), ha(y) their respective possibility representation and ~> is the classic greater or equal operator given

by


This operator can solve the following compari- sons. • Degree to which a "crisp" number is "greater

or equal" than a possibility distribution. • Degree to which a possibility distribution is

"greater or equal" than a "crisp" number. • Degree to which a possibility distribution is

"greater or equal" than another possibility distribution.

- " L e s s or equal". It is defined on ordered domains. This operator membership function is given by the fuzzy relation

# ~< (A, B) = sup min( ~< (x,y), gA(X), rtn(y)), (x, y )eXx Y

where A and B are imprecise data over ordered domain or "crisp" numeric data, nA(X), lrB(y) their respective possibility representations and ~< the classic operator "Less or equal" given by

{01 if x > Y , ~<(x ,y)= if x~<y.

This operator can solve the same comparison as the operator "9reater or equal"

- "Grea ter than". We define this operator from "less or equal" operator. To do that, we calculate the complement of such an operator:

#>(A,B) = 1 - I~<(a,B) .

- "Less than". It is defined as the complement of the operator "greater or equal":

,u<(A,B) = 1 - I~>~(A,B).

We will call "condition qualification" the action of establishing a matching threshold for an atomic (simple) condition involved in a query.

Such a threshold will be named "qualifier". This "qualifier", a value between 0 and 1, can be represented through a linguistic value; for example, if we state that the degree to which a condition is match- ed is "high", it means that we will accept all tuples whose degree of matching is greater or equal to 0.8. That is to say, we can associate linguistic values with qualifiers. The threshold value we associate with each linguistic label must be stored in the system and has, as well as linguistic labels, a subjec- tive meaning. Fuzzy quantifiers of a query

There are two quantifiers in the classic relational model: The existent quantifier EXISTS (3) and the universal quantifier FORALL (V). With the first one we obtain a true answer (TRUE) when some of the tuples satisfy the condition in the query, and with the second one the true answer is obtained when all the tuples in the database satisfy such a condition. But in the fuzzy case there is a wide range of quantifiers between the previous (3) and (V) which can be given in a linguistic way as: "almost none", "some", "a lot", "almost all'. These linguistic values have a representation in terms of possibility distributions on a domain DQ defined as

DQ = {d: d =

number of tuples satisfying the condition'~

number of consulted tuples ~. (1)

Fig. 3 shows graphically the behavior of the fuzzy relational operators described.

Query threshold qualifiers When we query an imprecise database we are

imposing some conditions that the resulting tuples must satisfy.

Given the imprecise nature of data and operators we work on,.there is a degree of matching for any condition involved in a query. This degree of matching is in the interval [0, 1]. Using a minimal threshold for this degree of matching we can control the precision with which the conditions of the query are satisfied.

4. Implementation of the imprecise information

It is carried out on three levels. • At the level of the FRDBMS. The system possesses

knowledge about the treatment of the available fuzzy operations. This is not the issue of this article, and it will be dealt with in forthcoming papers.

• At the level of the Database. The database consists of all the permanent elements describing a piece of the universe. As we are concerned with the representation of imprecise data, we must

J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289 279

o a)

o .%" o b)

/ f/

k

, ~ u>_B(z) g .(B<c)t /_____~

¢) g) k

~,<B(~) ~,<~(~)

o 3( o d/ h)

k

e) i)

Fig. 3. Fuzzy relational operators: (a) the three possibility distributions we operate on; (b) the membership function of /> operator applied to B; (c) the degrees in which A and C are "greater or equal" than B; (d) the membership function of ~< applied to B; (e) the degrees in which A and C are "less or equal" than B; (f) and (h) the membership function of operators > and < , respectively, built on B; (g) and (i) degrees in which A and C are "greater" and "less" than B, respectively, are calculated.


determine how we can store it. So, the data representation must be extended in order to deal with this kind of information.

• At the level of the metaknowledge base. There is a part of the Classic RDBMS where all the information the system must know about the pre- sented data is stored, "data about the data" (sometimes called "metadata'). We will refer to this part as the metaknowledge base. Usually, the previous information is represented through tables or relations organized in the so-called system dictionary. The FRDBMS must contain information about which are the elements in the Database with imprecise data as well as their nature and representation. We will call fuzzy metaknowledge base that extension of the metaknowledge base which captures all the necessary information about the imprecise data in the database. We have shown the criteria used in order to

represent several aspects about the imprecise information. Now, we proceed to describe the imprecise item implementation in the database and metaknowledge base.

4.1. Implementation of the imprecise information in the database

There are three types of attributes which can be treated in an imprecise manner. The classification is based on the underlying domain.

1. Attributes with "crisp data" having linguistic labels defined on them. They are called Type 1 attributes. The representation of these data attributes is similar to that of precise data. In addition, in-

formation about the linguistic labels as well as information about the attribute type is stored in the Metaknowledge Base.

2. Attributes with "imprecise data on ordered domain". They are called Type 2 attributes. The data types allowed are those described in Table 3. In- complete information such as UNKNOWN, UN- DEFINED and NULL, and precise information can also be represented. We must store the following information in the database: • The data type. These can be: UNKNOWN, UN-

DEFINED, NULL, "CRISP", LABEL, INTER- VAL, APROX., TRAPEZOIDAL.

• The parameter description defining each data, depending on the corresponding type data. For instance, there are four parameters describing a TRAPEZOIDAL type: ~, 13, 7, 6. In addition, for the metaknowledge base we need

to know which are these attributes and their characteristics.

The codification of this information depends on the implementation and the System characteristics we want to enhance. • Computer speed versus storage capabilities.

A closer representation could be used for some of these types but this would decrease the speed in most of the operations involved in a query.

• Uniformity in the representation. We use five classic attributes to represent each fuzzy attribute of this type.

• The RDBMS elements must be used to represent the information following the relational scheme. In conjunction with other criteria, this will allow us to translate any imprecise operation in terms of the classical relational model.

Table 3 Type 2 attributes representation

Data type F_TYPE F_I F_2 E 3 F_4

U N K N O W N 0 NULL NULL NULL U N D E F I N E D 1 NULL NULL NULL NULL 2 NULL NULL NULL CRISP 3 d NULL NULL LABEL 4 FUZZY_ID NULL NULL INTERVAL[A, B] 5 A 0 0 APROX(d) 6 d-margin Margin - Margin FUZZY 7 ~ /~ - ~ 7 -- 6

NULL NULL NULL NULL NULL B d + margin 6

J..M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289 281

Table 3 shows the implementation adopted: For each type 2 an attribute F is created, an attribute F_TYPE with the type code and attributes F_I, F_2, F_3, F_4 representing the parameters for each data. NULL values appearing in the attributes have the meaning assigned by the host RDBMS. For the LABEL type the FUZZY_ID code represents an identifier for the linguistic label defined in the metaknowledge base. "Margin" is another parameter stored in the metaknowledge base.

3. Attributes with "discrete domains with ana- logies": Type 3 attributes. These attributes are used for scalar data or possibility distributions on scalar domains: the representation is given in Section 3. It also accepts UNKNOWN, UNDEFINED and NULL.

For these kind of attributes the type and the representation associated with each data needs to be stored in the database. It is stored in the metaknowledge base which are the type 3 attributes as well as the definition for the proximity relations associated with the underline domain.

In Table 4 the implementation adopted for this kind of attributes is shown.

4.2. The fuzzy metaknowledge base

As we have seen in the previous section, there is information about the described attributes which must be stored in a accessible manner by the Sys- tem. The metaknowledge base organizes all the information concerning the imprecise nature of these attributes. We consider the metaknowledge base as an extension to the catalog system; so, we will organize the information using tables or relations. The elements stored in the metaknowledge base are the following.

• Which are the attributes in the database with imprecise treatment.

• The type of these attributes: type 1, 2 or 3. • Elements defined in the database scope, i.e.,

query fuzzy quantifiers • The fuzzy objects defined on each attribute:

- Linguistic labels - Approximate values

- Proximity relations - Query threshold qualifiers

4.2.1. Tables of the fuzzy metaknowledge base In this part we detail an example of implementa-

tion for the FMB. The organization of the tables of FMB is shown in Fig. 4. In the following section we are going to describe the data structure of each table of FMB.

FUZZY-COL This table contains a description of the system

items that permit a fuzzy handling (in analogous terms to those used in the conventional database catalogs) and establishes a classification either for the fuzzy data types that can appear in an item or for the type of handling that the FRDBMS will use with them.

The table is initially formed by four items in which the previous characteristics are reflected. The items are the following. - TABLE_NAME: The type of this item is "char-

acter"; its length is compatible with the length (for table identifiers) allowed by the system in which it is implemented. In this way, the rules used to name the tables will be the same as those used by the system. In our case, these rules are described in Section 3.3 of the SQL Language Reference Manual (ver. 6.0) [18]. This item contains the name of the

Table 4 Type 3 attributes representation

Data type F_TYPE F_P1 F_I F_P2 F_2 F_P3 F_3 ..-

U N K N O W N 0 N U L L N U L L N U L L N U L L N U L L N U L L U N D E F I N E D 1 N U L L N U L L N U L L N U L L N U L L N U L L N U L L 2 N U L L N U L L N U L L N U L L N U L L N U L L SIMPLE 3 1 d N U L L N U L L N U L L N U L L POS. DISTR. 4 p, dl P2 d2 Pa da


TABLE_NAME

FUZZY_COL

COLUMN_NAME~OLUMN.ID

! I

I

COLUMN_ID

COLUMN_ID BASE

FUZZY_kPROX._DEF

COLUMN_TYPE

FUZZY_OBJECT_LIST

OBJECT_NAME~--~OBJECT ID

\ )

J QUALIFIER

OBJECT_TYPE

LABEL-ID

FUZZY_QUALIF_DEF

O B J E C T / D 1 OBJECT_ID2 D E G R E E

FUZZY_NEARNES S__DEF

LABEL-ID ALFA BETA GAMMA DELTA

FUZZY_LABEL_DEF and FUZZY_QUANT_DEF

Fig. 4. Metaknowledge base scheme.

table which the fuzzy item (see C O L U M N _ NAME) belongs to.

- COLUMN_NAME: It is also a character type item and has the same considerations about the identifiers type that may be involved. The identifiers contained in this item refer to those columns which will receive fuzzy handling, because they contain fuzzy information or because they may be involved in a fuzzy querying process. The type of item is described in C O L U M N TYPE below.

-- COLUMN_ID: It is a positive number whose range depends on the implemented system (6 or 7 digits are enough). It is, in addition, primary key of this table and associates a numeric identifier with the item designated in C O L U M N _ NAME; its task is to be the reference (in any definition table) of the item associated with it, i.e., we refer to a fuzzy column with its COL- UMN_ID.

- COLUMN_TYPE: This digit item, is very important because it contains information about


the data types and the handling that the columns designed by COLUMN_ID will receive. Below, we analyze the data types (both fuzzy and crisp) to be managed along with their types of management. Column Types. Apart from the conventional data of a database (called "crisp"), we consider those added under the new fuzzy conception of the database. The classification of the different columns is the same as that adopted for the attributes in Section 4.1. • Type 1. Those columns involving data that,

though crisp, are susceptible to fuzzy handling (in a query) and thus can appear in the description in the FUZZY_COL table. This condition will be translated into the COLUMN_TYPE item (of FUZZY_COL table) as "1". The domains of these data types will also be the same as in the initial database. The treatment of this type of data, as well as the conventional functions, is determined by the linguistic labels defined over them and appropriately coupled in the FUZZY_ OBJECT_LIST table (described below).

• Type 2. This classification permits the system to recognize the fuzzy data in the item, referred to such as linguistic labels and possibility distributions as well as the most suitable handling, according to the information in the FUZZY_OBJECT_LIST table. This type of column is identified with a "2" in the COLUMN_TYPE item in table FUZZY_ COL.

• Type 3. These columns require that proximity relations over the elements of each item are defined. The information related to these definitions, as well as the domain values allowed, are given in tables FUZZY_OBJECT_LIST and FUZZY_NEARNESS_DEF. Items of this type will be identified with a "3" in the COLUMN_TYPE item of table FUZZY_ COL.

FUZZY_OBJECT_LIST This table contains a list of fuzzy objects defined

in the database columns. Furthermore, it contains a classification of these objects through the OBJECT_TYPE item. The information is structured as detailed in the following.

- COLUMN_ID: NUMBER(6). Contains the number that identifies the column over which the object named in the OBJECT_NAME item of this table is defined. It constitutes a foreign key to the FUZZY_COL table.

- OBJECT_NAME: CHARACTER(30). Con- tains the name of an object among the types involved in the OBJECT_TYPE item. OBJECT_ID: NUMBER(6). Assigns a number to each object; this number is used to identify the object in the rest of the tables. This item, together with the COLUMN_ID item consitutes the primary key of this table.

- OBJECT_TYPE: NUMBER(l). Specifies the object type identified by the OBJECT_ID item; it therefore tells the system which table to look for in the suitable definition of the object involved. See Section 3 for more information about these objects.

The values allowed are: - 0 for linguistic label (trapezoidal type) - 1 for scalar related to proximity relations

handling - 2 for qualifier labels defined over the matching

degree of the query - 3 for quantifier labels over query. FUZZY_LABEL__DEF This table contains the points that determine the

membership function corresponding to the linguistic labels of trapezoidal type. The items in this table are:

- LABEL_ID: NUMBER(6). Contains the number that identifies the label through the OBJECT_ID item in the FUZZY_ OBJECT_LIST table. This item constitutes a foreign key to this table. The LABEL_ID item is the primary key of the FUZZY_ LABEL_DEF table.

- ALPHA: number. ~ = inf{x: x s s u p p o r t (label)}

- BETA: number. ~ = inf{x: x ~ kernel(label)} - G A M M A : number. ? = s u p { x : x ~ kernel

(label)} - DELTA: number. 6 = sup{x: x ~ s u p p o r t

(label)} (see Fig. 2 in Section 3).

F U Z Z Y _ A P P R O X _ D E F

This table contains all the information about the membership function of the "approximately" type

284 ,I.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289

labels. These functions are triangular, with membership value 1 for the point over which the ap- proximation is considered and can be characterized by this point, that will be given in the query, and by the width of the triangle base. Therefore, the table is composed of a label identifier, LABEL_ID (the primary key) and by the item that contains the MARGIN (see Fig. 2 in Section 3).

FUZZY_NEARNESS_DEF This table represents the proximity or similarity

measures between the different domain values allowed for type "3" items of the FUZZY_COL table. The information is structured in this way: Two items establish the relation between two domain values, and the third one contains their degree of proximity. We use the concept of proximity because it is less restrictive in the sense that it only demands symmetric and reflexive properties. Thus, it is not necessary to give information about those pairs whose values can be deduced from these properties R(x , x) = 1, R(x , y) = R(y, x). The items are structured as follows:

OBJECT_IDI: number. Contains the first value of the related couple.

- OBJECT_ID2: number. Contains the second value.

- DEGREE: number ([0, 1]). Contains the degree of proximity of the related concepts of OBJECT_ID1 and OBJECT_ID2.

Note that the primary key to this table is formed by OBJECT_ID1 and OBJECT_ID2.

FUZZY_QUALIFIERS_DEF This table contains the qualifier associates to the

linguistic value given in the table FUZZY_OB- JECT_LIST. The concept qualifier is introduced in Section 3. Fig. 5(a) shows a representation example for this item. The items in this table are:

- LABEL_ID: Number(6). Is the primary key and identifies the label over which the definition is established.

- QUALIFIER: number[0, 1]. Qualifier value for the label referenced in the LABEL_ID item.

FUZZY_QUANTIFIERS_DEF This table contains the label definitions of the

quantifiers used in the query. These labels identify trapezoidal possibility distributions over [0, 1] domain. What this domain represents is defined in Eq. (1). Thus, we have:

- LABEL_ID: number(6). Is the primary key and identifies the label over which the definition is established.

- ALPHA: number[0, 1]. ct = inf {x: x ~ support( label)}

- BETA: number[0, 1]. fl = inf {x: x ~ kernel( label)}

- GAMMA: number[0, 1]. = sup{x: x ~ kernel( label)}

- DELTA: number[0, 1]. 3 = sup {x: x e support( label)}

(see Fig. 5(b)).

4.3. Example o f implementat ion o f imprecise information

To illustrate the complete mechanism of the imprecise information implementation in a conventional RDBMS, we give an example of how the information related to a table involving both crisp and fuzzy data is expressed. Our example is based on a group of employees shown in Table 5.

The items NAME and ADDRESS contain crisp information, the former being the primary key to the table; the information they contain is expressed

LABEL_ID 1

i j

1 0

LABEL_ID

QUALIFIER

a) b)

Li

1

Fig. 5. Definitions in FUZZY_QUALIE_DEF and FUZZY_ QUANT_DEF tables.


Table 5 EMPLOYEES

N A M E ADDRESS AGE P R O D U C T I V I T Y SALARY

Luis Recogidas 31 Good High Antonio Reyes Catblicos Middle Fair 100 000 Juan Carlos Camino Ronda Young Bad 90000 Francisco P.A. Alarc6n Old Excellent Low Julia Puerta Real Young Good Medium In6s Manuel de Falla # 28 Good 125 000 Javier Gran Via *30, 35 Fair 105 000

OLD

/ 16

YOUNG MIDDLE

25 30 35 40 45 50 55 0 AGE' 6 5 8 0

i/ 0 50 65

LOW MEDIUM HIGH

85 95 110 130

SALARY m

180 x IOOO$

n - 5 n n + 5 AGE ~

c) Fig. 6. Labels definitions on AGE and SALARY attributes.

286 J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273 289

by the data types of the initial RDBMS (in this case Character).

The attributes AGE and SALARY permit fuzzy information (type 2 attribute); so some labels over the AGE and SALARY domains must be defined.

Table 6

Proximity relation over P R O D U C T I V I T Y

se(d, d) Bad Fair Good Excellent

Bad 1 0.8 0.5 0.1

Fair 0.8 1 0.7 0.5

Good 0.5 0.7 1 0.8

Excellent 0.1 0.5 0.8 1

The symbol # means "approximately" and it is defined in Fig. 6 and * an intervalar value.

The item PRODUCTIVITY also admits fuzzy information, but only over a discrete domain (type 3 attribute); so that when we make a query involving this attribute as evaluation element, we need to have previously defined the proximity relation over the elements of its domain.

The Figs. 6(a), (b) and (c) show the label definitions used for AGE, SALARY, and for the membership function for the "approximately" expression, respectively. The proximity relation for the PRO- DUCTIVITY is shown in Table 6.

Once these definitions have been made, the EMPLOYEES table is implemented in the "host" RDBMS as Table 7 shows, where we have adopted

Table 7

E M P L O Y E E S

... A G E _ T Y P E A G E 1 AGE_2 A G E 3 AGE_4 P R O D _ T Y P E ...

3 31 N U L L N U L L N U L L 3 4 4 N U L L N U L L N U L L 3

4 3 N U L L N U L L N U L L 3

4 5 N U L L N U L L N U L L 3 4 3 N U L L N U L L N U L L 3

6 23 5 - 5 33 3

5 30 0 0 35 3

... P R O D P1 P R O D 1 SAL T Y P E SAL 1 SAL 2 SAL 3 SAL_4

1 G o o d 4 2 N U L L N U L L N U L L

1 Fai r 3 100000 N U L L N U L L N U L L

1 Bad 3 90000 N U L L N U L L N U L L

1 Excel lent 4 0 N U L L N U L L N U L L

1 G o o d 4 1 N U L L N U L L N U L L

1 G o o d 3 125000 N U L L N U L L N U L L 1 Fai r 3 105000 N U L L N U L L N U L L

Table 8

F U Z Z Y _ C O L

T A B L E _ N A M E C O L U M N _ N A M E C O L U M N ID C O L U M N _ T Y P E

Employee Salary 0 2

Employee Age 1 2 Employee Productivity 2 3


Table 9 FUZZY_OBJECT_LIST

C O L U M N _ I D OBJECT N A M E OBJECT ID OBJECT_TYPE

0 Low 0 0 0 Medium 1 0 0 High 2 0 1 Young 3 0 1 Middle 4 0 1 Old 5 0 2 Bad 6 1 2 Fair 7 1 2 Good 8 1 2 Excellent 9 1

Table 10 FUZZY_LABEL DEF

LABEL_ID ALPHA BETA G A M M A DELTA

Table 11 F U Z Z Y _ A P R O X _ D E F

OBJECT_ID1 OBJECT_ID2 DEGREE

0 5000 65000 85000 95000 6 7 0.8 1 85000 95000 110000 130000 6 8 0.5 2 110000 130000 180000 400000 6 9 0.1 3 0 16 30 40 7 8 0.7 4 25 35 45 55 7 9 0.5 5 40 50 65 80 8 9 0.8

for the AGE, SALARY and PRODUCTIVITY the representation shown in the Section 4.1 (Tables 3 and 4).

See Table 9 (FUZZY_OBJECT_LIST) for the linguistics labels for the IDs shown in AGE_I and SAL_I attributes.

Tables 8-12 show how the information in the FMB tables is organized.

5. Conclusions

5.1. Summary

This paper is inscribed inside a work line which tries to give an alternative including the necessary elements for the development of FRDBMS. For this we have introduced in [16] a theoretical model of FRDB, whose basic aspects are described in Section 2. The second part of above mentioned work line is centred in the aspect of constructing FRDBMS which allow to put the ideas expounded in the

Table 12 FUZZY APROX_DEF

C O L U M N _ I D M A R G I N

1 5

formulated theoretical model into practice. Since this model can be seen as an extension of the classical relational model capable of representing and handling information of imprecise nature, we raise ourselves if it is also possible to start from conventional RDBMS to build on them extensions which operate on this kind of information. Exactly, this paper propounds the adoption of a series of approaches for: • Representing fuzzy information in the frame-

work of databases. • Implementing this representation through the

use of the available mechanisms in the conventional RDBMS.


With these approaches it has been pursued to satisfy the following general objectives.

1. To start from theoretical scheme proposed in GEFRED [16]. Nevertheless, some of the adopted criteria can be used on other theoretical approxi- mations of Fuzzy Relational Databases.

2. To provide representation to an extensive range of imprecise information.

3. To base our approach on the mechanisms provided by the conventional RDBMS at the mo- ment of deciding the way in which we can represent this information.

4. To select those approach which favour the efficiency in the treatment of the imprecise information in the framework of these systems.

In accordance with this, in Section 4, a proposal has been adopted for the representation of the fuzzy information (data and operators) where the simplicity without losing capacity of representation has been pursued. With this we have resolved the previous objectives, giving potency to the last three requirements principally. In fact, the adopted representation facilitates, as shown in Section 4, the implementation in conventional RDBMS. Besides, the form adopted for the operators induces elemen- tary implementations which are remarkably bene- fited as for efficiency. Exactly, Fig. 3 shows how the calculus of the main operations of fuzzy comparison can realize in geometrical form. The adoption of a complexer representation would have led to implementations more complicated than, besides to adduce nothing significant to the treatment of the fuzzy information, would have reduced the efficiency of this treatment substantially.

On the other hand a proposal for the implementation of the imprecise information in a RDBMS has been shown in Section 4 which satisfies the marked objectives and is based on the following characteristics. • To implement the imprecise information the

structure of data available in any RDBMS (domain, attributes, relations and catalogue of system) is used. The mechanism used to extend the capacity of representation of a conventional RDBMS is integrated perfectly in the system.

• It establishes an adequate implement scheme in order that the operations of fuzzy manipulation allow to obtain satisfactory results and reach

a high grade of efficiency. Besides, it facilitates the development of the routines that will attend to resolve these operations by stored information. In relation to the last point, it is essential to

notice that in this work the implementation of fuzzy algebra introduced in GEFRED is not treated al- though the representation which will be adopted has been shown for the comparison operators and for the imprecise information, the adequate organization for the development of this algebra is established in later works.

5.2. Research course in the future

In relation to what has been stated before we are working on the development of an implementation for the algebra introduced in GEFRED based on the scheme of implementation introduced in this paper. This study also includes the formulation of a fuzzy extension of SQL, capable of expressing operations of fuzzy manipulation. With the adopted representation, it is possible to build a pro- cessor for such a syntax, capable of translating fuzzy sentences to classical ones that can be pro- cessed in a conventional RDBMS.

With the theoretical model for fuzzy relational databases, the adopted representation for the imprecise information and the extended syntax for SQL, we will be able to develop a prototype that implements a FRDBMS based in a conventional RDBMS. Thanks to such a prototype we will evaluate some characteristics such as efficiency of the model and representation, capabilities for handling imprecise information and execution speed of operations involving fuzzy elements.

References

[1] M. Anvari and G.F. Rose Fuzzy relational databases, in: Bezdek, Ed., Analysis of Fuzzy Information, Vol. II (CRC Press, Boca Raton, FL, 1987).

1-2] J.F. Baldwin, FRIL-A fuzzy relational inference language, Fuzzy Sets and Systems 14 (1984) 155-174.

1-3] P. Bose, M. Galibourg and G. Hamon, Fuzzy queryng with SQL: extensions and implementation aspects, Fuzzy Sets and Systems 28 (1988) 333-349.


[4] B.P. Buckles and F.E. Perry, A fuzzy representation of data for relational databases, Fuzzy Sets and Systems 7 (1982) 213 226.

[5] B.P. Buckles and F.E. Petry, Extending the fuzzy database with fuzzy numbers, Inform. Sci. 34 (1984) 145 155.

[6] B.P. Buckles, F.E. Petry and H.S. Sachar, A domain calculus for fuzzy relational databases, Fuzzy Sets and Systems 29 (1989) 327 340.

[7] E.F. Codd, A Relational model of data for large shared data banks, Commun. ACM 13 (1970) 377-387.

[8] J.C. Cubero, J.M. Medina and M.A. Vila, Influences of granularity level in fuzzy functional dependencies, in: Symbolic and Quantitative Approach to Reasoning and Uncertainty, Lecture Notes in Computer Sciences, Vol. 747 (Springer, Berlin, 1993) 73-78.

[9] J.C. Cubero, O. Pons and M.A. Vila, Weak and strong resemblances and fuzzy functional dependencies, IEEE '94 lnternat. Conf., FL, to appear.

[10] J.C. Cubero and M.A. Vila, A new definition of fuzzy functional dependencies in fuzzy relational databases, lnternat. J. Intelligent Systems 9 (1994) 441 448.

[11] D. Dubois and H. Prade, Possibility Theory. An Approach to Computerized Processin 9 of Uncertainty (Plenum Press, New York, 1988).

[12] J. Kacprzyk and A. Ziolkowski, Database queries with fuzzy linguistic quantifiers, IEEE Trans. System Man. Cybernat. SMC-16 (1986).

[13] Li D. Liu, A Fuzzy Prolog Database System (Wiley, New York, 1990).

[14] J.M. Medina, Bases de Datos Relacionales Difusas: Modelo Te6rico y Aspectos de su Implementacibn, Ph.D. Thesis. University of Granada (1994).

[15] J.M. Medina, O. Pons and M.A. Vila, GEFRED. A generalized model of fuzzy relational databases, Inform. Sci. 76 (1994) 87-109.

[16] J.M. Medina and M.A. Vila, Un Modelo de Bases de Datos Aplicado a Informaci6n M6dica, let Congreso Es- pa~ol sobre Tecnolog~as y Lbgica Fuzzy, Granada (1991).

[17] J.M. Medina and M.A. Vila, Un Modelo de Representa- ci6n del Conocimiento para Bases de Datos Imprecisas, AEPIA "91, Madrid (1991).

[18] Oracle RDBMS, SQL Language Reference Manual vet. 6.0. (1990).

[19] O. Pons, J.M. Medina and M.A. Vila, Incomplete information in the framework of logic fuzzy databases, Internat. J. Approximate Reasoning, submitted.

[20] H. Prade and C. Testemale, Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries, Inform. Sci. 34 (1984) 115-143.

[21] H. Prade and C. Testemale, Representation of soft con- straints and fuzzy attribute values by means of possibility

distributions in databases, in: Bezdek, Ed., Analysis of Fuzzy Information, Vol. II (CRC Press, Boca Raton, FL, 1987).

[22] K. Raju and A. Majumdar, The study of joins in fuzzy relational databases, Fuzzy Sets and Systems 21 (1987) 19-34.

[23] K. Raju and A. Majumdar, Fuzzy functional dependencies and losslesss join decomposition of fuzzy relational databases system, ACM TODS, Vol. 13 (1988) 129-166.

[24] E.A. Rundensteiner, LW. Hawkes and W. Bandler, On nearness measures in fuzzy relational data models, Internat. J. Approximate Reasoning 3 (1989) 267-298.

[25] S. Shenoi and A. Melton, Proximity relations in the fuzzy relational database model, Fuzzy Sets and System 31 (1989) 285-296.

[26] S. Shenoi and A. Melton, An extended version of the fuzzy relational database model, Inform. Sci. 52 (1990) 35-52.

[27] M. Umano, Freedom-O: A fuzzy database system, in: Gupta-Sanchez, Ed., Fuzzy Information and Decision Processes (North-Holland, Amsterdam, 1982).

[28] M. Umano and S. Fukami, Perspectives of fuzzy databases, Japanese J. Fuzzy Theory and Systems 3 (1991) 75 91.

[29] M. Umano, M. Mizumoto and Tanaka, FSTDS system: a fuzzy-set manipulation system, Inform. Sci. 14 (1978) 115-159.

[30] M.A. Vila, et al. A logic approach to fuzzy relational databases, Internat. J. Intelligent Systems 9 (1994) 449- 461.

[31] M.A. Vila, J.C. Cubero, J.M. Medina and O. Pons, Logic and fuzzy relational databases: a new language and a new definition, in: P. Bosc and Kacprzyk, Eds., Fuzzy Sets and Possibility Theory in Database Management Systems (Physica Verlag, 1993).

[32] M.A. Vila, J.C. Cubero, J.M. Medina and O. Pons, On the use of a logical definition of fuzzy relational databases, 2nd IEEE Internat. Conf. on Fuzzy Systems, San Francisco (1993) 489-499.

[33] M.A. Villa, J.C. Cubero, J.M. Medina and O. Pons, The generalized selection: an alternative way for the quotient operations in fuzzy relational databases, in: International Conf. on Information Systems. IPMU '94, to appear.

[34] L.A. Zadeh, Similarity relations and fuzzy orderings, Inform. Sci. 3 (1971) 177 200.

[35] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1 (1978) 3-28.

[36] L.A. Zadeh, PRUF-A meaning representation language for natural languages, lnternat. J. Man-Machine Stud. 10 (1978) 395-460.

[37] M. Zemankova and A. Kandel, Fuzzy Relational Data Bases A Key to Expert Systems. (Verlag TUV Rheinland, 1984).

towards the implementation of a generalized fuzzy relational

Documents