design and simulation of a parallel inference machine architecture for rule based systems

Data & Knowledge Engineering 4 (1989) 267-285 267 North-Holland

Design and simulation of a parallel inference machine architecture for rule based systems*

Anupam BASU, Tapas K. NAYAK and Sarit M U K H E R J E E Dept. of Computer Science & Engineering Indian Institute of Technology Kharagpur-721302 India

Abstract. This paper presents an architecture of the inference machine for a rule based expert system. The paper, structured around the concept of "inference flow graphs", is aimed at incorporating parallelism in antecedent matching to find out the firable rules as well as firing more than one rule simultaneously, whenever required. Through this architecture, the number of comparisons required during the antecedent matching phase, is significantly reduced. The flow of inferencing can also proceed in a pipelined manner resulting in faster inferences.

Keywords. Rulebased system, Inference machine, Antecedent matching, Consequent evaluation.

1. Introduction

Expert systems are being increasingly used in problem domains where the solutions are not direct enough and require sufficient expert knowledge and heuristics. The key issues involved in the development of an expert system include the knowledge representation scheme and the inference machine architecture. Of the different knowledge representation methods suggested [3, 4, 7, 11, 15], the production rules have been popular and are being widely used [13, 14].

An inference machine, applied over a rule based system, cycles through the following states: (a) finding the rules whose antecedents match the current state of the database; (b) resolving the conflict and finding the rules to be fired, in case there is more than one enabled rule, and (c) firing the rule. The major bottleneck in production systems stems from the state of finding the rules which match the state of the database at any instant. Several pattern matching algorithms have been suggested to alleviate the problem [16]. A number of architectures have been proposed for production systems, to make the inference process faster. The Rete algorithm [8] represents the set of rules in the form of a binary tree and exploits parallelism at the node level as well as at the production level. Studies have been carried out on the architectures utilising AND-OR parallelism [5, 10]. Data flow and data driven architectures have also been proposed for the purpose of faster inferencing [12].

This paper proposes a forward reasoning inference machine architecture (INFLOW), for rule based systems and aims at faster pattern matching and parallel firing of more than one rule whenever possible.

The architecture has been based on the concept of "inference flow graph" and its corresponding mapping onto a structure of processing elements. The schematic diagram of the different modules of an expert system built around INFLOW is shown in Fig. 1.

The Long Term Database (LDB) contains the various procedures and information which are not expected to be changed frequently. The Short Term Database (SDB) stores the facts

0169-023X/89/$3.50 © 1989, Elsevier Science Publishers B.V. (North-Holland)

268 A. Basu et al, / Parallel inference machine architecture

U S E R

I N T F .

!

I I I I S D B I

t I N F E R E N C E

M A C H I N E

I I I I

_ . J

I ! I I i AM i c, 'i I A I ~ B I L _ _ _ -J . . . . d

R U L E B A S E

I_ I -

L D B

Fig. 1.

relevant for the current problem that the expert system handles. This information is stored through the instantiation of the different slots in various frames. While inferencing, the slot instantiations are modified as rules are fired by the inference machine.

The Rulebase contains the rules, based on which inferences are taken by INFLOW. Depending on the functional characteristics of the rules [2], the entire rulebase is partitioned into a number of ruleblocks. A rulegraph generator translates the set of rules in a ruleblock into an inference flow graph. Hence the rulebase essentially contains a set of inference flow graphs.

The Active Rule Block (ARB) stores the inference flow graph corresponding to the ruleblock, which is being currently used. In ARB, the graph is stored in two distinct partitions. The Antecedent Memory (AM) holds the antecedent nodes and the Consequent Memory (CM) holds the consequent nodes of the rulegraph.

The user interface module is initiated by the inference machine and performs the necessary interactions with the users.

In the following sections, we have concentrated only on the architecture of the inference machine. It is felt that this architecture will be particularly useful for expert systems applied to synthesis problem, where there are many parallel tasks to be accomplished to achieve the cud goal. Due to this reason, INFLOW has been applied to CONEX, an expert system for control system design [2] and the improvement in performance has been shown.

Although the concept of INFLOW has got some similarities with the idea of dataflow computers [6], the former includes the necessary interfacing with the "short term database" for storing the history of the deduction path traversed. This feature is essential for expert systems in order to enable them perform backtracking and explanation generation.

The salient features of INFLOW include its capabilities of avoiding unnecessary evaluation of antecedents, of firing more than one rule, if required, and of incorporating pipelined computation. Thus INFLOW offers computational parallelism and also the effort in pattern matching to find the triggered rules is reduced.

A. Basu et al. I Parallel inference machine architecture 269

2. Formulation of the architecture

2.1. Definitions and notations

In this subsection a few definitions and notations, related to production rules, are presented. These notations have been used frequently in subsequent discussions.

Let a rulebase be denoted by R b. A rulebase consists of production rules, each being of the form

IF (antecedent field) THEN (consequent).

Hence, a rule r k E R b c a n be denoted by an ordered tuple (A k, Ck ), where A ~ denotes the antecedent field of the rule rk and Ck denotes the consequent of rk.

The firing of a rule generates new facts or modifies the existing ones. The facts are generated by instantiation of different object-attribute pairs to different values. An object- attribute pair has been denoted by 0~ and has been represented in the SDB (Fig. I ), as a doublet ( f , s) , where f is the identification of a frame representing an object and s is the identification of a slot denoting an attribute of f.

Example. Let MARY be an object represented by the frame MARY. Let AGE be a slot in the frame MARY. The instantiation of the object-attribute pair (MARY, AGE) to the value 20 represents the fact that "The AGE of MARY is 20".

Definition 2.1.1. An antecedent field A k of a rule r k is a conjunction of individual antecedents, a~, where each antecedent is a triplet consisting of a predicate symbol P~, an object-attribute pair 0 i and a value v,..

Thus A k = {a i : i E 1 . . . imam}, where ima~ is the maximum number of antecedents in the antecedent field A k, and ai = (P~, 0~, v~).

For some predicate symbols, the value may be omitted in the antecedent construct. Example. Let us consider a rule having the following antecedent field. IS (MARY(AGE),20) AND KNOWN (MARY (ADDRESS)). In this case, the

antecedent field A I ={a l , a2} , where a 1 = (IS, (MARY, AGE) , 20) and a 2= (KNOWN, (MARY, ADDRESS) ).

It may be noted that the first antecedent a~ consists of the predicate symbol IS, the object-attribute pair (MARY, AGE) and the value 20, but the second antecedent a 2 consists of only the predicate symbol KNOWN and the object-attribute pair (MARY, ADDRESS).

The consequent c k of a rule r k specifies the action to be taken when the rule is fired. The action may be indicated by a consequent operator which operates on a number of arguments. For example, a consequent operator CALL may specify some external program to be executed; in that case the identification of the program is the argument of the operator CALL. As the action specified by the consequent operator is executed, some object-attribute pairs are instantiated. From the viewpoint of the SDB, the values resulting from the action propagate to some object-attribute pairs to instantiate them to new values. Thus new facts

are generated.

Definition 2.1.2. A consequent, c k, is a tuple consisting of a consequent operator, opk, an argument list, arg k, and a destination list, d k, which is a set of destination object-attribute pairs O k . The consequent operator indicates the action to be taken over the arguments and the value generated as a result of the action updates the destination object-attribute pairs.

270 A. Basu et al. / Parallel inference machine architecture

Thus c k = (opk, argk, dk ) where dk = {Ok} = {(fk, Sk))"

For some consequent operators the destination field may be empty if the operator does not produce any value.

Example. Let us consider the following two consequents,

(1) SET (ADULT, MARY (CLASS)) (2) DISPLAY (MARY (AGE))

In the first example cl = (SET, ADULT, { (MARY, CLASS) } )

where opl = SET arg~ = ADULT

and d, = {(MARY, CLASS).

In the second example c 2 = (DISPLAY, (MARY, AGE))

where op2 = DISPLAY and arg~ = (MARY, A G E ) .

In c2, the destination field is empty and it is also evident that the argument list may contain values as well as object-attribute pairs.

Definition 2.1.3. An antecedent cluster, AC, is a set consisting of all the antecedents belonging to all the rules in the rulebase.

Thus AC -- (ai : ::lk, r k E R b and a s E Ak} •

Definition 2.1.4. A consequent cluster, CC, is a set of all ordered tuples (rk, Ck), where c k is the consequent of the rule r k.

Thus CC = {(rk, Ck): Ck is the consequent of the rule r k and rk ~ Rb}.

Definition 2.1.5. An antecedent a i is said to be activated at some instant if the object- attribute pair 0 i associated with ai is modified at that instant.

Example. Let us consider the following three rules.

r: IF GT(SOMEBODY(AGE), 20) THEN SET(ADULT, SOMEBODY(CLASS))

r.: IF IS(SOMEBODY(CLASS), ADULT) THEN SET(MATURED, SOMEBODY(MIND))

rk: IF IS(SOMEBODY(CLASS), CHILD) THEN SET(IMMATURED, SOMEBODY(MIND)).

Here

r , , = ( A , , , , c , )


where

Am = {ax} = {(GT, (SOMEBODY, A G E ) , 20)}

Cm = ( SET, A D U L T , { (SOMEBODY, CLASS) } )

A n = {a2} ---- {(IS, (SOMEBODY, CLASS) , A D U L T ) }

c . = ( SET, M A T U R E D , { (SOMEBODY, MIND ) } )

A k = { a 3 } = {(IS, (SOMEBODY, CLASS) , CHILD)}

c k -- (SET, I M M A T U R E D , { ( SOMEBODY, MIND ) } ) .

In this case, if r m is fired and c m is executed, then the object-attribute pair (SOMEBODY, CLASS) is updated and hence A n and A k are activated.

It may be noted from the above example, that as the rule r m fires, both a 2 and a 3 are activated. But , since the object-attribute pair (SOMEBODY, CLASS) is instantiated to the value A D U L T , only a 2 will be evaluated to be true. Thus, only a subset of the enabled antecedents may be evaluated to be true, while others are evaluated to be false.

Definition 2.1.6. A rule r k is sa!~:~ to be enabled when all the antecedents a i E A k are evaluated to be true.

In the previous example, only rule r, is enabled after the rule rm is fired. ¢

Definition 2.1. Z A confli6fset, CS, of rules can be defined as a set of enabled rules, such that the antecedent field-of one subsumes the antecedent fields of the rest.

As a special c~b., when only one rule is enabled at any point of time, the conflict set reduces to a singleton with the enabled rule as the only member.

Example. Let us consider two rules r m and r 2 where

rl: IF X AND Y A N D Z THEN C t r2: IF X AND Y T H E N C2. Here (X AND Y A N D Z) is At and (X AND Y) is A 2.

In this case CS = {r t, r2} forms a conflict set. Again let us consider another rule r 3 along with r I and r 2 where

r3: IF X AND Y T H E N C 3. Here (r~, r 2, r3} forms a conflict set.

2.2. A conceptual framework

The set of rules in the rulebase Can be represented as a directed bipartite graph, where the antecedent cluster and the consequent cluster are the two partitions.

The architecture of the proposed inference machine is structured around the concept of an inference flow graph (Fig. 2), which is now being described.

272 A, Basu et ai. / Parallel inference machine architecture

Definition 2.2.1. An antecedent node, n°i, represents an antecedent a i E AC.

As mentioned in the previous subsection, only a subset of the activated antecedents may be evaluated to be true. Hence, a truth value node may be associated with each antecedent a~. Such truth value nodes will receive a "true" token if the corresponding antecedent is evaluated to be true. Otherwise they will receive a "false" token.

Definition 2.2.2. A truth value node, n,~, represents a node for storing the truth value associated with an antecedent a~ E AC.

Definition 2.2.3. A consequent node, n~j, represents a tuple (rj, c~) ~ CC.

Definition 2.2.4. the set of nodes

(i) N = N~ U

An inference flow graph, I, is defined as a tuple ( N, E) where N represents and E represents the set of edges defined as follows: NtUN~

where No = { noi: N,-- {n,A Nc = {n,j}

a, E AC}

(ii) E = E 1LIE 2

where E I = {(n°~, nt~): nai ~ 31, and n, E Nt} and E 2 is a subset of { N t x No}.

Definition 2.2.5. The enabling count, ec(nc~ ), of a consequent node is the in-degree of the node nor

2.3. Operations on the graph

In Fig. 2, the antecedent nodes have been shown by solid circles, the truth value nodes have been shown by void circles and the consequent nodes have been shown by oblongs.

Conceptual processors (shown by dotted rectangles) have been associated with each node and distributed in three levels to perform dedicated operations over the nodes.

Let the sets of processors at level 1, level 2 and level 3 be designated by PE1, PE2 and PE3, respe~ rely, such that PEI~ denotes the processor attached to the ith node at level 1. Following these notations, the functions of each processor is described below.

{Initially, for all i, ] "n~j has not seen a true token from n, . Any node "sees" a true token when a truth value "true" arrives at it.}

PEI~: repeat (i) if no~ is activated

then begin (ii) evaluate n°,.;

(The predicate P~ of a~ is operated on its arguments O~ and vj to obtain the truth value w~lich is either a "true" or a "false" token)

(iii) send the truth token to n~;; end

forever.


ANTECEOENT NODES

I I I I i i I IpE 1 i__ _ 3 ' ! L _ _.J L_ - J L__ --.,W

'_~-I'I:I'UTH ~ VALUE ( ~ NODES ~ _ -

I" I------'1 f" . . . . I i- -I I I

(,

I --~" "1 I I

I i I I I ipE 2 - - J - - . L ; - . i

CONSEQUENT NODES

I ' - -'1 ! IPE 3 I,. _[..J

Fig. 2.

PE2~: repeat (i) if n . contains "true" token

then begin (ii) form the set C. = {ncj: n~,j is adjacent to n.};

(iii) if "ncj has not seen a true token from n,/ ' then

begin (iv) decrement ec(ncj); (v) Vn~j E C assert "ncj has seen a true token from n . "

end (vi) consume the true token of n,

end

(vii) (viii)

(ix)

else begin Vnc~ E t~ do

begin if "ncj has seen a true token from n . " then

begin increment ec(ncj); assert "n~j has seen a true token from n ." end

consume token of n. ; end

forever.

274 A. Basu et al. I Parallel inference machine architecture

PE3 fi repeat (i) if ec(nc) = 0

then begin (ii) fire the consequent cj;

(iii) set ec(ncj) to the original value of ec(ncj); Vnti such that (n . , ncj) E E 2 do

(iv) assert "ncj has not seen a true token from n . " end forever.

Thus each processor in level 3 (PE3) fires a single consequent, generating one or more results. These results activate zero or more antecedent nodes. Since more than one processor in the PE3 array can work simultaneously, a number of antecedent nodes can be activated simultaneously. Since the activated antecedents are treated by the PE1 processor array, the activated antecedents can be evaluated simultaneously. Also, while PE1 and PE2 are busy with evaluation of antecedents, PE3 can evaluate the consequents of already enabled rules. Thus the inference flow graph allows the processing of a stream of facts in a pipelined manner and also at each level parallel treatment of the individual packets can be carried ot~t, on their arrival at the nodes.

3. The architecture

Although the concept of inference flow graph resembles that of the data flow graphs, a major difference exists between the architectural requirements of the two. Unlike data flow architecture, updating the database becomes a necessary feature of an inference machine for expert, systems. This is necessary for storing the history of inferencing, required for the purpose of explanation generation as well as for enabling the inference machine to backtrack whenever necessary.

3.1. Representation of the database

The Short Term Database, SDB, is represented by a set F consisting of frames. Each frame f ~ F represents an object and consists of triplets (s, v, gr) where s ~ S (the set of attributes or slots), v is a value of the attribute s and gr is the set of antecedent identifications such that gr = {id(ar): ar = (Pi, Or, or) and 0 r = ( f ' , s ' ) for some f ' E F, s ' E S}.

Here id(a~) is a unique identification attached to a t. Example. Referring to the three rules cited in the previous example, the frame SOME-

BODY of SDB can be {(AGE, 35, g~), (CLASS, ADULT, g2)} where gl = {id(al)} = {id((GT, (SOMEBODY, AGE) , 20))} and g2 = {id(a2), id(a3)}

= {id((IS, (SOMEBODY, CLASS), ADULT)) , id((IS, (SOMEBODY, CLASS), CHILD))}.

It may however be noted that such an instantiation of the SOMEBODY frame enables the second rule r , , but does not enable other two rules.

3.2. The inference machine architecture

A schematic diagram of the inference machine architecture is shown in Fig. 3.


i I | b ' .Fu - ~ A . 1 I I

I i " . . . . . . "1 I I

I A A B I I i I i

, "I, 1 t i S O B i I I I I

L . . . . J

I A T B I I I

p L~ - - - - ..i. - - - - - - - J

I I I

. . . . I J i ! i

t i

~ - - - ~ - - - ~ T V B

r . . . . ' - I

I

"- - - -~ /Z~-- - -~cPs k _ _ I' J

i i i ...... i~'~SET~ L -J

, , - - - - ~ . . . . ~RPB

Fig. 3.

In this diagram, the different processors have been shown by rectangles with continuous line and the different memory elements are shown by dotted rectangles. PESET1 and PESET2 represent the two sets of processing element arrays. The three levels of processing elements shown in the inference flow graph correspond to PESET1, ASU and PESET2 respectively in Fig. 3.

3.2.1. The memory elements AM: The Antecedent Memory, AM, stores all the antecedent nodes in the inference flow

graph corresponding to the rulebase. Each entry in AM consists of the tuple {n,~, g'(na~) } where g'(not) represents the list of nc/s adjacent to n,.

CM: The Consequent Memory, CM, stores all the consequent nodes in the inference flow graph. Each entry in CM consists of a tuple {opt, argo, d~, ec(ncj), m), where opt is a consequent operator, argt is the associated argument list, dj is the set of destinations of the result, ec(ncj) is the enabling count of the consequent ncj and m is a duplication of the original value of ec(ncj).

The rulebase is transformed into the corresponding inference flow graph by means of a 'rule to graph' translator. The antecedent nodes and the consequent nodes of the generated graph are separately stored into the antecedent memory and the consequent memory respectively.

AAB: The Active Antecedent Buffer, AA8 , stores the tokens corresponding to the active antecedents; each token, AA i, being a tuple {v t, gi) where vi is the value of the updated slot/slots resulting in the activation of the antecedent and gt is the list of antecedents activated by the updation of the slot in Short Term Database (SDB).

276 A, Basu et ai. I Parallel inference machine architecture

ATB: The Antecedent Token Buffer, A T B stores the packets, A T i, such that A T r is a tuple (Pr, val'attri, or, n,i)" Here P~ is the predicate associated with the antecedent a i, val- attri is the current value (in the SDB) of the object-attribute pair 0~, associated with a~ E AC, or is the value associated with a/ and n a E iV, such that (n,, , n a) E E~ where no~ is the antecedent node corresponding to a r.

TVB: The Truth Value Buffer, TVB, stores the packets TV~ such that TV~ is a tuple (t, destn r, na) where t E {TRUE, FALSE} and destn i = {ncj: (na, ncj) E E2}.

CPB: The Consequent Packet Buffer, CPB, is a temporary store of the consequent packets associated with the rules which have been fired. Each consequent packet, CP r, is a set of tuples (op r, argr, dr), corresponding to the conflict sets.

RPB: The Result Packet Buffer, RPB, is a temporary store of the result packets, R P r, generated by the execution of the consequent operators. These packets flow to the short term database and update the latter. Each R P r is a tuple (o/, 0 r) where o~ is a value and 0 r is an object-attribute pair to be updated in the SDB.

3.2.2. The processors The operation of the different processors on the memory elements described are being

enumerated below. $1 : The processor S1 is a --heduler which extracts the packets from A T B and schedules

them to the free processors available in the PESET1 array. The functions of S1 can be described as:

repeat if A TB is not empty

then begin select a packet A 7',. from A TB; find a free processor PElj; if a free processor is found

then schedule the packet to the free processor;

end forever.

PEIj: The processing element array PESETI consists of the several processing elements PEI~s. When S1 schedules a packet to a free PEI~, the processing element cycles through the following stages of action:

repeat take a packet (Pr, val-attr~, v r, na) scheduled to it (if any) by S1; evaluate the predicate Pr with the arguments val-attr r and v r to find the truth value t; form the packet (t, destn~, na); send the packet to TVB;

forever. ASU: The Activity Store Update unit, ASU, is a processor dedicated to the tasks of

fetching packets from TVB and updating the enabling counts of the consequents in CM. Along with the complete enabling of some of the consequent nodes, a number of conflict sets of the enabled rules are generated. ASU forms the largest possible sets of the consequent nodes corresponding to the generated conflict sets. Consequent packets CP~ are then formed by A S U from these sets of consequent nodes and are despatched to CPB.

• ,~. Basu et al. I Parallel inference machine architecture 277

The ASU functions are shown in the following procedure:

repeat if TVB is not empty

then begin take packets(TV~) from TVB; for each ncj E destn~ do

begir~ if t = "TRUE"

then begin if "ncj has not seen a true token from n ."

then begin ec(ncj) = ec(nc~)- 1; assert "n~j has seen a true token from n."; if ec(nc) = 0 then

begin reset ec(ncj) to initial value m; update the corresponding conflict sets; for each n. such that (na, n~j) E E2 do assert "n~j has not seen a true token from ha"; end

end; end;

else if "n~j has seen a true token from n,;" then

begin assert "ncj has not seen true token from n."; ec(n,j) = ec(n,l ) + 1; end;

form the packets (CPt) from the conflict sets; send the consequent packets to CPB; end; end forever.

$2: The processor S2 fetches packets from CPB and schedules them to the free processing elements PE2j in the processing element array PESET2. Apart from this, $2 also performs the task of rule selection,

While carrying out inferencing in a rule based expert system, different search strategies, such as depth first, breadth first, heuristic strategies etc. are adopted for selecting the rules to be fired. The processor S2 applies a specified strategy to select the tuple from the packets CP~, corresponding to the rules to be fired and accordingly schedules the tuple to the processing elements.


The function of $2 can be represented as:

repeat if CPB is not empty then

begin take a consequent packet(CPi) from CPB; select tuple from CPi according to the selection strategy specified; for each of the selected tuples do

begin find a free processor (PE2j); if a free processor is found then schedule the selected tuple to PE2~; end;

end forever.

PE2j: The processing element PE2j executes the action specified by the operator in the scheduled CP~ to generate a result. The functions of PE2j can be represented as follows:

repeat if a packet CP~ is scheduled to PE2~ then

begin apply the operator op~ on argument a,g~ and obtain the value or; form the packets RP~ = (v~, 0~) where 0t E d~; send the packet RP i to R IB ; end

forever.

DBU: The processor DBU is the database updatcr and it works as follows:

repeat if RPB is not empty then

begin Take a packet RP a = (v~, 0 i) from RPB; if 0 i = (f, s)

then begin Replace the tuple (s, o, g~) E f by (s, v i, g~); for each ai such that id(a, )E g~ do

begin Form the packet A A i = ( v~, id(a~) ) ; Send the packet to A A B ;

end; end;

end; forever.

A. Basu et ai. I Parallel inference machine architecture 279

AFU: This processor forms the antecedent token packets AT~ and despatches them to ATB. The procedure of action is:

repeat if AAB is not empty

then begin Take a packet AA i from AAB; Form the packet A T i = ( e l , ve l 'a t t r i , vi , na); Send the packet A T~ to A TB; end

forever.

4. An i l lustration o f the inference f low

In order to demonstrate the inference flow in the proposed architecture, let us consider a rulebase consisting of the following three rules.

(1) IF IS (SKY (CONDITION), CLOUDY) AND

IS (TIME (MONTH), JULY) THEN SET (HIGH, RAINFALL (POSSIBILITY))

(2) IF IS (DRAINAGE (CONDITION), POOR) THEN SET (HIGH, WATERLOGGING (POSSIBILITY))

(3) IF IS (RAINFALL (POSSIBILITY), HIGH) AND

IS (WATERLOGGING (POSSIBILITY), HIGH) THEN SET (ALERTSIGNAL, DIVISION (PUMPING))

Following the notations presented earlier, the rulebase is represented as R b = {r t, r 2, r3} where

r t = ( A , , c , ) r 2 = (A2, c2) r 3 = (Aa, c3).

Also A 1

A2 A3

= { a , , a2} = { (IS, (SKY, CONDITION), CLOUDY), (IS, (TIME, MONTH), JULY) } = {a3} = { (IS, (DRAINAGE, SYSTEM), POOR) } = {a4, as} = {(IS, (RAINFALL, POSSIBILITY), HIGH),

( IS, ( WATERLOGGING, POSSIBILITY), HIGH ) }

c l = (SET, HIGH, ( (RAINFALL, POSSIBILITY) } ) c 2 = ( SET, HIGH, ( (WATERLOGGING, POSSIBILITY) } ) c 3 = (SET, ALERTSIGNAL, { (DIVISION, PUMPING) }).

The above set of rules translates into the inference flow graph shown in Fig. 4.


no I

~'f1(

%II

ntz(

( ) )

°°3 t

C nc 2

.o4q nasq

nvs(

\ / 1

Fig. 4.

In the inference flow graph nat = a~, no2 = a 2, n~3 = a 3, na4 = a 4 , h a s = a s and nc~ = c t, /'/c2 = C2' /'It3 = C3'

Moreover, E 1 = {(n,, 1, n,1 ), (n,, 2, n,2), (no~, n,3), <ha4, h i4 ) , (has , /~ t s>}

and E2= {<n,, n,.,), (n,2, nct >, <n,3, n,2), (n,,, n~3), <n,s,n~3)}.

Hence the Antecedent Memory, AM, consists of

{(ha,, g'(na,)}, (ha2, g'(no2)), (no3, g'(n~3)), (n~,, g'(no,)), (nos, g'(nas)) },

where g'(no~) and g'(na2 ) both contain the identification of no1, g'(noj) contains the identification of n~2, g'(na4 ) and g'(nas ) both contain the identification of ncj.

The Consequent Memory, CM, initially contains {(no1 , 2, 2), (no2, 1, 1), (ncj, 2, 2)} where the integer pair in each tuple designates the current enabling count of the consequent node and the original value of the enabling count respectively.

At the start of inferencing, let us assume that the facts IS (SKY (CONDITION), CLOUDY), IS (TIME (MONTH), JULY) and IS (DRAINAGE (CONDITION), POOR) hold in the short term database. Hence at this stage, A AB contains the packets (CLOUDY, id(al)), (JULY, id(a2)}, and (POOR, id(aj)). The processor AFU takes these three packets from the AAB, forms the packets (IS, CLOUDY, CLOUDY, n,l ), (IS, JULY,

A. Basu et al. / Parallel inference machine architecture 281

JULY, n,2 ), (IS, POOR, POOR, n,3 ) and despatches these packets to the ATB. The processor S1 receives the packets from ATB and schedules them to PEll, PE12 and PE13 respectively, if they are free. These processors evaluate the truth values of the arriving packets and generate, in turn, the packets (TRUE, nc~, n,t ), (TRUE, no1, n,2), (TRUE, no2, nta ). These packets are then sent to TVB. The processor ASU receives the first packet from TVB and decrements the enabling count, ec(nc~), to the value 1. Next ASU takes the second packet and decrements ec(nct ) to zero. As soon as eC(nc~) decrements to zero the packet (SET, HIGH, { (RAINFALL, POSSIBILITY)}) is formed and despatched to the buffer CPB and ec(n~l ) is reset to the value 2. Similarly, the third packet is taken from TVB and when the enabling count ec(nc2 ) reduces to zero, a consequent packet is sent to CPB. The processor $2 receives the consequent packets and schedules them to PE21 and PE22 respectively. PE2~ fires the consequent packet (SET, HIGH, {(RAINFALL,POSSIBILI- TY)}) and sends the result packet (HIGH, {(RAINFALL, POSSIBILITY)}) to RPB. Similarly, the result packet (HIGH, {(WATERLOGGING, POSSIBILITY)}) is sent to RPB by PE22. The Database Updater, DBU, takes the first packet from RPB and updates the frame-slot pair (RAINFALL, POSSIBILITY), in the SDB, to the value HIGH. Similarly the destination (WATERLOGGING, POSSIBILITY) is updated to the value HIGH by DBU. Moreover, DBU sends the packets (HIGH, id(a4) ) and (HIGH, id(as) ) to AAB.

A similar cycle follows for the next level of inferencing to fire the third rule and the frame-slot pair (DIVISION, PUMPING) is updated to ALERTSIGNAL.

Thus it is seen that the rules at the first level of inferencing (that is, the first two rules) are selected simultaneously and also that they can be fired simultaneously, provided the processing elements are available. Also it may be noted that more than one antecedent can be compared simultaneously improving the speed of inferencing. Moreover, let us suppose that the consequent of the first rule is evaluated earlier than that of the second. Then the inference machine can overlap the evaluation of the activated antecedent, (IS, ({RAIN- FALL, POSSIBILITY}), HIGH) with the evaluation of the consequent of the second rule.

This demonstrates the inbuilt facilities of pipelining and parallel consequent evaluation of the proposed architecture. It may also be noted that instead of comparing all the antecedents of a rule, the antecedents are evaluated only when the corresponding frame-slot pair is updated in the SDB. Such "instantiation directed" antecedent evaluation reduces the number of antecedent comparisons to a large extent.

$. Synchronisation

In order that the proposed architecture can carry out functional computations and inferencing it is necessary to achieve synchronization both at the processor level and at the inference level. Processor level synchronization. It is apparent from Fig. 3 that each of the buffers AAB, ATB, TVB, CPB and RPB are shared by two processor modules. Moreover, one of the processors has write access to the buffer while the other has read access.

In order to synchronize the operations over these buffers, a dedicated processor is associated with each of these buffers to accomplish 'Producer - Consumer' type of synchronization [9]. The dedicated processors are explicitly shown for TVB, CPB and RPB, by hashed rectangles in Fig. 3. Inference level synchronization. In order to elucidate inference level synchronization the following definitions are presented.

282 A. Basu et al. I Parallel inference machine architecture

Definition 5.1. The ith wavefront, WF~ of an inference flow graph is defined as:

WF o = set of data approaching the antecedent partition of the graph as a result of the initiation process.

WFi = set of data approaching the antecedent partition of the graph as a result of the passage of WF~_~ through the graph, once completely and thereby updating the short term database.

Definition 5.2. The inference flow is said to be at inference level i if WF i is in the process of flow through the rulegraph.

In order to appreciate the necessity of inference level synchronization, the following example is cited.

Example. Let us consider the following three rules. (1) IF A AND B AND C THEN X (2) IF X AND Y THEN Z (3) IF A AND X THEN NOT(Y).

Let us suppose that initially A, B, C and Y are true, at the (i - 1)th inference level. Now if the ith level is allowed only after ( i - 1)th level is completed then the following result is obtained.

luference level Inference Status i A & B & C ~ X A , B , C , X , Y

i + 1 X & Y ~ Z A , B , C , X , Z A & X=),NOT(Y) NOT(Y)

[Here '&' implies 'logical AND']

But if inference level synchronization is not used then the possible inference sequences depend on the relative speeds of the diffeent processors. At level (i + 1) the result obtained may be either NOT(Y) from rule 3, preventing Z from being true or the correct result may be obtained if both th: rules 2 and 3 fire.

Inference level synchronization requires synchronization of the different processors at the end of each level. Thus each processor must know when an inference level is over.

When the inference level i starts, WF~ is approaching the antecedent partition. Let A be the set of antecedent nodes activated due to level (i - 1) and let C be the set of

consequent nodes enabled due to the evaluation of the set of antecedent nodes A. In order to assure that level i is over, it must satisfy the following criteria.

(i) The number of antecedent tokens prepared by AFU and sent to ATB is IAI. (ii) The number of antecedent tokens processed by the processing elements PESET1 is

IAI. (iii) The number of consequent tokens formed by ASU and sent to CPB is [C[. (iv) The number of consequent tokens processed by PESET2 together with the con-

sequent tokens enabled but not yet processed, is equal to I CI. (v) All the SDB updates necessary at level i due to the firing of the consequents are

complete.

At the implementation level, the first four criteria may be satisfied by using counters. But, for the fifth criterion, it is not useful to employ counters, because it is not guaranteed that all consequent activations will produce results that will update the SDB. In that case synchronization may be accomplished by sending end markers to the buffers. Thus counters coupled with end markers provide a satisfactory solution to the inference level synchronization problem.


156

t z

8

t~

Ge tat

i

s S ~

Ss S

_ . /

/" ,

0 .0 18.8

/ /

/ fs

s ~

/ I

I

• ~ " P

/

s S s

TIME ( % } ~ ,..

Fig. 5.

. . . . . . SEQUENTIAL

I N FLOW

j s j " s

/ J

s S j J '

I g

100.0

6. Simulation results

The proposed architecture has been simulated on an HP-9000 computer with UNIX operating system. The simulation has been carried out in a quasiparallel environment under the supervision of a global scheduler. The individual processors have been simulated by independent procedures and the buffers by queues. The details of simulation stra~egies adopted will be subsequently reported. In this paper, we present some interesting simtflation results to validate the admissibility of the architecture.

A comparative study of performances of INFLOW and a sequential machine proposed in [2] has been made. The simulated sequential machine is a uniprocessor one and executes the tasks of pattern matching and consequent firing in a sequential manner.

The performance study presented is based on a set of computation intensive rules dealing with various matrix operations applied to control system design problems. The ru~es are extracted from the CONEX rulebase [1, 2]. The performance study has yielded the following results.

6.1. Antecedent comparisons

For the same problem, the number of antecedents evaluated was counted for both the machines at regular time intervals. Fig. 5 shows the graphs for both the machines with the number of antecedents plotted against elapsed time, normalized with respect to the time taken by the sequential machine.

From Fig. 5, it is apparent that the number of antecedent,,~ needed to be compared by INFLOW is much less compared to that by the sequential machine. Moreover, the higher slope of the INFLOW curve suggesis the higher speed of paLtern matching. Thus, for the particular problem chosen, INFLOW had to compare only 68 antecedents while the sequential model had to compare 156 antecedents. Moreover., the pattern matching process was complete for INFLOW within 18.8% of the time required by the sequential machine.

284 A. Basu et al. / Parallel infere, ce machine architecture

This improvement in performance of INFLOW is due to its "instantiation directed" nature of pattern matching. In the sequential machine, on the other hand, all the antecedents of the rules containing the active antecedents are compared giving rise to the extra comparisons. Moreover, more than one active antecedent can be compared in parallel in INFLOW resulting in faster pattern matching.

6.2. Consequent evaluation

Fig. 6 depicts a plot of the number of rules fired against time for both the sequential model and INFLOW. Time has been normalized with respect to the total time elapsed in the sequential model to reach the end inference for a specific problem.

The following features of Fig. 6 may be noted. (i) The number of rules fired by both the machines is the same, although the steep

nature of the INFLOW curve shows that the rate of rule firing is much higher for INFLOW. The higher rate of rule firing is partly due to the higher speed of pattern matching and partly because more processors are being able to cater to the enabled consequents. Also the ultimate inference has been drawn by INFLOW at a much shorter span of time.

(ii) The graphs for both the machines show a few flat regions. This is due to the pattern matching activity over the antecedents.

Since, for synchronization, INFLOW requires all the active antecedents, at the same inferencing level, to be evaluated first, the initial set up time for INFLOW is greater than that of a sequential machine. This is evident from the initial flat regions of Fig. 6. However, once INFLOW sets up its flow, this extra latency is compensated for by the faster firing of the rules.

(iii) The relatively large flat region of the sequential machine can be explained as follows. During this period, the inference machine is busy with the pattern matching task to find

t P, t d

.T

t n i . i

18

0 . 0

! t

I ,/

/ I

,t

/ ,t

,r

I 38.0

/ i I

I t

I t

I ! t

I t

t I

/ j ,

S J

j J J

INFLOW

. . . . . . . . S E Q U E N T I A L

T I M F__ f % |

Fig 6

1 0 0 . 0


the rules to be fired. Since the sequential machine is less efficient in finding out the "true" antecedents, the time spent to find them out is more for the sequential machine than that for INFLOW.

However, the actual length of the flat regions and the slopes of the curves will vary with problems and the inherent parallelism in the rules. Hence these curves merely serve to demonstrate the increased efficiency of INFLOW compared to the sequential model.

7. Conclusion

This paper essentially presented the architecture of INFLOW and the schemes of operation of the different modules. For the hardware organization and implementation, the following measures are suggested.

The Short Term Database ( S D B ) can be implemented by Content Addressable Memory ( C A M ) , accessed by the identification of the frame-slot pair. The antecedent memory and the consequent memory should be of direct access type to allow faster access. Each of the buffers should be large enough to cope with the speeds of the processors communicating with it. However, a more detailed simulation analysis is required to determine the optimum sizes of the buffers. Along with each processing element ( P E ) there should be local memory, sufficient in size, to store one token. In order to reduce the idling time of A F U , there should be a local buffer associated with it.

Although this paper refrains from presenting the hardware details, the simulation results presented prove the utility of the architecture for reducing the pattern matching overhead and faster inferencing.

The proposed architecture of I N F L O W is restricted to production systems without any variables and quantifications over variables. Facilities of variable unifications need to be incorporated to allow a wider application of INFLOW.

Re~ren~s

[1] A. Basu and A.K. Majumdar, An organization of the rulebase in expert systems for control system design and analysis, Proc. of the Second International Conference on Advances In Pattern Recognition and Digital Techniques, Calcutta (1986).

[2] A. Basu, A.K. Majumdar and S. Sinha, An expert system approach to control system design and analysis, IEEE Transactions on Systems, Man and Cybernetics, 18 (5) (1988) 685-694.

[3] R. Davis and J. King, An overview of production systems, in Machine Intelligence 8 (1977) 300-332. [4] R. Davis and J. King, Knowledge Based Systems in Artificial Intelligence (New York, McGraw Hill, 1982). [5] D. Degroot, Restricted AND parallelism, Proc. of International Conference on Fifth Generation Computer

Systems (1984) 471-478. [6] J.B. Dennis, Dataflow super computer, IEEE Computer 13 (11) (Nov. 1980) 48-56. [7] R. Fikes and T. Kehler, The role of frame based representation in reasoning, CACM 28 (Sept. 1985) 904-920. [8] A. Gupta, Parallelism in production systems: The sources and expected speed up, Tech. Report, Dept. of

Computer Science, CMU (1984). [9] P.B. Hansen, Operating System Principles (P.H., 1973).

[10] S. Haridi and A. Ciepieleweski, Execution of Bagof on the OR-parallel token machine, Proc. oflnt. Conf. on Fifth Generation Computer Systems (1984) 551-560.

[11] F. Hayes-Roth, Rule based systems, CACM 28 (Sept. 1985) 921-932. [12] N. Ito et al., Dataflow based execution mechanism on parallel and concurrent Prolog, New Generation

Computing3 (1985) 15-41. [13] J. McDermott, RI: A rule based configurer of computer systems, Artificial Intelligence 19 (1) (1982) 39-88. [14] E.H. Shortliffe, Computer Based Medical Consultations: MYCIN (New York, Elsevier, 1976). [15] Special Issue on Knowledge Representation, IEEE Computer (Oct. 1983). [16] D.A. Waterman and F. Hayes-Roth (Eds.), Pattern Directed Inference Systems (Academic Press, 1979).

design and simulation of a parallel inference machine architecture for rule based systems

Documents