some knowledge-acquisition methods for prospector-like systems

6
Some knowledge-acquisition methods for Prospector-like systems Vladimir Ma ik and Zden k Kouba The paper deals with the knowledge-acquisition methods designed and tested under the FEL-EXPERT Project, which is aimed at the development of rule-based diag- nostic shells. Three different approaches have been used for knowledge acquisition: pattern-recognition, decision- tree, and intensional, pure probabilistic approaches. The designed methods may be applied to a wide subclass of rule-based diagnostic systems that exploit the pseudo- Bayesian model for uncertainty handling. The experi- mental results are discussed. Keywords: knowledge acquisition, expert systems, methods, uncertainty handling, rule-based systems Knowledge acquisition is generally recognised as a bottleneck of current expert systems applications. This is also the authors' experience gathered from develop- ing shells and expert systems applications under the FEL-EXPERT Project 1 in the last decade. The FEL- EXPERT shell has been successfully applied in differ- ent areas, such as medical diagnosis, genetic counsel- ling, technical diagnosis, and chemistry. The FEL-EXPERT systems are diagnostic rule- based shells that process both uncertain knowledge and uncertain data. The basic philosophy of the uncertainty evaluation is similar to that of the Prospector expert system17. That is why the knowledge-acquisition meth- ods described in this paper may be used by a wide subclass of Prospector-like rule-based systems. It is assumed that the knowledge representation in such systems is based on the production rules of the E --->H type: Faculty of Electrical Engineering, K 335, Czech Technical Univer- sity, Technick~t 2, 166 27 Prague 6, Czechoslovakia Paper received 28 January 1991. Revised paper received 11 July 1991 IF <evidence E> THEN <hypothesis H> WITH <probability P (H/E)> ELSE <hypothesis H> WITH <probability P(H/'E)> where evidence E and hypothesis H are propositions and P(H/E) and P(H/'E) are subjective conditional probabilities. By preparing knowledge bases in different areas, the problems of eliciting/acquiring and formulating quite different types of expert knowledge and formalizing it in the corresponding knowledge-base syntax had to be faced. Often such a situation has been met when no expert knowledge but rather a set of binary data (i.e., the data describing an example have the form of two- valued ones) as examples has been available. In such cases it is necessary to induce the knowledge in the form of rules from this data set. Three approaches have been applied within the FEL- EXPERT Project, based on: • methods used in pattern recognition for feature ordering purposes a decision-tree construction (Quinlan's ID3 algo- rithm) and conversion of the tree into the proper knowledge-base notation • the intensional graph-probabilistic approach to uncertainty information processing The purpose of this paper is to describe briefly each of these approaches and to present particular results obtained as well as experience gathered. The goal is not to develop new, efficient methods in the area of pattern recognition, machine learning, or statistical inference, but rather to demonstrate possible applications and modifications of the methods developed in these specific research areas for knowledge-engineering practice. A proper integration of the specific methods studied in the areas of pattern recognition or machine learning with the knowledge representation/exploration methods has always been the focus of the authors' attention. Vol 4 No 4 December 1991 0950-7051/91/040225-06 © 1991 Butterworth-Heinemann Ltd 225

Upload: vladimir-marik

Post on 26-Jun-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Some knowledge-acquisition methods for prospector-like systems

Some knowledge-acquisition methods for Prospector-like

systems

Vladimir Ma ik and Zden k Kouba

The paper deals with the knowledge-acquisition methods designed and tested under the FEL-EXPERT Project, which is aimed at the development of rule-based diag- nostic shells. Three different approaches have been used for knowledge acquisition: pattern-recognition, decision- tree, and intensional, pure probabilistic approaches. The designed methods may be applied to a wide subclass of rule-based diagnostic systems that exploit the pseudo- Bayesian model for uncertainty handling. The experi- mental results are discussed.

Keywords: knowledge acquisition, expert systems, methods, uncertainty handling, rule-based systems

Knowledge acquisition is generally recognised as a bottleneck of current expert systems applications. This is also the authors' experience gathered from develop- ing shells and expert systems applications under the FEL-EXPERT Project 1 in the last decade. The FEL- EXPERT shell has been successfully applied in differ- ent areas, such as medical diagnosis, genetic counsel- ling, technical diagnosis, and chemistry.

The FEL-EXPERT systems are diagnostic rule- based shells that process both uncertain knowledge and uncertain data. The basic philosophy of the uncertainty evaluation is similar to that of the Prospector expert system 17. That is why the knowledge-acquisition meth- ods described in this paper may be used by a wide subclass of Prospector-like rule-based systems.

It is assumed that the knowledge representation in such systems is based on the production rules of the E ---> H type:

Faculty of Electrical Engineering, K 335, Czech Technical Univer- sity, Technick~t 2, 166 27 Prague 6, Czechoslovakia Paper received 28 January 1991. Revised paper received 11 July 1991

IF <evidence E> THEN <hypothesis H > WITH <probability P (H/E)> ELSE <hypothesis H > WITH <probability P(H/'E)>

where evidence E and hypothesis H are propositions and P(H/E) and P(H/'E) are subjective conditional probabilities.

By preparing knowledge bases in different areas, the problems of eliciting/acquiring and formulating quite different types of expert knowledge and formalizing it in the corresponding knowledge-base syntax had to be faced. Often such a situation has been met when no expert knowledge but rather a set of binary data (i.e., the data describing an example have the form of two- valued ones) as examples has been available. In such cases it is necessary to induce the knowledge in the form of rules from this data set.

Three approaches have been applied within the FEL- EXPERT Project, based on:

• methods used in pattern recognition for feature ordering purposes

• a decision-tree construction (Quinlan's ID3 algo- rithm) and conversion of the tree into the proper knowledge-base notation

• the intensional graph-probabilistic approach to uncertainty information processing

The purpose of this paper is to describe briefly each of these approaches and to present particular results obtained as well as experience gathered. The goal is not to develop new, efficient methods in the area of pattern recognition, machine learning, or statistical inference, but rather to demonstrate possible applications and modifications of the methods developed in these specific research areas for knowledge-engineering practice. A proper integration of the specific methods studied in the areas of pattern recognition or machine learning with the knowledge representation/exploration methods has always been the focus of the authors' attention.

Vol 4 No 4 December 1 9 9 1 0950-7051/91/040225-06 © 1991 Butterworth-Heinemann Ltd 225

Page 2: Some knowledge-acquisition methods for prospector-like systems

PATTERN-RECOGNITION APPROACH

Two algorithms have been developed, namely:

• algorithm A, based on the results of the training process of the linear classifier

• algorithm B, based on the Fukunaga-Koontz pro- cedure 3 for feature ordering

The 'input' of these algorithms is a training set T containing binary (algorithm A) or numerical (algo- rithm B) data; as a result of both algorithms the E --+ H rules with corresponding weights (probabilities) are obtained.

The brief description of both algorithms only takes the dichotomy case into account. (Only two classes in the training set or two final hypotheses in the knowl- edge base to be constructed are considered, respec- tively.) It is possible to generalize these algorithms for more classes/final hypotheses cases by a simple decom- position of the task.

Algorithm A

(1) The binary training set T is converted into the training set T* by replacing the zeros with the values ' -1 ' . The training process of the 'classical' linear classifier 2 is performed. Let q0, q~, , - . , qn be the classifier weights set up as a result of this training; denote q = [q0, . . . , qn]. The goal of the training process is to set up the classifier weights qo, • • • qn in such a way that q.x > 0 for class 1 (hypothesis H,) and q.x < 0 for class 2 (hypothesis/-/2) is achieved in the maximum number of cases belonging to training set T*.

(2) The vector q is being normalized:

qi. _ qi k.max Iq,I

where for the constant k it usually holds that 1.1 < k < 1.4.

(3) The a priori probabilities of the propositions Ei are considered to be proportional to the 'frequency' of the presence of the corresponding features in the training set. Analogically, the a priori probabilities of the final hypotheses H, and H2 may be computed from their frequency in the training set.

(4) As the vectors x are binary, the weights qi (or q~*) are proportional to the influence of the ith proposition E~ (ith feature) on the final hypotheses H 1 and//2. If ]q~*l > Q, where Q is a given threshold, two rules E~--+ H,, E~--+ Ha are inserted into the knowledge base. The probability measures P(H/E) and P(H/ 'E) are estimated in the following way:

For q~* > 0: P ( H J E ) = P(H,) + q** [1-P(H,)I; P(HJ'E) = (l-q,*) P(H,); P ( H J E ) = P(H2) • (1--qi*); P ( H J ' E ) = P(H2) + q,*[1-P(H=)].

Similar formulas may be derived for the case of q** < 0.

Algorithm B (1) The Fukunaga-Koontz feature ordering procedure 3 is applied to the subsets T~ (T~ contains all the examples from the training set belonging to class 1) and T2 (class 2). As a result two sequences of eigenvalues of normal- ized covariance matrices are obtained:

0 ~ 2~,in ~- . . . . . 2~ a ~ 2~il ~- 1

(2) For 1)~ i > Q , where Q is a given threshold, Q > 0.5, put qi = 1)~i' For 2k i > Q, put qi = -2Li.

(3)-(5) These steps are analogical to steps (2)-(4) of algorithm A.

(6) The Foley-Sammon method 4 may be applied to derive efficient synthetic features (as a linear combi- nation of the original ones) and to add new rules to the knowledge base.

Experiments and results Both algorithms A and B have been used to support the development of several knowledge bases in the area of genetic counselling.

Algorithm A was used by constructing the knowl- edge base POLYD1 for differential diagnosis of genetic syndromes accompanied by the polydactyly. Eight syn- dromes (final hypotheses) were considered. The train- ing set consisted of 57 cases, each represented by a ll3-dimensional binary vector. By application of the knowledge base POLYD1, the coincidence ratio 80.6% with human medical experts was reached. The knowl- edge base POLYD1 is robust and also works properly with uncertain data (although being induced by binary data).

Algorithm B was applied by solving the problem of evaluating phenyl-alanin tests with the goal of detecting in-born metabolic diseases. A small portion of expert experience but a set of 255 real cases were available. Each case has been represented by 12 values of bio- chemical entities obtained during the four-hour loading tests (12 numerical features). As a result of the appli- cation of algorithm B, some of the values (features) have been discovered to be irrelevant (they were not included in the knowledge base at all). On the other hand, some new, synthetic features with a strong discriminative power have been discovered. The knowledge base METABOL-A (24 nodes of the infer- ence net, 28 rules, six goals), which was constructed by algorithm B, was tested on 315 cases 1. The knowledge base, called METABOL-AC (26 nodes, 28 rules, four goals), having been slightly improved by physician experts, reached the decision correctness 99.7% and has been in routine use at the Diagnostic Centre for Metabolic Diseases, Prague, Czechoslovakia since August 1987.

DECISION-TREE APPROACH The tree-construction algorithm makes use of ideas presented by Quinlan 5'6. The efficiency of the decision

226 Knowledge-Based Systems

Page 3: Some knowledge-acquisition methods for prospector-like systems

N

I H I

N

+ E 5 E 3

- - I . 2 H H

2 I

Figure 1. Example of decision tree

H 3

)

Figure 2. Inference net corresponding to decision tree

tree built by Quinlan's algorithm depends mainly on the splitting criterion used at each step of the tree- construction process 7 as well as on the character of the data at hand. Not only the original Quinlan's criterion, but also four other criteria (two of them based on entropy computations) have been tested, evaluated, and compared in the authors' experiments 8.

Three methods of converting the decision tree into a knowledge base have been developed. One of these methods, which gives the best results, is now described. An example of the decision tree is shown in Figure 1. The edges are labelled in accordance with the absence/ presence of attributes E~, E2, E 3 , the intermediate nodes are labelled NI, N2, N3, and the leaves H~ 1, H2, n 3 , H~ 2 correspond to the goal hypotheses H1, /-/2, /-/3.

The rules of the knowledge base are constructed in such a way that each rule corresponds exactly to one possible path leading from the top to any leaf of the decision tree. In the example, the following four rules can be extracted from the decision tree:

E1 ---~/-/1 "El & E2 & "E3 ~/- /1 "E 1 & E 2 & E 3 ~ n 2 "El & "E2---'/-/3

The corresponding inference net is drawn in Figure 2. The weights of the rules can be estimated statistically

from the sizes of the sets corresponding to the decision- tree nodes. This simple approach gives good results if a sufficient number of examples with suitable distribu- tion is available.

If there are more classes or features, the induced trees could be complicated and could lead to complex knowledge-base structures. In such cases, use of simple

pruning algorithms such as 'Reduced-Error Pruning '7,9 has given good results.

Experiments and results

The knowledge base POLYD2 (solving the same prob- lem as that mentioned in the second section and making use of the same training set) has been developed by the decision-tree approach described in this section. The decision correctness of the decision tree was 75%, the first 'draft' of the knowledge base (after converting the tree into the knowledge-base structure) providing a somewhat lower decision correctness (68%). The knowledge base was later 'tuned' by experts (the efficiency reached was 88%). After adding some rules created by algorithm A, the final decision efficiency of 93% has been achieved.

The decision-tree approach has been used by con- structing a set of 13 knowledge bases covering the problem of diagnosis of ship diesel engines. A large portion of the knowledge was given by the experts in the form of the decision trees - the algorithms of automatic conversion of these trees into the knowl- edge-base notation have been used. The decision cor- rectness exceeded 93%.

PROBABILISTIC APPROACH

One of the authors' attempts to construct inductively the knowledge base from a given set of examples (training set) is based on a purely statistical approach.

First, a short introduction is given into the uncer- tainty-processing mechanism used by the FEL- EXPERT system.

Let El, • • . , Eu be a set of evidences that affects the classification of an object by one of the goal hypotheses G1, • • . , G~. Denote the observation of the world by the symbol E'. The user provides the expert system with the probability P(E/E') of the proposition E~ under the given state of the world (observation) E' for each evidence Ej. The task to be solved by the infer- ence engine is to compute the a posteriori weights P(G/E') of the goal hypothesis for i = 1 , . . . , N.

Each rule involved in the knowledge base has the form E---, H (P(H/E), P(H/'E)). Both parameters define the conditional probability distribution and can be estimated by making use of a given training set. Unfortunately, they do not provide full information on data dependencies encoded in the training set, e.g., the dependence of any hypothesis on couples, triplets, etc. of pieces of evidence.

Prospector-like expert systems (including the FEL- EXPERT system) cannot process this information. They compose simple dependencies to approximate more complex ones. For example, the probability P(H/ E1 = true, Ez = false) will be evaluated by composing marginal conditional probabilities P(H/E 1 = true) and P(H/E2 = false) under the assumption of conditional independence between E1 and E2 given H.

To process efficiently the information provided by the training set, rules of the type:

E I & . . . & E M A x ~ H

Vol 4 No 4 December 1991 227

Page 4: Some knowledge-acquisition methods for prospector-like systems

accompanied by parameters describing the correspond- ing conditional distribution P(H/E1 . . . . , EMAX) have been introduced into the shell. It seems to be reason- able to choose an upper limit for MAX because of the reliability of parameter estimates and combinatorial explosion of parameters.

The more complex dependencies can be evaluated by composing the simpler ones using the maximum entropy principle TM. This composition can be solved generally by the iterative proportional fitting procedure 1°,2°. If the structure meets the so-called decomposable model, however, analytical formulas solving this problem are known.

There are two basic tasks to be solved:

• Extending the existing inference mechanism of the FEL-EXPERT system to process the rules with the conjunction on the left-hand side.

• Developing a proper method of search for the set of rules extracting maximum information from the given training set and finding the structure of an acceptable decomposable model.

The solution of the first problem is easy in the case that any evidence Ej is evaluated by a categorical probabil- ity P(E/E') = 0 or P(E/E') = 1. In such a case the set of parameters accompanying the rule contains a value expressing the a posteriori weight of the hypothesis H for the given observation E'. Otherwise, it is necessary to interpolate the parameters by some proper function f, such that:

P ( H m ' ) = f( e ( e , m ' ) , . . ., P (EKm' ) ), 1 ~< K ~< MAX

What requirements have to be met by the function f?

• If the weights of K - 1 propositions P(E/E') = P ( E j ) , j = 1 . . . . , l - 1, l + 1, . . . , K a r e fixed, then the function f(P(E0, . . . , P(Et-O, P(E/E'), P(Et+~), • • . , P(EK)) depends on one variable only and has to correspond to the standard Prospector's interpolation function.

• If the weights of all evidences E i, ] = 1 . . . . , l - 1, l + 1, . . . , K, are arbitrary but constant and func- tion f depends on the weight of one evidence P(E/ E') only, then the function has to be monotonic on any interval <0, P(E~)>, <P(Et), 1> of its definition range.

Briefly, the solution for a simple case MAX = 2 is sketched. The definition range of the function f(P(E1/ E'), P(E2/E')) has been split into four parts called S/S, S/N, N/S, and N/N domains (N = necessity, S = sufficiency). The following representation of the function f(*) has been chosen:

P(H/E') = a.P(EJE') + b.P(E2/E') + c.P(E,/E').P(E2/E') + d

Parameters a,b,c,d can be computed for every domain S/S, . . . , N/N using known values of this function in four fixed points of the particular domain 2x. These points are given by the expert. For example, in the S/S domain the values in the following points are known:

P(EJE') P(E2/E') P(H/E')

e(E, ) e(e ) e ( m 1 P(E2) P(H/EO P(E1) 1 P(H/E2) 1 1 e ( H m , / , E 2 )

The function f(*) defined above satisfies the above requirements and, moreover, has an interesting prop- erty. If E1 and E2 are conditionally independent given hypothesis H and, moreover, the hypothesis H and the observation E' are conditionally independent, then the vector of parameters a,b,c,d is identical in any of the areas S / S , . . . , N/N.

Also the problem of finding a proper dependency structure that could be processed by the extended inference engine had to be solved. Some data analysis techniques - namely, the concept of decomposable models - have been used for this purpose in the authors' work. The decomposable model seems to be a promising formalism, which has been also widely explored in the famous work of Lauritzen and Spiegelhalter 12. The authors' research has been inspired and influenced by that of Hfijek a9' Havrfinek 13 and Havrfinek and Edwards TM before Lauritzen and Spiegelhalter 12 had published their work.

There is a significant difference between the authors' approach and that of Lauritzen and Spiegelhalter. While the latter tended to refine the a priori given dependency structure by satisfying the requirements of the inference engine, the authors' approach is oriented towards searching for the best decomposable model, without taking any a priori dependency structure into account.

Here some basic terms are introduced that are being used in the field of data analysis to explain the method for inductive knowledge-base construction.

The structure of the dependence among factors (propositions in this case) may be described by so- called log-linear models TM. For example, for a three- factor case, the log-linear model looks like this:

logp(A,B,C) = u ° + u A + u a + u c + u AB + u Bc + u Ac + u ABc (**)

Parameter u ° is always nonzero; the other parameters u s (S denotes the set of factors) express how the probability p(A,B,C) is affected by the mutual inter- action of factors.

The presence and/or absence of some u parameters in the expression (**) defines the particular depend- ency structure. The term 'model' means the depend- ency structure in the following text. It can be tested by statistical means whether or not this particular model fits the given data sufficiently.

The aim of the data analysis is to find the simplest model that fits the given data well. The set of all possible log-linear models is usually reduced to simplify the task.

Class of hierarchical log-linear models 1°

A log-linear model belongs to the class of hierarchical models, iff for any nonzero parameter uS<>O it holds

228 Knowledge-Based Systems

Page 5: Some knowledge-acquisition methods for prospector-like systems

that any parameter u v (V is a subset of S) is nonzero too. It means that the hierarchical model is uniquely defined by the set of all maximal sets of factors for which the corresponding u parameter is nonzero. This set is called a generating sentence and its elements are called generators.

Class of graph modeW 5

For every hierarchical model, a so-called first-order interaction graph can be constructed. The set of vertices of the graph corresponds to the set of all factors. There is an edge between any two vertices, iff the u parameter indexed by the corresponding pair of factors is nonzero.

A hierarchical model is called a graph model iff the set of all vertices forming any maximal clique (i.e., complete subgraph) of the corresponding first-order interaction graph is equal to some generator of the model. It means that any graph model is uniquely represented by the set of maximal cliques of its first- order interaction graph.

Decomposable models 1~

A graph model is called a decomposable model iff its first-order interaction graph is a chordal (i.e., triangulated).

Consider a model M defined by its generating sen- tence M = { S j ; ] = 1, . . . , h} . If the model M is decomposable, the estimate of the parameters of the corresponding joint probability distribution that satis- fies the maximum entropy principle can be made without making use of the iterative procedure. Knowl- edge of marginal distributions P(Sj) for all generators Sj is necessary, as is sufficient input information to do it. For example, if M = { {A,B}, {B,C} } then:

P(A,B,C) = P (A ,B) . P(B,C) P(B)

This feature of decomposable models is important because it enables the construction of particular rules. Each rule will correspond to a generator of a model. All parameters accompanying the rule can be estimated using the training set. According to the limited number of conjuncts forming the left-hand side of the rule, it is necessary to restrict attention to a subclass of decom- posable models consisting of production models only. A production model is defined as a decomposable model that has no generator of a cardinality higher than MAX + 1.

The crucial question is how to find the proper production model that fits the given training set well. It is not possible to test all models to find the best fitting one. The results presented by Havr~inek and Edwards 13'~4 have been used to construct an algorithm searching for the proper production models. The main principle of this algorithm is that of coherency.

A partial ordering is introduced on the class of all graph models of a given dimension by the definition:

M1 < M2 iff E~ E2

Table 1. Results of classification using FEL-EXPERT system against original decisions by expert

T, T2 T3 T4 T5 T6 %

T1 144 0 0 0 0 0 100 T2 0 47 15 0 0 0 75.8 T3 0 1 112 15 0 0 87.5 T4 0 0 17 110 15 0 77.5 T5 0 0 0 19 65 0 77.4 T 6 0 0 0 0 0 16 100

where El, E2 are sets of edges of corresponding first- order graphs.

The coherency principle says that if Mx < M2 and M1 fits the given data well, then M2 also fits them well and it is not necessary to test them. Analogically, if M2 does not fit the given data well then model M~ can be refused without testing it. The number of statistical tests may be reduced efficiently by use of the coherency principle.

The following experiment in the banking environ- ment has been done. The information on about 576 objects (customers) was available. Each of the objects was described by 10 binary features and classified into one of six classes (differing in the permissible height of a credit). The knowledge base HCREDIT was con- structed by making use of the method described above. The maximal length of the left-hand side of the rules was limited by MAX = 2. Each object from the train- ing set was classified using the FEL-EXPERT system with the improved inference mechanism. The results of the classification were compared with the original decisions made by the expert. The results are presented in Table 1. Each entry of Table 1 shows how many objects belonging to the class corresponding to the row number were classified by the expert system into the class defined by the column number.

CONCLUSIONS

Many deep theories and sophisticated algorithms have been developed in the areas of pattern recognition, machine learning, and statistical data analysis in last two decades. Especially, the machine-learning algo- rithms are often presented as programs that could induce efficient real knowledge bases easily. But exper- imental testing of, for example, machine-learning algo- rithms is not the same as their practical use in real knowledge-acquisition procedures connected with a particular expert system shell. This paper has presented some ways and results of the application of different methods in real tasks of knowledge acquisition oriented towards Prospector-like diagnostic shells.

The knowledge-acquisition methods described in this paper are not aimed to develop perfect, user-ready knowledge bases, nor to offer a complete solution. They merely help in constructing the first 'draft' of a knowledge-base structure (or its part) when no expert knowledge but a large set of examples is available.

All the methods are 'blind', mechanical ones in the sense they are unable to understand the data, to discover essential concepts and causal links among them, or to formulate deep knowledge. They discover shallow knowledge in the data only. That is why the

Vol 4 No 4 December 1991 229

Page 6: Some knowledge-acquisition methods for prospector-like systems

knowledge bases are usually not structured enough: the resulting pieces of inference nets have the depth 1 (or 2 if the decision-tree approach is being used). But the experiments have demonstrated that a good final efficiency may be achieved by a slight improving/tuning of the knowledge bases or by adding a small piece of knowledge from the other source (perhaps obtained by the other method).

The applicability of the algorithms based on both the pattern-recognition methods and the intensional approach is restricted to problems with a limited number of features - the computation time demands grow rapidly with the growing number of features.

The decision-tree approach is sensitive to the proper choice of the training set. A single wrong example may result in a complicated but inefficient tree structure. If there are more classes or features, a mechanical con- version of the tree into the knowledge-base notation may bring a large set of rules with 'long' logical combinations on the left-hand sides. In such a case the final solution may be sensitive to noise. Hence methods of decision-tree pruning 7 may be used with advantage.

It is rather complicated to use the pattern-recog- nition approach when considering a greater number of classes (the methods are applicable to dichotomic sub- tasks only). Algorithm A is restricted to linear-separa- ble tasks, but it may be considered as an attempt to use simple neuronal nets for knowledge-acquisition pur- poses. More complicated nets may be used in the future. Algorithm B may result in discovering 'syn- thetic' features with a high discriminative power. The applicability of these features may be limited by not being quite understandable to the field users.

The probabilistic approach is the only one that also respects the dependencies among the features. This approach requires consideration of new types of the nodes in the inference net (representing combinations of two, three, and more features). These make the knowledge-base syntax much more complicated.

The pure probabilistic approach reflects the modern trend in machine learning to improve the 'naive' use 16 of Bayesian formulas (assuming the independence of attributes) by 'semi-naive' application when certain dependencies among attributes are considered and expressed in the knowledge base.

In spite of the criticisms, it has been proved that the knowledge-acquisition methods designed may bring good and applicable results. The limitations must be taken into account by the knowledge engineer.

REFERENCES

1 M a ~ , V et al. 'Application of the FEL-EXPERT System in the diagnosis of genetic diseases' in Carson, R E, Kneppo, P and Krekule I (eds) Advances in Biomedical Measurement Plenum Press, USA (1988) pp 465-475

2 Duda, R O and Hart, P E Pattern Classification and Scene Analysis John Wiley, USA (1973)

3 Fukunaga, K and Koontz, W L G 'Application of the Karhunen-Loeve expansion to feature selection and ordering' IEEE Trans. Comput. Vol 19 No 4 (1970) pp 311-318

4 Foley, D H and Sammon, J W 'An optimal set of

discriminant vectors' 1EEE Trans. Comput. Vol 24 No 3 (1975) pp 281-289

5 Quinlan, J R 'Induction over large data bases' HPP-79-14 Computer Science Department, Stan- ford University, Stanford, CA, USA (1979)

6 Quinlan, J R 'Learning efficient classification pro- cedure and their application to chess end games' in Michalski, R S, Carbonell, J G and Mitchell, T M (eds) Machine Learning: An Artificial Intelligence Approach I Morgan Kaufmann, USA (1983) pp 463-482

7 Mingers, J 'An empirical comparison of pruning methods for decision tree induction' J. Mach. Learning Vol 4 (1989) pp 227-243

8 M a ~ , V and Zdrtihal, Z 'Decision tree construc- tion by induction' Report No K335-90-15 Czech Technical University, Prague, Czechoslovakia (1990) (in Czech)

9 Quinlan, J R 'Simplifying decision trees' Int. J. Man-Mach. Stud. Vol 27 (1987)pp 221-234

10 Bishop, Y M M, Fienberg, S E and Holland, P W Discrete Multivariate Analysis: Theory and Practice MIT Press, USA (1975)

11 Darroch, J N, Lauritzen, S L and Speed, T P 'Markow fields and log-linear interaction models for contingency tables' Ann. Stat. Vol 8 (1980) pp 522-539

12 Lauritzen, S L and Spiegelhalter, D J 'Fast manip- ulation of probabilities and local representations - with applications to expert systems' in Proc. AI- Workshop on Inductive Reasoning Roskilde, Den- mark (April 1987)

13 Havr~tnek, T 'A procedure for model search in multidimensional contingency tables' Biometrics Vol 40 (1984) pp 95-100

14 Havr~inek, T and Edwards, D 'A fast procedure for model search in multidimensional contingency tables' Biometrika Vol 72 (1985) pp 332-351

15 Edwards, D and Kreiner, S 'The analysis of contin- gency tables by graphical models' Biometrika Vol 70 (1983) pp 553-566

16 Kononenko, I 'Semi-naive Bayesian classifier' in Kodratoff, Y (ed) Proc. EWSL-91 (Lecture Notes in Artificial Intelligence Vol 482) Springer-Verlag, Germany (1991) pp 206-219

17 Duda, R O, Hart, P E and Nilsson, N J 'Subjective Bayesian methods for rule-based inference systems' TN 124 SRI International, Stanford, CA, USA (1976)

18 Guiasu, S and Shenitzer, A 'The principle of maxi- mum entropy' The Mathematical Intelligencer Vol 7 (1985) pp 42-48

19 H~jek, P 'Combining functions in consulting sys- tems and dependence of premises (a remark)' in Proc. Artificial Intelligence and Information - Con- trol of Robots - 84 North-Holland, The Netherlands (1984) pp 163-166

20 Jirou~ek, R 'A survey of methods used in probabi- listic expert systems for knowledge integration' Knowl.-Based Syst. Vol 3 No 1 (1990) pp 7-12

21 M a ~ , V, Kouba, Z and Zdrlthal, Z 'Knowledge acquisition experiments under the FEL-EXPERT Project' in Plander, I (ed) Arttficial Intelligence and Information-Control Systems of Robots North-Hol- land, The Netherlands (1987) pp 327-331

230 Knowledge-Based Systems