the role of description languages in inductive concept...

10
I SBAI - UNESP - Rio Claro/SP - Brasil The Role of Description Languages in Inductive Concept Learning Maria do Carmo Nicoletti Universidade Federal de São Carlos /ILTC/IFQSC-USP Departamento de Computação Via Washington Luiz, km 235 Caixa Postal 676, 13565-905 - São Carlos, SP e-mail:[email protected] Maria Carolina Monard 1 Universidade de São Paulo/ILTC Instituto de Ciências Matemáticas de São Carlos Departamento de Ciências de Computação e Estatística Caixa Postal 668, 13560-970 - São Carlos, SP e-mail: [email protected] Abstract In Machine Learning systems the languages employed for describing instances and concepts can vary in representational power - the representative powér of these languages defines the boundaries of the learning processo A great deal of work on inductive learning has employed propositional calculus (or some variation), because ofits well-defined syntax and semantics . Des- pite their success, propositional learning systems are limited by the language used to describe examples and concepts as well as by the very limited and inexpressive form that background knowledge can be used in the le arning processo Attempts to overcome some of the representa- tional limitations include: • allowing the use of backgrãund knowledge (domain theory) and • providing a more powerful concept description language, namely the language of logic programs The learning of logic programs from exa mpl es is a new paradigm in the field of Machine Le arn ing, known as Inductive Logic Programming - ILP, which consists basically in inducing first-order logic programs from examp les and backgrounel knowledge. This work is concerned with the discussion of the main limitations related to propositional repr esentation in machine learning anel with the presentation of some principies of ILP. 1 Introduction In order to alleviate the knowledge-acqnisition bottleneck for the construction of Expert Systems, it became apparent that new techniques capable of constructing ruIes from examp Ies could consi- derabIy ease this problem. The Iast decade has seen a great expansion of such techniques, which are gathered under the name of Machine Learning - M-L - anel considered a sub-branch of Artificial Int elligence - AI. The field of Machine Learning aims to deveIop methods and techniques to automate the acquisition of new information, new skills and new ways of organizing existing information. Several tasks have JWork partially supported by National and State Research Council - CNPq and FAPESP. - 403-

Upload: trinhdan

Post on 23-May-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

I SBAI - UNESP - Rio Claro/SP - Brasil

The Role of Description Languages in Inductive Concept Learning

Maria do Carmo Nicoletti Universidade Federal de São Carlos/ILTC/IFQSC-USP

Departamento de Computação Via Washington Luiz, km 235

Caixa Postal 676, 13565-905 - São Carlos, SP

e-mail:[email protected]

Maria Carolina Monard 1

Universidade de São Paulo/ILTC Instituto de Ciências Matemáticas de São Carlos

Departamento de Ciências de Computação e Estatística Caixa Postal 668, 13560-970 - São Carlos, SP

e-mail: [email protected]

Abstract

In Machine Learning systems the languages employed for describing instances and concepts can vary in representational power - the representative powér of these languages defines the boundaries of the learning processo A great deal of work on inductive learning has employed propositional calculus (or some variation), because ofits well-defined syntax and semantics. Des­pite their success, propositional learning systems are limited by the language used to describe examples and concepts as well as by the very limited and inexpressive form that background knowledge can be used in the learning processo Attempts to overcome some of the representa­tional limitations include:

• allowing the use of backgrãund knowledge (domain theory) and

• providing a more powerful concept description language, namely the language of logic programs

The learning of logic programs from examples is a new paradigm in the field of Machine Learn ing , known as Inductive Logic Programming - ILP, which consists basically in inducing first-order logic programs from examples and backgrounel knowledge. This work is concerned with the discussion of the main limitations related to propositional representation in machine learning anel with the presentation of some principies of ILP.

1 Introduction

In order to alleviate the knowledge-acqnisition bottleneck for the construction of Expert Systems, it became apparent that new techniques capable of constructing ruIes from exampIes could consi­derabIy ease this problem.

The Iast decade has seen a great expansion of such techniques, which are gathered under the name of Machine Learning - M-L - anel considered a sub-branch of Artificial Intelligence - AI.

The field of Machine Learning aims to deveIop methods and techniques to automate the acquisition of new information, new skills and new ways of organizing existing information. Several tasks have

JWork partially supported by National and State Research Council - CNPq and FAPESP.

- 403-

-I SBAI - UNESP - Rio Claro/SP - Brasil

been addressed in the ML literature, such as learning from examples, learning heuristics, learning by observation, etc. ML is also centered around different paradigms, among them the symbolic concept-oriented learning paradigm.

The most common paradigm for symbolic learning is inductive learning from examples, using propo­sitional calculus (01' some variation) to represent both training examples and induced hypotheses (OI' concepts).

Despite their success, propositionallearning systems are limited by the language used to describe examples and concepts as well as by the very limited and inexpressive form that background knowledge can be used in the learning processo

Although many difficulties need to be overcome, there is an increasing interest in systems which induce first-order logic programs from examples. This new trend in the field of Machine Learning is known as Inductive Logic Programming anel is elefineel as [6]

Inductive Logic Programming = Logic Programming n Induction

In ILP systems the language used to represent examples, background knowledge and concept des­criptions are typically subsets of first order logic - namely logic programs. In fact, learning in ILP can be considered a logic program synthesis.

The main objective of this work is to discuss the main drawbacks inherent to propositional re­presentation in ML and to introduce some principIes of a more powerful framework for learning, llamely ILP.

The work is organized as follows : In Section 1 the basic concepts related to inductive learning are introduced; in Section 2 the ad vantages anel limitations of an attribu te- based language are discussed; Section 3 presents ILP as all inductive inference paradigm and points out its representative power as well as some necessary cOllstraints aiming to improve its operationativeness; Section 4 presents our conclusiollS and some guidelilles for future research work .

2 Inductive Learning

The most widely adopteel anel studied paraeligm for symbolic learning is known as inductive learning from examples and consists of inelucillg a general concept description from a given set of examples - positive examples - anel (usually) counterexamples - negative examples - of the concept.

The learning task consists in building a concept description - hypothesis - from which all the positive exemples can be rederived by universalinstantiation but none of the previous negative examples can be rederived by the same process [2]. In order to define formally illductive learning from examples, it is necessary first to introduce some basic concepts [1] [6] [7] .

Definition 2.1 lf U is an universal set of objects, a concept C is a subset of objets in U.

Learning a single concept C means to acquire the ability of recognizillg whether x E C, Vx E: U anel can be stateel as:

-Definition 2.2 Given the seis E+ anel E- of positive anel negative examples respectively of a con-cept C, finel a hypothesis 1{, eXp'ressed in (/. given concept elescription language [, such that:

- 404-

I SBAI - UNESP - Rio Claro/SP - Brasil

• every positive example e E C+ is covereJ2 by 1t • no negative example e E C- is covered by 1t

To test the coverage, a function

covers(1t,e)

can be defined, such that

cove7's(1t,e) = tnle if eis covered by 1t cover s(1t,e) = f ais e otherwise

In arder to deal with sets of examples, the function covers can be redefined as:

covers(1t, C) = { e E E I covers(1t,e) = true} .

It should be noticed that the definition of the function covers will depend on the examples and concept description languages, discussed in next sections.

However, for any given set of positive and negative examples, generally a potentially infinite number of hypotheses can be generated that covers ali the positive examples and no negative examples; it is therefore necessary to constrain the space of possible inductive hypotheses using some additional information, such as providing a prior domain knowledge, known as background knowledge (ar domain theory). In the presence of background knowledge, the learning task can be formally defined as:

Definition 2.3 Given the set C = C+ U C- of positive and negative examples respectively of a concept C, a background knowledge B, finel a hypothesis 1t, expressed in a given concept description language L, such that 1t is complete and consistent with respect to the background knowledge B anel the examples C.

The function covers redefined in arder to take into account the background knowledge B becomes

covers(B,1t,e) = covers(B U 1t,e)

cove7's(B,1t,í') = covers(B U 1t,C)

The induced hypothesis 1t should satisfy two requirements: it should be complete and consistent with respect to the set of examples C. In the presence of background knowledge, the concepts of completeness and consistency of a hypothesis 1t can be stated as :

Definition 2.4 A hypothesis 1t is complete with respect to background knowledge B anel examples C if ali the positive examples are covered i.e., if covers(B,1t,C+) = C+.

Definition 2.5 A hypothesis 1t is consistent with respect to background knowledge B and examples C ifno negative example is covered i.e., ifcove1's(B,1t ,C-) = 0.

In a computeI' system the learning task cannot go beyond the limits defined by the languages it employs. The languages in which examples, hypotheses and background knowledge are expressed characterize the scope and define the limits of the learning task . Next sections disCllSS the role of the description languages in the learning processo

2 A concept description covers an object description if the descriptioll of the object satisfies the descriptioll of lhe cOllcept.

- 405-

I SBAI - UNESP - Rio Claro/SP - Brasil

3 Attribute-based Language

Generally machine learning systems employ formallanguages for describing instances and concepts, referred to as instance description language and concept description language respectively.

In order to represent instances - training examples - many of the existing inductive learning algorithms use an attribute-based language. The concept description language used for expressing the induced hypotheses, which are typically production rules 01' decision trees, can be treated as variants of attribute-based languages.

Ail attribute is a possibly relevant characteristic of the concept tobe learned and training examples are described as vectors of pairs attribute3-value and a class label. In an attribute-based formalism, the collection of attributes used to describe the examples is fixed. Each example belongs to one of a set of mutually exclusive classes. The attributes are often discrete (nominal, ordinal 01' interval) and sometimes it is also provided information about the attributes names and their value ranges. In an attribute-based formalism, the learning tal,ik can be stated as follows:

given a training set of examples expressed as vectors of pairs attribute-value, whose classes a1'e known, find a rule for predicting the class of a new example as a function of its atlribute values.

Among the many advantages of using an attribute-value language, one may find:

• relatively successful systems available • the inductive process is very well understood • relatively simple approach to learning • generally easy to be understood by final users • existence of procedures for dealing with noisy and incomplete data

Among the many systems which employ attribute-based languages, two families of systems, namely TDIDT (Top Down Induction of Decision Trees) and AQ, based on the ID3 [16] and AQ [9] algorithms respectively, have been particularly sucessful.

An illustration of the power of inductive inference is shown in the following example, related to the WILLARD Expert System [10], for predicting the likehood of severe thunderstorms occuring in the central USA.

Table 1, contained within a widely used meteorological manual, shows the three attributes training examples used to determine the expected change of lapse rate. From these examples, the rule generated by inductive inference used only one of the three given attributes. The rule is:

lf the thickness (distance between the two constant pressure surfaces)

1. shrinks, the lapse rate becomes more stable

2. stretches, the lapse rate becomes less stable

3. does not change, the lapse mte does not change

This simple rule, which is correct and consistent with a physical mo dei of the atmosphere, was neveI' discovered by meteorologists, although the table had appeared for years in standard texts.

Apart the valuable predictive contribution of attribute-based hypotheses, it should be noticed that:

3The term feature if often employed as a synonym for att.ribute.

- 406-

I SBAI- UNESP - Rio Claro/SP - Brasil

Vertical Vertical Vertical Change in Divergence Motion Thickness Lapse Rate divergent descencling shrinking more stable divergent ascending shrinking more stable divergent ascending stretching less stable convergent descencling shrinking more stable convergent descending stretching less stable convergent ascencling stretching less stable clivergent none shrinking more stable convergent none stretching less stable divergent ascencling no change no change convergent clescending no change no change none ascencling stretching less stable none clescencling shrinking more stable

Table 1: Table found in NTIS Air Weather Service Manual used in lapse rate determination

• if the training examples are providecl by an unreliable source (e.g. unreliable experts), there may be no hypotheses consistent with the training examples.

• the quality4 of the obtainned hypotheses is a reflection of the representativeness of the training set - the hypotheses will only accurately classify new examples if the attributes and training examples contain information that actually cliscriminates among clifferent classes.

• the selection of the initial set of attributes which clescribes the examples is essential to the construction of a meaningful concept - sometimes cluring this selection it is better to include too many attributes because those that are not relevant will be cliscarcled by the incluctive processo

• an incorrect choice of attributes becomes evident when the addition of new examples to the training set produces contradictions. A contradiction is a set of examples that have the same attribute-value pairs, but express clifferent concepts .

• if the selectecl attributes are irrelevant to the learning task, no learning system will be able to construct a meaningful hypothesis.

It is possible that no ruIe 01' set of rules, within the space of ruIes defined by the description

lallguage, can completely and correctly classify all possibIe situations. This is quite common in

~edical domains, for 'example, where there may be severaI possibIe diseases categories for a given set of observations andwhere further observations (i.e. a more complete description) are not possible

for reasons of time 01' other costs [3].

One attempt to overcome the limited expressiveness of an attribute-based language is through the

use of constructive induction, a process that changes the description space - sometimes the initial

attributes used to describe the examples are not directIy reIevant to the concept but, when conve­

niently combined generating new attributes, canbecome highly representative for the expression of the concept.

Constructive induction can be defined as any form of induction that generates new attributes not

present in the initial training set . Through constructive induction the vocabulary of the description

languages is elllarged - that provokes a change of the illstances and concepts representatiollal spaces.

The process of constructing llew attributes can be automatically performed by the system 01' inte­

ractively directed by the useI'. In either case, depending on the knowledge domain, such construction

call be a very difficult task since it illvolves a search through the space of all possible constructive

operators (used for combining old attributes into new ones).

4 Measured by its predictive power 011 l1ew examples.

- 407-

I SBAI - UNESP - Rio Claro/SP - Brasil

Constructive induction has been used with success in the problem of learning boolean functions with a short DNF5 representation using decision trees asa concept description language [15] [8] [12] [13]. However, due to the fact that it is highly dependent on the knowledge domain, constructive induction still remains an open problem with very few practical results.

Despite their success, attribute-value learning methods (with or without constructive induction) are limited by the language used to describe examples and concepts as well as by the very limi­ted and inexpressive form that. background knowledge can be used in the learning processo An attribute-based language has the same expressive p0wer as a language based on propositiollallogic. Consequently, only those concepts expressible in propositional logic are candidates to be learned by a system which uses an attribute-based language. This strong constraint prevents representing structured objects as well as relations among objects or among its components - relevant aspects of the training instances, which somehow could characterize the concept being learned, cannot be represented. Furthermore, some relational descriptions, stated in an attribute-based language can be lenghty, inexact and sometimes completely incomprehensible by humans. This aspect makes it inappropriate for use in certain domains .

For these reasons in many practical applications attribute-based representations are insufficient. This includes domains where relational descriptions need to be expressed, as in medicai diagnosis.

4 First-order Logic Based Language

In order to overcome the representational limitations imposed by an attribute-based language, learning in representations that are more powerful, such as some variants of first-order logic has received more attention recently. It can be noticed an increasing amount of research on:

• extending the representational power of machine learning systems by using restricted first­order logics as conceptjinstance description language anel

• effectively incorporating background knowledge to induce more compact hypotheses.

The difference in representativeness power between an attribute-based language anel a first-oreler based language can be evidenced through the following sim pIe examples.

Example 4.1 Let A and B be two boolean alt7'ibules. Exp1·cssing that A anel B havc lhe same

value:

• in an attribute-based language

(A = fals e ) 1\ (B ~ false) V (A = true) 1\ (B = true)

• in a first-order language

Example 4.2 Learning the concept of a pair in poker

5Disjunctive Normal Fornl.

- 408-

I SBAI - UNESP - Rio Claro/SP - Brasil

Suit 1 Rank 1 Suit 2 Rank2 Class

<> 7 ... 7 paz1'

• Q <::I Q pau' ... A <> J no_pai1'

• in an attribute-based language

paÍT if (Rank1 = 7 /\ Rank2 = 7) V /

• in a first-order logic based language

(Rank1 = Q /\ Rank2 = Q) V

paÍT if Rank1 = Rank2

First-order logic expressions can be considered a natural way of expressing relations in a concise form as well as a convenient way of describing information related to structured objects,

It is well known that logic programming is a traditional and sound area of research which is based on the use of first-order logic to define computeI' programs, Logic programming can be defined broadly as the use of symbolic logic for the explicit representation of problems and their associated knowledge bases, together with the use of controlled logical inference for the effective solution of those problems. At present, logic programming is generally understood in more specific terms: the problem-representation language is a particular subset (Horn-clause form) of classical first-order predicate logic, and the problem-solving mechanism is a particular form (resolution) of classical first-order inference [5]. One of the practical realization of logic programming is a programming language called Prolog, which can be consiclered a lush resolution theorem prover witha depth first search strategy.

Incluctive Logic Programming is an atempt to integrate the techniques already available and esta­blished for logic programming in a framework oflearning, aiming to incluce first-orcler logic programs from exam pies, using backgrouncl knowledge. In ILP the system 's knowledge consists of examples and background knowledge expressed as a logic program - the expressiveness oflogic programs and the use of background knowledge have promoted ILP as a powerful inductive inference paradigm, Consider next a simple example of ILP problem [6],

Example 4.3 The learning task is to expl'ess the target relation daughtel'(X~ Y), which states that person X is a daughter 01 person Y, in ter'ms 01 the background knowledge r'elations f ernale j 1 and pal'entj2, For the learning 01 relation daughtel'j2 lour ground ' instances (2 positives and 2 negatives) of this relation are given, The tmining examples anel backgrounel knowleelge ar'e shown in Table 2,

Using Horn clauses for the expression of the hypothesis, it is possible to construct the following relation

daughtel'(X, Y) ~ female(X), par'ent(X, Y),

which is consistent anel complete with respect to the backgrounel knowIedge and the training exam­pIes.

- 409-

I SBAI - UNESP - Rio Claro/SP - Brasil

Training examples daughter( mary,ann) . EEl daughter(eve,tom). EB daughter(tom,ann). e daughter( eve,ann). e

Background knowledge parent(ann;mary) . female(ann) . paren t( ann, tom). female( mary). parent(tom,eve) . female(eve) . parent(tom,ian).

Table 2: A simple ILP problem: learning the daughter relation

ILP framework goes beyond the more established empiricallearning framework because of the use of a quantified relationallogic together with background knowledge. It goes beyond the explanation­based learning framework due to the lack of the insist~nce on complete and correct background knowledge [11]. It is important to notice that most researchers have used standard first-order logic as a representation language; in particular, most have used restricted subsets of the Prolog programming language to represent concepts. One advantage of basinglearning systems on Prolog is that its semanti~s and complexity are mathematically well-understood [4] . .

ILP techniques face a challenging problem - how to reduce the hypotheses space so to make the learning problem feasible, avoiding combinatorial explosion during the searching processo

Most systems apply strong restrictions on description languages and background knowledge aiming to limit the hypotheses space. These restrictions end up provoking as an undesirable side effect, a restriction of the space of concepts learnable by the system.

Among the many possible limitations imposed on the description languages employed by some ILP systems, one may find restrictions which would only allow the learning of:

• special kinds of clauses • clauses without functional symbols • non-recursive clauses • clauses with limited number of variables, etc.

The restrictions on background knowledge are practically those related to constraining its limits, allowing only the use of ground models [14]. Even restricted, the background knowledge should be controlled so its use does not provoke combinatorial explosion when searching the hypotheses space.

ILP, as a young area of research, has a long way to go in order to establish itself as a useful paradigm for practical applications. ILP as a promissing and evoluting learning technique, has to invest in reducing the number ofrestrictions imposed on language description and use ofbackground knowledge, in order to become an effective and well established learning paradigm applicable to a large set of problems.

5 Conclusions

In this work we discussed the limitation of propositional approach used in Machine Learning. This limitation can be overcome using systems which induce first-order logic programs from examples, namely Inductive Logic Programming. However, the adoption of a more powerful language de­scription gives rise to many difficulties that need to be overcome, since learning logical definitions requires the exploration of very large space of theory descriptions and, consequetly, restrictions should be imposed in order to make learning a feasible task.

Research work on ILP should be concerned with the development of techniques which would relieve

- 410-

I SBAI c UNESP - Rio Claro/SP - Brasil

ILP from the burden of so many restrictions, without compromizing its efficiency or limiting its applicabili ty.

Acknowledgements: We would like to thank Dr. Nada Lavrac, from Jozef Stefan Institute, Slovenia, for sending us a preprinting version of [6J and Proceedings of The Third International Workshop on Inductive Logic Programming - ILP'93,as well as Dr. Igor Mozetic, for the Pro­ceedings of European Conference on Machine Learning, ECML-93.

References

[lJ Bratko, r. Machine learning. In Gilhooly, K., (ed.), Human and Machine Problem Solving. Plenum Press, New York, 1989.

[2J Carbonell J .G. Paradigms for Machine Learning. Artificial Intelligence 40, pp 1-9, 1989.

[3J Clark, P.; Niblett, T. Induction in Noisy Domains. In r. Bratko & N. Lavrac (eds.) Progress in Machine Learning. Sigma Press, pp 11-30, 1987.

[4J Cohen, W.W. Learnability of Restricted Logic Programs. Proceedings ofthe Third International Workshop on Inductive Logic Programming - ILP'93, Bled, Slovenia, April, pp 1-3, 1993.

[5J Kowalski, R.A.; Hogger, C.J. Logic Programming. Encyclopedia of Artificial Intelligence. S.C.Shapiro; D. Eckroth; G.A. Vallasi (eds.), John Wiley & Sons, New York, pp 544-558, 1987.

[6J Lavrae, N., Dzeroski, S. Inductive Logic Programming Techniques and Applications. To appear.

[7J Lavrai, N., Dzeroski, S. Tllt01'ial on Indllctive Logic Progmmming (ILP). SCAI-93; Stockholm, Sweden, May 1993.

[8J Matheus C.J. Feature Constrllction: an A nalytic Framework and an Application to Decision Tr-ees. Report UIUCDCS-R-89-1559, UILU-ENG-89-1470, December 1989.

[9J Michalski, R.S.; Larson, J. Selection of Most Representative Training Examples and Incre­mentaI Generation of VLl Hypotheses: the Underlyúig Methodology and the Description of Pl'ogmms ESEL and AQ11. Report ISG 83-5, Dept. of ComputeI' Science, University of Illi­nois, Urbana, USA, 1983.

[10J Muggleton, S.H. Inductive Acquisition of Expe'rt J(nowledge. Turing Institute Press, Glasgow, 1990.

[11J Muggleton, S.H . Indllctive Logic Progmmming: Derivations, Sllccesses and Shortcomings. Bradzil , P. (ecl .) Lecture Notes in Artificial Intelligence 667,1993.

[12J Nicoletti, M.C .; Bezerra, P.C.; Monard, M.C. Learning Boolean Fllnctions Using Prime­implicants as Featllres. Proceedings InternatioIlal AMSE Conference on "Signals, Data, Sys­tems", Chicago, USA, September 2-4, AMSE Press, Vol. 1, pp 177-183, 1992.

[13J Nicoletti, M.C.; Monard, M.C. Empirical Evalllation of Two Prllning Methods Applied to Constrllctive Learning of Boolean Functions. Anales deI Primer Congresso Internacional de Informatica, Computacion y Teleinformatica, INFORMATICA'93, Mendoza, Argentina, pp 33-42, junho 1993.

- 411-

I SBAI - UNESP - Rio Claro/SP - Brasil

[14] Nicoletti, M.C.; Monard, M.C. Herbrand lnte'rpretation, Moc/e! and Leosl Moc/e! wilhin lhe Framework of Logic Programming. Notas do ICMSC-USP, Nº2, 30 pg, junho 1993.

[15] Pagallo G., Haussler D. Boolean Fealure Discovery in Empir·ical Learning. Machine Learning 5, pp 71-99, 1990.

[16] Quinlan , J. R. Discovering Rules by Inductionfl'07n Larye Collections of Examp!es. in D. Michie (ed.), Expert systems in the microeletronic age, Edinburgh University Press, 1979.

- 412-