ontology handbook

1. Ontology Learning

Alexander Maedche1 and Steffen Staab2

1 FZI Research Center for Information Technologies, University of Karlsruhe,Germanyemail: [email protected]

2 Institute AIFB, University of Karlsruhe, Germanyemail: [email protected]

Ontology Learning greatly facilitates the construction of ontologies bythe ontology engineer. The vision of ontology learning that we propose hereincludes a number of complementary disciplines that feed on different types ofunstructured and semi-structured data in order to support a semi-automatic,cooperative ontology engineering process. Our ontology learning frameworkproceeds through ontology import, extraction, pruning, and refinement givingthe ontology engineer a wealth of coordinated tools for ontology modelling.Besides of the general architecture, we show in this paper some exemplarytechniques in the ontology learning cycle that we have implemented in ourontology learning environment, Text-To-Onto.

1.1 Introduction

Ontologies are a formal conceptualization of a particular domain that isshared by a group of people. The Semantic Web relies heavily on the formalontologies that structure underlying data for the purpose of comprehensiveand transportable machine understanding. In the context of the SemanticWeb, ontologies describe domain theories for the explicit representation ofthe semantics of Web data. Therefore, the success of the Semantic Web de-pends strongly on the proliferation of ontologies, which requires fast and easyengineering of ontologies and avoidance of a knowledge acquisition bottleneck.Though ontology engineering tools have matured over the last decade, themanual building of ontologies still remains a tedious, cumbersome task whichcan easily result in a knowledge acquisition bottleneck.

When using ontologies as a basis for Semantic Web applications, one hasto face exactly this issue and in particular questions about development time,difficulty, confidence and the maintenance of ontologies. Thus, what one endsup with is similar to what knowledge engineers have dealt with over the lasttwo decades when elaborating methodologies for knowledge acquisition orworkbenches for defining knowledge bases. A method which has proven to beextremely beneficial for the knowledge acquisition task is the integration ofknowledge acquisition with machine learning techniques.

The notion of Ontology Learning [1.1] aims at the integration of a mul-titude of disciplines in order to facilitate the construction of ontologies, in

2 Alexander Maedche and Steffen Staab

particular machine learning. Ontology Learning greatly facilitates the con-struction of ontologies by the ontology engineer. Because the fully automaticacquisition of knowledge by machines remains in the distant future, the over-all process is considered to be semi-automatic with human intervention. Itrelies on the “balanced cooperative modeling” paradigm, describing a coor-dinated interaction between human modeler and learning algorithm for theconstruction of ontologies for the Semantic Web. This objective in mind,an approach that combines ontology engineering with machine learning isdescribed here, feeding on the resources that we nowadays find on the Web.

Organization. This article is organized as following. Section 1.2 introducesthe architecture for ontology learning and its relevant components. In Section1.3 we introduce different ontology learning algorithms that have serve as abasis for ontology learning for the Semantic Web. Section 1.4 describes howwe have implemented our vision of ontology learning in the form of a concretesystem called Text-To-Onto.

1.2 An Architecture for Ontology Learning

The purpose of this section is to introduce the ontology learning architectureand its relevant components. Four core components have been identified andembedded into a coherent generic architecture for ontology learning:

– Ontology Management Component: As basis for ontology learningcomprehensive ontology management capabilities (e.g. means for includingexisting ontologies, evolution, etc.) and a graphical user interface to supportthe ontology engineering process which are manually performed by theontology engineer.

– Resource Processing Component: This component contains a widerange of techniques for discovering, importing, analyzing and transformingrelevant input data. An important sub-component is a natural languageprocessing system. The general task of the resource processing componentis to generate a set of pre-processed data as input for the algorithm librarycomponent.

– Algorithm Library Component: This component acts as the algorith-mic backbone of the framework. A number of algorithms are provided forthe extraction and maintenance of the entity contained in the ontologymodel. In order to be able to combine the extraction results of differentlearning algorithms, it is necessary to standardize the output in a commonway. Therefore a common result structure for all learning methods is pro-vided. If several extraction algorithms obtain the same results, they arecombined in a multi-strategy way and presented to the user only once.

– Coordination Component: The ontology engineer uses this componentto interact with the ontology learning components for resource processing

1. Ontology Learning 3

Fig. 1.1. Ontology Learning Conceptual Architecture

and for the algorithm library. Comprehensive user interfaces are providedto the ontology engineer to help select relevant data, apply processing andtransformation techniques or start a specific extraction mechanism. Dataprocessing can also be triggered by the selection of an ontology learningalgorithm that requires a specific representation. Results are merged us-ing the result set structure and presented to the ontology engineer withdifferent views of the ontology structures.

In the following we provide a detailed overview of the four components.

1.2.1 Ontology Management

As core to our approach we build on an ontology management and applica-tion infrastructure called KArlsruhe ONtology and Semantic Web Infrastruc-ture (KAON), allowing easy ontology management and application. KAONis based on an ontology model [1.3], derived as an extension of RDF(S), withsome proprietary extensions, such as inverse, symmetric and transitive re-lations, cardinalities, modularization, meta-modeling and representation oflexical information.


KAON is based on an ontology model as introduced in [1.3]. Briefly, theontology language is based on RDF(S), but with clean separation of modelingprimitives from the ontology itself (thus avoiding the pitfalls of self-describingRDFS primitives such as subClassOf), providing means for modelling meta-classes and incorporating several commonly used modelling primitives, suchas transitive, symmetric and inverse properties, or cardinalities. All infor-mation is organized in so-called OI-models (ontology-instance models), con-taining both ontology entities (concepts and properties) as well as their in-stances. This allows grouping concepts with their well-known instances intoself-contained units. E.g. a geographical information OI-model might containthe concept Continent along with its seven well-known instances. An OI-model may include another OI-model, thus making all definitions from theincluded OI-model automatically available.

Fig. 1.2. OI-model example

An OI-model which is of particular interest for ontology learning is thelexical OI-model. It consists of so-called lexical entries that reflect variouslexical properties of ontology entities, such as a label, synonym, stem ortextual documentation. There is an n : m relationship between lexical entriesand instances, established by the property references. Thus, the same lexicalentry may be associated with several elements (e.g. jaguar label may beassociated with an instance representing a Jaguar car or a jaguar cat). Thevalue of the lexical entry is given by property value, whereas the languageof the value is specified through the inLanguage property that refers to theset of all languages, and its instances are defined by the ISO standard 639.


1.2.2 Coordination Component

As mentioned earlier this component contains a wide range of techniquesfor discovering, importing, analyzing and transforming relevant input data.An important sub-component is a natural language processing system. Thegeneral task of the resource processing component is to generate a set ofpre-processed data as input for the algorithm library component.

The ontology engineer uses the coordination component to select inputdata, i.e. relevant resources such as HTML documents from the Web that areexploited in the further discovery process. Secondly, using the coordinationcomponent, the ontology engineer also chooses among a set of resource pro-cessing methods available at the resource processing component and amonga set of algorithms available in the algorithm library.

1.2.3 Resource processing

Resource processing strategies differ depending on the type of input datamade available:

– Semi-structured documents, like dictionaries, may be transformed into apredefined relational structure as described in [1.1]. HTML documents maybe indexed and reduced to free text.

– For processing free natural text our system accesses language dependentnatural language processing systems may be used. E.g. for the Germanlanguage we have used SMES (Saarbrucken Message Extraction System),a shallow text processor [1.9]. SMES comprises a tokenizer based on regularexpressions, a lexical analysis component including various word lexicons, amorphological analysis module, a named entity recognizer, a part-of-speechtagger and a chunk parser. For English, one may build on many availablelanguage processing resource, e.g. the GATE system as described in [1.8].In the actual implementation that is described in 1.4 we just used the well-known Porter stemming algorithm as a very simple means for normalizingnatural language.

After first preprocessing according to one of these or similar strategies,the resource processing module transforms the data into an algorithm-specificrelational representation (see [1.1] for a detailed description).

1.2.4 Algorithm Library

An ontology may be described by a number of sets of concepts, relations,lexical entries, and links between these entities. An existing ontology defi-nition may be acquired using various algorithms working on this definitionand the preprocessed input data. While specific algorithms may greatly varyfrom one type of input to the next, there is also considerable overlap con-cerning underlying learning approaches like association rules, pattern-based


extraction or clustering. Hence, we may reuse algorithms from the library foracquiring different parts of the ontology definition.

Subsequently, we introduce some of these algorithms available in our im-plementation. In general, we use a multi-strategy learning and result combi-nation approach, i.e. each algorithm that is plugged into the library generatesnormalized results that adhere to the ontology structures sketched above andthat may be combined into a coherent ontology definition. Several algorithmsare described in more detail in the following section.

1.3 Ontology Learning Algorithms

1.3.1 Lexical Entry & Concept Extraction

A straightforward technique for extracting relevant lexical entries that mayindicate concepts is just counting frequencies of lexical entries in a given setof (linguistically processed) documents, the corpus D. In general this ap-proach is based on the assumption that a frequent lexical entry in a set ofdomain-specific texts indicates a surface of a concept. Research in informationretrieval has shown that there are more effective methods of term weightingthan simple counting of frequencies. A standard information retrieval ap-proach is pursued for term weighting, based on the following quantities:

– The lexical entry frequency lefl,d is the frequency of occurrence oflexical entry l ∈ L in a document d ∈ D.

– The document frequency dfl is the number of documents in the corpusD that l occurs in.

– The corpus frequency cfl is total number of occurrences of l in the overallcorpus D.

The reader may note that in general dfl ≤ cfl and∑

d lefl,d = cfl. Theextraction of lexical entries is based on the information retrieval measuretfidf (term frequency inverted document frequency). Tfidf combinesthe introduced quantities as follows:

Definition 1.3.1. Let lefd,l be the lexical entry frequency of the lexical entryl in a document d. Let dfl be the overall document frequency of lexical entryl. Then tfidfl,d of the lexical entry l for the document d is given by:

tfidfl,d = lefl,d ∗ log( |D|

dfl

). (1.1)

Tfidf weighs the frequency of a lexical entry in a document with a fac-tor that discounts its importance when it appears in almost all documents.Therefore terms that appear too rarely or too frequently are ranked lowerthan terms that hold the balance. A final step that has to be carried out on


the computed tfidfl,d is the following: A list of all lexical entries containedin one of the documents from the corpus D without lexical entries that ap-pear in a standard list of stopwords is produced.1. The tfidf values for lexicalentries l are computed as follows:

Definition 1.3.2. tfidfl :=∑d∈D

tfidfl,d, tfidfl ∈ IR. (1.2)

The user may define a threshold k ∈ IR+ that tfidfl has to exceed. Thelexical entry approach has been evaluated in detail (e.g. for varying k anddifferent selection strategies).

1.3.2 Taxonomic Relation Extraction

Taxonomic relation extraction can be done in many different ways. In ourframework we mainly include the following to kinds of approaches:

– Statistics-based extraction using hierarchical clustering– Lexico-syntactic pattern extraction

In the following we will shortly elaborate on the two approaches.

Hierarchical Clustering. Distributional data about words may be used tobuild concepts and their embedding into a hierarchy “from scratch”. In thiscase a certain clustering technique is applied to distributional representationsof concepts. Clustering can be defined as the process of organizing objects intogroups whose members are similar in some way (see [1.5]). In general thereare two major styles of clustering: non-hierarchical clustering in whichevery object is assigned to exactly one group and hierarchical clustering,in which each group of size greater than one is in turn composed of smallergroups. Hierarchical clustering algorithms are preferable for detailed dataanalysis. They produce hierarchies of clusters, and therefore contain moreinformation than non-hierarchical algorithms. However, they are less efficientwith respect to time and space than non-hierarchical clustering2. [1.7] identifytwo main uses for clustering in natural language processing3: The first is theuse of clustering for exploratory data analysis, the second is for generalization.Seminal work in this area of so-called distributional clustering of Englishwords has been described in [1.6]. Their work focuses on constructing class-based word co-occurrence models with substantial predictive power. In thefollowing the existing and seminal work of applying statistical hierarchicalclustering in NLP (see [1.6]) is adopted and embedded into the framework.

1 Available for German at http://www.loria.fr/˜bonhomme/sw/stopword.de2 Hierarchical clustering has in the average quadratic time and space complexity.3 A comprehensive survey on applying clustering in NLP is also available in the

EAGLES report, see http://www.ilc.pi.cnr.it/EAGLES96/rep2/node37.htm


Lexico-Syntactic Patterns. The idea of using lexico-syntactic patterns in theform of regular expressions for the extraction of semantic relations, in par-ticular taxonomic relations has been introduced by [1.11]. Pattern-based ap-proaches in general are heuristic methods using regular expressions that orig-inally have been successfully applied in the area of information extraction.In this lexico-syntactic ontology learning approach the text is scanned forinstances of distinguished lexico-syntactic patterns that indicate a relation ofinterest, e.g. the taxonomic relation. Thus, the underlying idea is very sim-ple: Define a regular expression that captures re-occurring expressions andmap the results of the matching expression to a semantic structure, such astaxonomic relations between concepts.

Example. This examples provides a sample pattern-based ontology extrac-tion scenario. For example, in [1.11] the following lexico-syntactic pattern isconsidered

. . . NP{, NP} ∗ {, } or other NP . . .

When we apply this pattern to a sentence it can be inferered that theNP’s referring to concepts on the left of or other are sub concepts of the NPreferring to a concept on the right. For example from the sentence

Bruises, wounds, broken bones or other injuries are common.

we extract the taxonomic relations (Bruise,Injury), (Wound,Injury) and(Broken-Bone,Injury).

Within our ontology learning system we have applied different techniquesto dictionary definitions in the context of the insurance and telecommunica-tion domains is described in [1.12]. An important aspect in this system andapproach is that existing concepts are included in the overall process. Thus,in contrast to [1.11] the extraction operations have been performed on theconcept level, thus, patterns have been directly matched onto concepts. Thus,the system is, besides extracting taxonomic relations from scratch, able torefine existing relations and refer to existing concepts.

1.3.3 Non-Taxonomic Relation Extraction

Association rules have been established in the area of data mining, thus,finding interesting association relationships among a large set of data items.Many industries become interested in mining association rules from theirdatabases (e.g. for helping in many business decisions such as customer re-lationship management, cross-marketing and loss-leader analysis. A typicalexample of association rule mining is market basket analysis. This processanalyzes customer buying habits by finding associations between the differ-ent items that customers place in their shopping baskets. The information


discovered by association rules may help to develop marketing strategies, e.g.layout optimization in supermarkets (placing milk and bread within closeproximity may further encourage the sale of these items together within sin-gle visits to the store). In [1.10] concrete examples for extracted associationsbetween items are given. The examples are based supermarket products thatare included in a set of transactions collected from customers’ purchases. Oneof the classical association rule that has been extracted from these databasesis that “diaper are purchased together with beer”.

For the purpose of illustration, an example is provided to the reader. Theexample is based on actual ontology learning experiments as described in[1.2]. A text corpus given by a WWW provider for tourist information hasbeen processed. The corpus describes actual objects referring to locations,accomodations, furnishings of accomodations, administrative information, orcultural events, such as given in the following example sentences.(1) a. “Mecklenburg’s” schonstes “Hotel” liegt in Rostock. (“Mecklenburg’s”

most beautiful “hotel” is located in Rostock.)b. Ein besonderer Service fur unsere Gaste ist der “Frisorsalon” in un-

serem “Hotel”. (A “hairdresser” in our “hotel” is a special service forour guests.)

c. Das Hotel Mercure hat “Balkone” mit direktem “Strandzugang”. (Thehotel Mercure offers “balconies” with direct “access” to the beach.)

d. Alle “Zimmer” sind mit “TV”, Telefon, Modem und Minibar ausges-tattet. (All “rooms” have “TV”, telephone, modem and minibar.)

Processing the example sentences (1a) and (1b) the dependency relationsbetween the lexical entries are extracted (and some more). In sentences (1c)and (1d) the heuristic for prepositional phrase-attachment and the sentenceheuristic relate pairs of lexical entries, respectively. Thus, four concept pairs– among many others – are derived with knowledge from the lexicon.

Table 1.1. Examples for Linguistically Related Pairs of Concepts

L1 ai,1 L2 ai,2

“Mecklenburgs” area hotel hotel“hairdresser” hairdresser hotel hotel“balconies” balcony access access“room” room TV television

The algorithm for learning generalized association rules uses the con-cept hierarchy, an excerpt of which is depicted in Figure 1.3, and the con-cept pairs from above (among many other concept pairs). In our actual ex-periments, it discovered a large number of interesting and important non-taxonomic conceptual relations. A few of them are listed in Table 1.2. Notethat in this table we also list two conceptual pairs, viz. (area, hotel)and (room, television), that are not presented to the user, but that


Fig. 1.3. An Example Concept Taxonomy as Background Knowledge for Non-Taxonomic Relation Extraction

are pruned. The reason is that there are ancestral association rules, viz.(area, accomodation) and (room,furnishing), respectively with higherconfidence and support measures.

Table 1.2. Examples of Discovered Non-Taxonomic Relations

Discovered relation Confidence Support

(area, accomodation) 0.38 0.04(area, hotel) 0.1 0.03(room, furnishing) 0.39 0.03(room, television) 0.29 0.02(accomodation, address) 0.34 0.05(restaurant, accomodation) 0.33 0.02

1.3.4 Ontology Pruning

Pruning is of need, if one adopts a (generic) ontology to a given domain.Again, we take the assumption that the occurrence of specific concepts andconceptual relations in web documents are vital for the decision whetheror not a given concept or relation should remain in an ontology. We takea frequency based approach determining concept frequencies in a corpus.Entities that are frequent in a given corpus are considered as a constituent ofa given domain. But - in contrast to ontology extraction - the mere frequencyof ontological entities is not sufficient.

To determine domain relevance ontological entities retrieved from a do-main corpus are compared to frequencies obtained from a generic corpus.The user can select several relevance measures for frequency computation.The ontology pruning algorithm uses the computed frequencies to determinethe relative relevancy of each concept contained in the ontology. All exist-ing concepts and relations, which are more frequent in the domain-specific


corpus remain in the ontology. The user can also control the pruning of con-cepts which are neither contained in the domain-specific nor in the genericcorpus. We refer to interested reader to [1.1], where the pruning algorithmsare described in further detail.

1.4 Implementation

This section describes the implemented ontology learning system Text-To-Onto that is embedded in KAON, the Karlsruhe Ontology and Semantic webinfrastructure. KAON4 is an open-source ontology management and appli-cation infrastructure targeted for semantics-driven business applications. Itincludes a comprehensive tool suite allowing easy ontology management andapplication.

Fig. 1.4. OI-Modeler

Within KAON we have developed OI-modeler, an end-user applicationthat realizes a graph-based user interface for OI-model creation and evolution[1.4]. Figure 1.4 shows the graph-based view onto an OI-model.

A specific aspect of OI-modeler is that it supports ontology evolution atthe user level (see right side of the screenshot). The figure shows a modellingsession where the user attempted to remove the concept Person. Before ap-plying this change to the ontology, the system computed the set of additional4 http://kaon.semanticweb.org


changes that must be applied. The tree of dependent changes is presentedto the user, thus allowing the user to comprehend the effects of the changebefore it is actually applied. Only when the user agrees, the changed will beapplied to the ontology.

Text-To-Onto5 has been developed on top of KAON and in specific ontop of OI-modeler. Text-To-Onto provides means for defining a corpus as abasis for ontology learning (see right lower part of Figure 1.5). On top of thecorpus various algorithms may be applied. In the example depicts in Figure1.5 the user has selected different paramters and executed a pattern-basedextraction and an association rule extraction algorithm in parallel.

Finally, the results are presented to the user (see Figure 1.5) and graph-ical means for adding lexical entries, concepts and conceptual relations areprovided.

Fig. 1.5. Text-To-Onto

5 Text-To-Onto is open-source and available for download at the KAON Web page.


1.5 Conclusion

Until recently ontology learning per se, i.e. for comprehensive construction ofontologies, has not existed. However, much work in a number of disciplines —computational linguistics, information retrieval, machine learning, databases,software engineering — has actually researched and practiced techniques forsolving part of the overall problem.

We have introduced ontology learning as an approach that may greatlyfacilitate the construction of ontologies by the ontology engineer. The notionof Ontology Learning introduced in this article aims at the integration of amultitude of disciplines in order to facilitate the construction of ontologies.The overall process is considered to be semi-automatic with human interven-tion. It relies on the “balanced cooperative modeling” paradigm, describing acoordinated interaction between human modeler and learning algorithm forthe construction of ontologies for the Semantic Web.

References

1.1 A. Maedche: Ontology Learning for the Semantic Web. Kluwer Academic Pub-lishers, 2002.

1.2 A. Maedche and S. Staab: Mining ontologies from text. In Proceedings ofEKAW-2000, Springer Lecture Notes in Artificial Intelligence (LNAI-1937),Juan-Les-Pins, France, 2000.

1.3 B. Motik and A. Maedche and R. Volz: A Conceptual Modeling Approach forbuilding semantics-driven enterprise applications. 1st International Conferenceon Ontologies, Databases and Application of Semantics (ODBASE-2002), Cal-ifornia, USA, 2002.

1.4 A. Maedche and B. Motik and L. Stojanovic and R. Studer and R. Volz: Ontolo-gies for Enterprise Knowledge Management, IEEE Intelligent Systems, 2002.

1.5 L. Kaufman and P. Rousseeuw: Finding Groups in Data: An Introduction toCluster Analysis, John Wiley, 1990.

1.6 Pereira, F. and Tishby, N. and Lee, L.: Distributation Clustering of EnglishWords. In Proceedings of the ACL-93, 1993.

1.7 Manning, C. and Schuetze, H.: Foundations of Statistical Natural LanguageProcessing. MIT Press, Cambridge, Massachusetts, 1999.

1.8 H. Cunningham and R. Gaizauskas and K. Humphreys and Y. Wilks: ThreeYears of GATE, In Proceedings of the AISB’99 Workshop on Reference Archi-tectures and Data Standards for NLP, Edinburgh, U.K. Apr, 1999.

1.9 G. Neumann and R. Backofen and J. Baur and M. Becker and C. Braun: AnInformation Extraction Core System for Real World German Text Processing.In Proceedings of ANLP-97, Washington, USA, 1997.

1.10 Agrawal, R. and Imielinski, T. and Swami, A.: Mining Associations betweenSets of Items in Massive Databases, In Proceedings of the 1993 ACM SIGMODInternational Conference on Management of Data, Washington, D.C., May 26-28, 1993.

1.11 Hearst, M.: Automatic acquisition of hyponyms from large text corpora. InProceedings of the 14th International Conference on Computational Linguis-tics. Nantes, France, 1992.


1.12 Kietz, J.-U. and Volz, R. and Maedche, A.: Semi-automatic ontology acquisi-tion from a corporate intranet. In International Conference on Grammar Infer-ence (ICGI-2000), Lecture Notes in Artificial Intelligence, LNAI, 2000.

ontology handbook

Documents