ontology research and development. part 1 - a review of ontology generation

15
http://jis.sagepub.com/ Journal of Information Science http://jis.sagepub.com/content/28/2/123 The online version of this article can be found at: DOI: 10.1177/016555150202800204 2002 28: 123 Journal of Information Science Ying Ding and Schubert Foo Ontology research and development. Part 1 - a review of ontology generation Published by: http://www.sagepublications.com On behalf of: Chartered Institute of Library and Information Professionals can be found at: Journal of Information Science Additional services and information for http://jis.sagepub.com/cgi/alerts Email Alerts: http://jis.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://jis.sagepub.com/content/28/2/123.refs.html Citations: What is This? - Apr 1, 2002 Version of Record >> at UNIV OF UTAH SALT LAKE CITY on November 24, 2014 jis.sagepub.com Downloaded from at UNIV OF UTAH SALT LAKE CITY on November 24, 2014 jis.sagepub.com Downloaded from

Upload: s

Post on 30-Mar-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontology research and development. Part 1 - a review of ontology generation

http://jis.sagepub.com/Journal of Information Science

http://jis.sagepub.com/content/28/2/123The online version of this article can be found at:

 DOI: 10.1177/016555150202800204

2002 28: 123Journal of Information ScienceYing Ding and Schubert Foo

Ontology research and development. Part 1 - a review of ontology generation  

Published by:

http://www.sagepublications.com

On behalf of: 

  Chartered Institute of Library and Information Professionals

can be found at:Journal of Information ScienceAdditional services and information for    

  http://jis.sagepub.com/cgi/alertsEmail Alerts:

 

http://jis.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://jis.sagepub.com/content/28/2/123.refs.htmlCitations:  

What is This? 

- Apr 1, 2002Version of Record >>

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 2: Ontology research and development. Part 1 - a review of ontology generation

Ying Ding and Schubert Foo

Vrije Universiteit, Amsterdam and Nanyang TechnologicalUniversity, Singapore

Received 25 June 2001Revised 8 October 2001

Abstract.

Ontology is an important emerging discipline that has the huge potential to improve information organization,management and understanding. It has a crucial role to playin enabling content-based access, interoperability, commu-nications, and providing qualitatively new levels of serviceson the next wave of web transformation in the form of theSemantic Web. The issues pertaining to ontology generation,mapping and maintenance are critical key areas that need tobe understood and addressed. This survey is presented intwo parts. The first part reviews the state-of-the-art tech-niques and work done on semi-automatic and automaticontology generation, as well as the problems facing suchresearch. The second complementary survey is dedicated to ontology mapping and ontology ‘evolving’. Through thissurvey, we have identified that shallow information extrac-tion and natural language processing techniques aredeployed to extract concepts or classes from free-text orsemi-structured data. However, relation extraction is a verycomplex and difficult issue to resolve and it has turned outto be the main impediment to ontology learning and applic-ability. Further research is encouraged to find appropriateand efficient ways to detect or identify relations throughsemi-automatic and automatic means.

1. Introduction

Ontology is the term referring to the shared under-standing of some domains of interest, which is oftenconceived as a set of classes (concepts), relations, func-tions, axioms and instances [1]. In the knowledge repre-sentation community, the commonly used or highlycited ontology definition is adopted from Gruber [1]where an ‘ontology is a formal, explicit specification ofa shared conceptualization. “Conceptualization” refersto an abstract model of phenomena in the world byhaving identified the relevant concepts of thosephenomena. “Explicit” means that the type of conceptsused, and the constraints on their use are explicitlydefined. “Formal” refers to the fact that the ontologyshould be machine readable. “Shared” reflects thatontology should capture consensual knowledgeaccepted by the communities’.

Ontology is a complex multi-disciplinary field thatdraws upon the knowledge of information organization,natural language processing, information extraction,artificial intelligence, knowledge representation andacquisition. Ontology is gaining popularity and istouted as an emerging technology that has a huge poten-tial to improve information organization, managementand understanding. In particular, when ontology pro-vides a shared framework of the common understand-ing of specific domains that can be communicatedbetween people and application systems, then it canhave a significant impact on areas dealing with vastamounts of distributed and heterogeneous computer-based information, such as those residing on the worldwide web and intranet information systems, complexindustrial software applications, knowledge manage-ment, electronic commerce and e-business. Forinstance, ontology has played a strategic role for agentcommunication [2]; ontology mapping has the capabil-ity to break the bottleneck of the business-to-business marketplace [3]; and ontology is the enabler

123456789

1110123456789

2012

113456789

30123456789

40123456789

5012

Journal of Information Science, 28 (2) 2002, pp. 123–136 123

The effect of postings information on searching behaviour

Ontology research anddevelopment. Part 1 – a review of ontology generation

Correspondence to: Ying Ding, Division of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, TheNetherlands. E-mail: [email protected]

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 3: Ontology research and development. Part 1 - a review of ontology generation

to improve intranet-based knowledge management sys-tems [4]. Ontology itself is an explicitly defined refer-ence model of application domains with the purpose ofimproving information consistency and reusability,systems interoperability and knowledge sharing. Itdescribes the semantics of a domain in both a human-understandable and computer-processable way.

Developments at the World Wide Web Consortium(W3C) indicate that the first generation of the worldwide web will make transition in future into the second generation, the Semantic Web [5, 6]. The term‘Semantic Web’ was coined by Tim Berners-Lee, theinventor of the world wide web, to describe his visionof the next generation web that provides services thatare much more automated based on machine-process-able semantics of data and heuristics [7]. Ontologiesthat provide shared and common domain theories willbecome a key asset for this to happen. They can be seen as metadata that explicitly represent semantics of data in a machine-processable way. Ontology-basedreasoning services can operationalize such semanticsand be used for providing various forms of services (for instance, consistency checking, subsumptionreasoning, query answering, etc.). Ontologies helppeople and computers to access the information theyneed, and effectively communicate with each other.

They therefore have a crucial role to play in enablingcontent-based access, interoperability and communica-tion across the web, providing it with a qualitativelynew level of service such as proposed in the SemanticWeb [8].

Ontology learning is starting to emerge as a sub-areaof ontology engineering due to the rapid increase ofweb documents and the advanced techniques shared bythe information retrieval, machine learning, naturallanguage processing and artificial intelligence commu-nities. The majority of existing ontologies have beengenerated manually. Generating ontologies in thismanner has been the normal approach undertaken bymost ontology engineers. However, this process is verytime-intensive and error-prone, and poses problems inmaintaining and updating ontologies. For this reason,researchers are looking for other alternatives to generateontologies in more efficient and effective ways. Thissurvey aims to provide an insight into this importantemerging field of ontology, and highlights the maincontributions of ontology generation, mapping andevolving,* whose inter-relationships are shown in Fig. 1.

The survey is carried out over two parts, namely a state-of-the-art survey on ontology generation and astate-of-the-art survey on ontology mapping andevolving. In this first part of the survey on ontologygeneration, the areas of semi-automatic or automaticontology generation will be covered. A subsequentpaper will report on ontology mapping and evolving.

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

124 Journal of Information Science, 28 (2) 2002, pp. 123–136

*Ontology evolving, although an incorrect use of Englishlanguage, has become an accepted buzzword in the ontologyfield, as in ontology mapping and ontology versioning.

Fig. 1. General overview of ontology generation, mapping and evolving.

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 4: Ontology research and development. Part 1 - a review of ontology generation

2. Ontology generation in general

Although there already exist large-scale ontologies,ontology engineers are still needed to construct theontology and knowledge base for a particular task ordomain, and to maintain and update the ontology tokeep it relevant and up-to-date. Manually constructedontologies are time-consuming, labour-intensive anderror-prone. Moreover, a significant delay in updatingontologies, causing currency problems, actuallyhinders the development and application of the ontologies.

The starting point for creating an ontology could arisefrom different situations. An ontology can be createdfrom scratch, from existing ontologies (whether globalor local ontologies) only, from a corpus of informa-tion sources only; or a combination of the latter twoapproaches [9]. Various degrees of automation could beused to build ontologies, ranging from fully manualthrough semi-automated to fully automated. At present,the fully automated method only functions well forvery lightweight ontologies under very limited circum-stances.

Normally, methods to generate an ontology could be summarized as: bottom-up – from specification to generalization; top-down – from generalization tospecification (e.g. KACTUS ontology); and middle-out– from the most important concepts to generaliza-tion and specialization (e.g. Enterprise ontology andMethodology ontology [10]). Most often, lifting algo-rithms are used to lift and derive different levels ofontologies from a basic ontology [11].

There are also a number of general ontology designprinciples that have been proposed by differentontology engineers over a period of time:� Guarino [12] was inspired by philosophical

research and proposed a methodology for ontologydesign known as ‘Formal Ontology’ [13]. Thisdesign principle contains a theory of parts, a theoryof wholes, a theory of identity, a theory of depen-dence and a theory of universals. He summarizedthe basic design principles that include the need to: (1) be clear about the domain; (2) take identityseriously; (3) isolate a basic taxonomic structure;and (4) identify roles explicitly.

� Uschold and Gruninger [14] proposed a skeletalmethodology for building ontologies via a purelymanual process – (1) identify purpose and scope;(2) build the ontology via a three-step process –ontology capture (identification of the key conceptsand relationships and provision of the definitionsof such concepts and relationships); ontology

coding (committing to the basic terms for ontology(class, entity, relation); choosing a representationlanguage; writing the code); and integrating existingontologies; (3) evaluation [15]; (4) documentation;and (5) guidelines for each of the previous phases.The final resulting ontology should be clear (definitions should be maximally clear and unam-biguous), consistent and coherent (an ontologyshould be internally and externally consistent),extensible and reusable (an ontology should bedesigned in such a way as to maximize subsequentreuse and extensibility).

� Ontological design patterns (ODPs) [16] were usedto abstract and identify ontological design struc-tures, terms, larger expressions and semanticcontexts. These techniques can separate theconstruction and definition of complex expressionsfrom its representation to change them indepen-dently. This method was successfully applied inthe integration of molecular biological information[16].

Hwang [2] proposed a number of desirable criteria for the final generated ontology to be (1) open anddynamic (both algorithmically and structurally for easyconstruction and modification), (2) scalable and inter-operable, (3) easily maintained (ontology should have asimple, clean structure as well as being modular), and(4) context independent.

The remaining sections highlight the major contribu-tions and projects that have been reported with respectto ontology generation. In each project, an introductionand background is first provided. This is followed by adescription of the methods employed and concludeswith a summary of problems that have surfaced duringthe process of ontology generation.

2.1. InfoSleuth (MCC)

InfoSleuth is a research project at MCC (Micro-electronics and Computer Technology Corporation) to develop and deploy new technologies for findinginformation available both in corporate networks and external networks. It focuses on the problems oflocating, evaluating, retrieving and merging informa-tion in an environment in which new informationsources are constantly being added. It is a projectaiming to build up the ontology-based agent archi-tecture. It has been successfully implemented indifferent application areas that include KnowledgeManagement, Business Intelligence, Logistics, CrisisManagement, Genome Mapping, environmental dataexchange networks, and so on.

11123456789

10123456789

1120123456789

1130123456789

1140123456789

501

112

YING DING AND SCHUBERT FOO

Journal of Information Science, 28 (2) 2002, pp. 123–136 125

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 5: Ontology research and development. Part 1 - a review of ontology generation

Methods. The procedure for automatic generation ofthe ontology adopted in InfoSleuth is as follows [2]:� Human experts provide the system with a small

number of seedwords that represent high-levelconcepts. Relevant documents will be collectedfrom the web (with POS-tagged or otherwiseunmarked text) automatically.

� The system processes the incoming documents,extracts phrases containing seedwords, generatescorresponding concept terms and places them inthe ‘right’ place in the ontology. At the same time,it also collects candidates for seedwords for thenext round of processing. The iteration continuesuntil a set of results is reached.

� Several kinds of relations are extracted – ‘is-a’,‘part-of’, ‘manufactured-by’, ‘owned-by’, etc. The‘assoc-with’ relation is used to define all relationsthat are not an ‘is-a’ relation. The distinctionbetween ‘is-a’ and ‘assoc-with’ relations is based ona linguistic property of noun compounds.

� In each iteration, a human expert is consulted toascertain the correctness of the concepts. If neces-sary, the expert has the right to make the correctionand reconstruct the ontology. Figure 2 shows anexample of the structure of a generated ontology.

In Fig. 2, the indentation shows the hierarchy (class andsubclass relationships). Thus, ‘field emission display’,‘flat panel display’ and ‘display panel’ are subclasses of‘display’. Here one obvious rule to generate this hier-archy is that if the phrase has ‘display’ as the last wordin the phrase, it will become the subclass of ‘display’.Likewise, the same rule is applied for ‘image’. Anotherrule is that if the phrase has ‘display’ as the first wordin the phrase, this phrase will also become the subclassof the ‘display’ with the indication of ‘*’, such as‘display panel’, ‘video image retrieval’ and so on.

This system has a number of special features andcharacteristics:� Discover-and-alert – the system expands the

ontology with new concepts it learns from newdocuments and alerts the human experts of thechanges.

� Attribute-relation-discovery – this approach candiscover some of the attributes associated withcertain concepts. For instance, the method candiscover the attributes of physical dimensions ornumber of pixels and can even learn the range oftheir possible values. Based on the linguistic char-acters, the ‘assoc-with’ relations can be identifiedautomatically. Since the ontology is organized ashierarchies, attributes are automatically inheritedfollowing ‘is-a’ links.

� Indexing-documents – while constructing theontology, this method also indexes documents forfuture retrieval, optionally saving the results in arelational database. It collects ‘context lines’ foreach concept generated, showing how the conceptwas used in the text, as well as frequency and co-occurrence statistics for word association discoveryand data mining.

� This system allows the users to decide between the precision completeness through browsing theontology and inferences based on the is-a relationand assoc-with relations.

The system uses simple part-of-speech (POS) tagging toconduct superficial syntactic analysis (shallow infor-mation extraction techniques). The relationship of theconcepts is detected from the linguistic features. As inany other corpus-based approach, the richer and themore complete the data set, the higher will be the reli-ability of the results achieved as a direct result of theapplicability of machine learning techniques.

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

126 Journal of Information Science, 28 (2) 2002, pp. 123–136

Fig. 2. Example of the automatic generated ontology from InfoSleuth.

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 6: Ontology research and development. Part 1 - a review of ontology generation

Problems encountered. Generating automatic ontolo-gies in this manner involves several challenges andproblems such as:

� Syntactic structural ambiguity – a correct structuralanalysis of a phrase is important because the deci-sion whether to regard a certain subsequence of aphrase as a concept often depends on the syntacticstructure. For example, the concept ‘image process-ing software’ when extracted sequentially takes thevarying forms of image (wrong concept), image pro-cessing (wrong concept), and image processing soft-ware (right concept). Likewise, ‘low-temperaturepolysilicon TFT panel’ becomes ‘low-temperature’(wrong concept), ‘low-temperature polysilicon’(wrong concept), ‘low-temperature polysilicon TFTpanel’ (right concept).

� Recognition of different phrases that refer to thesame concept. For example, ‘quartz crystal oscil-lator’ is actually the same as ‘crystal quartz oscil-lator’. One possible solution to this problem is todifferentiate them via the co-occurrence frequencyof these two phrases.

� Proper attachment of adjectival modifiers is apossible way to avoid creating non-concepts.

� Word sense problem – a possible way to solve thisproblem is to make reference to some generallinguistic ontologies (such as SENSUS or WordNet[17]) so as to disambiguate different word senses(using human involvement to select the word sensefrom SENSUS or WordNet, or through machinelearning techniques to learn the patterns).

� Heterogeneous resources as the data source forgenerating ontology – terminological inconsisten-cies are common when document sources arediverse, which is also a common informationretrieval problem. A possible way to solve this is tofind the synonyms or similar terms from generallinguistic ontologies (such as SENSUS or WordNet)or use the co-occurrence techniques to cluster theconcepts based on the high similarities to detect theinconsistency.

� The automatically constructed ontology can be too prolific and deficient at the same time.Excessively prolific ontologies could hinder adomain expert’s browsing and correction (reason-able choice of seedwords and initial cleaning and training data should limit this risk). On theother hand, automatically generated ontologiescould be deficient since they rely on seedwordsonly. One promising technique could be synonymlearning [18].

2.2. SKC (University of Stanford)

The Scalable Knowledge Composition (SKC) projectaims to develop a novel approach to resolving semanticheterogeneity in information systems. The projectattempts to derive general methods for ontology inte-gration that can be used in any application area so thatit is basically application neutral. An ontology algebrahas been developed therefore to represent the termi-nologies from distinct, typically autonomous domains.This research effort is funded by the United States AirForce Office of Scientific Research (AFOSR), with thecooperation of the United States Defense AdvancedResearch Project Agency (DARPA) High-PerformanceKnowledge Base (HPKB) program.

In this project, Jannink and Wiederhold [19] andJannink [20] converted the Webster’s dictionary data toa graph structure to support the generation of a domainor task ontology. The resulting text is tagged to mark theparts of the definitions, similar to the XML (eXtensibleMarkup Language) structure. According to theirresearch purpose, only head words (<hw> . . . </hw>)and definitions (<def> . . . </def>) having many-to-many relationships were considered. This resulted in adirected graph that had the two properties that eachhead word and definition grouping was a node; andeach word in a definition node was an arc to the nodehaving that head word.

Methods. Jannink and Wiederhold [19] did notdescribe the adopted technique in detail in their publi-cation. However, Jannink [20] mentioned that they useda novel algebraic extraction technique to generate thegraph structure and create the thesaurus entries for allwords defined in the structure including some stopwords (e.g. a, the, and). Ideas from the PageRank algo-rithm [21] were also adopted. This is a flow algorithmover the graph structure of the WWW that models thelinks followed during a random browsing sessionthrough the web. The ArcRank from the PageRankmodel was chosen to extract relationships between thedictionary words and the strength of the relationship.The attraction of using the dictionary as a structuringtool is that headwords are terms distinguished from thedefinition text, which provides the extra informationallowing for types of analysis that are not currentlyperformed in traditional data mining and informationretrieval. This method could also be applied to docu-ment classification and the relevance ranking of miningqueries. The basic hypothesis for this work is that struc-tural relationships between terms are relevant to theirmeaning. The methodology to extract the relations (theimportant component of ontology) is achieved through

11123456789

10123456789

1120123456789

1130123456789

1140123456789

501

112

YING DING AND SCHUBERT FOO

Journal of Information Science, 28 (2) 2002, pp. 123–136 127

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 7: Ontology research and development. Part 1 - a review of ontology generation

a new iterative algorithm, based on the Pattern/Relationextraction algorithm as follows [22]:� compute a set of nodes that contain arcs compa-

rable to seed arc set;� threshold them according to ArcRank value;� extend seed arc set, when nodes contain further

commonality;� if the node set increased in size repeat from the first

step.The algorithm outputs a set of terms that are related

by the strength of the associations in the arcs that theycontain. These associations were computed accordingto local hierarchies of subsuming and specializing relationships, and sets of terms are related by thekinship relation. The algorithm is naturally self-limiting via the thresholds. This approach can also beused to distinguish senses. For instance, the senses of a word such as hard, are distinguished by the choice of association with tough and severe. Also, ranking thedifferent senses of a term by the strength of its associa-tions with other terms could uncover the principalsense of a term.

Problems encountered. A number of problems havebeen highlighted during the process of ontology gener-ation in this project:� syllable and accent markers in head words;� mis-spelled head words;� mis-tagged fields;� stemming and irregular verbs (e.g. hopelessness);� common abbreviations in definitions (e.g. etc.);� undefined words with common prefixes (e.g. un-);� multi-word head words (e.g. water buffalo);� undefined hyphenated and compound words (e.g.

sea-dog).The interested reader can refer to Jannink [20] for a

more detailed account of the methodology and prob-lems encountered.

2.3. Ontology learning (AIFB, University of Karlsruhe)

The Ontology Learning Group in AIFB (Institute ofApplied Informatics and Formal Description Methods,University of Karlsruhe, Germany) is active in the onto-logy engineering area. They have developed varioustools to support ontology generation that includeOntoEdit (the ontology editor) and Text-To-Onto (anintegrated environment for the task of learning onto-logies from text [23, 24]).

Extracting ontology from domain data, especiallydomain-specific natural-language free texts, turns out to be very important. Common approaches usuallyextract relevant domain concepts based on shallow

information retrieval techniques and cluster them intoa hierarchy based on statistics and machine learningalgorithmal analysis. However, most of theseapproaches have only managed to learn the taxonomicrelations in ontologies. Detecting the non-taxonomicconceptual relationships, for example, the ‘has Part’relations between concepts, is becoming critical forbuilding good-quality ontologies [23, 24].

Methods. AIFB’s approach to ontology generationcontains two parts: shallow text processing andlearning algorithms. In shallow text processing, tech-niques have been implemented on top of SMES(Saarbrucken Message Extraction System). SMES is ashallow text processor for the German language devel-oped by DFKI (German Research Centre for ArtificialIntelligence, Germany [25]). It comprises techniques for tokenizer, lexicon, lexical analysis (morpholog-ical analysis, recognition of name entities, retrieval ofdomain-specific information, part-of-speech tagging)and Chunk parser. SMES uses weighted finite statetransducers to efficiently process phrasal and sententialpatterns.

The outputs of SMES are dependency relations foundthrough lexical analysis. These relations are treated asthe input of the learning algorithms. Some of the depen-dency relations do not hold the meaningful relations of the two concepts which can be linked together (co-occurrence) by some mediator (i.e. proposition,etc.). SMES also returns some phrases without any rela-tions. Some heuristic rules have been defined toincrease the high recall of the linguistic dependencyrelations, for instance, the NP-PP-heuristic (attachingall prepositional phrases to adjacent noun phrases),sentence-heuristic (relating all concepts contained inone sentence), and title-heuristic (linking the conceptsappearing in the title with all the concepts contained inthe overall document [23, 24]).

The learning mechanism is based on the algorithmfor discovering generalized association rules proposedby Srikant and Agrawal [26]. The learning modulecontains four steps: (1) selecting the set of documents;(2) defining the association rules; (3) determining confi-dence for these rules; and (4) outputting associationrules exceeding the user-defined confidence.

AIFB also built a system to facilitate the semi-auto-matic generation of the ontologies known as Text-To-Onto [24], as shown in Fig. 3. The system includes anumber of components: (1) text and processing manage-ment component (for selecting domain texts exploitedfor the further discovery process); (2) text processingserver (containing a shallow text processor based on thecore system SMES – the result of text processing is

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

128 Journal of Information Science, 28 (2) 2002, pp. 123–136

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 8: Ontology research and development. Part 1 - a review of ontology generation

stored in annotations using XML-tagged text); (3)lexical DB and Domain Lexicon (facilitating syntacticprocessing based on the lexical knowledge); (4)learning and discovering component (using variousextraction methods, e.g. association rules, for conceptacquisition); and (5) ontology engineering environment(support in semi-automatically adding newly discov-ered conceptual structures to the ontology by theontology engineers, OntoEdit).

Kietz et al. [4] adopted the above method to build aninsurance ontology from a corporate intranet. First, theGermaNet was chosen as the top-level ontology for thedomain-specific goal ontology. Then, domain-specificconcepts were acquired using a dictionary that containsimportant corporate terms that were described innatural language. Then, a domain-specific and a generalcorpus of texts were used to remove concepts that werenot domain-specific through some heuristic rules.Relations between concepts were learned by analyzingthe corporate intranet documents based on a multi-strategy learning algorithm, either from the statistical

approach (frequent coupling of concepts in sentencescan be regarded as relevant relations between concepts)or from the pattern-based approach.

Problems encountered. A number of problems havebeen highlighted in the process of ontology generationin this project:� The lightweight ontology contains much noisy

data. For instance, not every noun-phrase or singleterm can be considered as the concept or class. Theword sense problem generates lots of ambiguity.

� The refinement of these lightweight ontologies is atricky issue that needs to be resolved. For instance,domain experts should be involved or semi-auto-matic techniques based on the recursive machinelearning algorithms should be used.

� The learning relationship is not trivial. Forinstance, general relations can be detected from thehierarchical structure of the extracted concepts (orterms). How to identify and regroup the specific orconcrete relationships becomes the main hurdle forontology generation.

11123456789

10123456789

1120123456789

1130123456789

1140123456789

501

112

YING DING AND SCHUBERT FOO

Journal of Information Science, 28 (2) 2002, pp. 123–136 129

selectedtext and

pre-processingmethod

Naturallanguagetexts

Proposesnewconceptualstructures

(XML tagged) textand selected algorithms

XML taggedtext

Text and ProcessingManagement

Text and ProcessingManagement

OntoEditOntology Modelling

EnvironmentText Processing Server

Stemming P.O.S.tagging

Chunkparsing

Morpho-syntacticlexicon

Informationextraction …

Domain lexicon

Ontology

‘is-a’s

References

Models

Models

‘is-a’s

Evaluation

Fig. 3. General overview of ontology learning system Text-to-Onto (http://ontoserver.aifb.uni-karlsruhe.de/texttoonto/).

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 9: Ontology research and development. Part 1 - a review of ontology generation

2.4. ECAI 2000

Some of the automatic ontology learning researchreported in the Ontology Learning Workshop of ECAI2000 (European Conference on Artificial Intelligence) isimportant and appropriate to this survey. A number ofshallow natural language processing techniques such asPart-of-Speech tagging, word sense disambiguation,tokenization etc., are directly relevant. These are usedto extract important (high frequency) words or phrasesthat could be used to define concepts. In the conceptformation step, it is usual for some form of general top-level ontologies (WordNet, SENSUS) to be used toassist correct extraction of the final terms and disam-biguation of the word senses.

Methods. Wagner [27] addressed the automatic acqui-sition of selectional preferences of verbs by means of statistical corpus analysis for automatic ontologygeneration. Such preference is essential for inducingthematic relations, which link verbal concepts tonominal concepts that are selectionally preferred astheir complements. Wagner [27] introduced a modifi-cation of Abe and Li’s [28] method (based on the well-founded principle: minimum description length) andevaluated it by employing a gold standard. It aimed to find the appropriate level of generation of conceptsthat can be linked to some specific relations. TheEuroWordNet database provided information that canbe combined to obtain a gold standard for selectionalpreferences. With this gold standard, lexicographicappropriateness can be evaluated automatically.However, one of the drawbacks of this method was thatthe learning algorithms were fed with word formsrather than word senses.

Chalendar and Grau [29] conceived a systemSVETLAN, which was able to learn categories of nounsfrom domain-free texts. In order to avoid generalclasses, they also considered the contextual use ofwords. The input data of this system were the semanticdomains with the thematic units (TU). Domains weresets of weighted words, relevant to represent the samespecific topic. The first step of SVETLAN was toretrieve text segments of the original texts associatedwith the different TUs, then all the triplets constitutedby a verb, the head noun of a phrase and its syntacticrole from the parser results in order to producesyntactic thematic units (STUs). The STUs belonging toa same semantic domain were aggregated together tolearn about a structured domain.

Bisson et al. [30] described Mo’K (a configurableworkbench) to support the development of conceptualclustering methods for ontology building. Mo’K wasintended to assist ontology developers to define the

most suitable learning methods for a given task. Itprovides facilities for evaluation, comparison, charac-terization and elaboration of conceptual clusteringmethods.

Faure and Poibeau [31] discussed how semanticknowledge learned from a specific domain can help thecreation of a powerful information extraction system.They combined two systems, SYLEX [32] and ASIUMtogether, and termed it the ‘double regularity model’, toeliminate the negative impacts of individual systems toyield good results. For instance, this combination couldeasily avoid a very time-consuming manual disam-biguation step. The special part of their work is that theconceptual clustering process does not only identify thelists of nouns but also augments this list by induction.

Todirascu et al. [33] used the shallow naturallanguage processing parsing techniques to semi-auto-matically build up the domain ontology (conceptualhierarchy) and represent it in description logics (DL),which provides a powerful inference mechanism and iscapable of dealing with incomplete, erroneous data.Different small French corpora have been tested in the prototype. The system was capable of identi-fying relevant semantic issues (semantic chunks) using minimal syntactic knowledge and the complexconcepts were inferred by the DL mechanism. Severaltools were employed in the model:� A POS tagging identifies the content words (nouns,

adjectives, verbs) and functional words (preposi-tions, conjunctions, etc.). The tagger uses a set ofcontextual and lexical rules (based on prefixes andsuffixes identification) learned from annotatedtexts.

� The sense tagger contains a pattern matcher, a setof patterns (words, lexical categories and syntagms)and their sense assigned by a human expert. Thesense is represented by DL concepts. The set ofconceptual descriptions was established by ahuman expert from a list of the most frequentlyrepeated segments and words extracted from a setof representative texts. The pattern matcher anno-tates each sequence of words matching the patternwith its semantic description.

� The chunk border identification identifies thewords and the syntactic constructions delimitingthe semantic chunks. This module uses the outputof the POS tagger, as well as a set of manually builtcue phrases (syntactic phrases containing auxil-iaries, composed prepositions, etc.). The borders ofnoun and prepositional phrases (determiners,prepositions) are best candidates for the chunkborder.

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

130 Journal of Information Science, 28 (2) 2002, pp. 123–136

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 10: Ontology research and development. Part 1 - a review of ontology generation

This research has basically automated the process ofcreating a domain hierarchy based on a small set ofprimitive concepts defined by the human expert. Theexpert also has to define the relations of these concepts.As part of future research, emphasis is needed on theuse of document summaries as indexes and integrationof the system into XML documents.

Problems encountered. The main problem arisingfrom these researches pertains to relation extraction.Such relations were defined manually or induced fromthe hierarchical structure of the concept classes. Apotential solution proposed is to have provision of verygeneral relations, such as ‘assoc-with’, ‘is-a’ and so on.How to efficiently extract concrete relations for theconcept class remains an important and interestingtopic for ontology learning and research.

2.5. Inductive logic programming (University of Texas at Austin)

The Machine Learning Group of the University of Texas(UT) applied inductive logic programming (ILP) tolearn relational knowledge from different examples.Most machine learning algorithms are restricted tofeature-based examples or concepts and therefore limitthemselves for learning complex relational and recur-sive knowledge. The applications of ILP by this groupare extended to various problems in natural languageand theory refinement.

Methods. Thompson and Mooney [34] described asystem WOLFIE (Word Learning From InterpretedExamples) that learns a semantic lexicon from a corpusof sentences. The lexicon learned consists of wordspaired with representations of their meanings, andallows for both synonymy and polysemy. WOLFIE is part of an integrated system that learns to parse novelsentences into their meaning representations. Theoverview of the WOLFIE algorithm is to ‘derive possi-ble phrase-meaning pairs by sampling the input sentence–representation pairs that have phrases incommon, and deriving the common substructure intheir representations, until all input representation can be composed from their phrase meanings, then add the best phrase-meaning pair to the lexicon, con-strain the remaining possible phrase-meaning pairs to reflect the pair just learned, return the lexicon oflearned phrase-meaning pairs’.

ILP is a growing topic in machine learning to studythe induction of logic programs from examples in thepresence of background knowledge [35].

Problems encountered. UT’s systems (ILP) have thepotential to become a workbench for ontological

concept extraction and relation detection. Theycombine the techniques from information retrieval,machine learning and artificial intelligence for conceptand rule learning. However, how to deploy UT’smethods for ontology concept and rule learning is stillan open question that needs to be resolved to make thisworkbench a feasible possibility.

2.6. Library science and ontology

Traditional techniques deployed in the library andinformation science area have been significantly chal-lenged by the huge amount of digital resources.Ontologies, explicit specification of the semantics andrelations in a machine-processable way, have thepotential to suggest a solution to such issues. Thedigital library and Semantic Web communities are atpresent working hand-in-hand and collaborating toaddress such special needs in the digital age. The recent European thematic network on digital libraries(DELOS, www.ercim.org/delos/) and Semantic Web(ONTOWEB, www.ontoweb.org) proposed a joint spon-sorship of a working group on ‘Content standardizationfor cultural repositories’ within the OntoWeb SIG on Content Management (www.ontoweb.org/sig.htm),which attests to the cooperation and future collabora-tions of these communities.

Methods. In digital library projects, ontologies arespecified or simplified to take the form of variousvocabularies, including cataloguing codes such asmachine readable cataloguing (MARC), thesauri orsubject heading lists, and classification schemes.Thesauri and classifications, on the one hand, are usedto represent the subject content of a book, journalarticle, a file, or any form of recorded knowledge.Semantic relationships among different concepts arereflected through broader terms, narrower terms andrelated terms in thesauri, and a hierarchical structurein classification schemes. On the other hand, athesaurus does not handle descriptive data (title,author, publisher, etc.). In this instance, separate repre-sentational vocabularies for the descriptive data such asthe Anglo-American Cataloguing Rules (AACR2) tomeet the need for descriptive representation, or theDublin Core Metadata (www.dublincore.org) have beenused.

The fundamental difference between an ontology and a conventional representational vocabulary is thelevel of abstraction, relationships among concepts, the machine-understandability and, most important,the expressiveness that can be provided by ontologies.For instance, an ontology can be layered according to

11123456789

10123456789

1120123456789

1130123456789

1140123456789

501

112

YING DING AND SCHUBERT FOO

Journal of Information Science, 28 (2) 2002, pp. 123–136 131

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 11: Ontology research and development. Part 1 - a review of ontology generation

different requirements (similar to the design model of an object-oriented programming language such as the unified modelling language, UML). Thus, we mayhave an upper-level ontology to define the general and generic concept or the schema of the lower-levelontology. Moreover, an ontology can function as a data-base schema to define various tasks or applications. Ina nutshell, an ontology aims to achieve communicationbetween humans and computers, while the conven-tional vocabulary in the library and information sciencefield fulfils the requirement for communication amonghuman beings. Ontology promotes standardization and reusability of information representation throughidentifying common and shared knowledge. Ontologyadds values to traditional thesauri through deepersemantics in digital objects, conceptually, relationallyand through machine understandability [36]. Deepersemantics can imply deeper levels of hierarchy,enriched relationships between classes and concepts,conjunction and disjunction of various classes andconcepts, formulation of inference rules, etc.

The University of Michigan Digital Library (UMDL)[37] maps the UMDL ontology to MARC with either‘has’ or ‘of’ or ‘kind-of’ or ‘extended’ relationships. Inanother study, Kwasnik [36] converted a controlledvocabulary scheme into an ontology, citing it as an added-value contribution between ontology and the knowledge representation vocabularies used inlibraries and information industries for the followingreasons:� higher levels of conception of descriptive vocabu-

lary;� deeper semantics for class/subclass and cross-class

relationships;� ability to express such concepts and relationships

in a description language;� reusability and ‘share-ability’ of the ontological

constructs in a heterogeneous system; and� strong inference and reasoning functions.

Qin and Paling [38] used the controlled vocabulary atthe Gateway to Educational Materials (GEM) as anexample and converted it into an ontology. The majordifference between the two models is the value addedthrough deeper semantics both conceptually and relationally. The demand to convert the controlledvocabulary into an ontology is due to the limitedexpressive power of controlled vocabulary, theemerging popularity of agent communication(ontology-driven communication), semantic searchingthrough the intranet and the internet, and the contentand context exchanges existing in various market-places of e-commerce. The purpose of such conversions

not only reduces the duplication of effort involved inbuilding an ontology from scratch by using the existingvocabularies, but also establishes a mechanism forallowing differing vocabularies to be mapped onto theontology.

Problems encountered. Three main problems remainin the area of development of ontology in this respect:� different ways of modelling the knowledge (Library

Science and Ontology) due to the ‘shallow’ and‘deeper’ semantics that are inherent in these twodisciplines;

� different ways of representing the knowledge, forinstance, the librarian uses the hierarchical tree(more lexically flavoured) to represent the thes-aurus or catalogues, while the ontology engineeruses mathematical and formal logics to enrich andrepresent ontologies (more mathematically andlogically flavoured);

� there is a long way to go to achieve consensus tomerge the two to create a standardized means toorganize and describe information.

2.7. Others

Borgo et al. [39] used lexical semantic graphs to createan ontology (or annotate the knowledge) based on theWordNet. They pointed out some special nouns whichare always used to represent relations, for instance, part(has-part), function (function-of), etc. They called thesespecial nouns ‘relational nouns’ which would facilitatethe identification of the relations between twoconcepts.

Yamaguchi [40] focused on how to construct domainontologies based on a machine-readable dictionary(MRD). He proposed a domain ontology rapid develop-ment environment (called DODDLE) to manage conceptdrift (word sense changes due to different domains).However, no detailed information about the basictheory adopted was made available. Nonetheless, itimplies some form of concept graph mapping plus user-involved word sense disambiguation based on theWordNet to trim and deal with the concept shift so asto get the very specific small domain ontology from the user input containing several seed words for thedomain.

Kashyap [41] proposed an approach for designing anontology for information retrieval based on the schemasof the databases and a collection of queries of interestto the users. Ontology construction from databaseschema involves many issues, such as determiningprimary keys, foreign keys, inclusion dependencies,abstracting details, grouping information from multiple

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

132 Journal of Information Science, 28 (2) 2002, pp. 123–136

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 12: Ontology research and development. Part 1 - a review of ontology generation

11123456789

10123456789

1120123456789

1130123456789

1140123456789

501

112

YING DING AND SCHUBERT FOO

Journal of Information Science, 28 (2) 2002, pp. 123–136 133

Table 1Summary of state-of-the-art ontology generation studies and projects

InfoSleuth SKC Ontology ECAI2000 Inductive Library Others(MCC) (University learning logic program- science and

of Stanford) (AIFB) ming (UT) ontology

Source data Domain Webster’s Free-text, Domain-free Annotated Controlled Machine (tagged) thesaurus online natural text documents vocabulary readable

Seedwords dictionary language Semantic A corpus of dictionaryfrom expert (in XML or documents domain with sentences Schema of Free-text doc SGML format), from the web TU databasefrom the semi- Annotated User queriesinternet structured texts(POS tagged source data Primitive automatically) concepts from

the human experts

Methods for Superficial Tag extraction Tokenzier Category of Slots fillers Subject concept syntactic Pattern Morphological nouns (rules learning headings from extraction analysis: matching analysis Conceptual from C4.5 and controlled

pattern (wrapper or Name clustering and Rapier) vocabularymatching + script) recognition induction Pattern Manually local context PageRank Part-of-speech Shallow matching refined (noun phrases) algorithm tagging natural POS conceptsWord sense Chunk parser language Tokendisambiguation processingis needed POS tagging

(contextual and lexical rules)

Methods for Relations were ArcRank Co-occurrence Selectional Inductive Relations from Relational relation automatically An iterative clustering of preferences of logic controlled nouns to extraction acquired based algorithm concepts verbs programming vocabulary represent

on the based on Mediator (minimum (machine (broad term, relationslinguistic pattern/relation (proposition, description learning) narrow term, property of extraction verb) length with a etc.)noun algorithm Heuristic rules gold standard) Manually components Relations based on the refined and the could be linguistic relationsinheritance learned and dependency hierarchy refined based relationsTwo kinds: on the local General is-a and graphical association assoc-with hierarchies of rules by

subsuming and machine specializing learning

Ontology reuse Yes (unit of Yes (online Yes (Lexicon) Yes Yes (controlled Yes (WordNet,measure, dictionary) (EuroWordNet) vocabulary) data geographic dictionary, metadata etc.) controlled

vocabulary)Ontology Hierarchical Conceptual XML Conceptual Conceptual representation structure graph hierarchy graph

Description logic

Tool or system None SMES SVETLAN WOLFIE DODDLEassociated Text-To-Onto Mo’K

OntoEditor SYLEXASIUM

Other Corpus-based Non-taxonomylearning relation

learning

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 13: Ontology research and development. Part 1 - a review of ontology generation

tables, identifying relationships, and incorporatingconcepts suggested by new database schema. A set ofuser queries expressing their information needs couldbe used to further refine the ontology created, whichcould result in the design of new entities, attributes and class–subclass relationships that are not directlypresented in existing database schemas. The ontologygenerated can be further enhanced by the use of a datadictionary and a controlled vocabulary. The approachfor ontology construction in this instance is therefore tomake full use of the existing sources, such as databaseschemas, user queries, data dictionaries and standard-ized vocabularies for proposing an initial set of entities,attributes and relationships for an ontology.

3. Summary and conclusions

As the first part of the survey of ontology generation,research has been examined that is related to semi-automatic or automatic ontology generation. Table 1summarizes the general pattern and characteristics of the various methods adopted by different researchgroups or researchers along the dimensions of source data, methods for concept extraction and rela-tion extraction, ontology reuse, ontology representa-tion, associative tools and systems and other specialfeatures.

The following salient points and features in ontologygeneration to date can be observed in general:� Source data are more or less semi-structured and

some seed-words are provided by the domainexperts not only for searching for the source databut also as the backbone for ontology generation.Learning ontology from free-text or heterogeneousdata sources is still within the area of the researchlaboratory and far from real applications.

� For concept extraction, there already exist somerelatively mature techniques (such as POS, wordsense disambiguation, tokenizer, pattern matching,etc.) that have been employed in the field of infor-mation extraction, machine learning, text miningand natural language processing. The results ofthese individual techniques are promising as basicentities and should prove most useful in the forma-tion of concepts in ontology building.

� Relation extraction is a very complex and difficultproblem to resolve. It has turned out to be the mainimpediment to ontology learning and applicability.Further research is encouraged to find appropriateand efficient ways to detect or identify the relationseither semi-automatically or automatically.

� Ontologies are highly reused and reusable. Basedon a basic ontology, other forms of ontologies maybe lifted off to cater for specific applicationdomains. This is important because of the cost ofgeneration, abstraction and reusability.

� Ontologies can be represented as graph (conceptualgraph), logic (description logic), web standards(XML), or a simple hierarchy (conceptual hier-archy). Currently there is the standard ontologyrepresentation language called DAML+OIL(www.daml.org/2001/03/daml+oil-index), whichcombines the merits of description logic, formallogic and web standards.

� A number of tools have been created to facilitateontology generation in a semi-automatic or manualway. For instance, the University of Karlsruhe(Germany) has developed and commercialized thesemi-automatic ontology editor called OntoEdit(now owned by the Spin-off Company calledOntoprice). Stanford University has exploited and provided an ontology-editing environmentcalled Protégé with many users. The University ofManchester owns OilEd, an ontology editor forsupporting DAML+OIL (for details of the state-of-the-art ontology editor, see www.ontoweb.org/workshop/amsterdamfeb13/index.html).

It is evident that much needs to be done in the areaof ontology research before any viable large scalesystem can emerge to demonstrate ontology’s promiseof superior information organization, management andunderstanding. Far beyond ontology generation,evolving and mapping existing ontologies will formanother challenging area of work in the ontology field.

References

[1] T.R. Gruber, A translation approach to portable ontologyspecifications, Knowledge Acquisition 5 (1993) 199–220.

[2] C.H. Hwang, Incompletely and imprecisely speaking:using dynamic ontologies for representing and retrievinginformation. In: Proceedings of the 6th InternationalWorkshop on Knowledge Representation meets Data-bases (KRDB’99), Linköping, Sweden, 29–30 July 1999.

[3] D. Fensel, J. Hendler, H. Lieberman and W. Wahlster(eds), Semantic WebTechnology (MIT Press, Boston,MA, 2002).

[4] J.-U. Kietz, A. Maedche and R. Volz, Extracting adomain-specific ontology Learning from a corporateintranet. In: Second ‘Learning Language In Logic’ LLLWorkshop, co-located with the International Conferencein Grammere Inference (ICGI’2000) and Conference onNatural Language Learning (CoNLL’2000), Lisbon,Portugal, 13–14 September 2000.

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

134 Journal of Information Science, 28 (2) 2002, pp. 123–136

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 14: Ontology research and development. Part 1 - a review of ontology generation

[5] T. Berners-Lee, J. Hendler and O. Lassila, The SemanticWeb. Scientific American May (2001) 216–218.

[6] D. Fensel, B. Omelayenko, Y. Ding, E. Schulten, G.Botquin, M. Brown and A. Flett, Product data integrationin B2B e-commerce, IEEE Intelligent Systems 16(4)(2001) 54–59.

[7] T. Berners-Lee and M. Fischetti, Weaving the Web(Harper, San Francisco, CA, 1999).

[8] D. Fensel, Ontologies: Silver Bullet for KnowledgeManagement and Electronic Commerce (Springer,Berlin, 2001).

[9] M. Uschold, Creating, integrating and maintaining local and global ontologies. In: Proceedings of the FirstWorkshop on Ontology Learning (OL-2000) in conjunc-tion with the 14th European Conference on ArtificialIntelligence (ECAI 2000), August 2001, Berlin.

[10] M. Fernandez-Lopez, Overview of methodologies forbuilding ontologies. In: Proceedings of IJCAI-99Workshop on Ontologies and Problem-Solving Methods:Lessons Learned and Future Trends, in conjunction withthe Sixteenth International Joint Conference on ArtificialIntelligence, Stockholm, August 1999.

[11] J. McCarthy, Notes on formalizing context. In:Proceedings of the Thirteenth International JointConference on Artificial Intelligence (AAAI, 1993).

[12] N. Guarino, Some ontological principles for designingupper level lexical resources. In: Proceedings of the FirstInternational Conference on Lexical Resources andEvaluation, Granada, 28–30 May 1998.

[13] N.B. Cocchiarella, Formal ontology. In: H. Burkhardtand B. Smith (eds), Handbook of Metaphysics andOntology (Philosophia, Munich, 1991), pp. 640–647.

[14] M. Uschold and M. Gruninger, Ontologies: principles,methods, and applications, Knowledge EngineeringReview 11(2) (1996) 93–155.

[15] A. Gomez-Perez, N. Juristo and J. Pazos, Evaluation andassessment of knowledge sharing technology. In: N.J.Mars (ed.), Towards very Large Knowledge Bases –Knowledge Building and Knowledge Sharing (IOS Press:Amsterdam, 1995), pp. 289–296.

[16] J.R. Reich, Ontological design patterns for the integra-tion of molecular biological information. In: Proceedingsof the German Conference on Bioinformatics GCB’99,4–6 October, Hannover, 1999, pp.156–166.

[17] G.A. Miller, WORDNET: a lexical database for English.Communications of ACM (11) (1995) 39–41.

[18] E. Riloff and J. Shepherd, A corpus-based approach for building semantic lexicons. In: Proceedings ofInternational Symposium on Cooperative DatabaseSystems for Advanced Applications, 1999.

[19] J. Jannink and G. Wiederhold, Ontology maintenancewith an algebraic methodology: a case study. In:Proceedings of AAAI Workshop on OntologyManagement, July 1999.

[20] J. Jannink, Thesaurus entry extraction from an on-linedictionary. In: Proceedings of Fusion ‘99, Sunnyvale,CA, July 1999.

[21] L. Page and S. Brin, The anatomy of a large-scale hyper-textual web search engine. In: Proceedings of the 7thAnnual World Wide Web Conference, 1998.

[22] S. Brin, Extracting patterns and relations from the worldwide web. In: WebDB Workshop at EDBT ‘98, 1998.

[23] A. Maedche and S. Staab, Discovering conceptual rela-tions from text. In: ECAI 2000. Proceedings of the 14thEuropean Conference on Artificial Intelligence (IOSPress, Amsterdam, 2000).

[24] A. Maedche and S. Staab, Mining ontologies from text.In: R. Dieng and O. Corby (eds), EKAW-2000 – 12thInternational Conference on Knowledge Engineering andKnowledge Management, Juan-les-Pins, France, 2–6October (LNAI, Springer, Berlin, 2000).

[25] G. Neumann, R. Backofen, J. Baur, M. Becker and C.Braun, An information extraction core system for realworld German text processing. In: ANLP’97 –Proceedings of the Conference on Applied NaturalLanguage Processing, Washington, DC, pp. 208–215.

[26] R. Srikant and R. Agrawal, Mining generalized associa-tion rules. In: Proceedings of VLDB’95 1995, pp.407–419.

[27] A. Wagner, Enriching a lexical semantic net with selectional perferences by means of statistical corpusanalysis. In: Proceedings of the First Workshop onOntology Learning (OL-2000) in conjunction with the14th European Conference on Artificial Intelligence(ECAI 2000), Berlin, August 2000.

[28] N. Abe and H. Li, Learning word association normsusing tree cut pair models. In: Proceedings of 13thInternational Conference on Machine Learning, 1996.

[29] G. Chalendar and B. Grau, SVETLAN: a system to classify nouns in context. In: Proceedings of the FirstWorkshop on Ontology Learning (OL-2000) in conjunc-tion with the 14th European Conference on ArtificialIntelligence (ECAI 2000), Berlin, August 2000.

[30] G. Bisson, C. Nedellec and D. Canamero, Designing clustering methods for ontology building: the Mo’Kworkbench. In: Proceedings of the First Workshop onOntology Learning (OL-2000) in conjunction with the14th European Conference on Artificial Intelligence(ECAI 2000), Berlin, August 2000.

[31] D. Faure and T. Poibeau, First experiments of usingsemantic knowledge learned by ASIUM for informationextraction task using INTEX. In: Proceedings of the FirstWorkshop on Ontology Learning (OL-2000) in conjunc-tion with the 14th European Conference on ArtificialIntelligence (ECAI 2000), Berlin, August 2000.

[32] P. Constant, Reducing the Complexity of Encoding Rule-based Grammars. December 1996.

[33] A. Todirascu, F. Beuvron, D. Galea and F. Rousselot,Using description logics for ontology extraction. In:Proceedings of the First Workshop on Ontology Learning(OL-2000) in conjunction with the 14th EuropeanConference on Artificial Intelligence (ECAI 2000), Berlin,August 2000.

11123456789

10123456789

1120123456789

1130123456789

1140123456789

501

112

YING DING AND SCHUBERT FOO

Journal of Information Science, 28 (2) 2002, pp. 123–136 135

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from

Page 15: Ontology research and development. Part 1 - a review of ontology generation

[34] C.A. Thompson and R.J. Mooney, Semantic LexiconAcquisition for Learning Parsers, UnpublishedTechnical Note, January 1997.

[35] N. Lavrac and S. Dzeroski (eds), Inductive LogicProgramming: Techniques and Applications (EllisHorwood, Chichester, 1994).

[36] B. Kwasnik, The role of classification in knowledgerepresentation and discovery. Library Trends 48 (1999)22–47.

[37] P. Weinstein, Ontology-based metadata: transforms theMARC legacy. In: F. Akscyn and F.M. Shipman (eds),Digital Libraries 98, Third ACM Conference on DigitalLibraries (New York: ACM Press, 1998), 254–263.

[38] J. Qin and S. Paling, Converting a controlled vocabularyinto an ontology: the case of GEM. Information Research6(2) (2000). Available at: http:\\InformationR.net/ir/6-2/infres62.html

[39] S. Borgo, N. Guarino, C. Masolo and G. Vetere, Using alarge linguistic ontology for internet-based retrieval ofobject-oriented components. In: Proceedings of 1997Conference on Software Engineering and KnowledgeEngineering, Madrid (Knowledge Systems Institute,Snokie, IL, 1997).

[40] T. Yamaguchi, Constructing domain ontologies based on concept drift analysis. In: Proceedings of IJCAI-99Workshop on Ontologies and Problem-Solving Methods:Lessons Learned and Future Trends, in conjunction withthe Sixteenth International Joint Conference on ArtificialIntelligence, Stockholm, August 1999.

[41] V. Kashyap, Design and creation of ontologies for environmental information retrieval. In: Proceedings ofTwelfth Workshop on Knowledge Acquisition, Modelingand Management, Alberta, October, 1999.

11123456789101234567892012345678930123456789401234567895012

Ontology research and development. Part 1

136 Journal of Information Science, 28 (2) 2002, pp. 123–136

at UNIV OF UTAH SALT LAKE CITY on November 24, 2014jis.sagepub.comDownloaded from