approach to ontology construction based on text mining

11
This article was downloaded by: [Adams State University] On: 08 October 2014, At: 08:36 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK New Zealand Journal of Agricultural Research Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tnza20 Approach to ontology construction based on text mining Wang Qian a b , Tao Lan c & Zhu Lijun d a Computer and Information Management Center , Tsinghua University , Beijing, 100084, China b College of Information and Electrical Engineering , China Agricultural University , Beijing, 100083, China E-mail: c College of Information Engineering , Shenzhen University , Shenzhen, 518060, China d Institute of Science and Technology Information of China , Beijing, 100038, China Published online: 22 Feb 2010. To cite this article: Wang Qian , Tao Lan & Zhu Lijun (2007) Approach to ontology construction based on text mining, New Zealand Journal of Agricultural Research, 50:5, 1383-1391, DOI: 10.1080/00288230709510426 To link to this article: http://dx.doi.org/10.1080/00288230709510426 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

Upload: zhu

Post on 24-Feb-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Approach to ontology construction based on text mining

This article was downloaded by: [Adams State University]On: 08 October 2014, At: 08:36Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

New Zealand Journal of AgriculturalResearchPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tnza20

Approach to ontology constructionbased on text miningWang Qian a b , Tao Lan c & Zhu Lijun da Computer and Information Management Center , TsinghuaUniversity , Beijing, 100084, Chinab College of Information and Electrical Engineering , ChinaAgricultural University , Beijing, 100083, China E-mail:c College of Information Engineering , Shenzhen University ,Shenzhen, 518060, Chinad Institute of Science and Technology Information of China ,Beijing, 100038, ChinaPublished online: 22 Feb 2010.

To cite this article: Wang Qian , Tao Lan & Zhu Lijun (2007) Approach to ontology constructionbased on text mining, New Zealand Journal of Agricultural Research, 50:5, 1383-1391, DOI:10.1080/00288230709510426

To link to this article: http://dx.doi.org/10.1080/00288230709510426

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

Page 2: Approach to ontology construction based on text mining

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 3: Approach to ontology construction based on text mining

New Zealand Journal of Agricultural Research, 2007, Vol 50: 1383-13910028-8233/07/5005-1383 © The Royal Society of New Zealand 2007

1383

Approach to ontology construction based on text mining

WANG QIAN*Computer and Information Management CenterTsinghua UniversityBeijing 100084, China

andCollege of Information and Electrical

EngineeringChina Agricultural UniversityBeijing 100083, China

TAOLANCollege of Information EngineeringShenzhen UniversityShenzhen 518060, China

ZHU LIJUNInstitute of Science and Technology Information

of ChinaBeijing 100038, China

*Author for correspondence: [email protected]

Abstract The use of ontologies as representationsof knowledge is widespread but until recently theirconstruction has been entirely manual. We argue inthis paper for the use of text corpora and automatednatural language processing methods for the con-struction of ontologies. A method of constructing adynamic agricultural ontology based on text miningis proposed by comparing thesaurus with ontology.First, AGROVOC is transformed into a conceptschema in OWL (a web ontology language). Then,the further relations among concepts are mined fromthe literature subject-indexed by AGROVOC. Theproblem of ontology construction is solved by thismethod. It is also easier to build more accurate do-rnain ontology. At the same time the method dependsless heavily on input from domain experts.

A07167; Online publication date 21 May 2008Received and accepted 10 August 2007

Keywords AGROVOC; data mining; ontologyconstruction; relation mining

INTRODUCTION

Semantic Web was introduced as a common frame-work that allows data to be shared and reused acrossapplication, enterprise and community boundar-ies (Berners-Lee et al. 2001). Ontology is used torepresent knowledge on Semantic Web. Ontologyis a conceptualisation of a domain into a human-understandable butmachine-readable format consist-ing of entities, attributes, relationships and axioms(Guarino & Giaretta 1995). In the last 10 years, manytools for the construction of ontologies have beenpresented such as Ontolingua (Farquhar et al. 1997),WebOnto puineveld et al. 2000), Protégé (Noy et al.2000), OilEd (Bechhofer et al. 2001), OntoEdit (Sureet al. 2002), KAON (Bozsak et al. 2002). However,even with those ontology construction tools, thereis an assumption behind the ontology that usefulknowledge is obtained only through manual input.Recent trends in information techniques such as textprocessing (Velardi et al. 2001) ontology learning(Navigli et al. 2003) and information extraction(Avigdor et al. 2004) provide dramatic improve-ments in automatic ontology construction. However,ontology-based text document processing, althoughpromising, has not yet been exploited fully. One ofthe bottlenecks in developing ontology-based textprocessing systems stems from the fact that the con-ceptual formalism supported by a typical ontologymay not be sufficient to represent the uncertaintyinformation commonly found in many applicationdomains (Quan et al. 2004).

Thesaurus is an indexing tool developed lastcentury and organised by dom ain experts. Thesauruscontains almost all terms and simple relations torepresent knowledge in a specific domain. There arealso large numbers of literatures in which subjectsare indexed by thesaurus. AGROVOC (FAO 2005) isa thesaurus in agriculture. To construct an ontologyautomatically or semi-automatically in agriculture,

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 4: Approach to ontology construction based on text mining

1384 New Zealand journal of agricultural Research, 2007, Vol. 50

we propose a hybrid approach to the dynamic ac-quisition of domain ontology. The method is basedon the integration of data mining (DM), naturallanguage processing (NLP) and existing thesaurusand literatures in agriculture.

A G r o V o c

agROVOC is a multilingual, structured and con-trolled vocabulary designed to cover the terminologyof all subject fields in agriculture, forestry, fisheries,food and related domains. agROVOC adopted USE,UF, BT, NT and RT as the five standard relationshipsdefined by ISO2788 and ISO5964. These relationscan be divided into three sorts at the semantic level,namely: synonymy, affiliation and correlativity.USE and UF belong to synonymy, which indicatesthe same meaning between two terms. BT and NTbelong to affiliation, which indicates that one termincludes elements of the other. RT belongs to correla-tivity, which indicates a relative relationship otherthan synonymy and affiliation. The relationships ofagROVOC are shown in Table 1.

Thesaurus is primarily used to index a literaturesubject for the convenience of information search-ing. The structure of thesaurus is quite simple anddescriptors can be combined and expanded easily. itrelies less on input labour and it improves informa-tion retrieval by computer. For the reasons above,almost all subject indexing vocabularies are thesau-rus, e.g., agROVOC, Ei, iNSPEC.

dIFFErEncE BEtWEEn AGroVoc AndontoLoGY

in 1993, the most widely accepted definition ofontology was proposed by gruber (1993). Thereis some overlap between thesaurus and ontology.They are both information organisation tools whichemphasise the representation, ranking and organisa-tion of information. Their aim is to make it easy to

table 1 The relations of agROVOC.

Relationship

Synonymy

Affiliation

Correlativity

abbreviation

USEUFBTNTRT

Description

UseUsed ForBroader TermNarrower TermRelated Term

retrieve information. But the terms also have cleardifferences:(1) Elements Thesaurus is made up of descriptors

and simple relations while ontology is usuallymade up of classes, concepts, relations, func-tions, axioms and instances.

(2) Emphasis on information description The em-phasis of thesaurus is to index a subject withdescriptors. Ontology is mainly used to clearlydefine a concept. It reflects the conception ag-gregate recognised in a correlative domain.

(3) Structure and representation Thesaurus is mostlyedited by hand with a symbol-indexed relation,it has no concrete descriptive language, and itis tree-like and linear in its organisational struc-ture. Ontology is a specification of a conceptu-alisation. There are multidimensional reticularrelations among conceptions which are morecomplicated than thesaurus.

(4) Semantic description Thesaurus and ontologypossess relations among terms or conceptions.The relations of thesaurus are more general andambiguous while the conception relation of on-tology is more complex and comprehensive.

c o n S t r u c t I o n oF AGrIcuLturALontoLoGY

construction processa t the present time, the familiar methods of buildingontology include TOVE (gruninger & Fox 1995),Skeletal Methodology (Uschold & gruninger 1996),METhONTOLOgY (gomez-Perez et al. 1996)and Cyclic acquisition Process (jorg-Uwe et al.2000). almost all of these methods have been pro-posed in specific projects. Such ontology is alwaysconstructed for a specific application and the con-struction processes are different.

Considering the wide use of thesaurus and somany indexed literatures, we proposed the methodof constructing a dynamic domain ontology basedon thesaurus.

a s shown in Fig. 1, the process is divided intothree phases: analysis, construction, and evalua-tion.

The detailed descriptions are as follows:(1) analysis phase in this phase, scope should be

confirmed. According to the scope, thesauruscan be confirmed. A minimum scope is best inthis phase.

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 5: Approach to ontology construction based on text mining

Wang et al.—Ontology construction 1385

Fig. 1 Process of dynamic ontology construction.

Term c o d e : Ï 4 3 5 • T»Ä(HTWTHC TwrtK

fiT

NT: Orv;

WT ; Or/Z3 lone starr .nata

MT :

HT

ß T : ftiea

RT

Fig. 2 Examples of terms and relations.

(2) Construction phase This phase can first bedivided into two parts. above all, the structure ofthesaurus which is selected should be changed toconcept schema. That is, descriptors are changedto concept class and relations are changed toconcept property. Then, with proper data min-ing arithmetic, we can find the further relationsamong instances, which can be obtained fromthe literature indexed by thesaurus. in this phase,the basic domain ontology will be formed.

(3) Evaluation and improvement phase Properstandards will be suggested by domain expertsto evaluate the basic domain ontology formedin the second phase. Domain experts should beinvolved in this phase. if the basic domain ontol-ogy is not accurate, we can adjust the scope ofthesaurus or arithmetic to improve the domainontology.

Ontology construction is a reduplicate cyclicprogress. in the approach, the second phase is espe-cially important. in this paper, we give an exampleof the construction with agROVOC.

term selectionafter analysing the target clearly, we can select thescope of the thesaurus properly in terms of taxonomyand scientific literature database. The more minimal

the scope, the better. in later circular evolutions,the scope can be extended according to need. Forexample, if it is certain that the scope is agriculturalinformation management we can select descriptorsand their relation aggregate in agROVOC. First ofall, agricultural information is generally managed bycomputer and "Oryza" can be regarded as root whichcan be extended. a part of "Oryza" in agROVOCis shown in Fig. 2. "Term code" is the standarddescriptor code in agROVOC. The terms requiredwill be selected in turn according to the relationsof selected terms. This step will continue until theminimum scope is supplied.

concept schema of oWLThere are many ontology description languagessuch as CycL (Cycope 2002), RDF/RDF Schema(W3C 2004) and OWL (W3C 2004). among theselanguages, OWL has more ways and means to rep-resent semantic information than others. in February2004, it was recommended that it be the officialstandard ontology language by W3C.

in this paper, we use OWL to transform thesau-rus into concept schema. Affiliation correspondsto the subClassOf or partOf property of ontology

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 6: Approach to ontology construction based on text mining

1386 New Zealand journal of agricultural Research, 2007, Vol. 50

Fig. 3 System architecture of relation mining.

(van assem et al. 2004). The synonymy and correla-tivity relations can be described as object propertiesof owl: ObjectProperty. The concepts and propertiesof concept schema should be defined according todescriptor codes which marks items uniquely and inaccord with the criterion of uniform resource identi-fier (URI). For example, "Oryza" can be defined asClass "C_5435" and RT can be defined as Property"R_90". analysing the further relationship betweenconcepts from thesaurus, we can draw the conclusiondirectly as follows:• (x<use>y)->(y<used_for>x)

Property,<use>, can be tagged as the owl:inverseof<used_for>

• (x<broader_term>y)->(y<narrow_term>x)Property,< broader_term >, can be tagged as theowl:inverseof< narrow_term >-

• (x< broader_term>y)Ù(y< broader_term>z)->(x<broader_term>z)P r o p e r t y , < b r o a d e r _ t e r m > , is aowl: TransitiveProperty

• (x<narrow_term>y)Ù(y<narrow_term>z)->(x<narrow_term>z)Property,< narrow_term>, is aowl: TransitiveProperty

• (x<related_term>y)->(y<related_term>x)Property,< related_term >, is a owl: Symmet-ricProperty

relation miningTransforming terms and relations of thesaurus intoconcept schema can only change the organisationand representation form of thesaurus. To improve itspracticability and reasoning ability, we should findfurther thesaurus relations and other non-taxonomicrelations.

Description logic is used to organise the ontologyknowledge system. Description logic can be dividedinto two parts. One is Tbox which contains the lim-ited aggregate of terminology, the other is aboxwhich contains the limited aggregate of assertion. inthis paper, the concept schema transformed by the-saurus can be considered as Tbox. From a literaturedatabase indexed by thesaurus we can obtain abox.in other words, the literature indexed by thesaurus isthe direct source of conception instances. generally,one literature indexed by concept always containsmany related concepts. With proper NLP processorand data mining arithmetic, we obtain the furtherrelations and properties from the literature database.The system architecture of relation mining is shownin Fig. 3.

The proposed relation mining system is aimedat extracting relational concepts from agriculturalliteratures. a s shown in Fig. 3, this system has threemajor components.• SRD (seed relation in domain) extraction—this

module accepts ontology-based tagged text docu-ments as input and extracts SRD from it based onNLP techniques.

• RDF transform—this module changes the rep-resentation form of SRD and obtains the RDFStatements. according to the agROVOC andother RDF Characteristics such as Mutual in-formation and Predicate information, all weaklyRDF Statements will be filtered.

• Relation rules mining—this module implementsdeep relation mining based on association rulesmining.

S r d extractionEach literature can be converted into an instance ofthe tree by distributing the tags (Muhammad 2005).Empirical studies suggest that ontology engineersmay not always easily label a relation between twogeneral concepts, since various relations amonginstances of the same general concepts are possible(Maedche 2002). To obtain the base relationship,

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 7: Approach to ontology construction based on text mining

Wang et al.—Ontology construction 1387

we define SRD which is an aggregate containingrelation verbs and the agricultural concepts sur-rounded by verbs. A SRD defined in the Cartesianproducts of the concept sets C1, C2….. Cn, is a SRDsuch thatSRD =

((CC12,,...Cn),vC,(12,,CC...nC)) | (12,,CC...n)

Ë CC12×....×Cn(1)

By algorithm relation variants (ESRD) some ofthe key behavioural features of concepts can be

extracted, we use some functions and sets explainedas follows:—Trees is a set of instance segments obtained fromliterature database;—Tree is a tree instance segment which containsRoot, Right root (Rr) and Left_root (Lr);—Root is a node that contains the right most verbtag;—Right root (Rr) is a node that contains all tags thatare to the right of the tag considered at Root;—Left_root (Lr) is a node that contains all tags thatare to the left of the tag considered at Root.

Algorithms : ESRD (Extraction of SRD)

Input: List of highTF verb;Collection of Language_SchemaLiterature database indexed by thesaurus

output: SRD aggregates.

1.

2.

3.

4.

5.

6

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

Convert literature into Sentence binary Trees structure.

for each tree Trees {

root =node_text //start from node_text of the tree

If(root Null){

Search_highTF_Verb (Stem_Verb_List, root);

// Search the stem verb list for the partial match of the root word

if (the root word has a match with a highTF verb){

Check the first element of the text segment pointed by Left_root(Lr)

if (Lr has a match with a Language_Schema ){

Store the root word along with the Concepts as Relation variantof the verb that was matched in step 5;

goto step 15

} else{

Store the root word only as a Relation variant (text) of the verbthat was matched in step 5;

goto step 15

} else{

root=root. Right root(Rr);

goto Step 4

}

}

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 8: Approach to ontology construction based on text mining

1388 New Zealand journal of agricultural Research, 2007, Vol. 50

r d F transform

We can transform resources into RDF statementswith another simple algorithm to split the Cartesianproducts from SRD. The mutual information (Mi)of association between a concept pair Ci and Cj iscomputed as

d=1 d=1

where <2> denotes a conjunction operator which istaken as a minimum operator in our case and n is thenumber of documents. a threshold value may be ap-plied to filter all weakly associated concept associa-tions. a top-10 list of extracted concepts associationsalong with their Mi is shown in Table 2.

The predicate information (Pi) of verbs betweena concept pair Ci and Cj is computed as

(3)

relation rules mining

Thus the system can obtain the relation variants ofrelational verbs and then can compute the frequentlyoccurring triplets of the form (s,p, o) very efficiently.However, few property rules defined in OWL areidentified. We will give an algorithm for the furtherrelation rules mining (RRM). in this algorithm, weuse some functions and sets explained as follows:

• Φ is a set of RDF statements obtained from a litera-ture database. Every statement is represented as (s,p, o). s is subject ,p is predicate and o is object;

d=

P ={ (s,p, o)| (s,p, O) Φ }, where (s,p, o) Φ,return all the statements which contain property

p;R is a set of property rules defined in OWL(W3C 2004);Support (c) is the support degree of Concept C.it is computed as

(2) SupportC() = i

\m\(4)

• Support (p) is the support degree of Propertyp.it is computed as

(5)

M• Confidence (TranP) is the confidence degree

where p satisfies is a TransitiveProperty. it iscomputed as

Confidence()TranP =

{(xz,)|(xp,,zP)} (6)

CTC losure({(,xz) |( P})\\

• Confidence (SymmP) is the confidence degreewhere p satisfies is a SymmetricProperty. it iscomputed as

Confidence()SymmP =

{(xy,)|(xp,,yP)} ( 7 )

lySC losure({(,xy)|(,xp,y)} P)

• Confidence (FuncP) is the confidence degreewhere p satisfies is a FunctionalProperty. it iscomputed as

Confidence()FuncP =

\\{x\(x,p,y)e PA(x,p,z)e Py() = z}

\\{x\(x,p,y)£P}\\

table 2 Concept pairmation.

Concept 1

M th rYieldsOryza sativaTilleringPlanting stockOryza sativaResistance to injurious

factorsheredityRootsLight

associations. Mi, Mutual

Concept 2

FathersSpikeshybridisationSpikesSpikesgenesgenes

genesPlanting stockOryza sativa

infor-

Mi

0.860.580.500.500.460.460.46

0.420.420.39

table 3 RDFframework; Pi,

Subject

Yield increasesOryza sativaLightRiceCultivationhardinessRiceWaterhybridisationSpikes

statements. RDF, Resource descriptionpredicate information.

Predicate

increasepart ofaffectsresistbringinhibitexpressedirrigateinduceincrease

Object

RiceSpikesOryza sativaPhosphorusOryza sativagenesgenesgeneshardinessRice

Pi

0.880.810.770.700.680.670.670.660.610.58

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 9: Approach to ontology construction based on text mining

Wang et al.—Ontology construction 1389

• Confidence (TranP) is the confidence degreewhere p satisfies is a inverseOf property. it iscomputed as

Confidence(,PP12) =

2 1 × {(xp , , yx) | ( , py ,) e P\ A (, px ,) e P2} ( 9 )

Confidence()IFuncP =

• Confidence (IFuncP) is the confidence degreewherep satisfies is a inverseFunctionalProperty.it is computed as

\\{x\(x,p,y)£P}\\

• Confidence (From) is the confidence degree wherep satisfies is a allValuesFrom or someValuesFromlimitation. it is computed asConfidence()From =

\\{(x,p,y)\3(x,p,y)ePAyeC}\\

\\{(x,p,y)\3(x,p,y)eP}\\

Algorithms: Relation Rules Mining

Input: RDF aggregates as Φ

output: Properties according with the rules

1. for each (c C){

2. Compute Support (c);

3. if( Support (c) < the threshold) Delete the record from Φ ;

4. } //filter Φ by Concept Support

5. for each (p P){

6. Compute Support (p);

7. if( Support (p) < the threshold) Delete the record from Φ ;

8. } //filter Φ by Property Support // Property Characteristics mining

9. for each (p P){

10. Compute the Confidence(TranP) Confidence(SymmP) Confidence(FuncP)and Confidence(IFuncP);

11. if (Confidence > the threshold) output the Property;

12. foreach(p2 P){

13. Compute the Confidence(p,p2);

14. if (Confidence > the threshold) output the Property;

15. }

// Property Restrictions mining

16.foreach(c C){

17. for each (p P){

18. Compute the Confidence(From);

19. output the Property according to the Confidence(From);

20. }

The output data of RRM is shown in Table 4.

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 10: Approach to ontology construction based on text mining

1390 New Zealand Journal of Agricultural Research, 2007, Vol. 50

Table 4 Output data of relation rules mining (RRM).

Property characteristics Properties

TransitivePropertySymmetric PropertyFunctionalPropertyinverseOfInverseFunctionalProperty

include, increase, induce etc.affect, relate with etc.transform(increase, reduce)null

Evaluation and improvementFrom thesaurus selection to the achievement of theontology m odel, the whole process can be completedautomatically or semi-automatically by computer. Itis necessary to evaluate the basic domain ontologymodel. It should accord with the criterion (Gruber1993) of definitude, integrality, coherence, expansi-bility and less restriction. Besides the above, domainexperts should participate in its evaluation. If thebasic model is incomplete or inaccurate, we shouldextend the literature, adjust the threshold S oî Sup-port (pr) or threshold C of Confidence (pr), repeatthe approach of modelling and update the definitionof the model.

The improvement of ontology is a reduplicatecircular process, which requires the participation ofexperts. The criterion of domain ontology is differentbecause of the different requirements. To improvethe reasoning ability of ontology, it is necessary forus to further research and mine the literature databasewith machine learning and data mining. Therefore,the reasoning ability can be improved with preciselogic.

CONCLUSION

Mark et al. (2004) have suggested a method for Con-verting Thesauri to RDF/OWL. This was targeted atchanging the organisation and representation form ofthesaurus. Abulaish & Dey (2005) have developed atext-mining framework to enhance biological ontol-ogy with fuzzy relations. In their work, few propertyrules defined in OWL are identified.

In this paper we have considered som e of the chal-lenges faced in the construction and improvementof agricultural ontology and we have argued for theuse of text literatures as the basic resource for build-ing it. Moreover, we have also proposed a relationmining technique that can find the further relationsfrom the literature database indexed by thesaurus.

The proposed framework is used to construct domainontology in agriculture and the quality can also beassured. The process can be completed automaticallyor semi-automatically with the computer so that itdepends less upon input from domain experts.

However, it is incomplete in multiple-level re-lation mining and quantifieational evaluation. Inour research, valid data of RDF Statements are notenough after filtered in the RDF Transform step,so the following cannot present the predominancewith association rules mining. In future research, weintend to enlarge the literature resource, and adjustthe mining arithmetic to support multiple-level rela-tions. Moreover, we will use more specific criteriato evaluate ontology to make the domain ontologygenerated more accurate and more useful.

REFERENCES

Avigdor G, Giovanni M, Hasan J 2004. OntoBuilder:fully automatic extraction and consolidation ofontologies from web sources. In: Proceedings ofthe ICDE. Boston, IEEE Computer Society. Pp.853-853.

Bechhofer S, Horrocks I, Goble C, Stevens R 2001. OilEd:a reason-able ontology editor for the semanticWeb. In: Baader F, Brewka G, Eiter T, ed. Proceed-ings of the KI Joint German/Austrian Conferenceon Artificial Intelligence. Heidelberg, Springer-Verlag. Pp.396-408.

Berners-Lee T, Hendler J, Lassila O 2001. The semanticweb. Scientific American Magazine, May 2001.Pp. 30-37.

Bozsak E, Ehrig M, Handschuh S, Hotho A, Maedche A,Motik B, Oberle D, Schmitz C, Staab S, StojanovicL, Stojanovic N, Studer R, Stumme G, Sure Y,Tane J, Volz R, Zacharias V 2002. KAON-towardsa large scale semantic web. In: Bauknecht K,Mintjoa A, Quirchmayr G ed. Proceedings of the3rd International Conference on E-Commerce andWeb Technologies. Heidelberg, Springer-Verlag.Pp. 304-313.

Cycope Inc. 2002. The cycl of syntax, http://www.cyc.com/cycdoc/ref/cycl-syntax. html

Duineveld AJ, Stoter R, Weiden MR, Kenepa B, Ben-jamins VR 2000. Wonder tools? A comparativestudy of ontological engineering tools. Interna-tional Journal of Human-Computer Studies 52(6):1111-1133.

FAO 2005. AGROVOC thesaurus, http://www.fao.org/aims/ag_intro.htm

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014

Page 11: Approach to ontology construction based on text mining

Wang et al.—Ontology construction 1391

Farquhar A, Fikes R, Rice J 1997. The Ontolingua server:a tool for collaborative ontology construction.International Journal of Human-Computer Studies46(6): 707-727.

Gomez-Perez A, Fernandez-Lopez M, de Vicente A 1996.Towards a method to conceptualise domain on-tologies. In: ECAI96 Workshop on OntologicalEngineering, Budapest. Pp. 41-51.

Gruber TR 1993. A translation approach to portableontology specifications. Knowledge Acquisition5: 199-220.

Guarino N, Giaretta P 1995. Ontologies and knowledgebases: towards a terminological clarification. In:Mars N ed. Toward very large knowledge bases:knowledge building and knowledge sharing. Am-sterdam, IOS Press. Pp. 25-32.

Gruninger M, Fox MS 1995. Methodology for the designand evaluation of ontologies. In: Workshop onBasic Ontological Issues in Knowledge Sharing,Montreal.

Jorg-Uwe Kietz, Raphael Volz, Alexander Maedche 2000.Extracting a domain-specific ontology from acorporate intranet. Proceedings of the FourthConference on Computational Natural LanguageLearning and of the Second Learning Languagein Logic Workshop.

Maedche A, Staab S 2001. Ontology learning for thesemantic web. IEEE Intelligent Systems 16(2):72-79.

Navigli R, Velardi P, Gangemi A 2003. Ontology learningand its application to automated terminology trans-lation. IEEE Intelligent Systems 18(1): 22-31.

Noy NF, Fergerson RW, Musen MA 2000. The knowledgemodel of protégé-2000: combining interoper-ability and flexibility. In: Dieng R, Corby O ed.Proceedings of the EKAW 2000. Heidelberg,Springer-Verlag. Pp. 17-32.

Quan TT, Hui SC, Cao TH 2004. FOGA: a fuzzy ontologygeneration framework for scholarly semantic web.In: Proceedings ofthe 2004 Knowledge Discoveryan Ontologies Workshop (KDO'04), Pisa, Italy,24 September.

Sure Y, Angele J, Erdmann M, Staab S, Studer R, Wenke D2002. OntoEdit: collaborative ontology engineer-ing for the semantic web. In: Horrocks I, HendlerJA ed. Proceedings of the ISWC 2002. Heidelberg,Springer-Verlag. Pp. 221-235.

Uschold M, Gruninger M 1996. Ontologies: principles,methods and applications. Knowledge Engineer-ing Review 11(2): 93-136.

van Assem M, Menken MR, Schreiber G, Wielemaker J,Wielinga BJ 2004. A method for converting the-sauri to RDF/OWL. International Semantic WebConference. Pp. 17-31.

Velardi P, Fabriani P, Missikoff M 2001. Using text pro-cessing techniques to automatically enrich a do-main ontology. In: Proceedings of the InternationalConference on Formal Ontology in InformationSystems. New York, ACM Press. Pp. 270-284.

W3C RDF Primer 2004. W3C Recommendation 10 Febru-ary 2004. http: //www. w3.org/TR/rdf- prim er

W3C 2004. OWL web ontology language guide. W3CRecommendation 10 February 2004. http://www.w3.org/TR/2004/REC-owl-guide-20040210

Dow

nloa

ded

by [

Ada

ms

Stat

e U

nive

rsity

] at

08:

36 0

8 O

ctob

er 2

014