theory and application of database systems a hybrid approach for extending ontology from text he wei

17
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Upload: clara-hampton

Post on 01-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Theory and Application of Database Systems

A Hybrid Approach for Extending Ontology from

Text He Wei

Page 2: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

OutLine

Introduction Related WorkOur Proposed MethodExperiment and EvaluationConclusion

Page 3: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Introduction

ontology was carried out, it was drawn an attention by many domestic and foreign researchers, and applied to the various fields of computer.

Once the ontology constructed, it is a time-consuming and laborious to manually add a new concept into an existing ontology, and is still a great challenge to extend an existing ontology automatically;

To solve this problem, we propose a hybrid approach for semi-auto extending ontology from text using semantic relatedness between words in this paper.

Page 4: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Related Work

The automatic and semi-automatic ontology extension has been studied for two decade years. There are three kinds of approaches of ontology extension, which include natural language processing (NLP) based approach network based approach and user interaction approach.

we proposed a semi-automatic method for extending ontology from text, which used semantic relatedness between terms to discover the new concepts, and positioned them into seed ontology through various kinds of rules.

Page 5: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

The co-occurrence analysis and word filter are exploited to acquire the candidate concepts for each concept of the seed ontology from documents in this method;

To improve the speed of ontology extension, we use semantic relatedness between words to compress the extended concept space;

The extension rules and subsumption analysis are exploited to add the extended concepts into the seed ontology with generating the extended ontology.

Page 6: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Identifying the Candidate Concepts Using Co-occurrence Analysis and Word Filter exploiting search engine to get a domain document set related to C,

named as D, and looking for the CoWord(C) from D to generate the co-occurrence word set,denoted as CoWordSet(C), CoWordSet(C)={wi|wi CoWord(C)}; then counting the CoFreq(wi) ∈and AFreq(wi) for each wi in CoWordSet(C) in document set D, and discarding the words which hold the AFreq(wi) >> CoFreq(wi) and CoFreq(Wi)<5; finally ranking the remaining the co-occurrence words in CoWordSet(C) according to theirs CoFreq(W) on descending order;

calculate the relative importance of each wi in CoWordSet(C), RI(wi); compute the entropy of each wi in CoWordSet(C), Entropy(wi); Selecting the overlap words, which hold a higher RI and Entropy

scores fromCoWordSet(C) and generating the candidate concept set of concept C, denoted byCandCpt(C).

Page 7: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Obtaining the Extended Concepts Using Semantic Relatedness between Words we use semantic relatedness between words to compress

the extended concept space. We only select a portion of the concepts in CandCpt(C) as the extended concepts.

The processof selection is followed: for each concept Ci in CandCpt(C), we measure semantic relatedness between Ci and C, and select the concepts which have a highly score of semantic relatedness as the extended concepts. In this paper, we only use the top 3 concepts.

Page 8: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Extending Ontology Using Extension Rules and Subsumption Analysis Rule 1: if the score of semantic relatedness between the

concept Ci and C is equals to 1 or approximately 1, it means that they are consistent in semantic and hold the synonym relationship. We add the concept Ci into the synonym attribute of concept C.

Page 9: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Extending Ontology Using Extension Rules and Subsumption Analysis Rule 2:if the score of semantic relatedness between the

concept Ci and C is the maximum, but it does not satisfy the extension rule 1, we use subsumption analysis to identify the semantic relationship between Ci and C.

Subsumption analysis:Given two concepts Ci and C, the concept C is said to more general than concept Ci if the following condition holds:

Page 10: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Extending Ontology Using Extension Rules and Subsumption Analysis Rule 3: if the score of semantic relatedness between the

concept Ci and C is the maximum and does not satisfy the extension rule 1,2, we think there hold a related relationship between Ci and C. We add the concept Ci into the related attributeof concept C.

Page 11: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Experiment and EvaluationExperiment

We select some terms related to education field that is a sub-field of E-government and constructed seed ontology in our experiment. The seed ontology is consist of 10 concepts and includes three kinds of relationship between this concepts, such as synonym, hyponym/hypernym (is-a) and related relationship.

download about 4,000 pages related to education from the website of Education Ministry of China, and then exploit htmlparser to acquire the content of these pages and generate the domain document set D.

Page 12: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Experiment and EvaluationExperiment

5

Page 13: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Experiment and EvaluationEvaluation

We choose a part of a gold standard E-government domain ontology constructed by E-government thesaurus[11] as our reference ontology, which is concerned to education.It has about 4,500 terms and three kinds of relationship between terms, such as synonym, hyponym/hypernym (is-a) and related relationship.

The improved recall,precision and F1-Measure have been used to evaluate our proposed method

Page 14: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

Experiment and EvaluationEvaluation

Because ontology is consisting of the concepts and relationship between concepts, we define the improved recall, precision and F1-Measure as following formula.

Page 15: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Our Proposed Method

F1-Measure has been raised with the increment of the number of iteration of ontology extension. It reaches 0.6827 after the fifth iteration achievement, which is a promising value. It indicates that the proposed method is valuable. And the precision has been maintained at a higher level. It ranges from 0.7278 to 0.9899.

Page 16: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

Conclusion

With the massive new web information, the existing ontology serious lags in the emergence of the new concepts and has not suitable to organize and manage the new information.

To solve this problem, we propose a hybrid approach for extending ontology from text using semantic relatedness between words in this paper, and add the new concepts discovered in documents into the existing ontology.

Evaluation results on the improved recall, precision and F1-Measure demonstrate that our proposed method in this paper is promising and logically.

there is a little drawback because of relationship definition during the course of ontology extension.

Page 17: Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei