knowledge discovery in ontology learning a survey

22
Knowledge Discovery in Ontology Learning A survey

Upload: susanna-burns

Post on 11-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge Discovery in Ontology Learning A survey

Knowledge Discovery in Ontology Learning

A survey

Page 2: Knowledge Discovery in Ontology Learning A survey

Outline• Introduction

• OL Data Input

• OL Application Fields

• OL Methods

• OL Tools (practical session)

Page 3: Knowledge Discovery in Ontology Learning A survey

Introduction• Ontology Engineering is a time-consuming task

• Ontology Learning (OL) is the semi-automatic process supporting ontology engineering

• OL it is a bottom-up and data-driven process

• OL is an interdisciplinary field

Page 4: Knowledge Discovery in Ontology Learning A survey

OL Data Input• Pure NL text

• Ontologies

• KB (DB) instances

• Schemata– DB schemata

– Web schemata

• Log files

Page 5: Knowledge Discovery in Ontology Learning A survey

OL Application Fields• OL can support Ontology Engineering (and management) in different

phases.– Ontology extraction: based on some input the ontology engineer gets

ontology proposal.

– Ontology reuse: pruning existing domain ontologies for a specific application.

– Ontology interoperability (multiple ontology management): mapping discovery.

Page 6: Knowledge Discovery in Ontology Learning A survey

OL Methods (outline)• Ontology Extraction (from text)

– Weak ontology notion• Document Ontology extraction

– Strong ontology notion• Association rules

• Conceptual clustering

• Ontology Reuse– Ontology Pruning

• Ontology Learning for interoperability

Page 7: Knowledge Discovery in Ontology Learning A survey

Document Ontology extraction (1)• Extraction of concepts from a set of documents and identification of

relationships between these concepts with different individual terms [3]

• No semantic relations extraction

• Only concepts extraction (aggregation of terms identified with the same concept)

• Use of statistical analisys above a set of documents

• Good for domain specific applications

Page 8: Knowledge Discovery in Ontology Learning A survey

Document Ontology extraction (2)• Input (text documents)

• Pre-processing

• Normalization

• LSI (using SVD)

• Document Ontology Construction

Page 9: Knowledge Discovery in Ontology Learning A survey

Document Ontology extraction (3)

m x n m x r

r x rr x n

X X=

Ter

ms

Documents

Singular Value Decomposition

Ter

ms

Concepts

A U S VT

Page 10: Knowledge Discovery in Ontology Learning A survey

Association Rules (1)• Make use of shallow text processing techniques [6]

• No taxonomic relation

• Assumption: syntactic relations semantic relations

Page 11: Knowledge Discovery in Ontology Learning A survey

Association Rules (2)• Preprocess the text documents

– Morphological analysis

– Recognition of name entities

– Retrieval of domain specific concepts (if available)

– Disambiguation using context information

• Determine Concept Pairs set (CP) using several heuristic (either general or domain dependant)– NP-PP heuristic

– Sentence heuristic

– Title heuristic

Page 12: Knowledge Discovery in Ontology Learning A survey

Association Rules (3)

• Determine T = {{ai,1,…,ai,n}| (ai,1, ai,2)CP m >2 ((ai,1, ai,m) H (ai,2,

ai,m) H)}

• Determine support and confidence for all association rules Xk Yk, where |Xk|=|Yk|=1

• Propose to the user only the rules that exceed user-defined thresholds

support (Xk Yk) =

confidence (Xk Yk) =

|{ti|Xk Yk ti}|

n

|{ti|Xk Yk ti}|

|{ti|Xk ti}|

Page 13: Knowledge Discovery in Ontology Learning A survey

Conceptual Clustering (1)• Use of conceptual clustering approach [2,5] to extract a hierarchy of

concepts and to learn subcategorization frames

• In our case, examples to cluster are set of words, associated to the frequency of the corresponding instantiated frame in the corpora

• Syntactic parser provides parsed sentences including attachments of noun phrases to verbs and clauses<to travel> <subject: father> <by: car><to travel> <subject: neighbor> <by: train><to drive> <subject: friend> <by: car><to drive> <subject: colleague> <by: motor-bike><to drive> <subject: friend> <by: motor-bike>

• Unambiguous parsed sentences is not a requirement, noise is taken in account

• The meaning of the concepts of the ontology is characterized by the subcategorization frames they appear in

Page 14: Knowledge Discovery in Ontology Learning A survey

Conceptual Clustering (2)E.g.:<to travel> <subject: father> <by: car><to travel> <subject: neighbor> <by: train><to drive> <subject: friend> <by: car><to drive> <subject: colleague> <by: motor-bike><to drive> <subject: friend> <by: motor-bike>

<to travel> <subject: [father(1), neighbor(1)]> <by: [car(1), train(1)]><to drive> <subject: [friend(2), colleague(1)]> <by: [car(1), motor-bike(2)]>

<to travel> <subject: human> <by: motorized vehicle><to drive> <subject: human> <by: motorized vehicle>

Page 15: Knowledge Discovery in Ontology Learning A survey

Conceptual Clustering (3)

C1 : to cook in C2 : to put in

oven (4)

stew pan (12)

frying pan (2)

oven (5)

stew pan (3)

wok (6)

pan (2)

Clusters which have a maximum overlap (thus, clusters which contains the same words with the same frequencies) have to be merged.

Page 16: Knowledge Discovery in Ontology Learning A survey

Ontology Pruning• Ontology pruning is a data-driven means to reuse existing (general)

ontologies in order to tailor them to a certain domain [4]

• The approach uses data-oriented techniques that are based on word/concept frequencies

• The idea is to compare the frequencies of words/concepts in two different corpora, one domain-specific and one generic

• Words/concepts whose frequencies, in the domain-specific corpora, overcome of a certain percentage the frequencies of the same words in the generic corpora, are accepted, the others rejected

Page 17: Knowledge Discovery in Ontology Learning A survey

OL for Interoperability (1)• The key challenge here is to find semantic mappings between similar

elements from two ontologies [1]

• First problem: how can we define a meaningful similarity measure?

• Second problem: how can we compute such measure using the available data?

• An assumption here, is to have instances that can be used to learn concepts

Page 18: Knowledge Discovery in Ontology Learning A survey

OL for Interoperability (2)• Similarity Measure

– Many definitions are possible (it is task dependent)

– Many similarity measures are based on the joint probability distribution:P(A , B) – P(¬A , B) – P(A , ¬B) – P(¬A , ¬B)

– Jaccard coefficent – JC(A,B) = =P(A B)

P(A B)

P(A , B)

P(A , B) + P(¬A , B) + P(A , ¬B)

A B

Page 19: Knowledge Discovery in Ontology Learning A survey

OL for Interoperability (3)• Distribution estimator

– We assume to have a set of instances that is representative of the universe covered by the ontology

– N(UiA,B) is the number of instances of the ith ontology that belongs to both

A and B

– P(A , B) =

– Problem: what if A and B does not belong to the same ontology? (because this is our case!)

[N(U1A,B) + N(U2

A,B)]

[N(U1) + N(U2)]

Page 20: Knowledge Discovery in Ontology Learning A survey

OL for Interoperability (4)R

A C D

E Ft1, t2 t3, t4

t5, t6 t7

t1, t2, t3, t4

t5, t6, t7 Trained Learner L

G

B H

I Js2 s3, s4

s5, s6 s5, s6

s1

s1, s2, s3, s4

U1A

U1¬A

U2¬B

U2B

L s1, s3 s2 , s4

s5 s6

U2A , B

U2A , ¬B

U2¬A , B

U2¬A , ¬ B

Page 21: Knowledge Discovery in Ontology Learning A survey

OL Tools (KAON)• http://kaon.semanticweb.org

• Open Source

• Java based

• Implements a modular framework

• Text2Onto, module for OL from text (association rules, see Association Rules (1))

• Ontology Pruning implemented (simple filter on TF)

Page 22: Knowledge Discovery in Ontology Learning A survey

References[1] A. Doan, J. Madhavan, P. Domingos, A. Halevy. Learning to map between ontologies on the Semantic Web. In Proceedings of the 11th International World Wide Web Conference (WWW 2002), Hawaii, USA, May 2002.

[2] D. Faure, C. Nedellec. A corpus-based conceptual clustering method for verb frames and ontology acquisition. In 1st International Conference on Language resources and Evaluation -- Workshop on Adapting lexical and corpus resources to sublanguages and applications, Granada, Spain, pages 1--8, 1998.

[3] G. R. Maddi, C. S. Velvadapu, S. Srivastava, J. Gil de Lamadrid. Ontology Extraction from text documents by Singular Value Decomposition.

[4] A. Maedche, R. Volz, R. Studer, B. Lauser. Pruning-based identification of a domain in ontologies. In Proc. of I-KNOW'03, Graz, Austria, 07 2003.

[5] A. Maedche, V. Zacharias. Ontology-based Instance Clustering. In proc. of ECML/PKDD. Springer, 2002.

[6] A. Maedche, S. Staab. Discovering Conceptual Relations from Text. In Proc. Of ECAI-2000.