prontolearn: unsupervised lexico-semantic ontology generation using probabilistic methods

37
Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions PrOntoLearn: Unsupervised lexico-semantic ontology generation using probabilistic methods Saminda Abeyruwan 1 Ubbo Visser 1 Vance Lemmon 2 Stephan Sch¨ urer 3 Department of Computer Science, University of Miami The Miami Project to Cure Paralysis, University of Miami Miller School of Medicine Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine URSW 2010 7 th November, 2010

Upload: rommel-carvalho

Post on 07-May-2015

1.264 views

Category:

Documents


0 download

DESCRIPTION

Presentation given by Saminda Abeyruwan at the 6th Uncertainty Reasoning for the Semantic Web Workshop at the 9th International Semantic Web Conference in November 7, 2010. Paper: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic MethodsAbstract: Formalizing an ontology for a domain manually is well-known as a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck. Therefore, researchers developed algorithms and systems that can help to automatize the process. Among them are systems that include text corpora for the acquisition. Our idea is also based on vast amount of text corpora. Here, we provide a novel unsupervised bottom-up ontology generation method. It is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. We provide a quantitative and two qualitative results illustrating our approach using a high throughput screening assay corpus and two custom text corpora. This process could also provide evidence for domain experts to build ontologies based on top-down approaches.

TRANSCRIPT

Page 1: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

PrOntoLearn: Unsupervised lexico-semanticontology generation using probabilistic methods

Saminda Abeyruwan1 Ubbo Visser1 Vance Lemmon2

Stephan Schurer3

Department of Computer Science, University of MiamiThe Miami Project to Cure Paralysis, University of Miami Miller School of Medicine

Department of Molecular and Cellular Pharmacology, University of Miami Miller School ofMedicine

URSW 2010 7th November, 2010

Page 2: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Outline

1 Motivation

2 Related work

3 Deficiencies

4 Research approach

5 Results

6 Discussion

7 Summary & Future work

8 Questions

Page 3: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Motivation

Why?

1 An ontology is a formal, explicit specification of a sharedconceptualisation [TRG93, RS98]

2 Knowledge-bases are represented by ontologies [UMLS09]

3 Formalizing an ontology for a domain is a tedious and cumbersomeprocess (Knowledge acquisition bottleneck (KAB))

4 Substantially large text corpora available to be classified into anontology [BAO09]

5 Text corpora of the domain of discourse contains

RedundancyStructured and unstructured textNoisy data (Uncertainty via Degree of belief)Lexical disambiguitiesSemantic heterogeneity problems

6 Research on KAB is highly investigated by the Semantic (Web)Community

Page 4: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

General idea

General idea

1 Reverse engineering an ontology (bottom-up) (Lexicon ⇒ Anontology)

2 Bayesian reasoning to deal with degree of belief

3 Conceptualization is learned through probabilistic reasoning

4 Lexicon-semantic structues extracted from Wordnet 3.0 [WN3009]

5 Use top-down approach to check the consistency of the generatedontology

6 Constrained by conditions and hypotheses

7 Serialize the learned ontology into OWL DL and query usingSPARQL

“A little semantics goes a long way” - Hendler hypothesis [JH03]

Page 5: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Probabilistic reasoning & Heterogeneity

Probabilistic reasoning

P-CLASSIC [DK97]

P-OWL extension [ZD04]

P-SHIF(D), P-SHOIN(D) & P-Pellet [TL07, PP08]

Heterogeneity

Read the web project [TM09, TM10]

SEAL, iSEAL & ASIA [RW07, RW08, RW09]

Taxonomy induction [RS06]

LOD [JB09, LD06]

Page 6: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Knowledge acquisition & ontology learning

Knowledge acquisition

Approaches [PC09, DSK09, LS09, HC09, JB05]

Large scale knowledge extractionKnowledge integrationExtracting commonsensical knowledgeTextual entailment with first-order-logic

Tools [TTO00, SS09, OLSW02, TTO01, HT09]

Text-To-Onto, Text2Onto, OntoWare.org LExO & HermiT

Ontology learning

Learning [PC09, PH05, CC08, JL09, LBM08]

Dealing with uncertainty and inconsistencySemantic concepts with unsupervised statistical learningSemantic Web Services & floksonomy

Formal concept analysis [PC05]

Page 7: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Deficiencies

Related work

Pros

Learning terms, synonyms, concepts, taxonomies, rules, relations andaxioms for ontology O

NLP, dictionary passing, statistical methods & machine learningtechniques and co-occurrence among terms

Cons

Top-down approach. Classification or an ontology is givenUncertainty is dealt with a domain expertMost of the conceptualisation is learned by predefined rules

Our approach

1 Substantially large text corpora

2 Uncertainty is represented with probabilistic approach

3 Unsupervised learning

4 Hypothesis: an ontology generation is much faster

5 Goal: to achieve maximum confidence

Page 8: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Goals

Goals

1 To generate consistent lexico-semantic ontology O with a T − Boxand a A− Box that can be serialized into OWL DL

2 Querying via SPARQL [SPARQL08] [JENA09]

How do we start ?

1 Corpus C contains a lot of documents di (di ∈ C ) for i = 1, 2, 3, . . .

2 Learned lexicon set L contains a finite list of words wj

(L = w1,w2, . . . ,wn) and group set G contains a finite set of groupsgk (G = g1, g2, . . . , gm)

Page 9: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Overall process

Page 10: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Definition

The lexicon L is the set that contains words belonging to the universe ofEnglish vocabulary, which is part-of-speech type tagged with the PennTreebank English POS tag set [PT10] and the type of the word IS,

Term DescriptionNN Noun, singular or massNNP Proper Noun, singularNNS Noun, pluralNNPS Proper Noun, pluralJJ AdjectiveJJR Adjective, comparativeJJS Adjective, superlativeVB Verb, base formVBD Verb, past tenseVBG Verb, gerund or present participleVBN Verb, past participleVBP Verb, non-3rd person singular presentVBZ Verb, 3rd person singular present

Page 11: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Phases

Phases

1 Pre-processing

Stanford tagger (the Pen Treebank POS tagger)Filter elements for lexicon

2 Syntactic analysis

Boostrap algorithm to count frequencies of words, groupsNormalizing, stemming and lemmatization of words

3 Semantic analysis

Bayesian reasoning to produce concepts and relationsSubsumption hierarchy inductionHyponym and meronym analysis

4 Representation

Serialize to OWL DL

Page 12: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Pre-processing

Filter

Regex ([a-zA-Z]+[- ]?w*) , Length of a word (2)

Example

1 The mevalonate pathway is comprised of three consecutive reactionsthat are catalyzed by the enzymes mevalonate kinase (MK; E.C.2.7.1.36), phosphomevalonate kinase (PMK; E.C. 2.7.4.2), anddiphosphomevalonate decarboxylase (PDM-DC; E.C. 4.1.1.33).

2 The DT mevalonate JJ pathway NN is VBZ comprised VBN of INthree CD consecutive JJ reactions NNS that WDT are VBPcatalyzed VBN by IN the DT enzymes NNS mevalonate VBPkinase NN -LRB- -LRB- MK NNP ; : E.C. NNP 2.7.1.36 CD-RRB- -RRB- , , phosphomevalonate JJ kinase NN -LRB- -LRB-PMK NNP ; : E.C. NNP 2.7.4.2 CD -RRB- -RRB- , , and CCdiphosphomevalonate JJ decarboxylase NN -LRB- -LRB-PDM-DC NN ; : E.C. NNP 4.1.1.33 CD -RRB- -RRB- . .

Page 13: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Pre-processing

Filter

Regex ([a-zA-Z]+[- ]?w*) , Length of a word (2)

Example

1 The mevalonate pathway is comprised of three consecutive reactionsthat are catalyzed by the enzymes mevalonate kinase (MK; E.C.2.7.1.36), phosphomevalonate kinase (PMK; E.C. 2.7.4.2), anddiphosphomevalonate decarboxylase (PDM-DC; E.C. 4.1.1.33).

2 The DT mevalonate JJ pathway NN is VBZ comprised VBN of INthree CD consecutive JJ reactions NNS that WDT are VBPcatalyzed VBN by IN the DT enzymes NNS mevalonate VBPkinase NN -LRB- -LRB- MK NNP ; : E.C. NNP 2.7.1.36 CD-RRB- -RRB- , , phosphomevalonate JJ kinase NN -LRB- -LRB-PMK NNP ; : E.C. NNP 2.7.4.2 CD -RRB- -RRB- , , and CCdiphosphomevalonate JJ decarboxylase NN -LRB- -LRB-PDM-DC NN ; : E.C. NNP 4.1.1.33 CD -RRB- -RRB- . .

Page 14: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Syntactic analysis

Bootstrap

1 di (di ∈ C ) for i = 1, 2, 3, . . .

2 From di read each sentence sj using OpenNLP(sj ∈ di for j = 1, 2, 3, . . .)

3 Generate lexicon L according to the definition of lexicon

4 Each lexis wk ∈ L is normalized: find lemma or stemmed usingWordnet 3.0

5 Candidate semantic groups gl using N − Gram model for lexis wk

[SJB09]

6 Candidate binary relationships vi (gj , gk) vi , gk ∈ L using pattern(N∗

W O∗

W VW N∗

W O∗

W )∗

Page 15: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

N-Gram model

3-Gram model

4-Gram model

Probability

P(wi |gj) where i > 0, j > 0

Page 16: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

N-Gram model

3-Gram model

4-Gram model

Probability

P(wi |gj) where i > 0, j > 0

Page 17: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

T-Box subsumption model

Subsumption model

w1

g1

w2

g2

w3

g3

w4

g4

w5

g2

BN1 BN2

BN3

BN4

BN5

Page 18: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

T-Box relations model

Relations model

Semantic mapping

p(C1,C2|V ) = p(C1|V )p(C2|V ) → V (C1,C2)

Page 19: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Semantic analysis & representation

Semantics

1 Calculate probabilities

2 T-Box subsumption model. Pruning parameter KF

3 T-Box relations model. Pruning parameter RF

4 Antonomy pruning

5 Subsumption hierachy induction

6 Hyponomy and meronym analisys using Wordnet recognizable words

7 Serialize models to OWL DL

Page 20: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Example: T-Box Subsumption

Example

Page 21: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Example: T-Box Relations

Example

Page 22: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Example - Subsumption hierachy induction

Example

Page 23: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Datasets

Datasets

1 PubChem assays, large public hight throughput screening dataset[BAO09] (primary, qualitative evaluation). (Semantic Web Challenge2010, http://bioassayontology.org)

2 Sample collection of 218 web pages extracted from the University ofMiami, Dept. of Computer Science (www .cs.miami .edu) domain(quantitative evaluation)

3 Sample collection of 38 pdf files from ISWC 2009 proceedings(secondary)

Page 24: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Dataset: www .cs.miami .edu domain

Detaset

Title Statistics Description

DocumentsAll documents are xhtml

218 formated with a give template

Unique ConceptWordsNorm. candidate concept words

5,384 from NN, NNP, NNS, JJ, JJR& JJS using [a-zA-Z]+[- ]?w*

Unique VerbsNorm. verbs from

835 VB, VBD, VBG, VBN, VBP& VBZ using [a-zA-Z]+[- ]?w*

Total ConceptWords 39,455Total Verbs 4,797Total Lexicon 44,252 L = ConceptWords

⋂Verbs

Total Groups 39,455

Page 25: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Dataset: www .miami .edu domain, quantitative

Measures: ref. ontology 1

KF Prec. Rec. F10.1 0.209 1 0.3090.2 0.194 1 0.3250.3 0.257 1 0.4100.4 0.257 1 0.4100.5 0.257 1 0.4100.6 0.248 1 0.3970.7 0.244 1 0.3930.8 0.236 1 0.3830.9 0.237 1 0.3831.0 0.13 1 0.232

Measures: ref. ontology 2

KF Pre. Rec. F10.1 0.424 1 0.5960.2 0.388 1 0.5590.3 0.445 1 0.6160.4 0.438 1 0.6090.5 0.438 1 0.6090.6 0.424 1 0.5950.7 0.415 1 0.5870.8 0.412 1 0.5830.9 0.405 1 0.5761.0 0.309 1 0.472

Page 26: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Dataset: www .miami .edu domain, quantitative

Measures: ref. ontology 1

KF Prec. Rec. F10.1 0.209 1 0.3090.2 0.194 1 0.3250.3 0.257 1 0.4100.4 0.257 1 0.4100.5 0.257 1 0.4100.6 0.248 1 0.3970.7 0.244 1 0.3930.8 0.236 1 0.3830.9 0.237 1 0.3831.0 0.13 1 0.232

Measures: ref. ontology 2

KF Pre. Rec. F10.1 0.424 1 0.5960.2 0.388 1 0.5590.3 0.445 1 0.6160.4 0.438 1 0.6090.5 0.438 1 0.6090.6 0.424 1 0.5950.7 0.415 1 0.5870.8 0.412 1 0.5830.9 0.405 1 0.5761.0 0.309 1 0.472

Page 27: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Dataset: PubChem dataset (primary)

Dataset

Title Statistics Description

DocumentsAll documents are xhtml

1,759 formated with a given template

Unique ConceptWordsNorm. candidate concept words

13,017 from NN, NNP, NNS, JJ, JJR& JJS using [a-zA-Z]+[- ]?w*

Unique VerbsNorm. verbs from

1,337 VB, VBD, VBG, VBN, VBP& VBZ using [a-zA-Z]+[- ]?w*

Total ConceptWords 631,623Total Verbs 109,421Total Lexicon 741,044 L = ConceptWords

⋂Verbs

Total Groups 631,623

Page 28: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Dataset: BioAssay ontology dataset (primary)

Evaluation: qualitative

Availability of ground truth

Domain expert evaluation (Prof. Stephan Schuerer)

Results for 3-gram

Rich vocabularyGood structure

Suitable as a seeding ontology to influence domain experts decisions

Page 29: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Dataset: BioAssay ontology dataset (primary)

Thing assay

acetylcholine_plate_step

0_acetylcholine

acetylcholine_calcium_receptor

acetylcholine_receptor_turnover

acetylcholine_rat_receptor

acetylcholine_nanomolar_plate

screen {some} assay_compound_l inescreen {some} cel l_compound_l ine

add {some} acetylchol ine_assay_bufferadd {some} assay_buffer_second

add {some} acetylchol ine_assay_bufferadd {some} assay_buffer_secondadd {some} buffer_second_st imulat ionadd {some} second_step_st imulat ion

screen {some} assay_compound_l inescreen {some} cel l_compound_l ine

Page 30: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Discussion

Discussion

NLP expressions and our expression. Semantic attachment

Substantial amount of data

Distinction between concepts and individuals of the concepts

WordNet unrecognizable words. Porter stemming algorithm.

Complexity

Syntactic layer: O(M ×max(sj)×max(wk))Semantic layer: O(|L| × |SuperConcepts|)Representation layer: complexity of Jena object model serializer

Pellet and Fact++ reasoner output

Page 31: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Summary & Future work

Summary

Goal: The construction of an ontology for a random corpus

Achievement: Seed ontology construction for a random text corpus

Probabilistic reasoning to classify lexico-semantic structures

Future work

Inclusion of a set of English grammar rules to the N-gram models toget variable window sizes

Extract information from other sources to provide a human readableconcepts and roles

Computational lexical semantics

Expand the scope with adding more Pen Treebank tags

Page 32: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Summary & Future work

Summary

Goal: The construction of an ontology for a random corpus

Achievement: Seed ontology construction for a random text corpus

Probabilistic reasoning to classify lexico-semantic structures

Future work

Inclusion of a set of English grammar rules to the N-gram models toget variable window sizes

Extract information from other sources to provide a human readableconcepts and roles

Computational lexical semantics

Expand the scope with adding more Pen Treebank tags

Page 33: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

Questions

Page 34: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

UMLS. Unified Medical Language System http://www.nlm.nih.gov/research/umls/ , 2009

Alfresco Share Team. Alfresco BioAssayOntology University of Miami, http://share.ccs.miami.edu/share/page/site-index , 2009.

T. Berners-Lee. Linked Data W3C Design Issues, 2006.

J. Volker, P. Haase and P. Hitzler Learning Expressive Ontologies Volume 2 Studies on the Semantic Web, 2009

T. Mitchell. Populating the Semantic Web by Macro-Reading Internet Text ISWC Keynote, 2009.

A. Maedche and S. Staab The TEXT-TO-ONTO Ontology Learning Environment, ICCS, 2000.

A. Maedche. Ontology Learning for the Semantic Web, Kluwer Academic Publishers,2002.

S. Staab and R. Studer. Handbook on Ontologies International Handbooks on Information Systems, 2009

A. Maedche and R. Volz The Ontology Extraction and Maintenance Framework Text-To-Onto, In proceeding of the ICDM’01

Workshop on Integrating Data Mining and Knowledge Management, 2001

P. Cimiano and J. Volker Text2Onto A framework for Ontology Learning and Data-driven Change Discovery, Proceedings of

the 10th Internatioanl Conference on Applications of Natural Language to Information System (NLDB), volume 3513 of LNCS,pages 227-238, Alicante, Spain, Springer, 2005

P. Haase and J. Volker Ontology Learning and Reasoning - Dealing with Uncertainty and Inconsistency, Proceedings of the

workshop on Uncertainty Reasoning for the semantic web, (URSW pages 45-55), 2005

P. Clark and P. Harrison. arge-Scae Extraction and Use of Knowledge from Text, K-Cap, 2009.

D.S. Kim, K. Barker, and B. Porter. Knowledge Integration Across Multiple Texts, K-Cap, 2009.

L. Schubert. Can be Derive General World Knowledge from Texts?. K-Cap, 2009.

H.C. Cankaya and D. Moldovan. Method for Extracting Commonsense Knowledge K-Cap, 2009.

Page 35: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

J. Bos and K. Markert. Recognising textual entailment with logical inference, Proceedings of the conference on Human

Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada Pages: 628- 635, 2005.

C. Chemudugunta, A. Holloway, P. Smyth and M. Steyvers Modeling Documents by Combining Semantic Concepts with

Unsupervised Statistical Learning, LNCS 5318, 2008.

L.B. Marinho, K. Buza and L. Schmidt-Thieme Floksonomy-Based Collabulary Learning, LNCS 5318, 2008.

J.L. Ambite, S. Darbha, A. Goel, C.A. Knoblock,K. Lerman, R. Parundekar, T. Russ Automatically Constructing Semantic Web

Services from Online Sources, LNCS 5823, 2009.

S. Russel and P. Norving Artificial Intelligence, A Modern Approach, 2nd ed. Prentice Hall Series in Artificial Intelligence, 2001.

S. Banerjee and T. Pedersen The Design, Implementation and Use of the Ngram Statistic Package, LNCS 2588, 2009.

T. Pedersen, M. Kayaalp and R. Bruce Significant Lexical Relationships, 13th National Conference on Artificial Intelligence,

1996.

Open NLP, http://opennlp.sourceforge.net/ ,2009.

WordNet 3.0. http://wordnet.princeton.edu/, 2009.

A. Ghazvinian, N.F. Noy, C. Jonquet, N. Shah and M.A. Musen. What Four Million Mappings Can Tell You about Two Hundred

Ontologies, LNCS 5823, 2009.

J. Dolby, A. Fokoue, A. Kalyanpur, E. Schonberg and K. Srinivas Extracting Enterprise Vocabularies Using Linked Open Data,

LNCS 5823, 2009..

R. Snow, D. Jurafsky, A.Y. Ng Semantic Taxonomy Induction from Heterogeneous Evidence, Proceedings of the 21st

International Conference on Computational Linguistics and the 44th annual meeting of the Association for ComputationalLinguistics, Sydney, Australia, Pages: 801 - 808, 2006.

R. Wang and W.W. Cohen Language-Independent Set Expansion of Named Entities using the Web, Proceedings of the 2007

Seventh IEEE International Conference on Data Mining, 2007.

Page 36: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

R. Want and W.W. Cohen Iterative Set Expansion of Named Entitles using the Web, , 2008 8th International Conference on

Data Mining, 2008.

R. Wang and W.W. Cohen Automatic Set Instance Extraction using the Web, In Proceedings of the 47th Annual Meeting of

the ACL and the 4th IJCNLP of the AFNLP, 2009

SPARQL. SPARQL Query Language for RDF, W3C Recommendation 15 January 2008,

http://www.w3.org/TR/rdf-sparql-query/, 2008.

Jena. A Semantic Web Framework for Java, http://jena.sourceforge.net/ , 2009.

P. Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications, Springler, 2006

P. Mika. Social Networks and the Semantic Web, Springler, 2007.

T.R. Gruber. Knowledge Acquisition, A Translation Approach to Portable Ontologies. 5(2):199-220, 1993.

R. Studer, R. Benjamins and D. Fensel. Data & Knowledge Engineering, Knowledge Engineering: Principles and methods.

25(1-2):161-198, 1998.

J. Hendler. On beyond ontology, Keynote talk, Second Internatioanl Semantic Web Conference, 2003.

P. Cimiano, A. Madche, S. Staab and J. Volker. Ontology Learning, Handbook On Ontologies, 254-267, 2009

D. Koller and A. Levy and A. Pfeffer P-CLASSIC: A tractable probabilistic description logic, In Proceedings of AAAI-97, Pages

390–397, 1997.

Z. Ding and Yun Peng. A Probabilistic Extension to Ontology Language OWL, Proceedings of the 37th Hawaii International

Conference on System Sciences, 2004.

T. Lukasiewicz. Probabilistic description logics for the semantic web, Technical Report Nr. 1843-06-05, Institut fur

Informationssysteme, Technische Universitat Wien, 2007.

Pellet Pronto. Pellet Pronto, http://pellet.owldl.com/pronto/, 2008.

Page 37: PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions

A. Carlson,J. Betteridge, R. C. Wang,E. R. Hruschka Jr. and T. M. Mitchell. Coupled Semi-Supervised Learning for Information

Extraction, Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), 2010.

P. Cimiano and A. Hotho and S. Staab. Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis, Journal

of Artificial Intelligence research, Pages 305–339, 2005.

The Penn Treebank Project. The Penn Treebank Project, http://www.cis. upenn.edu/ treebank/, 2010.

HermiT. Reasoning with Large Ontologies, http://www.comlab.ox.ac.uk/projects/HermiT/, 2010.