information systems & semantic web university of … › ~staab › research › talks ›...
TRANSCRIPT
<is web> Information Systems & Semantic WebUniversity of Koblenz ▪ Landau, Germany
Ontology Learning
Steffen StaabWith contributions by
Rabeeh Abbasi (U Koblenz), Philipp Cimiano (U Karlsruhe), Daniel Oberle (SAP Research)
Inaugural workshop of the language,
interaction and computation lab
Rovereto, Italy
May 29, 2007
<is web>
Rabeeh [email protected]
Bridging the Gap…2 of 54
ISWeb - Information Systems & Semantic Web
What is an Ontology?
Gruber 93 (slightly extended):
An Ontology is aformal specificationof a sharedconceptualizationof a domain of interest
⇒ Executable, Discussable⇒ Group of persons⇒ About concepts⇒ Between application
and „unique truth“
<is web>
Rabeeh [email protected]
Bridging the Gap…3 of 54
ISWeb - Information Systems & Semantic Web
Taxonomy
Object
Person Topic Document
ResearcherStudent Semantics
OntologyDoctoral Student
Taxonomy := Segmentation, classification and ordering of elements into a classification system according to theirrelationships between each other
PhD Student F-Logic
Menu
<is web>
Rabeeh [email protected]
Bridging the Gap…4 of 54
ISWeb - Information Systems & Semantic Web
Thesaurus
Object
Person Topic Document
ResearcherStudent Semantics
PhD StudentDoktoral Student
• Terminology for specific domain• Taxonomy plus fixed relationships (similar, synonym, related to) • originate from bibliography
similarsynonym
OntologyF-Logic
Menu
<is web>
Rabeeh [email protected]
Bridging the Gap…5 of 54
ISWeb - Information Systems & Semantic Web
Topic Map
Object
Person Topic Document
ResearcherStudent Semantics
PhD StudentDoktoral Student
knows described_in
writes
AffiliationTel
• Topics (nodes), relationships and occurences (to documents)• ISO-Standard• typically for navigation- and visualisation
OntologyF-Logic
similarsynonym
Menu
<is web>
Rabeeh [email protected]
Bridging the Gap…6 of 54
ISWeb - Information Systems & Semantic Web
OntologyF-Logic
similar
OntologyF-Logic
similarPhD StudentDoktoral Student
Ontology (in our sense)
Object
Person Topic Document
Tel
PhD StudentPhD Student
Semantics
knows described_in
writes
Affiliationdescribed_in is_about
knowsP writes D is_about T P T
DT T D
Rules
subTopicOf
• Representation Languages: RDF(S); OWL; Predicate Logic; F-Logic
ResearcherStudent
instance_of-1
is_a-1
is_a-1
is_a-1
Affiliation
Affiliation
York Sure
AIFB+49 721 608 6592
<is web>
Rabeeh [email protected]
Bridging the Gap…7 of 54
ISWeb - Information Systems & Semantic Web
lightweight heavyweight
Text Corpora
DataDictionaries
(EDI)
Ad-hocHierarchies(Yahoo!)
Folks-onomies
Glossaries
Thesauri
XMLDTDs
PrincipledInformal
Hierarchies
DBSchema
XMLSchema
Data Models(UML, STEP)
FormalTaxonomies
Frames(OKBC)
DescriptionLogics
Glossaries &Data Dictionaries Thesauri,
Taxonomies
MetaData,XML Schemas,Data Models
Formal Ontologies& Inference
First-order,Higher-order,Modal Logic
F-Logic
Ontologies and Their Relatives
<is web>
Rabeeh [email protected]
Bridging the Gap…8 of 54
ISWeb - Information Systems & Semantic Web
lightweight heavyweight
Text Corpora
DataDictionaries
(EDI)
Ad-hocHierarchies(Yahoo!)
Folks-onomies
Glossaries
Thesauri
XMLDTDs
PrincipledInformal
Hierarchies
DBSchema
XMLSchema
Data Models(UML, STEP)
FormalTaxonomies
Frames(OKBC)
DescriptionLogics
First-order,Higher-order,Modal Logic
F-Logic
Ontologies and Their Relatives
Top Level Ontologies: Dolce, IEEE SUODomain/application ontologies:GALEN, Foundational Model of AnatomyEnabling ontologies:Core ontology of servicesWSMO, COMM - Multimedia
Thesauri:UMLS,
FAO Agrovoc, Getty Art and Architecture,
UNSPSC
WordNetEuroWordNet
Metadata SchemataDublin Core
<is web>
Rabeeh [email protected]
Bridging the Gap…9 of 54
ISWeb - Information Systems & Semantic Web
lightweight heavyweight
Text Corpora
DataDictionaries
(EDI)
Ad-hocHierarchies(Yahoo!)
Folks-onomies
Glossaries
Thesauri
XMLDTDs
PrincipledInformal
Hierarchies
DBSchema
XMLSchema
Data Models(UML, STEP)
FormalTaxonomies
Frames(OKBC)
DescriptionLogics
First-order,Higher-order,Modal Logic
F-Logic
Ontology Learning & Re-use
Ontology Learning & Re-use is Adding Knowledge
<is web>
Rabeeh [email protected]
Bridging the Gap…10 of 54
ISWeb - Information Systems & Semantic Web
Terms
Concepts
Taxonomy
Relations
Axioms & Rules
disease, illness, hospital
{disease, illness, Krankheit}
DISEASE:=<Int,Ext,Lex>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx →∀
Ontology Learning Layer Cake
<is web>
Rabeeh [email protected]
Bridging the Gap…11 of 54
ISWeb - Information Systems & Semantic Web
lightweight heavyweight
Text Corpora
DataDictionaries
(EDI)
Ad-hocHierarchies(Yahoo!)
Folks-onomies
Glossaries
Thesauri
XMLDTDs
PrincipledInformal
Hierarchies
DBSchema
XMLSchema
Data Models(UML, STEP)
FormalTaxonomies
Frames(OKBC)
DescriptionLogics
First-order,Higher-order,Modal Logic
F-Logic
T-Know & T-Org
Adding Knowledge to Folksonomies
<is web>
Rabeeh [email protected]
Bridging the Gap…12 of 54
ISWeb - Information Systems & Semantic Web
Social Tagging Systems / Folksonomies
In a social tagging system, people add keywords (called tags) to their resources and share these resources with othersAdvantages
low-cost classification, improve search, reputation systems, personal organization, no fixed vocabulary, collaboration…
<is web>
Rabeeh [email protected]
Bridging the Gap…13 of 54
ISWeb - Information Systems & Semantic Web
Social Tagging Systems – Problem!
I want to “browse” vehicle images!!!how can I do it?
• can I do it using a Tag Cloud?
Perhaps I need to structure the tags and resources!how can I do it?
• Put them into categories (like Vehicles, People, etc)!– Do it Manually or with Training?
» Might not be possible on a large scale!– Automatically and without any training!
» Using T-ORG!
<is web>
Rabeeh [email protected]
Bridging the Gap…14 of 54
ISWeb - Information Systems & Semantic Web
T-ORG
Tag Organization using T-ORG
Select ontologies related to the
categories(e.g. Vehicle, People, etc.)
Prune and refine these ontologies according to the
desired categories (add missing
concepts, filter existing concepts)
Apply the classification
algorithm T-KNOW to classify the
tags
Browse the categories to explore the tags
and resources
<is web>
Rabeeh [email protected]
Bridging the Gap…15 of 54
ISWeb - Information Systems & Semantic Web
PresidentGeraldFordNixonPardon
T-ORG – Classification
Organize resources by putting their tags into categories depending upon their contextUsers can browse categories to retrieve required resources
User A
User B
Group 2
Group 1
EiffelEiffel tower
BigEyefulParis
FranceMiniatures
SingenCarsMotorsFord1955
Person Location VehicleCategories
<is web>
Rabeeh [email protected]
Bridging the Gap…16 of 54
ISWeb - Information Systems & Semantic Web
Classifying the tags using T-KNOW
Use linguistic patterns (Hearst …)
to generate queriesSearch these patterns on
Google and download search results
Compare each Google search result with the context of the tag and
extract the concept
Select the concept which has the highest similarity with the context of the tag
<is web>
Rabeeh [email protected]
Bridging the Gap…17 of 54
ISWeb - Information Systems & Semantic Web
T-KNOW – Computing Similarity
Compute similarity using cosine measure between Bag of Words (BOW) representation of “Tag Context” and “Search Result”
1955 = 1as = 0cars = 1ford = 1foundation = 0international = 0motors = 1organizations = 0singen = 1such = 0
1955 = 0as = 1cars = 0ford = 1foundation = 2international = 1motors = 0organizations = 1singen = 0such = 1
Tag Contextsingencarsmotorsford1955
Search Result
BOW
cos(ĉ,â) = ĉ ⋅ â / |ĉ||â|= 0.15
ĉ â
Only consider the results having similarity above a certain ThresholdResult having the highest similarity is considered as final
<is web>
Rabeeh [email protected]
Bridging the Gap…18 of 54
ISWeb - Information Systems & Semantic Web
T-KNOW – Computing Similarity – Resource Context
Getting the context of the tag “Ford” from middle image usingResource Context
• Select all tags of the current resource
PresidentGeraldFordNixonPardon
EiffelEiffel tower
BigEyefulParis
FranceMiniatures
SingenCarsMotorsFord1955
<is web>
Rabeeh [email protected]
Bridging the Gap…19 of 54
ISWeb - Information Systems & Semantic Web
T-KNOW – Computing Similarity – Tag Context
Getting the context of the tag “Ford” from middle image usingTag Context
• Select all tags of all the resources having this tag “Ford”
PresidentGeraldFordNixonPardon
EiffelEiffel tower
BigEyefulParis
FranceMiniatures
SingenCarsMotorsFord1955
<is web>
Rabeeh [email protected]
Bridging the Gap…20 of 54
ISWeb - Information Systems & Semantic Web
T-KNOW – Computing Similarity – User Context
Getting the context of the tag “Ford” from middle image usingUser Context
• Select all tags of all the resources from the user who use this resource
PresidentGeraldFordNixonPardon
User A
User B
EiffelEiffel tower
BigEyefulParis
FranceMiniatures
SingenCarsMotorsFord1955
<is web>
Rabeeh [email protected]
Bridging the Gap…21 of 54
ISWeb - Information Systems & Semantic Web
T-KNOW – Computing Similarity – Group Context
Getting the context of the tag “Ford” from middle image usingGroup Context
• Select all tags of all the resources present in the group to which this resource belong
PresidentGeraldFordNixonPardon
Group 2
Group 1
EiffelEiffel tower
BigEyefulParis
FranceMiniatures
SingenCarsMotorsFord1955
<is web>
Rabeeh [email protected]
Bridging the Gap…22 of 54
ISWeb - Information Systems & Semantic Web
Experimental Setup
Person
Location
Vehicle
Organization
Other
Author, Singer, Human, …Country, District, City, Village,…
Vehicle, Car, Truck, Motorbike, Train, …Company, Organization, Firm, Foundation, …
4+1 Categories 932 Concepts
189 random Images from 9 Flickr groups 1754 Tags
<is web>
Rabeeh [email protected]
Bridging the Gap…23 of 54
ISWeb - Information Systems & Semantic Web
Experimental Setup – Classifiers
Two human classifiers: K (gold standard) and ST-KNOW
<is web>
Rabeeh [email protected]
Bridging the Gap…24 of 54
ISWeb - Information Systems & Semantic Web
Experimental Setup – Evaluation
F-MeasureA = set of correct classification by test (user S or T-KNOW)B = set of all classification by Gold Standard (user K)C = set of all classifications by test
Precision = A / CRecall = A / BF-Measure = 2 * Precision * Recall / (Precision + Recall)
Cohen’s KappaConsiders classification done by chanceUsed to measure classifiers reliability
• P0 = observed agreement between classifiers • Pc = agreement occurred due to chance
c
c
PPPK
−−
=1
0
<is web>
Rabeeh [email protected]
Bridging the Gap…25 of 54
ISWeb - Information Systems & Semantic Web
Results – F-Measure
0.51
0.56
0.61
0.66
0.71
0.76
0.00 0.05 0.10 0.15 0.20 0.25 0.30Threshold
F-M
easu
re
Tag ContextResource ContextUser ContextGroup ContextUser S
- Results comparable to Human Classification
<is web>
Rabeeh [email protected]
Bridging the Gap…26 of 54
ISWeb - Information Systems & Semantic Web
Results – Cohen’s Kappa
0.00
0.10
0.20
0.30
0.40
0.50
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Threshold
Kap
pa V
alue
Tag ContextResource ContextUser ContextGroup ContextUser S
- Might be a good measure when there is a chance of classification by chance
<is web>
Rabeeh [email protected]
Bridging the Gap…27 of 54
ISWeb - Information Systems & Semantic Web
Targeted Usage: Faceted Browsing
+Cities +Countries +Lakes
+Markets +Universities
-Austria -Germany -Pakistan -USA+Animals +Cameras +Colours
+Events +Languages +People
+Places +Programming +Resources
<is web>
Rabeeh [email protected]
Bridging the Gap…28 of 54
ISWeb - Information Systems & Semantic Web
lightweight heavyweight
Text Corpora
DataDictionaries
(EDI)
Ad-hocHierarchies(Yahoo!)
Folks-onomies
Glossaries
Thesauri
XMLDTDs
PrincipledInformal
Hierarchies
DBSchema
XMLSchema
Data Models(UML, STEP)
FormalTaxonomies
Frames(OKBC)
DescriptionLogics
First-order,Higher-order,Modal Logic
F-Logic
Ontology Learning & Re-use
Formal Concept Analysis for Learning ConceptHierarchies from Texts
<is web>
Rabeeh [email protected]
Bridging the Gap…29 of 54
ISWeb - Information Systems & Semantic Web
OL from Text as Reverse Engineering
Reverse Engineering
Write
Shared World Model
<is web>
Rabeeh [email protected]
Bridging the Gap…30 of 54
ISWeb - Information Systems & Semantic Web
Distributional Hypothesis & Vector Space Model
Harris, 1986„Words are (semantically) similar to the extent to which they share similar words“
Firth, 1957„You shall know a word by the company it keeps“
Idea: collect context information and represent it as a vector:
compute similarity among vectors wrt. a measure
21excursion
14trip
3111motor-bike
423car
32apartment
join_objride_objdrive_objrent_objbook_obj
Such techniques for bridging the syntactic/semantic divide are verywell-known – but not very sophisticated
Full benefit requires more efforts
<is web>
Rabeeh [email protected]
Bridging the Gap…31 of 54
ISWeb - Information Systems & Semantic Web
Overall Process
Or other clustering mechanism
<is web>
Rabeeh [email protected]
Bridging the Gap…32 of 54
ISWeb - Information Systems & Semantic Web
Using Syntactic Surface Dependencies
Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger’s delta.
city: biggest(1)ambience: traditional(1)center: of_tourist_industry(1)junction town: nearby(1)market: bustling(1)port: vibrant(1)overload:suffer_from(1)tourist industry: center_of(1), local(1)town: seem_subj(1)view: nice(1), offer_obj(1)
<is web>
Rabeeh [email protected]
Bridging the Gap…33 of 54
ISWeb - Information Systems & Semantic Web
extract syntactic dependencies from text⇒ verb/object, verb/subject, verb/PP relations⇒ car: drive_obj, crash_subj, sit_in, …
Context Extraction Process
s
LoPar
vpdp
v dp
tgrep
sat_in(car) crashed_subj(cars)drove_obj(car)
sit_in(car)crash_subj(car)drive_obj(car)
lemmatization
<is web>
Rabeeh [email protected]
Bridging the Gap…34 of 54
ISWeb - Information Systems & Semantic Web
Weighting
Observation:output of the parser can be erroneousnot all attribute/object pairs are significant
Conditional Probability:
Consider attribute/object pairs with weight over threshold t
)|( argvnP
<is web>
Rabeeh [email protected]
Bridging the Gap…35 of 54
ISWeb - Information Systems & Semantic Web
Similarity-based Clustering
Similarity Measures:Binary (Jaccard, Dine)Geometric (Cosine, Euclidean/Manhattan distance)Information-theoretic (Relative Entropy, Mutual Information)(…)
Methods:Hierarchical agglomerative clustering (complete, average, singlelinkage)Hierarchical top-down clustering, e.g. Bi-Section KMeans(…)
<is web>
Rabeeh [email protected]
Bridging the Gap…36 of 54
ISWeb - Information Systems & Semantic Web
Agglomerative/Bottom-Up Clustering
car bus tripexcursionappartment
<is web>
Rabeeh [email protected]
Bridging the Gap…37 of 54
ISWeb - Information Systems & Semantic Web
Bi-Section-KMeans
excursion trip
appartmentcar bustrip excursion
excursiontripcar
busappartment
appartmentbus car
bus car
<is web>
Rabeeh [email protected]
Bridging the Gap…38 of 54
ISWeb - Information Systems & Semantic Web
Set Theoretical Clustering
XXexcursion
XXtrip
XXXXmotor-bike
XXXcar
XXapartment
joinableridabledrivablerentablebookable
Set theoreticalFormal Concept Analysis[Ganter and Wille 1999]
<is web>
Rabeeh [email protected]
Bridging the Gap…39 of 54
ISWeb - Information Systems & Semantic Web
Tourism Formal Context
XXexcursion
XXtrip
XXXXmotor-bike
XXXcar
XXappartment
joinablerideabledriveablerentablebookable
Genus: here - ClassesSpecies: here - SubclassesDifferentiae: Characteristics which allow to group or distinguish objects from each other
<is web>
Rabeeh [email protected]
Bridging the Gap…40 of 54
ISWeb - Information Systems & Semantic Web
Formal Context
Definition 1 (Formal Context)A triple (G,M ,I ) is called a formal context if G and M are sets
and I ⊆ G x M is a binary relation between G and M.
The elements of G are called objects , those of M attributes and I is the incidence of the context.
For A ⊆ G, we define: A‘ := { m ∈ M | ∀ g ∈ A: (g,m) ∈ I}And dually: For B ⊆ M, we define: B‘ := { g ∈ G | ∀ m ∈ B: (g,m) ∈ I}
Intuitively speaking, A‘ is the set of all attributes common to theobjects of A, while B‘ is the set of all objects that have all attributesin B in common.
Example:{appartment,car}‘={bookable,rentable}{appartment,car}‘‘={bookable,rentable}‘={appartment,car,motor-bike}
<is web>
Rabeeh [email protected]
Bridging the Gap…41 of 54
ISWeb - Information Systems & Semantic Web
Formal ConceptDefinition 2 (Formal Concept)A pair (A,B ) is a formal concept of (G,M ,I ) if and only if
A ⊆ G; B ⊆ M; A‘ = B and A = B‘.
In other words, (A,B) is a formal concept if the set of all attributesshared by the objects of A is identical with B and vice versaA is also the set of all objects that have all attributes in B.
A is then called the extent and B the intent of the formal concept (A,B).
The formal concepts of a given context are naturally ordered by thesubconcept-superconcept relation as defined by:
(A1;B1) ≤ (A2;B2) ⇔ A1 ≤ A2(⇔ B2 ≤ B1)
Example:({appartment,car},{bookable,rentable}) is not a formal concept{appartment,car,motor-bike},{bookable,rentable}) is a formal concept
<is web>
Rabeeh [email protected]
Bridging the Gap…42 of 54
ISWeb - Information Systems & Semantic Web
Tourism Lattice
<is web>
Rabeeh [email protected]
Bridging the Gap…43 of 54
ISWeb - Information Systems & Semantic Web
Concept Hierarchy
bookable
rentable joinable
driveable appartment
car
motor-bike
tripexcursion
rideable
<is web>
Rabeeh [email protected]
Bridging the Gap…44 of 54
ISWeb - Information Systems & Semantic Web
Evaluation - Data Sets
Tourism (118 Mio. tokens):http://www.all-in-all.de/englishhttp://www.lonelyplanet.comBritish National Corpus (BNC)handcrafted tourism ontology (289 concepts)
Finance (185 Mio. tokens):Reuters news from 1987GETESS finance ontology (1178 concepts)
<is web>
Rabeeh [email protected]
Bridging the Gap…45 of 54
ISWeb - Information Systems & Semantic Web
Evaluation - Example 1
bookable
rentable joinable
driveable appartment
car
bike
tripexcursion
rideable
root
thing activity
vehicle appartment
car
bike
tripexcursion
TWV
P=100%
R=100%
F=100%
Learnedhierarchy Gold
standard
<is web>
Rabeeh [email protected]
Bridging the Gap…46 of 54
ISWeb - Information Systems & Semantic Web
Evaluation – Example 2
bookable
rentable joinable
driveable appartment
carbike
tripexcursion
root
thing activity
vehicle appartment
car
bike
tripexcursion
TWV
P=100%
R=87,5% F=93.33%
Learnedhierarchy Gold
standard
<is web>
Rabeeh [email protected]
Bridging the Gap…47 of 54
ISWeb - Information Systems & Semantic Web
Evaluation – Example 3
root
thing activity
vehicle appartment
car
bike
tripexcursion
TWV
bookable
rentable joinable
driveable appartment
car
bike
tripplanable
rideable excursion
P=90%
R=100%
F=94.74%
Learnedhierarchy Gold
standard
<is web>
Rabeeh [email protected]
Bridging the Gap…48 of 54
ISWeb - Information Systems & Semantic Web
Precision/Recall/F-Measure
FCA (Tourism)
0
0,2
0,4
0,6
0,8
1
1,2
0 0,2 0,4 0,6 0,8 1
threshold t
PrecRecallF
F-Measure for recovering structure
<is web>
Rabeeh [email protected]
Bridging the Gap…49 of 54
ISWeb - Information Systems & Semantic Web
Lexical Recall, F‘
FCA (Tourism)
00,050,1
0,150,2
0,250,3
0,350,4
0,450,5
0 0,2 0,4 0,6 0,8 1
threshold t
FLRF'
F‘-Measure for recovering terms and structure
<is web>
Rabeeh [email protected]
Bridging the Gap…50 of 54
ISWeb - Information Systems & Semantic Web
Comparison (Tourism, F‘)
Comparison (Tourism)
00,05
0,10,15
0,20,25
0,30,35
0,40,45
0,5
0 0,2 0,4 0,6 0,8 1
threshold t
FCAComplete LinkageAverage LinkageSingle LinkageBi-Section-Kmeans
<is web>
Rabeeh [email protected]
Bridging the Gap…51 of 54
ISWeb - Information Systems & Semantic Web
Comparison (Finance, F‘)
Comparison (Finance)
00,05
0,10,15
0,20,25
0,30,35
0,40,45
0 0,2 0,4 0,6 0,8 1
threshold t
FCAComplete-LinkageAverage LinkageSingle LinkageBi-Section-Kmeans
<is web>
Rabeeh [email protected]
Bridging the Gap…52 of 54
ISWeb - Information Systems & Semantic Web
Clustering – Comparison
Weak-FairO(n2)36.42/32.77%DivisiveClustering
FairO(n2 log(n))O(n2)O(n2)
36.78/33.35%36.55/32.92%38.57/32.15%
AgglomerativeClustering
GoodO(2n)(pract. better!)
43.81/41.02%FCA
UnderstandabilityWorst Case Time Complexity
F-Measure
<is web>
Rabeeh [email protected]
Bridging the Gap…53 of 54
ISWeb - Information Systems & Semantic Web
lightweight heavyweight
Text Corpora
DataDictionaries
(EDI)
Ad-hocHierarchies(Yahoo!)
Folks-onomies
Glossaries
Thesauri
XMLDTDs
PrincipledInformal
Hierarchies
DBSchema
XMLSchema
Data Models(UML, STEP)
FormalTaxonomies
Frames(OKBC)
DescriptionLogics
First-order,Higher-order,Modal Logic
F-Logic
Ontology Learning & Re-use
Many more ways to explore along each dimension:
• Source
• Method
• Usage
<is web>
Rabeeh [email protected]
Bridging the Gap…54 of 54
ISWeb - Information Systems & Semantic Web
Outlook
Formal concept analysis sensible to
NoiseLow probabilities
→ Smooth FCA
Folksonomies relying on social contextResourcesTagsTime
→ Exploration of Interaction and dynamics
Thank you!http://isweb.uni-koblenz.de
<is web> Information Systems & Semantic WebUniversity of Koblenz ▪ Landau, Germany
Thank you!
http://isweb.uni-koblenz.de
<is web>
Rabeeh [email protected]
Bridging the Gap…56 of 54
ISWeb - Information Systems & Semantic Web
Literature
Overviews
OntologiesS. Staab, R. Studer. Handbook on Ontologies. Springer, 2004.
Ontology Population (by Annotation)S. Handschuh, S. Staab (eds). Annotation for the Semantic Web, IOS Press, 2003.
Ontology LearningPaul Buitelaar, Philipp Cimiano, Bernardo Magnini (eds.) Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005. P. Cimiano. Ontology Learning and Population from Text. Algorithms, Evaluation and Applications, Springer, November 2006.A. Mädche. Ontology Learning for the Semantic Web. Kluwer, 2002.
Specifically used here
P. Cimiano, S. Staab. Learning by Googling. SIGKDD Explorations. 6(2), pp. 24-33, 2004.P. Cimiano, A. Hotho, S. Staab. Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. JAIR - Journal of AI Research. 24: 305-339, 2005.K. Dellschaft, S. Staab. On How to Perform a Gold Standard based Evaluation of Ontology Learning. In: Proc. of ISWC-2006 – International Semantic Web Conference, Athens, GA, USA, Springer, LNCS, November 2006.R. Abbasi, S. Staab, P. Cimiano. Organizing Resources on Tagging Systems using T-ORG. In ESWC-2007 Workshop on Bridging the Gap between Semantic Web and Web 2.0, Innsbruck, 2007.
<is web>
Rabeeh [email protected]
Bridging the Gap…57 of 54
ISWeb - Information Systems & Semantic Web
A Note on the Evaluation of Ontology Learning
The aposteriori Approach:ask domain expert for a per concept evaluation of the learned ontologyCount three categories of concepts:
• Correct : both in learned and the gold ontology• New : only in learned ontology, but relevant and should be in gold
standard as well• Spurious: useless
Compute precision = (correct + new) / (correct + new + spurious)As the result: The a priori evaluations are aweful – BUTA posteriori evaluations by domain experts still show very good results, very helpful for domain expert!
Sabou M., Wroe C., Goble C. and Mishne G.,Learning Domain Ontologies for Web Service Descriptions: an Experiment in Bioinformatics, In Proceeedings of the 14th International World Wide Web Conference (WWW2005), Chiba, Japan, 10-14 May, 2005.