information systems & semantic web university of … › ~staab › research › talks ›...

57
<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany Ontology Learning Steffen Staab With contributions by Rabeeh Abbasi (U Koblenz), Philipp Cimiano (U Karlsruhe), Daniel Oberle (SAP Research) Inaugural workshop of the language, interaction and computation lab Rovereto, Italy May 29, 2007

Upload: others

Post on 30-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web> Information Systems & Semantic WebUniversity of Koblenz ▪ Landau, Germany

Ontology Learning

Steffen StaabWith contributions by

Rabeeh Abbasi (U Koblenz), Philipp Cimiano (U Karlsruhe), Daniel Oberle (SAP Research)

Inaugural workshop of the language,

interaction and computation lab

Rovereto, Italy

May 29, 2007

Page 2: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…2 of 54

ISWeb - Information Systems & Semantic Web

What is an Ontology?

Gruber 93 (slightly extended):

An Ontology is aformal specificationof a sharedconceptualizationof a domain of interest

⇒ Executable, Discussable⇒ Group of persons⇒ About concepts⇒ Between application

and „unique truth“

Page 3: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…3 of 54

ISWeb - Information Systems & Semantic Web

Taxonomy

Object

Person Topic Document

ResearcherStudent Semantics

OntologyDoctoral Student

Taxonomy := Segmentation, classification and ordering of elements into a classification system according to theirrelationships between each other

PhD Student F-Logic

Menu

Page 4: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…4 of 54

ISWeb - Information Systems & Semantic Web

Thesaurus

Object

Person Topic Document

ResearcherStudent Semantics

PhD StudentDoktoral Student

• Terminology for specific domain• Taxonomy plus fixed relationships (similar, synonym, related to) • originate from bibliography

similarsynonym

OntologyF-Logic

Menu

Page 5: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…5 of 54

ISWeb - Information Systems & Semantic Web

Topic Map

Object

Person Topic Document

ResearcherStudent Semantics

PhD StudentDoktoral Student

knows described_in

writes

AffiliationTel

• Topics (nodes), relationships and occurences (to documents)• ISO-Standard• typically for navigation- and visualisation

OntologyF-Logic

similarsynonym

Menu

Page 6: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…6 of 54

ISWeb - Information Systems & Semantic Web

OntologyF-Logic

similar

OntologyF-Logic

similarPhD StudentDoktoral Student

Ontology (in our sense)

Object

Person Topic Document

Tel

PhD StudentPhD Student

Semantics

knows described_in

writes

Affiliationdescribed_in is_about

knowsP writes D is_about T P T

DT T D

Rules

subTopicOf

• Representation Languages: RDF(S); OWL; Predicate Logic; F-Logic

ResearcherStudent

instance_of-1

is_a-1

is_a-1

is_a-1

Affiliation

Affiliation

York Sure

AIFB+49 721 608 6592

Page 7: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…7 of 54

ISWeb - Information Systems & Semantic Web

lightweight heavyweight

Text Corpora

DataDictionaries

(EDI)

Ad-hocHierarchies(Yahoo!)

Folks-onomies

Glossaries

Thesauri

XMLDTDs

PrincipledInformal

Hierarchies

DBSchema

XMLSchema

Data Models(UML, STEP)

FormalTaxonomies

Frames(OKBC)

DescriptionLogics

Glossaries &Data Dictionaries Thesauri,

Taxonomies

MetaData,XML Schemas,Data Models

Formal Ontologies& Inference

First-order,Higher-order,Modal Logic

F-Logic

Ontologies and Their Relatives

Page 8: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…8 of 54

ISWeb - Information Systems & Semantic Web

lightweight heavyweight

Text Corpora

DataDictionaries

(EDI)

Ad-hocHierarchies(Yahoo!)

Folks-onomies

Glossaries

Thesauri

XMLDTDs

PrincipledInformal

Hierarchies

DBSchema

XMLSchema

Data Models(UML, STEP)

FormalTaxonomies

Frames(OKBC)

DescriptionLogics

First-order,Higher-order,Modal Logic

F-Logic

Ontologies and Their Relatives

Top Level Ontologies: Dolce, IEEE SUODomain/application ontologies:GALEN, Foundational Model of AnatomyEnabling ontologies:Core ontology of servicesWSMO, COMM - Multimedia

Thesauri:UMLS,

FAO Agrovoc, Getty Art and Architecture,

UNSPSC

WordNetEuroWordNet

Metadata SchemataDublin Core

Page 9: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…9 of 54

ISWeb - Information Systems & Semantic Web

lightweight heavyweight

Text Corpora

DataDictionaries

(EDI)

Ad-hocHierarchies(Yahoo!)

Folks-onomies

Glossaries

Thesauri

XMLDTDs

PrincipledInformal

Hierarchies

DBSchema

XMLSchema

Data Models(UML, STEP)

FormalTaxonomies

Frames(OKBC)

DescriptionLogics

First-order,Higher-order,Modal Logic

F-Logic

Ontology Learning & Re-use

Ontology Learning & Re-use is Adding Knowledge

Page 10: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…10 of 54

ISWeb - Information Systems & Semantic Web

Terms

Concepts

Taxonomy

Relations

Axioms & Rules

disease, illness, hospital

{disease, illness, Krankheit}

DISEASE:=<Int,Ext,Lex>

is_a(DOCTOR,PERSON)

cure(dom:DOCTOR,range:DISEASE)

(Multilingual) Synonyms

))(),((, xillyxsufferFromyx →∀

Ontology Learning Layer Cake

Page 11: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…11 of 54

ISWeb - Information Systems & Semantic Web

lightweight heavyweight

Text Corpora

DataDictionaries

(EDI)

Ad-hocHierarchies(Yahoo!)

Folks-onomies

Glossaries

Thesauri

XMLDTDs

PrincipledInformal

Hierarchies

DBSchema

XMLSchema

Data Models(UML, STEP)

FormalTaxonomies

Frames(OKBC)

DescriptionLogics

First-order,Higher-order,Modal Logic

F-Logic

T-Know & T-Org

Adding Knowledge to Folksonomies

Page 12: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…12 of 54

ISWeb - Information Systems & Semantic Web

Social Tagging Systems / Folksonomies

In a social tagging system, people add keywords (called tags) to their resources and share these resources with othersAdvantages

low-cost classification, improve search, reputation systems, personal organization, no fixed vocabulary, collaboration…

Page 13: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…13 of 54

ISWeb - Information Systems & Semantic Web

Social Tagging Systems – Problem!

I want to “browse” vehicle images!!!how can I do it?

• can I do it using a Tag Cloud?

Perhaps I need to structure the tags and resources!how can I do it?

• Put them into categories (like Vehicles, People, etc)!– Do it Manually or with Training?

» Might not be possible on a large scale!– Automatically and without any training!

» Using T-ORG!

Page 14: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…14 of 54

ISWeb - Information Systems & Semantic Web

T-ORG

Tag Organization using T-ORG

Select ontologies related to the

categories(e.g. Vehicle, People, etc.)

Prune and refine these ontologies according to the

desired categories (add missing

concepts, filter existing concepts)

Apply the classification

algorithm T-KNOW to classify the

tags

Browse the categories to explore the tags

and resources

Page 15: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…15 of 54

ISWeb - Information Systems & Semantic Web

PresidentGeraldFordNixonPardon

T-ORG – Classification

Organize resources by putting their tags into categories depending upon their contextUsers can browse categories to retrieve required resources

User A

User B

Group 2

Group 1

EiffelEiffel tower

BigEyefulParis

FranceMiniatures

SingenCarsMotorsFord1955

Person Location VehicleCategories

Page 16: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…16 of 54

ISWeb - Information Systems & Semantic Web

Classifying the tags using T-KNOW

Use linguistic patterns (Hearst …)

to generate queriesSearch these patterns on

Google and download search results

Compare each Google search result with the context of the tag and

extract the concept

Select the concept which has the highest similarity with the context of the tag

Page 17: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…17 of 54

ISWeb - Information Systems & Semantic Web

T-KNOW – Computing Similarity

Compute similarity using cosine measure between Bag of Words (BOW) representation of “Tag Context” and “Search Result”

1955 = 1as = 0cars = 1ford = 1foundation = 0international = 0motors = 1organizations = 0singen = 1such = 0

1955 = 0as = 1cars = 0ford = 1foundation = 2international = 1motors = 0organizations = 1singen = 0such = 1

Tag Contextsingencarsmotorsford1955

Search Result

BOW

cos(ĉ,â) = ĉ ⋅ â / |ĉ||â|= 0.15

ĉ â

Only consider the results having similarity above a certain ThresholdResult having the highest similarity is considered as final

Page 18: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…18 of 54

ISWeb - Information Systems & Semantic Web

T-KNOW – Computing Similarity – Resource Context

Getting the context of the tag “Ford” from middle image usingResource Context

• Select all tags of the current resource

PresidentGeraldFordNixonPardon

EiffelEiffel tower

BigEyefulParis

FranceMiniatures

SingenCarsMotorsFord1955

Page 19: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…19 of 54

ISWeb - Information Systems & Semantic Web

T-KNOW – Computing Similarity – Tag Context

Getting the context of the tag “Ford” from middle image usingTag Context

• Select all tags of all the resources having this tag “Ford”

PresidentGeraldFordNixonPardon

EiffelEiffel tower

BigEyefulParis

FranceMiniatures

SingenCarsMotorsFord1955

Page 20: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…20 of 54

ISWeb - Information Systems & Semantic Web

T-KNOW – Computing Similarity – User Context

Getting the context of the tag “Ford” from middle image usingUser Context

• Select all tags of all the resources from the user who use this resource

PresidentGeraldFordNixonPardon

User A

User B

EiffelEiffel tower

BigEyefulParis

FranceMiniatures

SingenCarsMotorsFord1955

Page 21: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…21 of 54

ISWeb - Information Systems & Semantic Web

T-KNOW – Computing Similarity – Group Context

Getting the context of the tag “Ford” from middle image usingGroup Context

• Select all tags of all the resources present in the group to which this resource belong

PresidentGeraldFordNixonPardon

Group 2

Group 1

EiffelEiffel tower

BigEyefulParis

FranceMiniatures

SingenCarsMotorsFord1955

Page 22: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…22 of 54

ISWeb - Information Systems & Semantic Web

Experimental Setup

Person

Location

Vehicle

Organization

Other

Author, Singer, Human, …Country, District, City, Village,…

Vehicle, Car, Truck, Motorbike, Train, …Company, Organization, Firm, Foundation, …

4+1 Categories 932 Concepts

189 random Images from 9 Flickr groups 1754 Tags

Page 23: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…23 of 54

ISWeb - Information Systems & Semantic Web

Experimental Setup – Classifiers

Two human classifiers: K (gold standard) and ST-KNOW

Page 24: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…24 of 54

ISWeb - Information Systems & Semantic Web

Experimental Setup – Evaluation

F-MeasureA = set of correct classification by test (user S or T-KNOW)B = set of all classification by Gold Standard (user K)C = set of all classifications by test

Precision = A / CRecall = A / BF-Measure = 2 * Precision * Recall / (Precision + Recall)

Cohen’s KappaConsiders classification done by chanceUsed to measure classifiers reliability

• P0 = observed agreement between classifiers • Pc = agreement occurred due to chance

c

c

PPPK

−−

=1

0

Page 25: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…25 of 54

ISWeb - Information Systems & Semantic Web

Results – F-Measure

0.51

0.56

0.61

0.66

0.71

0.76

0.00 0.05 0.10 0.15 0.20 0.25 0.30Threshold

F-M

easu

re

Tag ContextResource ContextUser ContextGroup ContextUser S

- Results comparable to Human Classification

Page 26: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…26 of 54

ISWeb - Information Systems & Semantic Web

Results – Cohen’s Kappa

0.00

0.10

0.20

0.30

0.40

0.50

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Threshold

Kap

pa V

alue

Tag ContextResource ContextUser ContextGroup ContextUser S

- Might be a good measure when there is a chance of classification by chance

Page 27: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…27 of 54

ISWeb - Information Systems & Semantic Web

Targeted Usage: Faceted Browsing

+Cities +Countries +Lakes

+Markets +Universities

-Austria -Germany -Pakistan -USA+Animals +Cameras +Colours

+Events +Languages +People

+Places +Programming +Resources

Page 28: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…28 of 54

ISWeb - Information Systems & Semantic Web

lightweight heavyweight

Text Corpora

DataDictionaries

(EDI)

Ad-hocHierarchies(Yahoo!)

Folks-onomies

Glossaries

Thesauri

XMLDTDs

PrincipledInformal

Hierarchies

DBSchema

XMLSchema

Data Models(UML, STEP)

FormalTaxonomies

Frames(OKBC)

DescriptionLogics

First-order,Higher-order,Modal Logic

F-Logic

Ontology Learning & Re-use

Formal Concept Analysis for Learning ConceptHierarchies from Texts

Page 29: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…29 of 54

ISWeb - Information Systems & Semantic Web

OL from Text as Reverse Engineering

Reverse Engineering

Write

Shared World Model

Page 30: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…30 of 54

ISWeb - Information Systems & Semantic Web

Distributional Hypothesis & Vector Space Model

Harris, 1986„Words are (semantically) similar to the extent to which they share similar words“

Firth, 1957„You shall know a word by the company it keeps“

Idea: collect context information and represent it as a vector:

compute similarity among vectors wrt. a measure

21excursion

14trip

3111motor-bike

423car

32apartment

join_objride_objdrive_objrent_objbook_obj

Such techniques for bridging the syntactic/semantic divide are verywell-known – but not very sophisticated

Full benefit requires more efforts

Page 31: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…31 of 54

ISWeb - Information Systems & Semantic Web

Overall Process

Or other clustering mechanism

Page 32: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…32 of 54

ISWeb - Information Systems & Semantic Web

Using Syntactic Surface Dependencies

Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger’s delta.

city: biggest(1)ambience: traditional(1)center: of_tourist_industry(1)junction town: nearby(1)market: bustling(1)port: vibrant(1)overload:suffer_from(1)tourist industry: center_of(1), local(1)town: seem_subj(1)view: nice(1), offer_obj(1)

Page 33: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…33 of 54

ISWeb - Information Systems & Semantic Web

extract syntactic dependencies from text⇒ verb/object, verb/subject, verb/PP relations⇒ car: drive_obj, crash_subj, sit_in, …

Context Extraction Process

s

LoPar

vpdp

v dp

tgrep

sat_in(car) crashed_subj(cars)drove_obj(car)

sit_in(car)crash_subj(car)drive_obj(car)

lemmatization

Page 34: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…34 of 54

ISWeb - Information Systems & Semantic Web

Weighting

Observation:output of the parser can be erroneousnot all attribute/object pairs are significant

Conditional Probability:

Consider attribute/object pairs with weight over threshold t

)|( argvnP

Page 35: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…35 of 54

ISWeb - Information Systems & Semantic Web

Similarity-based Clustering

Similarity Measures:Binary (Jaccard, Dine)Geometric (Cosine, Euclidean/Manhattan distance)Information-theoretic (Relative Entropy, Mutual Information)(…)

Methods:Hierarchical agglomerative clustering (complete, average, singlelinkage)Hierarchical top-down clustering, e.g. Bi-Section KMeans(…)

Page 36: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…36 of 54

ISWeb - Information Systems & Semantic Web

Agglomerative/Bottom-Up Clustering

car bus tripexcursionappartment

Page 37: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…37 of 54

ISWeb - Information Systems & Semantic Web

Bi-Section-KMeans

excursion trip

appartmentcar bustrip excursion

excursiontripcar

busappartment

appartmentbus car

bus car

Page 38: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…38 of 54

ISWeb - Information Systems & Semantic Web

Set Theoretical Clustering

XXexcursion

XXtrip

XXXXmotor-bike

XXXcar

XXapartment

joinableridabledrivablerentablebookable

Set theoreticalFormal Concept Analysis[Ganter and Wille 1999]

Page 39: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…39 of 54

ISWeb - Information Systems & Semantic Web

Tourism Formal Context

XXexcursion

XXtrip

XXXXmotor-bike

XXXcar

XXappartment

joinablerideabledriveablerentablebookable

Genus: here - ClassesSpecies: here - SubclassesDifferentiae: Characteristics which allow to group or distinguish objects from each other

Page 40: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…40 of 54

ISWeb - Information Systems & Semantic Web

Formal Context

Definition 1 (Formal Context)A triple (G,M ,I ) is called a formal context if G and M are sets

and I ⊆ G x M is a binary relation between G and M.

The elements of G are called objects , those of M attributes and I is the incidence of the context.

For A ⊆ G, we define: A‘ := { m ∈ M | ∀ g ∈ A: (g,m) ∈ I}And dually: For B ⊆ M, we define: B‘ := { g ∈ G | ∀ m ∈ B: (g,m) ∈ I}

Intuitively speaking, A‘ is the set of all attributes common to theobjects of A, while B‘ is the set of all objects that have all attributesin B in common.

Example:{appartment,car}‘={bookable,rentable}{appartment,car}‘‘={bookable,rentable}‘={appartment,car,motor-bike}

Page 41: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…41 of 54

ISWeb - Information Systems & Semantic Web

Formal ConceptDefinition 2 (Formal Concept)A pair (A,B ) is a formal concept of (G,M ,I ) if and only if

A ⊆ G; B ⊆ M; A‘ = B and A = B‘.

In other words, (A,B) is a formal concept if the set of all attributesshared by the objects of A is identical with B and vice versaA is also the set of all objects that have all attributes in B.

A is then called the extent and B the intent of the formal concept (A,B).

The formal concepts of a given context are naturally ordered by thesubconcept-superconcept relation as defined by:

(A1;B1) ≤ (A2;B2) ⇔ A1 ≤ A2(⇔ B2 ≤ B1)

Example:({appartment,car},{bookable,rentable}) is not a formal concept{appartment,car,motor-bike},{bookable,rentable}) is a formal concept

Page 42: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…42 of 54

ISWeb - Information Systems & Semantic Web

Tourism Lattice

Page 43: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…43 of 54

ISWeb - Information Systems & Semantic Web

Concept Hierarchy

bookable

rentable joinable

driveable appartment

car

motor-bike

tripexcursion

rideable

Page 44: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…44 of 54

ISWeb - Information Systems & Semantic Web

Evaluation - Data Sets

Tourism (118 Mio. tokens):http://www.all-in-all.de/englishhttp://www.lonelyplanet.comBritish National Corpus (BNC)handcrafted tourism ontology (289 concepts)

Finance (185 Mio. tokens):Reuters news from 1987GETESS finance ontology (1178 concepts)

Page 45: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…45 of 54

ISWeb - Information Systems & Semantic Web

Evaluation - Example 1

bookable

rentable joinable

driveable appartment

car

bike

tripexcursion

rideable

root

thing activity

vehicle appartment

car

bike

tripexcursion

TWV

P=100%

R=100%

F=100%

Learnedhierarchy Gold

standard

Page 46: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…46 of 54

ISWeb - Information Systems & Semantic Web

Evaluation – Example 2

bookable

rentable joinable

driveable appartment

carbike

tripexcursion

root

thing activity

vehicle appartment

car

bike

tripexcursion

TWV

P=100%

R=87,5% F=93.33%

Learnedhierarchy Gold

standard

Page 47: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…47 of 54

ISWeb - Information Systems & Semantic Web

Evaluation – Example 3

root

thing activity

vehicle appartment

car

bike

tripexcursion

TWV

bookable

rentable joinable

driveable appartment

car

bike

tripplanable

rideable excursion

P=90%

R=100%

F=94.74%

Learnedhierarchy Gold

standard

Page 48: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…48 of 54

ISWeb - Information Systems & Semantic Web

Precision/Recall/F-Measure

FCA (Tourism)

0

0,2

0,4

0,6

0,8

1

1,2

0 0,2 0,4 0,6 0,8 1

threshold t

PrecRecallF

F-Measure for recovering structure

Page 49: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…49 of 54

ISWeb - Information Systems & Semantic Web

Lexical Recall, F‘

FCA (Tourism)

00,050,1

0,150,2

0,250,3

0,350,4

0,450,5

0 0,2 0,4 0,6 0,8 1

threshold t

FLRF'

F‘-Measure for recovering terms and structure

Page 50: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…50 of 54

ISWeb - Information Systems & Semantic Web

Comparison (Tourism, F‘)

Comparison (Tourism)

00,05

0,10,15

0,20,25

0,30,35

0,40,45

0,5

0 0,2 0,4 0,6 0,8 1

threshold t

FCAComplete LinkageAverage LinkageSingle LinkageBi-Section-Kmeans

Page 51: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…51 of 54

ISWeb - Information Systems & Semantic Web

Comparison (Finance, F‘)

Comparison (Finance)

00,05

0,10,15

0,20,25

0,30,35

0,40,45

0 0,2 0,4 0,6 0,8 1

threshold t

FCAComplete-LinkageAverage LinkageSingle LinkageBi-Section-Kmeans

Page 52: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…52 of 54

ISWeb - Information Systems & Semantic Web

Clustering – Comparison

Weak-FairO(n2)36.42/32.77%DivisiveClustering

FairO(n2 log(n))O(n2)O(n2)

36.78/33.35%36.55/32.92%38.57/32.15%

AgglomerativeClustering

GoodO(2n)(pract. better!)

43.81/41.02%FCA

UnderstandabilityWorst Case Time Complexity

F-Measure

Page 53: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…53 of 54

ISWeb - Information Systems & Semantic Web

lightweight heavyweight

Text Corpora

DataDictionaries

(EDI)

Ad-hocHierarchies(Yahoo!)

Folks-onomies

Glossaries

Thesauri

XMLDTDs

PrincipledInformal

Hierarchies

DBSchema

XMLSchema

Data Models(UML, STEP)

FormalTaxonomies

Frames(OKBC)

DescriptionLogics

First-order,Higher-order,Modal Logic

F-Logic

Ontology Learning & Re-use

Many more ways to explore along each dimension:

• Source

• Method

• Usage

Page 54: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…54 of 54

ISWeb - Information Systems & Semantic Web

Outlook

Formal concept analysis sensible to

NoiseLow probabilities

→ Smooth FCA

Folksonomies relying on social contextResourcesTagsTime

→ Exploration of Interaction and dynamics

Thank you!http://isweb.uni-koblenz.de

Page 55: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web> Information Systems & Semantic WebUniversity of Koblenz ▪ Landau, Germany

Thank you!

http://isweb.uni-koblenz.de

Page 56: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…56 of 54

ISWeb - Information Systems & Semantic Web

Literature

Overviews

OntologiesS. Staab, R. Studer. Handbook on Ontologies. Springer, 2004.

Ontology Population (by Annotation)S. Handschuh, S. Staab (eds). Annotation for the Semantic Web, IOS Press, 2003.

Ontology LearningPaul Buitelaar, Philipp Cimiano, Bernardo Magnini (eds.) Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005. P. Cimiano. Ontology Learning and Population from Text. Algorithms, Evaluation and Applications, Springer, November 2006.A. Mädche. Ontology Learning for the Semantic Web. Kluwer, 2002.

Specifically used here

P. Cimiano, S. Staab. Learning by Googling. SIGKDD Explorations. 6(2), pp. 24-33, 2004.P. Cimiano, A. Hotho, S. Staab. Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. JAIR - Journal of AI Research. 24: 305-339, 2005.K. Dellschaft, S. Staab. On How to Perform a Gold Standard based Evaluation of Ontology Learning. In: Proc. of ISWC-2006 – International Semantic Web Conference, Athens, GA, USA, Springer, LNCS, November 2006.R. Abbasi, S. Staab, P. Cimiano. Organizing Resources on Tagging Systems using T-ORG. In ESWC-2007 Workshop on Bridging the Gap between Semantic Web and Web 2.0, Innsbruck, 2007.

Page 57: Information Systems & Semantic Web University of … › ~staab › Research › Talks › ... · 2012-06-12 ·  Rabeeh Abbasi abbasi@uni-koblenz.de

<is web>

Rabeeh [email protected]

Bridging the Gap…57 of 54

ISWeb - Information Systems & Semantic Web

A Note on the Evaluation of Ontology Learning

The aposteriori Approach:ask domain expert for a per concept evaluation of the learned ontologyCount three categories of concepts:

• Correct : both in learned and the gold ontology• New : only in learned ontology, but relevant and should be in gold

standard as well• Spurious: useless

Compute precision = (correct + new) / (correct + new + spurious)As the result: The a priori evaluations are aweful – BUTA posteriori evaluations by domain experts still show very good results, very helpful for domain expert!

Sabou M., Wroe C., Goble C. and Mishne G.,Learning Domain Ontologies for Web Service Descriptions: an Experiment in Bioinformatics, In Proceeedings of the 14th International World Wide Web Conference (WWW2005), Chiba, Japan, 10-14 May, 2005.