ontology alignment

28
Ontology Alignment Ontology Alignment

Upload: toni

Post on 21-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Ontology Alignment. Problem Statement. Given N Ontologies (O 1 ,…, O n ) In a Particular Domain Different Level of Coverage Goal Evaluate Commonality of Entities Rank Entities. Challenges & Solutions. Ontology Alignments Largest Common Subgraph (LCS) Vector Space Model (TF/ IDF) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology Alignment

Ontology AlignmentOntology Alignment

Page 2: Ontology Alignment

Problem StatementProblem Statement

Given N Ontologies (O1 ,…, On)◦In a Particular Domain ◦Different Level of Coverage

Goal◦Evaluate Commonality of Entities◦Rank Entities

Page 3: Ontology Alignment

Challenges & SolutionsChallenges & SolutionsOntology Alignments

◦Largest Common Subgraph (LCS)◦Vector Space Model (TF/ IDF)

Accuracy of Entities in Aligned Concepts◦Ranking Entities

Page 4: Ontology Alignment

LCS Algorithm for Multiple LCS Algorithm for Multiple OntologiesOntologies

Find the LCS for two

Ontologies

Align LCS with other

Ontologies

Page 5: Ontology Alignment

Largest Common Subgraph Largest Common Subgraph (LCS) Algorithm between two (LCS) Algorithm between two OntologiesOntologies

Page 6: Ontology Alignment

Data Structure for LCS Data Structure for LCS Algorithm Algorithm

C1

C2

C3

C4

C5

C6C

7

C’1

C’2

C’3

C’4

C’5

C’6

Similarity Measure for Corresponding EntitiesNode Similarity + Structural Similarity

C1(C1,C’1, .95),(C1,C’6,.77),(C1,C’3,.71),(C1,C’4,.65),(C1,C’5,.54),(C1,C’2,.34)

C2(C2,C’3, .85),(C2,C’2,.67),(C2,C’1,.51),(C2,C’4,.45),(C2,C’5,.24),(C2,C’6,.14)

C3(C3,C’4, .90),(C3,C’1,.67),(C3,C’3,.51),(C3,C’2,.45),(C3,C’5,.34),(C3,C’6,.24)

C4(C4,C’2, .95),(C4,C’1,.65),(C4,C’3,.51),(C4,C’4,.45),(C4,C’5,.23),(C4,C’6,.14)

C5(C5,C’4, .80),(C5,C’1,.67),(C5,C’3,.65),(C5,C’2,.35),(C5,C’5,.34),(C5,C’6,.24)

C6(C6,C’1, .20),(C6,C’1,.15),(C6,C’3,.12),(C6,C’2,.12),(C6,C’5,.09),(C6,C’6,.08)

C7(C7,C’4, .31),(C7,C’1,.25),(C7,C’3,.23),(C7,C’2,.15),(C7,C’5,.14),(C7,C’6,.12)

Page 7: Ontology Alignment

Node Similarity: Instance-based Node Similarity: Instance-based Representing types using N-grams*Representing types using N-grams*

Node Similarity (Name-Match)◦Find Common N-gram (N = 2) for

corresponding columns

StrName FENAME Status

LOCUST-GROVE DR

LOCUST GROVE

BUILT

LOUISE LN LOUISE BUILT

Street Laddress

Raddress

TRAIL RANGE DR

1600 1798

CR45/MANET CT

2500 2598

CA

N-gram types from A.StrName = {LO, OC, CU,ST,…..}

N-gram types from B.Street = {TR, RA, R4, 5/,…..}

CB

*Jeffrey Partyka, Neda Alipanah, Latifur Khan, Bhavani Thuraisingham & Shashi Shekhar, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.

Page 8: Ontology Alignment

Node Similarity: Instance-Node Similarity: Instance-basedbasedVisualizing Entropy and Conditional Visualizing Entropy and Conditional EntropyEntropy

H(C) = –Σpi log pi for all x є C1 U

C2

H(C | T) = H (C,T) – H(C) for all x є C1 U C2 and t є T

Page 9: Ontology Alignment

Node Similarity: Faults of Node Similarity: Faults of this Methodthis Method• Semantically similar columns are not

guaranteed to have a high similarity score City Countr

y

Dallas USA

Houston USA

Kingston Jamaica

Halifax Canada

Mexico City

Mexico

ctyName country

Shanghai China

Beijing China

Tokyo Japan

New Delhi India

Kuala Lumpur

Malaysia

2-grams extracted from A: {Da, al, la, as, Ho, ou, us…}

A є O1 B є O2

2-grams extracted from B: {Sh, ha, an, ng, gh, ha, ai, Be, ei, ij…}

Page 10: Ontology Alignment

: Column 1

: Column 2

Similarity = H(C|T) / H(C)

C1 є O1 C2 є O2

Step3Step3: Calculate Similarity

Step1Step1: Extract distinct keywords from compared columns

Step2Step2: Group distinct keywords together into semantic clusters

Keywords extracted from columns = {Johnson, Rd., School, 15th,…}

“Rd.”,”Dr.”,”St.”,”Pwy”,…“Johnson”,”School”,”Dr.”….

C1 C2

C1 U C2

roadName City

Johnson Rd. Plano

School Dr. Richardson

Zeppelin St. Lakehurst

Road County

Custer Pwy Collin

15th St. Collin

Parker Rd. Collin

Node Similarity: Instance-Node Similarity: Instance-basedbasedK-medoid + NGD instance similarityK-medoid + NGD instance similarity

Page 11: Ontology Alignment

Node Similarity: Instance-Node Similarity: Instance-basedbased Problems with K-medoid + NGD*Problems with K-medoid + NGD*It is possible that two different geographic entities (ie: Dallas,

TX and Dallas County) in the same location will have a very low computed NGD value, and thus, be mistaken for being similar:

roadName City

Johnson Rd. Plano

School Dr. Richardson

Zeppelin St. Lakehurst

Alma Dr. Richardson

Preston Rd. Addison

Dallas Pkwy Dallas

Road County

Custer Pwy Cooke

15th St. Collin

Parker Rd. Collin

Alma Dr. Collin

Campbell Rd. Denton

Harry Hines Blvd.

Dallas

*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Semantic Schema Matching Without Shared Instances,” to appear in Third IEEE International Conference on Semantic Computing, Berkeley, CA, USA - September 14-16, 2009.

Page 12: Ontology Alignment

NodeNode Similarity: Instance-basedSimilarity: Instance-basedUsing geographic type information*Using geographic type information*We use a gazetteer to determine the geographic type of an instance:

O1 O2Geotypes

*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” to appear in ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009.

Page 13: Ontology Alignment

Node Similarity: Instance-basedNode Similarity: Instance-basedResults of Geographic Matching Over 2 Results of Geographic Matching Over 2 Separate Road Network Data SourcesSeparate Road Network Data Sources

Page 14: Ontology Alignment

Structural Similarity Structural Similarity

◦ Structural Similarity MeasurementI. Neighbor Similarity

C’1

C’3

C’4C’5

C1

C2

C3

C5

C6

Page 15: Ontology Alignment

Structural Similarity Structural Similarity

Structural Similarity MeasurementI. Properties Similarity

C1

C2

C3

C4

C5

C6

C7

C’1

C’2

C’3

C’4 C’

5

C’6

isA

isA

isA

subClass

hasFlavor

hasColor

subClass

isA

hasFlavorhasFlavo

r

hasFood

hasDrink

subclass

RTC1 = [3isA, 2subClass,1hasFlavor,1hasColor, 0 hasFood,1 hasTopping] RTC2 = [1isA, 1subClass,2hasFlavor,0hasColor,1hasFood]

hasTopping

Page 16: Ontology Alignment

SimilaritySimilarityResults of Pairwise Ontology Results of Pairwise Ontology Matching(I3CON Matching(I3CON Benchmark)Benchmark)

Matching using Name Similarity + RTS

Matching usingName Similarity + (RTS and Neighbor)

Page 17: Ontology Alignment

Ontology MatchingOntology MatchingVector Space Model (VSM)Vector Space Model (VSM)

Define the VSM for Each Entity• Collection of Words in label, edge types,

comment and neighbors.

C1

C2

C3

C4

C5

C6

C7

C’1

C’2

C’3

C’4 C’

5

C’6

isA

isAisA

subClass

hasFlavor

hasColor

subClass

isA

hasFlavorhasFlavo

r

hasFood

hasDrink

subclass

VSM(C1)= [1C1,1C2,1C3,1C5,1C6,1isA, 2subClass,1hasFlavor]

VSM(C’1)= [1C’3, C’4,1C’5, 1isA, 2hasFlavor]

hasTopping

Page 18: Ontology Alignment

Ontology MatchingOntology MatchingVector Space Model (VSM)Vector Space Model (VSM)• Update VSM by Word Score Using TF/IDF

• Calculate Cosine Similarity for

corresponding entities

Cos(VSM(C1) , VSM(C2) )

Page 19: Ontology Alignment

Aligned ConceptsAligned Concepts• Aggregate different

ontologies• Example

Page 20: Ontology Alignment

Aligned ConceptsAligned Concepts• Statistical Model

Global Ontology

Entity1

O1: Person

Entity2

O1:hasFather

O1:hasMaleParent

O2:hasFather

Entity3 Entity4

O1: hasMon

O2: Person

O1:hasMother

O1:hasFemaleParent

O1:Harry

α1 α2 α3 α4

β1

Β2

Β3

β5

β4

β6

β7

β9

Β10=1

O2:hasMother

β8

Page 21: Ontology Alignment

Aligned ConceptsAligned Concepts• Calculate the probabilities of

appearance of each entity in GO

• Use Maximum likelihood Estimation

• Calculate and

Page 22: Ontology Alignment

ReificationReification

Reification can be considered as a metadata about RDF/OWL statements.

Ontology Alignment approaches rely on probabilistic measures to find matches between concepts in different ontologies.

Reification data can be attached with the alignment information to show the 'match factor' between two concepts in OWL-2.

Advanced analytic algorithms can benefit from reification in establishing the relevance of search results.

Page 23: Ontology Alignment

OWL - 2OWL - 2 OWL – 2 is an extension to OWL. Some of

the new features in OWL 2 are as follows - Syntactic sugar (eg. Disjoint union of classes) Property chains Richer datatypes, data ranges Qualified cardinality restrictions new constructs that increase expressivity simple metamodeling capabilities extended annotation capabilities Following link lists all the new features in OWL

2http://www.w3.org/TR/2009/REC-owl2-new-features-20091027/

Page 24: Ontology Alignment

Ontology Extraction from Ontology Extraction from Text DocumentsText Documents

Page 25: Ontology Alignment

Problem StatementProblem StatementOur solution for ontology

construction of documents◦Use hierarchical clustering algorithm to

build a hierarchy for documents Hierarchical Agglomerative Clustering (HAC) Modified Self-Organizing Tree (MSOT) Hierarchical Growing Self-Organizing Tree

(HGSOT)

◦Assign concept for each node in the hierarchy Usage of the WordNet

Page 26: Ontology Alignment

Concept AssignmentConcept Assignment Concept Assignment to document

LVQ1: topic vector (t) is built by training with the training documents.

Clusters in LVQ are predefined. Each topic cluster is represented by a node in the output map, and the LVQ use pre-labeled data for training. Only the best match node’s vector (winning

vector) will be updated, rather than its neighbors. Vector updating rule will use following equations:

If data x and best match node c belong to the same class,

If data x and best match node c belong to the different class.

))((),()()()1( twxcittwtw iii

))((),()()()1( twxcittwtw iii

Page 27: Ontology Alignment

Concept AssignmentConcept Assignment◦ Concept sense disambiguation

One keyword associated with more than one concept in WordNet.

Keyword “gold” has 4 senses in WordNet and keyword “copper” has five senses in WordNet.

For disambiguation of concepts we apply the same technique (i.e., cosine similarity measure) used in topic tracking.

To construct a vector for each sense we will use a short description that appears in WordNet.

Page 28: Ontology Alignment

Concept AssignmentConcept AssignmentConcept assignment for leaf node

◦ If there are majority documents have the same concept we assign the concept to the leaf.

◦ If there is not majority we will choose a generic concept of all concept from WordNet to the leaf.

Concept assignment for non leaf node◦ If there are majority children have the same

concept we assign the concept to the internal node.

◦ If there is not majority we will choose a generic concept of all concept from WordNet to the internal node.