Ontology Ontology Alignment/MatchingAlignment/Matching
Prafulla PalwePrafulla Palwe
AgendaAgenda► IntroductionIntroduction
Being serious about the semantic webBeing serious about the semantic web Living with heterogeneity Living with heterogeneity Heterogeneity problemHeterogeneity problem I have a plan for you I have a plan for you
► Matching ProblemMatching Problem Matching OperationMatching Operation MotivationMotivation Schema Matching Vs Ontology MatchingSchema Matching Vs Ontology Matching CorrespondenceCorrespondence AlignmentAlignment
► Matching ProcessMatching Process Sequential compositionSequential composition Parallel compositionParallel composition
► Application DomainsApplication Domains TraditionalTraditional EmergentEmergent
► ClassificationClassification Matching DimensionsMatching Dimensions
► Basic TechniquesBasic Techniques Element Level Element Level Structure LevelStructure Level
► Summary and ChallengesSummary and Challenges
IntroductionIntroduction►Being serious about the semantic web -
It is not one guy's ontology It is not several guys' common ontology It is many guys and girls' many ontologies So it is a mess, but a meaningful mess
IntroductionIntroduction
► Living with heterogeneity - The semantic web will be:
►Huge►Dynamic►Heterogeneous
These are not bugs, they are features. We must learn to live with them.
IntroductionIntroduction
►Heterogeneity problem – Resources being expressed in different ways must be
reconciled before being used. Mismatch between formalized knowledge can occur when:
► different languages are used;► different terminologies are used;► different modeling is used.
IntroductionIntroduction
► I have a plan for you – ReconciliationI have a plan for you – Reconciliation
Matching ProblemMatching Problem
►Matching OperationMatching Operation DefinitionDefinition – Matching operation takes as – Matching operation takes as
input ontologies, each consisting of a set input ontologies, each consisting of a set of discrete entities (e.g., tables, XML of discrete entities (e.g., tables, XML elements, classes, properties) and elements, classes, properties) and determines as output the relationships determines as output the relationships (e.g., equivalence, subsumption) holding (e.g., equivalence, subsumption) holding between these entitiesbetween these entities
Matching ProblemMatching Problem
►Motivation –Motivation – 2 XML Schemas 2 XML Schemas 2 Ontologies 2 Ontologies
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem
Matching ProblemMatching Problem►Schema mapping Vs ontology mappingSchema mapping Vs ontology mapping
Differences -Differences -►Schemas often do not provide explicit Schemas often do not provide explicit
semantics for their datasemantics for their data Relational schemas provide no generalizationRelational schemas provide no generalization
►Ontologies are logical systems that constrain Ontologies are logical systems that constrain the meaningthe meaning
Ontology definition as set of logical axiomsOntology definition as set of logical axioms
Commonalities -Commonalities -►Schemas and ontologies provide a vocabulary Schemas and ontologies provide a vocabulary
of terms that describes the domain of interestof terms that describes the domain of interest►Schemas and ontologies constrain the Schemas and ontologies constrain the
meaning of terms used in the vocabulary.meaning of terms used in the vocabulary.
Matching ProblemMatching Problem
►CorrespondenceCorrespondence Definition –Definition – Given 2 ontologies O and O’ , a Given 2 ontologies O and O’ , a
correspondence between M between O and correspondence between M between O and O’ is a 5-uple : <id,e,e’,R,n> such that:O’ is a 5-uple : <id,e,e’,R,n> such that:►id is a unique identifier of the correspondence.id is a unique identifier of the correspondence.►e and e’ are entities of O and O’ (e.g. XML e and e’ are entities of O and O’ (e.g. XML
Elements, classes)Elements, classes)►R is a relation (e.g. equivalence (=), disjointness R is a relation (e.g. equivalence (=), disjointness
(_|_))(_|_))►n is a confidence measure in some n is a confidence measure in some
mathematical structure (typically in the [0,1] mathematical structure (typically in the [0,1] range)range)
Matching ProblemMatching Problem
►AlignmentAlignment Definition – Definition – Given 2 ontologies O and O’, an alignment Given 2 ontologies O and O’, an alignment
A between O and O’:A between O and O’:►Is a set of correspondence on O and O’Is a set of correspondence on O and O’►With some cardinality: 1-1, 1-* etc.With some cardinality: 1-1, 1-* etc.►Some additional metadata (method, date, Some additional metadata (method, date,
properties etc)properties etc)
Matching ProcessMatching Process
Matching ProcessMatching Process
Matching ProcessMatching Process
Matching ProcessMatching Process
Matching ProcessMatching Process
Matching ProcessMatching Process
Matching ProcessMatching Process
►General Basic Matching ProcessGeneral Basic Matching Process
Matching ProcessMatching Process
►Sequential CompositionSequential Composition
Matching ProcessMatching Process
►Parallel compositionParallel composition
Matching ProcessMatching Process
►Similarity Filter, alignment extractor Similarity Filter, alignment extractor and alignment filter –and alignment filter –
Matching ProcessMatching Process
►Aggregation Operations –Aggregation Operations – There are many different ways to aggregate matcher results, There are many different ways to aggregate matcher results,
usually depending on confidence/similarity:usually depending on confidence/similarity:► Triangular norms (min, weighted products) useful for selecting Triangular norms (min, weighted products) useful for selecting
only the best resultsonly the best results► Multidimensional distances (Eudidean distance, weighted Multidimensional distances (Eudidean distance, weighted
sum) useful for taking into account all dimensionssum) useful for taking into account all dimensions► Fuzzy aggregation (min, weighted average) useful for Fuzzy aggregation (min, weighted average) useful for
aggregating competing algorithms and averaging their resultsaggregating competing algorithms and averaging their results► Other specific measures (e.g., ordered weighted average)Other specific measures (e.g., ordered weighted average)
Application DomainsApplication Domains
►Traditional - Traditional - Ontology evolutionOntology evolution Schema integrationSchema integration Catalog integrationCatalog integration Data integrationData integration
Application DomainsApplication Domains
►Ontology EvolutionOntology Evolution
Application DomainsApplication Domains
►Catalog IntegrationCatalog Integration
Application DomainsApplication Domains
►EmergentEmergent P2P information sharingP2P information sharing Agent communicationAgent communication Web service compositionWeb service composition Query answering on the webQuery answering on the web
Application DomainsApplication Domains
►P2P information sharingP2P information sharing
Application DomainsApplication Domains
►Web Service CompositionWeb Service Composition
Application DomainsApplication Domains
►Agent communicationAgent communication
ClassificationsClassifications
►Matching DimensionsMatching Dimensions Input DimensionsInput Dimensions
►Underlying models (e.g. XML, OWL)Underlying models (e.g. XML, OWL)►Schema Level Vs Instance LevelSchema Level Vs Instance Level
Process Dimensions Process Dimensions ►Approximate Vs ExactApproximate Vs Exact► Interpretation of the inputInterpretation of the input
Output DimensionsOutput Dimensions►CardinalityCardinality►Equivalence Vs Diverse relationsEquivalence Vs Diverse relations►Graded Vs Absolute ConfidenceGraded Vs Absolute Confidence
ClassificationsClassifications
►Three LayersThree Layers Upper LayerUpper Layer
► Granularity of matchGranularity of match► Interpretation of the input informationInterpretation of the input information
Middle LayerMiddle Layer► Represents classes of elementary (basic) matching techniquesRepresents classes of elementary (basic) matching techniques
Lower LayerLower Layer► Based on the kind of input which is used by elementary Based on the kind of input which is used by elementary
matching techniquesmatching techniques
ClassificationsClassifications►Classification of schema based Classification of schema based
techniquestechniques
Basic TechniquesBasic Techniques►Element Level TechniquesElement Level Techniques
String based – String based – Prefix -Prefix -
► Takes an input 2 strings and checks whether the first string Takes an input 2 strings and checks whether the first string starts with the second starts with the second
► e.g. net = network but also hot = hotele.g. net = network but also hot = hotel Suffix – Suffix –
► Takes an input 2 strings and checks whether the first string Takes an input 2 strings and checks whether the first string ends with the second ends with the second
► e.g. ID = PID but also word = sworde.g. ID = PID but also word = sword
Edit Distance –Edit Distance –► Takes as input 2 strings and calculates the number of edit Takes as input 2 strings and calculates the number of edit
operations (insertion,deletion,substitution) of characters operations (insertion,deletion,substitution) of characters required to transform one string into other normalized by required to transform one string into other normalized by length of the max string.length of the max string.
► editDistance(NKN, Nikon) = 0.4editDistance(NKN, Nikon) = 0.4
Basic TechniquesBasic Techniques Language based –Language based – Tokenization –Tokenization –
► Parses names into tokens by recognizing punctuation, casesParses names into tokens by recognizing punctuation, cases► Hands-Free_Kits Hands-Free_Kits <hands, free, kits> <hands, free, kits>
Lemmatization –Lemmatization –► Analyses morphologically tokens in order to find all their Analyses morphologically tokens in order to find all their
possible basic formspossible basic forms► Kits Kits Kit Kit
Elimination –Elimination –► Discards empty tokens that are articles, prepositions, Discards empty tokens that are articles, prepositions,
conjuctions conjuctions ► a, the, by, type of, their, from a, the, by, type of, their, from
Basic TechniquesBasic Techniques
►Structure Level Techniques Structure Level Techniques Ontologies are viewed as graph-like structure containing Ontologies are viewed as graph-like structure containing
terms and their inter-relationships.terms and their inter-relationships.
Taxonomy basedTaxonomy based►Bounded path matchingBounded path matching
These take 2 paths with links between classes These take 2 paths with links between classes defined by the hierarchical relations, compare terms defined by the hierarchical relations, compare terms and their positions along these paths and identify and their positions along these paths and identify similar terms.similar terms.
►Super(sub)-concept rules Super(sub)-concept rules If super concepts are the same, the actual concepts If super concepts are the same, the actual concepts
are similar to each otherare similar to each other
Basic TechniquesBasic Techniques
Tree based Tree based Children Children
►2 non leaf schema elements are structurally 2 non leaf schema elements are structurally similar if their immediate children sets are similar if their immediate children sets are highly similarhighly similar
Leaves Leaves ►2 non leaf schema elements are structurally 2 non leaf schema elements are structurally
similar if their leaf sets are highly similar, even similar if their leaf sets are highly similar, even if their immediate children are not.if their immediate children are not.
Basic TechniquesBasic Techniques
Basic TechniquesBasic Techniques
Basic TechniquesBasic Techniques
Summary and ChallengesSummary and Challenges
► SummarySummary Ontology Matching and alignment is the process of Ontology Matching and alignment is the process of
developing the common or most common developing the common or most common structure/semantic terms out of 2 or more different structure/semantic terms out of 2 or more different ontologies/structures/schemas.ontologies/structures/schemas.
Different efficient and complex algorithms using Different efficient and complex algorithms using basic techniques of matching process, can be basic techniques of matching process, can be developed for matching and alignment generation.developed for matching and alignment generation.
► ChallengesChallenges Developing generic and highly efficient matching Developing generic and highly efficient matching
and alignment generation algorithms.and alignment generation algorithms.
Thank YouThank You