matchit 1.1: data integration with semantic mapping technologies

22
4 North Park Suite 106 Hunt Valley, MD 21030 410-584-0009 www.revelytix.com Ontology Based Information Management MatchIT 1.1: Data Integration with Semantic Mapping Technologies Michael Schidlowsky Sr. Software Architect

Upload: sileas

Post on 12-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

MatchIT 1.1: Data Integration with Semantic Mapping Technologies. Michael Schidlowsky Sr. Software Architect. Data Integration. Motivated by: Organizational Changes Mergers and Acquisitions Internal reorganizations (e.g., DHS) Data Mining Standards Conformance Migration Efforts - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

4 North Park • Suite 106 • Hunt Valley, MD 21030 • 410-584-0009 • www.revelytix.com

Ontology Based Information Management

MatchIT 1.1: Data Integration with

Semantic Mapping Technologies

Michael Schidlowsky

Sr. Software Architect

Page 2: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Data Integration

Motivated by:

• Organizational Changes

Mergers and Acquisitions

Internal reorganizations (e.g., DHS)

• Data Mining

• Standards Conformance

• Migration Efforts

• Legacy Systems

• Decouple data sources from application code

Page 3: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Data Integration

Challenges for integration specialist include:

• Domain-specific terms

• Unfamiliarity with source schemas

• Large size of schema set

• Semantics often not captured

• Captured semantics

Stored in ad-hoc formats

Cannot be reused to facilitate future data integration efforts

Page 4: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Data Integration: ExampleBackground:

Acme Inc., merges with CompuGlobalHyperMeganet.

Technical Challenge:

Need “Virtual Database” of all sales for all stores in real-time.

• Which fields represent customers?

CUSTOMERID

CUST_ID

SSN

• Which fields represent ‘Price’?

Sale_Amt

Total_Sale

• What if your database has 10,000 columns?

Page 5: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Data Integration: ExampleBackground:

HR needs to use employee information for new company portal.

Technical Challenge:

Data must be in XML and conform to standard HR schema.

• Find all fields related to Address?

RESIDENCE

PREV_RESIDENCE

• What if your database has 10,000 columns?

Page 6: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Ideal Matching Solution• Finds lexical relationships

• Captures semantic information

• Finds semantic relationships

• Provides programmatic access to results (API)

• Fast

• Scalable

• Human Involvement

Page 7: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

MatchIT Philosophy

Best Matching tool already exists!

What is meant by “ID”?

Page 8: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

MatchIT Philosophy

Best Matching tool already exists!

What is meant by “ID”?

- “PLEASE PRESENT ID”

Page 9: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

MatchIT Philosophy

Best Matching tool already exists!

What is meant by “ID”?

- “PLEASE PRESENT ID”

- NY, NJ, ID

Page 10: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

MatchIT Philosophy

Best Matching tool already exists!

What is meant by “ID”?

- “PLEASE PRESENT ID”

- NY, NJ, ID

- SUPEREGO, EGO, ID

Page 11: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

MatchIT 1.1

- MatchIT is a semantic and lexical matching tool.

- Session Outline:

- Import and process schemas

- Perform lexical matching

- Create and manage a semantic vocabulary

- Perform semantic matching

- Demonstrate 3rd Party integration with Data Integration tool (MetaMatrix)

Page 12: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Import & Process SchemasRevelytix Models are RDF/OWL

• Flexible model architecture

• Extensible

• Interoperable

Current Importers:

• JDBC

• XML Schema

• MetaMatrix XMI ModelsImporter Demo

Page 13: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Lexical Matching

Uses lexical distance measures to determine lexical similarity.

• Fastest matching technique

• Requires no work other than importing schemas

• Often yields interesting results

Lexical Matching Demo

Page 14: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Create Vocabulary from Schemas

A Vocabulary is

• A set of symbols

• Occurrences of those symbols in your schemas

• Binding of each symbol to one or more semantic concepts

• Created by MatchIT from schemas using tokenization algorithms.

• Reusable

Page 15: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Tokenization AlgorithmsDifferent schemas require different tokenization techniques.

Tokenization algorithms determine how symbols are extracted from schemas:

• Capitalization

• Delimiters

• English Language

Vocabulary Demo

Page 16: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Matching Techniques

MatchIT currently uses two types of matching techniques:

• Lexical Matching

Attempts to determine similarity based on the lexical distance between them.

• Semantic Matching

Attempts to determine similarity based on the ontological distance between them within a semantic knowledge base.

Page 17: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Parts Supplier Schema(as seen by a person)

Page 18: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Parts Supplier Schema (as seen by a computer)

Page 19: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Semantic Matching

How semantically similar are two concepts?

car

motor vehicle

self-propelled vehicle

wheeled vehicle

vehicle

craft

aircraft

heavier-than-air craft

airplanetruck

is a

is a

is a

is a is a

is a

is a

is a

is a

car and truck are very similar

Car and airplane are less similar

Page 20: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

Semantic Matching

Uses knowledge base distance measures to determine semantic similarity.

• Presents ranked candidate matches

• Based on semantics captured in Vocabularies

• The only way to effectively find relationships between lexically dissimilar symbols:

GenderCode SexCode

Provider Supplier

Amount Quantity Semantic Matching Demo

Page 21: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

3rd Party Integration

MatchIT Integration

• MatchIT Java API

• Stand-alone application

• Embeddable application (as Eclipse plug-ins).

• Hides unapproved matches

• Useful for various 3rd Party applications:

- Data Integration

- Data Discovery

- Ontology Mediation

- Search

- Metadata Management

- Data Cleansing

MetaMatrix Demo

Page 22: MatchIT 1.1:  Data Integration with Semantic Mapping Technologies

4 North Park • Suite 106 • Hunt Valley, MD 21030 • 410-584-0009 • www.revelytix.com

Ontology Based Information Management

Questions?

MatchIT 30-day trial available at http://www.revelytix.com

Michael Schidlowsky

[email protected]