copyright 2009, information builders. slide 1 iway enterprise information management (eim) data...

36
Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Vincent Deeney Solutions Architect Information Builders New York User Forum November 18, 2009

Upload: conrad-mcdonald

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Copyright 2009, Information Builders. Slide 1

iWay Enterprise Information Management (EIM) Data Quality and Master Data Management

Vincent Deeney Solutions Architect

Information Builders

New York User ForumNovember 18, 2009

Copyright 2009, Information Builders. Slide 2

Data Quality and Master Data Management

AgendaBusiness Drivers Behind Data ManagementUsage – Where To Use Data Management Impact Of Data QualityWhat Is Data Management?

Data Profiling Data Cleansing Data Enrichment Match & Merge (De-duplication) Master Data Management

Examples and Demonstration

Copyright 2009, Information Builders. Slide 3

Drivers

Copyright 2007, Information Builders. Slide 3

Copyright 2009, Information Builders. Slide 4

Business Drivers

Customer Service

Marketing Campaigns

Process Improvement

Regulatory Compliance

Fraud Detection

Copyright 2009, Information Builders. Slide 5

Data Drivers

Accuracy Correct Information

Completeness Thorough Information

Consistency Uniform Information

Validity Valid Information

Copyright 2009, Information Builders. Slide 6

Usage

Copyright 2007, Information Builders. Slide 6

Copyright 2009, Information Builders. Slide 7Copyright 2007, Information Builders. Slide 7

Analytic EIM (Batch or Real-time)

Analytical EIM focuses on improving the data quality and accuracy of BI reports

Operational EIM (Real-time)

Goal is to synchronize operational systems data with golden record so that you have quality and consistency across enterprise processes.

EIM Usages

Copyright 2009, Information Builders. Slide 8Copyright 2007, Information Builders. Slide 8

EIM Dimensions

DW/DM

System

Copyright 2009, Information Builders. Slide 9

ProcessesTransactionsDocuments

Supplier, PartnersCustomer, Exchange

DataWarehouse,Data Mart.

ODSApplications

Portals

EnterpriseSearchBI and Real-Time

Dashboards

Universal Adapter Suite

Core Integration Services

Reporting ApplicationData Management

Mainframe Data, Applications and

Transactions

Applications,CRM, ERP, etc

Databases, DataWarehouse, Data Marts

Documents, Files, ContentManagement

Messages,Transactions,

E-Mails

SWIFT, HIPAA, EDI Formats

EIM and WebFOCUS Solutions

Core Reporting Services

Copyright 2009, Information Builders. Slide 10

Impact

Copyright 2007, Information Builders. Slide 10

Copyright 2009, Information Builders. Slide 11

Impact of Data Quality Address Data

3658; 36%

2727; 27%

3799; 37%

Verified OK

To be checked manually

To be corrected manually

36 %Naturally Correct

64 %Manual Attention

Copyright 2009, Information Builders. Slide 12

3 %Manual Attention

3658; 36%

2727; 27%

3492; 34%

307; 3%

Verified OK

Standardized & Verified OK

Corrected automatically

To be checked manually

61 %Automated Cleansing

36 %Naturally Correct

+

Impact of Data Quality Address Data

Copyright 2009, Information Builders. Slide 13

What Is Data Management?

Data Quality and Master Data Management

Copyright 2007, Information Builders. Slide 13

Copyright 2009, Information Builders. Slide 14

Data Profiling

Profiling Basic Analysis

Minimums Maximums Averages Counts Etc.

Patterns Extremes Quantities Frequency Analysis Foreign Key Analysis Masking Drilldown

Copyright 2007, Information Builders. Slide 14

Copyright 2009, Information Builders. Slide 15

Parsing data parsed into components (pattern

based)

Standardization transformation into standard format

(Jim Smith -> James Smith) standard and nonstandard

abbreviations (Str. -> Street) language-specific replacements

Data quality improvement validation against rules validation against reference tables

Large number of domain oriented algorithms - examples:

Address Party Vehicle Name Identification number Credit Card number Bank account number

Extension by custom validation steps

using complex function and rules including

Levenshtein distance SoundEx internal (java-based) functions

Data Cleansing

Copyright 2009, Information Builders. Slide 16

External company register standard company name registration ID official address national bank account classification

Geocodes adding geo-codes for identified

address allows showing map locations used for geomarketing or insurance

risks

External address register adding missing zip-codes, street

names, city, etc. validating existence against register

of addresses

List of names, surnames, academic and social titles

validating existence standardization (PHD -> Ph.D.) adding missing components

Data Enrichment

Copyright 2009, Information Builders. Slide 17

Unification identification of the set of records

connected to one person address vehicle contact …etc.

Deduplication golden record creation (the best

representation of the identified subject)

Identification new data entries – to identify subject

(person, address, etc.) to which the new record is connected (matched)

Complex business rules using sophisticated algorithms and

functions including Levenshtein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc.

Match & Merge

Copyright 2009, Information Builders. Slide 18

Master Data Management (MDM) Defined MDM for customer data systems are software products that:

Support the global identification, linking and synchronization of customer information across heterogeneous data sources

Create and manage a central, database-based system of record Enable the delivery of a single view for all stakeholders

MDM architectural styles vary in: Instantiation of the customer master data — varying from the maintenance of a physical

customer profile to a more-virtual, metadata-based indexing structure The latency of customer master data maintenance — varying from real-time,

synchronous, reading and writing of the master data in a transactional context to batch, asynchronous harmonization of the master data across systems

An MDM program potentially encompasses the management of customer, product, asset, person or party, supplier and financial masters.

Copyright 2009, Information Builders. Slide 19

MDM Architectures

Master is Single Version of Truth Data Quality at Master Updates occur at Sources Updates propagated to MasterMaster

Source Source

Source Source

Multiple Versions of Truth Data Quality is Ongoing Updates occur at Sources Keys and Metadata in Registry Updates propagated to other Sources

(Optional)

Master

Source Source

Source Source

Consolidated

Registry

Master is Single Version of Truth Data Quality is Ongoing Updates occur at Sources or Master Updates propagated to other Sources

Master

Source Source

Source Source

Coexistence

Master

Source Source

Source Source Master is Single Version of Truth Data Quality at Master Updates occur at Master Updates propagated to Sources

Centralized

Copyright 2009, Information Builders. Slide 20

Examples And Demonstration

Copyright 2007, Information Builders. Slide 20

Copyright 2009, Information Builders. Slide 21

Data Quality Examples

Copyright 2007, Information Builders. Slide 21

Copyright 2009, Information Builders. Slide 22

Original data – before cleansing

Source data

Name G SIN Birth Date Address

Dr. John Smith F 000000000 12/16/1978 14618 110 Ave Surrey V3R 2A9

Smith W. John M 095-242-434 16.12.1978 Surrey 14618 110 Ave

John William Smith SIN095242434 781612 25 Linden Str Toronto M4X 1V5

Dr. J.W. Smith M 095242433 11/16/78

John Smith 095252433 16.11.1978 8500 Leslie L3T 7M8 Toronto

Smith John 16.11.1978 8500 Leslie street Marham

John Smiht 095252433 16.11.1978

Jane Watson 420347213 1982 600-8500 Leslie str. Toronto L3T 7M8

Watson Jane F 420-347-213 5.1.1982 8500 Leslei street Toronto L3T 7M8

Jane Smith F SIN420347213 1982-01-05

J. Smith 420-347-213

Copyright 2009, Information Builders. Slide 23

Titles Parsing

Name G SIN Birth Date Titles Clearing Codes

Dr. John Smith F 000000000 12/16/1978 Dr. Academic_Title

Smith W. John M 095-242-434 16.12.1978

John William Smith

SIN095242434 781612

Dr. J.W. Smith M 095242433 11/16/78 Dr. Academic_Title

John Smith 095252433 16.11.1978

Smith John 16.11.1978

John Smiht 095252433 16.11.1978

Jane Watson 420347213 1982

Watson Jane F 420-347-213 5.1.1982

Jane Smith F SIN420347213 1982-01-05

J. Smith 420-347-213

Copyright 2009, Information Builders. Slide 24

Name Parsing

First M Last G SIN Birth Date Clearing Codes

John Smith F 000000000 12/16/1978 Academic_Title

John W. Smith M 095-242-434 16.12.1978

John William Smith SIN095242434 781612

J. W. Smith M 095242433 11/16/78 Academic_Title

John Smith 095252433 16.11.1978

John Smith 16.11.1978

John Smiht 095252433 16.11.1978 Last_name_not_found

Jane Watson 420347213 1982

Jane Watson F 420-347-213 5.1.1982

Jane Smith F SIN420347213 1982-01-05

J. Smith 420-347-213

Copyright 2009, Information Builders. Slide 25

Update gender (based on first name)

First M Last G SIN Birth Date Clearing Codes

John Smith M 000000000 12/16/1978 ...le, Gender_changed

John W. Smith M 095-242-434 16.12.1978

John William Smith M SIN095242434 781612 Gender_updated

J. W. Smith M 095242433 11/16/78 Academic_Title

John Smith M 095252433 16.11.1978 Gender_updated

John Smith M 16.11.1978 Gender_updated

John Smiht M 095252433 16.11.1978 Last_name_not_found

Jane Watson F 420347213 1982 Gender_updated

Jane Watson F 420-347-213 5.1.1982

Jane Smith F SIN420347213 1982-01-05

J. Smith 420-347-213

Copyright 2009, Information Builders. Slide 26

Validate Social Security Number

First M Last G SIN Birth Date Clearing Codes

John Smith M 000000000 12/16/1978 ...nged, SIN_blacklist

John W. Smith M 095-242-434 16.12.1978 SIN_removed_dashes

John William Smith M SIN095242434 781612 ...ated, SIN_extra_chars

J. W. Smith M 095242433 11/16/78 ...mic_Title, SIN_invalid

John Smith M 095252433 16.11.1978 Gender_updated

John Smith M 16.11.1978 ...updated, SIN_missing

John Smiht M 095252433 16.11.1978 Last_name_not_found

Jane Watson F 420347213 1982 Gender_updated

Jane Watson F 420-347-213 5.1.1982 SIN_removed_dashes

Jane Smith F SIN420347213 1982-01-05 SIN_extra_characters

J. Smith 420-347-213 SIN_removed_dashes

Copyright 2009, Information Builders. Slide 27

Validate Social Security Number (after)

First M Last G SIN Birth Date Clearing Codes

John Smith M 12/16/1978 ...nged, SIN_blacklist

John W. Smith M 095242434 16.12.1978 SIN_removed_dashes

John William Smith M 095242434 781612 ...ated, SIN_extra_chars

J. W. Smith M 11/16/78 ...mic_Title, SIN_invalid

John Smith M 095252433 16.11.1978 Gender_updated

John Smith M 16.11.1978 ...updated, SIN_missing

John Smiht M 095252433 16.11.1978 Last_name_not_found

Jane Watson F 420347213 1982 Gender_updated

Jane Watson F 420347213 5.1.1982 SIN_removed_dashes

Jane Smith F 420347213 1982-01-05 SIN_extra_characters

J. Smith 420347213 SIN_removed_dashes

Copyright 2009, Information Builders. Slide 28

Validate Birth Date

First M Last G SIN Birth Date Clearing Codes

John Smith M 12/16/1978 ...nged, SIN_blacklist

John W. Smith M 095242434 16.12.1978 SIN_removed_dashes

John William Smith M 095242434 781612 ...ated, SIN_extra_chars

J. W. Smith M 11/16/78 ...mic_Title, SIN_invalid

John Smith M 095252433 16.11.1978 Gender_updated

John Smith M 16.11.1978 ...updated, SIN_missing

John Smiht M 095252433 16.11.1978 Last_name_not_found

Jane Watson F 420347213 1982 .._updated, BD_invalid

Jane Watson F 420347213 5.1.1982 SIN_removed_dashes

Jane Smith F 420347213 1982-01-05 SIN_extra_characters

J. Smith 420347213 SIN_removed_dashes

Copyright 2009, Information Builders. Slide 29

Validate Birth Date (after)

First M Last G SIN Birth Date Clearing Codes

John Smith M 1978-12-16 ...nged, SIN_blacklist

John W. Smith M 095242434 1978-12-16 SIN_removed_dashes

John William Smith M 095242434 1978-12-16 ...ated, SIN_extra_chars

J. W. Smith M 1978-11-16 ...mic_Title, SIN_invalid

John Smith M 095252433 1978-11-16 Gender_updated

John Smith M 1978-11-16 ...updated, SIN_missing

John Smiht M 095252433 1978-11-16 Last_name_not_found

Jane Watson F 420347213 .._updated, BD_invalid

Jane Watson F 420347213 1982-01-05 SIN_removed_dashes

Jane Smith F 420347213 1982-01-05 SIN_extra_characters

J. Smith 420347213 SIN_removed_dashes

Copyright 2009, Information Builders. Slide 30

Prepared data (after cleansing)

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Smith M 1978-11-16

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht M 095252433 1978-11-16

Jane Watson F 420347213 L3T 7M8;ON;Markham;8500 Leslie Str.

Jane Watson F 420347213 1982-01-01 L3T 7M8;ON;Markham;8500 Leslie Str.

Jane Smith F 420347213 1982-01-05

J. Smith 420347213

Copyright 2009, Information Builders. Slide 31

Master Data Management Examples

Copyright 2007, Information Builders. Slide 31

Copyright 2009, Information Builders. Slide 32

Prepared data (after cleansing)

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Smith M 1978-11-16

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht 095252433 1978-11-16

Jane Watson F 420347213 L3T 7M8;ON;Markham;8500 Leslie Str.

Jane Watson F 420347213 1982-01-01 L3T 7M8;ON;Markham;8500 Leslie Str.

Jane Smith F 420347213 1982-01-05

J. Smith 420347213

Copyright 2009, Information Builders. Slide 33

Match

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Smith M 1978-11-16

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht 095252433 1978-11-16

Jane Watson F 420347213 L3T 7M8;ON;Markham;8500 Leslie Str.

Jane Watson F 420347213 1982-01-01 L3T 7M8;ON;Markham;8500 Leslie Str.

Jane Smith F 420347213 1982-01-05

J. Smith 420347213

Copyright 2009, Information Builders. Slide 34

Merge

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Golden record

First Last G SIN Birth Date Address

John Smith M

095242434 1978-12-16

M4X 1V5;ON;Toronto;25 Linden Street

The newest permanent address

The most frequent address

V3R 2A9;BC;Surrey;14618 110 Avenue

Copyright 2009, Information Builders. Slide 35

Demonstration

Copyright 2007, Information Builders. Slide 35

Copyright 2009, Information Builders. Slide 36

Thank-You

Copyright 2007, Information Builders. Slide 36