taming a complex data landscape - wild apricot · [email protected] 614.783.1442 2 mdm as...
TRANSCRIPT
Taming a Complex Data
Landscape
Carl GerberPrincipal Architect &
Independent Consultant
Moving Towards MDM & SOA
Southwest Ohio Data Management
Association
February 27, 2009
[email protected] 614.783.1442 2
MDM as Pretext to SOAMDM as Pretext to SOAMDM as Pretext to SOAMDM as Pretext to SOA
Data Quality & StewardshipData Quality & StewardshipData Quality & StewardshipData Quality & Stewardship
Business Context & MotivationBusiness Context & MotivationBusiness Context & MotivationBusiness Context & Motivation
111
333
222
Data LandscapeData LandscapeData LandscapeData Landscape
555
Approach & TechnologyApproach & TechnologyApproach & TechnologyApproach & Technology444
2/27/2009
[email protected] 614.783.1442 3
MDM as a Pretext to SOA
• SOA without maturity is a Silver BulletSilver BulletSilver BulletSilver Bullet
• Services publish & consume DataDataDataData
• Must Get Data House In Get Data House In Get Data House In Get Data House In OrderOrderOrderOrder to publish data services
• Therefore: Comprehensive Comprehensive Comprehensive Comprehensive Master Data Management Master Data Management Master Data Management Master Data Management Builds a SOA FoundationBuilds a SOA FoundationBuilds a SOA FoundationBuilds a SOA Foundation
2/27/2009
[email protected] 614.783.1442 5
Business Context
• Support Revenue initiatives
• Enhance business decision making
• Deliver operational efficiencies and enhancements
2/27/2009
BadBadBadBad times, are the bestbestbestbest times, to prepare for goodgoodgoodgood times
Business Context
Pu
rch
asin
g,
Manu
factu
ring
E-C
om
merc
e
Mark
eting
Sale
s
Fin
ancia
ls
An
aly
tics
Pro
du
ct
Develo
pm
ent
Build Transparency & Trust Across Core Data Objects in the LandsBuild Transparency & Trust Across Core Data Objects in the LandsBuild Transparency & Trust Across Core Data Objects in the LandsBuild Transparency & Trust Across Core Data Objects in the Landscapecapecapecape
ERPCustom
Application
3rd Party
Hosted App
POS
Application
Data
Warehouse
3rd Party
Hosted AppERP
Custom
Application
Custom
Application
Spread-
sheetsERP
Spread-
sheets
Custom
Application
Spread-
sheets
Value Chain
Enterprise Systems2/27/2009 6
Motivation
2/27/2009 [email protected] 614.783.1442
Motivation – Key Questions
• Where is the business critical information?
• How can I relate the business critical information together?
• Where are the inconsistencies that lead us to lower efficiencies?
2/27/2009 [email protected] 614.783.1442
Motivation – Architecture
Copyright MIT Sloan Center for Information Systems Research. Used with permission.9
Motivation – Benefits
Cost ReductionCost ReductionCost ReductionCost Reduction• Rationalization of Redundant dataRationalization of Redundant dataRationalization of Redundant dataRationalization of Redundant data
• Reduced use of Storage Reduced use of Storage Reduced use of Storage Reduced use of Storage
• Reduction in Effort for Data MaintenanceReduction in Effort for Data MaintenanceReduction in Effort for Data MaintenanceReduction in Effort for Data Maintenance
• Reduce Duplicate purchase of 3Reduce Duplicate purchase of 3Reduce Duplicate purchase of 3Reduce Duplicate purchase of 3rdrdrdrd party dataparty dataparty dataparty data
Improved EfficiencyImproved EfficiencyImproved EfficiencyImproved Efficiency
• Reduction of Data Inconsistencies (in Silos)Reduction of Data Inconsistencies (in Silos)Reduction of Data Inconsistencies (in Silos)Reduction of Data Inconsistencies (in Silos)
• Multiple checks for Data ErrorsMultiple checks for Data ErrorsMultiple checks for Data ErrorsMultiple checks for Data Errors
• Single Single Single Single ““““trustedtrustedtrustedtrusted”””” source of source of source of source of
• Customer DataCustomer DataCustomer DataCustomer Data
• Product DataProduct DataProduct DataProduct Data
Improved CustomerImproved CustomerImproved CustomerImproved CustomerServiceServiceServiceService
• Reduce Exposure of incorrect data Reduce Exposure of incorrect data Reduce Exposure of incorrect data Reduce Exposure of incorrect data
• Reduce Customer complaints / penaltiesReduce Customer complaints / penaltiesReduce Customer complaints / penaltiesReduce Customer complaints / penalties
• Systematic improvement of Data QualitySystematic improvement of Data QualitySystematic improvement of Data QualitySystematic improvement of Data Quality
2/27/2009 [email protected] 614.783.1442
Motivation – Benefits
Improved ReportingImproved ReportingImproved ReportingImproved Reporting• Reduce Data Inconsistencies in Reports Reduce Data Inconsistencies in Reports Reduce Data Inconsistencies in Reports Reduce Data Inconsistencies in Reports
• Provide Enterprise Level ReportingProvide Enterprise Level ReportingProvide Enterprise Level ReportingProvide Enterprise Level Reporting
• Better understanding of Risks and ExposureBetter understanding of Risks and ExposureBetter understanding of Risks and ExposureBetter understanding of Risks and Exposure
ProductivityProductivityProductivityProductivity• Gain access to Gain access to Gain access to Gain access to ““““Cross SiloCross SiloCross SiloCross Silo”””” View of DataView of DataView of DataView of Data
• Promote Knowledge SharingPromote Knowledge SharingPromote Knowledge SharingPromote Knowledge Sharing
• Establish mechanism for Data OwnershipEstablish mechanism for Data OwnershipEstablish mechanism for Data OwnershipEstablish mechanism for Data Ownership
• Improve Regulatory ReportingImprove Regulatory ReportingImprove Regulatory ReportingImprove Regulatory Reporting
• Ensure Traceability and Auditability of DataEnsure Traceability and Auditability of DataEnsure Traceability and Auditability of DataEnsure Traceability and Auditability of Data
• Assist in SOX ComplianceAssist in SOX ComplianceAssist in SOX ComplianceAssist in SOX ComplianceComplianceComplianceComplianceCompliance
2/27/2009 [email protected] 614.783.1442
Data Landscape
2/27/2009 [email protected] 614.783.1442
[email protected] 614.783.1442 13
Complex Data LandscapeComplex Data LandscapeComplex Data LandscapeComplex Data Landscape
Data is Understood Locally
Metadata is Not Documented
Data Landscape – Typical Challenges
PointPointPointPoint----totototo----Point Data FlowsPoint Data FlowsPoint Data FlowsPoint Data Flows
Custom Interfaces
Transformations Hard-coded
Redundant DataRedundant DataRedundant DataRedundant Data
Difficult to Synchronize
Data Ownership is Unknown
System of Record is Uncertain
Organic Data EnvironmentOrganic Data EnvironmentOrganic Data EnvironmentOrganic Data Environment
Un-planned Growth
Constantly Changing
2/27/2009
[email protected] 614.783.1442 14
Data Landscape – As Is
“In large established
organizations, the IT
infrastructure can sometimes appear to have been
constructed in a series of
weekend handyman jobs. It does the job it was designed to
do but is apt to create problems
whenever it is applied to
another purpose.”
Competing on Analytics,
Thomas Davenport, Jeanne
Harris.
2/27/2009
[email protected] 614.783.1442 15
Approach
• Inventory Important Data Inventory Important Data Inventory Important Data Inventory Important Data ObjectsObjectsObjectsObjects
• Document Current Business Document Current Business Document Current Business Document Current Business Data Owner RolesData Owner RolesData Owner RolesData Owner Roles
• Apply Consistent Data Quality Apply Consistent Data Quality Apply Consistent Data Quality Apply Consistent Data Quality
ChecksChecksChecksChecks
• Implement an Enterprise Data Implement an Enterprise Data Implement an Enterprise Data Implement an Enterprise Data
ModelModelModelModel
• DeDeDeDe----couple Systems via Hubscouple Systems via Hubscouple Systems via Hubscouple Systems via Hubs
• Leverage Existing ToolsLeverage Existing ToolsLeverage Existing ToolsLeverage Existing Tools
2/27/2009
[email protected] 614.783.1442 16
Approach - Technique
• Engage Subject Matter ExpertsEngage Subject Matter ExpertsEngage Subject Matter ExpertsEngage Subject Matter Experts
– Inventory & Data Quality RequirementsInventory & Data Quality RequirementsInventory & Data Quality RequirementsInventory & Data Quality Requirements
– SourceSourceSourceSource----totototo----Target Data MappingsTarget Data MappingsTarget Data MappingsTarget Data Mappings
– Document Semantic MeaningDocument Semantic MeaningDocument Semantic MeaningDocument Semantic Meaning
• Engage Data ArchitectsEngage Data ArchitectsEngage Data ArchitectsEngage Data Architects
– Model Data HubsModel Data HubsModel Data HubsModel Data Hubs
– Publish Semantic MeaningPublish Semantic MeaningPublish Semantic MeaningPublish Semantic Meaning
– Establish ETL PatternsEstablish ETL PatternsEstablish ETL PatternsEstablish ETL Patterns
• Error HandlingError HandlingError HandlingError Handling
• Data Quality ChecksData Quality ChecksData Quality ChecksData Quality Checks
• Create Enterprise Data AssetsCreate Enterprise Data AssetsCreate Enterprise Data AssetsCreate Enterprise Data Assets
– ReReReRe----used Across Systems and Projectsused Across Systems and Projectsused Across Systems and Projectsused Across Systems and Projects
2/27/2009
[email protected] 614.783.1442 17
Approach - Technologies• Data Modeling CASE ToolData Modeling CASE ToolData Modeling CASE ToolData Modeling CASE Tool
• Data Profiling & Cleansing ToolsData Profiling & Cleansing ToolsData Profiling & Cleansing ToolsData Profiling & Cleansing Tools
• Universal Metadata ToolUniversal Metadata ToolUniversal Metadata ToolUniversal Metadata Tool
– MS ExcelMS ExcelMS ExcelMS Excel
• Metadata RepositoryMetadata RepositoryMetadata RepositoryMetadata Repository– IntranetIntranetIntranetIntranet
• Extract ProcessesExtract ProcessesExtract ProcessesExtract Processes
– ETL Tool Direct ConnectorsETL Tool Direct ConnectorsETL Tool Direct ConnectorsETL Tool Direct Connectors
– Custom Code & Flat FilesCustom Code & Flat FilesCustom Code & Flat FilesCustom Code & Flat Files
• Transform & LoadTransform & LoadTransform & LoadTransform & Load
– RDBMSRDBMSRDBMSRDBMS----based Data Hubsbased Data Hubsbased Data Hubsbased Data Hubs
– ETL ToolETL ToolETL ToolETL Tool
2/27/2009
Used with permission
[email protected] 614.783.1442 19
Data Landscape – To Be
Other
Data Hubs
Item & Merchandise
Inventory
SalesLocation
PromotionsOthers
Merchandise
Inventory
Point of Sale
Others
Stores
Business
Intelligence
Data
Warehouse
Financials
2/27/2009
[email protected] 614.783.1442 20
Data Integration Hubs
• Built Using Existing ToolsBuilt Using Existing ToolsBuilt Using Existing ToolsBuilt Using Existing Tools
• New Mind SetNew Mind SetNew Mind SetNew Mind Set
• Leverages Your Data Leverages Your Data Leverages Your Data Leverages Your Data
AssetsAssetsAssetsAssets
2/27/2009
[email protected] 614.783.1442 21
Data Hub AnatomyMaster Transaction
Enterprise Data Model ● ●
Error Handling
Insert & Notify ● ●
Write to Error Table & Notify ●
Data Cleanse & Notify ●
Threshold Reached - Stop ● ●
History
Week of Daily Snapshots ● ●
Cyclic (Deltas) ●
Full (Snapshot) ● ●
Update/Destructive Overwrites (Type 1) ●
Insert Only – No Update ●
2/27/2009
[email protected] 614.783.1442 22
Data Hub AnatomyMaster Transaction
Process Control Dimensions
Run Dates, Batch & Audit IDs ● ●
Error Identifiers ● ●
Source System ● ●
Subject Area & Data Stewards ● ●
Transaction, Processed, Post Dates ●
Transfer Architecture
Publish & Subscribe ●
Push ● ●
Structure
Primary Key Surrogate Key + Date
Surrogate Key
Alternate Key Business Keys Biz Key+Trans_ID+ Date
2/27/2009
Methodology
2/27/2009 [email protected] 614.783.1442
Methodology
1. Understand what exists1. Understand what exists1. Understand what exists1. Understand what exists• Create an inventory
2. Organize the information2. Organize the information2. Organize the information2. Organize the information• Classify the data into relevant categories
3. Understand what it means3. Understand what it means3. Understand what it means3. Understand what it means• Get the semantic metadata
4. Determine the distribution of the shared data4. Determine the distribution of the shared data4. Determine the distribution of the shared data4. Determine the distribution of the shared data• Understand the relationships between the data categories
2/27/2009 [email protected] 614.783.1442
Methodology
5. Map the information5. Map the information5. Map the information5. Map the information• Attribute level mapping
• Understand the data flows
• Understand the hierarchies
6. Publish the understanding6. Publish the understanding6. Publish the understanding6. Publish the understanding• Provide enterprise level knowledge-base
7. Create a Data Quality Framework7. Create a Data Quality Framework7. Create a Data Quality Framework7. Create a Data Quality Framework• Leverage data as an asset
• Resolve inconsistency & inaccuracy in the data environment
2/27/2009 [email protected] 614.783.1442
1. Understand what exists
• Inventory Databases & Data Stores
• Catalog Core Business Data Objects
2/27/2009 [email protected] 614.783.1442
Criteria for Determining Core Business Objects
1. Object must have an entity associated with it (unique ID)
2. Object must be present in multiple databases
3. Object must be associated with multiple attributes.
QUESTION : QUESTION : QUESTION : QUESTION : What data objects are important to the business?What data objects are important to the business?What data objects are important to the business?What data objects are important to the business?
Technical CriteriaTechnical CriteriaTechnical CriteriaTechnical Criteria
1. Knowledgeable individuals in multiple areas must acknowledge importance
2. Multiple individuals must agree that objects are “core” to business.
Business CriteriaBusiness CriteriaBusiness CriteriaBusiness Criteria
2/27/2009 [email protected] 614.783.1442
2. Organize the informationProduct
Product
(Item)
Catalog
Price
Product Grp
Dimensions
-----------
-----------
-----------
Customer
Customer
Account
Type
-----------
-----------
-----------
Location
City
State
Province
Country
District
Region
-----------
-----------
-----------
Market
Market
Segment
Promotion
Channel
-----------
-----------
-----------
Financial
Chart Accnts
Accnts Payble
Currency
-----------
-----------
-----------
General
Address
Language
Time
Calendar Yr
Fiscal Year
Quarter
Month
Week
Vendor
Vendor
Type
Distributor
-----------
-----------
-----------
Assets
Fixed
Digital
Infrastructure
-----------
-----------
-----------
Transaction Objects
Sales
Invoices
Shipments
Purchase
Orders
Organization
Company
Division
Group
Employee
HR Org Unit
-----------
Shared Data Object Classification 28
3. Understand what it means
Source: Global IDs, Used with Permission29
3. Understand what it means
Source: Global IDs, Used with Permission30
4. Determine the distribution of the shared data
1. Object ID must be present in multiple databases
2. Metadata should show common attributes
3. Attribute data should show strong overlaps.
QUESTION : QUESTION : QUESTION : QUESTION : How are data objects distributed across databases?How are data objects distributed across databases?How are data objects distributed across databases?How are data objects distributed across databases?
Technical CriteriaTechnical CriteriaTechnical CriteriaTechnical Criteria
1. Current Data Stewards should confirm the object distribution.
2. Current Data Stewards should confirm differences in data structures
Business CriteriaBusiness CriteriaBusiness CriteriaBusiness Criteria
Criteria for Understanding Commonalities Across Databases
2/27/2009 [email protected] 614.783.1442
Shared Data Object Distribution
2/27/2009 [email protected] 614.783.1442
5. Map the information
Source: Global IDs, Used with Permission34
6. Publish the understanding
• Remember When You Couldn’t Find the Metadata, Glossary, Data Dictionary?
– Create a Table in Core Databases for
Definitions
• A Page on the Intranet
• Data Portal
2/27/2009 [email protected] 614.783.1442
7. Create a Data Quality Framework
• Ownership
• System of Record
• Data Validation Rules– Completeness
– Timeliness
– Accuracy
• Stewardship– Roles & Responsibilities
– Process
2/27/2009 [email protected] 614.783.1442
Determining System of Record, Ownership, Stewardship
1. Data Origination
2. Data Linkages (Inbound and Outbound)
3. Trust Level
How should System of Record be determined?How should System of Record be determined?How should System of Record be determined?How should System of Record be determined?
1. Stakeholder from Business side
2. Stakeholder with most interest in data accuracy
3. Stakeholder with most to lose, with incorrect data
ACCOUNTABILITY : Data Accuracy from Business PerspectiveACCOUNTABILITY : Data Accuracy from Business PerspectiveACCOUNTABILITY : Data Accuracy from Business PerspectiveACCOUNTABILITY : Data Accuracy from Business Perspective
Who should be the Data Owner?Who should be the Data Owner?Who should be the Data Owner?Who should be the Data Owner?
1. Stakeholder from Technology side
2. Stakeholder with most knowledge of data quality
3. Stakeholder with ability to measure, monitor and fix inconsistencies.
ACCOUNTABILITY : Day to Day Monitoring and Error CorrectionACCOUNTABILITY : Day to Day Monitoring and Error CorrectionACCOUNTABILITY : Day to Day Monitoring and Error CorrectionACCOUNTABILITY : Day to Day Monitoring and Error Correction
Who should be the Data Steward?Who should be the Data Steward?Who should be the Data Steward?Who should be the Data Steward?
2/27/2009 [email protected] 614.783.1442
Criteria for Understanding Data Inconsistencies
1. Data Profiling Analysis• Valid Values
• Valid Formats
• Valid Patterns
• Valid Lengths
QUESTION : QUESTION : QUESTION : QUESTION : How are data inconsistencies identified?How are data inconsistencies identified?How are data inconsistencies identified?How are data inconsistencies identified?
Technical CriteriaTechnical CriteriaTechnical CriteriaTechnical Criteria
1. Current Data Stewards should confirm inconsistencies found intechnical analysis.
2. Data Stewards should provide mechanisms for fixing inconsistency.
Business CriteriaBusiness CriteriaBusiness CriteriaBusiness Criteria
2. Data Quality Analysis.Create Data Quality Metrics for
• Data Accuracy
• Data Completeness
• Data Timeliness
• Data Conformity
2/27/2009 [email protected] 614.783.1442
[email protected] 614.783.1442 39
Data Governance – Goal
• Business teams are doing Business teams are doing Business teams are doing Business teams are doing marketing, manufacturing, marketing, manufacturing, marketing, manufacturing, marketing, manufacturing,
salessalessalessales…………
• Data governance is implicitly Data governance is implicitly Data governance is implicitly Data governance is implicitly
practiced across the practiced across the practiced across the practiced across the business teams todaybusiness teams todaybusiness teams todaybusiness teams today
Goal: Goal: Goal: Goal: Formalize Data GovernanceFormalize Data GovernanceFormalize Data GovernanceFormalize Data Governance
2/27/2009
[email protected] 614.783.1442 40
Data Governance – Style
• Take cue from corporate Take cue from corporate Take cue from corporate Take cue from corporate cultureculturecultureculture
• TopTopTopTop----down with a burning down with a burning down with a burning down with a burning imperativeimperativeimperativeimperative
• BottomBottomBottomBottom----up rally the troops, up rally the troops, up rally the troops, up rally the troops, lead by examplelead by examplelead by examplelead by example
2/27/2009
41
Data Stewardship ProcessData Steward
Information Quality Leader Data Steward
Business Data Owner
•Fix Inconsistencies.
•Implements Business Data Rules.
•Makes process & system changes,
e.g. screen edits.
•Performs data modeling and
facilitates data definition.
•Establishes Data Quality Culture.
•Creates Business Data Definitions.
•Empowers Subject Matter Experts &
Data Stewards to correct data
defects.
•Establishes Data Quality standards,
tolerances & metrics.
•Performs Root Cause Analysis.
•Gains approval for improvement
plans.
Plan
DoCheck
Act
•Raises awareness
•Facilitates cross-functional teams.
•Tracks & Reports Quality Metrics.
•Facilitates Root Cause Analysis
•Measures effectiveness of
improvements.
Data Steward
Information Quality Leader Data Steward
Business Data Owner
•Fix Inconsistencies.
•Implements Business Data Rules.
•Makes process & system changes,
e.g. screen edits.
•Performs data modeling and
facilitates data definition.
•Establishes Data Quality Culture.
•Creates Business Data Definitions.
•Empowers Subject Matter Experts &
Data Stewards to correct data
defects.
•Establishes Data Quality standards,
tolerances & metrics.
•Performs Root Cause Analysis.
•Gains approval for improvement
plans.
Plan
DoCheck
Act
•Raises awareness
•Facilitates cross-functional teams.
•Tracks & Reports Quality Metrics.
•Facilitates Root Cause Analysis
•Measures effectiveness of
improvements.
Adapted from the PDCA Cycle. Originally conceived by Walter Shewhart in 1930's, and later adopted by W. Edwards Deming.
The model provides a framework for the improvement of a process or system.
Closing Thoughts
2/27/2009 [email protected] 614.783.1442
Closing Thoughts – Change Isn’t Easy
2/27/2009 [email protected] 614.783.1442
Closing Thoughts
• Obtain a License to Practice– Top-Down Program Charter
• Gain Buy-In from Your Peers
• Stand on the Shoulders of Others
• Deliver Value Every 90 Days
• Good Places to Start– Customer-Facing Data
– Link to an Initiative
– Service Oriented Architecture Pre-work2/27/2009 [email protected] 614.783.1442