data-ed webinar: data quality success stories
TRANSCRIPT
-
Dr. Peter Aiken, Founder, [email protected] Akens, Data Consultant, [email protected]
Data Quality Success StoriesDataversity Webinar 7-12-2016
-
Copyright 2016 by Data Blueprint
2
Peter Intro Slide
-
Copyright 2016 by Data Blueprint
3
Karen Akens, CDMP
Data management and solution development experience for numerous government and commercial clients
Connector between Business & IT based on practical experience in both arenas
Focus on Data Quality, Data Governance & Stewardship, and Business Intelligence
Speaker at EDW, DGIQ, various DAMA chapters
Board member of DAMA-Central Virginia.
-
Copyright 2016 by Data Blueprint
4
Information transparency Analytics Business Intelligence Increasing efficiencies Decreasing costs Driving holistic decision-
making across the organization
High Quality Data is Critical
-
Copyright 2016 by Data Blueprint
5
Getting Started with Data Quality
Our approach begins with discovering The data that is most impactful to your
business needs Your organizational capabilities to manage
data as an asset (foundational practices) The state of your technical environment
(technical practices)
and then laying out the path forward in a roadmap That is achievable and matches your
organizations abilities to deliver That builds momentum with specific, short-
term win projects That outlines a long-term vision and
implementation milestones
-
Copyright 2016 by Data Blueprint
Clients Data Landscape
6
Growth through acquisition No data accountability Fractured Technology Landscape Need to Align with Global Education Strategy
Challenge
No Comprehensive BI Capability Lack of Unified Product and Portfolio
Management Poor Data Quality & Unreliable Reporting Increasing Costs Due to Poor Data Mgmt.
Business Impact
Centralized Data Governance Program Formalize Data Stewardship Become Proactive vs. Reactive Increase Transparency and Decrease Cost
Opportunity
-
Copyright 2016 by Data Blueprint
7
Case Study - Supplier Master
Business Value Achievements: 1) Consolidated number of suppliers getting better terms and conditions
2) Reduced suppliers with immediate payment terms, increasing cash flow3) Removed duplicate supplier, increasing ability to track spending/reduce risk dup/payments4) Increase in email addresses, (order email/remittance email) faster communication vendors, reduce cost of remittances via post.
5) Improving quality contact information risk of missed payments and supplier relationships
7
0200400600800
1000120014001600
20-Oct-142-Dec-14
Data Governance Board 12/14/2014
-
Copyright 2016 by Data Blueprint
Challenges from a Lack of Data Quality
8
Its a ticking time bomb waiting to explode
No Account Creation Controls
No Standard Product
HierarchyNo
Universal Product Model
Visible Inactive Records
Labor Intensive Manual
Data Clean-up
Inconsistent Use of
Business Terms
Duplicate Accounts
Missing Remittance
Info
InaccurateReports
Missing Data
Who owns the data?
Who fixes?
-
Copyright 2016 by Data Blueprint
9
Selling the
Message
Share60-second Elevator Speech
Use Current Inconsistencies
that Impact Reporting
Obtain Senior Level
SponsorshipQuantify Value
of Data
Perform Data Quality Pilot
Demonstrate Stewardship
Success Story
And ask for help before you think you need it.
-
Copyright 2016 by Data Blueprint
If you want to avoid situations like this
10
One US system had 11,500 active cost centers, increasing risk mis-posting &
mapping effort
One S. African system missing electronic remittance info in 88% of
cases, payments sent by post, increasing cost & lag time
No Standard Product HierarchyCant determine product profitability
90% of suppliers in one US system on immediate payment terms, impact to cash flow
Lost revenue of $2 million annually, not utilizing rights previously granted
No processes for deactivating vendors, 222,000 obsolete vendors removed from one of two US
systems, many systems still contain ROT
Supply Chain 320 hours every year end tracking missing 1099 data to avoid tax
penalties
Data issues not fixed at the source;never ending battle - financial resources spend 35%
time reconciling data
Data that Matters
-
Copyright 2016 by Data Blueprint
you need to have this
Enterprise Data
Strategy
Data Governance & Stewardship Framework
which articulates
roles of data owners and
data stewards
Senior level sponsorship & organizational
culture that treats data as a strategic asset Data
Governance Board with a mandate to drive data
quality enterprise wide
Master Data Management
solution
Data quality principles that are embedded in process &
system designacross the enterprise
Standard Business
Glossary with an authoring
and publishing process
but not all at once!
-
Copyright 2016 by Data Blueprint
Where to Start When Developing a Data Quality Framework
No Accountability or Responsibility for Data
Many resources create, review or manage data No formal data stewardship roles and responsibilitiesDifficult to determine who is accountable & responsible for data
Establish Data Ownership & Increase Data Accountability
Define clear data ownership & stewardship roles, accountability & responsibility of data.
Define a vetting & onboarding process ensuring resource capacity
Establish decision rightsMaintain a master list of all Data Stewards and their related data domains.
Inconsistent Master Data
Fire drill to fix data issues in isolationLittle standardization across Lines of Business and Geographies
Difficult to report on a global level at needed level of detailNo formal master data change control process
Consistent Master Data Management
Develop master data standardsEstablish change controlDefine consistent data models Ongoing governance and stewardship of master data
Inconsistent Data DefinitionsPoor Data Quality
Business Terms definitions differ by groupData monitored in silos Fragmented use of a variety of toolsFocus on find and fix instead of root cause analysisNo standard reporting/tracking metrics
Term Authoring & PublishingIncrease Data Quality
Establish and implement process to define business accredited terms & publish for consumption enterprise wide
Stewards define business rules used to structure & profile data
Develop and implement DQ standards Ongoing Score carding & DQ metric reporting
-
Copyright 2016 by Data Blueprint
1313
Every work stream has a part to play if organization is to move from a reactive to proactive approach to improving data quality
Principle Implications1. Capture data right, first
timeWherever possible all data is captured once, at source, and validated on input
2. Engineer-in positive impacts on data quality
Wherever possible data quality improvement is automated, proactive and on-goingSystems, processes and products are inherently designed to improve data quality. e.g.
The possibility of errors when data is entered or changed is engineered out
Processes are designed to enter and maintain accurate data
Data entry is quick and intuitive for users
3. Integrate data quality into business processes
Data quality standards and rules are defined and integrated into day-to-day operations e.g. instances of non-compliance are fixed at root causeThere is clear accountability throughout the organization for promoting & sustaining good quality data
-
Copyright 2016 by Data Blueprint
Discovery - Identify potential data quality issues.Profile Data - Review sample data and existing data creation and usage process to provide context for business rule discussion with Data Owners and Business Data Stewards.Develop Business Rules - Work with Data Owners and Business Data Stewards to review documented business rules and capture undocumented rules. Define Metrics - Define metrics and acceptable thresholds against which to measure levels of quality.Evaluate Data with Metrics - Execute business rules against production data and evaluate results. Utilize acceptable thresholds set by the Data Governance Board to evaluate the data. Findings Review - Review the Findings with the Data Owners and Business Data Stewards.Remediate Anomalies - Implement and execute remediation process to fix problems with production data.Monitor Health - Define and implement a continuous monitoring/remediation plan to prevent and/or fix data quality problems in the future.
Repeatable Process
-
Copyright 2016 by Data Blueprint
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
Discovery
-
Copyright 2016 by Data Blueprint
Identifying Business Need & Resources
Discovery process not solely the responsibility of business, IT, or Data Governance/Data Quality organizations. Requires collaboration.
Business need or problem definitions can be influenced by a variety of sources such as:
Migrating to One ERP and One CRM
Master Data Management Processes
Suspected data quality deficiencies impacting BV & regulatory requirements
Data Governance Board initiatives
Needs of data-centric business strategies and opportunities
Directives from executive sponsorship team
-
Copyright 2016 by Data Blueprint
Identifying Business Need & Resources Identify Key Resources
BusinessData Quality
Center of Excellence
IT
Data Quality Analyst
IT Data Steward
BusinessData
Steward
DataOwner
-
Copyright 2016 by Data Blueprint
Identifying Business Need & Resources Refine Problem & Develop Initial Business Case
Data quality team refines original problem statement to ensure that the defined project objectives are achievable and in alignment with enterprise strategy.
Refinement of Problem
Statement
Begin a list of potential business impacts related to degraded quality of data within the project scope. Human capital expense for manual correction Revenue lost due to inaccurate information Regulatory fines from compliance violations Damage to corporate reputation
Initial Development of Business Case
-
Copyright 2016 by Data Blueprint
19
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
-
Copyright 2016 by Data Blueprint
20
What to Include?
Data Quality team should work to define the specific data elements and their encompassing source systems which will be included in the analysis.
Focus on Data that Answers Questions
Confirm that the data available in the defined data sources is capable of answering the questions posed by the project problem statement.
Identifying and Requesting Data
-
Copyright 2016 by Data Blueprint
21
Allows for a query against live data that can be re-utilized in a repeatable process.
Preferred for access to current data. Provides greater flexibility of data import options. Requires effort from IT team members and may have an
associated cost.
Build a Direct Database
Connection
Useful when direct connection is not available. Requires knowledgeable analyst for identifying correct
format and uploading. Each data load requires a new data extraction effort
Extract Data into Flat Files
Identifying and Requesting Data
Consider - Staging Area for data preparation
Two Options
-
Copyright 2016 by Data Blueprint
22
An initial profile should be run against the data without any business rules to confirm a successful data import.
This profile serves two purposes It is a sense check, allowing the analyst an overview of the
data to ensure the data was loaded properly. It provides an overview against which initial observations can
be made.
Initial Data Profiling and Discovery
-
Copyright 2016 by Data Blueprint
23
Initial Data Profile Output
Uniqueness
Percentages Counts Key Fields
Nulls
Percentages Counts Key Fields
Min/Max
Unexpected Values
Values outside domain
Data Review at a Glance
-
Copyright 2016 by Data Blueprint
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
-
Copyright 2016 by Data Blueprint
Data owners Business data stewards IT data stewards
Conduct a data profiling debrief session
Purpose of the data profiling exercise Scope of the data included in the profile Expectations of them to assist in the development and
application of business rules to future profiles.
Communicate to the data owners
Initial Data Profiling and DiscoveryReport Findings to Data Owners and Stewards
-
Copyright 2016 by Data Blueprint
It may be advisable to extract information from reporting tool results into another format which can be shared with all members of the data quality team. Excel PDF
Peculiarities of the data profile should be highlighted for review with the data owners.
Any inferences about potential business rules, as well as questions about patterns in the data, should be noted.
Initial Data Profiling and DiscoveryCollect and Report Information from Profile
-
Copyright 2016 by Data Blueprint
27
Next StepsInitial profiling is just the beginning of the Data Quality Process
The real benefit is in developing business rules that can be applied to data in order to continue the repeatable process and develop actionable insights.
-
Copyright 2016 by Data Blueprint
28
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
-
Copyright 2016 by Data Blueprint
29
Defining Business Rules and MetricsSourcing Business Rules
Possible sources of business rules
Master Data Standards documents
Subject matter expert interviews Data Stewards, Owners, and
Consumers Desktop procedures documents Process and system
documentation
What to look for
Allowable values Required fields Links between fields Fields that link between data
domains Potential duplicate records Insights into patterns that might
be found in the data profile
-
Copyright 2016 by Data Blueprint
30
Defining Business Rules and MetricsExample Business Rules Business Rule Related Business Action Data Quality Check
Tax Identifier is required for all non-employee vendors.
A W-9 is required before entering a new vendor into the vendor management system.
Rule is violated if Vendor Type Employee and Tax ID is Null
Tax Identifiers should be entered in the valid format for type of identification number.
Consistent formatting of tax identification numbers allows for higher confidence in searching and validation.
Rule is violated if tax ID is not in a valid format for the type, i.e. SSNs should be 999-99-9999; FEINs should be 99-9999999
Entities (companies, employees, products, etc.) should be unique and duplicates should not be entered.
Entity names entered into the system should be entered in a consistent format to assist with presentation and elimination of duplicates.
Rule is violated if entity names are duplicated.
E-mail addresses must be entered in valid formats.
Complete e-mail addresses should be entered into the system in order to ensure valid contact information.
Rule is violated if email address field is not a valid format (e.g. [email protected])
mailto:[email protected]
-
Copyright 2016 by Data Blueprint
31
Defining Business Rules and MetricsWhat Makes Good Metrics?
Meaningful to the Business the score should relate to improved business performance
Measurable must be able to be quantified within a discrete range
Controllable some action can be taken to change the data and improve the score
Reportable should provide enough information to the data steward to take action
Traceable must be able to be tracked over time to show improvement efforts
-
Copyright 2016 by Data Blueprint
32
Defining Business Rules and MetricsExamples of Metrics for Various Dimensions
Does each value fall within an allowed set of values? Does each value conform to the defined level of precision?Accuracy
Is data present in required fields?Completeness
Is the data used the same way across the enterprise?Consistency
Is the data up to date?Currency
Are identifying data elements unique?Integrity
Are data elements stored as assigned data types, e.g. is text stored in a telephone number field?Conformity
Do duplicate records exist?Duplication
-
Copyright 2016 by Data Blueprint
33
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
-
Copyright 2016 by Data Blueprint
34
Evaluating Data & Reporting FindingsRe-profile Data with Business Rules and Report Findings
Definition, refinement, and application of business rules should be repeated iteratively and reviewed until the data owners are satisfied with the accuracy and completeness of the business rule implementation.
Present all findings to the data owners and stewards for review.
The goal of this step is to finalize the data quality assessment definition such that an ongoing monitoring process can be modeled from the activity.
-
Copyright 2016 by Data Blueprint
35
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
-
Copyright 2016 by Data Blueprint
36
Two Routes
Find-and-Fix Process Change
Remediating AnomaliesCorrective Actions
Leverage the continuous monitoring of data quality reports to confirm that the data cleansing procedures are effective
BestPractice
-
Copyright 2016 by Data Blueprint
37
The costs of poor data quality include: human capital expense for manual correction revenue lost due to inaccurate information regulatory fines from compliance violations damage to corporate reputation
Data Stewardship Training - Session 2 37
Business Value from Data Quality
-
Copyright 2016 by Data Blueprint
38
Business Value Calculations# Errors Identified
Potential Cost Avoidance
Business Rule: Customer Address Invalid 84367 92,952.42$
Calculation Description:Manual effot to research and correct an invalid Customer Address
Average Salary for worker engaged in correcting address 25,000.00$ Average Salary including benefits 34,375.00$ Salary per hour 16.53$ Salary per minute 0.28$ # minutes to correct an invalid address 4Cost of manual effort to research and correct one address: 1.10$
Sheet1
# Errors IdentifiedPotential Cost Avoidance
Business Rule: Customer Address Invalid84367$ 92,952.42
Calculation Description:Manual effot to research and correct an invalid Customer Address
Average Salary for worker engaged in correcting address$ 25,000.00
Average Salary including benefits$ 34,375.00
Salary per hour$ 16.53
Salary per minute$ 0.28
# minutes to correct an invalid address4
Cost of manual effort to research and correct one address:$ 1.10
-
Copyright 2016 by Data Blueprint
39
State the issue (e.g. duplicate vendor records are causing issues with payments) Ask Why? five times
Remediating AnomaliesFive Whys for Root Cause (Danette McGilvray)
New master records are created instead of using existing ones.
Why are there duplicate records?
The reps dont want to search for existing records.
Why do they create new duplicate records?
Search takes too long.Why dont they want to
search for existing records?
Reps have not been trained in proper search techniques, system performance is poor.
Why is the search time too long?
Reps are measured by how quickly they can create a new master record and they dont see the implications of duplicate data downstream.
Why is long search time a problem?
-
Copyright 2016 by Data Blueprint
40
Profile Data
Develop Business Rules
Define Metrics
Evaluate Data with Metrics
Remediate Anomalies
Monitor Health
Discovery
FindingsReview
Findings Review
-
Copyright 2016 by Data Blueprint
41
MonitoringAt the Enterprise Level
Customer Product Supplier
Open Deferred Remediated Total Issues Open Deferred Remediated Total Issues Open Deferred Remediated Total Issues
156
48
97
11
225
140
19
66
145
43
90
12
Data Quality Issues by Domain as of 1-31-2015
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115# Open Issues
Customer
Product
Supplier
26 Critical Issues
26.80% of Open
31 Critical Issues
46.97% of Open
5 Critical Issues
11.63% of Open
Critical Data Quality Issues
Total Data Quality Issues open more than 30 days: 92
Total Data Quality Issues open more than 60 days: 31
Total Data Quality Issue open more than 90 days: 17
OpenDeferredRemediatedTotal Issues
-
Copyright 2016 by Data Blueprint
42
Establish Process to Consume Artifacts from
Data Profiling
Take corrective measures to improve the
data quality
Verify through monitoring that
improvements were implemented by either data cleansing, controls at
the root cause, or a combination of both.
The data stewards should understand how to
interpret the metrics, including what is being measured and why.
Monitoring can be costly so it should focus
primarily on those processes that are
essential to the business.
MonitoringMonitoring by Data Stewards
-
Copyright 2016 by Data Blueprint
Data Governance & Stewardship Maturity Model
Define
Control
Measure
Expand
Optimize
Business Glossary &
Roles
Data Standards
DQ Dashboards
Data Sprints
Continuous Improvement
Identify & catalog data assets, map to owners & stewards Stewards are identifying, defining critical data, publishing
business accredited terms for consumption
Define authorities, control changes Data Standards enforced by Stewards & Owners Harmonize definitions across functions, Lines of Business,
Geographies
Measuring data quality (DQ) Monitor ongoing stewardship operations & data use Data Standards implemented for new system
Repeatable data management processes in place Expand scope & breath of stewardship program Increase volume & efficiency of data it supports.
Iteratively enhance data quality & stewardship performance Continuously prioritize & act upon enhancement opportunities
from monitoring & expansion activities.
-
Copyright 2016 by Data Blueprint
Parts of organization unaware of DG/Stewardship and do their own thing; inconsistent with DG standard
Business units may be unaware of benefits and added value
Risk: Awareness
Business units refuse to adopt standards put forth System constraints make it difficult to implement new standards Business units do not engage the Global Data Services
team on projects
Risk: Adoption
Funding model that aligns with governance and organizational structure (i.e. building data connections to sources with DQ tool)
Cost of building and establishing Global Data Services
Risk: Funding
Stewardship skills are hard to maintain Build and sustain capability across a large world-wide
organization
Risk: Training
Data Governance and Stewardship is a long-term program, not a one-time project
Risk: Time to Build
Strong communication plan that is meshed into overall corporate communications
Corporate governance and strong sponsorship of DG/Stewardship
Mitigation: Awareness
Accountability and approval process by Data Owners and DG Enterprise Steering Committee
Document exceptions and work-arounds Corporate governance and Architecture Review Board to
align projects with DG/Stewardship
Mitigation: Adoption
DG & Stewardship funding established Cost allocation aligned with DG & Stewardship model Project specific costs
Mitigation: Funding
Partner with Data Architecture, Global Change & Process Excellence unit to provide a training curriculum
Define staffing models and career paths that outline training and align with DG/Stewardship
Mitigation: Training
Leverage parallel opportunities to accelerate build and implementation (Master Data, Global KPI reporting, One ERP road map, One CRM)
Pilot projects to quickly show tangible benefits
Mitigation: Time to Build
-
Copyright 2016 by Data Blueprint
QUESTIONS??
45
Data Quality Success StoriesDataversity Webinar 7-12-2016Peter Intro SlideKaren Akens, CDMPSlide Number 4Slide Number 5Clients Data LandscapeCase Study - Supplier MasterChallenges from a Lack of Data QualitySlide Number 9If you want to avoid situations like thisyou need to have thisSlide Number 12Slide Number 13Repeatable ProcessSlide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Next StepsSlide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Business Value from Data QualityBusiness Value CalculationsSlide Number 39Slide Number 40Slide Number 41Slide Number 42Slide Number 43Slide Number 44QUESTIONS??