essential elements and metrics for a data warehouse...
Post on 11-Mar-2018
216 Views
Preview:
TRANSCRIPT
Essential Elements and Metrics for a Data Warehouse TCOE
Amita Awasthi
Infosys Limited (NASDAQ: INFY)
Abstract
We know from our experience that Data warehouse is a must for large organizations, as it
provides insight into huge volume of data and enables them to take business decisions. Testing
of data warehouse becomes a critical factor as any issue with the quality of data in the data
warehouse can lead to huge issues.
It is not only a functional testing area but also a topic of research where we see rapid evolution
of tools and technology, the latest trend is Big Data which can support all the 3 Vs(Volume,
Velocity, Variety) of data which are big challenges in Data Warehouse. All the top Data
warehouse appliances, ETL, BI tool vendors are in the race to extend their offerings to support
hadoop and other Big Data platforms. Big data may also become a source of information to our
regular Enterprise Data Warehouse where it can feed in the unstructured data and helps in
advance analytics. Researchers are working to see how maximum benefit can be achieved by
combing EDW and Big Data.
There are other advancements as well like Data warehouse on cloud, Mobile Business
Intelligence etc. For any organization to keep a tab of these advancements and extract
maximum benefits out of the data it is very much required to have a dedicated Data Warehouse
Testing Center of excellence in place. This paper is to elaborate and discuss the essential
elements and metrics for a data warehouse testing center of excellence2
Abstract
The information provided in this paper is a result of work done in defining the data warehouse
testing center of excellence roadmap for 2 clients in last 8 months.
This paper explains the essential components and benefits of moving to a DWT COE.
3
FSI Client
Process
People
Infra/Tools
DWT Manager
Test Environment Management
Tools Management
Templates Estimates
Staffing
KM
Methodologies
Quality KPIs
Automation ROI
Project Mgmt.
Test Strategy
Project Manager Tools SME
Domain SME
Solution Architect
Technology SME
DWT Project 1
DWTProject 2
DWTProject 3
DWT Project 4
DWT Project 5
Group 2Group 1 Group 3 Group 4
Key Challenges
• Lack of DWT skilled resources, no competency development plan in place
• Spending too much time in collecting data/metrics. DWT specific metrics not defined
• Not able to focus on the latest trends in DWT space in market
• No centralize repository for processes, training, tools, SMEs, Best practices, templates etc.
• No clear direction on career opportunities for DWH tester
• Lessons learnt, best practices from similar project is not documented and shared across all DWT project
PeopleTechnology
Process
DWT
COE
Key Takeaways
• Characteristics of a Data Warehouse TCOE
• Metrics specific to Data Warehouse Testing
• Data Warehouse Competency enablement framework
• Latest technology trends in Data Warehouse and how QA is
prepared for them
• How your existing testing landscape can be transformed to Data
Warehouse TCOE
4
Target Audience
Audience Prerequisite- Basic Knowledge of Databases, Data
warehouse testing
Intended Audience- Managers, Technical Leads and Testers of a Data
Warehouse Testing Project
5
Speakers Profile
Amita Awasthi is a PMP certified Project Manager with Infosys. She did her B Tech from
HBTI, Kanpur. During her 13 years at Infosys she has gained experience in handling
large virtual teams and different type of clients, projects, people and technologies. She
has been recognized at organization level and a winner of “Infosys Excellence Award” ,
“KM Trailblazer Champion” and “People’s Manager” .
She is a SME for Data warehouse testing and Infosys DWH testing solution called
Perfaware. Thought leadership, knowledge management, Project & Program
Management, Data warehouse testing and Big Data are her key areas of interest and she
has presented papers in Internal and external forums (http://www.stepinforum.org/stepin-
summit-2012/plenaries/amita_awasti_track.html).
Currently she is managing multiple projects for a major US based Banking Customer, and
actively contributes to unit level activities.
The author can be reached at amita_awasthi@infosys.com
6
Context & Background
In today’s world we all rely on data and make informed decisions, for any large
organization data warehouse is the holy grail of information which is helping them to
analyze the past, make decisions for today and future. It is no more limited to “after the
fact” analysis with the advent of continuous technology innovations in this space.
Managing and executing the data warehouse testing projects has become more
challenging and interesting as the service offering itself is getting refined with the latest
technology trend. We have seen that many of our clients are struggling with the
decentralized way of managing DWH testing projects and either moved or have future
roadmaps defined to move into the DWH testing Center of excellence model.
According to Gartner Hype Cycle for Information Infrastructure, 2012, “the Logical Data Warehouse
(LDW) is a new data management architecture for analytics which combines the strengths of
traditional repository warehouses with alternative data management and access strategy. The LDW
will form a new best practices by the end of 2015.”
There are some essential elements, metrics and roadmap definition for transforming to
Data Warehouse TCOE8
Reference: http://www.compositesw.com/solutions/logical-data-
warehouse/
Context & Background
The objective of this paper is to elaborate on the three essential elements of
Data warehouse TCOE
1. People
2. Process
3. Technology
The solution provided here talks about the challenges faced by clients in a
traditional data warehouse testing set up, what is the market perspective and
trend that we are seeing in current times. This can be used as a skeleton
framework to access the current DWH testing state, and outlining the roadmap
for moving to a mature DWT COE end state.
There are metrics defined specific to data warehousing which are crucial for
data quality and load.
9
DWH Testing Challenges in a typical implementation
Source Staging Data Warehouse
Data
Pu
blish
ing
Reporting and Analytics
Metadata
Raw data
Summary data
Outbound Extracts
Data Marts
In-memory databases
ETL ETL
Reports & Dashboards
Ad hoc analysis
Mobile Apps
Data Quality checks not performed
on source system data, few of the
DQ checks are
Duplicate check
Null value check
Metadata check
Pattern check
Heterogeneous data sources
QA Challenges
Static testing not performed prior
to test execution
Schema validations not done
Sampling strategy is used
causing incomplete coverage of
testing
Exhaustive testing not done due
to lack of automation
Huge volume of information coming in
DWH
How much history to store in data
warehouse, storage infrastructure vs.
cost and analytical requirements
Consistency of data to ensure data
correctness between reporting, ad
hoc query and analytics
Defects caught very later in the life
cycle during the review of extracts
and reports
No performance testing done for
ad hoc reports & queries
E2E data reconciliation is not done
from reports to source data
Lack of Skilled resources, Lack of DWH competency enablement framework, Lack of dedicated DWH Research track , Lack of differentiators and accelerators
10
How clients are dealing with DWH Testing Challenges – Market Perspective
• Based on market data we see that clients who don’t have a TCOE working
towards setting up a TCoE
• By implementing TCOE, huge cost savings and quality improvement are
achieved by many of Infosys clients and they have been able to compress
testing timelines as well
• DWT COE
• Cost effective solution
• Increased focus on reuse
• Improved data quality and availability of systems
• Improved time to market to meet stringent timelines requirements
• Effective DW&BI enables better management decisions and reduces risks
• Provide strategic direction for the organization in terms of tools, licensing,
processes and technology
11
Better Quality Through Data Test Strategies
( Exhaustive, Aggregate, Sampling , Risk Based Testing etc.)
Building Data Quality as the practice
(Metadata, Pattern, Statistical, relationship, Business Rules Analysis
etc. early in lifecycle)
End to End Coverage of the DW Lifecycle
( Defined DWH life cycle to ensure complete coverage in terms of functional
and non-functional requirements, also end-to-end data reconciliation)
Efficiency Through automated Data Testing( ETL Validation, Data Quality
Analysis, Performance Testing can be automated using in-house
/market Tools)
Centralization and better utilization of ETL/BI/DWT tools
(using the strategic tools across organization will help in saving license
costs, improved utilization and training requirements
DWH Testing career with defined growth path
( this will motivate people to learn and grow as career path is defined)
DWH Test Academy to Skill/Re-skill people, perform assessment,
improve technical capability(DWH Testing skill plan for beginners,
intermediate and expert level, planned technical assessment to ensure
improvement of skill level)
Centralized repository for any DWH Testing related artifacts
(process documents, templates, checklists, questionnaires etc.)
12
Metrics Driven QA framework(Data Quality, Data Load, Response
time etc.)
Better deployment and utilization of resources
(centralized control of DWH testers to be deployed in projects based on
project skill set requirements)
Knowledge and Best Practices sharing across data testing
projects(lessons learnt, defect repository, in-
house tools created etc.)
Keeping up with continuously evolving DW technology(benchmarking with industry
standards of data testing in terms of tools, preparedness to adopt new
technology, trends etc.)
Characteristics of a Data Warehouse TCOE ………….Contd.
Efficiency Through automated Data Testing
Centralization and better utilization of ETL/BI/DWT tools
Evaluate technology trends and identify new tools for adoption, keeping up with continuously evolving DW technology
DWH Testing career with defined growth path
DWH Test Academy to Skill/Re-skill people, perform assessment, improve technical capability
Better deployment and utilization of resources
Better Quality Through Data Test Strategies
Building Data Quality as the practice
End to End Coverage of the DW Lifecycle
Metrics driven QA framework
Centralized repository for any DWH Testing related artifacts
Knowledge and Best Practices sharing across data testing projects
DWH TCOE
People
Tools/Technology
Process
Characteristics of a Data Warehouse TCOE
13
Level 1
Level 2
Level 3
Level 4
• DWH Concepts,
• SQL Query writing
• Excel macros
• Data validations
• Basic query tools and
reporting
• ETL testing
• Test Data
Management
• Test Strategy
• Defect Analysis
• ETL&BI Tools
• Automation
• End to End Solution
usage- Estimation,
Planning, Data
modeling, ETL , Data
validation, Reporting,
Technology Trends,
Appliance testing
• Consulting – DWT,
Appliance testing, Big
Data, Mobile BI, DW
testing on cloud,
Analytics testing
Continuous improvement of individual technical competency
Clarity on the roles and career path ahead
Awareness of what trainings to attend, what certifications to attend, thought leadership
People Competency Framework and Roles in DWH Testing
People
14
• Test automation tools – QuerySurge, Informatica Data Validation etc.
• Excel based tools(macros) which can automate test steps like: test case creation, query creation, data comparison etc.
Test Automation
• This can be created based on our experience and can be referred to ensure all critical scenarios are covered in test planning and scriptingDefect Repository
• Ready to use DWT risk repository portal, this is invaluable for test risk planning.Risk Repository
• Business Value articulation case studies repository which can be used to implement best practices across similar projectsBVA Repository
• Reusable templates for test planning, test strategy, status reporting etc.
• Reusable checklists for test plan review, pre-execution checks, execution checks etc.Templates and Checklists
• DWT specific training program for different competency levels- basic, intermediate and advance
• DWT tools specific training program to create tools SMECompetency Development
• Research initiatives and repository of DWT publications to keep updated on latest trends in DWHThought Leadership
ProcessProcesses and Best Practices for DWH Testing
15
DWH Test MetricsCategory Direct Metrics Derived Metrics
Uniqueness # of duplicate records # of duplicate records/total number of records
Correctness & Consistency
# of records with pattern mismatch# of fields with inconsistent data occurrence
# of records with pattern mismatch/total number of records
Completeness # of records with null values in not nullable fields# of records with blank values in non blank fields
# of records with null values in not nullable fields/total number of records# of records with blank values/total number of records
Timeliness Delay in receiving data or feed files (hours/days) # of days delay in receiving data/ Test execution duration
Phase Containment # of data quality defects caught in each phase of project
# of data quality defects caught in one phase of project/#Total data quality defects caught in project
Data Load # of records loaded in target# of records rejected# of valid rejects# Total number of records in source
# of records loaded in target/(Total number of records in source- # of valid rejects)
Schema Validation # of entities missing from defined schema# of entities mismatching from defined schema# of data type mismatches for the fields
Schema validation means comparing the defined/documented database schema with the actual DB schema, PK/FK constraints also checked here
Performance Report response timeTime taken to complete End to End data load
% adherence can be calculated if SLAs are defined for report response and E2E data load time
Process
16
Top 3 Technology Trend in DWH/ BI
17
“Big Data drives Tomorrow’s
BI”
“Elastic DWH in the Cloud”
• Lower cost in Pay per use model, over
provisioning leading to high costs can
be avoided
• Expertise of building and maintaining
DWH is no longer needed within the
organization itself
• An elastic data warehousing system in
the cloud would automatically increase
or decrease the number of nodes used,
allowing one to save money
• Moving from wired world to wireless
world with an advantage of
smartphones/tablets
• Technological advancement created
the need for having information
available on the go for faster decision
making, better customer service,
efficiency in business processes and
improved employee productivity
• Most of the top banks have there
banking apps available on mobile
• All top BI vendors are offering mobile
BI capability
“Information on the Move”
• Enables huge storage of data-
petabytes
• Advantage of storing and analyzing
unstructured data from social
networks, public domain
• Helps in understand and predict
customer behavior can be used for
cross selling of products, customer
loyalty management, real time
fraud detection, compliance check
etc.
• All top BI vendors are offering big
data capabilities
Tools/Technology
DWT Assessment and Transformation Roadmap
1 Establish a DWH Testing Center of Excellence
2 Enhance and standardize the current DWH testing process framework for E2E Test Life Cycle by following a standard lifecycle approach
3 Implement key DWH test metrics
4 Identify strategic test tools and integrate current tools to enable end to end automation. Standardize the use of automation frameworks across projects
5 Leverage TDM function for better quality and timely provision of test data
6 Centralized knowledge repository of any DWH project templates, checklists, test artifacts, lessons learnt, trackers, questionnaires, training material etc.
8 Preparedness of DWH QA organization for adaption of new capabilities/services
Process Evaluation
18
• Identification of transformation initiatives based on QA Assessment recommendations
• Categorization of initiatives into short/medium/long term milestones
• Develop the plan for deployment of each initiative
7 Centralized training academy for skill/re-skill of DWH resources, technical assessment
Cost Saving
Knowledge SharingCompetency EnablementProcess adherence &
Improvement
Adapting to Latest Market Trends
19
Improved Control on Projects
Faster Time to Market
Benefits of establishing a Data Warehouse TCOE
Improved system AvailabilityBetter Resource utilization
Expected ROI of DWT COE - Key Dimensions
Key
DimensionsElements Metrics to track for success Typical Improvement
People Improved resource utilization • Resource utilization % 10 – 15%
Reduced resource on-boarding time
• Time taken to on-board
resource from request to
deployment
15 – 30%
Improved Competency level
• Technical Assessment
Results- # of people moved
from lower levels to higher
levels
Helps in better project execution
Process Following Standardized DWT processes• Process Compliance Index
• Cost of quality5% - 10 %
Re-use of test strategy, templates, best
practices, queries etc.
• Testing Cycle Time
• % of reuse8% - 10%
Predictive profiling of defects and proactive
strategies
• Defect removal effectiveness
• Defect Slippage5% -10 %
Early Validations to catch defect early in life
cycle• Defect Containment metrics 10% -20%
Tools/Technolo
gyAutomation of test process and execution
• % Reduction in Test
Execution Effort
• Testing coverage
10% -25%
Internal Test Infrastructure/ tool
Consolidation/virtualization
• % Reduction in license/infra
cost5% - 10%
2020
Conclusion
Data Warehouse testing is no more limited to data and report testing, it
is one of the rapidly changing technology areas and organizations need
to make dedicated investment to keep up with the Market trends.
As per Gartner they see future of Data Warehouse as “Logical Data
Warehouse”, real time analytics, data visualization, domain knowledge
to test industry specific use cases in data warehouse it has become
essential elements of data warehouse testing.
The benefits of having DWT COE cannot be ignored anymore and
moving to DWT COE is a path ahead for large organizations to make
maximum use of the golden mine of data.
21
top related