dmm9 - data migration testing
TRANSCRIPT
© 2016, Data Maven Limited
Data Migration Testing
The most important part of a migration project
Data Migration Testing: Agenda
The key to migration success… test, test, test, and test again!
Where… does it fit in the PDM v2 landscape?
When… does the testing of data start during a migration project?… and when does it end?
What… is the scope of the data to be tested?… is the grey area?… do we need?
How… do we test for completeness?… do we test for accuracy?
Questions
Test, test, test, and test again!
(and repeat if necessary)
Data Migration Testing: The key to Migration Success
Test, test, test, and test again!
Independent of the approach used in the migration (big bang or staggered, waterfall or agile) sufficient test cycles should always be part of the overall project plan
Best practise (based on experience, but flexible based on budget and timelines): At least 3 full test cycles should be scheduled, with partial cycles in between (if
needed)• Full cycles are used to test the end-to-end readiness of the target system(s)• Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing• Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole
migration• If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the
whole project, and it might be better to start from scratch… Use of actual target platform is recommended for all full test cycles• No use testing a full migration on scaled down versions of the source or target platforms, as
it will completely distort any estimation of cut-over windows A complete dry-run of the end-to-end migration should preferably be scheduled as
the last full test cycle• Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window
Every iteration should result in a more complete picture Target will be evolving, based on implementation changes, and thus data migration
will grow! If in doubt of completeness or accuracy, beg for project extension to fit in another
test cycle, rather than ending up with a failed project!
Where does testing fit into the PDM v2 landscape?
Data Migration Testing: Where?
Where does testing fit into the PDM v2 landscape?
Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool
But it should be much larger: it should be part of every portion that actually touches data!
Bus
ines
s en
gage
men
tTe
chni
cal
Landscape analysis
(LA)
Gap analysis and mapping
(GAM)
Migration design and execution
(MDE)
Data quality rules(DQR)
Legacy decommissioning
(LD)
Key data stakeholder management
(KDSM)
System retirement plan(SRP)
Migration strategy and governance
(MSG
)
Profiling tool Data quality tool
Migrationcontroller
DMZ
When… does the testing of data start during a migration project?… and when does it end?
Data Migration Testing: When?
When does the testing of data start during a migration project? Simple answer: Right at the start! Planning phase:
Starting with the planning phase of the project, allow for a parallel stream of testing
Discovery phase: During discovery, get indicative counts from the business• How many customers do you have?• How many products do you manufacture?
Analysis phase: During the initial data analysis, compare the indicative counts with what is found
in the source system(s), and report back• Found 1,200 customer records, but you said you have only 1,000 active customers? What do
we do with the rest? Archive/discard/reactivate?• Found 5,000 products, but you stated that you manufacture around 2,000? Does the
product master we have analysed contain sub-assemblies, and if so, how would we identify those?
Refine the counts based on the findings Extract phase:
During the development of the ETL, test the volumes of extracted records against the refined counts• Are we missing records? Is this due to incorrect selection conditions, or missing data?
(caused by inner joins instead of outer joins used in combined extraction queries?)
When does testing end?
Traditional answer: Right at the end! Testing should be the final step before the target system(s) is(are) handed over for
User Acceptance Testing During the test cycles, all tests should be completed before handing over for
(partial) UAT During the final migration:• Detail comparison of all business critical data, like product catalogues, financials and
manufacturing parameters needed for day to day operations, must be completed• Detail data comparison of historical data, if it was migrated, can linger, as long as it is
completed before the first reporting runs that rely on this data• Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off
o Depends on the number of test cycles used during the project, and the success achieved during those cycles
o Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the target system(s)
Correct answer: Testing should never stop! In all migration projects, the test tools and completeness of the test suite is a
perfect starting point for ongoing data management, especially in a multi-system environment
It would be bad business practise not to exploit this fact, and allow the test suite to be dismantled with the rest of the migration project
What… is the scope of the data to be tested?… is the grey area?… do we need?
Data Migration Testing: What?
What is the scope of the data to be tested?
Simple answer: Every bit of data is in scope! Normally, the business will only mention the source system(s) that are to be
replaced and decommissioned This is the obvious data that has to be tested, but…
Integrated systems: will the interfaces still work as expected?• When enriching/transforming data during migration, does the interface still give the target
the exact data it needs to understand and handle the incoming messages?• Are the data formats extracted from the new systems the same as what is expected? Maybe
longer or shorter string values?• One format very often overlooked: dates! Are they in the same format? Are they expected to
be in UTC? Or local time zone? Online reporting tools (BI): will the data from the new system(s) flow into the
cubes without problems?• New conversion layer might be necessary, or transformation of the current BI platform to
allow seamless continuation of the existing cubes and reports Offline reporting tools: will these still be fed in a correct manner?• Very often Excel reports are linked to the underlying database using ODBC queries, e.g.
PowerPivot or simply Excel tables• Data sources need to be changed, and very often the queries adapted to produce the same
output• In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!
What is the grey area?
Historical data If historical data is going to be archived as part of the migration, very often full
blown data testing is considered out of scope However, the following tests are still required:• Rudimentary tests like record and object counts• Simple verification that archived historical data is accessible and displays in a valid manner• Accessibility of historical data by the user groups who need access
Offline reporting tools Many migrations include the aim to get rid of the myriad of spreadsheets used in
the business to manipulate and generate data reporting• Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool
for execs and the board However, the following is still required:• 100% surety that these reports have something to replace them• 100% surety that none of these spreadsheets contain any data not present in the target
platform• 100% surety that the business does not rely on any of these reports
If 100% surety can not be achieved, these offline reports immediately become part of the overall migration scope, and needs to be addressed, fixed and tested!
What do we need?
Assumptions of requirements: Migration Aim 1: non-disruptive to normal business• Normal business users should ideally not be (physically) aware that a migration is taking
place, i.e. no dips in performance, no interruptions, no extra work• To achieve this, source system(s) should be cloned to (temporary) systems where migration
activities can be executed Migration Aim 2: target(s) should be flawless after the migration• All possible scenario’s should be tested, including destructive testing and fail-over• This is more a task of the systems team, so, to not interfere with
migration/implementation/user aceptance testing tasks, the target system(s) should be cloned
Migration Aim 3: target(s) must be functionally operable without workarounds• User acceptance testing should include any and all possible operations, reports and
interfacing between systems• Data integrity and quality should be of such a standard that there are no hick-ups when
going live Migration Aim 4: data testing should ideally be independent from the migration
builders• Independent verification of data transformations, enrichments and manipulations
But… budget available will be the main decision factor! So, what should the ideal landscape look like?
Building the ideal migration landscape
Source(s) Target(s)Staging
Testing
Clone(s) Clone(s)
How… do we test data for completeness?… do we test data for accuracy?
Data Migration Testing: How?
How do we test data for completeness?
Technical testing Do we know where each record from the source has gone?• End game is decommissioning, so each bit of data must be accounted for• Whether migrating, archiving or truncating old data, each record must have a target, even if
the target is the bin! Of the data that ends in the target(s), do we have the same or equivalent number
of records as those earmarked for migration in the source(s)?• DQ rules could have combined records from the source(s) on the target(s)• Enrichment could have caused extra data to be added to the target(s)• Summarisation of e.g. historical sales orders could have caused one record in the target,
and multiple in the archiving system• All these transformations have to be built into the reconciliation engine, and in the end, the
count of the source(s) must match the count of the target(s) Functional testing (during UAT)
Can the users do their normal day-to-day work?• No manual intervention or “work arounds” needed
Reports from source(s) and target(s) have the same values? Are all interfaces working as expected?• Do all integrated systems respond in the expected manner?• Are the results from these systems fit for purpose on the target system(s)?
How do we test data for accuracy?
Read data from source Read data from target Add transformations and/or
enrichments Apply the same actions to source
data, or Reverse apply the actions to target
data
Compare data on byte-for-byte level:(e.g. simplified data set with 3 resultant columns)
select c1, c2, c3, sum(chk) from (select s.c1, s.c2, s.c3, -1 as chk from source sunion allselect t.c1, t.c2, t.c3, 1 as chk from target t
)group by c1, c2, c3having sum(chk) <> 0
Any results? Error in migration!
Staging
Testing
Source(s) Target(s)
Lot of work to build, extremely complex and a lot of time needed for the comparison, but… accuracy is 100% guaranteed!
Any questions?
Data Migration Testing: Questions?
Thank you for your attention!
Data Migration Testing: The End