dmm9 - data migration testing

© 2016, Data Maven Limited

Data Migration Testing

The most important part of a migration project

Data Migration Testing: Agenda

The key to migration success… test, test, test, and test again!

Where… does it fit in the PDM v2 landscape?

When… does the testing of data start during a migration project?… and when does it end?

What… is the scope of the data to be tested?… is the grey area?… do we need?

How… do we test for completeness?… do we test for accuracy?

Questions

Test, test, test, and test again!

(and repeat if necessary)

Data Migration Testing: The key to Migration Success

Test, test, test, and test again!

Independent of the approach used in the migration (big bang or staggered, waterfall or agile) sufficient test cycles should always be part of the overall project plan

Best practise (based on experience, but flexible based on budget and timelines): At least 3 full test cycles should be scheduled, with partial cycles in between (if

needed)• Full cycles are used to test the end-to-end readiness of the target system(s)• Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing• Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole

migration• If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the

whole project, and it might be better to start from scratch… Use of actual target platform is recommended for all full test cycles• No use testing a full migration on scaled down versions of the source or target platforms, as

it will completely distort any estimation of cut-over windows A complete dry-run of the end-to-end migration should preferably be scheduled as

the last full test cycle• Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window

Every iteration should result in a more complete picture Target will be evolving, based on implementation changes, and thus data migration

will grow! If in doubt of completeness or accuracy, beg for project extension to fit in another

test cycle, rather than ending up with a failed project!

Where does testing fit into the PDM v2 landscape?

Data Migration Testing: Where?

Where does testing fit into the PDM v2 landscape?

Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool

But it should be much larger: it should be part of every portion that actually touches data!

Bus

ines

s en

gage

men

tTe

chni

cal

Landscape analysis

(LA)

Gap analysis and mapping

(GAM)

Migration design and execution

(MDE)

Data quality rules(DQR)

Legacy decommissioning

(LD)

Key data stakeholder management

(KDSM)

System retirement plan(SRP)

Migration strategy and governance

(MSG

)

Profiling tool Data quality tool

Migrationcontroller

DMZ

When… does the testing of data start during a migration project?… and when does it end?

Data Migration Testing: When?

When does the testing of data start during a migration project? Simple answer: Right at the start! Planning phase:

Starting with the planning phase of the project, allow for a parallel stream of testing

Discovery phase: During discovery, get indicative counts from the business• How many customers do you have?• How many products do you manufacture?

Analysis phase: During the initial data analysis, compare the indicative counts with what is found

in the source system(s), and report back• Found 1,200 customer records, but you said you have only 1,000 active customers? What do

we do with the rest? Archive/discard/reactivate?• Found 5,000 products, but you stated that you manufacture around 2,000? Does the

product master we have analysed contain sub-assemblies, and if so, how would we identify those?

Refine the counts based on the findings Extract phase:

During the development of the ETL, test the volumes of extracted records against the refined counts• Are we missing records? Is this due to incorrect selection conditions, or missing data?

(caused by inner joins instead of outer joins used in combined extraction queries?)

When does testing end?

Traditional answer: Right at the end! Testing should be the final step before the target system(s) is(are) handed over for

User Acceptance Testing During the test cycles, all tests should be completed before handing over for

(partial) UAT During the final migration:• Detail comparison of all business critical data, like product catalogues, financials and

manufacturing parameters needed for day to day operations, must be completed• Detail data comparison of historical data, if it was migrated, can linger, as long as it is

completed before the first reporting runs that rely on this data• Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off

o Depends on the number of test cycles used during the project, and the success achieved during those cycles

o Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the target system(s)

Correct answer: Testing should never stop! In all migration projects, the test tools and completeness of the test suite is a

perfect starting point for ongoing data management, especially in a multi-system environment

It would be bad business practise not to exploit this fact, and allow the test suite to be dismantled with the rest of the migration project

What… is the scope of the data to be tested?… is the grey area?… do we need?

Data Migration Testing: What?

What is the scope of the data to be tested?

Simple answer: Every bit of data is in scope! Normally, the business will only mention the source system(s) that are to be

replaced and decommissioned This is the obvious data that has to be tested, but…

Integrated systems: will the interfaces still work as expected?• When enriching/transforming data during migration, does the interface still give the target

the exact data it needs to understand and handle the incoming messages?• Are the data formats extracted from the new systems the same as what is expected? Maybe

longer or shorter string values?• One format very often overlooked: dates! Are they in the same format? Are they expected to

be in UTC? Or local time zone? Online reporting tools (BI): will the data from the new system(s) flow into the

cubes without problems?• New conversion layer might be necessary, or transformation of the current BI platform to

allow seamless continuation of the existing cubes and reports Offline reporting tools: will these still be fed in a correct manner?• Very often Excel reports are linked to the underlying database using ODBC queries, e.g.

PowerPivot or simply Excel tables• Data sources need to be changed, and very often the queries adapted to produce the same

output• In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!

What is the grey area?

Historical data If historical data is going to be archived as part of the migration, very often full

blown data testing is considered out of scope However, the following tests are still required:• Rudimentary tests like record and object counts• Simple verification that archived historical data is accessible and displays in a valid manner• Accessibility of historical data by the user groups who need access

Offline reporting tools Many migrations include the aim to get rid of the myriad of spreadsheets used in

the business to manipulate and generate data reporting• Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool

for execs and the board However, the following is still required:• 100% surety that these reports have something to replace them• 100% surety that none of these spreadsheets contain any data not present in the target

platform• 100% surety that the business does not rely on any of these reports

If 100% surety can not be achieved, these offline reports immediately become part of the overall migration scope, and needs to be addressed, fixed and tested!

What do we need?

Assumptions of requirements: Migration Aim 1: non-disruptive to normal business• Normal business users should ideally not be (physically) aware that a migration is taking

place, i.e. no dips in performance, no interruptions, no extra work• To achieve this, source system(s) should be cloned to (temporary) systems where migration

activities can be executed Migration Aim 2: target(s) should be flawless after the migration• All possible scenario’s should be tested, including destructive testing and fail-over• This is more a task of the systems team, so, to not interfere with

migration/implementation/user aceptance testing tasks, the target system(s) should be cloned

Migration Aim 3: target(s) must be functionally operable without workarounds• User acceptance testing should include any and all possible operations, reports and

interfacing between systems• Data integrity and quality should be of such a standard that there are no hick-ups when

going live Migration Aim 4: data testing should ideally be independent from the migration

builders• Independent verification of data transformations, enrichments and manipulations

But… budget available will be the main decision factor! So, what should the ideal landscape look like?

Building the ideal migration landscape

Source(s) Target(s)Staging

Testing

Clone(s) Clone(s)

How… do we test data for completeness?… do we test data for accuracy?

Data Migration Testing: How?

How do we test data for completeness?

Technical testing Do we know where each record from the source has gone?• End game is decommissioning, so each bit of data must be accounted for• Whether migrating, archiving or truncating old data, each record must have a target, even if

the target is the bin! Of the data that ends in the target(s), do we have the same or equivalent number

of records as those earmarked for migration in the source(s)?• DQ rules could have combined records from the source(s) on the target(s)• Enrichment could have caused extra data to be added to the target(s)• Summarisation of e.g. historical sales orders could have caused one record in the target,

and multiple in the archiving system• All these transformations have to be built into the reconciliation engine, and in the end, the

count of the source(s) must match the count of the target(s) Functional testing (during UAT)

Can the users do their normal day-to-day work?• No manual intervention or “work arounds” needed

Reports from source(s) and target(s) have the same values? Are all interfaces working as expected?• Do all integrated systems respond in the expected manner?• Are the results from these systems fit for purpose on the target system(s)?

How do we test data for accuracy?

Read data from source Read data from target Add transformations and/or

enrichments Apply the same actions to source

data, or Reverse apply the actions to target

data

Compare data on byte-for-byte level:(e.g. simplified data set with 3 resultant columns)

select c1, c2, c3, sum(chk) from (select s.c1, s.c2, s.c3, -1 as chk from source sunion allselect t.c1, t.c2, t.c3, 1 as chk from target t

)group by c1, c2, c3having sum(chk) <> 0

Any results? Error in migration!

Staging

Testing

Source(s) Target(s)

Lot of work to build, extremely complex and a lot of time needed for the comparison, but… accuracy is 100% guaranteed!

Any questions?

Data Migration Testing: Questions?

Thank you for your attention!

Data Migration Testing: The End

dmm9 - data migration testing

Documents