testing data warehouse applications by kirti bhushan

42
Data Warehouse Testing Strategy Ver 1.0 Kirti Bhushan Page 1 of 42

Upload: kirti-bhushan

Post on 12-May-2015

2.491 views

Category:

Technology


1 download

DESCRIPTION

Testing data warehouse applications by Kirti Bhushan

TRANSCRIPT

Page 1: Testing data warehouse applications by Kirti Bhushan

Data Warehouse Testing Strategy

Ver 1.0Kirti Bhushan

Page 1 of 28

Page 2: Testing data warehouse applications by Kirti Bhushan

Table of Contents

Introduction......................................................................................................................................3About Data Warehousing............................................................................................................3Need for Data Warehouse testing................................................................................................3Challenges to Data Warehouse testing........................................................................................3

Functional Testing Model................................................................................................................6Data Warehouse Testing Model......................................................................................................7

Project Definition Phase..............................................................................................................7Test Design Phase........................................................................................................................9Test Development Phase..............................................................................................................9Test Execution Phase.................................................................................................................10Acceptance.................................................................................................................................11

Data Warehouse Testing Architecture...........................................................................................12Goals of Data Warehouse Testing.................................................................................................13

Data Completeness Testing.......................................................................................................13Data Transformation Testing.....................................................................................................14Data Quality Testing..................................................................................................................15Non-Functional Testing.............................................................................................................15

Error log and Audit log..........................................................................................................15Backup and Recovery Testing...............................................................................................15

Security Testing.........................................................................................................................16System Security.....................................................................................................................16Application Security..............................................................................................................17

Performance and Scalability......................................................................................................18Integration Testing.....................................................................................................................19Reports Testing..........................................................................................................................19User Acceptance Testing...........................................................................................................21Regression Testing.....................................................................................................................22

Scope of Testing............................................................................................................................23Roles and Responsibilities.............................................................................................................24Artifacts / Deliverables..................................................................................................................25

Software Project Plan (SPP)......................................................................................................25System Test Plan........................................................................................................................25System Test Cases/Test Plan and Scripts (TPS)........................................................................25Tools and Automation in Data Warehousing............................................................................26

References......................................................................................................................................28

Page 2 of 28

Page 3: Testing data warehouse applications by Kirti Bhushan

Introduction

About Data Warehousing

A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. Data Warehouse can be formally defined terms; it is Subject-oriented, meaning that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together; Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time; Non-volatile, meaning that data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting; and Integrated, meaning that the database contains data from most or all of an organization's operational applications, and that this data is made consistent.

Need for Data Warehouse testing

Businesses are increasingly focusing on the collection and organization of data for strategic decision-making. The ability to review historical trends and monitor near real-time operational data has become a key competitive advantage.There is an exponentially increasing cost associated with finding software defects later in the development lifecycle. In data warehousing, this is compounded because of the additional business costs of using incorrect data to make critical business decisions. Given the importance of early detection of software defects we need to come up with a strategy to effectively test an ETL application.

Challenges to Data Warehouse testing

Challenges to data warehouse testing can be summarized as below:

Data selection from multiple source systems and analysis that follows pose great challenge.

Volume and the complexity of the data. Inconsistent and redundant data in a data warehouse. Loss of data during the ETL process. Non-Availability of comprehensive test bed. Critical data for bussiness. Data quality not assured at source. Very high cost of quality. This is because any defect slippage will translate into

significantly high costs for the organization.

Page 3 of 28

Page 4: Testing data warehouse applications by Kirti Bhushan

100% data verification not feasible always, so more stress on ETL components to ensure data behaves as expected within these modules.

Data Warehouse Testing is Different than traditional testing

Data Warehouse Testing is Different than the traditional testing in many ways.

All works in Data Warehouse population are mostly through batch runs. Therefore the testing is different from what is done in transaction systems.Unlike a typical transaction system, data warehouse testing is different on the following counts:User-Triggered vs. System triggeredMost of the production/Source system testing is the processing of individual transactions, which are driven by some input from the users (Application Form, Servicing Request.). There are very few test cycles, which cover the system-triggered scenarios (Like billing, Valuation.)In data Warehouse, most of the testing is system triggered as per the scripts for ETL ('Extraction, Transformation and Loading'), the view refresh scripts etc.Therefore typically Data-Warehouse testing is divided into two parts--> 'Back-end' testing where the source systems data is compared to the end-result data in Loaded area, and 'Front-end' testing where the user checks the data by comparing their MIS with the data displayed by the end-user tools like OLAP.

Batch vs. online gratificationThis is something, which makes it a challenge to retain users interest.A transaction system will provide instant OR at least overnight gratification to the users, when they enter a transaction, which either is processed online OR maximum via overnight batch. In the case of data- warehouse, most of the action is happening in the back-end and users have to trace the individual transactions to the MIS and views produced by the OLAP tools. This is the same challenge, when you ask users to test the month-end mammoth reports/financial statements churned out by the transaction systems. Volume of Test DataThe test data in a transaction system is a very small sample of the overall production data. Typically to keep the matters simple, we include as many test cases as are needed to comprehensively include all possible test scenarios, in a limited set of test data..Data Warehouse has typically large test data as one does try to fill-up maximum possible combination and permutations of dimensions and facts.For example, if you are testing the location dimension, you would like the location-wise sales revenue report to have some revenue figures for most of the 100 cities and the 44 states. This would mean that you have to have thousands of sales transaction data at sales office level (assuming that sales office is lowest level of granularity for location dimension).

Page 4 of 28

Page 5: Testing data warehouse applications by Kirti Bhushan

Possible scenarios/ Test CasesIf a transaction system has hundred (say) different scenarios, the valid and possible combination of those scenarios will not be unlimited. However, in case of Data Warehouse, the permutations and combinations one can possibly test is virtually unlimited due to the core objective of Data Warehouse is to allow all possible views of Data. In other words, 'You can never fully test a data Warehouse'Therefore one has to be creative in designing the test scenarios to gain a high level of confidence. Test Data PreparationThis is linked to the point of possible test scenarios and volume of data. Given that a data- warehouse needs lots of both, the effort required to prepare the same is much more. Programming for testing challengeIn case of transaction systems, users/business analysts typically test the output of the system. However, in case of data warehouse, as most of the action is happening at the back-end, most of the 'Data Warehouse data Quality testing' and 'Extraction, Transformation and Loading' testing is done by running separate stand-alone scripts. These scripts compare pre-Transformation to post Transformation (say) comparison of aggregates and throw out the pilferages. Users roles come in play, when their help is needed to analyze the same (if designers OR business analysts are not able to figure it out).

Page 5 of 28

Page 6: Testing data warehouse applications by Kirti Bhushan

Functional Testing Model

The diagram below shows the phases of a conventional functional testing model.

The phases in this model are as follows:1 Project Definition Phase2 Functional Test Design Phase3 Functional Test Case Preparation Phase4 Functional Test Execution Phase5 Functional Test Acceptance Phase

Page 6 of 28

Page 7: Testing data warehouse applications by Kirti Bhushan

Data Warehouse Testing Model

The conventional functional testing model is tailored below to suit any Data Warehouse Testing project.

The Data Warehouse Testing model has similar phases but the activities which are part of the phases are different and are relevant to the Data Warehouse projects.

Project Definition Phase

Purpose Identify the project scope and understand the customer

requirements Request for Infrastructure and Human resources for the

project Define the Software Project Plan

Entry CriteriaStatement of Work, High Level Business Requirement Document

Page 7 of 28

Page 8: Testing data warehouse applications by Kirti Bhushan

TasksProject Initiation Allocate Project Id Understand Project Requirements Prepare Project Kick-off request Submit Project Kick-off request for approval Prepare Work Order Submit Work Order for approval

Software Project Plan List the assumptions made Identify the deliverables of the project Define development process and verification activities and specify

deviations from the standard process, if any. Identify the project organization depending on the size of the project Identify the risk and the risk management plans for the project. Review the estimated effort, Schedule and milestones for the project

provided in the estimation worksheet. Identify the project management process. Prepare the training plan for the project Identify the hardware, software, and other project specific

requirements for the project. Identify the quality goals for the project. List the verification activities and mention the deviations, if any. List the Invoicing Schedule for the project. Identify the project metrics to be collected Define the organization for Configuration Management activities Identify the Configurable Items, libraries to store them and version

numbering scheme. Define configuration control mechanisms Define configuration Status Accounting mechanisms. Plan for configuration Audits Prepare a Software Project Plan document Submit the Software Project Plan for review

Exit Criteria Reviewed and Approved Software Project Plan (SPP)

Page 8 of 28

Page 9: Testing data warehouse applications by Kirti Bhushan

Test Design Phase

Purpose Define the Test approach and develop a test design specification. Prepare the associated test data.

Entry CriteriaApproved Software Project Plan (SPP), Approved SOW and Approved Functional Specification (or) Business Requirement Document (BRD) in detail

TasksPrepare Test Design Specification and Master Test Plan Derive the Detailed Master Test Plan Prepare the Master Test Plan based on the Estimation Review the Functional specification. Identify at a high level the modules and their integration. Identify major business (functional) process. Create Positive, Negative and Destructive Test scenarios. Create Test data requirements. Define Timeline and Resource schedule Resource loading for the activities defined in the Project plan. Assign unique identifier for each test scenario and trace it to

requirements.(this may be a part of TPS) Peer Review of Master Test Plan. Review of Master Test Plan by Team members. QA management review of Master Test Plan.

Exit Criteria Reviewed and Approved Test Design Specification and Software Test

Plan (STP)

Test Development Phase

Purpose Identify and prepare test cases

Entry CriteriaFunctional specification, ETL specification, ER diagram, Report Design Document, Approved Test Design Document and Software Test Plan STP)

Page 9 of 28

Page 10: Testing data warehouse applications by Kirti Bhushan

TasksPrepare Test Cases Create detailed test cases for all identified types of testing. Create test cases for the functionality to be tested covering all

scenarios captured during Test Design Phase. Assign unique identifier for the Test cases and trace it to requirements Mapping Test Cases to the corresponding High level scenarios Perform peer review of the test cases Test cases need to be review by the team members Rework of Test Cases Modify Test Cases in case of any defects found during Peer review Update the Traceability Matrix.

Exit CriteriaReviewed and Approved System Test Cases/Test plan and Scripts (TPS)

document.

Test Execution Phase

Purpose To Test the Product Prepare the Test Report Package and release of test deliverables

Entry CriteriaApproved Test Cases, Approved Test Data and Unit Tested Source Code

TasksExecute Tests for all identified testing types Execute the test cases as per the test case specification Record defects Prepare Test report for the test failures and defects found Test deliverable package release notePrepare Test Report Prepare test report for each testing phase Prepare test summary report based on test incident reportsPackage Package the test data and test results

Page 10 of 28

Page 11: Testing data warehouse applications by Kirti Bhushan

Prepare delivery note Prepare release notesRelease Verify the test results package and release it

Exit CriteriaApproved Test Results and Approved Test Summary.

Acceptance

Purpose To verify Test Deliverables Obtain Customer approval and Sign-off

Entry CriteriaApproved Test Deliverable Package

Tasks Support acceptance test. Package and Deliver.

Exit CriteriaCustomer Sign-off.

Page 11 of 28

Page 12: Testing data warehouse applications by Kirti Bhushan

Data Warehouse Testing Architecture

The architecture below depicts the various types of testing that can be performed for any data warehouse testing project.

Page 12 of 28

Page 13: Testing data warehouse applications by Kirti Bhushan

Goals of Data Warehouse Testing

Listed below are the different types of testing needed to ensure the quality if the Data warehouse.

Data Completeness testing – to ensure that all expected data is loaded. Data Transformation testing – to ensure that all data is transformed correctly

according to business rules and/or design specifications. Data Quality testing – to ensure that the ETL application correctly rejects,

substitutes default values, corrects or ignores and reports invalid data. Non-Functional Testing – to ensure that an application or entire system can

successfully recover from a variety of hardware, software or network malfunctions without loss of data or data integrity. It also involves verifying log files like Audit log and Error log.

Security Testing – to ensure that only those users granted access to the system can access the applications and only through the appropriate gateways

Performance and Scalability Testing– to ensure that data loads and queries perform within expected time frames and that the technical architecture is scalable.

Integration Testing – to ensure that the ETL process functions well with other upstream and downstream processes.

Reports Testing – to ensure consistency and accuracy of the data reported. User-Acceptance testing –to ensure the solution meets users’ current

expectations and anticipates their future expectations. Regression Testing – to ensure existing functionality remains intact each

time a new release of code is completed.

Data Completeness Testing

One of the most basic tests of data completeness is to verify that all expected data loads into the data warehouse. This includes validating that all records, all fields and the full contents of each field are loaded.

Test Strategies to consider for Data completeness testing include:

Comparing record counts between source data, data loaded to the warehouse and rejected records.

Comparing unique values of key fields between source data and data loaded to the warehouse. This is a valuable technique that points out a variety of possible data errors without doing a full validation on all fields.

Page 13 of 28

Page 14: Testing data warehouse applications by Kirti Bhushan

Utilizing a data profiling tool that shows the range and value distributions of fields in a data set. This can be used during testing and in production to compare source and target data sets and point out any data anomalies from source systems that may be missed even when the data movement is correct.

Populating the full contents of each field to validate that no truncation occurs at any step in the process. For example, if the source data field is a string (30) make sure to test it with 30 characters.

Testing the boundaries of each field to find any database limitations. For example, for a decimal (3) field include values of -99 and 999, and for date fields include the entire range of dates expected. Depending on the type of database and how it is indexed, it is possible that the range of values the database accepts is too small.

Data Transformation Testing

Validating that data is transformed correctly based on business rules can be the most complex part of testing an ETL application with significant transformation logic. One typical method is to pick some sample records and “stare and compare” to validate data transformations manually. This can be useful but requires manual testing steps and testers who understand the ETL logic. A combination of automated data profiling and automated data movement validations is a better long-term strategy. Here are some simple automated data movement techniques:

Create a spreadsheet of scenarios of input data and expected results and validate these with the business customer. This is a good requirements elicitation exercise during design and can also be used during testing.

Create test data that includes all scenarios. Elicit the help of an ETL developer to automate the process of populating data sets with the scenario spreadsheet to allow for flexibility because scenarios will change.

Utilize data profiling results to compare range and distribution of values in each field between source and target data.

Validate correct processing of ETL-generated fields such as surrogate keys. Validate that data types in the warehouse are as specified in the design

and/or the data model. Set up data scenarios that test referential integrity between tables. For

example, what happens when the data contains foreign key values not in the parent table?

Validate parent-to-child relationships in the data. Set up data scenarios that test how orphaned child records are handled.

Page 14 of 28

Page 15: Testing data warehouse applications by Kirti Bhushan

Data Quality Testing

Data quality is defined as ‘how the ETL system handles data rejection, substitution, correction and notification without modifying data.’ To ensure success in testing data quality, include as many data scenarios as possible.

Typically, data quality rules are defined during design, for example:

Reject the record if a certain decimal field has nonnumeric data. Substitute null if a certain decimal field has nonnumeric data. Validate and correct the state field if necessary based on the ZIP code. Compare product code to values in a lookup table, and if there is no match load

anyway but report to users.

Depending on the data quality rules of the application being tested, scenarios to test might include null key values, duplicate records in source data and invalid data types in fields (e.g., alphabetic characters in a decimal field). Review the detailed test scenarios with business users and technical designers to ensure that all are on the same page. Data quality rules applied to the data will usually be invisible to the users once the application is in production; users will only see what’s loaded to the database. For this reason, it is important to ensure that what is done with invalid data is reported to the users. These data quality reports present valuable data that sometimes reveals systematic issues with source data. In some cases, it may be beneficial to populate the “before” data in the database for users to view.

Non-Functional Testing

Error log and Audit log

Error log and Audit log ensures the following: Error messages should be in proper format as in the ETL specification. The log file name should be same as session name, with timestamp and. log

extension. Is there a test case to check whether mail has been sent to the customer in case

there is a failure in loading the data.

Backup and Recovery Testing

Backup and recovery testing ensures that an application or entire system can successfully recover from a variety of hardware, software or network malfunctions without loss of data or data integrity.

Page 15 of 28

Page 16: Testing data warehouse applications by Kirti Bhushan

To ensure maximum system availability and uptime, a proper backup plan must be prepared. The plan must include backup frequency, media and storage. All backup systems must be able to be restored easily and properly “take over” for the failed system without loss of data or transactions or even valuable downtime. BI Tools, which are mostly repository based (i.e., metadata stored in an relational database management system), poses a different challenge when it comes to recovery for configuration management purposes. 

Recovery testing is an antagonistic test process in which the application or system is exposed to extreme conditions (or simulated conditions) such as device I/O failures or invalid database pointers/keys. Recovery processes are invoked, and the application/system is monitored and/or inspected to verify proper application, system and data recovery has been achieved. Both the database administrator and the system administrator plan and execute any such testing.

Security Testing

When it comes to security of a system, there are three aspects of it: Authentication, access control and privileges.Security testing focuses on the following two key areas of security:

System security - This looks after authentication and access control, including logging into and remote access to the system.

Application security - This includes access to the data or business functions and privileges thereof.

System Security

System security ensures that only those users granted access to the system can access the applications and only through the appropriate gateways. Obviously, this is governed by the overall enterprise-wide security policy. Each individual user is assigned a unique user ID and password, which has governing password-changing policy. Generally such security is implemented through such technologies as Windows NT Authentication and/or Lightweight Directory Access Protocol (LDAP) at the OS and network level and the database security. The network administrator(s) and the system administrator(s) will be responsible for setting up and governing such system security. A customized homegrown security implementation is also not uncommon, although they are giving way to the industry choices of OS/LDAP authentication.

A typical requirement that is coming across in a DW/BI application is what is known as single sign-on (SSO) capability. While the requirement is very simple, “users once logged on to their desktop using the local area network (LAN) user ID and password do not want to enter yet another user ID and password (may be same or different from the LAN ID)”, implementing this feature calls for tight integration between the various DW/BI tools and the security application. The tools come with out-of-the-box

Page 16 of 28

Page 17: Testing data warehouse applications by Kirti Bhushan

integration features with NT/LDAP authentication. However, integration with homegrown security is always a challenge and needs to be tested thoroughly.

Database Security

The databases used in a DW/BI project have basically three different types of users: Database administrators. Responsible for creating other users as well as creating

and maintaining the database objects for the application. Developers. Responsible for developing the data warehouse application, using

ETL tools and/or BI tools. Dedicated application users. Responsible for the database connection in the N-

Tier architecture for both ETL and BI tools within the production environment. Database security testing will concentrate on testing the authentication and/or privileges of the above-referenced types of users.

Application Security 

This type of security ensures that, based upon desired restrictions, users are limited to specific functions and data only after they have been authenticated successfully. In a DW/BI project, application security will include that found in the ETL tool and in the BI application as described below. 

ETL Tool Security

When it comes to an ETL tool, the following repository privileges will be tested: Server administrator, Repository administrator, Session operator, Using designer, Browsing repository, and Creating sessions and batches.

ETL security testing will concentrate on testing the ability to perform expected tasks while having particular repository privileges granted to a ETL tool user.

BI Tool Security

When it comes to a BI tool, the application security involves the privileges of a given user. The privileges can be user expertise profile-based (e.g., power user versus basic user) and/or functional subject area-based (access to sales reports versus marketing reports or access to dashboard reports versus operational reports). Data level access restrictions are also not very uncommon (e.g., East Region Manager should see East region data whereas West Region Manager should see West region data only while running the same report). 

Page 17 of 28

Page 18: Testing data warehouse applications by Kirti Bhushan

Generally, two separate user profiles are created in the BI tool and implemented by using security role objects:

Basic users. Those users who will be accessing the application using the Web environment. They will be mostly running canned reports and are not allowed to create their own report. The security testing will validate this.

Power users. Those users who will be able to access the application both from the Web and a client-server desktop environment. They not only can execute the canned reports but also can create their own reports based on pre-existing reusable objects line metrics, attributes and filters. Some expert data stewards are given privileges to create new reusable objects.

Functional subject area-based access or access restrictions are implemented in a BI tool by using what is known as a security group. The access control list (ACL) of various reusable objects in a BI tool will have entries of specific security groups who should be granted access with read/write/execute privileges. Individual end users are assigned to these groups in turn, thereby allowing or restricting access to specific objects.The data level access restrictions are implemented in a BI tool by what is called “security filters.” Security filters are associated with either an individual user or a security group. When these individuals and/or the groups execute a report, the BI tool will automatically append the security filter condition to all SQL generated by the tool against the database, if applicable. 

BI tool security testing will concentrate on testing the privileges of the basic user and the power user. It will also check subject area-specific access control based on various group level accesses. The testing will also check for password change policy. It is more important to test that end users are not able to access those reports which they are not given grants to.

Apart from the two user profiles listed above, there is also a BI tool administrator and a developer role. Developers will have privileges for using the BI tool architect and agent software via the desktop. The test cases used for testing the power user’s security access can also be used to test the developer’s security access as well. The BI tool administrator has all-inclusive privileges. No separate testing is required.

Performance and Scalability

As the volume of data in a data warehouse grows, ETL load times can be expected to increase and performance of queries can be expected to degrade. This can be mitigated by having a solid technical architecture and good ETL design. The aim of the performance testing is to point out any potential weaknesses in the ETL design, such as reading a file multiple times or creating unnecessary intermediate files.

The following strategies will help discover performance issues:

Page 18 of 28

Page 19: Testing data warehouse applications by Kirti Bhushan

Load the database with peak expected production volumes to ensure that this volume of data can be loaded by the ETL process within the agreed-upon window.

Compare these ETL loading times to loads performed with a smaller amount of data to anticipate scalability issues. Compare the ETL processing times component by component to point out any areas of weakness.

Monitor the timing of the reject process and consider how large volumes of rejected data will be handled.

Perform simple and multiple join queries to validate query performance on large database volumes. Work with business users to develop sample queries and acceptable performance criteria for each query.

Integration Testing

Typically, system testing only includes testing within the ETL application. The endpoints for system testing are the input and output of the ETL code being tested. Integration testing shows how the application fits into the overall flow of all upstream and downstream applications. When creating integration test scenarios, consider how the overall process can break and focus on touch points between applications rather than within one application. Consider how process failures at each step would be handled and how data would be recovered or deleted if necessary.Most issues found during integration testing are either data related to or resulting from false assumptions about the design of another application. Therefore, it is important to integration test with production-like data. Real production data is ideal, but depending on the contents of the data, there could be privacy or security concerns that require certain fields to be randomized before using it in a test environment. As always, don’t forget the importance of good communication between the testing and design teams of all systems involved. To help bridge this communication gap, gather team members from all systems together to formulate test scenarios and discuss what could go wrong in production. Run the overall process from end to end in the same order and with the same dependencies as in production. Integration testing should be a combined effort and not the responsibility solely of the team testing the ETL application.

Reports Testing

End user reporting is the final component of a DW/BI Project. Reports provide the visual output to the end consumer. Typically, the BI reports are most likely developed using an Online Analytical Processing (OLAP) tool like Cognos, Business Objects, Hyperion, MicroStrategy, etc. The reports run aggregate SQL queries against the data stored in the data mart and/or the DW and display them in a suitable format either on a Web browser or on a client application interface. Once the initial view is rendered, the reporting tool interface provides various ways of manipulating the

Page 19 of 28

Page 20: Testing data warehouse applications by Kirti Bhushan

information such as sorting, pivoting, adding subtotals, and adding view filters to slice-and-dice the information further.

Keep in mind some special considerations while testing the reports: The ETL process should be complete and the data mart must be populated. The BI tools generally have an SQL engine which will generate the SQL based

on the how the dimension and fact tables are mapped in the tool. Additionally, there may be some global or report-specific parameters set to handle very large database (VLDB)-related optimization requirements. As such, testing of the BI tool will concentrate on validating the SQL generated; this in turn validates the dimensional model and the report specification vis-à-vis the design.

Unit testing of the BI reports is recommended to test the layout format per the design mockup, style sheets, prompts and filters, attributes and metrics on the report.

Unit testing will be executed both in the desktop and Web environment. System testing of the BI reports should concentrate on various report

manipulation techniques like the drilling, sorting and export functions of the reports in the Web environment.

Testing of the reports will require an initial load of data followed by two incremental loads of data.

Dashboard reports and/or documents need special consideration for testing, because they are high visibility reports used by the top management and because they have various charts, gauges and data points to provide a visual insight to the performance of the organization in question.

There may be some trending reports, or more specifically called comp reports, that compare the performance of an organizational unit over two time periods. Testing these reports needs special consideration especially if a fiscal calendar is used instead of an English calendar for time period comparison.

For reports containing derived metrics (for example, “cost per click,” which is defined as the sum of cost divided by sum of clicks) special focus should be paid to any subtotals. The subtotal row should use a “smart-total,” i.e., do the aggregation first and then do the division instead of adding up the individual cost per click of each row in the report.

Reports with “nonaggregateable” metrics (e.g., inventory at hand) also need special attention to the subtotal row. It should not, for example, add up the inventory for each week and show the inventory of the month.

During unit testing, all data formats should be verified against the standard. For example, metrics with monetary value should show the proper currency symbol, decimal point precision (at least two places) and the appropriate positive or negative. For example, negative numbers should be shown in red and enclosed in braces.

Page 20 of 28

Page 21: Testing data warehouse applications by Kirti Bhushan

During system testing, while testing the drill-down capability of reports, care must be taken to verify that the subtotal at the drill-down report matches with the corresponding row of the summary report. At times, it is desirable to carry the parent attribute to the drill-down report; verify the requirements for this.

When testing a report containing conditional metrics, care should be taken to check for “outer join condition;” i.e., nonexistence of one condition is reflected appropriately with the existence of the other condition.

Reports with multilevel sorting needs special attention for testing especially if the multilevel sorting includes both attributes and metrics to be sorted.

Reports containing metrics at different dimensionality and with percent-to-total metrics and/or cumulative metrics needs special attention to check that the subtotals are hierarchy-aware (i.e., they “break” or “re-initialized” at the appropriate levels).

User Acceptance Testing

The main reason for building a data warehouse application is to make data available to business users. Users know the data best, and their participation in the testing effort is a key component to the success of a data warehouse implementation. User-acceptance testing (UAT) typically focuses on data loaded to the data warehouse and any views that have been created on top of the tables, not the mechanics of how the ETL application works. Consider the following strategies:

Use data that is either from production or as near to production data as possible. Users typically find issues once they see the “real” data, sometimes leading to design changes.

Test database views comparing view contents to what is expected. It is important that users sign off and clearly understand how the views are created.

Plan for the system test team to support users during UAT. The users will likely have questions about how the data is populated and need to understand details of how the ETL works.

Consider how the users would require the data loaded during UAT and negotiate how often the data will be refreshed.

Page 21 of 28

Page 22: Testing data warehouse applications by Kirti Bhushan

Regression Testing

Regression testing is revalidation of existing functionality with each new release of code. When building test cases, remember that they will likely be executed multiple times as new releases are created due to defect fixes, enhancements or upstream systems changes. Building automation during system testing will make the process of regression testing much smoother. Test cases should be prioritized by risk in order to help determine which need to be rerun for each new release. A simple but effective and efficient strategy to retest basic functionality is to store source data sets and results from successful runs of the code and compare new test results with previous runs. When doing a regression test, it is much quicker to compare results to a previous execution than to do an entire data validation again.

Page 22 of 28

Page 23: Testing data warehouse applications by Kirti Bhushan

Scope of Testing

The above section “Goals of Data Warehouse Testing” details all the areas that can be possibly tested for a Data Warehouse project. For any specific project, the approach for Testing would detail the selected areas and types of testing to be performed – which would be a subset of the “Goals of Data Warehouse Testing”. After the Testing approach is defined, the assumptions for the project would be defined followed by the In-scope and out-of –scope testing activities.

Page 23 of 28

Page 24: Testing data warehouse applications by Kirti Bhushan

Roles and Responsibilities

The roles and responsibilities of the Testers, the Subject Matter Experts (SMEs) from the BI team would be clearly defined as per the Test Approach. Usage of the SMEs at various phases of the testing project would also be defined and their roles and responsibilities need to be charted out clearly.

Page 24 of 28

Page 25: Testing data warehouse applications by Kirti Bhushan

Artifacts / Deliverables

Software Project Plan (SPP)

The Software Project Plan (SPP) will be as per the template applicable per RBS IDC format. The high level areas that would be covered in the SPP are:

Scope Project Planning Quality Planning Software Configuration Management Planning

System Test Plan

The System Test Plan will be as per the template applicable per RBS IDC format.. The high level areas that would be covered in the Test Plan are:

Purpose Project Synopsis Testing Environment Schedule for System Testing Scope of System Testing Test Data Test Cycles Out of Scope of System Testing Constraints of System Testing Entry Criteria of System testing Suspension Criteria of System testing Exit Criteria of System testing Tools Used

System Test Cases/Test Plan and Scripts (TPS)

The System Test Case/Scripts will be as per the template applicable RBS IDC format. The high level areas that would be covered in the TPS are:

High Level Scenarios Detailed Test Cases Predictions Test Execution Scripts(SQL) Updated Requirement Traceability Matrix

Page 25 of 28

Page 26: Testing data warehouse applications by Kirti Bhushan

Tools and Automation in Data Warehousing

There are no standard guidelines on the tools that can be used for data warehouse testing.Majority of the testing teams go with the tool that has been used for the data warehouse implementation. A drawback of this approach is redundancy. The same transformation logic will have to be developed for DWH implementation and also for its testing.One may Try selecting an independent tool for testing of the DWH.Eg: Transformation logic implemented using a tool 'X' can be tested by reproducing the same logic in yet another tool- say 'Y'.

Tool Selection also depends on the test strategy viz exhaustive verification, Sampling, Aggregation etc.Reusability & Scalibility of the test Suite being develpoed is a very important factor to be considered.Tools with built-in test strategies help in deskilling.Also one shpould focus on and explore areas of automation in datawarehousing and use other tools to automate such areas. Example of these areas could be automation of web applications used for test data generation.

ETL tools and automationThe ETL technology which helps in automating the process of data loading has one type which produces code and another that produces run time modules which can be parameterized. To get the real benefit of Data Warehouse you need to go thru pain of automating data loading to it from various sources (depicted in diagram below). ETL software can help you in automating such process of data loading from Operational environment to Data Warehouse environment.

Automation Plan for datawarehouse testing

Page 26 of 28

Page 27: Testing data warehouse applications by Kirti Bhushan

Spend as much time as you can in analysis phase itself and understand as much as you can for source system.

Because your database and table structure would be different for Source and Destination, so create source side queries which should provide you the result which is expected by user from target data warehouse. Get the result set from source certified by customer for their needs.

Based on the data loading and data transformations done for the target data warehouse, create target side queries which should yield the same results set which you got in above step. This seems very easy to do but actually not, when you work on creating schema you miss the various fields which are required to join tables in queries. If you are able to fetch the same result set from Target data warehouse then you can get your design and data certified from customer.

You can use any tool based on in which language your ETL’s are written and on which RDBMS your Data Warehouse is hosted. I would recommend C#, SQL SSIS technology if your ETL’s are designed for SQL Server and Data Warehouse is hosted on SQL Server. So to conclude about the tools and Automation scope in datawarehouse testing we can broadly say that, In data warehouse systems ETL’s are the tools which can pull the data from operation systems to data warehouse systems, which are needed for various regulatory compliance and audit needs. Test automation of this ETL’s can help you save lot of time in data analysis and every monthly, quarterly, half yearly and yearly efforts depending on your data loading frequency to data.

Data warehouse testing: Best practices

Focus on Data Quality: If the data quality is ascertained, then the testing of ETL logic is pretty much straight forward. One need not even test 100% of the data .The ETL logic alone can be tested pitching it against all possible data sets. However, signing off the data quality is no easy task, simply becuae of its sheer volume.

Identify Critical Business Scenarios: The Sampling technique will have to be adopted many a times in Data warehouse testing. However, what constitutes a sample? 10% or 20% or 60%?Identification of critical business scenarios and including more test data is a route to success.

Automation: Automate as much as possible! The Data warehouse test Suite will be used time and again as the database will be periodically updated. Hence a regression suite should be built and be available for use at any time. This will save much of a time.

Page 27 of 28

Page 28: Testing data warehouse applications by Kirti Bhushan

References1 www. wikipedia.org 2 Bill Inmon, an early and influential practitioner [Wikipedia.org]3 http://www.dmreview.com/

Page 28 of 28