etl testing

18
ETL Testing / Data Warehouse Testing – Tips, Techniques, Process and Challenges Posted In | Database Testing | Last Updated: "February 12, 2015" This is a guest post by Vishal Chhaperia. If you want to publish your article please read our article guidelines . Today let me take a moment and explain my testing fraternity about one of the much in demand and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load). This article will present you with a complete idea about ETL testing and what we do to test ETL process. It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as prospective business gain. Customers have been offered different range of products in terms of service offerings, distributed in many areas based on technology, process and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully. Why do organizations need Data Warehouse? Organizations with organized IT practices are looking forward to

Upload: sandip-patil

Post on 08-Aug-2015

93 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Etl testing

ETL Testing / Data Warehouse Testing – Tips, Techniques, Process and ChallengesPosted In | Database Testing | Last Updated: "February 12, 2015"

This is a guest post by Vishal Chhaperia. If you want to publish your article please read our article guidelines.

Today let me take a moment and explain my testing fraternity about one of the much in demand and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load). This article will present you with a complete idea about ETL testing and what we do to test ETL process.

It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as prospective business gain. Customers have been offered different range of products in terms of service offerings, distributed in many areas based on technology, process and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.

Why do organizations need Data Warehouse?Organizations with organized IT practices are looking forward to create a next level of technology transformation. They are now trying to make themselves much more operational with easy-to-interoperate data. Having said that data is most important part of any organization, it may be everyday data or historical data. Data is backbone of any report and reports are the baseline on which all the vital management decisions are taken.

Most of the companies are taking a step forward for constructing their data warehouse to store and monitor real time data as well as historical data. Crafting an efficient data warehouse is not an easy job. Many organizations have distributed departments with different applications running on distributed technology. ETL tool is employed in order to make a flawless integration between

Page 2: Etl testing

different data sources from different departments. ETL tool will work as an integrator, extracting data from different sources; transforming it in preferred format based on the business transformation rules and loading it in cohesive DB known are Data Warehouse.

Well planned, well defined and effective testing scope guarantees smooth conversion of the project to the production. A business gains the real buoyancy once the ETL processes are verified and validated by independent group of experts to make sure that data warehouse is concrete and robust.

ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

New Data Warehouse Testing – New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is build and verified with the help of ETL tools.

Migration Testing – In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency.

Change Request – In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule.

Report Testing – Report are the end result of any Data Warehouse and the basic propose for which DW is build. Report must be tested by validating layout, data in the report and calculation.

ETL Testing Techniques:

1) Verify that data is transformed correctly according to various business requirements and rules.2) Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.3) Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.4) Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability.

Apart from these 4 main ETL testing methods other testing methods like integration testing and user acceptance testing is also carried out to make sure everything is smooth and reliable.

ETL Testing Process:

Similar to any other testing that lies under Independent Verification and Validation, ETL also go through the same phase.

Business and requirement understanding Validating

Page 3: Etl testing

Test Estimation

Test planning based on the inputs from test estimation and business requirement

Designing test cases and test scenarios from all the available inputs

Once all the test cases are ready and are approved, testing team proceed to perform pre-execution check and test data preparation for testing

Lastly execution is performed till exit criteria are met

Upon successful completion summary report is prepared and closure process is done.

It is necessary to define test strategy which should be mutually accepted by stakeholders before starting actual testing. A well defined test strategy will make sure that correct approach has been followed meeting the testing aspiration. ETL testing might require writing SQL statements extensively by testing team or may be tailoring the SQL provided by development team. In any case testing team must be aware of the results they are trying to get using those SQL statements.

Difference between Database and Data Warehouse TestingThere is a popular misunderstanding that database testing and data warehouse is similar while the fact is that both hold different direction in testing.

 Database testing is done using smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases.

 In database testing normally data is consistently injected from uniform sources while in data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent.

We generally perform only CRUD (Create, read, update and delete) operation in database testing while in data warehouse testing we use read-only (Select) operation.

Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.

There are number of universal verifications that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in ETL testing:– Verify that data transformation from source to destination works as expected– Verify that expected data is added in target system– Verify that all DB fields and field data is loaded without any truncation– Verify data checksum for record count match– Verify that for rejected data proper error logs are generated with all details– Verify NULL value fields– Verify that duplicate data is not loaded– Verify data integrity

Page 4: Etl testing

ETL Testing Challenges:

ETL testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here is the list of few ETL testing challenges I experienced on my project:– Incompatible and duplicate data.– Loss of data during ETL process.– Unavailability of inclusive test bed.– Testers have no privileges to execute ETL jobs by their own.– Volume and complexity of data is very huge.– Fault in business process and procedures.– Trouble acquiring and building test data.– Missing business flow information.

Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes hazard of data loss in production.

Hope these tips will help ensure your ETL process is accurate and the data warehouse build by this is a competitive advantage for your business.

Database Testing – Practical Tips and Insight on How to Test DatabaseDatabase is one of the inevitable parts of a software application these days. It does not matter at all whether it is web or desktop, client server or peer to peer, enterprise or individual business, database is working at backend. Similarly, whether it is healthcare of finance, leasing or retail, mailing application or controlling spaceship, behind the scene a database is always in action.

Moreover, as the complexity of application increases the need of stronger and secure database emerges. In the same way, for the applications with high frequency of transactions (e.g. banking or finance application), necessity of fully featured DB Tool is coupled.

Currently, several database tools are available in the market e.g. MS-Access2010, MS SQL Server 2008 r2, Oracle 10g, Oracle Financial, MySQL, PostgreSQL, DB2 etc.  All of these vary in cost, robustness, features and security. Each of these DBs possesses its own benefits and drawbacks. One thing is certain; a business application must be built using one of these or other DB Tools.

Page 5: Etl testing

Before I start digging into the topic, let me comprehend the foreword. When the application is under execution, the end user mainly utilizes the ‘CRUD’ operations facilitated by the DB Tool.

C: Create – When user ‘Save’ any new transaction, ‘Create’ operation is performed.R: Retrieve – When user ‘Search’ or ‘View’ any saved transaction, ‘Retrieve’ operation is performed.U: Update – when user ‘Edit’ or ‘Modify’ an existing record, the ‘Update’ operation of DB is performed.D: Delete – when user ‘Remove’ any record from the system, ‘Delete’ operation of DB is performed.

It does not matter at all, which DB is used and how the operation is preformed. End user has no concern if any join or sub-query, trigger or stored-procedure, query or function was used to do what he wanted. But, the interesting thing is that all DB operations performed by user, from UI of any application, is one of the above four, acronym as CRUD.

As a database tester one should be focusing on following DB testing activities:

What to test in database testing:

1) Ensure data mapping:

Make sure that the mapping between different forms or screens of AUT and the Relations of its DB is not only accurate but is also according to design documents. For all CRUD operations, verify that respective tables and records are updated when user clicks ‘Save’, ‘Update’, ‘Search’ or ‘Delete’ from GUI of the application.

2) Ensure ACID Properties of Transactions:

Page 6: Etl testing

ACID properties of DB Transactions refer to the ‘Atomicity’, ‘Consistency’, ‘Isolation’ and ‘Durability’. Proper testing of these four properties must be done during the DB testing activity. This area demands more rigorous, thorough and keen testing when the database is distributed.

3) Ensure Data Integrity:

Consider that different modules (i.e. screens or forms) of application use the same data in different ways and perform all the CRUD operations on the data. In that case, make it sure that the latest state of data is reflected everywhere. System must show the updated and most recent values or the status of such shared data on all the forms and screens. This is called the Data Integrity.

4) Ensure Accuracy of implemented Business Rules:

Today, databases are not meant only to store the records. In fact, DBs have been evolved into extremely powerful tools that provide ample support to the developers in order to implement the business logic at DB level. Some simple examples of powerful features of DBs are ‘Referential Integrity’, relational constrains, triggers and stored procedures. So, using these and many other features offered by DBs, developers implement the business logic on DB level. Tester must ensure that the implemented business logic is correct and works accurately.

Above points describe the four most important ‘What Tos’ of database testing. Now, I will put some light on ‘How Tos’ of DB Testing. But, first of all I feel it better to explicitly mention an important point. DB Testing is a business critical task, and it should never be assigned to a fresh or inexperienced resource without proper training.

How To Test Database:

1. Create your own Queries

In order to test the DB properly and accurately, first of all a tester should have very good knowledge of SQL and specially DML (Data Manipulation Language) statements. Secondly, the tester should acquire good understanding of internal DB structure of AUT. If these two pre-requisites are fulfilled, then the tester is ready to test DB with complete confidence. (S)He will perform any CRUD operation from the UI of application, and will verify the result using SQL query.

------------

This is the best and robust way of DB testing especially for applications with small to medium level of complexity. Yet, the two pre-requisites described are necessary. Otherwise, this way of DB testing cannot be adopted by the tester.

Moreover, if the application is very complex then it may be hard or impossible for the tester to write all of the needed SQL queries himself or herself. However, for some complex queries,

Page 7: Etl testing

tester may get help from the developer too. I always recommend this method for the testers because it does not only give them the confidence on the testing they have performed but, also enhance their SQL skill.

2. Observe data table by table

If the tester is not good in SQL, then he or she may verify the result of CRUD operation, performed using GUI of the application, by viewing the tables (relations) of DB. Yet, this way may be a bit tedious and cumbersome especially when the DB and tables have large amount of data.

Similarly, this way of DB testing may be extremely difficult for tester if the data to be verified belongs to multiple tables. This way of DB testing also requires at least good knowledge of Table structure of AUT.

3. Get query from developer

This is the simplest way for the tester to test the DB. Perform any CRUD operation from GUI and verify its impacts by executing the respective SQL query obtained from the developer. It requires neither good knowledge of SQL nor good knowledge of application’s DB structure.

So, this method seems easy and good choice for testing DB. But, its drawback is havoc. What if the query given by the developer is semantically wrong or does not fulfill the user’s requirement correctly? In this situation, the client will report the issue and will demand its fix as the best case. While, the worst case is that client may refuse to accept the application.

 

Conclusion:

Database is the core and critical part of almost every software application. So DB testing of an application demands keen attention, good SQL skills, proper knowledge of DB structure of AUT and proper training.

In order to have the confident test report of this activity, this task should be assigned to a resource with all the four qualities stated above. Otherwise, shipment time surprises, bugs identification by the client, improper or unintended application’s behavior or even wrong outputs of business critical tasks are more likely to be observed. Get this task done by most suitable resources and pay it the well-deserved attention

Page 8: Etl testing

Tips to design test data before executing your test casesPosted In | Testing best practices, Testing Skill Improvement, Testing Tips and resources | Last Updated: "February 12, 2015"

I have mentioned importance of proper test data in many of my previous articles. Tester should check and update the test data before execution of any test case. In this article I will provide tips on how to prepare test environment so that any important test case will not be missed by improper test data and incomplete test environment setup.

What do I mean by test data?

If you are writing test case then you need input data for any kind of test. Tester may provide this input data at the time of executing the test cases or application may pick the required input data from the predefined data locations. The test data may be any kind of input to application, any kind of file that is loaded by the application or entries read from the database tables. It may be in any format like xml test data, system test data, SQL test data or stress test data.

Preparing proper test data is part of the test setup. Generally testers call it as testbed preparation. In testbed all software and hardware requirements are set using the predefined data values.

If you don’t have the systematic approach for building test data while writing and executing test cases then there are chances of missing some important test cases. Tester can’t justify any bug saying that test data was not available or was incomplete. It’s every testers responsibility to create his/her own test data according to testing needs. Don’t even rely on the test data created by other tester or standard production test data, which might not have been updated for months! Always create fresh set of your own test data according to your test needs.

Sometime it’s not possible to create complete new set of test data for each and every build. In such cases you can use standard production data. But remember to add/insert your own data sets in this available database. One good way to design test data is use the existing sample test data or testbed and append your new test case data each time you get same module for testing. This way you can build comprehensive data set.

Page 9: Etl testing

How to keep your data intact for any test environment?

Many times more than one tester is responsible for testing some builds. In this case more than one tester will be having access to common test data and each tester will try to manipulate that common data according to his/her own needs. Best way to keep your valuable input data collection intact is to keep personal copies of the same data. It may be of any format like inputs to be provided to the application, input files such as word file, excel file or other photo files.

Check if your data is not corrupted:Filing a bug without proper troubleshooting is bad a practice. Before executing any test case on existing data make sure that data is not corrupted and application can read the data source.

How to prepare data considering performance test cases?

Performance tests require very large data set. Particularly if application fetching or updating data from DB tables then large data volume play important role while testing such application for performance. Sometimes creating data manually will not detect some subtle bugs that may only be caught by actual data created by application under test. If you want real time data, which is impossible to create manually, then ask your manager to make it available from live environment.

I generally ask to my manager if he can make live environment data available for testing. This data will be useful to ensure smooth functioning of application for all valid inputs.

Take example of my search engine project ‘statistics testing’. To check history of user searches and clicks on advertiser campaigns large data was processed for several years which was practically impossible to manipulate manually for several dates spread over many years. So there is no other option than using live server data backup for testing. (But first make sure your client is allowing you to use this data)

What is the ideal test data?

Test data can be said to be ideal if for the minimum size of data set all the application errors get identified. Try to prepare test data that will incorporate all application functionality, but not exceeding cost and time constraint for preparing test data and running tests.

------------

How to prepare test data that will ensure complete test coverage?

Design your test data considering following categories:Test data set examples:1) No data: Run your test cases on blank or default data. See if proper error messages are generated.

Page 10: Etl testing

2) Valid data set: Create it to check if application is functioning as per requirements and valid input data is properly saved in database or files.

3) Invalid data set: Prepare invalid data set to check application behavior for negative values, alphanumeric string inputs.

4) Illegal data format: Make one data set of illegal data format. System should not accept data in invalid or illegal format. Also check proper error messages are generated.

5) Boundary Condition data set: Data set containing out of range data. Identify application boundary cases and prepare data set that will cover lower as well as upper boundary conditions.

6) Data set for performance, load and stress testing: This data set should be large in volume.

This way creating separate data sets for each test condition will ensure complete test coverage.

Conclusion:

Preparing proper test data is a core part of “project test environment setup”. Tester cannot pass the bug responsibility saying that complete data was not available for testing. Tester should create his/her own test data additional to the existing standard production data. Your test data set should be ideal in terms of cost and time. Use the tips provided in this article to categorize test data to ensure complete functional test cases coverage.

Be creative, use your own skill and judgments to create different data sets instead of relying on standard production data while testing.

Database Testing – Properties of a Good Test Data and Test Data Preparation TechniquesPosted In | Database Testing | Last Updated: "February 12, 2015"

A couple of months ago, I wrote about database testing strategies. It covered the aspect that is entirely related to the execution of test cases. It was all about black-box testing of a database. There is another important aspect of DB testing activity which we will cover in this article.

As a tester, you have to test the ‘Examination Results’ module of the website of a university. Consider the whole application has been integrated and it is in ‘Ready for Testing’ state. ‘Examination Module’ is linked with ‘Registration’, ‘Courses’ and ‘Finance’ modules. Assume that you have adequate information of the application and you created a comprehensive list of

Page 11: Etl testing

test scenarios. Now you have to design, document and execute these test cases. In ‘Actions/Steps’ section of the test cases, you must mention the acceptable data as input for the test. The data mentioned in test cases must be selected properly. The accuracy of ‘Actual Results’ column of TC Document is primarily dependent upon the test data. So, step to prepare the input test data is significantly important. Thus, here is my rundown on “DB Testing – Test Data Preparation Strategies”.

Properties of Test Data:

The test data should be selected precisely and it must possess the following four qualities:

1. Realistic: By realistic, it means the data should be accurate in the context of real life e.g. in order to test ‘Age’ field, all the values should be positive and 18 or above. It is quite obvious that the candidates for an admission in the university are usually 18 years old (this might be defined in requirements).

2. Practically valid: This is similar to realistic but not the same. This property is more related to the business logic of AUT e.g. value 60 is realistic in age field but practically invalid for a candidate of Graduation or even Masters Programs. In this case, a valid range would be 18-25 years (this might be defined in requirements).

3. Versatile to cover scenarios: There may be several subsequent conditions in a single scenario, so choose the data shrewdly to cover maximum aspects of a single scenario with minimum set of data, e.g. while creating test data for result module, do not only consider the case of regular students who are smoothly completing their program. Give attention to the students who are repeating the same course and belong to different semesters or even different programs. The data set may look like this:

Sr# Student_ID Program_ID Course_ID Grade1 BCS-Fall2011-Morning-01 BCS-F11 CS-401 A2 BCS-Spring2011-Evening-14 BCS-S11 CS-401 B+3 MIT-Fall2010-Afternoon-09 MIT-F10 CS-401 A-… … … … …

Page 12: Etl testing

There might be several other interesting and tricky sub-conditions. E.g. the limitation of years to complete a degree program, passing a prerequisite course for registering a course, maximum no. of courses a student may enroll in a single semester etc. etc. Make sure to cover all these scenarios wisely with finite set of data.

4. Exceptional data (if applicable/required): There may be certain exceptional scenarios that are less frequent but demand high importance when occur, e.g. disabled students related issues.

------------

Test data preparation techniques:

We have briefly discussed the important properties of test data and it also elaborates how test data selection is important while database testing. Now let’s discuss the ‘techniques to prepare test data’.

There are only two ways to prepare test data:

Method 1. Insert New Data:

Get a clean DB and insert all the data as specified in your test cases. Once, all your required and desired data has been entered, start executing your test cases and fill ‘Pass/Fail’ columns by comparing the ‘Actual Output’ with ‘Expected Output’.  Sounds simple, right? But wait, it’s not that simple.

Few essential and critical concerns are as follows:

1. Empty instance of database may not be available2. Inserted test data may be insufficient for testing some cases like performance and load

testing.

3. Inserting the required test data into blank DB is not an easy job due to the database table dependencies. Because of this inevitable restriction, data insertion can become difficult task for tester.

4. Insertion of limited test data (just according to the test cases needs) may hide some issues that could be found only with the large data set.

5. For data insertion, complex queries and/or procedures may be required, and for this sufficient assistance or help from the DB developer(s) would be necessary.

Above mentioned five issues are the most important and the most obvious drawbacks of this technique for test data preparation. But if there are some advantages as well:

1. Execution of TCs becomes more efficient as the DB has the required data only.2. Bugs isolation requires no time as only the data specified in test cases present in the DB.

Page 13: Etl testing

3. Less time required for testing and results comparison.

4. Clutter-free test process

Method 2. Choose sample data subset from actual DB data:

This is the feasible and more practical technique for test data preparation. However it requires sound technical skills and demands detailed knowledge of DB Schema and SQL. In this method you need to copy and use production data by replacing some field values by dummy values. This is the best data subset for your testing as it represents the production data.  But this may not be feasible all the time due to data security and privacy issues.

This strategy deserves one separate post which we’ll discuss in next article ‘Database gray-box testing’ and precautions to take while testing database.

.