Download - Test labs 2016. Тестирование data warehouse
1 © L
uxoft
Tra
inin
g 2
012
1 © L
uxoft
Tra
inin
g 2
012
TEST Labs 2016
Тестирование Data
Warehouse (DWH)
Юрий Слива
Luxoft
1. Введение.
2. Основные понятия и принципы работы DWH.
3. Тестирование DWH. С чего начать?
4. SQL(DDL, DML, DCL) и их использование в тестировании.
5. Tips and tricks. QA.
Содержание курса
3 © L
uxoft
Tra
inin
g 2
012
3 © L
uxoft
Tra
inin
g 2
012
Тестирование DWH
Введение
Relational database
A relational database is a collection of data
items organized as a set of formally-described
tables from which data can be accessed or
reassembled in many different ways without
having to reorganize the database tables.
The standard user and application
program interface to a relational
database is the structured query
language (SQL).
5 © L
uxoft
Tra
inin
g 2
012
5 © L
uxoft
Tra
inin
g 2
012
Тестирование DWH
Основные понятия и принципы работы DWH.
Why a Data Warehouse is Separated from
Operational Databases?
• An operational database is constructed for well-
known tasks and workloads such as searching
particular records, indexing, etc.
• In contract, data warehouse queries are often
complex and they present a general form of
data.
• Operational databases support concurrent
processing of multiple transactions.
• Concurrency control and recovery mechanisms
are required for operational databases to ensure
robustness and consistency of the database.
• An operational database query allows to read
and modify operations, while an OLAP query
needs only read only access of stored data.
• An operational database maintains current data.
On the other hand, a data warehouse maintains
historical data.
What is Data Warehouse?
• A data warehouse is a database, which is kept
separate from the organization's operational
database.
• There is no frequent updating done in a data
warehouse.
• It possesses consolidated historical data, which
helps the organization to analyse its business.
• A data warehouse helps executives to organize,
understand, and use their data to take strategic
decisions.
• Data warehouse systems help in the integration
of diversity of application systems.
• A data warehouse system helps in consolidated
historical data analysis.
8 © L
uxoft
Tra
inin
g 2
012
8 © L
uxoft
Tra
inin
g 2
012
Тестирование DWH
Тестирование DWH. С чего начать?
E
TL
Source data
Transformed
data
Business application specific data
Business application specific data
ET
L
T r a n s f o r m e d d a t a
Local
storage area
Dimensions
Schema 1
Application1
Pipe - delimited data
Feed 1
Feed 2
Real - time feeds
Feed 3
( Web Services )
Feeds
Static Data
DATA ( Oracle DB )
XLS
CSV
CSV
CSV
XLS
CSV
Application area 1 Staging Area
J M
S
T r a
n s f o
r m a t i o
n a
r e a
( I
o r
t c a )
Transformed data from Schema 1
Application 3
Application area 3
T r a n s f o r m e d d a t a Application 2
Application area 2
Reporting
App
App
Reporting
Reporting
ET
L
ET
L
Shared Folder
SFTP
SFTP
SFTP
from Schema 1
from Schema 1
Transformed
data
Transformed
data
DWH - high level
DWH Testing Process
Test Preparation
Following task’s should be done on test
preparation phase:
- Analyse requirements
- Create test plan
- Clarify open points
- Create test pack (test cases)
- Mitigate risks
Test Execution
• Test Scripts and Test Cases execution -
it is the responsibility of the Testers, and
test Results are recorded by tester in
the Bug tracking system.
• The tester will record any defects
identified during test execution in the
Defect Management system
• Defects will be logged in Defect
Management System, according to the
Defect Management process definition.
DWH – feeds testing Legend:
System Parameter, ie parameter is generated by system Parameter 1
Parametrized XML parameter (i.e. value of tag is derived from one of system field) <Attribute>
Line # Xpath (open tag) Input Parameter Xpath (close tag) R/O/C
1 <?xml version="1.0" encoding="UTF-8"?>
2 <publicExecutionReport xmlns="http://www.fpml.org/FpML-5/transparency"
3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fpmlVersion="5-3"
4 xsi:schemaLocation="http://www.fpml.org/FpML-5/transparency
../../xmls/SDR/transparency/fpml-main-5-3.xsd">
5 <header>
6 <messageId messageIdScheme=" Data prefix "> Required
7 Internal TWH Message SID </messageId>
8 <sentBy> Data value </sentBy> Required
9 <sendTo>DTCCGTR</sendTo>
10 <creationTimestamp> Message Creation Date/Time </creationTimestamp>
11 </header>
12
SELECT
to_char(EXECUTIONDATETIME2)
FROM SCHEMA_OWNER.TABLE T1,
XMLTABLE
(
XMLNAMESPACES
(
'http://www.fpml.org/FpML-5/transparency' AS "ns"
),
'//ns:publicExecutionReport'
PASSING XMLType(T.MESSAGE_C)
COLUMNS -- columns for parsed values
EXECUTIONDATETIME2 VARCHAR2(200) PATH
'//ns:termination/ns:executionDateTime/text()'
)t2
where
t1.id = 1
DWH – Staging Area
A staging area, or landing zone, is an intermediate storage area used for data
processing during the extract, transform and load (ETL) process. The data
staging area sits between the data source(s) and the data target(s), which are
often data warehouses, data marts, or other data repositories.[1]
List of the most popular ETL tools
• Informatica - Power Center
• IBM - Websphere DataStage(Formerly known
as Ascential DataStage)
• SAP - BusinessObjects Data Integrator
• IBM - Cognos Data Manager (Formerly known
as Cognos DecisionStream)
• Microsoft - SQL Server Integration Services
• Oracle - Data Integrator (Formerly known as
Sunopsis Data Conductor)
• SAS - Data Integration Studio
• Oracle - Warehouse Builder
• AB Initio
• Information Builders - Data Migrator
• Pentaho - Pentaho Data Integration
• Embarcadero Technologies - DT/Studio
• IKAN - ETL4ALL
• IBM - DB2 Warehouse Edition
• Pervasive - Data Integrator
• ETL Solutions Ltd. - Transformation Manager
• Group 1 Software (Sagent) – DataFlow
• Sybase - Data Integrated Suite ETL
• Talend - Talend Open Studio
• Expressor Software - Expressor Semantic Data
Integration System
• Elixir - Elixir Repertoire
• OpenSys - CloverETL
ETL Testing
Key points:
• Ensure that data is transformed correctly
• Without any data loss and truncation projected
• Data should be loaded into the data warehouse
• ETL application appropriately rejects and
replaces with default values and reports invalid
data
• Make sure that the data loaded in data
warehouse within prescribed and expected time
frames to confirm scalability and performance
• All methods should have appropriate unit tests
regardless of visibility
• To measure their effectiveness all unit tests
should use appropriate coverage techniques
• Strive for one assertion per test case
• Create unit tests that target exceptions
Testers key responsibilities:
• Stage table testing
• Business transformation logic applied
• Target table loading from stage file or table after
applying a transformation.
Mapping
Source -> Staging
Staging to CSV-file
16 © L
uxoft
Tra
inin
g 2
012
16 © L
uxoft
Tra
inin
g 2
012
Тестирование DWH
SQL(DDL, DML, DCL) и их использование в
тестировании
SQL(DDL, DML, DCL)
Data Definition Language (DDL) are used to define the database structure or schema. Examples:
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object
Data Manipulation Language (DML) are used for managing data within schema objects. Examples:
SELECT - retrieve data from the a database
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
MERGE - UPSERT operation (insert or update)
CALL - call a PL/SQL or Java subprogram
EXPLAIN PLAN - explain access path to data
LOCK TABLE - control concurrency
Data Control Language (DCL) is used for privileges. Examples:
GRANT - gives user's access privileges to database
REVOKE - withdraw access privileges given with the GRANT command
18 © L
uxoft
Tra
inin
g 2
012
18 © L
uxoft
Tra
inin
g 2
012
Тестирование DWH
SQL(DDL, DML, DCL) и их использование в
тестировании
Tips and tricks
CREATE SEQUENCE Name [START WITH first value]
[INCREMENT BY increment_value];
SEQUENCE
PARTITION BY
SELECT col1, col2, SUM(col3) sum_col3
FROM Table
GROUP BY col1, col2;
SELECT id, col1, col2, SUM(col3)
OVER (PARTITION BY col1, col2) sum_col3
FROM Table;
ROW_NUMBER
RANK
SELECT *, ROW_NUMBER() OVER(ORDER BY type)
num, RANK() OVER(ORDER BY type) rnk
FROM WORK_PRN
code model color type price num rnk
1 1276 n Laser 259 3 3
2 1433 y Jet 302 1 1
3 1434 y Jet 243 2 1
4 1401 n Matrix 139 5 5
5 1408 n Matrix 280 6 5
6 1288 n Laser 402 4 3
20 © L
uxoft
Tra
inin
g 2
012
20 © L
uxoft
Tra
inin
g 2
012
Тестирование DWH
Questions