building the data warehouse: transforming data

48
6 Copyright © Oracle Corporation, 2002. All rights reserved. Building the Data Warehouse: Transforming Data

Upload: jescie-soto

Post on 04-Jan-2016

72 views

Category:

Documents


5 download

DESCRIPTION

Building the Data Warehouse: Transforming Data. Objectives. After completing this lesson, you should be able to do the following: Define transformation Identify possible staging models Identify data anomalies and eliminate them Explain the importance of quality data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building the Data Warehouse: Transforming Data

6Copyright © Oracle Corporation, 2002. All rights reserved.

Building the Data Warehouse: Transforming Data

Page 2: Building the Data Warehouse: Transforming Data

6-2 Copyright © Oracle Corporation, 2002. All rights reserved.

Objectives

After completing this lesson, you should be able to do the following:

• Define transformation

• Identify possible staging models

• Identify data anomalies and eliminate them

• Explain the importance of quality data

• Describe techniques for transforming data

• Design transformation process

• List Oracle’s enhanced features and tools that can be used to transform data

Page 3: Building the Data Warehouse: Transforming Data

6-3 Copyright © Oracle Corporation, 2002. All rights reserved.

Transformation

Transformation eliminates anomalies from operational data:

• Cleans and standardizes

• Presents subject-oriented data

Extract

Warehouse

Load

Operationalsystems

Data Staging Area

Transform:

Clean up

Consolidate

Restructure

Page 4: Building the Data Warehouse: Transforming Data

6-4 Copyright © Oracle Corporation, 2002. All rights reserved.

Possible Staging Models

• Remote staging model

• Onsite staging model

Page 5: Building the Data Warehouse: Transforming Data

6-5 Copyright © Oracle Corporation, 2002. All rights reserved.

Remote Staging Model

LoadWarehouse

LoadWarehouse

Data staging area within the warehouse environment

Data staging area in its own environment

Operationalsystem

Extract

Operationalsystem

Extract

Transform

Staging area

Transform

Staging area

Page 6: Building the Data Warehouse: Transforming Data

6-6 Copyright © Oracle Corporation, 2002. All rights reserved.

On-site Staging Model

Data staging area within the operational environment,possibly affecting the operational system

Extract Load

Warehouse

Operational system

Transform

Staging area

Page 7: Building the Data Warehouse: Transforming Data

6-7 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Anomalies

• No unique key

• Data naming and coding anomalies

• Data meaning anomalies between groups

• Spelling and text inconsistencies

CUSNUM NAME ADDRESS

90233479 Oracle Limited 100 N.E. 1st St.

90233489 Oracle Computing 15 Main Road, Ft. Lauderdale

90234889 Oracle Corp. UK15 Main Road, Ft. Lauderdale, FLA

90345672 Oracle Corp UK Ltd 181 North Street, Key West, FLA

Page 8: Building the Data Warehouse: Transforming Data

6-8 Copyright © Oracle Corporation, 2002. All rights reserved.

Transformation Routines

• Cleaning data

• Eliminating inconsistencies

• Adding elements

• Merging data

• Integrating data

• Transforming data before load

Page 9: Building the Data Warehouse: Transforming Data

6-9 Copyright © Oracle Corporation, 2002. All rights reserved.

Transforming Data: Problems and Solutions

• Multipart keys

• Multiple local standards

• Multiple files

• Missing values

• Duplicate values

• Element names

• Element meanings

• Input formats

• Referential Integrity constraints

• Name and address

Page 10: Building the Data Warehouse: Transforming Data

6-10 Copyright © Oracle Corporation, 2002. All rights reserved.

Multipart Keys Problem

Multipart keys

Country code

Sales territory

Productnumber

Salesperson code

Product code = 12 M 654313 45

Page 11: Building the Data Warehouse: Transforming Data

6-12 Copyright © Oracle Corporation, 2002. All rights reserved.

Multiple Local Standards Problem

• Multiple local standards

• Tools or filters to preprocess

cm

inches

cm USD 600

1,000 GBP

FF 9,990

DD/MM/YY

MM/DD/YY

DD-Mon-YY

Page 12: Building the Data Warehouse: Transforming Data

6-13 Copyright © Oracle Corporation, 2002. All rights reserved.

Multiple Files Problem

• Added complexity of multiple source files

• Start simple

Transformeddata

Multiple source files

Logic to detectcorrect source

Page 13: Building the Data Warehouse: Transforming Data

6-14 Copyright © Oracle Corporation, 2002. All rights reserved.

Missing Values Problem

Solution:

• Ignore

• Wait

• Mark rows

• Extract when time-stamped

If NULL thenfield = ‘A’

A

Page 14: Building the Data Warehouse: Transforming Data

6-15 Copyright © Oracle Corporation, 2002. All rights reserved.

Duplicate Values Problem

Solution:

• SQL self-join techniques

• RDMBS constraint utilitiesACME Inc

ACME Inc

ACME Inc

SQL> SELECT ... 2 FROM table_a, table_b 3 WHERE table_a.key (+)= table_b.key 4 UNION 5 SELECT ... 6 FROM table_a, table_b 7 WHERE table_a.key = table_b.key (+);

Page 15: Building the Data Warehouse: Transforming Data

6-16 Copyright © Oracle Corporation, 2002. All rights reserved.

Element Names Problem

Solution:

Common naming conventions

Customer

Customer

Client

Contact

Name

Page 16: Building the Data Warehouse: Transforming Data

6-17 Copyright © Oracle Corporation, 2002. All rights reserved.

Element Meaning Problem

• Avoid misinterpretation

• Complex solution

• Document meaning in metadata

Customer’s name

Customer_detail

All customerdetails

All detailsexcept name

Page 17: Building the Data Warehouse: Transforming Data

6-18 Copyright © Oracle Corporation, 2002. All rights reserved.

Input Format Problem

ASCIIEBCDIC

12373“123-73”

ACME Co.

áøåëéí äáàéí Beer (Pack of 8)

Page 18: Building the Data Warehouse: Transforming Data

6-19 Copyright © Oracle Corporation, 2002. All rights reserved.

Referential Integrity Problem

Solution:

• SQL anti-join

• Server constraints

• Dedicated tools

Department

10

20

30

40

Emp Name Department

1099 Smith 10

1289 Jones 20

1234 Doe 50

6786 Harris 60

Page 19: Building the Data Warehouse: Transforming Data

6-20 Copyright © Oracle Corporation, 2002. All rights reserved.

Name and Address Problem

• Single-field format

• Multiple-field format

Mr. J. Smith,100 Main St., Bigtown, County Luth, 23565

Database 1

NAME LOCATION

DIANNE ZIEFELD N100

HARRY H. ENFIELD M300

Database 2

NAME LOCATION

ZIEFELD, DIANNE 100

ENFIELD, HARRY H 300

Name Mr. J. Smith

Street 100 Main St.

Town Bigtown

Country County Luth

Code 23565

Page 20: Building the Data Warehouse: Transforming Data

6-22 Copyright © Oracle Corporation, 2002. All rights reserved.

Name and Address Processing in Oracle9i Warehouse Builder

Name and address mapping operator supports:

• Parsing

• Standardization

• Postal matching and geocoding

Page 21: Building the Data Warehouse: Transforming Data

6-24 Copyright © Oracle Corporation, 2002. All rights reserved.

Quality Data: Importance and Benefits

• Quality data: – Key to a successful warehouse implementation

• Quality data helps you in:– Targeting right customers– Determining buying patterns– Identifying householders: private and commercial– Matching customers– Identify historical data

Page 22: Building the Data Warehouse: Transforming Data

6-26 Copyright © Oracle Corporation, 2002. All rights reserved.

Quality: Standards and Improvements

• Setting standards:– Define a quality strategy– Decide on optimal data-quality level

• Improving operational data quality:– Consider modifying rules for operational data– Document the sources– Create a data stewardship program– Design the cleanup process carefully– Initial cleanup and refresh routines

may differ

Page 23: Building the Data Warehouse: Transforming Data

6-28 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Quality Guidelines

Operational data:

• Should not be used directly in the warehouse

• Must be cleaned for each increment

• Is not simply fixed by modifying applications

Page 24: Building the Data Warehouse: Transforming Data

6-30 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Quality: Solutions and Management

Solutions:

• COBOL, Java, 4GL

• Specialized tools

• Customized data conversion process– Investigation– Conditioning and Standardization– Integration

Management:

• Take responsibility

• Resolve problems

• Data quality manager

Page 25: Building the Data Warehouse: Transforming Data

6-31 Copyright © Oracle Corporation, 2002. All rights reserved.

Transformation Techniques

• Merging data

• Adding a Date Stamp

• Adding Keys to Data

Page 26: Building the Data Warehouse: Transforming Data

6-32 Copyright © Oracle Corporation, 2002. All rights reserved.

Merging Data

• Operational transactions do not usually map one-to-one with warehouse data.

• Data for the warehouse is merged to provide information for analysis.

Pizza sales/returns by day, hour, seconds

Sale 1/2/02 12:00:01 Ham Pizza $10.00

Sale 1/2/02 12:00:02 Cheese Pizza $15.00

Sale 1/2/02 12:00:02 Anchovy Pizza $12.00

Return 1/2/02 12:00:03 Anchovy Pizza - $12.00

Sale 1/2/02 12:00:04 Sausage Pizza $11.00

Page 27: Building the Data Warehouse: Transforming Data

6-33 Copyright © Oracle Corporation, 2002. All rights reserved.

Merging Data

Pizza sales

Sale 1/2/02 12:00:01 Ham Pizza $10.00

Sale 1/2/02 12:00:02 Cheese Pizza $15.00

Sale 1/2/02 12:00:04 Sausage Pizza $11.00

Pizza sales/returns by day, hour, seconds

Sale 1/2/02 12:00:01 Ham Pizza $10.00

Sale 1/2/02 12:00:02 Cheese Pizza $15.00

Sale 1/2/02 12:00:02 Anchovy Pizza $12.00

Return 1/2/02 12:00:03 Anchovy Pizza - $12.00

Sale 1/2/02 12:00:04 Sausage Pizza $11.00

Page 28: Building the Data Warehouse: Transforming Data

6-34 Copyright © Oracle Corporation, 2002. All rights reserved.

Adding a Date Stamp

• Time element can be represented as a:– Single point in time– Time span

• Add time element to:– Fact tables– Dimension data

Page 29: Building the Data Warehouse: Transforming Data

6-36 Copyright © Oracle Corporation, 2002. All rights reserved.

Adding a Date Stamp:Fact Tables and Dimensions

Item TableItem_idDept_id

Time_key

Store TableStore_id

District_idTime_key

Sales Fact TableItem_idStore_idTime_key

Sales_dollarsSales_units

Time TableWeek_idPeriod_idYear_id

Time_key

Product TableProduct_idTime_key

Product_desc

Page 30: Building the Data Warehouse: Transforming Data

6-38 Copyright © Oracle Corporation, 2002. All rights reserved.

Adding Keys to Data

#1 Sale 1/2/98 12:00:01 Ham Pizza $10.00

#2 Sale 1/2/98 12:00:02 Cheese Pizza $15.00

#3 Sale 1/2/98 12:00:02 Anchovy Pizza $12.00

#5 Sale 1/2/98 12:00:04 Sausage Pizza $11.00

#4 Return 1/2/98 12:00:03 Anchovy Pizza - $12.00

#dw1 Sale 1/2/98 12:00:01 Ham Pizza $10.00

#dw2 Sale 1/2/98 12:00:02 Cheese Pizza $15.00

#dw3 Sale 1/2/98 12:00:04 Sausage Pizza $11.00

Data values or artificial keys

Page 31: Building the Data Warehouse: Transforming Data

6-39 Copyright © Oracle Corporation, 2002. All rights reserved.

Summarizing Data

1. During extraction on staging area

2. After loading to the warehouse server

Operationaldatabases

Warehousedatabase

Staging area

Page 32: Building the Data Warehouse: Transforming Data

6-41 Copyright © Oracle Corporation, 2002. All rights reserved.

Maintaining Transformation Metadata

Transformation metadata contains:

• Transformation rules

• Algorithms and routines

SourcesExtract

StageTransform

RulesLoad

PublishQuery

Page 33: Building the Data Warehouse: Transforming Data

6-42 Copyright © Oracle Corporation, 2002. All rights reserved.

Maintaining Transformation Metadata

• Restructure keys

• Identify and resolve coding differences

• Validate data from multiple sources

• Handle exception rules

• Identify and resolve format differences

• Fix referential integrity inconsistencies

• Identify summary data

Page 34: Building the Data Warehouse: Transforming Data

6-43 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Ownership and Responsibilities

• Data ownership and responsibilities should be shared by the:– Operational team – Data warehouse team

• Business benefit gained with “work together” approach

Page 35: Building the Data Warehouse: Transforming Data

6-45 Copyright © Oracle Corporation, 2002. All rights reserved.

Transformation Timing and Location

• Transformation is performed:– Before load– In parallel

• Can be initiated at different points:– On the operational platform– In a separate staging area

Page 36: Building the Data Warehouse: Transforming Data

6-46 Copyright © Oracle Corporation, 2002. All rights reserved.

Page 37: Building the Data Warehouse: Transforming Data

6-47 Copyright © Oracle Corporation, 2002. All rights reserved.

Choosing a Transformation Point

• Workload

• Impact on environment

• CPU usage

• Disk space

• Network bandwidth

• Parallel execution

• Load window time

• User information needs

Page 38: Building the Data Warehouse: Transforming Data

6-48 Copyright © Oracle Corporation, 2002. All rights reserved.

Monitoring and Tracking

Transformations should:

• Be self-documenting

• Provide summary statistics

• Handle process exceptions

Page 39: Building the Data Warehouse: Transforming Data

6-49 Copyright © Oracle Corporation, 2002. All rights reserved.

Designing Transformation Processes

• Analysis:– Sources and target mappings, business rules– Key users, metadata, grain

• Design options: – Third-party tools– Custom 3GL programs– 4GLs like SQL or PL/SQL – Replication

• Design issues:– Performance– Size of the staging area– Exception handling, integrity maintenance

Page 40: Building the Data Warehouse: Transforming Data

6-50 Copyright © Oracle Corporation, 2002. All rights reserved.

Transformation Tools

• Third-party tools

• SQL*Loader

• In-house developed programs

Page 41: Building the Data Warehouse: Transforming Data

6-51 Copyright © Oracle Corporation, 2002. All rights reserved.

Oracle’s Enhanced Featuresfor Transformation

Transformation methods

Stagingtable 1

Stagingtable 2

Stagingtable 2

Flat Files

Loading intostaging tables

Merge intowarehouse tables

Multi stage Transformation

Transformdata

Validatedata

Datawarehouse

Page 42: Building the Data Warehouse: Transforming Data

6-52 Copyright © Oracle Corporation, 2002. All rights reserved.

Oracle’s Enhanced Featuresfor Transformation

Transformation methods

Pipelined Transformation

External tables

Flat Files

Externaltable

Table functions

Transformdata

Validatedata

Merge intowarehouse tables

Warehousetables

Page 43: Building the Data Warehouse: Transforming Data

6-53 Copyright © Oracle Corporation, 2002. All rights reserved.

Existingrowupdated

New rowinserted

Oracle’s Enhanced Featuresfor Transformation

Transformation mechanisms

Using SQL:• CREATE TABLES AS SELECT (CTAS)• UPDATE • MERGE

• Multitable INSERT

50

130

50

60

80

130

Cust Customer

Merge

Page 44: Building the Data Warehouse: Transforming Data

6-54 Copyright © Oracle Corporation, 2002. All rights reserved.

Oracle’s Enhanced Featuresfor Transformation

Transformation mechanisms

Sourcetable

Multitable INSERT

Condition

Targettable 1

Targettable 2

Targettable 3

Page 45: Building the Data Warehouse: Transforming Data

6-55 Copyright © Oracle Corporation, 2002. All rights reserved.

Oracle’s Enhanced Featuresfor Transformation

Transformation mechanisms (continued)

• Using PL/SQL:– Used for complex transformations

• Using Table Functions: Table Functions can:– Return multiple rows from a function – Accept results of multiple row SQL subqueries as

input– Take cursors as input – Be parallelized – Support incremental pipelining

Page 46: Building the Data Warehouse: Transforming Data

6-56 Copyright © Oracle Corporation, 2002. All rights reserved.

Page 47: Building the Data Warehouse: Transforming Data

6-57 Copyright © Oracle Corporation, 2002. All rights reserved.

Summary

In this lesson, you should have learned how to:

• Define transformation

• Identify possible staging models

• Identify data anomalies and eliminate them

• Explain the importance of quality data

• Describe techniques for transforming data

• Design transformation process

• List Oracle’s enhanced features and tools that can be used to transform data

Page 48: Building the Data Warehouse: Transforming Data

6-58 Copyright © Oracle Corporation, 2002. All rights reserved.

Practice 6-1 Overview

This practice covers the following topics:

• Answering a series of questions based on the business scenario for Frontier Airways

• Answering a series of short questions