data warehouse components

25

Upload: ailis

Post on 06-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Data Warehouse Components. Overview of the Components. Source Data Component Production data Internal data Archive data External data Data staging component Extraction Transformation Cleaning standardization Loading Data storage component Information delivery component - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Warehouse  Components
Page 2: Data Warehouse  Components

• Source Data Component•Production data•Internal data•Archive data•External data

• Data staging component•Extraction•Transformation

•Cleaning•standardization

•Loading• Data storage component• Information delivery component• Metadata component• Management and control component

Overview of the Components

Page 3: Data Warehouse  Components

Architectural Framework

Page 4: Data Warehouse  Components

Data AcquisitionYou are the data analyst on the project team building a DW for an insurance company. List the possible data sources from which you will bring data into DWProduction data: data from various operational systemsExternal data: for finding trends and comparisons against other organizations. Internal data: private confidential data important to an organizationArchived data:for getting some historical information

Page 5: Data Warehouse  Components

Architectural Framework

Page 6: Data Warehouse  Components

Data StagingPerforms ETL

Extraction Select data sources, determine filters Automatic replicate Create intermediary files

Transformation Clean, merge, de-duplicate data Covert data types Calculate derived data Resolve synonyms and homonyms

Loading Initial loading Incremental loading

Page 7: Data Warehouse  Components

Why is a separate data staging area required?Data is across various operational databases It should be subject-oriented dataData staging is mandatory

Page 8: Data Warehouse  Components

Architectural Framework

Page 9: Data Warehouse  Components

Characteristics of data storage areaSeparate repositoryData content

Read onlyIntegratedHigh volumesGrouped by business subjects

Metadata drivenData from DW is aggregated in MDDBs

Page 10: Data Warehouse  Components

Architectural Framework

Page 11: Data Warehouse  Components

Information delivery componentDepends on the user

Novice user: prefabricated reports, preset queries

Casual user: once in a while information business analyst: complex analysisPower users: picks up interesting data

Page 12: Data Warehouse  Components

Information delivery component

Page 13: Data Warehouse  Components

Architectural Framework

Page 14: Data Warehouse  Components

Metadata componentData about data in the datawarehouseMetadata can be of 3 types

Operational metadata: contains information about operational data sources

Extraction and transformation metadata: Details pertaining to extraction frequencies, extraction methods, business rules for data extraction

End-user metadata: navigational map of DW

Page 15: Data Warehouse  Components

Why is metadata especially important in a data warehouse? It acts as the glue that connects all parts of

the data warehouse. It provides information about the contents

and structures to the developers. It opens the door to the end-users and makes

the contents recognizable in their own terms.

Page 16: Data Warehouse  Components
Page 17: Data Warehouse  Components

Management and ControlSits on top of all components

Coordinates the services and activities within the DW

Controls the data transformation and transfer in DW storage

Page 18: Data Warehouse  Components

Summing upData warehouse building blocks or

components are: source data, data staging, data storage, information delivery, metadata, and management and control.

In a data warehouse, metadata is especially significant because it acts as the glue holding all the components together and serves as a roadmap for the end-users.

Page 19: Data Warehouse  Components

Doubts????????????????

Page 20: Data Warehouse  Components
Page 21: Data Warehouse  Components

Case study 1As a senior analyst on DW project of a large

retail chain, you are responsible for improving data visualization of the output results. Make a list of recommendations

Page 22: Data Warehouse  Components
Page 23: Data Warehouse  Components

Parallel processingPerformance of DW may be improved using

parallel processing with appropriate hardware and software options.

Parallel processing optionsSymmetric multiprocessingMassively parallel processingclusters

Page 24: Data Warehouse  Components

DW with ERP packages

Page 25: Data Warehouse  Components