data warehousing and olap

49

Upload: tucker-wiley

Post on 03-Jan-2016

44 views

Category:

Documents


4 download

DESCRIPTION

Data Warehousing and OLAP. Definition. Data Warehouse A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented Data warehouse is organized around the key subjects of the enterprise - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Warehousing and OLAP
Page 2: Data Warehousing and OLAP
Page 3: Data Warehousing and OLAP

Data Warehouse A subject-oriented, integrated, time-variant, non-

updatable collection of data used in support of management decision-making processes› Subject-oriented

Data warehouse is organized around the key subjects of the enterprise e.g. customers, patients, students, products

› Integrated: Consistent naming conventions, formats, encoding structures; from

multiple data sources› Time-variant:

Data contain a time dimension: may be used to study trends and changes

› Non-updatable: Read-only, periodically refreshed

Data Mart› A data warehouse that is limited in scope

Page 4: Data Warehousing and OLAP

Data warehousing is the process whereby organizations create and maintain data warehouses and extract meaning and inform decision making from their informational assets through these data warehouse

Page 5: Data Warehousing and OLAP

Introduction

Applications that data warehouse supports are:› OLAP (Online Analytical Processing) is a term used to

describe the analysis of complex data from the data warehouse.

› DSS (Decision Support Systems) also known as EIS (Executive Information Systems) supports organization’s leading decision makers for making complex and important decisions.

› Data Mining is used for knowledge discovery, the process of searching data for unanticipated new knowledge.

Page 6: Data Warehousing and OLAP

1. A business requires integrated, company-wide view of high-quality information (from different databases)

2. IS department must separate informational from operational systems to improve performance in managing company.

Page 7: Data Warehousing and OLAP

For decision making: necessary to provide a single, corporate view of the information

Example of the difficulty of deriving a single corporate view

Page 8: Data Warehousing and OLAP

Examples of heterogeneous dataExamples of heterogeneous data

From Class Registration SystemFrom Class Registration System

From Personnel SystemFrom Personnel System

From Health Centre SystemFrom Health Centre System

Page 9: Data Warehousing and OLAP

Issues need to be resolved:› Inconsistent key structures› Synonyms› Free-form fields versus structured fields› Inconsistent data values› Missing data

Page 10: Data Warehousing and OLAP

Why organizations need to bring data together from various systems of record?› More profitable› More competitive› To grow by adding value for customers

Accomplished by:› Increasing speed and flexibility of decision making› Improving business processes› Gaining a clear understanding of customer behavior

Page 11: Data Warehousing and OLAP

Operational system:› a system that is used to run a business in real time,

based on current data; also called a system of record› Must process large volumes of relatively simple

read/write transactions, while providing fast response.› Example: sales order processing, reservation systems

Informational system› a system designed to support decision making based on

historical point-in-time and prediction data for complex queries or data-mining applications

› Example: Sales trend analysis, customer segmentation

Page 12: Data Warehousing and OLAP
Page 13: Data Warehousing and OLAP

Data Warehouse ArchitecturesData Warehouse Architectures

Independent Data MartIndependent Data Mart Dependent Data Mart and Operational Dependent Data Mart and Operational

Data StoreData Store Logical Data Mart and Real-Time Data Logical Data Mart and Real-Time Data

WarehouseWarehouse Three-Layer architectureThree-Layer architecture

All involve some form of All involve some form of extractionextraction, , transformationtransformation and and loadingloading ( (ETL)ETL)

Page 14: Data Warehousing and OLAP

Data Mart

A data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform and load processes from data source systems.

Page 15: Data Warehousing and OLAP

Independent Data MartIndependent Data Mart A data mart filled with data extracted from

the operational environment without benefit of a data warehouse

Four basic steps:1. Data are extracted from various internal and

external source system files and databases2. Data are transformed and integrated before

being loaded into the data marts Transactions may be sent to the source systems to

correct errors discovered in data staging Data Warehouse collection of data marts

Page 16: Data Warehousing and OLAP

Independent Data MartIndependent Data Mart Four basic steps (continue):

3. Data warehouse is a set of physically distinct databases organized for decision support. Contains both detailed and summary data

4. Users access the data warehouse by means of a variety of query languages and analytical tools.

Results may be fed back to data warehouse and operational databases.

Page 17: Data Warehousing and OLAP

Independent data mart data warehousing architectureIndependent data mart data warehousing architecture

Data marts:Data marts:Mini-warehouses, limited in scope

E

T

L

Separate ETL for each independent data mart

Data access complexity due to multiple data marts

Page 18: Data Warehousing and OLAP

Independent Data MartIndependent Data Mart

Several limitations:1. A separate ETL processes is developed for each data

mart 2. Data marts may not be consistent with one another3. No capability to drill down into greater detail or

into related facts in other data marts4. Scaling costs are excessive because every new

application, which creates a separate data mart, repeats all the extract and load steps.

5. Cost to make the separate data marts consistent are quite high.

Page 19: Data Warehousing and OLAP

Dependent Data Mart and Dependent Data Mart and Operational Data StoreOperational Data Store

Operational Data Store:› An integrated, subject-oriented, continuously updatable,

current-valued (with recent history), enterprise-wide, detailed database designed to serve operational users as they do decision making

Enterprise Data Warehouse (EDW):› A centralized, integrated data warehouse that is the control

point and single source of all data made available to end users for decision support applications

Dependent Data Mart (from EDW):› A data mart filled exclusively from the enterprise data

warehouse and its reconciled

Page 20: Data Warehousing and OLAP

Dependent data mart with operational data store:Dependent data mart with operational data store: a three-level architecturea three-level architecture

ET

L

Single ETL for enterprise data warehouse (EDW)(EDW)

Simpler data access

ODS ODS provides option for obtaining current data

Dependent data marts loaded from EDW

Page 21: Data Warehousing and OLAP

Logical Data Mart and Real-Time Logical Data Mart and Real-Time Data WarehouseData Warehouse

Logical data mart:› A data mart created by a relational view of a data

warehouse. Real-Time Data Warehouse:

› An enterprise data warehouse that accepts near-real-time feeds of transactional data from the systems of record, analyzes warehouse data, and in near-real-time relays business rules to the data warehouse and systems of record so that immediate action can be taken in response to business events.

Page 22: Data Warehousing and OLAP

E

T

L

Near real-time ETL for Data WarehouseData Warehouse

ODS ODS and data warehousedata warehouse are one and the same

Data marts are NOT separate databases, but logical views of the data warehouse Easier to create new data marts

Logical data mart and real time warehouse architectureLogical data mart and real time warehouse architecture

Page 23: Data Warehousing and OLAP

Data Warehouse Versus Data MartData Warehouse Versus Data Mart

Page 24: Data Warehousing and OLAP

Three-Layer architectureThree-Layer architecture

Operational data are stored in the various operational systems of record throughout the organization

Reconciled data are the type of data stored in the enterprise data warehouse and an operational data store› Reconciled data: detailed, current data intended to be the Reconciled data: detailed, current data intended to be the

single, source for all decision support applicationssingle, source for all decision support applications Derived data are the type of data stored in each of the data

marts› Derived data: data that have been selected, formatted and Derived data: data that have been selected, formatted and

aggregated for end-user decision support applications.aggregated for end-user decision support applications.

Page 25: Data Warehousing and OLAP

Three-layer data architecture for a data warehouseThree-layer data architecture for a data warehouse

Page 26: Data Warehousing and OLAP

Three-Layer architecture: Role of the Three-Layer architecture: Role of the Enterprise Data ModelEnterprise Data Model

Enterprise Data Model: Presents a total picture explaining the data required by an organization.

Reconciled Data: must conform to the design specified in the EDM

EDM: controls the phased evolution of the DW

Page 27: Data Warehousing and OLAP

Three-Layer architecture: Role of Three-Layer architecture: Role of MetadataMetadata

Metadata: technical and business data that describe the properties or characteristics of other data› Operational metadata

Describe the data in the various operational systems (including the external data) that feed the EDW

› EDW metadata Derived from EDM. Describe the reconciled data layer

as well as the rules for extracting, transforming and loading operational data into reconciled data

› Data mart metadata Described the derived data layer and the rules for

transforming reconciled data to derived data

Page 28: Data Warehousing and OLAP

Data Characteristics: Data Characteristics: Status vs. Event DataStatus vs. Event Data

Status

Status

Event = a database action (create/update/delete) that results from a transaction

Example of DBMS log entryExample of DBMS log entry

Page 29: Data Warehousing and OLAP

Data Characteristics: Data Characteristics: Transient vs. Transient vs. Periodic DataPeriodic Data

With transient data, changes to existing records are written over previous records, thus destroying the previous data content

Transient operational data

Page 30: Data Warehousing and OLAP

Data Characteristics: Data Characteristics: Transient vs. Transient vs. Periodic DataPeriodic Data

Periodic data are

never physicall

y altered

or deleted

once they have been added to the store

Periodic warehouse data

Page 31: Data Warehousing and OLAP

Derived Data Derived Data ObjectivesObjectives

› Ease of use for decision support applicationsEase of use for decision support applications› Fast response to predefined user queriesFast response to predefined user queries› Customized data for particular target audiencesCustomized data for particular target audiences› Ad-hoc query supportAd-hoc query support› Data mining capabilitiesData mining capabilities

CharacteristicsCharacteristics› Detailed (mostly periodic) dataDetailed (mostly periodic) data› Aggregate (for summary)Aggregate (for summary)› Distributed (to departmental servers)Distributed (to departmental servers)

Most common data model = dimensional model(usually implemented as a star schema)

Page 32: Data Warehousing and OLAP

A simple database design in which dimensional data are separated from fact or event data.

A dimensional model: another name for star schema

Suited ad hoc queries Not suited to online transaction processing: not

used in operational systems, operational data stores or an EDW.

Page 33: Data Warehousing and OLAP

Components of a star schemastar schemaFact tables contain factual or quantitative data

Dimension tables contain descriptions about the subjects of the business

1:N relationship between dimension tables and fact tables

Excellent for ad-hoc queries, but bad for online transaction processing

Dimension tables are denormalized to maximize performance

Page 34: Data Warehousing and OLAP

Star schema example

Fact table provides statistics for sales broken down by product, period and store dimensions

Page 35: Data Warehousing and OLAP

Figure A: Star schema with sample dataFigure A: Star schema with sample data

Page 36: Data Warehousing and OLAP
Page 37: Data Warehousing and OLAP
Page 38: Data Warehousing and OLAP
Page 39: Data Warehousing and OLAP

Depends on the number of dimensions and the grain of the fact table

Number of rows = product of number of possible values for each dimension associated with the fact table

Example: assume the following for Figure A:

Total rows calculated as follows (assuming only half the products record sales for a given month):

Page 40: Data Warehousing and OLAP

Estimate the size(in bytes) for fact table: Sales › 6 fields – each four bytes› Total size of the fact table:› Total size = 120,000,000 rows x 6 fields x 4 bytes/field

= 2,880,000,000 bytes @ 2.88 gb Total rows (month)

Total rows (daily)› Total rows = 1000 stores x 5000 active products x 720 days

= 3,600,000,000 rows

Page 41: Data Warehousing and OLAP

Multiple Facts Tables› Can improve performance› Often used to store facts for different combinations of dimensions› Conformed dimensions: one or more dimension tables associated

with two or more fact tables for which the dimension tables have the same business meaning and primary key with each fact table.

Page 42: Data Warehousing and OLAP

Factless Facts Tables› No nonkey data, but foreign keys for associated

dimensions› Used for:

Tracking events Inventory coverage

Page 43: Data Warehousing and OLAP
Page 44: Data Warehousing and OLAP

Tools to query and analyze data stored in data warehouses and data marts:› Traditional query and reporting tools› Online Analytical Processing (OLAP), MOLAP, ROLAP› Data Visualization Tools

Data visualization–representing data in graphical/multimedia formats for analysis

› Data Mining Tools Data Mining -Knowledge discovery using a blend of

statistical, AI, and computer graphics techniques

Page 45: Data Warehousing and OLAP
Page 46: Data Warehousing and OLAP

Identify subjects of the data mart Identify dimensions and facts Indicate how data is derived from enterprise data

warehouses, including derivation rules Indicate how data is derived from operational data

store, including derivation rules Identify available reports and predefined queries Identify data analysis techniques (e.g. drill-down) Identify responsible people

Page 47: Data Warehousing and OLAP

The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques

General term for several categories of data warehouse and data mart access tools.

Relational OLAP (ROLAP)› Traditional relational representation› Use variations of SQL and view the database as a

traditional relational database Multidimensional OLAP (MOLAP)

› CubeCube structure› Load data into an intermediate structure , usually a three

or higher dimensional array (hypercube)

Page 48: Data Warehousing and OLAP

OLAP Operations› Cube slicing–come up with 2-D view of data

Page 49: Data Warehousing and OLAP

OLAP Operations› Drill-down–going from summary to more detailed views

Starting with summary data, users can obtain details for particular cells