dw - rolap molap holap

48
Data Warehouse Architecture

Upload: veenahbhat

Post on 04-Mar-2015

388 views

Category:

Documents


14 download

TRANSCRIPT

Page 1: Dw - Rolap Molap Holap

Data Warehouse Architecture

Page 2: Dw - Rolap Molap Holap

Decision Support

• Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions– “What were the sales volumes by region and product

category for the last year?”– “How did the share price of comp. manufacturers correlate

with quarterly profits over the past 10 years?”– “Which orders should we fill to maximize revenues?”

• On-line analytical processing (OLAP) is an element of decision support systems (DSS)

Page 3: Dw - Rolap Molap Holap

OLAP Conceptual Data Model

Goal of OLAP is to support ad-hoc querying for the business analyst

Business analysts are familiar with spreadsheets Extend spreadsheet analysis model to work with

warehouse data Multidimensional view of data is the foundation of

OLAP

Page 4: Dw - Rolap Molap Holap

Three-Tier Decision Support Systems

• Warehouse database server– Almost always a relational DBMS, rarely flat files

• OLAP servers– Relational OLAP (ROLAP): extended relational DBMS

that maps operations on multidimensional data to standard relational operators

– Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations

• Clients– Query and reporting tools– Analysis tools– Data mining tools

Page 5: Dw - Rolap Molap Holap

Approaches to OLAP ServersThree possibilities for OLAP servers(1) Relational OLAP (ROLAP)

– Relational and specialized relational DBMS to store and manage warehouse data

– OLAP middleware to support missing pieces(2) Multidimensional OLAP (MOLAP)

– Array-based storage structures– Direct access to array data structures

(3) Hybrid OLAP (HOLAP)

– Storing detailed data in RDBMS– Storing aggregated data in MDBMS– User access via MOLAP tools

Page 6: Dw - Rolap Molap Holap

OLTP vs. OLAP On-Line Transaction Processing (OLTP):

– technology used to perform updates on operational or transactional systems (e.g., point of sale systems)

On-Line Analytical Processing (OLAP): – technology used to perform complex analysis of the

data in a data warehouseOLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the dimensionality of the enterprise as understood by the user. [source: OLAP Council: www.olapcouncil.org]

Page 7: Dw - Rolap Molap Holap

OLTP vs. OLAP

• Clerk, IT Professional• Day to day operations

• Application-oriented (E-R based)

• Current, Isolated• Detailed, Flat relational• Structured, Repetitive• Short, Simple transaction• Read/write• Index/hash on prim. Key• Tens• Thousands• 100 MB-GB• Trans. throughput

• Knowledge worker• Decision support

• Subject-oriented (Star, snowflake)

• Historical, Consolidated• Summarized, Multidimensional• Ad hoc• Complex query• Read Mostly• Lots of Scans• Millions• Hundreds• 100GB-TB• Query throughput, response

User

Function

DB Design

Data

View

Usage

Unit of work

Access

Operations

# Records accessed

#Users

Db size

Metric

OLTPOLTP OLAPOLAP

Source: Datta, GT

Page 8: Dw - Rolap Molap Holap

Approaches to OLAP Servers

• Multidimensional OLAP (MOLAP)– Array-based storage structures– Direct access to array data structures– Example: Essbase (Arbor)

• Relational OLAP (ROLAP)– Relational and Specialized Relational DBMS to store and

manage warehouse data– OLAP middleware to support missing pieces

• Optimize for each DBMS backend• Aggregation Navigation Logic• Additional tools and services

– Example: Microstrategy, MetaCube (Informix)

Page 9: Dw - Rolap Molap Holap

ROLAP

• Special schema design: star, snowflake

• Special indexes: bitmap, multi-table join

• Special tuning: maximize query throughput

• Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets

• Products– IBM DB2, Oracle, Sybase IQ, RedBrick,

Informix

Page 10: Dw - Rolap Molap Holap

Points to be noticed about ROLAP

• Defines complex, multi-dimensional data with simple model

• Reduces the number of joins a query has to process• Allows the data warehouse to evolve with rel. low

maintenance• Can contain both detailed and summarized data.• ROLAP is based on familiar, proven, and already

selected technologies.

BUT!!!• SQL for multi-dimensional manipulation of

calculations.

Page 11: Dw - Rolap Molap Holap

MOLAP

• MDDB: a special-purpose data model

• Facts stored in multi-dimensional arrays

• Dimensions used to index array

• Sometimes on top of relational DB

• Products– Pilot, Arbor Essbase, Gentia

Page 12: Dw - Rolap Molap Holap

Multidimensional Data

1010

4747

3030

1212

JuiceJuice

ColaCola

Milk Milk

CreaCreamm

NYNY

LALA

SFSF

Sales Sales Volume Volume as a as a functiofunction of n of time, time, city city and and producproductt3/1 3/2 3/3 3/1 3/2 3/3

3/43/4

DateDate

Page 13: Dw - Rolap Molap Holap

Operations in Multidimensional Data Model

• Aggregation (roll-up)– dimension reduction: e.g., total sales by city– summarization over aggregate hierarchy: e.g., total sales by city

and year -> total sales by region and by year• Selection (slice) defines a subcube

– e.g., sales where city = Palo Alto and date = 1/15/96• Navigation to detailed data (drill-down)

– e.g., (sales - expense) by city, top 3% of cities by average income

• Visualization Operations (e.g., Pivot)

Page 14: Dw - Rolap Molap Holap

A Visual Operation: Pivot (Rotate)

1010

4747

3030

1212

JuiceJuice

ColaCola

Milk Milk

CreaCreamm

NYNY

LALA

SFSF

3/1 3/2 3/3 3/1 3/2 3/3 3/43/4

DateDate

Month

Month

Reg

ion

Reg

ion

ProductProduct

Page 15: Dw - Rolap Molap Holap

Advantages of ROLAP Dimensional Modeling

• Define complex, multi-dimensional data with simple model

• Reduces the number of joins a query has to process

• Allows the data warehouse to evolve with rel. low maintenance

• HOWEVER! Star schema and relational DBMS are not the magic solution– Query optimization is still problematic

Page 16: Dw - Rolap Molap Holap

Aggregates

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1

81

Page 17: Dw - Rolap Molap Holap

Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date

ans date sum1 812 48

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Page 18: Dw - Rolap Molap Holap

Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId

sale prodId date amtp1 1 62p2 1 19p1 2 48

drill-down

rollup

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Page 19: Dw - Rolap Molap Holap

Aggregates• Operators: sum, count, max, min,

median, ave

• “Having” clause

• Using dimension hierarchy– average by region (within store)– maximum by month (within date)

Page 20: Dw - Rolap Molap Holap

ROLAP vs. MOLAP• ROLAP:

Relational On-Line Analytical Processing

• MOLAP:Multi-Dimensional On-Line Analytical Processing

Page 21: Dw - Rolap Molap Holap

The MOLAP Cube

sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8

s1 s2 s3p1 12 50p2 11 8

Fact table view: Multi-dimensional cube:

dimensions = 2

Page 22: Dw - Rolap Molap Holap

3-D Cube

dimensions = 3

Multi-dimensional cube:Fact table view:

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

Page 23: Dw - Rolap Molap Holap

Example

Store

Pro

duct

Time

M T W Th F S S

Juice

Milk

Coke

Cream

Soap

Bread

NYSF

LA

10

34

56

32

12

56

56 units of bread sold in LA on M

Dimensions:Time, Product, Store

Attributes:Product (ucp, price, …)Store ……

Hierarchies:Product Brand …Day Week QuarterStore Region Country

roll-up to week

roll-up to brand

roll-up to region

Page 24: Dw - Rolap Molap Holap

Cube Aggregation: Roll-up

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

. . .

drill-down

rollup

Example: computing sums

Page 25: Dw - Rolap Molap Holap

Cube Operators for Roll-up

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

. . .

sale(s1,*,*)

sale(*,*,*)sale(s2,p2,*)

Page 26: Dw - Rolap Molap Holap

s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129

Extended Cube

day 2 s1 s2 s3 *p1 44 4 48p2* 44 4 48s1 s2 s3 *

p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Page 27: Dw - Rolap Molap Holap

Aggregation Using Hierarchies

region A region Bp1 56 54p2 11 8

store

region

country

(store s1 in Region A;stores s2, s3 in Region B)

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

Page 28: Dw - Rolap Molap Holap

Slicing

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 12 50p2 11 8

TIME = day 1

Page 29: Dw - Rolap Molap Holap

Productsd1 d2

Store s1 Electronics $5.2Toys $1.9

Clothing $2.3Cosmetics $1.1

Store s2 Electronics $8.9Toys $0.75

Clothing $4.6Cosmetics $1.5

ProductsStore s1 Store s2

Store s1 Electronics $5.2 $8.9Toys $1.9 $0.75

Clothing $2.3 $4.6Cosmetics $1.1 $1.5

Store s2 ElectronicsToys

Clothing

($ millions)d1

Sales($ millions)

Time

Sales

Slicing &Pivoting

Page 30: Dw - Rolap Molap Holap

Summary of Operations• Aggregation (roll-up)

– aggregate (summarize) data to the next higher dimension element

– e.g., total sales by city, year total sales by region, year• Navigation to detailed data (drill-down)• Selection (slice) defines a subcube

– e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’• Calculation and ranking

– e.g., top 3% of cities by average income• Visualization operations (e.g., Pivot)• Time functions

– e.g., time average

Page 31: Dw - Rolap Molap Holap

Points to be noticed about MOLAP

• Pre-calculating or pre-consolidating transactional data improves speed.

BUTFully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB

MDDs are great candidates for the <50GB department data marts.

• Rolling up and Drilling down through aggregate data.

• With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.

Page 32: Dw - Rolap Molap Holap

ROLAP

Page 33: Dw - Rolap Molap Holap

Relational DBMS as Warehouse Server

• Schema design• Specialized scan, indexing and join

techniques• Handling of aggregate views (querying and

materialization)• Supporting query language extensions

beyond SQL• Complex query processing and optimization• Data partitioning and parallelism

Page 34: Dw - Rolap Molap Holap

MOLAP vs. OLAP

• Commercial offerings of both types are available

• In general, MOLAP is good for smaller warehouses and is optimized for canned queries

• In general, ROLAP is more flexible and leverages relational technology on the data server and uses a ROLAP server as intermediary. May pay a performance penalty to realize flexibility

Page 35: Dw - Rolap Molap Holap

The MOLAP Cube

sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8

s1 s2 s3p1 12 50p2 11 8

Fact table view: Multi-dimensional cube:

dimensions = 2

Page 36: Dw - Rolap Molap Holap

Hybrid OLAP (HOLAP)

• HOLAP = Hybrid OLAP:

– Best of both worlds

– Storing detailed data in RDBMS

– Storing aggregated data in MDBMS

– User access via MOLAP tools

Page 37: Dw - Rolap Molap Holap

Multi-dimensional

accessMultidimensional

Viewer

RelationalViewer

ClientMDBMS Server

Multi-dimensional

data

SQL-Read

RDBMS Server

Userdata Meta data

Deriveddata

SQL-Reach Through

SQL-Read

Data Flow in HOLAP

Page 38: Dw - Rolap Molap Holap

When deciding which technology to go for, consider:

1) Performance:

• How fast will the system appear to the end-user?

• MDD server vendors believe this is a key point in their favor.

2) Data volume and scalability:

• While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes.

Page 39: Dw - Rolap Molap Holap

An experiment with Relational and the Multidimensional models on a data set

The analysis of the author’s example illustrates the following differences between the best Relational alternative and the Multidimensional approach.

* This may include the calculation of many other derived data without any additional I/O.

Reference: http://dimlab.usc.edu/csci599/Fall2002/paper/I2_P064.pdf

relational Multi-dimensional

Improvement

Disk space requirement

(Gigabytes)

17 10 1.7

Retrieve the corporate measures

Actual Vs Budget, by month (I/O’s)

240 1 240

Calculation of Variance Budget/Actual for the whole database (I/O time in hours)

237 2* 110*

Page 40: Dw - Rolap Molap Holap

What-if analysisIF

A. You require write access B. Your data is under 50 GBC. Your timetable to implement is 60-90 daysD. Lowest level already aggregatedE. Data access on aggregated levelF. You’re developing a general-purpose application for inventory movement or assets management

THENConsider an MDD /MOLAP solution for your data mart

 IF

A. Your data is over 100 GBB. You have a "read-only" requirementC. Historical data at the lowest level of granularityD. Detailed access, long-running queriesE. Data assigned to lowest level elements

THENConsider an RDBMS/ROLAP solution for your data mart.

IFA. OLAP on aggregated and detailed dataB. Different user groupsC. Ease of use and detailed data

THENConsider an HOLAP for your data mart

Page 41: Dw - Rolap Molap Holap

Examples

• ROLAP– Telecommunication startup: call data records (CDRs) – ECommerce Site– Credit Card Company

• MOLAP– Analysis and budgeting in a financial department– Sales analysis

• HOLAP– Sales department of a multi-national company– Banks and Financial Service Providers

Page 42: Dw - Rolap Molap Holap

Tools: Warehouse Servers

The RDBMS dominates: Oracle 8i/9i IBM DB2 Microsoft SQL Server Informix (IBM) Red Brick Warehouse (Informix/IBM) NCR Teradata Sybase…

Page 43: Dw - Rolap Molap Holap

Tools: OLAP Servers Support multidimensional OLAP queries Often characterized by how the underlying data stored Relational OLAP (ROLAP) Servers

Data stored in relational tables Examples: Microstrategy Intelligence Server, MetaCube

(Informix/IBM) Multidimensional OLAP (MOLAP) Servers

Data stored in array-based structures Examples: Hyperion Essbase, Fusion (Information Builders)

Hybrid OLAP (HOLAP) Examples: PowerPlay (Cognos), Brio, Microsoft Analysis

Services, Oracle Advanced Analytic Services

Page 44: Dw - Rolap Molap Holap

• DOLAP:–Brio.Enterprise–BusinessObjects–Cognos PowerPlay

• MOLAP–SAS CFO Vision –Comshare Decision–Hyperion Essbase–PowerPlay Enterprise Server

• ROLAP–Cartesis Carat–MicroStrategy

• HOLAP–Oracle Express–Seagate Holos–Speedware Media/M–Microsoft OLAP Services

This list is neither all inclusive nor complete. Product classification and vendor classification might vary.

Source: OLAP architectures, http://www.olapreport.com/Architectures.htm

Page 45: Dw - Rolap Molap Holap

Tools: Extraction, Transformation, & Load (ETL)

Cognos Accelerator Copy Manager, Data Migrator for SAP,

PeopleSoft (Information Builders) DataPropagator (IBM) ETI Extract (Evolutionary Technologies) Sagent Solution (Sagent Technology) PowerMart (Informatica)…

Page 46: Dw - Rolap Molap Holap

Tools: Report & Query

Actuate e.Reporting Suite (Actuate) Brio One (Brio Technologies) Business Objects Crystal Reports (Crystal Decisions) Impromptu (Cognos) Oracle Discoverer, Oracle Reports QMF (IBM) SAS Enterprise Reporter…

Page 47: Dw - Rolap Molap Holap

Tools: Data Mining

BusinessMiner (Business Objects) Decision Series (Accrue) Enterprise Miner (SAS) Intelligent Miner (IBM) Oracle Data Mining Suite Scenario (Cognos)…

Page 48: Dw - Rolap Molap Holap

– www.microstrategy.com

– www.businessobjects.com

– www.cognos.com

– www.brio.com

– www.hyperion.com

– www.oracle.com/ip/analyze/warehouse/bus_intell

– www.microsoft.com/sql/techinfo/datawarehousing.htm