decision support systems1 from transaction processing to support for decision making cis 671

64
Decision Support Systems 1 From Transaction Processing to Support for Decision Making CIS 671

Upload: silvia-dorsey

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 1

From Transaction Processingto

Support for Decision Making

CIS 671

Page 2: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 2

Computerized Information Systems

• Used to “run the business”.• OSU Examples

– Personnel & Payroll (ARMS)– Course Offerings– Students, including course enrollments and grades

• (estimated $30M to replace)

– Inventory

• Transaction Processing

Page 3: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 3

1st Generation DBMS

• Designed for Transaction Processing– Hierarchical – IBM – IMS– Network

• Management Information Systems – Added later– Mostly standard summary reports

• Produced on a regular basis

Page 4: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 4

Relational DBMS

• Codd – particularly designed for “ad hoc” queries

• First uses for Transaction Processing

• Transaction Data now available on-line– Use it to help Decision Making– Ad Hoc

Page 5: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 5

Decision Support Systems (DSS)• Use comprehensive view of all aspects of

business.– Different business units– Historical data– Summary information

• Classes of analysis tools:– Complex “traditional” SQL queries– Many “group-by” and “aggregation” queries

(On Line Analytical Processing)– Exploratory data analysis - Data Mining

Page 6: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 6

Data Warehousing

• Properties– Consolidated data from many sources– Spanning long time periods– Augmented with summary information

• Size: several gigabytes to terabytes

Page 7: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 7

Data Warehouse Creation

• Integrate schemas from different groups– Semantic mismatches

• Different currencies

• Different names for same attributes

• Different structures for similar tables

Page 8: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 8

Data Warehouse Creation, cont.• Extract data from different operational

databases and other external sources– Clean data - correct errors, fill in missing data– Transform data to match integrated schema– Load data into warehouse– Refresh data in a timely fashion– Purge very old data– Create metadata repository

• May be so large that it is in a separate database

Page 9: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 9

Data Warehouse - Provide Variety of Analytical Tools

– Complex “traditional” SQL queries– OLAP query engine– Data mining algorithm– Information visualization tools– Statistical packages– Report generators

Page 10: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 10

Data Mart

• Departmental subset of a data warehouse• Top-down approach

– Derive from the organization’s data warehouse

– May be too hard to do all at once

• Bottom-up approach– Initially create departmental data marts

– Integrate data marts into organizational data warehouse

– If not done carefully, may be hard to integrate

Page 11: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 11

OLTP vs. Data Warehouse DBs(from Toby J. Teorey, Database Modeling & Design, Morgan Kaufmann, 1999, p. 212)

OLTP

• Transaction oriented

• Thousands of users

• Small (MB to several GB)

• Current data

• Normalized data (many tables, few columns per table)

• Continuous update

• Simple to complex queries

Data Warehouse

• Subject oriented

• Few users ( 100)

• Large (hundreds of GB to several TB)

• Historical data

• Denormalized data (few tables, many columns per table)

• Batch updates

• Usually very complex queries

Page 12: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 12

Complex “traditional” SQL queries

• Relational DBMS optimized for decision support– in contrast to a DBMS optimized for

transaction processing

• Example:– Teradata machine from NCR

Page 13: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 13

On Line Analytical Processing (OLAP)

Multidimensional Databases (MDD)

Page 14: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 14

Example from Finkelstein [Fink95]:

• Note that

Branch, ProdID, Date Sales, Returns

• Note the multidimensionality of the SALES_INFO table.

SALES_INFOBranch ProdID Date Sales ReturnsBOS 1 1/2/98 $1,000.00 4NY 1 1/2/98 $1,222.00 2CMH 2 1/3/98 $555.00 1SF 2 1/3/98 $1,777.00 9

PROD_INFOProdID Description Category

1 Widget I2 Super Widget II

BRANCH_INFOBranch RegionBOS ANY ACMH BSF C

REGION_INFORegion TerritoryA EastB EastC West

Page 15: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 15

Dimension Hierarchies

LOCATION

Territory

Region

Branch

TIMEYear

Quarter

Week Month

Date

PRODUCT

Category

ProdID

Page 16: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 16

Possible queries:1. How did product Widget sell in the last

month, and how does this figure compare with sales over the last five years? How about by branch, region and territory?

2. Did this product sell better in different regions, and are there any regional trends?

3. Were there more returns of Widgets over the last year? Were these returns caused by defects? Were they manufactured in any particular plants?

Page 17: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 17

Additional Possible query:4. Do commissions and pricing affect how

sales persons sell the product? Do particular salespersons do a better job of selling the product?

Note that a "multidimensional" spreadsheet would be useful.

Codd called this type of problem On Line Analytical Processing (OLAP)

in contrast to On Line Transaction Processing (TP).

Page 18: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 18

Codd's rules for OLAP: [Codd93]1. Multi-Dimensional Concept View

The user should be able to see the data as being multidimensional insofar as it should be easy to 'pivot' or 'slice and dice’. (See later.)

2. Transparency

The OLAP functionality should be provided behind the user's existing software without adversely affecting the functionality of the 'host'.

3. Accessibility

OLAP should allow the user to access diverse data stores but see the data within a common 'schema' provided by the OLAP tool.

Page 19: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 19

OLAP Rules, cont.4. Consistent Reporting Performance

There should not be significant degradation in performance with large numbers of dimensions or large quantities of data.

5. Client-Server Architecture

Since much of the data is on mainframes, and the users work on PCs, the OLAP tool must be able to bring the two together!

6. Generic Dimensionality

Data dimensions must all be treated equally. Functions available for one dimension must be available for others.

Page 20: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 20

OLAP Rules, cont.7. Dynamic Sparse Matrix Handling

The OLAP tool should be able to work out for itself the most efficient way to store sparse matrix data.

8. Multi User SupportThis is self-evident.

9. Unrestricted Cross-Dimensional Operationse.g., individual office overheads are allocated according to total corporate overheads divided in proportion to individual office sales.

Page 21: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 21

OLAP Rules, cont.

10. Intuitive Data ManipulationNavigation should be done by operations on individual cells rather than menus.

11. Flexible ReportingRow and column headings must be capable of more than one dimension each, and of displaying subsets of any dimension.

12. Unlimited Dimensions and Aggregation LevelsAt least 15 dimensions may be required, and within each there may be many hierarchical levels.

Page 22: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 22

Example from Finkelstein [Fink95]:

• Note that

Branch, ProdID, Date Sales, Returns

• Note the multidimensionality of the SALES_INFO table.

SALES_INFOBranch ProdID Date Sales ReturnsBOS 1 1/2/98 $1,000.00 4NY 1 1/2/98 $1,222.00 2CMH 2 1/3/98 $555.00 1SF 2 1/3/98 $1,777.00 9

PROD_INFOProdID Description Category

1 Widget I2 Super Widget II

BRANCH_INFOBranch RegionBOS ANY ACMH BSF C

REGION_INFORegion TerritoryA EastB EastC West

Page 23: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 23

“Pivoting”Cross Tabulation

Sales by Date and Region

RegionA B C Total

Date 1/2/98 $2,222 $0 $0 $2,2221/3/98 $0 $555 $1,777 $2,332Total $2,222 $555 $1,777 $4,554

Page 24: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 24

“Drill Down”(narrower category)

Replace Region by Branch.Region

A B C TotalDate 1/2/98 $2,222 $0 $0 $2,222

1/3/98 $0 $555 $1,777 $2,332Total $2,222 $555 $1,777 $4,554

“Rollup” (more general category)Replace Region by Territory.

Page 25: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 25

OLAP Questions

1. Query language - how to say what's wanted.

2. Processing language - how to specify calculations: ratios, variances, . . . .

3. Data visualization - how to see the data.

4. Performance - time to process the query (5 second rule).

Page 26: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 26

OLAP References

[Codd93] E. F. Codd, S. B. Codd, and C.T. Salley, "Providing OLAP to User Analysts: An IT Mandate," Codd & Date Inc., 1993.

[Fink95] Richard Finkelstein, "MDD: Database Reaches the Next Dimension," DATABASE Programming and Design, 8(4), April 1995.

Page 27: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 27

Exploratory Data Analysis Data Mining

• Find interesting trends or patterns in large data sets.

• Statistics - Exploratory Data Analysis

• Artificial Intelligence - Knowledge Discovery and Machine Learning

• Much larger data sets

Page 28: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 28

Mining for Association Rules

• Classic example

• Market basket analysis– Record each customer transaction at a grocery

store.– Try and identify sets of items purchased

together.

Page 29: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 29

TransID Item111 coke111 chips111 dip112 coke112 chips112 veggies113 coke113 beef113 chicken114 chips114 beef115 chips115 chicken

Association Rule:{coke} {chips}

People who buy coke usually buy chips.

Measures for Association Rule{LHS} {RHS}

• Support: % of transactions containing this set of items. (2/5=40%)

• Confidence: given all transactions containing LHS items, the % that also contain the RHS (2/3=67%)

Want both to be “reasonably” large.

Page 30: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 30

On-Line Analytical Processing (OLAP)Part II:

CIS 671

Elmasri & Navathe §26.1

Page 31: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 31

Multi-dimensional View of Data

• Fact Table (also called cubes)– Dimension attributes– Dependent attributes (functions of the

dimension attributes)

• Dimension Tables, potentially one for each dimension

Page 32: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 32

OLAP Operations

• Roll-up – increase the level of aggregation

• Drill-down - decrease the level of aggregation

• Slice-and-dice - selection and projection,i.e., reduce dimensionality of the data

• Pivot – re-orient the dimensional view

Page 33: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 33

Implementation Approaches

• Relational OLAP (ROLAP) Servers– Data stored in a relational

– system

– SQL extended • To allow easy OLAP query expression• To provide efficient OLAP query execution.

• Multidimensional OLAP (MOLAP)– Systems directly store multidimensional data in special data structures

– OLAP operations implemented directly on these data structures.

• Hybrid OLAP (HOLAP)– Combines ROLAP and MOLAP.

– Detail records (largest volume) in relational database.

– Aggregations in separate, but connected”, MOLAP store.

Page 34: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 34

Example a Star Schema

OrderNoOrderDate

CustomerNoCustomerNameCustomerAddressCity

SalespersonIDSalespersonNameCityQuota

OrderNoSalespersonIDCustomerNoProdNoDateKeyCityNameQuantityTotalPrice

CityNameStateRegion

DateKeyDateMonthYear

ProdNoProdNameProdDescrCategoryCategoryDescrUnitPriceQOH

Customer

Order

Salesperson

Sales (Fact) table

Product

Date

City

Page 35: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 35

Snowflake Schema

OrderNoOrderDate

CustomerNoCustomerNameCustomerAddressCity

SalespersonIDSalespersonNameCityQuota

OrderNoSalespersonIDCustomerNoProdNoDateKeyCityNameQuantityTotalPrice

CityNameStateRegion

DateKeyDateMonthYear

ProdNoProdNameProdDescrCategoryUnitPriceQOH

Customer

Order

Salesperson

Sales (Fact) table

Product

Date

City

MonthYear

CategoryNameCategoryDescr

StateRegion

Year

State

Category

Region

Month

Page 36: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 36

Data Cubes

• Precompute all possible aggregations.

• Required extra storage is tolerable.

• Little penalty to keep aggregate up-to-date if data does not change.

• Normally some aggregation of raw data is done before it is entered into the data cube.

Page 37: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 37

Data Cube with Orders Accumulated

CustomerNoCustomerNameCustomerAddressCity

SalespersonIDSalespersonNameCityQuota

SalespersonIDCustomerNoProdNoDateKeyCityNameQuantityTotalValue

CityNameState

DateKeyDateMonth

ProdNoProdNameProdDescrCategoryUnitPriceQOH

Customer

Salesperson

Sales table

Product

Date

City

MonthYear

CategoryNameCategoryDescr

StateRegion

Year

State

Category

Region

Month

Note that average for any aggregate can be calculated from TotalValue and Quantity.

Page 38: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 38

Sample of Aggregates in the CUBE

Sales(SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue)

22 11 100 2 ‘Columbus’ 3 300

CUBE(Sales)(SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue)

22 11 100 2 ‘Columbus’ 3 30022 * 100 2 ‘Columbus’ 6 222222 * * 2 ‘Columbus’ 25 33000

* * * 2 ‘Columbus’ 75 90000* * * * ‘Columbus’ 200 503444

Page 39: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 39

How to answer query given the relation CUBE(Sales)

Choose tuples in CUBE(Sales) with the following properties:

1. Query specifies value v for attribute a tuple t has v in its component for a.

2. Query groups by attribute a tuple t has any non-* value in its component for a.

3. Query has neither groups by attribute a nor specifies value for a

tuple t has * value in its component for a.

Page 40: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 40

How to answer query given the relation CUBE(Sales)

Cube(Sales)(SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue)

22 11 100 2 ‘Columbus’ 3 30022 * 100 2 ‘Columbus’ 6 222222 * * 2 ‘Columbus’ 25 33000

* * * 2 ‘Columbus’ 75 90000* * * * ‘Columbus’ 200 503444

select CustomerNo, avg(Price)from Saleswhere SalespersonID = 22Group by CustomerNo

Cube(Sales)(SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue)

22 c * * * n v

Result(c, v/n)

Page 41: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 41

Cube Implementation by Materialized Views

• Dimensions may have hierarchies.– Product, Category– City, State, Region

Page 42: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 42

Example: Materialized ViewsCube(Sales)(SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue)

insert into SalesV1select SalespersonID, CustomerNo, Month, State

sum(Quantity) as Quantity, sum(TotalValue) as TotalValuefrom Sales join City on Sales.CityName = City.CityNamegroup by SalespersonID, CustomerNo, Month, State;

insert into SalesV2select SalespersonID, CustomerNo, Month, Region

sum(Quantity) as Quantity, sum(TotalValue) as TotalValuefrom Sales join City on Sales.CityName = City.CityNamegroup by SalespersonID, CustomerNo, Month, Region;

City(CityName, State, Region)

Page 43: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 43

Example: Query 1select SalespersonID, sum(TotalValue) from Salesgroup by SalespersonID;

select SalespersonID, sum(TotalValue) from SalesV1group by SalespersonID;

select SalespersonID, sum(TotalValue) from SalesV2group by SalespersonID;

Answer by

or by

Page 44: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 44

Example: Query 2

select SalespersonID, State, sum(TotalValue) from Salesgroup by SalespersonID, State;

select SalespersonID, State, sum(TotalValue) from SalesV1group by SalespersonID, State;

Answer only by

Page 45: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 45

Example: Query 3

select SalespersonID, State, date, sum(TotalValue) from Salesgroup by SalespersonID, State, Date;

Cannot be answered by either SalesV1 or SalesV2.Thus must use Sales itself.

Page 46: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 46

Lattice of ViewsAll

Years

Quarters

MonthsWeeks

Days

All

City

State

Region

Page 47: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 47

Lattice of Materialized Views and Queries

Sales

Q1

SalesV2SalesV1

Q3Q2

Page 48: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 48

OLAP ExampleGarcia-Molina, Ullman & Widom, Database System Implementation,

Prentice Hall, 2000

Automobile Sales Company: analyze sales of cars

Sales(serialNo, date, dealer, price)

Autos(serialNo, model, color)Dealers(name, city, state)

Days(day, week, month, year) ( 5, 27, 7, 2000)

Fact Table

Dimension Tables

Time Dimension Table, probably not

stored

Page 49: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 49

Assume a particular car model, say ‘Gobi’, is not selling as well as anticipated.

How to analyze?

Maybe it’s the color. Slice for ‘Gobi. Dice for color.

select color, sum(price)from Sales natural join Autoswhere model = ‘Gobi’group by color;

Doesn’t show anything interesting.

Page 50: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 50

Gobi analysis, continuing

What about time? Drill down for month.

select color, month, sum(price)from Sales natural join Autos

join Days on date = daywhere model = ‘Gobi’group by color, month;

Suppose we discover red Gobis have not sold well recently.

Page 51: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 51

Gobi analysis, continuing

Are red Gobis selling poorly for all dealers or just some?

Drill down for dealer.

select dealer, month, sum(price)from Sales natural join Autos

join Days on date = daywhere model = ‘Gobi’ and color = ‘red’group by dealer, month;

Discover there are too few sales to show anything interesting.

Page 52: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 52

Gobi analysis, continuing

Rollup time from month to year and slice for last two years.

select dealer, year, sum(price)from Sales natural join Autos

join Days on date = daywhere model = ‘Gobi’ and color = ‘red’ and (year = ‘1999’ or year = ‘2000’)group by dealer, year;

Does show variation. Now understand the problem better.

Page 53: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 53

Administration

• Lab assignments and HWs posted on the web.• Clarifications/Questions?• Please use appropriate online submit command• Teams of 2 allowed but make contribution of each

team member explicit especially in the lab assignment.

• Extra Credit assignment in lab.• Bring questions to class on Thursday

Page 54: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 54

• (color codes , meaning tuple representation (time in quarters, product,country,Tsales)

• time, product, country are dimension attributes, Tsales is total sales

• White squares (basic fact table) - (q, p, c, sales)

• Green squares total annual sales grouped by product and country. (*, p, c, Tsales)

• Dark Green squares total annual sales grouped by product (*, p, *, Tsales)

• Orange squares total annual sales grouped by quarter and country. (q, *, c, Tsales)

• Dark orange squares total annual sales grouped by quarter. (q, *, *, Tsales)

• Grey total annual sales grouped by country. (*, *, c, Tsales)

• Other pair (quarter and product) not shown (need to pivot). (q, *, p, Tsales)

• Dark blue (all sales) (*, *, *, sales)

February 22, 2003 Data Mining: Concepts and Techniques 27

A Sample Data Cube

Total annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntrysum

sumTV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Size of white cube = QXPXC, size of colored cube = (Q+1)X (P+1)X(C+1)Why? (* think of it as another category along each dimensionSize of colored cube with hierarchy Even larger!

Page 55: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 55

February 22, 2003 Data Mining: Concepts and Techniques 41

Cube Operation

define cube sales[item, city, year]: sum(sales_in_dollars)

compute cube sales

Transform it into a SQL-like language (with a new operator cube by, introduced by Gray et al.’96)

SELECT item, city, year, SUM (amount)

FROM SALES

CUBE BY item, city, year Need compute the following Group-Bys

(date, product, customer),(date,product),(date, customer), (product, customer),(date), (product), (customer)()

(item)(city)

()

(year)

(city, item) (city, year) (item, year)

(city, item, year)

Page 56: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 56

Aggregation causes Database Explosion

in Large Multi-dimensional Applications as the Number of

Dimensions Increases

Based on Nigel Pendse, “Database Explosion”,

www.olapreport.com/DatabaseExplosion.htm

Page 57: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 57

Factors not causing data explosion• Poor handling of data sparsity.

– No more than factor of 4 vs. factors of 10s or 100s

• Type of database technology.– Although optimized storage technology will be significantly

better.

• Lack of data compression.– Compression is helpful, but explosion still occurs.

• Software errors– Again, a different problem.

Page 58: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 58

Multi-dimensional Database (MDB)can save significant space

• Keys, indexes & dimensional structures .– Not required or take far less space.

• Sparsity better suppressed.• Data compressed.• Example:

– 6-dimensional (including measures) banking cube– 13 million row fact table– Relational fact table incl. indexes, but not aggregates: 5188 Mb– MOLAP cube including aggregations: 336 Mb– Well under 10% the space.– Much faster query processing.

Page 59: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 59

(n+m+p) 2

(n+m) 2

Why is there a data explosion even without sparsity?

• Take two dimensional example

• n: data from original source.

• m: data aggregations precalculated.

• p: on-the-fly results, not stored.

n2

n m p

Simplifying to n=m=p1n2, 4n2, 9n2

In 3 dimensions this becomes1n3, 8n3, 27n3

Page 60: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 60

When Data is Sparse it’s much worse.

• One-dimensional data.• Simple hierarchy. Black - actual data, red - nulls.• Detailed level: 8 of 25 or 32%. • Aggregated levels: 5 of 6 or 83%.• Growth factor: 1.625 (13 cells based on 8 input cells)

Page 61: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 61

Aggregated data

Two dimensions: The problem gets worse

Detail data

• Potential input cells:25*25=625

• Potential aggregated cells:

6*6 + 6*25 +6*25 =336

• More than 1 derived cell for every 2 possible input cells.

• In 6 dimensions, could have 2 or 3 derived cells per 1 input cell.

Page 62: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 62

What about higher dimensions?• One percent density, 6 of 625 input cells.

• Yields 29 computed cells.

• I.e., 35 total cells, only 6 input.

• Growth factor: 5.83.

• Growth factor per dimension: sqrt(5.83)=2.4. – Called compound growth factor (CGF).

• CGF is typically in the range 1.5 to 2.5.

• CGF increases as sparsity increases.

• With large dimensions, will often be more consolidation. – (Many thousands of products more levels of groupings.)

• With CGF of 2.0, extra dimension with no increase in input data, will double size of fully computed database.

Page 63: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 63

So what is the problem?

• Disk space increases.

• Can software handle this much data?

• Time to load and update database increases.– Could take days to load the database.

Page 64: Decision Support Systems1 From Transaction Processing to Support for Decision Making CIS 671

Decision Support Systems 64

What to do?• Avoid fully pre-calculating any multi-

dimensional object with more than 5 sparse dimensions.

• Reduce sparsity of individual data objects:– Use good application design.

What to pre-calculate?• Data that is slow to calculate at run-time because it depends

on many other cells or complex formulae.

• Data that is frequently viewed.

• Data that is the basis of many other calculations.

• Note: If too much is precalculated, performance may decrease because cache will not include as much useful data.