Transcript
Page 1: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

1

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

Physical Data Warehouse Design

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Page 2: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

2

It’s all about trading storage for speed!

• Fundamentals• Aggregates (Ch. 16, pp. 356 - 357)• Indexes (Ch. 16, p. 357)

Page 3: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

3

Fundamentals: the Storage Hierarchy

CPU

Cache

Memory

Disk

Storage Capacity

Small

Large

Access Speed

Slow

Fast

10-8 second

10-7 second

10-2 second

500-1000 MIPS

512 KB

512 MB

512 GB

Page 4: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

4

Fundamentals: the Storage Hierarchy

CPU

Memory Disk

Disk Drive (I/O Channel)

Cache

Bus

How long does it take to query sales by city?How large is the Sales Fact table?How long does it take to access the Sales Fact table?

Page 5: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

5

FundamentalsHow large is the fact table?e.g., 1 million records/day, 0.2KB/record 0.2

GB/day

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Page 6: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

6

Fundamentals

How long does it take to access all the fact records?

E.g., the small fact table is 1 Terabyte in size!

– 0.01s*1012=325 years LONG!!!!!!!!!!!!!

Page 7: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

7

Fundamentals: the Storage Hierarchy

CPU

Memory Disk

Disk Drive (I/O Channel)

Cache

Bus

The logical unit of data transferred between disk and memory is block (e.g., 4k bytes)

Page 8: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

8

Fundamentals

How long does it take to access all the fact records?

E.g., the small fact table is 1 Terabyte in size!

– Number of blocks: 2.5 millions– Access time = 0.01s*2500000= < 7

hrs!!!

Page 9: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

9

Aggregate

• In data warehouse design, we choose the gain of fact table to be the possible lowest level.

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Grain: orderline

Page 10: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

10

Aggregate

• The reasons to choose the lowest level of fact: – (X) Analysts want to query on single

record

– (O) Analysts want to flexibly cut and group records.

Page 11: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

11

Aggregate

• However, keeping the most detailed fact records could result in

– huge-size fact table: TeraBytes?! (1 million records/day, 256 Bytes/record

-> 0.2 GB/day)

– slow query

Page 12: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

12

Aggregate

• To keep s data warehouse flexible, fact tables need to store facts in their lowest levels of detail.

• To improve query performance, another type of fact table which stores pre-computed summaries of detailed facts helps.

• Reduced to a logical design solution

Page 13: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

13

Aggregate

• An aggregate fact table is a fact table that summarizes base-level fact table records

along one or several dimensions.• An aggregate dimension table is a

dimension table that summarizes base-level dimension table records.

• E.g., marketing managers check daily product sales by city --- aggregate by city in customer dimension

Page 14: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

14

Aggregate

CUST_CITY# CITY_KEY* CITY* STATE

SALES_BY_CITY# TIME_KEY# PRODUCT_KEY# CITY_KEY* AVERAGE_PRICE_BY_CITY* TOTAL_QUANTITY_BY_CITY* TOTAL_SALES_BY_CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_NUMBER_IN_MONTH

ref

ref

ref

ref

ref

refAggregate fact table

Aggregate dimension table

Page 15: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

15

Aggregate

SALES_BY_CITY# TIME_KEY# PRODUCT_KEY# CITY_KEY* AVERAGE_PRICE_BY_CITY* TOTAL_QUANTITY_BY_CITY* TOTAL_SALES_BY_CITY

CUST_CITY# CITY_KEY* CITY* STATE

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

CUSTOMER# CUSTOMER_KEY* CID* CNAME* CITY* STATE

TIME# TIME_KEY* ORDERDATE* DAY_NUMBER_IN_MONTH

ref

ref

ref

ref

ref

ref

ref

ref

refref

ref

ref

Page 16: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

16

Indexes

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

How long does it take to

find out the total purchase

Amt by Tom Jones?

Page 17: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

17

Indexes

• Customer table– 1M records, each record 0.200 Kbytes

long– Block is 4K size, block access time is 0.01s– Number of records/block: 4/0.2=20 – Number of blocks: 1M/20=50K

• Sequential search– Time: 25K*0.01s=250s=4min.

Page 18: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

18

Indexes

• Binary search– Time: log(50K)*0.01s=16*0.01s=0.16s

• B+ tree index– Create index pn on customer(cname);– If each node (block) in B+ tree has 117 keys, then

• # of access to indexes: log117(1M)=3 (i.e.height of the tree)

• # of access to Customer Dimension: 1• Total time = 4*0.01 = 0.04s

Page 19: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

19

...(11 key values, 12 pointers)

...

B+-trees - P=12

Indexes to customer records

……….

Indexes to indexes

Page 20: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

20

Indexes

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

How long does it take to

find out the total sales of

Desktop computers?

Page 21: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

21

Performance Improvement

• Suppose there are only 4 product categories for 1M products

• Create a B+ tree index???– Suppose the size of product category

and block ID is 10 bytes– Size of index = 1M * 10 = 10 M bytes

Page 22: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

22

Performance Improvement

• A bitmap index for an attribute A is a collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v for attribute A.

Page 23: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

23

Bitmaps

Desktop 1 0 1

Notebook 0 1 0

Server 0 0 0

Accessory

0 0 0

Product record 1 record 2 record 3

A bitmap index for an attribute A is a collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v for attribute A.

Page 24: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

24

Performance Improvement

• Bitmap index is suitable for low cardinality attribute.– Cardinality(A) = # of possible values for A/#of records

• Compared with B+ tree index, bitmap index has the following advantages for low cardinality attributes– Storage space saving (1M*4/8=500K bytes)– Efficient for boolean operations

• CREATE BITMAP INDEX bitpc ON PRODUCT (PCNAME);


Top Related