ch8-2 business intelligence data warehoue &...

63
Business Intelligence Data Warehoue & OLAP 2015.06 충북대학교 조 완섭 [email protected] Ch8-2

Upload: others

Post on 29-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Business Intelligence Data Warehoue & OLAP

2015.06

충북대학교 조 완섭[email protected]

Ch8-2

Page 2: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Contents

1. DW와 OLAP

2. Models과 operations

3. Data warehouse 구축

4. 실무사례

- AOL

2015-07-26 CBU / MIS 2

Page 3: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DATA WAREHOUSE AND OLAP

Section 1

2015-07-26 CBU / MIS 3

Page 4: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Contents

• DW and OLAP

– Background

– What is a data warehouse?

– Why a warehouse?

– DW, OLAP, OLTP

2015-07-26 CBU / MIS 4

Page 5: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

What is a data warehouse?

• Collection of diverse data– aimed at executive, decision maker

– subject oriented

– often a copy of operational data

– with value-added data (e.g., summaries, history)

– Integrated, historical, and non-volatile data

• Collection of tools– gathering data

– cleansing, integrating, ...

– querying, reporting, analysis

– data mining

– monitoring, administering warehouse

2015-07-26 CBU / MIS 5

Page 6: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Why a warehouse?

• Huge databases, but few analysis

• Analysis requires integration of diverse databases

– Construct a data warehouse

2015-07-26 CBU / MIS 6

Page 7: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP - DW : 2 approaches

• Two approaches in the analysis

– Query-Driven (Lazy) ; Limitation !

– Warehouse (Eager)

2015-07-26 CBU / MIS 7

Source Source….SourceSource

SourceSource

Query-driven ?

or

Data warehouse ?

Page 8: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP - DW : 2 approaches

• Query-Driven

2015-07-26 CBU / MIS 8

Limitation

- Historical data ?

- Overhead in the OLTP systems

Client Client

Wrapper Wrapper Wrapper

Mediator

Source Source Source

Analysis

Query &

Results

OLTP

OLAP

OLTP

Page 9: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP - DW : 2 approaches

• Adv. Of Query-Driven

– No need to copy data

– Less storage

– No need to purchase data

– More up-to-date data

– Query needs can be unknown (ad-hoc queries)

– Only query interface needed at sources

2015-07-26 CBU / MIS 9

Page 10: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP - DW : 2 approaches

• Data Warehouse

2015-07-26 CBU / MIS 10

Client Client

Warehouse

Source Source Source

Query & Analysis

Integration

MetadataExtraction

Transformation

Cleansing

Loading

OLAP & Data Mining

Page 11: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP - DW : 2 approaches

• Adv. Of DW Approach

– High query performance (specialized indexes)

– Queries not visible outside warehouse

– Local processing at sources (OLTP) unaffected

– Can operate when sources unavailable

– Can query data not stored in a DBMS (queries on the integrated database)

– Extra information at warehouse

– Summarize (store aggregates)

– Add historical information

– Sufficient meta data

2015-07-26 CBU / MIS 11

Page 12: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP

• OLTP: On Line Transaction Processing

– Describes processing at operational sites

• OLAP: On Line Analytical Processing

– Describes processing at warehouse

2015-07-26 CBU / MIS 12

Mostly reads

Queries long, complex

Gb-Tb of data

Summarized, consolidated data

Decision-makers, analysts as users

OLTP OLAP Mostly updates

Many small transactions

Mb-Tb of data

Raw data

Clerical users

Up-to-date data

Consistency, recoverability critical

Page 13: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW, OLAP, OLTP

• Data Mart

– Smaller warehouses

– Support part of organization• e.g., marketing (customers, products, sales)

– Do not require enterprise-wide consensus• but long term integration problems?

2015-07-26 CBU / MIS 13

Page 14: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

WAREHOUSE MODELS & OPERATORS

Section 2

2015-07-26 CBU / MIS 14

Page 15: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW Data Model and Operators

• Data Models

– relations

– Star schema & snowflakes

– cubes

• Operators

– aggregations

– grouping

– slice & dice

– roll-up, drill down

– pivoting

– other

2015-07-26 CBU / MIS 15

Page 16: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Star Schema

• Fact Table & Dimension Tables & Measures

2015-07-26 CBU / MIS 16

Measures

Fact Table

Dimension TableDimension Table

Dimension Table

Page 17: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Star Schema

• Contents

2015-07-26 CBU / MIS 17

customer custId name address city

53 joe 10 main sfo

81 fred 12 main sfo

111 sally 80 willow la

product prodId name price

p1 bolt 10

p2 nut 5

store storeId city

c1 nyc

c2 sfo

c3 la

sale oderId date custId prodId storeId qty amt

100 1/7/97 53 p1 c1 1 12

102 2/7/97 53 p2 c1 2 11

105 3/8/97 111 p1 c3 5 50

Page 18: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Star Schema

• Snow-flake schema

2015-07-26 CBU / MIS 18

sType

cidspopregID

City

cidssizelocation

sType

regIDregName

Region

Page 19: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Star Schema

• Contents

• Dimension Hierarchy

2015-07-26 CBU / MIS 19

store storeId cityId stId mgr

s5 sfo t1 joe

s7 sfo t2 fred

s9 la t1 nancycity cityId pop regId

sfo 1M north

la 5M south

region regId name

north cold region

south warm region

sType stId size location

t1 small downtown

t2 large suburbs

snowflake schema

store

sType

city region

week

year

quarter month day

Page 20: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Cube Data Model

• Star Schema vs. Cube data model

2015-07-26 CBU / MIS 20

sale prodId storeId amt

p1 c1 12

p2 c1 11

p1 c3 50

p2 c2 8

c1 c2 c3

p1 12 50

p2 11 8

Fact table view:Multi-dimensional cube:

dimensions = 2

Page 21: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Cube Data Model

• Star Schema vs. Cube data model

2015-07-26 CBU / MIS 21

sale oderId date custId prodId storeId qty amt

100 1/7/97 53 p1 c1 1 12

102 2/7/97 53 p2 c100 2 11

105 3/8/97 111 p1 c3 5 50

Customer

Sto

re

P1

p50

P2…

1111 … 53 …

C100

….C3

C2

C1

60

50

40

150

Store

<1,12,…>

<5,50,…>

<2,11,…>

<qty, amt, date>

Page 22: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Cube Data Model

• Another example

2015-07-26 CBU / MIS 22

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

Multi-dimensional cube:Fact table view:

3D Cube

Page 23: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

MOLAP - CUBE Operators• Pivot

– 행과 열의 위치를 바꾸는 연산자

• Roll-up – 차원 계층을 따라서 상위 레벨에 대한 집계치를 계산하는 연산– 예를들어, 날짜별 판매량에서 주별, 혹은 월별 판매량을 구할 때 사용됨

• Drill-down – 차원 계층을 따라서 하위 레벨에 대한 집계치를 계산하는 연산– 예를들어, 월별 판매량에서 주별, 혹은 날짜별 판매량을 구하는데 사용됨

• Slice– 한 차원에 대하여 주어진 값으로 selection한 결과 (subcube)를 산출하는 연산– 예를들어 slice for time = "Q1"은 time 차원에서 하나의 값 (Q1)으로 셀렉션하여 생성됨

• Dice – 두 차원 이상에 대하여 주어진 값들로 selection한 결과 (subcube)– location, time, item 세 차원에서 selection 된 결과 subcube를 만들 때 사용됨

• Drill-across– 두 개 이상의 사실 테이블을 포함한 질의를 수행할 때 사용되는 연산자

• Drill-through– 큐브에서 해당 소스 데이터베이스 (관계형 데이터베이스 테이블)까지 접근하여

raw data를 검색하는 연산

2015-07-26 CBU / MIS 23

Page 24: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

MOLAP - CUBE Operators

• Ranking (Sorting) the top/bottom N items

– 최고치부터 차례로 N개를 선택하거나 혹은 최저치부터 차례로 N개를 선택하는 연산

– 매출량이 가장 높은 10개 상점은 ?

• 이 밖에도 큐브에 대한 다양한 연산자들이 제안되고 있음

– Moving average

– Growth rate

– Interests

– Internal rate of return

– Depreciation

– Currency conversion

– Statistical functions

2015-07-26 CBU / MIS 24

Page 25: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

MOLAP - CUBE Operators

• Pivoting

2015-07-26 CBU / MIS 25

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

Multi-dimensional cube:Fact table view:

c1 c2 c3

p1 56 4 50

p2 11 8

Pivoting

- change the orientation of axis

p1 p2

C1 56 11

C2 4 8

C3 50

cube

roll-up

Page 26: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

MOLAP - CUBE Operators

• Cube Aggregation : roll-up, drill down

2015-07-26 CBU / MIS 26

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

c1 c2 c3

p1 56 4 50

p2 11 8

c1 c2 c3

sum 67 12 50

sum

p1 110

p2 19

129

. . .

sale(c1,*,*)

sale(*,*,*)sale(c2,p2,*)

day1, day2

Drill-dwonRoll-up

Page 27: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

MOLAP - CUBE Operators

• Slice, dice

2015-07-26 CBU / MIS 27

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

c1 c2 c3

p1 12 50

p2 11 8

day 1

slice

dice

subcube

cube

Page 28: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

MOLAP - CUBE Operators

• Han’s book 그림

2015-07-26 CBU / MIS 28

OLAP 연산자

Page 29: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP - SQL

• Star Schema

2015-07-26 CBU / MIS 29

Page 30: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

의사결정지원용 질의 예• 날짜별 avg_sales ?• 날짜별, branch_name별 avg_sales ?• 날짜별, branch_name별, item_name별 avg_sales ?• 월별, branch_name별, item_name별 avg_sales ?• 분기별, branch_name별, item_name별 avg_sales ?• 년도별, branch_name별, item_name별 avg_sales ?• 2009년, branch_name별, item_name별 avg_sales ?• 2009년, 1분기, branch_name별, item_name별 avg_sales ?• 2009년, 1분기, branch_name별, item_name별, city별 avg_sales ?• item_name별, month별 avg_sales ?• item_name별, quarter별 avg_sales ?• item_name별, quarter별, item_name별, city별 avg_sales ?• brand별, quarter별, item_name별, city별 avg_sales• type별, quarter별, item_name별, city별 avg_sales• supplier_type별, quarter별, item_name별, city별 avg_sales• (조건에 따라서 무수히 많은 질문이 가능하며, 이러한 질의를 효과적으로 작

성하고, 빠르게 분석할 수 있어야 함)=> SQL vs. OLAP ?

2015-07-26 CBU / MIS 30

Page 31: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP - SQL

• Grouping

Q1) year, month별로 SUM(Amount)를 집계한 SQL SELECT year, month, SUM(dollars_sold)FROM Sales S, Time TWHERE S.time_key = T.time_keyGROUP BY year, month;

2015-07-26 CBU / MIS 31

year month SUM(dollars_sold)

2005 1 200

2005 2 250

2005 3 530

... ... ...

2006 1 330

2006 2 420

... ... ...

Page 32: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP - SQL

• Q2) 질의 Q1에 대한 roll-up / drill-down 연산

SELECT year, SUM(dollars_sold) => year, month, SUM(dollars_sold)

FROM Sales S, time T

WHERE S.time_key = T.time_key

GROUP BY year; => year, month

2015-07-26 CBU / MIS 32

year SUM(dollars_sold)

2005 2200

2006 3250

2007 6530

... ...

year month SUM(dollars_sold)

2005 1 200

2005 2 250

2005 3 530

... ... ...

2006 1 330

2006 2 420

... ... ...

Rollup

Drilldown

Page 33: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP - SQL

• Q3) Slice for time = "Q1"SELECT S.*

FROM sales S, time T

WHERE S.time_key = T.time_key AND T.quarter = "Q1";

• Q4) Dice for (location.city = "Toronto" or "Vancouver") AND

(time.quarter = "Q1" or "Q2") AND

(item.item_name = "entertainment" or "PC")

SELECT S.*

FROM sales S, time T, item I, lication L

WHERE S.time_key = T.time_key AND

S.lication_key = L.location_key AND

S.item_key = I.item_key AND

(T.quarter = "Q1" or T.quarter = "Q2") AND

(L.city = "Toronto" or L.city = "Vancouver") AND

(I.item_name = "entertainment" or I.item_name = "PC");

2015-07-26 CBU / MIS 33

Page 34: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

• 확장된 집계 연산자– rank (e.g., top ten), percentile (top 10%), window queries

• GROUP BY 절의 확장– GROUPING SETS, ROLLUP, CUBE 등의 옵션 추가

• Assumed Table

2015-07-26 CBU / MIS 34

shipment (fact table)

S# P# Qty Date

S1 P1 20 2004-01-01

S1 P2 10 2004-01-01

S1 P3 50 2003-03-07

S2 P1 20 2004-05-02

S2 P2 5 2004-05-02

S2 P3 50 2004-05-02

S1 P1 45 2004-01-02

… … … …

Page 35: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

2015-07-26 CBU / MIS 35

S# TotalS1 125S2 75

S# P# Total

S1 P1 65

S1 P2 10

S1 P3 50

S2 P1 20

S2 P2 5

S2 P3 50

Total shipment quantities by supplier

SELECT S#, SUM(QTY) AS TotalFROM SHIPMENTGROUP BY (S#);

Total shipment quantities by supplier and product

SELECT S#, P#, SUM(QTY) AS TotalFROM SHIPMENTGROUP BY (S#, P#);

Page 36: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

2015-07-26 CBU / MIS 36

GROUPING SET으로 확장한 SQL

S# P# Total

S1 NULL 125

S2 NULL 75

NULL P1 85

NULL P2 15

NULL P3 100

SELECT S#, P#, SUM(QTY) AS TotalFROM SHIPMENTGROUP BY

GROUPING SETS (S#, P#);

S#

P#

Page 37: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

2015-07-26 CBU / MIS 37

S#, P#

S#

()

1) Aggregate with the finest granularity (Group By S#, P#)2) Then with the next level of granularity (Group By S#)3) Then the grand total (with no Group By clause)

Page 38: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

2015-07-26 CBU / MIS 38

Aggregate with the finest granularity (GROUP BY S#, P#)Then with all subsets (GROUP BY S#, GROUP BY P#Then the grand total (with no GROUP BY clause)

Page 39: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

2015-07-26 CBU / MIS 39

Null Null

Null

Null

Page 40: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

ROLAP – SQL 확장 [Gra97]

2015-07-26 CBU / MIS 40

NullNullNullNull

Null

Null

Null Null

Page 41: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

IMPLEMENTING A DW

Section 3

2015-07-26 CBU / MIS 41

Page 42: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

• OLAP

– Analysis tool on the data warehouse

• ROLAP:

– Relational On-Line Analytical Processing

– Relational DBMS is used for storing & accessing data warehouses

• MOLAP:

– Multi-Dimensional On-Line Analytical Processing

– Specialized data management systems

2015-07-26 CBU / MIS 42

Page 43: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

• Logical Architecture

2015-07-26 CBU / MIS 43

Data model transformation between conceptual multidimensional schema and logical schema

Data model transformation between external sources and logical schema

Page 44: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

• The physical architecture

– Data warehouse architecture for multidimensional OLAP systems

• Multidimensional DB systems serves as both the Data Store Layer and the Data Management Layer

• Data from sources are conformed to the multidimensional model, summaries along all dimensions are precomputed for performance

• Data accesses are directly submitted

2015-07-26 CBU / MIS 44

Page 45: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

• Multidimensional OLAP

2015-07-26 CBU / MIS 45

Page 46: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

• The physical architecture (cont’)

– Data warehouse architecture for relational OLAP • Both the Data Store Layer and Data Management Layer are rela

tional

• Required to support the multidimensional analysis requests from the Application Interface Layer => relational OLAP engine

• Relational DBMSs cannot be directly transplanted into a DW system

– Data marts• Small data volumes (< 10 GB)

• Data with less than 10 dimensions

– Virtual data warehouse• Collection of views

2015-07-26 CBU / MIS 46

Page 47: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

• Relational OLAP

2015-07-26 CBU / MIS 47

Page 48: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

2015-07-26 CBU / MIS 48

Data Warehouse 구축과 OLAP 분석과정

Page 49: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

DW 구축 절차

• Data extracted from operational databases (data sources)

• Cleaned to minimize errors, fill in missing information

• Transformed (e.g., relational views) to reconcile semantic mismatches

• Loading data: Materializing views, storing them locally into warehouse

• Data is periodically refreshed to reflect updates in data sources

• Periodically purge old data

• System catalogs: Very large, often stored in separate database called metadata

repository

2015-07-26 CBU / MIS 49

Page 50: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

2015-07-26 CBU / MIS 50

multi-

dimensional

server

M.D. tools

utilities

could also

sit on

relational

DBMS

Pro

du

ct

Date1 2 3 4

milk

soda

eggs

soap

AB

Sales

• 2가지 구현방식- MOLAP- ROLAP

Page 51: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

OLAP & DW Architecture

2015-07-26 CBU / MIS 51

relational

DBMS

ROLAP

server

tools

utilitiessale prodId date sum

p1 1 62

p2 1 19

p1 2 48

Special indices, tuning;

Schema is “denormalized”

Page 52: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Index

• Inverted Index

2015-07-26 CBU / MIS 52

20

23

18

19

20

21

22

23

25

26

r4

r18

r34

r35

r5

r19

r37

r40

rId name age

r4 joe 20

r18 fred 20

r19 sally 21

r34 nancy 20

r35 tom 20

r36 pat 25

r5 dave 21

r41 jeff 26

. . .

age

index

inverted

lists data

recordsQuery:

Get people with age = 20 and name = “fred”

List for age = 20: r4, r18, r34, r35

List for name = “fred”: r18, r52

Answer is intersection: r18

Page 53: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Index

• Bit-map index

2015-07-26 CBU / MIS 53

id name age

1 joe 20

2 fred 20

3 sally 21

4 nancy 20

5 tom 20

6 pat 25

7 dave 21

8 jeff 26

. . .

data

records

20

23

18

19

20

21

22

23

25

26

age

indexbit

maps

1

1

0

1

1

0

0

0

0

0

0

1

0

0

0

1

0

1

1

Query:

Get people with age = 20 AND

name = “fred”

List for age = 20 : 1101100000

List for name = “fred” : 0100000001

Answer is intersection : 0100000000

Page 54: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Index

• Bit-map index

– Good if domain cardinality small

– Bit vectors can be compressed

2015-07-26 CBU / MIS 54

Page 55: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Join index

• SELECT * FROM sales, product WHERE prodId=id ;

2015-07-26 CBU / MIS 55

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

product id name price

p1 bolt 10

p2 nut 5

joinTb prodId name price storeId date amt

p1 bolt 10 c1 1 12

p2 nut 5 c1 1 11

p1 bolt 10 c3 1 50

p2 nut 5 c2 1 8

p1 bolt 10 c1 2 44

p1 bolt 10 c2 2 4

HUGE TABLE

Page 56: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Join index

• Join index on product.id = sales.prodId

2015-07-26 CBU / MIS 56

product id name price jIndex

p1 bolt 10 r1,r3,r5,r6

p2 nut 5 r2,r4

sale rId prodId storeId date amt

r1 p1 c1 1 12

r2 p2 c1 1 11

r3 p1 c3 1 50

r4 p2 c2 1 8

r5 p1 c1 2 44

r6 p1 c2 2 4

join index

Page 57: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Materialization

• 자주 요구하는 요약 정보를 미리 계산하여 DW에저장함으로써 분석의 성능을 높임

2015-07-26 CBU / MIS 57

day 2 c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

c1 c2 c3

p1 56 4 50

p2 11 8

c1 c2 c3

p* 67 12 50

c*

p1 110

p2 19

129

. . .

CUBE Sales (Product, City, Day)

Sales(Product, City, *)

Sales(*, City, *)

Sales(Product, *, *)

Sales(*, * , *)

Page 58: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Materialization

• Cube Aggregation Lattice

2015-07-26 CBU / MIS 58

city, product, date

city, product city, date product, date

city product date

all

use greedy

algorithm to

decide what

to materializecity, product, date

city, product city, date product, date

city product date

all129

Page 59: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Materialized View

• Query Processing Using MVs

2015-07-26 CBU / MIS 59

Page 60: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Materialized View

• MV 생성

2015-07-26 CBU / MIS 60

Page 61: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Materialized View

• Query Rewriting

2015-07-26 CBU / MIS 61

Page 62: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

Metadata

• Administrative– definition of sources, tools, ...

– schemas, dimension hierarchies, …

– rules for extraction, cleaning, …

– refresh, purging policies

– user profiles, access control, ...

• Business– business terms & definition

– data ownership, charging

• Operational– data lineage (data sources and path way)

– data currency (e.g., active, archived, purged)

– useage stats, error reports, audit trails

2015-07-26 CBU / MIS 62

Page 63: Ch8-2 Business Intelligence Data Warehoue & OLAPelearning.kocw.net/KOCW/document/2015/chungbuk/... · • OLAP: On Line Analytical Processing –Describes processing at warehouse

2015-07-26 CBU / MIS 63

구 분 정 보 내 용 탐 색 유 형

데이터관리용도

- 데이터 모델 정보- 엔티티 정보- 속성정보- 도메인 정보- KEY 정보- Alias 정보- Entity별 비즈니스룰- 약어정보- 논리·물리모델 매핑- 비즈니스용어·물리적이름의 매핑

- 관리자 정보

- 회사내의 비즈니스엔티티 유형은 어떤 것이고 어떻게 정의되었는가?- 비즈니스 엔티티의 속성은?- 속성들이 가지는 코드값의 의미는?- 비즈니스 엔티티에 관련된 다른 정보는 어떤 것들이 있는가?- 비즈니스 변화가 어떤 엔티티에 변화를 주는가?- 데이터 품질에 대한 책음은 누가 지고있는가?- 데이터와 관련된 비즈니스 정의에 대해 누구에게 질문할 것인가?

데이터베이스정보

- 데이터베이스명- 테이블/뷰명- COPYBOOK명- Column명- Index 명- Dimensions / Facts- Connectivity 정보- Network, Server 정보- 권한 정보- 관리자 정보

- 데이터베이스를 구성하는 테이블 목록은?- 데이터의 물리적 속성은 어떻게 되는가?- 데이터는 어떻게 인덱싱되어있는가?- 데이터가 최종적으로 갱신된 시기는?- 이 데이터는 누가 사용하는가?- 이 데이터는 언제 자주 사용하는가?- 데이터베이스에 문제가 생기면 누구와 연락을 취해야 하는가?

데이터 이동정보

-소스 및 타겟정보(컬럼, 테이블, 데이터베이스)

- 소스/타겟 매핑- 변환로직- 변환 버전관리- 매핑 Value- 연산, 파생규칙- 이동시간, 주기- 검증정보- 관리자정보

- 이 데이터의 원천은 어디인가?- 소스시스템은 어떤 종류가 있는가?- 데이터 추출방법은 어떠한가/어떤 조건에 의해 추출되었는가?- 소스에서 타겟으로 오면서 어떤 변환과정을 거쳤는가?- 데이터 추출작업이 제대로 수행되었는가?- 예외데이터에 대해 어떤 처리가 이루어졌는가?- 데이터 이동은 누가 책임지는가?

BusinessIntelligence

- Connectivity 정보- 보안 절차 정보- 리포트, 질의유형 정보- 리포트, 질의검증 정보- 리포트 분배정보- 리포트 원천데이터정보- Help Desk 정보- 관리자정보

- 어떤 리포트, 질의를 사용하여 원하는 결과를 얻을수 있을까?- 내 리포트와 질의를 사용하는 사람은 누구인가?- 이 리포트의 데이터는 어디서 오는가?- 리포트, 질의의 결과로 얻을 수 있는 결론은 무었인가?- 이 리포트와 질의는 품질이 검증된 것인가?- 누가 이 질의를 만들었고 누가 사용하는가?- 특정데이터에 권한이 없을 때 누구와 연락을 취해야 하는가?- 데이터베이스에 문제가 생기면 누구와 연락을 취해야 하는가?