data warehousing. make the right decisions for your organization –rapid access to all kinds of...

51
Data Warehousing

Upload: randolph-cross

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Data WarehousingData Warehousing

Page 2: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Data Warehousing• Make the right decisions for your organization

– Rapid access to all kinds of information– Research and analyze the past data– Identify and predict future trends

• The construction of data warehouse– Involve data cleaning and data integration– Provide on-line analytical processing (OLAP) tools for th

e interactive analysis of data

• W.H. Inmon– A data warehouse is a subject oriented, integrated, tim

e-dependent and non-volatile collection of data in support of management’s decision making process

Page 3: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Characteristics of Data Warehouse

• Subject-oriented– Data warehouse is designed for decision

support and around major subject, such as customer and sales

– Not all information in the operational database is useful

• Integrated– Integrate multiple heterogeneous sources and

make it consistent– The data from different sources may use

different names for the same entities

Page 4: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Characteristics of Data Warehouse

• Time dependent– Record the information and the time when it

was entered– Data mining can be done from the data in some

period of time

• Non-volatile– Data in a data warehouse is never updated

Page 5: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Data Warehousing• Data warehousing

– The process of constructing and using data warehouse

• Two types of databases– Operational database

• Large database in operation

• Built for high speed and large number of users

– Data warehouse• Designed for decision support

• Contain vast amounts of historical data

• Data mart– A department subset of the data warehouse that focuses on

selected subjects, and its scope is department-wide

Page 6: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

OLTP & OLAP System

• OLTP (On-Line Transaction Processing) System– The major task of operational database is to

perform on-line transaction and query processing

• OLAP (On-Line Analytical Processing) System– Data warehouse system serves users on data

analysis and decision making

Page 7: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Differences ~ OLTP & OLAP• Characteristic

– OLTP: operational processing– OLAP: informational processing

• Orientation– OLTP: transaction-oriented– OLAP: analysis-oriented

• User– OLTP: customer, DBA– OLAP: manager, analyst

• Function– OLTP: day-to-day operations– OLAP: information requirement, decision support

Page 8: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Differences ~ OLTP & OLAP• DB design

– OLTP: ER based, application-oriented– OLAP: star/snowflake, subject-oriented

• Data– OLTP: current; guaranteed up-to date– OLAP: historical

• Unit of work– OLTP: short, simple query– OLAP: complex query

• Access– OLTP: read/write– OLAP: mostly read

• DB size– OLTP: 100 MB to GM– OLAP: 100 GB to TB

Page 9: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Differences ~ OLTP & OLAP

Page 10: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Data Warehousing

Multidimensional Data ModelStar Schema or Snowflake Schema

Relational Data Model Relational Schema

Page 11: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Model & Schema for Relational Database

RelationalSchema

Relational Data Model

Page 12: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Multidimensional Data Model

• Example: AllElectronics creates a sales data warehouse in order to keep records of the store’s sales– Fact Table

• sales amount in dollars and number of units sold (measure)

– Dimension Tables• time, item, branch, and location

• Multidimensional data model views data in the form of a data cube

Page 13: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Two Dimensions• 2-D view of sales data for item sold per

quarter in the city of Vancouver. The measure is dollars_sold (in thousands)

Measures

Dimensions

Page 14: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Three Dimensions• 3-D view of sales data according to the

dimensions time, item and location. The measure is dollars_sold (in thousands)

Dimensions

Measures

Page 15: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Three Dimensions• 3-D data cube representation according to

the dimensions time, item and location. The measure is dollars_sold (in thousands)

Page 16: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Four Dimensions• 4-D data cube of sales data according to the

dimensions supplier, time, item and location. The measure is dollars_sold (in thousands)

Page 17: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Schemas for Multidimensional Data Model

• Star Schema

• Snowflake Schema

• Fact Constellation Schema

Page 18: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Star Schema

Page 19: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Snowflake Schema

Page 20: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Snowflake Schema• Some dimension tables are normalized to

reduce redundancies and save storage space• Reduce the effectiveness of browsing since

more join will be needed to execute a query• This saving of space is negligible in

comparison to the magnitude of the fact table

• Snowflake schema is not as popular as the start schema in data warehouse design

Page 21: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Fact Constellation Schema• Multiple fact tables share dimension tables

Page 22: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

OLAP TechnologiesOLAP Technologies

Page 23: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Concept Hierarchies

Page 24: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify
Page 25: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify
Page 26: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Three-Tier DW Architecture

Page 27: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Case Study in Data Warehousing

Case Study in Data Warehousing

Page 28: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

公司簡介

Page 29: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

公司簡介

Page 30: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

公司簡介

Page 31: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

背景資料• A 公司利用傳統的 E-R Model 來建立其關聯式資料庫系統

• A 公司發現此種資料庫系統無法即時地滿足高階主管對有效資訊的取得與分析,進而做出決策– 傳統的 E-R Model 資料模型的設計對資料的一致性 (Consistency) 及避免資料的重複 (Duplication) 上有最佳的效率

– 對於 Multi-constraint 及 Multi-join 的多維度查詢除了會拉長查詢的時間外,還會搶奪系統資源,造成系統負荷過重而產生瓶頸

Page 32: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

背景資料• A 公司決定利用多維度資料模型 (Multidi

mensional Data Model) 所設計的資料庫系統來解決上述的問題– 建立資料倉儲 (Data Warehousing)– 一次滿足所有的限制,而不需大量的合併動作,同時使用者介面也較為和善

Page 33: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

建立多維度資料庫的步驟• 了解作業流程與需求,以作為設計時的基礎知識,此部份可藉由與客戶的訪談、閱讀交易系統文件、分析現有作業流程而得知

• 界定 Fact Table 內要有哪些組成?此部份要注意到是否能滿足第一步驟所定義的需求

• 找出用戶的思考觀點及每一個思考觀點的層級關係,也就是 Dimension Table

• 定義 Fact Table 的 Measure ,這些 Measure 是各個維度所可能會取用的值

Page 34: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

因果關係

Page 35: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

因果關係

Page 36: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

因果關係圖

Page 37: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

因果關係

Page 38: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 39: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 40: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 41: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 42: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 43: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 44: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 45: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 46: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

多維度資料庫的建立

Page 47: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

• 其餘表格依此類推。• 最後共產生共 20 個 Fact Tables 及數十個 Dimension Tables 。

• 這些表格為 OLAP 系統或資料探勘 (Data Mining) 系統的輸入 (Input) 。

• 利用這些系統我們才能得到更進一步的統計及知識的輸出 (Output) 。

多維度資料庫的建立

Page 48: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

Design of Data Warehouse• How can I design a data warehouse ?

– Top-down approach

– Bottom-up approach

– Combination of both

• In general, the warehouse design process consists of the following steps– Choose a business process to model

– Choose the gain of the business process

– Choose the dimensions

– Choose the measures

Page 49: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

其它應用實例

Page 50: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

其它應用實例

Page 51: Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify

其它應用實例