data warehouse (dw) & on-line analytic processing (olap) rev: feb, 2012 euiho (david) suh, ph.d....
TRANSCRIPT
Data Warehouse (DW) &On-line Analytic Processing (OLAP)
Rev: Feb, 2012
Euiho (David) Suh, Ph.D.
POSTECH Strategic Management of Information and Technology Laboratory(POSMIT: http://posmit.postech.ac.kr)
Dept. of Industrial & Management EngineeringPOSTECH
Contents※ Discussion Questions
1 Data Warehouse
1) Introduction of Data Warehouse
2) Concepts for Data Warehouse
3) Difficulties and Trends
2 On-line Analytic Processing (OLAP)
1) Introduction of OLAP
2) Concepts for OLAP
3 Case Study
3
Discussion Questions
■ What is the differences among Database, Data Warehouse, and Data Mart?
■ What is the core difference between DBMS and MBMS in their functionalities?
■ What are the benefits and limitations of the relational database model for business applications today?
■ What do you think the major reason for using OLAP in firms?
4
■ Data Warehouse
– Stores static data that has been extracted from other databases in an organization– Central source of data that has been cleaned, transformed, and cataloged– Data is used for data mining, analytical processing, analysis, research, decision sup-
port
Definition of Data Warehouse 1. Data Warehouse1) Introduction of Data Warehouse
Integrated
Non-volatile
Time variant
A data warehouse is a collection of data in support of manage-ment’s decisions
Scattered Information Cleaned Data Warehouse Query & Distribute to End User
0
50
100
SalesHR
Cost
Finance
Bond
Customer
5
■ Data Warehouse architecture
Data Warehouse Architecture 1. Data Warehouse1) Introduction of Data Warehouse
* Building the Data Warehouse *Use of Data Warehouse
Data Warehouse
External file
OLTP System
Back up file
Enterprise server
Workgroup server
Query, Reporting tool
OLAP tool
Datamining Application
EIS/DSS Application
Web browserSlice/Dice
SQLSQL
SQL
SQL
SQL
SQL
SQL
Data MartSource Data
MDB
RDB
Infra, Data integration and Administration
Application development, Data access & Use
6
■ Technical architecture for a data warehousing system
Data Warehouse Architecture
DataAcquisitionComponent
DesignComponent
DataManager
Component
InformationDirectory
Component
DataDelivery
Component
MiddlewareComponent
Data AccessComponent
warehousedata
warehousemetadata
externaldata
externalmetadata
sourcedata
Management Component
1. Data Warehouse1) Introduction of Data Warehouse
7
■ Definition of database– Integrated collection of logically related data elements
■ Common Database Structures (Types)– Hierarchical
• Early DBMS structure• Records arranged in tree-like structure• Relationships are one-to-many
– Network• Used in some mainframe DBMS packages• Many-to-many relationships
– Relational• Most widely used structure• Data elements are stored in tables• Row represents a record; column is a field• Can relate data in one file with data in another,
if both files share a common data element
– Multidimensional• Variation of relational model• Uses multidimensional structures to
organize data• Data elements are viewed as being in cubes• Popular for analytical databases that support Online Analytical Processing (OLAP)
– Object-Oriented• Store data together with the appropriate methods for accessing it i.e. encapsulation• Information is represented in the form of objects as used in object-oriented programming
Introduction of Database 1. Data Warehouse2) Concepts for Data Warehouse
Relational Struc-ture
Object-Oriented Structure
8
■ Metadata– Data about data (similar to catalog card in library)– Define the data in the data warehouse– Enable to find the data in data warehouse, more easily and fast
■ Data Marts– Collection of database– Comparing with Data Warehouse, data marts are usually smaller and focus on a par-
ticular subject or department. – Data marts are subsets of larger Data Warehouse
■ Data Warehouse VS. Data Mart– Data in Data Warehouse• The data needs to be gathered from all the relevant transactional systems that produce it,
cleansed and validated, and made available from a system-of-record that ensures the referential integrity of the data
– Data in Data Mart• The data needs to be presented in a structure that is intuitive to the users and facilitates their
ability to query the data that is relevant to their needs
Metadata and Data Marts 1. Data Warehouse2) Concepts for Data Warehouse
9
■ Data Warehouse built on top of DB
Information Flow 1. Data Warehouse2) Concepts for Data Warehouse
Internal / External
Database
Data Warehouse
Metadata RepositoryInternal / External
Database
Data Marts
Finance Management Reporting
Accounting
SalesMarketing
10
■ Data Warehouse Components
Data Warehouse Components 1. Data Warehouse2) Concepts for Data Warehouse
11
■ Applications and Data Marts
Applications and Data Marts 1. Data Warehouse2) Concepts for Data Warehouse
12
Difficulties in implementing DW
■ Complete Alignment– Make sure you have full involvement and buy -in from those that represent your users -
the consumers of your data warehouse.
■ Iterative & Frequent Update– Consider all aspects of the process of researching your data sources, capturing and
transmitting that data to the data warehouse, transforming and loading it into the data warehouse and accounting for its lineage.
■ Risk– Make sure you develop a proper risk management plan.
1. Data Warehouse3) Difficulties and Trends
13
Future Trends
■ Enterprise Data Warehouse– The enterprise data warehouse, whether a single store or integrated data marts across
a variety of platforms, yields a view of the operation previously unattainableby Don Hatcher, SAS
■ Real-time– Organization move to more real-time data transformation and seek to better leverage
common metadata across applications by Allan Houpt, CA
■ Capacity– The future of data warehousing is all about ever larger data warehouses - in fact I just
read about a U.S. Government effort to create petabyte repositoriesby Roman Bukary, SAP Director of Market Strategy
1. Data Warehouse3) Difficulties and Trends
14
Definition of OLAP
■ OLAP (On-Line Analytical Processing)– The dynamic enterprise analysis required to create, manipulate, animate and synthesis
information from Enterprise Data Models * Providing OLAP : An IT Mandate
E.F. Codd(1993)
– FASMI(Fast Analysis of Shared Multidimensional Information)• This definition was first used in early 1995, and has not needed revision since
Pendse & Greeth(1995)
2. OLAP1) Introduction of OLAP
FAST
ANALYSIS
SHARED
MULTIDIMENSIONAL
INFORMATION
15
OLAP Architecture
■ OLAP Architecture
2. OLAP1) Introduction of OLAP
16
From OLTP to OLAP
■ Data used in OLAP– Sales data of June? (OLTP)– Multi-dimensional data(having many features) (OLAP)
■ Direct Access: EUC Environment
■ From What to Why– OLTP : Storing primitive data, supporting routine business operation(What) – OLAP : Storing cumulative data , supporting business goal(Why)
2. OLAP2) Concepts for OLAP
Information Source
Information Broker
Information Consumer
17
OLTP vs. OLAP
■ OLTP vs. OLAP
2. OLAP2) Concepts for OLAP
OLTP OLAP
Definition On-Line Transaction Processing On-Line Analytical Processing
Objective Operational Analytical
Focus Daily repetitious work Decision support in organization
Developer Computer expert End-user
User Simple operator Special analyst
Storing Current valueSummarized and Consolidated
dataUse Repetitive Unstructured
Response Immediate Delayed
Data Updated Summarized
Update Field Recomputation
Amount of Data Small Much
Data Structure Complex Simple
Database RDB MDB
Data period Past, Current Past, Current, Future
Query type Regular Irregular, Analytical
18
Enterprise IT Architecture
■ OLTP/OLAP Enterprise IT Architecture
2. OLAP2) Concepts for OLAP
19
Data Warehouse vs. OLAP Server
■ Data Warehouse vs. OLAP Server
2. OLAP2) Concepts for OLAP
Data Warehouse OLAP Server
Objective Ready to all kinds of retrieval Specialized retrieval
Characteristics Data Storage Computation Engine
Query Type Read only Read/Write
Response Flexible Consistent, rapid
Content Historical, present Historical, present, Future
Data Structure Plain Multi-dimensional
Amount of Data Huge, much detail Much, detail
Development pe-riod
A few month, yrs A few weeks, months
20
Two types of OLAP
■ MOLAP
■ ROLAP
2. OLAP2) Concepts for OLAP
Clients
Clients
Clients
MDBMS
RDBMS MD Processing
Query
SQL
SQL Respond
MD Processing
Query
Respond
21
From RDB to MDB
■ Basic Data Structure of MDB & RDB
– RDB : OLTP, Data Warehouse
■ RDB as OLAP Server– Cannot handle and represent Multi-dimensional relationship well– Cannot summarize data well
■ MDB as OLAP Server– Gives many managerial viewpoints– EUC– Supports analysis functionality
Table
Field, Row
Record,Column
Cube
Dimension
Hierarchy
– MDB : OLAP
2. OLAP2) Concepts for OLAP
23
Reference
■ Euiho Suh, “EIS_DSS_OLAP_DW (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory)
■ Euiho Suh, “OLAP (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory)
■ O’Brien & Marakas, “Introduction to Information Systems – Fifteenth Edition”, McGraw – Hill, Chapter 5, pp. 137~168