database and data warehouse 2. data warehouse and...

17
2. DATA WAREHOUSE AND OLAP Database and Data warehouse 1

Upload: others

Post on 14-Aug-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

2. DATA WAREHOUSE AND OLAP

Database and Data warehouse

1

Page 2: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

From Database to Data Warehouse

Data Warehouse:

combines and

reorganizes

current and

historical data

from multiple

data sources into

a central storage

for decision

making purposes.

Very larger than

DB. Why?

2

Page 3: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Data Warehouse

Definition

“A subject-oriented, integrated, time-variant and

nonvolatile (non-updatable) collection of data in

support of decision-making process.”

◦ Subject? Customer, product, sales, etc.

◦ Integrated? From many sources

◦ Time-variant? Historical

◦ Nonvolatile? Accumulated

3

Page 4: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Data Warehouse

Data mart

◦ A subset of data warehouse relevant to specific purposes

Two major applications of Data warehouse

◦ OLAP: Multidimensional analysis of DW data

◦ Data Mining: Knowledge discovery from DW data

ETL = Extraction, transformation, and loading

◦ A process that extracts information from internal and external

databases, transforms the information using a common set of

enterprise definitions, and loads the information into a data

warehouse

4

Page 5: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

ETL

• Standardization • Removing redundancies

• Missing data? fill-up or discard • Incorrect data? discard or correction • Outlier? Smoothing or discard

• Attribute reduction • Data compression (Ex: histogram, regression)

• Generalization/specialization • Summation • Normalization (ex: btw 0-1) • Attribute construction

5

Page 6: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

DW is Multidimensional

DB is relational - databases contain information in a

series of tables (relations)

In a data warehouse and data mart, information is

multidimensional, it contains layers of columns and

rows

Northeast

Southeast

Central

West

Northeast

Southeast

Central

West

Northeast

Southeast

Central

West

Quarter1

Quarter2

Quarter3

Quarter4

Quarter1

Quarter2

Quarter3

Quarter4

Quarter1

Quarter2

Quarter3

Quarter4

Gizmo

Widget

Gizmo

Widget

Gizmo

Widget

6

Page 7: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Multidimensional Data

7

Page 8: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

OLTP and OLAP

OLTP

◦ Real time processing of transactions such as sales, flight

reservation, cancellation, etc.

◦ OLTP works on database

◦ online processing ≈ real time processing ⇔ batch processing

OLAP

◦ OLAP provides online multidimensional analysis functionality

working on Data Warehouse or Data Mart.

◦ Need a data warehouse and OLAP or mining system to analyze

patterns, trends, or outliers!

◦ What is required for the following analysis?

Effects of oil price increase on car manufacturer?

Cell phone usage patterns of college students in urban area?

8

Page 9: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

OLTP and OLAP

OLTP (DB) OLAP (DW)

users clerk, IT professional knowledge worker

#users thousands hundreds

function Operations/transactions

processing

decision support

DB design application-oriented subject-oriented

data Current (up-to-date)

detailed

dispersed over applications

historical

summarized

multidimensional, integrated

usage repetitive on occasions

access Updating (read/write)

Indexing on primary key

Loading and accessing

unit of work short, simple transaction complex analytic query

Data size 100MB-GB (each DB) 100GB-TB

(many source and historical)

quality metric transaction throughput query throughput

9

Page 10: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

OLAP using DW Example Example: Sales volume is a function of month, product, and customer.

1. What is the sale amount of Digital Camera in Feb., 2005 by Fred Smith?

2. What if you want to know the temporal sales trends of Fred Smith?

10

Page 11: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

From Tables to Data Cube

A DW stores data in Data Cube (not in relations)

Data Cube maintains both dimension tables and fact

tables

◦ Dimension tables: attribute structures such as item (item_name,

brand, type), time (day, week, month, quarter, year) etc.

◦ Fact tables: values such as dollars_sold, unit_sold, etc.

Data Cube and Cuboid

◦ Data cube: composed of cuboids

◦ Apex cuboid: the top most 0-D cuboid, the highest-level of

summarization

◦ Base cuboid: n-D base cube. n is the number of dimensions

11

Page 12: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Data Cube = A Collection of Cuboids

12

quarter, prod

quarter,prod,country

quarter, prod, country, supplier

all

quarter product country supplier

quarter,country

quarter,supplier

prod,country

prod,supplier

country,supplier

quarter,prod,supplier

quarter,country,supplier

prod,country,supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Page 13: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Typical OLAP Operations

Roll up

◦ Drill up (취합): summarize data

◦ Climbing up the cube: dimension reduction

Drill down

◦ Roll down (분해): reverse of roll-up

◦ Going down the cube: introducing new dimensions

Slicing

◦ Fixing dimensions (to look at only one dimension)

Pivoting

◦ View data from different perspective by reorienting the cube

Computation

◦ Computation at each cuboids on vast amount of data in advance

13

Page 14: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Typical OLAP Operations

Sales volume as a function of product, month, and measures

all

month product measure

month, product

month, measure

product, measure

month, product, measure

Roll up

Drill down

14

Page 15: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

OLAP Requirements

Types of Analysis

◦ What-if Analysis, Sensitivity Analysis, Goal Seeking Analysis, Rank

Analysis, Exception Analysis, Prediction, etc.

OLAP Requirements

◦ FASMI: Fast Analysis of Shared Multidimensional Information

◦ Fast: 1 sec, 2 sec, 5 sec or 30 sec. according to the type of tasks

◦ Analysis: various analysis tools

◦ Shared: supports many users and purposes

◦ Multi-dimensional Information: many attributes, hierarchical

attributes.

15

Page 16: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

OLAP Types

OLAP Data storage

◦ Stored in Cubes or Relations

Types of OLAP (according to data storages)

◦ MOLAP (Multidimensional OLAP)

Store data in Cubes and performs operations such as drill-down, roll-up,

slicing, etc., for analysis

Very fast but pre-computation required

◦ ROLAP

Use data directly from relational DB(current and historical data)

Very scalable

◦ Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)

Flexibility, e.g., low level: relational, high-level: array

16

Page 17: Database and Data warehouse 2. DATA WAREHOUSE AND OLAPcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Warehouse Data mart A subset of data warehouse relevant

Part II Summary

Database

◦ Data, information, knowledge, wisdom

◦ File and database

◦ Relational DBMS

◦ SQL

Data warehouse

◦ Data warehouse

◦ Data cubes

◦ OLTP and OLAP

17