chapter 15 data warehousing, olap, and data mining

38
Chapter 15 Data Warehousing, OLAP, and Data Mining

Post on 21-Dec-2015

253 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Chapter 15 Data Warehousing, OLAP, and Data Mining

Chapter 15Data Warehousing, OLAP, and

Data Mining

Page 2: Chapter 15 Data Warehousing, OLAP, and Data Mining

2

Introduction

• Data, data, data…everywhere!• Information…that’s another story!• Especially, the right information @ the right time!• Data warehousing’s goal is to make the right

information available @ the right time• Data warehousing is a data store (eg., a

database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes

Page 3: Chapter 15 Data Warehousing, OLAP, and Data Mining

3

Introduction

• Data warehouses are natural allies for data mining (work together well)

• Data mining can help fulfill some of the goal of data warehouses – right information @ the right time

• Relational database management systems (RDBMS), such as Oracle, DB2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing

Page 4: Chapter 15 Data Warehousing, OLAP, and Data Mining

4

Definitions of a Data Warehouse

- W.H. Inmon

“A subject-oriented, integrated, time-variant and

non-volatile collection of data in support of

management's decision making process”

- Ralph Kimball

“A copy of transaction data, specifically structured for query and analysis”

1.

2.

Page 5: Chapter 15 Data Warehousing, OLAP, and Data Mining

5

Data Warehouse

• For organizational learning to take place, data

from many sources must be gathered together

and organized in a consistent and useful way –

hence, Data Warehousing (DW)

• DW allows an organization (enterprise) to

remember what it has noticed about its data

• Data Mining techniques make use of the data in

a Data Warehouse

Page 6: Chapter 15 Data Warehousing, OLAP, and Data Mining

6

Data Warehouse

Customers

Etc…

Vendors Etc…

Orders

DataWarehouse

Enterprise“Database”

Transactions

Copied, organizedsummarized

Data Mining

Data Miners:• “Farmers” – they know• “Explorers” - unpredictable

Page 7: Chapter 15 Data Warehousing, OLAP, and Data Mining

7

Data Warehouse

A data warehouse is a copy of transaction data

specifically structured for querying, analysis, reporting,

and more rigorous data mining

Note that the data warehouse contains a copy of the

transactions which are not updated or changed later by

the transaction system

Also note that this data is specially structured, and may

have been transformed when it was copied into the data

warehouse

Page 8: Chapter 15 Data Warehousing, OLAP, and Data Mining

8

Data Mart

• A Data Mart is a smaller, more focused

Data Warehouse – a mini-warehouse.

• A Data Mart typically reflects the business

rules of a specific business unit within an

enterprise.

Page 9: Chapter 15 Data Warehousing, OLAP, and Data Mining

9

Data Warehouse to Data Mart

DataWarehouse

Data Mart

Data Mart

Data Mart

Decision Support

Information

Decision Support

Information

Decision Support

Information

Page 10: Chapter 15 Data Warehousing, OLAP, and Data Mining

10

Generic Architecture of Data

(synonym) Transaction data

Page 11: Chapter 15 Data Warehousing, OLAP, and Data Mining

11

Transaction (Operational) Data

• Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc…

• Transactions are the base level of data – the raw material for understanding customer behavior

• Unfortunately, operational systems change due to changing business needs

• Fortunately, operational systems can usually be changed to support changing business needs

• Data warehousing strategies need to be aware of operational system changes

Page 12: Chapter 15 Data Warehousing, OLAP, and Data Mining

12

Operational Summary Data

Summaries are for a specific time period and utilize the transaction data for that time period

Other Examples???

Page 13: Chapter 15 Data Warehousing, OLAP, and Data Mining

13

Decision Support Summary Data

• The data that are used to help make decisions about the business– Financial Data, such as:

• Income Statements (Profit & Loss)• Balance Sheets (Assets – Liabilities = Net Worth)

– Sales summaries– Other examples???

• Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing???)

• Generally, it is a bad idea to use the same system for analytic and operational purposes

Page 14: Chapter 15 Data Warehousing, OLAP, and Data Mining

14

Database Schema

• Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data)

• In RDBMS:– Columns = fields = attributes (A,B,C)– Rows = records = tuples (1-7)

Page 15: Chapter 15 Data Warehousing, OLAP, and Data Mining

15

Logical & Physical Database Schema

• Describes data in a way that is familiar to business users

• Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it

Page 16: Chapter 15 Data Warehousing, OLAP, and Data Mining

16

Metadata

• General definition: Data about data !!!– Examples:

• A library’s card catalog (metadata) describes publications (data)

• A file system maintains permissions (metadata) about files (data)

• A form of system documentation including:– Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.)– Description of the contents of each field (e.g., start date)– Date when data were loaded– Indication of currency of the data (last updated)– Mappings between systems (e.g., A.this = B.that)

• Invaluable, otherwise have to research to find it

Page 17: Chapter 15 Data Warehousing, OLAP, and Data Mining

17

Business Rules

• Highest level of abstraction from operational (transaction) data

• Describes why relationships exist and how they are applied

• Examples:– Need to have 3 forms of ID for credit

– Only allow a maximum daily withdrawal of $200

– After the 3rd log-in attempt, lock the log-in screen

– Accept no bills larger than $20

– Others???

Page 18: Chapter 15 Data Warehousing, OLAP, and Data Mining

18

General Architecture for Data Warehousing

• Source systems

• Extraction, (Clean),

Transformation, &

Load (ETL)

• Central repository

• Metadata repository

• Data marts

• Operational feedback

• End users (business)

Page 19: Chapter 15 Data Warehousing, OLAP, and Data Mining

19

Where does OLAP fit in?

Page 20: Chapter 15 Data Warehousing, OLAP, and Data Mining

20

OLAP Overview

• Interactive, exploratory analysis of multidimensional data to discover patterns

age accid

ents

gen

de

r

Page 21: Chapter 15 Data Warehousing, OLAP, and Data Mining

21

OLAP Architecture

Page 22: Chapter 15 Data Warehousing, OLAP, and Data Mining

22

Server Options

• Single processor

• Symmetric

multiprocessor (SMP)

• Massively parallel

processor (MPP)

Page 23: Chapter 15 Data Warehousing, OLAP, and Data Mining

23

OLAP Server Options

• ROLAP (Relational)

• MOLAP (Multidimensional)

• HOLAP (Hybrid)

Page 24: Chapter 15 Data Warehousing, OLAP, and Data Mining

24

OLAP – Online Analytical Processing

• A definition:

• Data representation is in the form of a CUBE• OLAP goes beyond SQL with its analysis

capabilities• Key feature of OLAP: Relevant multi-dimensional

views such as products, time, geography

Page 25: Chapter 15 Data Warehousing, OLAP, and Data Mining

25

OLAP Cube - 1

Page 26: Chapter 15 Data Warehousing, OLAP, and Data Mining

26

OLAP Cube - 2

Page 27: Chapter 15 Data Warehousing, OLAP, and Data Mining

27

OLAP Cube - 3

• Star Structure (quite common)

Facts

Week

Product

Product

Year

Region

Time

Channel

Revenue

Expenses

Units

Model

Type

Color

Channel

Region

Nation

District

Dealer

Time

Page 28: Chapter 15 Data Warehousing, OLAP, and Data Mining

28

OLAP Cube - 4

Sales 1996

Redblob

Blueblob

1997

TheCube

Page 29: Chapter 15 Data Warehousing, OLAP, and Data Mining

29

OLAP Cube - 5

Three-Dimensional

CubeDisplay

Page ColumnsRegion:North

Sales

Redblob

Blueblob

Total

1996Rows 1997Year Total

Page 30: Chapter 15 Data Warehousing, OLAP, and Data Mining

30

OLAP Cube - 6

Six-Dimensional

Cube

Dimension ExampleBrand Mt. AiryStore AtlantaCustomer segment BusinessProduct group DesksPeriod JanuaryVariable Units sold

Page 31: Chapter 15 Data Warehousing, OLAP, and Data Mining

31

Rotation (Pivot Table)

Page 32: Chapter 15 Data Warehousing, OLAP, and Data Mining

32

Drill Down

Page 33: Chapter 15 Data Warehousing, OLAP, and Data Mining

33

OLAP Examples

• http://perso.wanadoo.fr/bernard.lupin/english/example.htm

• Excel Pivot Table example (similar to OLAP cube)

Page 34: Chapter 15 Data Warehousing, OLAP, and Data Mining

34

Sample of OLAP products

Just a snippet from http://www.olapreport.com/ProductsIndex.htm ; not an endorsement

Page 35: Chapter 15 Data Warehousing, OLAP, and Data Mining

35

Data Mining versus OLAP

Page 36: Chapter 15 Data Warehousing, OLAP, and Data Mining

36

Data Mining versus OLAP

• OLAP - Online

Analytical Processing

– Provides you with a very

good view of what is

happening, but can not

predict what will happen

in the future or why it is

happening

Page 37: Chapter 15 Data Warehousing, OLAP, and Data Mining

37

Results of Data Mining Include:

• Forecasting what may happen in the future• Classifying people or things into groups by

recognizing patterns• Clustering people or things into groups

based on their attributes• Associating what events are likely to occur

together• Sequencing what events are likely to lead

to later events

Page 38: Chapter 15 Data Warehousing, OLAP, and Data Mining

38

End of Chapter 15