chapter 15 data warehousing, olap, and data mining

Post on 21-Dec-2015

254 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 15Data Warehousing, OLAP, and

Data Mining

2

Introduction

• Data, data, data…everywhere!• Information…that’s another story!• Especially, the right information @ the right time!• Data warehousing’s goal is to make the right

information available @ the right time• Data warehousing is a data store (eg., a

database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes

3

Introduction

• Data warehouses are natural allies for data mining (work together well)

• Data mining can help fulfill some of the goal of data warehouses – right information @ the right time

• Relational database management systems (RDBMS), such as Oracle, DB2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing

4

Definitions of a Data Warehouse

- W.H. Inmon

“A subject-oriented, integrated, time-variant and

non-volatile collection of data in support of

management's decision making process”

- Ralph Kimball

“A copy of transaction data, specifically structured for query and analysis”

1.

2.

5

Data Warehouse

• For organizational learning to take place, data

from many sources must be gathered together

and organized in a consistent and useful way –

hence, Data Warehousing (DW)

• DW allows an organization (enterprise) to

remember what it has noticed about its data

• Data Mining techniques make use of the data in

a Data Warehouse

6

Data Warehouse

Customers

Etc…

Vendors Etc…

Orders

DataWarehouse

Enterprise“Database”

Transactions

Copied, organizedsummarized

Data Mining

Data Miners:• “Farmers” – they know• “Explorers” - unpredictable

7

Data Warehouse

A data warehouse is a copy of transaction data

specifically structured for querying, analysis, reporting,

and more rigorous data mining

Note that the data warehouse contains a copy of the

transactions which are not updated or changed later by

the transaction system

Also note that this data is specially structured, and may

have been transformed when it was copied into the data

warehouse

8

Data Mart

• A Data Mart is a smaller, more focused

Data Warehouse – a mini-warehouse.

• A Data Mart typically reflects the business

rules of a specific business unit within an

enterprise.

9

Data Warehouse to Data Mart

DataWarehouse

Data Mart

Data Mart

Data Mart

Decision Support

Information

Decision Support

Information

Decision Support

Information

10

Generic Architecture of Data

(synonym) Transaction data

11

Transaction (Operational) Data

• Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc…

• Transactions are the base level of data – the raw material for understanding customer behavior

• Unfortunately, operational systems change due to changing business needs

• Fortunately, operational systems can usually be changed to support changing business needs

• Data warehousing strategies need to be aware of operational system changes

12

Operational Summary Data

Summaries are for a specific time period and utilize the transaction data for that time period

Other Examples???

13

Decision Support Summary Data

• The data that are used to help make decisions about the business– Financial Data, such as:

• Income Statements (Profit & Loss)• Balance Sheets (Assets – Liabilities = Net Worth)

– Sales summaries– Other examples???

• Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing???)

• Generally, it is a bad idea to use the same system for analytic and operational purposes

14

Database Schema

• Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data)

• In RDBMS:– Columns = fields = attributes (A,B,C)– Rows = records = tuples (1-7)

15

Logical & Physical Database Schema

• Describes data in a way that is familiar to business users

• Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it

16

Metadata

• General definition: Data about data !!!– Examples:

• A library’s card catalog (metadata) describes publications (data)

• A file system maintains permissions (metadata) about files (data)

• A form of system documentation including:– Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.)– Description of the contents of each field (e.g., start date)– Date when data were loaded– Indication of currency of the data (last updated)– Mappings between systems (e.g., A.this = B.that)

• Invaluable, otherwise have to research to find it

17

Business Rules

• Highest level of abstraction from operational (transaction) data

• Describes why relationships exist and how they are applied

• Examples:– Need to have 3 forms of ID for credit

– Only allow a maximum daily withdrawal of $200

– After the 3rd log-in attempt, lock the log-in screen

– Accept no bills larger than $20

– Others???

18

General Architecture for Data Warehousing

• Source systems

• Extraction, (Clean),

Transformation, &

Load (ETL)

• Central repository

• Metadata repository

• Data marts

• Operational feedback

• End users (business)

19

Where does OLAP fit in?

20

OLAP Overview

• Interactive, exploratory analysis of multidimensional data to discover patterns

age accid

ents

gen

de

r

21

OLAP Architecture

22

Server Options

• Single processor

• Symmetric

multiprocessor (SMP)

• Massively parallel

processor (MPP)

23

OLAP Server Options

• ROLAP (Relational)

• MOLAP (Multidimensional)

• HOLAP (Hybrid)

24

OLAP – Online Analytical Processing

• A definition:

• Data representation is in the form of a CUBE• OLAP goes beyond SQL with its analysis

capabilities• Key feature of OLAP: Relevant multi-dimensional

views such as products, time, geography

25

OLAP Cube - 1

26

OLAP Cube - 2

27

OLAP Cube - 3

• Star Structure (quite common)

Facts

Week

Product

Product

Year

Region

Time

Channel

Revenue

Expenses

Units

Model

Type

Color

Channel

Region

Nation

District

Dealer

Time

28

OLAP Cube - 4

Sales 1996

Redblob

Blueblob

1997

TheCube

29

OLAP Cube - 5

Three-Dimensional

CubeDisplay

Page ColumnsRegion:North

Sales

Redblob

Blueblob

Total

1996Rows 1997Year Total

30

OLAP Cube - 6

Six-Dimensional

Cube

Dimension ExampleBrand Mt. AiryStore AtlantaCustomer segment BusinessProduct group DesksPeriod JanuaryVariable Units sold

31

Rotation (Pivot Table)

32

Drill Down

33

OLAP Examples

• http://perso.wanadoo.fr/bernard.lupin/english/example.htm

• Excel Pivot Table example (similar to OLAP cube)

34

Sample of OLAP products

Just a snippet from http://www.olapreport.com/ProductsIndex.htm ; not an endorsement

35

Data Mining versus OLAP

36

Data Mining versus OLAP

• OLAP - Online

Analytical Processing

– Provides you with a very

good view of what is

happening, but can not

predict what will happen

in the future or why it is

happening

37

Results of Data Mining Include:

• Forecasting what may happen in the future• Classifying people or things into groups by

recognizing patterns• Clustering people or things into groups

based on their attributes• Associating what events are likely to occur

together• Sequencing what events are likely to lead

to later events

38

End of Chapter 15

top related