concepts and components chapters 1, 2, 7, 14, 15 1

33
DATA WAREHOUSING Concepts and Components Chapters 1, 2, 7, 14, 15 1

Upload: stewart-daniels

Post on 22-Dec-2015

239 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Concepts and Components Chapters 1, 2, 7, 14, 15 1

1

DATA WAREHOUSINGConcepts and Components

Chapters 1, 2, 7, 14, 15

Page 2: Concepts and Components Chapters 1, 2, 7, 14, 15 1

2

Agenda

Background Data Warehouse vs Operational Data

Store Characteristics of a Data Warehouse Improvements in Data Warehousing Relationship to Business Intelligence

Page 3: Concepts and Components Chapters 1, 2, 7, 14, 15 1

3

Evolution of Decision Support Technologies Business people need information to make plans,

decisions, and assess results 60's

Batch reports 70’s

DSSs 80’s

Info Centers 90’s

Early DWs 2000's

Business Intelligence Issues:

Dependency on IT resources Based on OLTP or extracts Functionality often pre-programmed "Big Data" Analytics

Page 4: Concepts and Components Chapters 1, 2, 7, 14, 15 1

4

DW vs. Business Intelligence Short:

DW = populating structures with data BI = using DW data

Long: DW = body of historical data, separate from the

operations of the organization, used to create BI BI = the delivery of timely, accurate, and

useful information to decision-makers Broad:

BI = a broad category of applications, technologies, and organizational processes for gathering, storing, accessing, and analyzing data to help business users make better decisions

Page 5: Concepts and Components Chapters 1, 2, 7, 14, 15 1

5

Need for Decision-Optimized Data Storage Business people need information to

make plans, decisions, and assess results What were sales volumes by region and

product category for the last 3 years? Which of two new medications will result in

the best outcomes (higher recovery rate and shorter hospital stay)?

Data captured by complex operational systems (OLTPs) optimized to support well-defined transaction requirements

Difficult to get needed information from data grounded in OLTPs

Page 6: Concepts and Components Chapters 1, 2, 7, 14, 15 1

6

Operational vs. Informational Data Operational (ie,

OLTP)Informational (ie, OLAP)

Data Content Current Values Historical, derived, summarized

Data Structure

Optimized for transactions

Optimized for complex queries

Data Volume MB/GB of data GB/TB/PB… of data

Access Frequency

High Medium to low

Access Type Read, update, delete Read-only

Usage Predictable, repetitive Ad hoc, random, heuristic

Response Time

Sub-seconds Several seconds to minutes

Users Large number; operational & data workers

Relatively smaller number; data & knowledge workers

Page 7: Concepts and Components Chapters 1, 2, 7, 14, 15 1

7

Data Warehouse“… a subject-oriented, integrated, nonvolatile,

and time variant collection of data in support of management decisions.” Managing the Data Warehouse, W. H. Inmon, John

Wiley & Sons, December, 1996.

“… a copy of transaction data specifically structured for query and analysis.” The Data Warehouse Toolkit, R. Kimball, John Wiley

& Sons, February, 1996.

Enterprise data, transformed, integrated, accumulated over time, optimized for decision-making, and accessible via analytical tools

Page 8: Concepts and Components Chapters 1, 2, 7, 14, 15 1

8

Characteristics of a DW (ala Inmon) Subject-Oriented

As opposed to business-process oriented Integrated

Multiple sources, internal and external Critical part of DW implementation

Time-Variant History, time periods important

Non-Volatile DW data not changed once stored

Page 9: Concepts and Components Chapters 1, 2, 7, 14, 15 1

9

Subject Orientation Data organized based on:

How users refer to it Subject areas of interest to users Areas important for tracking success, performance

Often based on transactions

Graphic Source: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=22&ved=0CHUQFjALOAo&url=http%3A%2F%2Fwww.csun.edu%2F~hcmgt004%2FDWPresSp05.ppt&ei=U1S9VMuOJsTCgwSes4HADg&usg=AFQjCNGvjoMayU2-79jot8f2zb4J_IbZGQ&bvm=bv.83829542,d.eXY&cad=rja

Page 10: Concepts and Components Chapters 1, 2, 7, 14, 15 1

10

Characteristics of a DW, cont… Subject-Oriented

Needs are business subject-focused Integrated

Multiple sources, internal and external Time-Variant

History, time periods important Non-Volatile

DW data not changed once stored Data Granularity

Page 11: Concepts and Components Chapters 1, 2, 7, 14, 15 1

11

Data Granularity*

Level of detail stored in database Operational focus Analytical focus

Examples: Life Insurance Policy vs. Life Insurance

Coverage Product Category vs. Product Sales

High granularity (eg, transactional grain) is most flexible

* See http://www.kimballgroup.com/2007/07/keep-to-the-grain-in-dimensional-modeling/ for excellent (and brief) description of granularity

Page 12: Concepts and Components Chapters 1, 2, 7, 14, 15 1

12

Challenges in Early DW Implementation1. Improper or infeasible architectures,

approaches2. Insufficient attention to

organizational strategy and culture3. Early information delivery tools too

complex for business users4. Storage technology made it difficult

to store much detail or history, and slow to process

Page 13: Concepts and Components Chapters 1, 2, 7, 14, 15 1

13

Improved Technology User-friendly tools for analysis,

visualization Excel Tableau Reporting Services, …

Improved technology for accessing, aggregating, partitioning data

Advances in processing technology Parallel processing In-Memory data warehouses

Advances in storage technology RAID Solid State

Page 14: Concepts and Components Chapters 1, 2, 7, 14, 15 1

14

Improved Architectures

Based on Data Marts Conformed dimensions BI-emphasis

Page 15: Concepts and Components Chapters 1, 2, 7, 14, 15 1

15

Data Warehouse vs. Data Marts

Enterprise Data Warehouse Information about ALL subjects

important to the organization

Pros Cons

Requires a corporate effort Requires a corporate effort

Single, central storage Longer (costlier) to implement

Centralized control, architecture

Higher risk of failure

Page 16: Concepts and Components Chapters 1, 2, 7, 14, 15 1

16

Data Warehouse vs. Data Marts, cont…

Pros Cons

Faster implementation Can introduce redundant data

Earlier return on investment

Can make data mart integration more complex

Less risk of failure

Gives project team time to learn, grow

Data Marts Subsets of data warehouse that focus on

a selected subject area; typically departmental in nature

Page 17: Concepts and Components Chapters 1, 2, 7, 14, 15 1

17

Data Warehouse Architecture: Basic

Page 18: Concepts and Components Chapters 1, 2, 7, 14, 15 1

18

Data Warehouse Architecture: Types

Page 19: Concepts and Components Chapters 1, 2, 7, 14, 15 1

19

BI Architecture

Source: Chaudhuri et. al. , An Overview of Business Intelligence Technology, Communications of the ACM, 54(8), August 2011, pp. 88-98.

Page 20: Concepts and Components Chapters 1, 2, 7, 14, 15 1

20

BI Architecture, cont…

Source: Oracle Corporation. Information Management and Big Data: A Reference Architecture, Oracle White Paper, February 2013, p. 12.

Page 22: Concepts and Components Chapters 1, 2, 7, 14, 15 1

22

Architecture Components

1. Data Sources2. Data Staging (Movement)3. Data Storage (Warehouse) 4. Data Analysis/Discovery (Mid-tier)5. Information Delivery (Front-end

Presentation)

Page 23: Concepts and Components Chapters 1, 2, 7, 14, 15 1

23

1. Data Sources

Identifying required business data from Production Internal, Personal Archived External

Page 24: Concepts and Components Chapters 1, 2, 7, 14, 15 1

24

2. Data Staging Extract

From source systems Transform

Cleanse Supplement Convert Combine…

Load Populate data warehouse/mart tables

Page 25: Concepts and Components Chapters 1, 2, 7, 14, 15 1

25

3. Data Storage

Data Warehouse / Data Mart Relational database for structured data Non-relational (e.g., Hadoop) data store

for "loosely-structured" data Metadata

Relational database Catalog Extended properties Custom tables

External products/tools Spreadsheets…

Page 26: Concepts and Components Chapters 1, 2, 7, 14, 15 1

26

4. Data Analysis: Supporting Knowledge Discovery

Forms of Discovery

Know What Info to Look For

Know How to Explore Info

Layout-Led Discovery

X X

Data-Led Discovery

X

Model-Led Discovery

Layout-Led Discovery Pre-Designed Reports

Data-Led Discovery OLAP Analysis

Model-Led Discovery Data Mining

Page 27: Concepts and Components Chapters 1, 2, 7, 14, 15 1

27

Pre-Defined Reports

Information pushed to user Content and layout pre-determined Can be parameter-driven Can support some drill-down May also include basic report

development

Page 28: Concepts and Components Chapters 1, 2, 7, 14, 15 1

28

OLAP Online Analytical Processing

Providing On-Line Analytical Processing to User Analysts, E. F. Codd, Codd & Date, Inc 1993.

Short Definition: Class of applications or tools that support

ad-hoc analysis of multidimensional data Longer Definition:

“…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…” OLAP Council (www.olapcouncil.org)

Page 29: Concepts and Components Chapters 1, 2, 7, 14, 15 1

29

Data Mining

Search for patterns in large amounts of data Making connections/associations with

data Predicting future outcomes

OLAP vs. Data Mining “Report on the past” vs. “Predict the

future”

Part of Knowledge Discovery…

Page 30: Concepts and Components Chapters 1, 2, 7, 14, 15 1

30

5. Information Delivery

Page 31: Concepts and Components Chapters 1, 2, 7, 14, 15 1

31

Examples of Uses of BI/DW

Industry Use

Retail Customer Loyalty

Customer Service Effectiveness

Financial Fraud Detection

Profitability by product/LOB

Airlines Route Profitability

Customer Profitability

Manufacturing Cost Reduction Opportunities

Product Shipments

Non Profits Giving Campaign Effectiveness

Salvation Army Bell Ringer Effectiveness

Government Manpower Planning

School Academic Performance

Medical Patient Risk for Disease

Page 32: Concepts and Components Chapters 1, 2, 7, 14, 15 1

32

More Examples…

Industry Use

Marketing Churn Reduction

Brand/Product/Company Perception

Sales Cross-Selling Opportunities

Technology/Consulting

Estimate future engagements

Page 33: Concepts and Components Chapters 1, 2, 7, 14, 15 1

33

Next Time…

Data Warehouse Design (Dimensional Modeling)