8659164-datawarehousing
Post on 23-Oct-2015
7 Views
Preview:
DESCRIPTION
TRANSCRIPT
DATA WAREHOUSING
Agenda
Introduction Process DSS Information processing Dimensions OLAP Architecture Types Best Practise Case
IntroductionShilpa Surve
04/17/2023 4
Definition
Data Warehouse is a• Subject-Oriented• Integrated• Time-Variant• Non-volatile
04/17/2023 5
What are Data Warehouses?
Data warehouses store large volumes of data which are frequently used by DSS
It is maintained separately from the organization’s operational databases
Data warehouses are relatively static with only infrequent updates
A data warehouse is a stand-alone repository of information, integrated from several, possibly heterogeneous operational databases
04/17/2023 6
Steps in Building a Warehouse
Identify key business drivers, sponsorship, risks, ROI
Survey information needs and identify desired functionality and define functional requirements for initial subject area.
Architect long-term, data warehousing architecture
Evaluate and Finalize DW tool & technologyConduct Proof-of-Concept
04/17/2023 7
Steps in building Data Warehouse
Design target data base schema Build data mapping, extract,
transformation, cleansing and aggregation/summarization rules
Build initial data mart, using exact subset of enterprise data warehousing architecture and expand to enterprise architecture over subsequent phases
Maintain and administer data warehouse
04/17/2023 8
The Three Views of Data Warehousing
Strategic or Business view• Define key business drivers of data warehouse• How can business-driven approach achieve high
ROI? Architectural or Technology view
• Alternative data warehousing architectures• How can the right architecture achieve a high
ROI? Methodology or Implementation view
• Development and implementation methodology• How can the right methodology achieve a rapid
ROI?
ProcessSwathi Velisetty
04/17/2023 10
DW Components
TransmissionNETWORK
Metadata Layer
Cleansing
Transformation
AggregationSummarization
Data Mart Population
Knowledge Discovery
ODS DW
OLAP ANALYSIS
Extraction
DM1
DM2
DMn
Legacy System
FS1
FS2
FSn
.
.
.
STAGING
AREA
04/17/2023 11
Cleansing process
Raw data (Staging Area)
Process MetadataCleansing Rules
Control Metadata
CleansingProcess
Cleansing
Reports
Good
Bad
Clean data
•Clean the Raw Data •Mark it Good/Bad•Generate the cleansing Reports and mail to the DWA and Feed System representatives
04/17/2023 12
Transformation Process
TransformationProcess
CleanOperational
DataOperational
Data Store
•Transform the cleaned Operational Data into DSS Data •Load the DSS data into ODS•ODS contains the current DSS data at the lowest level of granularity
Control Metadata
Process Metadata•Mapping Detail•Transformation Rule
04/17/2023 13
Summarization Process
Summarization
Process
ODS
Weekly Monthly Yearly
DW
• Summarize and aggregate ODS data and Populate to the Warehouse• Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily )
Control Metadata
04/17/2023 14
Enterprise Data Warehouse
DATA WAREHOUSE
Legacy
OLTP
External
API
USERS
Operational Systems Enterprise wide Data
Select
Extract
Maintain
Transform
Integrate
Data Preparation
Metadata Repositor
yClient/Server
04/17/2023 15
Distributed Data Marts
API
USERS
Operational Systems Data
Data Preparation
Data Mart
Data Mart
Data MartLegacy
OLTP
External
Select
Extract
Maintain
Transform
Integrate
Client/Server
04/17/2023 16
Multi-tiered Data Warehouse
DATA WAREHOUSE
Legacy
Client/Server
OLTP
External
API
USERS
Operational Systems Enterprise wide Data
Metadata Repository
Data Mart
Data Mart
Data Mart
Select
Extract
Maintain
Transform
Integrate
04/17/2023 17
Example
Monthly Sales by Product for 1991-94
Weekly sales by product/sub-productfor 1991-94
Sales Detailfor 1991-94
Sales Detail for1985-90
Metadata
Weekly sales by region for 1991-94
Monthly sales by region for 1991-94
Decision support system
Atul zade
04/17/2023 19
What is DSS?
Enable users to get a “Business View” of the data
Facilitate Data based Decision Making that would drive and improve the Business
Discover “Hidden Trends”
Decision Support Systems (DSS) are interactive computer-based systems intended to help decision makers utilize data and models to identify and solve problems and make decisions. Data Warehouse is the foundation of DSS process. It is a Strategy and a Process for Staging Corporate Data.
Decision Support Systems (DSS) are interactive computer-based systems intended to help decision makers utilize data and models to identify and solve problems and make decisions. Data Warehouse is the foundation of DSS process. It is a Strategy and a Process for Staging Corporate Data.
04/17/2023 20
Driving Forces for DSS
Changes in the Business Environment
RESULT:
Customers
Reform
Technology
Business Speed
COMPETITION
Contd.
04/17/2023 21
How to answer these Business Queries?
What is the sales distribution region wise?
What is Defaulter’s Profile?
What are the slow movers in my product line?
How did my revenue improve in the past 5 years?
Which of my Sales Agentsare doing better?
Who are my profitable customers?
Currency Risk, Interest Rate Risk, Liquidity Risk
Strategic Planning / Budgeting
Which channel costs me more and pays less?
OLTP v/s DSS Environment
OLTP Environment
• get data IN • large volumes of simple
transaction queries• continuous data changes• low processing time• mode of processing• transaction details• data inconsistency• mostly current data• high concurrent usage• highly normalized data
structure• static applications• automates routines
DSS Environment • get information OUT • small number of diverse
queries• periodic updates only• high processing time• mode of discovery• subject oriented - summaries • data consistency• historical data is relevant• low concurrent usage• fewer tables, but more
columns per table• dynamic applications• facilitates creativity
04/17/2023 23
Benefits for Business User
• Flexible Information Access• High Availability• Ease of Use• Quality & Completeness of Data• Focus on Information Processing• Information Base for Knowledge Discovery
04/17/2023 24
Classification of Business Users
• Executives/Managers• Multi-dimensional analysis, reporting tools
• Knowledge Worker• Ad hoc queries, detail & summary data,
application focus• Power-Analyst
• Ad hoc queries, Data Analysis & Data Mining
• Customer Contacts• Detail Data at specific levels
Information processingPrem Sequera
04/17/2023 26
Data Processing to Information Processing
Business Objectives & GoalsApplication Domains and Business FunctionsB U S I N E S S E L E M E N T S
Heterogeneous Data Sources Feed Systems and External Sources
D A T A E L E M E N T S
T
R
A
C
E
AB
IL
IT
Y
Query Processing
ReportGeneration
KNOWLEDGEDISCOVERYData MiningApplicationsKNOWLEDGE MANAGEMENT
T
R
A
C
E
AB
IL
IT
Y
OperationalData Store(ODS)
OLAP/QueryTools
Enterprise Data Warehouse
OLAPAppl.
Data Mart A
Data Mart B
Data Mart N
Appl. Spec.Analysis
Appl. Spec.Analysis
Appl. Spec.Analysis
Management Decision: Value Chain
Data ProcessingInformation ProcessingKnowledge Processing
04/17/2023 27
Subject Oriented Analysis
Data Warehouse StorageTransactional Storage
SalesSales
CustomersCustomers
ProductsProducts
Entry
Sales RepQuantity SoldPart NumberDate Customer NameProduct DescriptionUnit PriceMail Address
Process Oriented Subject Oriented
04/17/2023 28
Integration of Data
Data Warehouse StorageTransactional Storage
Appl. A - M, FAppl. B - 1, 0Appl. C - X, Y
Appl. A - pipeline cm.Appl. B - pipeline inchesAppl. C - pipeline mcf
Appl. A - balance dec(13,2) Appl. B - balance PIC 9(9)V99Appl. C - balance float
Appl. A - bal-on-handAppl. B - current_balanceAppl. C - balance
Appl. A - date (Julian)Appl. B - date (yymmdd)Appl. C - date (absolute)
M, F
pipeline cm
balance dec(13, 2)
balance
date (Julian)In
teg
rati
on
Encoding
Unit of Attributes Physical Attributes Naming Conventions
Data Consistency
04/17/2023 29
Volatility of Data
Load
Access
Mass Load / Access of DataRecord-by-Record Data Manipulation
Insert
Access
Insert
Change
Delete
Change
Volatile Non-Volatile
Data Warehouse StorageTransactional Storage
04/17/2023 30
Time Variant Data Analysis
Data Warehouse StorageTransactional Storage
Current Data Historical Data
0
5
10
15
20
Sales ( in lakhs )
January February March
Year97
Sales ( Region , Year - Year 97 - 1st Qtr)
EastWestNorth
DimensionKairav Parikh
04/17/2023 32
What is a Dimension?
Data Warehouse is• Subject-Oriented• •Integrated• Time-Variant• Non-volatilecollection of data in support of management’s decision.
Subject Dimension
CustomerGeography
Time
04/17/2023 33
Dimensional Hierarchy
World
America
AsiaEurope
USA
FL
Canada
Argentina
GA VA CA WA
Tampa
Miami Orlando
Naples
Continent Level
State Level
City Level
World Level
Country Level
Pare
nt R
elat
ion
Dimension Member / Business Entity
Geography Dimension
Attributes: Population, Tourist’s Place
04/17/2023 34
Types of Dimensions
• Simple Dimensions (e.g. Time)
• Related Dimensions (e.g. Gender of a Customer)
• Spool Dimensions (e.g. Account as an interaction between Customer and Product)
• Bucket Dimensions (e.g. Income Ranges of a Customer)
• Slowly Changing Dimensions (e.g. changes in Organization)
• Fast Varying Dimensions (e.g. changes Retail Customers attributes)
• Unused Dimensions (e.g. Order No., Invoice No.)
04/17/2023 35
Dimensional ModelingSTEP 1
• Identify Subjects (Dimensions)
• Identify Hierarchies of a Dimension
• Identify Attributes of levels in Hierarchies
• Define Grain
Customer
Industry SegmentIndustry Type City
State
Country
Contd.
Fin. Class
04/17/2023 36
Dimensional ModelingSTEP 2
• Use KPIs to identify the Facts
• Group the Facts in a logical set
Trans. Amount
No. of Bonds
No. of TransactionsService Cost...
Financial Transactions
No. of Cheques Cleared
No. of Visits to a Branch
No. of DEMAT Transactions
...
Non-Financial Transactions
Contd.
04/17/2023 37
Dimensional ModelingSTEP 3
• Link the Group of Facts to the Dimensions that participate in the Facts
Customer
OrganizationTime
Product
Channel
Financial Transactions
04/17/2023 38
Dimensional ModelingSTEP 4
• Define Granularity for each Group of Facts
Customer (Customer)
Organization (Branch)
Product (Scheme)
Channel (Channel)
Time (Day-Hour)
Financial Transactions
04/17/2023 39
Data Warehouse Schemas
Star Schema
• A Group of Facts connected to Multiple Dimensions
Customer
OrganizationTime
Product
Channel
Contd.
Financial Transactions
04/17/2023 40
Data Warehouse Schemas
Snow-flake Schema (= Extended Star Schema)
• A Group of Facts connected to Dimensions, which are split across multiple hierarchies and attributes
Customer
Organization
Time Product
ChannelFinancial Transactions
Contd.
Segment
Geography
04/17/2023 41
Data Warehouse Schemas
Galaxy Schema
• Multiple Groups of Facts links by few common dimensions
Fact1
Fact2 Fact3
Dimension2Dimension1
Dimension4
Dimension5
Dimension3
Dimension7Dimension6
OLAPAkshay Shiveshwarkar
04/17/2023 43
On-Line Analytical Processing
OLAP can be defined as a technology which allows the users to view the aggregate data across measurements (like Maturity Amount, Interest Rate etc.) along with a set of related parameters called dimensions (like Product, Organization, Customer, etc.)
04/17/2023 44
What is MDDB?
A multidimensional database is a computer software system designed to allow for efficient and convenient storage and retrieval of data that is • intimately related and
• stored, viewed and analyzed from different perspectives (Dimensions).
04/17/2023 45
RDBMS v/s MDDB
MODEL COLOR SALES VOL.MINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
9 x 3 = 27 cells 3 x 3 = 9 cells
04/17/2023 46
Benefits of MDDB over RDBMS
Ease of Data Presentation & Navigation Intuitive, Spreadsheet / Crosstab like data views
Storage Space Very low Space Consumption compared to Relational DB
Performance Gives much better performance. Relational DB may give comparable results only through
database tuning (indexing, keys etc), which may not be possible for ad-hoc queries.
Ease of Maintenance No overhead as data is stored in the same way it is
viewed. In Relational DB, indexes, sophisticated joins etc. are used which require considerable storage and maintenance
04/17/2023 47
Issues with MDDB
• Sparsity– Controlled Sparsity– Random Sparsity
• Data Explosion– Due to Sparsity– Due to Summarization
• Performance– Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)
04/17/2023 48
OLAP Features
Subject oriented approach to Decision Support Calculations applied across dimensions,
through hierarchies and/or across members Trend analysis over sequential time periods, What-if scenarios. Slicing / Dicing subsets for on-screen viewing Rotation to new dimensional comparisons in
the viewing area Drill-down/up along the hierarchy Reach-through / Drill-through to underlying
detail data
04/17/2023 49
Features of OLAP - Drill Down / Up
Gary
Gleason Carr Levi Lucas Bolton
Midwest
St. LouisChicago
Clyde
REGION
DISTRICT
DEALERSHIP
ORGANIZATION DIMENSIONSales at region/District/Dealership Level
• Moving Up and moving down in a hierarchy is referred to as “drill-up” / “roll-up” and “drill-down”
Architecture TypesRitesh Raushan
04/17/2023 51
Implementation Techniques - OLAP Architectures
• MOLAP - Multidimensional OLAP• Multidimensional Databases for database and
application logic layer• ROLAP - Relational OLAP
• Access Data stored in relational Data Warehouse for OLAP Analysis.
• Database and Application logic provided as separate layers
• HOLAP - Hybrid OLAP• OLAP Server routes queries first to MDDB, then to
RDBMS and result processed on-the-fly in Server• DOLAP - Desk OLAP
• Personal MDDB Server and application on the desktop
04/17/2023 52
MOLAP - MDDB storage
OLAPCalculation
Engine OLAP Tools
OLAP Applications
WebBrowserOLAP
Cube
04/17/2023 53
ROLAP - Standard SQL storage
OLAPCalculation
Engine OLAPTools
OLAP Applications
WebBrowser
Relational DWMDDB - Relational Mapping
SQL
04/17/2023 54
HOLAP - Combination of RDBMS and MDDB
Any Client
OLAPCalculation
Engine OLAPTools
OLAP Applications
WebBrowser
Relational DW
OLAP Cube
SQL
04/17/2023 55
Architecture Comparison
MOLAP ROLAP HOLAP
Definition MDDB OLAP =Transaction level data +summary in MDDB
Relational OLAP =Transaction level data +summary in RDBMS
Hybrid OLAP =ROLAP + summary inMDDB
Data explosion dueto Sparsity
Good Design 3 – 10times
No Sparsity Sparsity exists only inMDDB part
Data explosion dueto Summarization
High (May go beyondcontrol. Estimation isvery important)
To the necessary extent To the necessary extent
Query ExecutionSpeed
Fast - (Depends upon the size of the MDDB)
Slow Optimum - If the data isfetched from RDBMSthen it’s like ROLAPotherwise like MOLAP.
Cost Medium: MDDB Server+ large disk space cost
Low: Only RDBMS + diskspace cost
High: RDBMS + diskspace + MDDB Servercost
Where to apply? Small transactionaldata + complex model +frequent summaryanalysis
Very large transactionaldata & it needs to beviewed / sorted
Large transactional data+ frequent summaryanalysis
CaseKiran Naik
Thank you
top related