data warehouse
DESCRIPTION
A process of transforming data into information and making it available to users in a timely enough manner to make a difference - used by finance and insurance industry - Historical data held the key to understanding data over time - Presentation introducing basic concept of data warehouse by Prof Jyoti -TRANSCRIPT
![Page 1: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/1.jpg)
Data Warehousing
Presentation by Prof. Jyotindra Zaveri
1 Data Warehousing J. Zaveri
![Page 2: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/2.jpg)
Data, Data everywhere
yet ...
� I can’t find the data I need
� data is scattered over the network
� many versions, subtle differences
� I can’t get the data I need
Data Warehousing
� I can’t get the data I need
need an expert to get the data
� I can’t understand the data I found
available data poorly documented
� I can’t use the data I found
results are unexpected
data needs to be transformed from one form to other
J. Zaveri2
![Page 3: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/3.jpg)
What is Data Warehousing?
A process of transforming data into information and
Information
information and making it available to users in a timely enough manner to make a difference
Data
3 Data Warehousing J. Zaveri
![Page 4: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/4.jpg)
Why Data Warehouse? [DW]� For organizational learning to take data
from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW).Warehousing (DW).
� DW allows an organization (enterprise) to remember what it has noticed about its data.
� Data Mining techniques make use of the data in a DW.
4 Data Warehousing J. Zaveri
![Page 5: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/5.jpg)
Data Warehouse
�A data warehouse is a copy of transaction data specifically
structured for querying, analysis and reporting
�The data warehouse contains a copy of the transactions which are �The data warehouse contains a copy of the transactions which are
not updated or changed later by the transaction system.
�This data is specially structured, and may have been transformed
when it was copied into the data warehouse.
5 Data Warehousing J. Zaveri
![Page 6: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/6.jpg)
Data Warehouse
Customers Orders
Enterprise“Database”
Transactions
Etc…
Vendors Etc…
DataWarehouse
Copied, organizedsummarized
6 Data Warehousing
Data Mining
Data Miners
J. Zaveri
![Page 7: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/7.jpg)
Data Warehouse Architecture
Optimized Loader
ExtractionCleansing
RelationalDatabases
ERPSystems
Data Warehouse Engine Analyze
Query
Metadata RepositoryLegacyData
Purchased Data
Systems
7 Data Warehousing
J. Zaveri
![Page 8: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/8.jpg)
Data warehouseMergeClean
DirectQuery
Reportingtools
MiningtoolsOLAP
Decision support tools
RelationalDBMS+
Crystal reportsIntelligent Miner
Online Analytic Processing
Mumbai branch Delhi branch Pune branchCensusdata
Operational data
Detailed transactionaldata
Data warehouseCleanSummarize
DBMS+e.g. Oracle
GISdata
8 Data Warehousing J. Zaveri
![Page 9: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/9.jpg)
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysisTelecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
9 Data Warehousing J. Zaveri
![Page 10: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/10.jpg)
Data Mart
� A Data Mart is a smaller, more focused Data Warehouse – a
mini-warehouse.
10 Data Warehousing J. Zaveri
![Page 11: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/11.jpg)
Data Warehouse to Data Mart
Data Mart
DecisionSupport
Information
DataWarehouse Data Mart
Data Mart
DecisionSupport
Information
DecisionSupport
Information
11 Data Warehousing J. Zaveri
![Page 12: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/12.jpg)
Data Warehouse definition
A single, complete and consistent store of data obtained from a variety obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
Data Warehousing 12J. Zaveri
![Page 13: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/13.jpg)
Which are ourlowest/highest margin
Which are ourlowest/highest margin
customers ?Who are my customers Who are my customers
and what products are they buying?
What is the most What is the most effective distribution
Why Data Warehousing?
are they buying?
to the competition ?
Which customersare most likely to go to the competition ?
What impact will
and margins?
What impact will new products/services
have on revenue and margins?
What product prom--options have the biggest
What product prom--options have the biggest
impact on revenue?
effective distribution channel?
Data Warehousing1
3 J. Zaveri
![Page 14: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/14.jpg)
Decision Support� Used to manage and control business
� Data is historical or point-in-time
� Optimized for inquiry rather than update
� Use of the system is loosely defined and can be ad-hoc
Used by managers and end-users to understand the business
Data Warehousing14
� Used by managers and end-users to understand the business and make judgments
J. Zaveri
![Page 15: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/15.jpg)
What are the users saying...
� Data should be integrated across the enterprise
� Summary data had a real value to the organization
Data Warehousing15
� Historical data held the key to understanding data over time
� What-if capabilities are required
J. Zaveri
![Page 16: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/16.jpg)
Data Warehousing --
It is a process
� Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible
Data Warehousing16
were not previous possible
� A decision support database maintained separately from the organization’s operational database
J. Zaveri
![Page 17: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/17.jpg)
Traditional RDBMS used for OLTP � Database Systems have been used traditionally for OLTP
� clerical data processing tasks
� detailed, up to date data
� structured repetitive tasks
� read/update a few records
Data Warehousing17
� read/update a few records
� isolation, recovery and integrity are critical
� Will call these operational systems
J. Zaveri
![Page 18: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/18.jpg)
OLTP vs Data WarehouseOLTP vs Data WarehouseOLTP vs Data WarehouseOLTP vs Data Warehouse
� OLTP� Application Oriented
� Used to run business
� Clerical User
Detailed data
� Warehouse (DSS)� Subject Oriented
� Used to analyze business
� Manager/Analyst
� Summarized and refined
Data Warehousing18
� Detailed data
� Current up to date
� Isolated Data
� Repetitive access by small transactions
� Read/Update access
� Summarized and refined
� Snapshot data
� Integrated Data
� Ad-hoc access using large queries
� Mostly read access (batch update)
J. Zaveri
![Page 19: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/19.jpg)
From the Data Warehouse to Data
Marts
IndividuallyStructured
Less
Information
Data Warehousing19
DepartmentallyStructured
Data WarehouseOrganizationallyStructured
More
HistoryNormalizedDetailed
DataJ. Zaveri
![Page 20: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/20.jpg)
Users have different views of Data
OLAP
Tourists: Browse information harvestedby farmers
Data Warehousing20
Organizationallystructured
Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data
Farmers: Harvest informationfrom known access paths
J. Zaveri
![Page 21: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/21.jpg)
Walmart Case Study
� Founded by Sam Walton
� One the largest Super Market Chains in the US
Data Warehousing21
� Walmart: 3000+ Retail Stores
� SAM's Clubs 200+Wholesalers Stores
J. Zaveri
![Page 22: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/22.jpg)
Old Retail Paradigm
� Walmart� Inventory Management
� Merchandise Accounts Payable
� Purchasing
� Supplier Promotions: National,
� Suppliers � Accept Orders
� Promote Products
� Provide special Incentives
� Monitor and Track The
Data Warehousing22
� Supplier Promotions: National, Region, Store Level
� Monitor and Track The Incentives
� Bill and Collect Receivables
� Estimate Retailer Demands
J. Zaveri
![Page 23: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/23.jpg)
New (Just-In-Time) Retail Paradigm� No more deals� Shelf-Pass Through (POS Application)
� One Unit Price� Suppliers paid once a week on ACTUAL items sold
� Walmart Manager� Daily Inventory Restock
Data Warehousing23
� Daily Inventory Restock� Suppliers (sometimes Same Day) ship to Walmart
� Warehouse-Pass Through� Stock some Large Items
� Delivery may come from supplier� Distribution Center
� Supplier’s merchandise unloaded directly onto Walmart Trucks
J. Zaveri
![Page 24: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/24.jpg)
Information as a Strategic Weapon� Daily Summary of all Sales Information� Regional Analysis of all Stores in a logical area� Specific Product Sales� Specific Supplies Sales� Trend Analysis, etc.
Data Warehousing24
� Trend Analysis, etc.� Wal-Mart uses information when negotiating with
� Suppliers� Advertisers etc.
J. Zaveri
![Page 25: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/25.jpg)
Data Granularity in Warehouse� Summarized data stored
� reduce storage costs
� reduce CPU usage
� increases performance since smaller number of records to be processed
Data Warehousing25
processed
� design around traditional high level reporting needs
� tradeoff with volume of data to be stored and detailed usage of data
J. Zaveri
![Page 26: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/26.jpg)
Granularity in Warehouse� Solution is to have dual level of granularity
� Store summary data on disks� 95% of DSS processing done against this data
� Store detail on tapes� 5% of DSS processing against this data
Data Warehousing26J. Zaveri
![Page 27: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/27.jpg)
Levels of Granularity
Operational
accountactivity date
accountmonth
# transwithdrawalsdepositsmonthly account
Banking Example
Data Warehousing27
60 days ofactivity
activity dateamounttellerlocationaccount bal
depositsaverage bal
amountactivity date
amountaccount bal
monthly accountregister -- up to 10 years
Not all fieldsneed be archived
J. Zaveri
![Page 28: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/28.jpg)
Data Integration Across Sources
Trust Credit cardSavings Loans
Data Warehousing28
Same data different name
Different data Same name
Data found here nowhere else
Different keyssame data
J. Zaveri
![Page 29: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/29.jpg)
Data Transformation
Sequential Legacy Relational ExternalOperational/Source Data
Data Transformation
Accessing Capturing Extracting House holding FilteringReconciling Conditioning Loading Validating Scoring
Data Warehousing29
� Data transformation is the foundation for achieving single version of the truth
� Major concern for IT� Data warehouse can fail if appropriate data
transformation strategy is not developed
J. Zaveri
![Page 30: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/30.jpg)
Data Integrity Problems
� Same person, different spellings� Agarwal, Agrawal, Aggarwal etc...
� Multiple ways to denote company name� Persistent Systems, PSPL, Persistent Pvt. LTD.
� Use of different names� Mumbai, Bombay
Data Warehousing30
� Mumbai, Bombay
� Different account numbers generated by different applications for the same customer
� Required fields left blank
� Invalid product codes collected at point of sale� manual entry leads to mistakes� “in case of a problem use 9999999”
J. Zaveri
![Page 31: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/31.jpg)
Data Transformation Terms
� Extracting
� Conditioning
� Scrubbing
� Merging
� Enrichment
� Scoring
� Loading
� Validating
Data Warehousing31
Merging
� House holding
Validating
� Delta Updating
J. Zaveri
![Page 32: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/32.jpg)
Data Transformation Terms� House holding
� Identifying all members of a household (living at the same address)
� Ensures only one mail is sent to a household
� Can result in substantial savings: 1 million catalogues at Rs. 50
Data Warehousing32
� Can result in substantial savings: 1 million catalogues at Rs. 50 each costs Rs. 50 million . A 2% savings would save Rs. 1 million
J. Zaveri
![Page 33: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/33.jpg)
Deploying Data Warehouses
� What business information keeps you in business today? What business information can put you out of business tomorrow?
� What business information should be a mouse click away?
Data Warehousing33
click away?
� What business conditions are the driving the need for business information?
J. Zaveri
![Page 34: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/34.jpg)
Cultural Considerations
� Not just a technology project
� New way of using information to support daily activities and decision making
� Care must be taken to prepare organization for change
Data Warehousing34
for change
� Must have organizational backing and support
J. Zaveri
![Page 35: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/35.jpg)
Data Mining works with
Warehouse Data
� Data Warehousing provides the Enterprise with a memory
Data Warehousing 35
� Data Mining provides the Enterprise with intelligence
J. Zaveri
![Page 36: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/36.jpg)
Join Prof. Zaveri on Facebook and [email protected]
eMail [email protected]
eLearning site http://www.dnserp.com
Follow on Twitter @followERP
Connect on LinkedIn http://in.linkedin.com/in/jyotindrazaveri
Connect on Facebook http://www.facebook.com/jyotindra
Subscribe YouTube http://www.youtube.com/dnserp
J. ZaveriData WarehousingSlide # 36
![Page 37: Data Warehouse](https://reader034.vdocuments.net/reader034/viewer/2022042713/547d1c9db4af9f34098b4731/html5/thumbnails/37.jpg)
Thank You
� Question / Answersession
� Please clarify your doubts
Data Warehousing37 J. Zaveri