introduction to data warehousingcsci253/presentations s12/data warehousin… · a data warehouse is...
TRANSCRIPT
![Page 1: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/1.jpg)
Introduction to Data Warehousing
![Page 2: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/2.jpg)
What is Data Warehouse ?
![Page 3: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/3.jpg)
What is Data Warehouse ?
Data warehouse (DW) is a database mainly used to facilitate
reporting and analysis in businesses.
![Page 4: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/4.jpg)
What is Data Warehouse ? The definition of the data warehouse focuses on data storage. The main
source of the data extracted is cleaned, transformed, cataloged and
made available for use by managers and other business professionals
for data mining, online analytical processing, market research and
decision support.
![Page 5: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/5.jpg)
What is Data Warehouse ?
In other words many large organizations are data rich but information
poor. So Data Warehousing reverses the situation by turning the
organization to be more information rich.
![Page 6: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/6.jpg)
What is Data Warehouse ?
It combines data from multiple and varied sources into one
comprehensive and easily manipulated database.
![Page 7: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/7.jpg)
What is Data Warehouse ? The data stored in the warehouse are uploaded from several operational DB
systems (such as marketplace, sales etc) and integrated into one large DB.
Then it can be subdivided into small group units of data called Data Marts
allowing the user to choose the source and type of data depending on current
needs.
![Page 8: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/8.jpg)
Why Data Warehouse ?
![Page 9: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/9.jpg)
Scenario: ABC Pvt Ltd.
ABC Pvt Ltd is a company with several branches in
different cities. The Sales Manager wants a quarterly sales
report for the whole company. Each branch has it’s own
separate operational system.
![Page 10: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/10.jpg)
Scenario: ABC Pvt Ltd.
City A
City B
City C
City D
Sales
Manager Report on sales per item
type for first quarter.
![Page 11: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/11.jpg)
Solution: Date Warehousing
Extract sales information from each database.
Store the information in a common database at a
single warehouse.
Refresh and update warehouse at regular intervals so that it contains up to date information for analysis.
Retrieve and analyze information from warehouse as it contains all data with a historical perspective.
![Page 12: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/12.jpg)
City A
City B
City C
City D
Data
Warehouse
Sales
Manager
Query &
Analysis tools
Report
![Page 13: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/13.jpg)
Why is Data Warehousing special ?
![Page 14: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/14.jpg)
Operational vs. Warehouse System
A data warehouse is a
- Subject-oriented,
- Integrated,
- Time-variant,
- Nonvolatile
collection of data in support of the management’s decision making
process.
![Page 15: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/15.jpg)
Subject-oriented
Data warehouse can organized around subjects such as sales, products
and customers.
For example, to learn more about your company's sales data, you can
build a warehouse that concentrates on sales. Using this warehouse,
you can answer questions like “Who was our best customer for this
item last year? ” or “Which employee made the most sales last
month?”
It excludes data not useful in decision support process.
![Page 16: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/16.jpg)
Integration
Data Warehouse is constructed by integrating multiple heterogeneous
sources with different data types, such as relational databases, OLTP
files, Flat files, and refined into one common coding scheme.
RDBMS
OLTP
Data
Warehouse
Flat File Data Processing
Data Transformation
![Page 17: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/17.jpg)
Time-variant
Provides information from historical perspective
Every data key structure contains either implicitly or explicitly an
element of time.
A data warehouse generally stores data that is 5-10 years old, to be
used for comparisons, trends, and forecasting.
![Page 18: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/18.jpg)
Nonvolatile
A Data Warehouse is always a physically separate store of data.
Due to this separation, data warehouses do not require transaction processing, recovery, or backup. The data are not updated or changed in any way once they enter the data warehouse, but are only loaded, refreshed and accessed for queries.
![Page 19: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/19.jpg)
Operational vs. Warehouse System
Features Operational Warehouse
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database
professional
Knowledge workers
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Mostly read
![Page 20: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/20.jpg)
Operational vs. Warehouse System
Features Operational Warehouse
Focus Data in Information out
Number of records
accessed
tens millions
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High performance,high
availability
High flexibility, end-
user autonomy
![Page 21: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/21.jpg)
The component of the data
warehouse
![Page 22: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/22.jpg)
Two-Tier Architecture
For the dimensional analysis the client software may
require the data to be well structured as a star schema that
simplifies both the software operations and users view for
the data
![Page 23: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/23.jpg)
Three-Tier Architecture
Adding application server between the client and the
warehouse to improve performance and reduce networking
traffic
It will manage the interaction with the warehouse , perform
the calculations and send the results to the client
![Page 24: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/24.jpg)
Three-Tier (MDD) Architecture
The Multi Dimensional Database can store the data in
special structure designed to facilitate the dimensional
analysis
![Page 25: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/25.jpg)
The application for Data
Warehousing
![Page 26: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/26.jpg)
Data warehousing tools / Extract ,
Transform , Load
![Page 27: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/27.jpg)
Summary of the processes the applications do
![Page 28: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/28.jpg)
Data
Extraction and Transformation
Normalized Changed into
Relational Data &
![Page 29: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/29.jpg)
Data
Cleaning
AT&T
ATT
AT and T
are all different
spelling of the
same name
ATT
![Page 30: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/30.jpg)
Stored in
Data warehouse
SQL
End User
Middle
ware
![Page 31: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/31.jpg)
http://www.youtube.com/watch?v=ripMMhMOL0s
![Page 32: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses](https://reader036.vdocuments.net/reader036/viewer/2022071022/5fd6c21080ab617d9d376da3/html5/thumbnails/32.jpg)
Thank You