data-ware housing
DESCRIPTION
TRANSCRIPT
![Page 1: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/1.jpg)
Data-ware Housing
![Page 2: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/2.jpg)
Introduction
![Page 3: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/3.jpg)
Definition : Simplex perception- No more than collection of Key
pieces of information used to manage & direct the business for the most profitable outcome.
Precise Definition- It concentrate on data- Data should be subject oriented, be consistent across sources & so on.
Pearson’s Definition- It is more than vast data- it is also process involved in getting that data from source to table & from table to analyst’s.
** In other word **
“A DWH is the data (Meta/fact/dimension/aggregate) and
process manager (load/warehouse/query) that make information
available, enabling people to make informed decision.
![Page 4: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/4.jpg)
Data-ware housing Architecture :
DWH must architected to support three major driving
factors.
1) Populating DWH.
2) Day-to-Day management of DWH.
3) The ability to cope with requirement evolution.
![Page 5: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/5.jpg)
Typical Process flow within D.W.H
Source
Extract & load
Warehouse
Data transformation and movement
User
Query
Archive data
![Page 6: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/6.jpg)
Processes :
1. Extract & load the data
2. Clean & transform data in to a form that can cope with
large data volume & provide good query performance.
3. Back up & Archive data
4. Manage queries & direct them to appropriate data
Sources.
![Page 7: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/7.jpg)
Extract & load process:
Op. Data
Suitable for operational System,May have been modified & extended over yr’s to support performance.
D.W.HReconstructed
![Page 8: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/8.jpg)
1) Extract & load process:
a. Controlling the processes: determine when to start
extracting the data, run transformation, consistency
check & so on. Eg: Retail sales analysis
b. When to initiate the extract: Data should be in a
consistent state. Same instances of time. Eg. Telecom
c. Loading the data: Temporary Data store. Clean up
& Consistency check. X Eg. Current subscriber &
Current Event DB.
d. Copy Management tools & data clean-up.:
coding
![Page 9: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/9.jpg)
2) Clean & transformation
a. Clean & transform the data in to a structure that speed up queries
b. Partition data in order to speed up queries, optimize h/w performance& simplify the management of DWH
![Page 10: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/10.jpg)
Clean & transformation
a. Clean & transform the data in to a structure that speed up queries
• Make sure data is consistent within itself. Eg: row
• Make sure data is consistent with other data
With in the same source.
• Make sure data is consistent with data in the
other source system
• Make sure data is consistent with the information already in
the warehouse.
![Page 11: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/11.jpg)
3) Back-up & archive process :
Back-up regularly- recover from loss/failure
In Archiving older data is removed from system
![Page 12: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/12.jpg)
4) Query management process :
Directing query to most effective data source.
![Page 13: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/13.jpg)
Process Architecture
![Page 14: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/14.jpg)
Process Function System
manager
Extract & load Extract & load the data,
performing simple
transformations before & during
load.
Load Manager
Clean & transform
Data
Transforms & Manages data Warehouse
manager
Backup & archive Backs up & archives data
warehouse
Ware house
manager
Query Manager Directs & manages queries Query Manager
![Page 15: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/15.jpg)
Operational Data
Operational Data
LOAD
MANAGER
Detailed informatio
n
Summary info
Meta Data
QUERY
MANAGER
Warehouse Manager
Data dipper
OLAP tools
Data Information Decision
Architecture of data-ware house
![Page 16: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/16.jpg)
Load Manager
System Component that perform all the operations necessary to support the
extract and load process.
Off-the-Shelf tools, bespoke coding, C programs & Shell script.
Size & Complexity will vary between specific solutions from d.h.w to d.h.w.,
larger the degree of overlap between source systems, the larger the load
manager will be.
Third-Party tools max-20 to 25 % of the total system fun.
![Page 17: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/17.jpg)
Load Manager Architecture
1) Extract the data from source systems.
2) Fast load the extracted data into a temporary data store.
3) Perform Simple transformations into a structure similar to the one in the data
ware house.
Each of these function has to be operate automatically & recover from any
error it encounters, to very large extent with no human intervention.
![Page 18: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/18.jpg)
Extract data from source system
In order get hold of the source data it has to be transfer from Source
systems, and made available to D.W.H..
ASCII files are FTP across the LAN.
Current gateways tech. operates too slowly to compete to FTP.
![Page 19: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/19.jpg)
Fast Load
Data should be loaded into warehouse in the fastest possible time, in
order to minimize the total load window.
This becomes critical as the no. of data sources increases and time
window shrinks.
In practice it is more effective to load the data in to a relational D.B. prior
to applying transformation & checks.(ASCII)
![Page 20: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/20.jpg)
Simple Transformation
Before or during the load there will be an opportunity to perform simple
Transformations on the data.
Here we perform those transformation that does not require complex
Logic, or use of relational set operators.
Eg: retail management system.:
1)Strip out all the column that are not required in DWH.
2)Convert all the values to the required data types;
![Page 21: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/21.jpg)
Load Manager Architecture
File structur
e
Temporary data Store
Ware house
str.
Load Manager
Controlling Process
Stored Procedure
Copy management
tools
Fast loader
![Page 22: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/22.jpg)
Ware-house Manager
System Component that perform all the operations necessary to support the
Ware house management process.
Third party system management tools, bespoke coding, C programs &
Shell script.
As the Load manager size & Complexity of ware-house manager will vary
between specific solution. Unlike L.M. the complexity of WH manager is
driven by extend to which the operational management of the DHW has been
automated.
Third-Party tools max-40 % of the total system fun.
![Page 23: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/23.jpg)
Ware-house Manager Architecture
1) Analyze the data to perform consistency & referential integrity check
2) Transform & merge the source data in to a temporary data source into the
Published DWH.
3) Create indexes, business view, partition views & so on.
4) Generate denormalization if appropriate.
![Page 24: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/24.jpg)
Ware house Manager Architecture
Temporary data store
Star flake schema
Summary tables
Ware-house Manager
Controlling Process
Stored Procedure
Backup /recovery tool
SQL scripts
![Page 25: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/25.jpg)
Using temporary destination table :
Once the data is in temporary Store, the next step is to crate a set of tables
identical to the destination table in the DWH.
Ex: if the data in DWH is highly partitioned….
As we r abt. to execute substantial constancy check, data should not be
loaded until it has been cleaned up.
If consistency check fails
Although Relational databases some form rollback, but in practice it is easy
to load data in temporary area, clean it up & then publish it to the DWH.
![Page 26: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/26.jpg)
Complex Transformation
Reconcile data
![Page 27: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/27.jpg)
Transform into a star flake schema:
Transform it into a form suitable for decision support queries.
Transform into a form in which the bulk of factual data lies in the center.
Star schema, snowflake schema, star flake schems.
![Page 28: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/28.jpg)
Create Indexes & views:
One would expect the index creation time to be significant, even if we
need only to create index against fact table partition.
Because of this most relational technology have facilities to create
indexes in parallel, distributing the load across the H/W & significantly
reducing the elapsed time.
Overhead of inserting a row into a table.
![Page 29: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/29.jpg)
Generate the summaries:
Ware-house manager has to create a set of the aggregation to
speed up query performance.
Generated Automatically.
![Page 30: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/30.jpg)
Query manager:
System Component that perform all the operations necessary to support the
Query management process.
User access tools, specialist data-ware housing monitoring tools,
native
data base facilities, bespoke coding, C programs & Shell script.
Size & Complexity will vary between specific solutions.
Unlike the L.M. complexity of Q.M. is driven by th extent to which the facilities
are provided by user access tools or native DB facilities.
![Page 31: Data-ware Housing](https://reader033.vdocuments.net/reader033/viewer/2022061201/5478f434b37959892b8b460d/html5/thumbnails/31.jpg)
Query Manager Architecture
1. Direct queries to the appropriate tables2. Schedule the execution of the user queries.