the data warehouse and technology - building the data warehouse
TRANSCRIPT
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
1/43
Building Data WareHouse by
Inmon
Chapter 5: The Data Warehouse and Technology
http://it-slideshares.blogspot.com/
http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/ -
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
2/43
5.0 Overview
Requires a simpler set oftechnological features than itsoperational predecessors:
Online updating: Not need. Locking, integrity: needs are minimal.
Teleprocessing interface: is required verybasic.
This chapter outlines some oftechnological requirements for thedata warehouse.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
3/43
MANAGING LARGEAMOUNTS OF DATA
1. Manage Volumes
2. Manage multiple
media technology
3. Index and
monitoring data
4. Interface to
retrieve and
passing data
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
4/43
Managing Multiple Media
Following is a hierarchy of storage of data interms of speed of access and cost of storage: Main memory Very fast Very
expensive
Expanded memory Very fast Expensive
Cache Very fast Expensive
DASD Fast Moderate
Magnetic tape Not fast Notexpensive
Near line Not fast* Notexpensive
Optical disk Not slow Notexpensive
Fiche Slow Cheap
*Not fast to find first record sought; very fast to find all other records in the block.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
5/43
Indexing and Monitoring Data
Monitoring data warehouse datadetermines such factors as the
following:
If a reorganization needs to be done If an index is poorly structured
If too much or not enough data is in
overflow The statistical composition of the access
of the data
Available remaining space
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
6/43
Interfaces to Many
TechnologiesThe interface to different technologies requires
several considerations: Does the data pass from one DBMS to another
easily?
Does it pass from one operating system toanother easily?
Does it change its basic format in passage(EBCDIC, ASCII, and so forth)?
Can passage into multidimensional processingbe done easily?
Can selected increments of data, such aschanged data capture (CDC) be passed ratherthan entire tables?
Is the context of data lost in translation as data ismoved to other environments?
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
7/43
PROGRAMMER ORDESIGNER CONTROL OFDATA PLACEMENT
Place data at
block/page level
Manage data in parallel
Solid Meta Data controlRich Language
Interface
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
8/43
Parallel Storage and
Management of DataMetadata Management
Data warehouse table structures
Data warehouse table attribution
Data warehouse source data (the system of
record)Mapping from the system of record to the
data warehouse
Data model specification
Extract loggingCommon routines for access of data
Definitions and/or descriptions of data
Relationships of one unit of data to another
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
9/43
Language Interface
Typically, the language interface to thedata warehouse should do the
following:
Be able to access data a set at a time Be able to access data a record at a time
Specifically ensure that one or more
indexes will be used in the satisfaction ofa query
Have an SQL interface
Be able to insert, delete, or update data
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
10/43
EFFICIENT LOADING OFDATA
Load efficiently
Use indexes
efficiently
Store data incompact way
Support compound
Keys
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
11/43
Efficient Index Utilization
Technology can support efficient index
access in several ways:
Using bit maps Having multileveled indexes
Storing all or parts of an index in main memory
Compacting the index entries when the order ofthe data being indexed allows such compaction
Creating selective indexes and range indexes
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
12/43
Compaction of Data
Manage large amounts of data.
Programmer gets the most out of a
given I/O when data is stored
compactly
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
13/43
Compound Keys
The time valiancy of data warehousedata.
Key-foreign key relationships are quite
common in the atomic data
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
14/43
VARIABLE-LENGTH DATAVariable-length data efficientlyLock Manager, explicit control at programmer LevelAble Index Only processingRestore data in Bulk efficiently
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
15/43
Lock Management
Ensures that two or more people arenot updating the same record at the
same time.
Turn the lock manager off and on isnecessary.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
16/43
Index-Only Processing
Looking in an index (or indexes)without going to the primary source of
data
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
17/43
Fast Restore
The capability to quickly restore a datawarehouse table from non-DASD
storage
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
18/43
Other Technological Features
Some of those features include thefollowing:
Transaction integrity
High-speed buffering Row- or page-level locking
Referential integrity
VIEWs of data Partial block loadin
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
19/43
DBMS Types and the Data
WarehouseData warehouses manage massive amounts of
data because: Granular, atomic detail
Historical information
Summary as well as detailed data
Because record level, transaction-based updatesare a regular feature of the general-purposeDBMS, must offer facilities: Locking
COMMITs
Checkpoints
Log tape processing
Deadlock
Backout
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
20/43
Changing DBMS Technology
Such a change may be in order for severalreasons: DBMS technologies may be available.
The size of the warehouse has grown.
Use of the warehouse has escalated andchanged.
The basic DBMS decision must be revisited fromtime to time.
Should the decision be made to go to a new
DBMS technology, what are theconsiderations? Will the new DBMS technology meet the
foreseeable requirements?
How will the conversion from the older DBMS
technology to the newer DBMS technology bedone?
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
21/43
Multidimensional DBMS and the
Data Warehouse
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
22/43
Multidimensional DBMS and the
Data Warehouse cont
The multidimensional DBMS The data warehouse
1. holds at least an order ofmagnitude less data.
2. is geared for very heavy andunpredictable access andanalysis of data.
3. holds a much shorter timehorizon of data.
4. allows unfettered access.
5. enjoy a complementary
relationship.
1. holds massive amounts ofdata
2. is geared for a limited amountof flexible access
3. contains data with a very
lengthy time horizon (from 5to 10 years)
4. allows analysts to access itsdata in a constrained fashion
5. being housed in amultidimensional DBMS
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
23/43
Multidimensional DBMS and the
Data Warehouse cont
Following is the relational foundation formultidimensional DBMS data marts:
Strengths:Can support a lot of data.
Can support dynamic joining of data.Has proven technology.
Is capable of supporting general-purposeupdate processing.
If there is no known pattern of usage of data,then the relational structure is as good asany other.
Weaknesses:Has performance that is less than optimal.
Cannot be purely optimized for access
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
24/43
Multidimensional DBMS and the
Data Warehouse contFollowing is the cube foundation for
multidimensional DBMS data marts:
Strengths: Performance that is optimal for DSS processing.
Can be optimized for very fast access of data.
If pattern of access of data is known, then thestructure of data can be optimized.
Can easily be sliced and diced.
Can be examined in many ways.
Weaknesses: Cannot handle nearly as much data as a standard
relational format.
Does not support general-purpose updateprocessing.
May take a long time to load.
If access is desired on a path not supported by the
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
25/43
Multidimensional DBMS and the
Data Warehouse cont
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
26/43
Multidimensional DBMS and the
Data Warehouse cont
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
27/43
MULTIDIMENSIONAL DBMSAND THE DATA WAREHOUSECONT
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
28/43
Data Warehousing across
Multiple Storage MediaA large amount of data is spread
across more than one storage
medium.
One processing environment is the DASDenvironment where online, interactive
processing is done.
The other processing environment is often
a tape or mass store environment
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
29/43
The Role of Metadata in the Data
Warehouse Environment
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
30/43
The Role of Metadata in the Data
Warehouse Environment
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
31/43
The Role of Metadata in the Data
Warehouse Environment
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
32/43
Context and Content
The context of the reports is explainedfor the contents
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
33/43
Three Types of Contextual
Information Three levels of contextual information must bemanaged:
Simple contextual information
Complex contextual information
External contextual information
Simple contextual information relates to thebasic structure of data itself, and includessuch things as these: The structure of data
The encoding of data The naming conventions used for data
The metrics describing the data, such as: How much data there is
How fast the data is growing
What sectors of the data are growing
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
34/43
Three Types of Contextual
Information cont This type of information addresses such
aspects of data as these:
Product definitions
Marketing territories Pricing
Packaging
Organization structure
Distribution
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
35/43
Three Types of Contextual
Information cont Some examples of external contextual
information include the following:
Economic forecasts: Inflation
Financial trends
Taxation
Economic growth
Political information
Competitive information
Technological advancements
Consumer demographic movements
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
36/43
Capturing and Managing
Contextual Information Complex and external contextual
types of information are hard to
capture and quantify because they are
so unstructured.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
37/43
Looking at the Past
Some of these shortcomings are asfollows:
The information management
attempts were aimed at theinformation systems developer, not the
end user.
Attempts at contextual managementwere passive.
Attempts at contextual information
management were in many cases
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
38/43
Refreshing the Data
WarehouseReading a log tape is no small matter,however. Many obstacles are in the
way, including the following:
The log tape contains muchextraneous data.
The log tape format is often arcane.
The log tape contains spannedrecords.
The log tape often contains addresses
instead of data values.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
39/43
Testing
It is very unusual to find a similar testenvironment in the world of the data
warehouse, for the following reasons:
Data warehouses are so large that acorporation has a hard time justifying
one of them, much less two of them.
The nature of the development lifecycle for the data warehouse is
iterative.
For the most part, programs are run in
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
40/43
Summary
Manage large amounts of data
Manage data on a diversemedia
Easily index and monitor data
Interface with a wide number
of technologies Allow the programmer to place
the data directly on thephysical device
Store and access data inparallel
Have metadata control of thewarehouse
Efficiently load the warehouse
Efficiently use indexes
Store data in a compact way
Support compound keys Selectively turn off the lock
manager
Do index-only processing
Quickly restore from bulkstorage
Some technological features arerequired: Robust language interface
Compound keys
Variable-length data
The abilities to do the following:
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
41/43
Summary cont
The data architect must recognize thedifferences between a transaction-
based DBMS and a data warehouse-
based DBMS.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
42/43
Summary cont
Multidimensional OLAP technology is suitedfor data mart processing and not datawarehouse processing.
When the data mart approach is used, manyproblems become evident: The number of extract programs grows large.
Each new multidimensional database must returnto the legacy operational environment for its own
data. There is no basis for reconciliation of differences
in analysis.
A tremendous amount of redundant data amongdifferent multidimensional DBMS environments
exists.
-
7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse
43/43
Summary cont
Metadata in the data warehouseenvironment plays a very different role
than metadata in the operational
legacy environment.