data modelling 242
TRANSCRIPT
-
7/25/2019 Data Modelling 242
1/247
Data Modeling
-
7/25/2019 Data Modelling 242
2/247
Data Warehouse Defined
A data warehouse is a collection of corporate
information, derived directly from
operational systems and some external datasources. Its specific purpose is to support
business decisions, not business operations
-
7/25/2019 Data Modelling 242
3/247
Characteristics of a DW
! "ub#ect$oriented Data%collects all data for a sub#ect, from different sources
! &ead$only &e'uests
%loaded during off$hours, read$only during day hours! Interactive (eatures, ad$hoc 'uery
%flexible design to handle spontaneous user 'ueries
! )re$aggregated data%to improve runtime performance
! *ighly denormali+ed data structures
%fat tables with redundant columns
-
7/25/2019 Data Modelling 242
4/247
Components of a Data Warehouse
"ource
"ystems
Data
"tagingArea
DW*
"erversnd -ser
Data Access
Storage
(lat (iles&DM"
Processing
No UserQuery
Services
Data Mart 1
Dimensional
Conforms to
DW Bus
Data Mart 2
/uery
0ools
&eport
Writers
Mining
0ools
-
7/25/2019 Data Modelling 242
5/247
STAGING AREA - SOME
CLARITY
! "taging Area
%optional
%to cleanse the source data
%Accepts data from different sources
%Data model is re'uired at staging area
%Multiple data models may be re'uired forpar1ing different sources and for transformed
data to be pushed out to warehouse
-
7/25/2019 Data Modelling 242
6/247
ODS - SOME CLARITY
! 2perational Data "tore
%2ptional
%3ranular, detailed level data
%May feed warehouse 4eg when warehouse is
aggregated5
%-sually a relational model%May 1eep data for a smaller time period than
warehouse
-
7/25/2019 Data Modelling 242
7/247
Data Modeling
WHAT IS A DATA MODEL???A data model is an abstraction of some aspect of the real
world 4system5.
WHY A DATA MODEL???%*elps to visualise the business
%A model is a means of communication.
%Models help elicit and document re'uirements.
%Models reduce the cost of change.
%Model is the essence of DW architecture based on which
DW will be implemented
-
7/25/2019 Data Modelling 242
8/247
-
7/25/2019 Data Modelling 242
9/247
Impact of Data Ana#$sis
Techni%ues on DM! /uery and reporting
! 7ormali+ed data model
! "elect associated data elements! summari+e and group by category
! present results
! direct table scan
! & with normali+ed 8 denormali+ed appropriate
-
7/25/2019 Data Modelling 242
10/247
/uery and reporting
-
7/25/2019 Data Modelling 242
11/247
Re%uirements of a Decision
Support &uer$ En'ironment
! 0o provide a method for testing hypothesis 4eg.
what if 9.5
! 0o allow ad$hoc 'ueries! 0o allow human input 4D"" ma1es decisions
withusers 5
! xpects user 1nowledge of problem! 0o simulate the behaviour of a real$world
problem
-
7/25/2019 Data Modelling 242
12/247
Impact of Data Analysis
0echni'ues on DM
! Multidimensional analysis
! (ast and easy access to data
! Any number of analysis dimensions in anycombinations
! & will mean many #oins
! Dimensional model appropriate
-
7/25/2019 Data Modelling 242
13/247
Multidimensional Analysis
-
7/25/2019 Data Modelling 242
14/247
Data Mining
! Data Mining! discovers unusual patterns
! re'uires low level of detail data
-
7/25/2019 Data Modelling 242
15/247
A #oo( at ifferent !arehouse
architectures
2perational
Data
xternal
data
Warehouse Manager
:
2
A
D
M
A
7
A
3
&
/
-
&
;
M
A
7
A
3
&
Detailed
Information
"ummary
information
Meta Data 2:A)
-
7/25/2019 Data Modelling 242
16/247
Data Warehouse Architecture - )
h h
-
7/25/2019 Data Modelling 242
17/247
Data Warehouse Architecture - 3
-
7/25/2019 Data Modelling 242
18/247
Data Warehouse Architecture - 4
-
7/25/2019 Data Modelling 242
19/247
DW Architectures
! Architecture
-
7/25/2019 Data Modelling 242
20/247
DW Architectures
! Architecture
-
7/25/2019 Data Modelling 242
21/247
-
7/25/2019 Data Modelling 242
22/247
3lobal Architecture
-
7/25/2019 Data Modelling 242
23/247
DW Architectures
! Independent Architecture
%stand$alone
%controlled by a department
%minimal integration
%no global view
%very fast to implement
-
7/25/2019 Data Modelling 242
24/247
DW Architectures
! Interconnected Architecture
%distributed
%integrated and interconnected
%gives a global view of enterprise
%more complexity
! who manages 8 controls data
! another tier in architecture to share common databetween multiple data marts
! have a data sharing schema across data marts
I d d d I d
-
7/25/2019 Data Modelling 242
25/247
Independent and Interconnected
Architecture
-
7/25/2019 Data Modelling 242
26/247
T$pes of Data Warehouse
! nterprise Data Warehouse
! Data Mart
nterprise
Data Warehouse
Datamart Datamart Datamart
-
7/25/2019 Data Modelling 242
27/247
Enterprise ata !arehouse
!
-
7/25/2019 Data Modelling 242
28/247
Data Mart
! :ogical subset of enterprise data warehouse
! 2rgani+ed around a single business process
! ased on granular data
! May or may not contain aggregates
! 2b#ect of analytical processing by the end user.
! :ess expensive and much smaller than a full
blown corporate data warehouse.
-
7/25/2019 Data Modelling 242
29/247
Distri*ute an Centra#i+e
Data !arehouses
! DW sitting on a monolithic machine $
unrea!istic
! "eparate machines, different 2", different Dsystems $ rea!it"
So!ution! "hare a uniform architecture to allow them to
be fused coherently
-
7/25/2019 Data Modelling 242
30/247
-
7/25/2019 Data Modelling 242
31/247
)hysical data warehouse6
Data warehouse $$@ data marts
SOURCE DATA
ExternalData
Operational Data
Staging Area
Data Warehouse Data Marts
Physical Data Warehouse:
Data Warehouse --> Data Marts
-
7/25/2019 Data Modelling 242
32/247
)hysical data warehouse6
Data marts $$@ data warehouse
SOURCE DATA
ExternalData
Operational Data
Staging Area
Data Warehous
Data Marts
Physical Data Warehouse:Data Marts --> Data Warehouse
-
7/25/2019 Data Modelling 242
33/247
)hysical data warehouse6
)arallel data warehouse and data
marts
SOURCE DATA
External
Data
Operational Data
Staging Area
Data Wareho
Data Marts
Physical Data Warehouse:Parallel Data Warehouse & Data Marts
DW I l t ti A h
-
7/25/2019 Data Modelling 242
34/247
DW Implementation Approaches
! 0op Down
! ottom$up
!
-
7/25/2019 Data Modelling 242
35/247
0op Down Implementation
-
7/25/2019 Data Modelling 242
36/247
ottom -p Implementation
-
7/25/2019 Data Modelling 242
37/247
DW Implementation Approaches
0op Down
! More planning and design
initially
! Involve people from
different wor1$groups,
departments
! Data marts may be built
later form 3lobal DW
! 2verall data model to be
decided up$front
ottom -p
!
-
7/25/2019 Data Modelling 242
38/247
-
7/25/2019 Data Modelling 242
39/247
DW Implementation Approaches
#ombine$ A%%roach
! Determine degree of planning and design for a global
approach to integrate data marts being built by bottom$up
approach
! Develop base level infrastructure definition for global DW
at business level
! Develop plan to handle data elements needed by multiple
data marts
! uild a common data store to be used by data marts and
global DW
-
7/25/2019 Data Modelling 242
40/247
-
7/25/2019 Data Modelling 242
41/247
:evels of modeling
!
-
7/25/2019 Data Modelling 242
42/247
-
7/25/2019 Data Modelling 242
43/247
"ample conceptual model
Products
Customer
n!oices
Customer
A""resses
Customers
#eographic
$oun"aries
Sales %eps
SampleConceptual
Mo"el
-
7/25/2019 Data Modelling 242
44/247
:ogical Model
! &eplaces many$to$many relationships with
associative entities.
! Defines a full population of entity attributes.
! May use non$physical entities for domains
and sub$types.
! stablishes entity identifiers.
! *as no specifics for any &DM" or
configuration.
-
7/25/2019 Data Modelling 242
45/247
-
7/25/2019 Data Modelling 242
46/247
-
7/25/2019 Data Modelling 242
47/247
-
7/25/2019 Data Modelling 242
48/247
"ample logical model
-
7/25/2019 Data Modelling 242
49/247
"ample logical model
PRODUCT
P%OD'C( CODE)P%OD'C( DESC%P(O*
C'S(OME%C'S(OME% DS*APS+O( DA(E)C'S(OME% *AME
C'S(OME% *,OCE*,OCE D*E (EM SE.)*,OCE DA(E
SAES %EPSAES %EP D
C'S(OME% ADD%ESSC'S(OME% DADD%ESS D
#EO#%AP+C$O'*DA%/#EO CODE
the 0ill 1or
purchase"0y
the 0ill sent to
purchase" at
the 0ill purchase" 0y
purchase" 0y
the general location o1
locate" 2ithin1or thecustomersol" to 0y
the salesman1or
the salesmanager 1or
1or thecustomer
manage" 0y
the salesman1or
sol" 0y
Sample ogical Mo"el
-
7/25/2019 Data Modelling 242
50/247
)hysical Model
! A )hysical data model may include
%&eferential Integrity
%Indexes%iews
%Alternate 1eys and other constraints
%0ablespaces and physical storage ob#ects.
PRODUCTS
-
7/25/2019 Data Modelling 242
51/247
PRODUCTS
P%OD'C(3CODEP%OD'C(3DESC%P(O*CA(E#O%/3CODECA(E#O%/3DESC%P(O*
C'S(OME%3*,OCES
*,OCE3D*E3(EM3SE.*,OCE3DA(EC'S(OME%3D$3(O3ADD%ESS3DSAES3%EP3DMA*A#E%3%EP3DO%#A*4A(O*3D
O%#3ADD%ESS3DP%OD'C(3CODE.'A*((/'*(3P%CEAMO'*(oP%OD'C(3COS(OAD3DA(E
C'S(OME%S
C'S(OME%3DS*APS+O(3DA(EC'S(OME%3*AMEoA#EoMA%(A3S(A('SC%ED(3%A(*#
C'S(OME%3ADD%ESSES
C'S(OME%3DADD%ESS3DADD%ESS3*E5oADD%ESS3*E6oPOS(A3CODESAES3%EP3D#EO3CODEOAD3DA(E
#EO#%AP+C3$O'*DA%ES
#EO3CODEC(/3*AMES(A(E3*AMECO'*(%/3*AMEoC(/3A$$%,oS(A(E3A$$%,oCO'*(%/3A$$%,
SAES3%EPS
SAES3%EP3DAS(3*AME7%S(3*AMEoMA*A#E%37%S(3*AMEoMA*A#E%3AS(3*AME
Sample PhysicaMo"el
-
7/25/2019 Data Modelling 242
52/247
-
7/25/2019 Data Modelling 242
53/247
Data Architecting$"eal timeata
! &epresents current status of business
! -sed by operational systems to run business
!
-
7/25/2019 Data Modelling 242
54/247
Data Architecting $ &eal time
dataTo use (ea! time $ata in DW&
Must be
!
-
7/25/2019 Data Modelling 242
55/247
Data Architecting $ Derived data
! Data created by summari+ing, aggregating,
averaging real$time data through some process! represents a view of business data at a specific
time
! *istorical record of business over a period
! )recalculate derived data elements and summari+e
detailed data to improve 'uery processing
-
7/25/2019 Data Modelling 242
56/247
Data Architecting $ &econciled
data
! &eal$time data cleansed, ad#usted, enhanced to
provided integrated source of data for analysis
!
-
7/25/2019 Data Modelling 242
57/247
nterprise Data Model 4DM5
!
-
7/25/2019 Data Modelling 242
58/247
DM $ 0he )hased nterprise
Data Model
-
7/25/2019 Data Modelling 242
59/247
nterprise Data Model 4DM5
,hases
Increasing order of Information re'uired
! Information )lanning! usiness Analysing
! :ogical Data Modeling
! )hysical Data design
-
7/25/2019 Data Modelling 242
60/247
nterprise Data Model 4DM5
#nformation Planning! C5! called sub#ect areas 8 super entity8 business entity in which
the organi+ation is interested g.customer, product
,ur%ose
% 0o set up scope and architecture of DW% 0o provide a single comprehensive point of view
-
7/25/2019 Data Modelling 242
61/247
nterprise Data Model 4DM5
Business $naly%ing! Define contents of primary business concepts.
! 3ather and arrange business re'uirements! Defines business terms
,ur%ose
% 0o set up scope and architecture of DW
% 0o provide a single comprehensive point of view
-
7/25/2019 Data Modelling 242
62/247
nterprise Data Model 4DM5
Logical Data Moeling
! nterprise$wide in scope
! consists of several entities, relationships, attributes! complete model in >rd 7ormal (orm.
-
7/25/2019 Data Modelling 242
63/247
nterprise Data Model 4DM5
P!ysical Data Design
! space
! performance! physical distribution of data
,ur%ose&
0o design for the physical implementation
-
7/25/2019 Data Modelling 242
64/247
nterprise Data Model 4DM5
Is it possi*#e to ra! an EDM&&&
7ot always
)hased approach 2& asim%!e DM
! list of sub#ect areas 4EBF5! define business relationships between sub#ect areas
! define contents of each sub#ect area
-
7/25/2019 Data Modelling 242
65/247
3ranularity
:evel of summari+ation of data elements
:evel of detail available in the data
Morethe detailLo!erthe granularity
Why is it important in DW===
2pportunity for T(ADE-O
performance vs. volume of data stored
ability to access detailed data vs. cost of storage
3ranularity
-
7/25/2019 Data Modelling 242
66/247
3ranularity
-
7/25/2019 Data Modelling 242
67/247
3ranularity
To o.ercome tra$e-os between $ata .o!ume an$
/uer" ca%abi!it" &
Divide the data in the DW
-
7/25/2019 Data Modelling 242
68/247
Data )artitioning ModelW*;=
To un$erstan$* maintain an$ na.i'ate a DW
0;)" of )artitioning
! :ogical and )hysical
i i i d l
-
7/25/2019 Data Modelling 242
69/247
Data )artitioning Model
Lo,ica# artitionin,$ W*;=3oals6
!
!
!
!
. .
Data )artitioning Model
-
7/25/2019 Data Modelling 242
70/247
Data )artitioning Model $
:ogical )artitioning! )artition large volumes of data by splitting
! *elps to ma1e data easier to6
! &estructure
Index
! "e'uentially scan
! &eorgani+e
! &ecover
! Monitor
Data )artitioning Model
-
7/25/2019 Data Modelling 242
71/247
Data )artitioning Model $
:ogical )artitioning
Logical Partition- HOW??
#riteria
! 0ime period 4date, month, or 'uarter5
%almost always chosen
! 3eography 4location5
! )roduct 4more generically, by line of business5
! 2rgani+ational unit
! A combination of the above
Data )artitioning Model
-
7/25/2019 Data Modelling 242
72/247
Data )artitioning Model $
:ogical )artitioning
:
Data )artitioning Model "ub#ect
-
7/25/2019 Data Modelling 242
73/247
Data )artitioning Model $"ub#ect
Areas"ub#ect areas classified by the topics of interest to
the business.
! FWG* rule
%when, where, who, what, why, and how
eg. Hwho could be customer, employee, manager, supplier,
business partner, competitor.
! 3et a candidate list of sub#ect areas! Decompose,rearrange, select, redefine in more
detail
Data )artitioning Model "ub#ect
-
7/25/2019 Data Modelling 242
74/247
Data )artitioning Model $"ub#ect
Areas! Define the business relationships among
sub#ect areas
! 0his will determine the dimensions used
! "ub#ect Areas help define criteria li1e6! -nit of the data model
! -nit of an implementation pro#ect
! -nit of management of the data! asis for the integration of multiple mplementations
unit for analysis should be business process
-
7/25/2019 Data Modelling 242
75/247
Data Moe#in, - Techni%ues
What nees to *e moe#e urin,
-
7/25/2019 Data Modelling 242
76/247
! "0A3I73 A&A
%;" 4maybe multiple data models are
re'uired5! 2D"
%;"
! DA0AWA&*2-"8DA0AMA&0%;"
What nees to *e moe#e urin,
a ata !arehouse pro.ect
-
7/25/2019 Data Modelling 242
77/247
Data Modeling $ 0echni'ues
! Modeling techni'ues
%$& Modeling
%Dimensional Modeling
Implementation and modeling
-
7/25/2019 Data Modelling 242
78/247
Implementation and modeling
styles
! Modeling versus implementation
%Modeling6 describe what should be built tonon$technical fol1s
%Implementation6 describe what is actually built
to technical fol1s
-
7/25/2019 Data Modelling 242
79/247
! &elational modeling
%-se for implementation
%Difficult to understand by non$technical fol1s! Dimensional modeling
%-se for modeling during analysis and design
phases%
-
7/25/2019 Data Modelling 242
80/247
E-R Moe#in,
! )roduces a data model, using two basic
concepts entitiesand the re!ationshi%s
between those entities.
! Detailed & models also contain attributes,
which can be properties of either the entitiesor the relationships.
Con'entions use in E-R
-
7/25/2019 Data Modelling 242
81/247
! ntities
! Attributes
! &elationships or Associations
Con'entions use in E R
moe#in,
mp7ame Address
elongs
0o
'MPL()''
-
7/25/2019 Data Modelling 242
82/247
-
7/25/2019 Data Modelling 242
83/247
-
7/25/2019 Data Modelling 242
84/247
-
7/25/2019 Data Modelling 242
85/247
-
7/25/2019 Data Modelling 242
86/247
-
7/25/2019 Data Modelling 242
87/247
-
7/25/2019 Data Modelling 242
88/247
-
7/25/2019 Data Modelling 242
89/247
-
7/25/2019 Data Modelling 242
90/247
-
7/25/2019 Data Modelling 242
91/247
-
7/25/2019 Data Modelling 242
92/247
-
7/25/2019 Data Modelling 242
93/247
-
7/25/2019 Data Modelling 242
94/247
-
7/25/2019 Data Modelling 242
95/247
-
7/25/2019 Data Modelling 242
96/247
-
7/25/2019 Data Modelling 242
97/247
-
7/25/2019 Data Modelling 242
98/247
-
7/25/2019 Data Modelling 242
99/247
-
7/25/2019 Data Modelling 242
100/247
-
7/25/2019 Data Modelling 242
101/247
Entities
! )rincipal data ob#ects about which information
is to be collected.
! -sually recogni+able concepts such as person,
things, or events.
! xamples 6 M):2;", )&2J
-
7/25/2019 Data Modelling 242
102/247
Attri*utes / Re#ationships
! Attributes describe the entity of which theyare associated.
! A relationship represents an associationbetween two or more entities. An example 6
%mployees are assigned to pro#ects
%Departments manage one or more pro#ects.
T$pes of Data Re#ationships -C i #i
-
7/25/2019 Data Modelling 242
103/247
Carina#it$
! 2ne $ 2ne G6 G
! 2ne $ Many G6 m
! Many $ Many m 6 n
! &ecursive data relationship
Norma#i+ation
-
7/25/2019 Data Modelling 242
104/247
+
! &emove data redundancy! C 7( $ contains repeating values
! G 7( $ 7o repeating values
! B 7( $ very attribute is dependent on the 1ey, thewhole 1ey and nothing but the 1ey
! > 7( $ 7o non$1ey attribute is functionally
dependent on another non$1ey attribute! Denormali+ation $ carefully introduced redundancy to
improve 'uery performance
Norma#i+ation - 0N1
-
7/25/2019 Data Modelling 242
105/247
+
! 'liminate "epeating groups
,erson S0i!!s
A2racle, DB M" Access, 2racle
< 2racle,
-
7/25/2019 Data Modelling 242
106/247
+
! 'liminate "eunant ata
S0i!! ID S0i!! Descri%tion
"G DB"B 2racle
"> M" Access
"K
-
7/25/2019 Data Modelling 242
107/247
+
! 'liminate Columns Not Depenent (n *ey
Memb ID S0i!! ID #om% ID) #om% 1ame Location
A "G DG
-
7/25/2019 Data Modelling 242
108/247
&elational modeling
! &epresents business entities, data items
associated with each entity, and the
relationships of business interest among theentities
! ntities are usually bro1en down into
smallest possible units and combined using
relationships
! Diagram loo1s li1e a spiderweb
E tit C # t Ch (#i t
-
7/25/2019 Data Modelling 242
109/247
Entit$ Comp#eteness Chec(#ist
! 1ame%to describe the data contained
%to meet naming conventions8standards
! Descri%tion%to describe precisely what the entity represents
%re'uired for sharing and reuse of data model
components
! #ate'or"
%classifies entities sharing common characteristics
Entit$ Comp#eteness Chec(#ist
-
7/25/2019 Data Modelling 242
110/247
Entit$ Comp#eteness Chec(#ist
3cont45!
-
7/25/2019 Data Modelling 242
111/247
Entit$ Comp#eteness Chec(#ist
3cont45! Abbre.iations
%document the abbreviation and full definition
! Acron"ms%avoid 4not understood by all, not uni'ue5
%if used, document them
! #urrent 1umber o occurrences%to estimate entity statistics for all entity
categories
Entit$ Comp#eteness Chec(#ist
-
7/25/2019 Data Modelling 242
112/247
Entit$ Comp#eteness Chec(#ist
3cont45! Authorit"
%Metadata authority4to approve change of entities,
attributes etc.5
%Data authority4to change occurrences of entity5
! ,rimar" e"2orei'n e"21on-0e" attribute
names
! (e!ationshi%s to other entities
%no entity stands by itself
-
7/25/2019 Data Modelling 242
113/247
6omon$ms
"ame or similar in sound or spelling as another
-0 DI((&70 I7 MA7I73
-
7/25/2019 Data Modelling 242
114/247
"ame meaning ...
"ame logical concept ...
Assigned different names
Introduce redundancy in model
ID70I(; A7D &"2: them $ for entitiesand attributes
-
7/25/2019 Data Modelling 242
115/247
S$non$ms 3cont45
-
7/25/2019 Data Modelling 242
116/247
Chec(#ist
! 1ame%to uni'uely identify the attribute
%to meet naming conventions8standards
! Descri%tion%to describe precisely what the attribute represents
! T"%e
%refers to how the attribute is used in the datamodel
Comp#eteness Chec(#ist 3cont45
-
7/25/2019 Data Modelling 242
117/247
! e" attributes
%primary 1eys in the entity that they are defined
%primary 8 foreign 1eys in other entities that they occur in5
%implemented with a uni'ue index
! 1on-e" attributes
%contain the bul1 of the information
%need not be uni'ue
%candidate 1eys not selected as primary 1eys%secondary 1eys may be selected as access paths
%implemented using non$uni'ue index
Comp#eteness Chec(#ist 3cont45
-
7/25/2019 Data Modelling 242
118/247
! Domain
set of permitted values for the attribute
Domain elements
%3eneral Domain
! describes the manner in which data is represented4data type5
! alphanumeric, real, integer, boolean, sound, digital video etc.
%"pecific Domain
! numerated domain! specific set of values that are valid and allowed
! static values 4eg. (lat type 6 B bed, > bed, duplex etc5
-
7/25/2019 Data Modelling 242
119/247
-
7/25/2019 Data Modelling 242
120/247
-
7/25/2019 Data Modelling 242
121/247
-
7/25/2019 Data Modelling 242
122/247
-
7/25/2019 Data Modelling 242
123/247
-
7/25/2019 Data Modelling 242
124/247
Comp#eteness Chec(#ist 3cont45
-
7/25/2019 Data Modelling 242
125/247
! Abbre.iations
%document the abbreviation and full definition
! Acron"ms
%avoid 4not understood by all, not uni'ue5
%if used, document them
! e" use
%applies only to primary 1eys
%will serve as primary or foreign 1ey in child entity
! Source
%whether attribute is primitive or derived
Comp#eteness Chec(#ist 3cont45
-
7/25/2019 Data Modelling 242
126/247
%If derived, establish the formula
%document formula%formula should identify any other attributes re'uired to
generate value for derived attribute
! Tracea*i#it$%why is the attribute there
%refer to source 4paragraph, citation of statement, physical
data structure element ...5
%mapped to metadata ob#ect that is maintained as part of
system lifecycle 4eg.
-
7/25/2019 Data Modelling 242
127/247
:ibrary ranch
!
-
7/25/2019 Data Modelling 242
128/247
Attribute Meta$ata
-
7/25/2019 Data Modelling 242
129/247
Shou!$ Data mo$e! contain $eri.e$ attributes??
;" %represent information that management actually wants
%users have an opportunity to specify business rules
%provide an opportunity to validate that all necessary basedata is captured
%design is made easier as re'uirements are already
mappedIn DSS en.ironment - ESSE1TIAL
N'+'" use erive attri,utes as P"#M$") -eys
Deri'e attri*utes - An e7amp#e
2&D& )&2D-
-
7/25/2019 Data Modelling 242
130/247
2rder N4)L5
order date
2&D& )&2D-
-
7/25/2019 Data Modelling 242
131/247
Attri*ute Names
-
7/25/2019 Data Modelling 242
132/247
! -ni'ue name representing its business meaning
! clear, concise, self$explanatory! minimi+e use of special characters
! length @ FC gives flexibility
%limitations of >B, >> exist in some
-
7/25/2019 Data Modelling 242
133/247
! "*2-:D 720
-
7/25/2019 Data Modelling 242
134/247
! uilds on and is consistent with attribute name
! unambiguous, clear, economically worded! stand alone 4not dependent on another attribute
definition to convey meaning.8EWAREof circular
attributedefinitions5! 7ever MI"" giving a description
A2ID6
%restating the name of attribute and8or characteristics 4eg.:ength, data type, domain values5
%using technical #argon
% limiting description to direct extract from dictionary
Some attri*ute escriptions
-
7/25/2019 Data Modelling 242
135/247
7eed improvement
! :ocation name $ the name of a
location
! order line total 'uantity $ a six$digit integer total
! directional indicator $ , W, ",
7, 7
)retty 3ood
! "afety level 'uantity $ 0he
calculated minimum 'uantity of a
product "L- that must be on
hand to reduce ris1 of out$of$
stoc1 conditions
! operating 'uantity $ 0he
calculated, demand$driven
'uantity of a material item thatmust be maintained and
replenished for use in day$to$day
operations
rimar$ 9e$ Attri*utes
-
7/25/2019 Data Modelling 242
136/247
rimar$ 9e$ Attri*utes
! "table4not to change in value, cannot be null5
! Minimal4in number of attributes.. :arge composite
1eys not advisable5! (actless4should not contain intelligent groupings of
data5
! Definitive4value always exists for every occurrence5
rimar$ 9e$ Attri*utes
-
7/25/2019 Data Modelling 242
137/247
!
-
7/25/2019 Data Modelling 242
138/247
$ $
Surro,ate 9e$s
-se artificial 1ey8surrogate 1ey8pseudo$
1ey8system$generated 1ey to ensure uni'ueness
when6%no attribute possesses all )L characteristics
%candidate 1eys are large and complex
! ALWAYS :SE IN DW Data Moe#
Re#ationships- Chec(#ist! 7ame ? Description 2ptional
-
7/25/2019 Data Modelling 242
139/247
! 7ame ? Description $ 2ptional
! 0ype 4identifying8non$identifying5!
-
7/25/2019 Data Modelling 242
140/247
Limitations of E-R Moe#in,
! )oor )erformance
! 0end to be very complex and difficult to
navigate.
Dimensional Modeling
-
7/25/2019 Data Modelling 242
141/247
! Dimensional modeling uses three basicconcepts 6 measures, facts, dimensions.
! Is powerful in representing the re'uirements
of the business user in the context ofdatabase tables.
! (ocuses on numeric data, such as values
counts, weights, balances and occurences.
Dimensional modeling
-
7/25/2019 Data Modelling 242
142/247
! Must identify
%usiness process to be supported
%3rain 4level of detail5%Dimensions
%(acts
Dimensional modeling
-
7/25/2019 Data Modelling 242
143/247
modeling
! (acts
! Measures4ariables5
! Dimensions%Dimension members
%Dimension hierarchies
(acts
-
7/25/2019 Data Modelling 242
144/247
! A fact is a collection of related data items,
consisting of measures and context data.! ach fact typically represents a business
item, a business transaction, or an event that
can be used in analy+ing the business orbusiness process.
! (acts are measured, continuously valued,
rapidly changing information.
-
7/25/2019 Data Modelling 242
145/247
(act 0able
! A table that is used to store business
information 4measures5 that can be used in
mathematical e'uations.%/uantities
%)ercentages
%)rices
Dimensions
-
7/25/2019 Data Modelling 242
146/247
! A dimension is a collection of members orunits of the same type of views.
! Dimensions determine the contextual
bac1ground for the facts.
! Dimensions represent the way business
people tal1 about the data resulting from a
business process, e.g., who, what, when,
where, why, how
Dimension 0able
-
7/25/2019 Data Modelling 242
147/247
Dimension 0able
! 0able used to store 'ualitative data about
fact records
%Who%What
%When
%Where
%Why
Dimension data should be
-
7/25/2019 Data Modelling 242
148/247
Dimension data should be
! verbose, descriptive
! complete
! no misspellings, impossible values! indexed
! e'ually available
! documented 4 metadata to explain origin,
interpretation of each attribute5
Dimensional model
i li di i l d l
-
7/25/2019 Data Modelling 242
149/247
! visualise a dimensional model as a in number5
! 2perations for 2:A)
Dri!! Down&*igher level of detail(o!! 5%& summari+ed level of data
40he navigation path is determined by hierarchies within dimensions.5
S!ice&cuts through the cube.-sers can focus on specificperspectives
Dice6rotates the cube to another perspective 4change thedimension5
Drill down 9. &oll up
-
7/25/2019 Data Modelling 242
150/247
"lice and Dice
-
7/25/2019 Data Modelling 242
151/247
Dimensions
! < ll ti f b it f th t f
-
7/25/2019 Data Modelling 242
152/247
!
-
7/25/2019 Data Modelling 242
153/247
*ierarchies
! Allow for the Hrollup of data to more
summari+ed levels.
%0ime! day
! month
! 'uarter
! year
6ierarchies
-
7/25/2019 Data Modelling 242
154/247
Aggregates
-
7/25/2019 Data Modelling 242
155/247
! Aggregate 0ables are pre$storedsummari+ed tables9 created at a higher
level of granularity across any or all of the
dimensions.
! If the existing granularity is Day wise sales,
then creating a separate month wise salestable is an example of Aggregate 0able.
$ggregates
-
7/25/2019 Data Modelling 242
156/247
! 0he use of such aggregates is the singlemost effective tool the data warehouse
designer has to improve 'uery performance.
! -sage of Aggregates can increase the
performance of /ueries by several times.
Measures
-
7/25/2019 Data Modelling 242
157/247
Measures
! A measure is a numeric attribute of a fact,
representing the performance or behaviour of the
business relative to dimensions.
! 0he actual numbers are called as variables.g. sales in money, sales volume, 'uantity supplied, supply cost,
transaction amount
!A measure is determined by combinations of themembers of the dimensions and is located on
facts.
0*
-
7/25/2019 Data Modelling 242
158/247
T$pes of 1acts
Addi i
-
7/25/2019 Data Modelling 242
159/247
! Additive
%Able to add the facts along all the dimensions
%Discrete numerical measures eg. &etail sales in P
! "emi Additive
%"napshot, ta1en at a point in time
%Measures of Intensity
%7ot additive along time dimension eg. Account
balance, Inventory balance%Added and divided by number of time period to get
a time$average
T$pes of 1acts
7 Additi
-
7/25/2019 Data Modelling 242
160/247
! 7on Additive
%7umeric measures that cannot be added across anydimensions
%Intensity measure averaged across all dimensions eg.
&oom temperature%0extual facts $ A2ID 0*M
Advantages of Dimensional
M d li
-
7/25/2019 Data Modelling 242
161/247
Modeling! Allows complex multi$dimensional data
structure to be defined with a very simple data
model.
! &educes number of physical #oins the 'uery
has to process
! "implifies the view of data model.
! Allows DW* to expand and evolve with
relatively low maintenance.
"ample business process versus
dimension table
-
7/25/2019 Data Modelling 242
162/247
dimension table
Products Customers Location SalesRep
Date
Product Sales
ProductManufacturing
EmployeeCompensation
"ample measure versus
dimension table
-
7/25/2019 Data Modelling 242
163/247
dimension table
Products Customers Location SalesRep
Date
Product Sales(!
ProductManufacturing(units!
SalesCommission (!
Payroll (gross!(!
Pro"uct "escriptionCategory co"eCategory "escription
P%OD'C(n!oice "ate7iscal year.uarterMonthWee8
(ME PE%OD
-
7/25/2019 Data Modelling 242
164/247
Sample Logical Model
1or Dimensional Data Mart
ast name7irst name
SAES %EP
Wee8
A""ress line 5A""ress line 6City nameState a00re!iationPostal co"eCountry name
ADD%ESSCustomer snapshot "aten!oice "ate#ross sales.uantityPro"uct cost
C'S(OME% %EP SAES
Customer name
C'S(OME%S
Snapshot "ate
Cre"it ratingMarital statusAge
C'S(OME% DEMO#%AP+CS
"PRODUCT#CODE) P%OD'C(3DESC%P(O*) CA(E#O%/3CODE) CA(E#O%/3DESC%P(O*
P%OD'C(S&P%OD'C(3CODE&S*APS+O(3DA(E) MS%P) 'OM) P%MA%/3S'PP-E%3*AME) S'PP-E%3C(/3*AME) S'PP-E%3S(A(E3A$$%,
S'PP-E% CO'*(%/ *AME
P%OD'C(3S*APS+O(S
-
7/25/2019 Data Modelling 242
165/247
) S'PP-E%3CO'*(%/3*AME
& SA-ES3%EP3D) -AS(3*AME) 7%S(3*AMEo
MA*A#E%37%S(3*AMEoMA*A#E%3-AS(3*AME
SA-ES3%EPS
&*,OCE3D&-*E3(EM3SE.) *,OCE3DA(E) C'S(OME%3DA(E) $--3(O3ADD%ESS3D) SA-ES3%EP3D) MA*A#E%3%EP3D) O%#A*4A(O*3D) O%#3ADD%ESS3D) P%OD'C(3CODE) .'A*((/) '*(3P%CE) AMO'*(o P%OD'C( COS(
) -OAD3DA(E
C'S(OME%3*,OCES
& *,OCE3D&-*E3(EM3SE.) *,OCE3DA(E) S'PP-E%3D) ADD%ESS3D) $'D#E(3D) %E,SO*3SE.) $'D#E(3-*E3(EM3SE.) P%OD'C(3CODE) .'A*((/) '*(3P%CE) AMO'*() -OAD3DA(E
P'%C+ASE3*,OCES
&$'D#E(3D&%E,SO*3SE.&-*E3(EM3SE.) $-3(/PE3CODE) $-3(/PE3DESC%P(O*) O%#A*4A(O*3D) ADD%ESS3D) $'D#E(3PE%OD) -OAD3DA(E) $'D#E(3AMO'*() E9PE*D('%ESo P%OD'C(3CODE
$'D#E(3DE(A-S
&C'S(OME%3D&ADD%ESS3D) ADD%ESS3-*E5oADD%ESS3-*E6oPOS(A-3CODE
) SA-ES3%EP3D) #EO3CODE) -OAD3DA(E
C'S(OME%3ADD%ESSES
&C'S(OME%3D&S*APS+O(3DA(E) C'S(OME%3*AMEoA#EoMA%(A- S(A('S
) C%ED(3%A(*#
C'S(OME%S
&S'PP-E%3D&ADD%ESS3D) S'PP-E%3*AMEoPOS(A-3CODE) #EO3CODE
) -OAD3DA(E
S'PP-E%3ADD%ESSES
&O%#A*4A(O*3D&ADD%ESS3D) O%#3(/PE) O%#A*4A(O*3*AME) ADD%ESS3-*E5
oADD%ESS3-*E6oPOS(A-3CODE) #EO3CODEoPA%E*(3O%#3D) -OAD3DA(E
*(E%*A-3O%#3ADD%ESSES
EO3CODE) C(/3*AME) S(A(E3*AME) CO'*(%/3*AMEoC(/3A$$%,oS(A(E3A$$%,oCO'*(%/3A$$%,
#EO#%AP+C3$O'*DA%ES
Sample PhysicMo"el
1orData Warehou
-
7/25/2019 Data Modelling 242
166/247
! "tar
%"ingle fact table surrounded by denormalised
dimension tables
%0he fact table primary 1ey is the composite of the
foreign 1eys 4primary 1eys of dimension tables5
%(act table contains transaction type information.
%Many star schemas in a data mart%asily understood by end users, more dis1 storage
re'uired
xample of "tar$ schema
-
7/25/2019 Data Modelling 242
167/247
-
7/25/2019 Data Modelling 242
168/247
! "nowfla1e%"ingle fact table surrounded by normalised dimension
tables
%7ormali+es dimension table to save data storage space.
%When dimensions become very very large
%:ess intuitive, slower performance due to #oins
! May want to use both approaches, especially if
supporting multiple end$user tools.
xample of "now fla1e schema
-
7/25/2019 Data Modelling 242
169/247
"nowfla1e $ Disadvantages
-
7/25/2019 Data Modelling 242
170/247
g
! 7ormali+ation of dimension ma1es it
difficult for user to understand
! Decreases the 'uery performance because itinvolves more #oins
! Dimension tables are normally smaller than
fact tables $ space may not be a ma#or issueto warrant snowfla1ing
9e$s ;44
-
7/25/2019 Data Modelling 242
171/247
$
! )rimary Leys
%uni'uely identify a record
! (oreign Leys
%primary 1ey of another table referred here
! "urrogate Leys
%system$generated 1ey for dimensions
%1ey on its own has no meaning
%integer 1ey, less space
More 9e$s ;44
! "mart Leys
-
7/25/2019 Data Modelling 242
172/247
! "mart Leys
%primary 1ey out of various attributes ofdimension
%A2ID 0*M
%Join to (act table should be on single surrogate1ey
! )roduction Leys
%D2 720 -" )roduction defined attributes%usiness may reuse8change them $ DW cannot
asic Dimensional Modeling
0echni'ues
-
7/25/2019 Data Modelling 242
173/247
0echni'ues
! "lowing changing Dimensions
! &apidly changing "mall Dimensions
! :arge Dimensions! &apidly changing :arge Dimensions
! Degenerate Dimensions
! Jun1 Dimensions
"lowly
-
7/25/2019 Data Modelling 242
174/247
y g g
A dimension is considered a Slo.ly
C!anging Dimensionwhen its attributes
remain a#mostconstant over time, re'uiring
relatively minor alterations to represent the
evolved state.
S#o!#$ chan,in, Dimension-Options
-
7/25/2019 Data Modelling 242
175/247
E') e" $oes not chan'e but $escri%tion chan'es 6%ro$uct$escri%tion7
TYE 0
! 2verwrite dimension record with newvalues
%use$ when o!$ .a!ue o attribute has no
si'niicance
S#o!#$ chan,in, Dimension-Options
-
7/25/2019 Data Modelling 242
176/247
TYE )
!
-
7/25/2019 Data Modelling 242
177/247
p 3 5
TYE 2!
-
7/25/2019 Data Modelling 242
178/247
An xample
! "lowly
-
7/25/2019 Data Modelling 242
179/247
E') (a%i$ chan'es to %ro$uct $imension
! 0ype B 4use surrogate 1ey and create a new
record5
! use effective dates
! use only until dimension table remains
small
Lar,e Dimensions
-
7/25/2019 Data Modelling 242
180/247
Dimensions containin, se'era# mi##ion recors
-
7/25/2019 Data Modelling 242
181/247
Dimensions containin, = 0>> mi##ion recors
-
7/25/2019 Data Modelling 242
182/247
! uild the data in this dimension with allpossible combinations of values for each
attribute
! Identify each combination uni'uely! verytime an event occurs and is recorded
in fact table, attach it with the uni'ue
combination ID.
Any fact table
containing
customerQ1ey as a
foreign 1ey9..
Any fact table
containing
customerQ1ey and
d 1
-
7/25/2019 Data Modelling 242
183/247
-
7/25/2019 Data Modelling 242
184/247
&elatively constant
attributes 9.
demogQLey
demographic attributes
9.
Demographics dimension
-
7/25/2019 Data Modelling 242
185/247
! Advantages%7o increase in data storage everytim event occurs
! Drawbac1s
%(orced to use ranges of discrete values fordimensional attributes
%7ew dimension cannot be too big 4not @GM5
%Data in new dimension can be accessed along with
static data only through the fact table $s!ower
%2nly if event occurs, lin1 the static and changing
portions of dimension $ 0ee% a $umm" e.ent in act
De,enerate Dimensions
-
7/25/2019 Data Modelling 242
186/247
! 2ccur in line item oriented fact tables
! occur when dimension table is left only
with a single 1ey and no other fields! all other attributes have been moved into
other dimension tables
! Moved to fact table $ not #oined to anything
?un( Dimensions
! 7umber of miscellaneous flags and text
-
7/25/2019 Data Modelling 242
187/247
! 7umber of miscellaneous flags and text
attributes left over after design
W*A0 02 D2 WI0* 0*M====
D( N(/
%:eave them behind in the fact table
%Ma1e each flag and attribute into its own dimension
%"trip off all such flags and attributes
?un( Dimensions 3cont;5
-
7/25/2019 Data Modelling 242
188/247
! D2%3rouping of random flags and attributes
%ta1e away from fact and group them into8un0
dimension
eg. 2pen ended comments fields
Conforme Dimensions! Dimension that means the same thing with every
possible fact table that it is #oined
-
7/25/2019 Data Modelling 242
189/247
possible fact table that it is #oined.
! Dimension is identically the same dimension in each
data mart
! Ma#or responsibility of the central DWdesign team is to
establish, publish, maintain and enforce them
! DW cannot function as an integrated whole without
strict adherence to conformed dimensions
Conforme Dimensions 3Cont45
-
7/25/2019 Data Modelling 242
190/247
! When you dont need
-
7/25/2019 Data Modelling 242
191/247
0imeQ1ey
dayQofQwee1dayQnumberQinQmonth
dayQnumberQoverall
wee1QnumberQinQyear
month
'uarter
fiscalQperiod
holidayQflag
wee1dayQflag
lastQdayQinQmonthQflagseason
event
0ime Dimension
-
7/25/2019 Data Modelling 242
192/247
! An exclusive 0ime dimension is re'uired
because the "/: date semantics and
functions cannot generate several important
attributes re'uired for analytical purposes.
! Attributes li1e wee1days, wee1ends, fiscal
period, holidays, season cannot begenerated by "/: statements.
0ime Dimension
-
7/25/2019 Data Modelling 242
193/247
! Moreover "/: date stamps occupy more
space largely increasing the si+e of the fact
table.
! Joins on such "/: generated date$stamps
are costly decreasing the 'uery speed
significantly.
0ime Dimension
0h D f 14M d 5 i f l
-
7/25/2019 Data Modelling 242
194/247
! 0he Day of wee14Monday, ...5 is useful to
create reports comparing for ex. Monday
sales to (riday sales.
! 0he Day number in month is useful for
comparing measures for the same day in
each month.
! 0he last day in month flag is useful for
performing payday analysis.
0ime Dimension
-
7/25/2019 Data Modelling 242
195/247
! 0he holiday flag and season attributes areuseful for holiday " non$holiday analysis
and season business analysis.
! vent attribute is needed to record special
days li1e stri1e days, etc..
-
7/25/2019 Data Modelling 242
196/247
-
7/25/2019 Data Modelling 242
197/247
Prouct,ro$uct e"
)roduct Id)roduct category
9..rand 7ame"L-9..
Date
Month;ear9.9.
Promotion
,romotion 0e"
)romotion Id)romotion
-
7/25/2019 Data Modelling 242
198/247
! 0he first sales fact table measures the sales
figures at a granularity of "L-, Day and
Individual "tore and )romotion name.
! 2nly the "L- s that actually sell on the
day ma1e it into the sales fact table
irrespective of whether they are on
promotion or not.
"etail C!ains Sample Dimensional moel
-
7/25/2019 Data Modelling 242
199/247
! 0he second promotion fact table is a
factless fact table. It has a granularity of
"L-, Day, "tore and )romotion 7ame.
! 0his promotion fact table records which
items are on promotion in which stores and
at what times.
"etail C!ains Sample Dimensional moel
-
7/25/2019 Data Modelling 242
200/247
! 0ime, )roduct and "tore are common
dimensions in both the fact tables.
! )roduct and )romotion are 0ype B "lowly
changing dimensions.
"etail C!ains Sample Dimensional moel
-
7/25/2019 Data Modelling 242
201/247
! 0he sales fact enables the sales monitoring andanalysis across )roduct, "tores, 0ime and
)romotion dimensions.
! 0he second promotion fact table is needed toanswer the critical 'uestion 9. Which are the
products that were on promotion but did not sell
on a particular day=
"etail C!ains Sample Dimensional moel
-
7/25/2019 Data Modelling 242
202/247
! 0he second fact table can be avoided if we 1eep
+ero sales figures in the sales fact table9. but that
would ma1e our sales fact table very
large9.because less than FR of products whichwere on promotion on a particular day actually
sell.
"etail C!ains Sample Dimensional moel
! itmap Indexes on the foreign 1ey columns in
-
7/25/2019 Data Modelling 242
203/247
the fact tables.
! itmap Indexes on low cardinality columns in
dimensional tables li1e Month, )roduct
-
7/25/2019 Data Modelling 242
204/247
! 0he sales fact is partitioned across the Monthcolumn.
! Aggregates can be created in future based onunderstanding of fre'uently needed ? time
ta1ing 'ueries loo1ing for summari+ed
information.
Aggregates
!
-
7/25/2019 Data Modelling 242
205/247
!
-
7/25/2019 Data Modelling 242
206/247
g
)roducts 6 >C
rands 6 GFC
i.e GFC rows in the )roduct Dimension
! 0ime Dimension
;ear 6 F
Month 6 OC
Days 6 >OFSFGTBF
i.e GTBF rows in the 0ime Dimension
Aggregates
-
7/25/2019 Data Modelling 242
207/247
! Assuming a transaction for each of the
rands everydayU we have GTBFSGFC rows
in our sales (act table.
! A /uery li1e6 "how
-
7/25/2019 Data Modelling 242
208/247
! )roduct
! 0ime
;ear 6 F
Month6 OC
0here would be OCS>GTC rows in thisaggregated fact table.
0he 'uery on this table needs to access only
GTC rows to get the same set of results.
AggregatesM(N/
Time e"
C$/'3(")
#ate'or" e"$334 S$L'S
-
7/25/2019 Data Modelling 242
209/247
9 "
Month(iscalQ)eriod
"eason
#ate'or"9e"
-
7/25/2019 Data Modelling 242
210/247
data model.! Aggregates increase the maintenance load
on the Data warehouse. 0hey must be
updated as the base table data gets updated.! Aggregates occupy storage space. *ence
aggregates should be created only for
fre'uent and time ta1ing 'ueries.
$ggregate Navigation
! Aggregate 7avigation features enable end$
-
7/25/2019 Data Modelling 242
211/247
users to 'uery the data mart withoutbothering about the presence of aggregates.
! Without Aggregate navigation, the end userneeds to be aware of the presence of
aggregates so that he can 'uery the
aggregated table instead of detailed table9thus increasing the complexity of the user
interface.
$ggregate Navigation
! An aggregate navigator intercepts the
clients "/: and if possible transforms
-
7/25/2019 Data Modelling 242
212/247
client s "/: and if possible transforms
base$level "/: into aggregate aware "/:.
! HAggregate Aware function in usiness2b#ects K.G is an example of Aggregate
navigator.
$ggregate Navigation
-
7/25/2019 Data Modelling 242
213/247
! 7ew features in 2racle Ti li1e Materiali+ed
views, /uery rewrite
! enable aggregate navigation to be built
within the data mart DM" instead of front
end access tools.
! enables all front end access tools to utili+e
the aggregate navigation feature.
(actless (act table
-
7/25/2019 Data Modelling 242
214/247
! (actless fact tables are fact tables that donot have any measures.
! 0hese 1ind of fact tables arise when there
are no obvious measures for the businessarea.
! Daily attendance trac1ing is one such
example of a business area having noconcrete measures.
0actless fact ta,les
-
7/25/2019 Data Modelling 242
215/247
0IM
-
7/25/2019 Data Modelling 242
216/247
Prouct,ro$uct e"
)roduct Id)roduct category
9..rand 7ame"L-9..
Month;ear9.9.
Promotion,romotion 0e"
)romotion Id)romotion
-
7/25/2019 Data Modelling 242
217/247
. Who 4people, groups, organi+ations5 is of interest to theuser=
What 4functions5 is the user trying to analy+e=
Why does the user need the data=
When 4for what point in time5 does the data need to be
recorded=
Where 4geographically, organi+ationally5 do relevant
processes occur=*ow do we measure the performance or state of the
functions being analy+ed=
Approaches to Data Gatherin,
:) Source Dri.en
-
7/25/2019 Data Modelling 242
218/247
:) Source Dri.en
%define re'uirements by using the source data in
production operational systems.
%by analy+ing an & model of source data 2&
%by analy+ing the actual physical record layouts andselecting data elements deemed to be of interest.
A$.anta'es
! Lnow data that you can supply
! Minimi+e user involvement in early stages of pro#ect
Disa$.anta'es
! Increased ris1 of producing wrong set of re'uirements
Approaches to Data Gatherin,
;) 5ser Dri.en
-
7/25/2019 Data Modelling 242
219/247
%define re'uirements by investigating the functions theusers perform
%done through a series of meetings and8or interviews
with users.
A$.anta'es
! (ocus on what is needed rather than what is available
Disa$.anta'es
! xpectations to be closely managed.
-
7/25/2019 Data Modelling 242
220/247
Data Moe#in, for DataWarehouse - Steps
0) Steps to Data moe#in, for Data Warehouse
-
7/25/2019 Data Modelling 242
221/247
0) Steps to Data moe#in, for Data Warehouse
14 Stuy '" 24'valuate an $nalyse
54 "evie. Dimension 64 $ /ime Dimension74 #entify 0acts 84 3ranularity
94 Merge 0acts :4 "evie. 0acts
;4 Name 0acts 1
-
7/25/2019 Data Modelling 242
222/247
!
-
7/25/2019 Data Modelling 242
223/247
Step 0@ Remove all entities that act asassociative entities and all subtypeentities.
(eg.Product Component, Inventory,Order Line, Order, Retail Store, andCorporate Sales Ofce
!ote" #e care$ul to create all themany%to%many relationships thatreplace these entities
-
7/25/2019 Data Modelling 242
224/247
y y p g
entities.
&or each ne' entity, consider 'hich attributesin the original entities 'ould be use$ul
constraints on the ne' dimension.Note " Remember to consider attributes o$ any
subtype entities removed in the rst step.
Logical )odel is a logical representation"
remove individual *eys and replace 'ithgeneric *ey $or each dimension.
-
7/25/2019 Data Modelling 242
225/247
+ Roll the salesperson up into the salesdimension
implies (correctly that the relationshipsamong outlet, salesperson and customer
roll up into the sales to customerrelationship.
+-he many%to%many relationship
bet'een customer and sales preventsthe erroneous rollup o$ customer intosales person and ultimately into sales.
-
7/25/2019 Data Modelling 242
226/247
organi+ation&e'uirements that are collected must represent
these 6
! !hat is *ein, ana#$+e 3Dimensions5! e'a#uation criteria for !hat is *ein, ana#$+e
4Measures5
ID70I(; the measures and dimensionsAnaly+e the 'uestions, define measures and
dimensions to meet re'uirements.
-
7/25/2019 Data Modelling 242
227/247
-sed all information available!
-
7/25/2019 Data Modelling 242
228/247
Do we have all data to answer all the 'uestions=a. "ales and Manufacturing=== ;es
b. )roduct
/B, /> can they be answered= 72Whats MI""I73==
-nit costof model at anypoint in timeis re'uired.
*istoryof unit cost re'uired. Addbegin and enddatein product dimension.
-nit cost Derivation rule==
-
7/25/2019 Data Modelling 242
229/247
:owest level of 0ime $ DA;&eporting re'uirements ===
y day, wee1 and month
(inal Dimension :ist
-
7/25/2019 Data Modelling 242
230/247
2ne set of dimensions and its associated measuresma1e up what is called afact.
2rgani+ing the dimensions and measures into facts .
%0he process of grouping dimensions and measurestogether in a manner that can address the specified
re'uirements. *2W=
! irst create an initial fact for each of the 'ueries in
the case study.
7ote6 (or any measures that describe exactly the same set of
dimensions,create only one fact
-
7/25/2019 Data Modelling 242
231/247
& &&F do not have any measures
If we did not6
merge /O with /F, /V in (act Kmerge /T and / with /B in (act B
left withact!ess acts4fact with no measures5
the sa!e o a %ro$uct at a %oint in time 6acts ; an$ 37 ata s%eciic !ocation 6act ; on!"7* has occurre$) 1o other
measurement is re/uire$.
-
7/25/2019 Data Modelling 242
232/247
:evel of detail at which fact is recorded0ry to 1eep at most detailed level 4summari+e if re'uired5
A$$iti.it"6 ability of measure to be
summari+ed! u!!" a$$iti.eadditive across all dimensions $advised5
! non-a$$iti.e adding R of B facts $ not possible5
! semi-a$$iti.e adding balances of same account at Bdifferent points in time. Additive only across some
dimensions5
-
7/25/2019 Data Modelling 242
233/247
0otal cost and total revenue 4daily5So!ution a. "plit into B facts
b. Ma1e the time dimension consistent
Ma1e time to lowest level $ DA;Average 'uantity on hand $ non$additive
So!utionstore actual 'uantity on hand and let the
'uery calculate average.
-
7/25/2019 Data Modelling 242
234/247
/wo different levels of granularity/B 4daily5
/T, / 4month5
So!ution6 "ince measures are fully additive, set thegrain of time to a day. A 'uery can handle any
summari+ation to the monthly level.
6=
/wo different grains of time. 7either can roll up to the other.
-
7/25/2019 Data Modelling 242
235/247
2ptions6
a.
-
7/25/2019 Data Modelling 242
236/247
"eplace R 4(act >5 with 'uantity of models soldthrough6 $ retail outlet,
$ corporate sales office
$ salesperson.0otal 'uantity sold is already present.R can be calculated
"eplace R 4(act K5 with 6
$ number of models eligible for discount, $ 'uantity of models eligible for discount actually sold
$ 'uantity of models sold at a discount.
-
7/25/2019 Data Modelling 242
237/247
Consoli"ate 1acts 2here possi0le -W+/! asier for a user to find the data needed to satisfy a 'uery if
there are fewer places to loo1.
! xpand the analysis potential because you can relate moremeasures to more dimensions at a higher level of
granularity.
!(ewer facts $ lesser administration
*2W==
Determine for each measure which additional dimensions can
be added to increase its granularity
-
7/25/2019 Data Modelling 242
238/247
(act B6 Already has all the dimensions in (act > and K
(act > 6 Add "ales dimension to brea1 up
0otal into (act B
-
7/25/2019 Data Modelling 242
239/247
directly from the product dimension. 7ot needed inconsolidated (act
! )roduct dimension tells whether an individual model is
eligible for discount! -se the total 'uantity sold4consolidated from fact >5 to
represent the 'uantity of models eligible for discount
actually sold.
-
7/25/2019 Data Modelling 242
240/247
/uantity of models sold at a discount $ &etain2&
! record the discount amountand generate the
'uantity sold at a discountby adding up the'uantity soldwhere the discount amountis not
+ero.
So!ution&Merge (act B, >, K
-
7/25/2019 Data Modelling 242
241/247
(act G 6
-
7/25/2019 Data Modelling 242
242/247
(act G $ Inventory (act(act B $ "ales (act
0>4 Si+e the moe#
! calculate the si+e of the data in a table%number of rows S length of each row
0o calculate row length6
! K bytes for each numeric or date attribute! number of characters for character attribute
! number of digits in a decimal attribute 8 B and rounded up.
-
7/25/2019 Data Modelling 242
243/247
"eller 6:4 > corpX GFretail X >C salesmen5
CC
7o. of models experiencing changes GC per wee1
GC S FB S K BCTC
7o. of product rows >CC X BCTC 25265:<
-
7/25/2019 Data Modelling 242
244/247
Si>e o Sa!es act
112>
-
7/25/2019 Data Modelling 242
245/247
#ase Stu$" 6cont$))7
004 Recor Metaata
$ ! 4 fi i i i f
-
7/25/2019 Data Modelling 242
246/247
Mo$e! 47ame, Definition, )urpose,
-
7/25/2019 Data Modelling 242
247/247
! Confirms t!at moel meets user re?uirements! Confirms t!at user unerstans t!e moel4
alidated portion goes through design&emaining goes bac1 in iterative development of model