data modelling 242

Upload: bhaskar-reddy

Post on 28-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Data Modelling 242

    1/247

    Data Modeling

  • 7/25/2019 Data Modelling 242

    2/247

    Data Warehouse Defined

    A data warehouse is a collection of corporate

    information, derived directly from

    operational systems and some external datasources. Its specific purpose is to support

    business decisions, not business operations

  • 7/25/2019 Data Modelling 242

    3/247

    Characteristics of a DW

    ! "ub#ect$oriented Data%collects all data for a sub#ect, from different sources

    ! &ead$only &e'uests

    %loaded during off$hours, read$only during day hours! Interactive (eatures, ad$hoc 'uery

    %flexible design to handle spontaneous user 'ueries

    ! )re$aggregated data%to improve runtime performance

    ! *ighly denormali+ed data structures

    %fat tables with redundant columns

  • 7/25/2019 Data Modelling 242

    4/247

    Components of a Data Warehouse

    "ource

    "ystems

    Data

    "tagingArea

    DW*

    "erversnd -ser

    Data Access

    Storage

    (lat (iles&DM"

    Processing

    No UserQuery

    Services

    Data Mart 1

    Dimensional

    Conforms to

    DW Bus

    Data Mart 2

    /uery

    0ools

    &eport

    Writers

    Mining

    0ools

  • 7/25/2019 Data Modelling 242

    5/247

    STAGING AREA - SOME

    CLARITY

    ! "taging Area

    %optional

    %to cleanse the source data

    %Accepts data from different sources

    %Data model is re'uired at staging area

    %Multiple data models may be re'uired forpar1ing different sources and for transformed

    data to be pushed out to warehouse

  • 7/25/2019 Data Modelling 242

    6/247

    ODS - SOME CLARITY

    ! 2perational Data "tore

    %2ptional

    %3ranular, detailed level data

    %May feed warehouse 4eg when warehouse is

    aggregated5

    %-sually a relational model%May 1eep data for a smaller time period than

    warehouse

  • 7/25/2019 Data Modelling 242

    7/247

    Data Modeling

    WHAT IS A DATA MODEL???A data model is an abstraction of some aspect of the real

    world 4system5.

    WHY A DATA MODEL???%*elps to visualise the business

    %A model is a means of communication.

    %Models help elicit and document re'uirements.

    %Models reduce the cost of change.

    %Model is the essence of DW architecture based on which

    DW will be implemented

  • 7/25/2019 Data Modelling 242

    8/247

  • 7/25/2019 Data Modelling 242

    9/247

    Impact of Data Ana#$sis

    Techni%ues on DM! /uery and reporting

    ! 7ormali+ed data model

    ! "elect associated data elements! summari+e and group by category

    ! present results

    ! direct table scan

    ! & with normali+ed 8 denormali+ed appropriate

  • 7/25/2019 Data Modelling 242

    10/247

    /uery and reporting

  • 7/25/2019 Data Modelling 242

    11/247

    Re%uirements of a Decision

    Support &uer$ En'ironment

    ! 0o provide a method for testing hypothesis 4eg.

    what if 9.5

    ! 0o allow ad$hoc 'ueries! 0o allow human input 4D"" ma1es decisions

    withusers 5

    ! xpects user 1nowledge of problem! 0o simulate the behaviour of a real$world

    problem

  • 7/25/2019 Data Modelling 242

    12/247

    Impact of Data Analysis

    0echni'ues on DM

    ! Multidimensional analysis

    ! (ast and easy access to data

    ! Any number of analysis dimensions in anycombinations

    ! & will mean many #oins

    ! Dimensional model appropriate

  • 7/25/2019 Data Modelling 242

    13/247

    Multidimensional Analysis

  • 7/25/2019 Data Modelling 242

    14/247

    Data Mining

    ! Data Mining! discovers unusual patterns

    ! re'uires low level of detail data

  • 7/25/2019 Data Modelling 242

    15/247

    A #oo( at ifferent !arehouse

    architectures

    2perational

    Data

    xternal

    data

    Warehouse Manager

    :

    2

    A

    D

    M

    A

    7

    A

    3

    &

    /

    -

    &

    ;

    M

    A

    7

    A

    3

    &

    Detailed

    Information

    "ummary

    information

    Meta Data 2:A)

  • 7/25/2019 Data Modelling 242

    16/247

    Data Warehouse Architecture - )

    h h

  • 7/25/2019 Data Modelling 242

    17/247

    Data Warehouse Architecture - 3

  • 7/25/2019 Data Modelling 242

    18/247

    Data Warehouse Architecture - 4

  • 7/25/2019 Data Modelling 242

    19/247

    DW Architectures

    ! Architecture

  • 7/25/2019 Data Modelling 242

    20/247

    DW Architectures

    ! Architecture

  • 7/25/2019 Data Modelling 242

    21/247

  • 7/25/2019 Data Modelling 242

    22/247

    3lobal Architecture

  • 7/25/2019 Data Modelling 242

    23/247

    DW Architectures

    ! Independent Architecture

    %stand$alone

    %controlled by a department

    %minimal integration

    %no global view

    %very fast to implement

  • 7/25/2019 Data Modelling 242

    24/247

    DW Architectures

    ! Interconnected Architecture

    %distributed

    %integrated and interconnected

    %gives a global view of enterprise

    %more complexity

    ! who manages 8 controls data

    ! another tier in architecture to share common databetween multiple data marts

    ! have a data sharing schema across data marts

    I d d d I d

  • 7/25/2019 Data Modelling 242

    25/247

    Independent and Interconnected

    Architecture

  • 7/25/2019 Data Modelling 242

    26/247

    T$pes of Data Warehouse

    ! nterprise Data Warehouse

    ! Data Mart

    nterprise

    Data Warehouse

    Datamart Datamart Datamart

  • 7/25/2019 Data Modelling 242

    27/247

    Enterprise ata !arehouse

    !

  • 7/25/2019 Data Modelling 242

    28/247

    Data Mart

    ! :ogical subset of enterprise data warehouse

    ! 2rgani+ed around a single business process

    ! ased on granular data

    ! May or may not contain aggregates

    ! 2b#ect of analytical processing by the end user.

    ! :ess expensive and much smaller than a full

    blown corporate data warehouse.

  • 7/25/2019 Data Modelling 242

    29/247

    Distri*ute an Centra#i+e

    Data !arehouses

    ! DW sitting on a monolithic machine $

    unrea!istic

    ! "eparate machines, different 2", different Dsystems $ rea!it"

    So!ution! "hare a uniform architecture to allow them to

    be fused coherently

  • 7/25/2019 Data Modelling 242

    30/247

  • 7/25/2019 Data Modelling 242

    31/247

    )hysical data warehouse6

    Data warehouse $$@ data marts

    SOURCE DATA

    ExternalData

    Operational Data

    Staging Area

    Data Warehouse Data Marts

    Physical Data Warehouse:

    Data Warehouse --> Data Marts

  • 7/25/2019 Data Modelling 242

    32/247

    )hysical data warehouse6

    Data marts $$@ data warehouse

    SOURCE DATA

    ExternalData

    Operational Data

    Staging Area

    Data Warehous

    Data Marts

    Physical Data Warehouse:Data Marts --> Data Warehouse

  • 7/25/2019 Data Modelling 242

    33/247

    )hysical data warehouse6

    )arallel data warehouse and data

    marts

    SOURCE DATA

    External

    Data

    Operational Data

    Staging Area

    Data Wareho

    Data Marts

    Physical Data Warehouse:Parallel Data Warehouse & Data Marts

    DW I l t ti A h

  • 7/25/2019 Data Modelling 242

    34/247

    DW Implementation Approaches

    ! 0op Down

    ! ottom$up

    !

  • 7/25/2019 Data Modelling 242

    35/247

    0op Down Implementation

  • 7/25/2019 Data Modelling 242

    36/247

    ottom -p Implementation

  • 7/25/2019 Data Modelling 242

    37/247

    DW Implementation Approaches

    0op Down

    ! More planning and design

    initially

    ! Involve people from

    different wor1$groups,

    departments

    ! Data marts may be built

    later form 3lobal DW

    ! 2verall data model to be

    decided up$front

    ottom -p

    !

  • 7/25/2019 Data Modelling 242

    38/247

  • 7/25/2019 Data Modelling 242

    39/247

    DW Implementation Approaches

    #ombine$ A%%roach

    ! Determine degree of planning and design for a global

    approach to integrate data marts being built by bottom$up

    approach

    ! Develop base level infrastructure definition for global DW

    at business level

    ! Develop plan to handle data elements needed by multiple

    data marts

    ! uild a common data store to be used by data marts and

    global DW

  • 7/25/2019 Data Modelling 242

    40/247

  • 7/25/2019 Data Modelling 242

    41/247

    :evels of modeling

    !

  • 7/25/2019 Data Modelling 242

    42/247

  • 7/25/2019 Data Modelling 242

    43/247

    "ample conceptual model

    Products

    Customer

    n!oices

    Customer

    A""resses

    Customers

    #eographic

    $oun"aries

    Sales %eps

    SampleConceptual

    Mo"el

  • 7/25/2019 Data Modelling 242

    44/247

    :ogical Model

    ! &eplaces many$to$many relationships with

    associative entities.

    ! Defines a full population of entity attributes.

    ! May use non$physical entities for domains

    and sub$types.

    ! stablishes entity identifiers.

    ! *as no specifics for any &DM" or

    configuration.

  • 7/25/2019 Data Modelling 242

    45/247

  • 7/25/2019 Data Modelling 242

    46/247

  • 7/25/2019 Data Modelling 242

    47/247

  • 7/25/2019 Data Modelling 242

    48/247

    "ample logical model

  • 7/25/2019 Data Modelling 242

    49/247

    "ample logical model

    PRODUCT

    P%OD'C( CODE)P%OD'C( DESC%P(O*

    C'S(OME%C'S(OME% DS*APS+O( DA(E)C'S(OME% *AME

    C'S(OME% *,OCE*,OCE D*E (EM SE.)*,OCE DA(E

    SAES %EPSAES %EP D

    C'S(OME% ADD%ESSC'S(OME% DADD%ESS D

    #EO#%AP+C$O'*DA%/#EO CODE

    the 0ill 1or

    purchase"0y

    the 0ill sent to

    purchase" at

    the 0ill purchase" 0y

    purchase" 0y

    the general location o1

    locate" 2ithin1or thecustomersol" to 0y

    the salesman1or

    the salesmanager 1or

    1or thecustomer

    manage" 0y

    the salesman1or

    sol" 0y

    Sample ogical Mo"el

  • 7/25/2019 Data Modelling 242

    50/247

    )hysical Model

    ! A )hysical data model may include

    %&eferential Integrity

    %Indexes%iews

    %Alternate 1eys and other constraints

    %0ablespaces and physical storage ob#ects.

    PRODUCTS

  • 7/25/2019 Data Modelling 242

    51/247

    PRODUCTS

    P%OD'C(3CODEP%OD'C(3DESC%P(O*CA(E#O%/3CODECA(E#O%/3DESC%P(O*

    C'S(OME%3*,OCES

    *,OCE3D*E3(EM3SE.*,OCE3DA(EC'S(OME%3D$3(O3ADD%ESS3DSAES3%EP3DMA*A#E%3%EP3DO%#A*4A(O*3D

    O%#3ADD%ESS3DP%OD'C(3CODE.'A*((/'*(3P%CEAMO'*(oP%OD'C(3COS(OAD3DA(E

    C'S(OME%S

    C'S(OME%3DS*APS+O(3DA(EC'S(OME%3*AMEoA#EoMA%(A3S(A('SC%ED(3%A(*#

    C'S(OME%3ADD%ESSES

    C'S(OME%3DADD%ESS3DADD%ESS3*E5oADD%ESS3*E6oPOS(A3CODESAES3%EP3D#EO3CODEOAD3DA(E

    #EO#%AP+C3$O'*DA%ES

    #EO3CODEC(/3*AMES(A(E3*AMECO'*(%/3*AMEoC(/3A$$%,oS(A(E3A$$%,oCO'*(%/3A$$%,

    SAES3%EPS

    SAES3%EP3DAS(3*AME7%S(3*AMEoMA*A#E%37%S(3*AMEoMA*A#E%3AS(3*AME

    Sample PhysicaMo"el

  • 7/25/2019 Data Modelling 242

    52/247

  • 7/25/2019 Data Modelling 242

    53/247

    Data Architecting$"eal timeata

    ! &epresents current status of business

    ! -sed by operational systems to run business

    !

  • 7/25/2019 Data Modelling 242

    54/247

    Data Architecting $ &eal time

    dataTo use (ea! time $ata in DW&

    Must be

    !

  • 7/25/2019 Data Modelling 242

    55/247

    Data Architecting $ Derived data

    ! Data created by summari+ing, aggregating,

    averaging real$time data through some process! represents a view of business data at a specific

    time

    ! *istorical record of business over a period

    ! )recalculate derived data elements and summari+e

    detailed data to improve 'uery processing

  • 7/25/2019 Data Modelling 242

    56/247

    Data Architecting $ &econciled

    data

    ! &eal$time data cleansed, ad#usted, enhanced to

    provided integrated source of data for analysis

    !

  • 7/25/2019 Data Modelling 242

    57/247

    nterprise Data Model 4DM5

    !

  • 7/25/2019 Data Modelling 242

    58/247

    DM $ 0he )hased nterprise

    Data Model

  • 7/25/2019 Data Modelling 242

    59/247

    nterprise Data Model 4DM5

    ,hases

    Increasing order of Information re'uired

    ! Information )lanning! usiness Analysing

    ! :ogical Data Modeling

    ! )hysical Data design

  • 7/25/2019 Data Modelling 242

    60/247

    nterprise Data Model 4DM5

    #nformation Planning! C5! called sub#ect areas 8 super entity8 business entity in which

    the organi+ation is interested g.customer, product

    ,ur%ose

    % 0o set up scope and architecture of DW% 0o provide a single comprehensive point of view

  • 7/25/2019 Data Modelling 242

    61/247

    nterprise Data Model 4DM5

    Business $naly%ing! Define contents of primary business concepts.

    ! 3ather and arrange business re'uirements! Defines business terms

    ,ur%ose

    % 0o set up scope and architecture of DW

    % 0o provide a single comprehensive point of view

  • 7/25/2019 Data Modelling 242

    62/247

    nterprise Data Model 4DM5

    Logical Data Moeling

    ! nterprise$wide in scope

    ! consists of several entities, relationships, attributes! complete model in >rd 7ormal (orm.

  • 7/25/2019 Data Modelling 242

    63/247

    nterprise Data Model 4DM5

    P!ysical Data Design

    ! space

    ! performance! physical distribution of data

    ,ur%ose&

    0o design for the physical implementation

  • 7/25/2019 Data Modelling 242

    64/247

    nterprise Data Model 4DM5

    Is it possi*#e to ra! an EDM&&&

    7ot always

    )hased approach 2& asim%!e DM

    ! list of sub#ect areas 4EBF5! define business relationships between sub#ect areas

    ! define contents of each sub#ect area

  • 7/25/2019 Data Modelling 242

    65/247

    3ranularity

    :evel of summari+ation of data elements

    :evel of detail available in the data

    Morethe detailLo!erthe granularity

    Why is it important in DW===

    2pportunity for T(ADE-O

    performance vs. volume of data stored

    ability to access detailed data vs. cost of storage

    3ranularity

  • 7/25/2019 Data Modelling 242

    66/247

    3ranularity

  • 7/25/2019 Data Modelling 242

    67/247

    3ranularity

    To o.ercome tra$e-os between $ata .o!ume an$

    /uer" ca%abi!it" &

    Divide the data in the DW

  • 7/25/2019 Data Modelling 242

    68/247

    Data )artitioning ModelW*;=

    To un$erstan$* maintain an$ na.i'ate a DW

    0;)" of )artitioning

    ! :ogical and )hysical

    i i i d l

  • 7/25/2019 Data Modelling 242

    69/247

    Data )artitioning Model

    Lo,ica# artitionin,$ W*;=3oals6

    !

    !

    !

    !

    . .

    Data )artitioning Model

  • 7/25/2019 Data Modelling 242

    70/247

    Data )artitioning Model $

    :ogical )artitioning! )artition large volumes of data by splitting

    ! *elps to ma1e data easier to6

    ! &estructure

    Index

    ! "e'uentially scan

    ! &eorgani+e

    ! &ecover

    ! Monitor

    Data )artitioning Model

  • 7/25/2019 Data Modelling 242

    71/247

    Data )artitioning Model $

    :ogical )artitioning

    Logical Partition- HOW??

    #riteria

    ! 0ime period 4date, month, or 'uarter5

    %almost always chosen

    ! 3eography 4location5

    ! )roduct 4more generically, by line of business5

    ! 2rgani+ational unit

    ! A combination of the above

    Data )artitioning Model

  • 7/25/2019 Data Modelling 242

    72/247

    Data )artitioning Model $

    :ogical )artitioning

    :

    Data )artitioning Model "ub#ect

  • 7/25/2019 Data Modelling 242

    73/247

    Data )artitioning Model $"ub#ect

    Areas"ub#ect areas classified by the topics of interest to

    the business.

    ! FWG* rule

    %when, where, who, what, why, and how

    eg. Hwho could be customer, employee, manager, supplier,

    business partner, competitor.

    ! 3et a candidate list of sub#ect areas! Decompose,rearrange, select, redefine in more

    detail

    Data )artitioning Model "ub#ect

  • 7/25/2019 Data Modelling 242

    74/247

    Data )artitioning Model $"ub#ect

    Areas! Define the business relationships among

    sub#ect areas

    ! 0his will determine the dimensions used

    ! "ub#ect Areas help define criteria li1e6! -nit of the data model

    ! -nit of an implementation pro#ect

    ! -nit of management of the data! asis for the integration of multiple mplementations

    unit for analysis should be business process

  • 7/25/2019 Data Modelling 242

    75/247

    Data Moe#in, - Techni%ues

    What nees to *e moe#e urin,

  • 7/25/2019 Data Modelling 242

    76/247

    ! "0A3I73 A&A

    %;" 4maybe multiple data models are

    re'uired5! 2D"

    %;"

    ! DA0AWA&*2-"8DA0AMA&0%;"

    What nees to *e moe#e urin,

    a ata !arehouse pro.ect

  • 7/25/2019 Data Modelling 242

    77/247

    Data Modeling $ 0echni'ues

    ! Modeling techni'ues

    %$& Modeling

    %Dimensional Modeling

    Implementation and modeling

  • 7/25/2019 Data Modelling 242

    78/247

    Implementation and modeling

    styles

    ! Modeling versus implementation

    %Modeling6 describe what should be built tonon$technical fol1s

    %Implementation6 describe what is actually built

    to technical fol1s

  • 7/25/2019 Data Modelling 242

    79/247

    ! &elational modeling

    %-se for implementation

    %Difficult to understand by non$technical fol1s! Dimensional modeling

    %-se for modeling during analysis and design

    phases%

  • 7/25/2019 Data Modelling 242

    80/247

    E-R Moe#in,

    ! )roduces a data model, using two basic

    concepts entitiesand the re!ationshi%s

    between those entities.

    ! Detailed & models also contain attributes,

    which can be properties of either the entitiesor the relationships.

    Con'entions use in E-R

  • 7/25/2019 Data Modelling 242

    81/247

    ! ntities

    ! Attributes

    ! &elationships or Associations

    Con'entions use in E R

    moe#in,

    mp7ame Address

    elongs

    0o

    'MPL()''

  • 7/25/2019 Data Modelling 242

    82/247

  • 7/25/2019 Data Modelling 242

    83/247

  • 7/25/2019 Data Modelling 242

    84/247

  • 7/25/2019 Data Modelling 242

    85/247

  • 7/25/2019 Data Modelling 242

    86/247

  • 7/25/2019 Data Modelling 242

    87/247

  • 7/25/2019 Data Modelling 242

    88/247

  • 7/25/2019 Data Modelling 242

    89/247

  • 7/25/2019 Data Modelling 242

    90/247

  • 7/25/2019 Data Modelling 242

    91/247

  • 7/25/2019 Data Modelling 242

    92/247

  • 7/25/2019 Data Modelling 242

    93/247

  • 7/25/2019 Data Modelling 242

    94/247

  • 7/25/2019 Data Modelling 242

    95/247

  • 7/25/2019 Data Modelling 242

    96/247

  • 7/25/2019 Data Modelling 242

    97/247

  • 7/25/2019 Data Modelling 242

    98/247

  • 7/25/2019 Data Modelling 242

    99/247

  • 7/25/2019 Data Modelling 242

    100/247

  • 7/25/2019 Data Modelling 242

    101/247

    Entities

    ! )rincipal data ob#ects about which information

    is to be collected.

    ! -sually recogni+able concepts such as person,

    things, or events.

    ! xamples 6 M):2;", )&2J

  • 7/25/2019 Data Modelling 242

    102/247

    Attri*utes / Re#ationships

    ! Attributes describe the entity of which theyare associated.

    ! A relationship represents an associationbetween two or more entities. An example 6

    %mployees are assigned to pro#ects

    %Departments manage one or more pro#ects.

    T$pes of Data Re#ationships -C i #i

  • 7/25/2019 Data Modelling 242

    103/247

    Carina#it$

    ! 2ne $ 2ne G6 G

    ! 2ne $ Many G6 m

    ! Many $ Many m 6 n

    ! &ecursive data relationship

    Norma#i+ation

  • 7/25/2019 Data Modelling 242

    104/247

    +

    ! &emove data redundancy! C 7( $ contains repeating values

    ! G 7( $ 7o repeating values

    ! B 7( $ very attribute is dependent on the 1ey, thewhole 1ey and nothing but the 1ey

    ! > 7( $ 7o non$1ey attribute is functionally

    dependent on another non$1ey attribute! Denormali+ation $ carefully introduced redundancy to

    improve 'uery performance

    Norma#i+ation - 0N1

  • 7/25/2019 Data Modelling 242

    105/247

    +

    ! 'liminate "epeating groups

    ,erson S0i!!s

    A2racle, DB M" Access, 2racle

    < 2racle,

  • 7/25/2019 Data Modelling 242

    106/247

    +

    ! 'liminate "eunant ata

    S0i!! ID S0i!! Descri%tion

    "G DB"B 2racle

    "> M" Access

    "K

  • 7/25/2019 Data Modelling 242

    107/247

    +

    ! 'liminate Columns Not Depenent (n *ey

    Memb ID S0i!! ID #om% ID) #om% 1ame Location

    A "G DG

  • 7/25/2019 Data Modelling 242

    108/247

    &elational modeling

    ! &epresents business entities, data items

    associated with each entity, and the

    relationships of business interest among theentities

    ! ntities are usually bro1en down into

    smallest possible units and combined using

    relationships

    ! Diagram loo1s li1e a spiderweb

    E tit C # t Ch (#i t

  • 7/25/2019 Data Modelling 242

    109/247

    Entit$ Comp#eteness Chec(#ist

    ! 1ame%to describe the data contained

    %to meet naming conventions8standards

    ! Descri%tion%to describe precisely what the entity represents

    %re'uired for sharing and reuse of data model

    components

    ! #ate'or"

    %classifies entities sharing common characteristics

    Entit$ Comp#eteness Chec(#ist

  • 7/25/2019 Data Modelling 242

    110/247

    Entit$ Comp#eteness Chec(#ist

    3cont45!

  • 7/25/2019 Data Modelling 242

    111/247

    Entit$ Comp#eteness Chec(#ist

    3cont45! Abbre.iations

    %document the abbreviation and full definition

    ! Acron"ms%avoid 4not understood by all, not uni'ue5

    %if used, document them

    ! #urrent 1umber o occurrences%to estimate entity statistics for all entity

    categories

    Entit$ Comp#eteness Chec(#ist

  • 7/25/2019 Data Modelling 242

    112/247

    Entit$ Comp#eteness Chec(#ist

    3cont45! Authorit"

    %Metadata authority4to approve change of entities,

    attributes etc.5

    %Data authority4to change occurrences of entity5

    ! ,rimar" e"2orei'n e"21on-0e" attribute

    names

    ! (e!ationshi%s to other entities

    %no entity stands by itself

  • 7/25/2019 Data Modelling 242

    113/247

    6omon$ms

    "ame or similar in sound or spelling as another

    -0 DI((&70 I7 MA7I73

  • 7/25/2019 Data Modelling 242

    114/247

    "ame meaning ...

    "ame logical concept ...

    Assigned different names

    Introduce redundancy in model

    ID70I(; A7D &"2: them $ for entitiesand attributes

  • 7/25/2019 Data Modelling 242

    115/247

    S$non$ms 3cont45

  • 7/25/2019 Data Modelling 242

    116/247

    Chec(#ist

    ! 1ame%to uni'uely identify the attribute

    %to meet naming conventions8standards

    ! Descri%tion%to describe precisely what the attribute represents

    ! T"%e

    %refers to how the attribute is used in the datamodel

    Comp#eteness Chec(#ist 3cont45

  • 7/25/2019 Data Modelling 242

    117/247

    ! e" attributes

    %primary 1eys in the entity that they are defined

    %primary 8 foreign 1eys in other entities that they occur in5

    %implemented with a uni'ue index

    ! 1on-e" attributes

    %contain the bul1 of the information

    %need not be uni'ue

    %candidate 1eys not selected as primary 1eys%secondary 1eys may be selected as access paths

    %implemented using non$uni'ue index

    Comp#eteness Chec(#ist 3cont45

  • 7/25/2019 Data Modelling 242

    118/247

    ! Domain

    set of permitted values for the attribute

    Domain elements

    %3eneral Domain

    ! describes the manner in which data is represented4data type5

    ! alphanumeric, real, integer, boolean, sound, digital video etc.

    %"pecific Domain

    ! numerated domain! specific set of values that are valid and allowed

    ! static values 4eg. (lat type 6 B bed, > bed, duplex etc5

  • 7/25/2019 Data Modelling 242

    119/247

  • 7/25/2019 Data Modelling 242

    120/247

  • 7/25/2019 Data Modelling 242

    121/247

  • 7/25/2019 Data Modelling 242

    122/247

  • 7/25/2019 Data Modelling 242

    123/247

  • 7/25/2019 Data Modelling 242

    124/247

    Comp#eteness Chec(#ist 3cont45

  • 7/25/2019 Data Modelling 242

    125/247

    ! Abbre.iations

    %document the abbreviation and full definition

    ! Acron"ms

    %avoid 4not understood by all, not uni'ue5

    %if used, document them

    ! e" use

    %applies only to primary 1eys

    %will serve as primary or foreign 1ey in child entity

    ! Source

    %whether attribute is primitive or derived

    Comp#eteness Chec(#ist 3cont45

  • 7/25/2019 Data Modelling 242

    126/247

    %If derived, establish the formula

    %document formula%formula should identify any other attributes re'uired to

    generate value for derived attribute

    ! Tracea*i#it$%why is the attribute there

    %refer to source 4paragraph, citation of statement, physical

    data structure element ...5

    %mapped to metadata ob#ect that is maintained as part of

    system lifecycle 4eg.

  • 7/25/2019 Data Modelling 242

    127/247

    :ibrary ranch

    !

  • 7/25/2019 Data Modelling 242

    128/247

    Attribute Meta$ata

  • 7/25/2019 Data Modelling 242

    129/247

    Shou!$ Data mo$e! contain $eri.e$ attributes??

    ;" %represent information that management actually wants

    %users have an opportunity to specify business rules

    %provide an opportunity to validate that all necessary basedata is captured

    %design is made easier as re'uirements are already

    mappedIn DSS en.ironment - ESSE1TIAL

    N'+'" use erive attri,utes as P"#M$") -eys

    Deri'e attri*utes - An e7amp#e

    2&D& )&2D-

  • 7/25/2019 Data Modelling 242

    130/247

    2rder N4)L5

    order date

    2&D& )&2D-

  • 7/25/2019 Data Modelling 242

    131/247

    Attri*ute Names

  • 7/25/2019 Data Modelling 242

    132/247

    ! -ni'ue name representing its business meaning

    ! clear, concise, self$explanatory! minimi+e use of special characters

    ! length @ FC gives flexibility

    %limitations of >B, >> exist in some

  • 7/25/2019 Data Modelling 242

    133/247

    ! "*2-:D 720

  • 7/25/2019 Data Modelling 242

    134/247

    ! uilds on and is consistent with attribute name

    ! unambiguous, clear, economically worded! stand alone 4not dependent on another attribute

    definition to convey meaning.8EWAREof circular

    attributedefinitions5! 7ever MI"" giving a description

    A2ID6

    %restating the name of attribute and8or characteristics 4eg.:ength, data type, domain values5

    %using technical #argon

    % limiting description to direct extract from dictionary

    Some attri*ute escriptions

  • 7/25/2019 Data Modelling 242

    135/247

    7eed improvement

    ! :ocation name $ the name of a

    location

    ! order line total 'uantity $ a six$digit integer total

    ! directional indicator $ , W, ",

    7, 7

    )retty 3ood

    ! "afety level 'uantity $ 0he

    calculated minimum 'uantity of a

    product "L- that must be on

    hand to reduce ris1 of out$of$

    stoc1 conditions

    ! operating 'uantity $ 0he

    calculated, demand$driven

    'uantity of a material item thatmust be maintained and

    replenished for use in day$to$day

    operations

    rimar$ 9e$ Attri*utes

  • 7/25/2019 Data Modelling 242

    136/247

    rimar$ 9e$ Attri*utes

    ! "table4not to change in value, cannot be null5

    ! Minimal4in number of attributes.. :arge composite

    1eys not advisable5! (actless4should not contain intelligent groupings of

    data5

    ! Definitive4value always exists for every occurrence5

    rimar$ 9e$ Attri*utes

  • 7/25/2019 Data Modelling 242

    137/247

    !

  • 7/25/2019 Data Modelling 242

    138/247

    $ $

    Surro,ate 9e$s

    -se artificial 1ey8surrogate 1ey8pseudo$

    1ey8system$generated 1ey to ensure uni'ueness

    when6%no attribute possesses all )L characteristics

    %candidate 1eys are large and complex

    ! ALWAYS :SE IN DW Data Moe#

    Re#ationships- Chec(#ist! 7ame ? Description 2ptional

  • 7/25/2019 Data Modelling 242

    139/247

    ! 7ame ? Description $ 2ptional

    ! 0ype 4identifying8non$identifying5!

  • 7/25/2019 Data Modelling 242

    140/247

    Limitations of E-R Moe#in,

    ! )oor )erformance

    ! 0end to be very complex and difficult to

    navigate.

    Dimensional Modeling

  • 7/25/2019 Data Modelling 242

    141/247

    ! Dimensional modeling uses three basicconcepts 6 measures, facts, dimensions.

    ! Is powerful in representing the re'uirements

    of the business user in the context ofdatabase tables.

    ! (ocuses on numeric data, such as values

    counts, weights, balances and occurences.

    Dimensional modeling

  • 7/25/2019 Data Modelling 242

    142/247

    ! Must identify

    %usiness process to be supported

    %3rain 4level of detail5%Dimensions

    %(acts

    Dimensional modeling

  • 7/25/2019 Data Modelling 242

    143/247

    modeling

    ! (acts

    ! Measures4ariables5

    ! Dimensions%Dimension members

    %Dimension hierarchies

    (acts

  • 7/25/2019 Data Modelling 242

    144/247

    ! A fact is a collection of related data items,

    consisting of measures and context data.! ach fact typically represents a business

    item, a business transaction, or an event that

    can be used in analy+ing the business orbusiness process.

    ! (acts are measured, continuously valued,

    rapidly changing information.

  • 7/25/2019 Data Modelling 242

    145/247

    (act 0able

    ! A table that is used to store business

    information 4measures5 that can be used in

    mathematical e'uations.%/uantities

    %)ercentages

    %)rices

    Dimensions

  • 7/25/2019 Data Modelling 242

    146/247

    ! A dimension is a collection of members orunits of the same type of views.

    ! Dimensions determine the contextual

    bac1ground for the facts.

    ! Dimensions represent the way business

    people tal1 about the data resulting from a

    business process, e.g., who, what, when,

    where, why, how

    Dimension 0able

  • 7/25/2019 Data Modelling 242

    147/247

    Dimension 0able

    ! 0able used to store 'ualitative data about

    fact records

    %Who%What

    %When

    %Where

    %Why

    Dimension data should be

  • 7/25/2019 Data Modelling 242

    148/247

    Dimension data should be

    ! verbose, descriptive

    ! complete

    ! no misspellings, impossible values! indexed

    ! e'ually available

    ! documented 4 metadata to explain origin,

    interpretation of each attribute5

    Dimensional model

    i li di i l d l

  • 7/25/2019 Data Modelling 242

    149/247

    ! visualise a dimensional model as a in number5

    ! 2perations for 2:A)

    Dri!! Down&*igher level of detail(o!! 5%& summari+ed level of data

    40he navigation path is determined by hierarchies within dimensions.5

    S!ice&cuts through the cube.-sers can focus on specificperspectives

    Dice6rotates the cube to another perspective 4change thedimension5

    Drill down 9. &oll up

  • 7/25/2019 Data Modelling 242

    150/247

    "lice and Dice

  • 7/25/2019 Data Modelling 242

    151/247

    Dimensions

    ! < ll ti f b it f th t f

  • 7/25/2019 Data Modelling 242

    152/247

    !

  • 7/25/2019 Data Modelling 242

    153/247

    *ierarchies

    ! Allow for the Hrollup of data to more

    summari+ed levels.

    %0ime! day

    ! month

    ! 'uarter

    ! year

    6ierarchies

  • 7/25/2019 Data Modelling 242

    154/247

    Aggregates

  • 7/25/2019 Data Modelling 242

    155/247

    ! Aggregate 0ables are pre$storedsummari+ed tables9 created at a higher

    level of granularity across any or all of the

    dimensions.

    ! If the existing granularity is Day wise sales,

    then creating a separate month wise salestable is an example of Aggregate 0able.

    $ggregates

  • 7/25/2019 Data Modelling 242

    156/247

    ! 0he use of such aggregates is the singlemost effective tool the data warehouse

    designer has to improve 'uery performance.

    ! -sage of Aggregates can increase the

    performance of /ueries by several times.

    Measures

  • 7/25/2019 Data Modelling 242

    157/247

    Measures

    ! A measure is a numeric attribute of a fact,

    representing the performance or behaviour of the

    business relative to dimensions.

    ! 0he actual numbers are called as variables.g. sales in money, sales volume, 'uantity supplied, supply cost,

    transaction amount

    !A measure is determined by combinations of themembers of the dimensions and is located on

    facts.

    0*

  • 7/25/2019 Data Modelling 242

    158/247

    T$pes of 1acts

    Addi i

  • 7/25/2019 Data Modelling 242

    159/247

    ! Additive

    %Able to add the facts along all the dimensions

    %Discrete numerical measures eg. &etail sales in P

    ! "emi Additive

    %"napshot, ta1en at a point in time

    %Measures of Intensity

    %7ot additive along time dimension eg. Account

    balance, Inventory balance%Added and divided by number of time period to get

    a time$average

    T$pes of 1acts

    7 Additi

  • 7/25/2019 Data Modelling 242

    160/247

    ! 7on Additive

    %7umeric measures that cannot be added across anydimensions

    %Intensity measure averaged across all dimensions eg.

    &oom temperature%0extual facts $ A2ID 0*M

    Advantages of Dimensional

    M d li

  • 7/25/2019 Data Modelling 242

    161/247

    Modeling! Allows complex multi$dimensional data

    structure to be defined with a very simple data

    model.

    ! &educes number of physical #oins the 'uery

    has to process

    ! "implifies the view of data model.

    ! Allows DW* to expand and evolve with

    relatively low maintenance.

    "ample business process versus

    dimension table

  • 7/25/2019 Data Modelling 242

    162/247

    dimension table

    Products Customers Location SalesRep

    Date

    Product Sales

    ProductManufacturing

    EmployeeCompensation

    "ample measure versus

    dimension table

  • 7/25/2019 Data Modelling 242

    163/247

    dimension table

    Products Customers Location SalesRep

    Date

    Product Sales(!

    ProductManufacturing(units!

    SalesCommission (!

    Payroll (gross!(!

    Pro"uct "escriptionCategory co"eCategory "escription

    P%OD'C(n!oice "ate7iscal year.uarterMonthWee8

    (ME PE%OD

  • 7/25/2019 Data Modelling 242

    164/247

    Sample Logical Model

    1or Dimensional Data Mart

    ast name7irst name

    SAES %EP

    Wee8

    A""ress line 5A""ress line 6City nameState a00re!iationPostal co"eCountry name

    ADD%ESSCustomer snapshot "aten!oice "ate#ross sales.uantityPro"uct cost

    C'S(OME% %EP SAES

    Customer name

    C'S(OME%S

    Snapshot "ate

    Cre"it ratingMarital statusAge

    C'S(OME% DEMO#%AP+CS

    "PRODUCT#CODE) P%OD'C(3DESC%P(O*) CA(E#O%/3CODE) CA(E#O%/3DESC%P(O*

    P%OD'C(S&P%OD'C(3CODE&S*APS+O(3DA(E) MS%P) 'OM) P%MA%/3S'PP-E%3*AME) S'PP-E%3C(/3*AME) S'PP-E%3S(A(E3A$$%,

    S'PP-E% CO'*(%/ *AME

    P%OD'C(3S*APS+O(S

  • 7/25/2019 Data Modelling 242

    165/247

    ) S'PP-E%3CO'*(%/3*AME

    & SA-ES3%EP3D) -AS(3*AME) 7%S(3*AMEo

    MA*A#E%37%S(3*AMEoMA*A#E%3-AS(3*AME

    SA-ES3%EPS

    &*,OCE3D&-*E3(EM3SE.) *,OCE3DA(E) C'S(OME%3DA(E) $--3(O3ADD%ESS3D) SA-ES3%EP3D) MA*A#E%3%EP3D) O%#A*4A(O*3D) O%#3ADD%ESS3D) P%OD'C(3CODE) .'A*((/) '*(3P%CE) AMO'*(o P%OD'C( COS(

    ) -OAD3DA(E

    C'S(OME%3*,OCES

    & *,OCE3D&-*E3(EM3SE.) *,OCE3DA(E) S'PP-E%3D) ADD%ESS3D) $'D#E(3D) %E,SO*3SE.) $'D#E(3-*E3(EM3SE.) P%OD'C(3CODE) .'A*((/) '*(3P%CE) AMO'*() -OAD3DA(E

    P'%C+ASE3*,OCES

    &$'D#E(3D&%E,SO*3SE.&-*E3(EM3SE.) $-3(/PE3CODE) $-3(/PE3DESC%P(O*) O%#A*4A(O*3D) ADD%ESS3D) $'D#E(3PE%OD) -OAD3DA(E) $'D#E(3AMO'*() E9PE*D('%ESo P%OD'C(3CODE

    $'D#E(3DE(A-S

    &C'S(OME%3D&ADD%ESS3D) ADD%ESS3-*E5oADD%ESS3-*E6oPOS(A-3CODE

    ) SA-ES3%EP3D) #EO3CODE) -OAD3DA(E

    C'S(OME%3ADD%ESSES

    &C'S(OME%3D&S*APS+O(3DA(E) C'S(OME%3*AMEoA#EoMA%(A- S(A('S

    ) C%ED(3%A(*#

    C'S(OME%S

    &S'PP-E%3D&ADD%ESS3D) S'PP-E%3*AMEoPOS(A-3CODE) #EO3CODE

    ) -OAD3DA(E

    S'PP-E%3ADD%ESSES

    &O%#A*4A(O*3D&ADD%ESS3D) O%#3(/PE) O%#A*4A(O*3*AME) ADD%ESS3-*E5

    oADD%ESS3-*E6oPOS(A-3CODE) #EO3CODEoPA%E*(3O%#3D) -OAD3DA(E

    *(E%*A-3O%#3ADD%ESSES

    EO3CODE) C(/3*AME) S(A(E3*AME) CO'*(%/3*AMEoC(/3A$$%,oS(A(E3A$$%,oCO'*(%/3A$$%,

    #EO#%AP+C3$O'*DA%ES

    Sample PhysicMo"el

    1orData Warehou

  • 7/25/2019 Data Modelling 242

    166/247

    ! "tar

    %"ingle fact table surrounded by denormalised

    dimension tables

    %0he fact table primary 1ey is the composite of the

    foreign 1eys 4primary 1eys of dimension tables5

    %(act table contains transaction type information.

    %Many star schemas in a data mart%asily understood by end users, more dis1 storage

    re'uired

    xample of "tar$ schema

  • 7/25/2019 Data Modelling 242

    167/247

  • 7/25/2019 Data Modelling 242

    168/247

    ! "nowfla1e%"ingle fact table surrounded by normalised dimension

    tables

    %7ormali+es dimension table to save data storage space.

    %When dimensions become very very large

    %:ess intuitive, slower performance due to #oins

    ! May want to use both approaches, especially if

    supporting multiple end$user tools.

    xample of "now fla1e schema

  • 7/25/2019 Data Modelling 242

    169/247

    "nowfla1e $ Disadvantages

  • 7/25/2019 Data Modelling 242

    170/247

    g

    ! 7ormali+ation of dimension ma1es it

    difficult for user to understand

    ! Decreases the 'uery performance because itinvolves more #oins

    ! Dimension tables are normally smaller than

    fact tables $ space may not be a ma#or issueto warrant snowfla1ing

    9e$s ;44

  • 7/25/2019 Data Modelling 242

    171/247

    $

    ! )rimary Leys

    %uni'uely identify a record

    ! (oreign Leys

    %primary 1ey of another table referred here

    ! "urrogate Leys

    %system$generated 1ey for dimensions

    %1ey on its own has no meaning

    %integer 1ey, less space

    More 9e$s ;44

    ! "mart Leys

  • 7/25/2019 Data Modelling 242

    172/247

    ! "mart Leys

    %primary 1ey out of various attributes ofdimension

    %A2ID 0*M

    %Join to (act table should be on single surrogate1ey

    ! )roduction Leys

    %D2 720 -" )roduction defined attributes%usiness may reuse8change them $ DW cannot

    asic Dimensional Modeling

    0echni'ues

  • 7/25/2019 Data Modelling 242

    173/247

    0echni'ues

    ! "lowing changing Dimensions

    ! &apidly changing "mall Dimensions

    ! :arge Dimensions! &apidly changing :arge Dimensions

    ! Degenerate Dimensions

    ! Jun1 Dimensions

    "lowly

  • 7/25/2019 Data Modelling 242

    174/247

    y g g

    A dimension is considered a Slo.ly

    C!anging Dimensionwhen its attributes

    remain a#mostconstant over time, re'uiring

    relatively minor alterations to represent the

    evolved state.

    S#o!#$ chan,in, Dimension-Options

  • 7/25/2019 Data Modelling 242

    175/247

    E') e" $oes not chan'e but $escri%tion chan'es 6%ro$uct$escri%tion7

    TYE 0

    ! 2verwrite dimension record with newvalues

    %use$ when o!$ .a!ue o attribute has no

    si'niicance

    S#o!#$ chan,in, Dimension-Options

  • 7/25/2019 Data Modelling 242

    176/247

    TYE )

    !

  • 7/25/2019 Data Modelling 242

    177/247

    p 3 5

    TYE 2!

  • 7/25/2019 Data Modelling 242

    178/247

    An xample

    ! "lowly

  • 7/25/2019 Data Modelling 242

    179/247

    E') (a%i$ chan'es to %ro$uct $imension

    ! 0ype B 4use surrogate 1ey and create a new

    record5

    ! use effective dates

    ! use only until dimension table remains

    small

    Lar,e Dimensions

  • 7/25/2019 Data Modelling 242

    180/247

    Dimensions containin, se'era# mi##ion recors

  • 7/25/2019 Data Modelling 242

    181/247

    Dimensions containin, = 0>> mi##ion recors

  • 7/25/2019 Data Modelling 242

    182/247

    ! uild the data in this dimension with allpossible combinations of values for each

    attribute

    ! Identify each combination uni'uely! verytime an event occurs and is recorded

    in fact table, attach it with the uni'ue

    combination ID.

    Any fact table

    containing

    customerQ1ey as a

    foreign 1ey9..

    Any fact table

    containing

    customerQ1ey and

    d 1

  • 7/25/2019 Data Modelling 242

    183/247

  • 7/25/2019 Data Modelling 242

    184/247

    &elatively constant

    attributes 9.

    demogQLey

    demographic attributes

    9.

    Demographics dimension

  • 7/25/2019 Data Modelling 242

    185/247

    ! Advantages%7o increase in data storage everytim event occurs

    ! Drawbac1s

    %(orced to use ranges of discrete values fordimensional attributes

    %7ew dimension cannot be too big 4not @GM5

    %Data in new dimension can be accessed along with

    static data only through the fact table $s!ower

    %2nly if event occurs, lin1 the static and changing

    portions of dimension $ 0ee% a $umm" e.ent in act

    De,enerate Dimensions

  • 7/25/2019 Data Modelling 242

    186/247

    ! 2ccur in line item oriented fact tables

    ! occur when dimension table is left only

    with a single 1ey and no other fields! all other attributes have been moved into

    other dimension tables

    ! Moved to fact table $ not #oined to anything

    ?un( Dimensions

    ! 7umber of miscellaneous flags and text

  • 7/25/2019 Data Modelling 242

    187/247

    ! 7umber of miscellaneous flags and text

    attributes left over after design

    W*A0 02 D2 WI0* 0*M====

    D( N(/

    %:eave them behind in the fact table

    %Ma1e each flag and attribute into its own dimension

    %"trip off all such flags and attributes

    ?un( Dimensions 3cont;5

  • 7/25/2019 Data Modelling 242

    188/247

    ! D2%3rouping of random flags and attributes

    %ta1e away from fact and group them into8un0

    dimension

    eg. 2pen ended comments fields

    Conforme Dimensions! Dimension that means the same thing with every

    possible fact table that it is #oined

  • 7/25/2019 Data Modelling 242

    189/247

    possible fact table that it is #oined.

    ! Dimension is identically the same dimension in each

    data mart

    ! Ma#or responsibility of the central DWdesign team is to

    establish, publish, maintain and enforce them

    ! DW cannot function as an integrated whole without

    strict adherence to conformed dimensions

    Conforme Dimensions 3Cont45

  • 7/25/2019 Data Modelling 242

    190/247

    ! When you dont need

  • 7/25/2019 Data Modelling 242

    191/247

    0imeQ1ey

    dayQofQwee1dayQnumberQinQmonth

    dayQnumberQoverall

    wee1QnumberQinQyear

    month

    'uarter

    fiscalQperiod

    holidayQflag

    wee1dayQflag

    lastQdayQinQmonthQflagseason

    event

    0ime Dimension

  • 7/25/2019 Data Modelling 242

    192/247

    ! An exclusive 0ime dimension is re'uired

    because the "/: date semantics and

    functions cannot generate several important

    attributes re'uired for analytical purposes.

    ! Attributes li1e wee1days, wee1ends, fiscal

    period, holidays, season cannot begenerated by "/: statements.

    0ime Dimension

  • 7/25/2019 Data Modelling 242

    193/247

    ! Moreover "/: date stamps occupy more

    space largely increasing the si+e of the fact

    table.

    ! Joins on such "/: generated date$stamps

    are costly decreasing the 'uery speed

    significantly.

    0ime Dimension

    0h D f 14M d 5 i f l

  • 7/25/2019 Data Modelling 242

    194/247

    ! 0he Day of wee14Monday, ...5 is useful to

    create reports comparing for ex. Monday

    sales to (riday sales.

    ! 0he Day number in month is useful for

    comparing measures for the same day in

    each month.

    ! 0he last day in month flag is useful for

    performing payday analysis.

    0ime Dimension

  • 7/25/2019 Data Modelling 242

    195/247

    ! 0he holiday flag and season attributes areuseful for holiday " non$holiday analysis

    and season business analysis.

    ! vent attribute is needed to record special

    days li1e stri1e days, etc..

  • 7/25/2019 Data Modelling 242

    196/247

  • 7/25/2019 Data Modelling 242

    197/247

    Prouct,ro$uct e"

    )roduct Id)roduct category

    9..rand 7ame"L-9..

    Date

    Month;ear9.9.

    Promotion

    ,romotion 0e"

    )romotion Id)romotion

  • 7/25/2019 Data Modelling 242

    198/247

    ! 0he first sales fact table measures the sales

    figures at a granularity of "L-, Day and

    Individual "tore and )romotion name.

    ! 2nly the "L- s that actually sell on the

    day ma1e it into the sales fact table

    irrespective of whether they are on

    promotion or not.

    "etail C!ains Sample Dimensional moel

  • 7/25/2019 Data Modelling 242

    199/247

    ! 0he second promotion fact table is a

    factless fact table. It has a granularity of

    "L-, Day, "tore and )romotion 7ame.

    ! 0his promotion fact table records which

    items are on promotion in which stores and

    at what times.

    "etail C!ains Sample Dimensional moel

  • 7/25/2019 Data Modelling 242

    200/247

    ! 0ime, )roduct and "tore are common

    dimensions in both the fact tables.

    ! )roduct and )romotion are 0ype B "lowly

    changing dimensions.

    "etail C!ains Sample Dimensional moel

  • 7/25/2019 Data Modelling 242

    201/247

    ! 0he sales fact enables the sales monitoring andanalysis across )roduct, "tores, 0ime and

    )romotion dimensions.

    ! 0he second promotion fact table is needed toanswer the critical 'uestion 9. Which are the

    products that were on promotion but did not sell

    on a particular day=

    "etail C!ains Sample Dimensional moel

  • 7/25/2019 Data Modelling 242

    202/247

    ! 0he second fact table can be avoided if we 1eep

    +ero sales figures in the sales fact table9. but that

    would ma1e our sales fact table very

    large9.because less than FR of products whichwere on promotion on a particular day actually

    sell.

    "etail C!ains Sample Dimensional moel

    ! itmap Indexes on the foreign 1ey columns in

  • 7/25/2019 Data Modelling 242

    203/247

    the fact tables.

    ! itmap Indexes on low cardinality columns in

    dimensional tables li1e Month, )roduct

  • 7/25/2019 Data Modelling 242

    204/247

    ! 0he sales fact is partitioned across the Monthcolumn.

    ! Aggregates can be created in future based onunderstanding of fre'uently needed ? time

    ta1ing 'ueries loo1ing for summari+ed

    information.

    Aggregates

    !

  • 7/25/2019 Data Modelling 242

    205/247

    !

  • 7/25/2019 Data Modelling 242

    206/247

    g

    )roducts 6 >C

    rands 6 GFC

    i.e GFC rows in the )roduct Dimension

    ! 0ime Dimension

    ;ear 6 F

    Month 6 OC

    Days 6 >OFSFGTBF

    i.e GTBF rows in the 0ime Dimension

    Aggregates

  • 7/25/2019 Data Modelling 242

    207/247

    ! Assuming a transaction for each of the

    rands everydayU we have GTBFSGFC rows

    in our sales (act table.

    ! A /uery li1e6 "how

  • 7/25/2019 Data Modelling 242

    208/247

    ! )roduct

    ! 0ime

    ;ear 6 F

    Month6 OC

    0here would be OCS>GTC rows in thisaggregated fact table.

    0he 'uery on this table needs to access only

    GTC rows to get the same set of results.

    AggregatesM(N/

    Time e"

    C$/'3(")

    #ate'or" e"$334 S$L'S

  • 7/25/2019 Data Modelling 242

    209/247

    9 "

    Month(iscalQ)eriod

    "eason

    #ate'or"9e"

  • 7/25/2019 Data Modelling 242

    210/247

    data model.! Aggregates increase the maintenance load

    on the Data warehouse. 0hey must be

    updated as the base table data gets updated.! Aggregates occupy storage space. *ence

    aggregates should be created only for

    fre'uent and time ta1ing 'ueries.

    $ggregate Navigation

    ! Aggregate 7avigation features enable end$

  • 7/25/2019 Data Modelling 242

    211/247

    users to 'uery the data mart withoutbothering about the presence of aggregates.

    ! Without Aggregate navigation, the end userneeds to be aware of the presence of

    aggregates so that he can 'uery the

    aggregated table instead of detailed table9thus increasing the complexity of the user

    interface.

    $ggregate Navigation

    ! An aggregate navigator intercepts the

    clients "/: and if possible transforms

  • 7/25/2019 Data Modelling 242

    212/247

    client s "/: and if possible transforms

    base$level "/: into aggregate aware "/:.

    ! HAggregate Aware function in usiness2b#ects K.G is an example of Aggregate

    navigator.

    $ggregate Navigation

  • 7/25/2019 Data Modelling 242

    213/247

    ! 7ew features in 2racle Ti li1e Materiali+ed

    views, /uery rewrite

    ! enable aggregate navigation to be built

    within the data mart DM" instead of front

    end access tools.

    ! enables all front end access tools to utili+e

    the aggregate navigation feature.

    (actless (act table

  • 7/25/2019 Data Modelling 242

    214/247

    ! (actless fact tables are fact tables that donot have any measures.

    ! 0hese 1ind of fact tables arise when there

    are no obvious measures for the businessarea.

    ! Daily attendance trac1ing is one such

    example of a business area having noconcrete measures.

    0actless fact ta,les

  • 7/25/2019 Data Modelling 242

    215/247

    0IM

  • 7/25/2019 Data Modelling 242

    216/247

    Prouct,ro$uct e"

    )roduct Id)roduct category

    9..rand 7ame"L-9..

    Month;ear9.9.

    Promotion,romotion 0e"

    )romotion Id)romotion

  • 7/25/2019 Data Modelling 242

    217/247

    . Who 4people, groups, organi+ations5 is of interest to theuser=

    What 4functions5 is the user trying to analy+e=

    Why does the user need the data=

    When 4for what point in time5 does the data need to be

    recorded=

    Where 4geographically, organi+ationally5 do relevant

    processes occur=*ow do we measure the performance or state of the

    functions being analy+ed=

    Approaches to Data Gatherin,

    :) Source Dri.en

  • 7/25/2019 Data Modelling 242

    218/247

    :) Source Dri.en

    %define re'uirements by using the source data in

    production operational systems.

    %by analy+ing an & model of source data 2&

    %by analy+ing the actual physical record layouts andselecting data elements deemed to be of interest.

    A$.anta'es

    ! Lnow data that you can supply

    ! Minimi+e user involvement in early stages of pro#ect

    Disa$.anta'es

    ! Increased ris1 of producing wrong set of re'uirements

    Approaches to Data Gatherin,

    ;) 5ser Dri.en

  • 7/25/2019 Data Modelling 242

    219/247

    %define re'uirements by investigating the functions theusers perform

    %done through a series of meetings and8or interviews

    with users.

    A$.anta'es

    ! (ocus on what is needed rather than what is available

    Disa$.anta'es

    ! xpectations to be closely managed.

  • 7/25/2019 Data Modelling 242

    220/247

    Data Moe#in, for DataWarehouse - Steps

    0) Steps to Data moe#in, for Data Warehouse

  • 7/25/2019 Data Modelling 242

    221/247

    0) Steps to Data moe#in, for Data Warehouse

    14 Stuy '" 24'valuate an $nalyse

    54 "evie. Dimension 64 $ /ime Dimension74 #entify 0acts 84 3ranularity

    94 Merge 0acts :4 "evie. 0acts

    ;4 Name 0acts 1

  • 7/25/2019 Data Modelling 242

    222/247

    !

  • 7/25/2019 Data Modelling 242

    223/247

    Step 0@ Remove all entities that act asassociative entities and all subtypeentities.

    (eg.Product Component, Inventory,Order Line, Order, Retail Store, andCorporate Sales Ofce

    !ote" #e care$ul to create all themany%to%many relationships thatreplace these entities

  • 7/25/2019 Data Modelling 242

    224/247

    y y p g

    entities.

    &or each ne' entity, consider 'hich attributesin the original entities 'ould be use$ul

    constraints on the ne' dimension.Note " Remember to consider attributes o$ any

    subtype entities removed in the rst step.

    Logical )odel is a logical representation"

    remove individual *eys and replace 'ithgeneric *ey $or each dimension.

  • 7/25/2019 Data Modelling 242

    225/247

    + Roll the salesperson up into the salesdimension

    implies (correctly that the relationshipsamong outlet, salesperson and customer

    roll up into the sales to customerrelationship.

    +-he many%to%many relationship

    bet'een customer and sales preventsthe erroneous rollup o$ customer intosales person and ultimately into sales.

  • 7/25/2019 Data Modelling 242

    226/247

    organi+ation&e'uirements that are collected must represent

    these 6

    ! !hat is *ein, ana#$+e 3Dimensions5! e'a#uation criteria for !hat is *ein, ana#$+e

    4Measures5

    ID70I(; the measures and dimensionsAnaly+e the 'uestions, define measures and

    dimensions to meet re'uirements.

  • 7/25/2019 Data Modelling 242

    227/247

    -sed all information available!

  • 7/25/2019 Data Modelling 242

    228/247

    Do we have all data to answer all the 'uestions=a. "ales and Manufacturing=== ;es

    b. )roduct

    /B, /> can they be answered= 72Whats MI""I73==

    -nit costof model at anypoint in timeis re'uired.

    *istoryof unit cost re'uired. Addbegin and enddatein product dimension.

    -nit cost Derivation rule==

  • 7/25/2019 Data Modelling 242

    229/247

    :owest level of 0ime $ DA;&eporting re'uirements ===

    y day, wee1 and month

    (inal Dimension :ist

  • 7/25/2019 Data Modelling 242

    230/247

    2ne set of dimensions and its associated measuresma1e up what is called afact.

    2rgani+ing the dimensions and measures into facts .

    %0he process of grouping dimensions and measurestogether in a manner that can address the specified

    re'uirements. *2W=

    ! irst create an initial fact for each of the 'ueries in

    the case study.

    7ote6 (or any measures that describe exactly the same set of

    dimensions,create only one fact

  • 7/25/2019 Data Modelling 242

    231/247

    & &&F do not have any measures

    If we did not6

    merge /O with /F, /V in (act Kmerge /T and / with /B in (act B

    left withact!ess acts4fact with no measures5

    the sa!e o a %ro$uct at a %oint in time 6acts ; an$ 37 ata s%eciic !ocation 6act ; on!"7* has occurre$) 1o other

    measurement is re/uire$.

  • 7/25/2019 Data Modelling 242

    232/247

    :evel of detail at which fact is recorded0ry to 1eep at most detailed level 4summari+e if re'uired5

    A$$iti.it"6 ability of measure to be

    summari+ed! u!!" a$$iti.eadditive across all dimensions $advised5

    ! non-a$$iti.e adding R of B facts $ not possible5

    ! semi-a$$iti.e adding balances of same account at Bdifferent points in time. Additive only across some

    dimensions5

  • 7/25/2019 Data Modelling 242

    233/247

    0otal cost and total revenue 4daily5So!ution a. "plit into B facts

    b. Ma1e the time dimension consistent

    Ma1e time to lowest level $ DA;Average 'uantity on hand $ non$additive

    So!utionstore actual 'uantity on hand and let the

    'uery calculate average.

  • 7/25/2019 Data Modelling 242

    234/247

    /wo different levels of granularity/B 4daily5

    /T, / 4month5

    So!ution6 "ince measures are fully additive, set thegrain of time to a day. A 'uery can handle any

    summari+ation to the monthly level.

    6=

    /wo different grains of time. 7either can roll up to the other.

  • 7/25/2019 Data Modelling 242

    235/247

    2ptions6

    a.

  • 7/25/2019 Data Modelling 242

    236/247

    "eplace R 4(act >5 with 'uantity of models soldthrough6 $ retail outlet,

    $ corporate sales office

    $ salesperson.0otal 'uantity sold is already present.R can be calculated

    "eplace R 4(act K5 with 6

    $ number of models eligible for discount, $ 'uantity of models eligible for discount actually sold

    $ 'uantity of models sold at a discount.

  • 7/25/2019 Data Modelling 242

    237/247

    Consoli"ate 1acts 2here possi0le -W+/! asier for a user to find the data needed to satisfy a 'uery if

    there are fewer places to loo1.

    ! xpand the analysis potential because you can relate moremeasures to more dimensions at a higher level of

    granularity.

    !(ewer facts $ lesser administration

    *2W==

    Determine for each measure which additional dimensions can

    be added to increase its granularity

  • 7/25/2019 Data Modelling 242

    238/247

    (act B6 Already has all the dimensions in (act > and K

    (act > 6 Add "ales dimension to brea1 up

    0otal into (act B

  • 7/25/2019 Data Modelling 242

    239/247

    directly from the product dimension. 7ot needed inconsolidated (act

    ! )roduct dimension tells whether an individual model is

    eligible for discount! -se the total 'uantity sold4consolidated from fact >5 to

    represent the 'uantity of models eligible for discount

    actually sold.

  • 7/25/2019 Data Modelling 242

    240/247

    /uantity of models sold at a discount $ &etain2&

    ! record the discount amountand generate the

    'uantity sold at a discountby adding up the'uantity soldwhere the discount amountis not

    +ero.

    So!ution&Merge (act B, >, K

  • 7/25/2019 Data Modelling 242

    241/247

    (act G 6

  • 7/25/2019 Data Modelling 242

    242/247

    (act G $ Inventory (act(act B $ "ales (act

    0>4 Si+e the moe#

    ! calculate the si+e of the data in a table%number of rows S length of each row

    0o calculate row length6

    ! K bytes for each numeric or date attribute! number of characters for character attribute

    ! number of digits in a decimal attribute 8 B and rounded up.

  • 7/25/2019 Data Modelling 242

    243/247

    "eller 6:4 > corpX GFretail X >C salesmen5

    CC

    7o. of models experiencing changes GC per wee1

    GC S FB S K BCTC

    7o. of product rows >CC X BCTC 25265:<

  • 7/25/2019 Data Modelling 242

    244/247

    Si>e o Sa!es act

    112>

  • 7/25/2019 Data Modelling 242

    245/247

    #ase Stu$" 6cont$))7

    004 Recor Metaata

    $ ! 4 fi i i i f

  • 7/25/2019 Data Modelling 242

    246/247

    Mo$e! 47ame, Definition, )urpose,

  • 7/25/2019 Data Modelling 242

    247/247

    ! Confirms t!at moel meets user re?uirements! Confirms t!at user unerstans t!e moel4

    alidated portion goes through design&emaining goes bac1 in iterative development of model