dim modeling paper-revised

Upload: jwash3

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Dim Modeling Paper-Revised

    1/34

    Successful DimensionalModeling of Very LargeData Warehouses

    By Bert Scalzo, Ph.D.

    [email protected]

  • 7/28/2019 Dim Modeling Paper-Revised

    2/34

    About the Author

    Oracle DBA from 4 through 8i

    Worked for Oracle Education

    Worked for Oracle Consulting

    Holds several Oracle Masters

    BS, MS and PhD in Computer Science

    MBA and insurance industry designations

    Articles in Oracle Magazine Oracle Informant

    PC Week (now E-Magazine)

  • 7/28/2019 Dim Modeling Paper-Revised

    3/34

    About Quest Software

  • 7/28/2019 Dim Modeling Paper-Revised

    4/34

    Know Your Application

    What type of application are you building:

    On Line Transaction Processing (OLTP)

    Operational Data Store (ODS)

    On Line Analytical Processing (OLAP)

    Data Mart / Data Warehouse (DM/DW)

  • 7/28/2019 Dim Modeling Paper-Revised

    5/34

    OLTP ODS OLAP DM/DW

    BusinessFocus Operational OperationalTactical Tactical TacticalStrategic

    End UserTools

    ClientServer Web

    Client ServerWeb

    Client Server Client ServerWeb

    DBTechnology

    Relational Relational Cubic Relational

    Trans Count Large Medium Small Small

    Trans Size Small Medium Medium Large

    Trans Time Short Medium Long Long

    Size in Gigs 10 200 50 400 50 400 400 - 4000

    Normalization 3NF 3NF N/A 0NF

    Data Modeling TraditionalER

    Traditional ER N/A Dimensional

  • 7/28/2019 Dim Modeling Paper-Revised

    6/34

    Embrace New Concepts

    Teach Old Dog New Tricks

    Throw out any OLTP baggage

    Forget OLTP Golden Rules

  • 7/28/2019 Dim Modeling Paper-Revised

    7/34

    Star Schema Design

    Star schema approach to dimensional datamodeling was pioneered by Ralph Kimball

    Dimensions: smaller, de-normalized tables containingbusiness descriptive columns that end-users query on

    Facts: very large tables with primary keys formed fromthe concatenation of related dimension table foreign key

    columns, and possessing numerically additive, non-keycolumns used for calculations during end-user queries

  • 7/28/2019 Dim Modeling Paper-Revised

    8/34

    Dimensions

    Facts

  • 7/28/2019 Dim Modeling Paper-Revised

    9/34

    108th -1010th

    103rd

    -105th

  • 7/28/2019 Dim Modeling Paper-Revised

    10/34

    Transform OLTP Model

    Fold OLTP model into itself to form a Star:

    De-Normalize parent/child relationships

    De-Normalize lookup relationships

    Use surrogate or meaningless keys

    Create and populate a time dimension

    Create hierarchies of data in dimensions

  • 7/28/2019 Dim Modeling Paper-Revised

    11/34OLTP Model

  • 7/28/2019 Dim Modeling Paper-Revised

    12/34Dimensional Model

  • 7/28/2019 Dim Modeling Paper-Revised

    13/34

    Dimension Hierarchies

    SQL> select distinct levelx from dw_period;

    LEVELX--------------------DAYMONTH

    QUARTERWEEKYEAR

    SQL> select distinct levelx from dw_product;

    LEVELX--------------------

    ALL PRODUCTSCATEGORYITEMPSASUB_CATEGORY

  • 7/28/2019 Dim Modeling Paper-Revised

    14/34

    Avoid Snowflakes

    Avoid natural desire to normalize model:

    Complicates end-user query construction

    Adds additional level of JOIN complexity

    Database optimizers do not handle very well

    Saves some space at the cost of longer queries

  • 7/28/2019 Dim Modeling Paper-Revised

    15/34

    Snowflake Model

  • 7/28/2019 Dim Modeling Paper-Revised

    16/34

    Common Aggregations

    Build end-user driven aggregate tables:

    By time (e.g. week, month, quarter, year)

    By geographic regions (e.g. time zones)

    By end-user reporting interests (e.g. beer)

    By dimension hierarchy (e.g. product category)

    Aggregates should be 5 to 10 times smaller

  • 7/28/2019 Dim Modeling Paper-Revised

    17/34

    Time Aggregates

  • 7/28/2019 Dim Modeling Paper-Revised

    18/34

    Non-Time Aggregates

  • 7/28/2019 Dim Modeling Paper-Revised

    19/34

    Index Design

    All fact table, foreign key columns must

    have individual bitmap indexes on them

    All dimension table, non-key columns

    should have individual bitmap indexes

  • 7/28/2019 Dim Modeling Paper-Revised

    20/34

    10 B-Tree Indexes

  • 7/28/2019 Dim Modeling Paper-Revised

    21/34

    48 Bitmap Indexes!!!

  • 7/28/2019 Dim Modeling Paper-Revised

    22/34

    Key Fact Table Issues

    Fact tables should:

    NOT create or enable foreign key constraints

    NOT create or enable table check constraints

    NOT create or enable primary/unique constraints(use unique indexes which offer parallel creation)

    NOT create or enable column check constraints

    (other than simple NOT NULL check constraints)

    NOT create or enable row level triggers

    NOT enable logging on tables or their indexes

  • 7/28/2019 Dim Modeling Paper-Revised

    23/34

    No PK/UK/FK Constraints

  • 7/28/2019 Dim Modeling Paper-Revised

    24/34

    Key Oracle Issues

    Trust meno way to build large DW in Oracle 7.X

    Very brief overview in next few slides of:

    Partioning options

    Indexing options

    Comparative timings

    Tuning ad-hoc Star queries

    Serial versus Parallel queries

    Materialized Views

  • 7/28/2019 Dim Modeling Paper-Revised

    25/34

    Oracle Partitioning

    Way beyond the scope of dimensional modeling, but:

    Use Range or List Partitioning using your time dimension

    Fact unique index = local, prefixed b-tree index

    Fact time index = local, prefixed bitmap index

    Fact non-time index = local, non-prefixed bitmap index

    If any non-time dimension provides a good locality of

    reference for typical user queries, then sub-partition on

    that dimension (i.e use 8is new composite partitioning)

  • 7/28/2019 Dim Modeling Paper-Revised

    26/34

    TABLE

    RELATIONALOBJECT

    TABLE IN

    CLUSTER

    TABLE IN

    TABLESPACE

    ORG INDEX ORG HEAP

    CLUSTER

    INDEX

    NON-

    CLUSTER

    INDEX

    INDEX NON-

    PARTITION

    INDEX NON-

    PARTITION

    GLOBAL GLOBAL

    1. BTREE 2. BTREE

    3. BITMAP

    TABLE NON-

    PARTITION

    TABLE

    PARTITION

    INDEX NON-

    PARTITION

    GLOBAL

    4. BTREE

    5. BITMAP

    INDEX

    PARTITION

    GLOBAL

    6. BTREE

    INDEX NON-

    PARTITION

    GLOBAL

    7. BTREE

    8. BITMAP

    INDEX

    PARTITION

    GLOBAL LOCAL

    9. BTREE 10. BTREE

    11. BITMAP

    TABLE-IZED

    INDEX

    INDEX NON-

    PARTITION

    GLOBAL

    12. BTREE

    Indexing Options!!!

  • 7/28/2019 Dim Modeling Paper-Revised

    27/34

    Oracle 8i Table Option Timings

    FactImplementation

    Timing

    Regular Heap Table 9,293

    Single ColumnPartition

    4,747

    Multi Column Partition 4,987

    Composite Partition 6,319

    Index Organized Table 12,508

    Partition IndexOrganized

    14,902NOTE: specific to my data and user queries

  • 7/28/2019 Dim Modeling Paper-Revised

    28/34

    Tuning Star Queries

    Way beyond the scope of dimensional modeling, but:

    Use Oracle 8.Xs Range Partitioning based upon your time

    dimension (do not try to use hash or composite partitioning)

    Fact unique index uses local, prefixed b-tree index

    Fact time index uses local, prefixed bitmap index

    Fact non-time index use local, non-prefixed bitmap index

    Typical User Query

  • 7/28/2019 Dim Modeling Paper-Revised

    29/34

    Query: beer and coffee sales

    for November of 98 in Dallas

    Typical User Query

    Best Explain Plan

  • 7/28/2019 Dim Modeling Paper-Revised

    30/34

    Star Transformation

    Best Explain Plan

  • 7/28/2019 Dim Modeling Paper-Revised

    31/34

    Explain Plan UNIX NT

    Serial, No Partition 9,688 22,344

    Serial, withPartition

    5,578 11,625

    Parallel, NoPartition

    ORA-600

    ORA-600

    Parallel, withPartition

    11,140

    25,454

    Oracle 8i Query Options

    NOTE: specific to my data and user queries

  • 7/28/2019 Dim Modeling Paper-Revised

    32/34

    Oracle 8i Materialized Views

    Way beyond the scope of dimensional modeling, but :

    Special form of snapshots (i.e. replication)

    End-users direct all queries against detail table

    Optimizer rewrites queries to use best aggregate

    Optimizer suggests new aggregates based on load

    Eliminates need for numerous aggregation programs

  • 7/28/2019 Dim Modeling Paper-Revised

    33/34

    Other DW Presentations

    Optimizing Data Warehouse Ad-Hoc Queries against "Star Schemas

    Attendees will learn optimal techniques for designing, monitoring and tuning "Star Schema" Data

    Warehouses in Oracle 8.0 and 8i. While there are numerous books and papers on Data Warehousing

    with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software

    architectures -- with some database design. This presentation provides the ground level, detailed recipe

    for successfully querying tables whose sizes exceed 500 million rows. Issues covered will include table

    and index designs, partitioning options, statistics and histograms, Oracle initialization parameters and

    star transformation explain plans. Attendees should be DBAs familiar with "Star Schema" database

    designs, have at least one years experience with Oracle 8.0, and some exposure to Oracle 8i.

    Optimizing Data Warehouse Loading via Parallelized Pro-C and SQL

    Attendees will learn optimal techniques for coding, monitoring and tuning parallel loading of Data

    Warehouses in Oracle 8.0 and 8i. While there are numerous books and papers on Data Warehousing

    with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software

    architectures -- with some database design. This presentation provides the ground level, detailed recipefor high speed loading of tables whose sizes exceed 500 million rows. Issues covered will include

    database instance options, table and index designs, partitioning options, optimizer choices, plus Oracle

    initialization parameters. Attendees should be DBAs or senior developers familiar with Oracle 8.X, Pro-

    C and SMP or MPP UNIX environments.

  • 7/28/2019 Dim Modeling Paper-Revised

    34/34

    THANK YOUFOR LISTENING