dim modeling paper-revised
TRANSCRIPT
-
7/28/2019 Dim Modeling Paper-Revised
1/34
Successful DimensionalModeling of Very LargeData Warehouses
By Bert Scalzo, Ph.D.
-
7/28/2019 Dim Modeling Paper-Revised
2/34
About the Author
Oracle DBA from 4 through 8i
Worked for Oracle Education
Worked for Oracle Consulting
Holds several Oracle Masters
BS, MS and PhD in Computer Science
MBA and insurance industry designations
Articles in Oracle Magazine Oracle Informant
PC Week (now E-Magazine)
-
7/28/2019 Dim Modeling Paper-Revised
3/34
About Quest Software
-
7/28/2019 Dim Modeling Paper-Revised
4/34
Know Your Application
What type of application are you building:
On Line Transaction Processing (OLTP)
Operational Data Store (ODS)
On Line Analytical Processing (OLAP)
Data Mart / Data Warehouse (DM/DW)
-
7/28/2019 Dim Modeling Paper-Revised
5/34
OLTP ODS OLAP DM/DW
BusinessFocus Operational OperationalTactical Tactical TacticalStrategic
End UserTools
ClientServer Web
Client ServerWeb
Client Server Client ServerWeb
DBTechnology
Relational Relational Cubic Relational
Trans Count Large Medium Small Small
Trans Size Small Medium Medium Large
Trans Time Short Medium Long Long
Size in Gigs 10 200 50 400 50 400 400 - 4000
Normalization 3NF 3NF N/A 0NF
Data Modeling TraditionalER
Traditional ER N/A Dimensional
-
7/28/2019 Dim Modeling Paper-Revised
6/34
Embrace New Concepts
Teach Old Dog New Tricks
Throw out any OLTP baggage
Forget OLTP Golden Rules
-
7/28/2019 Dim Modeling Paper-Revised
7/34
Star Schema Design
Star schema approach to dimensional datamodeling was pioneered by Ralph Kimball
Dimensions: smaller, de-normalized tables containingbusiness descriptive columns that end-users query on
Facts: very large tables with primary keys formed fromthe concatenation of related dimension table foreign key
columns, and possessing numerically additive, non-keycolumns used for calculations during end-user queries
-
7/28/2019 Dim Modeling Paper-Revised
8/34
Dimensions
Facts
-
7/28/2019 Dim Modeling Paper-Revised
9/34
108th -1010th
103rd
-105th
-
7/28/2019 Dim Modeling Paper-Revised
10/34
Transform OLTP Model
Fold OLTP model into itself to form a Star:
De-Normalize parent/child relationships
De-Normalize lookup relationships
Use surrogate or meaningless keys
Create and populate a time dimension
Create hierarchies of data in dimensions
-
7/28/2019 Dim Modeling Paper-Revised
11/34OLTP Model
-
7/28/2019 Dim Modeling Paper-Revised
12/34Dimensional Model
-
7/28/2019 Dim Modeling Paper-Revised
13/34
Dimension Hierarchies
SQL> select distinct levelx from dw_period;
LEVELX--------------------DAYMONTH
QUARTERWEEKYEAR
SQL> select distinct levelx from dw_product;
LEVELX--------------------
ALL PRODUCTSCATEGORYITEMPSASUB_CATEGORY
-
7/28/2019 Dim Modeling Paper-Revised
14/34
Avoid Snowflakes
Avoid natural desire to normalize model:
Complicates end-user query construction
Adds additional level of JOIN complexity
Database optimizers do not handle very well
Saves some space at the cost of longer queries
-
7/28/2019 Dim Modeling Paper-Revised
15/34
Snowflake Model
-
7/28/2019 Dim Modeling Paper-Revised
16/34
Common Aggregations
Build end-user driven aggregate tables:
By time (e.g. week, month, quarter, year)
By geographic regions (e.g. time zones)
By end-user reporting interests (e.g. beer)
By dimension hierarchy (e.g. product category)
Aggregates should be 5 to 10 times smaller
-
7/28/2019 Dim Modeling Paper-Revised
17/34
Time Aggregates
-
7/28/2019 Dim Modeling Paper-Revised
18/34
Non-Time Aggregates
-
7/28/2019 Dim Modeling Paper-Revised
19/34
Index Design
All fact table, foreign key columns must
have individual bitmap indexes on them
All dimension table, non-key columns
should have individual bitmap indexes
-
7/28/2019 Dim Modeling Paper-Revised
20/34
10 B-Tree Indexes
-
7/28/2019 Dim Modeling Paper-Revised
21/34
48 Bitmap Indexes!!!
-
7/28/2019 Dim Modeling Paper-Revised
22/34
Key Fact Table Issues
Fact tables should:
NOT create or enable foreign key constraints
NOT create or enable table check constraints
NOT create or enable primary/unique constraints(use unique indexes which offer parallel creation)
NOT create or enable column check constraints
(other than simple NOT NULL check constraints)
NOT create or enable row level triggers
NOT enable logging on tables or their indexes
-
7/28/2019 Dim Modeling Paper-Revised
23/34
No PK/UK/FK Constraints
-
7/28/2019 Dim Modeling Paper-Revised
24/34
Key Oracle Issues
Trust meno way to build large DW in Oracle 7.X
Very brief overview in next few slides of:
Partioning options
Indexing options
Comparative timings
Tuning ad-hoc Star queries
Serial versus Parallel queries
Materialized Views
-
7/28/2019 Dim Modeling Paper-Revised
25/34
Oracle Partitioning
Way beyond the scope of dimensional modeling, but:
Use Range or List Partitioning using your time dimension
Fact unique index = local, prefixed b-tree index
Fact time index = local, prefixed bitmap index
Fact non-time index = local, non-prefixed bitmap index
If any non-time dimension provides a good locality of
reference for typical user queries, then sub-partition on
that dimension (i.e use 8is new composite partitioning)
-
7/28/2019 Dim Modeling Paper-Revised
26/34
TABLE
RELATIONALOBJECT
TABLE IN
CLUSTER
TABLE IN
TABLESPACE
ORG INDEX ORG HEAP
CLUSTER
INDEX
NON-
CLUSTER
INDEX
INDEX NON-
PARTITION
INDEX NON-
PARTITION
GLOBAL GLOBAL
1. BTREE 2. BTREE
3. BITMAP
TABLE NON-
PARTITION
TABLE
PARTITION
INDEX NON-
PARTITION
GLOBAL
4. BTREE
5. BITMAP
INDEX
PARTITION
GLOBAL
6. BTREE
INDEX NON-
PARTITION
GLOBAL
7. BTREE
8. BITMAP
INDEX
PARTITION
GLOBAL LOCAL
9. BTREE 10. BTREE
11. BITMAP
TABLE-IZED
INDEX
INDEX NON-
PARTITION
GLOBAL
12. BTREE
Indexing Options!!!
-
7/28/2019 Dim Modeling Paper-Revised
27/34
Oracle 8i Table Option Timings
FactImplementation
Timing
Regular Heap Table 9,293
Single ColumnPartition
4,747
Multi Column Partition 4,987
Composite Partition 6,319
Index Organized Table 12,508
Partition IndexOrganized
14,902NOTE: specific to my data and user queries
-
7/28/2019 Dim Modeling Paper-Revised
28/34
Tuning Star Queries
Way beyond the scope of dimensional modeling, but:
Use Oracle 8.Xs Range Partitioning based upon your time
dimension (do not try to use hash or composite partitioning)
Fact unique index uses local, prefixed b-tree index
Fact time index uses local, prefixed bitmap index
Fact non-time index use local, non-prefixed bitmap index
Typical User Query
-
7/28/2019 Dim Modeling Paper-Revised
29/34
Query: beer and coffee sales
for November of 98 in Dallas
Typical User Query
Best Explain Plan
-
7/28/2019 Dim Modeling Paper-Revised
30/34
Star Transformation
Best Explain Plan
-
7/28/2019 Dim Modeling Paper-Revised
31/34
Explain Plan UNIX NT
Serial, No Partition 9,688 22,344
Serial, withPartition
5,578 11,625
Parallel, NoPartition
ORA-600
ORA-600
Parallel, withPartition
11,140
25,454
Oracle 8i Query Options
NOTE: specific to my data and user queries
-
7/28/2019 Dim Modeling Paper-Revised
32/34
Oracle 8i Materialized Views
Way beyond the scope of dimensional modeling, but :
Special form of snapshots (i.e. replication)
End-users direct all queries against detail table
Optimizer rewrites queries to use best aggregate
Optimizer suggests new aggregates based on load
Eliminates need for numerous aggregation programs
-
7/28/2019 Dim Modeling Paper-Revised
33/34
Other DW Presentations
Optimizing Data Warehouse Ad-Hoc Queries against "Star Schemas
Attendees will learn optimal techniques for designing, monitoring and tuning "Star Schema" Data
Warehouses in Oracle 8.0 and 8i. While there are numerous books and papers on Data Warehousing
with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software
architectures -- with some database design. This presentation provides the ground level, detailed recipe
for successfully querying tables whose sizes exceed 500 million rows. Issues covered will include table
and index designs, partitioning options, statistics and histograms, Oracle initialization parameters and
star transformation explain plans. Attendees should be DBAs familiar with "Star Schema" database
designs, have at least one years experience with Oracle 8.0, and some exposure to Oracle 8i.
Optimizing Data Warehouse Loading via Parallelized Pro-C and SQL
Attendees will learn optimal techniques for coding, monitoring and tuning parallel loading of Data
Warehouses in Oracle 8.0 and 8i. While there are numerous books and papers on Data Warehousing
with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software
architectures -- with some database design. This presentation provides the ground level, detailed recipefor high speed loading of tables whose sizes exceed 500 million rows. Issues covered will include
database instance options, table and index designs, partitioning options, optimizer choices, plus Oracle
initialization parameters. Attendees should be DBAs or senior developers familiar with Oracle 8.X, Pro-
C and SMP or MPP UNIX environments.
-
7/28/2019 Dim Modeling Paper-Revised
34/34
THANK YOUFOR LISTENING