dimensional modelling session 2
DESCRIPTION
TRANSCRIPT
- 1. Dimensional Modeling (2) Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008
- 2. Dimension Model vs. ER Model
- ER Model:
- Normalization to remove redundancy, anomaly and improve integrity up to 6NF
- 3 major types of relation, one-to-one, one-to-many, many-to-many
- Optimized for INSERT, UPDATE and DELETE type operation
- i.e. Perfect for OLTP applications (high volume of small transactions)
- Things to consider:
-
- ER does not really model a business; rather modelling the micro relationships amount data elements
-
- Query optimization
- 3. Dimension Model vs. ER Model (cont) Dimension Model: Denormalized to 2NF (reduce number of tables and join paths), creates redundancy 1 major type of relationship, one-to-many Ideal for SELECT operation Top down approach: focus on business process Designed to support analytical queries and user access Handle anomaly within ETL Predictable SQL Perfect for OLAP applications
- 4. Case Study 1 Project Writeaway (2009) Database SQL Server 2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil Load Complete refresh Typical report generation time ~3 seconds Project build time 4 months Highlights Drill Across Factless Fact Table Dimension Outrigger Dimension Bridging Junk Dimension
- 5. Case Study 2 Project Absenteeism (2006) Database SQL Server 2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load Incremental Typical report generation time ~25 seconds Project build time 4 weeks Highlights Drill Across Slowly Changing Dimension Active Data Warehouse
- 6. Case Study 3 Project Mortgage Wealth DNA (2009) Database Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150 mil Load Incremental Typical report generation time ~20-30 seconds Project build time 3 months Highlights Drill Across Aggregate Join Index Partitioning/Multi-Partitioning 99% aggregation done on Teradata on the fly minimise data retrieval
- 7. Case Study 4 Project Commway (2005) Database SQL Server 2000 Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load Incremental Typical report generation time ~ 30 seconds Project build time 18 months Highlights Drill Across Slowly Changing Dimension Active Data Warehouse .NET Front End for Data Entry (4000+ Users)
- 8. Skills we have now
-
- Dimension modeling techniques/templates for different processes and subject areas
-
- Practiced appropriate dimensional modeling techniques in different scenarios; Conformed Dimension , Junk Dimension , Outrigger Dimension, Rapid Changing Dimension, Dimension Bridging, Degenerate Dimension, Accumulating Snapshot Fact Table , Late Arrival Fact, Factless Fact Table
-
- Refining ETL coding techniques; fact-to-dimension foreign key lookup via natural key, source staging/staging/helper/interim table methodology
-
- Data Warehouse architecture for Dimensional Modeling
-
- Dimensional Modeling Workshop procedures
-
- ETL mapping documentations
-
- Reporting with Dimensional Model; multi-pass SQL
-
- Practiced Star Schema friendly Teradata functions; AJI, Partition, Multi-Partitions
-
- 9. Technologies we have now
-
- State-of-the-art Teradata hardware
-
- GDW in 3rd NF
-
- Essbase Studio (EIS)
-
- DataStage
-
- Oracle Grid coming online?
-
- OBIEE
- Next
-
- 10. Shared Dimension (Conformed) and Drill Across Drill across to different business process fact can be enable via confromed dimension
- 11. Shared Dimension (Conformed) and Drill Across (cont) To produce the following drill across report: SELECT Customer, Actual Amount, Forecast Amount FROM --Subquery Act returns Actuals ( SELECT Customer, SUM(Sales Amount) AS Actual Amount FROM Sales Fact, Customer JOIN ) Act INNER JOIN --Subquery Fsct returns Forecast ( SELECT Customer, SUM(Forecast Amount) AS Forecast Amount FROM Forecast Fact, Customer JOIN )Fsct --Join for the above 2 result sets ON Act.Customer = Fcst.Customer AND Back Customer Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown $63548 $85676
- 12. Junk Dimension
-
- Grouping of flags and indicators
-
- Clean up cluttered design that already has too many dimensions
-
- 4 indicators (as above example) collapsed into a single integer surrogate key in the fact table
-
- Provide a smaller, quicker point of entry for queries (probably not so relevant for database with BITMAP indices, e.g. Oracle)
- See Also: Kiball Design Tip #48: De-Clutter With Junk (Dimensions) http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf
- Back
-
- 13. Accumulating Snapshot Schema Useful to track a multi-step business process capture the process history in a single row Design to ease the query design and query performance Back
- 14. Roadmap
-
- Conformed Dimensions (Product, Department, Date) with full Slowly Changing Dimension (SCD) capability
-
- Best practice ETL (Error handling, batch controls, slowly changing dimension ETL, foreign key lookup, assigning surrogate key, entity start/end date generation, naming standard)
-
- Star Schema design review process (we build it and we kill it until it cant be killed!)
-
- Dimensional Modeling trainings
-
- Code generator: DataStage, Oracle Warehouse Builder??
-
- 15. Myth busted
-
- Teradata do not support Star Schema
-
- Star Schema cannot support large volume of data
-
- Column-Store vs. Row-Store ..column-store is able to process column-oriented data so effectivelyfinding that late materialization improves performance by a factor of threecompression provides about a factor of two on average [1]
- [1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs. Row-Store: How Different Are They Really? In SIGMOD08.
-
- 16. The road is long but we wont get lost!
- Books are on the way to our library!
-
- The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data (Ralph Kimall)
-
- Building the Data Warehouse (William E. Inmon)
-
- Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance (Christopher Adamson)
-
- The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Ralph Kimball)
- Online materials (Kimball Group http://www.kimballgroup.com )
- Bus Matrix Diagram
- Some more interesting academic papers/research on my desk!
- 17. Bus Matrix Back
- 18. Previous presentations