data vault & ensemble modeling - bi-podium · the genesee academy cdvdm – data vault modeling...
TRANSCRIPT
© 2013 Genesee Academy, LLC
25568 Genesee Trail Rd Golden, Colorado 80401
(303) 526-0340
Data Vault Modeling and Approach DW2.0 and Unstructured Data Master Data Management and Metadata
Data Vault & Ensemble Modeling
BI Podium Next Generation DWH Modeling 2013
Hans Hultgren
2013 Genesee Academy, LLC 25568 Genesee Trail Rd Golden, Colorado 80401
©
gohansgo
© 2013 Genesee Academy, LLC
Data Vault & Ensemble Modeling
• Welcome • Quick audience poll:
– Data Warehousing Business Intelligence – Data Vault Modeling – Certification Course
• Session will cover: – Data Vault – Ensemble – Unified Decomposition – Data Warehousing – Agility
• More information
© 2013 Genesee Academy, LLC
Data Vault and Ensemble Modeling
• About Data Warehousing - Characteristics
Each layer of the architecture has its own requirements, constraints & variables
3
1 Intro
© 2013 Genesee Academy, LLC
Data Vault and Ensemble Modeling
• Why do we need it?
Each layer of the architecture has its own requirements, constraints & variables
4
Intro
© 2013 Genesee Academy, LLC
Data Vault and Ensemble Modeling
• Why do we need it?
Each layer of the architecture has its own requirements, constraints & variables
5
Intro
3 layer architecture…
© 2013 Genesee Academy, LLC
About Data Vault, Ensemble & the EDW
6
• Enterprise Data Warehousing
• Integrated, Non-Volatile, Time-Variant, Subject/Concept Oriented, Central data store.
• Core Features: Enterprise-Wide, Historized, Auditable, Central Data, Integrated across all forms of sources internal and external.
2 Intro
Why data vault…
© 2013 Genesee Academy, LLC
Why do we use Data Vault
7
• Integration • Traceability • History • Incremental Build • Agility
• Gracefully Adapts to New Sources • Full Auditability - Source to Mart • Enterprise View of Central Data
• Data Vault is optimized for modeling the EDW
2 Intro
What is data vault…
© 2013 Genesee Academy, LLC
• Data Vault is the leading data modeling approach among new options for the flexible/agile data warehouse.
Data Modeling Approaches:
Operational Data Warehouse Data Mart
• For data warehouse agility there are other techniques as well. The
broader family of techniques are all flavors of Ensemble Modeling. • In effect Ensemble modeling = EDW modeling. • Ensemble is based on the premise: The flexibility required by the data
warehouse needs a model that de-couples changing context from relationships from the business keys (Unified Decomposition).
Data Vault & Ensemble Modeling
3rd Normal Form Data Vault Dimensional
2 Intro
Agenda…
© 2013 Genesee Academy, LLC
• Background Topics: – Core Business Concepts – Agility
• Unified Decomposition • Ensemble Modeling • Data Vault Agility • The Data Vault Ensemble • Data Vault Core Constructs • Applying Data Vault • Core Concepts and the Backbone • DV Pattern applied • Bottom Line and Summary
Agenda
© 2013 Genesee Academy, LLC
INTEGRATION & THE CORE BUSINESS CONCEPT
© 2013 Genesee Academy, LLC
The Core Business Concept
11
• The Core Business Concept is the basis for our Data Vault Data Warehouse. It is similar to the Entity in 3NF or a Dimension in a Star Schema. And so it commonly includes Customer, Product, Employee, and etc.
• Important to note: 1) Business Driven, and 2) Enterprise Wide.
© 2013 Genesee Academy, LLC
ABOUT AGILITY
© 2013 Genesee Academy, LLC
Agile Data Warehousing BI
13
• Agility = Measure of ability to Adapt to Change
• The EDW is constantly needing to adapt to change
– New Sources – New Attributes – Changing Sources – New and Changing Requirements – New and Changing Business Rules – New and Changing Deliveries – Expanding Subject Areas
Data Warehousing
Adapting to Change =
4
© 2013 Genesee Academy, LLC
UNIFIED DECOMPOSITION™
© 2013 Genesee Academy, LLC
Unified Decomposition™
15
Separate things that change from things that are not changing.
• Break things out into component parts for flexibility and to facilitate the capture of things that are either interpreted in different ways or changing independently of each other. Decomposition.
• These parts however need to be integrated to define the core business concept (the Entity, the Dimension, etc.). So they must be kept together. Unified.
© 2013 Genesee Academy, LLC
Ensemble Modeling™
16
All the parts of a thing taken together, so that each part is considered only in relation to the whole.
• The constellation of component parts acts as a whole – an Ensemble.
• With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts.
• An Ensemble is based on all things defining a Core Business Concept that can be uniquely and specifically said for one instance of that Concept.
© 2013 Genesee Academy, LLC
Data Vault Agility
17
• The Data Vault Ensemble conforms to a single key embodied in the Hub construct.
• The component parts for the Data Vault Ensemble include: – Hub The Natural Business Key – Link The Natural Business Relationships – Satellite All Context, Descriptive Data and History
© 2013 Genesee Academy, LLC
The Data Vault Ensemble
18
• Data Vault constructs have been broken out by type of data…
Customer Customer
Core Constructs…
Core
© 2013 Genesee Academy, LLC
Hubs
– A Hub Construct in Data Vault • contains Business Key • only the Business Key • contains No Context • is always 1:1 with EWBK
– A Hub Table contains only • Business Key • Surrogate Key (Data Warehouse) • Load Date / Time Stamp • Record Source
Record source
Date/Time Stamp
Business Key
H_Customer_SID
H_Customer
© 2013 Genesee Academy, LLC
Links
– A Link Construct in Data Vault • contains Relationship • only a Relationship • contains No Context • is always 1:1 with Relationship
– A Link Table contains only • 2-n FKs for the Relationship • Surrogate Key (Data Warehouse) • Load Date / Time Stamp • Record Source
L_Cust_Class_SID
H_Sequence1_SID
H_Sequence2_SID
Date/Time Stamp
Record source
L_Cust_Class
© 2013 Genesee Academy, LLC
Satellites
– A Satellite Construct in Data Vault • contains Context only • has no FKs (no relationships) • Designed by * Rate of Change
* Type of Data * System…
– A Satellite Table contains only • Business Key FK + • Load Date / Time Stamp • Context Data… • Record Source
Context A Context B Context C
H_Customer
Record source Context D
Date/Time Stamp
S_Customer
© 2013 Genesee Academy, LLC
Applying the data vault modeling pattern
© 2013 Genesee Academy, LLC
Data Vault Model – How it Looks
23
Data Vault Model for Customer Sales with Employee and Product.
© 2013 Genesee Academy, LLC
Core Concepts
24
© 2013 Genesee Academy, LLC
Core Concepts
25
Six (6) Concept Keys
© 2013 Genesee Academy, LLC
Data Vault Backbone
Six (6) Concept Keys
The model as viewed.. without the things that describe the key without the things that change over time
The core foundation, the skeletal structure of the data vault model
© 2013 Genesee Academy, LLC
The Complete Data Vault Model
27
Complete model with all context and history. Easily adapting to changes.
© 2013 Genesee Academy, LLC
Applying the data vault modeling pattern
© 2013 Genesee Academy, LLC
Tracking History: Time Slice Data
© 2013 Genesee Academy, LLC
Tracking History: Time Slice Data
© 2013 Genesee Academy, LLC
Tracking History: Time Slice Data
© 2013 Genesee Academy, LLC
Tracking History: Time Slice Data
© 2013 Genesee Academy, LLC
Tracking History: Time Slice Data
© 2013 Genesee Academy, LLC
Impact of Change: New Attribute
34
New Attribute
5
© 2013 Genesee Academy, LLC
• The Data Warehouse needs to adapt to change easily, be based on central business concepts, integrate data from several sources, track history of changing context, contain trusted and auditable information, and it needs to perform.
• Answering this call means a data warehouse program that is designed to meet these requirements with the people, processes, and the modeling techniques that support them.
• Data Warehouse modeling => Ensemble modeling. Techniques that are based on Unified Decomposition. There are several forms of Ensemble methods in play today.
• Data Vault modeling is the leading form of Ensemble modeling today.
• The Best Practice is Modeling Awareness
The Bottom Line
© 2013 Genesee Academy, LLC
Data Vault Around the World
36
Estimated 750 Data Vault based Data Warehouses around the world
© 2013 Genesee Academy, LLC
Data Vault Certification Course
The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of data vault modeling. The course is delivered in a blended learning method using online video lessons (2 weeks), classroom lectures, exercises, labs and small group modeling cases. Public courses are offered on a regular schedule www.GeneseeAcademy.com and there are in-company options as well.
37
Data Vault Class
June 10-11
Amsterdam NL
Register Today!
© 2013 Genesee Academy, LLC
• Hans Hultgren is an author, speaker, educator and advisor in the data warehousing and business intelligence space. He is an expert on data vault modeling and the author of Modeling the Agile Data Warehouse with Data Vault where he introduced Ensemble Modeling and Unified Decomposition.
• Hans is the President of Genesee Academy, LLC (including also
www.DataVaultAcademy.com) which provides the CDVDM data vault certification around globe.
• For 20 years Hans was a professor at DU where he was the founder and
director of the masters of science degree in business intelligence and data warehousing MSBI.
About Hans Hultgren
© 2013 Genesee Academy, LLC
Links and Information
CDVDM Training & Certification
www.GeneseeAcademy.com [email protected]
gohansgo
Book DataVaultBook.blogspot.com
HansHultgren.WordPress.com
HansHultgren
39
Online video-lesson training
DataVaultAcademy.com
DataVaultAcademy
Data Vault Class
June 10-11
Amsterdam NL
Register Today!