experiences from a data vault pilot exploiting the internet of things

24
Business Analytics For All BA4All – Insight Session April 29 th 2014 Guy Van der Sande – Vincent Greslebin Unlocking the Value within the Data Vault

Upload: guyvandersande

Post on 23-Jun-2015

185 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Business Analytics For All

BA4All – Insight Session April 29th 2014 Guy Van der Sande – Vincent Greslebin

Unlocking the Value within the Data Vault

Page 2: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fifthplay : Architecture

Smart Homes Platform Data Warehouse

Data Vault

Data mart

Gebruikers

Marketing & SSC

Utility Portal

ETL Dag - 1

- Controle data kwaliteit - Toepassing business

rules - Aggregatie - Filtering

Facility Portal

Page 3: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fifthplay : Why Data Vault ?

• Pattern based design which allows agility to take place

• Easy to add new data sources making it future proof. This allows

Fifthplay to stay innovative

• Large volume of data

• Build up history that is not available in the operational system

• Possibility of performing analysis on raw data (cfr quality checks)

• Development speed (Pilot : 37 working days)

Page 4: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Data Vault ?

Page 5: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Data Vault ?

Page 6: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Data Vault ?

The Data Vault is a detail oriented, historical tracking and

uniquely linked set of normalized tables that support one or

more functional areas of business. It is a hybrid approach

encompassing the best of breed between 3rd normal form

(3NF) and star schema.

The design is flexible, scalable, consistent and adaptable to

the needs of the enterprise.

Page 7: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Standard architecture

The centerpiece of the Enterprise Data Warehouse

History is build-up

Granularity as ‘detailed’ as possible

No use of business rules

Use of business keys that are horizontal in nature and

provide visibility across lines of business

A new layer which has the benefits of the RAW Data

Vault, but with the business data embedded

In the Business Data Vault the data has been altered,

cleansed and changed to meet the business rules

Downstream of the raw data vault

Starting point for Master Data Management

Metadata is absolutely vital

Page 8: Experiences from a Data Vault Pilot Exploiting the Internet of Things

The Data Vault Model exists of 3 basic entity types

• Hubs : contains a unique list of business keys

• Links : associations across or between business keys

• Satellites : holds descriptive data (about the business key) over time

Component parts of the Data Vault model

Page 9: Experiences from a Data Vault Pilot Exploiting the Internet of Things

• Represents a Core Business Concept

• Is formed around the Business Key of this concept

• Is established the first time a new instance of that

business key is introduced

• Must be 1:1 with a single instance

• Consists of the business key, a sequence id, a load

date/time stamp and a record source.

Component parts - Hub

Page 10: Experiences from a Data Vault Pilot Exploiting the Internet of Things

• Represents a natural business relationship between business keys

• Is established the first time this new unique association is presented

• Can represent an association between several Hubs and sometimes other

Links.

• maintains a 1:1 relationship with the unique and specific business defined

association between that set of keys.

• Consists of the sequence ids from the Hubs and Links

• Contains sequence id, a load date/time stamp and a

• record source.

Component parts - Link

Page 11: Experiences from a Data Vault Pilot Exploiting the Internet of Things

• The Satellite contains the descriptive information

(context) for a business key.

• A Satellite can only describe one key (Hub or a Link).

• The Satellite is the only construct that manages time

slice data (data warehouse historical tracking of

values over time).

Component parts - Satellite

Page 12: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fact

Dimension 1

Dimension 3

Dimension 2

Dimension 4

Data Vault – Why ?

Page 13: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fact

Dimension 1

Dimension 3

Dimension 2

Dimension 4

Data Vault – Why ?

Page 14: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fact

Dimension 1

Dimension 3

Dimension 2

Dimension 4

Fact

Dimension 5

Data Vault – Why ?

Page 15: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Data Vault – Why ? DV

DM

Page 16: Experiences from a Data Vault Pilot Exploiting the Internet of Things

DV

DM

S

S

S S

H

S

L

H

H

H

Data Vault – Why ?

Page 17: Experiences from a Data Vault Pilot Exploiting the Internet of Things

DV

DM

S

S

S S

H

S

L

H

H

H

Dimension Fact

Data Vault – Why ?

Page 18: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Data Vault – How did we do it with Fifthplay ?

Page 19: Experiences from a Data Vault Pilot Exploiting the Internet of Things

HubServicePartner HubCustomer HubHomeAreaManager

HubSmartPlug

HubDeviceGroup

HubEnergyLogType

LinkServicePartnerCustomer LinkCustomerHomeAreaManager

LinkHomeAreaManagerSmartPlug

LinkCustomerDeviceGroup

LinkDeviceGroupSmartPlug

LinkDeviceSubGroupSmartPlug

LinkSmartPlugApplianceEnergyLogType

HubCityLinkHomeAreaManagerCity

HubCountry

LinkCountryCity

HubSatServicePartnerHubSatCustomer

HubSatHomeAreaManager

LinkSatHomeAreaManagerCity

LinkSatCountryCity

HubSatCountry

HubSatDeviceGroup

HubSatSmartPlug

HubAppliance

HubSatAppliance

LinkSatSmartPlugApplianceEnergyLogType

HubSatHomeAreaManagerAddress

SeqServicePartnerPK

ServicePartnerID

LoadDateTime

RecordSource

SeqCustomerPK

CustomerID

LoadDateTime

RecordSource

SeqHomeAreaManagerPK

HomeAreaManagerNumber

LoadDateTime

RecordSource

SeqSmartPlugPK

SmartPlugID

LoadDateTime

RecordSource

SeqDeviceGroupPK

DeviceGroupID

LoadDateTime

RecordSource

SeqEnergyLogTypePK

EnergyLogName

LoadDateTime

RecordSource

SeqServicePartnerCustomerPK

SeqCustomer

LoadDateTime

RecordSource

SeqServicePartner

SeqCustomerHomeAreaManager

PK

SeqCustomer

LoadDateTime

RecordSource

SeqHomeAreaManager

SeqHomeAreaManagerSmartPlug

PK

SeqHomeAreaManager

LoadDateTime

RecordSource

SeqSmartPlug

SeqCustomerDeviceGroupPK

SeqCustomer

LoadDateTime

RecordSource

SeqDeviceGroup

LoadDateTime

SeqDeviceGroupSmartPlugPK

LoadDateTime

RecordSource

SeqDeviceGroup

SeqDeviceSubGroupSmartPlug

PK

LoadDateTime

RecordSource

SeqDeviceGroup

SeqSmartPlug

SeqSmartPlug

SeqSmartPlugApplianceEnergyLogType

PK

SeqEnergyLogType

LoadDateTime

RecordSource

SeqSmartPlug

SeqCityPK

CityPostalCode

LoadDateTime

RecordSource

CityName

SeqHomeAreaManagerCityPK

SeqCity

LoadDateTime

RecordSource

SeqHomeAreaManager

SeqCountryPK

CountryIsoCode

LoadDateTime

RecordSource

SeqCountryCityPK

SeqCity

LoadDateTime

RecordSource

SeqCountry

SeqSatServicePartnerPK

SeqServicePartner

LoadDateTime

RecordSource

LoadEndDateTime

ServicePartnerCode

ServiucePartnerEmail

ServicePartnerCustomerContact

SeqSatCustomerPK

SeqCustomer

LoadDateTime

RecordSource

LoadEndDateTime

CustomerEmail

CustomerFirstName

CustomerLastName

CustomerLanguage

SeqSatHomeAreaManagerPK

SeqHomeAreaManager

LoadDateTime

RecordSource

LoadEndDateTime

HomeAreaManagerMode

HomeAreaManagerArchitecture

SeqSatHomeAreaManagerCity

PK

SeqHomeAreaManagerCity

LoadDateTime

RecordSource

LoadEndDateTime

HAMCityAddressLine1

HAMCityPhoneNumber

HAMCityAddressLine2

SeqSatCountryCityPK

SeqCountryCity

LoadDateTime

RecordSource

LoadEndDateTime

CountryCityRegion

CountryCityState

SeqSatCountryPK

SeqCountry

LoadDateTime

RecordSource

LoadEndDateTime

CountryName

SeqSatDeviceGroupPK

SeqDeviceGroup

LoadDateTime

RecordSource

LoadEndDateTime

DeviceGroupName

DeviceGroupDescription

SeqSatSmartPlugPK

SeqSmartPlug

LoadDateTime

RecordSource

LoadEndDateTime

SmartPlugDisplayName

SmartPlugManufacturer

SmartPlugModel

SmartPlugIsGenerator

SmartPlugHasChildren

SmartPlugHasSchedule

SeqAppliancePK

ApplianceID

LoadDateTime

RecordSource

SeqSatAppliancePK

SeqAppliance

LoadDateTime

RecordSource

LoadEndDateTime

ApplianceCategory

SeqSatSmartPlugApplianceEnergyLogType

PK

SeqSmartPlugApplianceEnergyLogType

LoadDateTime

RecordSource

LoadEndDateTime

EnergyLogDateTime

EnergyLogValue

SeqAppliance

EnergyLogValueUnit

Legend

Hub

Link

Satellite

ServicePartnerWebPage

SeqSatHomeAreaManagerAddress

PK

SeqHomeAreaManager

LoadDateTime

RecordSource

LoadEndDateTime

HomeAreaManagerAddressLine1

HomeAreaManagerPostalCode

HomeAreaManagerAddressLine2

HomeAreaManagerCityName

HomeAreaManagerProvince

HomeAreaManagerState

HomeAreaManagerCountry

Fifthplay Raw Data Vault Architecture

Page 20: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fifthplay Raw Data Vault Architecture

HubSmartPlug

HubEnergyLogType

LinkSmartPlugApplianceEnergyLogType

HubAppliance

HubSatAppliance

LinkSatSmartPlugApplianceEnergyLogType

SeqSmartPlugPK

SmartPlugID

LoadDateTime

RecordSource

SeqEnergyLogTypePK

EnergyLogName

LoadDateTime

RecordSource

SeqSmartPlugApplianceEnergyLogType

PK

SeqEnergyLogType

LoadDateTime

RecordSource

SeqSmartPlug

SeqAppliancePK

ApplianceID

LoadDateTime

RecordSource

SeqSatAppliancePK

SeqAppliance

LoadDateTime

RecordSource

LoadEndDateTime

ApplianceCategory

SeqSatSmartPlugApplianceEnergyLogType

PK

SeqSmartPlugApplianceEnergyLogType

LoadDateTime

RecordSource

LoadEndDateTime

EnergyLogDateTime

EnergyLogValue

SeqAppliance

EnergyLogValueUnit

Legend

Hub

Link

Satellite

Page 21: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Fifthplay : Data Vault – lessons learned

• Don’t stop with data vault; A combination with classic

dimensional Kimball-methodology is advised

• Be creative; get out of your comfort zone, dare to walk

the thine line

• While setting up the data vault, operational issues

where discovered early in the process

• ETL-development goes very quickly because of the

typical pattern design of the data vault;

Page 22: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Data Vault – What’s next ?

Page 23: Experiences from a Data Vault Pilot Exploiting the Internet of Things

2013 : Dan Linstedt

releases Data Vault

2.0 specs

History and what’s next ?

Relational modeling

(E.F.Codd)

Bill Inmon began

discussing Data

Warehousing

• Barry Devlin and

Dr Kimball

release

“Business Data

Warehouse”

• Bill Inmon

popularizes Data

Warehousing

• Dr Kimball

popularizes Star

Schema

Dan Linstedt begins

R&D on Data Vault

Modeling

Dan Linstedt

releases first 5

articles on Data

Vault Modeling

2012 : Dan Linstedt

announces Data

Vault 2.0

1960 1970 1980 1990 2000 2010

Page 24: Experiences from a Data Vault Pilot Exploiting the Internet of Things

Thank You

[email protected]

http://www.linkedin.com/company/usgprofessionalsbe

+32 3 231 94 84 www.usgict.be

https://www.facebook.com/usgictbe

“In the Data Warehousing/BI world, we

should store the data as it stands on the

source system and interpret it on the

way out to the data marts. This is

absolutely critical to remember.” Dan Linstedt

@BICC_at_USG