part 1 - data warehousing lecture at bw cooperative state university (dhbw)

72
Andreas Buckenhofer Data Warehouse (Datenbanken II)

Upload: andreas-buckenhofer

Post on 07-Jan-2017

37 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Andreas Buckenhofer

Data Warehouse (Datenbanken II)

Page 2: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Zur Person

Data Warehouse / DHBW / Fall 2016 / Seite 2

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas Buckenhofer

Senior DB Professional

[email protected]

Seit 2009 bei Daimler TSS

im Fachgebiet Big Data

Business Unit Analytics

Page 3: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Interner IT-Partner für Daimler

Wir sind Spezialist und strategischer Business-Partner für

innovative IT-Gesamtlösungen im Daimler-Konzern.

Als 100%ige Daimler-Tochter leben wir die Kultur der Exzellenz

mit dem Anspruch der Innovations- und Technologieführerschaft.

Mit herausragender Technologie- und Methodenkompetenz

verstehen wir uns als Anbieter von wettbewerbs-

differenzierenden Dienstleistungen und sind

Impulsgeber in anspruchsvollen IT-Fragestellungen,

speziell in den Kernthemen Car IT und Mobility,

Information Security, Analytics und

Shared Services.

Data Warehouse / DHBW / Fall 2016 / Seite 3

Page 4: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Standorte

Data Warehouse / DHBW / Fall 2016 / Seite 4

Daimler TSS Deutschland

6 Standorte

774 Mitarbeiter

Ulm (Hauptsitz)

Raum Stuttgart

Böblingen, Echterdingen,

Leinfelden, Möhringen

Berlin

Daimler TSS India

Hub Bangalore

19 Mitarbeiter

Daimler TSS China

Hub Beijing

6 Mitarbeiter

Daimler TSS Malaysia

Hub Kuala Lumpur

42 Mitarbeiter

Page 5: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH Data Warehouse / DHBW / Fall 2016 / Page 5

Ganzheitliche Betreuung (Erhebung, Auswertung,

Visualisierung und Interpretation), unabhängige

Beratung und Optimierung der Geschäftsabläufe.

Von klassischer BI bis hin zu predictive und prescriptive

Analytics bieten wir Leistungen unter Berücksichtigung

der Datensicherheit.

Dabei verknüpfen wir fachliche Erfahrung und IT-Know-

how im Daimler-Kontext mit dem Blick fürs große

Ganze.

Analytics. Das große Ganze verstehen, um Daten nutzbar machen.

Page 6: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

DWH, Big Data, Data Mining

Data Warehouse / DHBW / Fall 2016 / Page 6

• This lecture is about the classical DWH

• 4 sessions

• Mr. Bollinger’s lecture is about Big Data and Data Mining

Page 7: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Overview of the lecture

Data Warehouse / DHBW / Fall 2016 / Page 7

1. Introduction to DWH, DWH Architectures - 20.10.2016

2. Data Modeling, OLAP 1 - 27.10.2016

3. OLAP 2, ETL - 03.11.2016

4. Metadata, DWH Projects, Advanced Topics - 10.11.2016

Page 8: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

About this Lecture

Data Warehouse / DHBW / Fall 2016 / Page 8

• Structure of the lecture

• Review of the preceding lecture

• Presentation of content

• Group tasks, exercises

• 14:45 – 18:15

• 2x15min breaks

Page 9: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What you will learn today

Data Warehouse / DHBW / Fall 2016 / Page 9

• Data Warehousing is a major topic of computer science.

• After the end of this lecture you will be able to

• Understand the basic business and technology drivers for data warehousing

• Describe the characteristics of a data warehouse

• Describe the differences between production and data warehouse systems

• Understand DWH architecture(s)

Page 10: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Motivation

Data Warehouse / DHBW / Fall 2016 / Page 10

• Data as the new oil

• Data as Asset, getting more and more important also in production industries

• Not only classical data-intensive companies like Google or Facebook

• Increasing interest e.g. in insurance, health care, automotive, …

• Connected cars, Smart Home, Tailor-made insurances, etc.

• Hype technologies

• Databases (RDBMS, NoSQL)

• Big Data

• DWH still growing (additional stimuli coming also from Big Data, Analytics, NoSQL, IOT,

RealTime, TimeToMarket, etc)

Page 11: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Many employment opportunities

Data Warehouse / DHBW / Fall 2016 / Page 11

• DWH department in every (bigger) end user company, also in many medium-sized or

small-sized companies

• DWH department in every (bigger) consulting company

• DWH-only specialized consulting companies

• DWH tool vendors

Page 12: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Many employment opportunities with comprehensive

requirements

Data Warehouse / DHBW / Fall 2016 / Page 12

• Comprehensive requirements

• DWHs are complex

• Specialists (e.g. Data Integration, Data Visualization, Tool XYZ)

• Generalists (e.g. Data Architects)

Page 13: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Job description examples

Data Warehouse / DHBW / Fall 2016 / Page 13

Page 14: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Job description examples

Data Warehouse / DHBW / Fall 2016 / Page 14

Page 15: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Job description examples

Data Warehouse / DHBW / Fall 2016 / Page 15

Page 16: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

• Often used as synonym

• DWH more technical focus

• BI more business / process focus

• “Business intelligence is a set of methodologies, processes, architectures, and

technologies that transform raw data into meaningful and useful information used

to enable more effective strategic, tactical, and operational insights and decision

making.” (Boris Evelson, Forrester Research, 2008)

Data Warehouse (DWH) or Business Intelligence (BI)?

Data Warehouse / DHBW / Fall 2016 / Page 16

Page 17: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information Technology (1960‘s – 80‘s)

Data Warehouse / DHBW / Fall 2016 / Page 17

Support of few business processes in separated applications

• applications separated

• target: replace manual and time consuming activities

• support daily transactions

• data embedded in process / process-specific application

• process-orientation

Page 18: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information Technology (1960‘s – 80‘s)

Data Warehouse / DHBW / Fall 2016 / Page 18

• Many systems throughout the enterprises for dedicated purposes

• Customer data, order data, etc spread over many systems in many variations and with

contradictions

Page 19: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information Technology (1960‘s – 80‘s)

Data Warehouse / DHBW / Fall 2016 / Page 19

• Storage of data

• mostly on tape

• very little space, therefore data encoded and compressed

• one separate tape for each process / application

• data spread

• later introduction of databases – however, no consolidation of data throughout the

company

Page 20: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information Technology (1960‘s – 80‘s)

Data Warehouse / DHBW / Fall 2016 / Page 20

• Databases accessed through common unique keys, numbers, primary keys, ...

• More and more applications access the different databases

• Complex structure of systems and databases, but no overall view of company

possible (silos)

Page 21: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information Technology (1960‘s – 80‘s)

Data Warehouse / DHBW / Fall 2016 / Page 21

• „Management Information Systems“ (MIS)

• since late 60es

• did not really work

• Can be characterized as “Unplanned decision support”

• Management needs reports / combined data from different systems to make

decisions for company

• Reports all manually written by IT people

• extract, combine, cache, accumulate data

• can take several days

• Error prone

• relevant information may be forgotten or combined in a wrong way

• No ad-hoc questions possible

Page 22: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Major problems for effective decision support

Data Warehouse / DHBW / Fall 2016 / Page 22

• Distributed data

• Different data structures

• Historic data

• System workload

• Inadequate technology

Page 23: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Distributed data

Data Warehouse / DHBW / Fall 2016 / Page 23

• Data resides on

• different systems / storages

• different applications

• different technologies

• Has to be accumulated on one system for further analysis

• Data is inhomogeneous, e.g. each system has their own customer number or order

number, etc.

• How to combine the data?

Page 24: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Different data structures

Data Warehouse / DHBW / Fall 2016 / Page 24

• Systems developed independently from each other

• Different data types

• E.g.; zip-code as integer or character string

• Different encodings

• E.g.; m-f, m-w for gender

• Different field lengths

• E.g.; address field

• Different data modeling

• E.g. last name / first name in different fields vs last name / first name (badly

modelled) in one single field

Page 25: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Issues with historic data

Data Warehouse / DHBW / Fall 2016 / Page 25

• Usually: Data archived after max. 3 months

• daily transactions produce lots of data

• limited size of storage →  high amounts of data fill up systems

• For reports, already archived data may have to be accessed

Page 26: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Issues with system workload

Data Warehouse / DHBW / Fall 2016 / Page 26

• Systems constructed for daily transaction business

• handle a constant (large) amount of transactions

• handle transactions concurrently

• each transaction accesses small amount of data

• very simple

• Reports:

• few reports at one time

• access lots of data

• not on a regular basis

• complex arithmetic operations

Page 27: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Issues with system workload

Data Warehouse / DHBW / Fall 2016 / Page 27

• Consequences:

• systems stressed by additional load (due to reports)

• not optimized for this kind of workload

• performance of daily transaction business jeopardized

• may possibly lead to system failure!

Page 28: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Inadequate technology

Data Warehouse / DHBW / Fall 2016 / Page 28

• Capacity and cost of hard disk drives

• No SAN, inflexible scaling of storage

• E.g. 1979: 8″-Winchester hard disk drive of IBM

• Capacity 5 MB ! Cost 10 000 DM.

• Performance and cost of processors

• No graphical front-ends

• Network technology in its beginnings

Page 29: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Textual and visual Reports

Data Warehouse / DHBW / Fall 2016 / Seite 29

Page 30: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Visual Data Integration Tool

Data Warehouse / DHBW / Fall 2016 / Seite 30

Page 31: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Conclusion

Data Warehouse / DHBW / Fall 2016 / Page 31

•  "operative systems" not suitable for analytical evaluations

• Need for a new, separated system

• fast answers, ad-hoc questions possible

• no interference with daily transaction business

� Data Warehouse

Page 32: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Group Task

Data Warehouse / DHBW / Fall 2016 / Page 32

• Assume that you are the CIO of a bank, a retail company, a telco provider or an online

store.

• From what you have learned so far list possible (functional and non-functional)

requirements for a data warehouse. Think of the deficiencies of transactional

systems like

• Distributed data

• Different data structures

• Problem with historic data

• Problem with system workload

• Inadequate technology ! 

• What are requirements from an end-user perspective.

Page 33: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse User

Data Warehouse / DHBW / Fall 2016 / Page 33

• Wants to access and analyze all important data from desktop

• Uses data analysis as facility for daily work

• Wants data to be fast and dynamically available

• Identifies data correlations by browsing through the data

• does possibly not know what (s)he is looking for

Page 34: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

Data Warehouse / DHBW / Fall 2016 / Page 34

• Answer business-critical questions

• Decision support for management

• Overcomes difficulties when using existing transaction systems for those tasks

• Not a product, but a overall concept

• Applications come, applications go. The data, however, lives forever. It is not about

building applications; it really is about the data underneath these application (Tom

Kyte)

Page 35: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

Data Warehouse / DHBW / Fall 2016 / Page 35

• Contains data from different systems

• Imports data from different systems on a regular basis

• summarize data

• provide historic data

• generate metadata

• Completely new system

Page 36: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

High-Level Data Warehouse Architecture

Data Warehouse / DHBW / Fall 2016 / Page 36

Staging

OLTP

OLTP

OLTP

Core Warehouse Mart

Data Warehouse

Page 37: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Definitions DWH

Data Warehouse / DHBW / Fall 2016 / Page 37

Ralph Kimball Bill Inmon

„A data warehouse is a copy of transaction

data specifically structured for querying

and reporting“

“A data warehouse is a subject-oriented,

integrated, time-variant, nonvolatile

collection of data in support of

management’s decision-making process”

Page 38: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

• A data warehouse is organized around the major subjects of the enterprise like

• Customer

• Vendor

• Car

• In contrast to the process/functional orientation of applications such as

• Offer

• Booking

• Fulfillment

• Delivery

Subject-oriented

Data Warehouse / DHBW / Fall 2016 / Page 38

Page 39: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

• Data contained in the warehouse are integrated.

• Aspects of integration

• consistent naming conventions,

• consistent measurement of variables,

• consistent encoding structures,

• consistent physical attributes of data

• „One version of truth“

Integrated

Data Warehouse / DHBW / Fall 2016 / Page 39

Page 40: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

• All data in the data warehouse is accurate as of some moment in time.

• Has to be associated with a time stamp.

• In the operational environment data is accurate as of the moment of access.

• Once data is correctly recorded in the data warehouse, it cannot be updated or

deleted.

• Data warehouse data is, for all practical purposes, a long series of snapshots.

• If data is deleted in the OLTP system, the data is the DWH is marked with e.g. a

deletion flag but never actually deleted.

• Operational data, being accurate as of the moment of access, can be updated as the

need arises.

Time-variant

Data Warehouse / DHBW / Fall 2016 / Page 40

Page 41: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

• Operations in operational environment

• insert

• delete

• update

• select 

• Operations in a data warehouse

• the initial and additional loading of data by (batch) processes

• the access of data

• (almost) no updates and deletes (technical updates / deletes only)

Nonvolatile

Data Warehouse / DHBW / Fall 2016 / Page 41

Page 42: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Online Transaction Processing Online Analytical Processing

Transaction-oriented system Query-oriented system

Optimized for insert and update Optimized for complex queries with short

response times; ad-hoc queries

Many users change data Only ETL process writes data

Selective queries on the data Evaluations of all data (complex queries)

Avoid redundancy Redundant data storage

Normalized data management De-normalized data management

Relational Data Modeling Several layers with different data models,

one model usually Dimensional Data

Modeling

OLTP vs OLAP

Data Warehouse / DHBW / Fall 2016 / Page 42

Page 43: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Operative data Dispositive data

Handling structured, parallel processes

with short and isolated ("atomic")

transactions

information for management (decision

support)

Modeling process- and function oriented,

individual for each application

Different data models in one DWH;

historic, stable and summarized, data

# users many few(er) but increasing user base

System

return time

Milliseconds Seconds to minutes (even hours)

Operative vs dispositive data

Data Warehouse / DHBW / Fall 2016 / Page 43

Page 44: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Operative DBs Dispositive DBs

Purpose processing of daily business

transactions

information for management (decision

support)

Content detailed, complete, most recent

data

historic, stable and summarized, data

Data

amount

small amount of data per

transaction

large amount of data for load, and often

per query

Data

structure

suitable for operational

transactions

Several models; suitable for long term

storage and business analyses

Trans

actions

ACID; very short read/write

transactions

long load operations, longer read

transactions

Operative vs dispositive databases

Data Warehouse / DHBW / Fall 2016 / Page 44

Page 45: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

• Information oriented

• Analysis oriented

• Campaign oriented

• Planning oriented

Data Warehousing Usage Types

Data Warehouse / DHBW / Fall 2016 / Page 45

Page 46: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information oriented

Data Warehouse / DHBW / Fall 2016 / Page 46

•  Computation of various key figures

• Revenue, profit, ..

• Used for „business performance management“

• User groups

• Higher management

• Controlling

• Mostly fixed reports „standard reporting“

• No ad hoc queries

• 70 – 80 % of all users of a data warehouse

• Web-based front-end

• Enterprise Portals

Page 47: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Analysis oriented applications

Data Warehouse / DHBW / Fall 2016 / Page 47

•  Purpose analysis of complex problems

• Example: analysis of quality problems

• Using the information of warranty cases to find underlying reasons for product

failure

• Identification of outliers or quality issues

Page 48: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Campaign oriented applications

Data Warehouse / DHBW / Fall 2016 / Page 48

•   Executed as specific projects

• Campaign planning

• Campaign execution

• Campaign evaluation of results

• Requires

• a broad spectrum of data

• Not all data may be in a data warehouse

• Sophisticated analysis techniques

• Data mining

• Examples

• Marketing campaigns

• Churn prevention

Page 49: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Planning Oriented

Data Warehouse / DHBW / Fall 2016 / Page 49

• Use data from past to predict trends

• Using trends in the historical data for setting up plan figures

• Futures product sales

• For production planning

• Stock replenishment

• ...

Page 50: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercise

Data Warehouse / DHBW / Fall 2016 / Page 50

• For one of the following companies

• Bank, Telecommunication company, Online store (like Amazon.com), Car

manufacturer, Supermarket

1.  Outline the operational systems

• characterize which operations are performed by them

• mostly triggered by actions of a customer

• which information is stored by these systems

• which questions can be answered by these systems

2.  For a data warehouse

• describe what information can be stored in it

• which questions can/should be answered with this information

Page 51: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

FrontendBackend

External data

sources

Internal data

sources

Standard Data Warehouse Architecture

Data Warehouse / DHBW / Fall 2016 / Page 51

Staging Layer

(Input Layer)

Core Warehouse

Layer

(Storage Layer)

Reporting Layer

(Output Layer)

(Mart Layer)

Page 52: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Sources

Data Warehouse / DHBW / Fall 2016 / Page 52

� Providing internal and external data out of the source systems.

� Enabling data through Push (source is generating extracts) or Pull (BI Data Backend is

requesting or directly accessing data).

� Example for Push practice (deliver csv or text data through file interface; Change

Data Capture (CDC)).

� Example for Pull practice (direct access to the source system via ODBC, JDBC, API

and so on).

Page 53: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Staging Layer

Data Warehouse / DHBW / Fall 2016 / Page 53

� Major target is to decouple source and target system (repeating extraction run,

additional delivery).

� Granular data (no pre-aggregation or filtering in the Data Source Layer, i.e. the source

system).

� Usually not persistent, therefore regular housekeeping is necessary (for instance after a

few days or weeks or at the latest once a correct upload to Core Warehouse Layer is

ensured).

Page 54: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Core Warehouse Layer

Data Warehouse / DHBW / Fall 2016 / Page 54

� Data storage in an integrated, consolidated, consistent and often non-redundant

(normalized) data model.

� Application / Reporting neutral data storage on the most detailed level of granularity

(incl. historic data).

� Size of database can be several TB and can grow rapidly due to data historization.

Page 55: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Mart Layer

Data Warehouse / DHBW / Fall 2016 / Page 55

� For performance reasons the Data Mart Layer is providing aggregated data in a

denormalized data model.

� Created through filtering or aggregating the Core Warehouse Layer content on specific

dimensions (e.g. time, region), granularity or business content specific aspects.

� Technically the Data Mart Layer can also be a part of an Analytical Frontend product

(such as QlickTech QlikView or IBM Cognos TM1) and need not to be stored in a

relational database.

� The Data Mart Layer also contains security permissions as end users are granted

access to this layer only.

Page 56: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Other architectures

Data Warehouse / DHBW / Fall 2016 / Page 56

• Independent Data Marts

• Central data warehouse based on data marts (Kimball approach)

• Virtual Data Warehouse

• Data Vault 2.0 Architecture (Dan Linstedt)

• DW 2.0: The Architecture for the Next Generation of Data Warehousing (Bill Inmon)

Page 57: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

Frontend

Frontend

Backend

Backend

FrontendBackend

External data

sources

Internal data

sources

Independent Data Marts

Data Warehouse / DHBW / Fall 2016 / Page 57

Data Mart 1

Data Mart 2

Data Mart 3Staging Layer

(Input Layer)

Core Warehouse

Layer

(Storage Layer)

Page 58: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Independent Data Marts

Data Warehouse / DHBW / Fall 2016 / Page 58

• Situation is often encountered in enterprises

• Data marts created to solve specific analytical requirements

• Fast implementation and lower cost

• Often Staging Layer and/or Core Warehouse Layer omitted

• Not recommended

• Possible problems

• No integration with other data marts

• No common data model for all data marts

• Cross data mart queries difficult

Page 59: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

FrontendBackend

External data

sources

Internal data

sources

Central data warehouse based on data marts (Kimball

approach)

Data Warehouse / DHBW / Fall 2016 / Page 59

Core Warehouse Layer = Mart Layer

Data Mart 1

Data Mart 2Data Mart 3

Staging Layer

(Input Layer)

Page 60: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Central data warehouse based on data marts (Kimball

approach)

Data Warehouse / DHBW / Fall 2016 / Page 60

• Bottom-up approach

• Leverages existing data marts and integration processes

• Data marts may need to be redesigned if incompatibilities between the data models

exist

• Data mart changes can have an impact on the central data warehouse

• Enterprise service bus / conformed dimensions for integration purposes

Page 61: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

FrontendBackend

External data

sources

Internal data

sources

Virtual Data Warehouse

Data Warehouse / DHBW / Fall 2016 / Page 61

Query Management

Page 62: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Virtual Data Warehouse

Data Warehouse / DHBW / Fall 2016 / Page 62

• Data not extracted from operational systems and stored separately

• Standardized interface for all operational data sources

• One "GUI" for all existing data

• Generates combined queries

• Query Processor joins query result data from different sources

• Can also access data in Hadoop (Polybase, Big SQL, BigData SQL, etc)

Page 63: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Virtual Data Warehouse

Data Warehouse / DHBW / Fall 2016 / Page 63

• Query Management manages metadata about all operational systems

• (physical) location of data

• Algorithms for extracting data from operational system

• Implementation easier

• Low cost: can use existing hardware infrastructure

• Optimized access to existing infrastructure

Page 64: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Virtual Data Warehouse

Data Warehouse / DHBW / Fall 2016 / Page 64

• Queries cause significant performance problems in operational systems

• Known problems when analyzing operational data directly

• Same query is processed multiple times (if queried multiple times)

• Same query delivers different results when processed at different times

Page 65: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

FrontendBackend

External data

sources

Internal data

sources

Data Vault 2.0 Architecture (Dan Linstedt)

Data Warehouse / DHBW / Fall 2016 / Page 65

Staging Layer

(Input Layer)

Core Warehouse

Layer

(Storage Layer)

Reporting Layer

(Output Layer)

(Mart Layer)

Page 66: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Vault 2.0 Architecture

Data Warehouse / DHBW / Fall 2016 / Page 66

• Core Warehouse Layer is modeled with Data Vault

• Business rules are applied from Core Warehouse Layer to Mart Layer and not earlier

• Or alternatively within Core Warehouse Layer (Raw Data Vault containing data

without transformations and Business Data Vault containing transformed data)

• Data is fully auditable

• Real-time capable architecture

Page 67: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Vault 2.0 Architecture

Data Warehouse / DHBW / Fall 2016 / Page 67

• Data Vault is optimized for the following requirements:

• Flexibility

• Agility

• Data historization

• Data integration

• Auditability

• Bill Inmon wrote in 2008: “Data Vault is the optimal approach for modeling the EDW in

the DW2.0 framework.” (DW2.0)

Page 68: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

DW 2.0: The Architecture for the Next Generation of Data

Warehousing (Bill Inmon)

Data Warehouse / DHBW / Fall 2016 / Page 68

Page 69: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

DW 2.0: The Architecture for the Next Generation of Data

Warehousing (Bill Inmon)

Data Warehouse / DHBW / Fall 2016 / Page 69

• Structured and unstructured data

• Different storage areas

• Hot data: High speed, expensive storage (RAM, SSDs) for most recent data

• …

• Cold data: Low speed, inexpensive storage (e.g. hard disks) for old data

Page 70: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Architecture from an actual Project

Data Warehouse / DHBW / Fall 2016 / Seite 70

ETL Engine

Fro

nte

nd

Standard

Reports

AdHoc

ReportsLogs

TSM

IIDR

ReplEngine

Source

Datastore

Source

Spiegel

DB

OLTP

DB

IIDR ReplEngine

Spiegel

Datastore

Spiegel

IIDR ReplEngine

DWH

Datastore

DWH

DWH DB

Staging Layer

Core Layer / Data Vault

Mart Layer

Page 71: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Group Task

Data Warehouse / DHBW / Fall 2016 / Page 71

• Review the presented data warehouse architectures.

• Which architecture would you recommend for

• A multinational telco provider

• A holding of 3 car manufacturers

• An online store represented in almost all countries

• A machine tools manufacturer with plants in Europe, America and Asia

• You may recommend a combination of the architecures or you may recommend as well

different architectures for different subject areas

• List advantages and drawbacks of your proposal.

Page 72: Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Thank you!

Daimler TSS GmbH

Wilhelm-Runge-Straße 11, 89081 Ulm, Germany / Phone +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com / Intranet portal code: @TSS

Domicile and Court of Registry: Ulm / Commercial Register No.: 3844 / Management: Christoph Röger (Vorsitzender), Steffen Bäuerle